subject:"\[casper\] Compiler merging SRLs \-\- Timing performance"

Re: [casper] Compiler merging SRLs -- Timing performance

2014-12-05 Thread Jack Hickish

Hi Everyone,

Thanks all for the advice. Based on a few experiments so far (skip to
4 for what I think is a disappointinly simple solution, that I was too
stupid to see in the XST manual) --

1. My fabric utilisation isn't that high, although digging around
planAhead there are some areas with high routing congestion. I wonder
how much this throws off the compiler.

2. Making the shift registers that are causing problems implement as
cores, rather than behavioural HDL doesn't seem to solve the problem,
the tools will quite happily combine two such cores into one LUT.

3. Explicitly disabling SRLs (by either putting lots of single delays
as cores / adding resets / (*shreg_extract = "NO"*)-ing HDL code makes
the problem go away for the individual delay (since now there aren't
any LUTs to combine). But mostly the symptom will just appear
somewhere else. (I haven't tried the nuclear SRL global disable, but
I'd be amazed if that didn't just cause my design to explode).

4. Resynthesizing the netlists with "-lc off" seems to have made all
the issues I was having disappear. At least in the timing report I've
read the headline spectacular fails have gone. Map reports that there
are still some SRLs using both O5 and O6 outputs, but I've got a bunch
of pcores, and I haven't resynth'd them all yet.
I'm a bit surprised that this option is needed in XST to avoid
combining luts that exist in different pcores, I would have thought
turning it off in map would be sufficient, but I guess I was wrong.
Maybe I would have figured this out sooner if I'd properly read an
up-to-date XST manual -- it appears that the default behaviour of lut
combining in XST has gone from 'off' in Virtex 5, to 'auto' in Virtex
6.

So bottom line, maybe -lc is an option worth playing with in future if
designs are failing timing with bizarre signal paths.

Thanks again for the help (and big shoutout for resynth_netlist, which
I certainly didn't realise was added by Dave 4 YEARS AGO!).

Jack

On 5 December 2014 at 07:01, Jason Manley  wrote:
> I often re-run XST with:
>
> register_balancing yes
> optimize_primitives yes
> read_cores yes
> shreg_extract no
>
> shreg_extract prevents adjacent registers from being combined into SRL16s.
>
> Jason Manley
> CBF Manager
> SKA-SA
>
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
>
> On 05 Dec 2014, at 6:27, Henno Kriel  wrote:
>
>> Hi Jack,
>>
>> In Simulink if have seen similar issues when trying to add more "register" 
>> pipelining, to decrease routing delay's and thus increase Fmax.
>> However, ISE just collapses all the pipelining into a single SRL, which 
>> yields the frustrations you mentioned.
>> You can prevent this from happening, by adding a synchronise reset (one of 
>> the tick boxes on delay block) to your pipelining registers.
>> You will have to connect up a reset signal from a register (but you don't 
>> actually need to use it),
>> to ensure that it does not get optimized away.
>> In my case this normally resolves the routing issue and achieves timing 
>> closure.
>>
>> Hope this helps.
>> HK
>>
>>
>>
>> On Thu, Dec 4, 2014 at 9:37 PM, Jack Hickish  wrote:
>> Hey Mark,
>>
>> Yeah, I guess I could manually force the locations of the two offending 
>> shift-regs to stop the combination, but the problem SRLs seem to be a fairly 
>> arbitrary selection of those in the design. I don't really want to have to 
>> start constraining at the LUT level if I can help it. But maybe I'll try and 
>> see if the problem goes away, or just emerges somewhere else.
>>
>> Hi Dave,
>>
>> I have been through all the planAhead options, as well as the 
>> fast_runtime.opt settings in the base package (I've been using both flows) 
>> and (tried to) set everything to optimize for speed. The -lt option to me 
>> seems like it should control the behaviour I'm seeing, but it doesn't seem 
>> to. I'm using pblocks, but have been almost exclusively been constraining 
>> only rams/dsps. As above, I'm about to try forcing the placements. I haven't 
>> run resynth netlist on my simulink design, but equivalent register removal 
>> is turned off in planAhead and some of the signals it appears to be 
>> LUT-combining belong to different pcores, so I thought that planahead 
>> settings should be enough. (obviously I could be wrong).
>> In any case, I didn't think this was an equivalent register removal problem. 
>> It's not like multiple copies of the same register are being merged at the 
>> expense of fanout, just a 2-clock data delay inside an X-engine might be 
>> merged with a 2-clock delay of some data signal in an FFT. But again, maybe 
>> I'm understanding the options wrong, so I'll try resynthing the netlist and 
>> see if that helps.
>>
>> Thanks for your help, both.
>>
>> Jack
>>
>>
>>
>> On Thu Dec 04 2014 at 19:18:35 David MacMahon  
>> wrote:
>> Hi, Jack,
>>
>> Are the tools are optimizing for area instead of speed?  Are you using 
>> Pblocks?
>>
>> I don't know if this is relevant to your situation, but I've run

Re: [casper] Compiler merging SRLs -- Timing performance

2014-12-04 Thread Jason Manley

I often re-run XST with:

register_balancing yes
optimize_primitives yes
read_cores yes
shreg_extract no

shreg_extract prevents adjacent registers from being combined into SRL16s.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 05 Dec 2014, at 6:27, Henno Kriel  wrote:

> Hi Jack,
> 
> In Simulink if have seen similar issues when trying to add more "register" 
> pipelining, to decrease routing delay's and thus increase Fmax.
> However, ISE just collapses all the pipelining into a single SRL, which 
> yields the frustrations you mentioned. 
> You can prevent this from happening, by adding a synchronise reset (one of 
> the tick boxes on delay block) to your pipelining registers. 
> You will have to connect up a reset signal from a register (but you don't 
> actually need to use it), 
> to ensure that it does not get optimized away. 
> In my case this normally resolves the routing issue and achieves timing 
> closure.
> 
> Hope this helps.
> HK
> 
> 
> 
> On Thu, Dec 4, 2014 at 9:37 PM, Jack Hickish  wrote:
> Hey Mark,
> 
> Yeah, I guess I could manually force the locations of the two offending 
> shift-regs to stop the combination, but the problem SRLs seem to be a fairly 
> arbitrary selection of those in the design. I don't really want to have to 
> start constraining at the LUT level if I can help it. But maybe I'll try and 
> see if the problem goes away, or just emerges somewhere else.
> 
> Hi Dave,
> 
> I have been through all the planAhead options, as well as the 
> fast_runtime.opt settings in the base package (I've been using both flows) 
> and (tried to) set everything to optimize for speed. The -lt option to me 
> seems like it should control the behaviour I'm seeing, but it doesn't seem 
> to. I'm using pblocks, but have been almost exclusively been constraining 
> only rams/dsps. As above, I'm about to try forcing the placements. I haven't 
> run resynth netlist on my simulink design, but equivalent register removal is 
> turned off in planAhead and some of the signals it appears to be 
> LUT-combining belong to different pcores, so I thought that planahead 
> settings should be enough. (obviously I could be wrong). 
> In any case, I didn't think this was an equivalent register removal problem. 
> It's not like multiple copies of the same register are being merged at the 
> expense of fanout, just a 2-clock data delay inside an X-engine might be 
> merged with a 2-clock delay of some data signal in an FFT. But again, maybe 
> I'm understanding the options wrong, so I'll try resynthing the netlist and 
> see if that helps.
> 
> Thanks for your help, both.
> 
> Jack
> 
> 
> 
> On Thu Dec 04 2014 at 19:18:35 David MacMahon  
> wrote:
> Hi, Jack,
> 
> Are the tools are optimizing for area instead of speed?  Are you using 
> Pblocks?
> 
> I don't know if this is relevant to your situation, but I've run into 
> annoyances when the tools use "equivalent register removal" to save a few 
> flip-flops but end up causing fan-out/routing issues.  That can be turned 
> off, but it's a synthesis option so if you want to apply it to a System 
> Generator netlist, you have to use the "resynth_netlist" Matlab function from 
> the casper library to re-synthesize the entire netlist.
> 
> Dave
> 
> On Dec 4, 2014, at 10:48 AM, Jack Hickish wrote:
> 
> > Hi all,
> >
> > This is something I've been fighting with for a while now, and I wonder if 
> > anyone on this maillist has any insight (because I'm pretty sure I may just 
> > be doing something wrong with the tools).
> >
> > The problem:
> > I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz. 
> > However, every now and then I'll make a small change to the design and the 
> > compile will fail timing catastrophically, with paths failing sometimes 
> > with -2 ns (or worse) slack.
> > When I look at the failing path(s), the delays are usually ~80% routing. 
> > I'll see a signal take a huge detour to use a shift register in some 
> > arbitrary location on the chip. Upon closer inspection of the relevant SRL, 
> > it appears that the LUT concerned is being used for two signal paths, one 
> > on the O5 output, one on the O6. The result seems to be that it is poorly 
> > placed for both it's roles.
> >
> > I'm only using ~50% of the slices and about 30% of the registers / luts on 
> > the FPGA, and there are plenty of sensibly located SLICEMs the placer could 
> > use if it so desired. I've switched lut combining off (with the -lt flag), 
> > in planahead which doesn't seem to have made any difference.
> >
> > Can anyone offer me any words of advice / wisdom which might reduce my 
> > confusion at what's going on (or, even better, help me solve the problem)?
> >
> > Despairingly yours,
> > Jack
> >
> >
> 
> 
> 
> 
> -- 
> Kind regards,
> Henno Kriel
> 
> DBE: Hardware Manager
> 
> SKA South Africa
> Third Floor
> The Park
> Park Road (off Alexandra Road)
> Pinelands
> 7405
> Western Cape
> South Africa
> 
> Latitu

Re: [casper] Compiler merging SRLs -- Timing performance

2014-12-04 Thread Henno Kriel

Hi Jack,

In Simulink if have seen similar issues when trying to add more "register"
pipelining, to decrease routing delay's and thus increase Fmax.
However, ISE just collapses all the pipelining into a single SRL, which
yields the frustrations you mentioned.
You can prevent this from happening, by adding a synchronise reset (one of
the tick boxes on delay block) to your pipelining registers.
You will have to connect up a reset signal from a register (but you don't
actually need to use it),
to ensure that it does not get optimized away.
In my case this normally resolves the routing issue and achieves timing
closure.

Hope this helps.
HK



On Thu, Dec 4, 2014 at 9:37 PM, Jack Hickish  wrote:

> Hey Mark,
>
> Yeah, I guess I could manually force the locations of the two offending
> shift-regs to stop the combination, but the problem SRLs seem to be a
> fairly arbitrary selection of those in the design. I don't really want to
> have to start constraining at the LUT level if I can help it. But maybe
> I'll try and see if the problem goes away, or just emerges somewhere else.
>
> Hi Dave,
>
> I have been through all the planAhead options, as well as the
> fast_runtime.opt settings in the base package (I've been using both flows)
> and (tried to) set everything to optimize for speed. The -lt option to me
> seems like it should control the behaviour I'm seeing, but it doesn't seem
> to. I'm using pblocks, but have been almost exclusively been constraining
> only rams/dsps. As above, I'm about to try forcing the placements. I
> haven't run resynth netlist on my simulink design, but equivalent register
> removal is turned off in planAhead and some of the signals it appears to be
> LUT-combining belong to different pcores, so I thought that planahead
> settings should be enough. (obviously I could be wrong).
> In any case, I didn't think this was an equivalent register removal
> problem. It's not like multiple copies of the same register are being
> merged at the expense of fanout, just a 2-clock data delay inside an
> X-engine might be merged with a 2-clock delay of some data signal in an
> FFT. But again, maybe I'm understanding the options wrong, so I'll try
> resynthing the netlist and see if that helps.
>
> Thanks for your help, both.
>
> Jack
>
>
>
> On Thu Dec 04 2014 at 19:18:35 David MacMahon 
> wrote:
>
>> Hi, Jack,
>>
>> Are the tools are optimizing for area instead of speed?  Are you using
>> Pblocks?
>>
>> I don't know if this is relevant to your situation, but I've run into
>> annoyances when the tools use "equivalent register removal" to save a few
>> flip-flops but end up causing fan-out/routing issues.  That can be turned
>> off, but it's a synthesis option so if you want to apply it to a System
>> Generator netlist, you have to use the "resynth_netlist" Matlab function
>> from the casper library to re-synthesize the entire netlist.
>>
>> Dave
>>
>> On Dec 4, 2014, at 10:48 AM, Jack Hickish wrote:
>>
>> > Hi all,
>> >
>> > This is something I've been fighting with for a while now, and I wonder
>> if anyone on this maillist has any insight (because I'm pretty sure I may
>> just be doing something wrong with the tools).
>> >
>> > The problem:
>> > I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz.
>> However, every now and then I'll make a small change to the design and the
>> compile will fail timing catastrophically, with paths failing sometimes
>> with -2 ns (or worse) slack.
>> > When I look at the failing path(s), the delays are usually ~80%
>> routing. I'll see a signal take a huge detour to use a shift register in
>> some arbitrary location on the chip. Upon closer inspection of the relevant
>> SRL, it appears that the LUT concerned is being used for two signal paths,
>> one on the O5 output, one on the O6. The result seems to be that it is
>> poorly placed for both it's roles.
>> >
>> > I'm only using ~50% of the slices and about 30% of the registers / luts
>> on the FPGA, and there are plenty of sensibly located SLICEMs the placer
>> could use if it so desired. I've switched lut combining off (with the -lt
>> flag), in planahead which doesn't seem to have made any difference.
>> >
>> > Can anyone offer me any words of advice / wisdom which might reduce my
>> confusion at what's going on (or, even better, help me solve the problem)?
>> >
>> > Despairingly yours,
>> > Jack
>> >
>> >
>>
>>


-- 
Kind regards,
Henno Kriel

DBE: Hardware Manager

SKA South Africa
Third Floor
The Park
Park Road (off Alexandra Road)
Pinelands
7405
Western Cape
South Africa

Latitude: -33.94329 (South); Longitude: 18.48945 (East).

(p) +27 (0)21 506 7300
(p) +27 (0)21 506 7374 (direct)
(f) +27 (0)21 506 7375
(m) +27 (0)84 504 5050

Re: [casper] Compiler merging SRLs -- Timing performance

2014-12-04 Thread Jack Hickish

Hey Mark,

Yeah, I guess I could manually force the locations of the two offending
shift-regs to stop the combination, but the problem SRLs seem to be a
fairly arbitrary selection of those in the design. I don't really want to
have to start constraining at the LUT level if I can help it. But maybe
I'll try and see if the problem goes away, or just emerges somewhere else.

Hi Dave,

I have been through all the planAhead options, as well as the
fast_runtime.opt settings in the base package (I've been using both flows)
and (tried to) set everything to optimize for speed. The -lt option to me
seems like it should control the behaviour I'm seeing, but it doesn't seem
to. I'm using pblocks, but have been almost exclusively been constraining
only rams/dsps. As above, I'm about to try forcing the placements. I
haven't run resynth netlist on my simulink design, but equivalent register
removal is turned off in planAhead and some of the signals it appears to be
LUT-combining belong to different pcores, so I thought that planahead
settings should be enough. (obviously I could be wrong).
In any case, I didn't think this was an equivalent register removal
problem. It's not like multiple copies of the same register are being
merged at the expense of fanout, just a 2-clock data delay inside an
X-engine might be merged with a 2-clock delay of some data signal in an
FFT. But again, maybe I'm understanding the options wrong, so I'll try
resynthing the netlist and see if that helps.

Thanks for your help, both.

Jack

On Thu Dec 04 2014 at 19:18:35 David MacMahon 
wrote:

> Hi, Jack,
>
> Are the tools are optimizing for area instead of speed?  Are you using
> Pblocks?
>
> I don't know if this is relevant to your situation, but I've run into
> annoyances when the tools use "equivalent register removal" to save a few
> flip-flops but end up causing fan-out/routing issues.  That can be turned
> off, but it's a synthesis option so if you want to apply it to a System
> Generator netlist, you have to use the "resynth_netlist" Matlab function
> from the casper library to re-synthesize the entire netlist.
>
> Dave
>
> On Dec 4, 2014, at 10:48 AM, Jack Hickish wrote:
>
> > Hi all,
> >
> > This is something I've been fighting with for a while now, and I wonder
> if anyone on this maillist has any insight (because I'm pretty sure I may
> just be doing something wrong with the tools).
> >
> > The problem:
> > I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz.
> However, every now and then I'll make a small change to the design and the
> compile will fail timing catastrophically, with paths failing sometimes
> with -2 ns (or worse) slack.
> > When I look at the failing path(s), the delays are usually ~80% routing.
> I'll see a signal take a huge detour to use a shift register in some
> arbitrary location on the chip. Upon closer inspection of the relevant SRL,
> it appears that the LUT concerned is being used for two signal paths, one
> on the O5 output, one on the O6. The result seems to be that it is poorly
> placed for both it's roles.
> >
> > I'm only using ~50% of the slices and about 30% of the registers / luts
> on the FPGA, and there are plenty of sensibly located SLICEMs the placer
> could use if it so desired. I've switched lut combining off (with the -lt
> flag), in planahead which doesn't seem to have made any difference.
> >
> > Can anyone offer me any words of advice / wisdom which might reduce my
> confusion at what's going on (or, even better, help me solve the problem)?
> >
> > Despairingly yours,
> > Jack
> >
> >
>
>

Re: [casper] Compiler merging SRLs -- Timing performance

2014-12-04 Thread David MacMahon

Hi, Jack,

Are the tools are optimizing for area instead of speed?  Are you using Pblocks?

I don't know if this is relevant to your situation, but I've run into 
annoyances when the tools use "equivalent register removal" to save a few 
flip-flops but end up causing fan-out/routing issues.  That can be turned off, 
but it's a synthesis option so if you want to apply it to a System Generator 
netlist, you have to use the "resynth_netlist" Matlab function from the casper 
library to re-synthesize the entire netlist.

Dave

On Dec 4, 2014, at 10:48 AM, Jack Hickish wrote:

> Hi all,
> 
> This is something I've been fighting with for a while now, and I wonder if 
> anyone on this maillist has any insight (because I'm pretty sure I may just 
> be doing something wrong with the tools).
> 
> The problem:
> I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz. 
> However, every now and then I'll make a small change to the design and the 
> compile will fail timing catastrophically, with paths failing sometimes with 
> -2 ns (or worse) slack.
> When I look at the failing path(s), the delays are usually ~80% routing. I'll 
> see a signal take a huge detour to use a shift register in some arbitrary 
> location on the chip. Upon closer inspection of the relevant SRL, it appears 
> that the LUT concerned is being used for two signal paths, one on the O5 
> output, one on the O6. The result seems to be that it is poorly placed for 
> both it's roles.
> 
> I'm only using ~50% of the slices and about 30% of the registers / luts on 
> the FPGA, and there are plenty of sensibly located SLICEMs the placer could 
> use if it so desired. I've switched lut combining off (with the -lt flag), in 
> planahead which doesn't seem to have made any difference.
> 
> Can anyone offer me any words of advice / wisdom which might reduce my 
> confusion at what's going on (or, even better, help me solve the problem)?
> 
> Despairingly yours,
> Jack
> 
>

Re: [casper] Compiler merging SRLs -- Timing performance

2014-12-04 Thread Mark Wagner

Hi Jack,

Not sure if this will help, but in Planahead I would try to click and drag
that LUT as close as possible to each of the outputs.  And if that doesn't
help or makes it worse, you could also try to duplicate the logic going to
each of those outputs, forcing separate LUTs to be used.

Cheers,
Mark


On Thu, Dec 4, 2014 at 10:48 AM, Jack Hickish  wrote:

> Hi all,
>
> This is something I've been fighting with for a while now, and I wonder if
> anyone on this maillist has any insight (because I'm pretty sure I may just
> be doing something wrong with the tools).
>
> The problem:
> I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz.
> However, every now and then I'll make a small change to the design and the
> compile will fail timing catastrophically, with paths failing sometimes
> with -2 ns (or worse) slack.
> When I look at the failing path(s), the delays are usually ~80% routing.
> I'll see a signal take a huge detour to use a shift register in some
> arbitrary location on the chip. Upon closer inspection of the relevant SRL,
> it appears that the LUT concerned is being used for two signal paths, one
> on the O5 output, one on the O6. The result seems to be that it is poorly
> placed for both it's roles.
>
> I'm only using ~50% of the slices and about 30% of the registers / luts on
> the FPGA, and there are plenty of sensibly located SLICEMs the placer could
> use if it so desired. I've switched lut combining off (with the -lt flag),
> in planahead which doesn't seem to have made any difference.
>
> Can anyone offer me any words of advice / wisdom which might reduce my
> confusion at what's going on (or, even better, help me solve the problem)?
>
> Despairingly yours,
> Jack
>
>
>

[casper] Compiler merging SRLs -- Timing performance

2014-12-04 Thread Jack Hickish

Hi all,

This is something I've been fighting with for a while now, and I wonder if
anyone on this maillist has any insight (because I'm pretty sure I may just
be doing something wrong with the tools).

The problem:
I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz.
However, every now and then I'll make a small change to the design and the
compile will fail timing catastrophically, with paths failing sometimes
with -2 ns (or worse) slack.
When I look at the failing path(s), the delays are usually ~80% routing.
I'll see a signal take a huge detour to use a shift register in some
arbitrary location on the chip. Upon closer inspection of the relevant SRL,
it appears that the LUT concerned is being used for two signal paths, one
on the O5 output, one on the O6. The result seems to be that it is poorly
placed for both it's roles.

I'm only using ~50% of the slices and about 30% of the registers / luts on
the FPGA, and there are plenty of sensibly located SLICEMs the placer could
use if it so desired. I've switched lut combining off (with the -lt flag),
in planahead which doesn't seem to have made any difference.

Can anyone offer me any words of advice / wisdom which might reduce my
confusion at what's going on (or, even better, help me solve the problem)?

Despairingly yours,
Jack

Re: [casper] Compiler merging SRLs -- Timing performance

Re: [casper] Compiler merging SRLs -- Timing performance

Re: [casper] Compiler merging SRLs -- Timing performance

Re: [casper] Compiler merging SRLs -- Timing performance

Re: [casper] Compiler merging SRLs -- Timing performance

Re: [casper] Compiler merging SRLs -- Timing performance

[casper] Compiler merging SRLs -- Timing performance

7 matches

Site Navigation

Mail list logo

Footer information