Re: [casper] Compiler merging SRLs -- Timing performance
Hi Everyone, Thanks all for the advice. Based on a few experiments so far (skip to 4 for what I think is a disappointinly simple solution, that I was too stupid to see in the XST manual) -- 1. My fabric utilisation isn't that high, although digging around planAhead there are some areas with high routing congestion. I wonder how much this throws off the compiler. 2. Making the shift registers that are causing problems implement as cores, rather than behavioural HDL doesn't seem to solve the problem, the tools will quite happily combine two such cores into one LUT. 3. Explicitly disabling SRLs (by either putting lots of single delays as cores / adding resets / (*shreg_extract = "NO"*)-ing HDL code makes the problem go away for the individual delay (since now there aren't any LUTs to combine). But mostly the symptom will just appear somewhere else. (I haven't tried the nuclear SRL global disable, but I'd be amazed if that didn't just cause my design to explode). 4. Resynthesizing the netlists with "-lc off" seems to have made all the issues I was having disappear. At least in the timing report I've read the headline spectacular fails have gone. Map reports that there are still some SRLs using both O5 and O6 outputs, but I've got a bunch of pcores, and I haven't resynth'd them all yet. I'm a bit surprised that this option is needed in XST to avoid combining luts that exist in different pcores, I would have thought turning it off in map would be sufficient, but I guess I was wrong. Maybe I would have figured this out sooner if I'd properly read an up-to-date XST manual -- it appears that the default behaviour of lut combining in XST has gone from 'off' in Virtex 5, to 'auto' in Virtex 6. So bottom line, maybe -lc is an option worth playing with in future if designs are failing timing with bizarre signal paths. Thanks again for the help (and big shoutout for resynth_netlist, which I certainly didn't realise was added by Dave 4 YEARS AGO!). Jack On 5 December 2014 at 07:01, Jason Manley wrote: > I often re-run XST with: > > register_balancing yes > optimize_primitives yes > read_cores yes > shreg_extract no > > shreg_extract prevents adjacent registers from being combined into SRL16s. > > Jason Manley > CBF Manager > SKA-SA > > Cell: +27 82 662 7726 > Work: +27 21 506 7300 > > On 05 Dec 2014, at 6:27, Henno Kriel wrote: > >> Hi Jack, >> >> In Simulink if have seen similar issues when trying to add more "register" >> pipelining, to decrease routing delay's and thus increase Fmax. >> However, ISE just collapses all the pipelining into a single SRL, which >> yields the frustrations you mentioned. >> You can prevent this from happening, by adding a synchronise reset (one of >> the tick boxes on delay block) to your pipelining registers. >> You will have to connect up a reset signal from a register (but you don't >> actually need to use it), >> to ensure that it does not get optimized away. >> In my case this normally resolves the routing issue and achieves timing >> closure. >> >> Hope this helps. >> HK >> >> >> >> On Thu, Dec 4, 2014 at 9:37 PM, Jack Hickish wrote: >> Hey Mark, >> >> Yeah, I guess I could manually force the locations of the two offending >> shift-regs to stop the combination, but the problem SRLs seem to be a fairly >> arbitrary selection of those in the design. I don't really want to have to >> start constraining at the LUT level if I can help it. But maybe I'll try and >> see if the problem goes away, or just emerges somewhere else. >> >> Hi Dave, >> >> I have been through all the planAhead options, as well as the >> fast_runtime.opt settings in the base package (I've been using both flows) >> and (tried to) set everything to optimize for speed. The -lt option to me >> seems like it should control the behaviour I'm seeing, but it doesn't seem >> to. I'm using pblocks, but have been almost exclusively been constraining >> only rams/dsps. As above, I'm about to try forcing the placements. I haven't >> run resynth netlist on my simulink design, but equivalent register removal >> is turned off in planAhead and some of the signals it appears to be >> LUT-combining belong to different pcores, so I thought that planahead >> settings should be enough. (obviously I could be wrong). >> In any case, I didn't think this was an equivalent register removal problem. >> It's not like multiple copies of the same register are being merged at the >> expense of fanout, just a 2-clock data delay inside an X-engine might be >> merged with a 2-clock delay of some data signal in an FFT. But again, maybe >> I'm understanding the options wrong, so I'll try resynthing the netlist and >> see if that helps. >> >> Thanks for your help, both. >> >> Jack >> >> >> >> On Thu Dec 04 2014 at 19:18:35 David MacMahon >> wrote: >> Hi, Jack, >> >> Are the tools are optimizing for area instead of speed? Are you using >> Pblocks? >> >> I don't know if this is relevant to your situation, but I've run
Re: [casper] Compiler merging SRLs -- Timing performance
I often re-run XST with: register_balancing yes optimize_primitives yes read_cores yes shreg_extract no shreg_extract prevents adjacent registers from being combined into SRL16s. Jason Manley CBF Manager SKA-SA Cell: +27 82 662 7726 Work: +27 21 506 7300 On 05 Dec 2014, at 6:27, Henno Kriel wrote: > Hi Jack, > > In Simulink if have seen similar issues when trying to add more "register" > pipelining, to decrease routing delay's and thus increase Fmax. > However, ISE just collapses all the pipelining into a single SRL, which > yields the frustrations you mentioned. > You can prevent this from happening, by adding a synchronise reset (one of > the tick boxes on delay block) to your pipelining registers. > You will have to connect up a reset signal from a register (but you don't > actually need to use it), > to ensure that it does not get optimized away. > In my case this normally resolves the routing issue and achieves timing > closure. > > Hope this helps. > HK > > > > On Thu, Dec 4, 2014 at 9:37 PM, Jack Hickish wrote: > Hey Mark, > > Yeah, I guess I could manually force the locations of the two offending > shift-regs to stop the combination, but the problem SRLs seem to be a fairly > arbitrary selection of those in the design. I don't really want to have to > start constraining at the LUT level if I can help it. But maybe I'll try and > see if the problem goes away, or just emerges somewhere else. > > Hi Dave, > > I have been through all the planAhead options, as well as the > fast_runtime.opt settings in the base package (I've been using both flows) > and (tried to) set everything to optimize for speed. The -lt option to me > seems like it should control the behaviour I'm seeing, but it doesn't seem > to. I'm using pblocks, but have been almost exclusively been constraining > only rams/dsps. As above, I'm about to try forcing the placements. I haven't > run resynth netlist on my simulink design, but equivalent register removal is > turned off in planAhead and some of the signals it appears to be > LUT-combining belong to different pcores, so I thought that planahead > settings should be enough. (obviously I could be wrong). > In any case, I didn't think this was an equivalent register removal problem. > It's not like multiple copies of the same register are being merged at the > expense of fanout, just a 2-clock data delay inside an X-engine might be > merged with a 2-clock delay of some data signal in an FFT. But again, maybe > I'm understanding the options wrong, so I'll try resynthing the netlist and > see if that helps. > > Thanks for your help, both. > > Jack > > > > On Thu Dec 04 2014 at 19:18:35 David MacMahon > wrote: > Hi, Jack, > > Are the tools are optimizing for area instead of speed? Are you using > Pblocks? > > I don't know if this is relevant to your situation, but I've run into > annoyances when the tools use "equivalent register removal" to save a few > flip-flops but end up causing fan-out/routing issues. That can be turned > off, but it's a synthesis option so if you want to apply it to a System > Generator netlist, you have to use the "resynth_netlist" Matlab function from > the casper library to re-synthesize the entire netlist. > > Dave > > On Dec 4, 2014, at 10:48 AM, Jack Hickish wrote: > > > Hi all, > > > > This is something I've been fighting with for a while now, and I wonder if > > anyone on this maillist has any insight (because I'm pretty sure I may just > > be doing something wrong with the tools). > > > > The problem: > > I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz. > > However, every now and then I'll make a small change to the design and the > > compile will fail timing catastrophically, with paths failing sometimes > > with -2 ns (or worse) slack. > > When I look at the failing path(s), the delays are usually ~80% routing. > > I'll see a signal take a huge detour to use a shift register in some > > arbitrary location on the chip. Upon closer inspection of the relevant SRL, > > it appears that the LUT concerned is being used for two signal paths, one > > on the O5 output, one on the O6. The result seems to be that it is poorly > > placed for both it's roles. > > > > I'm only using ~50% of the slices and about 30% of the registers / luts on > > the FPGA, and there are plenty of sensibly located SLICEMs the placer could > > use if it so desired. I've switched lut combining off (with the -lt flag), > > in planahead which doesn't seem to have made any difference. > > > > Can anyone offer me any words of advice / wisdom which might reduce my > > confusion at what's going on (or, even better, help me solve the problem)? > > > > Despairingly yours, > > Jack > > > > > > > > > -- > Kind regards, > Henno Kriel > > DBE: Hardware Manager > > SKA South Africa > Third Floor > The Park > Park Road (off Alexandra Road) > Pinelands > 7405 > Western Cape > South Africa > > Latitu
Re: [casper] Compiler merging SRLs -- Timing performance
Hi Jack, In Simulink if have seen similar issues when trying to add more "register" pipelining, to decrease routing delay's and thus increase Fmax. However, ISE just collapses all the pipelining into a single SRL, which yields the frustrations you mentioned. You can prevent this from happening, by adding a synchronise reset (one of the tick boxes on delay block) to your pipelining registers. You will have to connect up a reset signal from a register (but you don't actually need to use it), to ensure that it does not get optimized away. In my case this normally resolves the routing issue and achieves timing closure. Hope this helps. HK On Thu, Dec 4, 2014 at 9:37 PM, Jack Hickish wrote: > Hey Mark, > > Yeah, I guess I could manually force the locations of the two offending > shift-regs to stop the combination, but the problem SRLs seem to be a > fairly arbitrary selection of those in the design. I don't really want to > have to start constraining at the LUT level if I can help it. But maybe > I'll try and see if the problem goes away, or just emerges somewhere else. > > Hi Dave, > > I have been through all the planAhead options, as well as the > fast_runtime.opt settings in the base package (I've been using both flows) > and (tried to) set everything to optimize for speed. The -lt option to me > seems like it should control the behaviour I'm seeing, but it doesn't seem > to. I'm using pblocks, but have been almost exclusively been constraining > only rams/dsps. As above, I'm about to try forcing the placements. I > haven't run resynth netlist on my simulink design, but equivalent register > removal is turned off in planAhead and some of the signals it appears to be > LUT-combining belong to different pcores, so I thought that planahead > settings should be enough. (obviously I could be wrong). > In any case, I didn't think this was an equivalent register removal > problem. It's not like multiple copies of the same register are being > merged at the expense of fanout, just a 2-clock data delay inside an > X-engine might be merged with a 2-clock delay of some data signal in an > FFT. But again, maybe I'm understanding the options wrong, so I'll try > resynthing the netlist and see if that helps. > > Thanks for your help, both. > > Jack > > > > On Thu Dec 04 2014 at 19:18:35 David MacMahon > wrote: > >> Hi, Jack, >> >> Are the tools are optimizing for area instead of speed? Are you using >> Pblocks? >> >> I don't know if this is relevant to your situation, but I've run into >> annoyances when the tools use "equivalent register removal" to save a few >> flip-flops but end up causing fan-out/routing issues. That can be turned >> off, but it's a synthesis option so if you want to apply it to a System >> Generator netlist, you have to use the "resynth_netlist" Matlab function >> from the casper library to re-synthesize the entire netlist. >> >> Dave >> >> On Dec 4, 2014, at 10:48 AM, Jack Hickish wrote: >> >> > Hi all, >> > >> > This is something I've been fighting with for a while now, and I wonder >> if anyone on this maillist has any insight (because I'm pretty sure I may >> just be doing something wrong with the tools). >> > >> > The problem: >> > I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz. >> However, every now and then I'll make a small change to the design and the >> compile will fail timing catastrophically, with paths failing sometimes >> with -2 ns (or worse) slack. >> > When I look at the failing path(s), the delays are usually ~80% >> routing. I'll see a signal take a huge detour to use a shift register in >> some arbitrary location on the chip. Upon closer inspection of the relevant >> SRL, it appears that the LUT concerned is being used for two signal paths, >> one on the O5 output, one on the O6. The result seems to be that it is >> poorly placed for both it's roles. >> > >> > I'm only using ~50% of the slices and about 30% of the registers / luts >> on the FPGA, and there are plenty of sensibly located SLICEMs the placer >> could use if it so desired. I've switched lut combining off (with the -lt >> flag), in planahead which doesn't seem to have made any difference. >> > >> > Can anyone offer me any words of advice / wisdom which might reduce my >> confusion at what's going on (or, even better, help me solve the problem)? >> > >> > Despairingly yours, >> > Jack >> > >> > >> >> -- Kind regards, Henno Kriel DBE: Hardware Manager SKA South Africa Third Floor The Park Park Road (off Alexandra Road) Pinelands 7405 Western Cape South Africa Latitude: -33.94329 (South); Longitude: 18.48945 (East). (p) +27 (0)21 506 7300 (p) +27 (0)21 506 7374 (direct) (f) +27 (0)21 506 7375 (m) +27 (0)84 504 5050
Re: [casper] Compiler merging SRLs -- Timing performance
Hey Mark, Yeah, I guess I could manually force the locations of the two offending shift-regs to stop the combination, but the problem SRLs seem to be a fairly arbitrary selection of those in the design. I don't really want to have to start constraining at the LUT level if I can help it. But maybe I'll try and see if the problem goes away, or just emerges somewhere else. Hi Dave, I have been through all the planAhead options, as well as the fast_runtime.opt settings in the base package (I've been using both flows) and (tried to) set everything to optimize for speed. The -lt option to me seems like it should control the behaviour I'm seeing, but it doesn't seem to. I'm using pblocks, but have been almost exclusively been constraining only rams/dsps. As above, I'm about to try forcing the placements. I haven't run resynth netlist on my simulink design, but equivalent register removal is turned off in planAhead and some of the signals it appears to be LUT-combining belong to different pcores, so I thought that planahead settings should be enough. (obviously I could be wrong). In any case, I didn't think this was an equivalent register removal problem. It's not like multiple copies of the same register are being merged at the expense of fanout, just a 2-clock data delay inside an X-engine might be merged with a 2-clock delay of some data signal in an FFT. But again, maybe I'm understanding the options wrong, so I'll try resynthing the netlist and see if that helps. Thanks for your help, both. Jack On Thu Dec 04 2014 at 19:18:35 David MacMahon wrote: > Hi, Jack, > > Are the tools are optimizing for area instead of speed? Are you using > Pblocks? > > I don't know if this is relevant to your situation, but I've run into > annoyances when the tools use "equivalent register removal" to save a few > flip-flops but end up causing fan-out/routing issues. That can be turned > off, but it's a synthesis option so if you want to apply it to a System > Generator netlist, you have to use the "resynth_netlist" Matlab function > from the casper library to re-synthesize the entire netlist. > > Dave > > On Dec 4, 2014, at 10:48 AM, Jack Hickish wrote: > > > Hi all, > > > > This is something I've been fighting with for a while now, and I wonder > if anyone on this maillist has any insight (because I'm pretty sure I may > just be doing something wrong with the tools). > > > > The problem: > > I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz. > However, every now and then I'll make a small change to the design and the > compile will fail timing catastrophically, with paths failing sometimes > with -2 ns (or worse) slack. > > When I look at the failing path(s), the delays are usually ~80% routing. > I'll see a signal take a huge detour to use a shift register in some > arbitrary location on the chip. Upon closer inspection of the relevant SRL, > it appears that the LUT concerned is being used for two signal paths, one > on the O5 output, one on the O6. The result seems to be that it is poorly > placed for both it's roles. > > > > I'm only using ~50% of the slices and about 30% of the registers / luts > on the FPGA, and there are plenty of sensibly located SLICEMs the placer > could use if it so desired. I've switched lut combining off (with the -lt > flag), in planahead which doesn't seem to have made any difference. > > > > Can anyone offer me any words of advice / wisdom which might reduce my > confusion at what's going on (or, even better, help me solve the problem)? > > > > Despairingly yours, > > Jack > > > > > >
Re: [casper] Compiler merging SRLs -- Timing performance
Hi, Jack, Are the tools are optimizing for area instead of speed? Are you using Pblocks? I don't know if this is relevant to your situation, but I've run into annoyances when the tools use "equivalent register removal" to save a few flip-flops but end up causing fan-out/routing issues. That can be turned off, but it's a synthesis option so if you want to apply it to a System Generator netlist, you have to use the "resynth_netlist" Matlab function from the casper library to re-synthesize the entire netlist. Dave On Dec 4, 2014, at 10:48 AM, Jack Hickish wrote: > Hi all, > > This is something I've been fighting with for a while now, and I wonder if > anyone on this maillist has any insight (because I'm pretty sure I may just > be doing something wrong with the tools). > > The problem: > I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz. > However, every now and then I'll make a small change to the design and the > compile will fail timing catastrophically, with paths failing sometimes with > -2 ns (or worse) slack. > When I look at the failing path(s), the delays are usually ~80% routing. I'll > see a signal take a huge detour to use a shift register in some arbitrary > location on the chip. Upon closer inspection of the relevant SRL, it appears > that the LUT concerned is being used for two signal paths, one on the O5 > output, one on the O6. The result seems to be that it is poorly placed for > both it's roles. > > I'm only using ~50% of the slices and about 30% of the registers / luts on > the FPGA, and there are plenty of sensibly located SLICEMs the placer could > use if it so desired. I've switched lut combining off (with the -lt flag), in > planahead which doesn't seem to have made any difference. > > Can anyone offer me any words of advice / wisdom which might reduce my > confusion at what's going on (or, even better, help me solve the problem)? > > Despairingly yours, > Jack > >
Re: [casper] Compiler merging SRLs -- Timing performance
Hi Jack, Not sure if this will help, but in Planahead I would try to click and drag that LUT as close as possible to each of the outputs. And if that doesn't help or makes it worse, you could also try to duplicate the logic going to each of those outputs, forcing separate LUTs to be used. Cheers, Mark On Thu, Dec 4, 2014 at 10:48 AM, Jack Hickish wrote: > Hi all, > > This is something I've been fighting with for a while now, and I wonder if > anyone on this maillist has any insight (because I'm pretty sure I may just > be doing something wrong with the tools). > > The problem: > I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz. > However, every now and then I'll make a small change to the design and the > compile will fail timing catastrophically, with paths failing sometimes > with -2 ns (or worse) slack. > When I look at the failing path(s), the delays are usually ~80% routing. > I'll see a signal take a huge detour to use a shift register in some > arbitrary location on the chip. Upon closer inspection of the relevant SRL, > it appears that the LUT concerned is being used for two signal paths, one > on the O5 output, one on the O6. The result seems to be that it is poorly > placed for both it's roles. > > I'm only using ~50% of the slices and about 30% of the registers / luts on > the FPGA, and there are plenty of sensibly located SLICEMs the placer could > use if it so desired. I've switched lut combining off (with the -lt flag), > in planahead which doesn't seem to have made any difference. > > Can anyone offer me any words of advice / wisdom which might reduce my > confusion at what's going on (or, even better, help me solve the problem)? > > Despairingly yours, > Jack > > >
[casper] Compiler merging SRLs -- Timing performance
Hi all, This is something I've been fighting with for a while now, and I wonder if anyone on this maillist has any insight (because I'm pretty sure I may just be doing something wrong with the tools). The problem: I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz. However, every now and then I'll make a small change to the design and the compile will fail timing catastrophically, with paths failing sometimes with -2 ns (or worse) slack. When I look at the failing path(s), the delays are usually ~80% routing. I'll see a signal take a huge detour to use a shift register in some arbitrary location on the chip. Upon closer inspection of the relevant SRL, it appears that the LUT concerned is being used for two signal paths, one on the O5 output, one on the O6. The result seems to be that it is poorly placed for both it's roles. I'm only using ~50% of the slices and about 30% of the registers / luts on the FPGA, and there are plenty of sensibly located SLICEMs the placer could use if it so desired. I've switched lut combining off (with the -lt flag), in planahead which doesn't seem to have made any difference. Can anyone offer me any words of advice / wisdom which might reduce my confusion at what's going on (or, even better, help me solve the problem)? Despairingly yours, Jack