Re: [casper] Problem setting parameters in fft blocks using mlib_devel
Hi Kenneth, I have managed to replicate your experience with the fft, and will look into it a bit later today. Regards Andrew On Fri, 2013-01-18 at 16:20 +, Kenneth R. Treptow wrote: > Restoring the links did not fix the problem. > > I have found that the fft_wideband_real is also not working although I > can set its parameters. > > I can compile tutorial 3 which contains the fft_wideband_real block. > > I had to replace this block with the new one from the library to get > it to compile. > > My guess that there is something cached in this design that makes it > work. > > > > Thanks Ken > > > > Afternoon Ken > > > > > I've seen similar behaviour when the block I'm using is cached by > Matlab oddly. Or that's how I'd describe what happens, anyway. Two > quick things to try: > > > > > > 1. Browse to the CASPER libraries in the library browser. Do they > refresh when you open them? > > > 2. Right click on the offending FFT in the model, click Link Options > -> Restore Link, and then click Use Library Block. I recall on > occasion having to actually delete the block from the model and add it > again from the library. > > > > > > It could also be a problem with the init scripts, though, so post back > if you're still having trouble. > > > > > > Regards > > > Paul > > > > > > > > On 18 January 2013 17:02, Andrew Martens wrote: > > Hey Ken > > Problem is a bit hard to get my head around, I don't seem to be > getting > the same results. It may be a version problem, I still am (rather > ashamedly) using a very old Matlab version. It would be cool if > someone > with similar versions to you could try to replicate your results... > > I will prod the problem again on Monday if no-one has managed to get > further, the init scripts can be made to generate logs as they run > that > may be helpful. > > Regards > Andrew > > > > > > >
Re: [casper] number of coefficients needed in PFB and FFT
PS3. You could also have done the 2^16 FFT's coefficients as a narrow cmult... you'd need to use a BRAM to store the 2^13 "reset points", but it would still indicate a reduction in memory use by a factor of 4 -- not trivial by any means. On Mon, Jan 21, 2013 at 9:39 PM, Ryan Monroe wrote: > It would work well for the PFB, but what we *really* need is a solid > "Direct Digital Synth (DDS) coefficient generator". FFT coefficients are > really just sampled points around the unit circle, so you could, in > principle, use a recursive complex multiplier to generate the coefficients > on the fly. You'll lose log2(sqrt(K)) bits for a recursion count of K, but > that's probably OK most of the time. Say you're doing a 2^14 point FFT, > you need 2^13 coeffs. You start with 18 bits of resolution and can do 1024 > iterations before you degrade down to the est. 2^13 resolution. So you'll > only need to store 8 "reset points". Four of those will be 1, j, -1 and -j > in this case. You could thus replace 8 BRAM36'es with three DSPs. > > If you had a much larger FFT, say 2^16... you would have to use a wider > recursive multiplier. You can achieve a wide cmult in no more than 10 > DSPs...I think. In that case, you would start with 25 bits and be able to > droop to 16 bits -- so up to 2^(2*9) = of recursion. You would only > need to have one "reset point" and your noise performance would be more > than sufficient. 1, j, -1 and -j are easy to store though, so I would > probably go with that > > In addition, for the FFT direct, the first stage has only one shared > coefficient pattern, second stage has 2, third 4, etc. You can, of course, > share coefficients amongst a stage where possible. The real winnings occur > when you realize that the other coefficient banks within later stages are > actually the same coeffs as the first stage, with a constant phase rotation > (again, I'm 90% sure but I'll check tomorrow morning). So, you could > generate your coefficients once, and then use a couple of complex > multipliers to make the coeffs for the other stages. BAM! FFT Direct's > coefficient memory utilization is *gone* > > You could also do this for the FFT Biplex, but it would be a bit more > complicated. Whoever designed the biplex FFT used in-order inputs. This > is OK, but it means that the coefficients are in bit-reverse order. So, > you would have to move the biplex unscrambler to the beginning, change the > mux logic, and replace the delay elements in the delay-commutator with some > flavor of "delay, bit-reversed". I don't know how that would look quite > yet. If you did that, your coefficients would become in-order, and you > could achieve the same savings I described with the FFT-Direct. Also, I > implement coefficient and control logic sharing in my biplex and direct FFT > and it works *really well* at managing the fabric and memory utilization. > Worth a shot. > > :-) > > --Ryan Monroe > > PS, Sorry, I'm a bit busy right now so I can't implement a coefficient > interpolator for you guys right now. I'll write back when I'm more free > > PS2. I'm a bit anal about noise performance so I usually use a couple > more bits then Dan prescribes, but as he demonstrated in the asic talks, > his comments about bit widths are 100% correct. I would recommend them as > a general design practice as well. > > > > > On Mon, Jan 21, 2013 at 3:48 PM, Dan Werthimer wrote: > >> >> agreed. anybody already have, or want to develop, a coefficient >> interpolator? >> >> dan >> >> On Mon, Jan 21, 2013 at 3:44 PM, Aaron Parsons < >> apars...@astron.berkeley.edu> wrote: >> >>> Agreed. >>> >>> The coefficient interpolator, however, could get substantial savings >>> beyond that, even, and could be applicable to many things besides PFBs. >>> >>> On Mon, Jan 21, 2013 at 3:36 PM, Dan Werthimer wrote: >>> hi aaron, if you use xilinx brams for coefficients, they can be configured as dual port memories, so you can get the PFB reverse and forward coefficients both at the same time, from the same memory, almost for free, without any memory size penalty over single port, dan On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons < apars...@astron.berkeley.edu> wrote: > You guys probably appreciate this already, but although the > coefficients in the PFB FIR are generally symmetric around the center tap, > the upper and lower taps use these coefficients in reverse order from one > another. In order to take advantage of the symmetry, you'll have to use > dual-port ROMs that support two different addresses (one counting up and > one counting down). In the original core I wrote, I instead just shared > coefficients between the real and imaginary components. This was an easy > factor of 2 savings. After that first factor of two, we found it was kind > of diminishing returns... > > Another thought coul
Re: [casper] number of coefficients needed in PFB and FFT
It would work well for the PFB, but what we *really* need is a solid "Direct Digital Synth (DDS) coefficient generator". FFT coefficients are really just sampled points around the unit circle, so you could, in principle, use a recursive complex multiplier to generate the coefficients on the fly. You'll lose log2(sqrt(K)) bits for a recursion count of K, but that's probably OK most of the time. Say you're doing a 2^14 point FFT, you need 2^13 coeffs. You start with 18 bits of resolution and can do 1024 iterations before you degrade down to the est. 2^13 resolution. So you'll only need to store 8 "reset points". Four of those will be 1, j, -1 and -j in this case. You could thus replace 8 BRAM36'es with three DSPs. If you had a much larger FFT, say 2^16... you would have to use a wider recursive multiplier. You can achieve a wide cmult in no more than 10 DSPs...I think. In that case, you would start with 25 bits and be able to droop to 16 bits -- so up to 2^(2*9) = of recursion. You would only need to have one "reset point" and your noise performance would be more than sufficient. 1, j, -1 and -j are easy to store though, so I would probably go with that In addition, for the FFT direct, the first stage has only one shared coefficient pattern, second stage has 2, third 4, etc. You can, of course, share coefficients amongst a stage where possible. The real winnings occur when you realize that the other coefficient banks within later stages are actually the same coeffs as the first stage, with a constant phase rotation (again, I'm 90% sure but I'll check tomorrow morning). So, you could generate your coefficients once, and then use a couple of complex multipliers to make the coeffs for the other stages. BAM! FFT Direct's coefficient memory utilization is *gone* You could also do this for the FFT Biplex, but it would be a bit more complicated. Whoever designed the biplex FFT used in-order inputs. This is OK, but it means that the coefficients are in bit-reverse order. So, you would have to move the biplex unscrambler to the beginning, change the mux logic, and replace the delay elements in the delay-commutator with some flavor of "delay, bit-reversed". I don't know how that would look quite yet. If you did that, your coefficients would become in-order, and you could achieve the same savings I described with the FFT-Direct. Also, I implement coefficient and control logic sharing in my biplex and direct FFT and it works *really well* at managing the fabric and memory utilization. Worth a shot. :-) --Ryan Monroe PS, Sorry, I'm a bit busy right now so I can't implement a coefficient interpolator for you guys right now. I'll write back when I'm more free PS2. I'm a bit anal about noise performance so I usually use a couple more bits then Dan prescribes, but as he demonstrated in the asic talks, his comments about bit widths are 100% correct. I would recommend them as a general design practice as well. On Mon, Jan 21, 2013 at 3:48 PM, Dan Werthimer wrote: > > agreed. anybody already have, or want to develop, a coefficient > interpolator? > > dan > > On Mon, Jan 21, 2013 at 3:44 PM, Aaron Parsons < > apars...@astron.berkeley.edu> wrote: > >> Agreed. >> >> The coefficient interpolator, however, could get substantial savings >> beyond that, even, and could be applicable to many things besides PFBs. >> >> On Mon, Jan 21, 2013 at 3:36 PM, Dan Werthimer wrote: >> >>> >>> hi aaron, >>> >>> if you use xilinx brams for coefficients, they can be configured as dual >>> port memories, >>> so you can get the PFB reverse and forward coefficients both at the same >>> time, >>> from the same memory, almost for free, without any memory size penalty >>> over single port, >>> >>> dan >>> >>> >>> >>> >>> On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons < >>> apars...@astron.berkeley.edu> wrote: >>> You guys probably appreciate this already, but although the coefficients in the PFB FIR are generally symmetric around the center tap, the upper and lower taps use these coefficients in reverse order from one another. In order to take advantage of the symmetry, you'll have to use dual-port ROMs that support two different addresses (one counting up and one counting down). In the original core I wrote, I instead just shared coefficients between the real and imaginary components. This was an easy factor of 2 savings. After that first factor of two, we found it was kind of diminishing returns... Another thought could be a small BRAM with a linear interpolator between addresses. This would be a block with a wide range of uses, and could easily cut the size of the PFB coefficients by an order of magnitude. The (hamming/hanning) window and the sinc that the PFB uses for its coefficients are smooth functions, making all the fine subdivisions for N>32 samples rather unnecessary. On Mon, Jan 21, 2013 at 2:56 PM, Dan We
Re: [casper] ngc file generated by 14.3 is not recognized by planahead 14.3
Hi David: probably not. The input to the planahead only system.ngc and ucf files only, no need of OPB cores or other pcores. The problem is the filenames of wrappers generated by 14.3 are not correct, shouldn't have the prefix "system_". If you compare the filenames of 14.3 and older versions under /implementation/, you will find the difference. cheers homin On 01/22/2013 01:11 PM, David MacMahon wrote: Hi, Homin, Could this problem possibly be caused by planAhead not finding the necessary OPB cores? Dave On Jan 20, 2013, at 6:01 PM, homin wrote: Hello: I am trying to push the fabric clock faster and faster, so that i am trying the newest version 14.3 and planahead. I got a problem while using planahead 14.3. If the system.ngc compiled by 14.3 toolflow(matlab 2012a, Xilinx 14.3), planahead can't parse the ngc file. The problem is the V14.3 put the prefix "system_" in the wrapper files, the old versions didn't. I have tried the system.ngc files by 11.4, the planahead 14.3 can run it without problem. There should be somewhere the prefix "system_" can be removed, but i didn't have good luck. Anyone have met this problem ? regards homin jiang - [NgdBuild 604] logical block 'epb_opb_bridge_inst' with type 'system_epb_opb_bridge_inst_wrapper' could not be resolved. A pin name misspelling can cause this, a missing edif or ngc file, case mismatch between the block name and the edif or ngc file name, or the misspelling of a type name. Symbol 'system_epb_opb_bridge_inst_wrapper' is not supported in target 'virtex6'.
Re: [casper] ngc file generated by 14.3 is not recognized by planahead 14.3
Hi, Homin, Could this problem possibly be caused by planAhead not finding the necessary OPB cores? Dave On Jan 20, 2013, at 6:01 PM, homin wrote: > Hello: > > I am trying to push the fabric clock faster and faster, so that i am trying > the newest version 14.3 and planahead. > I got a problem while using planahead 14.3. If the system.ngc compiled by > 14.3 toolflow(matlab 2012a, Xilinx 14.3), planahead can't parse the ngc file. > The problem is the V14.3 put the prefix "system_" in the wrapper files, the > old versions didn't. I have tried the system.ngc files by 11.4, the planahead > 14.3 can run it without problem. > > There should be somewhere the prefix "system_" can be removed, but i didn't > have good luck. > Anyone have met this problem ? > > regards > homin jiang > > - >>> [NgdBuild 604] logical block 'epb_opb_bridge_inst' with type >>> 'system_epb_opb_bridge_inst_wrapper' could not be resolved. A pin name >>> misspelling can cause this, a missing edif or ngc file, case mismatch >>> between the block name and the edif or ngc file name, or the misspelling of >>> a type name. Symbol 'system_epb_opb_bridge_inst_wrapper' is not supported >>> in target 'virtex6'. > >
Re: [casper] debugging communication with one_GbE from roach-2 's fpga
Hi The 1GbE core is configured to work in the same way as the 10GbE core with regards to tgtap. Regards Henno On Mon, Jan 21, 2013 at 3:52 PM, Marc Welz wrote: > Hello > > > A question to anyone in the know: is there a runtime way to configure the > > source IP/mac settings on roach 2 -- i.e. is tap_start implemented for > the > > 1GbE core? > > So if the 1GbE offers the same register layout as the 10GbE core then > this should work out of the box (just use a different name as tgtap > parameter)... but I think there might be some bus width differences > which might have to be hidden ? > > regards > > marc > > -- Henno Kriel DSP Engineer Digital Back End meerKAT SKA South Africa Third Floor The Park Park Road (off Alexandra Road) Pinelands 7405 Western Cape South Africa Latitude: -33.94329 (South); Longitude: 18.48945 (East). (p) +27 (0)21 506 7300 (p) +27 (0)21 506 7365 (direct) (f) +27 (0)21 506 7375 (m) +27 (0)84 504 5050
Re: [casper] number of coefficients needed in PFB and FFT
agreed. anybody already have, or want to develop, a coefficient interpolator? dan On Mon, Jan 21, 2013 at 3:44 PM, Aaron Parsons wrote: > Agreed. > > The coefficient interpolator, however, could get substantial savings > beyond that, even, and could be applicable to many things besides PFBs. > > On Mon, Jan 21, 2013 at 3:36 PM, Dan Werthimer wrote: > >> >> hi aaron, >> >> if you use xilinx brams for coefficients, they can be configured as dual >> port memories, >> so you can get the PFB reverse and forward coefficients both at the same >> time, >> from the same memory, almost for free, without any memory size penalty >> over single port, >> >> dan >> >> >> >> >> On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons < >> apars...@astron.berkeley.edu> wrote: >> >>> You guys probably appreciate this already, but although the coefficients >>> in the PFB FIR are generally symmetric around the center tap, the upper and >>> lower taps use these coefficients in reverse order from one another. In >>> order to take advantage of the symmetry, you'll have to use dual-port ROMs >>> that support two different addresses (one counting up and one counting >>> down). In the original core I wrote, I instead just shared coefficients >>> between the real and imaginary components. This was an easy factor of 2 >>> savings. After that first factor of two, we found it was kind of >>> diminishing returns... >>> >>> Another thought could be a small BRAM with a linear interpolator between >>> addresses. This would be a block with a wide range of uses, and could >>> easily cut the size of the PFB coefficients by an order of magnitude. The >>> (hamming/hanning) window and the sinc that the PFB uses for its >>> coefficients are smooth functions, making all the fine subdivisions for >>> N>32 samples rather unnecessary. >>> >>> On Mon, Jan 21, 2013 at 2:56 PM, Dan Werthimer wrote: >>> hi danny and ryan, i suspect if you are only doing small FFT's and PFB FIR's, 1K points or so, then BRAM isn't likely to be the limiting resource, so you might as well store all the coefficients with high precision. but for long transforms, perhaps >4K points or so, then BRAM's might be in short supply, and then one could consider storing fewer coefficients (and also taking advantage of sin/cos and mirror symmetries, which don't degrade SNR at all). for any length FFT or PFB/FIR, even millions of points, if you store 1K coefficients with at least at least 10 bit precision, then the SNR will only be degraded slightly. quantization error analysis is nicely written up in memo #1, at https://casper.berkeley.edu/wiki/Memos best wishes, dan On Mon, Jan 21, 2013 at 4:33 AM, Danny Price wrote: > Hey Jason, > > Rewinding the thread a bit: > > On Fri, Jan 4, 2013 at 7:39 AM, Jason Manley wrote: > >> Andrew and I have also spoken about symmetrical co-efficients in the >> pfb_fir and I'd very much like to see this done. We recently added the >> option to share co-efficient generators across multiple inputs, which has >> helped a lot for designs with multiple ADCs. It seems to me that bigger >> designs are going to be BRAM limited (FFT BRAM requirements scale >> linearly), so we need to optimise cores to go light on this resource. >> > > Agreed that BRAM is in general more precious than compute. In addition > to using symmetrical coefficients, it might be worth looking at generating > coefficients. I did some tests this morning with a simple moving average > filter to turn 256 BRAM coefficients into 1024 (see attached model file), > and it looks pretty promising: errors are a max of about 2.5%. > > Coupling this with symmetric coefficients could cut coefficient > storage to 1/8th, at the cost of a few extra adders for the interpolation > filter. Thoughts? > > Cheers > Danny > >>> >>> >>> -- >>> Aaron Parsons >>> 510-306-4322 >>> Hearst Field Annex B54, UCB >>> >> >> > > > -- > Aaron Parsons > 510-306-4322 > Hearst Field Annex B54, UCB >
Re: [casper] number of coefficients needed in PFB and FFT
Agreed. The coefficient interpolator, however, could get substantial savings beyond that, even, and could be applicable to many things besides PFBs. On Mon, Jan 21, 2013 at 3:36 PM, Dan Werthimer wrote: > > hi aaron, > > if you use xilinx brams for coefficients, they can be configured as dual > port memories, > so you can get the PFB reverse and forward coefficients both at the same > time, > from the same memory, almost for free, without any memory size penalty > over single port, > > dan > > > > > On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons < > apars...@astron.berkeley.edu> wrote: > >> You guys probably appreciate this already, but although the coefficients >> in the PFB FIR are generally symmetric around the center tap, the upper and >> lower taps use these coefficients in reverse order from one another. In >> order to take advantage of the symmetry, you'll have to use dual-port ROMs >> that support two different addresses (one counting up and one counting >> down). In the original core I wrote, I instead just shared coefficients >> between the real and imaginary components. This was an easy factor of 2 >> savings. After that first factor of two, we found it was kind of >> diminishing returns... >> >> Another thought could be a small BRAM with a linear interpolator between >> addresses. This would be a block with a wide range of uses, and could >> easily cut the size of the PFB coefficients by an order of magnitude. The >> (hamming/hanning) window and the sinc that the PFB uses for its >> coefficients are smooth functions, making all the fine subdivisions for >> N>32 samples rather unnecessary. >> >> On Mon, Jan 21, 2013 at 2:56 PM, Dan Werthimer wrote: >> >>> >>> >>> hi danny and ryan, >>> >>> i suspect if you are only doing small FFT's and PFB FIR's, >>> 1K points or so, then BRAM isn't likely to be the limiting resource, >>> so you might as well store all the coefficients with high precision. >>> >>> but for long transforms, perhaps >4K points or so, >>> then BRAM's might be in short supply, and then one could >>> consider storing fewer coefficients (and also taking advantage >>> of sin/cos and mirror symmetries, which don't degrade SNR at all). >>> >>> for any length FFT or PFB/FIR, even millions of points, >>> if you store 1K coefficients with at least at least 10 bit precision, >>> then the SNR will only be degraded slightly. >>> quantization error analysis is nicely written up in memo #1, at >>> https://casper.berkeley.edu/wiki/Memos >>> >>> best wishes, >>> >>> dan >>> >>> >>> >>> On Mon, Jan 21, 2013 at 4:33 AM, Danny Price wrote: >>> Hey Jason, Rewinding the thread a bit: On Fri, Jan 4, 2013 at 7:39 AM, Jason Manley wrote: > Andrew and I have also spoken about symmetrical co-efficients in the > pfb_fir and I'd very much like to see this done. We recently added the > option to share co-efficient generators across multiple inputs, which has > helped a lot for designs with multiple ADCs. It seems to me that bigger > designs are going to be BRAM limited (FFT BRAM requirements scale > linearly), so we need to optimise cores to go light on this resource. > Agreed that BRAM is in general more precious than compute. In addition to using symmetrical coefficients, it might be worth looking at generating coefficients. I did some tests this morning with a simple moving average filter to turn 256 BRAM coefficients into 1024 (see attached model file), and it looks pretty promising: errors are a max of about 2.5%. Coupling this with symmetric coefficients could cut coefficient storage to 1/8th, at the cost of a few extra adders for the interpolation filter. Thoughts? Cheers Danny >>> >>> >> >> >> -- >> Aaron Parsons >> 510-306-4322 >> Hearst Field Annex B54, UCB >> > > -- Aaron Parsons 510-306-4322 Hearst Field Annex B54, UCB
Re: [casper] number of coefficients needed in PFB and FFT
hi aaron, if you use xilinx brams for coefficients, they can be configured as dual port memories, so you can get the PFB reverse and forward coefficients both at the same time, from the same memory, almost for free, without any memory size penalty over single port, dan On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons wrote: > You guys probably appreciate this already, but although the coefficients > in the PFB FIR are generally symmetric around the center tap, the upper and > lower taps use these coefficients in reverse order from one another. In > order to take advantage of the symmetry, you'll have to use dual-port ROMs > that support two different addresses (one counting up and one counting > down). In the original core I wrote, I instead just shared coefficients > between the real and imaginary components. This was an easy factor of 2 > savings. After that first factor of two, we found it was kind of > diminishing returns... > > Another thought could be a small BRAM with a linear interpolator between > addresses. This would be a block with a wide range of uses, and could > easily cut the size of the PFB coefficients by an order of magnitude. The > (hamming/hanning) window and the sinc that the PFB uses for its > coefficients are smooth functions, making all the fine subdivisions for > N>32 samples rather unnecessary. > > On Mon, Jan 21, 2013 at 2:56 PM, Dan Werthimer wrote: > >> >> >> hi danny and ryan, >> >> i suspect if you are only doing small FFT's and PFB FIR's, >> 1K points or so, then BRAM isn't likely to be the limiting resource, >> so you might as well store all the coefficients with high precision. >> >> but for long transforms, perhaps >4K points or so, >> then BRAM's might be in short supply, and then one could >> consider storing fewer coefficients (and also taking advantage >> of sin/cos and mirror symmetries, which don't degrade SNR at all). >> >> for any length FFT or PFB/FIR, even millions of points, >> if you store 1K coefficients with at least at least 10 bit precision, >> then the SNR will only be degraded slightly. >> quantization error analysis is nicely written up in memo #1, at >> https://casper.berkeley.edu/wiki/Memos >> >> best wishes, >> >> dan >> >> >> >> On Mon, Jan 21, 2013 at 4:33 AM, Danny Price wrote: >> >>> Hey Jason, >>> >>> Rewinding the thread a bit: >>> >>> On Fri, Jan 4, 2013 at 7:39 AM, Jason Manley wrote: >>> Andrew and I have also spoken about symmetrical co-efficients in the pfb_fir and I'd very much like to see this done. We recently added the option to share co-efficient generators across multiple inputs, which has helped a lot for designs with multiple ADCs. It seems to me that bigger designs are going to be BRAM limited (FFT BRAM requirements scale linearly), so we need to optimise cores to go light on this resource. >>> >>> Agreed that BRAM is in general more precious than compute. In addition >>> to using symmetrical coefficients, it might be worth looking at generating >>> coefficients. I did some tests this morning with a simple moving average >>> filter to turn 256 BRAM coefficients into 1024 (see attached model file), >>> and it looks pretty promising: errors are a max of about 2.5%. >>> >>> Coupling this with symmetric coefficients could cut coefficient storage >>> to 1/8th, at the cost of a few extra adders for the interpolation filter. >>> Thoughts? >>> >>> Cheers >>> Danny >>> >> >> > > > -- > Aaron Parsons > 510-306-4322 > Hearst Field Annex B54, UCB >
Re: [casper] number of coefficients needed in PFB and FFT
You guys probably appreciate this already, but although the coefficients in the PFB FIR are generally symmetric around the center tap, the upper and lower taps use these coefficients in reverse order from one another. In order to take advantage of the symmetry, you'll have to use dual-port ROMs that support two different addresses (one counting up and one counting down). In the original core I wrote, I instead just shared coefficients between the real and imaginary components. This was an easy factor of 2 savings. After that first factor of two, we found it was kind of diminishing returns... Another thought could be a small BRAM with a linear interpolator between addresses. This would be a block with a wide range of uses, and could easily cut the size of the PFB coefficients by an order of magnitude. The (hamming/hanning) window and the sinc that the PFB uses for its coefficients are smooth functions, making all the fine subdivisions for N>32 samples rather unnecessary. On Mon, Jan 21, 2013 at 2:56 PM, Dan Werthimer wrote: > > > hi danny and ryan, > > i suspect if you are only doing small FFT's and PFB FIR's, > 1K points or so, then BRAM isn't likely to be the limiting resource, > so you might as well store all the coefficients with high precision. > > but for long transforms, perhaps >4K points or so, > then BRAM's might be in short supply, and then one could > consider storing fewer coefficients (and also taking advantage > of sin/cos and mirror symmetries, which don't degrade SNR at all). > > for any length FFT or PFB/FIR, even millions of points, > if you store 1K coefficients with at least at least 10 bit precision, > then the SNR will only be degraded slightly. > quantization error analysis is nicely written up in memo #1, at > https://casper.berkeley.edu/wiki/Memos > > best wishes, > > dan > > > > On Mon, Jan 21, 2013 at 4:33 AM, Danny Price wrote: > >> Hey Jason, >> >> Rewinding the thread a bit: >> >> On Fri, Jan 4, 2013 at 7:39 AM, Jason Manley wrote: >> >>> Andrew and I have also spoken about symmetrical co-efficients in the >>> pfb_fir and I'd very much like to see this done. We recently added the >>> option to share co-efficient generators across multiple inputs, which has >>> helped a lot for designs with multiple ADCs. It seems to me that bigger >>> designs are going to be BRAM limited (FFT BRAM requirements scale >>> linearly), so we need to optimise cores to go light on this resource. >>> >> >> Agreed that BRAM is in general more precious than compute. In addition to >> using symmetrical coefficients, it might be worth looking at generating >> coefficients. I did some tests this morning with a simple moving average >> filter to turn 256 BRAM coefficients into 1024 (see attached model file), >> and it looks pretty promising: errors are a max of about 2.5%. >> >> Coupling this with symmetric coefficients could cut coefficient storage >> to 1/8th, at the cost of a few extra adders for the interpolation filter. >> Thoughts? >> >> Cheers >> Danny >> > > -- Aaron Parsons 510-306-4322 Hearst Field Annex B54, UCB
[casper] number of coefficients needed in PFB and FFT
hi danny and ryan, i suspect if you are only doing small FFT's and PFB FIR's, 1K points or so, then BRAM isn't likely to be the limiting resource, so you might as well store all the coefficients with high precision. but for long transforms, perhaps >4K points or so, then BRAM's might be in short supply, and then one could consider storing fewer coefficients (and also taking advantage of sin/cos and mirror symmetries, which don't degrade SNR at all). for any length FFT or PFB/FIR, even millions of points, if you store 1K coefficients with at least at least 10 bit precision, then the SNR will only be degraded slightly. quantization error analysis is nicely written up in memo #1, at https://casper.berkeley.edu/wiki/Memos best wishes, dan On Mon, Jan 21, 2013 at 4:33 AM, Danny Price wrote: > Hey Jason, > > Rewinding the thread a bit: > > On Fri, Jan 4, 2013 at 7:39 AM, Jason Manley wrote: > >> Andrew and I have also spoken about symmetrical co-efficients in the >> pfb_fir and I'd very much like to see this done. We recently added the >> option to share co-efficient generators across multiple inputs, which has >> helped a lot for designs with multiple ADCs. It seems to me that bigger >> designs are going to be BRAM limited (FFT BRAM requirements scale >> linearly), so we need to optimise cores to go light on this resource. >> > > Agreed that BRAM is in general more precious than compute. In addition to > using symmetrical coefficients, it might be worth looking at generating > coefficients. I did some tests this morning with a simple moving average > filter to turn 256 BRAM coefficients into 1024 (see attached model file), > and it looks pretty promising: errors are a max of about 2.5%. > > Coupling this with symmetric coefficients could cut coefficient storage to > 1/8th, at the cost of a few extra adders for the interpolation filter. > Thoughts? > > Cheers > Danny >
Re: [casper] debugging communication with one_GbE from roach-2 's fpga
Hi Casper group, We managed to take the FPGA 1GBE on ROACH2 fully under control, great thanks to John, Marc, Danny, Jack, and Jason! Wireshark was extremely helpful, as it can see packets as long as the receiver's IP is set correctly. For python grabbing to work, it turned out that we simply had to configure the sender's IP to have the same subnet as the receiver IP (meaning the first 3 segments of the IP). Thanks again! From: casper-boun...@lists.berkeley.edu [casper-boun...@lists.berkeley.edu] on behalf of Danny Price [danny.pr...@astro.ox.ac.uk] Sent: Monday, January 21, 2013 3:39 AM To: casper@lists.berkeley.edu list Subject: Re: [casper] debugging communication with one_GbE from roach-2 's fpga Hi Ionna, Jeff In addition to John and Mark's suggestions, I'd recommend checking: 1) If you're using SELinux, disable it as it causes all sorts of grief. 2) Check the MTU on your ethernet port is set to be larger than your packet size. 3) Make sure that you don't have a firewall up on that port. 4) Check if your Ethernet port is going down when the FPGA is reprogrammed (if you've got a switch in between this shouldn't happen). If you've already checked your socket with roach1 UDP code, then most of these points are moot. Hopefully one stands out though... Cheers Danny
Re: [casper] debugging communication with one_GbE from roach-2 's fpga
We haven't actually used the 1GbE core at all at SKA-SA, but AFAIK, it's the same and tgtap should work out the box. Henno's actually the authoritative source for all things 1GbE. He might be able to offer more tomorrow. Jason On 21 Jan 2013, at 15:52, Marc Welz wrote: > Hello > >> A question to anyone in the know: is there a runtime way to configure the >> source IP/mac settings on roach 2 -- i.e. is tap_start implemented for the >> 1GbE core? > > So if the 1GbE offers the same register layout as the 10GbE core then > this should work out of the box (just use a different name as tgtap > parameter)... but I think there might be some bus width differences > which might have to be hidden ? > > regards > > marc >
Re: [casper] debugging communication with one_GbE from roach-2 's fpga
Hello > A question to anyone in the know: is there a runtime way to configure the > source IP/mac settings on roach 2 -- i.e. is tap_start implemented for the > 1GbE core? So if the 1GbE offers the same register layout as the 10GbE core then this should work out of the box (just use a different name as tgtap parameter)... but I think there might be some bus width differences which might have to be hidden ? regards marc
Re: [casper] debugging communication with one_GbE from roach-2 's fpga
Hi Ioana, I'm attaching a model and python script that Guy Kenfack and I knocked together worked on at the Green Bank workshop. We sent a counter and saw the data at the right IP/port in wireshark. A question to anyone in the know: is there a runtime way to configure the source IP/mac settings on roach 2 -- i.e. is tap_start implemented for the 1GbE core? Cheers, Jack On 21 January 2013 08:07, Marc Welz wrote: > Hello > > > However, we are stuck in debugging the system: we think we are sending > stuff > > properly, but we can not read anything from the python socket, coming > from > > our roach 2 fpga. > > Try running tcpdump on the receiving PC. If you are using tgtap logic, > you should see occasional arp traffic from the roach to work out where > to send its > data. > > Some switches have UDP flood protection - if they see lots of UDP traffic, > especially broadcast UDP, they throttle it down. If tgtap is running and I > remember things correctly, data destined for machines which do not respond > to arp traffic will be broadcast. Alternatively if the destination MAC > is all zeros, > switches typically discard traffic instead of sending it on. > > I believe you are one of the first people to use the GbE port, so please > let us know of your progress. > > regards > > marc > >
Re: [casper] debugging communication with one_GbE from roach-2 's fpga
Hi Ionna, Jeff In addition to John and Mark's suggestions, I'd recommend checking: 1) If you're using SELinux, disable it as it causes all sorts of grief. 2) Check the MTU on your ethernet port is set to be larger than your packet size. 3) Make sure that you don't have a firewall up on that port. 4) Check if your Ethernet port is going down when the FPGA is reprogrammed (if you've got a switch in between this shouldn't happen). If you've already checked your socket with roach1 UDP code, then most of these points are moot. Hopefully one stands out though... Cheers Danny
Re: [casper] debugging communication with one_GbE from roach-2 's fpga
Hello > However, we are stuck in debugging the system: we think we are sending stuff > properly, but we can not read anything from the python socket, coming from > our roach 2 fpga. Try running tcpdump on the receiving PC. If you are using tgtap logic, you should see occasional arp traffic from the roach to work out where to send its data. Some switches have UDP flood protection - if they see lots of UDP traffic, especially broadcast UDP, they throttle it down. If tgtap is running and I remember things correctly, data destined for machines which do not respond to arp traffic will be broadcast. Alternatively if the destination MAC is all zeros, switches typically discard traffic instead of sending it on. I believe you are one of the first people to use the GbE port, so please let us know of your progress. regards marc