Re: [casper] Problem setting parameters in fft blocks using mlib_devel

2013-01-21 Thread Andrew Martens
Hi Kenneth,

I have managed to replicate your experience with the fft, and will look
into it a bit later today. 

Regards
Andrew

On Fri, 2013-01-18 at 16:20 +, Kenneth R. Treptow wrote:
> Restoring the links did not fix the problem.
> 
> I have found that the fft_wideband_real is also not working although I
> can set its parameters.
> 
> I can compile tutorial 3 which contains the fft_wideband_real block.
> 
> I had to replace this block with the new one from the library to get
> it to compile.
> 
> My guess that there is something cached in this design that makes it
> work.
> 
>  
> 
> Thanks Ken
> 
>  
> 
> Afternoon Ken
> 
>  
> 
> 
> I've seen similar behaviour when the block I'm using is cached by
> Matlab oddly. Or that's how I'd describe what happens, anyway. Two
> quick things to try:
> 
> 
>  
> 
> 
> 1. Browse to the CASPER libraries in the library browser. Do they
> refresh when you open them?
> 
> 
> 2. Right click on the offending FFT in the model, click Link Options
> -> Restore Link, and then click Use Library Block. I recall on
> occasion having to actually delete the block from the model and add it
> again from the library.
> 
> 
>  
> 
> 
> It could also be a problem with the init scripts, though, so post back
> if you're still having trouble.
> 
> 
>  
> 
> 
> Regards
> 
> 
> Paul
> 
> 
>  
> 
> 
>  
> 
> On 18 January 2013 17:02, Andrew Martens  wrote:
> 
> Hey Ken
> 
> Problem is a bit hard to get my head around, I don't seem to be
> getting
> the same results. It may be a version problem, I still am (rather
> ashamedly) using a very old Matlab version. It would be cool if
> someone
> with similar versions to you could try to replicate your results...
> 
> I will prod the problem again on Monday if no-one has managed to get
> further, the init scripts can be made to generate logs as they run
> that
> may be helpful.
> 
> Regards
> Andrew
> 
> 
> 
> 
>  
> 
> 





Re: [casper] number of coefficients needed in PFB and FFT

2013-01-21 Thread Ryan Monroe
PS3.  You could also have done the 2^16 FFT's coefficients as a narrow
cmult... you'd need to use a BRAM to store the 2^13 "reset points", but it
would still indicate a reduction in memory use by a factor of 4 -- not
trivial by any means.


On Mon, Jan 21, 2013 at 9:39 PM, Ryan Monroe wrote:

> It would work well for the PFB, but what we *really* need is a solid
> "Direct Digital Synth (DDS) coefficient generator".  FFT coefficients are
> really just sampled points around the unit circle, so you could, in
> principle, use a recursive complex multiplier to generate the coefficients
> on the fly.  You'll lose log2(sqrt(K)) bits for a recursion count of K, but
> that's probably OK most of the time.  Say you're doing a 2^14 point FFT,
> you need 2^13 coeffs.  You start with 18 bits of resolution and can do 1024
> iterations before you degrade down to the est. 2^13 resolution.  So you'll
> only need to store 8 "reset points".  Four of those will be 1, j, -1 and -j
> in this case.  You could thus replace 8 BRAM36'es with three DSPs.
>
> If you had a much larger FFT, say 2^16... you would have to use a wider
> recursive multiplier.  You can achieve a wide cmult in no more than 10
> DSPs...I think.  In that case, you would start with 25 bits and be able to
> droop to 16 bits -- so up to 2^(2*9) =  of recursion.  You would only
> need to have one "reset point" and your noise performance would be more
> than sufficient.  1, j, -1 and -j are easy to store though, so I would
> probably go with that
>
> In addition, for the FFT direct, the first stage has only one shared
> coefficient pattern, second stage has 2, third 4, etc.  You can, of course,
> share coefficients amongst a stage where possible.  The real winnings occur
> when you realize that the other coefficient banks within later stages are
> actually the same coeffs as the first stage, with a constant phase rotation
> (again, I'm 90% sure but I'll check tomorrow morning).  So, you could
> generate your coefficients once, and then use a couple of complex
> multipliers to make the coeffs for the other stages.  BAM!  FFT Direct's
> coefficient memory utilization is *gone*
>
> You could also do this for the FFT Biplex, but it would be a bit more
> complicated.  Whoever designed the biplex FFT used in-order inputs.  This
> is OK, but it means that the coefficients are in bit-reverse order.  So,
> you would have to move the biplex unscrambler to the beginning, change the
> mux logic, and replace the delay elements in the delay-commutator with some
> flavor of "delay, bit-reversed".  I don't know how that would look quite
> yet.  If you did that, your coefficients would become in-order, and you
> could achieve the same savings I described with the FFT-Direct.  Also, I
> implement coefficient and control logic sharing in my biplex and direct FFT
> and it works *really well* at managing the fabric and memory utilization.
>  Worth a shot.
>
> :-)
>
> --Ryan Monroe
>
> PS, Sorry, I'm a bit busy right now so I can't implement a coefficient
> interpolator for you guys right now.  I'll write back when I'm more free
>
> PS2.  I'm a bit anal about noise performance so I usually use a couple
> more bits then Dan prescribes, but as he demonstrated in the asic talks,
> his comments about bit widths are 100% correct.   I would recommend them as
> a general design practice as well.
>
>
>
>
> On Mon, Jan 21, 2013 at 3:48 PM, Dan Werthimer wrote:
>
>>
>> agreed.   anybody already have, or want to develop, a coefficient
>> interpolator?
>>
>> dan
>>
>> On Mon, Jan 21, 2013 at 3:44 PM, Aaron Parsons <
>> apars...@astron.berkeley.edu> wrote:
>>
>>> Agreed.
>>>
>>> The coefficient interpolator, however, could get substantial savings
>>> beyond that, even, and could be applicable to many things besides PFBs.
>>>
>>> On Mon, Jan 21, 2013 at 3:36 PM, Dan Werthimer wrote:
>>>

 hi aaron,

 if you use xilinx brams for coefficients, they can be configured as
 dual port memories,
 so you can get the PFB reverse and forward coefficients both at the
 same time,
 from the same memory,  almost for free, without any memory size penalty
 over single port,

 dan




 On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons <
 apars...@astron.berkeley.edu> wrote:

> You guys probably appreciate this already, but although the
> coefficients in the PFB FIR are generally symmetric around the center tap,
> the upper and lower taps use these coefficients in reverse order from one
> another.  In order to take advantage of the symmetry, you'll have to use
> dual-port ROMs that support two different addresses (one counting up and
> one counting down).  In the original core I wrote, I instead just shared
> coefficients between the real and imaginary components.  This was an easy
> factor of 2 savings.  After that first factor of two, we found it was kind
> of diminishing returns...
>
> Another thought coul

Re: [casper] number of coefficients needed in PFB and FFT

2013-01-21 Thread Ryan Monroe
It would work well for the PFB, but what we *really* need is a solid
"Direct Digital Synth (DDS) coefficient generator".  FFT coefficients are
really just sampled points around the unit circle, so you could, in
principle, use a recursive complex multiplier to generate the coefficients
on the fly.  You'll lose log2(sqrt(K)) bits for a recursion count of K, but
that's probably OK most of the time.  Say you're doing a 2^14 point FFT,
you need 2^13 coeffs.  You start with 18 bits of resolution and can do 1024
iterations before you degrade down to the est. 2^13 resolution.  So you'll
only need to store 8 "reset points".  Four of those will be 1, j, -1 and -j
in this case.  You could thus replace 8 BRAM36'es with three DSPs.

If you had a much larger FFT, say 2^16... you would have to use a wider
recursive multiplier.  You can achieve a wide cmult in no more than 10
DSPs...I think.  In that case, you would start with 25 bits and be able to
droop to 16 bits -- so up to 2^(2*9) =  of recursion.  You would only
need to have one "reset point" and your noise performance would be more
than sufficient.  1, j, -1 and -j are easy to store though, so I would
probably go with that

In addition, for the FFT direct, the first stage has only one shared
coefficient pattern, second stage has 2, third 4, etc.  You can, of course,
share coefficients amongst a stage where possible.  The real winnings occur
when you realize that the other coefficient banks within later stages are
actually the same coeffs as the first stage, with a constant phase rotation
(again, I'm 90% sure but I'll check tomorrow morning).  So, you could
generate your coefficients once, and then use a couple of complex
multipliers to make the coeffs for the other stages.  BAM!  FFT Direct's
coefficient memory utilization is *gone*

You could also do this for the FFT Biplex, but it would be a bit more
complicated.  Whoever designed the biplex FFT used in-order inputs.  This
is OK, but it means that the coefficients are in bit-reverse order.  So,
you would have to move the biplex unscrambler to the beginning, change the
mux logic, and replace the delay elements in the delay-commutator with some
flavor of "delay, bit-reversed".  I don't know how that would look quite
yet.  If you did that, your coefficients would become in-order, and you
could achieve the same savings I described with the FFT-Direct.  Also, I
implement coefficient and control logic sharing in my biplex and direct FFT
and it works *really well* at managing the fabric and memory utilization.
 Worth a shot.

:-)

--Ryan Monroe

PS, Sorry, I'm a bit busy right now so I can't implement a coefficient
interpolator for you guys right now.  I'll write back when I'm more free

PS2.  I'm a bit anal about noise performance so I usually use a couple more
bits then Dan prescribes, but as he demonstrated in the asic talks, his
comments about bit widths are 100% correct.   I would recommend them as a
general design practice as well.




On Mon, Jan 21, 2013 at 3:48 PM, Dan Werthimer wrote:

>
> agreed.   anybody already have, or want to develop, a coefficient
> interpolator?
>
> dan
>
> On Mon, Jan 21, 2013 at 3:44 PM, Aaron Parsons <
> apars...@astron.berkeley.edu> wrote:
>
>> Agreed.
>>
>> The coefficient interpolator, however, could get substantial savings
>> beyond that, even, and could be applicable to many things besides PFBs.
>>
>> On Mon, Jan 21, 2013 at 3:36 PM, Dan Werthimer wrote:
>>
>>>
>>> hi aaron,
>>>
>>> if you use xilinx brams for coefficients, they can be configured as dual
>>> port memories,
>>> so you can get the PFB reverse and forward coefficients both at the same
>>> time,
>>> from the same memory,  almost for free, without any memory size penalty
>>> over single port,
>>>
>>> dan
>>>
>>>
>>>
>>>
>>> On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons <
>>> apars...@astron.berkeley.edu> wrote:
>>>
 You guys probably appreciate this already, but although the
 coefficients in the PFB FIR are generally symmetric around the center tap,
 the upper and lower taps use these coefficients in reverse order from one
 another.  In order to take advantage of the symmetry, you'll have to use
 dual-port ROMs that support two different addresses (one counting up and
 one counting down).  In the original core I wrote, I instead just shared
 coefficients between the real and imaginary components.  This was an easy
 factor of 2 savings.  After that first factor of two, we found it was kind
 of diminishing returns...

 Another thought could be a small BRAM with a linear interpolator
 between addresses.  This would be a block with a wide range of uses, and
 could easily cut the size of the PFB coefficients by an order of magnitude.
  The (hamming/hanning) window and the sinc that the PFB uses for its
 coefficients are smooth functions, making all the fine subdivisions for
 N>32  samples rather unnecessary.

 On Mon, Jan 21, 2013 at 2:56 PM, Dan We

Re: [casper] ngc file generated by 14.3 is not recognized by planahead 14.3

2013-01-21 Thread homin

Hi David:

probably not. The input to the planahead only system.ngc and ucf files 
only, no need of OPB cores or other pcores. The problem is the filenames 
of wrappers generated by 14.3 are not correct, shouldn't have the prefix 
"system_".
If you compare the filenames of 14.3 and older versions under 
/implementation/, you will find the difference.


cheers
homin

On 01/22/2013 01:11 PM, David MacMahon wrote:

Hi, Homin,

Could this problem possibly be caused by planAhead not finding the necessary 
OPB cores?

Dave

On Jan 20, 2013, at 6:01 PM, homin wrote:


Hello:

I am trying to push the fabric clock faster and faster, so that i am trying the 
newest version 14.3 and planahead.
I got a problem while using planahead 14.3. If the system.ngc compiled by 14.3 
toolflow(matlab 2012a, Xilinx 14.3), planahead can't parse the ngc file. The problem is 
the V14.3 put the prefix "system_" in the wrapper files, the old versions 
didn't. I have tried the system.ngc files by 11.4, the planahead 14.3 can run it without 
problem.

There should be somewhere the prefix "system_" can be removed, but i didn't 
have good luck.
Anyone have met this problem ?

regards
homin jiang

-

[NgdBuild 604] logical block 'epb_opb_bridge_inst' with type 
'system_epb_opb_bridge_inst_wrapper' could not be resolved. A pin name 
misspelling can cause this, a missing edif or ngc file, case mismatch between 
the block name and the edif or ngc file name, or the misspelling of a type 
name. Symbol 'system_epb_opb_bridge_inst_wrapper' is not supported in target 
'virtex6'.








Re: [casper] ngc file generated by 14.3 is not recognized by planahead 14.3

2013-01-21 Thread David MacMahon
Hi, Homin,

Could this problem possibly be caused by planAhead not finding the necessary 
OPB cores?

Dave

On Jan 20, 2013, at 6:01 PM, homin wrote:

> Hello:
> 
> I am trying to push the fabric clock faster and faster, so that i am trying 
> the newest version 14.3 and planahead.
> I got a problem while using planahead 14.3. If the system.ngc compiled by 
> 14.3 toolflow(matlab 2012a, Xilinx 14.3), planahead can't parse the ngc file. 
> The problem is the V14.3 put the prefix "system_" in the wrapper files, the 
> old versions didn't. I have tried the system.ngc files by 11.4, the planahead 
> 14.3 can run it without problem.
> 
> There should be somewhere the prefix "system_" can be removed, but i didn't 
> have good luck.
> Anyone have met this problem ?
> 
> regards
> homin jiang
> 
> -
>>> [NgdBuild 604] logical block 'epb_opb_bridge_inst' with type 
>>> 'system_epb_opb_bridge_inst_wrapper' could not be resolved. A pin name 
>>> misspelling can cause this, a missing edif or ngc file, case mismatch 
>>> between the block name and the edif or ngc file name, or the misspelling of 
>>> a type name. Symbol 'system_epb_opb_bridge_inst_wrapper' is not supported 
>>> in target 'virtex6'.
> 
> 




Re: [casper] debugging communication with one_GbE from roach-2 's fpga

2013-01-21 Thread Henno Kriel
Hi

The 1GbE core is configured to work in the same way as the 10GbE core with
regards to tgtap.

Regards
Henno

On Mon, Jan 21, 2013 at 3:52 PM, Marc Welz  wrote:

> Hello
>
> > A question to anyone in the know: is there a runtime way to configure the
> > source IP/mac settings on roach 2 -- i.e. is tap_start implemented for
> the
> > 1GbE core?
>
> So if the 1GbE offers the same register layout as the 10GbE core then
> this should work out of the box (just use a different name as tgtap
> parameter)... but I think there might be some bus width differences
> which might have to be hidden ?
>
> regards
>
> marc
>
>


-- 
Henno Kriel

DSP Engineer
Digital Back End
meerKAT

SKA South Africa
Third Floor
The Park
Park Road (off Alexandra Road)
Pinelands
7405
Western Cape
South Africa

Latitude: -33.94329 (South); Longitude: 18.48945 (East).

(p) +27 (0)21 506 7300
(p) +27 (0)21 506 7365 (direct)
(f) +27 (0)21 506 7375
(m) +27 (0)84 504 5050


Re: [casper] number of coefficients needed in PFB and FFT

2013-01-21 Thread Dan Werthimer
agreed.   anybody already have, or want to develop, a coefficient
interpolator?

dan

On Mon, Jan 21, 2013 at 3:44 PM, Aaron Parsons  wrote:

> Agreed.
>
> The coefficient interpolator, however, could get substantial savings
> beyond that, even, and could be applicable to many things besides PFBs.
>
> On Mon, Jan 21, 2013 at 3:36 PM, Dan Werthimer wrote:
>
>>
>> hi aaron,
>>
>> if you use xilinx brams for coefficients, they can be configured as dual
>> port memories,
>> so you can get the PFB reverse and forward coefficients both at the same
>> time,
>> from the same memory,  almost for free, without any memory size penalty
>> over single port,
>>
>> dan
>>
>>
>>
>>
>> On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons <
>> apars...@astron.berkeley.edu> wrote:
>>
>>> You guys probably appreciate this already, but although the coefficients
>>> in the PFB FIR are generally symmetric around the center tap, the upper and
>>> lower taps use these coefficients in reverse order from one another.  In
>>> order to take advantage of the symmetry, you'll have to use dual-port ROMs
>>> that support two different addresses (one counting up and one counting
>>> down).  In the original core I wrote, I instead just shared coefficients
>>> between the real and imaginary components.  This was an easy factor of 2
>>> savings.  After that first factor of two, we found it was kind of
>>> diminishing returns...
>>>
>>> Another thought could be a small BRAM with a linear interpolator between
>>> addresses.  This would be a block with a wide range of uses, and could
>>> easily cut the size of the PFB coefficients by an order of magnitude.  The
>>> (hamming/hanning) window and the sinc that the PFB uses for its
>>> coefficients are smooth functions, making all the fine subdivisions for
>>> N>32  samples rather unnecessary.
>>>
>>> On Mon, Jan 21, 2013 at 2:56 PM, Dan Werthimer wrote:
>>>


 hi danny and ryan,

 i suspect if you are only doing small FFT's and PFB FIR's,
 1K points or so,  then BRAM isn't likely to be the limiting resource,
 so you might as well store all the coefficients with high precision.

 but for long transforms, perhaps >4K points or so,
 then BRAM's might be in short supply, and then one could
 consider storing fewer coefficients (and also taking advantage
 of sin/cos and mirror symmetries, which don't degrade SNR at all).

 for any length FFT or PFB/FIR, even millions of points,
 if you store 1K coefficients with at least at least 10 bit precision,
 then the SNR will only be degraded slightly.
 quantization error analysis is nicely written up in memo #1, at
 https://casper.berkeley.edu/wiki/Memos

 best wishes,

 dan



 On Mon, Jan 21, 2013 at 4:33 AM, Danny Price 
 wrote:

> Hey Jason,
>
> Rewinding the thread a bit:
>
> On Fri, Jan 4, 2013 at 7:39 AM, Jason Manley wrote:
>
>> Andrew and I have also spoken about symmetrical co-efficients in the
>> pfb_fir and I'd very much like to see this done. We recently added the
>> option to share co-efficient generators across multiple inputs, which has
>> helped a lot for designs with multiple ADCs. It seems to me that bigger
>> designs are going to be BRAM limited (FFT BRAM requirements scale
>> linearly), so we need to optimise cores to go light on this resource.
>>
>
> Agreed that BRAM is in general more precious than compute. In addition
> to using symmetrical coefficients, it might be worth looking at generating
> coefficients. I did some tests this morning with a simple moving average
> filter to turn 256 BRAM coefficients into 1024 (see attached model file),
> and it looks pretty promising: errors are a max of about 2.5%.
>
> Coupling this with symmetric coefficients could cut coefficient
> storage to 1/8th, at the cost of a few extra adders for the interpolation
> filter. Thoughts?
>
> Cheers
> Danny
>


>>>
>>>
>>> --
>>> Aaron Parsons
>>> 510-306-4322
>>> Hearst Field Annex B54, UCB
>>>
>>
>>
>
>
> --
> Aaron Parsons
> 510-306-4322
> Hearst Field Annex B54, UCB
>


Re: [casper] number of coefficients needed in PFB and FFT

2013-01-21 Thread Aaron Parsons
Agreed.

The coefficient interpolator, however, could get substantial savings beyond
that, even, and could be applicable to many things besides PFBs.

On Mon, Jan 21, 2013 at 3:36 PM, Dan Werthimer wrote:

>
> hi aaron,
>
> if you use xilinx brams for coefficients, they can be configured as dual
> port memories,
> so you can get the PFB reverse and forward coefficients both at the same
> time,
> from the same memory,  almost for free, without any memory size penalty
> over single port,
>
> dan
>
>
>
>
> On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons <
> apars...@astron.berkeley.edu> wrote:
>
>> You guys probably appreciate this already, but although the coefficients
>> in the PFB FIR are generally symmetric around the center tap, the upper and
>> lower taps use these coefficients in reverse order from one another.  In
>> order to take advantage of the symmetry, you'll have to use dual-port ROMs
>> that support two different addresses (one counting up and one counting
>> down).  In the original core I wrote, I instead just shared coefficients
>> between the real and imaginary components.  This was an easy factor of 2
>> savings.  After that first factor of two, we found it was kind of
>> diminishing returns...
>>
>> Another thought could be a small BRAM with a linear interpolator between
>> addresses.  This would be a block with a wide range of uses, and could
>> easily cut the size of the PFB coefficients by an order of magnitude.  The
>> (hamming/hanning) window and the sinc that the PFB uses for its
>> coefficients are smooth functions, making all the fine subdivisions for
>> N>32  samples rather unnecessary.
>>
>> On Mon, Jan 21, 2013 at 2:56 PM, Dan Werthimer wrote:
>>
>>>
>>>
>>> hi danny and ryan,
>>>
>>> i suspect if you are only doing small FFT's and PFB FIR's,
>>> 1K points or so,  then BRAM isn't likely to be the limiting resource,
>>> so you might as well store all the coefficients with high precision.
>>>
>>> but for long transforms, perhaps >4K points or so,
>>> then BRAM's might be in short supply, and then one could
>>> consider storing fewer coefficients (and also taking advantage
>>> of sin/cos and mirror symmetries, which don't degrade SNR at all).
>>>
>>> for any length FFT or PFB/FIR, even millions of points,
>>> if you store 1K coefficients with at least at least 10 bit precision,
>>> then the SNR will only be degraded slightly.
>>> quantization error analysis is nicely written up in memo #1, at
>>> https://casper.berkeley.edu/wiki/Memos
>>>
>>> best wishes,
>>>
>>> dan
>>>
>>>
>>>
>>> On Mon, Jan 21, 2013 at 4:33 AM, Danny Price wrote:
>>>
 Hey Jason,

 Rewinding the thread a bit:

 On Fri, Jan 4, 2013 at 7:39 AM, Jason Manley  wrote:

> Andrew and I have also spoken about symmetrical co-efficients in the
> pfb_fir and I'd very much like to see this done. We recently added the
> option to share co-efficient generators across multiple inputs, which has
> helped a lot for designs with multiple ADCs. It seems to me that bigger
> designs are going to be BRAM limited (FFT BRAM requirements scale
> linearly), so we need to optimise cores to go light on this resource.
>

 Agreed that BRAM is in general more precious than compute. In addition
 to using symmetrical coefficients, it might be worth looking at generating
 coefficients. I did some tests this morning with a simple moving average
 filter to turn 256 BRAM coefficients into 1024 (see attached model file),
 and it looks pretty promising: errors are a max of about 2.5%.

 Coupling this with symmetric coefficients could cut coefficient storage
 to 1/8th, at the cost of a few extra adders for the interpolation filter.
 Thoughts?

 Cheers
 Danny

>>>
>>>
>>
>>
>> --
>> Aaron Parsons
>> 510-306-4322
>> Hearst Field Annex B54, UCB
>>
>
>


-- 
Aaron Parsons
510-306-4322
Hearst Field Annex B54, UCB


Re: [casper] number of coefficients needed in PFB and FFT

2013-01-21 Thread Dan Werthimer
hi aaron,

if you use xilinx brams for coefficients, they can be configured as dual
port memories,
so you can get the PFB reverse and forward coefficients both at the same
time,
from the same memory,  almost for free, without any memory size penalty
over single port,

dan




On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons  wrote:

> You guys probably appreciate this already, but although the coefficients
> in the PFB FIR are generally symmetric around the center tap, the upper and
> lower taps use these coefficients in reverse order from one another.  In
> order to take advantage of the symmetry, you'll have to use dual-port ROMs
> that support two different addresses (one counting up and one counting
> down).  In the original core I wrote, I instead just shared coefficients
> between the real and imaginary components.  This was an easy factor of 2
> savings.  After that first factor of two, we found it was kind of
> diminishing returns...
>
> Another thought could be a small BRAM with a linear interpolator between
> addresses.  This would be a block with a wide range of uses, and could
> easily cut the size of the PFB coefficients by an order of magnitude.  The
> (hamming/hanning) window and the sinc that the PFB uses for its
> coefficients are smooth functions, making all the fine subdivisions for
> N>32  samples rather unnecessary.
>
> On Mon, Jan 21, 2013 at 2:56 PM, Dan Werthimer wrote:
>
>>
>>
>> hi danny and ryan,
>>
>> i suspect if you are only doing small FFT's and PFB FIR's,
>> 1K points or so,  then BRAM isn't likely to be the limiting resource,
>> so you might as well store all the coefficients with high precision.
>>
>> but for long transforms, perhaps >4K points or so,
>> then BRAM's might be in short supply, and then one could
>> consider storing fewer coefficients (and also taking advantage
>> of sin/cos and mirror symmetries, which don't degrade SNR at all).
>>
>> for any length FFT or PFB/FIR, even millions of points,
>> if you store 1K coefficients with at least at least 10 bit precision,
>> then the SNR will only be degraded slightly.
>> quantization error analysis is nicely written up in memo #1, at
>> https://casper.berkeley.edu/wiki/Memos
>>
>> best wishes,
>>
>> dan
>>
>>
>>
>> On Mon, Jan 21, 2013 at 4:33 AM, Danny Price wrote:
>>
>>> Hey Jason,
>>>
>>> Rewinding the thread a bit:
>>>
>>> On Fri, Jan 4, 2013 at 7:39 AM, Jason Manley  wrote:
>>>
 Andrew and I have also spoken about symmetrical co-efficients in the
 pfb_fir and I'd very much like to see this done. We recently added the
 option to share co-efficient generators across multiple inputs, which has
 helped a lot for designs with multiple ADCs. It seems to me that bigger
 designs are going to be BRAM limited (FFT BRAM requirements scale
 linearly), so we need to optimise cores to go light on this resource.

>>>
>>> Agreed that BRAM is in general more precious than compute. In addition
>>> to using symmetrical coefficients, it might be worth looking at generating
>>> coefficients. I did some tests this morning with a simple moving average
>>> filter to turn 256 BRAM coefficients into 1024 (see attached model file),
>>> and it looks pretty promising: errors are a max of about 2.5%.
>>>
>>> Coupling this with symmetric coefficients could cut coefficient storage
>>> to 1/8th, at the cost of a few extra adders for the interpolation filter.
>>> Thoughts?
>>>
>>> Cheers
>>> Danny
>>>
>>
>>
>
>
> --
> Aaron Parsons
> 510-306-4322
> Hearst Field Annex B54, UCB
>


Re: [casper] number of coefficients needed in PFB and FFT

2013-01-21 Thread Aaron Parsons
You guys probably appreciate this already, but although the coefficients in
the PFB FIR are generally symmetric around the center tap, the upper and
lower taps use these coefficients in reverse order from one another.  In
order to take advantage of the symmetry, you'll have to use dual-port ROMs
that support two different addresses (one counting up and one counting
down).  In the original core I wrote, I instead just shared coefficients
between the real and imaginary components.  This was an easy factor of 2
savings.  After that first factor of two, we found it was kind of
diminishing returns...

Another thought could be a small BRAM with a linear interpolator between
addresses.  This would be a block with a wide range of uses, and could
easily cut the size of the PFB coefficients by an order of magnitude.  The
(hamming/hanning) window and the sinc that the PFB uses for its
coefficients are smooth functions, making all the fine subdivisions for
N>32  samples rather unnecessary.

On Mon, Jan 21, 2013 at 2:56 PM, Dan Werthimer wrote:

>
>
> hi danny and ryan,
>
> i suspect if you are only doing small FFT's and PFB FIR's,
> 1K points or so,  then BRAM isn't likely to be the limiting resource,
> so you might as well store all the coefficients with high precision.
>
> but for long transforms, perhaps >4K points or so,
> then BRAM's might be in short supply, and then one could
> consider storing fewer coefficients (and also taking advantage
> of sin/cos and mirror symmetries, which don't degrade SNR at all).
>
> for any length FFT or PFB/FIR, even millions of points,
> if you store 1K coefficients with at least at least 10 bit precision,
> then the SNR will only be degraded slightly.
> quantization error analysis is nicely written up in memo #1, at
> https://casper.berkeley.edu/wiki/Memos
>
> best wishes,
>
> dan
>
>
>
> On Mon, Jan 21, 2013 at 4:33 AM, Danny Price wrote:
>
>> Hey Jason,
>>
>> Rewinding the thread a bit:
>>
>> On Fri, Jan 4, 2013 at 7:39 AM, Jason Manley  wrote:
>>
>>> Andrew and I have also spoken about symmetrical co-efficients in the
>>> pfb_fir and I'd very much like to see this done. We recently added the
>>> option to share co-efficient generators across multiple inputs, which has
>>> helped a lot for designs with multiple ADCs. It seems to me that bigger
>>> designs are going to be BRAM limited (FFT BRAM requirements scale
>>> linearly), so we need to optimise cores to go light on this resource.
>>>
>>
>> Agreed that BRAM is in general more precious than compute. In addition to
>> using symmetrical coefficients, it might be worth looking at generating
>> coefficients. I did some tests this morning with a simple moving average
>> filter to turn 256 BRAM coefficients into 1024 (see attached model file),
>> and it looks pretty promising: errors are a max of about 2.5%.
>>
>> Coupling this with symmetric coefficients could cut coefficient storage
>> to 1/8th, at the cost of a few extra adders for the interpolation filter.
>> Thoughts?
>>
>> Cheers
>> Danny
>>
>
>


-- 
Aaron Parsons
510-306-4322
Hearst Field Annex B54, UCB


[casper] number of coefficients needed in PFB and FFT

2013-01-21 Thread Dan Werthimer
hi danny and ryan,

i suspect if you are only doing small FFT's and PFB FIR's,
1K points or so,  then BRAM isn't likely to be the limiting resource,
so you might as well store all the coefficients with high precision.

but for long transforms, perhaps >4K points or so,
then BRAM's might be in short supply, and then one could
consider storing fewer coefficients (and also taking advantage
of sin/cos and mirror symmetries, which don't degrade SNR at all).

for any length FFT or PFB/FIR, even millions of points,
if you store 1K coefficients with at least at least 10 bit precision,
then the SNR will only be degraded slightly.
quantization error analysis is nicely written up in memo #1, at
https://casper.berkeley.edu/wiki/Memos

best wishes,

dan



On Mon, Jan 21, 2013 at 4:33 AM, Danny Price  wrote:

> Hey Jason,
>
> Rewinding the thread a bit:
>
> On Fri, Jan 4, 2013 at 7:39 AM, Jason Manley  wrote:
>
>> Andrew and I have also spoken about symmetrical co-efficients in the
>> pfb_fir and I'd very much like to see this done. We recently added the
>> option to share co-efficient generators across multiple inputs, which has
>> helped a lot for designs with multiple ADCs. It seems to me that bigger
>> designs are going to be BRAM limited (FFT BRAM requirements scale
>> linearly), so we need to optimise cores to go light on this resource.
>>
>
> Agreed that BRAM is in general more precious than compute. In addition to
> using symmetrical coefficients, it might be worth looking at generating
> coefficients. I did some tests this morning with a simple moving average
> filter to turn 256 BRAM coefficients into 1024 (see attached model file),
> and it looks pretty promising: errors are a max of about 2.5%.
>
> Coupling this with symmetric coefficients could cut coefficient storage to
> 1/8th, at the cost of a few extra adders for the interpolation filter.
> Thoughts?
>
> Cheers
> Danny
>


Re: [casper] debugging communication with one_GbE from roach-2 's fpga

2013-01-21 Thread Haoxuan Zheng
Hi Casper group,


We managed to take the FPGA 1GBE on ROACH2 fully under control, great thanks to 
John, Marc, Danny, Jack, and Jason! Wireshark was extremely helpful, as it can 
see packets as long as the receiver's IP is set correctly. For python grabbing 
to work, it turned out that we simply had to configure the sender's IP to have 
the same subnet as the receiver IP (meaning the first 3 segments of the IP).


Thanks again!


From: casper-boun...@lists.berkeley.edu [casper-boun...@lists.berkeley.edu] on 
behalf of Danny Price [danny.pr...@astro.ox.ac.uk]
Sent: Monday, January 21, 2013 3:39 AM
To: casper@lists.berkeley.edu list
Subject: Re: [casper] debugging communication with one_GbE from roach-2 's fpga

Hi Ionna, Jeff

In addition to John and Mark's suggestions, I'd recommend checking:
1) If you're using SELinux, disable it as it causes all sorts of grief.
2) Check the MTU on your ethernet port is set to be larger than your packet 
size.
3) Make sure that you don't have a firewall up on that port.
4) Check if your Ethernet port is going down when the FPGA is reprogrammed (if 
you've got a switch in between this shouldn't happen).

If you've already checked your socket with roach1 UDP code, then most of these 
points are moot. Hopefully one stands out though...

Cheers
Danny


Re: [casper] debugging communication with one_GbE from roach-2 's fpga

2013-01-21 Thread Jason Manley
We haven't actually used the 1GbE core at all at SKA-SA, but AFAIK, it's the 
same and tgtap should work out the box.

Henno's actually the authoritative source for all things 1GbE. He might be able 
to offer more tomorrow.

Jason


On 21 Jan 2013, at 15:52, Marc Welz wrote:

> Hello
> 
>> A question to anyone in the know: is there a runtime way to configure the
>> source IP/mac settings on roach 2 -- i.e. is tap_start implemented for the
>> 1GbE core?
> 
> So if the 1GbE offers the same register layout as the 10GbE core then
> this should work out of the box (just use a different name as tgtap
> parameter)... but I think there might be some bus width differences
> which might have to be hidden ?
> 
> regards
> 
> marc
> 




Re: [casper] debugging communication with one_GbE from roach-2 's fpga

2013-01-21 Thread Marc Welz
Hello

> A question to anyone in the know: is there a runtime way to configure the
> source IP/mac settings on roach 2 -- i.e. is tap_start implemented for the
> 1GbE core?

So if the 1GbE offers the same register layout as the 10GbE core then
this should work out of the box (just use a different name as tgtap
parameter)... but I think there might be some bus width differences
which might have to be hidden ?

regards

marc



Re: [casper] debugging communication with one_GbE from roach-2 's fpga

2013-01-21 Thread Jack Hickish
Hi Ioana,

I'm attaching a model and python script that Guy Kenfack and I knocked
together worked on at the Green Bank workshop. We sent a counter and saw
the data at the right IP/port in wireshark.

A question to anyone in the know: is there a runtime way to configure the
source IP/mac settings on roach 2 -- i.e. is tap_start implemented for the
1GbE core?

Cheers,
Jack


On 21 January 2013 08:07, Marc Welz  wrote:

> Hello
>
> > However, we are stuck in debugging the system: we think we are sending
> stuff
> > properly, but we can not read anything from the python socket, coming
> from
> > our roach 2 fpga.
>
> Try running tcpdump on the receiving PC. If you are using tgtap logic,
> you should see occasional arp traffic from the roach to work out where
> to send its
> data.
>
> Some switches have UDP flood protection - if they see lots of UDP traffic,
> especially broadcast UDP, they throttle it down. If tgtap is running and I
> remember things correctly, data destined for machines which do not respond
> to arp traffic will be broadcast. Alternatively if the destination MAC
> is all zeros,
> switches typically discard traffic instead of sending it on.
>
> I believe you are one of the first people to use the GbE port, so please
> let us know of your progress.
>
> regards
>
> marc
>
>


Re: [casper] debugging communication with one_GbE from roach-2 's fpga

2013-01-21 Thread Danny Price
Hi Ionna, Jeff

In addition to John and Mark's suggestions, I'd recommend checking:
1) If you're using SELinux, disable it as it causes all sorts of grief.
2) Check the MTU on your ethernet port is set to be larger than your packet 
size.
3) Make sure that you don't have a firewall up on that port.
4) Check if your Ethernet port is going down when the FPGA is reprogrammed (if 
you've got a switch in between this shouldn't happen).

If you've already checked your socket with roach1 UDP code, then most of these 
points are moot. Hopefully one stands out though...

Cheers
Danny


Re: [casper] debugging communication with one_GbE from roach-2 's fpga

2013-01-21 Thread Marc Welz
Hello

> However, we are stuck in debugging the system: we think we are sending stuff
> properly, but we can not read anything from the python socket, coming from
> our roach 2 fpga.

Try running tcpdump on the receiving PC. If you are using tgtap logic,
you should see occasional arp traffic from the roach to work out where
to send its
data.

Some switches have UDP flood protection - if they see lots of UDP traffic,
especially broadcast UDP, they throttle it down. If tgtap is running and I
remember things correctly, data destined for machines which do not respond
to arp traffic will be broadcast. Alternatively if the destination MAC
is all zeros,
switches typically discard traffic instead of sending it on.

I believe you are one of the first people to use the GbE port, so please
let us know of your progress.

regards

marc