[casper] adc083000 on ROACH2

2013-06-10 Thread Suraj Gowda
Hi everyone,
Does anyone have a ROACH2-compatible pcore for the adc083000?  The
auto-update of the clock manager from DCM to MMCM doesn't seem to work
properly.  I remember discussing this issue before but I can't remember if
it was ever resolved.
Thanks,
Suraj


Re: [casper] 38 PFB's on a virtex-6?

2013-06-10 Thread Jack Hickish
On 10 June 2013 17:07, David Saroff  wrote:

> Short of compiling a design, can the resource usage of a PFB yellow block
> be seen?


> If the data rate is some submultiple of the FPGA clock, say 50 MSPS and
> 200 MHz is there a natural way to share resources?
>
> The question's context:
> 38 dipole antennae of the focal plane array for the green bank telescope.
> Signal from each of the 38 is sampled at 50 MSPS, digitized to 12 bits.
> What frequency resolution fits on a virtex-6? That is what is the number
> of taps P and fft bins n that will fit?
>
> Count multipliers as dsp48 blocks.
>
> 2000/38 =~ 50 dsp48 blocks per signal. That doesn't seem like enough. If
> there is a way to resource share, is there a factor of 200MHz/50MHz = 4
> available ideally?


Since I think you're talking about using the 64 input ADC, this is the
default input format. I.e, the 64x50MSa/s channels are presented to the
FPGA as 16x200MSa/s streams. There is a version of the CASPER PFB-FIR block
designed to deal with this, and with appropriate reordering, a normal
CASPER FFT works too.



> Then it looks more like 4 * 50 = 200 dsp48's per
> channel. That still doesn't seem like enough.


This is actually quite a lot. A PFB only costs 1 multiplier per tap per
stream, and an FFT costs scales only with log(N)


>
> What about a sample rate of 2.5 MHz? Then the potential reuse multiplier
> is 200MHz/2.5MHz = 80 and the number of multiplies per channel is a more
> comfortable 80 * 50 = 4000 The bandwidth is less for the 2.5 MSPS vrs 50
> MSPS, by a factor of 1/20th. That makes fewer fft bins n necessary, so
> might the advantage of lower sampling rates and narrower band widths be
> quadratic?
>
> Summary
> 1)when we parametrize a PFB, can we conveniently see how many dsp48s and
> slices, etc it requires?
>

As a rule of thumb the PFB FIR will cost you one multiplier per tap, and
the FFT ~2log(N) (where the 2 is from 4 multipliers per complex mult / 2
real streams FFT'd as one complex). [I'm sure someone will correct me if
this isn't right]
Rurik wrote a nice memo about PFB utilization (though geared towards wide
bandwidth implementation):
https://www.cfa.harvard.edu/twiki/pub/SMAwideband/MemoSeries/sma_wideband_utilization_1.pdf


> 2)if a PFB receives samples slower than the system clock, is there a way
> to share it between channels?
>

I don't know what the capabilities of Xilinx libraries are, but with the
64-input ADC we have been using a slightly modified version of the PFB
block (https://github.com/oxfork/mlib_devel/tree/master/ox_library) which
accepts time multiplexed data streams. We then reorder and use the normal
CASPER FFT.

FWIW, with this setup we can comfortably do 4-taps, 1024 channels (2048 pt
FFTs) with 32 antennas on ROACH 1. We run out or RAM long before
multipliers.

Cheers,
Jack


>
> some embarrassment for the beginner's questions.
>
>
>
>
>


Re: [casper] 38 PFB's on a virtex-6?

2013-06-10 Thread Danny Price
Hi David

In terms of resource usage, I'd recommend having a read of this memo by 
Rurik Primiani on the topic:
https://www.cfa.harvard.edu/twiki/pub/SMAwideband/MemoSeries/sma_wideband_utilization_1.pdf

There is a pfb_fir_mux which I suspect will do what you're after. 
There's a copy in
https://github.com/oxfork/mlib_devel/tree/master/ox_library
(maintained by https://github.com/jack-h)

Regards
Danny 


   	   
   	David Saroff  
  10 June 2013 
12:07
  Short of compiling a 
design, can the resource usage of a PFB yellow blockbe seen?If
 the data rate is some submultiple of the FPGA clock, say 50 MSPS and200
 MHz is there a natural way to share resources?The question's 
context:38 dipole antennae of the focal plane array for the green 
bank telescope.Signal from each of the 38 is sampled at 50 MSPS, 
digitized to 12 bits.What frequency resolution fits on a virtex-6? 
That is what is the numberof taps P and fft bins n that will fit?Count
 multipliers as dsp48 blocks.2000/38 =~ 50 dsp48 blocks per 
signal. That doesn't seem like enough. Ifthere is a way to resource 
share, is there a factor of 200MHz/50MHz = 4available ideally? Then 
it looks more like 4 * 50 = 200 dsp48's perchannel. That still 
doesn't seem like enough.What about a sample rate of 2.5 MHz? 
Then the potential reuse multiplieris 200MHz/2.5MHz = 80 and the 
number of multiplies per channel is a morecomfortable 80 * 50 = 4000
 The bandwidth is less for the 2.5 MSPS vrs 50MSPS, by a factor of 
1/20th. That makes fewer fft bins n necessary, somight the advantage
 of lower sampling rates and narrower band widths bequadratic?Summary1)when
 we parametrize a PFB, can we conveniently see how many dsp48s andslices,
 etc it requires?2)if a PFB receives samples slower than the 
system clock, is there a wayto share it between channels?some
 embarrassment for the beginner's questions.




Re: [casper] 38 PFB's on a virtex-6?

2013-06-10 Thread G Jones
Hi David,
Traditionally, the CASPER libraries are much better at processing data
many times the rate of the FPGA rather than the other way around. So
while your suggestion of reusing multipliers is of course a good one,
it's generally not well implemented in the CASPER libraries. You might
consider using Xilinx filters for the PFB_FIR section (perhaps)
followed by CASPER FFTs. The FFTs could be shared amongst many lower
bandwidth signals by sending one frame of data at a time through the
FFT.

You can estimate the resource usage by drilling down into the blocks
(look under mask) and manually counting. Or you can make a couple test
designs, count the resources (planahead is convenient for doing this)
and the figure out how it's scaling. Good to check this against your
estimate to make sure it's doing things the way you expect. Most
blocks have an option to explicitly use DSP48s or general logic for
the implementation.

Glenn

On Mon, Jun 10, 2013 at 12:07 PM, David Saroff  wrote:
> Short of compiling a design, can the resource usage of a PFB yellow block
> be seen?
>
> If the data rate is some submultiple of the FPGA clock, say 50 MSPS and
> 200 MHz is there a natural way to share resources?
>
> The question's context:
> 38 dipole antennae of the focal plane array for the green bank telescope.
> Signal from each of the 38 is sampled at 50 MSPS, digitized to 12 bits.
> What frequency resolution fits on a virtex-6? That is what is the number
> of taps P and fft bins n that will fit?
>
> Count multipliers as dsp48 blocks.
>
> 2000/38 =~ 50 dsp48 blocks per signal. That doesn't seem like enough. If
> there is a way to resource share, is there a factor of 200MHz/50MHz = 4
> available ideally? Then it looks more like 4 * 50 = 200 dsp48's per
> channel. That still doesn't seem like enough.
>
> What about a sample rate of 2.5 MHz? Then the potential reuse multiplier
> is 200MHz/2.5MHz = 80 and the number of multiplies per channel is a more
> comfortable 80 * 50 = 4000 The bandwidth is less for the 2.5 MSPS vrs 50
> MSPS, by a factor of 1/20th. That makes fewer fft bins n necessary, so
> might the advantage of lower sampling rates and narrower band widths be
> quadratic?
>
> Summary
> 1)when we parametrize a PFB, can we conveniently see how many dsp48s and
> slices, etc it requires?
>
> 2)if a PFB receives samples slower than the system clock, is there a way
> to share it between channels?
>
> some embarrassment for the beginner's questions.
>
>
>
>



[casper] 38 PFB's on a virtex-6?

2013-06-10 Thread David Saroff
Short of compiling a design, can the resource usage of a PFB yellow block
be seen?

If the data rate is some submultiple of the FPGA clock, say 50 MSPS and
200 MHz is there a natural way to share resources?

The question's context:
38 dipole antennae of the focal plane array for the green bank telescope.
Signal from each of the 38 is sampled at 50 MSPS, digitized to 12 bits.
What frequency resolution fits on a virtex-6? That is what is the number
of taps P and fft bins n that will fit?

Count multipliers as dsp48 blocks.

2000/38 =~ 50 dsp48 blocks per signal. That doesn't seem like enough. If
there is a way to resource share, is there a factor of 200MHz/50MHz = 4
available ideally? Then it looks more like 4 * 50 = 200 dsp48's per
channel. That still doesn't seem like enough.

What about a sample rate of 2.5 MHz? Then the potential reuse multiplier
is 200MHz/2.5MHz = 80 and the number of multiplies per channel is a more
comfortable 80 * 50 = 4000 The bandwidth is less for the 2.5 MSPS vrs 50
MSPS, by a factor of 1/20th. That makes fewer fft bins n necessary, so
might the advantage of lower sampling rates and narrower band widths be
quadratic?

Summary
1)when we parametrize a PFB, can we conveniently see how many dsp48s and
slices, etc it requires?

2)if a PFB receives samples slower than the system clock, is there a way
to share it between channels?

some embarrassment for the beginner's questions.






Re: [casper] Real-time control of ROACH2 with KATCP

2013-06-10 Thread Gary, Dale E.
Ah, thanks Glenn!  I should have figured that out...

Dale


On Mon, Jun 10, 2013 at 1:46 PM, G Jones  wrote:

> I think this is a common thing run into: it takes a moment for the
> FpgaClient to actually connect to the ROACH and this happens in the
> background. The FpgaClient instance is returned immediately, and if
> you try to talk to the ROACH too soon, it will complain about not
> being connected.
> In my opinion, the proper thing to do is to use the .is_connected()
> method to see when it gets connected. I.e. something like:
>
> tic = time.time()
> while (time.time() - tic < timeout):
> if fpga[-1].is_connected():
> break
> time.sleep(0.01)
> else:
> print "fpga connection timed out"
>
> Glenn
>
> On Mon, Jun 10, 2013 at 9:40 AM, Gary, Dale E. 
> wrote:
> > Hi All,
> >
> > This may be a Python question rather than a CASPER/KATCP one, but I have
> not
> > been able to solve it.  I am trying to connect to multiple ROACHes in
> order
> > to write delays to the coarse delay registers once per second.  I
> attempted
> > to do the following (minimal code that displays the bug) to check the
> > availability of each ROACH, and this works fine when typed into an
> > interactive session, but when used in a program it fails with a "Client
> not
> > connected" error.
> >
> > roach_ip = ('roach1.solar.pvt','roach2.solar.pvt')
> > fpga = []
> > for roach in roach_ip:
> > # Make connection to ROACHes
> > fpga.append( corr.katcp_wrapper.FpgaClient(roach) )
> > fpga[-1].ping()
> >
> > I am guessing that the difference is that something goes awry when run as
> > compiled code.  Still, this must be a fairly common thing to do.  I could
> > get around this by just connecting to each client once a second, sending
> the
> > delay values, and disconnecting, but this seems unnecessary overhead
> > compared with maintaining the open connection all of the time.
> >
> > Can someone suggest a fix, or a better method for what I want to do?
> >
> > Thanks,
> > Dale
>


Re: [casper] Real-time control of ROACH2 with KATCP

2013-06-10 Thread G Jones
I think this is a common thing run into: it takes a moment for the
FpgaClient to actually connect to the ROACH and this happens in the
background. The FpgaClient instance is returned immediately, and if
you try to talk to the ROACH too soon, it will complain about not
being connected.
In my opinion, the proper thing to do is to use the .is_connected()
method to see when it gets connected. I.e. something like:

tic = time.time()
while (time.time() - tic < timeout):
if fpga[-1].is_connected():
break
time.sleep(0.01)
else:
print "fpga connection timed out"

Glenn

On Mon, Jun 10, 2013 at 9:40 AM, Gary, Dale E.  wrote:
> Hi All,
>
> This may be a Python question rather than a CASPER/KATCP one, but I have not
> been able to solve it.  I am trying to connect to multiple ROACHes in order
> to write delays to the coarse delay registers once per second.  I attempted
> to do the following (minimal code that displays the bug) to check the
> availability of each ROACH, and this works fine when typed into an
> interactive session, but when used in a program it fails with a "Client not
> connected" error.
>
> roach_ip = ('roach1.solar.pvt','roach2.solar.pvt')
> fpga = []
> for roach in roach_ip:
> # Make connection to ROACHes
> fpga.append( corr.katcp_wrapper.FpgaClient(roach) )
> fpga[-1].ping()
>
> I am guessing that the difference is that something goes awry when run as
> compiled code.  Still, this must be a fairly common thing to do.  I could
> get around this by just connecting to each client once a second, sending the
> delay values, and disconnecting, but this seems unnecessary overhead
> compared with maintaining the open connection all of the time.
>
> Can someone suggest a fix, or a better method for what I want to do?
>
> Thanks,
> Dale



[casper] Real-time control of ROACH2 with KATCP

2013-06-10 Thread Gary, Dale E.
Hi All,

This may be a Python question rather than a CASPER/KATCP one, but I have
not been able to solve it.  I am trying to connect to multiple ROACHes in
order to write delays to the coarse delay registers once per second.  I
attempted to do the following (minimal code that displays the bug) to check
the availability of each ROACH, and this works fine when typed into an
interactive session, but when used in a program it fails with a "Client not
connected" error.

roach_ip = ('roach1.solar.pvt','roach2.solar.pvt')
fpga = []
for roach in roach_ip:
# Make connection to ROACHes
fpga.append( corr.katcp_wrapper.FpgaClient(roach) )
fpga[-1].ping()

I am guessing that the difference is that something goes awry when run as
compiled code.  Still, this must be a fairly common thing to do.  I could
get around this by just connecting to each client once a second, sending
the delay values, and disconnecting, but this seems unnecessary overhead
compared with maintaining the open connection all of the time.

Can someone suggest a fix, or a better method for what I want to do?

Thanks,
Dale


Re: [casper] Setting up 10gbe core for ROACH2

2013-06-10 Thread Marc Welz
On Mon, Jun 10, 2013 at 11:38 AM, Gary, Dale E.  wrote:
> Thanks for pointing out the ?tap-info command.  In fact, the problem I am
> having does not seem to be the tgtap server alone (although it may be
> implicated), and is certainly not due to too much ARP traffic.  I found that
> after the startup script is done the ARP table is not correct, but I can
> start an interactive Python session and issue the exact same commands as
> were run in the script and it works.  I tried moving the tap-start around in
> the script, adding time.sleep() in various places in case timing was an
> issue, but nothing seems to help.

Hmm... that is rather odd. If you feel like debugging it, note that you can
telnet to a roach multiple times - on a separate session type ?log-level trace
to dial up the debug output and then run your interactive or scripted commands,
and then compare the difference ... The debug output can be quite a lot, if
you wish to log directly to file and then diff things, use kcplog

Anway and aside - glad you managed to get your system going

regards

marc



Re: [casper] Setting up 10gbe core for ROACH2

2013-06-10 Thread Jason Manley
There's a function in corr for manually configuring the 10GbE cores if you'd 
like to avoid tgtap: 

def config_10gbe_core(self,device_name,mac,ip,port,arp_table,gateway=1):
"""Hard-codes a 10GbE core with the provided params. It does a 
blindwrite, so there is no verifcation that configuration was successful (this 
is necessary since some of these registers are set by the fabric depending on 
traffic received).

   @param self  This object.
   @param device_name  String: name of the device.
   @param mac   integer: MAC address, 48 bits.
   @param ipinteger: IP address, 32 bits.
   @param port  integer: port of fabric interface (16 bits).
   @param arp_table  list of integers: MAC addresses (48 bits ea).
   """

Obviously if something in your network changes, you'll need to manually update 
this, whereas tgtap will auto-learn it. For this reason, we prefer to use 
tgtap, so that your ROACH board behaves like a normal computer would.

Jason

On 10 Jun 2013, at 13:38, Gary, Dale E. wrote:

> Hi Marc,
> 
> Thanks for pointing out the ?tap-info command.  In fact, the problem I am 
> having does not seem to be the tgtap server alone (although it may be 
> implicated), and is certainly not due to too much ARP traffic.  I found that 
> after the startup script is done the ARP table is not correct, but I can 
> start an interactive Python session and issue the exact same commands as were 
> run in the script and it works.  I tried moving the tap-start around in the 
> script, adding time.sleep() in various places in case timing was an issue, 
> but nothing seems to help.
> 
> I finally gave up and figured out how to set the ARP table manually.  The 
> Python code below does it, where tx_cor_name is a string corresponding to the 
> tap-device in the design.
> 
> # Manually set ARP table
> arp = fpga.read( tx_core_name, 256*8, 0x3000 )
> arp_tab = numpy.array( struct.unpack('>256Q', arp) )
> arp_tab[101] = 0x0060dd4623fe   # MAC address for 10.0.0.101 (DPP eth2)
> arp_tab[102] = 0x0060dd4623ff   # MAC address for 10.0.0.102 (DPP eth3)
> arp = struct.pack( '>256Q', *arp_tab.tolist() )
> fpga.write( tx_core_name, arp, 0x3000 )
> 
> I am still puzzled, but at least we have a solution that allows us to move 
> forward.
> 
> Regards,
> Dale




Re: [casper] Setting up 10gbe core for ROACH2

2013-06-10 Thread Gary, Dale E.
Hi Marc,

Thanks for pointing out the ?tap-info command.  In fact, the problem I am
having does not seem to be the tgtap server alone (although it may be
implicated), and is certainly not due to too much ARP traffic.  I found
that after the startup script is done the ARP table is not correct, but I
can start an interactive Python session and issue the exact same commands
as were run in the script and it works.  I tried moving the tap-start
around in the script, adding time.sleep() in various places in case timing
was an issue, but nothing seems to help.

I finally gave up and figured out how to set the ARP table manually.  The
Python code below does it, where tx_cor_name is a string corresponding to
the tap-device in the design.

# Manually set ARP table
arp = fpga.read( tx_core_name, 256*8, 0x3000 )
arp_tab = numpy.array( struct.unpack('>256Q', arp) )
arp_tab[101] = 0x0060dd4623fe   # MAC address for 10.0.0.101 (DPP eth2)
arp_tab[102] = 0x0060dd4623ff   # MAC address for 10.0.0.102 (DPP eth3)
arp = struct.pack( '>256Q', *arp_tab.tolist() )
fpga.write( tx_core_name, arp, 0x3000 )

I am still puzzled, but at least we have a solution that allows us to move
forward.

Regards,
Dale


Re: [casper] Setting up 10gbe core for ROACH2

2013-06-10 Thread Marc Welz
On Fri, Jun 7, 2013 at 1:14 PM, Gary, Dale E.  wrote:
> Hi All,

Hello

> We previously were successful in setting up the 10gbe core for ROACH1 using
> tap_start, which I believe resulted in a process tgtap appearing in the
> ROACH.
>
> I am using the same method for ROACH2 (using the latest ROACH2 rev 2 file
> system loaded via netboot), and there is no reported error, but the ARP
> table does not contain the MAC address for the destination IP address, and
> when I check the processes running on the ROACH2 I do not see tgtap.

As mentioned by others: The tgtap logic on roach2 runs
internal to tcpborphserver3.

There should be a ?tap-info command you could run
to look at the arp table as seen by the roach2 - it will
show timeout information not visible using the current python
interface.

A while ago I tried improving the logic which generates arp traffic,
the code is at github.com/ska-sa/katcp_devel/blob/master/tcpborphserver3/tg.c,
in particular in function run_timer_tap - the code path which might still be
problematic is on line 821 (if(burst < 1)). You could increase 1 to some larger
number, or remove the test entirely.

During the week I may add some logic to check for starvation problems.

However, there are fundamental limits at play: The interface between the PPC
and FPGA is pretty slow/inefficient when compared to what can come in on
10GbE, so it is important to make sure that the PPC doesn't get swamped - and
the greatest sources of traffic in this regard are other roaches
sending arp requests
(an N^2 problem) - so on roach2 I have tried to send out arp traffic reasonably
conservatively - maybe even too conservatively ...

regards

marc