[casper] adc083000 on ROACH2
Hi everyone, Does anyone have a ROACH2-compatible pcore for the adc083000? The auto-update of the clock manager from DCM to MMCM doesn't seem to work properly. I remember discussing this issue before but I can't remember if it was ever resolved. Thanks, Suraj
Re: [casper] 38 PFB's on a virtex-6?
On 10 June 2013 17:07, David Saroff wrote: > Short of compiling a design, can the resource usage of a PFB yellow block > be seen? > If the data rate is some submultiple of the FPGA clock, say 50 MSPS and > 200 MHz is there a natural way to share resources? > > The question's context: > 38 dipole antennae of the focal plane array for the green bank telescope. > Signal from each of the 38 is sampled at 50 MSPS, digitized to 12 bits. > What frequency resolution fits on a virtex-6? That is what is the number > of taps P and fft bins n that will fit? > > Count multipliers as dsp48 blocks. > > 2000/38 =~ 50 dsp48 blocks per signal. That doesn't seem like enough. If > there is a way to resource share, is there a factor of 200MHz/50MHz = 4 > available ideally? Since I think you're talking about using the 64 input ADC, this is the default input format. I.e, the 64x50MSa/s channels are presented to the FPGA as 16x200MSa/s streams. There is a version of the CASPER PFB-FIR block designed to deal with this, and with appropriate reordering, a normal CASPER FFT works too. > Then it looks more like 4 * 50 = 200 dsp48's per > channel. That still doesn't seem like enough. This is actually quite a lot. A PFB only costs 1 multiplier per tap per stream, and an FFT costs scales only with log(N) > > What about a sample rate of 2.5 MHz? Then the potential reuse multiplier > is 200MHz/2.5MHz = 80 and the number of multiplies per channel is a more > comfortable 80 * 50 = 4000 The bandwidth is less for the 2.5 MSPS vrs 50 > MSPS, by a factor of 1/20th. That makes fewer fft bins n necessary, so > might the advantage of lower sampling rates and narrower band widths be > quadratic? > > Summary > 1)when we parametrize a PFB, can we conveniently see how many dsp48s and > slices, etc it requires? > As a rule of thumb the PFB FIR will cost you one multiplier per tap, and the FFT ~2log(N) (where the 2 is from 4 multipliers per complex mult / 2 real streams FFT'd as one complex). [I'm sure someone will correct me if this isn't right] Rurik wrote a nice memo about PFB utilization (though geared towards wide bandwidth implementation): https://www.cfa.harvard.edu/twiki/pub/SMAwideband/MemoSeries/sma_wideband_utilization_1.pdf > 2)if a PFB receives samples slower than the system clock, is there a way > to share it between channels? > I don't know what the capabilities of Xilinx libraries are, but with the 64-input ADC we have been using a slightly modified version of the PFB block (https://github.com/oxfork/mlib_devel/tree/master/ox_library) which accepts time multiplexed data streams. We then reorder and use the normal CASPER FFT. FWIW, with this setup we can comfortably do 4-taps, 1024 channels (2048 pt FFTs) with 32 antennas on ROACH 1. We run out or RAM long before multipliers. Cheers, Jack > > some embarrassment for the beginner's questions. > > > > >
Re: [casper] 38 PFB's on a virtex-6?
Hi David In terms of resource usage, I'd recommend having a read of this memo by Rurik Primiani on the topic: https://www.cfa.harvard.edu/twiki/pub/SMAwideband/MemoSeries/sma_wideband_utilization_1.pdf There is a pfb_fir_mux which I suspect will do what you're after. There's a copy in https://github.com/oxfork/mlib_devel/tree/master/ox_library (maintained by https://github.com/jack-h) Regards Danny David Saroff 10 June 2013 12:07 Short of compiling a design, can the resource usage of a PFB yellow blockbe seen?If the data rate is some submultiple of the FPGA clock, say 50 MSPS and200 MHz is there a natural way to share resources?The question's context:38 dipole antennae of the focal plane array for the green bank telescope.Signal from each of the 38 is sampled at 50 MSPS, digitized to 12 bits.What frequency resolution fits on a virtex-6? That is what is the numberof taps P and fft bins n that will fit?Count multipliers as dsp48 blocks.2000/38 =~ 50 dsp48 blocks per signal. That doesn't seem like enough. Ifthere is a way to resource share, is there a factor of 200MHz/50MHz = 4available ideally? Then it looks more like 4 * 50 = 200 dsp48's perchannel. That still doesn't seem like enough.What about a sample rate of 2.5 MHz? Then the potential reuse multiplieris 200MHz/2.5MHz = 80 and the number of multiplies per channel is a morecomfortable 80 * 50 = 4000 The bandwidth is less for the 2.5 MSPS vrs 50MSPS, by a factor of 1/20th. That makes fewer fft bins n necessary, somight the advantage of lower sampling rates and narrower band widths bequadratic?Summary1)when we parametrize a PFB, can we conveniently see how many dsp48s andslices, etc it requires?2)if a PFB receives samples slower than the system clock, is there a wayto share it between channels?some embarrassment for the beginner's questions.
Re: [casper] 38 PFB's on a virtex-6?
Hi David, Traditionally, the CASPER libraries are much better at processing data many times the rate of the FPGA rather than the other way around. So while your suggestion of reusing multipliers is of course a good one, it's generally not well implemented in the CASPER libraries. You might consider using Xilinx filters for the PFB_FIR section (perhaps) followed by CASPER FFTs. The FFTs could be shared amongst many lower bandwidth signals by sending one frame of data at a time through the FFT. You can estimate the resource usage by drilling down into the blocks (look under mask) and manually counting. Or you can make a couple test designs, count the resources (planahead is convenient for doing this) and the figure out how it's scaling. Good to check this against your estimate to make sure it's doing things the way you expect. Most blocks have an option to explicitly use DSP48s or general logic for the implementation. Glenn On Mon, Jun 10, 2013 at 12:07 PM, David Saroff wrote: > Short of compiling a design, can the resource usage of a PFB yellow block > be seen? > > If the data rate is some submultiple of the FPGA clock, say 50 MSPS and > 200 MHz is there a natural way to share resources? > > The question's context: > 38 dipole antennae of the focal plane array for the green bank telescope. > Signal from each of the 38 is sampled at 50 MSPS, digitized to 12 bits. > What frequency resolution fits on a virtex-6? That is what is the number > of taps P and fft bins n that will fit? > > Count multipliers as dsp48 blocks. > > 2000/38 =~ 50 dsp48 blocks per signal. That doesn't seem like enough. If > there is a way to resource share, is there a factor of 200MHz/50MHz = 4 > available ideally? Then it looks more like 4 * 50 = 200 dsp48's per > channel. That still doesn't seem like enough. > > What about a sample rate of 2.5 MHz? Then the potential reuse multiplier > is 200MHz/2.5MHz = 80 and the number of multiplies per channel is a more > comfortable 80 * 50 = 4000 The bandwidth is less for the 2.5 MSPS vrs 50 > MSPS, by a factor of 1/20th. That makes fewer fft bins n necessary, so > might the advantage of lower sampling rates and narrower band widths be > quadratic? > > Summary > 1)when we parametrize a PFB, can we conveniently see how many dsp48s and > slices, etc it requires? > > 2)if a PFB receives samples slower than the system clock, is there a way > to share it between channels? > > some embarrassment for the beginner's questions. > > > >
[casper] 38 PFB's on a virtex-6?
Short of compiling a design, can the resource usage of a PFB yellow block be seen? If the data rate is some submultiple of the FPGA clock, say 50 MSPS and 200 MHz is there a natural way to share resources? The question's context: 38 dipole antennae of the focal plane array for the green bank telescope. Signal from each of the 38 is sampled at 50 MSPS, digitized to 12 bits. What frequency resolution fits on a virtex-6? That is what is the number of taps P and fft bins n that will fit? Count multipliers as dsp48 blocks. 2000/38 =~ 50 dsp48 blocks per signal. That doesn't seem like enough. If there is a way to resource share, is there a factor of 200MHz/50MHz = 4 available ideally? Then it looks more like 4 * 50 = 200 dsp48's per channel. That still doesn't seem like enough. What about a sample rate of 2.5 MHz? Then the potential reuse multiplier is 200MHz/2.5MHz = 80 and the number of multiplies per channel is a more comfortable 80 * 50 = 4000 The bandwidth is less for the 2.5 MSPS vrs 50 MSPS, by a factor of 1/20th. That makes fewer fft bins n necessary, so might the advantage of lower sampling rates and narrower band widths be quadratic? Summary 1)when we parametrize a PFB, can we conveniently see how many dsp48s and slices, etc it requires? 2)if a PFB receives samples slower than the system clock, is there a way to share it between channels? some embarrassment for the beginner's questions.
Re: [casper] Real-time control of ROACH2 with KATCP
Ah, thanks Glenn! I should have figured that out... Dale On Mon, Jun 10, 2013 at 1:46 PM, G Jones wrote: > I think this is a common thing run into: it takes a moment for the > FpgaClient to actually connect to the ROACH and this happens in the > background. The FpgaClient instance is returned immediately, and if > you try to talk to the ROACH too soon, it will complain about not > being connected. > In my opinion, the proper thing to do is to use the .is_connected() > method to see when it gets connected. I.e. something like: > > tic = time.time() > while (time.time() - tic < timeout): > if fpga[-1].is_connected(): > break > time.sleep(0.01) > else: > print "fpga connection timed out" > > Glenn > > On Mon, Jun 10, 2013 at 9:40 AM, Gary, Dale E. > wrote: > > Hi All, > > > > This may be a Python question rather than a CASPER/KATCP one, but I have > not > > been able to solve it. I am trying to connect to multiple ROACHes in > order > > to write delays to the coarse delay registers once per second. I > attempted > > to do the following (minimal code that displays the bug) to check the > > availability of each ROACH, and this works fine when typed into an > > interactive session, but when used in a program it fails with a "Client > not > > connected" error. > > > > roach_ip = ('roach1.solar.pvt','roach2.solar.pvt') > > fpga = [] > > for roach in roach_ip: > > # Make connection to ROACHes > > fpga.append( corr.katcp_wrapper.FpgaClient(roach) ) > > fpga[-1].ping() > > > > I am guessing that the difference is that something goes awry when run as > > compiled code. Still, this must be a fairly common thing to do. I could > > get around this by just connecting to each client once a second, sending > the > > delay values, and disconnecting, but this seems unnecessary overhead > > compared with maintaining the open connection all of the time. > > > > Can someone suggest a fix, or a better method for what I want to do? > > > > Thanks, > > Dale >
Re: [casper] Real-time control of ROACH2 with KATCP
I think this is a common thing run into: it takes a moment for the FpgaClient to actually connect to the ROACH and this happens in the background. The FpgaClient instance is returned immediately, and if you try to talk to the ROACH too soon, it will complain about not being connected. In my opinion, the proper thing to do is to use the .is_connected() method to see when it gets connected. I.e. something like: tic = time.time() while (time.time() - tic < timeout): if fpga[-1].is_connected(): break time.sleep(0.01) else: print "fpga connection timed out" Glenn On Mon, Jun 10, 2013 at 9:40 AM, Gary, Dale E. wrote: > Hi All, > > This may be a Python question rather than a CASPER/KATCP one, but I have not > been able to solve it. I am trying to connect to multiple ROACHes in order > to write delays to the coarse delay registers once per second. I attempted > to do the following (minimal code that displays the bug) to check the > availability of each ROACH, and this works fine when typed into an > interactive session, but when used in a program it fails with a "Client not > connected" error. > > roach_ip = ('roach1.solar.pvt','roach2.solar.pvt') > fpga = [] > for roach in roach_ip: > # Make connection to ROACHes > fpga.append( corr.katcp_wrapper.FpgaClient(roach) ) > fpga[-1].ping() > > I am guessing that the difference is that something goes awry when run as > compiled code. Still, this must be a fairly common thing to do. I could > get around this by just connecting to each client once a second, sending the > delay values, and disconnecting, but this seems unnecessary overhead > compared with maintaining the open connection all of the time. > > Can someone suggest a fix, or a better method for what I want to do? > > Thanks, > Dale
[casper] Real-time control of ROACH2 with KATCP
Hi All, This may be a Python question rather than a CASPER/KATCP one, but I have not been able to solve it. I am trying to connect to multiple ROACHes in order to write delays to the coarse delay registers once per second. I attempted to do the following (minimal code that displays the bug) to check the availability of each ROACH, and this works fine when typed into an interactive session, but when used in a program it fails with a "Client not connected" error. roach_ip = ('roach1.solar.pvt','roach2.solar.pvt') fpga = [] for roach in roach_ip: # Make connection to ROACHes fpga.append( corr.katcp_wrapper.FpgaClient(roach) ) fpga[-1].ping() I am guessing that the difference is that something goes awry when run as compiled code. Still, this must be a fairly common thing to do. I could get around this by just connecting to each client once a second, sending the delay values, and disconnecting, but this seems unnecessary overhead compared with maintaining the open connection all of the time. Can someone suggest a fix, or a better method for what I want to do? Thanks, Dale
Re: [casper] Setting up 10gbe core for ROACH2
On Mon, Jun 10, 2013 at 11:38 AM, Gary, Dale E. wrote: > Thanks for pointing out the ?tap-info command. In fact, the problem I am > having does not seem to be the tgtap server alone (although it may be > implicated), and is certainly not due to too much ARP traffic. I found that > after the startup script is done the ARP table is not correct, but I can > start an interactive Python session and issue the exact same commands as > were run in the script and it works. I tried moving the tap-start around in > the script, adding time.sleep() in various places in case timing was an > issue, but nothing seems to help. Hmm... that is rather odd. If you feel like debugging it, note that you can telnet to a roach multiple times - on a separate session type ?log-level trace to dial up the debug output and then run your interactive or scripted commands, and then compare the difference ... The debug output can be quite a lot, if you wish to log directly to file and then diff things, use kcplog Anway and aside - glad you managed to get your system going regards marc
Re: [casper] Setting up 10gbe core for ROACH2
There's a function in corr for manually configuring the 10GbE cores if you'd like to avoid tgtap: def config_10gbe_core(self,device_name,mac,ip,port,arp_table,gateway=1): """Hard-codes a 10GbE core with the provided params. It does a blindwrite, so there is no verifcation that configuration was successful (this is necessary since some of these registers are set by the fabric depending on traffic received). @param self This object. @param device_name String: name of the device. @param mac integer: MAC address, 48 bits. @param ipinteger: IP address, 32 bits. @param port integer: port of fabric interface (16 bits). @param arp_table list of integers: MAC addresses (48 bits ea). """ Obviously if something in your network changes, you'll need to manually update this, whereas tgtap will auto-learn it. For this reason, we prefer to use tgtap, so that your ROACH board behaves like a normal computer would. Jason On 10 Jun 2013, at 13:38, Gary, Dale E. wrote: > Hi Marc, > > Thanks for pointing out the ?tap-info command. In fact, the problem I am > having does not seem to be the tgtap server alone (although it may be > implicated), and is certainly not due to too much ARP traffic. I found that > after the startup script is done the ARP table is not correct, but I can > start an interactive Python session and issue the exact same commands as were > run in the script and it works. I tried moving the tap-start around in the > script, adding time.sleep() in various places in case timing was an issue, > but nothing seems to help. > > I finally gave up and figured out how to set the ARP table manually. The > Python code below does it, where tx_cor_name is a string corresponding to the > tap-device in the design. > > # Manually set ARP table > arp = fpga.read( tx_core_name, 256*8, 0x3000 ) > arp_tab = numpy.array( struct.unpack('>256Q', arp) ) > arp_tab[101] = 0x0060dd4623fe # MAC address for 10.0.0.101 (DPP eth2) > arp_tab[102] = 0x0060dd4623ff # MAC address for 10.0.0.102 (DPP eth3) > arp = struct.pack( '>256Q', *arp_tab.tolist() ) > fpga.write( tx_core_name, arp, 0x3000 ) > > I am still puzzled, but at least we have a solution that allows us to move > forward. > > Regards, > Dale
Re: [casper] Setting up 10gbe core for ROACH2
Hi Marc, Thanks for pointing out the ?tap-info command. In fact, the problem I am having does not seem to be the tgtap server alone (although it may be implicated), and is certainly not due to too much ARP traffic. I found that after the startup script is done the ARP table is not correct, but I can start an interactive Python session and issue the exact same commands as were run in the script and it works. I tried moving the tap-start around in the script, adding time.sleep() in various places in case timing was an issue, but nothing seems to help. I finally gave up and figured out how to set the ARP table manually. The Python code below does it, where tx_cor_name is a string corresponding to the tap-device in the design. # Manually set ARP table arp = fpga.read( tx_core_name, 256*8, 0x3000 ) arp_tab = numpy.array( struct.unpack('>256Q', arp) ) arp_tab[101] = 0x0060dd4623fe # MAC address for 10.0.0.101 (DPP eth2) arp_tab[102] = 0x0060dd4623ff # MAC address for 10.0.0.102 (DPP eth3) arp = struct.pack( '>256Q', *arp_tab.tolist() ) fpga.write( tx_core_name, arp, 0x3000 ) I am still puzzled, but at least we have a solution that allows us to move forward. Regards, Dale
Re: [casper] Setting up 10gbe core for ROACH2
On Fri, Jun 7, 2013 at 1:14 PM, Gary, Dale E. wrote: > Hi All, Hello > We previously were successful in setting up the 10gbe core for ROACH1 using > tap_start, which I believe resulted in a process tgtap appearing in the > ROACH. > > I am using the same method for ROACH2 (using the latest ROACH2 rev 2 file > system loaded via netboot), and there is no reported error, but the ARP > table does not contain the MAC address for the destination IP address, and > when I check the processes running on the ROACH2 I do not see tgtap. As mentioned by others: The tgtap logic on roach2 runs internal to tcpborphserver3. There should be a ?tap-info command you could run to look at the arp table as seen by the roach2 - it will show timeout information not visible using the current python interface. A while ago I tried improving the logic which generates arp traffic, the code is at github.com/ska-sa/katcp_devel/blob/master/tcpborphserver3/tg.c, in particular in function run_timer_tap - the code path which might still be problematic is on line 821 (if(burst < 1)). You could increase 1 to some larger number, or remove the test entirely. During the week I may add some logic to check for starvation problems. However, there are fundamental limits at play: The interface between the PPC and FPGA is pretty slow/inefficient when compared to what can come in on 10GbE, so it is important to make sure that the PPC doesn't get swamped - and the greatest sources of traffic in this regard are other roaches sending arp requests (an N^2 problem) - so on roach2 I have tried to send out arp traffic reasonably conservatively - maybe even too conservatively ... regards marc