Re: [casper] wideband conversion and correlation
hi bob, i agree with dave that option #2 in your email is best, because you only have to oversample each PFB sub-band by 15% to 20%, instead of 50% overlap, so the F engine computing cost is only slightly increased, and you can still get the flat passband response you want in your FFX correlator. here's a strawperson FFX correlator using oversampling: 1) divide a 16 GHz band up into two 8 GHz bands using analog techniques. if your ADC has 16GHz bandwidth, then you can use a diplexer - no mixers are needed. 2) digitize each 8 GHz band using a 16Gsps ADC board and a Roach II. (you will need to design this ADC board or perhaps Dave H. will do this). 3) break the 8 GHz bands up into 8 sub-bands of 1 GHz each using an 8 tap PFB or 8 DDC's. oversample the data by 20% (1.2 GHz bandwidth per sub-band) and transmit the eight sub-bands using Roach II's eight ports. (use 4 bit real, 4 bit imaginary, XAUI protocol). 4) feed the above data into 8 FX correlators, each with 1.2 GHz bandwidth: 4a) each 1.2 GHz FX correlator consists of 8 Roach II boards and a 16 port 10Gbit switch and is implemented as follows: 4b) each Roach II board in the 1.2 GHz FX correlator serves as both a 1.2 GHz bandwidth dual pol F single antenna engine and a 125 MHz bandwidth eight antenna dual pol X engine. the libraries for these F and X blocks are available from the casper packetized correlator designs. the roach II board receives 1.2 GHz sub-bands via two xaui links from two polarizations from one antenna (from step 3 above), then breaks the two 1.2 GHz bands up into about 4K channels, packetizes the data and transmits 1GHz of the 1.2 GHz band out over a pair of 10Gbit ethernet links to a 10Gbe switch (4 bit real, 4 bit imaginary data). (there's no need to transmit or correlate the overlapping parts the 1.2 GHz band, so 100 MHz on each side is discarded). the switch implements the FX corner turn, and sends 125 MHz bands back to each Roach II for correlation over the same pair of 10Gbit ethernet links. each roach II FPGA also contains an 8 antenna X engine for 125 MHz bandwidth.. if you'd like to discuss some time, please give me a call. best wishes, dan On 12/29/2010 9:09 AM, David Hawkins wrote: Hi Bob, I believe the CASPER implementations of the PFB resamples the (output) channel data streams at a rate consistent with the channel spacing, resulting in an overall output data rate that matches the input data rate. The PFB output channels can alternatively be resampled at a higher data rate. This allows for a wider channel transition on the individual channels, but would result in a higher total output data rate than input data rate. This higher data rate would need to be accommodated between the output of the coarse channelizer PFB, and the second fine channelizer PFB. The transition channels would be discarded before sending the data to the cross-correlator. Given that the FFX F-to-F path is point-to-point, there would be no need to use packetized data, so having to deal with higher-bandwidth might be accommodated by operating FPGA-to-FPGA transceiver links as synchronous links, eg. given a XAUI lane nominally operated at 3.125Gbps, operate it as a synchronous link at say 6.5Gbps. The maximum bandwidth of the F-to-F path would help determine what your output channel resample rate should be. I believe Fred Harris' book [1] has a discussion on this, in Chapter 9 'Polyphase Channelizers'. Cheers, Dave [1] F. J. Harris, "Multirate Signal Processing for Communications Systems", 2004. Robert Wilson wrote: Dear Dan et al., Seasons Greetings. I hope that you have had less rain than we had snow. Previously I have suggested stacking two layers of PFBs in the case where the sampler runs faster than a single FPGA can be a complete F engine for. Although I can't find an email with your earlier suggestion, I believe that this is what you are calling an FFX correlator. A couple of weeks ago, I was thinking about this and realized that, at least for our purposes it will not work. We would like to cover a broad band with relatively fine spectral resolution and do not want holes in our spectral coverage unless there is a big penalty for avoiding them. Think about the first PFB which divides up the original band into, say, 8 blocks. Suppose we use the Micram ADC30 and convert 9 GHz at a time. Then each block is 1.125 GHz wide. We will want to be able to divide that into 32K (or perhaps even 64K) channels. Now consider the edge of the band covered by that block. The PFB can be designed to have a very sharp cutoff at the edge, perhaps 30 dB. This will avoid aliasing from adjacent bands, but the channels near each edge will be ~ 30 dB down and effectively useless. The block will be sampled at the Nyquist rate, so one can not design the filter a bit wider and throw away the edge channels in the second
Re: [casper] wideband conversion and correlation
Hi Bob, I believe the CASPER implementations of the PFB resamples the (output) channel data streams at a rate consistent with the channel spacing, resulting in an overall output data rate that matches the input data rate. The PFB output channels can alternatively be resampled at a higher data rate. This allows for a wider channel transition on the individual channels, but would result in a higher total output data rate than input data rate. This higher data rate would need to be accommodated between the output of the coarse channelizer PFB, and the second fine channelizer PFB. The transition channels would be discarded before sending the data to the cross-correlator. Given that the FFX F-to-F path is point-to-point, there would be no need to use packetized data, so having to deal with higher-bandwidth might be accommodated by operating FPGA-to-FPGA transceiver links as synchronous links, eg. given a XAUI lane nominally operated at 3.125Gbps, operate it as a synchronous link at say 6.5Gbps. The maximum bandwidth of the F-to-F path would help determine what your output channel resample rate should be. I believe Fred Harris' book [1] has a discussion on this, in Chapter 9 'Polyphase Channelizers'. Cheers, Dave [1] F. J. Harris, "Multirate Signal Processing for Communications Systems", 2004. Robert Wilson wrote: Dear Dan et al., Seasons Greetings. I hope that you have had less rain than we had snow. Previously I have suggested stacking two layers of PFBs in the case where the sampler runs faster than a single FPGA can be a complete F engine for. Although I can't find an email with your earlier suggestion, I believe that this is what you are calling an FFX correlator. A couple of weeks ago, I was thinking about this and realized that, at least for our purposes it will not work. We would like to cover a broad band with relatively fine spectral resolution and do not want holes in our spectral coverage unless there is a big penalty for avoiding them. Think about the first PFB which divides up the original band into, say, 8 blocks. Suppose we use the Micram ADC30 and convert 9 GHz at a time. Then each block is 1.125 GHz wide. We will want to be able to divide that into 32K (or perhaps even 64K) channels. Now consider the edge of the band covered by that block. The PFB can be designed to have a very sharp cutoff at the edge, perhaps 30 dB. This will avoid aliasing from adjacent bands, but the channels near each edge will be ~ 30 dB down and effectively useless. The block will be sampled at the Nyquist rate, so one can not design the filter a bit wider and throw away the edge channels in the second stage. I have seen two designs which I believe are attempts to deal with this problem: Mark Torres described a design in which the first stage PFB is actually duplicated as two PFBs shifted by half of a channel width. The PFBs can have simple FIR filters as only the central half of each will be used after the second stages. This certainly solves the problem, but it looks to me as though it requires almost twice as much computing in the F engines and the data rate out of the first stage will be twice the input rate. I believe that I have also seen designs in which the first stage is done with overlapping FIR filters. I don't know how much more computing that requires than the PFB, but the data rate is only modestly increased as is the computing in the second stage PFBs. The latter is probably the preferred solution, but there are two places where the CASPER PFB could be split. After the FIR filter and between the two stages of the FFT. This would allow sharing the load with up to three FPGAs. There would be no increase in data communications rate in this option. I have discussed this with Alan Rogers who offered to think about efficient solutions to the problem. In their VLBI processors, they split the input band into separate channels with a PFB to emulate the original analog filters. Apparently they have not worried about complete spectral coverage with the multitap off-line cross correlator. I wonder if there are other solutions to this problem. Regards, Bob Wilson On Fri, 24 Dec 2010, Dan Werthimer wrote: hi jason, jonathan, regarding jason's concerns below about corner turns and 10Gbit links: in the FFX model that i propose, where the first FGPA breaks up the 9 GHz band up into 8 pieces of 1.25 GHz each, there is no corner turner needed, as the 8 frequency bands emerge from the PFB in eight parallel paths and each path goes separately to it's own XAUI or 10Gbit ethernet port. the FFX design doesn't require any block ram or QDR: all the coefficients in the 8 channel PFB/FFT are constants, and there are no BRAM delays in the FFT, only registers, as the FFT is implemented with fully parallel inputs and outputs. (the FFT is implemented like a text book diagram of an 8 input FFT with all the butterfly's done in parallel). or instead of an 8 channel PFB, the channelization can
Re: [casper] wideband conversion and correlation
Dear Dan et al., Seasons Greetings. I hope that you have had less rain than we had snow. Previously I have suggested stacking two layers of PFBs in the case where the sampler runs faster than a single FPGA can be a complete F engine for. Although I can't find an email with your earlier suggestion, I believe that this is what you are calling an FFX correlator. A couple of weeks ago, I was thinking about this and realized that, at least for our purposes it will not work. We would like to cover a broad band with relatively fine spectral resolution and do not want holes in our spectral coverage unless there is a big penalty for avoiding them. Think about the first PFB which divides up the original band into, say, 8 blocks. Suppose we use the Micram ADC30 and convert 9 GHz at a time. Then each block is 1.125 GHz wide. We will want to be able to divide that into 32K (or perhaps even 64K) channels. Now consider the edge of the band covered by that block. The PFB can be designed to have a very sharp cutoff at the edge, perhaps 30 dB. This will avoid aliasing from adjacent bands, but the channels near each edge will be ~ 30 dB down and effectively useless. The block will be sampled at the Nyquist rate, so one can not design the filter a bit wider and throw away the edge channels in the second stage. I have seen two designs which I believe are attempts to deal with this problem: Mark Torres described a design in which the first stage PFB is actually duplicated as two PFBs shifted by half of a channel width. The PFBs can have simple FIR filters as only the central half of each will be used after the second stages. This certainly solves the problem, but it looks to me as though it requires almost twice as much computing in the F engines and the data rate out of the first stage will be twice the input rate. I believe that I have also seen designs in which the first stage is done with overlapping FIR filters. I don't know how much more computing that requires than the PFB, but the data rate is only modestly increased as is the computing in the second stage PFBs. The latter is probably the preferred solution, but there are two places where the CASPER PFB could be split. After the FIR filter and between the two stages of the FFT. This would allow sharing the load with up to three FPGAs. There would be no increase in data communications rate in this option. I have discussed this with Alan Rogers who offered to think about efficient solutions to the problem. In their VLBI processors, they split the input band into separate channels with a PFB to emulate the original analog filters. Apparently they have not worried about complete spectral coverage with the multitap off-line cross correlator. I wonder if there are other solutions to this problem. Regards, Bob Wilson On Fri, 24 Dec 2010, Dan Werthimer wrote: > > > hi jason, jonathan, > > regarding jason's concerns below about corner turns > and 10Gbit links: > > in the FFX model that i propose, where the > first FGPA breaks up the 9 GHz band up into 8 pieces of > 1.25 GHz each, there is no corner turner needed, as > the 8 frequency bands emerge from the PFB in eight > parallel paths and each path goes separately to it's own > XAUI or 10Gbit ethernet port. > > the FFX design doesn't require any block ram or QDR: > all the coefficients in the 8 channel PFB/FFT are constants, > and there are no BRAM delays in the FFT, only registers, > as the FFT is implemented with fully parallel inputs and outputs. > (the FFT is implemented like a text book diagram of an > 8 input FFT with all the butterfly's done in parallel). > or instead of an 8 channel PFB, the channelization can > be implemented as 8 DDC's, again with no BRAM's. > > i think the Roach II's eight 10Gbit links can just barely support > support 9 GHz of bandwidth, with 4 bit real, 4 bit imaginary data: > (1.25 GHz each * 8 bits = 10Gbits/sec on each link). > this will work with XAUI, but for 10Gbe, the extra overhead > from headers, time stamps, etc will reduce the bandwidth slightly. > > jonathan, > > suraj's conern about achieving high clock rates at high demux values > is for large FFT's (you asked about 32K points). > if you are just doing an 8 point PFB or FFT, or implementing 8 DDC's, > for an FFX correlator, the routing is pretty straghtforward - > you won't be using the CASPER PFB or FFT blocks, > it's all fully parallel implementation. > > > best wishes, > > dan > > > > > > On 12/23/2010 11:17 PM, Jason Manley wrote: > > To the best of my knowledge, nobody's built a CASPER correlator that > > processes such high bandwidths. I took a closer look at bringing 20Gsps > > into a ROACH2 for MeerKAT use a few months back. My conclusion was that > > this would be possible with current libraries with minimal changes. > > However, we weren't aiming for 32k PFBs and we weren't aiming to process > > the entire 10GHz band (we'd use a DDC and only process a couple of GHz). > > > > I believe t
Re: [casper] wideband conversion and correlation
hi jason, jonathan, regarding jason's concerns below about corner turns and 10Gbit links: in the FFX model that i propose, where the first FGPA breaks up the 9 GHz band up into 8 pieces of 1.25 GHz each, there is no corner turner needed, as the 8 frequency bands emerge from the PFB in eight parallel paths and each path goes separately to it's own XAUI or 10Gbit ethernet port. the FFX design doesn't require any block ram or QDR: all the coefficients in the 8 channel PFB/FFT are constants, and there are no BRAM delays in the FFT, only registers, as the FFT is implemented with fully parallel inputs and outputs. (the FFT is implemented like a text book diagram of an 8 input FFT with all the butterfly's done in parallel). or instead of an 8 channel PFB, the channelization can be implemented as 8 DDC's, again with no BRAM's. i think the Roach II's eight 10Gbit links can just barely support support 9 GHz of bandwidth, with 4 bit real, 4 bit imaginary data: (1.25 GHz each * 8 bits = 10Gbits/sec on each link). this will work with XAUI, but for 10Gbe, the extra overhead from headers, time stamps, etc will reduce the bandwidth slightly. jonathan, suraj's conern about achieving high clock rates at high demux values is for large FFT's (you asked about 32K points). if you are just doing an 8 point PFB or FFT, or implementing 8 DDC's, for an FFX correlator, the routing is pretty straghtforward - you won't be using the CASPER PFB or FFT blocks, it's all fully parallel implementation. best wishes, dan On 12/23/2010 11:17 PM, Jason Manley wrote: To the best of my knowledge, nobody's built a CASPER correlator that processes such high bandwidths. I took a closer look at bringing 20Gsps into a ROACH2 for MeerKAT use a few months back. My conclusion was that this would be possible with current libraries with minimal changes. However, we weren't aiming for 32k PFBs and we weren't aiming to process the entire 10GHz band (we'd use a DDC and only process a couple of GHz). I believe that you could put a PFB on the whole band if you tweak and optimise the library block as Billy has just done with the FFT. Though the FPGA might not run at very high speeds and you might not get the spectral resolution that you want directly due to resource consumption of pipelining, I think it would be possible do break this up into subbands (FFX approach) on a single board. I will highlight the following limitations with processing such large bandwidths on the ROACH2 platform: 1) QDR corner-turn bandwidth. You don't mention how many inputs you're planning and so you might not need the packetised infrastructure at all (perhaps you're considering something like Billy's point-to-point 3GHz 3-input correlator). ROACH2 will have four 36-bit QDR interfaces. These can be ganged together and demuxed to produce a single 288bit SDR interface so that your limits would be: 32 parallel_streams * 4bit * 2complex = need 256-bit interface. These 32 parallel streams are complex, post-FFT (after the imag half of spectrum has been tossed) so that it would represent ~300MHz*32=9.6GHz of real band. So you might be OK here. 2) QDR capacity for the corner turn is much less of a concern. With a packet length of 128 (what everyone's using right now), you can have up to 64 antennas: 128pkt_len * 32768chan * 4bit * 2complex = 32Mbit per dual-pol antenna. *) With the huge BRAM reserves on the V6, it might even be possible to bypass the QDR and do the whole corner-turn in BRAM, especially if you opt for smaller packet sizes (which'd result in smaller buffers and potentially faster dump rates but with reduced network efficiencies due to smaller payload/header ratio). 3) Another consideration, and possible deal-breaker, is the interconnect: ROACH-II will have 8 10GbE links (or maybe later two 40Gbps links) which could carry a little over 7GHz bandwidth after network overhead. Again, if you're not aiming for a packetised system, then you can do a little better. If your ADC is going to use some of the SERDES lines though (as many of the new high speed samplers do), then you might have to forfeit some of this interconnect. But basically, I think you're going to run out of bandwidth to get 10GHz out. WRT clock rates, I think that 300MHz should be achievable on ROACH-II with a little tweaking. ROACH-1 is able to do 250MHz with much less fiddling than the iBOBs at these speeds. The iBOB with the old libraries used to start choking around just 200MHz. So the clock rates are improving a little generation-to-generation and I don't think it's unreasonable to hope for 300MHz from V6 but I'm conservatively banking on at least 250MHz. My conservative conclusion after going through this whole exercise for KAT was that ROACH-2 could comfortably handle 4GHz bandwidth chunks at ~8Gsps (8000/32=250MHz clk rate) and that we'd start hitting various limits not long after that. So I would say that if you'r
Re: [casper] wideband conversion and correlation
Dear all who responded, First, I apologize for inadvertently cc'ing the entire list with a message to my internal team. A consequence of using autocomplete in the cc field to make sure I got the list address right. Thankfully I think I only said nice things ;) Second, I really appreciate all the responses which are very enlightening. I have little time to read carefully, and less time to respond, as I leave with my family to Cape Town this morning, and don't expect to surface for a good few days. I do look forward to connecting with the SA SKA/KAT group, probably in January to discuss this and other things in person. Perhaps the discussion will continue nonetheless. A few quick comments, based only on a scan of the responses. --the SMA is an 8 antenna array, with two active receivers per antenna. In particular might be dual pol, thus 16 "ant-pols". --we are certainly open to distributing the processing in the manner suggested by Dan, Mel, and possibly others. Even in such a scheme, though, an understanding of PFB fit and limits, and, related, increasing clock rates to improve performance is warranted. We are also open to not packetizing (on-board corner turn). --I made mention of 500 MHz FFT cores, those were advertised by industry DSP specialists we have had discussions with. Not designed with CASPER methods. Multiple clock domains are required, and perhaps we could "black box" one of these cores. I don't think anyone has commented on multiple clock domains in CASPER yet, Billy, anyone? (may have missed it on scanning). --We need to understand memory util, including bram, qdr, ddr, amount and bandwidth. Will read your comments carefully. --Andrew, our finding is that *both* multipliers and adders scale as Dlog2D (other terms, but this one dominates). If N is the size of the PFB they scale only as logN (I may mis-remember if this is dominant term). I don't understand the implication of the condition "(for large FFT sizes, i.e not doing straight butterfly)" I would very much like to discuss all of this with you, and others who might be interested, in CT if possible. --Dan your statement that D=64 or 128 would be possible is very encouraging, but appears to contradict what Suraj said. Would very much like to resolve this. Thanks to all who contributed. In a huge rush, please excuse mis- statement or typos, or questions on matters already addressed. Merry Christmas to those who celebrate it. And look forward to picking up this thread again. Jonathan On Dec 24, 2010, at 4:34 AM, Andrew Martens wrote: Hi Jonathan To start we are looking closely at the FPGA resource utilization of large PFBs. Something that probably is common knowledge amongst those experienced in FX correlator design is that the demux factor drives the utilization much faster than the size of the PFB. In that sense bandwidth is far more expensive than spectral resolution. We've put some effort into accurately quantifying the utilization, at least as far as multipliers and adders are concerned, and are expanding this analysis to block ram and other resources. And demux factor is typically radix 2, so it is very much quantized. Some thoughts on resource usage with the CASPER pfb_fir (for large FFT sizes, i.e not doing straight butterfly); complex multiplier usage; - scales linearly with the demux factor (often bandwidth) - scales linearly with number of FIR taps - is not affected by the FFT size adder usage (the final adder tree); - scales by nlogn with the demux factor. Will dominate adder usage for large demux factors - scales by nlogn with the number of FIR taps - is not affected by the FFT size BRAM usage; - scales linearly with demux factor but should not be affected (barring constraints set by underlying hardware). (BRAMs are currently not used efficiently - a separate set of coefficient and data storage BRAMs is not needed for each data input. The storage requirements should be completely dependent on FFT size and number of FIR taps). - scales linearly with the number of FIR taps. The current design could be improved so that BRAMs are more efficiently used though. - scales linearly with FFT size. Routing constraints; The design is simple, highly pipelined (almost no feedback) with very low fanout. Major constraints are BRAM to DSP slice, DSP slice to DSP slice and rounding, all of which are parameterised. Optimisations possible; The efficiency of BRAM use can be improved with some small logic savings. Resource usage in the CASPER FFT (when using the biplex FFT (eg fft_wideband_real and fft for 'large' FFTs); complex multiplier usage; - dominated by (n/2)*log2n (n = demux factor) needed in fft_direct for large FFTs. - scales linearly with increase in FFT size. BRAM usage; - scales linearly with bandwidth for large FFTs if FFT size kept const
Re: [casper] wideband conversion and correlation
Hi Jonathan To start we are looking closely at the FPGA resource utilization of large > PFBs. Something that probably is common knowledge amongst those experienced > in FX correlator design is that the demux factor drives the utilization much > faster than the size of the PFB. In that sense bandwidth is far more > expensive than spectral resolution. We've put some effort into accurately > quantifying the utilization, at least as far as multipliers and adders are > concerned, and are expanding this analysis to block ram and other resources. > And demux factor is typically radix 2, so it is very much quantized. > Some thoughts on resource usage with the CASPER pfb_fir (for large FFT sizes, i.e not doing straight butterfly); complex multiplier usage; - scales linearly with the demux factor (often bandwidth) - scales linearly with number of FIR taps - is not affected by the FFT size adder usage (the final adder tree); - scales by nlogn with the demux factor. Will dominate adder usage for large demux factors - scales by nlogn with the number of FIR taps - is not affected by the FFT size BRAM usage; - scales linearly with demux factor but should not be affected (barring constraints set by underlying hardware). (BRAMs are currently not used efficiently - a separate set of coefficient and data storage BRAMs is not needed for each data input. The storage requirements should be completely dependent on FFT size and number of FIR taps). - scales linearly with the number of FIR taps. The current design could be improved so that BRAMs are more efficiently used though. - scales linearly with FFT size. Routing constraints; The design is simple, highly pipelined (almost no feedback) with very low fanout. Major constraints are BRAM to DSP slice, DSP slice to DSP slice and rounding, all of which are parameterised. Optimisations possible; The efficiency of BRAM use can be improved with some small logic savings. Resource usage in the CASPER FFT (when using the biplex FFT (eg fft_wideband_real and fft for 'large' FFTs); complex multiplier usage; - dominated by (n/2)*log2n (n = demux factor) needed in fft_direct for large FFTs. - scales linearly with increase in FFT size. BRAM usage; - scales linearly with bandwidth for large FFTs if FFT size kept constant. - for constant (large) FFT size, unaffected by demux factor. Biplex cores shrink in length by one stage while doubling in number for each doubling in demux factor. - scales roughly like n^2 with increase in FFT size. Routing constraints; The FFT is highly pipelined with low fanout except for in the unscrambler (although some work has been done here and the unscrambler is now optional). Major constraints are BRAM to DSP slice, DSP slice to DSP slice and rounding, all of which are parameterised. Optimisations possible; Various optimisations are still possible; - Coefficients could be shared between twiddles, reducing the number of BRAMs required by the demux factor. This would be significant for large demux factor designs at the expense of some fanout. - BRAMs used for delaying data could be shared between input streams, saving some BRAMs at the expense of extra routing constraints. - As Dan has suggested, grow the bits in the FFT at each stage as needed to reduce logic (and BRAM) use and probably help timing. Care should be taken however, as data quality is directly related to the width of the data path through the FFT. As noted by Jason, please also remember that other constraints such as QDR SRAM and XAUI bandwidth needs to be considered when building such a large system. Dan's suggestion of FFX is worth considering. It is upgradeable, allowing the addition of newer, more capable boards as they come online until you end up with a simple FX correlator again. I would love to see a correlator like that in action. Regards Andrew
Re: [casper] wideband conversion and correlation
To the best of my knowledge, nobody's built a CASPER correlator that processes such high bandwidths. I took a closer look at bringing 20Gsps into a ROACH2 for MeerKAT use a few months back. My conclusion was that this would be possible with current libraries with minimal changes. However, we weren't aiming for 32k PFBs and we weren't aiming to process the entire 10GHz band (we'd use a DDC and only process a couple of GHz). I believe that you could put a PFB on the whole band if you tweak and optimise the library block as Billy has just done with the FFT. Though the FPGA might not run at very high speeds and you might not get the spectral resolution that you want directly due to resource consumption of pipelining, I think it would be possible do break this up into subbands (FFX approach) on a single board. I will highlight the following limitations with processing such large bandwidths on the ROACH2 platform: 1) QDR corner-turn bandwidth. You don't mention how many inputs you're planning and so you might not need the packetised infrastructure at all (perhaps you're considering something like Billy's point-to-point 3GHz 3-input correlator). ROACH2 will have four 36-bit QDR interfaces. These can be ganged together and demuxed to produce a single 288bit SDR interface so that your limits would be: 32 parallel_streams * 4bit * 2complex = need 256-bit interface. These 32 parallel streams are complex, post-FFT (after the imag half of spectrum has been tossed) so that it would represent ~300MHz*32=9.6GHz of real band. So you might be OK here. 2) QDR capacity for the corner turn is much less of a concern. With a packet length of 128 (what everyone's using right now), you can have up to 64 antennas: 128pkt_len * 32768chan * 4bit * 2complex = 32Mbit per dual-pol antenna. *) With the huge BRAM reserves on the V6, it might even be possible to bypass the QDR and do the whole corner-turn in BRAM, especially if you opt for smaller packet sizes (which'd result in smaller buffers and potentially faster dump rates but with reduced network efficiencies due to smaller payload/header ratio). 3) Another consideration, and possible deal-breaker, is the interconnect: ROACH-II will have 8 10GbE links (or maybe later two 40Gbps links) which could carry a little over 7GHz bandwidth after network overhead. Again, if you're not aiming for a packetised system, then you can do a little better. If your ADC is going to use some of the SERDES lines though (as many of the new high speed samplers do), then you might have to forfeit some of this interconnect. But basically, I think you're going to run out of bandwidth to get 10GHz out. WRT clock rates, I think that 300MHz should be achievable on ROACH-II with a little tweaking. ROACH-1 is able to do 250MHz with much less fiddling than the iBOBs at these speeds. The iBOB with the old libraries used to start choking around just 200MHz. So the clock rates are improving a little generation-to-generation and I don't think it's unreasonable to hope for 300MHz from V6 but I'm conservatively banking on at least 250MHz. My conservative conclusion after going through this whole exercise for KAT was that ROACH-2 could comfortably handle 4GHz bandwidth chunks at ~8Gsps (8000/32=250MHz clk rate) and that we'd start hitting various limits not long after that. So I would say that if you're considering ROACH-2 as a platform, you'd be safe if aiming for IF chunks around 4 or 5 GHz. Jason On 24 Dec 2010, at 07:51, Dan Werthimer wrote: > >> On 2. it seems to me that if we are digitizing a 9 GHz and using 20 Gsps, >> one still needs substantial demux (at least 64) no matter how small the PFB. >> As Sura points out this is far in excess of practical limits. This stacks >> with what we have found: BW is the difficult part, large PFB for high res >> less so. > > > hi jonathan, > > i agree you need to demux 20 Gsps by 64 or 128, but i don't think this will > be a problem. > 20 Gsps should fit pretty easily into an FPGA an FFX correlator: > > in my example of the FFX, you'd need to implement an 8 point PFB > on the first FPGA to break the 10 GHz band into 8 sub-bands. > let's assume you do demux of 64, and clock the FPGA at 312.5 MHz: > you'd need 64*8 multipliers to implement the FIR part of an 8 tap PFB. > and 64 * 16 multipliers to implement the real to complex FFT part of the PFB. > all the multipliers have fixed coefficients - no need to use block rams to > store coefficients - no block rams are needed for delays or coefficients, as > you'd > implement the butterfly diagram directly. > > so there's no coefficient routing, but there is data routing. > the data paths can all be 8 bit, and you can add pipeline registers > where needed, so you should be able to get to 312.5 MHz. > > if you can't get the FPGA to route at 312.5 MHz, then you'd have > to demux by 128, and you'd need twice as many multipliers. > (instead of 1536 multi
Re: [casper] wideband conversion and correlation
On 2. it seems to me that if we are digitizing a 9 GHz and using 20 Gsps, one still needs substantial demux (at least 64) no matter how small the PFB. As Sura points out this is far in excess of practical limits. This stacks with what we have found: BW is the difficult part, large PFB for high res less so. hi jonathan, i agree you need to demux 20 Gsps by 64 or 128, but i don't think this will be a problem. 20 Gsps should fit pretty easily into an FPGA an FFX correlator: in my example of the FFX, you'd need to implement an 8 point PFB on the first FPGA to break the 10 GHz band into 8 sub-bands. let's assume you do demux of 64, and clock the FPGA at 312.5 MHz: you'd need 64*8 multipliers to implement the FIR part of an 8 tap PFB. and 64 * 16 multipliers to implement the real to complex FFT part of the PFB. all the multipliers have fixed coefficients - no need to use block rams to store coefficients - no block rams are needed for delays or coefficients, as you'd implement the butterfly diagram directly. so there's no coefficient routing, but there is data routing. the data paths can all be 8 bit, and you can add pipeline registers where needed, so you should be able to get to 312.5 MHz. if you can't get the FPGA to route at 312.5 MHz, then you'd have to demux by 128, and you'd need twice as many multipliers. (instead of 1536 multipliers, it would take 3072 multipliers). you can use block rams for many of the multipliers, as most of the computations are multiplying 8 bit data by a fixed coefficient, so an 8 input, 8 output look up table is all you need. if you don't want to implement a an 8 channel PFB, you could also implement this as eight DDC's running in parallel from the same ADC data, each DDC with a different downmix frequency. the mixer coefficients are fixed, and many of the coefficients are 0, 1, -1. the DDC"s low pass filter coefficients are fixed as well - you can use look up tables for the low pass filters multipliers and the mixer multipliers if you are short on DSP48's. best wishes, dan BTW I realize as I write that my 6 GHz BW demux 32 case suggested in response to Suraj still requires > 400 MHz FPGA clock, thus not so practical. Can one gain a factor of 2 in demux doing quadrature sampling, and having I and Q inputs to a complex input PFB each at 1/2 the rate? Jonathan On Dec 23, 2010, at 5:24 PM, Dan Werthimer wrote: hi jonathan, some ideas for your correlator: 1) 300 MHz is a good target, especially for V6. suraj has shown how to achieve 375 MHz for V5 by using floor planning and auto-placing. suraj or i can send you his draft paper on this if you'd like. 2) you might want to consider FFX instead of FX: eg: digitizing your 9 GHz band and using a PFB to break it up into eight sub-bands of 1.25 GHz each, and then sending the sub-bands into eight 1.25 GHz FX correlators. this will simplify your switch requirements and each correlator now has only 4K channels, which is better suited for cornering turn in a roach II. 3) also, be sure to use billy's latest FFT, (recently checked in), which moves all the adders and multipliers into DSP48's makes routing easier. you should also consider bit growth FFT's and PFB's, which start out with the 4 or 5 or 8 bits from your ADC, and add bits gradually as you move the frequency domain. dave mcmahon and hong chen have done work on this. best wishes, dan On 12/23/2010 1:47 PM, Jonathan Weintroub wrote: Hi CASPERites, Here's a somewhat fluffy RFI which I hope might start a little thought and/or discussion over the season (acknowledging that not all in the global collaboration celebrate the traditional Western winter holidays): At SMA we are looking into the use of CASPER methods to build a ultra wideband high spectral resolution correlator. Typical specs are, say, 18 GHz bandwidth with roughly 300 KHz spectral resolution, by two polarizations, full Stokes. We are considering using a standard CASPER packetized FX architecture (FX much better for high res than XF), but in the relatively unexplored "small number of antennas, wide bandwidth" regime. If the entire 18 GHz were eaten by one ADC, this would require a sample rate of 40 Gsps and 64 kpoint PFB. Perhaps more reasonable would be two 9 GHz BW blocks and a 32 k PFB sampled at about 20 Gsps, or three 6 GHz / 16 or 32 k PFB / 14 Gsps. To start we are looking closely at the FPGA resource utilization of large PFBs. Something that probably is common knowledge amongst those experienced in FX correlator design is that the demux factor drives the utilization much faster than the size of the PFB. In that sense bandwidth is far more expensive than spectral resolution. We've put some effort into accurately quantifying the utilization, at least as far as multipliers and adders are concerned, and are expanding this analysis to block ram and other resources. And demux factor is typically radix 2, so it is very much quan
Re: [casper] wideband conversion and correlation
Another reason to consider FFX is more flexible selection of how many channels you want in each subband. ii) Equalization across a 16 GHz may be an issue. If there is too much slope across the band, suppose in an extreme case, a change at low gains end might make no change in the digitized signal. Again, there might be some specification on bandpass flatness requirements. mel On 12/23/10, Dan Werthimer wrote: > > > hi jonathan, > > some ideas for your correlator: > > 1) > 300 MHz is a good target, especially for V6. > suraj has shown how to achieve 375 MHz for V5 > by using floor planning and auto-placing. > suraj or i can send you his draft paper on this if you'd like. > > 2) > you might want to consider FFX instead of FX: > eg: digitizing your 9 GHz band and using a PFB to break it up into eight > sub-bands > of 1.25 GHz each, and then sending the sub-bands into eight 1.25 GHz > FX correlators. this will simplify your switch requirements and each > correlator > now has only 4K channels, which is better suited for cornering turn in a > roach II. > > 3) > also, be sure to use billy's latest FFT, (recently checked in), > which moves all the adders and multipliers into DSP48's makes routing > easier. > you should also consider bit growth FFT's and PFB's, which start > out with the 4 or 5 or 8 bits from your ADC, and add bits gradually > as you move the frequency domain. dave mcmahon and hong chen > have done work on this. > > best wishes, > > dan > > On 12/23/2010 1:47 PM, Jonathan Weintroub wrote: >> Hi CASPERites, >> >> Here's a somewhat fluffy RFI which I hope might start a little thought >> and/or discussion over the season (acknowledging that not all in the >> global collaboration celebrate the traditional Western winter holidays): >> >> At SMA we are looking into the use of CASPER methods to build a ultra >> wideband high spectral resolution correlator. Typical specs are, say, >> 18 GHz bandwidth with roughly 300 KHz spectral resolution, by two >> polarizations, full Stokes. We are considering using a standard >> CASPER packetized FX architecture (FX much better for high res than >> XF), but in the relatively unexplored "small number of antennas, wide >> bandwidth" regime. If the entire 18 GHz were eaten by one ADC, this >> would require a sample rate of 40 Gsps and 64 kpoint PFB. Perhaps >> more reasonable would be two 9 GHz BW blocks and a 32 k PFB sampled at >> about 20 Gsps, or three 6 GHz / 16 or 32 k PFB / 14 Gsps. >> >> To start we are looking closely at the FPGA resource utilization of >> large PFBs. Something that probably is common knowledge amongst those >> experienced in FX correlator design is that the demux factor drives >> the utilization much faster than the size of the PFB. In that sense >> bandwidth is far more expensive than spectral resolution. We've put >> some effort into accurately quantifying the utilization, at least as >> far as multipliers and adders are concerned, and are expanding this >> analysis to block ram and other resources. And demux factor is >> typically radix 2, so it is very much quantized. >> >> For example at 20 Gsps one might consider a demux factor of 128 >> resulting in an FPGA clock rate of 156 MHz, which is quite comfortable >> for the FPGA. Alternatively a demux factor of 64 with corresponding >> FPGA clock of twice that, or over 300 MHz. Traditionally a rather >> uncomfortable regime for CASPER (we're unusual, I believe, in running >> iBOBs at 256 MHz for the VLBI phased array). The trouble is our >> analysis shows that the difference between these two demux setting in >> the size of PFB one can fit in a Virtex 6 is really quite large, and >> 128 definitely won't allow us to do what we need to do. >> >> So we are increasingly highly motivated to run the FPGAs faster >> still. Just a 20% increment from the 256 MHz which we currently view >> as a practical upper limit allows us to cross a clock rate threshold >> which then enables a factor of two decrease in demux factor, and >> consequent even larger increment in the realizable PFB size. >> >> Which is just a long winded way of asking if there are any others in >> the collaboration motivated to run the FPGAs faster, and whether any >> tricks can be shared? In particular, does the CASPER toolflow support >> multiple clock domains? Our understanding is not yet, but that's based >> on incomplete information. We know that there exists Virtex 5 (?) IP >> FFT cores which supposably run at greater than 500 MHz rates, using >> the enhanced interconnect between DSP slices. >> >> While on this topic of high demux factors, the tool flow largely >> chokes on demux factors of 32 or greater. Any tips here would also be >> appreciated. >> >> If anyone can cast light on this general topic and related concerns it >> would be very much appreciated. >> >> Jonathan Weintroub >> SAO >> >> >> >> > > >
Re: [casper] wideband conversion and correlation
Hi Dan, Thanks for the input. As you see, Suraj has responded, and I will explore his techniques with him. Yes, very interested in any papers. 3. is good advice. On 2. it seems to me that if we are digitizing a 9 GHz and using 20 Gsps, one still needs substantial demux (at least 64) no matter how small the PFB. As Sura points out this is far in excess of practical limits. This stacks with what we have found: BW is the difficult part, large PFB for high res less so. BTW I realize as I write that my 6 GHz BW demux 32 case suggested in response to Suraj still requires > 400 MHz FPGA clock, thus not so practical. Can one gain a factor of 2 in demux doing quadrature sampling, and having I and Q inputs to a complex input PFB each at 1/2 the rate? Jonathan On Dec 23, 2010, at 5:24 PM, Dan Werthimer wrote: hi jonathan, some ideas for your correlator: 1) 300 MHz is a good target, especially for V6. suraj has shown how to achieve 375 MHz for V5 by using floor planning and auto-placing. suraj or i can send you his draft paper on this if you'd like. 2) you might want to consider FFX instead of FX: eg: digitizing your 9 GHz band and using a PFB to break it up into eight sub-bands of 1.25 GHz each, and then sending the sub-bands into eight 1.25 GHz FX correlators. this will simplify your switch requirements and each correlator now has only 4K channels, which is better suited for cornering turn in a roach II. 3) also, be sure to use billy's latest FFT, (recently checked in), which moves all the adders and multipliers into DSP48's makes routing easier. you should also consider bit growth FFT's and PFB's, which start out with the 4 or 5 or 8 bits from your ADC, and add bits gradually as you move the frequency domain. dave mcmahon and hong chen have done work on this. best wishes, dan On 12/23/2010 1:47 PM, Jonathan Weintroub wrote: Hi CASPERites, Here's a somewhat fluffy RFI which I hope might start a little thought and/or discussion over the season (acknowledging that not all in the global collaboration celebrate the traditional Western winter holidays): At SMA we are looking into the use of CASPER methods to build a ultra wideband high spectral resolution correlator. Typical specs are, say, 18 GHz bandwidth with roughly 300 KHz spectral resolution, by two polarizations, full Stokes. We are considering using a standard CASPER packetized FX architecture (FX much better for high res than XF), but in the relatively unexplored "small number of antennas, wide bandwidth" regime. If the entire 18 GHz were eaten by one ADC, this would require a sample rate of 40 Gsps and 64 kpoint PFB. Perhaps more reasonable would be two 9 GHz BW blocks and a 32 k PFB sampled at about 20 Gsps, or three 6 GHz / 16 or 32 k PFB / 14 Gsps. To start we are looking closely at the FPGA resource utilization of large PFBs. Something that probably is common knowledge amongst those experienced in FX correlator design is that the demux factor drives the utilization much faster than the size of the PFB. In that sense bandwidth is far more expensive than spectral resolution. We've put some effort into accurately quantifying the utilization, at least as far as multipliers and adders are concerned, and are expanding this analysis to block ram and other resources. And demux factor is typically radix 2, so it is very much quantized. For example at 20 Gsps one might consider a demux factor of 128 resulting in an FPGA clock rate of 156 MHz, which is quite comfortable for the FPGA. Alternatively a demux factor of 64 with corresponding FPGA clock of twice that, or over 300 MHz. Traditionally a rather uncomfortable regime for CASPER (we're unusual, I believe, in running iBOBs at 256 MHz for the VLBI phased array). The trouble is our analysis shows that the difference between these two demux setting in the size of PFB one can fit in a Virtex 6 is really quite large, and 128 definitely won't allow us to do what we need to do. So we are increasingly highly motivated to run the FPGAs faster still. Just a 20% increment from the 256 MHz which we currently view as a practical upper limit allows us to cross a clock rate threshold which then enables a factor of two decrease in demux factor, and consequent even larger increment in the realizable PFB size. Which is just a long winded way of asking if there are any others in the collaboration motivated to run the FPGAs faster, and whether any tricks can be shared? In particular, does the CASPER toolflow support multiple clock domains? Our understanding is not yet, but that's based on incomplete information. We know that there exists Virtex 5 (?) IP FFT cores which supposably run at greater than 500 MHz rates, using the enhanced interconnect between DSP slices. While on this topic of high demux factors, the tool flow largely chokes on demux factors of 3
Re: [casper] wideband conversion and correlation
Thanks, Suraj, it is good to of your experiences. I may ask more in time about the details of your implementation. Also, the practical limit, our analysis so far is purely number of multiplies and adds, and does yet look at routing. However, is your practical limit for Virtex 5, and might a demux 32 work on a Virtex 6? Demux 32 is an interesting case for us (6 GHz blocks and 14 GSa/s or so). Jonathan On Dec 23, 2010, at 5:09 PM, Suraj Gowda wrote: Hi Jonathan, I have been able to build spectrometers (FFT only) that operate at 375 MHz FPGA clock rate (3 GHz bandwidth). I don't know of anyone who has operated faster designs. 16x is a practical limit for demux factors for the FFT. The reason is that the fft_direct block for 16 inputs uses 32 butterflies, which can be fit in 3 DSP48E columns. A demux factor of 32x would substantially increase the routing complexity, probably reducing the overall speed. But this is only my best guess, I haven't actually tried. -Suraj On Dec 23, 2010, at 4:47 PM, Jonathan Weintroub wrote: Hi CASPERites, Here's a somewhat fluffy RFI which I hope might start a little thought and/or discussion over the season (acknowledging that not all in the global collaboration celebrate the traditional Western winter holidays): At SMA we are looking into the use of CASPER methods to build a ultra wideband high spectral resolution correlator. Typical specs are, say, 18 GHz bandwidth with roughly 300 KHz spectral resolution, by two polarizations, full Stokes. We are considering using a standard CASPER packetized FX architecture (FX much better for high res than XF), but in the relatively unexplored "small number of antennas, wide bandwidth" regime. If the entire 18 GHz were eaten by one ADC, this would require a sample rate of 40 Gsps and 64 kpoint PFB. Perhaps more reasonable would be two 9 GHz BW blocks and a 32 k PFB sampled at about 20 Gsps, or three 6 GHz / 16 or 32 k PFB / 14 Gsps. To start we are looking closely at the FPGA resource utilization of large PFBs. Something that probably is common knowledge amongst those experienced in FX correlator design is that the demux factor drives the utilization much faster than the size of the PFB. In that sense bandwidth is far more expensive than spectral resolution. We've put some effort into accurately quantifying the utilization, at least as far as multipliers and adders are concerned, and are expanding this analysis to block ram and other resources. And demux factor is typically radix 2, so it is very much quantized. For example at 20 Gsps one might consider a demux factor of 128 resulting in an FPGA clock rate of 156 MHz, which is quite comfortable for the FPGA. Alternatively a demux factor of 64 with corresponding FPGA clock of twice that, or over 300 MHz. Traditionally a rather uncomfortable regime for CASPER (we're unusual, I believe, in running iBOBs at 256 MHz for the VLBI phased array). The trouble is our analysis shows that the difference between these two demux setting in the size of PFB one can fit in a Virtex 6 is really quite large, and 128 definitely won't allow us to do what we need to do. So we are increasingly highly motivated to run the FPGAs faster still. Just a 20% increment from the 256 MHz which we currently view as a practical upper limit allows us to cross a clock rate threshold which then enables a factor of two decrease in demux factor, and consequent even larger increment in the realizable PFB size. Which is just a long winded way of asking if there are any others in the collaboration motivated to run the FPGAs faster, and whether any tricks can be shared? In particular, does the CASPER toolflow support multiple clock domains? Our understanding is not yet, but that's based on incomplete information. We know that there exists Virtex 5 (?) IP FFT cores which supposably run at greater than 500 MHz rates, using the enhanced interconnect between DSP slices. While on this topic of high demux factors, the tool flow largely chokes on demux factors of 32 or greater. Any tips here would also be appreciated. If anyone can cast light on this general topic and related concerns it would be very much appreciated. Jonathan Weintroub SAO
Re: [casper] wideband conversion and correlation
hi jonathan, some ideas for your correlator: 1) 300 MHz is a good target, especially for V6. suraj has shown how to achieve 375 MHz for V5 by using floor planning and auto-placing. suraj or i can send you his draft paper on this if you'd like. 2) you might want to consider FFX instead of FX: eg: digitizing your 9 GHz band and using a PFB to break it up into eight sub-bands of 1.25 GHz each, and then sending the sub-bands into eight 1.25 GHz FX correlators. this will simplify your switch requirements and each correlator now has only 4K channels, which is better suited for cornering turn in a roach II. 3) also, be sure to use billy's latest FFT, (recently checked in), which moves all the adders and multipliers into DSP48's makes routing easier. you should also consider bit growth FFT's and PFB's, which start out with the 4 or 5 or 8 bits from your ADC, and add bits gradually as you move the frequency domain. dave mcmahon and hong chen have done work on this. best wishes, dan On 12/23/2010 1:47 PM, Jonathan Weintroub wrote: Hi CASPERites, Here's a somewhat fluffy RFI which I hope might start a little thought and/or discussion over the season (acknowledging that not all in the global collaboration celebrate the traditional Western winter holidays): At SMA we are looking into the use of CASPER methods to build a ultra wideband high spectral resolution correlator. Typical specs are, say, 18 GHz bandwidth with roughly 300 KHz spectral resolution, by two polarizations, full Stokes. We are considering using a standard CASPER packetized FX architecture (FX much better for high res than XF), but in the relatively unexplored "small number of antennas, wide bandwidth" regime. If the entire 18 GHz were eaten by one ADC, this would require a sample rate of 40 Gsps and 64 kpoint PFB. Perhaps more reasonable would be two 9 GHz BW blocks and a 32 k PFB sampled at about 20 Gsps, or three 6 GHz / 16 or 32 k PFB / 14 Gsps. To start we are looking closely at the FPGA resource utilization of large PFBs. Something that probably is common knowledge amongst those experienced in FX correlator design is that the demux factor drives the utilization much faster than the size of the PFB. In that sense bandwidth is far more expensive than spectral resolution. We've put some effort into accurately quantifying the utilization, at least as far as multipliers and adders are concerned, and are expanding this analysis to block ram and other resources. And demux factor is typically radix 2, so it is very much quantized. For example at 20 Gsps one might consider a demux factor of 128 resulting in an FPGA clock rate of 156 MHz, which is quite comfortable for the FPGA. Alternatively a demux factor of 64 with corresponding FPGA clock of twice that, or over 300 MHz. Traditionally a rather uncomfortable regime for CASPER (we're unusual, I believe, in running iBOBs at 256 MHz for the VLBI phased array). The trouble is our analysis shows that the difference between these two demux setting in the size of PFB one can fit in a Virtex 6 is really quite large, and 128 definitely won't allow us to do what we need to do. So we are increasingly highly motivated to run the FPGAs faster still. Just a 20% increment from the 256 MHz which we currently view as a practical upper limit allows us to cross a clock rate threshold which then enables a factor of two decrease in demux factor, and consequent even larger increment in the realizable PFB size. Which is just a long winded way of asking if there are any others in the collaboration motivated to run the FPGAs faster, and whether any tricks can be shared? In particular, does the CASPER toolflow support multiple clock domains? Our understanding is not yet, but that's based on incomplete information. We know that there exists Virtex 5 (?) IP FFT cores which supposably run at greater than 500 MHz rates, using the enhanced interconnect between DSP slices. While on this topic of high demux factors, the tool flow largely chokes on demux factors of 32 or greater. Any tips here would also be appreciated. If anyone can cast light on this general topic and related concerns it would be very much appreciated. Jonathan Weintroub SAO
Re: [casper] wideband conversion and correlation
Hi Jonathan, I have been able to build spectrometers (FFT only) that operate at 375 MHz FPGA clock rate (3 GHz bandwidth). I don't know of anyone who has operated faster designs. 16x is a practical limit for demux factors for the FFT. The reason is that the fft_direct block for 16 inputs uses 32 butterflies, which can be fit in 3 DSP48E columns. A demux factor of 32x would substantially increase the routing complexity, probably reducing the overall speed. But this is only my best guess, I haven't actually tried. -Suraj On Dec 23, 2010, at 4:47 PM, Jonathan Weintroub wrote: Hi CASPERites, Here's a somewhat fluffy RFI which I hope might start a little thought and/or discussion over the season (acknowledging that not all in the global collaboration celebrate the traditional Western winter holidays): At SMA we are looking into the use of CASPER methods to build a ultra wideband high spectral resolution correlator. Typical specs are, say, 18 GHz bandwidth with roughly 300 KHz spectral resolution, by two polarizations, full Stokes. We are considering using a standard CASPER packetized FX architecture (FX much better for high res than XF), but in the relatively unexplored "small number of antennas, wide bandwidth" regime. If the entire 18 GHz were eaten by one ADC, this would require a sample rate of 40 Gsps and 64 kpoint PFB. Perhaps more reasonable would be two 9 GHz BW blocks and a 32 k PFB sampled at about 20 Gsps, or three 6 GHz / 16 or 32 k PFB / 14 Gsps. To start we are looking closely at the FPGA resource utilization of large PFBs. Something that probably is common knowledge amongst those experienced in FX correlator design is that the demux factor drives the utilization much faster than the size of the PFB. In that sense bandwidth is far more expensive than spectral resolution. We've put some effort into accurately quantifying the utilization, at least as far as multipliers and adders are concerned, and are expanding this analysis to block ram and other resources. And demux factor is typically radix 2, so it is very much quantized. For example at 20 Gsps one might consider a demux factor of 128 resulting in an FPGA clock rate of 156 MHz, which is quite comfortable for the FPGA. Alternatively a demux factor of 64 with corresponding FPGA clock of twice that, or over 300 MHz. Traditionally a rather uncomfortable regime for CASPER (we're unusual, I believe, in running iBOBs at 256 MHz for the VLBI phased array). The trouble is our analysis shows that the difference between these two demux setting in the size of PFB one can fit in a Virtex 6 is really quite large, and 128 definitely won't allow us to do what we need to do. So we are increasingly highly motivated to run the FPGAs faster still. Just a 20% increment from the 256 MHz which we currently view as a practical upper limit allows us to cross a clock rate threshold which then enables a factor of two decrease in demux factor, and consequent even larger increment in the realizable PFB size. Which is just a long winded way of asking if there are any others in the collaboration motivated to run the FPGAs faster, and whether any tricks can be shared? In particular, does the CASPER toolflow support multiple clock domains? Our understanding is not yet, but that's based on incomplete information. We know that there exists Virtex 5 (?) IP FFT cores which supposably run at greater than 500 MHz rates, using the enhanced interconnect between DSP slices. While on this topic of high demux factors, the tool flow largely chokes on demux factors of 32 or greater. Any tips here would also be appreciated. If anyone can cast light on this general topic and related concerns it would be very much appreciated. Jonathan Weintroub SAO
Re: [casper] wideband conversion and correlation
Hi Jonathan, Other specs ? For SMA 8 or 10 antennas x 2 pols analog input possible For CARMA 15 antennas x 2 pols, or 23 ants x 1 pol, or 23 ants x 2 pols Output sample and accumulation time for cross correlation: typical 10sec, fast for longer baselines 1 sec. Mel. On Thu, Dec 23, 2010 at 1:47 PM, Jonathan Weintroub wrote: > Hi CASPERites, > > Here's a somewhat fluffy RFI which I hope might start a little thought > and/or discussion over the season (acknowledging that not all in the global > collaboration celebrate the traditional Western winter holidays): > > At SMA we are looking into the use of CASPER methods to build a ultra > wideband high spectral resolution correlator. Typical specs are, say, 18 > GHz bandwidth with roughly 300 KHz spectral resolution, by two > polarizations, full Stokes. We are considering using a standard CASPER > packetized FX architecture (FX much better for high res than XF), but in the > relatively unexplored "small number of antennas, wide bandwidth" regime. > If the entire 18 GHz were eaten by one ADC, this would require a sample rate > of 40 Gsps and 64 kpoint PFB. Perhaps more reasonable would be two 9 GHz > BW blocks and a 32 k PFB sampled at about 20 Gsps, or three 6 GHz / 16 or 32 > k PFB / 14 Gsps. > > To start we are looking closely at the FPGA resource utilization of large > PFBs. Something that probably is common knowledge amongst those experienced > in FX correlator design is that the demux factor drives the utilization much > faster than the size of the PFB. In that sense bandwidth is far more > expensive than spectral resolution. We've put some effort into accurately > quantifying the utilization, at least as far as multipliers and adders are > concerned, and are expanding this analysis to block ram and other resources. > And demux factor is typically radix 2, so it is very much quantized. > > For example at 20 Gsps one might consider a demux factor of 128 resulting in > an FPGA clock rate of 156 MHz, which is quite comfortable for the FPGA. > Alternatively a demux factor of 64 with corresponding FPGA clock of twice > that, or over 300 MHz. Traditionally a rather uncomfortable regime for > CASPER (we're unusual, I believe, in running iBOBs at 256 MHz for the VLBI > phased array). The trouble is our analysis shows that the difference > between these two demux setting in the size of PFB one can fit in a Virtex 6 > is really quite large, and 128 definitely won't allow us to do what we need > to do. > > So we are increasingly highly motivated to run the FPGAs faster still. Just > a 20% increment from the 256 MHz which we currently view as a practical > upper limit allows us to cross a clock rate threshold which then enables a > factor of two decrease in demux factor, and consequent even larger increment > in the realizable PFB size. > > Which is just a long winded way of asking if there are any others in the > collaboration motivated to run the FPGAs faster, and whether any tricks can > be shared? In particular, does the CASPER toolflow support multiple clock > domains? Our understanding is not yet, but that's based on incomplete > information. We know that there exists Virtex 5 (?) IP FFT cores which > supposably run at greater than 500 MHz rates, using the enhanced > interconnect between DSP slices. > > While on this topic of high demux factors, the tool flow largely chokes on > demux factors of 32 or greater. Any tips here would also be appreciated. > > If anyone can cast light on this general topic and related concerns it would > be very much appreciated. > > Jonathan Weintroub > SAO > > > > >