Dave, Many thanks to you and others for help.
Perhaps I’d better start from scratch and design my own large n cross-correlator, with single bit samplers, looking to minimise risk. Cheers, Neil From: casper-boun...@lists.berkeley.edu [mailto:casper-boun...@lists.berkeley.edu] On Behalf Of David Hawkins Sent: 19 December 2015 19:46 To: casper@lists.berkeley.edu Subject: Re: [casper] building 300-receiver channel cross-correlator Hi Neil, Here's my back-of-the-envelope calculations 1. Complex-valued multiplier (CMULT) a = a_re + j*a_im b = b_re + j*b_im c = a*conj(b) = (a_re + j*a_im)*(b_re - j*b_im) Given 1-bit sampled I+Q components, there are 4-inputs, so the logic will map nicely to a 4-input LUT. The number of LUTs required depends on the number of bits out of the product table. If the 1-bit values are assigned -1, 1, then each of the products takes on the values -1 or 1. The sums in the complex product each take on the values -2, 0, or 2, i.e., three possible output values. Divide-by-2 and these three values can be -1, 0, 1, or the signed 2-bit codes 11, 00, 01, or you could add 1 to get the products 0, 1, and 2, and use codes 00, 01, 10. Given the fact that each complex-valued product component has a 2-bit output, a complex-valued multiplier requires 4 x 4-LUTs. Your correlator needs 300x299/2 = 44850 of these CMULTs, i.e., 179400 4-LUTs. The 2-bit plus 2-bit output of these CMULTs would feed accumulators. 2. Accumulators If your 1-bit ADCs get stuck in one state, eg., always outputting -1 or 1, then you will get a static product out of your CMULT. For an integration time of 10ms at 300MHz, you have 3M samples, so your accumulator will have a bit growth of log2(3M) = 22-bits, i.e., each accumulator would need to have a worst-case bit-width of around 24-bits. For random noise the bit-growth would be half this. So your solution will depend on whether you expect to have to handle coherent signals, eg., RFI. The first thing to realize is that you do not need to use LUTs to implement these accumulators. You can use a combination of all the resources available to you on your selected FPGA. For example, you can use DSP blocks split into sub-accumulators, or you can use short counters implemented in LUTs, and then use RAM for long-term accumulators. For example, lets say you have an FPGA with a 48-bit DSP block, and you consider that 48-bit DSP block as 4 x 12-bit counters. If your CMAC outputs the unsigned codes 0, 1, 2, then the input to your DSP block accumulator for two complex-valued products would be; 0000_0000_00aa_0000_0000_00bb_0000_0000_00cc_0000_0000_0000_000dd where the aa, bb, cc, dd are the 2-bit outputs of two CMACs. The unsigned 0, 1, 2 values can be accumulated for 2^12/2 = 2048 clocks before they overflow. (Overflow into the next 12-bit data word, and corrupt the DSP block 48bit accumulator contents). You could dump the DSP blocks every 2048 clocks into RAM, and then a long-term accumulator could read the RAM and accumulate the data further. The 44850 CMULTs could feed into 22425 of these 48-bit accumulators. Assuming a device with say 4000 DSP blocks, you'd need to implement the other accumulators in the fabric, eg., 18425 x 48-bits = 880k registers. Clearly your dominant resource is going to be your accumulation logic, so you'll want to carefully investigate methods of performing a low-bit-width accumulation of the CMULT output, and then use RAM-based long-term accumulation, and then get the data off the chip. What is the advantage of RAM accumulation? If the fast accumulation occurs for say 1000 clocks, your RAM accumulation logic has 1000 clocks in which to do its work, so one accumulator can read two RAMs and accumulate the two values for 1000 different partial products, i.e., you save 999 accumulators by reusing one. This system sounds like it could fit into one FPGA. You'd have to figure out how to get all 300 inputs onto the FPGA, eg., perhaps using 600 LVDS receivers (assuming you could find a device with that many). The other option to consider is several lower-cost FPGAs, eg., 4 x FPGAs with 150 LVDS pairs each linked together with enough serdes links to take the data between the devices that need it. You should start by prototyping a simple design in HDL and testing it using an existing development kit. Cheers, Dave On 12/18/2015 8:23 AM, Neil Salmon wrote: Thank you for your response. The system is part of a generic microwave/mm-wave aperture synthesis imaging system, so there’s an array of front-end heterodyne receivers with an IF earmarked at a centre frequency of 3 GHz (away from Wifi mobile comms), but the bandwidth is 300 MHz. Front-end initially may be receiving at a centre frequency of 20 GHz, but I could change this to 10 GHz or go up to 35 GHz. (I’ll be taking a single polarisation say horizontal or right-hand circular – I’ve not decided yet) Either way, I’ll need to digitise this 300 MHz bandwidth on each channel, and I’m quite happy with the loss in SNR in using a single bit digitisation, so satisfying the Nyquist criterion there will be I & Q channels, each generating a data stream at 300 M samples per second, ie a total of 600 Mbps for both I&Q per receiver channel, giving the total data rate of 180 Gbps. (sampling clock and mixing LO’s will be synched to a master oscillator) (as for the 3 GHz centre frequency the I&Q digitisation could be bandpass sampling/digital down conversion or a second analogue downshift using a matched pair of mixers and then comparators in each section to generate I and Q digits) So there will be this huge rate of I & Q data from 300 channels that needs to be cross-correlated in real-time with 95% duty cycle to avoid loss of SNR. (software correlation would generate just too much data for harddisk and a GPU PCIe bus solution couldn’t cope with the data rate – or at least I’d be uncomfortable about working close to data rate ceilings of PCIe.) That leaves the FPGA solution. So I need some high speed data bus to get the data into the FPGAs for cross-correlation. As I’m working single bits XOR gates will do nicely for the cross-multiplies and I want to store the four components of the cross-multiply in separate registers, just for diagnostic / trouble shooting. This gives the XOR op rate 54 T ops/sec and the requirement for the 180,000 accumulation registers. For me the challenges with be getting the arrays of single bit digitisers and linking them to the cross-correlators and doing the cross-correlation at this huge rate. Build of analogue front end heterodyne array and image formation algorithms I’ve done before. It’s just the digital hardware I need to sort. I’ve got a few researchers and postgrads around me in the engineering department who have general educational interests in FPGA technologies and the Xilinx and Altera University representative support under the university agreement. So I’m just wondering if I can do this with the Casper tools or others if necessary. Hope this extra information help. Many thanks for your help. Neil From: James Smith [mailto:jsm...@ska.ac.za] Sent: 18 December 2015 14:25 To: Neil Salmon Cc: casper@lists.berkeley.edu<mailto:casper@lists.berkeley.edu> Subject: Re: [casper] building 300-receiver channel cross-correlator Hello Neil, CASPER tools could probably do what you're looking for, but I found your description a bit confusing. You're going to need to clarify somewhat. Regards, James On Fri, Dec 18, 2015 at 4:15 PM, Neil Salmon <n.sal...@mmu.ac.uk<mailto:n.sal...@mmu.ac.uk>> wrote: Anyone help? I’m working in academia and need to build a 300-receiver channel single-bit digitiser / cross-correlator with a single frequency channel having a bandwidth of 300 MHz, centre frequency ~3 GHz. The single bit digitisers sample I&Q giving a total data rate of 180 Gbps and using XOR gates to do the cross-correlations, the total computation rate is 54 T XOR operations per second. I need to accumulate cross-correlations typically for times ranging from 10 ms to a few seconds. The system would comprise an array of single bit digitisers linked via a high speed data bus to FPGA boards for the cross-correlation/accumulation. I’ve no skills in board design but could probably learn VHDL. I don’t have funding to commission a design and build but wondered if anyone in this community could advise how I should go about building this system at our university. Thank you for any help you can provide. "Before acting on this email or opening any attachments you should read the Manchester Metropolitan University email disclaimer available on its website http://www.mmu.ac.uk/emaildisclaimer "