Re: [ARTIQ] DSP gateware

2016-08-03 Thread j arl
​Thank you for the detailed study Robert. ​


> This setup can -- for example -- generate a two-tone signal at 162 MHz
> and 238 MHz by setting f0=157 MHz, f1=5 MHz, f2=81 MHz. The attached
> plot has the data and the spectrum from a bit-accurate simulation of
> the full FPGA gateware. Units are "natural" (sample rate=1, full
> scale=1): the relevant tones are close to 0.1 and 0.15 sample rate.
> Output amplitude is below clipping.
>

​Thank you for the specific example. ​


> * 200 MHz is a bit under maximum achievable speed for this logic on a
> -2 speed grade kintex 7.
>

​Can -1 speed grade on UltraScale handle ​generation at the 1 Gb/s data
rate ?


> * 1.6 GHz * 4 channels is more than we can push to a DAC. The design
> can obviously also run at 1 GHz (f1,f2 at 125 MHz, f0 at 1 GHz) which
> would just about fill eight JESD204B pipes.
>

​That is, e
Each DAC requires 2
​parallel ​
JESD channels at 10 Gb/s.

* The design can also be built for 800 MHz with significantly lower
> resource usage (then running the f1,f2 NCOs at 200 MHz, f0 at 4*200
> MHz = 800 MHz). This would free a lot of room on the FPGA, fit the
> JESD pipes, and would still be able to comfortably generate the signal
> above.
>
​>​
This demonstrates that we can actually get very good high-data-rate
​>​
two-tone signals for eight channels out of gateware that fits on
​>​
currently available development boards.

​Splendid! This leaves room for future room for features like PID.

​-Joe ​
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq


Re: [ARTIQ] DSP gateware

2016-08-03 Thread Robert Jördens
Hi Dave,

On Mon, Aug 1, 2016 at 5:15 PM, Leibrandt, David R. (Fed)
 wrote:
> 1. I assume this logic would be followed by some sort of digital filter to 
> remove the unwanted Nyquist images.  Have you thought about how good of 
> suppression you might be able to achieve, and at what FPGA resource and phase 
> distortion cost?

That AA filter would be a better interpolator between the summing of
the f1/f2 oscilaltors and that data being fed into the f0 oscillator.
That filter would suppress the images. Currently there is just a
zeroth-order interpolator. I have played with designing a higher order
interpolator and for a CIC the math will survive the  up to second
order but most likely not for third and higher. Same for FIR.

> 2. Do you have an idea of the latency of the signal chain?  Say I wanted to 
> do a phase lock by feeding new p1 values into the RTIO.  What sort of 
> bandwidth could I achieve?

The p1 latency is about 37 cycles at 5 ns/cycle: a few misc cycles
here and there plus two CORDIC's worth of latency, each 16 bits + 3
guard bits.
Currently I have the latencies of all components matched so that RTIO
events on the different spline interpolators would automatically
arrive in the data stream time-aligned. For local feedback in e.g. PID
loops I would inject that feedback signal so that there is minimal
latency. For e.g. feedback on the p0 term that would be around 20
cycles, also at 5 ns/cycle. The u term can probably be as fast as one
or two cycles, the entire signal loop latency limited by other things.

Robert.
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq


Re: [ARTIQ] DSP gateware

2016-08-03 Thread Robert Jördens
On Mon, Aug 1, 2016 at 3:59 PM, Jonathan Mizrahi  wrote:
> I have one question, just out of curiosity: What is the motivation of
> linking two "buddy" channels in the way you described, with the b and c
> flags to turn these on and off? What application uses this feature?

A pair of buddy channels gives you all the features of two of those
signal generators in an IQ signal stream (thus also twice the
bandwidth and 3dB more SNR -- if I am not wrong). That means four
tones with two full-bandwidth oscillators. The IQ stream is something
that can be naturally fed into the DAC in question. Take a look at its
block diagram in the datasheet. Only an IQ stream can be coarse
modulated, shifted in frequency by the DAC's NCO, sinc-shaped, and
easily fed to e.g. an analog IQ mixer to get you to another frequency
window.
I just didn't call that pair of buddy channels an "IQ pair" because
when coupled, each channels's signal generators feed both I and Q.

Robert.
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq


Re: [ARTIQ] DSP gateware

2016-08-01 Thread Leibrandt, David R. (Fed)
Hi Robert,

This is a nice writeup.  A couple questions for now:
1. I assume this logic would be followed by some sort of digital filter to 
remove the unwanted Nyquist images.  Have you thought about how good of 
suppression you might be able to achieve, and at what FPGA resource and phase 
distortion cost?
2. Do you have an idea of the latency of the signal chain?  Say I wanted to do 
a phase lock by feeding new p1 values into the RTIO.  What sort of bandwidth 
could I achieve?

Thanks,
Dave

-Original Message-
From: Robert Jördens [mailto:r...@m-labs.hk] 
Sent: Sunday, July 31, 2016 5:32 AM
To: artiq@lists.m-labs.hk; Jonathan Mizrahi ; Sébastien 
Bourdeauducq ; Joe Britton ; 
Slichter, Daniel H. (Fed) ; Leibrandt, David R. (Fed) 
; Allcock, David T. (IntlAssoc) 
; Ken Brown 
Subject: Re: DSP gateware

Hello,

to fuel the discussion and planning of the smart arbitrary waveform generator 
requirements for the different applications, I did another extended design 
study for the proposed ARTIQ/Sayma DSP gateware and signal flow, looking at 
actual signal quality, resource usage and possible parametrizations.

This time, take the following parametrization of a channel's output o:

z = (a1*exp(i*(f1*t+p1)) + a2*exp(i*(f2*t+p2))) * exp(i*(f0*t+p0)) o = u + 
b*Re(z) + c*Im(z_buddy)

* u and a are 16 bit cubic spline inteprolators
* p are 16 bit constant (non-) interpolators
* f are 48 bit linear interpolators
* z_buddy refers to the (complex, IQ) z data coming from each channel's "buddy" 
channel, ignore it for now
* b, c are switches (with values 0 or 1) that allow a bunch of different 
configurations, ignore them for now
* all spline interpolators (u, a, f, p) sample at 200 MHz
* the f1/p1 and f2/p2 oscillators sample at 200 MHz and their data is fed to 
the f0/p0 oscillator without interpolation
* the f0/p0 oscillator samples at 8*200 MHz = 1.6 GHz
* data width is at least 16 bit everywhere

This setup can -- for example -- generate a two-tone signal at 162 MHz and 238 
MHz by setting f0=157 MHz, f1=5 MHz, f2=81 MHz. The attached plot has the data 
and the spectrum from a bit-accurate simulation of the full FPGA gateware. 
Units are "natural" (sample rate=1, full
scale=1): the relevant tones are close to 0.1 and 0.15 sample rate.
Output amplitude is below clipping.

This is a bit-accurate representation of the data that would be sent to the 
DAC. Actual analog output would only differ by the DAC's interpolation and it's 
analog output transfer function and DAC noise.
Don't be confused by the way the samples look: this is only due to the 
un-interpolated data from the f1/f2 oscillators. Same goes for the Nyquist 
images all around. A very rough and conservative estimate for wideband SNR is > 
85 dB not counting the images. There are a lot of things that can be tweaked 
still, this demo is not supposed to be show the optimum.

* 200 MHz is a bit under maximum achievable speed for this logic on a
-2 speed grade kintex 7.
* 1.6 GHz * 4 channels is more than we can push to a DAC. The design can 
obviously also run at 1 GHz (f1,f2 at 125 MHz, f0 at 1 GHz) which would just 
about fill eight JESD204B pipes.
* The design can also be built for 800 MHz with significantly lower resource 
usage (then running the f1,f2 NCOs at 200 MHz, f0 at 4*200 MHz = 800 MHz). This 
would free a lot of room on the FPGA, fit the JESD pipes, and would still be 
able to comfortably generate the signal above.
* DAC interpolation could be 2x if desired to get to 2 GHz or 1.6 GHz DAC 
sample rate depending on the choice of scenario.
* Eight channels of this 1.6 GHz design occupy about 62% of the LUTs of a 
xc7k325t (without _any_ other logic like everything related to the 
transcievers, ARTIQ, DRTIO, FIFOs...).
* Wrapping it in a minimal ARTIQ system brings the LUT resource usage to about 
72%.
* On a xcku040 the utilization estimate (same gateware as for the 62% xc7k325t 
system) is below 51%, (can't get a good number because of a Xilinx-Vivado bug).
* Take the LUT usage percentages with a grain of salt. They don't react kindly 
to extrapolation.
* Interpolation schemes for the f1/p1, f2/p2 oscillator data before it reaches 
the f0 oscillator might be interesting to look at.
* Spline knot behavior (ramping, switching, synchronization, latency matching, 
interpolation) for frequency, phase, amplitude is as expected (see e.g. the 
pdq2 documentation).

This demonstrates that we can actually get very good high-data-rate two-tone 
signals for eight channels out of gateware that fits on currently available 
development boards. The parametrization is intuitive and extremely flexible 
(you can e.g. rewire it at run-time to exploit and feed the full IQ datapaths 
of the DACs giving you twice the bandwidth on half the channels and all the 
other features in the DAC and 

Re: [ARTIQ] DSP gateware

2016-07-31 Thread Robert Jördens
Hello,

to fuel the discussion and planning of the smart arbitrary waveform
generator requirements for the different applications, I did another
extended design study for the proposed ARTIQ/Sayma DSP gateware and
signal flow, looking at actual signal quality, resource usage and
possible parametrizations.

This time, take the following parametrization of a channel's output o:

z = (a1*exp(i*(f1*t+p1)) + a2*exp(i*(f2*t+p2))) * exp(i*(f0*t+p0))
o = u + b*Re(z) + c*Im(z_buddy)

* u and a are 16 bit cubic spline inteprolators
* p are 16 bit constant (non-) interpolators
* f are 48 bit linear interpolators
* z_buddy refers to the (complex, IQ) z data coming from each
channel's "buddy" channel, ignore it for now
* b, c are switches (with values 0 or 1) that allow a bunch of
different configurations, ignore them for now
* all spline interpolators (u, a, f, p) sample at 200 MHz
* the f1/p1 and f2/p2 oscillators sample at 200 MHz and their data is
fed to the f0/p0 oscillator without interpolation
* the f0/p0 oscillator samples at 8*200 MHz = 1.6 GHz
* data width is at least 16 bit everywhere

This setup can -- for example -- generate a two-tone signal at 162 MHz
and 238 MHz by setting f0=157 MHz, f1=5 MHz, f2=81 MHz. The attached
plot has the data and the spectrum from a bit-accurate simulation of
the full FPGA gateware. Units are "natural" (sample rate=1, full
scale=1): the relevant tones are close to 0.1 and 0.15 sample rate.
Output amplitude is below clipping.

This is a bit-accurate representation of the data that would be sent
to the DAC. Actual analog output would only differ by the DAC's
interpolation and it's analog output transfer function and DAC noise.
Don't be confused by the way the samples look: this is only due to the
un-interpolated data from the f1/f2 oscillators. Same goes for the
Nyquist images all around. A very rough and conservative estimate for
wideband SNR is > 85 dB not counting the images. There are a lot of
things that can be tweaked still, this demo is not supposed to be show
the optimum.

* 200 MHz is a bit under maximum achievable speed for this logic on a
-2 speed grade kintex 7.
* 1.6 GHz * 4 channels is more than we can push to a DAC. The design
can obviously also run at 1 GHz (f1,f2 at 125 MHz, f0 at 1 GHz) which
would just about fill eight JESD204B pipes.
* The design can also be built for 800 MHz with significantly lower
resource usage (then running the f1,f2 NCOs at 200 MHz, f0 at 4*200
MHz = 800 MHz). This would free a lot of room on the FPGA, fit the
JESD pipes, and would still be able to comfortably generate the signal
above.
* DAC interpolation could be 2x if desired to get to 2 GHz or 1.6 GHz
DAC sample rate depending on the choice of scenario.
* Eight channels of this 1.6 GHz design occupy about 62% of the LUTs
of a xc7k325t (without _any_ other logic like everything related to
the transcievers, ARTIQ, DRTIO, FIFOs...).
* Wrapping it in a minimal ARTIQ system brings the LUT resource usage
to about 72%.
* On a xcku040 the utilization estimate (same gateware as for the 62%
xc7k325t system) is below 51%, (can't get a good number because of a
Xilinx-Vivado bug).
* Take the LUT usage percentages with a grain of salt. They don't
react kindly to extrapolation.
* Interpolation schemes for the f1/p1, f2/p2 oscillator data before it
reaches the f0 oscillator might be interesting to look at.
* Spline knot behavior (ramping, switching, synchronization, latency
matching, interpolation) for frequency, phase, amplitude is as
expected (see e.g. the pdq2 documentation).

This demonstrates that we can actually get very good high-data-rate
two-tone signals for eight channels out of gateware that fits on
currently available development boards. The parametrization is
intuitive and extremely flexible (you can e.g. rewire it at run-time
to exploit and feed the full IQ datapaths of the DACs giving you twice
the bandwidth on half the channels and all the other features in the
DAC and downstream). Any set of spline interpolators can receive new
knot data at the same time from their RTIO FIFOs: there is no
contention. The design works just as well for driving electrodes (the
u spline and maybe one of the oscillators to prod an ion). It is
broadband (the f0/p0 oscillator covers the entire data bandwidth). The
gateware as-is could also feed two IQ pairs at 1 GHz giving you full
and instant broadband access with each pair to 1 GHz IQ baseband in
the first Nyquist zone which you can up-convert in analog RF to
wherever you want. Or you can rethink it and feed it two IQ pairs at
600 MHz (4*150 MHz), use 4x interpolation and cover 2.4 GHz IQ
baseband with each pair using the DAC's fine or coarse modulation
schemes.

If there are questions about this, I'd be happy to answer them. We'd
also be happy to generate a quote for an implementation and/or a
hardware demonstrator system.

Regards,

-- 
Robert Jördens.


phaser_2fd7bfd.pdf
Description: Adobe PDF document
___

Re: [ARTIQ] DSP gateware

2016-04-01 Thread Robert Jördens
On Fri, Apr 1, 2016 at 1:09 AM, Slichter, Daniel H. (Fed)
 wrote:
>> And a 16 ns pulse would be just about 20 samples. Why would you want to
>> describe that using ~4 spline knots each being maybe 16 times 16 bits in 
>> data.
>> If you need the full bandwidth, the idea of compression using splines is not
>> very helpful. In that case you would need to design in a little "real" AWG
>> player that plays snippets from a wide BRAM.
>
> Sure, that is a better solution for these kinds of things.  I am just saying 
> that unless we have some suitable feature like this, no superconducting 
> people will be interested in the system.  So we should design things in such 
> a way that this is a possibility, to maximize the target audience.

Sure. If people want "real" sample-based AWG, it should be into the
specification. I was just pointing out that trying to do it with
spline interpolators is not particularly bright.
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq


Re: [ARTIQ] DSP gateware

2016-03-31 Thread Slichter, Daniel H. (Fed)
> to allow for FPGA selection and to rush the funding I have done a design
> study and implemented a basic DSP output channel for the ARTIQ DSP
> hardware. A 1.25 GS/s, 16 bit, "smart" channel pair would do
> 
> o0 = u0 + i0 * a0 * cos(f0 * t + p0) + q1 * a1 * sin(f1 * t + p1)
> o1 = u1 + q0 * a0 * sin(f0 * t + p0) + i1 * a1 * cos(f1 * t + p1)
> 
> * u and a are 16 bit cubic spline inteprolators
> * p are 16 bit constant (non-) interpolators
> * f are 48 bit linear interpolators
> * i and q are switches (0 or 1) that allow many different configurations,
> among them single tone independent, two-tone, single tone iq, and two-
> tone iq all with independent dc offsets
> * the inteprolators interpolate at 1/8 output rate, the DUCs output at full 
> rate
> (effectively).
> * all designed for 16 bit spline knot duration resolution and scalable spline
> interpolation clock

This looks like a good general purpose method for defining signals, which 
accommodates most possible use cases in a clean and concise way.  A few 
potential comments:
- For sc qubit applications, it would be necessary to update the u and a 
interpolators at the full output rate (~1.25 GSPS), since pulses are often only 
10-20 ns long and require nontrivial shaping over those periods of time 
(sometimes 2 envelope oscillations up and down, see e.g. 
http://arxiv.org/pdf/1405.0450v2.pdf on "wah-wah" pulses, which are commonly 
used).  In general, having u and a only updated at 1/8 clock rate will give 
rise to spurs at 1/4 of the Nyquist frequency and harmonics, which is 
undesirable for any application.  Perhaps I am misunderstanding what you mean 
by the interpolators running at 1/8 output date.  


> This uses about 28 kLUT, 14% of a xc7k325t. The timing, parsing, serial link,
> rtlink, drtio, jdes phy, gearbox, monitoring, digital servo, adc logic will
> probably add another 10-20 kLUT per channel pair but this is the dominant
> chunk.
> 
> This looks good for the xc7a200t or a xc7k325t as the building block and 4
> channels (two smart channel pairs).

Will changing the update rate for the spline interpolators make things much 
larger?  I assume they would have to be physically parallelized.  
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq


Re: [ARTIQ] DSP gateware

2016-03-31 Thread Grzegorz Kasprowicz
Yes, but for such speed you don't need to match better than several mm.
Greg

On 31 March 2016 at 13:51, Robert Jördens  wrote:

> On Thu, Mar 31, 2016 at 8:51 AM, Florent Kermarrec
>  wrote:
> > When choosing between Artix7 or Kintex7 you also have to consider that
> > Artix7 only have HR IOs which mean they don't have ODELAYE2 primitives
> and
> > we are currently using them in the actual DDR PHY for leveling.
> >
> > Also when choosing XC7A200T you will stuck to this FPGA on your board
> > because the package is different from others Artix7. With Kintex7, in the
> > slices range you are targeting, you will have more flexibility:
> > - FBG676 (8 transceivers): from XC7K70T to XC7K410T
> > - FFG676 (8 transceivers): from XC7K160T to XC7K410T
> > - FFG900 (16 transceivers): from XC7K325T to XC7K410T
>
> ACK. Good to know about the IODELAY in Artix.
> I guess the alignment on a.g. http://www.ohwr.org/projects/afc/wiki is
> done by trace length matching then, right?
>
> Robert.
> ___
> ARTIQ mailing list
> https://ssl.serverraum.org/lists/listinfo/artiq
>
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq


Re: [ARTIQ] DSP gateware

2016-03-31 Thread Florent Kermarrec
Hello,

When choosing between Artix7 or Kintex7 you also have to consider that
Artix7 only have HR IOs which mean they don't have ODELAYE2 primitives and
we are currently using them in the actual DDR PHY for leveling.

Also when choosing XC7A200T you will stuck to this FPGA on your board
because the package is different from others Artix7. With Kintex7, in the
slices range you are targeting, you will have more flexibility:
- FBG676 (8 transceivers): from XC7K70T to XC7K410T
- FFG676 (8 transceivers): from XC7K160T to XC7K410T
- FFG900 (16 transceivers): from XC7K325T to XC7K410T

Florent

2016-03-30 21:49 GMT+02:00 Robert Jördens :

> Hello,
>
> to allow for FPGA selection and to rush the funding I have done a
> design study and implemented a basic DSP output channel for the ARTIQ
> DSP hardware. A 1.25 GS/s, 16 bit, "smart" channel pair would do
>
> o0 = u0 + i0 * a0 * cos(f0 * t + p0) + q1 * a1 * sin(f1 * t + p1)
> o1 = u1 + q0 * a0 * sin(f0 * t + p0) + i1 * a1 * cos(f1 * t + p1)
>
> * u and a are 16 bit cubic spline inteprolators
> * p are 16 bit constant (non-) interpolators
> * f are 48 bit linear interpolators
> * i and q are switches (0 or 1) that allow many different
> configurations, among them single tone independent, two-tone, single
> tone iq, and two-tone iq
> all with independent dc offsets
> * the inteprolators interpolate at 1/8 output rate, the DUCs output at
> full rate (effectively).
> * all designed for 16 bit spline knot duration resolution and scalable
> spline interpolation clock
>
> This uses about 28 kLUT, 14% of a xc7k325t. The timing, parsing,
> serial link, rtlink, drtio, jdes phy, gearbox, monitoring, digital
> servo, adc logic will probably add another 10-20 kLUT per channel pair
> but this is the dominant chunk.
>
> This looks good for the xc7a200t or a xc7k325t as the building block
> and 4 channels (two smart channel pairs).
>
> I haven't implemented, benchmarked, or tested the latest X suggestions
> and design tweaks from article Y in journal Z.
>
> Robert.
> ___
> ARTIQ mailing list
> https://ssl.serverraum.org/lists/listinfo/artiq
>
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq