Re: [casper] Decrease in Spectrometer Perfomance / planAhead

2014-09-16 Thread Ryan Monroe
I support what Dan and Danny have said here.  It's likely that I've used
PlanAhead more than anyone else in CASPER and have never experienced this
problem ("Use of planahead reduces system dynamic range").  Here are things
I *have* seen:

1. slow designs don't meet timing on some ADC input bits -> when signals
with enough input power toggle those bits errors occur
2. Some ADC (katadc I think?) has a SPI interface to control various
settings.  I couldn't make it clock faster than ~280MHz (even though the
rest of the design was at 325).  Since ADC configuration usually seems to
survive an FPGA reprogram, we had to configure the ADC with a slower design
and then reprogram to the faster one
3. Design didn't meet timing.  ADC samples came out right, but started
seeing errors when I cranked up the input power.  I have a theory involving
propogation delays into upper bits which describes this problem, but it's
not going to fit into this email

I would suggest adding snap blocks to the ADCs for the two designs, so that
you can get raw adc samples.  Then pick the "working" design, and make sure
you can produce adc sample waveforms which look right (full dynamic range
exercised, waveform is of correct period).  Then now that you have the snap
decoding scheme correct, go to the non-working design and check to see if
the raw samples coming in are good as well.  I'm betting that you'll find
your problem there.

Cheers

--Ryan

On Tue, Sep 16, 2014 at 10:44 AM, Raul Sapunar Opazo 
wrote:

> Danny,
>
> I am actually using that ADC on a Roach 2 rev 2 board... but Iam using the
> same adc calibration for all my spectrometers.. and as I said before,  for
> the one that planahead was not needed, the dynamic range is fine..
>
> Regards,
> Raul
>
> 2014-09-16 14:13 GMT-03:00 Danny Price :
>
> Hi Raul
>>
>> As Dan mentioned, planahead shouldn't make a difference to things like
>> dynamic range. If you're using an ADC that interleaves several cores (eg
>> 5GSPS ADC), this could be something to do with poor ADC calibration. I'd
>> suggest looking into that as a possible cause?
>>
>> Regards
>> Danny
>>
>>> Raul Sapunar Opazo 
>>> September 16, 2014 at 10:21 AM
>>>
>>> Hello everyone,
>>>
>>> I used planahead to fix the timing problems of a 1.8Ghz 32k channel
>>> spectrometer in roach 2. I did it without troubles, using 38% of the DSP
>>> and less of the 50% of RAMs..
>>>
>>> After I compiled the model with the new constrains/placement and I
>>> tested the bof, I found out that the numeric noise increased a lot as the
>>> power of the harmonics...
>>>
>>>  This issue results in a big decrease of the dinamic range of the
>>> spectrometer in the whole BW (around 25 db against 40-45 db of a 1.8Ghz 16
>>> k ch spectrometer that didn't need planahead)
>>>
>>> I encountered this problem with an 1.8GHz 8k ch aswell.. so the issue is
>>> not related with the increase of the number of channels..
>>>
>>> Anyone encountered this kind of problems when using Planahead? As I
>>> understand planahead shouldn't change the performance because it doesn't
>>> make any change in the designs...
>>>
>>> Any help will be appreciated!
>>>
>>> Best Regards,
>>> Raul Sapunar
>>>
>>
>


[casper] Starburst, an open-source 10gsps low-N correlator for ROACH2

2014-10-29 Thread Ryan Monroe
Hey guys,

The CASPER community has been a great help to me in the past few years.
People have asked for my libraries and due to JPL policy, I've always had
to turn them away.  Thanks to help from Bob Jarnot, Jonathon Kocz and
others, I'm now free to open-source some of my designs/libraries.

For my PhD, I'm designing a 10gsps correlator.  I'd really like for this to
be an extremely versatile design, useful for radioastronomy and
earth-observing-science, good for all broadband, low-N applications.  *If
there are any special features you'd like to see in this design, beyond
what is listed below, tell me now!*  I'm willing to add it, but I have to
know before everything is finished up.

Stats are:

(note: N bits complex means N bits for each of real and imag)

Mode "A":
Dual-polarization full-stokes,
2.5 GHz per pol
8192-channel (per pol)
8-tap hamming PFB

Mode "B":
I/Q separating spectrometer
5 GHz total bandwidth
16384 channels across entire band
8-tap hamming PFB

Features common to both:
Time-domain delay tracking (sample resolution; 48k-sample range)
Frequency domain delay tracking (linear interpolation, set two registers to
update)
Bandpass calibration (applied before I/Q separation): unique 16 bit complex
gain applied to each signal= sideband rejection much greater than ADC SNR
10GBE full-duty cycle dump rate (4bits complex per sample)
1GBE accumulation dumps.  accumulations supported [10ms -> 100s for
spectrometer only]; [10ms -> 1s for correlator]
Everything is synchronized off 1pps and the end of an FFT.
Triggered accumulations via GPIO, software register or 1pps (accumulations
can be one-off or continuous)

In addition, the design will include a X-engine correlator (2 antennas,
each 2-pol).  The corner turn is performed simply by wiring 10gbe cables.
The design can be used as a spectrometer though.  The design requires an
FPGA clock rate of 312.5 MHz, but I'm going to try for 375 MHz so that we
can overclock if we want to (or if we get better ADCs later)

I really want to make this a versatile, general purpose, broadband,
spectrometer/low-N correlator.

Features I could add if people want:

DDR circular buffer (4bits of each adc sample, 1.6s of buffer@16 GB of ram)
[requested by tom kuiper/majin walid]
Larger x-engine (4 dual-pol antennas for charity, I could do 8 but it would
be lots of work so we'll have to talk in that case)
ADC core matching (if my old firmware for this still works!)
*your feature request here*


I look forward to your input!  As a friendly reminder, my track record for
designing FPGA firmware is extremely good, but this might not all pan out
as expected.  I'm making no promises quite yet.

Timeline is currently to have simulated firmware which meets timing at
312.5 MHz (equals 5 GHz total bandwidth) by dec1.  Fingers crossed!


--Ryan


Re: [casper] Starburst, an open-source 10gsps low-N correlator for ROACH2

2014-10-29 Thread Ryan Monroe
lly are able to get a comparable
> design running at 375 MHz with -1 speed grade parts, honestly you’d deserve
> an attaboy or two.  And we’d gladly learn from how you got there, so please
> keep us in the loop.
>
> By the way, assuming you are using the ADC referenced with the
> architecture you describe I’d suggest it is appropriate to cite all above
> referenced and other relevant prior work in your PhD.
>
> Best of luck with it.
>
> Jonathan and SMA / EHT team
>
>
>
>
>
> > On Oct 29, 2014, at 6:25 PM, Ryan Monroe 
> wrote:
> >
> > Hey guys,
> >
> > The CASPER community has been a great help to me in the past few years.
> People have asked for my libraries and due to JPL policy, I've always had
> to turn them away.  Thanks to help from Bob Jarnot, Jonathon Kocz and
> others, I'm now free to open-source some of my designs/libraries.
> >
> > For my PhD, I'm designing a 10gsps correlator.  I'd really like for this
> to be an extremely versatile design, useful for radioastronomy and
> earth-observing-science, good for all broadband, low-N applications.  If
> there are any special features you'd like to see in this design, beyond
> what is listed below, tell me now!  I'm willing to add it, but I have to
> know before everything is finished up.
> >
> > Stats are:
> >
> > (note: N bits complex means N bits for each of real and imag)
> >
> > Mode "A":
> > Dual-polarization full-stokes,
> > 2.5 GHz per pol
> > 8192-channel (per pol)
> > 8-tap hamming PFB
> >
> > Mode "B":
> > I/Q separating spectrometer
> > 5 GHz total bandwidth
> > 16384 channels across entire band
> > 8-tap hamming PFB
> >
> > Features common to both:
> > Time-domain delay tracking (sample resolution; 48k-sample range)
> > Frequency domain delay tracking (linear interpolation, set two registers
> to update)
> > Bandpass calibration (applied before I/Q separation): unique 16 bit
> complex gain applied to each signal= sideband rejection much greater than
> ADC SNR
> > 10GBE full-duty cycle dump rate (4bits complex per sample)
> > 1GBE accumulation dumps.  accumulations supported [10ms -> 100s for
> spectrometer only]; [10ms -> 1s for correlator]
> > Everything is synchronized off 1pps and the end of an FFT.
> > Triggered accumulations via GPIO, software register or 1pps
> (accumulations can be one-off or continuous)
> >
> > In addition, the design will include a X-engine correlator (2 antennas,
> each 2-pol).  The corner turn is performed simply by wiring 10gbe cables.
> The design can be used as a spectrometer though.  The design requires an
> FPGA clock rate of 312.5 MHz, but I'm going to try for 375 MHz so that we
> can overclock if we want to (or if we get better ADCs later)
> >
> > I really want to make this a versatile, general purpose, broadband,
> spectrometer/low-N correlator.
> >
> > Features I could add if people want:
> >
> > DDR circular buffer (4bits of each adc sample, 1.6s of buffer@16 GB of
> ram) [requested by tom kuiper/majin walid]
> > Larger x-engine (4 dual-pol antennas for charity, I could do 8 but it
> would be lots of work so we'll have to talk in that case)
> > ADC core matching (if my old firmware for this still works!)
> > your feature request here
> >
> >
> > I look forward to your input!  As a friendly reminder, my track record
> for designing FPGA firmware is extremely good, but this might not all pan
> out as expected.  I'm making no promises quite yet.
> >
> > Timeline is currently to have simulated firmware which meets timing at
> 312.5 MHz (equals 5 GHz total bandwidth) by dec1.  Fingers crossed!
> >
> >
> > --Ryan
>
>


Re: [casper] Starburst, an open-source 10gsps low-N correlator for ROACH2

2014-10-29 Thread Ryan Monroe
They're not released yet, I'm going to deal with that once I've gotten the
design up and running :-)

Thanks for all your help as well!

On Wed, Oct 29, 2014 at 6:21 PM, Jonathan Weintroub <
jweintr...@cfa.harvard.edu> wrote:

> Hi Ryan,
>
> Thanks for the response.
>
> To answer your question we use 2^15 = 32 k FFTs operating on 8 bit real
> time samples, to channelize our visibility spectrum to 2^14 = 16k complex
> points.  There is a pair of these 2^15 point PFBs on each Virtex 6, one for
> each 5 Gsps ADC input.
>
> We’d certainly be interested in learning about your custom FFT libraries
> especially if these may be helpful in getting to timing closure.  We do
> seem to be I/O bound in this design, by the way.
>
> I need to leave it there for tonight.
>
> Best wishes,
>
> Jonathan
>
>
> > On Oct 29, 2014, at 9:00 PM, Ryan Monroe 
> wrote:
> >
> > Hi Jonathan!  Reply is inline (in blue)
> >
> >
> >
> > Hi Ryan,
> >
> > That does look cool!   You don’t mention which ADC you plan to use.  Is
> it this one?
> >
> > https://casper.berkeley.edu/wiki/ADC1x5000-8
> >
> > That's the one.
> >
> > Just to mention in case it proves useful that our group at Submillimeter
> Array (SMA) and Event Horizon Telescope (EHT)  has been working on a
> correlator / phased array system with what appear to be rather similar
> features (low N, wideband, high spectral resolution 32 k PFB etc) using the
> above ADC (DMUX 1:1 version) and ROACH2.   We view it as dual 5 Gsps, but I
> suppose one might interpret that as 10Gsps.  There are specs, a little
> outdated, here:
> >
> > https://www.cfa.harvard.edu/twiki5/view/SMAwideband/DigitalBackEnd
> >
> > This page includes a link to our open source githup repo with all model
> files.
> >
> > We have done a fair amount of work on ADC core calibration too, also on
> the wiki, poke around.  The key results were recently published here:
> >
> http://www.worldscientific.com/doi/pdfplus/10.1142/S2251171714500019?src=recsys
> >
> > I've seen your work here and it's going to be extremely helpful.  Thanks!
> >
> >
> > There is also a recent publication by Jiang et al  on the ADC in PASP:
> > Vol. 126, No. 942 (August 2014), pp. 761-768
> >
> > At this point have the logic for this correlator reduced to a fully
> working V6 bit code with all features except the phased array (design in
> progress).  In fact, we are routinely taking observational data at SMA, and
> plan to field it for science in mid-November. However it is not yet running
> at our eventual design speed goal of 286 MHz, corresponding to 4.6 Gsps at
> the ADC—a little more modest than your 5 Gsps.  Our experience attempting
> to meet 286 MHz with this complex of a design has been sobering so far,
> though we have not given up.  If you really are able to get a comparable
> design running at 375 MHz with -1 speed grade parts, honestly you’d deserve
> an attaboy or two.  And we’d gladly learn from how you got there, so please
> keep us in the loop.
> >
> > I have custom FFT libraries I've written, which consume much fewer
> resources than stock CASPER stuff.  I've used them to close timing to 400
> MHz before, but I'm worried that bussing signals around the FPGA is going
> to be rough at 375.  I can talk to one of you, or direct you to reference
> designs, if you want help closing timing.
> >
> > Is it 2^15 point FFT, or 2^15 channel FFT?  Can you handle 2^14 points
> (equals 2^13 channels) per 2.5 GHz?  You are 8 single-pol antennas, each
> processing 2.5 GHz of bandwidth right?  I could build my design with you
> guys in mind, and close to 312.5 MHz.  My design supports all of your
> features and should be more-or-less plug and play once I'm finished.  My
> output format will be different from yours though.
> >
> > By the way, assuming you are using the ADC referenced with the
> architecture you describe I’d suggest it is appropriate to cite all above
> referenced and other relevant prior work in your PhD.
> >
> > For sure!  The ADC work is extremely relevant and we couldn't do it
> without you.
> >
> > Best of luck with it.
> >
> > Jonathan and SMA / EHT team
> >
> > On Wed, Oct 29, 2014 at 5:34 PM, Jonathan Weintroub <
> jweintr...@cfa.harvard.edu> wrote:
> > Hi Ryan,
> >
> > That does look cool!   You don’t mention which ADC you plan to use.  Is
> it this one?
> >
> > https://casper.berkeley.edu/wiki/ADC1x5000-8
> >
> > Just to mention in case it proves useful that our group at Submillimeter
> Ar

Re: [casper] 10gb Ethernet issue

2014-11-06 Thread Ryan Monroe
Hey ross, I had a bunch of problems when I tried to use 10gbe, and
eventually found that there are some problems with the distro that comes
with the ROACH1s.  I made an image which I can send to you if you'd like.

On Thu, Nov 6, 2014 at 3:50 PM, Ross Williamson <
rwilliam...@astro.caltech.edu> wrote:

> I'm trying to get 10Gb ethernet up and running on a ROACH-1.  I'm
> using tutorial-2 as a test. I've plugged a cable between ports 0 and 3
> and the red led on the ROACH has lit up.  Unfortunately when I run
> tut2.py it claims:
>
> Port 0 linkup: False.
>
> Also if I try and use tcpborphserver I get the following:
> tap-start gbe0 02:02:0A:00:00:14 192.168.5.20 6
> !tap-start ok
> #log warn 1415316436391 poco tgtap\_exited\_with\_code\_71
>
> ifconfig -a shows nothing
>
> Any thoughts much appreciated
>
> --
> Ross Williamson
> Research Scientist - Sub-mm Group
> California Institute of Technology
> 626-395-2647 (office)
> 312-504-3051 (Cell)
>
>


Re: [casper] inverse PFB

2014-12-10 Thread Ryan Monroe
IIRC, an inverse FFT can be implemented as
1. Complex conjugate
2. Fft
3. Complex conjugate

Which is mathematically identical iirc to an ifft, if slightly less
efficient computationally.

In general, the output will not be real valued of course

On Tue, Dec 9, 2014, 2:45 PM Jonathan Weintroub 
wrote:

> Thanks to Richard and everyone who responded earlier for the comments,
> which in some cases are very detailed. It is good to know we are not the
> only ones worrying about this.  Our DSP group is digesting the material and
> looking at options, and other followup will likely follow.  I  did not want
> to delay thanks and acknowledgment.
>
> One basic question which did come up is it appears that even an inverse
> FFT would present some challenges.  We stuff the 32k forward FFT with real
> time series data and extract 16k complex frequency domain points.   Might I
> ask if any CASPER folks have experience implementing an inverse FFT
> relevant to this case, as a real time FPGA bit code?
>
> Thanks again.
>
> Jonathan Weintroub
> SAO
>
>
> > On Dec 8, 2014, at 9:50 PM, Richard Shaw  wrote:
> >
> > Hi,
> >
> > I thought I'd comment as this is a problem we've been having to deal
> > with recently for some VLBI observations. Fortunately we've had some
> > success with an offline least-squares inversion of the PFB. This is
> > probably not the scheme that you want, as it essentially operates on
> > the whole PFB'd timestream at once, so realistically you need a
> > cluster to do it. However, there is prototype code available here [1]
> > if it's useful.
> >
> > The rationale for doing this is is that when you look at the whole PFB
> > timestream very little information is actually lost (essentially only
> > a few samples at the ends), though it may be spread across frequency
> > and time samples. For N PFB samples of length M, there are roughly
> > 2*N*M total numbers measured, which depend on 2*(N+P-1)*M numbers in
> > the underlying timestream (where P is the number of taps). As
> > typically P << N, there are very few unmeasured linear combinations,
> > and so a statistical inversion can be pretty accurate. Fortunately it
> > turns out this inversion can also be done pretty efficiently.
> >
> > The general scheme is this:
> >
> > 1. inverse FFT to generate a pseudo-timestream
> > 2. the coupling matrix between elements in this pseudo-timestream and
> > the real timestream is sparse diagonal, and is trivially calculable
> > from the window function
> > 3. Perform a shuffle on the timstream to turn this into a series of
> > band diagonal matrices (bandwidth ~ 2*P)
> > 4. Use a band diagonal least-squares solve to invert the
> > pseudo-timestream back to the underlying timestream.
> >
> > A fuller description is here [2].
> >
> > The complexity is O(N), and as the inversion breaks into blocks it
> > parallelises pretty trivially up to M processes (where M is the number
> > of samples in the window function).
> >
> > We did look at some iterative ways that step through the PFB
> > timestream, but they seem to accumulate errors as they go, and become
> > horribly inaccurate very quickly. This avoids it by treating the whole
> > timestream at once. Your accuracy improves the longer the length you
> > use at once.
> >
> > Juan Mena Parra and Kevin Bandura (cc'd) have also been looking at
> > what would need to change about the PFB to make it more easily
> > invertible in a streaming fashion (rather than having to touch the
> > whole timestream at once). My memory is that changing the window
> > functions seems to be a big help, so hopefully one of them could chime
> > in to clarify that.
> >
> > Anyway, hope that is of some help,
> > Richard
> >
> > [1]: http://github.com/jrs65/pfb-inverse/
> > [2]: http://nbviewer.ipython.org/github/jrs65/pfb-inverse/blob/
> master/notes.ipynb
> >
>
>
>


Re: [casper] inverse PFB

2014-12-11 Thread Ryan Monroe
Hey Laura, that technique sounds just fine.  You're right that the
fft_wideband_real block wouldn't do it for you in this case, you'd have to
do a complex FFT.  This would be pretty easy to stitch together from
fft_biplex and fft_direct modules (consider how they are stitched together
in the fft_wideband_real, replacing fft_biplex_real blocks with twice as
many fft_biplex blocks)

For a long FFT (32k complex fft), this would be very large and would
consume a significant portion of the entire FPGA.

>It would be really neat if there was a dsp trick out there that used the
wideband_real as an inverse, but we'd like to go with simplest solution
regardless.
I betcha you can do this (mathematically, not saying there's an actual
block out there), if you are sure that your output signal is going to be
real.  There are similar techniques for performing a n point DCT using a n
point FFT..
But there's no guarantee of that, especially considering that we won't have
the information at frequency pi (good catch, there)

Cheers!

--Ryan

On Thu, Dec 11, 2014 at 6:59 AM, Vertatschitsch, Laura E. <
lvertatschit...@cfa.harvard.edu> wrote:

> Hi Ryan,
>
> I have used a method of similar simplicity that involves swapping the real
> and imaginary parts of samples before and after the fft, so a mathematical
> equivalent of multiplying by j after taking the conjugate of the samples.
> For that design I used the fft_direct block and operated only on 32
> incoming parallel samples.
>
> The issue is more that we aren't sure which fft block to place in that
> algorithm for the case Jonathan describes, or if there is a clever
> algorithm to use another block.  We use the fft_wideband_real to generate
> half of the full fft, so 16k points coming out over many clock cycles.
> This block expects input data that is real and produces output data that is
> complex.  It strikes me that this block will not natively slide into the
> real/imag-swap algorithm.
>
> We could obviously try and produce the full fft output from the data by
> flipping and concatenation (and find the value at pi?), but we are still
> left with complex data in need of an fft block that will accept it and
> perform a 32k point transform.
>
> Do others use such a block with success?  It was suggested to me that the
> wideband real block was much more widely used than the other blocks, thus
> it is up to date, tested, and working.
>
> It would be really neat if there was a dsp trick out there that used the
> wideband_real as an inverse, but we'd like to go with simplest solution
> regardless.
>
> -Laura
>
>
>
> On Thursday, December 11, 2014, Ryan Monroe 
> wrote:
>
>> IIRC, an inverse FFT can be implemented as
>> 1. Complex conjugate
>> 2. Fft
>> 3. Complex conjugate
>>
>> Which is mathematically identical iirc to an ifft, if slightly less
>> efficient computationally.
>>
>> In general, the output will not be real valued of course
>>
>> On Tue, Dec 9, 2014, 2:45 PM Jonathan Weintroub <
>> jweintr...@cfa.harvard.edu> wrote:
>>
>>> Thanks to Richard and everyone who responded earlier for the comments,
>>> which in some cases are very detailed. It is good to know we are not the
>>> only ones worrying about this.  Our DSP group is digesting the material and
>>> looking at options, and other followup will likely follow.  I  did not want
>>> to delay thanks and acknowledgment.
>>>
>>> One basic question which did come up is it appears that even an inverse
>>> FFT would present some challenges.  We stuff the 32k forward FFT with real
>>> time series data and extract 16k complex frequency domain points.   Might I
>>> ask if any CASPER folks have experience implementing an inverse FFT
>>> relevant to this case, as a real time FPGA bit code?
>>>
>>> Thanks again.
>>>
>>> Jonathan Weintroub
>>> SAO
>>>
>>>
>>> > On Dec 8, 2014, at 9:50 PM, Richard Shaw 
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I thought I'd comment as this is a problem we've been having to deal
>>> > with recently for some VLBI observations. Fortunately we've had some
>>> > success with an offline least-squares inversion of the PFB. This is
>>> > probably not the scheme that you want, as it essentially operates on
>>> > the whole PFB'd timestream at once, so realistically you need a
>>> > cluster to do it. However, there is prototype code available here [1]
>>> > if it's useful.
>>> >
>>> > The rationale for doing this is i

Re: [casper] Confusion over fft_biplex_real_2x

2015-01-14 Thread Ryan Monroe
Hey ross, this is all going off memory, but...

fft_biplex takes two arbitrary complex-valued input streams and computes a
separate FFT on each one.  In general, the output is also complex-valued.

fft_biplex_real_4x uses a clever trick with FFTs, where for any given FFT,
you can instead do two (real-valued input) FFTs instead of one
(complex-valued input) fft.  Because the input is real, the output is
conjugate-symmetric.  So, half of the data coming from each FFT is
redundant in a sense.

fft_biplex_real_2x does the same operations as fft_biplex_real_4x, only it
does not generate the conjugate-symmetric part of each FFT.  Therefore,
there are half as many samples as before, and you can stuff all the data
into two streams instead of four.

Doing FFTs with a demux factor usually involves doing a separate FFT on
each demuxed stream, followed by an FFT across streams.  The ways we do
that vary across library versions, but the math all looks pretty much the
same.  If you're doing a fft_biplex_real implementation, you'll almost
certainly be using the fft_biplex_real_4x version for that.

Give me a call if you need some help, you have my number ;-)

--Ryan

On Tue, Jan 13, 2015 at 2:58 PM, Ross Williamson <
rwilliam...@astro.caltech.edu> wrote:

> I think Glenn touched on this in an earlier post but I still have some
> questions.
> I'm using the ska-sa fork of mlib_devel (last pull probably 1 month ago).
>
> 1) Is the output labeling still wrong? i.e. First output claims to be
> pol02 and second is pol13.  Is it in fact pol01 then pol23 or am I
> missing how the FFT works?
>
> 2) If I have 500 Mhz sampled data that is demuxed by two (i.e. two
> 250Mhz streams interleaved), can I simply connect them up to pol0 and
> pol1 and the output of pol02 (pol01?) would be the FFT? I could then
> do the same with a second channel on pol2 and pol3.
>
> 3) If the above is correct what is the channel ordering?  for example
> if I set NFFT to 13 I have 2^12 or 4096 channels.  Does each clock
> cycle output in sequential order 0-4096 on each clock cycle the repeat
> back to 0?
>
> Cheers,
>
> Ross
> --
> Ross Williamson
> Research Scientist - Sub-mm Group
> California Institute of Technology
> 626-395-2647 (office)
> 312-504-3051 (Cell)
>
>


Re: [casper] Confusion over fft_biplex_real_2x

2015-01-14 Thread Ryan Monroe
Oh and I don't know which inputs match with which outputs, I'd probably
just connect three to '0' and one to white noise, and then check which
output is non-zero.  Rinse and repeat!  Also, test!  I've experienced
flakiness with casper libraries lately so a zeroth-order test is not a bad
idea

On Wed, Jan 14, 2015 at 1:03 PM, Ryan Monroe 
wrote:

> Hey ross, this is all going off memory, but...
>
> fft_biplex takes two arbitrary complex-valued input streams and computes a
> separate FFT on each one.  In general, the output is also complex-valued.
>
> fft_biplex_real_4x uses a clever trick with FFTs, where for any given FFT,
> you can instead do two (real-valued input) FFTs instead of one
> (complex-valued input) fft.  Because the input is real, the output is
> conjugate-symmetric.  So, half of the data coming from each FFT is
> redundant in a sense.
>
> fft_biplex_real_2x does the same operations as fft_biplex_real_4x, only it
> does not generate the conjugate-symmetric part of each FFT.  Therefore,
> there are half as many samples as before, and you can stuff all the data
> into two streams instead of four.
>
> Doing FFTs with a demux factor usually involves doing a separate FFT on
> each demuxed stream, followed by an FFT across streams.  The ways we do
> that vary across library versions, but the math all looks pretty much the
> same.  If you're doing a fft_biplex_real implementation, you'll almost
> certainly be using the fft_biplex_real_4x version for that.
>
> Give me a call if you need some help, you have my number ;-)
>
> --Ryan
>
> On Tue, Jan 13, 2015 at 2:58 PM, Ross Williamson <
> rwilliam...@astro.caltech.edu> wrote:
>
>> I think Glenn touched on this in an earlier post but I still have some
>> questions.
>> I'm using the ska-sa fork of mlib_devel (last pull probably 1 month ago).
>>
>> 1) Is the output labeling still wrong? i.e. First output claims to be
>> pol02 and second is pol13.  Is it in fact pol01 then pol23 or am I
>> missing how the FFT works?
>>
>> 2) If I have 500 Mhz sampled data that is demuxed by two (i.e. two
>> 250Mhz streams interleaved), can I simply connect them up to pol0 and
>> pol1 and the output of pol02 (pol01?) would be the FFT? I could then
>> do the same with a second channel on pol2 and pol3.
>>
>> 3) If the above is correct what is the channel ordering?  for example
>> if I set NFFT to 13 I have 2^12 or 4096 channels.  Does each clock
>> cycle output in sequential order 0-4096 on each clock cycle the repeat
>> back to 0?
>>
>> Cheers,
>>
>> Ross
>> --
>> Ross Williamson
>> Research Scientist - Sub-mm Group
>> California Institute of Technology
>> 626-395-2647 (office)
>> 312-504-3051 (Cell)
>>
>>
>


[casper] timing failure on epb_clk??

2015-02-23 Thread Ryan Monroe
[pardon the triple-post, if that happened; I appear to have mailing list
issues]

Hey all,

I'm designing a FX correlator on ROACH2 boards, which is targeting a user
clock rate of 312.5 MHz.  However, I see TONS of failures similar to the
one posted below (see image)

It appears that there is a super-high fanout bus which connects all the
registers and shared-memories on the processor side.  I tried mitigating
this by minimizing the number of yellow blocks, but this only made the
problem worse.

We might be able to add a cycle of latency to this bus, resolving the
timing error and making my day.  Anyone have the experience to know how to
do this?  Or another way to resolve the issue?  I'm all out of ideas on my
end

Thanks in advance!

Image: https://dl.dropboxusercontent.com/u/2832602/epb_timing_fail.png

--Ryan


Re: [casper] timing failure on epb_clk??

2015-03-02 Thread Ryan Monroe
Dear all, I believe that I have resolved this problem:

The OPB bus connects many components which never talk directly to each
other.  Since the timing analysis is static, the tools are unaware of this
fact-- many of these paths are false.  I strongly believe that the paths in
question, for instance, are all false.  Anyone who sees a similar issue
might resolve the problem by adding constraints similar to these:

timespec ts_false_path0 = FROM
"sbs_v8_pfx_XSG_core_config/sbs_v8_pfx_XSG_core_config/sbs_v8_pfx_x0/*" TO
"*tig*" TIG;
timespec ts_false_path1 = FROM
"sbs_v8_pfx_*/tge_rx_inst/rx_cpu_enabled.cpu_rx_buffer/BU2/U0/blk_mem_generator/valid.cstr/ramloop*.ram.r/v*_noinit.ram/*"
THRU "*OPB*" TO
"sbs_v8_pfx_freqrespo_lut*_spec_cal?_?_ramblk/sbs_v8_pfx_freqrespo_lut*_spec_cal?_?_ramblk/ramb36e*"
TIG;
timespec ts_false_path2 = FROM
"sbs_v8_pfx_*/arp_cache_inst/BU2/U0/blk_mem_generator/valid.cstr/ramloop*.ram.r/v*_init.ram/TRUE_DP.SIMPLE_PRIM18.ram*"
THRU "*OPB*" TO
"sbs_v8_pfx_freqrespo_lut*_spec_cal?_?_ramblk/sbs_v8_pfx_freqrespo_lut*_spec_cal?_?_ramblk/ramb36e*"
TIG;




In addition, I got two very useful replies to this answer; while both were
private, I received permission to post them publicly.  I am doing so, so
that others can have the answer if this comes up




David George
Feb 23 (7 days ago)
to me
Hi Ryan.

> Hey David!  This is an excellent suggestion, and now that you mention it,
> when I closed timing on a subsection of my design, epb_clk errors went
away.
> Good to think about!  I'll attempt to close timing on that domain first,
and
> then apply epb_clk constraints.

Ok good - thats what I would have expected. The tools will spend all
the effort on the 'impossible' logic path and leave the easy ones
unattended; hence the easy ones pollute the results. The trick is to
identify the ones which the compiler spent effort on. These ones will
have low values for routes etc. Figuring out timing is a dark art.

> Another idea, it seems that it's just one (32-bit wide) net being bussed
> everywhere, I might be able to mitigate the problem by sending several
> mis-behaving bits over the global clock nets ;-)Only problem, is that
we
> can only use 12 bits in each clock region, which is pretty limiting for
this
> purpose...

At 67 MHz the EPB bus shouldn't really be a problem, though I'll bet
there is room for improvement, but that comes at the cost of changing
something that has remained unmoved for a while and is well proven.

> Mind if I include your responses in an email to the whole casper group?
> That way, the next one to have this problem can just search the archives!

Thats fine.

> Thanks for the help!

No problem; the issue caught my eye three times ;)

Cheers,
David



Russ McWhirter 
Feb 23 (7 days ago)
to me
Hi Ryan,

I have had timing closure issues for our Roach firmware and used an
inelegant solution.

It is possible to set the PPC to add cycles to the external bus
transactions. This can be done in uboot but doesn't survive a reboot. We
eventually added it to the kernel build.

I don't really recommend it since it relies on extra timing constraints in
the fpga that could be hard to get right. In our case, a timing constraint
was added to the fpga bus signals to reflect the larger setup and hold time.

One of my settings for example was: setidcr 0x012 0x0011 0x1000380

The details are in page 578,
PPC440EPx/GRx Embedded Processor
Revision 1.15 – September 22, 2008
Preliminary User’s Manual

Hope you find a better solution.





On Mon, Feb 23, 2015 at 2:47 AM, Ryan Monroe 
wrote:

> [pardon the triple-post, if that happened; I appear to have mailing list
> issues]
>
> Hey all,
>
> I'm designing a FX correlator on ROACH2 boards, which is targeting a user
> clock rate of 312.5 MHz.  However, I see TONS of failures similar to the
> one posted below (see image)
>
> It appears that there is a super-high fanout bus which connects all the
> registers and shared-memories on the processor side.  I tried mitigating
> this by minimizing the number of yellow blocks, but this only made the
> problem worse.
>
> We might be able to add a cycle of latency to this bus, resolving the
> timing error and making my day.  Anyone have the experience to know how to
> do this?  Or another way to resolve the issue?  I'm all out of ideas on my
> end
>
> Thanks in advance!
>
> Image: https://dl.dropboxusercontent.com/u/2832602/epb_timing_fail.png
>
> --Ryan
>
>


Re: [casper] Skewed data samples

2015-04-24 Thread Ryan Monroe
Yeah that'll just be some power showing up in the DC bin (which you should
throw away anyways)

--Ryan

On Fri, Apr 24, 2015 at 5:38 PM, Kuiper, Thomas (3266) 
wrote:

> Thanks, Dan.  Yes, we're using KAT ADCs.  I'm not worried about a DC
> offset and I know about the slight ADC bias.  It's the skewness I'm
> wondering about.  It's just barely detectable by eye in a histogram.
>
> Tom
> 
> From: dan.werthi...@gmail.com [dan.werthi...@gmail.com] on behalf of Dan
> Werthimer [d...@ssl.berkeley.edu]
> Sent: Friday, April 24, 2015 5:34 PM
> To: Kuiper, Thomas (3266)
> Cc: G Jones; Casper Lists
> Subject: Re: [casper] Skewed data samples
>
> hi tom,
>
> if you are using casper adcs:
>
> all the casper adc boards are AC coupled
> (they have baluns and coupling capacitors),
> so  even if your input signal has a DC offset, it won't couple
> into the ADC.   however, there are slight DC offsets in the ADC,
> so there will be a small spike in the DC bin, but probably
> not from the signal your are injecting.
>
> best wishes,
>
> dan
>
>


Re: [casper] Timing Errors ROACH2

2015-11-04 Thread Ryan Monroe
np!  the 10gbe core wasn't really intended to run at fast clock rates.  
Be sure to constrain it to the east-ish side of the chip, this is 
probably either a
1. device utilization issue (you are trying to do too much stuff on the 
chip), or
2. placement issue (probably this)-- the tools are terrible at placing 
things in the right spots


Send me any questions you have, but I'm pretty busy and reserve the 
right to be bad about answering


--Ryan

On 11/04/2015 12:32 AM, Amit Bansod wrote:

Hi Ryan,

Thanks a lot! I will give this a try!

Cheers,
Amit

On 04-Nov-15 9:31 AM, Ryan Monroe wrote:

Dear Amit,

Please consider my memo 50 on the CASPER list: "Performance optimization
for Virtex 6 CASPER designs" [1]

As it turns out, I have a design open right now, which must close 8
10gbe cores at 312.5 mhz.  Attached is an image of where I placed my
tge_tx_inst and tge_rx_inst pblocks, which I hope will be helpful for
you.  Wish I had time to do more for you, but such is life :-/

Cheers!

--Ryan Monroe

[1] https://casper.berkeley.edu/wiki/Memos

On 11/04/2015 12:17 AM, aban...@mpifr-bonn.mpg.de wrote:

Dear All,

I am getting following timing errors on 10GbE yellow blocks which I am
finding it hard to get rid off. I am running my design at 200 MHz. I
have given the device utilization summary if it is useful. I have also
enclosed the files.

Best Regards,
Amit

Timing Errors:
Timing constraint: PERIOD analysis for net
  "d_codd_64ch_2_P0_ADC_asiaa_adc5g/d_codd_64ch_2_P0_ADC_asiaa_adc5g/
  mmcm_clkout1" derived from  PERIOD analysis for net
  "d_codd_64ch_2_P0_ADC_asiaa_adc5g/d_codd_64ch_2_P0_ADC_asiaa_adc5g/
  adc_clk_div" derived from NET

"d_codd_64ch_2_P0_ADC_asiaa_adc5g/d_codd_64ch_2_P0_ADC_asiaa_adc5g/adc_clk"

 PERIOD = 2.5 ns HIGH 50%; multiplied by 2.00 to 5 nS and duty cycle
  corrected to HIGH 2.500 nS
  For more information, see Period Analysis in the Timing Closure User
Guide (UG612).

   1594959 paths analyzed, 343705 endpoints analyzed, 63 failing endpoints
   63 timing errors detected. (63 setup errors, 0 hold errors, 0
component switching limit errors)
   Minimum period is   5.727ns.



  Slack:  -0.727ns (requirement - (data path - clock
path skew + uncertainty))
Source:
d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/app_tx_validR
(FF)
Destination:
d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/tx_packet_fifo_inst/BU2/U0/grf.rf/mem/gbm.gbmg.gbmga.ngecc.bmg/blk_mem_generator/valid.cstr/ramloop[1].ram.r/v5_noinit.ram/SDP.SINGLE_PRIM36.TDP
(RAM)
Requirement:  5.000ns
Data Path Delay:  5.721ns (Levels of Logic = 1)
Clock Path Skew:  0.054ns (2.112 - 2.058)
Source Clock: adc0_clk rising at 0.000ns
Destination Clock:adc0_clk rising at 5.000ns
Clock Uncertainty:0.060ns

Clock Uncertainty:  0.060ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
  Total System Jitter (TSJ):  0.070ns
  Discrete Jitter (DJ):   0.097ns
  Phase Error (PE):   0.000ns

Maximum Data Path at Slow Process Corner:
d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/app_tx_validR
to
d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/tx_packet_fifo_inst/BU2/U0/grf.rf/mem/gbm.gbmg.gbmga.ngecc.bmg/blk_mem_generator/valid.cstr/ramloop[1].ram.r/v5_noinit.ram/SDP.SINGLE_PRIM36.TDP

  LocationDelay type Delay(ns) Physical
Resource
Logical
Resource(s)
  
---
  SLICE_X81Y192.CQTcko  0.337
d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/app_tx_validR

d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/app_tx_validR

  SLICE_X23Y278.A6net (fanout=5)4.173
d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/app_tx_validR

  SLICE_X23Y278.A Tilo  0.068
d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/tx_packet_fifo_inst/BU2/U0/grf.rf/ram_wr_en

d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/tx_packet_fifo_inst/BU2/U0/grf.rf/gl0.wr/ram_wr_en_i1

  RAMB36_X1Y54.WEAU3  net (fanout=16)   0.628
d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/tx_packet_fifo_inst/BU2/U0/grf.rf/ram_wr_en

  RAMB36_X1Y54.CLKARDCLKU Trcck_WEA 0.515
d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/tx_packet_fifo_inst/BU2/U0/grf.rf/mem/gbm.gbmg.gbmga.ngecc.bmg/blk_mem_generator/valid.cstr/ramloop[1].ram.r/v5_noinit.ram/SDP.SINGLE_PRIM36.TDP

d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00

Re: [casper] Timing Errors ROACH2

2015-11-05 Thread Ryan Monroe

Hi Amit,

FYI, I am CC'ing the CASPER list on all of these emails, so that they 
can be searchable for people in the future.


The placements I gave you were for my personal design, and may not work 
for yours.  When choosing pblock size, be sure to look at the pblock 
utilization in planahead.  odds are that you had either
1. a pblock which was outright too small for the components placed 
within (tools error out almost instantly)
2. multiple overlapping pblocks, in such a way that it's impossible to 
satisfy both constraints simultaneously (tools error out after much 
longer).  pblock statistics don't help with this one, but you can bash 
it out by hand with more difficulty.


look at the error from line 327-561 on your report.  this indicates the 
constraints which caused the issue, as well as some of the components 
involved.  that's a good place to start!


--Ryan

On 11/05/2015 12:20 AM, aban...@mpifr-bonn.mpg.de wrote:

Hi Ryan,

I tried to place the 10gbe pblocks as you suggested. Unfortunately, 
the tool was unable to place all components. The pblock statistics 
were more than enough. Do you know how to avoid this problem ? The 
tool took long time to give out errors (~11 hrs).


I have enclosed the map report.

Regards,
Amit

On 04.11.2015 09:37, Ryan Monroe wrote:

np!  the 10gbe core wasn't really intended to run at fast clock
rates.  Be sure to constrain it to the east-ish side of the chip, this
is almost either a
1. device utilization issue (you are trying to do too much stuff on
the chip), or
2. placement issue (probably this)-- the tools are terrible at
placing things in the right spots

Send me any questions you have, but I'm pretty busy and reserve the
right to be bad about answering ;-)

--Ryan



On 11/04/2015 12:32 AM, Amit Bansod wrote:

Hi Ryan,

Thanks a lot! I will give this a try!

Cheers,
Amit

On 04-Nov-15 9:31 AM, Ryan Monroe wrote:

Dear Amit,

Please consider my memo 50 on the CASPER list: "Performance 
optimization

for Virtex 6 CASPER designs" [1]

As it turns out, I have a design open right now, which must close 8
10gbe cores at 312.5 mhz.  Attached is an image of where I placed my
tge_tx_inst and tge_rx_inst pblocks, which I hope will be helpful for
you.  Wish I had time to do more for you, but such is life :-/

Cheers!

--Ryan Monroe

[1] https://casper.berkeley.edu/wiki/Memos

On 11/04/2015 12:17 AM, aban...@mpifr-bonn.mpg.de wrote:

Dear All,

I am getting following timing errors on 10GbE yellow blocks which 
I am

finding it hard to get rid off. I am running my design at 200 MHz. I
have given the device utilization summary if it is useful. I have 
also

enclosed the files.

Best Regards,
Amit

Timing Errors:
Timing constraint: PERIOD analysis for net
"d_codd_64ch_2_P0_ADC_asiaa_adc5g/d_codd_64ch_2_P0_ADC_asiaa_adc5g/
  mmcm_clkout1" derived from  PERIOD analysis for net
"d_codd_64ch_2_P0_ADC_asiaa_adc5g/d_codd_64ch_2_P0_ADC_asiaa_adc5g/
  adc_clk_div" derived from NET


"d_codd_64ch_2_P0_ADC_asiaa_adc5g/d_codd_64ch_2_P0_ADC_asiaa_adc5g/adc_clk" 



 PERIOD = 2.5 ns HIGH 50%; multiplied by 2.00 to 5 nS and duty 
cycle

  corrected to HIGH 2.500 nS
  For more information, see Period Analysis in the Timing Closure 
User

Guide (UG612).

   1594959 paths analyzed, 343705 endpoints analyzed, 63 failing 
endpoints

   63 timing errors detected. (63 setup errors, 0 hold errors, 0
component switching limit errors)
   Minimum period is   5.727ns.


 



  Slack:  -0.727ns (requirement - (data path - clock
path skew + uncertainty))
Source:

d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/app_tx_validR 


(FF)
Destination:

d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/tx_packet_fifo_inst/BU2/U0/grf.rf/mem/gbm.gbmg.gbmga.ngecc.bmg/blk_mem_generator/valid.cstr/ramloop[1].ram.r/v5_noinit.ram/SDP.SINGLE_PRIM36.TDP 


(RAM)
Requirement:  5.000ns
Data Path Delay:  5.721ns (Levels of Logic = 1)
Clock Path Skew:  0.054ns (2.112 - 2.058)
Source Clock: adc0_clk rising at 0.000ns
Destination Clock:adc0_clk rising at 5.000ns
Clock Uncertainty:0.060ns

Clock Uncertainty:  0.060ns  ((TSJ^2 + DJ^2)^1/2) / 2 
+ PE

  Total System Jitter (TSJ):  0.070ns
  Discrete Jitter (DJ):   0.097ns
  Phase Error (PE):   0.000ns

Maximum Data Path at Slow Process Corner:

d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/app_tx_validR 


to

d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/tx_packet_fifo_inst/BU2/U0/grf.rf/mem/gbm.gbmg.gbmga.ngecc.bmg/blk_mem_generator/valid.cstr/ramloop[1].ram.r/v5_noinit.ram/SDP.SINGLE_PRIM36.TDP 



  Location   

Re: [casper] Timing Errors ROACH2

2015-11-09 Thread Ryan Monroe
Assuming that the pblock is not overlapping with any other pblocks, and 
that you have not constrained any other resources in the pblock, I have 
used 99% of the resources in a pblock (simply made it as small as 
possible).  The trouble comes into play when there are resources which 
are not in the pblock, but also somehow constrained in the region.


If my errors are especially egregious, I'll often relax my timing 
constraints, try to close timing at a lower speed, and then increase the 
goal once it becomes more reasonable.  The tools really do not handle 
failing constraints miserably very well.


Cheers,
-Ryan

On 11/09/2015 07:40 AM, Amit Bansod wrote:

Hi Ryan,

I could get rid of those errors but the tool had hard time in placing
with huge setup timing errors. How much utilization is good to set on
p-blocks ?

Currently, I had 60-70% for different components.

Cheers,
Amit

On 05-Nov-15 9:46 AM, Ryan Monroe wrote:

Hi Amit,

FYI, I am CC'ing the CASPER list on all of these emails, so that they
can be searchable for people in the future.

The placements I gave you were for my personal design, and may not work
for yours.  When choosing pblock size, be sure to look at the pblock
utilization in planahead.  odds are that you had either
1. a pblock which was outright too small for the components placed
within (tools error out almost instantly)
2. multiple overlapping pblocks, in such a way that it's impossible to
satisfy both constraints simultaneously (tools error out after much
longer).  pblock statistics don't help with this one, but you can bash
it out by hand with more difficulty.

look at the error from line 327-561 on your report.  this indicates the
constraints which caused the issue, as well as some of the components
involved.  that's a good place to start!

--Ryan

On 11/05/2015 12:20 AM, aban...@mpifr-bonn.mpg.de wrote:

Hi Ryan,

I tried to place the 10gbe pblocks as you suggested. Unfortunately,
the tool was unable to place all components. The pblock statistics
were more than enough. Do you know how to avoid this problem ? The
tool took long time to give out errors (~11 hrs).

I have enclosed the map report.

Regards,
Amit

On 04.11.2015 09:37, Ryan Monroe wrote:

np!  the 10gbe core wasn't really intended to run at fast clock
rates.  Be sure to constrain it to the east-ish side of the chip, this
is almost either a
1. device utilization issue (you are trying to do too much stuff on
the chip), or
2. placement issue (probably this)-- the tools are terrible at
placing things in the right spots

Send me any questions you have, but I'm pretty busy and reserve the
right to be bad about answering ;-)

--Ryan



On 11/04/2015 12:32 AM, Amit Bansod wrote:

Hi Ryan,

Thanks a lot! I will give this a try!

Cheers,
Amit

On 04-Nov-15 9:31 AM, Ryan Monroe wrote:

Dear Amit,

Please consider my memo 50 on the CASPER list: "Performance
optimization
for Virtex 6 CASPER designs" [1]

As it turns out, I have a design open right now, which must close 8
10gbe cores at 312.5 mhz.  Attached is an image of where I placed my
tge_tx_inst and tge_rx_inst pblocks, which I hope will be helpful for
you.  Wish I had time to do more for you, but such is life :-/

Cheers!

--Ryan Monroe

[1] https://casper.berkeley.edu/wiki/Memos

On 11/04/2015 12:17 AM, aban...@mpifr-bonn.mpg.de wrote:

Dear All,

I am getting following timing errors on 10GbE yellow blocks which
I am
finding it hard to get rid off. I am running my design at 200 MHz. I
have given the device utilization summary if it is useful. I have
also
enclosed the files.

Best Regards,
Amit

Timing Errors:
Timing constraint: PERIOD analysis for net
"d_codd_64ch_2_P0_ADC_asiaa_adc5g/d_codd_64ch_2_P0_ADC_asiaa_adc5g/
   mmcm_clkout1" derived from  PERIOD analysis for net
"d_codd_64ch_2_P0_ADC_asiaa_adc5g/d_codd_64ch_2_P0_ADC_asiaa_adc5g/
   adc_clk_div" derived from NET


"d_codd_64ch_2_P0_ADC_asiaa_adc5g/d_codd_64ch_2_P0_ADC_asiaa_adc5g/adc_clk"


  PERIOD = 2.5 ns HIGH 50%; multiplied by 2.00 to 5 nS and duty
cycle
   corrected to HIGH 2.500 nS
   For more information, see Period Analysis in the Timing Closure
User
Guide (UG612).

1594959 paths analyzed, 343705 endpoints analyzed, 63 failing
endpoints
63 timing errors detected. (63 setup errors, 0 hold errors, 0
component switching limit errors)
Minimum period is   5.727ns.





   Slack:  -0.727ns (requirement - (data path - clock
path skew + uncertainty))
 Source:

d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/app_tx_validR

(FF)
 Destination:

d_codd_64ch_2_10G_10GBE2_gbe00/d_codd_64ch_2_10G_10GBE2_gbe00/tge_tx_inst/tx_packet_fifo_inst/BU2/U0/grf.rf/mem/gbm.gbmg.gbmga.ngecc.bmg/blk_mem_generator/valid.cstr/ramloop[1].ram.r/v5_noinit.ram/SDP.SINGLE_PRIM36.TDP

(RAM)
 Requirement:  

Re: [casper] FFT woes

2015-11-11 Thread Ryan Monroe
I would conjecture that reinstalling ISE will not solve your problem, 
since nothing in the ISE directories is normally changed anyways (unless 
you hacked something?)  I have also personally experienced issues 
involving upper-case characters in either my model-name or block names 
(I think it was model name).


I would suggest getting in touch with Jonathon Kocz (CC'ed), who has 
experience with this issue.  I believe he resolved it by black-boxing 
the FFT.


Cheers!

https://casper.berkeley.edu/wiki/images/a/a4/Black_box_memo.pdf

--Ryan Monroe

On 11/11/2015 10:43 AM, Michael D'Cruze wrote:


Hi Dan,

I have read about such problems. I’m using Red Hat version 6.7. We 
were originally using Debian but switched exclusively because it was, 
at the time, the only linux O/S that Xilinx would support.


Reinstalling the O/S isn’t really an option, but trashing and 
reinstalling ISE might be…. This hasn’t worked for Andrew Martens et 
al., however.


BW
Michael

*From:*dan.werthi...@gmail.com [mailto:dan.werthi...@gmail.com] *On 
Behalf Of *Dan Werthimer

*Sent:* 11 November 2015 18:38
*To:* Michael D'Cruze
*Cc:* Jack Hickish; casper@lists.berkeley.edu
*Subject:* Re: [casper] FFT woes

hi michael,

what operating system are you using?

we have seen problems where the FFT works in simulation,

and doesn't produce correct results on the FPGA when we were compiling 
using a non-xilinx supported


operating system.

the problem occurred only for large FFT's -  i think 8K or larger.

best wishes,

dan

On Wed, Nov 11, 2015 at 7:34 AM, Michael D'Cruze 
<mailto:michael.dcr...@postgrad.manchester.ac.uk>> wrote:


Hi Jack

Sorry it’s taken me so long to come back (I’m going to write back to 
everyone shortly). I’ve been chasing a few hunches I’ve had which 
might have exonerated the FFT, but to no avail. Indeed the FFT does 
simulate OK, but in the majority of cases in hardware every other 
channel is a zero. I say in the majority of cases, because in one or 
two cases the design works correctly. I have not been able to find a 
reason for this yet.


BW
Michael

*From:*Jack Hickish [mailto:jackhick...@gmail.com 
<mailto:jackhick...@gmail.com>]

*Sent:* 03 November 2015 01:01
*To:* Michael D'Cruze
*Cc:* casper@lists.berkeley.edu <mailto:casper@lists.berkeley.edu>
*Subject:* Re: [casper] FFT woes

Hi Michael,

Just so everyone is on the same page -- does your issue only show up 
in hardware like Andrew/Jonathon's - i.e., in simulation the FFT works ok?


Jack

On 3 November 2015 at 00:57, Michael D'Cruze 
<mailto:michael.dcr...@postgrad.manchester.ac.uk>> wrote:


Dear all,

Following on from the email thread from Jonathan Kocz and Andrew 
Martens about odd FFT outputs….


I’ve been experiencing similar inexplicable problems for a while now. 
Every other channel in my output is invariably a zero. I’ve tried 
everything I can think of, including solutions along the lines of 
those observed to work by Jonathan and Andrew (black-boxing, changing 
mask parameters etc.), in addition to wiping clean my libraries and 
re-syncing with casper-astro-soak-test. I’ve even re-drawn the entire 
model from scratch. The results are always the same. Below is a link 
to an example output.


https://dl.dropboxusercontent.com/u/38103354/32k_test_image.png

Hopefully it’s clear from a_0 (note that a_0 is zoomed in, a_1 is not) 
that every other channel outputs zero, and the interleaved a_0 and a_1 
spectra (to form the full 32k channel spectrum) are interleaving 
correctly to produce pairs of zeroes. I’ve been trying various things 
for quite a while now, without success and would appreciate some 
suggestions…!


Thanks

Michael





Re: [casper] fft_biplex_real_2x

2016-01-20 Thread Ryan Monroe
I don't use the stock CASPER FFTs anymore, but I'm pretty sure that 
there's no way to use them for anything less than {2 complex inputs 
--OR-- 4 real inputs}.  If you want less, you can drive an input with a 
constant '0', but resource-wise, they're the same.  This is because of 
algorithmic limitations; there is a resource efficiency you gain by 
doing two complex FFTs at once.


This is a time for a streaming Xilinx FFT.

--Ryan

On 01/19/2016 09:49 PM, James Smith wrote:

Hi Rolando,

I can't recall that it does, off the top of my head, but the Casper 
one can be set up to use just one input. This is what I've done in the 
past, I think.


Regards,
James


On Wed, Jan 20, 2016 at 7:39 AM, Rolando Paz > wrote:


Hi James and Andrew

Thank for yours advices.

I'm trying to recompile the design of Peter McMahon:
https://casper.berkeley.edu/wiki/Parspec

I'm using these libraries:

https://github.com/casper-astro/mlib_devel/tree/mlib_devel-2010-09-20

and I use a virtual machine "windows XP SP3", on ubuntu 14.04LTS,
Matlab R2007b,
ISE, EDK, SG 10.1, with respective updates, IBOB+QUADC.

With this configuration, I can not compile this new design.

I'll try with Xilinx FFT...
Is there a Xilinx block version for the PFB too?

Thank you.



2016-01-19 23:23 GMT-06:00 Andrew Martens mailto:and...@ska.ac.za>>:

Hi Rolando

You may want to look at the Xilinx FFT for your use case. The
CASPER FFT is optimised so that minimal resources are used
when processing high bandwidths (either many inputs, or inputs
captured at high sample rates). In this case you may find that
the Xilinx FFT actually uses fewer resources.

Regards
Andrew

On Tue, Jan 19, 2016 at 11:11 PM, Rolando Paz
mailto:flx...@gmail.com>> wrote:

Hi

Is there any other FFT block that I can use with ADC4x250-8?

https://casper.berkeley.edu/wiki/ADC4x250-8

I am using the "fft_biplex_real_2x" block, however I need
only one input of the four that this block has. I placed
at zero the others three inputs.

I need more FPGA resources from IBOB, and I think using
another FFT block may be one solution.

Best Regards

RP








Re: [casper] Compiler merging SRLs

2016-01-25 Thread Ryan Monroe
Turning off Behavioral HDL on the relevant instances *is* the correct 
answer.  This should not impact resource utilization significantly.


If it does, that means you have a large-fanout net, and can improve 
timing and utilization simultaneously by doing fanout in a couple steps 
(ie, one step with a fanout of 6, followed by 6x steps with fanout of 6 
each (=fanout 36 across two steps).


I agree with Jack.  DOWN WITH ISE!!!  Scarab looked good, if a tiny bit 
pricey.


--Ryan

On 01/25/2016 07:17 AM, Jack Hickish wrote:


Ha, I just read my email in the thread you linked. I guess turning off 
behavioural hdl isn't (ever? always?) the solution.



On Mon, 25 Jan 2016 5:14 pm Jack Hickish > wrote:


Hi Matt,

You can resynthesize the "main" simulink netlist, but off the top
of my head I don't know the exact way to go about this. I think
you can dig out the netlist from the sysgen build directory and
use the resynth script on that. Perhaps Dave MacMahon (who I
believe wrote that script) could comment further.
My experience was that adding the lc_off flag helped sometimes,
but I still found luts combined on some occasions - I never got
satisfactorily to the bottom of this.
You can explicitly prevent combining of some blocks (eg. delays)
by turning off any behavioural hdl options they have, but this
isn't viable if it needs doing to so many blocks that resource
utilisation ends up being too horrifically impacted.

Having just seen the skarab/roach3 presentation in South Africa, I
fondly await the demise of virtex 6 and ISE.

Jack


On Sat, 23 Jan 2016 10:48 pm Matt Strader
mailto:mstra...@physics.ucsb.edu>> wrote:

Hello Casperites (especially Jack),

I've run into the problem that Jack describes in this thread:
http://www.mail-archive.com/casper%40lists.berkeley.edu/msg05581.html
The compiler keeps wanting to combine unrelated LUTs on
opposite sides of the Virtex 6, resulting in timing errors
that are 90% routing.

At the end of the thread Jack says the solution is to use the
"-lc off" option in resynth_netlist and in map.
I'm using only one black box containing my pfb and fft.  The
rest of my design is not black boxed.  I used resynth_netlist
on the black box's ngc file, but it looks like I can't use it
on the netlists generated by casper_xps.  Is that right?
I also added "-lc off" to the map options in
XPS_ROACH2_base/etc/fast_runtime.opt

After doing this, I'm still getting timing errors from LUT
combining.  Is there somewhere else I need to turn this off? 
Any suggestions?


Thanks,
Matt Strader








Re: [casper] arp: unknown or malformed arp packet

2016-03-31 Thread Ryan Monroe
This may not be the answer you're looking for, but IIRC I eventually 
just populated the ARP tables manually.  I don't know if this was the 
problem I was having though.


On 03/31/2016 07:23 AM, Amit Bansod wrote:

Hi All,

We are seeing, "arp: unknown or malformed arp packet" messages on ROACH2
board, quite frequently.

In our setup, we have ROACH2 board sending data out via a  40G switch.
Sometimes the data is broadcast from ROACH2 boards instead of sending to
a particular ip address.

The 10GbE core details show ARP table having unknown MAC addresses tied
to unused IP addresses. Is this usual ? We do not have anything else
connected in this closed network.

We are trying to figure out if the above message is related to this issue.

How can we debug the root of this message ?

Thanks,
Amit






[casper] ROACH2 inconsistently crashes?

2016-07-25 Thread Ryan Monroe

Hi all!

I have two ROACH2s which I am operating remotely.  I have a bit file and 
script which I have previously successfully tested in the past.


Currently, it works with one of the two ROACH2s, but fails at an 
inconsistent location with the other.  Specifically, the bad ROACH will 
stop responding to all telnet connections, causing a "client not 
connected" via python.  The only resolution is to power-cycle the ROACH, 
to my knowledge.  Both ROACHes are using soloboot, and they have the 
same clock, 1pps source, similar input signal characteristics, etc.


To reiterate, this issue never happens with the other ROACH2 o_0

Any experience with this issue?  Thanks in advance!

--


Ryan Monroe
PhD Student | Electrical Engineering
California Institute of Technology
Cahill 255 | 626.773.0805




Re: [casper] Programming a ROACH2

2016-10-07 Thread Ryan Monroe

rmonroe@rmonroe-ThinkPad-P50:~$ sudo pip install spead
[sudo] password for rmonroe:
The directory '/home/rmonroe/.cache/pip/http' or its parent directory is 
not owned by the current user and the cache has been disabled. Please 
check the permissions and owner of that directory. If executing pip with 
sudo, you may want sudo's -H flag.
The directory '/home/rmonroe/.cache/pip' or its parent directory is not 
owned by the current user and caching wheels has been disabled. check 
the permissions and owner of that directory. If executing pip with sudo, 
you may want sudo's -H flag.

Collecting spead
  Downloading spead-0.5.1.tar.gz (61kB)
100% || 71kB 1.5MB/s
Installing collected packages: spead
  Running setup.py install for spead ... done
Successfully installed spead-0.5.1
rmonroe@rmonroe-ThinkPad-P50:~$ ipython
Python 2.7.12 (default, Jul  1 2016, 15:12:24)
Type "copyright", "credits" or "license" for more information.

IPython 2.4.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help  -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import spead

In [2]:


On 10/07/2016 02:04 AM, Heystek Grobler wrote:

Hi James

I installed the PySpead package but I get the following error when I 
run the tut3.py script:


ImportError: No mode named spead

Do you perhaps have any ideas on how to solve it?

Thank you!!

Heystek

On Fri, Oct 7, 2016 at 11:01 AM, James Smith > wrote:


Hello Heystek,

If you're still in the Python environment, then PySpead is the one
you want.

Regards,
James


On Fri, Oct 7, 2016 at 10:59 AM, Heystek Grobler
mailto:heystekgrob...@gmail.com>> wrote:

Good Day

After a while of troubleshooting I determined the connection
with TCP/IP into the board was correct. I updated the file
system and kernel of the ROACH2 and now I can program it.

I only need to install a package called spead in order to use
the .bof file with tutorial 3 of CASPER.

Where can a download the spead package from? I can only find
PySpead and Spead2.

Thanks for everyones help

Heystek!

On Fri, Sep 30, 2016 at 5:28 PM, Adam Isaacson
mailto:aisaac...@ska.ac.za>> wrote:

Hi Heystek,

If you want to telnet, which is another way of configuring
your board, then you need to state the port. Are you doing
the following from the terminal:

1)Telnet to port 7147: "telnet  7147".

2)?progremote fpgfile.fpg

You say you can ping your board, so you should be able to
connect via casperfpga, as you mentioned above. Did you do
what James suggested i.e. try running fpga.is_connected()?
if it reports "True" then you are connected and if false
then you will need to debug further. Are you sure that the
IP you are pinging is your roach2 - may sound like a silly
question, but I don't know your setup.

Kind Regards,

Adam


On Fri, Sep 30, 2016 at 3:38 PM, Heystek Grobler
mailto:heystekgrob...@gmail.com>> wrote:

Hi James

I will try it. Through the terminal I can ping the
board, but I cant open a Telnet connection.

When I open a ttyUSB connection to the Roach en
monitor it, and try to upload the fga file, the Roach
gives the same error "progremote"

Thats why I'm confused

Thanks for your help!

I really appreciate it

Heystek


On Friday, 30 September 2016, James Smith
mailto:jsm...@ska.ac.za>> wrote:

Hello Heystek,

Before you program the ROACH2, I'd suggest trying
fpga.is_connected() and fpga.est_clk_frequency()
to check whether you can actually communicate with
the ROACH2. It might be a network cable that's
been unplugged by accident - that's where I've
seen those errors before. The
fpga=casperfpga.katcp_fpga.KatcpFpga('roachname or
ip_address') doesn't actually throw an error if it
can't connect to the ROACH2. This information
would at least help you narrow down the
possibilities as to what's wrong (i.e. whether
it's the kernel on the ROACH2).

Disclaimer: I work only on ROACH, but I'm fairly
certain the procedure would be the same.

Regards,
James


On Fri, Sep 30, 2016 at

Re: [casper] Programming a ROACH2

2016-10-07 Thread Ryan Monroe
I would suggest using "pip uninstall spead" instead -- I don't recall 
ever using it myself, but it appears to be the pip-sanctioned way of 
removing something.



On 10/07/2016 02:24 AM, James Smith wrote:

Hello Heystek,

Pip is seeing that you've already got a version of Spead installed, 
which might not have worked. You can delete the directory to 
'uninstall' it (Request for comment: is this a safe approach? It's 
what I've always done with no problems.)


Before you try that though, perhaps just try importing spead in 
ipython as Ryan did. What are the error messages?


Regards,
James


On Fri, Oct 7, 2016 at 11:23 AM, Heystek Grobler 
mailto:heystekgrob...@gmail.com>> wrote:


Hi James and Ryan

I tried sudo pip install spead and I get the following

Requirment already satisfied (use --upgrade): spead in
usr/local/lib/python2.7/dist-packages
cleaning up

Any ideas?

I am a bit lost to be honest.

On Fri, Oct 7, 2016 at 11:09 AM, James Smith mailto:jsm...@ska.ac.za>> wrote:

Hello Heystek,

I vaguely recall installing spead from pip as well, as Ryan
has done here. Give that a whirl.

Regards,
    James


On Fri, Oct 7, 2016 at 11:06 AM, Ryan Monroe
mailto:ryan.m.mon...@gmail.com>> wrote:

rmonroe@rmonroe-ThinkPad-P50:~$ sudo pip install spead
[sudo] password for rmonroe:
The directory '/home/rmonroe/.cache/pip/http' or its
parent directory is not owned by the current user and the
cache has been disabled. Please check the permissions and
owner of that directory. If executing pip with sudo, you
may want sudo's -H flag.
The directory '/home/rmonroe/.cache/pip' or its parent
directory is not owned by the current user and caching
wheels has been disabled. check the permissions and owner
of that directory. If executing pip with sudo, you may
want sudo's -H flag.
Collecting spead
  Downloading spead-0.5.1.tar.gz (61kB)
100% || 71kB 1.5MB/s
Installing collected packages: spead
  Running setup.py install for spead ... done
Successfully installed spead-0.5.1
rmonroe@rmonroe-ThinkPad-P50:~$ ipython
Python 2.7.12 (default, Jul  1 2016, 15:12:24)
Type "copyright", "credits" or "license" for more information.

IPython 2.4.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help  -> Python's own help system.
object?   -> Details about 'object', use 'object??' for
extra details.

In [1]: import spead

In [2]:


On 10/07/2016 02:04 AM, Heystek Grobler wrote:

Hi James

I installed the PySpead package but I get the following
error when I run the tut3.py script:

ImportError: No mode named spead

Do you perhaps have any ideas on how to solve it?

Thank you!!

Heystek

On Fri, Oct 7, 2016 at 11:01 AM, James Smith
mailto:jsm...@ska.ac.za>> wrote:

Hello Heystek,

If you're still in the Python environment, then
PySpead is the one you want.

Regards,
James


On Fri, Oct 7, 2016 at 10:59 AM, Heystek Grobler
mailto:heystekgrob...@gmail.com>> wrote:

Good Day

After a while of troubleshooting I determined the
connection with TCP/IP into the board was
correct. I updated the file system and kernel of
the ROACH2 and now I can program it.

I only need to install a package called spead in
order to use the .bof file with tutorial 3 of
CASPER.

Where can a download the spead package from? I
can only find PySpead and Spead2.

Thanks for everyones help

Heystek!

On Fri, Sep 30, 2016 at 5:28 PM, Adam Isaacson
mailto:aisaac...@ska.ac.za>> wrote:

Hi Heystek,

If you want to telnet, which is another way
of configuring your board, then you need to
state the port. Are you doing the following
from the terminal:

1)Telnet to port 7147: "telnet  714

Re: [casper] Programmable fractional delay block

2017-01-11 Thread Ryan Monroe

I have something for the frequency domain


On 01/11/2017 02:22 PM, Daniel C Price wrote:

Hi all

Does anyone have an implementation of a runtime-programmable 
fractional delay simulink block (e.g. 1/10th of a clock cycle) that 
they would be willing to share?


Regards
Danny

--
Danny Price | dan...@berkeley.edu | +1 
617-386-3700




Re: [casper] contact

2017-08-09 Thread Ryan Monroe

Next workshop is in Pasadena this upcoming week!



On 08/09/2017 02:02 PM, Madden, Timothy J. wrote:



We have been using a ROACH for several years, and I have found the 
CASPER mailing lists to be useful, and the community helpful. Not sure 
what being a "member" of Casper is. We do not pay any fee. We just 
joined the mailing list. There are ROACH  conferences once a year. I 
have not gone, but wish I have because they seem useful.


Tim Madden
Argonne Lab



*From:* Beatriz Garcia 
*Sent:* Wednesday, August 9, 2017 3:04 PM
*To:* casper@lists.berkeley.edu
*Subject:* [casper] contact
Hello
we are starting to work in some development in radio astronomy and 
people from Cyntony Corporation suggested to join CASPER, at this 
stage, because we are buying a ROACH system (and be a member assure a 
discount) but in a future, because we will need a group of discussion 
to improve our development


Could you help me with this?

This is enough to be consider member of CASPER?
*
*
*thanks

Beatriz Garcia
*
--
You received this message because you are subscribed to the Google 
Groups "casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to casper+unsubscr...@lists.berkeley.edu 
.
To post to this group, send email to casper@lists.berkeley.edu 
.

--
You received this message because you are subscribed to the Google 
Groups "casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to casper+unsubscr...@lists.berkeley.edu 
.
To post to this group, send email to casper@lists.berkeley.edu 
.


--
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] 10 GbE SPF+ to 1 GbE RJ45 adapter

2017-08-10 Thread Ryan Monroe
Why not use the 1GbE port on the ROACH2?  No need to go for 10GbE (SFP+) 
ports



On 08/10/2017 01:16 PM, Xavier Bosch wrote:

Hi,

As I explained in a previous email, I am looking for a way to send the 
slow-throughput ROACH2 data  (~25 Mbps) to a PC without using the SPF+ 
ports on the PC side.  Mobility is important for the project that I am 
working, so I want to be able to use any laptop for that purpose 
without the burden of having to have a 10 GeB SPF+ card


I first considered the 1 GbE but this implies a big reconfiguration of 
the design sine the outputs are 8 bit instead of 64 that the 10 GbE 
interface uses.


Have anyone tried to use a SPF+ to 1 GbE RJ45 connector like the ones 
in the attached picture?


I did some research and I found that the Quad SFP+ Mezzanine board 
uses the Vitesse VSC8488XJU chip which in theory should allow such 
connection, as you can see in this spec. sheet 
http://www.mouser.com/ds/2/523/microsemi_VSC8489-01_PB-883836.pdf


So far, I tested 2 different adapters at port 0 with no luck.
Does anyone know if this is possible?
Is the 1 GbE option enable in the mezzanine board?

Thank you,
XB
--
You received this message because you are subscribed to the Google 
Groups "casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to casper+unsubscr...@lists.berkeley.edu 
<mailto:casper+unsubscr...@lists.berkeley.edu>.
To post to this group, send email to casper@lists.berkeley.edu 
<mailto:casper@lists.berkeley.edu>.


--

Ryan Monroe
PhD Student | Electrical Engineering
California Institute of Technology
Cahill 255 | 626.773.0805

--
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] 10 GbE SPF+ to 1 GbE RJ45 adapter

2017-08-10 Thread Ryan Monroe
I have not attempted the tutorial, but if I was testing the adapters, I 
would do something like this:


1. Take a known-working demo (which does loopback, for example)
2. Test the demo yourself and confirm it is doing loopback successfully
3. Disconnect the TX port from the ROACH and plug in your converter. 
   Connect to your pc via 1gbe and use wireshark to see if packets are

   coming through.
1. If this works, then you know that the connector works and your
   code is broken -- knowing what I do about writing 10GbE code,
   it's probably that!!
4. Now, test your own code with the device


I have used 1GbE and the interface is almost identical to the 10GbE 
interface.  Only difference is that you fill the interface one byte at a 
time.  So, toss an 8-mux in front of the input, driving each of the mux 
inputs with a different one-byte slice of the 8-byte 10GbE input word.  
Attach a 3-bit counter to the "select" port, and you're off to the races!



I strongly suggest starting off by connecting a simple constant or 
counter to your input at first.  Change it to something more 
sophisticated once you get your endianness and basic packet reading 
sorted out.



On 08/10/2017 01:32 PM, Xavier Bosch wrote:

Ryan,
Thank you for your response.
You are right, I first considered the 1 GbE port but this implies to 
restructure the data packaging, instead of using 64 bits of length, as 
I am right now. I have to change it to 8.
Besides the fact that there is no much information on the 1 GbE port, 
but there is a tutorial for teh 10 GbE, which made it quite easy to 
implement it.
Is not that it cannot be done, but design and test would take me a 
significant amount of time.

If I could solve the problem using an adapter that would be great !  :-)

XB


On Thu, Aug 10, 2017 at 1:22 PM Ryan Monroe <mailto:ryan.m.mon...@gmail.com>> wrote:


Why not use the 1GbE port on the ROACH2?  No need to go for 10GbE
(SFP+) ports


On 08/10/2017 01:16 PM, Xavier Bosch wrote:

Hi,

As I explained in a previous email, I am looking for a way to
send the slow-throughput ROACH2 data  (~25 Mbps) to a PC without
using the SPF+ ports on the PC side.  Mobility is important for
the project that I am working, so I want to be able to use any
laptop for that purpose without the burden of having to have a 10
GeB SPF+ card

I first considered the 1 GbE but this implies a big
reconfiguration of the design sine the outputs are 8 bit instead
of 64 that the 10 GbE interface uses.

Have anyone tried to use a SPF+ to 1 GbE RJ45 connector like the
ones in the attached picture?

I did some research and I found that the Quad SFP+ Mezzanine
board uses the Vitesse VSC8488XJU chip which in theory should
allow such connection, as you can see in this spec. sheet
http://www.mouser.com/ds/2/523/microsemi_VSC8489-01_PB-883836.pdf

So far, I tested 2 different adapters at port 0 with no luck.
Does anyone know if this is possible?
Is the 1 GbE option enable in the mezzanine board?

Thank you,
XB
-- 
You received this message because you are subscribed to the

Google Groups "casper@lists.berkeley.edu"
<mailto:casper@lists.berkeley.edu> group.
To unsubscribe from this group and stop receiving emails from it,
send an email to casper+unsubscr...@lists.berkeley.edu
<mailto:casper+unsubscr...@lists.berkeley.edu>.
To post to this group, send email to casper@lists.berkeley.edu
<mailto:casper@lists.berkeley.edu>.


-- 


Ryan Monroe
PhD Student | Electrical Engineering
California Institute of Technology
Cahill 255 |626.773.0805 



--
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] Matlab - Xilinx Help

2017-08-11 Thread Ryan Monroe
I just dealt with this problem yesterday!  Somewhere in your startup.m 
file, you should have a line that looks like this:

xlAddSysgen([getenv('XILINX_PATH'), '/ISE'])

Toss in an extra slash at the end of the path:
xlAddSysgen([getenv('XILINX_PATH'), '/ISE/'])

Now, I still haven't sorted the REST of the debian/xilinx issues 
that's tomorrows problem :-)



On 08/11/2017 01:58 AM, Heystek Grobler wrote:

Good day everyone

The last year I have been working on a Ubuntu system with Xilinx and 
Matlab. Currently I am trying to setup a system on a Debian (Jessie) 
system but I get the following error message when staring up matlab:


Cannot access directory lib/lin64. The libraries under the
path are needed to simulate and netlist designs using blocks
from Xilinx System Generator for DSP blockset.

Does anyone perhaps know how to solve this?

Thanks for the help

Heystek Grobler
--
You received this message because you are subscribed to the Google 
Groups "casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to casper+unsubscr...@lists.berkeley.edu 
.
To post to this group, send email to casper@lists.berkeley.edu 
.


--
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] Dealing with extreme RFI

2017-12-08 Thread Ryan Monroe


On 12/08/2017 01:04 AM, Jean Borsenberger wrote:
I knew that correlators could operate using only the sign bit trough 
the van-vleck correction and total-power denormalization, but I did 
not know that this also applies to spectrometers. I will look into it. 


"To the extent of my (imperfect) knowledge, this does not apply to 
spectrometers.  IIRC, even for a correlator, this is under a 
no-interference" and "white gaussian input" assumptions which are 
unlikely to be true in a heavy RFI environment.


PS RFI power being 17X signal power is a bit high but not insane.  In 
several systems I've worked on, the interference power is O(8x) signal.  
You might want to be worried about nonlinearity in your ADCs as the RFI 
drives it close to the rails.  If you have strong narrowband RFI, be 
especially concerned with intermodulation products mixing signal and RFI 
to interesting places, which we believe are apparent at the OVRO-LWA (we 
are fortunate that they mix into a non-science band)



Cheers!

--Ryan

--
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


[casper] NVIDIA open sources CUDA

2011-12-14 Thread Ryan Monroe

Hey all,

Not sure if this is interesting/helpful to anyone, but nvidia just 
announced that they were open sourcing CUDA and it's compiler.  This is 
used in high-performance GPU computing and might provide some insights.


http://developer.nvidia.com/content/cuda-platform-source-release

--Ryan Monroe



[casper] Dram problems?

2012-03-13 Thread Ryan Monroe

Hey all,

I'm trying to use the dram as a coefficient buffer, but I'm having some 
problems with the demos.


I tried to use test_dram_10_1 to fill the dram and subsequently read out 
the results, but it looks like the upper eight bits of every 32-bit word 
are being corrupted (or used as some kind of parity bits??)


Attached is a printout of the first couple of words read from the dram.  
As expected, the lower bits are a counter and the upper two bits 
increase across the row, so to speak.  However, the following six bits 
(which should be '0's, since they're part of a counter which starts at 
'0'), are all 1s!


Anyone else seen this?(note: I'm using ISE 13.4.  Might be the 
problem...)


Thanks guys!

--Ryan Monroe
<>

Re: [casper] Dram problems?

2012-03-13 Thread Ryan Monroe

Hi Laura,

Thanks for the quick response!  To clarify, in the image I posted, each 
row is a 'word' in the DRAM.  They are separated into 32-bit chunks for 
clarity.  So, there are as many rows under this (conceptually) as there 
are addresses in the dram.


The first row is address 0, so we shouldn't see counter wrapping.  IMHO, 
there shouldn't be any '1's in the upper bits -- besides the top two, 
which are set by the concat block in the model (picture attached for 
convenience)


Thanks for the tip about the controller!  For this application, we only 
need 2^20 memory locations, so we're good this time.


--Ryan Monroe


On 03/13/2012 12:47 PM, Laura Spitler wrote:

Hi Ryan,

I believe I understand what's going on assuming that your output
should be read by columns and not by rows (and that there are more
rows for each column below what you printed out).

The counter in the design is something like 29-bits (I may have the
exact value off). For a DRAM with 1 GB of memory, there are only 2^25
address, so extra 1's you're seeing at the top of the word is the
counter wrapping.
That said, there was a bug in the original verliog code

( 
mlib_devel_10_1/xps_lib/XPS_ROACH_base/pcores/dram_controller_v1_00_a/hdl/verilog/dram_controller.v
)

where ROW_WIDTH = 13 instead of ROW_WIDTH = 14.

This should have been fixed, but you should probably double check it.
If the bug is still there, only 2^24 addresses are written.

Hope that helps.
Laura


On Tue, Mar 13, 2012 at 3:09 PM, Ryan Monroe  wrote:

Hey all,

I'm trying to use the dram as a coefficient buffer, but I'm having some
problems with the demos.

I tried to use test_dram_10_1 to fill the dram and subsequently read out the
results, but it looks like the upper eight bits of every 32-bit word are
being corrupted (or used as some kind of parity bits??)

Attached is a printout of the first couple of words read from the dram.  As
expected, the lower bits are a counter and the upper two bits increase
across the row, so to speak.  However, the following six bits (which should
be '0's, since they're part of a counter which starts at '0'), are all 1s!

Anyone else seen this?(note: I'm using ISE 13.4.  Might be the
problem...)

Thanks guys!

--Ryan Monroe


<>

Re: [casper] Dram problems?

2012-03-13 Thread Ryan Monroe
Ouch!  I feel pretty silly right now... looks like that was the 
problem.  Good call!


--Ryan


On 03/13/2012 01:15 PM, Laura Spitler wrote:

Thanks for the png of the design; I was trying to remember how it
works from memory.

But that said, I still think I'm right. A single word is a
concatenation of 4 32-bit integers consisting of a 30-bit counter and
the top two bits reserved as an "integer counter". Note that the
freeze counter is 29 bits, but the DRAM can only see 2^24 addresses.
The DRAM is therefore written 2^5 times before it stops. This is why
you see the five 1's in bits 25-29.

Laura


On Tue, Mar 13, 2012 at 3:58 PM, Ryan Monroe  wrote:

Hi Laura,

Thanks for the quick response!  To clarify, in the image I posted, each row
is a 'word' in the DRAM.  They are separated into 32-bit chunks for clarity.
  So, there are as many rows under this (conceptually) as there are addresses
in the dram.

The first row is address 0, so we shouldn't see counter wrapping.  IMHO,
there shouldn't be any '1's in the upper bits -- besides the top two, which
are set by the concat block in the model (picture attached for convenience)

Thanks for the tip about the controller!  For this application, we only need
2^20 memory locations, so we're good this time.

--Ryan Monroe



On 03/13/2012 12:47 PM, Laura Spitler wrote:

Hi Ryan,

I believe I understand what's going on assuming that your output
should be read by columns and not by rows (and that there are more
rows for each column below what you printed out).

The counter in the design is something like 29-bits (I may have the
exact value off). For a DRAM with 1 GB of memory, there are only 2^25
address, so extra 1's you're seeing at the top of the word is the
counter wrapping.
That said, there was a bug in the original verliog code

(
mlib_devel_10_1/xps_lib/XPS_ROACH_base/pcores/dram_controller_v1_00_a/hdl/verilog/dram_controller.v
)

where ROW_WIDTH = 13 instead of ROW_WIDTH = 14.

This should have been fixed, but you should probably double check it.
If the bug is still there, only 2^24 addresses are written.

Hope that helps.
Laura


On Tue, Mar 13, 2012 at 3:09 PM, Ryan Monroe
  wrote:

Hey all,

I'm trying to use the dram as a coefficient buffer, but I'm having some
problems with the demos.

I tried to use test_dram_10_1 to fill the dram and subsequently read out
the
results, but it looks like the upper eight bits of every 32-bit word are
being corrupted (or used as some kind of parity bits??)

Attached is a printout of the first couple of words read from the dram.
  As
expected, the lower bits are a counter and the upper two bits increase
across the row, so to speak.  However, the following six bits (which
should
be '0's, since they're part of a counter which starts at '0'), are all
1s!

Anyone else seen this?(note: I'm using ISE 13.4.  Might be the
problem...)

Thanks guys!

--Ryan Monroe







Re: [casper] Matlab Crashing wiht CASPER Library

2012-04-05 Thread Ryan Monroe
Last year, I was using an old-ish version of the CASPER libraries and 
had the same problems.  MATLB segfaulted on the simulation or generation 
of large models (specifically, fft_wideband_real with n_sim_inputs=4 and 
FFTSize >= 13)


Since switching to a newer version of MATLAB and ISE 13.*,  (currently 
MATLAB 2011b and ISE 13.4), I no longer have this issue.



--Ryan Monroe

On 04/05/2012 03:12 PM, r...@physics.ucsb.edu wrote:

There was a complaint at my campus about
Matlab crashing when complex designs use
the new (Fall 2011) CASPER libraries. I
tried re-constructing one of my designs
using the new libraries and saw Matlab
crash when I tried to simulate, compile
or even sometimes when I tried to look
under the mask of a CASPER block in some
of my designs. I investigated further
into the matter and found that certain
blocks cause Matlab to crash even in
simple designs. For example, if I add a
snap block to a design, Matlab crashes
whenever I simulate, use bee_xps, change
display formats in Simulink, or try to
look under the mask of certain blocks. I
think I remember this problem being
mentioned in the CASPER lists, but I
don't remember the subject title or when
messages about it were posted. Does
anybody remember this problem being
mentioned?








Re: [casper] 1-2 GHz sampler

2012-05-28 Thread Ryan Monroe
I've used the ADC083000s before and the interleave can be pretty bad-- 
as much as 15% difference across their 3GHz spectrum!  You should expect 
better performance because you only need to tune for 1GHz of bandwidth 
however.  I can't speak for the other parts.


I have a technique I've developed which cancels out all cross-board ADC 
interleaving issues.  I've been trying to get JPL to let me release it 
to the public domain: I'll see what I can do, and please tell me if you 
have issues.


--Ryan Monroe




On 5/28/2012 5:47 PM, Dan Werthimer wrote:

hi bill,

i think all the boards you mention have analog bandwidth out to 2 GHz,
so they should work well for your 1-2 GHz band.

the asiaa board is the least expensive, but this board does not have
programmable
attenuators like the Kat-ADC.   the asiaa board can be used as a single
5 Gsps ADC, or as a dual 2.5 GHz ADC.
we have used the asiaa board as a single 5 gsps adc, and it works quite well.
but we have never tested it as a dual adc - perhaps others reading
this email can
give you advice about using it in dual mode.if you are using roach
I, you can't
get the 8 bit version of the asiaa board working at the full 5 gsps.
if you are using
roach II, you can use it at 5 Gsps.

best wishes,

dan




On Thu, May 24, 2012 at 11:30 AM, Bill Petrachenko  wrote:

I'm designing a digital data acquisition system using a ROACH1 board. I need
to sample two Nyquist zones at 1024-2048 MHz. It appears that in the Casper
group of products, a pair of ASIAA, ADC1x3000-8, or KatADC boards would work
well and nicely interfaced to a single ROACH1 board (although the ASIAA
board is not mention explicitly on the web-site). Is there any reason to
choose one board over another? The gain adjustment stage is attractive on
the KatADC but the performance of the ADC1x3000-8 chip seems marginally
better at 2-GHz input frequency. The e2v chip seems less established than
the National chips. Is interleaving or calibration an issue for any of the
chips?

I'd be grateful for any opinions on this.
Thanks, -Bill.





Re: [casper] 1-2 GHz sampler

2012-05-29 Thread Ryan Monroe

This is identical to what I developed too...  Sorry to disappoint :-)

I know a fine handful of calibration techniques which correct for 
gain/phase mismatches in a bulk manner (and don't require a calibration 
source), but I don't think that many of them handle frequency-dependent 
variations while not requiring a calibration source.  The only one that 
comes to mind and could be any good is this:


"GENERALIZED BLIND MISMATCH CORRECTION FOR TWO-CHANNEL TIME-INTERLEAVED 
A-TO-D CONVERTERS"
A search on IEEE xplore should do it.  I haven't had the time to figure 
out everything it's saying (so busy lately), so that part's on you.


Also worth reading if you're going to try and solve this problem is:
"Explicit Analysis of Channel Mismatch Effects in Time-Interleaved ADC 
Systems"


In my opinion, you should consider using the technique which Glenn 
described, even if you don't have a calibration source.  You won't be 
able to remove all of the errors, but it'll be a far cry from nothing at 
all.  Also, the correction is dead simple:  just perform an FFT as 
usual, but at the end of the second-to-last stage of our DIT FFT, you 
have the spectra from your two boards.  Simply applying a gain and phase 
correction to one of the two spectra is sufficient.  Once you correct at 
the final stage, you could back up one more stage to correct for the 
four ADC cores.  I haven't done the latter since in our application, the 
interleave artifacts between the two cores on each 083000 chip falls 
below the quantization noise floor.


In lab, this improves our SFDR by 17 dB.  I should be able to give you 
flight data in a month or two.


It's worth noting that when I tried to solve for the correction factors 
analytically (off of raw ADC data), my corrections were off by a percent 
or so.  Doing a search for the minimum spurious result gave me much 
better performance.


Cheers!

--Ryan

On 05/29/2012 08:25 AM, Jason Manley wrote:

One of my Bell Labs friends told me about a company  (in Finland if I remember 
correctly) which sells IP for FPGAs which breaks out the samples from the 4 
cores and separately frequency compensates them before re-assembling the input 
stream.  The improvement for communications signals was only a few dB, however.

We have been in touch with Signal Processing Devices, a Swedish company that 
does precisely what Glenn describes. Francois went to visit them and they tried 
their IP on our existing ADCs (iADC in 2-way interleaved and KATADC in 4-way 
interleaved). I've attached the results for your reference. We were a little 
underwhelmed. You also need a calibration source to do this on-the-fly which 
wasn't possible for us. The source data was a CW tone at various frequencies. 
You can see the before/after plots.

I'm also interested to hear in any other techniques for doing this.

Jason





Re: [casper] CLKIN1_PERIOD error when building for ROACH II

2012-07-20 Thread Ryan Monroe
You seeing it in the actual HDL file, or just in the build logs?  I'm 
presuming that that log file is directly written/altered by a matlab 
script in the build process? If so, that's a great place to start.


On 07/20/2012 02:13 PM, G Jones wrote:

I agree, but the mystery is how that crazy binary value is getting in there...
I should also note that I was able to build ROACH II designs with the
casper-astro/mlib_devel, but the resulting boffiles caused a kernel
panic sort of error when reading the registers. Using a known good
boffile from Rurik showed that the ROACH II itself was not the cause
of the problem.

Glenn

On Fri, Jul 20, 2012 at 2:09 PM, Ryan Monroe  wrote:

Hey Glenn,

I would guess that the HDL is wrong.  Reference the ISE 13.4 / Virtex 6 HDL
libraries guide, page 249:
http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_4/virtex6_hdl.pdf

Looks like it wants a float, representing the input period here. Definitely
not a binary value...

But I haven't looked at the HDL myself, I'm just going off of the report in
this email.  Take this all with a pinch of salt.

Anyways, how's life?  I haven't seen you in awhile--don't I still owe you a
beer?  :-)

--Ryan


On 07/20/2012 01:51 PM, G Jones wrote:

Hello,
I am running into a problem (error message below) when trying to build
simple designs for the ROACH II. I am using the ska-sa/mlib_devel
freshly cloned from github. I have double checked that my paths only
point to this version. I am using ISE 13.4 and MATLAB 2011a. The
design is very simple, just a blinking LED and a software register. I
initially tried clocking off of sys_clk at 100 MHz, but found the same
problem when I added an ADC to the design and selected adc0_clk at 200
MHz.
The problem occurs during ngdbuild of system.ngd

Mark Wagner says he has also seen this problem.

I checked the roach_infrastructure.v code in the pcore and it looks
reasonable to me.

Does anyone have any suggestions?

Thanks,
Glenn

Annotating constraints to design from ucf file "system.ucf" ...
Resolving constraint associations...
Checking Constraint Associations...
INFO:ConstraintSystem:178 - TNM 'sys_clk_n', used in period specification
 'TS_sys_clk_n', was traced into MMCM_ADV instance
 infrastructure_inst/MMCM_BASE_clk_200_inst. The following new TNM
groups and
 period specifications were generated at the MMCM_ADV output(s):
 CLKOUT1: 

INFO:ConstraintSystem:178 - TNM 'sys_clk_n', used in period specification
 'TS_sys_clk_n', was traced into MMCM_ADV instance
 infrastructure_inst/MMCM_BASE_inst. The following new TNM groups and
period
 specifications were generated at the MMCM_ADV output(s):
 CLKOUT1: 

INFO:ConstraintSystem - The Period constraint 
 [system.ucf(393)], is specified using the Net Period method which is
not
 recommended. Please use the Timespec PERIOD method.

INFO:ConstraintSystem - The Period constraint 
 [system.ucf(394)], is specified using the Net Period method which is
not
 recommended. Please use the Timespec PERIOD method.

Done...

ERROR:LIT:374 - Attribute CLKIN1_PERIOD on MMCM_ADV instance
 "infrastructure_inst/infrastructure_inst/MMCM_BASE_inst" has invalid
value

"64'SB1010". The
 CLKIN1_PERIOD attribute should have a real number, followed by
optional time
 or frequency units; nS are assumed if no units are given.
WARNING:NgdBuild:1440 - User specified non-default attribute value

(64'SB1010) was
 detected for the CLKIN1_PERIOD attribute on MMCM
 "infrastructure_inst/MMCM_BASE_inst".  This does not match the PERIOD
 constraint value (100 MHz.).  The uncertainty calculation will use the
PERIOD
 constraint value.  This could result in incorrect uncertainty
calculated for
 MMCM output clocks.
Checking expanded design ...






Re: [casper] CLKIN1_PERIOD error when building for ROACH II

2012-07-20 Thread Ryan Monroe

Ahem, by 'log file', I meant 'HDL file'.

On 07/20/2012 02:13 PM, G Jones wrote:

I agree, but the mystery is how that crazy binary value is getting in there...
I should also note that I was able to build ROACH II designs with the
casper-astro/mlib_devel, but the resulting boffiles caused a kernel
panic sort of error when reading the registers. Using a known good
boffile from Rurik showed that the ROACH II itself was not the cause
of the problem.

Glenn

On Fri, Jul 20, 2012 at 2:09 PM, Ryan Monroe  wrote:

Hey Glenn,

I would guess that the HDL is wrong.  Reference the ISE 13.4 / Virtex 6 HDL
libraries guide, page 249:
http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_4/virtex6_hdl.pdf

Looks like it wants a float, representing the input period here. Definitely
not a binary value...

But I haven't looked at the HDL myself, I'm just going off of the report in
this email.  Take this all with a pinch of salt.

Anyways, how's life?  I haven't seen you in awhile--don't I still owe you a
beer?  :-)

--Ryan


On 07/20/2012 01:51 PM, G Jones wrote:

Hello,
I am running into a problem (error message below) when trying to build
simple designs for the ROACH II. I am using the ska-sa/mlib_devel
freshly cloned from github. I have double checked that my paths only
point to this version. I am using ISE 13.4 and MATLAB 2011a. The
design is very simple, just a blinking LED and a software register. I
initially tried clocking off of sys_clk at 100 MHz, but found the same
problem when I added an ADC to the design and selected adc0_clk at 200
MHz.
The problem occurs during ngdbuild of system.ngd

Mark Wagner says he has also seen this problem.

I checked the roach_infrastructure.v code in the pcore and it looks
reasonable to me.

Does anyone have any suggestions?

Thanks,
Glenn

Annotating constraints to design from ucf file "system.ucf" ...
Resolving constraint associations...
Checking Constraint Associations...
INFO:ConstraintSystem:178 - TNM 'sys_clk_n', used in period specification
 'TS_sys_clk_n', was traced into MMCM_ADV instance
 infrastructure_inst/MMCM_BASE_clk_200_inst. The following new TNM
groups and
 period specifications were generated at the MMCM_ADV output(s):
 CLKOUT1: 

INFO:ConstraintSystem:178 - TNM 'sys_clk_n', used in period specification
 'TS_sys_clk_n', was traced into MMCM_ADV instance
 infrastructure_inst/MMCM_BASE_inst. The following new TNM groups and
period
 specifications were generated at the MMCM_ADV output(s):
 CLKOUT1: 

INFO:ConstraintSystem - The Period constraint 
 [system.ucf(393)], is specified using the Net Period method which is
not
 recommended. Please use the Timespec PERIOD method.

INFO:ConstraintSystem - The Period constraint 
 [system.ucf(394)], is specified using the Net Period method which is
not
 recommended. Please use the Timespec PERIOD method.

Done...

ERROR:LIT:374 - Attribute CLKIN1_PERIOD on MMCM_ADV instance
 "infrastructure_inst/infrastructure_inst/MMCM_BASE_inst" has invalid
value

"64'SB1010". The
 CLKIN1_PERIOD attribute should have a real number, followed by
optional time
 or frequency units; nS are assumed if no units are given.
WARNING:NgdBuild:1440 - User specified non-default attribute value

(64'SB1010) was
 detected for the CLKIN1_PERIOD attribute on MMCM
 "infrastructure_inst/MMCM_BASE_inst".  This does not match the PERIOD
 constraint value (100 MHz.).  The uncertainty calculation will use the
PERIOD
 constraint value.  This could result in incorrect uncertainty
calculated for
 MMCM output clocks.
Checking expanded design ...






Re: [casper] CLKIN1_PERIOD error when building for ROACH II

2012-07-20 Thread Ryan Monroe
Comment: 1000/CLK_FREQ will evaluate to 10 when CLK_FREQ=100.  That 
crazy binary value you see is 0b1010 = 0d10.  Looks like a variable 
casting issue


On 07/20/2012 02:39 PM, G Jones wrote:

Some more information:

The only place the problem value is shown is in
XPS_ROACH2_base/synthesis/infrastructure_inst_wrapper_xst.srp

Elaborating module
.

In the verilog file, CLKIN1_PERIOD  is set to (1000/CLK_FREQ) and the
CLK_FREQ parameter defaults to 100.
The value of CLK_FREQ passed from the .mhs is also 100

In the data/roach_infrastructure_v2_1_0.mpd, CLK_FREQ is set to 100.
The data type is indicated as integer which is a bit suspicous, but
it's the same in the "working" casper-astro/mlib_devel

Glenn


On Fri, Jul 20, 2012 at 2:17 PM, G Jones  wrote:

Yep, I agree again, but the MHS file that sets the parameters (as far
as I remember) looks OK too. I'll try grepping through the build
directory to see if I can figure out where the crazy value is coming
from.

On Fri, Jul 20, 2012 at 2:15 PM, Ryan Monroe  wrote:

Ahem, by 'log file', I meant 'HDL file'.


On 07/20/2012 02:13 PM, G Jones wrote:

I agree, but the mystery is how that crazy binary value is getting in
there...
I should also note that I was able to build ROACH II designs with the
casper-astro/mlib_devel, but the resulting boffiles caused a kernel
panic sort of error when reading the registers. Using a known good
boffile from Rurik showed that the ROACH II itself was not the cause
of the problem.

Glenn

On Fri, Jul 20, 2012 at 2:09 PM, Ryan Monroe 
wrote:

Hey Glenn,

I would guess that the HDL is wrong.  Reference the ISE 13.4 / Virtex 6
HDL
libraries guide, page 249:

http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_4/virtex6_hdl.pdf

Looks like it wants a float, representing the input period here.
Definitely
not a binary value...

But I haven't looked at the HDL myself, I'm just going off of the report
in
this email.  Take this all with a pinch of salt.

Anyways, how's life?  I haven't seen you in awhile--don't I still owe you
a
beer?  :-)

--Ryan


On 07/20/2012 01:51 PM, G Jones wrote:

Hello,
I am running into a problem (error message below) when trying to build
simple designs for the ROACH II. I am using the ska-sa/mlib_devel
freshly cloned from github. I have double checked that my paths only
point to this version. I am using ISE 13.4 and MATLAB 2011a. The
design is very simple, just a blinking LED and a software register. I
initially tried clocking off of sys_clk at 100 MHz, but found the same
problem when I added an ADC to the design and selected adc0_clk at 200
MHz.
The problem occurs during ngdbuild of system.ngd

Mark Wagner says he has also seen this problem.

I checked the roach_infrastructure.v code in the pcore and it looks
reasonable to me.

Does anyone have any suggestions?

Thanks,
Glenn

Annotating constraints to design from ucf file "system.ucf" ...
Resolving constraint associations...
Checking Constraint Associations...
INFO:ConstraintSystem:178 - TNM 'sys_clk_n', used in period
specification
  'TS_sys_clk_n', was traced into MMCM_ADV instance
  infrastructure_inst/MMCM_BASE_clk_200_inst. The following new TNM
groups and
  period specifications were generated at the MMCM_ADV output(s):
  CLKOUT1: 

INFO:ConstraintSystem:178 - TNM 'sys_clk_n', used in period
specification
  'TS_sys_clk_n', was traced into MMCM_ADV instance
  infrastructure_inst/MMCM_BASE_inst. The following new TNM groups
and
period
  specifications were generated at the MMCM_ADV output(s):
  CLKOUT1: 

INFO:ConstraintSystem - The Period constraint 
  [system.ucf(393)], is specified using the Net Period method which
is
not
  recommended. Please use the Timespec PERIOD method.

INFO:ConstraintSystem - The Period constraint 
  [system.ucf(394)], is specified using the Net Period method which
is
not
  recommended. Please use the Timespec PERIOD method.

Done...

ERROR:LIT:374 - Attribute CLKIN1_PERIOD on MMCM_ADV instance
  "infrastructure_inst/infrastructure_inst/MMCM_BASE_inst" has
invalid
value

"64'SB1010".
The
  CLKIN1_PERIOD attribute should have a real number, followed by
optional time
  or frequency units; nS are assumed if no units are given.
WARNING:NgdBuild:1440 - User specified non-default attribute value

(64'SB1010)
was
  detected for the CLKIN1_PERIOD attribute on MMCM
  "infrastructure_inst/MMCM_BASE_inst".  This does not match the
PERIOD
  constraint value (100 MHz.).  The uncertainty calculation will use
the
PERIOD
  constraint value.  This could result in incorrect uncertainty
calculated for
  MMCM output clocks.
Checking expanded design ...






[casper] ROACH: Stale NFS Handle?

2012-09-10 Thread Ryan Monroe

Hi All,

We recently purchased two ROACH boards from Digicom and we seem to be 
experiencing a bit of an issue.  Upon trying to boot to the SD card on 
either, several processes don't come up, complaining of "Stale NFS 
handle".  If I configure eth0, I can start it that once but it will not 
initialize by default.  Attached is a copy of a dump from minicom via 
serial.


The only command I am using in uboot is mmcboot.  In addition, I'm using 
the SD cards as they arrived (which seems to have the correct 
filesystem/format).  Should I be doing something else in addition to 
mount the SD card?


Finally, I saw a similar thread in the archive (here) 
.


Any advice?  Thank you very much!

--Ryan
[!p[?3;4l>(B[?1h=(B[?12l[?25h[?25l   

  
[?12l[?25h[?12l[?25hNOR[?25l CTRL-A Z for help 
|115200 8N1 | NOR | Minicom 2.3| VT102 |  Offline   
  
[?12l[?25h[?25l(B+---+| 
  ||  Initializing Modem   ||   
|+---+[?25l[?12l[?25h  

  
 
 
     
   (B
(BWelcome to minicom 2.3

OPTIONS: I18n 
Compiled on Nov 23 2010, 13:27:13.
Port /dev/ttyS0Press CTRL-A Z for help on special keys


AT S7=45 S0=0 L1 V1 X4 &C1 E1 Q0

LOGIN INCORRECT
ROACH LOGIN: 

U-Boot 2008.10-svn3231 (Jul 15 2010 - 14:58:38)

CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133, OPB=66, EBC=66 MHz)
   Security/Kasumi support
   Bootstrap Option C - Boot ROM Location EBC (16 bits)
   32 kB I-Cache 32 kB D-Cache
Board: Roach
I2C:   ready
DTT:   1 FAILED INIT
DRAM:  (spd v1.3) dram: notice: ecc ignored
 1 GB
FLASH: 64 MB
USB:   Host(int phy) Device(ext phy)
Net:   ppc_4xx_eth0

Roach Information
Serial Number:040308
Monitor Revision: 10.1.1843
CPLD Revision:8.0.1588

type run netboot to boot via dhcp+tftp+nfs
type run soloboot to run from flash without network
type run mmcboot to boot using filesystem on mmc/sdcard
type run usbboot to boot using filesystem on usb
type run bit to run tests

Hit any key to stop autoboot: 10  9  8  7  6  5  4  3  
2  1  0 
WARNING: adjusting available memory to 3000
## Booting kernel from Legacy Image at fc00 ...
   Image Name:   Linux-2.6.25-svn2382-dirty3
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:1399105 Bytes =  1.3 MB
   Load Address: 
   Entry Point:  
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK
id mach(): done
MMU:enter
MMU:hw init
MMU:mapin
MMU:setio

MMU:exit

setup_arch: enter

setup_arch: bootmem

ocp: exit

arch: exit

Linux version 2.6.25-svn2382-dirty3 (marc@seif) (gcc version 4.0.0 (DENX ELDK 
4.0 4.0.0)) #23 Tue Nov 10 15:30:48 SAST 2009

AMCC PowerPC 440EPx Roach Platform

Zone PFN ranges:

  DMA 0 ->   262143

  Normal 262143 ->   262143

Movable zone start PFN for each node

early_node_map[1] active PFN ranges

0:0 ->   262143

Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 260096

Kernel command line: console=ttyS0,115200 
mtdparts=physmap-flash.0:1792k(linux),256k@0x1c(fdt),8192k@0x20(root),54656k@0xa0(usr),256k@0x3f60000(env),384k@0x3fa0000(uboot)fdt_addr=

Re: [casper] ROACH: Stale NFS Handle?

2012-09-10 Thread Ryan Monroe
For the moment, I scavenged another SD card from a working ROACH, which 
seems to have fixed the issue. I'd say that that FS corruption is a 
pretty good bet.  I'll try this out tomorrow, when I get the chance.  
Thank you!


--Ryan

 On 09/10/2012 05:53 PM, Laura Vertatschitsch wrote:
Our sys admin was able to get our system back up to working condition 
with the following script:


#!/bin/sh
sync
echo
echo "THIS WILL REMOUNT, FSCK, AND REBOOT YOUR COMPUTER, YOU HAVE 5s 
TO CTRL+C"

echo
sleep 6
sync;sync
mount -n -o remount,ro / || exit 1
fsck /dev/mmcblock1
echo
echo "REBOOTING IN 5s"
sleep 6
reboot

If this doesn't work let me know.  We operate mobile, so unfriendly 
shutdowns happen and we wanted to get back up and running in the 
field.  There may be a better way of doing this - so I would love 
feedback from the community as well!


--Laura



On Mon, Sep 10, 2012 at 5:33 PM, G Jones <mailto:glenn.calt...@gmail.com>> wrote:


I had some luck solving this problem by putting the SD card in another
machine and using fsck.ext2 on it to fix the corrupt FS. My experience
is the SD cards are overly susceptible to this sort of FS corruption.
I've had much better luck netbooting (once getting through the
annoying initial hurdle of getting everything set up).
Glenn

On Mon, Sep 10, 2012 at 8:08 PM, Louis P. Dartez
mailto:louisdar...@gmail.com>> wrote:
> Hello Ryan,
>
> This happened to me as well recently with a new ROACH
board. I
> noticed that the file '/etc/network/run/ifstate' was corrupted.
When I used
> the command 'ls -alh' all I got was question marks in each
field. I was
> unable to rm or mv the file, or even execute 'touch ifstate'.
Another ROACH
> board that I received in the same shipment worked right out of
the box. I
> was going to try completely formatting the troublesome SD card
and copying
> everything on the working SD onto it. I have not tried this yet,
> though...but I think it should work.
>
> -Louis P. Dartez
>
> Arecibo Remote Command Center Scholar
> Center for Advanced Radio Astronomy Researcher
> Department of Physics and Astronomy
> University of Texas at Brownsville
>
> On 09/10/2012 06:42 PM, Ryan Monroe wrote:
>
> Hi All,
>
> We recently purchased two ROACH boards from Digicom and we seem
to be
> experiencing a bit of an issue.  Upon trying to boot to the SD
card on
> either, several processes don't come up, complaining of "Stale
NFS handle".
> If I configure eth0, I can start it that once but it will not
initialize by
> default.  Attached is a copy of a dump from minicom via serial.
>
> The only command I am using in uboot is mmcboot.  In addition,
I'm using the
> SD cards as they arrived (which seems to have the correct
> filesystem/format).  Should I be doing something else in
addition to mount
> the SD card?
>
> Finally, I saw a similar thread in the archive (here).
>
> Any advice?  Thank you very much!
>
> --Ryan
>
>






Re: [casper] virtex5 arithmetic speed

2012-09-18 Thread Ryan Monroe
Just a comment:  It is actually pretty practical to adjust our FFTs to 
run with word-lengths of up to about 27 bits (the last stages would use 
double DSP resources). FFT lengths which would need this are not 
completely implausible on a ROACH 2, so feel free to speak up if the 
need arises.


--Ryan


On 09/18/2012 12:45 AM, Alex Zahn wrote:
Thank you--that's very useful. I didn't know the DSP slices could do 5 
ns multiplies.


Ultimately what I'm what I'm getting at here is trying to estimate how 
many filter taps I can reasonably support on a 5 ns clock, with new 
data words arriving on every clock, questions of available chip 
resources aside.


If I understand this correctly, even with new data arriving on every 5 
ns clock, ROACH should (up to practical considerations) be able to 
operate as many taps as can fit on the FPGA. Is this right?


-Alex

On Mon, Sep 17, 2012 at 11:45 PM, Jason Manley > wrote:


The latency through an FPGA will be high relative to a CPU/GPU,
because the FPGA's clock rate is lower (1/200MHz=5ns). But these
operations can be pipelined so that you can do a DSP operation on
every clock cycle. ROACH 1 and ROACH 2 will both run at 200MHz
very easily.

Considering ROACH-1, it has 640 DSP slices and you can do up to an
18 bit x 25 bit multiply in a single DSP slice. So you can do 640
multiply (and/or addition operation) operations every 1/200MHz=5ns.

But then you can also start using the 14720 slices for multipliers
or adders so you can get many more operations per second. And
then, if you're doing low resolution operations, you can fill the
244 BRAMs with lookup tables and just lookup the product for a
given input vector to do even more operations on every clock cycle.

If you wanted to throw the whole FPGA at DSP operations, you could
easily say that a ROACH-1 board is capable of over 2 TeraOps/s for
4-bit operations (common in radio astronomy). But this is an
unrealistic figure of merit because it ignores things like
pipelining registers and data routing requirements, memory
controllers and the like which would all be needed in a practical
design.

Jason

On 18 Sep 2012, at 05:20, Alex Zahn wrote:

> I've been browsing the xilinx literature, but I just can't seem
to get any idea how long one can usually expect addition and
multiplication operations to take. I realize this depends on a lot
of factors in the design, but does anyone know if it's reasonable
to multiply two 16 bit numbers in a single clock with a clock rate
of 200 MHz? I would test this on my ROACH out to find out, but I'm
away from lab for a while, and thus rendered rather helpless for
the time being.
>
> Unrelated, is there any online documentation on the new snapshot
block?
>
> -Alex Zahn






Re: [casper] VEGAS Correlator

2012-10-11 Thread Ryan Monroe

Hey Pedro,

This actually happened to me a few days back.  I'm no expert in linux, 
but it appears that there are some libraries which Xilinx expects--and 
you are not loading.  I added this line to my startup script to resolve 
the problems (you should change yours to suit)


export 
LD_LIBRARY_PATH=/home/Xilinx/13.4/ISE_DS/ISE/sysgen/bin/lin64/dot/:$LD_LIBRARY_PATH


Note that I am using ISE 14.2 as well, but had to load the libraries 
from 13.4.  In my case, it appeared that the 14.2 libraries were 
compiled for 32 bit, where I use 64 bit.  I solved it by simply mapping 
to an older revision's libraries, but your mileage will vary.


Hope this helped!

--Ryan

On 10/11/2012 11:28 AM, Pedro Sánchez wrote:

Dear Casper Group

I'm working with the VEGAS correlator 
(https://github.com/casper-astro/vegas_devel) using Matlab2012a, 
Xilinx 14.2 and Ubuntu 12.04 with this libraries: 
https://github.com/cs150bf/mlib_devel and 
https://github.com/casper-astro/xblocks_devel. Also I had already 
installed libltdl-dev in Ubuntu. My issue is when I try to open the 
VEGAS model, in the Matlab window appears several times this error:


"/opt/Xilinx/14.2/ISE_DS/ISE/
sysgen/bin/lin64/dot/dot.bin: error while loading shared libraries: 
libexpat.so.0: cannot open shared object file: No such file or 
directory Error running Dot."


And every time I try to do something in the simulink window, the error 
appears again several times. . So what do you think it's going on?


Regards!


--
Pedro Sánchez M.
Universidad de Chile
Facultad de Ciencias Físicas y Matemáticas
Departamento de Ingeniería Eléctrica
Area de Instrumentación Astronómica




Re: [casper] DDR Memory Modules for Roach2

2012-10-25 Thread Ryan Monroe
So I have an application where I'd really love to use a 16 GB dimm.  
What do you think are the chances we can get one of those approved?  (I 
might be able to do some of the legwork myself / hack up the gateware 
controller as needed)


Are there any special addressing considerations for ROACH2?  I remember 
we had an address pin unconnected on ROACH1


Thanks

--Ryan


On 10/25/2012 01:32 PM, Wesley New wrote:

Hi Jason,

I think Mo had different Kingston DIMMs which are not 1gig, but the 
supplier claims that they are compatible. We have to be very careful 
when choosing the DIMM as we can only support a few different modules 
as the gateware controller has the SPD information hard coded.


I am unsure of the status, but Alec will be able to help out, when he 
gets into the office tomorrow morning (South African Time).


I noticed the wiki page heading read ROACH DDR3 Memory. I have changed 
it to ROACH2 DDR3 Memory. So the link is now at 
https://casper.berkeley.edu/wiki/ROACH2_DDR3_Memory_Modules


Regards

Wes



On Thu, Oct 25, 2012 at 10:16 PM, Jason Castro > wrote:


The following web page lists two memory modules for use in the ROACH2:

https://casper.berkeley.edu/wiki/ROACH_DDR3_Memory_Modules

Kingston KVR1333D3S8R9S/1G
Samsung M393B5773CH0-CH9

Mo at Digicom was only aware of the Kingston Module as being
approved for use in the ROACH2.  He also reported that the
Kingston Module may have some problems as well.  Can anyone give
me updated information on this topic?

Thanks,

Jason Castro









Re: [casper] DDR Memory Modules for Roach2

2012-10-26 Thread Ryan Monroe
Eh it's a project for a few months down the road at least, but I'd be 
happy to try my hand at putting the controller together if you guys need 
a hand.  Thanks for the help!


On 10/26/2012 07:31 AM, Alec Rust wrote:
Ryan I'll have to check compatibility with 16gig and get back to you 
but currently we don't have DDR3 yellowblock so you will have to put 
the controller together as you suggested.


On Thu, Oct 25, 2012 at 10:36 PM, Ryan Monroe <mailto:ryan.m.mon...@gmail.com>> wrote:


So I have an application where I'd really love to use a 16 GB
dimm.  What do you think are the chances we can get one of those
approved?  (I might be able to do some of the legwork myself /
hack up the gateware controller as needed)

Are there any special addressing considerations for ROACH2?  I
remember we had an address pin unconnected on ROACH1

Thanks

--Ryan



On 10/25/2012 01:32 PM, Wesley New wrote:

Hi Jason,

I think Mo had different Kingston DIMMs which are not 1gig, but
the supplier claims that they are compatible. We have to be very
careful when choosing the DIMM as we can only support a few
different modules as the gateware controller has the SPD
information hard coded.

I am unsure of the status, but Alec will be able to help out,
when he gets into the office tomorrow morning (South African Time).

I noticed the wiki page heading read ROACH DDR3 Memory. I have
changed it to ROACH2 DDR3 Memory. So the link is now at
https://casper.berkeley.edu/wiki/ROACH2_DDR3_Memory_Modules

Regards

Wes



On Thu, Oct 25, 2012 at 10:16 PM, Jason Castro mailto:jcas...@nrao.edu>> wrote:

The following web page lists two memory modules for use in
the ROACH2:

https://casper.berkeley.edu/wiki/ROACH_DDR3_Memory_Modules

Kingston KVR1333D3S8R9S/1G
Samsung M393B5773CH0-CH9

Mo at Digicom was only aware of the Kingston Module as being
approved for use in the ROACH2.  He also reported that the
Kingston Module may have some problems as well.  Can anyone
give me updated information on this topic?

Thanks,

Jason Castro












Re: [casper] Synchronization issue (zero channel jumping)

2012-11-26 Thread Ryan Monroe

Hi Dave,
I haven't been following all of this discussion, but bear with me:

I saw a problem similar to what you're experiencing in the past. I 
designed a block (my "optimized" PFB), which was designed to only ever 
output one sync pulse (the sync_delay's were designed to never respond 
to later pulses).  I later discovered that for some obscure reason, this 
caused the downstream system to randomly shift channel 0.  Sound familiar?


You can't easily tell if the PFB output is correct, except by running an 
FFT on PFB'ed data and considering the output.  That would at least, 
tell you that the PFB is doing its job, but won't easily diagnose the 
problem.  You could also put in a slow sine wave (one that's not in the 
middle of a FFT channel) and look at the PFB output.  I would expect 
that there would be no discontinuities in the time domain (or they 
should be very small).  If you see a big one, that's probably the end of 
your PFB frame, and would indicate a botched sync timing.


That help?

--Ryan Monroe

On 11/26/2012 12:20 PM, Ricardo Finger wrote:

Hello Dave,

The zero channel jumps to a different place every time I run the bof, 
but when running is stable.

I am not using and external sync pulse.

On simulations the pulse period looks ok, but I don't know if is 
synchronized with the data.  Actually, I don't know how to check this. 
For example: if I use for simulation a band limited white noise; how 
can I know when the first channel is going out of the pfb or fft? I 
guess with a big DC offset I could "mark" channel 0, but anyway I 
wouldn't know how this should look at the pfb output...


Cheers,

r


On 26-11-2012 14:04, David MacMahon wrote:

Hi, Ricardo,

On Nov 23, 2012, at 8:55 AM, Ricardo Finger wrote:

The zero channel jumps around every time I run the bof (It can be 
anywhere).
Is the zero channel different but stable each separate time you run 
the bof or does it jump around while the design is running?



On simulations the pulse looks ok for both cases.
What are you using for the external sync pulse?  Are you sure that 
the design is syncing up to it at startup?  What happens if you run 
the design without the external sync pulse?


Just some ideas,
Dave








[casper] Any experience with the katcp-C interface?

2012-11-27 Thread Ryan Monroe

Hey all,

I'm trying to read/write to ROACH registers and shared_memory elements 
directly from C.  I can read registers correctly, but writing seems to 
have no effect.  Can anyone who has a bit more experience show me what 
I'm doing wrong?  Thanks!


(read command which works)
  result = send_rpc_katcl(l, 5000, KATCP_FLAG_FIRST | KATCP_FLAG_STRING 
, "?write", KATCP_FLAG_STRING, regName, KATCP_FLAG_STRING, "0", 
KATCP_FLAG_LAST|KATCP_FLAG_ULONG, 0,NULL);


(write command which fails quietly)
  result = send_rpc_katcl(l, 5000, KATCP_FLAG_FIRST | KATCP_FLAG_STRING 
, "?write", KATCP_FLAG_STRING, regName, KATCP_FLAG_STRING, "0", 
KATCP_FLAG_LAST|KATCP_FLAG_ULONG, 0,NULL);



--Ryan Monroe



Re: [casper] Any experience with the katcp-C interface?

2012-11-28 Thread Ryan Monroe
It does help, thanks!  I'm interested in how you broke down the call 
though.  I would have parsed it differently:  In telnet, the command 
would be typed like this


"?write"   

where  is a string containing the register name,
 is a string containing the offset, and
 is raw binary data containing the value.

So I was trying to do:
?write
 = "trig_sel" (string)
 = "0" (string)
 = 0x (ulong)


Actually, I get it now: the offset is placed at the /end/ of a write 
command through the C interface, not the beginning.  Do I have that right?


Also, Is there more documentation to be had somewhere?

Thank you very much...

--Ryan Monroe

On 11/28/2012 02:16 AM, Marc Welz wrote:

Hello


(write command which fails quietly)
   result = send_rpc_katcl(l, 5000, KATCP_FLAG_FIRST | KATCP_FLAG_STRING,"?write", 
KATCP_FLAG_STRING, regName, KATCP_FLAG_STRING, "0", KATCP_FLAG_LAST|KATCP_FLAG_ULONG, 
0,NULL);

So this usage isn't quite correct. A "?write" request expects 3 parameters,
a name (which you have and looks correct), and offset (which you have
at 0, also ok), and then *binary* data - what you happened to send is
a text integer which will be interpreted as the binary string ascii
"0". Not only isn't that all zeros, it also is only one byte wide,
while  roach registers are generally 32 bits wide.

So there are two ways of solving this: Use the "?wordwrite" command
which accepts a hex integer as third parameter which is written as
32bits, or write out binary data, using "?write" and a binary buffer,
with a length parameter.

Here is an example of how wordwrite could work. Note that for
wordwrites, the offsets are in multiples of words.

result = send_rpc_katcl(l, 5000, KATCP_FLAG_FIRST | KATCP_FLAG_STRING,
"?wordwrite", KATCP_FLAG_STRING, "sys_scratchpad", KATCP_FLAG_STRING,
"0", KATCP_FLAG_LAST | KATCP_FLAG_XLONG, 0, NULL);

Also in case of failure the result of send_rpc_katcl should be
nonzero, in particular *greater* than zero if the katcp request failed
on the remote side (which it has in your case), and *less* than zero
for local or internal errors.

In case you wish to diagnose problems like that in future, you can
telnet to a roach while you are running your C program. On that
separate connection you can increase the log verbosity to help you see
where your code is failing - to do that issue a "?log-level trace" on
the interactive connection. When I run your code I see

#client-connected 192.168.64.1:58345
#log warn 1354095081805 raw
start\_0\_and\_length\_1\_have\_to\_be\_word\_aligned
#log trace 1354095081805 raw writing\_1\_bytes\_to\_sys_scratchpad
#log error 1354095081806 raw write\_on\_sys_scratchpad\_returns\_zero
#log debug 1354095081806 raw wrote\_-1/1\_bytes\_to\_sys_scratchpad\_at\_0
#log error 1354095081806 raw write\_failed\_-1\_instead\_of\_1\_bytes

Hope that helps

marc




Re: [casper] Any experience with the katcp-C interface?

2012-11-29 Thread Ryan Monroe
Hey Patrick, thanks for the reply:

As it turns out, the offset is *ascii-encoded*, whereas the data is
(presumably) binary-encoded (possibly with some kind of escape characters
added when necessary).  In any case, it turns out that I will only need to *
*read*  *from the C interface, which I can do...so I'll be fine.  I'll
table the C-interface for now.

A couple of things I've observed figuring all this out (so that other
people won't have to learn again):

-the bytes returned from all read processes are endianness-reversed.  So,
you'll have to this to the output buffer:

unsigned int dataRead[8192];
int len, i;
//read fills buffer, sets len
for(i=0;i wrote:

> Hello
>
> > It does help, thanks!  I'm interested in how you broke down the call
> > though.  I would have parsed it differently:  In telnet, the command
> > would be typed like this
> >
> > "?write"   
> >
> > where  is a string containing the register name,
> >  is a string containing the offset, and
> >  is raw binary data containing the value.
>
> Yes.
>
> > So I was trying to do:
> > ?write
> >  = "trig_sel" (string)
> >  = "0" (string)
> >  = 0x (ulong)
>
> Almost. Raw binary data doesn't look like "0x", raw binary
> data looks like what you get when you do "cat /bin/ls". Here in
> this mail, nul bytes are unlike to to show up, so raw binary
> data, using C escapes would be "\0\0\0\0". And it turns out,
> that is also what you would have had to type using the telnet
> interface, eg "?write trig_sel 0 \0\0\0\0". Now, the C interface does
> the escaping for you, but you still need to provide the data as a
> buffer (pointer to a memory region) and length. This typically
> involves interesting casts or a memcpy which people who are not
> that familiar with C find confusing, so for manipulation of
> single 32bit registers is probably easier using wordwrite. If you wish
> to use buffers, then the syntax (from memory... )
> is "KATCP_FLAG_BUFFER, buffer, len", alternatively
> there should be an append_buffer function, which allows you to
> construct a message in steps (used by the higher level rpc layer).
>
> > Actually, I get it now: the offset is placed at the /end/ of a write
> > command through the C interface, not the beginning.  Do I have that
> right?
>
> No, the C interface doesn't know about parameter ordering for
> individual commands. The C library speaks katcp, not the
> tcpborphserver command set layered on top of it.
>
> > Also, Is there more documentation to be had somewhere?
>
> I believe in examples/*.c there is a case in the server example
> which shows how a message is constructed which contains a
> binary message. Alternatively, do a "?read regname 0 4" to
> retrieve 4 binary bytes from a system.
>
> > Thank you very much...
>
> No problem, regards
>
> marc
>


[casper] Anyone used a ROACH with a 2GB DDR module?

2012-12-13 Thread Ryan Monroe

Hey all,
I'd like to use a ROACH with 1GB of accessible RAM.  I read that in the 
typical case, only 512 MB is available, unless other DIMMs are sought 
out.  Has anyone had success with this?


Thank you!

(for reference, the ROACH DRAM page) 

(for reference, the ROACH supported DIMMs page) 



--Ryan


Re: [casper] spectrometer 1Ghz

2012-12-20 Thread Ryan Monroe
I'll just chime in that I've seen this error before, which I solved by 
switching to a different version of the library.  I can't help with this 
one, just wanted to say it's probably not operator error.


--Ryan


On 12/20/2012 01:24 PM, katherine viviana cortes urbina wrote:

Hi Mark,


Today I change the fft in my design , I also open up the parameter 
boxes of both and make sure that the parameters are the same, I delete 
the old and replace it with the new, but I have the same error,


 Detected Linux OS
#
##  System Update  ##
#
Error using ==> gen_xps_files at 199
Error due to multiple causes.


Cheers

katty

2012/12/19 Mark Wagner >


Hi Kathy,

I think the design you opened up was built with an older version
of the libraries.  All you need to do is open up the simulink
browser and pull that fft into your design, then open up the
paramter boxes of both and make sure that the parameters are the
same.  After that, delete the old one and replace it with the new.

Mark


On Wed, Dec 19, 2012 at 5:14 PM, katherine viviana cortes urbina
mailto:kattycort...@gmail.com>> wrote:

Hi mark,

Where I get of newer versions of fft and pfb_fir?

Cheers

Katty

El 19/12/2012 19:10, "katherine viviana cortes urbina"
mailto:kattycort...@gmail.com>>
escribió:

Hi mark,

Where I try of newer versions of fft and pfb_fir ?

Cheers

El 19/12/2012 18:52, "Mark Wagner"
mailto:mwag...@ssl.berkeley.edu>> escribió:

Hi Katty,

I think it's possible you're using older versions of
the fft and the pfb_fir.  Could you try to replacing
those with the ones from the library you have open
(with the same parameter settings)?  And then try ctrl-d?

Mark


On Wed, Dec 19, 2012 at 4:32 PM, katherine viviana
cortes urbina mailto:kattycort...@gmail.com>> wrote:

Dear Casperites,

I am design a Spectrometer of 1Ghz, I just be
parameters in the existing blocks. I change the
parameter 'Number of simultaneous inputs' set to
3. But when I compile design , I have the error:


#
##  System Update  ##
#
Error using ==> gen_xps_files at 199
Error due to multiple causes.


if I do ctrl + d , I see the attachments.


Cheers


Katty








[casper] Are the ROACH1 10GbE ports duplex?

2013-01-20 Thread Ryan Monroe
Hey all,

I couldn't tell from the documentation, but are the ROACH1's 10GbE ports
full duplex at that rate, or is there some caveat, like "10Gb/s, one way or
5Gb/s bidirectional", or "each port is either send OR receive"?  I can't
find documentation to say with confidence, but I'm sure someone here has
tried this...

Thanks gusy!

--Ryan


Re: [casper] ngc file generated by 14.3 is not recognized by planahead 14.3

2013-01-20 Thread Ryan Monroe
Total hack, but you could just use planahead to design your constraints
file and then run it through the Casper tool flow.  That's what I've always
done and it's worked out pretty well so far
On Jan 20, 2013 6:02 PM, "homin"  wrote:

> Hello:
>
> I am trying to push the fabric clock faster and faster, so that i am
> trying the newest version 14.3 and planahead.
> I got a problem while using planahead 14.3. If the system.ngc compiled by
> 14.3 toolflow(matlab 2012a, Xilinx 14.3), planahead can't parse the ngc
> file. The problem is the V14.3 put the prefix "system_" in the wrapper
> files, the old versions didn't. I have tried the system.ngc files by 11.4,
> the planahead 14.3 can run it without problem.
>
> There should be somewhere the prefix "system_" can be removed, but i
> didn't have good luck.
> Anyone have met this problem ?
>
> regards
> homin jiang
>
> --**--**-
>
>> [NgdBuild 604] logical block 'epb_opb_bridge_inst' with type
>>> 'system_epb_opb_bridge_inst_**wrapper' could not be resolved. A pin
>>> name misspelling can cause this, a missing edif or ngc file, case mismatch
>>> between the block name and the edif or ngc file name, or the misspelling of
>>> a type name. Symbol 'system_epb_opb_bridge_inst_**wrapper' is not
>>> supported in target 'virtex6'.
>>>
>> --**--**
>
>


Re: [casper] number of coefficients needed in PFB and FFT

2013-01-21 Thread Ryan Monroe
It would work well for the PFB, but what we *really* need is a solid
"Direct Digital Synth (DDS) coefficient generator".  FFT coefficients are
really just sampled points around the unit circle, so you could, in
principle, use a recursive complex multiplier to generate the coefficients
on the fly.  You'll lose log2(sqrt(K)) bits for a recursion count of K, but
that's probably OK most of the time.  Say you're doing a 2^14 point FFT,
you need 2^13 coeffs.  You start with 18 bits of resolution and can do 1024
iterations before you degrade down to the est. 2^13 resolution.  So you'll
only need to store 8 "reset points".  Four of those will be 1, j, -1 and -j
in this case.  You could thus replace 8 BRAM36'es with three DSPs.

If you had a much larger FFT, say 2^16... you would have to use a wider
recursive multiplier.  You can achieve a wide cmult in no more than 10
DSPs...I think.  In that case, you would start with 25 bits and be able to
droop to 16 bits -- so up to 2^(2*9) =  of recursion.  You would only
need to have one "reset point" and your noise performance would be more
than sufficient.  1, j, -1 and -j are easy to store though, so I would
probably go with that

In addition, for the FFT direct, the first stage has only one shared
coefficient pattern, second stage has 2, third 4, etc.  You can, of course,
share coefficients amongst a stage where possible.  The real winnings occur
when you realize that the other coefficient banks within later stages are
actually the same coeffs as the first stage, with a constant phase rotation
(again, I'm 90% sure but I'll check tomorrow morning).  So, you could
generate your coefficients once, and then use a couple of complex
multipliers to make the coeffs for the other stages.  BAM!  FFT Direct's
coefficient memory utilization is *gone*

You could also do this for the FFT Biplex, but it would be a bit more
complicated.  Whoever designed the biplex FFT used in-order inputs.  This
is OK, but it means that the coefficients are in bit-reverse order.  So,
you would have to move the biplex unscrambler to the beginning, change the
mux logic, and replace the delay elements in the delay-commutator with some
flavor of "delay, bit-reversed".  I don't know how that would look quite
yet.  If you did that, your coefficients would become in-order, and you
could achieve the same savings I described with the FFT-Direct.  Also, I
implement coefficient and control logic sharing in my biplex and direct FFT
and it works *really well* at managing the fabric and memory utilization.
 Worth a shot.

:-)

--Ryan Monroe

PS, Sorry, I'm a bit busy right now so I can't implement a coefficient
interpolator for you guys right now.  I'll write back when I'm more free

PS2.  I'm a bit anal about noise performance so I usually use a couple more
bits then Dan prescribes, but as he demonstrated in the asic talks, his
comments about bit widths are 100% correct.   I would recommend them as a
general design practice as well.




On Mon, Jan 21, 2013 at 3:48 PM, Dan Werthimer wrote:

>
> agreed.   anybody already have, or want to develop, a coefficient
> interpolator?
>
> dan
>
> On Mon, Jan 21, 2013 at 3:44 PM, Aaron Parsons <
> apars...@astron.berkeley.edu> wrote:
>
>> Agreed.
>>
>> The coefficient interpolator, however, could get substantial savings
>> beyond that, even, and could be applicable to many things besides PFBs.
>>
>> On Mon, Jan 21, 2013 at 3:36 PM, Dan Werthimer wrote:
>>
>>>
>>> hi aaron,
>>>
>>> if you use xilinx brams for coefficients, they can be configured as dual
>>> port memories,
>>> so you can get the PFB reverse and forward coefficients both at the same
>>> time,
>>> from the same memory,  almost for free, without any memory size penalty
>>> over single port,
>>>
>>> dan
>>>
>>>
>>>
>>>
>>> On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons <
>>> apars...@astron.berkeley.edu> wrote:
>>>
>>>> You guys probably appreciate this already, but although the
>>>> coefficients in the PFB FIR are generally symmetric around the center tap,
>>>> the upper and lower taps use these coefficients in reverse order from one
>>>> another.  In order to take advantage of the symmetry, you'll have to use
>>>> dual-port ROMs that support two different addresses (one counting up and
>>>> one counting down).  In the original core I wrote, I instead just shared
>>>> coefficients between the real and imaginary components.  This was an easy
>>>> factor of 2 savings.  After that first factor of two, we found it was kind
>>>> of diminishing returns...
>

Re: [casper] number of coefficients needed in PFB and FFT

2013-01-21 Thread Ryan Monroe
PS3.  You could also have done the 2^16 FFT's coefficients as a narrow
cmult... you'd need to use a BRAM to store the 2^13 "reset points", but it
would still indicate a reduction in memory use by a factor of 4 -- not
trivial by any means.


On Mon, Jan 21, 2013 at 9:39 PM, Ryan Monroe wrote:

> It would work well for the PFB, but what we *really* need is a solid
> "Direct Digital Synth (DDS) coefficient generator".  FFT coefficients are
> really just sampled points around the unit circle, so you could, in
> principle, use a recursive complex multiplier to generate the coefficients
> on the fly.  You'll lose log2(sqrt(K)) bits for a recursion count of K, but
> that's probably OK most of the time.  Say you're doing a 2^14 point FFT,
> you need 2^13 coeffs.  You start with 18 bits of resolution and can do 1024
> iterations before you degrade down to the est. 2^13 resolution.  So you'll
> only need to store 8 "reset points".  Four of those will be 1, j, -1 and -j
> in this case.  You could thus replace 8 BRAM36'es with three DSPs.
>
> If you had a much larger FFT, say 2^16... you would have to use a wider
> recursive multiplier.  You can achieve a wide cmult in no more than 10
> DSPs...I think.  In that case, you would start with 25 bits and be able to
> droop to 16 bits -- so up to 2^(2*9) =  of recursion.  You would only
> need to have one "reset point" and your noise performance would be more
> than sufficient.  1, j, -1 and -j are easy to store though, so I would
> probably go with that
>
> In addition, for the FFT direct, the first stage has only one shared
> coefficient pattern, second stage has 2, third 4, etc.  You can, of course,
> share coefficients amongst a stage where possible.  The real winnings occur
> when you realize that the other coefficient banks within later stages are
> actually the same coeffs as the first stage, with a constant phase rotation
> (again, I'm 90% sure but I'll check tomorrow morning).  So, you could
> generate your coefficients once, and then use a couple of complex
> multipliers to make the coeffs for the other stages.  BAM!  FFT Direct's
> coefficient memory utilization is *gone*
>
> You could also do this for the FFT Biplex, but it would be a bit more
> complicated.  Whoever designed the biplex FFT used in-order inputs.  This
> is OK, but it means that the coefficients are in bit-reverse order.  So,
> you would have to move the biplex unscrambler to the beginning, change the
> mux logic, and replace the delay elements in the delay-commutator with some
> flavor of "delay, bit-reversed".  I don't know how that would look quite
> yet.  If you did that, your coefficients would become in-order, and you
> could achieve the same savings I described with the FFT-Direct.  Also, I
> implement coefficient and control logic sharing in my biplex and direct FFT
> and it works *really well* at managing the fabric and memory utilization.
>  Worth a shot.
>
> :-)
>
> --Ryan Monroe
>
> PS, Sorry, I'm a bit busy right now so I can't implement a coefficient
> interpolator for you guys right now.  I'll write back when I'm more free
>
> PS2.  I'm a bit anal about noise performance so I usually use a couple
> more bits then Dan prescribes, but as he demonstrated in the asic talks,
> his comments about bit widths are 100% correct.   I would recommend them as
> a general design practice as well.
>
>
>
>
> On Mon, Jan 21, 2013 at 3:48 PM, Dan Werthimer wrote:
>
>>
>> agreed.   anybody already have, or want to develop, a coefficient
>> interpolator?
>>
>> dan
>>
>> On Mon, Jan 21, 2013 at 3:44 PM, Aaron Parsons <
>> apars...@astron.berkeley.edu> wrote:
>>
>>> Agreed.
>>>
>>> The coefficient interpolator, however, could get substantial savings
>>> beyond that, even, and could be applicable to many things besides PFBs.
>>>
>>> On Mon, Jan 21, 2013 at 3:36 PM, Dan Werthimer wrote:
>>>
>>>>
>>>> hi aaron,
>>>>
>>>> if you use xilinx brams for coefficients, they can be configured as
>>>> dual port memories,
>>>> so you can get the PFB reverse and forward coefficients both at the
>>>> same time,
>>>> from the same memory,  almost for free, without any memory size penalty
>>>> over single port,
>>>>
>>>> dan
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jan 21, 2013 at 3:18 PM, Aaron Parsons <
>>>> apars...@astron.berkeley.edu> wrote:
>>>>
>>>>> You guys proba

[casper] problems with adc2x400-14 (re-send with attachments)

2013-01-22 Thread Ryan Monroe
(Looks like casper list does not support attachments.  Here are some 
links...let's try this again)

http://dl.dropbox.com/u/2832602/adc2x400_issues_histo.jpg
http://dl.dropbox.com/u/2832602/adc2x400_issues.jpg

===
Hey guys,

I'm trying to use the adc2x400-14, a 14-bit ADC part being clocked at 
256 MSPS.


When I run the adc with zero input, everything looks good.  However, if 
I put in a sine wave (or anything, really) -- I get this (see figure).  
Looks like it has a bunch of outliers.  They're not all the same code 
(see histogram), but they are coherent with a specific phase of the sine 
wave (plus or minus n*pi).


This happens in both ADC channels, although this figure is just of one.

Any ideas?  Thanks!
--

Ryan Monroe
904.923.8776




Re: [casper] problems with adc2x400-14 (re-send with attachments)

2013-01-23 Thread Ryan Monroe

Good call.  I'll give it a shot and report back :-)

On 01/23/2013 01:22 AM, Henno Kriel wrote:

Hi Ryan

I'm not sure which ADC part is being used, but from the MKID_ADC Test 
report it seems the be the TI ADS5474.


From this data sheet the data clock to data delay (t data) is 
typically 1.4ns. Your data clock period is 250 / 2 MHz (DDR) = 8 ns. 
The IDDR blocks are clocked by the 90deg clock from the vhdl file 
adc2x_14_400_interface.vhd.


So the problem is that you are probably violating the setup time on 
the IDDR:


Data delay = 1.4 ns
Clock delay = 8ns * 90deg = 2ns.

This gives a setup time of .6 ns (bad). Change the IDDRs to be clocked 
with clk (in stead of clk90) in adc2x_14_400_interface.vhd, which 
gives you setup time of 2.6ns


 and add the constraint to your ucf file:

OFFSET=IN 2.6 ns VALID 4.0 ns BEFORE "*DRDY_I_p" RISING;
OFFSET=IN 2.6 ns VALID 4.0 ns BEFORE "*DRDY_I_p" FALLING;

Hope this helps.

Henno

On Wed, Jan 23, 2013 at 12:11 AM, Ryan Monroe <mailto:ryan.m.mon...@gmail.com>> wrote:


(Looks like casper list does not support attachments.  Here are
some links...let's try this again)
http://dl.dropbox.com/u/2832602/adc2x400_issues_histo.jpg
http://dl.dropbox.com/u/2832602/adc2x400_issues.jpg

===
Hey guys,

I'm trying to use the adc2x400-14, a 14-bit ADC part being clocked
at 256 MSPS.

When I run the adc with zero input, everything looks good.
 However, if I put in a sine wave (or anything, really) -- I get
this (see figure).  Looks like it has a bunch of outliers.
 They're not all the same code (see histogram), but they are
coherent with a specific phase of the sine wave (plus or minus n*pi).

This happens in both ADC channels, although this figure is just of
one.

Any ideas?  Thanks!
-- 


Ryan Monroe
904.923.8776 





--
Henno Kriel

DSP Engineer
Digital Back End
meerKAT

SKA South Africa
Third Floor
The Park
Park Road (off Alexandra Road)
Pinelands
7405
Western Cape
South Africa

Latitude: -33.94329 (South); Longitude: 18.48945 (East).

(p) +27 (0)21 506 7300
(p) +27 (0)21 506 7365 (direct)
(f) +27 (0)21 506 7375
(m) +27 (0)84 504 5050




Re: [casper] number of coefficients needed in PFB and FFT

2013-01-24 Thread Ryan Monroe
Hey Andrew, thanks for the designs! I'll have to spend some time looking 
them over later, there's some good stuff there.


Nice idea (I think the Goertzel algorithm is often used with this
technique?). I have considered this for the DDC, it allows almost
arbitrary frequency and phase resolution. The only cost is a fair amount
of multipliers. For most applications at the moment we are BRAM limited
so this is not a problem (the very wide bandwidth instruments might be
multiplier limited at some point). It would be good as an option to
trade off multipliers for BRAM.

I haven't seen the Goertzel algorithm before, but it looks like a great 
idea for this: we might be able to produce a coefficient DDS in just two 
DSPs!


For my applications, I'm *totally* DSP limited, but I agree that we 
should try to cater to the greater CASPER community of course.


Coefficient reuse (as you describe between phases) would be nice (at the
cost of some register stages I guess).

The CASPER libraries *hemmorage* pipeline stages.  A few more won't 
hurt, and you'll be saving the RAM addressing logic.  Not so bad.


I think the reuse of control logic, coefficients etc would potentially
be the biggest saver assuming wide bandwidth systems. Ideally the
compiler would do this for us implicitly, but in the meantime explicit
reuse with optional register stages to reduce fanout would be awesome.

You can change a setting on pipeline registers (and maybe other places 
too) which allows it to do this.  it's called "Implement using 
behavioral HDL" in simulink, or "allow_register_retiming" in the xBlock 
interface.  I had a bad experience with it though: It'll try to optimize 
EVERYTHING.  Got two identical registers which you intend to place on 
opposite sides of the chip? They're now the same register.  In my 
experience, the only good way to control the sharing (or lack thereof) 
was to do it manually. YMMV.


I've got another idea we can consider too.  This one is farther away.  
I'm building radix-4 versions of my FFTs (1/2 as much fabric, 85% as 
much DSP and 100% as much coeff).  Now, for radix 4, you get three 
coefficient banks per butterfly stage, and while the sum total (# 
coefficients stored) is the same, the coefficients are actually in trios 
of (x^1; x^2; x^3 and an implicit x^0).  You could, in principle, store 
just the x^1 and square/cube it into x^2 and x^3.  I haven't tried this 
(just thought of it), so no idea regarding performance.  In addition, 
while Dan and I are working with JPL legal to get my library 
open-sourced, it's looking pretty clear that I won't be able to share 
the really new stuff, so you'd have to do radix-4 on your own :-(


--Ryan

On 01/22/2013 04:41 AM, Andrew Martens wrote:

Hi all



 It would work well for the PFB, but what we *really* need is a
 solid "Direct Digital Synth (DDS) coefficient generator".
 ...

Nice idea (I think the Goertzel algorithm is often used with this
technique?). I have considered this for the DDC, it allows almost
arbitrary frequency and phase resolution. The only cost is a fair amount
of multipliers. For most applications at the moment we are BRAM limited
so this is not a problem (the very wide bandwidth instruments might be
multiplier limited at some point). It would be good as an option to
trade off multipliers for BRAM.

Coefficient reuse (as you describe between phases) would be nice (at the
cost of some register stages I guess).

I think the reuse of control logic, coefficients etc would potentially
be the biggest saver assuming wide bandwidth systems. Ideally the
compiler would do this for us implicitly, but in the meantime explicit
reuse with optional register stages to reduce fanout would be awesome.


 PS, Sorry, I'm a bit busy right now so I can't implement a
 coefficient interpolator for you guys right now.  I'll write
 back when I'm more free

Got a bit carried away and implemented one. Attached is a model that
allows the comparison between ideal, interpolator, and Dan's reduced
storage idea. The interpolator uses a multiplier, cruder versions might
not at the cost of noise and/or more logic.


 PS2.  I'm a bit anal about noise performance so I usually use
 a couple more bits then Dan prescribes, but as he demonstrated
 in the asic talks, his comments about bit widths are 100%
 correct.   I would recommend them as a general design practice
 as well.

I have also seen papers that show that FFT performance is more dependent
on data path bit width than coefficient bit width. We need a proper
study on how many bits are required for different performance levels.


 but for long
 transforms, perhaps
 >4K points or so,
 then BRAM's might be
   

Re: [casper] number of coefficients needed in PFB and FFT

2013-01-24 Thread Ryan Monroe
For long FFTs, you could also use two BRAM18s (as lookup tables) and two 
complex multiplies (3 dsps each for V6, 4 dsps each for v5) to get a 
coefficient with 17 bits of accuracy and enough resolution for a 
2^19-point FFT.



On 01/24/2013 05:49 PM, Dan Werthimer wrote:


hi ryan, andrew,

we used to use CORDIC for generating coefficients.
not sure how cordic comares to goertzel.
there are a few open source VHDL cordics.

i think dave macmahon or someone developed
a radix4 version of the casper streaming FFT.

dan


On Thu, Jan 24, 2013 at 5:44 PM, Ryan Monroe <mailto:ryan.m.mon...@gmail.com>> wrote:


Hey Andrew, thanks for the designs! I'll have to spend some time
looking them over later, there's some good stuff there.

Nice idea (I think the Goertzel algorithm is often used with this
technique?). I have considered this for the DDC, it allows almost
arbitrary frequency and phase resolution. The only cost is a fair
amount
of multipliers. For most applications at the moment we are BRAM
limited
so this is not a problem (the very wide bandwidth instruments might be
multiplier limited at some point). It would be good as an option to
trade off multipliers for BRAM.

I haven't seen the Goertzel algorithm before, but it looks like a
great idea for this: we might be able to produce a coefficient DDS
in just two DSPs!

For my applications, I'm *totally* DSP limited, but I agree that
we should try to cater to the greater CASPER community of course.

Coefficient reuse (as you describe between phases) would be nice
(at the
cost of some register stages I guess).

The CASPER libraries *hemmorage* pipeline stages.  A few more
won't hurt, and you'll be saving the RAM addressing logic.  Not so
bad.

I think the reuse of control logic, coefficients etc would potentially
be the biggest saver assuming wide bandwidth systems. Ideally the
compiler would do this for us implicitly, but in the meantime explicit
reuse with optional register stages to reduce fanout would be awesome.

You can change a setting on pipeline registers (and maybe other
places too) which allows it to do this.  it's called "Implement
using behavioral HDL" in simulink, or "allow_register_retiming" in
the xBlock interface.  I had a bad experience with it though:
It'll try to optimize EVERYTHING.  Got two identical registers
which you intend to place on opposite sides of the chip? They're
now the same register.  In my experience, the only good way to
control the sharing (or lack thereof) was to do it manually. YMMV.

I've got another idea we can consider too.  This one is farther
away.  I'm building radix-4 versions of my FFTs (1/2 as much
fabric, 85% as much DSP and 100% as much coeff).  Now, for radix
4, you get three coefficient banks per butterfly stage, and while
the sum total (# coefficients stored) is the same, the
coefficients are actually in trios of (x^1; x^2; x^3 and an
implicit x^0).  You could, in principle, store just the x^1 and
square/cube it into x^2 and x^3.  I haven't tried this (just
thought of it), so no idea regarding performance.  In addition,
while Dan and I are working with JPL legal to get my library
open-sourced, it's looking pretty clear that I won't be able to
share the really new stuff, so you'd have to do radix-4 on your
own :-(

--Ryan

On 01/22/2013 04:41 AM, Andrew Martens wrote:

Hi all


 It would work well for the PFB, but what we
*really* need is a
 solid "Direct Digital Synth (DDS) coefficient
generator".
 ...

Nice idea (I think the Goertzel algorithm is often used with this
technique?). I have considered this for the DDC, it allows almost
arbitrary frequency and phase resolution. The only cost is a
fair amount
of multipliers. For most applications at the moment we are
BRAM limited
so this is not a problem (the very wide bandwidth instruments
might be
multiplier limited at some point). It would be good as an
option to
trade off multipliers for BRAM.

Coefficient reuse (as you describe between phases) would be
nice (at the
cost of some register stages I guess).

I think the reuse of control logic, coefficients etc would
potentially
be the biggest saver assuming wide bandwidth systems. Ideally the
compiler would do this for us implicitly, but in the meantime
explicit
reuse with optional register stages to reduce fanout would be
awesome.

 PS, Sorry, I'm a bit busy right now so I can't
implement a
  

Re: [casper] number of coefficients needed in PFB and FFT

2013-01-24 Thread Ryan Monroe

Ahem, you could do it with /one/ complex multiply.

On 01/24/2013 05:49 PM, Dan Werthimer wrote:


hi ryan, andrew,

we used to use CORDIC for generating coefficients.
not sure how cordic comares to goertzel.
there are a few open source VHDL cordics.

i think dave macmahon or someone developed
a radix4 version of the casper streaming FFT.

dan


On Thu, Jan 24, 2013 at 5:44 PM, Ryan Monroe <mailto:ryan.m.mon...@gmail.com>> wrote:


Hey Andrew, thanks for the designs! I'll have to spend some time
looking them over later, there's some good stuff there.

Nice idea (I think the Goertzel algorithm is often used with this
technique?). I have considered this for the DDC, it allows almost
arbitrary frequency and phase resolution. The only cost is a fair
amount
of multipliers. For most applications at the moment we are BRAM
limited
so this is not a problem (the very wide bandwidth instruments might be
multiplier limited at some point). It would be good as an option to
trade off multipliers for BRAM.

I haven't seen the Goertzel algorithm before, but it looks like a
great idea for this: we might be able to produce a coefficient DDS
in just two DSPs!

For my applications, I'm *totally* DSP limited, but I agree that
we should try to cater to the greater CASPER community of course.

Coefficient reuse (as you describe between phases) would be nice
(at the
cost of some register stages I guess).

The CASPER libraries *hemmorage* pipeline stages.  A few more
won't hurt, and you'll be saving the RAM addressing logic.  Not so
bad.

I think the reuse of control logic, coefficients etc would potentially
be the biggest saver assuming wide bandwidth systems. Ideally the
compiler would do this for us implicitly, but in the meantime explicit
reuse with optional register stages to reduce fanout would be awesome.

You can change a setting on pipeline registers (and maybe other
places too) which allows it to do this.  it's called "Implement
using behavioral HDL" in simulink, or "allow_register_retiming" in
the xBlock interface.  I had a bad experience with it though:
It'll try to optimize EVERYTHING.  Got two identical registers
which you intend to place on opposite sides of the chip? They're
now the same register.  In my experience, the only good way to
control the sharing (or lack thereof) was to do it manually. YMMV.

I've got another idea we can consider too.  This one is farther
away.  I'm building radix-4 versions of my FFTs (1/2 as much
fabric, 85% as much DSP and 100% as much coeff).  Now, for radix
4, you get three coefficient banks per butterfly stage, and while
the sum total (# coefficients stored) is the same, the
coefficients are actually in trios of (x^1; x^2; x^3 and an
implicit x^0).  You could, in principle, store just the x^1 and
square/cube it into x^2 and x^3.  I haven't tried this (just
thought of it), so no idea regarding performance.  In addition,
while Dan and I are working with JPL legal to get my library
open-sourced, it's looking pretty clear that I won't be able to
share the really new stuff, so you'd have to do radix-4 on your
own :-(

--Ryan

On 01/22/2013 04:41 AM, Andrew Martens wrote:

Hi all


 It would work well for the PFB, but what we
*really* need is a
 solid "Direct Digital Synth (DDS) coefficient
generator".
 ...

Nice idea (I think the Goertzel algorithm is often used with this
technique?). I have considered this for the DDC, it allows almost
arbitrary frequency and phase resolution. The only cost is a
fair amount
of multipliers. For most applications at the moment we are
BRAM limited
so this is not a problem (the very wide bandwidth instruments
might be
multiplier limited at some point). It would be good as an
option to
trade off multipliers for BRAM.

Coefficient reuse (as you describe between phases) would be
nice (at the
cost of some register stages I guess).

I think the reuse of control logic, coefficients etc would
potentially
be the biggest saver assuming wide bandwidth systems. Ideally the
compiler would do this for us implicitly, but in the meantime
explicit
reuse with optional register stages to reduce fanout would be
awesome.

 PS, Sorry, I'm a bit busy right now so I can't
implement a
 coefficient interpolator for you guys right now.
 I'll write
 back when I'm more free

Got a bit carried away and 

Re: [casper] Problem setting parameters in fft blocks using mlib_devel

2013-01-25 Thread Ryan Monroe
One common strategy I use in debugging mask scripts is this:

1. Turn on breakpoints and step through the top-level script one line at a
time until you find the one that fails.
2. Run the script in debug mode again, only this time run straight through
until that line (but don't run it).
3. Turn on "stop on errors"
4. Copy-paste the line into the command window and run it (alternatively,
highlight and press F9)

By doing this, you execute that command in the normal MATLAB debug
envirionment.  When the line errors out, you'll be pulled straight to the
failure point.  Note that because you executed the command outside of the
mask script interface, that instance of the block's behavior may become
undefined (it has to do with how the mask scripts run under the hood, I can
elaborate).  It's best to delete the block and pull out a new one (or copy
it ahead of time) once you do this.


On Fri, Jan 25, 2013 at 11:16 AM, Andrew Martens  wrote:

> Hi Ken
>
> I am at a bit of a loss. From the model you sent me, and the log output,
> it seems that an error occurs as the script for the fft_biplex is
> running. This causes fft_biplex drawing to be incomplete, and the init
> script for the fft then generates the error you see as fft_biplex0 does
> not have the ports it expects. The original error is silent though (as
> often happens with init scripts unfortunately) so we don't have any
> clues as to what went wrong.
>
> The fact that fft_wideband_real is generated successfully may be because
> it does not contain fft_biplex. However, fft_biplex_real_4x does not
> contain fft_biplex but also has problems drawing according to your mail.
>
> I am including the CASPER list in my reply in the hope that someone else
> can help you. Unfortunately we are very busy preparing for a workshop
> next week and my initial poking, and attempts to reproduce the problem,
> has not produced any results.
>
> Regards
> Andrew
>
> On Wed, 2013-01-23 at 14:48 +, Kenneth R. Treptow wrote:
> > Hi Andrew,
> >
> > Here is what I get:
> >
> > >> casper_log_groups={'all'}
> >
> > casper_log_groups =
> >
> > 'all'
> >
> > trace: entering fft_init
> > trace: fft_init post same_state
> > fft_init_debug:
> FFTSize5n_inputs2input_bit_width18coeff_bit_width18unscrambleonadd_latency1mult_latency2bram_latency2conv_latency1quantizationRound
>  (unbiased: +/-
> Inf)overflowSaturatearchVirtex5opt_targetlogiccoeffs_bit_limit8delays_bit_limit8mult_spec2hardcode_shiftsoffshift_scheduledsp48_addersoff
> > trace: entering fft_biplex_init
> > trace: fft_biplex_init post same_state
> > trace: entering biplex_core_init
> > trace: biplex_core_init post same_state
> > trace: entering fft_stage_n_init
> > trace: entering fft_stage_n_init
> > trace: fft_stage_n_init post same_state
> > fft_stage_n_init_debug:
> FFTSize3FFTStage2input_bit_width18coeff_bit_width18delays_bramoffcoeffs_bramoffquantizationRound
>  (unbiased: +/-
> Inf)overflowSaturateadd_latency1mult_latency2bram_latency2conv_latency1archVirtex5opt_targetlogicuse_hdlonuse_embeddedoffhardcode_shiftsoffdownshiftoffdsp48_addersoff
> > reuse_block_debug: butterfly_direct of same type so setting parameters
> > trace: entering butterfly_direct_init
> > trace: butterfly_direct_init post same_state
> > butterfly_direct_init_debug: biplexonFFTSize3Coeffs[0
>
>  
> 1]StepPeriod1coeff_bit_width18input_bit_width18hardcode_shiftsoffdownshiftoffadd_latency1mult_latency2bram_latency2conv_latency1quantizationRound
>  (unbiased: +/-
> Inf)overflowSaturatearchVirtex5opt_targetlogiccoeffs_bramoffuse_hdlonuse_embeddedoffdsp48_addersoff
> > butterfly_direct_init_debug: twiddle_stage_2 for twiddle
> > butterfly_direct_init_debug: Coeffs = [0
>  1] ActualCoeffs = [1+0i
> 6.123233995736766e-17-1i]
> > reuse_block_debug: a of same type so setting parameters
> > reuse_block_debug: b of same type so setting parameters
> > reuse_block_debug: sync of same type so setting parameters
> > reuse_block_debug: shift of same type so setting parameters
> > reuse_block_debug: a+bw of same type so setting parameters
> > reuse_block_debug: a-bw of same type so setting parameters
> > reuse_block_debug: of of same type so setting parameters
> > reuse_block_debug: sync_out of same type so setting parameters
> > reuse_block_debug: AddSub0 of different type so replacing
> > reuse_block_debug: AddSub1 of different type so replacing
> > reuse_block_debug: AddSub2 of different type so replacing
> > reuse_block_debug: AddSub3 of different type so replacing
> > reuse_block_debug: sync_delay of different type so replacing
> > reuse_block_debug: shift_delay of different type so replacing
> > reuse_block_debug: Scale0 of different type so replacing
> > reuse_block_debug: Scale1 of different type so replacing
> > reuse_block_debug: Scale2 of different type so replacing
> > reuse_block_debug: Scale3 of different type so replacing
> > reuse_block_debug: Mux0 of different type so replacing
> > reuse_block_debug: Mux1 of different type so replacing

Re: [casper] ROACH1 QDR size

2013-03-01 Thread Ryan Monroe

Hey David,

I know it's probably too late, but I figured out how to do a QDR-corner 
turn without ping-ponging (thus, doubling the effective size of the 
QDR).  Give me a shout if you need this in the future!


On 03/01/2013 10:23 AM, David MacMahon wrote:

Hi, Jack,

We noticed that one of our designs gave bogus results when run on a ROACH 1 
board with a serial number in the range 02, but valid results when run on a 
ROACH 1 board with a serial number in the ranges 03 or 04.

After much head scratching, we opened up one 02, one 03, and one 04 
roach.  The 02 roach (020247) had Cypress CY7C1263V18 QDR chips, which are 
2Mx18 bits.  The 03 roach (030144) had Cypress CY7C15632KV18 chips, which 
are 4Mx18.  The 04 roach (040122) had NEC D44647186AF5-E25 QDR chips, which 
are also 4Mx18.

The 2Mx18 bit QDR chips of the 02 roaches can only store 1M (1,048,576) 
32-bit words which was too small for our design.

Hope this helps,
Dave

On Mar 1, 2013, at 12:27 AM, Jack Hickish wrote:


Hi All,

Is someone able to confirm that the size of the QDR chips on ROACH 1 boards 
depends solely on the board version? If this is indeed the case, does anyone 
know the QDR specs for the different board iterations?


Cheers,
Jack







Re: [casper] ROACH1 QDR size

2013-03-01 Thread Ryan Monroe
It's not that complete.  I could write a memo about it and throw around 
some matlab code.  I implemented it for a demo several months ago but it 
was never tidy.  Whoever actually needs it would have half of their work 
done though. Basically, you just have to permute the bits of the address 
counter in a specific way.  It's just like what you guys do with the 
"reorder" blocks, where you use luts on the address line to the brams, 
only since your luts would be 2^21 long, I figured out the rule and used 
the permuted counter instead.  EDIT:  It wasn't permuted, I think it was 
a bit-circular-shifted counter.


I'll type that up this weekend.  Probably :-)

On 03/01/2013 10:50 AM, Dan Werthimer wrote:


hi ryan,

can you check this into one of the GIT repositories?

thanks,

dan

On Fri, Mar 1, 2013 at 10:37 AM, Ryan Monroe <mailto:ryan.m.mon...@gmail.com>> wrote:


Hey David,

I know it's probably too late, but I figured out how to do a
QDR-corner turn without ping-ponging (thus, doubling the effective
size of the QDR).  Give me a shout if you need this in the future!

On 03/01/2013 10:23 AM, David MacMahon wrote:

Hi, Jack,

We noticed that one of our designs gave bogus results when run
on a ROACH 1 board with a serial number in the range 02,
but valid results when run on a ROACH 1 board with a serial
number in the ranges 03 or 04.

After much head scratching, we opened up one 02, one
03, and one 04 roach.  The 02 roach (020247) had
Cypress CY7C1263V18 QDR chips, which are 2Mx18 bits.  The
03 roach (030144) had Cypress CY7C15632KV18 chips, which
are 4Mx18.  The 04 roach (040122) had NEC D44647186AF5-E25
QDR chips, which are also 4Mx18.

The 2Mx18 bit QDR chips of the 02 roaches can only store
1M (1,048,576) 32-bit words which was too small for our design.

Hope this helps,
Dave

On Mar 1, 2013, at 12:27 AM, Jack Hickish wrote:

Hi All,

Is someone able to confirm that the size of the QDR chips
on ROACH 1 boards depends solely on the board version? If
this is indeed the case, does anyone know the QDR specs
for the different board iterations?


Cheers,
Jack









[casper] Purpose of FFT-Direct

2013-03-12 Thread Ryan Monroe
Hey all,
Luke Madden was asking me about what's going on in the FFT-direct today.
 I'm pretty sure we have basically zero documentation on this lying around,
so it's a good time to fix that.  I'm going share what I know, but I'd
appreciate it if other people could add/correct me as needed.

So, you can split the CASPER FFTs into streaming and parallel FFTs:

streaming: 
These FFTs have several independent ports.  Each of these ports is fed with
normal-order, serial time-domain data and produces normal-order, serial
frequency-domain data.  If you know something about how pipelined FFTs
work, you'll probably call it a "Radix 2, Delay-Commutator FFT", or R2DC.
 In the , we follow the R2DC FFT with an
inverse-delay-commutator stage to un-scramble the data (the casper
implementation doesn't have the same structure as an
inverse-delay-commutator, but they do the same thing).  In
, we do the same R2DC FFT, but we treat real and imag as
separate inputs, making four inputs.

parallel: 
If map_tail is not set, then the fft_direct block accepts all the inputs
for an fft on *each clock cycle*.  Natural order in, Natural order out.
If map_tail *is* set, it's a bit more complicated.  Then, this block is
being used with a number of streaming FFTs to achieve a wideband FFT.
Imagine a standard DIT FFT.  The early stages of the FFT only use a few
coefficients.  In fact, they are each FFTs in their own rights, only on a
subset of the data.  These streaming FFTs are just that:  for as long as we
can still process the data in a serial fashion, we process each sample
sequentially.  Then, we do the last 1-4 (typically) stages in a massive
parallel format.  Here, the same structure is drawn as in the 
fft_direct... but the coefficients now change (specifically, their phases
are incrementing).

This is where my understanding gets a bit hazy, but it looks like the last
stages of the FFT are being literally enumerated here.  *If someone wants
to chime in, here is the place to do it*.

In any case, you could actually do these "mixed streaming/parallel FFTs"
(which are ) in a different fashion, by re-casting
them as a split-radix FFT (look it up).  Doing this is computationally
about the same, but saves resources and memory... and is simpler if the
size of  is greater than 2^2.


I hope this helps, Luke (and everyone else)!


--Ryan Monroe


Re: [casper] Purpose of FFT-Direct

2013-03-12 Thread Ryan Monroe
Hey Aaron!

My understanding may be imperfect, but I thought that a "split-radix" FFT
would have a bank of phase rotations (one for each input to fft-direct)
after the biplex FFTs.  If you chose your phase rotation coefficients
correctly, you'd be able to finish the larger FFT with a simple fft-direct
(map_tail=0).  That's the split-radix FFT which I was talking about.  It
simplifies things (all the coefficient storage goes in one place, reduces
routing, counters can be shared more easily, coefficients shared more
easily, etc) but I think the multiplier usage ends up the same.  The
difference would really start to show if you were trying to do like, a
2^21-point FFT... where you'd do the corner turns in QDR and generate
phase-rotate coefficients.  If you had the same coefficient schedule that
is used in fft_direct your FPGA would not be able to hold them all.

Either way, hat's off to you in a serious way, I would never have been able
to design this madness on my own :-)  Finally, as far as I can read your
memory utilization is the best that anyone can achieve under the constraint
of normal output order (you can do a bit better if you're okay with taking
a bit-reversal tho)  Ultimately these are all factorizations of the same
basic algorithm.  If you do a bit of mental gymnastics I guess it all looks
pretty similar

I have a radix-4 fft_wideband_real which uses 65%-85% as many multipliers
and better coefficient sharing, but as you say, you'll need to be doing
many parallel FFTs to take advantage of it (one R4MDC block can eat an
entire KATADC's worth of signal!).  No improvement to memory utilization
though.

*correction on my last post:  *When I said R4DC ("radix-4, Delay
Commutator"), I should have said R4MDC ("radix-4, multi-delay commutator"),
to distinguish it from streaming FFTs which only process FFT's worth of
data at a time.

--Ryan

On Tue, Mar 12, 2013 at 5:44 PM, Aaron Parsons  wrote:

> Hi Ryan,
>
> I wrote the various forms of the CASPER FFT, including this one.  The
> broad idea of the architecture was described in:
> http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4840623&tag=1
>
> Basically, (as far as I can tell from the brief perusal of split-radix
> ffts), I think this *is* a split radix FFT.   The mix of serial and
> parallell FFTs is used to evaluate a radix-2 Cooley Tukey FFT that is
> decomposed into several smaller FFTs that can be computed independently
> (without inter-communication of samples), followed by a direct FFT that
> cycles through twiddle coefficients (i.e. it is not truly a stand-alone
> direct FFT) that combines does the remaining butterflies, drawing on
> samples from all the sub-FFTs.  Data permutation is a bit of a headache in
> these architectures, so I invented a permuting buffer that uses basic group
> theory to automatically generate in-place permuters that do the necessary
> data reordering.
>
> I think you may have been misunderstanding how the architecture worked,
> and that is why you perhaps thought it was inefficient.  The total
> buffering is only 50% higher than the minimum of buffering possible (i.e.
> only storing each sample once), and the multipliers are all used at 100%
> efficiency.  Higher radices can produce some savings if you are doing more
> FFTs in parallel, but barring that, I'd be surprised if there is another
> architecture that substantially outperforms this one (but you are welcome
> to try!  :)
>
> I'm happy you're documenting.
>
> All the best,
> Aaron
>
>
> On Tue, Mar 12, 2013 at 3:39 PM, Ryan Monroe wrote:
>
>> Hey all,
>> Luke Madden was asking me about what's going on in the FFT-direct today.
>>  I'm pretty sure we have basically zero documentation on this lying around,
>> so it's a good time to fix that.  I'm going share what I know, but I'd
>> appreciate it if other people could add/correct me as needed.
>>
>> So, you can split the CASPER FFTs into streaming and parallel FFTs:
>>
>> streaming: 
>> These FFTs have several independent ports.  Each of these ports is fed
>> with normal-order, serial time-domain data and produces normal-order,
>> serial frequency-domain data.  If you know something about how pipelined
>> FFTs work, you'll probably call it a "Radix 2, Delay-Commutator FFT", or
>> R2DC.  In the , we follow the R2DC FFT with an
>> inverse-delay-commutator stage to un-scramble the data (the casper
>> implementation doesn't have the same structure as an
>> inverse-delay-commutator, but they do the same thing).  In
>> , we do the same R2DC FFT, but we treat real and imag as
>> separate inputs, making four inputs.
>>
>> parallel: 
>>

Re: [casper] Purpose of FFT-Direct

2013-03-12 Thread Ryan Monroe

That makes two of us!  Viva la revolution!

On 03/12/2013 06:35 PM, Dan Werthimer wrote:


it's pretty loud where i'm sitting.





Re: [casper] Question regarding FFT

2013-03-15 Thread Ryan Monroe

Comments:

-The FFT is probably fine.  If it was broken, it would probably be 100% 
broken.  At least this looks like a spectrum
-It appears to me as if the broken section is exactly 1/8th of the 
spectrum.  Did you hook up all of your outputs correctly?
-What's up with the spikes in the spectrum?  If they were interleave 
artifacts, I'd expect to see the one which is at channel ~1800 closer to 
2048.  Maybe they are some other tone though



On 03/15/2013 12:14 PM, Nimish Sane wrote:

Hi all:

I am attaching a plot for Power of two inputs vs frequency channels. 
As can be seen, we have this persistent problem where some channels at 
the end just do not make any sense. We suspect that there is something 
going wrong in the FFT block. Has anybody seen such behavior before or 
can think of what may be going wrong?


Following are some specifications that may be useful:
Hardware: ROACH2 with KATADC.
ADC Clock: 800 MHz,
FPGA clock: 200 MHz
Toolflow: XSG 11.5 with Matlab 2009b with RHEL5.8
Libraries: SKA
FFT green block: fft_wideband_real
FFT size: 2^13 (4096 Output channels)

Thanks,

Nimish
--
Nimish Sane

Center for Solar-Terrestrial Research
New Jersey Institute of Technology
University Heights

Newark, NJ 07102-1982 USA

Tel: (973) 642 4958

Fax: (973) 596 3617

nimish.s...@njit.edu 




Re: [casper] Question regarding FFT

2013-03-18 Thread Ryan Monroe

How about this:  Send me a copy of the model and I'll do a sanity check :-)

On 03/18/2013 08:04 AM, Nimish Sane wrote:

@Dan:
We apply sync pulse only once. From the memo you have mentioned, this 
is an acceptable mode. We do not apply resync pulses, and can also 
confirm that there is just one sync pulse.


@Dave:
We have never seen such behavior in simulations. I am double checking 
though.


@Ryan:
Your observations are correct, but I do not understand what you mean 
by hooking up all outputs correctly. That seems to be fine. We are 
using the block's outputs correctly.


FWIW, we run the design on two different FPGA clocks (150 MHz and 200 
MHz) and see similar behavior.


Thanks,

Nimish


On Fri, Mar 15, 2013 at 2:14 PM, Ryan Monroe <mailto:ryan.m.mon...@gmail.com>> wrote:


Comments:

-The FFT is probably fine.  If it was broken, it would probably be
100% broken.  At least this looks like a spectrum
-It appears to me as if the broken section is exactly 1/8th of the
spectrum.  Did you hook up all of your outputs correctly?
-What's up with the spikes in the spectrum?  If they were
interleave artifacts, I'd expect to see the one which is at
channel ~1800 closer to 2048.  Maybe they are some other tone though



On 03/15/2013 12:14 PM, Nimish Sane wrote:

Hi all:

I am attaching a plot for Power of two inputs vs frequency
channels. As can be seen, we have this persistent problem where
some channels at the end just do not make any sense. We suspect
that there is something going wrong in the FFT block. Has anybody
seen such behavior before or can think of what may be going wrong?

Following are some specifications that may be useful:
Hardware: ROACH2 with KATADC.
ADC Clock: 800 MHz,
FPGA clock: 200 MHz
Toolflow: XSG 11.5 with Matlab 2009b with RHEL5.8
Libraries: SKA
FFT green block: fft_wideband_real
FFT size: 2^13 (4096 Output channels)

Thanks,

Nimish
-- 
Nimish Sane


Center for Solar-Terrestrial Research
New Jersey Institute of Technology
University Heights

Newark, NJ 07102-1982 USA

Tel: (973) 642 4958 

Fax: (973) 596 3617 

nimish.s...@njit.edu <mailto:nimish.s...@njit.edu>







[casper] reading and writing to BORPH

2013-03-18 Thread Ryan Monroe

Hey guys,

I'm trying to read and write to a ROACH1 using the BORPH filesystem.  I 
followed the tutorial at 
https://safe.nrao.edu/wiki/pub/CICADA/FlashLights/BorphGettingStart.pdf, 
but got somewhat different results




root@roach:/boffiles# ./asmls_roach0_2013_Mar_15_1123.bof
^Z
[1]+  Stopped ./asmls_roach0_2013_Mar_15_1123.bof
root@roach:/boffiles# ps -a
  PID TTY  TIME CMD
  618 pts/000:00:00 asmls_roach0_20
  619 pts/000:00:00 ps
root@roach:/boffiles# cd /proc/618/hw
root@roach:/proc/618/hw# echo 0 > ioreg_mode
root@roach:/proc/618/hw# cd ioreg
root@roach:/proc/618/hw/ioreg# ls -ltrh
-r--r--r-- 1 root root4 Dec 26 21:56 spec_count
-rw-rw-rw- 1 root root4 Dec 26 21:56 acc_always_valid
<>

root@roach:/proc/618/hw/ioreg# cat spec_count
cat: spec_count: Invalid argument
root@roach:/proc/618/hw/ioreg# cat acc_always_valid
cat: acc_always_valid: Invalid argument
root@roach:/proc/618/hw/ioreg#


spec_count is a output reg, while acc_always_valid is an input reg.

Anyone know what I'm doing wrong?  Thanks!

--Ryan



[casper] in-place QDR transpose memo

2013-03-18 Thread Ryan Monroe
Hey all, finally got around to finishing this.  I'm not sure how 
complete/useful it is, so feel free to ask questions if needed. I've 
attached the memo here, but you'll need the zip from my dropbox if you 
want the project/mcode/bof/pictures.


The pictures included show a bit more about the 4x4 corner turn case.  
The same ideas scale to the big versions.


Cheers!

http://dl.dropbox.com/u/2832602/qdr_demo.zip

--Ryan


QDR corner turn memo.odt
Description: application/vnd.oasis.opendocument.text


Re: [casper] reading and writing to BORPH

2013-03-18 Thread Ryan Monroe



root@roach:/proc/618/hw/ioreg# echo 0 > ../ioreg_mode
root@roach:/proc/618/hw/ioreg# echo -e \x00\x00\x00\x00 > acc_always_valid
-bash: echo: write error: Invalid argument
root@roach:/proc/618/hw/ioreg# echo -e \x00\x00\x00\x00 > 
acc_always_validczx

-bash: acc_always_validczx: No such file or directory
root@roach:/proc/618/hw/ioreg# cat ../ioreg_mode
0
root@roach:/proc/618/hw/ioreg# echo 1 > ../ioreg_mode
root@roach:/proc/618/hw/ioreg# echo -e \x00\x00\x00\x00 > acc_always_valid
root@roach:/proc/618/hw/ioreg# cat acc_always_valid
x00xroot@roach:/proc/618/hw/ioreg#


Was that the expected response? "x00x"

Also, the "Invalid argument" seems to happen on write too, and it's 
different from the "file not found" error...



On 03/18/2013 02:12 PM, Adam Barta wrote:

Hi Ryan,


Try echo -e \x00\x00\x00\x00 > register


Adam


On Mon, Mar 18, 2013 at 9:12 PM, Ryan Monroe <mailto:ryan.m.mon...@gmail.com>> wrote:


Hey guys,

I'm trying to read and write to a ROACH1 using the BORPH
filesystem.  I followed the tutorial at
https://safe.nrao.edu/wiki/pub/CICADA/FlashLights/BorphGettingStart.pdf,
but got somewhat different results



root@roach:/boffiles# ./asmls_roach0_2013_Mar_15_1123.bof
^Z
[1]+  Stopped ./asmls_roach0_2013_Mar_15_1123.bof
root@roach:/boffiles# ps -a
  PID TTY  TIME CMD
  618 pts/000:00:00 asmls_roach0_20
  619 pts/000:00:00 ps
root@roach:/boffiles# cd /proc/618/hw
root@roach:/proc/618/hw# echo 0 > ioreg_mode
root@roach:/proc/618/hw# cd ioreg
root@roach:/proc/618/hw/ioreg# ls -ltrh
-r--r--r-- 1 root root4 Dec 26 21:56 spec_count
-rw-rw-rw- 1 root root4 Dec 26 21:56 acc_always_valid
<>

root@roach:/proc/618/hw/ioreg# cat spec_count
cat: spec_count: Invalid argument
root@roach:/proc/618/hw/ioreg# cat acc_always_valid
cat: acc_always_valid: Invalid argument
root@roach:/proc/618/hw/ioreg#


spec_count is a output reg, while acc_always_valid is an input reg.

Anyone know what I'm doing wrong?  Thanks!

--Ryan




--
*Adam Barta*
c: +27 72 105 8611
e: a...@ska.ac.za <mailto:a...@ska.ac.za>
w: www.ska.ac.za <http://www.ska.ac.za>





Re: [casper] reading and writing to BORPH

2013-03-18 Thread Ryan Monroe

Hi David, thanks for the response!

I guess I wasn't really sure which ioreg_mode value corresponded to 
which, so I just tried them both.  So <1=binary> and <0=ascii>.  Armed 
with that knowledge (plus your mad BASH chops), I successfully 
controlled some registers through BORPH. Still no luck in ASCII mode, 
but I don't really need it.  Also, tcpborphserver didn't work when I 
programmed the ROACH through BASH.  See below:


(tcpborphserver fails)
$ telnet 192.168.1.92 7147
Trying 192.168.1.92...
Connected to 192.168.1.92.
Escape character is '^]'.
#version poco-0.1
#build-state poco-0.2804
?read spec_count 0 4
#log error 4294538676790 poco register\_spec_count\_not\_found
!read fail program


(successful write)
root@roach:/proc/618/hw/ioreg# echo 1 > ../ioreg_mode
root@roach:/proc/618/hw/ioreg# echo -en "\x00\x00\x01\x00" > acc_len_m1
root@roach:/proc/618/hw/ioreg# echo -en "\x00\x00\x00\x01" > 
acc_always_valid

root@roach:/proc/618/hw/ioreg# echo -en "\x00\x00\x00\x01" > acc_sw_rst
root@roach:/proc/618/hw/ioreg# echo -en "\x00\x00\x00\x00" > acc_sw_rst
root@roach:/proc/618/hw/ioreg# cat spec_count
Lroot@roach:/proc/618/hw/ioreg# cat -An spec_count
 1^@^A^UBroot@roach:/proc/618/hw/ioreg#
root@roach:/proc/618/hw/ioreg# cat spec_count | hd
  00 05 d1 4d   |...M|
0004


Looks good!  Thanks everyone

--Ryan

On 03/18/2013 02:49 PM, David MacMahon wrote:

Hi, Ryan,

On Mar 18, 2013, at 2:21 PM, Ryan Monroe wrote:


root@roach:/proc/618/hw/ioreg# echo 0 > ../ioreg_mode
root@roach:/proc/618/hw/ioreg# echo -e \x00\x00\x00\x00 > acc_always_valid
-bash: echo: write error: Invalid argument

You need to either double up the backslashes or put the \x00\x00\x00\x00 in 
quotes.  Also adding -n is probably needed to suppress the newline.  I think 
the invalid argument error is due to writing too many bytes (when ioreg_mode is 
0).


root@roach:/proc/618/hw/ioreg# echo 1 > ../ioreg_mode
root@roach:/proc/618/hw/ioreg# echo -e \x00\x00\x00\x00 > acc_always_valid
root@roach:/proc/618/hw/ioreg# cat acc_always_valid
x00xroot@roach:/proc/618/hw/ioreg#

Here the shell ate the backslashes so you as before you tried to write "x00x00x00x00" to 
the register.  I guess ioreg_mode of 1 is more tolerant of trailing data, so only the first 4 
bytes, "x00x" got written to the register (as confirmed by the output of the cat command).

Hope this helps,
Dave






Re: [casper] reading and writing to BORPH

2013-03-19 Thread Ryan Monroe
It's okay, I don't need compatibility this time around.  I just want 
need performance better than what a telnet to the ROACH can give me.  
Future revisions of this project will be on a custom board anyways


Thanks for all of your (collective) help.  Problem solved... for now ;-)

--Ryan

On 03/18/2013 11:16 PM, Jason Manley wrote:

You can only use tcpborphserver's katcp interface if you program the FPGA through that 
too. If you program the FPGA "behind its back" (ie from the command line), then 
you can't read/write registers over katcp.

I strongly advise you to use katcp in your new design, because BORPH is not 
provided with ROACH-2 and will probably not be provided with ROACH-3 etc. So if 
you want forwards compatibility, I recommend katcp for all your interfacing.

Jason

On 19 Mar 2013, at 00:11, Ryan Monroe wrote:


Hi David, thanks for the response!

I guess I wasn't really sure which ioreg_mode value corresponded to which, so I just tried 
them both.  So <1=binary> and <0=ascii>.  Armed with that knowledge (plus your 
mad BASH chops), I successfully controlled some registers through BORPH. Still no luck in 
ASCII mode, but I don't really need it.  Also, tcpborphserver didn't work when I programmed 
the ROACH through BASH.  See below:

(tcpborphserver fails)
$ telnet 192.168.1.92 7147
Trying 192.168.1.92...
Connected to 192.168.1.92.
Escape character is '^]'.
#version poco-0.1
#build-state poco-0.2804
?read spec_count 0 4
#log error 4294538676790 poco register\_spec_count\_not\_found
!read fail program


(successful write)
root@roach:/proc/618/hw/ioreg# echo 1 > ../ioreg_mode
root@roach:/proc/618/hw/ioreg# echo -en "\x00\x00\x01\x00" > acc_len_m1
root@roach:/proc/618/hw/ioreg# echo -en "\x00\x00\x00\x01" > acc_always_valid
root@roach:/proc/618/hw/ioreg# echo -en "\x00\x00\x00\x01" > acc_sw_rst
root@roach:/proc/618/hw/ioreg# echo -en "\x00\x00\x00\x00" > acc_sw_rst
root@roach:/proc/618/hw/ioreg# cat spec_count
Lroot@roach:/proc/618/hw/ioreg# cat -An spec_count
 1^@^A^UBroot@roach:/proc/618/hw/ioreg#
root@roach:/proc/618/hw/ioreg# cat spec_count | hd
  00 05 d1 4d   |...M|
0004


Looks good!  Thanks everyone

--Ryan

On 03/18/2013 02:49 PM, David MacMahon wrote:

Hi, Ryan,

On Mar 18, 2013, at 2:21 PM, Ryan Monroe wrote:


root@roach:/proc/618/hw/ioreg# echo 0 > ../ioreg_mode
root@roach:/proc/618/hw/ioreg# echo -e \x00\x00\x00\x00 > acc_always_valid
-bash: echo: write error: Invalid argument

You need to either double up the backslashes or put the \x00\x00\x00\x00 in 
quotes.  Also adding -n is probably needed to suppress the newline.  I think 
the invalid argument error is due to writing too many bytes (when ioreg_mode is 
0).


root@roach:/proc/618/hw/ioreg# echo 1 > ../ioreg_mode
root@roach:/proc/618/hw/ioreg# echo -e \x00\x00\x00\x00 > acc_always_valid
root@roach:/proc/618/hw/ioreg# cat acc_always_valid
x00xroot@roach:/proc/618/hw/ioreg#

Here the shell ate the backslashes so you as before you tried to write "x00x00x00x00" to 
the register.  I guess ioreg_mode of 1 is more tolerant of trailing data, so only the first 4 
bytes, "x00x" got written to the register (as confirmed by the output of the cat command).

Hope this helps,
Dave








Re: [casper] New error timing

2013-04-08 Thread Ryan Monroe
Or set the rounding options on the block to wrap / truncate, which will 
probably resolve the issue.


*At the very least, set "wrap".*  Any situation which causes a saturate 
will be a catastrophe for the system anyways.  The fact that a wrap is 
twice (or more) as bad is irrelevant at that point.  There's no reason 
for casper FFT blocks to have a saturate option, in my opinion..



To be clear, the "fanout" issue that Issac and I resolved was atypical.  
You probably shouldn't look to that first if you're having timing issues.



On 04/08/2013 04:52 PM, David MacMahon wrote:

Hi, Katty,

The first error report in the timing file shows this:


   Source:   
tut3_XSG_core_config/tut3_XSG_core_config/tut3_x0/fft_wideband_real_e4c9925378/fft_biplex_real_4x0_3de6c27f63/biplex_core_6535c8c7e0/fft_stage_5_6db5e3967b/butterfly_direct_eff8a62bdf/twiddle_general_4mult_3ae9ad9772/mult/Maddsub_mult_46_56
 (DSP)

   Destination:  
tut3_XSG_core_config/tut3_XSG_core_config/tut3_x0/fft_wideband_real_e4c9925378/fft_biplex_real_4x0_3de6c27f63/biplex_core_6535c8c7e0/fft_stage_5_6db5e3967b/butterfly_direct_eff8a62bdf/twiddle_general_4mult_3ae9ad9772/convert0/convert/latency_lt_4.reg_out/partial_one.last_srl17e/reg_array[3].fde_used.u2
 (FF)

This means that this timing error is occurring in the twiddle_general_4mult 
block between the multiply-add DSP48 and subsequent convert block.  In fact, 
all of the timing erros were related to different bits of this same path.  The 
twiddle_general_4mult block uses the Xilinx convert block which is known to 
have sub-optimal timing.  The twiddle_general_4mult block should probably be 
updated to use the CASPER convert block.  That will likely result in better 
timing, but could have other undesirable side effects (e.g. using too many 
DSP48s).

Until this block is updated, you'll have to somehow ease the timing in some other way.  Either use 
PlanAhead to manually "pre-place" the design or add some pipelining registers to ease 
timing of other tight areas (generate a verbose timing remote using ISE or the "trce" 
command from the command line to find other tight areas).

Hope this helps,
Dave

On Apr 5, 2013, at 6:09 AM, katherine viviana cortes urbina wrote:


ERROR: 1 constraint not met.

PAR could not meet all timing constraints. A bitstream will not be generated.

To disable the PAR timing check:

1> Disable the "Treat timing closure failure as error" option from the Project 
Options dialog in XPS.

OR

2> Type following at the XPS prompt:
XPS% xset enable_par_timing_error 0


I saw in the system.twr file the error of fanout I think this isn't the case
but if I have an error the timing in some block, someone can help identify and 
fix it.

Cheers

Katty

Pd: the clock in the FPGA this is 225 MHz and the ADC 900 MHz







Re: [casper] New error timing

2013-04-08 Thread Ryan Monroe
So truncate is actually kind of hard to decide on.  In general, I've had 
mixed results with regard to using round-to-even, versus "truncate, but 
use one extra bit".  I think that the round choice is more important on 
a coherent application, but this is still guesswork.


Saturate is much easier.  Anytime you saturate, your snr is basically 
ruined.  It's like clipping your ADC.  In DSP land, it's like adding a 
huge delta function at each saturate/wrap point (bigger delta for a 
wrap). Do it once in a thousand times and your SNR is instantly down to 
~30 dB.  That being said, the rule of thumb is "never, EVER clip any 
part of your signal chain".  So its reasonable to use wrap... since once 
you saturate, it's so bad (and obvious) that using saturate isn't even 
important anymore.


With bit growth, I used to go for an analytical solution -- or even just 
a strong rule-of-thumb.  I have a few of the latter nowadays, but I 
pretty much just do robust SNR analysis against expected values, reduce 
until I'm just marginal and then add back 2-3 bits to be safe.




On 04/08/2013 09:44 PM, David MacMahon wrote:

Hi, Ryan,

I agree that setting wrap and truncate would probably resolve the timing issue, 
but I'm not so sure it's generally advisable to do so.  Have you done any 
analysis comparing the effects of wrap and truncate vs saturate and 
round-to-even?

That said, I do think the implementation could be smarter about bit growth in 
the twiddle block(s).  The twiddle step is simply a rotation in the complex 
plane, so I think there is no reason to grow more than one non-fractional bit.  
There is also little reason to grow fractional bits since rotation doesn't 
increase precision at all, though maybe one fractional bit could be argued for.

Dave

On Apr 8, 2013, at 7:37 PM, Ryan Monroe wrote:


Or set the rounding options on the block to wrap / truncate, which will 
probably resolve the issue.

At the very least, set "wrap".  Any situation which causes a saturate will be a 
catastrophe for the system anyways.  The fact that a wrap is twice (or more) as bad is 
irrelevant at that point.  There's no reason for casper FFT blocks to have a saturate 
option, in my opinion..


To be clear, the "fanout" issue that Issac and I resolved was atypical.  You 
probably shouldn't look to that first if you're having timing issues.


On 04/08/2013 04:52 PM, David MacMahon wrote:

Hi, Katty,

The first error report in the timing file shows this:



   Source:   
tut3_XSG_core_config/tut3_XSG_core_config/tut3_x0/fft_wideband_real_e4c9925378/fft_biplex_real_4x0_3de6c27f63/biplex_core_6535c8c7e0/fft_stage_5_6db5e3967b/butterfly_direct_eff8a62bdf/twiddle_general_4mult_3ae9ad9772/mult/Maddsub_mult_46_56
 (DSP)

   Destination:  
tut3_XSG_core_config/tut3_XSG_core_config/tut3_x0/fft_wideband_real_e4c9925378/fft_biplex_real_4x0_3de6c27f63/biplex_core_6535c8c7e0/fft_stage_5_6db5e3967b/butterfly_direct_eff8a62bdf/twiddle_general_4mult_3ae9ad9772/convert0/convert/latency_lt_4.reg_out/partial_one.last_srl17e/reg_array[3].fde_used.u2
 (FF)


This means that this timing error is occurring in the twiddle_general_4mult 
block between the multiply-add DSP48 and subsequent convert block.  In fact, 
all of the timing erros were related to different bits of this same path.  The 
twiddle_general_4mult block uses the Xilinx convert block which is known to 
have sub-optimal timing.  The twiddle_general_4mult block should probably be 
updated to use the CASPER convert block.  That will likely result in better 
timing, but could have other undesirable side effects (e.g. using too many 
DSP48s).

Until this block is updated, you'll have to somehow ease the timing in some other way.  Either use 
PlanAhead to manually "pre-place" the design or add some pipelining registers to ease 
timing of other tight areas (generate a verbose timing remote using ISE or the "trce" 
command from the command line to find other tight areas).

Hope this helps,
Dave

On Apr 5, 2013, at 6:09 AM, katherine viviana cortes urbina wrote:



ERROR: 1 constraint not met.

PAR could not meet all timing constraints. A bitstream will not be generated.

To disable the PAR timing check:

1> Disable the "Treat timing closure failure as error" option from the Project 
Options dialog in XPS.

OR

2> Type following at the XPS prompt:
XPS% xset enable_par_timing_error 0


I saw in the system.twr file the error of fanout I think this isn't the case
but if I have an error the timing in some block, someone can help identify and 
fix it.

Cheers

Katty

Pd: the clock in the FPGA this is 225 MHz and the ADC 900 MHz







Re: [casper] New error timing

2013-04-08 Thread Ryan Monroe
Flame war ENGAGE!

Kidding.  I'll have to look over what you wrote and I'll get back to you.
there's a perfectly good chance I'm missing something.  Thanks for the
detailed reply!
On Apr 8, 2013 10:42 PM, "David Hawkins"  wrote:

> Hi Ryan,
>
>  So truncate is actually kind of hard to decide on.  In general, I've had
>> mixed results with regard to using round-to-even, versus "truncate, but
>> use one extra bit".  I think that the round choice is more important on
>> a coherent application, but this is still guesswork.
>>
>> Saturate is much easier.  Anytime you saturate, your snr is basically
>> ruined.  It's like clipping your ADC.  In DSP land, it's like adding a
>> huge delta function at each saturate/wrap point (bigger delta for a
>> wrap). Do it once in a thousand times and your SNR is instantly down to
>> ~30 dB.  That being said, the rule of thumb is "never, EVER clip any
>> part of your signal chain".  So its reasonable to use wrap... since once
>> you saturate, it's so bad (and obvious) that using saturate isn't even
>> important anymore.
>>
>> With bit growth, I used to go for an analytical solution -- or even just
>> a strong rule-of-thumb.  I have a few of the latter nowadays, but I
>> pretty much just do robust SNR analysis against expected values, reduce
>> until I'm just marginal and then add back 2-3 bits to be safe.
>>
>
> Saturation and quantization noise for noise-like signals are
> intimately related. As you comment above when you clip you
> add a delta function, which is the same as wideband noise.
> So long as that wideband noise does not exceed the quantization
> noise you're doing ok.
>
> http://www.ovro.caltech.edu/~**dwh/carma_board/digitizer_**tests.pdf
> http://www.ovro.caltech.edu/~**dwh/wbsddc/correlator_**efficiency.pdf
>
> In these documents you will see noise power ratio (NPR) plots.
>
> How does this factor into anything we care about? Well, the
> "correlator efficiency" of say a 2-bit correlator, or 2-bit
> correlator with deleted inner products is ~87%. This efficiency
> comes about not due to the loss-in SNR due to heavy quantization,
> but due to the non-linearity in the correlation estimate caused
> by the heavy quantization.
>
> Anyway, a certain level of saturation is fine. It occurs when you
> sample the input signal, and it should also be happening when you
> requantize (saturate and round) signals within the DSP processing
> pipeline.
>
> Check out this tutorial for a discussion of rounding techniques
> and why only convergent (bankers) rounding should really be used
>
> http://www.ovro.caltech.edu/~**dwh/correlator/pdf/ESC-**
> 104Paper_Hawkins.pdf
>
> Eg., check out the comparisons of all the MATLAB rounding methods
> in Figure 13 on page 26.
>
> Wrapping should never be allowed in a DSP chain. Truncation adds a
> bias (keeping more bits just decreases the bias) so should not be
> used in applications where you care about the DC offset, eg.
> complex-valued baseband processing. Re-quantization stages should
> saturate and convergent round. If you're saturating  "too hard",
> then that is an error in the power-detection and scaling logic
> preceding the re-quantization stage.
>
> I don't mean to start a DSP flame war, I just thought you might
> like to hear an alternate opinion :)
>
> Cheers,
> Dave
>
>
>


Re: [casper] Latency in PFB and FFT

2013-05-07 Thread Ryan Monroe

Hi Ross,

I just added a memo to the CASPER page.  It was a study I did on meeting 
timing for ROACH2, but it applies to ROACH1 and should be illuminating 
w.r.t. timing closure on these parts.  Using the techniques I describe 
there /might/ get you to 325 MHz.  Keep in mind that I was *completely* 
unprofessional in that document. Don't go looking for something that can 
be published in nature ;-)  BUT you will get to see how I really feel 
about Xilinx tools


That said, your goal is pretty ambitious so be prepared for a struggle.  
I support everything Andrew said below.  Specifically, there is 
generally a hard limit, past which adding more latency *never* helps.  
For instance, with a multiply that is 18x25 or smaller, that limit is 4 
(but Xilinx often doesn't choose the optimal configuration so it's 
worse)  DSP48 adds, it's 3. for fabric adds, it's 1.  Anything more than 
these numbers will almost always hurt you, although less can help.


I'm planning on being down at Caltech 3:30-4:00 today.  If you want, I 
can meet you and talk about it either before or after that slot?


--Ryan

On 05/06/2013 11:33 PM, Andrew Martens wrote:

Hi Ross

The guys at Berkeley and Ryan would probably have more detailed advice 
(especially regarding hand-placement) but the following are some 
general guidelines if trying to optimise timing from within System 
Generator;


1. Adding latency to an operation in System Generator results in the 
following;
a. Register stages are added to sections within the operation so 
as to pipeline things and allow higher speed operation.
This is useful where pipeline registers exist in cores e.g the 
DSP48 multiplier core. It is also useful in operations that can be
pipelined e.g the cast/convert block uses a sequence of 
operations.
b. Once all possible register stages within the operation are 
exhausted, the remaining latency is allocated after the
operation. This latency will be limited in the benefit it adds 
as it is normally implemented in a single slice
(the look up table can act as a shift register in Xilinx FGPAs 
and the final register stage of latency is implemented

using the register in the slice).

This leads to the following tips;

1. Avoid long chains of asynchronous logic. Add latency to operations 
involving large fanout or fanin e.g muxes, adders, comparators, 
cast/convert. Do a bit of thinking and research on how the various 
operations would be implemented under-the-hood.


2. Register inputs and outputs of blocks.

3. Use the CASPER Delays/pipeline block instead of the System 
Generator delay block on critical timing paths (the pipeline block 
forces the latency to be implemented in a pipeline of register stages 
instead of being absorbed into a single slice).


4. BRAMs followed by Multipliers often cause timing problems as they 
are limited and location constrained. These occur in the pfb_fir and 
fft. Add lots of BRAM latency to help (at least 3 to start) and 
Multiplier latency (4 would be a start but adding too much adds 
register stages *after* the Multiplier which is pointless).


5. Any input/output to/from yellow blocks (especially FPGA pins) 
should contain a pipeline block with a latency of at least 2 (allowing 
one register stage near source and one near destination).


6. Check to see if the System Generator block you are using has timing 
related options e.g the cast/convert has a 'Pipeline for maximum 
performance' option.


Cheers
Andrew



Hi All,

I'm trying to get the ROACH1 to run at 325MHz (5GSPS) in a simple
spectrometer. This is obviously pushing the limits of the hardware -
I'm kind or arbitrarily tweaking the latencies in these blocks to try
and meet timing requirements. Are there any guidelines/notes I should
be following - i.e should certain latencies match such as Add and bram
where as say Fanout doesn't matter. Also are there any limits on these
- i.e say 20 rather than 2?

I'm sure if my understanding of the PFB and FFT was more than minimal
this would be obvious...

R


--
Ross Williamson
Research Scientist - Sub-mm Group
California Institute of Technology
626-395-2647 (office)
312-504-3051 (Cell)










Re: [casper] Latency in PFB and FFT

2013-05-07 Thread Ryan Monroe

Also, here's that memo for easy access

dl.dropbox.com/u/2832602/roach2_timing.zip 



On 05/06/2013 11:33 PM, Andrew Martens wrote:

Hi Ross

The guys at Berkeley and Ryan would probably have more detailed advice 
(especially regarding hand-placement) but the following are some 
general guidelines if trying to optimise timing from within System 
Generator;


1. Adding latency to an operation in System Generator results in the 
following;
a. Register stages are added to sections within the operation so 
as to pipeline things and allow higher speed operation.
This is useful where pipeline registers exist in cores e.g the 
DSP48 multiplier core. It is also useful in operations that can be
pipelined e.g the cast/convert block uses a sequence of 
operations.
b. Once all possible register stages within the operation are 
exhausted, the remaining latency is allocated after the
operation. This latency will be limited in the benefit it adds 
as it is normally implemented in a single slice
(the look up table can act as a shift register in Xilinx FGPAs 
and the final register stage of latency is implemented

using the register in the slice).

This leads to the following tips;

1. Avoid long chains of asynchronous logic. Add latency to operations 
involving large fanout or fanin e.g muxes, adders, comparators, 
cast/convert. Do a bit of thinking and research on how the various 
operations would be implemented under-the-hood.


2. Register inputs and outputs of blocks.

3. Use the CASPER Delays/pipeline block instead of the System 
Generator delay block on critical timing paths (the pipeline block 
forces the latency to be implemented in a pipeline of register stages 
instead of being absorbed into a single slice).


4. BRAMs followed by Multipliers often cause timing problems as they 
are limited and location constrained. These occur in the pfb_fir and 
fft. Add lots of BRAM latency to help (at least 3 to start) and 
Multiplier latency (4 would be a start but adding too much adds 
register stages *after* the Multiplier which is pointless).


5. Any input/output to/from yellow blocks (especially FPGA pins) 
should contain a pipeline block with a latency of at least 2 (allowing 
one register stage near source and one near destination).


6. Check to see if the System Generator block you are using has timing 
related options e.g the cast/convert has a 'Pipeline for maximum 
performance' option.


Cheers
Andrew



Hi All,

I'm trying to get the ROACH1 to run at 325MHz (5GSPS) in a simple
spectrometer. This is obviously pushing the limits of the hardware -
I'm kind or arbitrarily tweaking the latencies in these blocks to try
and meet timing requirements. Are there any guidelines/notes I should
be following - i.e should certain latencies match such as Add and bram
where as say Fanout doesn't matter. Also are there any limits on these
- i.e say 20 rather than 2?

I'm sure if my understanding of the PFB and FFT was more than minimal
this would be obvious...

R


--
Ross Williamson
Research Scientist - Sub-mm Group
California Institute of Technology
626-395-2647 (office)
312-504-3051 (Cell)









[casper] Problems with shared_memory blocks on ROACH2?

2013-05-09 Thread Ryan Monroe
hey guys!  I just got a shiny new ROACH2 and I want to test it out.

Sadly, Xilinx is hating on my shared_memory blocks.  For each instance I
get this:

(depth is 1024, all registering is on, optimize:minimize_area)

--
ERROR:EDK - r4_12ghz_top_out_pow_lower_0_ramblk (bram_block_custom) -
Release
   14.2 - Xilinx CORE Generator P.28xd (lin64)
   Copyright (c) 1995-2012 Xilinx, Inc.  All rights reserved.
   All runtime messages will be recorded in

 /nas/users/monroe/Desktop/new_algo_v3/monroe_library/dev/v6/radix_4/r4_12ghz_
   top

 /XPS_ROACH2_base/implementation/r4_12ghz_top_out_pow_lower_0_ramblk_wrapper/c
   ore
   gen.log
   Wrote CGP file for project 'coregen'.
   child process exited abnormally





I think I can just revise the core to a new version of
block_memory_generator, but I'd like to see if anyone's already seen it.
 Opinions?

Thanks!

--Ryan


Re: [casper] Problems with shared_memory blocks on ROACH2?

2013-05-09 Thread Ryan Monroe
(forgot to reply-list)

Yeah, I should have chased the logs.  Glenn also suggests that I try
reverting to an older revision of the library.  my hypothesis is that 14.2
doesn't support blk_mem_gen:7.3, but I won't know till I'm back in the
office a week from now

WARNING:coreutil:928 - Unable to find 'xilinx.com:ip:blk_mem_gen:7.3' within
   repository.
INFO:encore:314 - Created non-GUI application for batch mode execution.
Wrote CGP file for project 'coregen'.
WARNING:coreutil:928 - Unable to find 'xilinx.com:ip:blk_mem_gen:7.3' within
   repository.
ERROR:sim:1002 - Unable to create component instance ''
   (xilinx.com:ip:blk_mem_gen:7.3), as the specified identifier is not
present
   in the CoreGen repository.
ERROR:sim:918 - Could not create instance of component Block Memory
Generator
   v7.3.


On Thu, May 9, 2013 at 8:56 AM, David MacMahon wrote:

> Hi, Ryan,
>
> What is in the coregen.log file (whose name is terribly mangled)?
>
> On May 9, 2013, at 8:43 AM, Ryan Monroe wrote:
>
> >All runtime messages will be recorded in
> >
>  /nas/users/monroe/Desktop/new_algo_v3/monroe_library/dev/v6/radix_4/r4_12ghz_
> >top
> >
>  /XPS_ROACH2_base/implementation/r4_12ghz_top_out_pow_lower_0_ramblk_wrapper/c
> >ore
> >gen.log
>
> Dave
>
>


Re: [casper] Problems with shared_memory blocks on ROACH2?

2013-05-14 Thread Ryan Monroe
Hey all, just an update on this: First, I reverted to an old copy of the 
libraries -- it didn't work, but had a different problem.  It looks like 
after you have more than a certain number of yellow blocks in use, the 
tools start using the opb2opb_lite pcore, which can be had from the 
pcores_for_ise13 that I made a while back.  I only copied the two cores 
it required (I found out about the second one on the second build 
attempt) -- didn't copy all of them in fear of running into the coregen 
issue for some reason.  Now it's building quite nicely.


So, long story short, be wary of weird behavior when you have /many/ 
yellow blocks.


On 05/09/2013 08:56 AM, David MacMahon wrote:

Hi, Ryan,

What is in the coregen.log file (whose name is terribly mangled)?

On May 9, 2013, at 8:43 AM, Ryan Monroe wrote:


All runtime messages will be recorded in

/nas/users/monroe/Desktop/new_algo_v3/monroe_library/dev/v6/radix_4/r4_12ghz_
top

/XPS_ROACH2_base/implementation/r4_12ghz_top_out_pow_lower_0_ramblk_wrapper/c
ore
gen.log

Dave






Re: [casper] Problems with shared_memory blocks on ROACH2?

2013-05-15 Thread Ryan Monroe
Bah, I should have looked more closely... that is absolutely it.  
Clearly, I was breaking the 32 boundary and triggering the need for a 
bridge.  I don't know what caused the initial error though.


--Ryan Monroe
626.773.0805

On 05/14/2013 11:13 PM, Andrew Martens wrote:

Hi Ryan

Hey all, just an update on this: First, I reverted to an old copy of 
the libraries -- it didn't work, but had a different problem.  It 
looks like after you have more than a certain number of yellow blocks 
in use, the tools start using the opb2opb_lite pcore, which can be 
had from the pcores_for_ise13 that I made a while back.  I only 
copied the two cores it required (I found out about the second one on 
the second build attempt) -- didn't copy all of them in fear of 
running into the coregen issue for some reason.  Now it's building 
quite nicely.


So, long story short, be wary of weird behavior when you have /many/ 
yellow blocks.


The logic deciding how to set up the yellow blocks in the system (in 
casper_library/gen_xps_mod_mhs.m) works  as follows;


Most yellow blocks are hard-coded to use the primary opb bus (opb0) at 
a fixed address offset, e.g SDRAM, QDR SRAM, 10Ge etc. Shared BRAM and 
software registers as a group are allocated a fixed address space but 
are added as they are required.


The bus controller for the primary opb0 bus can support up to 32 
slaves, and a fixed address space size. If more slaves are 
instantiated, or the address space required is greater than what it 
can support, an opb bridge (opb2opb_lite) is connected to the 
controller, and a new controller connected to this bridge, which adds 
another opb bus (opb1) to the system. This process is repeated (add a 
bridge to opb0, connect it to a controller to make a new bus etc) if 
more opb busses are required allowing many hundreds of Shared BRAMs 
and/or software registers. The upper limit is reached when there as 
many secondary (opb1, opb2 etc) busses as there are open slots (minus 
any fixed opb0 devices) available on the primary opb0 bus controller.


Regards
Andrew





Re: [casper] ROACH2: Cannot communicate over USB

2013-05-20 Thread Ryan Monroe

Hey David,

We noticed the dialout group thing, and have tested it with me being 
both root (via sudo minicom USBTTY2) and dialout.  Thanks!


--Ryan Monroe
626.773.0805

On 05/20/2013 11:46 AM, David MacMahon wrote:

Hi, Ryan,

Everything seems OK.  Were you root or a member of the dialout group when you 
tried connecting?  I use screen instead of minicom, so I can't offer any extra 
ideas regarding minicom itself.

FWIW, /dev/ttyUSB2 is the one that usually works for me.

Dave

On May 20, 2013, at 11:34 AM, Ryan Monroe wrote:


Hey all, sorry for the spam :-(

I'm trying to communicate with my ROACH2 for initial configuration.  My understanding 
is that we should set up a machine per this .  Then use the USB port the 
same way we used serial for ROACH1... just over either ttyUSB2 or ttyUSB3.

My problem is that I receive no communications from the ROACH through USB.  
After configuring the settings in minicom (filename, flow control, init/reset 
string), I can connect to the four USB virtual serial ports on the ROACH2, but 
I receive no data upon starting it up.  Here's the test I ran:




Start up computer.  Connect ROACH2 to power.  Do not start up ROACH2.


Upon logging in, run "dmesg > tc_dmesg_preUSB.txt" (see file)
run "ls /dev/ttyUSB*" (see results in "tc_lsOnUSB.txt")

Plug in the USB cable to both ends
run "dmesg > tc_dmesg_postUSB.txt" (see file)
run "ls /dev/ttyUSB*" (see results in "tc_lsOnUSB.txt")


Run "sudo minicom -s TTYUSB2" and "sudo minicom -s TTYUSB3".  Take a screenshot 
to display the state of the configurations (tc_minicom_settings.jpg)

Run:
"sudo minicom TTYUSB0"
"sudo minicom TTYUSB1"
"sudo minicom TTYUSB2"
"sudo minicom TTYUSB3"

Take a screenshot (tc_before_powering_roach.jpg)
Press the ROACH2 power button
Wait several minutes
Take a screenshot to show that we received nothing from the ROACH. 
(tc_before_powering_roach.jpg ... it was the same result)

Unplug the USB cable
Watch as the four minicom terminals fill with error messages 
(tc_usb_removed.png)
=


So ladies and gents, any ideas?  Thanks in advance!
--
--Ryan Monroe
626.773.0805







Re: [casper] Matlab 2013a and ISE 14.4/5

2013-06-06 Thread Ryan Monroe
Try calling the ise system64.sh and then run the command "sysgen".  This is
the official way to call system generator.  You should not be calling
matlab directly to use system generator
On Jun 6, 2013 4:44 PM, "Ross Williamson" 
wrote:

> Anybody managed to get Matla 2013a and ISE 14.5 working?
>
> When I run  xlAddSysgen([getenv('XILINX_PATH'), '/ISE']) matlab
> segfaults. I can post the gory details but the summary is below.
>
>
> 
>Segmentation violation detected at Thu Jun  6 13:40:17 2013
> 
>
> Configuration:
>   Crash Decoding : Disabled
>   Current Visual : 0x21 (class 4, depth 24)
>   Default Encoding   : UTF-8
>   GNU C Library  : 2.17 stable
>   MATLAB Architecture: glnxa64
>   MATLAB Root: /opt/MATLAB/R2013a
>   MATLAB Version : 8.1.0.604 (R2013a)
>   Operating System   : Linux 3.8.0-23-generic #34-Ubuntu SMP Wed May
> 29 20:22:58 UTC 2013 x86_64
>   Processor ID   : x86 Family 6 Model 58 Stepping 9, GenuineIntel
>   Virtual Machine: Java 1.6.0_17-b04 with Sun Microsystems Inc.
> Java HotSpot(TM) 64-Bit Server VM mixed mode
>   Window System  : The X.Org Foundation (11303000), display :0.0
>
> Fault Count: 1
>
> ...
>
> This error was detected while a MEX-file was running. If the MEX-file
> is not an official MathWorks function, please examine its source code
> for errors. Please consult the External Interfaces Guide for information
> on debugging MEX-files.
>
>
>
> --
> Ross Williamson
> Research Scientist - Sub-mm Group
> California Institute of Technology
> 626-395-2647 (office)
> 312-504-3051 (Cell)
>
>


Re: [casper] Matlab 2013a and ISE 14.4/5

2013-06-06 Thread Ryan Monroe
Try just system generator before worrying about Casper libraries.  I'm out
of town right now but I'll help you or in a week ;-)
On Jun 6, 2013 8:06 PM, "Ross Williamson" 
wrote:

> Tried that - no luck. I think it might be some kind of mismatch in
> stdlibc++ between matlab, ISE and the native ubuntu damn it - time
> to start hacking libraries again.
>
>
>
> On Thu, Jun 6, 2013 at 2:35 PM, Ryan Monroe 
> wrote:
> > Try calling the ise system64.sh and then run the command "sysgen".  This
> is
> > the official way to call system generator.  You should not be calling
> matlab
> > directly to use system generator
> >
> > On Jun 6, 2013 4:44 PM, "Ross Williamson"  >
> > wrote:
> >>
> >> Anybody managed to get Matla 2013a and ISE 14.5 working?
> >>
> >> When I run  xlAddSysgen([getenv('XILINX_PATH'), '/ISE']) matlab
> >> segfaults. I can post the gory details but the summary is below.
> >>
> >>
> >> 
> >>Segmentation violation detected at Thu Jun  6 13:40:17 2013
> >> 
> >>
> >> Configuration:
> >>   Crash Decoding : Disabled
> >>   Current Visual : 0x21 (class 4, depth 24)
> >>   Default Encoding   : UTF-8
> >>   GNU C Library  : 2.17 stable
> >>   MATLAB Architecture: glnxa64
> >>   MATLAB Root: /opt/MATLAB/R2013a
> >>   MATLAB Version : 8.1.0.604 (R2013a)
> >>   Operating System   : Linux 3.8.0-23-generic #34-Ubuntu SMP Wed May
> >> 29 20:22:58 UTC 2013 x86_64
> >>   Processor ID   : x86 Family 6 Model 58 Stepping 9, GenuineIntel
> >>   Virtual Machine: Java 1.6.0_17-b04 with Sun Microsystems Inc.
> >> Java HotSpot(TM) 64-Bit Server VM mixed mode
> >>   Window System  : The X.Org Foundation (11303000), display :0.0
> >>
> >> Fault Count: 1
> >>
> >> ...
> >>
> >> This error was detected while a MEX-file was running. If the MEX-file
> >> is not an official MathWorks function, please examine its source code
> >> for errors. Please consult the External Interfaces Guide for information
> >> on debugging MEX-files.
> >>
> >>
> >>
> >> --
> >> Ross Williamson
> >> Research Scientist - Sub-mm Group
> >> California Institute of Technology
> >> 626-395-2647 (office)
> >> 312-504-3051 (Cell)
> >>
> >
>
>
>
> --
> Ross Williamson
> Research Scientist - Sub-mm Group
> California Institute of Technology
> 626-395-2647 (office)
> 312-504-3051 (Cell)
>


Re: [casper] How to avoid design placement/implementation in casper_xps command

2013-07-01 Thread Ryan Monroe
Nope!  But we should change the choices on Casper xps (imo) to

Update design
XSG
(Everything "copy base package" through Synthesis)
The rest
On Jul 1, 2013 9:28 AM, "Haoxuan Zheng"  wrote:

>  Hi Casper,
> [Question]:
> Is there a way to ask casper_xps to finish synthesized design and stop
> there (meaning not to attempt placement), and I take over in PlanAhead?
> This will save me a few hours each run. I checked the casper wiki for
> casper_xps but I'm not sure which step corresponds to completion of
> synthesized design.
>
>  [Background]:
> We have an X-engine design that is guaranteed not to meet timing in
> casper_xps compiling, but we have found a reliable floor planning strategy
> (pblocks etc) in PlanAhead to make it compile. I have been debugging it
> thus making minor changes that makes no difference as far as compiling is
> concerned, so it is a waste of time in casper_xps to go from a synthesised
> design to an implemented design, given that its implementation will fail
> timing anyways. (I'm using planahead language here, sorry if that's not
> clear.)
>
>  Thank you guys so much!
> Jeff
>


Re: [casper] How to avoid design placement/implementation in casper_xps command

2013-07-01 Thread Ryan Monroe
Wait until it hits MAP and hit CTRL-C? Then do your business and it'll 
usually skip synthesis on the next run


--Ryan Monroe
626.773.0805

On 07/01/2013 05:18 PM, Jeff Zheng wrote:

Hi Ryan,
Thanks a lot for the quick answer! Given the current capser_xps, is 
there any command I can skip to avoid placement? Does IP Synthesis 
finish the synthesized design?


Jeff


On Mon, Jul 1, 2013 at 12:37 PM, Ryan Monroe <mailto:ryan.m.mon...@gmail.com>> wrote:


Nope!  But we should change the choices on Casper xps (imo) to

Update design
XSG
(Everything "copy base package" through Synthesis)
The rest

On Jul 1, 2013 9:28 AM, "Haoxuan Zheng" mailto:jef...@mit.edu>> wrote:

Hi Casper,
[Question]:
Is there a way to ask casper_xps to finish synthesized design
and stop there (meaning not to attempt placement), and I take
over in PlanAhead? This will save me a few hours each run. I
checked the casper wiki for casper_xps but I'm not sure which
step corresponds to completion of synthesized design.

[Background]:
We have an X-engine design that is guaranteed not to meet
timing in casper_xps compiling, but we have found a reliable
floor planning strategy (pblocks etc) in PlanAhead to make it
compile. I have been debugging it thus making minor changes
that makes no difference as far as compiling is concerned, so
it is a waste of time in casper_xps to go from a synthesised
design to an implemented design, given that its implementation
will fail timing anyways. (I'm using planahead language here,
sorry if that's not clear.)

Thank you guys so much!
Jeff
[Upload Photo to Facebook]
[Twitt]
[Send by Gmail]
[Provided by QuickShare]
[Upload Video to Facebook]
[Twitt]
[Send by Gmail]
[Provided by QuickShare]






Re: [casper] capser_xps "Cannot find any compiled XSG netlist"

2013-07-08 Thread Ryan Monroe

Spaces..  could have been that too

--Ryan Monroe
626.773.0805

On 07/08/2013 11:05 AM, Haoxuan Zheng wrote:

Hi Casper,
I started getting this strange casper_xps fail yesterday:
"
XSG generation complete.
#
## Copying base system ##
#

source_dir =

/mnt/data0/omniscope/programs/mlib_devel/xps_base/XPS_ROACH2_base

Copying base package from:
 /mnt/data0/omniscope/programs/mlib_devel/xps_base/XPS_ROACH2_base

## Copying custom IPs ##

##
## Creating Simulink IP ##
##
*Error using gen_xps_create_pcore (line 41)
Cannot find any compiled XSG netlist. Have you run the Xilinx System 
Generator on your design ?

*"


Has anyone seen this error before?

Thanks a lot!
Jeff




Re: [casper] capser_xps "Cannot find any compiled XSG netlist"

2013-07-08 Thread Ryan Monroe
Yup, look to see if you have numbers or capital letters in your mdl 
file.  I don't remember what it was, but for some cases, XSG silently 
renames the output netlist in a way that casper_xps does not recognize


--Ryan Monroe
626.773.0805

On 07/08/2013 11:05 AM, Haoxuan Zheng wrote:

Hi Casper,
I started getting this strange casper_xps fail yesterday:
"
XSG generation complete.
#
## Copying base system ##
#

source_dir =

/mnt/data0/omniscope/programs/mlib_devel/xps_base/XPS_ROACH2_base

Copying base package from:
 /mnt/data0/omniscope/programs/mlib_devel/xps_base/XPS_ROACH2_base

## Copying custom IPs ##

##
## Creating Simulink IP ##
##
*Error using gen_xps_create_pcore (line 41)
Cannot find any compiled XSG netlist. Have you run the Xilinx System 
Generator on your design ?

*"


Has anyone seen this error before?

Thanks a lot!
Jeff




Re: [casper] (no subject)

2013-07-08 Thread Ryan Monroe
Maybe katcp has some uncertainty on operation times? Write some C or put
some code on the board and use the old bash scripting way
On Jul 8, 2013 1:21 PM, "Haoxuan Zheng"  wrote:

>  Hi Casper,
> Is there any way to find out the FPGA clock rate precisely? We are
> suspecting that our ROACH2s are not running at the frequency (202MHz) as we
> specified in the design. We tried the python function in katcp, but it does
> not look reliable enough. We hacked that function and increased the time
> between counter reads to 30 seconds (instead of 2 seconds), and the reading
> is still not that reliable. They all read roughly around 200.2MHz, and if
> that's the case, then we must be doing something wrong regarding the clocks
> and our 202MHz did not go into the design. We are using internal FPGA
> clocks with no ADC or anything.
>
> Background:
> We are testing 4 ROACH1 F-engine to 4 ROACH2 X-engine 10GBE set up, 16
> connections in between. Each F-engine is sending to all 4 X-engines, and
> each X-engine is receiving from all 4 F-engines. We have a very simple
> sending and receiving FIFO buffer logic. The thing we see is that one
> particular X-engine always has all 4 buffers filled up in a fixed amount of
> time (32K buffer fills up in ~270 seconds). We swapped a lot of thing
> around to test which part is wrong in the system, and we nailed it down to
> something physical in that X-engine, and the fact that it fills up indicate
> that X-engine clock is running slower than we set it to, because we set
> X-engine clock to be 202MHz vs 200MHz on F-engine. Therefore we would like
> an definitive measurement of the actual X-engine clock that is running.
>
> Plot explanation:
> It's plotting one of the four FIFO buffer filling up over 270 seconds. The
> vertical axis is the literal number of numbers currently stored in the
> FIFO. This FIFO should on average get 1 number every 8 clocks from the
> F-engine (200MHz) receiver, and should get 1 number pumped out every 8
> clocks on X-engine (202MHz), so this long term build up definitely suggest
> that the clocks are not what we think.
>
>
> Thanks a lot and sorry for the long email!
>
> Jeff
>


Re: [casper] (no subject)

2013-07-08 Thread Ryan Monroe
While you're at it, check what clock source you have in xsg core config.
Also, adjust your source clock and see if the result from est-brd-clk
changes
On Jul 8, 2013 7:20 PM, "Dan Werthimer"  wrote:

>
>
> it might be difficult in simulink to connect clock to sync_out,
> but you could use a divide by 16 counter, enabled all the time,
> and then connect the divider output to the sync output
> and measure with a frequency counter.
>
> dan
>
>
> On Mon, Jul 8, 2013 at 1:33 PM, Matt Dexter  wrote:
>
>> how about make a design that delivers a copy of the
>> FPGA clock to the Roach2's SYNC_OUT pin J11 ?
>> then you could connect that signal  into a frequency
>> counter, oscilloscope or ...
>> This won't be super low jitter but otherwise it will
>> be a fair representation.
>>
>>
>> On Mon, 8 Jul 2013, Haoxuan Zheng wrote:
>>
>>  Date: Mon, 8 Jul 2013 20:20:50 +
>>> From: Haoxuan Zheng 
>>> To: "casper@lists.berkeley.edu" 
>>> Subject: [casper] (no subject)
>>>
>>> Hi Casper,
>>> Is there any way to find out the FPGA clock rate precisely? We are
>>> suspecting that our ROACH2s are not running at the frequency (202MHz) as
>>> we
>>> specified in the design. We tried the python function in katcp, but it
>>> does
>>> not look reliable enough. We hacked that function and increased the time
>>> between counter reads to 30 seconds (instead of 2 seconds), and the
>>> reading
>>> is still not that reliable. They all read roughly around 200.2MHz, and if
>>> that's the case, then we must be doing something wrong regarding the
>>> clocks
>>> and our 202MHz did not go into the design. We are using internal FPGA
>>> clocks
>>> with no ADC or anything.
>>>
>>> Background:
>>> We are testing 4 ROACH1 F-engine to 4 ROACH2 X-engine 10GBE set up, 16
>>> connections in between. Each F-engine is sending to all 4 X-engines, and
>>> each X-engine is receiving from all 4 F-engines. We have a very simple
>>> sending and receiving FIFO buffer logic. The thing we see is that one
>>> particular X-engine always has all 4 buffers filled up in a fixed amount
>>> of
>>> time (32K buffer fills up in ~270 seconds). We swapped a lot of thing
>>> around
>>> to test which part is wrong in the system, and we nailed it down to
>>> something physical in that X-engine, and the fact that it fills up
>>> indicate
>>> that X-engine clock is running slower than we set it to, because we set
>>> X-engine clock to be 202MHz vs 200MHz on F-engine. Therefore we would
>>> like
>>> an definitive measurement of the actual X-engine clock that is running.
>>>
>>> Plot explanation:
>>> It's plotting one of the four FIFO buffer filling up over 270 seconds.
>>> The
>>> vertical axis is the literal number of numbers currently stored in the
>>> FIFO.
>>> This FIFO should on average get 1 number every 8 clocks from the F-engine
>>> (200MHz) receiver, and should get 1 number pumped out every 8 clocks on
>>> X-engine (202MHz), so this long term build up definitely suggest that the
>>> clocks are not what we think.
>>>
>>>
>>> Thanks a lot and sorry for the long email!
>>>
>>> Jeff
>>>
>>>
>>>
>>
>


Re: [casper] ROACH2 dies on fpga.read(...)

2013-07-11 Thread Ryan Monroe

Thanks!  Sounds good

Also: I take back the deterministic part.  The other roach started 
having the problem too, and it might have something to do with read 
lengths.  More to follow (eventually)


--Ryan Monroe
626.773.0805

On 07/11/2013 05:20 PM, John Ford wrote:

Hi Ryan.  We had this problem, which appeared to be a "lockup".  I think
that Glenn and some others corresponded about it, and it was due to trying
to read/write bytes instead of words over the opb bus with a buggy kernel
or a buggy library.

You might search through the mailing list for Glenn's name in about
November of last year.

John



Hey all,

I'm trying to test out a new bit file (it uses the "pcore" feature and
has 4 black boxes under the hood for what it's worth). *On one ROACH2 it
works just fine* (in the context of this problem).

On the other one, for ~1/5 of the registers, upon reading that register
the ROACH2 stops responding to all katcp commands.  From dmesg, it looks
like tcpborphserver is crashing.  It appears that the registers which
kill it are deterministic across programmings. It also looks like the
registers which fail are all shared_brams, but there is nothing
exceptional about the ones which fail, imho

Attached are the results of a python script on the two roaches, and a
dmesg output of the failed board.  In addition, pictures of the
configuration for both roaches.

Anyone seen this before?

--
--Ryan Monroe
626.773.0805









Re: [casper] ROACH2 dies on fpga.read(...)

2013-07-17 Thread Ryan Monroe

Hey Glenn, all,

Just to follow up on this, I reverted back to the old version, as 
indicated by Glenn here.  This solved my problem.  Thanks glenn!


--Ryan Monroe
626.773.0805

On 07/12/2013 07:26 AM, G Jones wrote:

Below is a message I wrote with more about the problems we had at
NRAO, which did not make it to the list. By the way, others at NRAO
are using a recent version of the repository and have had better luck,
but based on your experience I wonder if there is still some subtle
issue with marginal signals or timing on some boards.

Glenn

Previous message:

The problem was because of some errors that crept into the ska-sa
repository. I had to revert to a commit BEFORE this one
https://github.com/ska-sa/mlib_devel/commit/bad95b18fe79146d288607e5fe3c0360c071c2ad
  (easy to remember since the hash starts with 'bad' :)
Something about this EPB to OPB optimization they did messes things
up. In theory they reverted these changes, but I found it still was
present last time I looked. And this is of course the least fun kind
of problem to keep checking if it's still there...
Note I had other issues with ROACH1s with this commit too.


On Thu, Jul 11, 2013 at 8:23 PM, Ryan Monroe  wrote:

Thanks!  Sounds good

Also: I take back the deterministic part.  The other roach started having
the problem too, and it might have something to do with read lengths.  More
to follow (eventually)

--Ryan Monroe
626.773.0805


On 07/11/2013 05:20 PM, John Ford wrote:

Hi Ryan.  We had this problem, which appeared to be a "lockup".  I think
that Glenn and some others corresponded about it, and it was due to trying
to read/write bytes instead of words over the opb bus with a buggy kernel
or a buggy library.

You might search through the mailing list for Glenn's name in about
November of last year.

John



Hey all,

I'm trying to test out a new bit file (it uses the "pcore" feature and
has 4 black boxes under the hood for what it's worth). *On one ROACH2 it
works just fine* (in the context of this problem).

On the other one, for ~1/5 of the registers, upon reading that register
the ROACH2 stops responding to all katcp commands.  From dmesg, it looks
like tcpborphserver is crashing.  It appears that the registers which
kill it are deterministic across programmings. It also looks like the
registers which fail are all shared_brams, but there is nothing
exceptional about the ones which fail, imho

Attached are the results of a python script on the two roaches, and a
dmesg output of the failed board.  In addition, pictures of the
configuration for both roaches.

Anyone seen this before?

--
--Ryan Monroe
626.773.0805









[casper] Idea for speeding up CASPER build times

2013-07-23 Thread Ryan Monroe

Hey all,

So I just finished a ROACH2 build, and everything from "copy base 
package" through bitgen took almost 2 hours.  It looks like this is 
because I have 48 "shared memories", plus about another 20 
"snapshots".   The tools seem to be synthesizing each of these separately.


I would expect that many designs have several yellow blocks which are 
instantiated multiple times with identical parameters.  I don't have a 
strong background in the CASPER toolflow, but does anyone know if 
there's a way for us to cache synthesis products of identical yellow blocks?


--
--Ryan Monroe
626.773.0805




Re: [casper] Fwd: error in the slice

2013-08-07 Thread Ryan Monroe
Also, make sure the constant is at least fifteen bits
On Aug 6, 2013 10:38 PM, "Andrew Martens"  wrote:

> Hi Katty
>
>
>
>>
>>
>> Hi All,
>>
>> I tested the scalability in the number of channels of existing digital
>> spectrometers but when the size of pfb and  FFT are 2 ^15 pntos and the
>> other blocks as acc-cntrl  and vacc. I have any error:
>>
>> In the console of matlab:
>>
>> Error using gen_xps_files (line 196)
>> The S-function 'sysgen' in 'h1/fft_wideband_real/fft_**
>> biplex_real_4x0/biplex_core/**fft_stage_1/Slice' has specified the
>> option SS_OPTION_PORT_SAMPLE_TIMES_**ASSIGNED and specified inherited
>> for sample time number 0. Inheriting a sample time is not supported when
>> specifying SS_OPTION_PORT_SAMPLE_TIMES_**ASSIGNED
>>
>>
> I think the Slice block giving the error is used to extract a bit of the
> Shift input. Are you using a constant block for the FFT shift input? If you
> are, turn on the option to make it a 'Sampled constant' in the mask, and
> make the 'Sample period' 1. I have also seen System Generator give strange
> 'Internal errors' sometimes when trying to propagate sample times that have
> not been set.
>
> Regards
> Andrew
>
>
>
>
>
>


[casper] Are you using the "black box" feature? If so, this will speed up your compile times...

2013-08-13 Thread Ryan Monroe

Hey CASPER,

I've been working with some designs which have an obnoxious number of 
yellow blocks, and the compile times are considerable (there was a 
thread about this and caching netlists earlier).


If you're just updating some of your  but not changing any 
yellow-blocks, you can actually do the following to recompile your 
design without re-synthesizing all of those yellow blocks:


(initial state: run through EDK/ISE/Bitgen at least once)
1.  For each core which is being revised, click Generate on XSG
2.  Copy the revised cores to XPS_ROACH_BASE/pcores
3.  Check that they are, indeed, revised ("ls -ltrh")
4.  Navigate to XPS_ROACH_BASE/implementation
5. "rm system.bld system.ngc system.ncd system.bit system.par"
6. Run EDK/ISE/Bitgen on casper_xps
7. Thank me for giving you an hour of your life back


I tried this for the user_ip core (the main one which has everything 
which isn't in a pcore/), but it seems like it's synthesized 
into a wrapper earlier, and thus you can't do this trick on it   Maybe 
some enterprising casperite can refactor the code to make it possible.


--
--Ryan Monroe
626.773.0805




Re: [casper] Random failures on Compile

2013-10-22 Thread Ryan Monroe
I've also experienced these.  The most common cause seems to be a constant
which is not "sampled".  I don't have a good rule, but sometimes when I
start removing components on my design, it starts working, which shows me
the element at fault.  In general, placing a component which is known to
work on its own into a larger system does not cause problems.

Then there's also the times where I reduce the entire system to something
trivial like a counter and it still happens.  In that case I save copies of
everything and reboot.  Followed by  anger management.
On Oct 22, 2013 11:44 AM, "Ross Williamson" 
wrote:

> I'm in a rather frustrating situation where the casper tools will only
> compile in a seemingly random nature.  I can load up a design that I
> know compiles into simulink and run it and I will get an error like
> the following:
>
> Error reported by S-function 'sysgen' in
> 'ccat_corr_lowbits/ADC0/Constant6':
> An internal error occurred in the Xilinx Blockset Library.
>
> There is nothing obviously wrong with the constant (and the whole .slx
> has compiled as is before)
>
> Shutting down and starting up matlab sometimes fixes the issue -
> trying to compiled a different .slx and then going back to the one
> that doesn't work sometimes fixes it (Sometimes does not) -  I can
> screw around a whole day trying to get it to compile and the next day
> it works fine.
>
> Has anyone experienced this (pseudo) random compile problems and know
> of a good sequence to make a model compile again?
>
> Ross
>
>
> --
> Ross Williamson
> Research Scientist - Sub-mm Group
> California Institute of Technology
> 626-395-2647 (office)
> 312-504-3051 (Cell)
>
>


Re: [casper] Library must be saved before creating new library links

2013-11-18 Thread Ryan Monroe
Sounds like a time to set breakpoints and step through the code to me
On Nov 18, 2013 11:00 AM, "G Jones"  wrote:

> Hi,
> When I try to update the FFT (ska-sa F505ED55C8) I'm getting an error
> in fft_wideband_real/fft_direct/butterfly0_0/twiddle/coeff_gen:
> Initialization commands cannot be evaluated. --> Library must be saved
> before creating new library links
>
> This is a new one to me. Any ideas what it means? The libraries are
> all saved (I haven't even opened them). The model has been saved too.
>
> Thanks,
> Glenn
>
>


Re: [casper] Problems with Speed Optimization toolflow

2013-11-22 Thread Ryan Monroe
Hey Andres, my strategy has generally been to use plan ahead to generate a
ucf file, which I then place in data/ system.ucf.  then re run the tools
for edk ise bitgen.

Works consistently for me
On Nov 22, 2013 11:31 AM, "Andres Alvear"  wrote:

> Hi everyone,
>
> I'm working on Speed Optimization with PlanAhead, I've a Simulink design
> of a Spectrometer of 2048-channels and 2 ADCs ADC083000 to 1GSPS in
> interleaved mode, and I want to meet a time optimization increasing the
> bandwidth to 1GHz from the actual 500MHz and of course increase the numbers
> of channels at least to 4096, but with the conventional tool flow is
> impossible.
>
> First thing I told the system I wanted it to go to at 250 MHz, but my
> actual clock rate is about 120MHz too low!! However the system is working
> stable until 125MHz, so I can setup the ADC clock rate to 500MHz to have
> 1GSPS getting a 500MHz of bandwidth to each ADC.
>
> So I have been working on PlanAhead in a Floorplanning optimization the
> hardware implemented in the FPGA Virtex-5 SX95T, but after make the
> floorplanning edit my constraint file like Ryan Monroe say in his last
> memo. I got a 23% of Speed optimization from 120MHz to 148MHz, but I need
> meet time at least to 200MHz. However I have problems generating functional
> borph executables, and I'm hoping someone can help me figure out why. Since
> I'm targeting high speeds. This one is the error from Borph when I try to
> run from a ssh session:
>
> root@roach:/boffiles# ./system_2.bof
>
> -bash: ./system_2.bof: Input/output error
>
> Then in a ipython 2.7 terminal to check if you managed to connect to your
> ROACH:
>
> In [9]: fpga.is_connected()
>
> Out[9]: True
>
> Let's set the bitstream running using the progdev() command:
>
> In [10]: fpga.progdev('system_2.bof') <---generated from mkbof
>
> Out[10]: 'ok'
>
> See the ROACH and the leds not blinking. I placed these ones to see the
> working of my design, but these both not blinking at all: led0_sync,
> led1_new_acc.
>
>
> Do you think that I am in the right the way? Does anyone know something
> about these problems?
>
>
> Cheers!
>
> Andres Alvear
>


Re: [casper] Problems with Speed Optimization toolflow

2013-11-26 Thread Ryan Monroe

Hey Andres,
Closing timing on an FPGA design is not easy.  I'd help you but right 
now I'm a grad student and drowning in my own work.  Sorry! I'll refer 
you to this (highly unprofessional) report I gave someone for closing 
timing on a ROACH2 design once.  That's the best I can offer you now


https://dl.dropboxusercontent.com/u/2832602/roach2_timing.zip

--Ryan Monroe
626.773.0805

On 11/26/2013 09:04 AM, Andres Alvear wrote:

Thanks Ryan,
I have just generated my first .bof after re running the tools for 
EDK/ISE/Bitgen successfully but I have not been able to view any speed 
optimization so far as you may see in my results in the attached 
picture. After the compilation my design ran with a clock rate of 
54MHz reaching 216MHz of Bandwidth on each spectrometer. The results 
obtained from the constraint generated in the floorplanning process 
were introduced in the "system.ucf' file that was located in the 
following folders:

/opt/workspace/spectrometer_dctrl_op/XPS_ROACH_base/data
/opt/workspace/spectrometer_dctrl_op/XPS_ROACH_base/implementation
In the "data" folder I removed system.ucf and system.ucf.bac and then 
just put my version (with the floorplan) of "system.ucf" in its place. 
Then, in the "implementation" folder I replaced the "system.ucf" file 
with my version. Finally, I opened the simulink design and then re-ran 
it with just EDK/ISE/Bitgen. I had a successful compilation with a new 
.bof file. This one is working in the ROACH 1. However, my timing 
constrains were not met. I'm going to attach my constrains to see if 
you have some idea the possible problems. Given my constrain file 
attached, what values would you put in the constrains so that the 
system run at 400MHz? What do you think about my Global Timing 
Constrains? Specifically what are your thoughts about my timing groups 
that were generated from casper_xps toolflow compilation? Are they all 
right?


Cheers

Andres Alvear




2013/11/22 Ryan Monroe <mailto:ryan.m.mon...@gmail.com>>


Hey Andres, my strategy has generally been to use plan ahead to
generate a ucf file, which I then place in data/ system.ucf.  then
re run the tools for edk ise bitgen.

Works consistently for me

On Nov 22, 2013 11:31 AM, "Andres Alvear"
mailto:andres.alve...@gmail.com>> wrote:

Hi everyone,

I'm working on Speed Optimization with PlanAhead, I've a
Simulink design of a Spectrometer of 2048-channels and 2 ADCs
ADC083000 to 1GSPS in interleaved mode, and I want to meet a
time optimization increasing the bandwidth to 1GHz from the
actual 500MHz and of course increase the numbers of channels
at least to 4096, but with the conventional tool flow is
impossible.

First thing I told the system I wanted it to go to at 250 MHz,
but my actual clock rate is about 120MHz too low!! However the
system is working stable until 125MHz, so I can setup the ADC
clock rate to 500MHz to have 1GSPS getting a 500MHz of
bandwidth to each ADC.

So I have been working on PlanAhead in a Floorplanning
optimization the hardware implemented in the FPGA Virtex-5
SX95T, but after make the floorplanning edit my constraint
file like Ryan Monroe say in his last memo. I got a 23% of
Speed optimization from 120MHz to 148MHz, but I need meet time
at least to 200MHz. However I have problems generating
functional borph executables, and I'm hoping someone can help
me figure out why. Since I'm targeting high speeds. This one
is the error from Borph when I try to run from a ssh session:

root@roach:/boffiles# ./system_2.bof

-bash: ./system_2.bof: Input/output error

Then in a ipython 2.7 terminal to check if you managed to
connect to your ROACH:

In [9]: fpga.is_connected()

Out[9]: True

Let's set the bitstream running using the progdev() command:

In [10]: fpga.progdev('system_2.bof') <---generated
from mkbof

Out[10]: 'ok'

See the ROACH and the leds not blinking. I placed these ones
to see the working of my design, but these both not blinking
at all: led0_sync, led1_new_acc.



Do you think that I am in the right the way? Does anyone know
something about these problems?


Cheers!

Andres Alvear






Re: [casper] Roach-1 startup problem

2013-12-21 Thread Ryan Monroe
Sounds to me like a corrupted filesystem.  Try this:

http://www.mail-archive.com/casper@lists.berkeley.edu/msg03241.html



On Sat, Dec 21, 2013 at 3:05 PM, Rick Raffanti  wrote:

> Hello caspersphere,
> Can anybody help me?  My old Roach-1 board has suddenly stopped
> networking.  I get a boot message of "setting up networking.. line
> 85:/etc/network/run/ifstate: Stale NFS file handle  ...failed!"
> Though I can configure the FPGA etc via the serial port, networking
> doesn't work.  I haven't used it in a few months, but I've tried booting
> from both a thumb drive and a SD card, both of which used to work.
>
> Thanks
>
> Rick
>


  1   2   >