Re: [casper] spectrometer implementation using LX110T instead of SX95T

2014-10-27 Thread Jack Hickish
Hi louis,

I've just checked the spec sheet - 64 multipliers!! I'm guessing you ran
out of slices when you (or maybe the compiler) pushed lots of multipliers
into logic? (the lx has more slices than the sx)
Maybe send around your utilisation summary tomorrow - it sounds like you
might need to find some pretty substantial savings from somewhere.

A few things which might help save logic:
- hard code the fft shift schedule
- lower the number of fft bits, or use bit growth
- if the fft uses pipelined convert cores for rounding, using a cheap
rounding strategy with low latency should help.
- combine fir filters in a single block which shares logic (if you haven't
already), same with the fft.

But you might be better off finding a different roach :)

Jack
 On 28 Oct 2014 00:37, "Louis Dartez"  wrote:

> Hi Dan,
> Slices is what seems to be problem from the error report (which I can send
> around tomorrow morning when I’m back in the lab). I seem to remember that
> that the compiler raised an error stating that I was trying to ~80k slices
> when only ~60k are available. I knew this would be a slippery slope when I
> started. But it would be great if we could salvage our LX110T.
>
> Any chance someone out there has a ROACHI SX95T that’s just collecting
> dust?
> L
>
> Louis P. Dartez
>
> Graduate Research Assistant
>
> STARGATE
>
> Center for Advanced Radio Astronomy
>
> University of Texas Rio Grande Valley
>
> (956) 372-5812
>
>
> On Oct 27, 2014, at 7:31 PM, Dan Werthimer  wrote:
>
> hi louis,
>
> are you running out of memory?   dsp48's?  slices?
>
> if memory, the easiest thing to do is cut back on
> number of frequency channels.
>
> best,
>
> dan
>
>
> On Mon, Oct 27, 2014 at 4:59 PM, Louis Dartez 
> wrote:
>
> Hi all,
>
> I have implemented a 4 channel 200MHz correlating spectrometer on a ROACH 1
> using the Virtex5 SX95T. Currently, I am trying to get this same design to
> compile and run on a LX110T chip instead. I know that the SX95T is much
> more
> DSP intensive and more suitable for this sort of thing. During compilation
> for the LX110T I ran into the expected issues with resources and trying to
> use more than were available on the LX110T. I was wondering if anyone had
> any tips/advice on how to go about this? Has anyone out there run into
> similar situations? What knobs should I be able to tweak to get the design
> to compile for a LX110T? Is it even possible?
>
> I’d be more than happy to share my mdl (slx) files if needed. :)
>
> Thanks in advance!
> L
>
> Louis P. Dartez
>
> Graduate Research Assistant
>
> STARGATE
>
> Center for Advanced Radio Astronomy
>
> University of Texas at Brownsville
>
> (956) 372-5812
>
>
>
>


Re: [casper] spectrometer implementation using LX110T instead of SX95T

2014-10-27 Thread Louis Dartez
Hi Dan, 
Slices is what seems to be problem from the error report (which I can 
send around tomorrow morning when I’m back in the lab). I seem to remember that 
that the compiler raised an error stating that I was trying to ~80k slices when 
only ~60k are available. I knew this would be a slippery slope when I started. 
But it would be great if we could salvage our LX110T. 

Any chance someone out there has a ROACHI SX95T that’s just collecting 
dust?
L
> Louis P. Dartez
> Graduate Research Assistant
> STARGATE
> Center for Advanced Radio Astronomy
> University of Texas Rio Grande Valley
> (956) 372-5812

> On Oct 27, 2014, at 7:31 PM, Dan Werthimer  wrote:
> 
> hi louis,
> 
> are you running out of memory?   dsp48's?  slices?
> 
> if memory, the easiest thing to do is cut back on
> number of frequency channels.
> 
> best,
> 
> dan
> 
> 
> On Mon, Oct 27, 2014 at 4:59 PM, Louis Dartez  wrote:
>> Hi all,
>> 
>> I have implemented a 4 channel 200MHz correlating spectrometer on a ROACH 1
>> using the Virtex5 SX95T. Currently, I am trying to get this same design to
>> compile and run on a LX110T chip instead. I know that the SX95T is much more
>> DSP intensive and more suitable for this sort of thing. During compilation
>> for the LX110T I ran into the expected issues with resources and trying to
>> use more than were available on the LX110T. I was wondering if anyone had
>> any tips/advice on how to go about this? Has anyone out there run into
>> similar situations? What knobs should I be able to tweak to get the design
>> to compile for a LX110T? Is it even possible?
>> 
>> I’d be more than happy to share my mdl (slx) files if needed. :)
>> 
>> Thanks in advance!
>> L
>> 
>> Louis P. Dartez
>> 
>> Graduate Research Assistant
>> 
>> STARGATE
>> 
>> Center for Advanced Radio Astronomy
>> 
>> University of Texas at Brownsville
>> 
>> (956) 372-5812
>> 
>> 



Re: [casper] spectrometer implementation using LX110T instead of SX95T

2014-10-27 Thread Dan Werthimer
hi louis,

are you running out of memory?   dsp48's?  slices?

if memory, the easiest thing to do is cut back on
number of frequency channels.

best,

dan


On Mon, Oct 27, 2014 at 4:59 PM, Louis Dartez  wrote:
> Hi all,
>
> I have implemented a 4 channel 200MHz correlating spectrometer on a ROACH 1
> using the Virtex5 SX95T. Currently, I am trying to get this same design to
> compile and run on a LX110T chip instead. I know that the SX95T is much more
> DSP intensive and more suitable for this sort of thing. During compilation
> for the LX110T I ran into the expected issues with resources and trying to
> use more than were available on the LX110T. I was wondering if anyone had
> any tips/advice on how to go about this? Has anyone out there run into
> similar situations? What knobs should I be able to tweak to get the design
> to compile for a LX110T? Is it even possible?
>
> I’d be more than happy to share my mdl (slx) files if needed. :)
>
> Thanks in advance!
> L
>
> Louis P. Dartez
>
> Graduate Research Assistant
>
> STARGATE
>
> Center for Advanced Radio Astronomy
>
> University of Texas at Brownsville
>
> (956) 372-5812
>
>



[casper] spectrometer implementation using LX110T instead of SX95T

2014-10-27 Thread Louis Dartez
Hi all, 

I have implemented a 4 channel 200MHz correlating spectrometer on a 
ROACH 1 using the Virtex5 SX95T. Currently, I am trying to get this same design 
to compile and run on a LX110T chip instead. I know that the SX95T is much more 
DSP intensive and more suitable for this sort of thing. During compilation for 
the LX110T I ran into the expected issues with resources and trying to use more 
than were available on the LX110T. I was wondering if anyone had any 
tips/advice on how to go about this? Has anyone out there run into similar 
situations? What knobs should I be able to tweak to get the design to compile 
for a LX110T? Is it even possible?

I’d be more than happy to share my mdl (slx) files if needed. :)

Thanks in advance!
L
> Louis P. Dartez
> Graduate Research Assistant
> STARGATE
> Center for Advanced Radio Astronomy
> University of Texas at Brownsville
> (956) 372-5812



Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread David MacMahon
Hi, Richard and Peter,

Another possibility that crossed my mind is perhaps your ROACH2s were from the 
batch where the incorrect oscillator was installed for U72.  This seems 
unlikely for Richard based on this email (which also describes the incorrect 
oscillator problem in general):

https://www.mail-archive.com/casper@lists.berkeley.edu/msg04909.html

Maybe it's worth a double check anyway?

Dave

On Oct 27, 2014, at 1:41 PM, Richard Black wrote:

> David,
> 
> We'll take another close look at what model we are actually using, just to be 
> safe.
> 
> I went back and looked at our e-mails, and sure enough, you're right. You 
> were referring to the MTU issue as being the problem you tend to suppress all 
> memory of. It was just that you stated it in a separate paragraph, so, 
> out-of-context, I extrapolated that you have had the same problem before. My 
> bad for dragging your good name through the mud. :)
> 
> We will also update our local repositories, in the event some bizarre race 
> condition exists on our end.
> 
> I didn't know that the buffer could fill up while reset was asserted. We'll 
> definitely have to check up on that too.
> 
> We haven't tried dumping raw ADC data yet since we have been trying to get 
> the data link working first. After that, we were planning to inject signal 
> and examine outputs.
> 
> Thanks,
> 
> Richard Black
> 
> On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon  
> wrote:
> Hi, Richard,
> 
> On Oct 27, 2014, at 9:25 AM, Richard Black wrote:
> 
> > This is a reportedly fully-functional model that shouldn't require any 
> > major changes in order to operate. However, this has clearly not been the 
> > case in at least two independent situations (us and Peter). This begs the 
> > question: what's so different about our use of PAPER?
> 
> I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is the 
> one being used by the PAPER correlator currently fielded in South Africa.  It 
> is definitely a fully functional model.  That image (and all source files for 
> it) is available from the git repo listed on the PAPER Correlator Manifest 
> page of the CASPER Wiki:
> 
> https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest
> 
> > We, at BYU, have made painstakingly sure that our IP addressing schemes, 
> > switch ports, and scripts are all configured correctly (thanks to David 
> > MacMahon for that, btw), but we still have hit the proverbial brick wall of 
> > 10-GbE overflow.  When I last corresponded with David, he explained that he 
> > remembers having a similar issue before, but can't recall exactly what the 
> > problem was.
> 
> Really?  I recall saying that I often forget about increasing the MTU of the 
> 10 GbE switch and NICs.  I don't recall saying that I had a similar issue 
> before but couldn't remember the problem.
> 
> > In any case, the fact that by turning down the ADC clock prior to start-up 
> > prevents the 10-GbE core from overflowing is a major lead for us at BYU 
> > (we've been spinning our wheels on this issue for several months now). By 
> > no means are we proposing mid-run ADC clock modifications, but this appears 
> > to be a very subtle (and quite sinister, in my opinion) bug.
> >
> > Any thoughts as to what might be going on?
> 
> I cannot explain the 10 GbE overflow that you and Peter are experiencing.  I 
> have pushed some updates to the rb-papergpu.git repository listed on the 
> PAPER Correlator Manifest page.  The paper_feng_init.rb script now verifies 
> that the ADC clocks are locked and provides options for issuing a software 
> sync (only recommended for lab use) and for not storing the time of 
> synchronization in redis (also only recommended for lab use).
> 
> The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1) 
> while they are held in reset.  Since you are using the paper_feng_init.rb 
> script, this should not be happening (unless something has gone wrong during 
> the running of that script) because that script specifically and explicitly 
> disables the tx_valid signal before putting the cores into reset and it takes 
> the cores out of reset before enabling the tx_valid signal.  So assuming that 
> this is not the cause of the overflows, there must be something else that is 
> causing the 10 GbE cores to be unable to transmit data fast enough to keep up 
> with the data stream it is being fed.  Two things that could cause this are 
> 1) running the design faster than the 200 MHz sample clock that it was built 
> for and/or 2) some link issue that prevents the core from sending data.  
> Unfortunately, I think both of those ideas are also pretty far fetched given 
> all you've done to try to get the system working.  I wonder whether there is 
> some difference in the ROACH2 firmware (u-boot version or CPLD programming) 
> or PPC Linux setup or tcpborhpserver revision or ???.
> 
> Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data to 
> make sure that i

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
David,

We'll take another close look at what model we are actually using, just to
be safe.

I went back and looked at our e-mails, and sure enough, you're right. You
were referring to the MTU issue as being the problem you tend to suppress
all memory of. It was just that you stated it in a separate paragraph, so,
out-of-context, I extrapolated that you have had the same problem before.
My bad for dragging your good name through the mud. :)

We will also update our local repositories, in the event some bizarre race
condition exists on our end.

I didn't know that the buffer could fill up while reset was asserted. We'll
definitely have to check up on that too.

We haven't tried dumping raw ADC data yet since we have been trying to get
the data link working first. After that, we were planning to inject signal
and examine outputs.

Thanks,

Richard Black

On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon 
wrote:

> Hi, Richard,
>
> On Oct 27, 2014, at 9:25 AM, Richard Black wrote:
>
> > This is a reportedly fully-functional model that shouldn't require any
> major changes in order to operate. However, this has clearly not been the
> case in at least two independent situations (us and Peter). This begs the
> question: what's so different about our use of PAPER?
>
> I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is
> the one being used by the PAPER correlator currently fielded in South
> Africa.  It is definitely a fully functional model.  That image (and all
> source files for it) is available from the git repo listed on the PAPER
> Correlator Manifest page of the CASPER Wiki:
>
> https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest
>
> > We, at BYU, have made painstakingly sure that our IP addressing schemes,
> switch ports, and scripts are all configured correctly (thanks to David
> MacMahon for that, btw), but we still have hit the proverbial brick wall of
> 10-GbE overflow.  When I last corresponded with David, he explained that he
> remembers having a similar issue before, but can't recall exactly what the
> problem was.
>
> Really?  I recall saying that I often forget about increasing the MTU of
> the 10 GbE switch and NICs.  I don't recall saying that I had a similar
> issue before but couldn't remember the problem.
>
> > In any case, the fact that by turning down the ADC clock prior to
> start-up prevents the 10-GbE core from overflowing is a major lead for us
> at BYU (we've been spinning our wheels on this issue for several months
> now). By no means are we proposing mid-run ADC clock modifications, but
> this appears to be a very subtle (and quite sinister, in my opinion) bug.
> >
> > Any thoughts as to what might be going on?
>
> I cannot explain the 10 GbE overflow that you and Peter are experiencing.
> I have pushed some updates to the rb-papergpu.git repository listed on the
> PAPER Correlator Manifest page.  The paper_feng_init.rb script now verifies
> that the ADC clocks are locked and provides options for issuing a software
> sync (only recommended for lab use) and for not storing the time of
> synchronization in redis (also only recommended for lab use).
>
> The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1)
> while they are held in reset.  Since you are using the paper_feng_init.rb
> script, this should not be happening (unless something has gone wrong
> during the running of that script) because that script specifically and
> explicitly disables the tx_valid signal before putting the cores into reset
> and it takes the cores out of reset before enabling the tx_valid signal.
> So assuming that this is not the cause of the overflows, there must be
> something else that is causing the 10 GbE cores to be unable to transmit
> data fast enough to keep up with the data stream it is being fed.  Two
> things that could cause this are 1) running the design faster than the 200
> MHz sample clock that it was built for and/or 2) some link issue that
> prevents the core from sending data.  Unfortunately, I think both of those
> ideas are also pretty far fetched given all you've done to try to get the
> system working.  I wonder whether there is some difference in the ROACH2
> firmware (u-boot version or CPLD programming) or PPC Linux setup or
> tcpborhpserver revision or ???.
>
> Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data
> to make sure that it looks OK?
>
> Dave
>
>


Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread David MacMahon
Hi, Richard,

On Oct 27, 2014, at 9:25 AM, Richard Black wrote:

> This is a reportedly fully-functional model that shouldn't require any major 
> changes in order to operate. However, this has clearly not been the case in 
> at least two independent situations (us and Peter). This begs the question: 
> what's so different about our use of PAPER?

I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is the one 
being used by the PAPER correlator currently fielded in South Africa.  It is 
definitely a fully functional model.  That image (and all source files for it) 
is available from the git repo listed on the PAPER Correlator Manifest page of 
the CASPER Wiki:

https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest

> We, at BYU, have made painstakingly sure that our IP addressing schemes, 
> switch ports, and scripts are all configured correctly (thanks to David 
> MacMahon for that, btw), but we still have hit the proverbial brick wall of 
> 10-GbE overflow.  When I last corresponded with David, he explained that he 
> remembers having a similar issue before, but can't recall exactly what the 
> problem was.

Really?  I recall saying that I often forget about increasing the MTU of the 10 
GbE switch and NICs.  I don't recall saying that I had a similar issue before 
but couldn't remember the problem.

> In any case, the fact that by turning down the ADC clock prior to start-up 
> prevents the 10-GbE core from overflowing is a major lead for us at BYU 
> (we've been spinning our wheels on this issue for several months now). By no 
> means are we proposing mid-run ADC clock modifications, but this appears to 
> be a very subtle (and quite sinister, in my opinion) bug.
> 
> Any thoughts as to what might be going on?

I cannot explain the 10 GbE overflow that you and Peter are experiencing.  I 
have pushed some updates to the rb-papergpu.git repository listed on the PAPER 
Correlator Manifest page.  The paper_feng_init.rb script now verifies that the 
ADC clocks are locked and provides options for issuing a software sync (only 
recommended for lab use) and for not storing the time of synchronization in 
redis (also only recommended for lab use).

The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1) 
while they are held in reset.  Since you are using the paper_feng_init.rb 
script, this should not be happening (unless something has gone wrong during 
the running of that script) because that script specifically and explicitly 
disables the tx_valid signal before putting the cores into reset and it takes 
the cores out of reset before enabling the tx_valid signal.  So assuming that 
this is not the cause of the overflows, there must be something else that is 
causing the 10 GbE cores to be unable to transmit data fast enough to keep up 
with the data stream it is being fed.  Two things that could cause this are 1) 
running the design faster than the 200 MHz sample clock that it was built for 
and/or 2) some link issue that prevents the core from sending data.  
Unfortunately, I think both of those ideas are also pretty far fetched given 
all you've done to try to get the system working.  I wonder whether there is 
some difference in the ROACH2 firmware (u-boot version or CPLD programming) or 
PPC Linux setup or tcpborhpserver revision or ???.

Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data to 
make sure that it looks OK?

Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jack Hickish
Hi Richard,

That's my theory, though I doubt it's right. But as you say, an easy
test is just to delay after issuing a sync for a couple more seconds
and see if that helps. But if your PPS is a real PPS (rather than just
a square wave at some vague 1s period) then I can't see what
difference this would make.
When that doesn't help, my inclination would be to start prodding the
10gbe control signals from software to make sure the reset / sw
enables are working / see if a tge reset without a new sync behaves
differently. But I can't imagine how that would be broken unless the
stuff on github is out of date (which I doubt).

Jack

On 27 October 2014 17:28, Richard Black  wrote:
> Jack,
>
> I appreciate your help. I tend to agree that the issue is likely a hardware
> configuration problem, but we have been trying to match it as closely as
> possible.
>
> We do feed a 1-PPS signal into the board, but I'm hazy on the details of the
> other pulse parameters. I'll look into that as well.
>
> So, if I understand you correctly, you believe that the sync pulse is
> reaching the ethernet interfaces after the cores are enabled? If that is the
> case, couldn't we delay enabling the 10-GbE cores for another second to fix
> it? This might be a quick way to test that theory, but please correct me if
> I've misunderstood.
>
> Richard Black
>
> On Mon, Oct 27, 2014 at 11:05 AM, Jack Hickish 
> wrote:
>>
>> Hi Richard,
>>
>> I've just had a very brief look at the design / software, so take this
>> email with a pinch of salt, but on the off-chance you haven't checked
>> this
>>
>> It looks like the PAPER F-engine setup on running the start script for
>> software / firmware out of the box is --
>>
>> 1. Disable all ethernet interfaces
>> 2. Arm sync generator, wait 1 second for PPS
>> 3. Reset ethernet interfaces
>> 4. Enable interfaces.
>>
>> These four steps seem like they should be safe, yet the behaviour
>> you're describing sounds like the design is midway sending a packet,
>> then gets a sync, gives up sending an end-of-frame and starts sending
>> a new packet, at which point the old packet + the new packet =
>> overflow.
>>
>> Knowing that the design works for paper, my wondering is whether after
>> arming the sync generator syncs are flowing through the design before
>> the ethernet interface is enabled. Do you have a PPS-like input? the
>> fengine initialisation script seems to wait for a second after arming,
>> but if your sync input is something significantly slower, you could
>> have problems.
>>
>> I'm sceptical about this theory (I think the symptoms would be lots of
>> OK packets when you brought up the interface, and then it dying when
>> the sync arrives, rather than a single good packet like you're
>> seeing), but if the firmware + software really is the same as that
>> working with paper, and the wiki hasn't just got out of sync with the
>> paper devs, perhaps the problem is in your hardware setup
>>
>> Cheers,
>> Jack
>>
>> On 27 October 2014 16:38, Richard Black  wrote:
>> > By "enable" port, I assume you mean the "valid" port. I've been looking
>> > at
>> > the PAPER model carefully for some time now, and that is how it
>> > operates. It
>> > has a gated valid signal with a software register on each 10-GbE core.
>> >
>> > Once again, this is not our model. This is one made available on the
>> > CASPER
>> > wiki and run without modifications.
>> >
>> > Richard Black
>> >
>> > On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley 
>> > wrote:
>> >>
>> >> I suspect the 10GbE core's input FIFO is overflowing on startup. One
>> >> key
>> >> thing with this core is to the ensure that your design keeps the enable
>> >> port
>> >> held low until the core's been configured. The core becomes unusable
>> >> once
>> >> the TX FIFO overflows. This has been a long-standing bug (my emails
>> >> trace
>> >> back to 2009) but it's so easy to work around that I don't think
>> >> anyone's
>> >> bothered looking into fixing it.
>> >>
>> >> Jason Manley
>> >> CBF Manager
>> >> SKA-SA
>> >>
>> >> Cell: +27 82 662 7726
>> >> Work: +27 21 506 7300
>> >>
>> >> On 27 Oct 2014, at 18:25, Richard Black  wrote:
>> >>
>> >> > Jason,
>> >> >
>> >> > Thanks for your comments. While I agree that changing the ADC
>> >> > frequency
>> >> > mid-operation is non-kosher and could result in uncertain behavior,
>> >> > the
>> >> > issue at hand for us is to figure out what is going on with the PAPER
>> >> > model
>> >> > that has been published on the CASPER wiki. This naturally won't be
>> >> > (and
>> >> > shouldn't be) the end-all solution to this problem.
>> >> >
>> >> > This is a reportedly fully-functional model that shouldn't require
>> >> > any
>> >> > major changes in order to operate. However, this has clearly not been
>> >> > the
>> >> > case in at least two independent situations (us and Peter). This begs
>> >> > the
>> >> > question: what's so different about our use of PAPER?
>> >> >
>> >> > We, at BYU, have made painstakingly su

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
Jack,

I appreciate your help. I tend to agree that the issue is likely a hardware
configuration problem, but we have been trying to match it as closely as
possible.

We do feed a 1-PPS signal into the board, but I'm hazy on the details of
the other pulse parameters. I'll look into that as well.

So, if I understand you correctly, you believe that the sync pulse is
reaching the ethernet interfaces *after* the cores are enabled? If that is
the case, couldn't we delay enabling the 10-GbE cores for another second to
fix it? This might be a quick way to test that theory, but please correct
me if I've misunderstood.

Richard Black

On Mon, Oct 27, 2014 at 11:05 AM, Jack Hickish 
wrote:

> Hi Richard,
>
> I've just had a very brief look at the design / software, so take this
> email with a pinch of salt, but on the off-chance you haven't checked
> this
>
> It looks like the PAPER F-engine setup on running the start script for
> software / firmware out of the box is --
>
> 1. Disable all ethernet interfaces
> 2. Arm sync generator, wait 1 second for PPS
> 3. Reset ethernet interfaces
> 4. Enable interfaces.
>
> These four steps seem like they should be safe, yet the behaviour
> you're describing sounds like the design is midway sending a packet,
> then gets a sync, gives up sending an end-of-frame and starts sending
> a new packet, at which point the old packet + the new packet =
> overflow.
>
> Knowing that the design works for paper, my wondering is whether after
> arming the sync generator syncs are flowing through the design before
> the ethernet interface is enabled. Do you have a PPS-like input? the
> fengine initialisation script seems to wait for a second after arming,
> but if your sync input is something significantly slower, you could
> have problems.
>
> I'm sceptical about this theory (I think the symptoms would be lots of
> OK packets when you brought up the interface, and then it dying when
> the sync arrives, rather than a single good packet like you're
> seeing), but if the firmware + software really is the same as that
> working with paper, and the wiki hasn't just got out of sync with the
> paper devs, perhaps the problem is in your hardware setup
>
> Cheers,
> Jack
>
> On 27 October 2014 16:38, Richard Black  wrote:
> > By "enable" port, I assume you mean the "valid" port. I've been looking
> at
> > the PAPER model carefully for some time now, and that is how it
> operates. It
> > has a gated valid signal with a software register on each 10-GbE core.
> >
> > Once again, this is not our model. This is one made available on the
> CASPER
> > wiki and run without modifications.
> >
> > Richard Black
> >
> > On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley 
> wrote:
> >>
> >> I suspect the 10GbE core's input FIFO is overflowing on startup. One key
> >> thing with this core is to the ensure that your design keeps the enable
> port
> >> held low until the core's been configured. The core becomes unusable
> once
> >> the TX FIFO overflows. This has been a long-standing bug (my emails
> trace
> >> back to 2009) but it's so easy to work around that I don't think
> anyone's
> >> bothered looking into fixing it.
> >>
> >> Jason Manley
> >> CBF Manager
> >> SKA-SA
> >>
> >> Cell: +27 82 662 7726
> >> Work: +27 21 506 7300
> >>
> >> On 27 Oct 2014, at 18:25, Richard Black  wrote:
> >>
> >> > Jason,
> >> >
> >> > Thanks for your comments. While I agree that changing the ADC
> frequency
> >> > mid-operation is non-kosher and could result in uncertain behavior,
> the
> >> > issue at hand for us is to figure out what is going on with the PAPER
> model
> >> > that has been published on the CASPER wiki. This naturally won't be
> (and
> >> > shouldn't be) the end-all solution to this problem.
> >> >
> >> > This is a reportedly fully-functional model that shouldn't require any
> >> > major changes in order to operate. However, this has clearly not been
> the
> >> > case in at least two independent situations (us and Peter). This begs
> the
> >> > question: what's so different about our use of PAPER?
> >> >
> >> > We, at BYU, have made painstakingly sure that our IP addressing
> schemes,
> >> > switch ports, and scripts are all configured correctly (thanks to
> David
> >> > MacMahon for that, btw), but we still have hit the proverbial brick
> wall of
> >> > 10-GbE overflow.  When I last corresponded with David, he explained
> that he
> >> > remembers having a similar issue before, but can't recall exactly
> what the
> >> > problem was.
> >> >
> >> > In any case, the fact that by turning down the ADC clock prior to
> >> > start-up prevents the 10-GbE core from overflowing is a major lead
> for us at
> >> > BYU (we've been spinning our wheels on this issue for several months
> now).
> >> > By no means are we proposing mid-run ADC clock modifications, but this
> >> > appears to be a very subtle (and quite sinister, in my opinion) bug.
> >> >
> >> > Any thoughts as to what might be going on?
> >> >
> >> > Richard 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jack Hickish
Hi Richard,

I've just had a very brief look at the design / software, so take this
email with a pinch of salt, but on the off-chance you haven't checked
this

It looks like the PAPER F-engine setup on running the start script for
software / firmware out of the box is --

1. Disable all ethernet interfaces
2. Arm sync generator, wait 1 second for PPS
3. Reset ethernet interfaces
4. Enable interfaces.

These four steps seem like they should be safe, yet the behaviour
you're describing sounds like the design is midway sending a packet,
then gets a sync, gives up sending an end-of-frame and starts sending
a new packet, at which point the old packet + the new packet =
overflow.

Knowing that the design works for paper, my wondering is whether after
arming the sync generator syncs are flowing through the design before
the ethernet interface is enabled. Do you have a PPS-like input? the
fengine initialisation script seems to wait for a second after arming,
but if your sync input is something significantly slower, you could
have problems.

I'm sceptical about this theory (I think the symptoms would be lots of
OK packets when you brought up the interface, and then it dying when
the sync arrives, rather than a single good packet like you're
seeing), but if the firmware + software really is the same as that
working with paper, and the wiki hasn't just got out of sync with the
paper devs, perhaps the problem is in your hardware setup

Cheers,
Jack

On 27 October 2014 16:38, Richard Black  wrote:
> By "enable" port, I assume you mean the "valid" port. I've been looking at
> the PAPER model carefully for some time now, and that is how it operates. It
> has a gated valid signal with a software register on each 10-GbE core.
>
> Once again, this is not our model. This is one made available on the CASPER
> wiki and run without modifications.
>
> Richard Black
>
> On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley  wrote:
>>
>> I suspect the 10GbE core's input FIFO is overflowing on startup. One key
>> thing with this core is to the ensure that your design keeps the enable port
>> held low until the core's been configured. The core becomes unusable once
>> the TX FIFO overflows. This has been a long-standing bug (my emails trace
>> back to 2009) but it's so easy to work around that I don't think anyone's
>> bothered looking into fixing it.
>>
>> Jason Manley
>> CBF Manager
>> SKA-SA
>>
>> Cell: +27 82 662 7726
>> Work: +27 21 506 7300
>>
>> On 27 Oct 2014, at 18:25, Richard Black  wrote:
>>
>> > Jason,
>> >
>> > Thanks for your comments. While I agree that changing the ADC frequency
>> > mid-operation is non-kosher and could result in uncertain behavior, the
>> > issue at hand for us is to figure out what is going on with the PAPER model
>> > that has been published on the CASPER wiki. This naturally won't be (and
>> > shouldn't be) the end-all solution to this problem.
>> >
>> > This is a reportedly fully-functional model that shouldn't require any
>> > major changes in order to operate. However, this has clearly not been the
>> > case in at least two independent situations (us and Peter). This begs the
>> > question: what's so different about our use of PAPER?
>> >
>> > We, at BYU, have made painstakingly sure that our IP addressing schemes,
>> > switch ports, and scripts are all configured correctly (thanks to David
>> > MacMahon for that, btw), but we still have hit the proverbial brick wall of
>> > 10-GbE overflow.  When I last corresponded with David, he explained that he
>> > remembers having a similar issue before, but can't recall exactly what the
>> > problem was.
>> >
>> > In any case, the fact that by turning down the ADC clock prior to
>> > start-up prevents the 10-GbE core from overflowing is a major lead for us 
>> > at
>> > BYU (we've been spinning our wheels on this issue for several months now).
>> > By no means are we proposing mid-run ADC clock modifications, but this
>> > appears to be a very subtle (and quite sinister, in my opinion) bug.
>> >
>> > Any thoughts as to what might be going on?
>> >
>> > Richard Black
>> >
>> > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley  wrote:
>> > Just a note that I don't recommend you adjust FPGA clock frequencies
>> > while it's operating. In theory, you should do a global reset in case the
>> > PLL/DLLs lose lock during clock transitions, in which case the logic could
>> > be in a uncertain state. But the Sysgen flow just does a single POR.
>> >
>> > A better solution might be to keep the 10GbE cores turned off (enable
>> > line pulled low) on initialisation, until things are configured (tgtap
>> > started etc), and only then enable the transmission using a SW register.
>> >
>> > Jason Manley
>> > CBF Manager
>> > SKA-SA
>> >
>> > Cell: +27 82 662 7726
>> > Work: +27 21 506 7300
>> >
>> > On 25 Oct 2014, at 10:34, peter  wrote:
>> >
>> > > Hi Richard,Joe,& all,
>> > > Thanks for your help,It finally can receive packets now!
>> > > As you point,After enabled the

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
Jason,

Fair point. One of our guys is currently trying to get ChipScope configured
to make sure all our control signals are correct. We'll definitely look at
that signal too. Hopefully that will finally put this issue to rest.

Thanks for the tip,

Richard Black

On Mon, Oct 27, 2014 at 10:47 AM, Jason Manley  wrote:

> Yep, ok, so whoever did it (Dave?) already knows about this issue and has
> dealt with it. So scratch that idea then! Only other thing to check is to
> make sure you don't actually toggle that software register until the core
> is configured.
>
> Jason Manley
> CBF Manager
> SKA-SA
>
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
>
> On 27 Oct 2014, at 18:38, Richard Black  wrote:
>
> > By "enable" port, I assume you mean the "valid" port. I've been looking
> at the PAPER model carefully for some time now, and that is how it
> operates. It has a gated valid signal with a software register on each
> 10-GbE core.
> >
> > Once again, this is not our model. This is one made available on the
> CASPER wiki and run without modifications.
> >
> > Richard Black
> >
> > On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley 
> wrote:
> > I suspect the 10GbE core's input FIFO is overflowing on startup. One key
> thing with this core is to the ensure that your design keeps the enable
> port held low until the core's been configured. The core becomes unusable
> once the TX FIFO overflows. This has been a long-standing bug (my emails
> trace back to 2009) but it's so easy to work around that I don't think
> anyone's bothered looking into fixing it.
> >
> > Jason Manley
> > CBF Manager
> > SKA-SA
> >
> > Cell: +27 82 662 7726
> > Work: +27 21 506 7300
> >
> > On 27 Oct 2014, at 18:25, Richard Black  wrote:
> >
> > > Jason,
> > >
> > > Thanks for your comments. While I agree that changing the ADC
> frequency mid-operation is non-kosher and could result in uncertain
> behavior, the issue at hand for us is to figure out what is going on with
> the PAPER model that has been published on the CASPER wiki. This naturally
> won't be (and shouldn't be) the end-all solution to this problem.
> > >
> > > This is a reportedly fully-functional model that shouldn't require any
> major changes in order to operate. However, this has clearly not been the
> case in at least two independent situations (us and Peter). This begs the
> question: what's so different about our use of PAPER?
> > >
> > > We, at BYU, have made painstakingly sure that our IP addressing
> schemes, switch ports, and scripts are all configured correctly (thanks to
> David MacMahon for that, btw), but we still have hit the proverbial brick
> wall of 10-GbE overflow.  When I last corresponded with David, he explained
> that he remembers having a similar issue before, but can't recall exactly
> what the problem was.
> > >
> > > In any case, the fact that by turning down the ADC clock prior to
> start-up prevents the 10-GbE core from overflowing is a major lead for us
> at BYU (we've been spinning our wheels on this issue for several months
> now). By no means are we proposing mid-run ADC clock modifications, but
> this appears to be a very subtle (and quite sinister, in my opinion) bug.
> > >
> > > Any thoughts as to what might be going on?
> > >
> > > Richard Black
> > >
> > > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley 
> wrote:
> > > Just a note that I don't recommend you adjust FPGA clock frequencies
> while it's operating. In theory, you should do a global reset in case the
> PLL/DLLs lose lock during clock transitions, in which case the logic could
> be in a uncertain state. But the Sysgen flow just does a single POR.
> > >
> > > A better solution might be to keep the 10GbE cores turned off (enable
> line pulled low) on initialisation, until things are configured (tgtap
> started etc), and only then enable the transmission using a SW register.
> > >
> > > Jason Manley
> > > CBF Manager
> > > SKA-SA
> > >
> > > Cell: +27 82 662 7726
> > > Work: +27 21 506 7300
> > >
> > > On 25 Oct 2014, at 10:34, peter  wrote:
> > >
> > > > Hi Richard,Joe,& all,
> > > > Thanks for your help,It finally can receive packets now!
> > > > As you point,After enabled the ADC card and run bof
> file(./adc_init.rb roach1 bof file)in 200 Mhz (or higher than it), We need
> run init fengien script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 )
> ,That will allow the packet transfer.  then we can turn the frequency
> higher.However the finally ADC clock frequency is up to 120 Mhz in my
> experiment.Our final ADC frequency standard is 250 Mhz. Maybe I need run
> the bof file in a higher ADC frequency first to make a final steady 250 Mhz
> ADC clock frequncy.
> > > > Why it need init in a lower frequency and turn it up? That didn't
> make sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
> designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
> final frequency in your experiment?
> > > > Any reply will be helpful!
> > > > Best Regards!
> > > > 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley
Yep, ok, so whoever did it (Dave?) already knows about this issue and has dealt 
with it. So scratch that idea then! Only other thing to check is to make sure 
you don't actually toggle that software register until the core is configured.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 27 Oct 2014, at 18:38, Richard Black  wrote:

> By "enable" port, I assume you mean the "valid" port. I've been looking at 
> the PAPER model carefully for some time now, and that is how it operates. It 
> has a gated valid signal with a software register on each 10-GbE core.
> 
> Once again, this is not our model. This is one made available on the CASPER 
> wiki and run without modifications.
> 
> Richard Black
> 
> On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley  wrote:
> I suspect the 10GbE core's input FIFO is overflowing on startup. One key 
> thing with this core is to the ensure that your design keeps the enable port 
> held low until the core's been configured. The core becomes unusable once the 
> TX FIFO overflows. This has been a long-standing bug (my emails trace back to 
> 2009) but it's so easy to work around that I don't think anyone's bothered 
> looking into fixing it.
> 
> Jason Manley
> CBF Manager
> SKA-SA
> 
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
> 
> On 27 Oct 2014, at 18:25, Richard Black  wrote:
> 
> > Jason,
> >
> > Thanks for your comments. While I agree that changing the ADC frequency 
> > mid-operation is non-kosher and could result in uncertain behavior, the 
> > issue at hand for us is to figure out what is going on with the PAPER model 
> > that has been published on the CASPER wiki. This naturally won't be (and 
> > shouldn't be) the end-all solution to this problem.
> >
> > This is a reportedly fully-functional model that shouldn't require any 
> > major changes in order to operate. However, this has clearly not been the 
> > case in at least two independent situations (us and Peter). This begs the 
> > question: what's so different about our use of PAPER?
> >
> > We, at BYU, have made painstakingly sure that our IP addressing schemes, 
> > switch ports, and scripts are all configured correctly (thanks to David 
> > MacMahon for that, btw), but we still have hit the proverbial brick wall of 
> > 10-GbE overflow.  When I last corresponded with David, he explained that he 
> > remembers having a similar issue before, but can't recall exactly what the 
> > problem was.
> >
> > In any case, the fact that by turning down the ADC clock prior to start-up 
> > prevents the 10-GbE core from overflowing is a major lead for us at BYU 
> > (we've been spinning our wheels on this issue for several months now). By 
> > no means are we proposing mid-run ADC clock modifications, but this appears 
> > to be a very subtle (and quite sinister, in my opinion) bug.
> >
> > Any thoughts as to what might be going on?
> >
> > Richard Black
> >
> > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley  wrote:
> > Just a note that I don't recommend you adjust FPGA clock frequencies while 
> > it's operating. In theory, you should do a global reset in case the 
> > PLL/DLLs lose lock during clock transitions, in which case the logic could 
> > be in a uncertain state. But the Sysgen flow just does a single POR.
> >
> > A better solution might be to keep the 10GbE cores turned off (enable line 
> > pulled low) on initialisation, until things are configured (tgtap started 
> > etc), and only then enable the transmission using a SW register.
> >
> > Jason Manley
> > CBF Manager
> > SKA-SA
> >
> > Cell: +27 82 662 7726
> > Work: +27 21 506 7300
> >
> > On 25 Oct 2014, at 10:34, peter  wrote:
> >
> > > Hi Richard,Joe,& all,
> > > Thanks for your help,It finally can receive packets now!
> > > As you point,After enabled the ADC card and run bof file(./adc_init.rb 
> > > roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien 
> > > script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow 
> > > the packet transfer.  then we can turn the frequency  higher.However the 
> > > finally ADC clock frequency is up to 120 Mhz in my experiment.Our final 
> > > ADC frequency standard is 250 Mhz. Maybe I need run the bof file in a 
> > > higher ADC frequency first to make a final steady 250 Mhz ADC clock 
> > > frequncy.
> > > Why it need init in a lower frequency and turn it up? That didn't make 
> > > sense.Is the hardware going wrong?As the yellow block adc16*250-8 is 
> > > designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the 
> > > final frequency in your experiment?
> > > Any reply will be helpful!
> > > Best Regards!
> > > peter
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2014-10-25 00:36:52, "Richard Black"  wrote:
> > > Peter,
> > >
> > > That's correct. We downloaded the FPGA firmware and programmed the ROACH 
> > > with the precompiled bitstream. When we didn't get any data beyond that 
> > > single packet, we stuck some overflow status r

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
By "enable" port, I assume you mean the "valid" port. I've been looking at
the PAPER model carefully for some time now, and that is how it operates.
It has a gated valid signal with a software register on each 10-GbE core.

Once again, this is not our model. This is one made available on the CASPER
wiki and run without modifications.

Richard Black

On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley  wrote:

> I suspect the 10GbE core's input FIFO is overflowing on startup. One key
> thing with this core is to the ensure that your design keeps the enable
> port held low until the core's been configured. The core becomes unusable
> once the TX FIFO overflows. This has been a long-standing bug (my emails
> trace back to 2009) but it's so easy to work around that I don't think
> anyone's bothered looking into fixing it.
>
> Jason Manley
> CBF Manager
> SKA-SA
>
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
>
> On 27 Oct 2014, at 18:25, Richard Black  wrote:
>
> > Jason,
> >
> > Thanks for your comments. While I agree that changing the ADC frequency
> mid-operation is non-kosher and could result in uncertain behavior, the
> issue at hand for us is to figure out what is going on with the PAPER model
> that has been published on the CASPER wiki. This naturally won't be (and
> shouldn't be) the end-all solution to this problem.
> >
> > This is a reportedly fully-functional model that shouldn't require any
> major changes in order to operate. However, this has clearly not been the
> case in at least two independent situations (us and Peter). This begs the
> question: what's so different about our use of PAPER?
> >
> > We, at BYU, have made painstakingly sure that our IP addressing schemes,
> switch ports, and scripts are all configured correctly (thanks to David
> MacMahon for that, btw), but we still have hit the proverbial brick wall of
> 10-GbE overflow.  When I last corresponded with David, he explained that he
> remembers having a similar issue before, but can't recall exactly what the
> problem was.
> >
> > In any case, the fact that by turning down the ADC clock prior to
> start-up prevents the 10-GbE core from overflowing is a major lead for us
> at BYU (we've been spinning our wheels on this issue for several months
> now). By no means are we proposing mid-run ADC clock modifications, but
> this appears to be a very subtle (and quite sinister, in my opinion) bug.
> >
> > Any thoughts as to what might be going on?
> >
> > Richard Black
> >
> > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley  wrote:
> > Just a note that I don't recommend you adjust FPGA clock frequencies
> while it's operating. In theory, you should do a global reset in case the
> PLL/DLLs lose lock during clock transitions, in which case the logic could
> be in a uncertain state. But the Sysgen flow just does a single POR.
> >
> > A better solution might be to keep the 10GbE cores turned off (enable
> line pulled low) on initialisation, until things are configured (tgtap
> started etc), and only then enable the transmission using a SW register.
> >
> > Jason Manley
> > CBF Manager
> > SKA-SA
> >
> > Cell: +27 82 662 7726
> > Work: +27 21 506 7300
> >
> > On 25 Oct 2014, at 10:34, peter  wrote:
> >
> > > Hi Richard,Joe,& all,
> > > Thanks for your help,It finally can receive packets now!
> > > As you point,After enabled the ADC card and run bof file(./adc_init.rb
> roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien
> script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow
> the packet transfer.  then we can turn the frequency  higher.However the
> finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC
> frequency standard is 250 Mhz. Maybe I need run the bof file in a higher
> ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
> > > Why it need init in a lower frequency and turn it up? That didn't make
> sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
> designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
> final frequency in your experiment?
> > > Any reply will be helpful!
> > > Best Regards!
> > > peter
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2014-10-25 00:36:52, "Richard Black"  wrote:
> > > Peter,
> > >
> > > That's correct. We downloaded the FPGA firmware and programmed the
> ROACH with the precompiled bitstream. When we didn't get any data beyond
> that single packet, we stuck some overflow status registers in the model
> and found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
> > >
> > > We have actually found a way to get packets to flow, but it isn't a
> good fix. When we turn the ADC clock frequency down to about 75 MHz, the
> packets begin to flow. There is an opinion in our group that the 10-GbE
> buffer overflow is a transient behavior, and, hence, if we slowly turn up
> the clock frequency after the ROACH has started up, packets may continue to
> flow in steady-state operation. We have

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley
I suspect the 10GbE core's input FIFO is overflowing on startup. One key thing 
with this core is to the ensure that your design keeps the enable port held low 
until the core's been configured. The core becomes unusable once the TX FIFO 
overflows. This has been a long-standing bug (my emails trace back to 2009) but 
it's so easy to work around that I don't think anyone's bothered looking into 
fixing it.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 27 Oct 2014, at 18:25, Richard Black  wrote:

> Jason,
> 
> Thanks for your comments. While I agree that changing the ADC frequency 
> mid-operation is non-kosher and could result in uncertain behavior, the issue 
> at hand for us is to figure out what is going on with the PAPER model that 
> has been published on the CASPER wiki. This naturally won't be (and shouldn't 
> be) the end-all solution to this problem.
> 
> This is a reportedly fully-functional model that shouldn't require any major 
> changes in order to operate. However, this has clearly not been the case in 
> at least two independent situations (us and Peter). This begs the question: 
> what's so different about our use of PAPER?
> 
> We, at BYU, have made painstakingly sure that our IP addressing schemes, 
> switch ports, and scripts are all configured correctly (thanks to David 
> MacMahon for that, btw), but we still have hit the proverbial brick wall of 
> 10-GbE overflow.  When I last corresponded with David, he explained that he 
> remembers having a similar issue before, but can't recall exactly what the 
> problem was.
> 
> In any case, the fact that by turning down the ADC clock prior to start-up 
> prevents the 10-GbE core from overflowing is a major lead for us at BYU 
> (we've been spinning our wheels on this issue for several months now). By no 
> means are we proposing mid-run ADC clock modifications, but this appears to 
> be a very subtle (and quite sinister, in my opinion) bug.
> 
> Any thoughts as to what might be going on?
> 
> Richard Black
> 
> On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley  wrote:
> Just a note that I don't recommend you adjust FPGA clock frequencies while 
> it's operating. In theory, you should do a global reset in case the PLL/DLLs 
> lose lock during clock transitions, in which case the logic could be in a 
> uncertain state. But the Sysgen flow just does a single POR.
> 
> A better solution might be to keep the 10GbE cores turned off (enable line 
> pulled low) on initialisation, until things are configured (tgtap started 
> etc), and only then enable the transmission using a SW register.
> 
> Jason Manley
> CBF Manager
> SKA-SA
> 
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
> 
> On 25 Oct 2014, at 10:34, peter  wrote:
> 
> > Hi Richard,Joe,& all,
> > Thanks for your help,It finally can receive packets now!
> > As you point,After enabled the ADC card and run bof file(./adc_init.rb 
> > roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien 
> > script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow 
> > the packet transfer.  then we can turn the frequency  higher.However the 
> > finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC 
> > frequency standard is 250 Mhz. Maybe I need run the bof file in a higher 
> > ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
> > Why it need init in a lower frequency and turn it up? That didn't make 
> > sense.Is the hardware going wrong?As the yellow block adc16*250-8 is 
> > designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the 
> > final frequency in your experiment?
> > Any reply will be helpful!
> > Best Regards!
> > peter
> >
> >
> >
> >
> >
> >
> > At 2014-10-25 00:36:52, "Richard Black"  wrote:
> > Peter,
> >
> > That's correct. We downloaded the FPGA firmware and programmed the ROACH 
> > with the precompiled bitstream. When we didn't get any data beyond that 
> > single packet, we stuck some overflow status registers in the model and 
> > found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
> >
> > We have actually found a way to get packets to flow, but it isn't a good 
> > fix. When we turn the ADC clock frequency down to about 75 MHz, the packets 
> > begin to flow. There is an opinion in our group that the 10-GbE buffer 
> > overflow is a transient behavior, and, hence, if we slowly turn up the 
> > clock frequency after the ROACH has started up, packets may continue to 
> > flow in steady-state operation. We haven't tested this yet, though.
> >
> > Richard Black
> >
> > On Thu, Oct 23, 2014 at 8:39 PM, peter  wrote:
> > Hi Richard,& All,
> > As you said the size of isolate packet is changing every time. ) :
> > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
> > 10:10:55.622053 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 4616
> > Ddi you 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
Jason,

Thanks for your comments. While I agree that changing the ADC frequency
mid-operation is non-kosher and could result in uncertain behavior, the
issue at hand for us is to figure out what is going on with the PAPER model
that has been published on the CASPER wiki. This naturally won't be (and
shouldn't be) the end-all solution to this problem.

This is a reportedly fully-functional model that shouldn't require any
major changes in order to operate. However, this has clearly not been the
case in at least two independent situations (us and Peter). This begs the
question: what's so different about our use of PAPER?

We, at BYU, have made painstakingly sure that our IP addressing schemes,
switch ports, and scripts are all configured correctly (thanks to David
MacMahon for that, btw), but we still have hit the proverbial brick wall of
10-GbE overflow.  When I last corresponded with David, he explained that he
remembers having a similar issue before, but can't recall exactly what the
problem was.

In any case, the fact that by turning down the ADC clock prior to start-up
prevents the 10-GbE core from overflowing is a major lead for us at BYU
(we've been spinning our wheels on this issue for several months now). By
no means are we proposing mid-run ADC clock modifications, but this appears
to be a very subtle (and quite sinister, in my opinion) bug.

Any thoughts as to what might be going on?

Richard Black

On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley  wrote:

> Just a note that I don't recommend you adjust FPGA clock frequencies while
> it's operating. In theory, you should do a global reset in case the
> PLL/DLLs lose lock during clock transitions, in which case the logic could
> be in a uncertain state. But the Sysgen flow just does a single POR.
>
> A better solution might be to keep the 10GbE cores turned off (enable line
> pulled low) on initialisation, until things are configured (tgtap started
> etc), and only then enable the transmission using a SW register.
>
> Jason Manley
> CBF Manager
> SKA-SA
>
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
>
> On 25 Oct 2014, at 10:34, peter  wrote:
>
> > Hi Richard,Joe,& all,
> > Thanks for your help,It finally can receive packets now!
> > As you point,After enabled the ADC card and run bof file(./adc_init.rb
> roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien
> script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow
> the packet transfer.  then we can turn the frequency  higher.However the
> finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC
> frequency standard is 250 Mhz. Maybe I need run the bof file in a higher
> ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
> > Why it need init in a lower frequency and turn it up? That didn't make
> sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
> designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
> final frequency in your experiment?
> > Any reply will be helpful!
> > Best Regards!
> > peter
> >
> >
> >
> >
> >
> >
> > At 2014-10-25 00:36:52, "Richard Black"  wrote:
> > Peter,
> >
> > That's correct. We downloaded the FPGA firmware and programmed the ROACH
> with the precompiled bitstream. When we didn't get any data beyond that
> single packet, we stuck some overflow status registers in the model and
> found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
> >
> > We have actually found a way to get packets to flow, but it isn't a good
> fix. When we turn the ADC clock frequency down to about 75 MHz, the packets
> begin to flow. There is an opinion in our group that the 10-GbE buffer
> overflow is a transient behavior, and, hence, if we slowly turn up the
> clock frequency after the ROACH has started up, packets may continue to
> flow in steady-state operation. We haven't tested this yet, though.
> >
> > Richard Black
> >
> > On Thu, Oct 23, 2014 at 8:39 PM, peter  wrote:
> > Hi Richard,& All,
> > As you said the size of isolate packet is changing every time. ) :
> > tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
> > 10:10:55.622053 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 4616
> > Ddi you download the PAPER gateware on the casper  (
> https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest ) directly?
> How about the PAPER bof file run on your system? Have you met overflow
> before?I download and install  PAPER model as the website says ,but the
> overflow shows when I run the paper_feng_netstat.rb.
> > Thanks for your information.
> > peter
> >
> >
> >
> >
> >
> > At 2014-10-24 09:59:12, "Richard Black"  wrote:
> > Peter,
> >
> > I don't mean to hijack your thread, but we've been having a very similar
> (and time-absorbing) issue with the PAPER f-engine FPGA firmware here at
> BYU. Out of curiosity, does this single packet that you're receiving i

Re: [casper] OS for development: Ubuntu 14.04?

2014-10-27 Thread Jack Hickish
Hi Adam,

I'm using Ubuntu 14.04 and things seem to work as they should, as long
as you follow the instructions on that wiki page. Though not used by
the toolflow, Vivado 2014.3 officially supports ubuntu 14.04, if
that's a concern to you.

Having said that, I think if I were to go through the setup process
again, on a machine that wasn't my everyday desktop, I'd probably go
for one of the free RedHat like distros like CentOS, just to try and
minimize unforseen headaches (I've occasionally had ubuntu updates
break things, but no more seriously than going back and redoing some
of the steps in the wiki).

Good luck!
Jack


On 27 October 2014 14:17, Schoenwald, Adam (GSFC-5640)[GSFC -  HIGHER
EDUCATION]  wrote:
> Hi everyone,
>
> I’m just getting started with a roach2 board and am about to
> set up a development station. So far I have been using windows 7 and
> encountering issues. A new computer will be coming in soon, but has Ubuntu
> 14.04 LTS on it. Has anyone had any issues with this? Should I be planning
> to wipe it and install 12.04, or are the compatibility issues minimal and
> easily resolved?
>
>
>
> My plan was to just follow the instructions at
> https://casper.berkeley.edu/wiki/MSSGE_Setup_with_Xilinx_14.x_and_Matlab_2012b
> but then I saw there were some problems with 13.03
> (http://www.mail-archive.com/casper%40lists.berkeley.edu/msg04260.html).
>
>
>
> Any input here would be helpful,
>
>
>
> Thanks,
>
> Adam Schoenwald



[casper] OS for development: Ubuntu 14.04?

2014-10-27 Thread Schoenwald, Adam (GSFC-5640)[GSFC - HIGHER EDUCATION]
Hi everyone,
I'm just getting started with a roach2 board and am about to 
set up a development station. So far I have been using windows 7 and 
encountering issues. A new computer will be coming in soon, but has Ubuntu 
14.04 LTS on it. Has anyone had any issues with this? Should I be planning to 
wipe it and install 12.04, or are the compatibility issues minimal and easily 
resolved?

My plan was to just follow the instructions at 
https://casper.berkeley.edu/wiki/MSSGE_Setup_with_Xilinx_14.x_and_Matlab_2012b 
but then I saw there were some problems with 13.03 
(http://www.mail-archive.com/casper%40lists.berkeley.edu/msg04260.html).

Any input here would be helpful,

Thanks,
Adam Schoenwald


Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley
Just a note that I don't recommend you adjust FPGA clock frequencies while it's 
operating. In theory, you should do a global reset in case the PLL/DLLs lose 
lock during clock transitions, in which case the logic could be in a uncertain 
state. But the Sysgen flow just does a single POR. 

A better solution might be to keep the 10GbE cores turned off (enable line 
pulled low) on initialisation, until things are configured (tgtap started etc), 
and only then enable the transmission using a SW register.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 25 Oct 2014, at 10:34, peter  wrote:

> Hi Richard,Joe,& all,
> Thanks for your help,It finally can receive packets now!
> As you point,After enabled the ADC card and run bof file(./adc_init.rb roach1 
> bof file)in 200 Mhz (or higher than it), We need run init fengien script in 
> about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow the packet 
> transfer.  then we can turn the frequency  higher.However the finally ADC 
> clock frequency is up to 120 Mhz in my experiment.Our final ADC frequency 
> standard is 250 Mhz. Maybe I need run the bof file in a higher ADC frequency 
> first to make a final steady 250 Mhz ADC clock frequncy.
> Why it need init in a lower frequency and turn it up? That didn't make 
> sense.Is the hardware going wrong?As the yellow block adc16*250-8 is designed 
> for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the final 
> frequency in your experiment? 
> Any reply will be helpful!
> Best Regards!
> peter
> 
> 
> 
> 
> 
> 
> At 2014-10-25 00:36:52, "Richard Black"  wrote:
> Peter,
> 
> That's correct. We downloaded the FPGA firmware and programmed the ROACH with 
> the precompiled bitstream. When we didn't get any data beyond that single 
> packet, we stuck some overflow status registers in the model and found that 
> we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
> 
> We have actually found a way to get packets to flow, but it isn't a good fix. 
> When we turn the ADC clock frequency down to about 75 MHz, the packets begin 
> to flow. There is an opinion in our group that the 10-GbE buffer overflow is 
> a transient behavior, and, hence, if we slowly turn up the clock frequency 
> after the ROACH has started up, packets may continue to flow in steady-state 
> operation. We haven't tested this yet, though.
> 
> Richard Black
> 
> On Thu, Oct 23, 2014 at 8:39 PM, peter  wrote:
> Hi Richard,& All,
> As you said the size of isolate packet is changing every time. ) :
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
> 10:10:55.622053 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 4616
> Ddi you download the PAPER gateware on the casper  
> (https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest ) directly? How 
> about the PAPER bof file run on your system? Have you met overflow before?I 
> download and install  PAPER model as the website says ,but the overflow shows 
> when I run the paper_feng_netstat.rb.
> Thanks for your information.
> peter
> 
> 
> 
> 
> 
> At 2014-10-24 09:59:12, "Richard Black"  wrote:
> Peter,
> 
> I don't mean to hijack your thread, but we've been having a very similar (and 
> time-absorbing) issue with the PAPER f-engine FPGA firmware here at BYU. Out 
> of curiosity, does this single packet that you're receiving in tcpdump change 
> in size every time you reprogram the ROACH? We've seen this happen, and we're 
> pretty sure that this isolated packet is the 10-GbE buffer flushing when the 
> 10-GbE core is initialized (i.e. the enable signal isn't sync'd with the 
> start of new packet).
> 
> Regardless of whether we have the same issue, I'm very interested to see this 
> problem's resolution.
> 
> Good luck,
> 
> Richard Black
> 
> On Thu, Oct 23, 2014 at 7:50 PM, peter  wrote:
> Hi Joe, & All,
> I find a thing this morning , there is one packet send out from roach When I 
> run PAPER model, which I got from HPC tcpdump:
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
> 09:04:02.757813 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 6456
> 
> The lenght is not expected 8200+8 ,and far from full TX buffer size 
> 8K+512.And the other packets are stopped from overflow.
> I have tried to change the tutorial 2 packet size to 8200 bytes and 8K +512 
> bytes. It is  a good transfer.I also make sure the boundary size is indeed 
> 8K+512 ,because while I change size to 8K+513 byetes ,There is no data 
> send.So the received packet this morning with length 6456  is totally under 
> the limit.But what caused the other packets  in overflow? 
> Any suggestions could be helpful !
> peter
> 
> 
> 
> 
> 
> 
> At 2014-10-24 00:37:14, "Kujawski, Joseph"  wrote:
> Peter,
> 
> By cadence of the broadcast, I mean how often are the 8200 byte packets sent. 

Re: [casper] about boffile download using tut3.py

2014-10-27 Thread Marc Welz
On Sat, Oct 25, 2014 at 2:33 PM, Wang Jinqing  wrote:

> For tut1 I can use telnet to roach2,the using the command like
>
> nc -w 2 -q 2 192.168.111.10  < name.bof
>
> to download the bof file.But tut3.py looks not in that way. What should I
> do  ?
>

I think try using the same approach as for tut 1


>  By the way,is there a linux system in roach2?
>

Yes, there are flash chips soldered onto the roach. They contain several
partitions, and one of them is a writable filesystem


> For I even can't find a SD card on the board.
>
> error information:
>
> 192.168.40.60: ?progdev tut3_2014_Oct_24_0848.bof
>
>
>
> 192.168.40.60: #log info 992952462866 raw attempting\_to\_empty\_fpga
>
> 192.168.40.60: #log info 992952462866 raw
> attempting\_to\_program\_tut3_2014_Oct_24_0848.bof
>
> 192.168.40.60: #log error 992952462867 raw
> unable\_to\_open\_boffile\_./tut3_2014_Oct_24_0848.bof:\_No\_such\_file\_or\_directory
>

Progdev requires a file on the local filesystem - if it hasn't been
transferred/uploaded to it previously, then this won't be found. Use the
upload* requests to transfer the bof file on to the roach

regards

marc