Jack,

I appreciate your help. I tend to agree that the issue is likely a hardware
configuration problem, but we have been trying to match it as closely as
possible.

We do feed a 1-PPS signal into the board, but I'm hazy on the details of
the other pulse parameters. I'll look into that as well.

So, if I understand you correctly, you believe that the sync pulse is
reaching the ethernet interfaces *after* the cores are enabled? If that is
the case, couldn't we delay enabling the 10-GbE cores for another second to
fix it? This might be a quick way to test that theory, but please correct
me if I've misunderstood.

Richard Black

On Mon, Oct 27, 2014 at 11:05 AM, Jack Hickish <jackhick...@gmail.com>
wrote:

> Hi Richard,
>
> I've just had a very brief look at the design / software, so take this
> email with a pinch of salt, but on the off-chance you haven't checked
> this....
>
> It looks like the PAPER F-engine setup on running the start script for
> software / firmware out of the box is --
>
> 1. Disable all ethernet interfaces
> 2. Arm sync generator, wait 1 second for PPS
> 3. Reset ethernet interfaces
> 4. Enable interfaces.
>
> These four steps seem like they should be safe, yet the behaviour
> you're describing sounds like the design is midway sending a packet,
> then gets a sync, gives up sending an end-of-frame and starts sending
> a new packet, at which point the old packet + the new packet =
> overflow.
>
> Knowing that the design works for paper, my wondering is whether after
> arming the sync generator syncs are flowing through the design before
> the ethernet interface is enabled. Do you have a PPS-like input? the
> fengine initialisation script seems to wait for a second after arming,
> but if your sync input is something significantly slower, you could
> have problems.
>
> I'm sceptical about this theory (I think the symptoms would be lots of
> OK packets when you brought up the interface, and then it dying when
> the sync arrives, rather than a single good packet like you're
> seeing), but if the firmware + software really is the same as that
> working with paper, and the wiki hasn't just got out of sync with the
> paper devs, perhaps the problem is in your hardware setup....
>
> Cheers,
> Jack
>
> On 27 October 2014 16:38, Richard Black <aeldstes...@gmail.com> wrote:
> > By "enable" port, I assume you mean the "valid" port. I've been looking
> at
> > the PAPER model carefully for some time now, and that is how it
> operates. It
> > has a gated valid signal with a software register on each 10-GbE core.
> >
> > Once again, this is not our model. This is one made available on the
> CASPER
> > wiki and run without modifications.
> >
> > Richard Black
> >
> > On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley <jman...@ska.ac.za>
> wrote:
> >>
> >> I suspect the 10GbE core's input FIFO is overflowing on startup. One key
> >> thing with this core is to the ensure that your design keeps the enable
> port
> >> held low until the core's been configured. The core becomes unusable
> once
> >> the TX FIFO overflows. This has been a long-standing bug (my emails
> trace
> >> back to 2009) but it's so easy to work around that I don't think
> anyone's
> >> bothered looking into fixing it.
> >>
> >> Jason Manley
> >> CBF Manager
> >> SKA-SA
> >>
> >> Cell: +27 82 662 7726
> >> Work: +27 21 506 7300
> >>
> >> On 27 Oct 2014, at 18:25, Richard Black <aeldstes...@gmail.com> wrote:
> >>
> >> > Jason,
> >> >
> >> > Thanks for your comments. While I agree that changing the ADC
> frequency
> >> > mid-operation is non-kosher and could result in uncertain behavior,
> the
> >> > issue at hand for us is to figure out what is going on with the PAPER
> model
> >> > that has been published on the CASPER wiki. This naturally won't be
> (and
> >> > shouldn't be) the end-all solution to this problem.
> >> >
> >> > This is a reportedly fully-functional model that shouldn't require any
> >> > major changes in order to operate. However, this has clearly not been
> the
> >> > case in at least two independent situations (us and Peter). This begs
> the
> >> > question: what's so different about our use of PAPER?
> >> >
> >> > We, at BYU, have made painstakingly sure that our IP addressing
> schemes,
> >> > switch ports, and scripts are all configured correctly (thanks to
> David
> >> > MacMahon for that, btw), but we still have hit the proverbial brick
> wall of
> >> > 10-GbE overflow.  When I last corresponded with David, he explained
> that he
> >> > remembers having a similar issue before, but can't recall exactly
> what the
> >> > problem was.
> >> >
> >> > In any case, the fact that by turning down the ADC clock prior to
> >> > start-up prevents the 10-GbE core from overflowing is a major lead
> for us at
> >> > BYU (we've been spinning our wheels on this issue for several months
> now).
> >> > By no means are we proposing mid-run ADC clock modifications, but this
> >> > appears to be a very subtle (and quite sinister, in my opinion) bug.
> >> >
> >> > Any thoughts as to what might be going on?
> >> >
> >> > Richard Black
> >> >
> >> > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley <jman...@ska.ac.za>
> wrote:
> >> > Just a note that I don't recommend you adjust FPGA clock frequencies
> >> > while it's operating. In theory, you should do a global reset in case
> the
> >> > PLL/DLLs lose lock during clock transitions, in which case the logic
> could
> >> > be in a uncertain state. But the Sysgen flow just does a single POR.
> >> >
> >> > A better solution might be to keep the 10GbE cores turned off (enable
> >> > line pulled low) on initialisation, until things are configured (tgtap
> >> > started etc), and only then enable the transmission using a SW
> register.
> >> >
> >> > Jason Manley
> >> > CBF Manager
> >> > SKA-SA
> >> >
> >> > Cell: +27 82 662 7726
> >> > Work: +27 21 506 7300
> >> >
> >> > On 25 Oct 2014, at 10:34, peter <peterniu...@163.com> wrote:
> >> >
> >> > > Hi Richard,Joe,& all,
> >> > > Thanks for your help,It finally can receive packets now!
> >> > > As you point,After enabled the ADC card and run bof
> file(./adc_init.rb
> >> > > roach1 bof file)in 200 Mhz (or higher than it), We need run init
> fengien
> >> > > script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will
> allow
> >> > > the packet transfer.  then we can turn the frequency
> higher.However the
> >> > > finally ADC clock frequency is up to 120 Mhz in my experiment.Our
> final ADC
> >> > > frequency standard is 250 Mhz. Maybe I need run the bof file in a
> higher ADC
> >> > > frequency first to make a final steady 250 Mhz ADC clock frequncy.
> >> > > Why it need init in a lower frequency and turn it up? That didn't
> make
> >> > > sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
> >> > > designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How
> about the
> >> > > final frequency in your experiment?
> >> > > Any reply will be helpful!
> >> > > Best Regards!
> >> > > peter
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > At 2014-10-25 00:36:52, "Richard Black" <aeldstes...@gmail.com>
> wrote:
> >> > > Peter,
> >> > >
> >> > > That's correct. We downloaded the FPGA firmware and programmed the
> >> > > ROACH with the precompiled bitstream. When we didn't get any data
> beyond
> >> > > that single packet, we stuck some overflow status registers in the
> model and
> >> > > found that we were overflowing at 1025 64-bit words (i.e. 8200
> bytes).
> >> > >
> >> > > We have actually found a way to get packets to flow, but it isn't a
> >> > > good fix. When we turn the ADC clock frequency down to about 75
> MHz, the
> >> > > packets begin to flow. There is an opinion in our group that the
> 10-GbE
> >> > > buffer overflow is a transient behavior, and, hence, if we slowly
> turn up
> >> > > the clock frequency after the ROACH has started up, packets may
> continue to
> >> > > flow in steady-state operation. We haven't tested this yet, though.
> >> > >
> >> > > Richard Black
> >> > >
> >> > > On Thu, Oct 23, 2014 at 8:39 PM, peter <peterniu...@163.com> wrote:
> >> > > Hi Richard,& All,
> >> > > As you said the size of isolate packet is changing every time. ) :
> >> > > tcpdump: verbose output suppressed, use -v or -vv for full protocol
> >> > > decode
> >> > > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535
> >> > > bytes
> >> > > 10:10:55.622053 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 4616
> >> > > Ddi you download the PAPER gateware on the casper
> >> > > (https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest )
> directly? How
> >> > > about the PAPER bof file run on your system? Have you met overflow
> before?I
> >> > > download and install  PAPER model as the website says ,but the
> overflow
> >> > > shows when I run the paper_feng_netstat.rb.
> >> > > Thanks for your information.
> >> > > peter
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > At 2014-10-24 09:59:12, "Richard Black" <aeldstes...@gmail.com>
> wrote:
> >> > > Peter,
> >> > >
> >> > > I don't mean to hijack your thread, but we've been having a very
> >> > > similar (and time-absorbing) issue with the PAPER f-engine FPGA
> firmware
> >> > > here at BYU. Out of curiosity, does this single packet that you're
> receiving
> >> > > in tcpdump change in size every time you reprogram the ROACH? We've
> seen
> >> > > this happen, and we're pretty sure that this isolated packet is the
> 10-GbE
> >> > > buffer flushing when the 10-GbE core is initialized (i.e. the
> enable signal
> >> > > isn't sync'd with the start of new packet).
> >> > >
> >> > > Regardless of whether we have the same issue, I'm very interested to
> >> > > see this problem's resolution.
> >> > >
> >> > > Good luck,
> >> > >
> >> > > Richard Black
> >> > >
> >> > > On Thu, Oct 23, 2014 at 7:50 PM, peter <peterniu...@163.com> wrote:
> >> > > Hi Joe, & All,
> >> > > I find a thing this morning , there is one packet send out from
> roach
> >> > > When I run PAPER model, which I got from HPC tcpdump:
> >> > > tcpdump: verbose output suppressed, use -v or -vv for full protocol
> >> > > decode
> >> > > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535
> >> > > bytes
> >> > > 09:04:02.757813 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 6456
> >> > >
> >> > > The lenght is not expected 8200+8 ,and far from full TX buffer size
> >> > > 8K+512.And the other packets are stopped from overflow.
> >> > > I have tried to change the tutorial 2 packet size to 8200 bytes and
> 8K
> >> > > +512 bytes. It is  a good transfer.I also make sure the boundary
> size is
> >> > > indeed 8K+512 ,because while I change size to 8K+513 byetes ,There
> is no
> >> > > data send.So the received packet this morning with length 6456  is
> totally
> >> > > under the limit.But what caused the other packets  in overflow?
> >> > > Any suggestions could be helpful !
> >> > > peter
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > At 2014-10-24 00:37:14, "Kujawski, Joseph" <jkujaw...@siena.edu>
> >> > > wrote:
> >> > > Peter,
> >> > >
> >> > > By cadence of the broadcast, I mean how often are the 8200 byte
> >> > > packets sent.  Basically, I would like to determine how close your
> system is
> >> > > to the maximum data rate of the 10Gbe.
> >> > >
> >> > > Also, it would be instructive to know the following:
> >> > >
> >> > > 1) What transmission protocol are you using? (the One_GBe module
> uses
> >> > > UDP are you using that or TCP?)
> >> > >
> >> > > 2) What NICs are you using on the receive side?
> >> > >
> >> > > At this time, I am working on the theory that the issue is related
> to
> >> > > the network itself not being able to sustain the data volume you are
> >> > > generating and would like to get a better idea of how much data is
> generated
> >> > > and how often it is sent.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > -Joe Kujawski
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Oct 23, 2014 at 12:01 PM, peter <peterniu...@163.com>
> wrote:
> >> > > hi Joe,
> >> > > 1,yes ,acctually we have 3 roach2 with 8 nics.
> >> > > 2,well,each roach has 4 of 8 NICs connect directly to pc.the other 4
> >> > > connect 10gb switch.I have connected the sfp wire( whitch should
> connect
> >> > > switch)  to pc directly to see whwther the data come out.but no
> data out as
> >> > > for the overflow.
> >> > > 3 could you make an example about the cadence broadcast?I am not
> >> > > familiar with this.
> >> > > it indeed require bigger data,but each packet has the limited 8200
> >> > > bytes.
> >> > > thanks for your reply!
> >> > > peter
> >> > > --
> >> > > 发自 Android 网易邮箱
> >> > >
> >> > >
> >> > >
> >> > > On 2014-10-23 23:16 , Kujawski, Joseph Wrote:
> >> > >
> >> > > Peter,
> >> > >
> >> > > I am downloading it now.  Can you answer these questions:
> >> > >
> >> > > 1) Do you have a standard PAPER architecture with two ROACH boards
> >> > > each containing 8 10GBe ports?
> >> > >
> >> > > 2) Please describe your internet architecture i.e. how are each of
> the
> >> > > ports connected.
> >> > >
> >> > > 3) What is the cadence of each broadcast?
> >> > >
> >> > > My current suspicion is that you are generating more data than you
> can
> >> > > push through your interface(s).  It may be that the higher data
> volume in
> >> > > your implementation requires more of a network infrastructure than
> was
> >> > > required byt the original system.
> >> > >
> >> > > -Joe Kujawski
> >> > >
> >> > > On Thu, Oct 23, 2014 at 11:01 AM, peter <peterniu...@163.com>
> wrote:
> >> > > This is a littel big, roach2_tl8511port is the one can send data
> >> > > normally.The environment should be ok now ,Iast time the
> crc32x64_con may be
> >> > > missing.
> >> > > Good night!
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > At 2014-10-23 22:52:54, "Kujawski, Joseph" <jkujaw...@siena.edu>
> >> > > wrote:
> >> > > Peter,
> >> > >
> >> > > 1) For reference, here is a list of the errors:
> >> > >
> >> > > --------------------------------- Version Log
> >> > > ----------------------------------
> >> > > Version                                 Path
> >> > > System Generator 14.6
> >> > > C:/Xilinx/14.6/ISE_DS/ISE/sysgen
> >> > > Matlab 8.0.0.783 (R2012b)               C:/MATLAB/R2012b
> >> > > ISE                                     C:/Xilinx/14.6/ISE_DS/ISE
> >> > >
> >> > >
> --------------------------------------------------------------------------------
> >> > > Summary of Errors:
> >> > > Error 0001: Could not find the configuration m-function
> >> > > "crc32x64_con...
> >> > >      Block:
> >> > > 'roach2_fengine_tl8511port/transpose/Transpose1/crc/crc32x64'
> >> > > Error 0002: Could not find the configuration m-function
> >> > > "crc32x64_con...
> >> > >      Block:
> >> > > 'roach2_fengine_tl8511port/transpose/Transpose2/crc/crc32x64'
> >> > > Error 0003: Could not find the configuration m-function
> >> > > "crc32x64_con...
> >> > >      Block:
> >> > > 'roach2_fengine_tl8511port/transpose/Transpose3/crc/crc32x64'
> >> > > Error 0004: Could not find the configuration m-function
> >> > > "crc32x64_con...
> >> > >      Block:
> >> > > 'roach2_fengine_tl8511port/transpose/Transpose4/crc/crc32x64'
> >> > >
> >> > >
> --------------------------------------------------------------------------------
> >> > >
> >> > > 2) Your email did not have an attachment.  I have more comments, but
> >> > > wanted to let you know about the attachment before you went to bed.
> >> > >
> >> > > -Joe Kujawski
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Oct 23, 2014 at 10:33 AM, peter <peterniu...@163.com>
> wrote:
> >> > >
> >> > > Hi Joe,
> >> > > Thanks for your warm help!
> >> > > What error  shows when you compile my model?Is there some file it
> >> > > missed? I will packet my whole file to you in the attachment. And
> how about
> >> > > the PAPER one ?Did it report overflow message? It need to install
> and use
> >> > > the ruby to control it .
> >> > > Leave the PAPER model alone, Let's talk about the 10Gb block on
> roach
> >> > > v2. Though your model is good to see the Data_valid and eof etc.  I
> don't
> >> > > know how to add your model to the PAPER as I realize the PAPER have
> a data
> >> > > valid and EOF according to a counter.So I don't know where to put
> the
> >> > > model.For example,if I put the data_valid or eof control process you
> >> > > designed on the 10Gbe port in PAPER model,then I think it equal to
> add a
> >> > > 10Gbe block instead One_GBe block in yours. *_*!!
> >> > > I change the number 50 to 1025 on tutorial 2 to make packet size to
> >> > > 8200 bytes ,And it seems good transfer without error.it is a
> frequency
> >> > > 1.3*1025. that means 1 packet send every 1.3*1025 clock.I got the
> boundary
> >> > > frequency 1.3*1025 by test a lot of times.  but when I change the
> frequency
> >> > > lower than 1.3*1025,the first few packets can send out,but the
> overflow
> >> > > comes.I think it is the transfer frequency that determined the
> overflow.
> >> > > Thanks for your suggestions and advice!
> >> > > peter
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > At 2014-10-23 00:14:29, "Kujawski, Joseph" <jkujaw...@siena.edu>
> >> > > wrote:
> >> > >
> >> > > Peter,
> >> > >
> >> > > I find that I can not compile and simulate your design, however,
> >> > > looking at the code structure, I can't tell if tx_val and tx_EOF
> are high at
> >> > > the same time:
> >> > >
> >> > >
> >> > >
> >> > > Also, I modified the design to send out a packet of size 8200 once
> per
> >> > > second (model attached) and added a register that latches the GBE
> tx_aful
> >> > > and tx_overrun lines so they can be read through the KATCP
> interface.
> >> > > Modify the model to remove the oscilloscope and Xilinx out gateways
> before
> >> > > compiling it for your platform.  Note that this model does not
> check for
> >> > > overflow, though the latch will let you know if you have had one.
> >> > >
> >> > > Let me know how this works for you.
> >> > >
> >> > > -Joe Kujawski
> >> > > --
> >> > > **************************************
> >> > > * Joe Kujawski
> >> > > * Siena College
> >> > > * Dept. of Physics and Astronomy, RB 113
> >> > > * 515 Loudon Road
> >> > > * Loudonville, NY 12211-1462
> >> > > *
> >> > > * Email: jkujaw...@siena.edu
> >> > > * Phone: 518-867-7509  <-- NEW NUMBER
> >> > > * Fax: 518-783-2986
> >> > > **************************************
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > **************************************
> >> > > * Joe Kujawski
> >> > > * Siena College
> >> > > * Dept. of Physics and Astronomy, RB 113
> >> > > * 515 Loudon Road
> >> > > * Loudonville, NY 12211-1462
> >> > > *
> >> > > * Email: jkujaw...@siena.edu
> >> > > * Phone: 518-867-7509  <-- NEW NUMBER
> >> > > * Fax: 518-783-2986
> >> > > **************************************
> >> > >
> >> > > 从网易163邮箱发来的云附件
> >> > >
> >> > > paperfengine.zip (126.71M, 2014年11月7日 22:58 到期)
> >> > > 下载
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > **************************************
> >> > > * Joe Kujawski
> >> > > * Siena College
> >> > > * Dept. of Physics and Astronomy, RB 113
> >> > > * 515 Loudon Road
> >> > > * Loudonville, NY 12211-1462
> >> > > *
> >> > > * Email: jkujaw...@siena.edu
> >> > > * Phone: 518-867-7509  <-- NEW NUMBER
> >> > > * Fax: 518-783-2986
> >> > > **************************************
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > **************************************
> >> > > * Joe Kujawski
> >> > > * Siena College
> >> > > * Dept. of Physics and Astronomy, RB 113
> >> > > * 515 Loudon Road
> >> > > * Loudonville, NY 12211-1462
> >> > > *
> >> > > * Email: jkujaw...@siena.edu
> >> > > * Phone: 518-867-7509  <-- NEW NUMBER
> >> > > * Fax: 518-783-2986
> >> > > **************************************
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> >
> >>
> >
>

Reply via email to