Jack, I appreciate your help. I tend to agree that the issue is likely a hardware configuration problem, but we have been trying to match it as closely as possible.
We do feed a 1-PPS signal into the board, but I'm hazy on the details of the other pulse parameters. I'll look into that as well. So, if I understand you correctly, you believe that the sync pulse is reaching the ethernet interfaces *after* the cores are enabled? If that is the case, couldn't we delay enabling the 10-GbE cores for another second to fix it? This might be a quick way to test that theory, but please correct me if I've misunderstood. Richard Black On Mon, Oct 27, 2014 at 11:05 AM, Jack Hickish <jackhick...@gmail.com> wrote: > Hi Richard, > > I've just had a very brief look at the design / software, so take this > email with a pinch of salt, but on the off-chance you haven't checked > this.... > > It looks like the PAPER F-engine setup on running the start script for > software / firmware out of the box is -- > > 1. Disable all ethernet interfaces > 2. Arm sync generator, wait 1 second for PPS > 3. Reset ethernet interfaces > 4. Enable interfaces. > > These four steps seem like they should be safe, yet the behaviour > you're describing sounds like the design is midway sending a packet, > then gets a sync, gives up sending an end-of-frame and starts sending > a new packet, at which point the old packet + the new packet = > overflow. > > Knowing that the design works for paper, my wondering is whether after > arming the sync generator syncs are flowing through the design before > the ethernet interface is enabled. Do you have a PPS-like input? the > fengine initialisation script seems to wait for a second after arming, > but if your sync input is something significantly slower, you could > have problems. > > I'm sceptical about this theory (I think the symptoms would be lots of > OK packets when you brought up the interface, and then it dying when > the sync arrives, rather than a single good packet like you're > seeing), but if the firmware + software really is the same as that > working with paper, and the wiki hasn't just got out of sync with the > paper devs, perhaps the problem is in your hardware setup.... > > Cheers, > Jack > > On 27 October 2014 16:38, Richard Black <aeldstes...@gmail.com> wrote: > > By "enable" port, I assume you mean the "valid" port. I've been looking > at > > the PAPER model carefully for some time now, and that is how it > operates. It > > has a gated valid signal with a software register on each 10-GbE core. > > > > Once again, this is not our model. This is one made available on the > CASPER > > wiki and run without modifications. > > > > Richard Black > > > > On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley <jman...@ska.ac.za> > wrote: > >> > >> I suspect the 10GbE core's input FIFO is overflowing on startup. One key > >> thing with this core is to the ensure that your design keeps the enable > port > >> held low until the core's been configured. The core becomes unusable > once > >> the TX FIFO overflows. This has been a long-standing bug (my emails > trace > >> back to 2009) but it's so easy to work around that I don't think > anyone's > >> bothered looking into fixing it. > >> > >> Jason Manley > >> CBF Manager > >> SKA-SA > >> > >> Cell: +27 82 662 7726 > >> Work: +27 21 506 7300 > >> > >> On 27 Oct 2014, at 18:25, Richard Black <aeldstes...@gmail.com> wrote: > >> > >> > Jason, > >> > > >> > Thanks for your comments. While I agree that changing the ADC > frequency > >> > mid-operation is non-kosher and could result in uncertain behavior, > the > >> > issue at hand for us is to figure out what is going on with the PAPER > model > >> > that has been published on the CASPER wiki. This naturally won't be > (and > >> > shouldn't be) the end-all solution to this problem. > >> > > >> > This is a reportedly fully-functional model that shouldn't require any > >> > major changes in order to operate. However, this has clearly not been > the > >> > case in at least two independent situations (us and Peter). This begs > the > >> > question: what's so different about our use of PAPER? > >> > > >> > We, at BYU, have made painstakingly sure that our IP addressing > schemes, > >> > switch ports, and scripts are all configured correctly (thanks to > David > >> > MacMahon for that, btw), but we still have hit the proverbial brick > wall of > >> > 10-GbE overflow. When I last corresponded with David, he explained > that he > >> > remembers having a similar issue before, but can't recall exactly > what the > >> > problem was. > >> > > >> > In any case, the fact that by turning down the ADC clock prior to > >> > start-up prevents the 10-GbE core from overflowing is a major lead > for us at > >> > BYU (we've been spinning our wheels on this issue for several months > now). > >> > By no means are we proposing mid-run ADC clock modifications, but this > >> > appears to be a very subtle (and quite sinister, in my opinion) bug. > >> > > >> > Any thoughts as to what might be going on? > >> > > >> > Richard Black > >> > > >> > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley <jman...@ska.ac.za> > wrote: > >> > Just a note that I don't recommend you adjust FPGA clock frequencies > >> > while it's operating. In theory, you should do a global reset in case > the > >> > PLL/DLLs lose lock during clock transitions, in which case the logic > could > >> > be in a uncertain state. But the Sysgen flow just does a single POR. > >> > > >> > A better solution might be to keep the 10GbE cores turned off (enable > >> > line pulled low) on initialisation, until things are configured (tgtap > >> > started etc), and only then enable the transmission using a SW > register. > >> > > >> > Jason Manley > >> > CBF Manager > >> > SKA-SA > >> > > >> > Cell: +27 82 662 7726 > >> > Work: +27 21 506 7300 > >> > > >> > On 25 Oct 2014, at 10:34, peter <peterniu...@163.com> wrote: > >> > > >> > > Hi Richard,Joe,& all, > >> > > Thanks for your help,It finally can receive packets now! > >> > > As you point,After enabled the ADC card and run bof > file(./adc_init.rb > >> > > roach1 bof file)in 200 Mhz (or higher than it), We need run init > fengien > >> > > script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will > allow > >> > > the packet transfer. then we can turn the frequency > higher.However the > >> > > finally ADC clock frequency is up to 120 Mhz in my experiment.Our > final ADC > >> > > frequency standard is 250 Mhz. Maybe I need run the bof file in a > higher ADC > >> > > frequency first to make a final steady 250 Mhz ADC clock frequncy. > >> > > Why it need init in a lower frequency and turn it up? That didn't > make > >> > > sense.Is the hardware going wrong?As the yellow block adc16*250-8 is > >> > > designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How > about the > >> > > final frequency in your experiment? > >> > > Any reply will be helpful! > >> > > Best Regards! > >> > > peter > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > At 2014-10-25 00:36:52, "Richard Black" <aeldstes...@gmail.com> > wrote: > >> > > Peter, > >> > > > >> > > That's correct. We downloaded the FPGA firmware and programmed the > >> > > ROACH with the precompiled bitstream. When we didn't get any data > beyond > >> > > that single packet, we stuck some overflow status registers in the > model and > >> > > found that we were overflowing at 1025 64-bit words (i.e. 8200 > bytes). > >> > > > >> > > We have actually found a way to get packets to flow, but it isn't a > >> > > good fix. When we turn the ADC clock frequency down to about 75 > MHz, the > >> > > packets begin to flow. There is an opinion in our group that the > 10-GbE > >> > > buffer overflow is a transient behavior, and, hence, if we slowly > turn up > >> > > the clock frequency after the ROACH has started up, packets may > continue to > >> > > flow in steady-state operation. We haven't tested this yet, though. > >> > > > >> > > Richard Black > >> > > > >> > > On Thu, Oct 23, 2014 at 8:39 PM, peter <peterniu...@163.com> wrote: > >> > > Hi Richard,& All, > >> > > As you said the size of isolate packet is changing every time. ) : > >> > > tcpdump: verbose output suppressed, use -v or -vv for full protocol > >> > > decode > >> > > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 > >> > > bytes > >> > > 10:10:55.622053 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 4616 > >> > > Ddi you download the PAPER gateware on the casper > >> > > (https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest ) > directly? How > >> > > about the PAPER bof file run on your system? Have you met overflow > before?I > >> > > download and install PAPER model as the website says ,but the > overflow > >> > > shows when I run the paper_feng_netstat.rb. > >> > > Thanks for your information. > >> > > peter > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > At 2014-10-24 09:59:12, "Richard Black" <aeldstes...@gmail.com> > wrote: > >> > > Peter, > >> > > > >> > > I don't mean to hijack your thread, but we've been having a very > >> > > similar (and time-absorbing) issue with the PAPER f-engine FPGA > firmware > >> > > here at BYU. Out of curiosity, does this single packet that you're > receiving > >> > > in tcpdump change in size every time you reprogram the ROACH? We've > seen > >> > > this happen, and we're pretty sure that this isolated packet is the > 10-GbE > >> > > buffer flushing when the 10-GbE core is initialized (i.e. the > enable signal > >> > > isn't sync'd with the start of new packet). > >> > > > >> > > Regardless of whether we have the same issue, I'm very interested to > >> > > see this problem's resolution. > >> > > > >> > > Good luck, > >> > > > >> > > Richard Black > >> > > > >> > > On Thu, Oct 23, 2014 at 7:50 PM, peter <peterniu...@163.com> wrote: > >> > > Hi Joe, & All, > >> > > I find a thing this morning , there is one packet send out from > roach > >> > > When I run PAPER model, which I got from HPC tcpdump: > >> > > tcpdump: verbose output suppressed, use -v or -vv for full protocol > >> > > decode > >> > > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 > >> > > bytes > >> > > 09:04:02.757813 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 6456 > >> > > > >> > > The lenght is not expected 8200+8 ,and far from full TX buffer size > >> > > 8K+512.And the other packets are stopped from overflow. > >> > > I have tried to change the tutorial 2 packet size to 8200 bytes and > 8K > >> > > +512 bytes. It is a good transfer.I also make sure the boundary > size is > >> > > indeed 8K+512 ,because while I change size to 8K+513 byetes ,There > is no > >> > > data send.So the received packet this morning with length 6456 is > totally > >> > > under the limit.But what caused the other packets in overflow? > >> > > Any suggestions could be helpful ! > >> > > peter > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > At 2014-10-24 00:37:14, "Kujawski, Joseph" <jkujaw...@siena.edu> > >> > > wrote: > >> > > Peter, > >> > > > >> > > By cadence of the broadcast, I mean how often are the 8200 byte > >> > > packets sent. Basically, I would like to determine how close your > system is > >> > > to the maximum data rate of the 10Gbe. > >> > > > >> > > Also, it would be instructive to know the following: > >> > > > >> > > 1) What transmission protocol are you using? (the One_GBe module > uses > >> > > UDP are you using that or TCP?) > >> > > > >> > > 2) What NICs are you using on the receive side? > >> > > > >> > > At this time, I am working on the theory that the issue is related > to > >> > > the network itself not being able to sustain the data volume you are > >> > > generating and would like to get a better idea of how much data is > generated > >> > > and how often it is sent. > >> > > > >> > > Thanks, > >> > > > >> > > -Joe Kujawski > >> > > > >> > > > >> > > > >> > > On Thu, Oct 23, 2014 at 12:01 PM, peter <peterniu...@163.com> > wrote: > >> > > hi Joe, > >> > > 1,yes ,acctually we have 3 roach2 with 8 nics. > >> > > 2,well,each roach has 4 of 8 NICs connect directly to pc.the other 4 > >> > > connect 10gb switch.I have connected the sfp wire( whitch should > connect > >> > > switch) to pc directly to see whwther the data come out.but no > data out as > >> > > for the overflow. > >> > > 3 could you make an example about the cadence broadcast?I am not > >> > > familiar with this. > >> > > it indeed require bigger data,but each packet has the limited 8200 > >> > > bytes. > >> > > thanks for your reply! > >> > > peter > >> > > -- > >> > > 发自 Android 网易邮箱 > >> > > > >> > > > >> > > > >> > > On 2014-10-23 23:16 , Kujawski, Joseph Wrote: > >> > > > >> > > Peter, > >> > > > >> > > I am downloading it now. Can you answer these questions: > >> > > > >> > > 1) Do you have a standard PAPER architecture with two ROACH boards > >> > > each containing 8 10GBe ports? > >> > > > >> > > 2) Please describe your internet architecture i.e. how are each of > the > >> > > ports connected. > >> > > > >> > > 3) What is the cadence of each broadcast? > >> > > > >> > > My current suspicion is that you are generating more data than you > can > >> > > push through your interface(s). It may be that the higher data > volume in > >> > > your implementation requires more of a network infrastructure than > was > >> > > required byt the original system. > >> > > > >> > > -Joe Kujawski > >> > > > >> > > On Thu, Oct 23, 2014 at 11:01 AM, peter <peterniu...@163.com> > wrote: > >> > > This is a littel big, roach2_tl8511port is the one can send data > >> > > normally.The environment should be ok now ,Iast time the > crc32x64_con may be > >> > > missing. > >> > > Good night! > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > At 2014-10-23 22:52:54, "Kujawski, Joseph" <jkujaw...@siena.edu> > >> > > wrote: > >> > > Peter, > >> > > > >> > > 1) For reference, here is a list of the errors: > >> > > > >> > > --------------------------------- Version Log > >> > > ---------------------------------- > >> > > Version Path > >> > > System Generator 14.6 > >> > > C:/Xilinx/14.6/ISE_DS/ISE/sysgen > >> > > Matlab 8.0.0.783 (R2012b) C:/MATLAB/R2012b > >> > > ISE C:/Xilinx/14.6/ISE_DS/ISE > >> > > > >> > > > -------------------------------------------------------------------------------- > >> > > Summary of Errors: > >> > > Error 0001: Could not find the configuration m-function > >> > > "crc32x64_con... > >> > > Block: > >> > > 'roach2_fengine_tl8511port/transpose/Transpose1/crc/crc32x64' > >> > > Error 0002: Could not find the configuration m-function > >> > > "crc32x64_con... > >> > > Block: > >> > > 'roach2_fengine_tl8511port/transpose/Transpose2/crc/crc32x64' > >> > > Error 0003: Could not find the configuration m-function > >> > > "crc32x64_con... > >> > > Block: > >> > > 'roach2_fengine_tl8511port/transpose/Transpose3/crc/crc32x64' > >> > > Error 0004: Could not find the configuration m-function > >> > > "crc32x64_con... > >> > > Block: > >> > > 'roach2_fengine_tl8511port/transpose/Transpose4/crc/crc32x64' > >> > > > >> > > > -------------------------------------------------------------------------------- > >> > > > >> > > 2) Your email did not have an attachment. I have more comments, but > >> > > wanted to let you know about the attachment before you went to bed. > >> > > > >> > > -Joe Kujawski > >> > > > >> > > > >> > > > >> > > > >> > > On Thu, Oct 23, 2014 at 10:33 AM, peter <peterniu...@163.com> > wrote: > >> > > > >> > > Hi Joe, > >> > > Thanks for your warm help! > >> > > What error shows when you compile my model?Is there some file it > >> > > missed? I will packet my whole file to you in the attachment. And > how about > >> > > the PAPER one ?Did it report overflow message? It need to install > and use > >> > > the ruby to control it . > >> > > Leave the PAPER model alone, Let's talk about the 10Gb block on > roach > >> > > v2. Though your model is good to see the Data_valid and eof etc. I > don't > >> > > know how to add your model to the PAPER as I realize the PAPER have > a data > >> > > valid and EOF according to a counter.So I don't know where to put > the > >> > > model.For example,if I put the data_valid or eof control process you > >> > > designed on the 10Gbe port in PAPER model,then I think it equal to > add a > >> > > 10Gbe block instead One_GBe block in yours. *_*!! > >> > > I change the number 50 to 1025 on tutorial 2 to make packet size to > >> > > 8200 bytes ,And it seems good transfer without error.it is a > frequency > >> > > 1.3*1025. that means 1 packet send every 1.3*1025 clock.I got the > boundary > >> > > frequency 1.3*1025 by test a lot of times. but when I change the > frequency > >> > > lower than 1.3*1025,the first few packets can send out,but the > overflow > >> > > comes.I think it is the transfer frequency that determined the > overflow. > >> > > Thanks for your suggestions and advice! > >> > > peter > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > At 2014-10-23 00:14:29, "Kujawski, Joseph" <jkujaw...@siena.edu> > >> > > wrote: > >> > > > >> > > Peter, > >> > > > >> > > I find that I can not compile and simulate your design, however, > >> > > looking at the code structure, I can't tell if tx_val and tx_EOF > are high at > >> > > the same time: > >> > > > >> > > > >> > > > >> > > Also, I modified the design to send out a packet of size 8200 once > per > >> > > second (model attached) and added a register that latches the GBE > tx_aful > >> > > and tx_overrun lines so they can be read through the KATCP > interface. > >> > > Modify the model to remove the oscilloscope and Xilinx out gateways > before > >> > > compiling it for your platform. Note that this model does not > check for > >> > > overflow, though the latch will let you know if you have had one. > >> > > > >> > > Let me know how this works for you. > >> > > > >> > > -Joe Kujawski > >> > > -- > >> > > ************************************** > >> > > * Joe Kujawski > >> > > * Siena College > >> > > * Dept. of Physics and Astronomy, RB 113 > >> > > * 515 Loudon Road > >> > > * Loudonville, NY 12211-1462 > >> > > * > >> > > * Email: jkujaw...@siena.edu > >> > > * Phone: 518-867-7509 <-- NEW NUMBER > >> > > * Fax: 518-783-2986 > >> > > ************************************** > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > -- > >> > > ************************************** > >> > > * Joe Kujawski > >> > > * Siena College > >> > > * Dept. of Physics and Astronomy, RB 113 > >> > > * 515 Loudon Road > >> > > * Loudonville, NY 12211-1462 > >> > > * > >> > > * Email: jkujaw...@siena.edu > >> > > * Phone: 518-867-7509 <-- NEW NUMBER > >> > > * Fax: 518-783-2986 > >> > > ************************************** > >> > > > >> > > 从网易163邮箱发来的云附件 > >> > > > >> > > paperfengine.zip (126.71M, 2014年11月7日 22:58 到期) > >> > > 下载 > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > -- > >> > > ************************************** > >> > > * Joe Kujawski > >> > > * Siena College > >> > > * Dept. of Physics and Astronomy, RB 113 > >> > > * 515 Loudon Road > >> > > * Loudonville, NY 12211-1462 > >> > > * > >> > > * Email: jkujaw...@siena.edu > >> > > * Phone: 518-867-7509 <-- NEW NUMBER > >> > > * Fax: 518-783-2986 > >> > > ************************************** > >> > > > >> > > > >> > > > >> > > > >> > > -- > >> > > ************************************** > >> > > * Joe Kujawski > >> > > * Siena College > >> > > * Dept. of Physics and Astronomy, RB 113 > >> > > * 515 Loudon Road > >> > > * Loudonville, NY 12211-1462 > >> > > * > >> > > * Email: jkujaw...@siena.edu > >> > > * Phone: 518-867-7509 <-- NEW NUMBER > >> > > * Fax: 518-783-2986 > >> > > ************************************** > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > >> > > >> > > >