Re: [casper] Problem about the adc frequency in PAPER model.

peter Tue, 28 Oct 2014 05:35:40 -0700

Hi all,
Sorry to reply you late.
First, Though the serial number of all 8 roaches we have are in the range that 
might got wrong,fortunately, ours are installed the correct crystals (Epson 
EEG-2121-100.000L).
I have viewed the discuss yesterday.My project'final frequency is 250Mhz,but I 
didn't turn it up to 250Mhz when I run PAPER model.
As the initialization shows:



[peter@roachserver rb_test]$ ./paper_feng_init.rb roach1:0 initializing roach1 
as FID 0 connecting to roach1 roach1 roach2_fengine app/lib revision 
47c59e2/cd26bd2 disabling network transmission setting roach1 FID to 0 setting 
fftshift to 2047 setting eq to 600/1 configuring 10 GbE interfaces setting 
corner turner mode 0 (8 F engines) arming sync generator(s) arming sync 
generator(s) storing sync time in redis on redishost seeding noise generators 
arming noise generator(s) Setting F-Engine inputs to ADC signals resetting 
network interfaces enable transmission to X engines enable transmission to 
switch all done
The configuration looks ok,but no data send out because the overflow.I agree 
with David that It may not be the script that matters. Because I can use this 
script to initial my own model which are modified from PAPER for our use.What's 
more, it can send out data packets from ROACH in 200Mhz(even in 250Mhz).And the 
overflow problem has never happened.My model are sending data in 4112 bytes 
length.
I also find neither PAPER model in 75 Mhz nor my model in 200Mhz could receive 
the correct data structure on my system.I mean the Header appears in the middle 
of the packet.I found this in wireshark.


I have run the adc16_dump_chans.rb when I run PAPER model. The result is like 
flowing:


[peter@roachserver bin]$ ./adc16_dump_chans.rb -r -v pf1 data snap took 
0.363328416 seconds 111.5 112.0 112.1 112.1 127.1 127.1 127.3 127.4 112.2 112.3 
111.8 112.0 112.1 112.2 111.6 112.0 112.4 111.6 112.1 112.0 127.0 127.4 127.1 
127.3 112.1 111.4 112.0 111.7 127.3 126.7 127.4 126.6
I also download the new script as David point,but I met a name-error:


[peter@roachserver bin]$ ./paper_feng_init.rb pf1 initializing pf1 as FID 0 
connecting to pf1 ./paper_feng_init.rb:130:in `block in <main>': undefined 
local variable or method `a' for main:Object (NameError) from 
./paper_feng_init.rb:112:in `map' from ./paper_feng_init.rb:112:in `<main>'


Thanks for your communication and suggestions!
peter










At 2014-10-28 05:03:14, "David MacMahon" <dav...@astro.berkeley.edu> wrote:
>Hi, Richard and Peter,
>
>Another possibility that crossed my mind is perhaps your ROACH2s were from the 
>batch where the incorrect oscillator was installed for U72.  This seems 
>unlikely for Richard based on this email (which also describes the incorrect 
>oscillator problem in general):
>
>https://www.mail-archive.com/casper@lists.berkeley.edu/msg04909.html
>
>Maybe it's worth a double check anyway?
>
>Dave
>
>On Oct 27, 2014, at 1:41 PM, Richard Black wrote:
>
>> David,
>> 
>> We'll take another close look at what model we are actually using, just to 
>> be safe.
>> 
>> I went back and looked at our e-mails, and sure enough, you're right. You 
>> were referring to the MTU issue as being the problem you tend to suppress 
>> all memory of. It was just that you stated it in a separate paragraph, so, 
>> out-of-context, I extrapolated that you have had the same problem before. My 
>> bad for dragging your good name through the mud. :)
>> 
>> We will also update our local repositories, in the event some bizarre race 
>> condition exists on our end.
>> 
>> I didn't know that the buffer could fill up while reset was asserted. We'll 
>> definitely have to check up on that too.
>> 
>> We haven't tried dumping raw ADC data yet since we have been trying to get 
>> the data link working first. After that, we were planning to inject signal 
>> and examine outputs.
>> 
>> Thanks,
>> 
>> Richard Black
>> 
>> On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon <dav...@astro.berkeley.edu> 
>> wrote:
>> Hi, Richard,
>> 
>> On Oct 27, 2014, at 9:25 AM, Richard Black wrote:
>> 
>> > This is a reportedly fully-functional model that shouldn't require any 
>> > major changes in order to operate. However, this has clearly not been the 
>> > case in at least two independent situations (us and Peter). This begs the 
>> > question: what's so different about our use of PAPER?
>> 
>> I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is the 
>> one being used by the PAPER correlator currently fielded in South Africa.  
>> It is definitely a fully functional model.  That image (and all source files 
>> for it) is available from the git repo listed on the PAPER Correlator 
>> Manifest page of the CASPER Wiki:
>> 
>> https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest
>> 
>> > We, at BYU, have made painstakingly sure that our IP addressing schemes, 
>> > switch ports, and scripts are all configured correctly (thanks to David 
>> > MacMahon for that, btw), but we still have hit the proverbial brick wall 
>> > of 10-GbE overflow.  When I last corresponded with David, he explained 
>> > that he remembers having a similar issue before, but can't recall exactly 
>> > what the problem was.
>> 
>> Really?  I recall saying that I often forget about increasing the MTU of the 
>> 10 GbE switch and NICs.  I don't recall saying that I had a similar issue 
>> before but couldn't remember the problem.
>> 
>> > In any case, the fact that by turning down the ADC clock prior to start-up 
>> > prevents the 10-GbE core from overflowing is a major lead for us at BYU 
>> > (we've been spinning our wheels on this issue for several months now). By 
>> > no means are we proposing mid-run ADC clock modifications, but this 
>> > appears to be a very subtle (and quite sinister, in my opinion) bug.
>> >
>> > Any thoughts as to what might be going on?
>> 
>> I cannot explain the 10 GbE overflow that you and Peter are experiencing.  I 
>> have pushed some updates to the rb-papergpu.git repository listed on the 
>> PAPER Correlator Manifest page.  The paper_feng_init.rb script now verifies 
>> that the ADC clocks are locked and provides options for issuing a software 
>> sync (only recommended for lab use) and for not storing the time of 
>> synchronization in redis (also only recommended for lab use).
>> 
>> The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1) 
>> while they are held in reset.  Since you are using the paper_feng_init.rb 
>> script, this should not be happening (unless something has gone wrong during 
>> the running of that script) because that script specifically and explicitly 
>> disables the tx_valid signal before putting the cores into reset and it 
>> takes the cores out of reset before enabling the tx_valid signal.  So 
>> assuming that this is not the cause of the overflows, there must be 
>> something else that is causing the 10 GbE cores to be unable to transmit 
>> data fast enough to keep up with the data stream it is being fed.  Two 
>> things that could cause this are 1) running the design faster than the 200 
>> MHz sample clock that it was built for and/or 2) some link issue that 
>> prevents the core from sending data.  Unfortunately, I think both of those 
>> ideas are also pretty far fetched given all you've done to try to get the 
>> system working.  I wonder whether there is some difference in the ROACH2 
>> firmware (u-boot version or CPLD programming) or PPC Linux setup or 
>> tcpborhpserver revision or ???.
>> 
>> Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data 
>> to make sure that it looks OK?
>> 
>> Dave
>> 
>> 
>

Re: [casper] Problem about the adc frequency in PAPER model.

Reply via email to