Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-19 Thread Marc Welz
On Wed, Nov 19, 2014 at 6:50 AM, Peter Niu peterniu...@163.com wrote:

 Hi,Dave,
 Sorry reply you late.
 The little trouble I encountered in netboot turned out to be that the
 uImage I am using have changed.Well ,As for a test ,I download the latest
 uImage from
 https://github.com/ska-sa/roach2_nfs_uboot/tree/master/boot, the
 uImage-roach2-3.16-hwmon
 https://github.com/ska-sa/roach2_nfs_uboot/blob/master/boot/uImage-roach2-3.16-hwmon
  as
 the uImage in netboot.
 The file like this:
 [peter@roachserver ~]$ file -L /srv/roach_boot/boot/uImage
 /srv/roach_boot/boot/uImage: u-boot legacy uImage,
 Linux-3.16.0-saska-03675-g1c70f, Linux/PowerPC, OS Kernel Image (gzip),
 3034204 bytes, Tue Aug 26 14:54:14 2014, Load Address: 0x0070, Entry
 Point: 0x007010C4, Header CRC: 0x66EDCF88, Data CRC: 0x42A230BA
 I changed the uImage to uImage-r2borph3
 https://github.com/ska-sa/roach2_nfs_uboot/blob/master/boot/uImage-r2borph3
 ,


 There should be an even newer uImage (ie linux kernel) and romfs (ie flash
filesystem, containing tcpborphserver3) at that location.

I think the most notable change is that we have changed the kernel memory
model, so
that the full 128Mb fpga address space is visible in one go. There are a
probably some other
fixes and change too - the commit logs in katcp_devel should have some
information.

Things are rather busy here, so apologies for not updating the NFS
filesystem - we currently don't use it, so it is likely to remain out of
date, though Dave (I think ?) maintains a more
recent version.

regards

marc


Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-19 Thread Marc Welz
On Wed, Nov 19, 2014 at 8:37 AM, Marc Welz m...@ska.ac.za wrote:

 There should be an even newer uImage (ie linux kernel) and romfs (ie flash
 filesystem, containing tcpborphserver3) at that location.

 I think the most notable change is that we have changed the kernel memory
 model, so
 that the full 128Mb fpga address space is visible in one go.


... meaning that you would need to update both the kernel and
tcpborphserver3 to the revisions checked in a week ago or so, to map the
full address space - just updating one will not be sufficient.

regards

marc


Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-19 Thread Marc Welz
Hello



  I find a updated roach2-root-fullmap-2014-08-12.romfs.Could you please
 tell me what should I do to make it work?
  Should I put this file in the same place as tcpborphserver3 in Roach2
 file system (/usr/local/sbin)?
 Thanks for your answer ,I am totally a fresh man. :)
 Peter



If you are not solobooting, then on a linux pc somewhere

# mkdir -p /mnt/tmp  mount -o loop roach2-root-fullmap-2014-08-12.romfs
/mnt/tmp

... now copy out /mnt/tmp/sbin/tcpborphserver3 to where you need it

regards

marc


Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread Marc Welz
On Thu, Nov 13, 2014 at 5:49 AM, Richard Black aeldstes...@gmail.com
wrote:

 Wow. Well that seemed to be the magic bullet. Thanks!

 Any ideas why this works? Is it because of an NFS lock-out or a 10-GbE
 driver issue in the NFS kernel image?


So I don't know. It could also be a version difference ? The things to look
at are
the kernel and tcpborphserver  (the former is a file in its own right, the
latter can
be gotten by mounting a romfs image via loopback and copying out
/sbin/tcpborphserver3).

We also have had interesting cases where the fpga doesn't quite do what the
bus controller
on the power pc expects to happen - in those cases random perturbations
change the behaviour,
although pathological cases can have the fpga contend with flash accesses
which then corrupts things.

Also look in https://github.com/ska-sa/roach2_nfs_uboot, particularly the
boot directory - occasionally prebuilt images get uploaded there, though
for the change information you will
have to read the ska-sa/katcp_devel commits.

Final, unrelated, tip: It is fine to have another (interactive) telnet
connection to port 7147
on the roach while your scripts are doing things - this connection can be
used to see failures or problems, and for detailed debugging messages, try
typing ?log-level trace - just be mindful
of the performance impact. There is a tool (kcplog) which can be built for
a remote machine
to automate this.

regards

marc


Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread David MacMahon
Hi, Richard,

I'm glad this fixed your problem as well!  This is definitely one for the 
wiki!!!

Dave

On Nov 12, 2014, at 2:34 PM, Richard Black wrote:

 Wow. Well that seemed to be the magic bullet. Thanks!
 
 Any ideas why this works? Is it because of an NFS lock-out or a 10-GbE driver 
 issue in the NFS kernel image?
 
 In any case, this is a tremendous discovery! Thanks to all for all the effort!
 
 Richard
 
 On Wednesday, November 12, 2014, 牛晨辉 peterniu...@163.com wrote:
 
 Hi All,
 I'm happy to tell you the PAPER model can run without overflow finally!
 I find the bof file no matter PAPER model or own could run in 200Mhz and the 
 packet structure is right.
 That is the System setup on roach it matters,(Thanks to Marc's help in 
 soloboot!).I try the soloboot on the roach,
 and it works fine for the model.
 I don't know why the setup on netboot is not ok ,(it influenced the frequency 
 too much I guess)however, FWIW,The overflow problem company with me for few 
 weeks finally solved out!
 I could have a good sleep tonight,Thanks for your warm help!
 Peter
 
 
 
 
 
 
 At 2014-11-08 03:10:47, David MacMahon dav...@astro.berkeley.edu
  wrote:
 Hi, Richard,
 
 I think that your 1 PPS should be very usable.  I think we typically 
 generate the 1 PPS from a GPS clock.
 
 If you want to try a test, you could disconnect the 1 PPS and use the 
 software generated sync signal as per the earlier emails.  If that works and 
 using the external 1 PPS doesn't then you will have found the problem.  I'd 
 be surprised (but happy!) if that turns out to be the problem.
 
 Dave
 
 On Nov 7, 2014, at 10:55 AM, Richard Black wrote:
 
  Thanks David and all,
  
  I unfortunately misspoke when it came to the power in the ADC clock 
  signal. In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
  
  I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To 
  check on possible ringing, I also hooked up our pulse generator to an 
  oscilloscope (I increased the pulse width to 10 ms, so I could see it). 
  The waveform I observe has some severe overshoot both on the uptake and 
  down. I've attached a drawing to explain what I mean.
  
  I can't seem to mitigate this overshoot with our little Agilent arbitrary 
  waveform generator. Is this similar to the ringing seen at NRAO? If so, 
  how is the 1 PPS generated by casperites?
  
  Thanks,
  
  Richard Black
  
  On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon 
 dav...@astro.berkeley.edu
  wrote:
  Hi, Richard,
  
  On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
  
   Haven't heard anything for a while, so I thought I would add some more 
   detail about our system setup to see if it might shed some light on the 
   problem:
  
   1 PPS Signal
   -
   Square pulse
   Frequency: 1 Hz
   Amplitude: 3 Vpp
   Offset: 0 V
   Width: 10 ms
   Edge Time: 5 ns
  
  That should be fine assuming the 3 Vpp is measured with the 50 ohm 
  termination in place.  If you want to try a software sync, you can pass 
  -S (UPPERcase!) to the latest paper_feng_init.rb script.  Check the 
  output of paper_feng_init.rb --help to see whether your version supports 
  that option.
  
   ADC Clock
   -
   CW Tone
   Frequency: 200 MHz
   Power: -9 dBm
  
  It would be a good idea to increase the power level to +6 dBm as described 
  on this wiki page:
  
  
 https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
 
  
  But if the paper_feng_init.rb script reports that the ADC clocks are 
  locked and they measure approximately 200 MHz, then I think this is 
  unlikely to be the cause of the 10 GbE overflow problems (though it would 
  be great if the fix were this simple!).
  
   For David, are there any red flags with our UBoot version or ROACH CPLD? 
   Here they are again for reference:
  
   From serial interface after ROACH reboot
   ==
   U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
   ...
   CPLD: 2.1
   ==
  
  This matches one of our ROACH2s that is running and sending 10 GbE packets 
  in our lab:
  
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
  
  CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
 No Security/Kasumi support
 Bootstrap Option C - Boot ROM Location EBC (16 bits)
 32 kB I-Cache 32 kB D-Cache
  Board: ROACH2
  I2C:   ready
  DRAM:  512 MiB
  Flash: 128 MiB
  In:serial
  Out:   serial
  Err:   serial
  CPLD:  2.1
  USB:   Host(int phy)
  SN:ROACH2.2 batch=D#6#69 software fixups match
  MAC:   02:44:01:02:06:45
  DTT:   1 is 23 C
  DTT:   2 is 26 C
  Net:   ppc_4xx_eth0
  
  Hope this helps,
  Dave
  
  
  pulse_profile.png
 
 
 
 
 
 
 -- 
 Richard Black




Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread David MacMahon
Hi, Marc,

On Nov 13, 2014, at 12:08 AM, Marc Welz wrote:

 On Thu, Nov 13, 2014 at 5:49 AM, Richard Black aeldstes...@gmail.com wrote:
 
 Any ideas why this works? Is it because of an NFS lock-out or a 10-GbE 
 driver issue in the NFS kernel image?

None of the control stuff goes over NFS so I don't think that's likely to be 
the problem, but at this point (almost) nothing would surprise me.

 So I don't know. It could also be a version difference ? The things to look 
 at are
 the kernel and tcpborphserver  (the former is a file in its own right, the 
 latter can
 be gotten by mounting a romfs image via loopback and copying out 
 /sbin/tcpborphserver3).

Are the drivers that provide the /dev/roach/mem and /dev/roach/config nodes 
compiled into the kernel image?

 We also have had interesting cases where the fpga doesn't quite do what the 
 bus controller
 on the power pc expects to happen - in those cases random perturbations 
 change the behaviour, 
 although pathological cases can have the fpga contend with flash accesses 
 which then corrupts things. 
 
 Also look in https://github.com/ska-sa/roach2_nfs_uboot, particularly the 
 boot directory - occasionally prebuilt images get uploaded there, though for 
 the change information you will
 have to read the ska-sa/katcp_devel commits. 
 
 Final, unrelated, tip: It is fine to have another (interactive) telnet 
 connection to port 7147 
 on the roach while your scripts are doing things - this connection can be 
 used to see failures or problems, and for detailed debugging messages, try 
 typing ?log-level trace - just be mindful
 of the performance impact. There is a tool (kcplog) which can be built for a 
 remote machine 
 to automate this.

Thanks for the tips!

Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread Marc Welz
On Thu, Nov 13, 2014 at 8:32 AM, David MacMahon dav...@astro.berkeley.edu
wrote:


 Are the drivers that provide the /dev/roach/mem and /dev/roach/config
 nodes compiled into the kernel image?


Yes, the roach kernels have never used modules

regards

marc






Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread David MacMahon
Thanks, Marc,

On Nov 13, 2014, at 12:08 AM, Marc Welz wrote:

 Also look in https://github.com/ska-sa/roach2_nfs_uboot, particularly the 
 boot directory - occasionally prebuilt images get uploaded there, though for 
 the change information you will
 have to read the ska-sa/katcp_devel commits. 

FWIW, we are using the boot/uImage-r2borph3 kernel image from commit a8da6b6 
of that repository.  The file command shows it as:

$ file -L /srv/tftpboot/uboot-roach2/uImage-r2borph3
/srv/tftpboot/uboot-roach2/uImage-r2borph3: u-boot legacy uImage, 
Linux-3.7.0-rc2+, Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Sun Nov 
18 23:30:35 2012, Load Address: 0x0050, Entry Point: 0x005010D4, Header 
CRC: 0x9BDC0E32, Data CRC: 0xF3A1DC96

Interestingly, the (NOT used by PAPER) soloboot uImage kernel image in 
/dev/mtdblock0 on one of our ROACH2s deployed in South Africa is:

root@r2d020808:~# file -s /dev/mtdblock0
/dev/mtdblock0: u-boot legacy uImage, Linux-3.4.0-rc3+, Linux/PowerPC, OS 
Kernel Image (gzip), 2429134 bytes, Tue May 29 17:05:09 2012, Load Address: 
0x0050, Entry Point: 0x00500460, Header CRC: 0xCAB17B63, Data CRC: 
0x096FD3C7

...while the (NOT used by PAPER) soloboot uImage kernel image in /dev/mtdblock0 
on two ROACH2s in our lab is:

root@r2d020813:~# file -s /dev/mtdblock0 
/dev/mtdblock0: u-boot legacy uImage, Linux-3.9.0-rc1+, Linux/PowerPC, OS 
Kernel Image (gzip), 2345540 bytes, Wed Mar  6 02:54:34 2013, Load Address: 
0x0050, Entry Point: 0x005010D4, Header CRC: 0xC0B47AFF, Data CRC: 
0x9247592F

root@r2d020669:~# file -s /dev/mtdblock0
/dev/mtdblock0: u-boot legacy uImage, Linux-3.9.0-rc1+, Linux/PowerPC, OS 
Kernel Image (gzip), 2345540 bytes, Wed Mar  6 02:54:34 2013, Load Address: 
0x0050, Entry Point: 0x005010D4, Header CRC: 0xC0B47AFF, Data CRC: 
0x9247592F

These two ROACH2s were repaired by Digicom (813 for the U72 fix and 669 for 
vehicular stress).  It looks like Digicom is populating the ROACH2 soloboot 
with a new uImage that is not available in the roach2_nfs_uboot repo.  Are 
different kernels required for netboot vs soloboot or is this just an oversight?

Richard and/or Peter,

I'm curious to know what versions of uImage you have for both your netboot 
environment and in /dev/mtdblock0 on your ROACH2s.  Can you please run the 
above file commands on your uImages and report back with the results?  This 
will hopefully help us zero in on where the problem is (and where/when it was 
corrected).

Thanks,
Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread Peter Niu



Hi ,Dave
Though I am quite a new hand in the uImage system ,I did suspect the uImage 
could cause this problem.Do you remember one of my roach in netboot doesn't 
work normally I mentioned previously?It works fine with the soloboot. 
Interestingly,one of my roach works fine with netboot before,now could not 
works in soloboot!(Using the telnet,The /boffiles could not found in the 
soloboot Roach linux while others could) .
I checked the uImage on the no-work-in-soloboot roach,Well,I use soloboot 
now,so the file command can not be found on busybox.
~ # file
-sh: file: not found

but the set-up in soloboot process tell me the ulmage is like this:
Image Name: Linux-3.4.0-rc3+
Image Type: PowerPC Linux Kernel Image (gzip compressed)
Data Size: 2429134 Bytes = 2.3 MiB
Load Address: 0050
Entry Point: 00500460
Verifying Checksum ... OK
Uncompressing Kernel Image ... OK

The same roach run in netboot,and log in as root in ssh,The information like 
this:

root@pf1:~# file -s /dev/mtdblock0
/dev/mtdblock0: u-boot legacy uImage, Linux-3.4.0-rc3+, Linux/PowerPC, OS 
Kernel Image (gzip), 2429134 bytes, Tue May 29 15:05:09 2012, Load Address: 
0x00507

As the netboot,we also use the 

uImage-r2borph3

In our PC, the information like this:
[peter@roachserver ~]$ file -L /srv/roach_boot/boot/uImage 
/srv/roach_boot/boot/uImage: u-boot legacy uImage, Linux-3.7.0-rc2+, 
Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Mon Nov 19 15:30:35 2012, 
Load Address: 0x0050, Entry Point: 0x005010D4, Header CRC: 0x9BDC0E32, Data 
CRC: 0xF3A1DC96
(I am not sure why the data is not same with you:Mon Nov 19 15:30:35 2012)

what's more information,The other roachs which could work in both soloboot and 
netboot. 
The soloboot information in set-up process:
## Booting kernel from Legacy Image at f800 ...
   Image Name:   Linux-3.9.0-rc1+
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:2345540 Bytes = 2.2 MiB
   Load Address: 0050
   Entry Point:  005010d4
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK
As the no-work-in-soloboot roach in soloboot has a image name :Linux-3.4.0-rc3+
I am not sure whether is the Linux version in soloboot that matters.
Jason have once mentioned this similar question to me,he also sent me a latest 
binary romfs soloboot.
https://www.mail-archive.com/casper%40lists.berkeley.edu/msg05393.html
Hope these information could be helpful to our question!
Thanks for your warm help to me in PAPER model !

Peter

PS:I also found a new version on https://github.com/ska-sa/roach2_nfs_uboot 
upload on Nov 12, 2014.I will try it latter.





At 2014-11-14 08:40:35, David MacMahon dav...@astro.berkeley.edu wrote:
Thanks, Marc,

On Nov 13, 2014, at 12:08 AM, Marc Welz wrote:

 Also look in https://github.com/ska-sa/roach2_nfs_uboot, particularly the 
 boot directory - occasionally prebuilt images get uploaded there, though for 
 the change information you will
 have to read the ska-sa/katcp_devel commits. 

FWIW, we are using the boot/uImage-r2borph3 kernel image from commit a8da6b6 
of that repository.  The file command shows it as:

$ file -L /srv/tftpboot/uboot-roach2/uImage-r2borph3
/srv/tftpboot/uboot-roach2/uImage-r2borph3: u-boot legacy uImage, 
Linux-3.7.0-rc2+, Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Sun 
Nov 18 23:30:35 2012, Load Address: 0x0050, Entry Point: 0x005010D4, 
Header CRC: 0x9BDC0E32, Data CRC: 0xF3A1DC96

Interestingly, the (NOT used by PAPER) soloboot uImage kernel image in 
/dev/mtdblock0 on one of our ROACH2s deployed in South Africa is:

root@r2d020808:~# file -s /dev/mtdblock0
/dev/mtdblock0: u-boot legacy uImage, Linux-3.4.0-rc3+, Linux/PowerPC, OS 
Kernel Image (gzip), 2429134 bytes, Tue May 29 17:05:09 2012, Load Address: 
0x0050, Entry Point: 0x00500460, Header CRC: 0xCAB17B63, Data CRC: 
0x096FD3C7

...while the (NOT used by PAPER) soloboot uImage kernel image in 
/dev/mtdblock0 on two ROACH2s in our lab is:

root@r2d020813:~# file -s /dev/mtdblock0 
/dev/mtdblock0: u-boot legacy uImage, Linux-3.9.0-rc1+, Linux/PowerPC, OS 
Kernel Image (gzip), 2345540 bytes, Wed Mar  6 02:54:34 2013, Load Address: 
0x0050, Entry Point: 0x005010D4, Header CRC: 0xC0B47AFF, Data CRC: 
0x9247592F

root@r2d020669:~# file -s /dev/mtdblock0
/dev/mtdblock0: u-boot legacy uImage, Linux-3.9.0-rc1+, Linux/PowerPC, OS 
Kernel Image (gzip), 2345540 bytes, Wed Mar  6 02:54:34 2013, Load Address: 
0x0050, Entry Point: 0x005010D4, Header CRC: 0xC0B47AFF, Data CRC: 
0x9247592F

These two ROACH2s were repaired by Digicom (813 for the U72 fix and 669 for 
vehicular stress).  It looks like Digicom is populating the ROACH2 soloboot 
with a new uImage that is not available in the roach2_nfs_uboot repo.  Are 
different kernels required for netboot vs soloboot or is this just an 
oversight?

Richard and/or Peter,

I'm curious to know what versions of uImage you have for both your netboot 
environment and in /dev/mtdblock0 on your 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread David MacMahon
Hi, Peter,

Thanks for this information!

On Nov 13, 2014, at 7:22 PM, Peter Niu wrote:

 In our PC, the information like this:
 [peter@roachserver ~]$ file -L /srv/roach_boot/boot/uImage 
 /srv/roach_boot/boot/uImage: u-boot legacy uImage, Linux-3.7.0-rc2+, 
 Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Mon Nov 19 15:30:35 
 2012, Load Address: 0x0050, Entry Point: 0x005010D4, Header CRC: 
 0x9BDC0E32, Data CRC: 0xF3A1DC96
 (I am not sure why the data is not same with you:Mon Nov 19 15:30:35 2012)

 At 2014-11-14 08:40:35, David MacMahon dav...@astro.berkeley.edu wrote:
 
 $ file -L /srv/tftpboot/uboot-roach2/uImage-r2borph3
 /srv/tftpboot/uboot-roach2/uImage-r2borph3: u-boot legacy uImage, 
 Linux-3.7.0-rc2+, Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Sun 
 Nov 18 23:30:35 2012, Load Address: 0x0050, Entry Point: 0x005010D4, 
 Header CRC: 0x9BDC0E32, Data CRC: 0xF3A1DC96

These are the same uImage.  The length, header CRC, and data CRC match.  The 
timestamps differ by 16 hours, but I think that's because the timestamp is 
printed in the local timezone.  If you do:

env TZ=UTC file -L /srv/roach_boot/boot/uImage

...you will get a timestamp of Mon Nov 19 07:30:35 2012.

This means that the uImage file is NOT the cause of the problem since the same 
version works for us but not for you.  I think this might leave only the 
tcpborphserver version as the cause of the problem.  Could it be anything else?

Can you please run:

telnet pf1 7147

(Type CTRL-] then q then ENTER to quit.)

against both the soloboot and netboot environments and let me know the results?

Thanks again,
Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-12 Thread 牛晨辉


Hi All,
I'm happy to tell you the PAPER model can run without overflow finally!
I find the bof file no matter PAPER model or own could run in 200Mhz and the 
packet structure is right.
That is the System setup on roach it matters,(Thanks to Marc's help in 
soloboot!).I try the soloboot on the roach,
and it works fine for the model.
I don't know why the setup on netboot is not ok ,(it influenced the frequency 
too much I guess)however, FWIW,The overflow problem company with me for few 
weeks finally solved out!
I could have a good sleep tonight,Thanks for your warm help!
Peter









At 2014-11-08 03:10:47, David MacMahon dav...@astro.berkeley.edu wrote:
Hi, Richard,

I think that your 1 PPS should be very usable.  I think we typically generate 
the 1 PPS from a GPS clock.

If you want to try a test, you could disconnect the 1 PPS and use the software 
generated sync signal as per the earlier emails.  If that works and using the 
external 1 PPS doesn't then you will have found the problem.  I'd be surprised 
(but happy!) if that turns out to be the problem.

Dave

On Nov 7, 2014, at 10:55 AM, Richard Black wrote:

 Thanks David and all,
 
 I unfortunately misspoke when it came to the power in the ADC clock signal. 
 In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
 
 I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To check 
 on possible ringing, I also hooked up our pulse generator to an oscilloscope 
 (I increased the pulse width to 10 ms, so I could see it). The waveform I 
 observe has some severe overshoot both on the uptake and down. I've attached 
 a drawing to explain what I mean.
 
 I can't seem to mitigate this overshoot with our little Agilent arbitrary 
 waveform generator. Is this similar to the ringing seen at NRAO? If so, how 
 is the 1 PPS generated by casperites?
 
 Thanks,
 
 Richard Black
 
 On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon dav...@astro.berkeley.edu 
 wrote:
 Hi, Richard,
 
 On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
 
  Haven't heard anything for a while, so I thought I would add some more 
  detail about our system setup to see if it might shed some light on the 
  problem:
 
  1 PPS Signal
  -
  Square pulse
  Frequency: 1 Hz
  Amplitude: 3 Vpp
  Offset: 0 V
  Width: 10 ms
  Edge Time: 5 ns
 
 That should be fine assuming the 3 Vpp is measured with the 50 ohm 
 termination in place.  If you want to try a software sync, you can pass -S 
 (UPPERcase!) to the latest paper_feng_init.rb script.  Check the output of 
 paper_feng_init.rb --help to see whether your version supports that option.
 
  ADC Clock
  -
  CW Tone
  Frequency: 200 MHz
  Power: -9 dBm
 
 It would be a good idea to increase the power level to +6 dBm as described 
 on this wiki page:
 
 https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
 
 But if the paper_feng_init.rb script reports that the ADC clocks are locked 
 and they measure approximately 200 MHz, then I think this is unlikely to be 
 the cause of the 10 GbE overflow problems (though it would be great if the 
 fix were this simple!).
 
  For David, are there any red flags with our UBoot version or ROACH CPLD? 
  Here they are again for reference:
 
  From serial interface after ROACH reboot
  ==
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
  ...
  CPLD: 2.1
  ==
 
 This matches one of our ROACH2s that is running and sending 10 GbE packets 
 in our lab:
 
 U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
 
 CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
No Security/Kasumi support
Bootstrap Option C - Boot ROM Location EBC (16 bits)
32 kB I-Cache 32 kB D-Cache
 Board: ROACH2
 I2C:   ready
 DRAM:  512 MiB
 Flash: 128 MiB
 In:serial
 Out:   serial
 Err:   serial
 CPLD:  2.1
 USB:   Host(int phy)
 SN:ROACH2.2 batch=D#6#69 software fixups match
 MAC:   02:44:01:02:06:45
 DTT:   1 is 23 C
 DTT:   2 is 26 C
 Net:   ppc_4xx_eth0
 
 Hope this helps,
 Dave
 
 
 pulse_profile.png



Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-12 Thread Richard Black
Wow. Well that seemed to be the magic bullet. Thanks!

Any ideas why this works? Is it because of an NFS lock-out or a 10-GbE
driver issue in the NFS kernel image?

In any case, this is a tremendous discovery! Thanks to all for all the
effort!

Richard

On Wednesday, November 12, 2014, 牛晨辉 peterniu...@163.com wrote:


 Hi All,
 I'm happy to tell you the PAPER model can run without overflow finally!
 I find the bof file no matter PAPER model or own could run in 200Mhz and
 the packet structure is right.
 That is the System setup on roach it matters,(Thanks to Marc's help in
 soloboot!).I try the soloboot on the roach,
 and it works fine for the model.
 I don't know why the setup on netboot is not ok ,(it influenced the
 frequency too much I guess)however, FWIW,The overflow problem company with
 me for few weeks finally solved out!
 I could have a good sleep tonight,Thanks for your warm help!
 Peter






 At 2014-11-08 03:10:47, David MacMahon dav...@astro.berkeley.edu 
 javascript:_e(%7B%7D,'cvml','dav...@astro.berkeley.edu'); wrote:
 Hi, Richard,
 
 I think that your 1 PPS should be very usable.  I think we typically 
 generate the 1 PPS from a GPS clock.
 
 If you want to try a test, you could disconnect the 1 PPS and use the 
 software generated sync signal as per the earlier emails.  If that works and 
 using the external 1 PPS doesn't then you will have found the problem.  I'd 
 be surprised (but happy!) if that turns out to be the problem.
 
 Dave
 
 On Nov 7, 2014, at 10:55 AM, Richard Black wrote:
 
  Thanks David and all,
 
  I unfortunately misspoke when it came to the power in the ADC clock 
  signal. In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
 
  I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To 
  check on possible ringing, I also hooked up our pulse generator to an 
  oscilloscope (I increased the pulse width to 10 ms, so I could see it). 
  The waveform I observe has some severe overshoot both on the uptake and 
  down. I've attached a drawing to explain what I mean.
 
  I can't seem to mitigate this overshoot with our little Agilent arbitrary 
  waveform generator. Is this similar to the ringing seen at NRAO? If so, 
  how is the 1 PPS generated by casperites?
 
  Thanks,
 
  Richard Black
 
  On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon dav...@astro.berkeley.edu 
  javascript:_e(%7B%7D,'cvml','dav...@astro.berkeley.edu'); wrote:
  Hi, Richard,
 
  On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
 
   Haven't heard anything for a while, so I thought I would add some more 
   detail about our system setup to see if it might shed some light on the 
   problem:
  
   1 PPS Signal
   -
   Square pulse
   Frequency: 1 Hz
   Amplitude: 3 Vpp
   Offset: 0 V
   Width: 10 ms
   Edge Time: 5 ns
 
  That should be fine assuming the 3 Vpp is measured with the 50 ohm 
  termination in place.  If you want to try a software sync, you can pass 
  -S (UPPERcase!) to the latest paper_feng_init.rb script.  Check the 
  output of paper_feng_init.rb --help to see whether your version supports 
  that option.
 
   ADC Clock
   -
   CW Tone
   Frequency: 200 MHz
   Power: -9 dBm
 
  It would be a good idea to increase the power level to +6 dBm as described 
  on this wiki page:
 
  https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
 
  But if the paper_feng_init.rb script reports that the ADC clocks are 
  locked and they measure approximately 200 MHz, then I think this is 
  unlikely to be the cause of the 10 GbE overflow problems (though it would 
  be great if the fix were this simple!).
 
   For David, are there any red flags with our UBoot version or ROACH CPLD? 
   Here they are again for reference:
  
   From serial interface after ROACH reboot
   ==
   U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
   ...
   CPLD: 2.1
   ==
 
  This matches one of our ROACH2s that is running and sending 10 GbE packets 
  in our lab:
 
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
 
  CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
 No Security/Kasumi support
 Bootstrap Option C - Boot ROM Location EBC (16 bits)
 32 kB I-Cache 32 kB D-Cache
  Board: ROACH2
  I2C:   ready
  DRAM:  512 MiB
  Flash: 128 MiB
  In:serial
  Out:   serial
  Err:   serial
  CPLD:  2.1
  USB:   Host(int phy)
  SN:ROACH2.2 batch=D#6#69 software fixups match
  MAC:   02:44:01:02:06:45
  DTT:   1 is 23 C
  DTT:   2 is 26 C
  Net:   ppc_4xx_eth0
 
  Hope this helps,
  Dave
 
 
  pulse_profile.png
 





-- 
Richard Black


Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Richard Black
Hi all,

Haven't heard anything for a while, so I thought I would add some more
detail about our system setup to see if it might shed some light on the
problem:

1 PPS Signal
-
Square pulse
Frequency: 1 Hz
Amplitude: 3 Vpp
Offset: 0 V
Width: 10 ms
Edge Time: 5 ns

ADC Clock
-
CW Tone
Frequency: 200 MHz
Power: -9 dBm

For David, are there any red flags with our UBoot version or ROACH CPLD?
Here they are again for reference:

From serial interface after ROACH reboot
==
U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
...
CPLD: 2.1
==

Thanks!


Richard Black

On Tue, Nov 4, 2014 at 12:05 PM, Richard Black aeldstes...@gmail.com
wrote:

 Hi David,

 Comments below:

 Richard Black

 On Mon, Nov 3, 2014 at 3:51 PM, David MacMahon dav...@astro.berkeley.edu
 wrote:

 Hi, Richard,

 On Nov 3, 2014, at 11:47 AM, Richard Black wrote:

  So, it's been a little while now, but not much has changed yet. We've
 gotten Chipscope working, and, so far, there aren't any red flags with the
 FPGA firmware 10-GbE control signals.

 That's good to know, although maybe in some way it would have been nice
 if you had found some red flags.

  We also confirmed that the bitstream we are using is in fact
 roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not the
 problem.

 At least you are using a known good BOF file, so that eliminates a source
 of potential errors.

  I also took a look at the ROACH2 PPC setup: we pulled from the .git
 repository on February 12, 2014 (commit number =
 e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any changes
 to that repository since August 2013, so unless the SKA-SA ROACH-2s are
 using a pull from before then, I don't think that is our issue.

 We use our own homegrown NFS root filesystem for the ROACH2s, so I can't
 comment on the status of the one you refer to (
 https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more interested
 in the U-Boot version you have (see
 https://github.com/ska-sa/roach2_uboot.git) and which version of the
 ROACH2 CPLD image you are using (not sure where to get this).  I think
 these are unlikely to be problematic, but we've already checked all the
 likely problems.


 When I rebooted the ROACH-2, I got the following header for U-Boot:

 U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
 ...
 CPLD: 2.1

 Hope this is informative.




  We also tried out Jason Manley's suggestion of delaying the enabling of
 the 10-GbE cores to ensure that the sync pulse propagated through the
 entire system before buffering up data, but the problem persisted.

 Do you have an external 1 PPS sync pulse connected or have you tried the
 latest rb-papergpu software that supports a software-generated sync?  The
 paper_feng_init.rb script already disables the data flow to the 10 GbE
 cores until the sync pulse has propagated through and the cores have been
 taken out of reset.


 We are using an external 1 PPS sync pulse. However, we are certain that
 it's set up correctly. Although, this could just be me grasping at straws
 since nothing else seems to solve the problem. How would we go about
 setting up the software-generated pulse?


 Does the latest rb-papergpu code show that the ADC clocks (MMCMs) are
 locked?  Does it estimate the clock frequency correctly?  Does
 adc16_dump_chans.rb show samples that correspond correctly to the analog
 inputs (e.g. a CW tone)?


 I've attached an image of the output from xtor_up.sh -f 1 with the latest
 rb-papergpu code. Nothing significant to note: the clock reads ~200 MHz.

 I've also attached an image of the output from adc16_dump_chans.rb, where
 A1 has a CW tone with a 10-MHz 40-V emf signal. You can see the
 oscillations in the first column and noise everywhere else.



  Just to rule it out, I double-checked (or more accurately
 triple-checked) the U72 part, and, sure enough, it is the correct
 oscillator, model number EEG-2121.

 Does it have the L suffix on the 100.000L frequency part of the chip
 markings?


 Yes, it does.


 On a related note, as I sent off-list to you and Peter earlier today:
 The fact that the Peter can send small packets at 200 MHz without overflow,
 but large packets give overflow is very interesting and puzzling.  I assume
 that the smaller packets are just fewer channels of the same length
 spectrum and that the number of packets per second remains the same (I
 think we discussed this previously).  In that case, the small packets
 reduce the data rate, which suggests that the 156.25 MHz xaui_ref_clk
 clock is maybe not really 156.25 MHz but something somewhat slower.  This
 clock is driven by the oscillator at U56 and the clock splitter at U54 (see
 attached schematic snippet).  Can you please inspect those parts on your
 board(s)?  I will be able to inspect a ROACH2 this afternoon and report
 what I have on a known working system.

 On one 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Dan Werthimer
you mentioned your 1 PPS is a square wave.
that's different from everyone else's 1 PPS:

standard 1 PPS systems output a pulse that is
high for about 1 uS.  (extremely low duty cycle).

i don't know if a square wave could be a problem - my guess
is that the correlator design uses an edge detection block,
so is only sensitive to edges, not levels, but it might
be worth investigating.

best wishes,

dan


On Fri, Nov 7, 2014 at 9:03 AM, Richard Black aeldstes...@gmail.com wrote:
 Hi all,

 Haven't heard anything for a while, so I thought I would add some more
 detail about our system setup to see if it might shed some light on the
 problem:

 1 PPS Signal
 -
 Square pulse
 Frequency: 1 Hz
 Amplitude: 3 Vpp
 Offset: 0 V
 Width: 10 ms
 Edge Time: 5 ns

 ADC Clock
 -
 CW Tone
 Frequency: 200 MHz
 Power: -9 dBm

 For David, are there any red flags with our UBoot version or ROACH CPLD?
 Here they are again for reference:

 From serial interface after ROACH reboot
 ==
 U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
 ...
 CPLD: 2.1
 ==

 Thanks!


 Richard Black

 On Tue, Nov 4, 2014 at 12:05 PM, Richard Black aeldstes...@gmail.com
 wrote:

 Hi David,

 Comments below:

 Richard Black

 On Mon, Nov 3, 2014 at 3:51 PM, David MacMahon dav...@astro.berkeley.edu
 wrote:

 Hi, Richard,

 On Nov 3, 2014, at 11:47 AM, Richard Black wrote:

  So, it's been a little while now, but not much has changed yet. We've
  gotten Chipscope working, and, so far, there aren't any red flags with the
  FPGA firmware 10-GbE control signals.

 That's good to know, although maybe in some way it would have been nice
 if you had found some red flags.

  We also confirmed that the bitstream we are using is in fact
  roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not the
  problem.

 At least you are using a known good BOF file, so that eliminates a source
 of potential errors.

  I also took a look at the ROACH2 PPC setup: we pulled from the .git
  repository on February 12, 2014 (commit number =
  e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any changes 
  to
  that repository since August 2013, so unless the SKA-SA ROACH-2s are 
  using a
  pull from before then, I don't think that is our issue.

 We use our own homegrown NFS root filesystem for the ROACH2s, so I can't
 comment on the status of the one you refer to
 (https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more interested in
 the U-Boot version you have (see https://github.com/ska-sa/roach2_uboot.git)
 and which version of the ROACH2 CPLD image you are using (not sure where to
 get this).  I think these are unlikely to be problematic, but we've already
 checked all the likely problems.


 When I rebooted the ROACH-2, I got the following header for U-Boot:

 U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
 ...
 CPLD: 2.1

 Hope this is informative.




  We also tried out Jason Manley's suggestion of delaying the enabling of
  the 10-GbE cores to ensure that the sync pulse propagated through the 
  entire
  system before buffering up data, but the problem persisted.

 Do you have an external 1 PPS sync pulse connected or have you tried the
 latest rb-papergpu software that supports a software-generated sync?  The
 paper_feng_init.rb script already disables the data flow to the 10 GbE cores
 until the sync pulse has propagated through and the cores have been taken
 out of reset.


 We are using an external 1 PPS sync pulse. However, we are certain that
 it's set up correctly. Although, this could just be me grasping at straws
 since nothing else seems to solve the problem. How would we go about setting
 up the software-generated pulse?


 Does the latest rb-papergpu code show that the ADC clocks (MMCMs) are
 locked?  Does it estimate the clock frequency correctly?  Does
 adc16_dump_chans.rb show samples that correspond correctly to the analog
 inputs (e.g. a CW tone)?


 I've attached an image of the output from xtor_up.sh -f 1 with the latest
 rb-papergpu code. Nothing significant to note: the clock reads ~200 MHz.

 I've also attached an image of the output from adc16_dump_chans.rb, where
 A1 has a CW tone with a 10-MHz 40-V emf signal. You can see the oscillations
 in the first column and noise everywhere else.



  Just to rule it out, I double-checked (or more accurately
  triple-checked) the U72 part, and, sure enough, it is the correct
  oscillator, model number EEG-2121.

 Does it have the L suffix on the 100.000L frequency part of the chip
 markings?


 Yes, it does.


 On a related note, as I sent off-list to you and Peter earlier today:
 The fact that the Peter can send small packets at 200 MHz without overflow,
 but large packets give overflow is very interesting and puzzling.  I assume
 that the smaller packets are just fewer channels of the same length spectrum
 and that the number of 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Richard Black
Dan,

We aren't using a square wave. It's a pulse function, but that pulse's
shape can be easily described as a very thin square pulse.

However, you are saying that the pulse is high for only 1 us? That is much
shorter than what we are doing. I'll see if I can twiddle that down.

Thanks,

Richard Black

On Fri, Nov 7, 2014 at 10:09 AM, Dan Werthimer d...@ssl.berkeley.edu
wrote:

 you mentioned your 1 PPS is a square wave.
 that's different from everyone else's 1 PPS:

 standard 1 PPS systems output a pulse that is
 high for about 1 uS.  (extremely low duty cycle).

 i don't know if a square wave could be a problem - my guess
 is that the correlator design uses an edge detection block,
 so is only sensitive to edges, not levels, but it might
 be worth investigating.

 best wishes,

 dan


 On Fri, Nov 7, 2014 at 9:03 AM, Richard Black aeldstes...@gmail.com
 wrote:
  Hi all,
 
  Haven't heard anything for a while, so I thought I would add some more
  detail about our system setup to see if it might shed some light on the
  problem:
 
  1 PPS Signal
  -
  Square pulse
  Frequency: 1 Hz
  Amplitude: 3 Vpp
  Offset: 0 V
  Width: 10 ms
  Edge Time: 5 ns
 
  ADC Clock
  -
  CW Tone
  Frequency: 200 MHz
  Power: -9 dBm
 
  For David, are there any red flags with our UBoot version or ROACH CPLD?
  Here they are again for reference:
 
  From serial interface after ROACH reboot
  ==
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
  ...
  CPLD: 2.1
  ==
 
  Thanks!
 
 
  Richard Black
 
  On Tue, Nov 4, 2014 at 12:05 PM, Richard Black aeldstes...@gmail.com
  wrote:
 
  Hi David,
 
  Comments below:
 
  Richard Black
 
  On Mon, Nov 3, 2014 at 3:51 PM, David MacMahon 
 dav...@astro.berkeley.edu
  wrote:
 
  Hi, Richard,
 
  On Nov 3, 2014, at 11:47 AM, Richard Black wrote:
 
   So, it's been a little while now, but not much has changed yet. We've
   gotten Chipscope working, and, so far, there aren't any red flags
 with the
   FPGA firmware 10-GbE control signals.
 
  That's good to know, although maybe in some way it would have been nice
  if you had found some red flags.
 
   We also confirmed that the bitstream we are using is in fact
   roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not
 the
   problem.
 
  At least you are using a known good BOF file, so that eliminates a
 source
  of potential errors.
 
   I also took a look at the ROACH2 PPC setup: we pulled from the .git
   repository on February 12, 2014 (commit number =
   e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any
 changes to
   that repository since August 2013, so unless the SKA-SA ROACH-2s are
 using a
   pull from before then, I don't think that is our issue.
 
  We use our own homegrown NFS root filesystem for the ROACH2s, so I
 can't
  comment on the status of the one you refer to
  (https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more
 interested in
  the U-Boot version you have (see
 https://github.com/ska-sa/roach2_uboot.git)
  and which version of the ROACH2 CPLD image you are using (not sure
 where to
  get this).  I think these are unlikely to be problematic, but we've
 already
  checked all the likely problems.
 
 
  When I rebooted the ROACH-2, I got the following header for U-Boot:
 
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
  ...
  CPLD: 2.1
 
  Hope this is informative.
 
 
 
 
   We also tried out Jason Manley's suggestion of delaying the enabling
 of
   the 10-GbE cores to ensure that the sync pulse propagated through
 the entire
   system before buffering up data, but the problem persisted.
 
  Do you have an external 1 PPS sync pulse connected or have you tried
 the
  latest rb-papergpu software that supports a software-generated
 sync?  The
  paper_feng_init.rb script already disables the data flow to the 10 GbE
 cores
  until the sync pulse has propagated through and the cores have been
 taken
  out of reset.
 
 
  We are using an external 1 PPS sync pulse. However, we are certain that
  it's set up correctly. Although, this could just be me grasping at
 straws
  since nothing else seems to solve the problem. How would we go about
 setting
  up the software-generated pulse?
 
 
  Does the latest rb-papergpu code show that the ADC clocks (MMCMs) are
  locked?  Does it estimate the clock frequency correctly?  Does
  adc16_dump_chans.rb show samples that correspond correctly to the
 analog
  inputs (e.g. a CW tone)?
 
 
  I've attached an image of the output from xtor_up.sh -f 1 with the
 latest
  rb-papergpu code. Nothing significant to note: the clock reads ~200 MHz.
 
  I've also attached an image of the output from adc16_dump_chans.rb,
 where
  A1 has a CW tone with a 10-MHz 40-V emf signal. You can see the
 oscillations
  in the first column and noise everywhere else.
 
 
 
   Just to rule it out, I double-checked (or more 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread G Jones
Also, at least for many ADC boards that have a PPS input, the signal is
connected to a 50 ohm resistor to ground and then goes into a TTL to LVDS
converter chip. You mentioned 3 Vpp and 0 V offset, so that sounds like the
signal is mostly at -1.5 V and then pulses up to +1.5V. I would suggest a
positive only waveform, 0 V pulsing up to 3 V would be better.

Glenn

On Fri, Nov 7, 2014 at 12:12 PM, Richard Black aeldstes...@gmail.com
wrote:

 Dan,

 We aren't using a square wave. It's a pulse function, but that pulse's
 shape can be easily described as a very thin square pulse.

 However, you are saying that the pulse is high for only 1 us? That is much
 shorter than what we are doing. I'll see if I can twiddle that down.

 Thanks,

 Richard Black

 On Fri, Nov 7, 2014 at 10:09 AM, Dan Werthimer d...@ssl.berkeley.edu
 wrote:

 you mentioned your 1 PPS is a square wave.
 that's different from everyone else's 1 PPS:

 standard 1 PPS systems output a pulse that is
 high for about 1 uS.  (extremely low duty cycle).

 i don't know if a square wave could be a problem - my guess
 is that the correlator design uses an edge detection block,
 so is only sensitive to edges, not levels, but it might
 be worth investigating.

 best wishes,

 dan


 On Fri, Nov 7, 2014 at 9:03 AM, Richard Black aeldstes...@gmail.com
 wrote:
  Hi all,
 
  Haven't heard anything for a while, so I thought I would add some more
  detail about our system setup to see if it might shed some light on the
  problem:
 
  1 PPS Signal
  -
  Square pulse
  Frequency: 1 Hz
  Amplitude: 3 Vpp
  Offset: 0 V
  Width: 10 ms
  Edge Time: 5 ns
 
  ADC Clock
  -
  CW Tone
  Frequency: 200 MHz
  Power: -9 dBm
 
  For David, are there any red flags with our UBoot version or ROACH CPLD?
  Here they are again for reference:
 
  From serial interface after ROACH reboot
  ==
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
  ...
  CPLD: 2.1
  ==
 
  Thanks!
 
 
  Richard Black
 
  On Tue, Nov 4, 2014 at 12:05 PM, Richard Black aeldstes...@gmail.com
  wrote:
 
  Hi David,
 
  Comments below:
 
  Richard Black
 
  On Mon, Nov 3, 2014 at 3:51 PM, David MacMahon 
 dav...@astro.berkeley.edu
  wrote:
 
  Hi, Richard,
 
  On Nov 3, 2014, at 11:47 AM, Richard Black wrote:
 
   So, it's been a little while now, but not much has changed yet.
 We've
   gotten Chipscope working, and, so far, there aren't any red flags
 with the
   FPGA firmware 10-GbE control signals.
 
  That's good to know, although maybe in some way it would have been
 nice
  if you had found some red flags.
 
   We also confirmed that the bitstream we are using is in fact
   roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately
 not the
   problem.
 
  At least you are using a known good BOF file, so that eliminates a
 source
  of potential errors.
 
   I also took a look at the ROACH2 PPC setup: we pulled from the .git
   repository on February 12, 2014 (commit number =
   e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any
 changes to
   that repository since August 2013, so unless the SKA-SA ROACH-2s
 are using a
   pull from before then, I don't think that is our issue.
 
  We use our own homegrown NFS root filesystem for the ROACH2s, so I
 can't
  comment on the status of the one you refer to
  (https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more
 interested in
  the U-Boot version you have (see
 https://github.com/ska-sa/roach2_uboot.git)
  and which version of the ROACH2 CPLD image you are using (not sure
 where to
  get this).  I think these are unlikely to be problematic, but we've
 already
  checked all the likely problems.
 
 
  When I rebooted the ROACH-2, I got the following header for U-Boot:
 
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
  ...
  CPLD: 2.1
 
  Hope this is informative.
 
 
 
 
   We also tried out Jason Manley's suggestion of delaying the
 enabling of
   the 10-GbE cores to ensure that the sync pulse propagated through
 the entire
   system before buffering up data, but the problem persisted.
 
  Do you have an external 1 PPS sync pulse connected or have you tried
 the
  latest rb-papergpu software that supports a software-generated
 sync?  The
  paper_feng_init.rb script already disables the data flow to the 10
 GbE cores
  until the sync pulse has propagated through and the cores have been
 taken
  out of reset.
 
 
  We are using an external 1 PPS sync pulse. However, we are certain that
  it's set up correctly. Although, this could just be me grasping at
 straws
  since nothing else seems to solve the problem. How would we go about
 setting
  up the software-generated pulse?
 
 
  Does the latest rb-papergpu code show that the ADC clocks (MMCMs) are
  locked?  Does it estimate the clock frequency correctly?  Does
  adc16_dump_chans.rb show samples that correspond correctly to the
 analog
 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Dan Werthimer
seconding glenn,

the 1 PPS pulse should be 0 to +3 volts
when terminated in 50 ohms.  (when connected to the roach board).
(that's 0 to 5 or 6 volts when not terminated).
the 1 PPS pulse should not go negative.

i suggest a pulse width of 1 uS  (not 10 ms).

best wishes,

dan




On Fri, Nov 7, 2014 at 9:12 AM, Richard Black aeldstes...@gmail.com wrote:
 Dan,

 We aren't using a square wave. It's a pulse function, but that pulse's shape
 can be easily described as a very thin square pulse.

 However, you are saying that the pulse is high for only 1 us? That is much
 shorter than what we are doing. I'll see if I can twiddle that down.

 Thanks,

 Richard Black

 On Fri, Nov 7, 2014 at 10:09 AM, Dan Werthimer d...@ssl.berkeley.edu
 wrote:

 you mentioned your 1 PPS is a square wave.
 that's different from everyone else's 1 PPS:

 standard 1 PPS systems output a pulse that is
 high for about 1 uS.  (extremely low duty cycle).

 i don't know if a square wave could be a problem - my guess
 is that the correlator design uses an edge detection block,
 so is only sensitive to edges, not levels, but it might
 be worth investigating.

 best wishes,

 dan


 On Fri, Nov 7, 2014 at 9:03 AM, Richard Black aeldstes...@gmail.com
 wrote:
  Hi all,
 
  Haven't heard anything for a while, so I thought I would add some more
  detail about our system setup to see if it might shed some light on the
  problem:
 
  1 PPS Signal
  -
  Square pulse
  Frequency: 1 Hz
  Amplitude: 3 Vpp
  Offset: 0 V
  Width: 10 ms
  Edge Time: 5 ns
 
  ADC Clock
  -
  CW Tone
  Frequency: 200 MHz
  Power: -9 dBm
 
  For David, are there any red flags with our UBoot version or ROACH CPLD?
  Here they are again for reference:
 
  From serial interface after ROACH reboot
  ==
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
  ...
  CPLD: 2.1
  ==
 
  Thanks!
 
 
  Richard Black
 
  On Tue, Nov 4, 2014 at 12:05 PM, Richard Black aeldstes...@gmail.com
  wrote:
 
  Hi David,
 
  Comments below:
 
  Richard Black
 
  On Mon, Nov 3, 2014 at 3:51 PM, David MacMahon
  dav...@astro.berkeley.edu
  wrote:
 
  Hi, Richard,
 
  On Nov 3, 2014, at 11:47 AM, Richard Black wrote:
 
   So, it's been a little while now, but not much has changed yet.
   We've
   gotten Chipscope working, and, so far, there aren't any red flags
   with the
   FPGA firmware 10-GbE control signals.
 
  That's good to know, although maybe in some way it would have been
  nice
  if you had found some red flags.
 
   We also confirmed that the bitstream we are using is in fact
   roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not
   the
   problem.
 
  At least you are using a known good BOF file, so that eliminates a
  source
  of potential errors.
 
   I also took a look at the ROACH2 PPC setup: we pulled from the .git
   repository on February 12, 2014 (commit number =
   e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any
   changes to
   that repository since August 2013, so unless the SKA-SA ROACH-2s are
   using a
   pull from before then, I don't think that is our issue.
 
  We use our own homegrown NFS root filesystem for the ROACH2s, so I
  can't
  comment on the status of the one you refer to
  (https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more
  interested in
  the U-Boot version you have (see
  https://github.com/ska-sa/roach2_uboot.git)
  and which version of the ROACH2 CPLD image you are using (not sure
  where to
  get this).  I think these are unlikely to be problematic, but we've
  already
  checked all the likely problems.
 
 
  When I rebooted the ROACH-2, I got the following header for U-Boot:
 
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
  ...
  CPLD: 2.1
 
  Hope this is informative.
 
 
 
 
   We also tried out Jason Manley's suggestion of delaying the enabling
   of
   the 10-GbE cores to ensure that the sync pulse propagated through
   the entire
   system before buffering up data, but the problem persisted.
 
  Do you have an external 1 PPS sync pulse connected or have you tried
  the
  latest rb-papergpu software that supports a software-generated sync?
  The
  paper_feng_init.rb script already disables the data flow to the 10 GbE
  cores
  until the sync pulse has propagated through and the cores have been
  taken
  out of reset.
 
 
  We are using an external 1 PPS sync pulse. However, we are certain that
  it's set up correctly. Although, this could just be me grasping at
  straws
  since nothing else seems to solve the problem. How would we go about
  setting
  up the software-generated pulse?
 
 
  Does the latest rb-papergpu code show that the ADC clocks (MMCMs) are
  locked?  Does it estimate the clock frequency correctly?  Does
  adc16_dump_chans.rb show samples that correspond correctly to the
  analog
  inputs (e.g. a CW tone)?
 
 
  I've attached an image of 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread 牛晨辉
Hi Glenn,Richard,and all, first,Do you think ADC clock -9dbm is proper?I 
checked the manual on casper website,it said +6dbm,well ,I doubt it too big ,so 
I use -1dbm. second,could it possible that the data received by wireshark on 
hpc is disordered?is the wireshark reading order correct?i received packets 
that the header show up in the middle of the packet when i use wireshark.I 
doubt the wireshark reading order is not correct ... Best Regards! peter -- 发自 
Android 网易邮箱 On 2014-11-08 01:15 , G Jones Wrote: Also, at least for many ADC 
boards that have a PPS input, the signal is connected to a 50 ohm resistor to 
ground and then goes into a TTL to LVDS converter chip. You mentioned 3 Vpp and 
0 V offset, so that sounds like the signal is mostly at -1.5 V and then pulses 
up to +1.5V. I would suggest a positive only waveform, 0 V pulsing up to 3 V 
would be better. Glenn On Fri, Nov 7, 2014 at 12:12 PM, Richard Black 
aeldstes...@gmail.com wrote: Dan, We aren't using a square wave. It's a pulse 
function, but that pulse's shape can be easily described as a very thin square 
pulse. However, you are saying that the pulse is high for only 1 us? That is 
much shorter than what we are doing. I'll see if I can twiddle that down. 
Thanks, Richard Black On Fri, Nov 7, 2014 at 10:09 AM, Dan Werthimer 
d...@ssl.berkeley.edu wrote: you mentioned your 1 PPS is a square wave. 
that's different from everyone else's 1 PPS: standard 1 PPS systems output a 
pulse that is high for about 1 uS.  (extremely low duty cycle). i don't know if 
a square wave could be a problem - my guess is that the correlator design uses 
an edge detection block, so is only sensitive to edges, not levels, but it 
might be worth investigating. best wishes, dan On Fri, Nov 7, 2014 at 9:03 AM, 
Richard Black aeldstes...@gmail.com wrote:  Hi all,   Haven't heard 
anything for a while, so I thought I would add some more  detail about our 
system setup to see if it might shed some light on the  problem:   1 PPS 
Signal  -  Square pulse  Frequency: 
1 Hz  Amplitude: 3 Vpp  Offset: 0 V  Width: 10 ms  Edge Time: 5 ns   ADC 
Clock  -  CW Tone  Frequency: 200 
MHz  Power: -9 dBm   For David, are there any red flags with our UBoot 
version or ROACH CPLD?  Here they are again for reference:   From serial 
interface after ROACH reboot  ==  U-Boot 
2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)  ...  CPLD: 2.1  
==   Thanks!Richard Black   On Tue, Nov 4, 2014 at 
12:05 PM, Richard Black aeldstes...@gmail.com  wrote:   Hi David,   
Comments below:   Richard Black   On Mon, Nov 3, 2014 at 3:51 PM, David 
MacMahon dav...@astro.berkeley.edu  wrote:   Hi, Richard,   On 
Nov 3, 2014, at 11:47 AM, Richard Black wrote:So, it's been a little 
while now, but not much has changed yet. We've   gotten Chipscope working, 
and, so far, there aren't any red flags with the   FPGA firmware 10-GbE 
control signals.   That's good to know, although maybe in some way it 
would have been nice  if you had found some red flags.We also 
confirmed that the bitstream we are using is in fact   
roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not the   
problem.   At least you are using a known good BOF file, so that 
eliminates a source  of potential errors.I also took a look at 
the ROACH2 PPC setup: we pulled from the .git   repository on February 12, 
2014 (commit number =   e14df9016c3b7ccba62cc6d0cae05405f4929c94). There 
haven't been any changes to   that repository since August 2013, so unless 
the SKA-SA ROACH-2s are using a   pull from before then, I don't think that 
is our issue.   We use our own homegrown NFS root filesystem for the 
ROACH2s, so I can't  comment on the status of the one you refer to  
(https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more interested in  
the U-Boot version you have (see https://github.com/ska-sa/roach2_uboot.git) 
 and which version of the ROACH2 CPLD image you are using (not sure where to 
 get this).  I think these are unlikely to be problematic, but we've already 
 checked all the likely problems.When I rebooted the ROACH-2, I 
got the following header for U-Boot:   U-Boot 
2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)  ...  CPLD: 2.1 
  Hope this is informative.   We also tried out Jason 
Manley's suggestion of delaying the enabling of   the 10-GbE cores to 
ensure that the sync pulse propagated through the entire   system before 
buffering up data, but the problem persisted.   Do you have an external 1 
PPS sync pulse connected or have you tried the  latest rb-papergpu software 
that supports a software-generated sync?  The  paper_feng_init.rb script 
already disables the data flow to the 10 GbE cores  until the sync pulse has 
propagated through and the cores have been taken  out of reset.We 
are using an external 1 PPS sync pulse. However, we are 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread David MacMahon
Hi, Richard,

On Nov 7, 2014, at 9:03 AM, Richard Black wrote:

 Haven't heard anything for a while, so I thought I would add some more detail 
 about our system setup to see if it might shed some light on the problem:
 
 1 PPS Signal
 -
 Square pulse
 Frequency: 1 Hz
 Amplitude: 3 Vpp
 Offset: 0 V
 Width: 10 ms
 Edge Time: 5 ns

That should be fine assuming the 3 Vpp is measured with the 50 ohm termination 
in place.  If you want to try a software sync, you can pass -S (UPPERcase!) 
to the latest paper_feng_init.rb script.  Check the output of 
paper_feng_init.rb --help to see whether your version supports that option.

 ADC Clock
 -
 CW Tone
 Frequency: 200 MHz
 Power: -9 dBm

It would be a good idea to increase the power level to +6 dBm as described on 
this wiki page:

https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs

But if the paper_feng_init.rb script reports that the ADC clocks are locked and 
they measure approximately 200 MHz, then I think this is unlikely to be the 
cause of the 10 GbE overflow problems (though it would be great if the fix were 
this simple!).

 For David, are there any red flags with our UBoot version or ROACH CPLD? Here 
 they are again for reference:
 
 From serial interface after ROACH reboot
 ==
 U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
 ...
 CPLD: 2.1
 ==

This matches one of our ROACH2s that is running and sending 10 GbE packets in 
our lab:

U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)

CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
   No Security/Kasumi support
   Bootstrap Option C - Boot ROM Location EBC (16 bits)
   32 kB I-Cache 32 kB D-Cache
Board: ROACH2
I2C:   ready
DRAM:  512 MiB
Flash: 128 MiB
In:serial
Out:   serial
Err:   serial
CPLD:  2.1
USB:   Host(int phy)
SN:ROACH2.2 batch=D#6#69 software fixups match
MAC:   02:44:01:02:06:45
DTT:   1 is 23 C
DTT:   2 is 26 C
Net:   ppc_4xx_eth0

Hope this helps,
Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Richard Black
Thanks David and all,

I unfortunately misspoke when it came to the power in the ADC clock signal.
In fact, we had it at 9 dBm, not -9. Sorry for any confusion.

I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To check
on possible ringing, I also hooked up our pulse generator to an
oscilloscope (I increased the pulse width to 10 ms, so I could see it). The
waveform I observe has some severe overshoot both on the uptake and down.
I've attached a drawing to explain what I mean.

I can't seem to mitigate this overshoot with our little Agilent arbitrary
waveform generator. Is this similar to the ringing seen at NRAO? If so, how
is the 1 PPS generated by casperites?

Thanks,

Richard Black

On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon dav...@astro.berkeley.edu
wrote:

 Hi, Richard,

 On Nov 7, 2014, at 9:03 AM, Richard Black wrote:

  Haven't heard anything for a while, so I thought I would add some more
 detail about our system setup to see if it might shed some light on the
 problem:
 
  1 PPS Signal
  -
  Square pulse
  Frequency: 1 Hz
  Amplitude: 3 Vpp
  Offset: 0 V
  Width: 10 ms
  Edge Time: 5 ns

 That should be fine assuming the 3 Vpp is measured with the 50 ohm
 termination in place.  If you want to try a software sync, you can pass
 -S (UPPERcase!) to the latest paper_feng_init.rb script.  Check the
 output of paper_feng_init.rb --help to see whether your version supports
 that option.

  ADC Clock
  -
  CW Tone
  Frequency: 200 MHz
  Power: -9 dBm

 It would be a good idea to increase the power level to +6 dBm as described
 on this wiki page:


 https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs

 But if the paper_feng_init.rb script reports that the ADC clocks are
 locked and they measure approximately 200 MHz, then I think this is
 unlikely to be the cause of the 10 GbE overflow problems (though it would
 be great if the fix were this simple!).

  For David, are there any red flags with our UBoot version or ROACH CPLD?
 Here they are again for reference:
 
  From serial interface after ROACH reboot
  ==
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
  ...
  CPLD: 2.1
  ==

 This matches one of our ROACH2s that is running and sending 10 GbE packets
 in our lab:

 U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)

 CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
No Security/Kasumi support
Bootstrap Option C - Boot ROM Location EBC (16 bits)
32 kB I-Cache 32 kB D-Cache
 Board: ROACH2
 I2C:   ready
 DRAM:  512 MiB
 Flash: 128 MiB
 In:serial
 Out:   serial
 Err:   serial
 CPLD:  2.1
 USB:   Host(int phy)
 SN:ROACH2.2 batch=D#6#69 software fixups match
 MAC:   02:44:01:02:06:45
 DTT:   1 is 23 C
 DTT:   2 is 26 C
 Net:   ppc_4xx_eth0

 Hope this helps,
 Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread David MacMahon
Hi, Peter,

Here is a tcpdump snapshot of the first part of a PAPER packet.  The data from 
tcpdump includes headers from other network layers that encapsulate the 
application data.

Here is the output:

$ sudo tcpdump -i eth4 -s 100 -xx -c 1 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth4, link-type EN10MB (Ethernet), capture size 100 bytes
20:36:04.678013 IP 10.10.4.1.8511  10.0.4.54.8511: UDP, length 8208
0x:     0202 c0a8 0401 0800 4500
0x0010:  202c  4000 ff11 3f80 0a0a 0401 0a00
0x0020:  0436 213f 213f 2018  0006 e74d 2d6d
0x0030:  0510 c003 1f1f f200 eefe 0fed e2dd dbf0
0x0040:  e00f e5c3 eef4 03e2 ff11 31ed 1011 1e3c
0x0050:  4ce5 f342 10bf 1ff9 1f2a 9f26 e334 4e60
0x0060:  1010 1ff2 ...

Here is a breakdown of what is there...

# Ethernet Header

  Note the broadcast destination MAC (ff:ff:ff:ff:ff:ff) is used because this is
  a direct connection from ROACH2 to 10 GbE NIC (i.e. no switch).

0x:     0202 c0a8 0401 0800

# IP Header

  Note the source IP (10.10.4.1) and destination IP (10.0.4.45) in the last 8 
octets.

0x: 4500
0x0010:  202c  4000 ff11 3f80 0a0a 0401 0a00
0x0020:  0436

# UDP Header

  PAPER uses port 8511 (0x213f) because US Letter Size paper is 8.5x11 
inches. :-)
  The same port number is used for both source and destination ports.
  0x2018 is UDP packet length == UDP header length + application packet length.
  Here we have 8216 == 8 + 8208.

0x0020:   213f 213f 2018 

# PAPER Packet (finally!)

  The first 6 bytes are MCOUNT (0x0006e74d2d6d).
  The next 1 byte is FID (5).
  The next 1 byte is XID (16).
  The next 8192 bytes (not all shown) are the data.
  The final 8 bytes (not shown) are 4 bytes CRC + 4 bytes of zeros.
  The CRC is of the PAPER header and data (mcount+fid+xid+data).

0x0020:   0006 e74d 2d6d
0x0030:  0510 c003 1f1f f200 eefe 0fed e2dd dbf0
0x0040:  e00f e5c3 eef4 03e2 ff11 31ed 1011 1e3c
0x0050:  4ce5 f342 10bf 1ff9 1f2a 9f26 e334 4e60
0x0060:  1010 1ff2 ...

Hope this helps,
Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread David MacMahon
Hi, Richard,

I think that your 1 PPS should be very usable.  I think we typically generate 
the 1 PPS from a GPS clock.

If you want to try a test, you could disconnect the 1 PPS and use the software 
generated sync signal as per the earlier emails.  If that works and using the 
external 1 PPS doesn't then you will have found the problem.  I'd be surprised 
(but happy!) if that turns out to be the problem.

Dave

On Nov 7, 2014, at 10:55 AM, Richard Black wrote:

 Thanks David and all,
 
 I unfortunately misspoke when it came to the power in the ADC clock signal. 
 In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
 
 I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To check 
 on possible ringing, I also hooked up our pulse generator to an oscilloscope 
 (I increased the pulse width to 10 ms, so I could see it). The waveform I 
 observe has some severe overshoot both on the uptake and down. I've attached 
 a drawing to explain what I mean.
 
 I can't seem to mitigate this overshoot with our little Agilent arbitrary 
 waveform generator. Is this similar to the ringing seen at NRAO? If so, how 
 is the 1 PPS generated by casperites?
 
 Thanks,
 
 Richard Black
 
 On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon dav...@astro.berkeley.edu 
 wrote:
 Hi, Richard,
 
 On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
 
  Haven't heard anything for a while, so I thought I would add some more 
  detail about our system setup to see if it might shed some light on the 
  problem:
 
  1 PPS Signal
  -
  Square pulse
  Frequency: 1 Hz
  Amplitude: 3 Vpp
  Offset: 0 V
  Width: 10 ms
  Edge Time: 5 ns
 
 That should be fine assuming the 3 Vpp is measured with the 50 ohm 
 termination in place.  If you want to try a software sync, you can pass -S 
 (UPPERcase!) to the latest paper_feng_init.rb script.  Check the output of 
 paper_feng_init.rb --help to see whether your version supports that option.
 
  ADC Clock
  -
  CW Tone
  Frequency: 200 MHz
  Power: -9 dBm
 
 It would be a good idea to increase the power level to +6 dBm as described on 
 this wiki page:
 
 https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
 
 But if the paper_feng_init.rb script reports that the ADC clocks are locked 
 and they measure approximately 200 MHz, then I think this is unlikely to be 
 the cause of the 10 GbE overflow problems (though it would be great if the 
 fix were this simple!).
 
  For David, are there any red flags with our UBoot version or ROACH CPLD? 
  Here they are again for reference:
 
  From serial interface after ROACH reboot
  ==
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
  ...
  CPLD: 2.1
  ==
 
 This matches one of our ROACH2s that is running and sending 10 GbE packets in 
 our lab:
 
 U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
 
 CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
No Security/Kasumi support
Bootstrap Option C - Boot ROM Location EBC (16 bits)
32 kB I-Cache 32 kB D-Cache
 Board: ROACH2
 I2C:   ready
 DRAM:  512 MiB
 Flash: 128 MiB
 In:serial
 Out:   serial
 Err:   serial
 CPLD:  2.1
 USB:   Host(int phy)
 SN:ROACH2.2 batch=D#6#69 software fixups match
 MAC:   02:44:01:02:06:45
 DTT:   1 is 23 C
 DTT:   2 is 26 C
 Net:   ppc_4xx_eth0
 
 Hope this helps,
 Dave
 
 
 pulse_profile.png




Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Richard Black
David,

Well, unfortunately, using only the software-generated sync did not fix the
packet overflow issue. :-(

Richard Black

On Fri, Nov 7, 2014 at 12:10 PM, David MacMahon dav...@astro.berkeley.edu
wrote:

 Hi, Richard,

 I think that your 1 PPS should be very usable.  I think we typically
 generate the 1 PPS from a GPS clock.

 If you want to try a test, you could disconnect the 1 PPS and use the
 software generated sync signal as per the earlier emails.  If that works
 and using the external 1 PPS doesn't then you will have found the problem.
 I'd be surprised (but happy!) if that turns out to be the problem.

 Dave

 On Nov 7, 2014, at 10:55 AM, Richard Black wrote:

  Thanks David and all,
 
  I unfortunately misspoke when it came to the power in the ADC clock
 signal. In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
 
  I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To
 check on possible ringing, I also hooked up our pulse generator to an
 oscilloscope (I increased the pulse width to 10 ms, so I could see it). The
 waveform I observe has some severe overshoot both on the uptake and down.
 I've attached a drawing to explain what I mean.
 
  I can't seem to mitigate this overshoot with our little Agilent
 arbitrary waveform generator. Is this similar to the ringing seen at NRAO?
 If so, how is the 1 PPS generated by casperites?
 
  Thanks,
 
  Richard Black
 
  On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon 
 dav...@astro.berkeley.edu wrote:
  Hi, Richard,
 
  On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
 
   Haven't heard anything for a while, so I thought I would add some more
 detail about our system setup to see if it might shed some light on the
 problem:
  
   1 PPS Signal
   -
   Square pulse
   Frequency: 1 Hz
   Amplitude: 3 Vpp
   Offset: 0 V
   Width: 10 ms
   Edge Time: 5 ns
 
  That should be fine assuming the 3 Vpp is measured with the 50 ohm
 termination in place.  If you want to try a software sync, you can pass
 -S (UPPERcase!) to the latest paper_feng_init.rb script.  Check the
 output of paper_feng_init.rb --help to see whether your version supports
 that option.
 
   ADC Clock
   -
   CW Tone
   Frequency: 200 MHz
   Power: -9 dBm
 
  It would be a good idea to increase the power level to +6 dBm as
 described on this wiki page:
 
 
 https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
 
  But if the paper_feng_init.rb script reports that the ADC clocks are
 locked and they measure approximately 200 MHz, then I think this is
 unlikely to be the cause of the 10 GbE overflow problems (though it would
 be great if the fix were this simple!).
 
   For David, are there any red flags with our UBoot version or ROACH
 CPLD? Here they are again for reference:
  
   From serial interface after ROACH reboot
   ==
   U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
   ...
   CPLD: 2.1
   ==
 
  This matches one of our ROACH2s that is running and sending 10 GbE
 packets in our lab:
 
  U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
 
  CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
 No Security/Kasumi support
 Bootstrap Option C - Boot ROM Location EBC (16 bits)
 32 kB I-Cache 32 kB D-Cache
  Board: ROACH2
  I2C:   ready
  DRAM:  512 MiB
  Flash: 128 MiB
  In:serial
  Out:   serial
  Err:   serial
  CPLD:  2.1
  USB:   Host(int phy)
  SN:ROACH2.2 batch=D#6#69 software fixups match
  MAC:   02:44:01:02:06:45
  DTT:   1 is 23 C
  DTT:   2 is 26 C
  Net:   ppc_4xx_eth0
 
  Hope this helps,
  Dave
 
 
  pulse_profile.png




Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-03 Thread Richard Black
David,

So, it's been a little while now, but not much has changed yet. We've
gotten Chipscope working, and, so far, there aren't any red flags with the
FPGA firmware 10-GbE control signals.

We also confirmed that the bitstream we are using is in fact
roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not the
problem.

I also took a look at the ROACH2 PPC setup: we pulled from the .git
repository on February 12, 2014 (commit number =
e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any changes
to that repository since August 2013, so unless the SKA-SA ROACH-2s are
using a pull from before then, I don't think that is our issue.

We also tried out Jason Manley's suggestion of delaying the enabling of the
10-GbE cores to ensure that the sync pulse propagated through the entire
system before buffering up data, but the problem persisted.

Just to rule it out, I double-checked (or more accurately triple-checked)
the U72 part, and, sure enough, it is the correct oscillator, model number
EEG-2121.

There is another possibility, albeit an unlikely problem: we currently have
the ROACH-2 board booting off another PC (i.e. not the same PC that the
ruby control scripts are running on). I can't imagine that this is the
problem, but I'm planning on trying to consolidate the NFS and ruby scripts
onto a single PC to rule it out.

So I suppose at this point, my questions are:

(1) What version of the roach2_nfs_uboot .git repository are SKA-SA using?
(2) Is SKA-SA using the same PCs for ROACH-2 net boots and file systems as
the ruby control scripts?
(3) Are there any additional steps that need to be taken when installing
the Quad SFP+ mezzanine cards onto the ROACH-2 board? Are there potentially
some drivers or configuration steps that are needed to make sure they
function properly? As I recall, when we got the boards, we didn't do
anything special with the cards outside of simply plugging them in.

Again, thanks for your patient advice and suggestions.


Richard Black

On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon dav...@astro.berkeley.edu
wrote:

 Hi, Richard,

 On Oct 27, 2014, at 9:25 AM, Richard Black wrote:

  This is a reportedly fully-functional model that shouldn't require any
 major changes in order to operate. However, this has clearly not been the
 case in at least two independent situations (us and Peter). This begs the
 question: what's so different about our use of PAPER?

 I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is
 the one being used by the PAPER correlator currently fielded in South
 Africa.  It is definitely a fully functional model.  That image (and all
 source files for it) is available from the git repo listed on the PAPER
 Correlator Manifest page of the CASPER Wiki:

 https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest

  We, at BYU, have made painstakingly sure that our IP addressing schemes,
 switch ports, and scripts are all configured correctly (thanks to David
 MacMahon for that, btw), but we still have hit the proverbial brick wall of
 10-GbE overflow.  When I last corresponded with David, he explained that he
 remembers having a similar issue before, but can't recall exactly what the
 problem was.

 Really?  I recall saying that I often forget about increasing the MTU of
 the 10 GbE switch and NICs.  I don't recall saying that I had a similar
 issue before but couldn't remember the problem.

  In any case, the fact that by turning down the ADC clock prior to
 start-up prevents the 10-GbE core from overflowing is a major lead for us
 at BYU (we've been spinning our wheels on this issue for several months
 now). By no means are we proposing mid-run ADC clock modifications, but
 this appears to be a very subtle (and quite sinister, in my opinion) bug.
 
  Any thoughts as to what might be going on?

 I cannot explain the 10 GbE overflow that you and Peter are experiencing.
 I have pushed some updates to the rb-papergpu.git repository listed on the
 PAPER Correlator Manifest page.  The paper_feng_init.rb script now verifies
 that the ADC clocks are locked and provides options for issuing a software
 sync (only recommended for lab use) and for not storing the time of
 synchronization in redis (also only recommended for lab use).

 The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1)
 while they are held in reset.  Since you are using the paper_feng_init.rb
 script, this should not be happening (unless something has gone wrong
 during the running of that script) because that script specifically and
 explicitly disables the tx_valid signal before putting the cores into reset
 and it takes the cores out of reset before enabling the tx_valid signal.
 So assuming that this is not the cause of the overflows, there must be
 something else that is causing the 10 GbE cores to be unable to transmit
 data fast enough to keep up with the data stream it is being fed.  Two
 things that could cause this are 1) running the 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-03 Thread David MacMahon
Hi, Richard,

On Nov 3, 2014, at 11:47 AM, Richard Black wrote:

 So, it's been a little while now, but not much has changed yet. We've gotten 
 Chipscope working, and, so far, there aren't any red flags with the FPGA 
 firmware 10-GbE control signals.

That's good to know, although maybe in some way it would have been nice if you 
had found some red flags.

 We also confirmed that the bitstream we are using is in fact 
 roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not the 
 problem.

At least you are using a known good BOF file, so that eliminates a source of 
potential errors.

 I also took a look at the ROACH2 PPC setup: we pulled from the .git 
 repository on February 12, 2014 (commit number = 
 e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any changes to 
 that repository since August 2013, so unless the SKA-SA ROACH-2s are using a 
 pull from before then, I don't think that is our issue.

We use our own homegrown NFS root filesystem for the ROACH2s, so I can't 
comment on the status of the one you refer to 
(https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more interested in the 
U-Boot version you have (see https://github.com/ska-sa/roach2_uboot.git) and 
which version of the ROACH2 CPLD image you are using (not sure where to get 
this).  I think these are unlikely to be problematic, but we've already checked 
all the likely problems.

 We also tried out Jason Manley's suggestion of delaying the enabling of the 
 10-GbE cores to ensure that the sync pulse propagated through the entire 
 system before buffering up data, but the problem persisted.

Do you have an external 1 PPS sync pulse connected or have you tried the latest 
rb-papergpu software that supports a software-generated sync?  The 
paper_feng_init.rb script already disables the data flow to the 10 GbE cores 
until the sync pulse has propagated through and the cores have been taken out 
of reset.

Does the latest rb-papergpu code show that the ADC clocks (MMCMs) are locked?  
Does it estimate the clock frequency correctly?  Does adc16_dump_chans.rb show 
samples that correspond correctly to the analog inputs (e.g. a CW tone)?

 Just to rule it out, I double-checked (or more accurately triple-checked) the 
 U72 part, and, sure enough, it is the correct oscillator, model number 
 EEG-2121.

Does it have the L suffix on the 100.000L frequency part of the chip 
markings?

On a related note, as I sent off-list to you and Peter earlier today:  The fact 
that the Peter can send small packets at 200 MHz without overflow, but large 
packets give overflow is very interesting and puzzling.  I assume that the 
smaller packets are just fewer channels of the same length spectrum and that 
the number of packets per second remains the same (I think we discussed this 
previously).  In that case, the small packets reduce the data rate, which 
suggests that the 156.25 MHz xaui_ref_clk clock is maybe not really 156.25 
MHz but something somewhat slower.  This clock is driven by the oscillator at 
U56 and the clock splitter at U54 (see attached schematic snippet).  Can you 
please inspect those parts on your board(s)?  I will be able to inspect a 
ROACH2 this afternoon and report what I have on a known working system.

On one of our ROACH2s U56 is labeled like this:

EEG-2121
156.250L
OGPN1Z5C

Again, note the L suffix.  I think that signifies LVDS, which is what is 
expected/required for the ROACH2.  That's very important.  I am not 100% sure 
about my transcription of the third line, it could have typos.

 There is another possibility, albeit an unlikely problem: we currently have 
 the ROACH-2 board booting off another PC (i.e. not the same PC that the ruby 
 control scripts are running on). I can't imagine that this is the problem, 
 but I'm planning on trying to consolidate the NFS and ruby scripts onto a 
 single PC to rule it out.

The scripts communicate with the ROACH2 over the network via KATCP.  There is 
no requirement that the scripts be running on the same server that is providing 
the NFS root filesystem to the ROACH2s.

 So I suppose at this point, my questions are:
 
 (1) What version of the roach2_nfs_uboot .git repository are SKA-SA using?

I don't know.

 (2) Is SKA-SA using the same PCs for ROACH-2 net boots and file systems as 
 the ruby control scripts?

I doubt SKA-SA is using ruby, but as stated above the ruby scripts can be run 
on any system that can reach the ROACH2 via KATCP.

 (3) Are there any additional steps that need to be taken when installing the 
 Quad SFP+ mezzanine cards onto the ROACH-2 board? Are there potentially some 
 drivers or configuration steps that are needed to make sure they function 
 properly? As I recall, when we got the boards, we didn't do anything special 
 with the cards outside of simply plugging them in.

Just plugging them in is all that is necessary.  There is a slight complication 
in that the standoffs might not be exactly the right height and some 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-28 Thread peter
Hi all,
Sorry to reply you late.
First, Though the serial number of all 8 roaches we have are in the range that 
might got wrong,fortunately, ours are installed the correct crystals (Epson 
EEG-2121-100.000L).
I have viewed the discuss yesterday.My project'final frequency is 250Mhz,but I 
didn't turn it up to 250Mhz when I run PAPER model.
As the initialization shows:


[peter@roachserver rb_test]$ ./paper_feng_init.rb roach1:0 initializing roach1 
as FID 0 connecting to roach1 roach1 roach2_fengine app/lib revision 
47c59e2/cd26bd2 disabling network transmission setting roach1 FID to 0 setting 
fftshift to 2047 setting eq to 600/1 configuring 10 GbE interfaces setting 
corner turner mode 0 (8 F engines) arming sync generator(s) arming sync 
generator(s) storing sync time in redis on redishost seeding noise generators 
arming noise generator(s) Setting F-Engine inputs to ADC signals resetting 
network interfaces enable transmission to X engines enable transmission to 
switch all done
The configuration looks ok,but no data send out because the overflow.I agree 
with David that It may not be the script that matters. Because I can use this 
script to initial my own model which are modified from PAPER for our use.What's 
more, it can send out data packets from ROACH in 200Mhz(even in 250Mhz).And the 
overflow problem has never happened.My model are sending data in 4112 bytes 
length.
I also find neither PAPER model in 75 Mhz nor my model in 200Mhz could receive 
the correct data structure on my system.I mean the Header appears in the middle 
of the packet.I found this in wireshark.


I have run the adc16_dump_chans.rb when I run PAPER model. The result is like 
flowing:


[peter@roachserver bin]$ ./adc16_dump_chans.rb -r -v pf1 data snap took 
0.363328416 seconds 111.5 112.0 112.1 112.1 127.1 127.1 127.3 127.4 112.2 112.3 
111.8 112.0 112.1 112.2 111.6 112.0 112.4 111.6 112.1 112.0 127.0 127.4 127.1 
127.3 112.1 111.4 112.0 111.7 127.3 126.7 127.4 126.6
I also download the new script as David point,but I met a name-error:


[peter@roachserver bin]$ ./paper_feng_init.rb pf1 initializing pf1 as FID 0 
connecting to pf1 ./paper_feng_init.rb:130:in `block in main': undefined 
local variable or method `a' for main:Object (NameError) from 
./paper_feng_init.rb:112:in `map' from ./paper_feng_init.rb:112:in `main'


Thanks for your communication and suggestions!
peter










At 2014-10-28 05:03:14, David MacMahon dav...@astro.berkeley.edu wrote:
Hi, Richard and Peter,

Another possibility that crossed my mind is perhaps your ROACH2s were from the 
batch where the incorrect oscillator was installed for U72.  This seems 
unlikely for Richard based on this email (which also describes the incorrect 
oscillator problem in general):

https://www.mail-archive.com/casper@lists.berkeley.edu/msg04909.html

Maybe it's worth a double check anyway?

Dave

On Oct 27, 2014, at 1:41 PM, Richard Black wrote:

 David,
 
 We'll take another close look at what model we are actually using, just to 
 be safe.
 
 I went back and looked at our e-mails, and sure enough, you're right. You 
 were referring to the MTU issue as being the problem you tend to suppress 
 all memory of. It was just that you stated it in a separate paragraph, so, 
 out-of-context, I extrapolated that you have had the same problem before. My 
 bad for dragging your good name through the mud. :)
 
 We will also update our local repositories, in the event some bizarre race 
 condition exists on our end.
 
 I didn't know that the buffer could fill up while reset was asserted. We'll 
 definitely have to check up on that too.
 
 We haven't tried dumping raw ADC data yet since we have been trying to get 
 the data link working first. After that, we were planning to inject signal 
 and examine outputs.
 
 Thanks,
 
 Richard Black
 
 On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon dav...@astro.berkeley.edu 
 wrote:
 Hi, Richard,
 
 On Oct 27, 2014, at 9:25 AM, Richard Black wrote:
 
  This is a reportedly fully-functional model that shouldn't require any 
  major changes in order to operate. However, this has clearly not been the 
  case in at least two independent situations (us and Peter). This begs the 
  question: what's so different about our use of PAPER?
 
 I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is the 
 one being used by the PAPER correlator currently fielded in South Africa.  
 It is definitely a fully functional model.  That image (and all source files 
 for it) is available from the git repo listed on the PAPER Correlator 
 Manifest page of the CASPER Wiki:
 
 https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest
 
  We, at BYU, have made painstakingly sure that our IP addressing schemes, 
  switch ports, and scripts are all configured correctly (thanks to David 
  MacMahon for that, btw), but we still have hit the proverbial brick wall 
  of 10-GbE overflow.  When I last corresponded with David, he explained 
  that he 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-28 Thread David MacMahon
Hi, Peter,

On Oct 28, 2014, at 5:34 AM, peter wrote:

 First, Though the serial number of all 8 roaches we have are in the range 
 that might got wrong,fortunately, ours are installed the correct crystals 
 (Epson EEG-2121-100.000L). 

Thanks for checking.  That eliminates one potential cause of the problem.

 I have run the adc16_dump_chans.rb when I run PAPER model. The result is like 
 flowing:
 
 [peter@roachserver bin]$ ./adc16_dump_chans.rb -r  -v pf1
 data snap took 0.363328416 seconds
 111.5 112.0 112.1 112.1 127.1 127.1 127.3 127.4 112.2 112.3 111.8 112.0 112.1 
 112.2 111.6 112.0 112.4 111.6 112.1 112.0 127.0 127.4 127.1 127.3 112.1 111.4 
 112.0 111.7 127.3 126.7 127.4 126.6

The '-r' option tells the script to output the RMS of the 32 inputs.  Those RMS 
values are very, very high.  A full scale sine wave would have an RMS of only 
90.

What signals are driving the ADC inputs?

If you don't pass '-r' then it will dump 1K of samples from each input (one 
column per input, one row per sample).  What does that show?

 [peter@roachserver bin]$ ./paper_feng_init.rb pf1
 initializing pf1 as FID 0
 connecting to pf1
 ./paper_feng_init.rb:130:in `block in main': undefined local variable or 
 method `a' for main:Object (NameError)
   from ./paper_feng_init.rb:112:in `map'
   from ./paper_feng_init.rb:112:in `main'

Sorry about that copy/paste error!  I have pushed a fix.

Hope this helps to get us closer to understanding this problem,
Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley
Just a note that I don't recommend you adjust FPGA clock frequencies while it's 
operating. In theory, you should do a global reset in case the PLL/DLLs lose 
lock during clock transitions, in which case the logic could be in a uncertain 
state. But the Sysgen flow just does a single POR. 

A better solution might be to keep the 10GbE cores turned off (enable line 
pulled low) on initialisation, until things are configured (tgtap started etc), 
and only then enable the transmission using a SW register.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:

 Hi Richard,Joe, all,
 Thanks for your help,It finally can receive packets now!
 As you point,After enabled the ADC card and run bof file(./adc_init.rb roach1 
 bof file)in 200 Mhz (or higher than it), We need run init fengien script in 
 about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow the packet 
 transfer.  then we can turn the frequency  higher.However the finally ADC 
 clock frequency is up to 120 Mhz in my experiment.Our final ADC frequency 
 standard is 250 Mhz. Maybe I need run the bof file in a higher ADC frequency 
 first to make a final steady 250 Mhz ADC clock frequncy.
 Why it need init in a lower frequency and turn it up? That didn't make 
 sense.Is the hardware going wrong?As the yellow block adc16*250-8 is designed 
 for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the final 
 frequency in your experiment? 
 Any reply will be helpful!
 Best Regards!
 peter
 
 
 
 
 
 
 At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com wrote:
 Peter,
 
 That's correct. We downloaded the FPGA firmware and programmed the ROACH with 
 the precompiled bitstream. When we didn't get any data beyond that single 
 packet, we stuck some overflow status registers in the model and found that 
 we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
 
 We have actually found a way to get packets to flow, but it isn't a good fix. 
 When we turn the ADC clock frequency down to about 75 MHz, the packets begin 
 to flow. There is an opinion in our group that the 10-GbE buffer overflow is 
 a transient behavior, and, hence, if we slowly turn up the clock frequency 
 after the ROACH has started up, packets may continue to flow in steady-state 
 operation. We haven't tested this yet, though.
 
 Richard Black
 
 On Thu, Oct 23, 2014 at 8:39 PM, peter peterniu...@163.com wrote:
 Hi Richard, All,
 As you said the size of isolate packet is changing every time. ) :
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
 10:10:55.622053 IP 10.10.2.1.8511  10.10.2.9.8511: UDP, length 4616
 Ddi you download the PAPER gateware on the casper  
 (https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest ) directly? How 
 about the PAPER bof file run on your system? Have you met overflow before?I 
 download and install  PAPER model as the website says ,but the overflow shows 
 when I run the paper_feng_netstat.rb.
 Thanks for your information.
 peter
 
 
 
 
 
 At 2014-10-24 09:59:12, Richard Black aeldstes...@gmail.com wrote:
 Peter,
 
 I don't mean to hijack your thread, but we've been having a very similar (and 
 time-absorbing) issue with the PAPER f-engine FPGA firmware here at BYU. Out 
 of curiosity, does this single packet that you're receiving in tcpdump change 
 in size every time you reprogram the ROACH? We've seen this happen, and we're 
 pretty sure that this isolated packet is the 10-GbE buffer flushing when the 
 10-GbE core is initialized (i.e. the enable signal isn't sync'd with the 
 start of new packet).
 
 Regardless of whether we have the same issue, I'm very interested to see this 
 problem's resolution.
 
 Good luck,
 
 Richard Black
 
 On Thu, Oct 23, 2014 at 7:50 PM, peter peterniu...@163.com wrote:
 Hi Joe,  All,
 I find a thing this morning , there is one packet send out from roach When I 
 run PAPER model, which I got from HPC tcpdump:
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
 09:04:02.757813 IP 10.10.2.1.8511  10.10.2.9.8511: UDP, length 6456
 
 The lenght is not expected 8200+8 ,and far from full TX buffer size 
 8K+512.And the other packets are stopped from overflow.
 I have tried to change the tutorial 2 packet size to 8200 bytes and 8K +512 
 bytes. It is  a good transfer.I also make sure the boundary size is indeed 
 8K+512 ,because while I change size to 8K+513 byetes ,There is no data 
 send.So the received packet this morning with length 6456  is totally under 
 the limit.But what caused the other packets  in overflow? 
 Any suggestions could be helpful !
 peter
 
 
 
 
 
 
 At 2014-10-24 00:37:14, Kujawski, Joseph jkujaw...@siena.edu wrote:
 Peter,
 
 By cadence of the broadcast, I mean how often are the 8200 byte packets 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
Jason,

Thanks for your comments. While I agree that changing the ADC frequency
mid-operation is non-kosher and could result in uncertain behavior, the
issue at hand for us is to figure out what is going on with the PAPER model
that has been published on the CASPER wiki. This naturally won't be (and
shouldn't be) the end-all solution to this problem.

This is a reportedly fully-functional model that shouldn't require any
major changes in order to operate. However, this has clearly not been the
case in at least two independent situations (us and Peter). This begs the
question: what's so different about our use of PAPER?

We, at BYU, have made painstakingly sure that our IP addressing schemes,
switch ports, and scripts are all configured correctly (thanks to David
MacMahon for that, btw), but we still have hit the proverbial brick wall of
10-GbE overflow.  When I last corresponded with David, he explained that he
remembers having a similar issue before, but can't recall exactly what the
problem was.

In any case, the fact that by turning down the ADC clock prior to start-up
prevents the 10-GbE core from overflowing is a major lead for us at BYU
(we've been spinning our wheels on this issue for several months now). By
no means are we proposing mid-run ADC clock modifications, but this appears
to be a very subtle (and quite sinister, in my opinion) bug.

Any thoughts as to what might be going on?

Richard Black

On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za wrote:

 Just a note that I don't recommend you adjust FPGA clock frequencies while
 it's operating. In theory, you should do a global reset in case the
 PLL/DLLs lose lock during clock transitions, in which case the logic could
 be in a uncertain state. But the Sysgen flow just does a single POR.

 A better solution might be to keep the 10GbE cores turned off (enable line
 pulled low) on initialisation, until things are configured (tgtap started
 etc), and only then enable the transmission using a SW register.

 Jason Manley
 CBF Manager
 SKA-SA

 Cell: +27 82 662 7726
 Work: +27 21 506 7300

 On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:

  Hi Richard,Joe, all,
  Thanks for your help,It finally can receive packets now!
  As you point,After enabled the ADC card and run bof file(./adc_init.rb
 roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien
 script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow
 the packet transfer.  then we can turn the frequency  higher.However the
 finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC
 frequency standard is 250 Mhz. Maybe I need run the bof file in a higher
 ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
  Why it need init in a lower frequency and turn it up? That didn't make
 sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
 designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
 final frequency in your experiment?
  Any reply will be helpful!
  Best Regards!
  peter
 
 
 
 
 
 
  At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com wrote:
  Peter,
 
  That's correct. We downloaded the FPGA firmware and programmed the ROACH
 with the precompiled bitstream. When we didn't get any data beyond that
 single packet, we stuck some overflow status registers in the model and
 found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
 
  We have actually found a way to get packets to flow, but it isn't a good
 fix. When we turn the ADC clock frequency down to about 75 MHz, the packets
 begin to flow. There is an opinion in our group that the 10-GbE buffer
 overflow is a transient behavior, and, hence, if we slowly turn up the
 clock frequency after the ROACH has started up, packets may continue to
 flow in steady-state operation. We haven't tested this yet, though.
 
  Richard Black
 
  On Thu, Oct 23, 2014 at 8:39 PM, peter peterniu...@163.com wrote:
  Hi Richard, All,
  As you said the size of isolate packet is changing every time. ) :
  tcpdump: verbose output suppressed, use -v or -vv for full protocol
 decode
  listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
  10:10:55.622053 IP 10.10.2.1.8511  10.10.2.9.8511: UDP, length 4616
  Ddi you download the PAPER gateware on the casper  (
 https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest ) directly?
 How about the PAPER bof file run on your system? Have you met overflow
 before?I download and install  PAPER model as the website says ,but the
 overflow shows when I run the paper_feng_netstat.rb.
  Thanks for your information.
  peter
 
 
 
 
 
  At 2014-10-24 09:59:12, Richard Black aeldstes...@gmail.com wrote:
  Peter,
 
  I don't mean to hijack your thread, but we've been having a very similar
 (and time-absorbing) issue with the PAPER f-engine FPGA firmware here at
 BYU. Out of curiosity, does this single packet that you're receiving in
 tcpdump change in size every 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley
I suspect the 10GbE core's input FIFO is overflowing on startup. One key thing 
with this core is to the ensure that your design keeps the enable port held low 
until the core's been configured. The core becomes unusable once the TX FIFO 
overflows. This has been a long-standing bug (my emails trace back to 2009) but 
it's so easy to work around that I don't think anyone's bothered looking into 
fixing it.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:

 Jason,
 
 Thanks for your comments. While I agree that changing the ADC frequency 
 mid-operation is non-kosher and could result in uncertain behavior, the issue 
 at hand for us is to figure out what is going on with the PAPER model that 
 has been published on the CASPER wiki. This naturally won't be (and shouldn't 
 be) the end-all solution to this problem.
 
 This is a reportedly fully-functional model that shouldn't require any major 
 changes in order to operate. However, this has clearly not been the case in 
 at least two independent situations (us and Peter). This begs the question: 
 what's so different about our use of PAPER?
 
 We, at BYU, have made painstakingly sure that our IP addressing schemes, 
 switch ports, and scripts are all configured correctly (thanks to David 
 MacMahon for that, btw), but we still have hit the proverbial brick wall of 
 10-GbE overflow.  When I last corresponded with David, he explained that he 
 remembers having a similar issue before, but can't recall exactly what the 
 problem was.
 
 In any case, the fact that by turning down the ADC clock prior to start-up 
 prevents the 10-GbE core from overflowing is a major lead for us at BYU 
 (we've been spinning our wheels on this issue for several months now). By no 
 means are we proposing mid-run ADC clock modifications, but this appears to 
 be a very subtle (and quite sinister, in my opinion) bug.
 
 Any thoughts as to what might be going on?
 
 Richard Black
 
 On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za wrote:
 Just a note that I don't recommend you adjust FPGA clock frequencies while 
 it's operating. In theory, you should do a global reset in case the PLL/DLLs 
 lose lock during clock transitions, in which case the logic could be in a 
 uncertain state. But the Sysgen flow just does a single POR.
 
 A better solution might be to keep the 10GbE cores turned off (enable line 
 pulled low) on initialisation, until things are configured (tgtap started 
 etc), and only then enable the transmission using a SW register.
 
 Jason Manley
 CBF Manager
 SKA-SA
 
 Cell: +27 82 662 7726
 Work: +27 21 506 7300
 
 On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:
 
  Hi Richard,Joe, all,
  Thanks for your help,It finally can receive packets now!
  As you point,After enabled the ADC card and run bof file(./adc_init.rb 
  roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien 
  script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow 
  the packet transfer.  then we can turn the frequency  higher.However the 
  finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC 
  frequency standard is 250 Mhz. Maybe I need run the bof file in a higher 
  ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
  Why it need init in a lower frequency and turn it up? That didn't make 
  sense.Is the hardware going wrong?As the yellow block adc16*250-8 is 
  designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the 
  final frequency in your experiment?
  Any reply will be helpful!
  Best Regards!
  peter
 
 
 
 
 
 
  At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com wrote:
  Peter,
 
  That's correct. We downloaded the FPGA firmware and programmed the ROACH 
  with the precompiled bitstream. When we didn't get any data beyond that 
  single packet, we stuck some overflow status registers in the model and 
  found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
 
  We have actually found a way to get packets to flow, but it isn't a good 
  fix. When we turn the ADC clock frequency down to about 75 MHz, the packets 
  begin to flow. There is an opinion in our group that the 10-GbE buffer 
  overflow is a transient behavior, and, hence, if we slowly turn up the 
  clock frequency after the ROACH has started up, packets may continue to 
  flow in steady-state operation. We haven't tested this yet, though.
 
  Richard Black
 
  On Thu, Oct 23, 2014 at 8:39 PM, peter peterniu...@163.com wrote:
  Hi Richard, All,
  As you said the size of isolate packet is changing every time. ) :
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
  10:10:55.622053 IP 10.10.2.1.8511  10.10.2.9.8511: UDP, length 4616
  Ddi you download the PAPER gateware on the casper  
  

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
By enable port, I assume you mean the valid port. I've been looking at
the PAPER model carefully for some time now, and that is how it operates.
It has a gated valid signal with a software register on each 10-GbE core.

Once again, this is not our model. This is one made available on the CASPER
wiki and run without modifications.

Richard Black

On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za wrote:

 I suspect the 10GbE core's input FIFO is overflowing on startup. One key
 thing with this core is to the ensure that your design keeps the enable
 port held low until the core's been configured. The core becomes unusable
 once the TX FIFO overflows. This has been a long-standing bug (my emails
 trace back to 2009) but it's so easy to work around that I don't think
 anyone's bothered looking into fixing it.

 Jason Manley
 CBF Manager
 SKA-SA

 Cell: +27 82 662 7726
 Work: +27 21 506 7300

 On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:

  Jason,
 
  Thanks for your comments. While I agree that changing the ADC frequency
 mid-operation is non-kosher and could result in uncertain behavior, the
 issue at hand for us is to figure out what is going on with the PAPER model
 that has been published on the CASPER wiki. This naturally won't be (and
 shouldn't be) the end-all solution to this problem.
 
  This is a reportedly fully-functional model that shouldn't require any
 major changes in order to operate. However, this has clearly not been the
 case in at least two independent situations (us and Peter). This begs the
 question: what's so different about our use of PAPER?
 
  We, at BYU, have made painstakingly sure that our IP addressing schemes,
 switch ports, and scripts are all configured correctly (thanks to David
 MacMahon for that, btw), but we still have hit the proverbial brick wall of
 10-GbE overflow.  When I last corresponded with David, he explained that he
 remembers having a similar issue before, but can't recall exactly what the
 problem was.
 
  In any case, the fact that by turning down the ADC clock prior to
 start-up prevents the 10-GbE core from overflowing is a major lead for us
 at BYU (we've been spinning our wheels on this issue for several months
 now). By no means are we proposing mid-run ADC clock modifications, but
 this appears to be a very subtle (and quite sinister, in my opinion) bug.
 
  Any thoughts as to what might be going on?
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za wrote:
  Just a note that I don't recommend you adjust FPGA clock frequencies
 while it's operating. In theory, you should do a global reset in case the
 PLL/DLLs lose lock during clock transitions, in which case the logic could
 be in a uncertain state. But the Sysgen flow just does a single POR.
 
  A better solution might be to keep the 10GbE cores turned off (enable
 line pulled low) on initialisation, until things are configured (tgtap
 started etc), and only then enable the transmission using a SW register.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:
 
   Hi Richard,Joe, all,
   Thanks for your help,It finally can receive packets now!
   As you point,After enabled the ADC card and run bof file(./adc_init.rb
 roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien
 script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow
 the packet transfer.  then we can turn the frequency  higher.However the
 finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC
 frequency standard is 250 Mhz. Maybe I need run the bof file in a higher
 ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
   Why it need init in a lower frequency and turn it up? That didn't make
 sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
 designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
 final frequency in your experiment?
   Any reply will be helpful!
   Best Regards!
   peter
  
  
  
  
  
  
   At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com wrote:
   Peter,
  
   That's correct. We downloaded the FPGA firmware and programmed the
 ROACH with the precompiled bitstream. When we didn't get any data beyond
 that single packet, we stuck some overflow status registers in the model
 and found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
  
   We have actually found a way to get packets to flow, but it isn't a
 good fix. When we turn the ADC clock frequency down to about 75 MHz, the
 packets begin to flow. There is an opinion in our group that the 10-GbE
 buffer overflow is a transient behavior, and, hence, if we slowly turn up
 the clock frequency after the ROACH has started up, packets may continue to
 flow in steady-state operation. We haven't tested this yet, though.
  
   Richard Black
  
   On Thu, Oct 23, 2014 at 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley
Yep, ok, so whoever did it (Dave?) already knows about this issue and has dealt 
with it. So scratch that idea then! Only other thing to check is to make sure 
you don't actually toggle that software register until the core is configured.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 27 Oct 2014, at 18:38, Richard Black aeldstes...@gmail.com wrote:

 By enable port, I assume you mean the valid port. I've been looking at 
 the PAPER model carefully for some time now, and that is how it operates. It 
 has a gated valid signal with a software register on each 10-GbE core.
 
 Once again, this is not our model. This is one made available on the CASPER 
 wiki and run without modifications.
 
 Richard Black
 
 On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za wrote:
 I suspect the 10GbE core's input FIFO is overflowing on startup. One key 
 thing with this core is to the ensure that your design keeps the enable port 
 held low until the core's been configured. The core becomes unusable once the 
 TX FIFO overflows. This has been a long-standing bug (my emails trace back to 
 2009) but it's so easy to work around that I don't think anyone's bothered 
 looking into fixing it.
 
 Jason Manley
 CBF Manager
 SKA-SA
 
 Cell: +27 82 662 7726
 Work: +27 21 506 7300
 
 On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:
 
  Jason,
 
  Thanks for your comments. While I agree that changing the ADC frequency 
  mid-operation is non-kosher and could result in uncertain behavior, the 
  issue at hand for us is to figure out what is going on with the PAPER model 
  that has been published on the CASPER wiki. This naturally won't be (and 
  shouldn't be) the end-all solution to this problem.
 
  This is a reportedly fully-functional model that shouldn't require any 
  major changes in order to operate. However, this has clearly not been the 
  case in at least two independent situations (us and Peter). This begs the 
  question: what's so different about our use of PAPER?
 
  We, at BYU, have made painstakingly sure that our IP addressing schemes, 
  switch ports, and scripts are all configured correctly (thanks to David 
  MacMahon for that, btw), but we still have hit the proverbial brick wall of 
  10-GbE overflow.  When I last corresponded with David, he explained that he 
  remembers having a similar issue before, but can't recall exactly what the 
  problem was.
 
  In any case, the fact that by turning down the ADC clock prior to start-up 
  prevents the 10-GbE core from overflowing is a major lead for us at BYU 
  (we've been spinning our wheels on this issue for several months now). By 
  no means are we proposing mid-run ADC clock modifications, but this appears 
  to be a very subtle (and quite sinister, in my opinion) bug.
 
  Any thoughts as to what might be going on?
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za wrote:
  Just a note that I don't recommend you adjust FPGA clock frequencies while 
  it's operating. In theory, you should do a global reset in case the 
  PLL/DLLs lose lock during clock transitions, in which case the logic could 
  be in a uncertain state. But the Sysgen flow just does a single POR.
 
  A better solution might be to keep the 10GbE cores turned off (enable line 
  pulled low) on initialisation, until things are configured (tgtap started 
  etc), and only then enable the transmission using a SW register.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:
 
   Hi Richard,Joe, all,
   Thanks for your help,It finally can receive packets now!
   As you point,After enabled the ADC card and run bof file(./adc_init.rb 
   roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien 
   script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow 
   the packet transfer.  then we can turn the frequency  higher.However the 
   finally ADC clock frequency is up to 120 Mhz in my experiment.Our final 
   ADC frequency standard is 250 Mhz. Maybe I need run the bof file in a 
   higher ADC frequency first to make a final steady 250 Mhz ADC clock 
   frequncy.
   Why it need init in a lower frequency and turn it up? That didn't make 
   sense.Is the hardware going wrong?As the yellow block adc16*250-8 is 
   designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the 
   final frequency in your experiment?
   Any reply will be helpful!
   Best Regards!
   peter
  
  
  
  
  
  
   At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com wrote:
   Peter,
  
   That's correct. We downloaded the FPGA firmware and programmed the ROACH 
   with the precompiled bitstream. When we didn't get any data beyond that 
   single packet, we stuck some overflow status registers in the model and 
   found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
Jason,

Fair point. One of our guys is currently trying to get ChipScope configured
to make sure all our control signals are correct. We'll definitely look at
that signal too. Hopefully that will finally put this issue to rest.

Thanks for the tip,

Richard Black

On Mon, Oct 27, 2014 at 10:47 AM, Jason Manley jman...@ska.ac.za wrote:

 Yep, ok, so whoever did it (Dave?) already knows about this issue and has
 dealt with it. So scratch that idea then! Only other thing to check is to
 make sure you don't actually toggle that software register until the core
 is configured.

 Jason Manley
 CBF Manager
 SKA-SA

 Cell: +27 82 662 7726
 Work: +27 21 506 7300

 On 27 Oct 2014, at 18:38, Richard Black aeldstes...@gmail.com wrote:

  By enable port, I assume you mean the valid port. I've been looking
 at the PAPER model carefully for some time now, and that is how it
 operates. It has a gated valid signal with a software register on each
 10-GbE core.
 
  Once again, this is not our model. This is one made available on the
 CASPER wiki and run without modifications.
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za
 wrote:
  I suspect the 10GbE core's input FIFO is overflowing on startup. One key
 thing with this core is to the ensure that your design keeps the enable
 port held low until the core's been configured. The core becomes unusable
 once the TX FIFO overflows. This has been a long-standing bug (my emails
 trace back to 2009) but it's so easy to work around that I don't think
 anyone's bothered looking into fixing it.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:
 
   Jason,
  
   Thanks for your comments. While I agree that changing the ADC
 frequency mid-operation is non-kosher and could result in uncertain
 behavior, the issue at hand for us is to figure out what is going on with
 the PAPER model that has been published on the CASPER wiki. This naturally
 won't be (and shouldn't be) the end-all solution to this problem.
  
   This is a reportedly fully-functional model that shouldn't require any
 major changes in order to operate. However, this has clearly not been the
 case in at least two independent situations (us and Peter). This begs the
 question: what's so different about our use of PAPER?
  
   We, at BYU, have made painstakingly sure that our IP addressing
 schemes, switch ports, and scripts are all configured correctly (thanks to
 David MacMahon for that, btw), but we still have hit the proverbial brick
 wall of 10-GbE overflow.  When I last corresponded with David, he explained
 that he remembers having a similar issue before, but can't recall exactly
 what the problem was.
  
   In any case, the fact that by turning down the ADC clock prior to
 start-up prevents the 10-GbE core from overflowing is a major lead for us
 at BYU (we've been spinning our wheels on this issue for several months
 now). By no means are we proposing mid-run ADC clock modifications, but
 this appears to be a very subtle (and quite sinister, in my opinion) bug.
  
   Any thoughts as to what might be going on?
  
   Richard Black
  
   On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za
 wrote:
   Just a note that I don't recommend you adjust FPGA clock frequencies
 while it's operating. In theory, you should do a global reset in case the
 PLL/DLLs lose lock during clock transitions, in which case the logic could
 be in a uncertain state. But the Sysgen flow just does a single POR.
  
   A better solution might be to keep the 10GbE cores turned off (enable
 line pulled low) on initialisation, until things are configured (tgtap
 started etc), and only then enable the transmission using a SW register.
  
   Jason Manley
   CBF Manager
   SKA-SA
  
   Cell: +27 82 662 7726
   Work: +27 21 506 7300
  
   On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:
  
Hi Richard,Joe, all,
Thanks for your help,It finally can receive packets now!
As you point,After enabled the ADC card and run bof
 file(./adc_init.rb roach1 bof file)in 200 Mhz (or higher than it), We need
 run init fengien script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 )
 ,That will allow the packet transfer.  then we can turn the frequency
 higher.However the finally ADC clock frequency is up to 120 Mhz in my
 experiment.Our final ADC frequency standard is 250 Mhz. Maybe I need run
 the bof file in a higher ADC frequency first to make a final steady 250 Mhz
 ADC clock frequncy.
Why it need init in a lower frequency and turn it up? That didn't
 make sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
 designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
 final frequency in your experiment?
Any reply will be helpful!
Best Regards!
peter
   
   
   
   
   
   
At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com
 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jack Hickish
Hi Richard,

I've just had a very brief look at the design / software, so take this
email with a pinch of salt, but on the off-chance you haven't checked
this

It looks like the PAPER F-engine setup on running the start script for
software / firmware out of the box is --

1. Disable all ethernet interfaces
2. Arm sync generator, wait 1 second for PPS
3. Reset ethernet interfaces
4. Enable interfaces.

These four steps seem like they should be safe, yet the behaviour
you're describing sounds like the design is midway sending a packet,
then gets a sync, gives up sending an end-of-frame and starts sending
a new packet, at which point the old packet + the new packet =
overflow.

Knowing that the design works for paper, my wondering is whether after
arming the sync generator syncs are flowing through the design before
the ethernet interface is enabled. Do you have a PPS-like input? the
fengine initialisation script seems to wait for a second after arming,
but if your sync input is something significantly slower, you could
have problems.

I'm sceptical about this theory (I think the symptoms would be lots of
OK packets when you brought up the interface, and then it dying when
the sync arrives, rather than a single good packet like you're
seeing), but if the firmware + software really is the same as that
working with paper, and the wiki hasn't just got out of sync with the
paper devs, perhaps the problem is in your hardware setup

Cheers,
Jack

On 27 October 2014 16:38, Richard Black aeldstes...@gmail.com wrote:
 By enable port, I assume you mean the valid port. I've been looking at
 the PAPER model carefully for some time now, and that is how it operates. It
 has a gated valid signal with a software register on each 10-GbE core.

 Once again, this is not our model. This is one made available on the CASPER
 wiki and run without modifications.

 Richard Black

 On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za wrote:

 I suspect the 10GbE core's input FIFO is overflowing on startup. One key
 thing with this core is to the ensure that your design keeps the enable port
 held low until the core's been configured. The core becomes unusable once
 the TX FIFO overflows. This has been a long-standing bug (my emails trace
 back to 2009) but it's so easy to work around that I don't think anyone's
 bothered looking into fixing it.

 Jason Manley
 CBF Manager
 SKA-SA

 Cell: +27 82 662 7726
 Work: +27 21 506 7300

 On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:

  Jason,
 
  Thanks for your comments. While I agree that changing the ADC frequency
  mid-operation is non-kosher and could result in uncertain behavior, the
  issue at hand for us is to figure out what is going on with the PAPER model
  that has been published on the CASPER wiki. This naturally won't be (and
  shouldn't be) the end-all solution to this problem.
 
  This is a reportedly fully-functional model that shouldn't require any
  major changes in order to operate. However, this has clearly not been the
  case in at least two independent situations (us and Peter). This begs the
  question: what's so different about our use of PAPER?
 
  We, at BYU, have made painstakingly sure that our IP addressing schemes,
  switch ports, and scripts are all configured correctly (thanks to David
  MacMahon for that, btw), but we still have hit the proverbial brick wall of
  10-GbE overflow.  When I last corresponded with David, he explained that he
  remembers having a similar issue before, but can't recall exactly what the
  problem was.
 
  In any case, the fact that by turning down the ADC clock prior to
  start-up prevents the 10-GbE core from overflowing is a major lead for us 
  at
  BYU (we've been spinning our wheels on this issue for several months now).
  By no means are we proposing mid-run ADC clock modifications, but this
  appears to be a very subtle (and quite sinister, in my opinion) bug.
 
  Any thoughts as to what might be going on?
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za wrote:
  Just a note that I don't recommend you adjust FPGA clock frequencies
  while it's operating. In theory, you should do a global reset in case the
  PLL/DLLs lose lock during clock transitions, in which case the logic could
  be in a uncertain state. But the Sysgen flow just does a single POR.
 
  A better solution might be to keep the 10GbE cores turned off (enable
  line pulled low) on initialisation, until things are configured (tgtap
  started etc), and only then enable the transmission using a SW register.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:
 
   Hi Richard,Joe, all,
   Thanks for your help,It finally can receive packets now!
   As you point,After enabled the ADC card and run bof file(./adc_init.rb
   roach1 bof file)in 200 Mhz (or higher than it), We need run init 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
Jack,

I appreciate your help. I tend to agree that the issue is likely a hardware
configuration problem, but we have been trying to match it as closely as
possible.

We do feed a 1-PPS signal into the board, but I'm hazy on the details of
the other pulse parameters. I'll look into that as well.

So, if I understand you correctly, you believe that the sync pulse is
reaching the ethernet interfaces *after* the cores are enabled? If that is
the case, couldn't we delay enabling the 10-GbE cores for another second to
fix it? This might be a quick way to test that theory, but please correct
me if I've misunderstood.

Richard Black

On Mon, Oct 27, 2014 at 11:05 AM, Jack Hickish jackhick...@gmail.com
wrote:

 Hi Richard,

 I've just had a very brief look at the design / software, so take this
 email with a pinch of salt, but on the off-chance you haven't checked
 this

 It looks like the PAPER F-engine setup on running the start script for
 software / firmware out of the box is --

 1. Disable all ethernet interfaces
 2. Arm sync generator, wait 1 second for PPS
 3. Reset ethernet interfaces
 4. Enable interfaces.

 These four steps seem like they should be safe, yet the behaviour
 you're describing sounds like the design is midway sending a packet,
 then gets a sync, gives up sending an end-of-frame and starts sending
 a new packet, at which point the old packet + the new packet =
 overflow.

 Knowing that the design works for paper, my wondering is whether after
 arming the sync generator syncs are flowing through the design before
 the ethernet interface is enabled. Do you have a PPS-like input? the
 fengine initialisation script seems to wait for a second after arming,
 but if your sync input is something significantly slower, you could
 have problems.

 I'm sceptical about this theory (I think the symptoms would be lots of
 OK packets when you brought up the interface, and then it dying when
 the sync arrives, rather than a single good packet like you're
 seeing), but if the firmware + software really is the same as that
 working with paper, and the wiki hasn't just got out of sync with the
 paper devs, perhaps the problem is in your hardware setup

 Cheers,
 Jack

 On 27 October 2014 16:38, Richard Black aeldstes...@gmail.com wrote:
  By enable port, I assume you mean the valid port. I've been looking
 at
  the PAPER model carefully for some time now, and that is how it
 operates. It
  has a gated valid signal with a software register on each 10-GbE core.
 
  Once again, this is not our model. This is one made available on the
 CASPER
  wiki and run without modifications.
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za
 wrote:
 
  I suspect the 10GbE core's input FIFO is overflowing on startup. One key
  thing with this core is to the ensure that your design keeps the enable
 port
  held low until the core's been configured. The core becomes unusable
 once
  the TX FIFO overflows. This has been a long-standing bug (my emails
 trace
  back to 2009) but it's so easy to work around that I don't think
 anyone's
  bothered looking into fixing it.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:
 
   Jason,
  
   Thanks for your comments. While I agree that changing the ADC
 frequency
   mid-operation is non-kosher and could result in uncertain behavior,
 the
   issue at hand for us is to figure out what is going on with the PAPER
 model
   that has been published on the CASPER wiki. This naturally won't be
 (and
   shouldn't be) the end-all solution to this problem.
  
   This is a reportedly fully-functional model that shouldn't require any
   major changes in order to operate. However, this has clearly not been
 the
   case in at least two independent situations (us and Peter). This begs
 the
   question: what's so different about our use of PAPER?
  
   We, at BYU, have made painstakingly sure that our IP addressing
 schemes,
   switch ports, and scripts are all configured correctly (thanks to
 David
   MacMahon for that, btw), but we still have hit the proverbial brick
 wall of
   10-GbE overflow.  When I last corresponded with David, he explained
 that he
   remembers having a similar issue before, but can't recall exactly
 what the
   problem was.
  
   In any case, the fact that by turning down the ADC clock prior to
   start-up prevents the 10-GbE core from overflowing is a major lead
 for us at
   BYU (we've been spinning our wheels on this issue for several months
 now).
   By no means are we proposing mid-run ADC clock modifications, but this
   appears to be a very subtle (and quite sinister, in my opinion) bug.
  
   Any thoughts as to what might be going on?
  
   Richard Black
  
   On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za
 wrote:
   Just a note that I don't recommend you adjust FPGA clock frequencies
   while it's 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jack Hickish
Hi Richard,

That's my theory, though I doubt it's right. But as you say, an easy
test is just to delay after issuing a sync for a couple more seconds
and see if that helps. But if your PPS is a real PPS (rather than just
a square wave at some vague 1s period) then I can't see what
difference this would make.
When that doesn't help, my inclination would be to start prodding the
10gbe control signals from software to make sure the reset / sw
enables are working / see if a tge reset without a new sync behaves
differently. But I can't imagine how that would be broken unless the
stuff on github is out of date (which I doubt).

Jack

On 27 October 2014 17:28, Richard Black aeldstes...@gmail.com wrote:
 Jack,

 I appreciate your help. I tend to agree that the issue is likely a hardware
 configuration problem, but we have been trying to match it as closely as
 possible.

 We do feed a 1-PPS signal into the board, but I'm hazy on the details of the
 other pulse parameters. I'll look into that as well.

 So, if I understand you correctly, you believe that the sync pulse is
 reaching the ethernet interfaces after the cores are enabled? If that is the
 case, couldn't we delay enabling the 10-GbE cores for another second to fix
 it? This might be a quick way to test that theory, but please correct me if
 I've misunderstood.

 Richard Black

 On Mon, Oct 27, 2014 at 11:05 AM, Jack Hickish jackhick...@gmail.com
 wrote:

 Hi Richard,

 I've just had a very brief look at the design / software, so take this
 email with a pinch of salt, but on the off-chance you haven't checked
 this

 It looks like the PAPER F-engine setup on running the start script for
 software / firmware out of the box is --

 1. Disable all ethernet interfaces
 2. Arm sync generator, wait 1 second for PPS
 3. Reset ethernet interfaces
 4. Enable interfaces.

 These four steps seem like they should be safe, yet the behaviour
 you're describing sounds like the design is midway sending a packet,
 then gets a sync, gives up sending an end-of-frame and starts sending
 a new packet, at which point the old packet + the new packet =
 overflow.

 Knowing that the design works for paper, my wondering is whether after
 arming the sync generator syncs are flowing through the design before
 the ethernet interface is enabled. Do you have a PPS-like input? the
 fengine initialisation script seems to wait for a second after arming,
 but if your sync input is something significantly slower, you could
 have problems.

 I'm sceptical about this theory (I think the symptoms would be lots of
 OK packets when you brought up the interface, and then it dying when
 the sync arrives, rather than a single good packet like you're
 seeing), but if the firmware + software really is the same as that
 working with paper, and the wiki hasn't just got out of sync with the
 paper devs, perhaps the problem is in your hardware setup

 Cheers,
 Jack

 On 27 October 2014 16:38, Richard Black aeldstes...@gmail.com wrote:
  By enable port, I assume you mean the valid port. I've been looking
  at
  the PAPER model carefully for some time now, and that is how it
  operates. It
  has a gated valid signal with a software register on each 10-GbE core.
 
  Once again, this is not our model. This is one made available on the
  CASPER
  wiki and run without modifications.
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za
  wrote:
 
  I suspect the 10GbE core's input FIFO is overflowing on startup. One
  key
  thing with this core is to the ensure that your design keeps the enable
  port
  held low until the core's been configured. The core becomes unusable
  once
  the TX FIFO overflows. This has been a long-standing bug (my emails
  trace
  back to 2009) but it's so easy to work around that I don't think
  anyone's
  bothered looking into fixing it.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:
 
   Jason,
  
   Thanks for your comments. While I agree that changing the ADC
   frequency
   mid-operation is non-kosher and could result in uncertain behavior,
   the
   issue at hand for us is to figure out what is going on with the PAPER
   model
   that has been published on the CASPER wiki. This naturally won't be
   (and
   shouldn't be) the end-all solution to this problem.
  
   This is a reportedly fully-functional model that shouldn't require
   any
   major changes in order to operate. However, this has clearly not been
   the
   case in at least two independent situations (us and Peter). This begs
   the
   question: what's so different about our use of PAPER?
  
   We, at BYU, have made painstakingly sure that our IP addressing
   schemes,
   switch ports, and scripts are all configured correctly (thanks to
   David
   MacMahon for that, btw), but we still have hit the proverbial brick
   wall of
   10-GbE overflow.  When I last 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread David MacMahon
Hi, Richard,

On Oct 27, 2014, at 9:25 AM, Richard Black wrote:

 This is a reportedly fully-functional model that shouldn't require any major 
 changes in order to operate. However, this has clearly not been the case in 
 at least two independent situations (us and Peter). This begs the question: 
 what's so different about our use of PAPER?

I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is the one 
being used by the PAPER correlator currently fielded in South Africa.  It is 
definitely a fully functional model.  That image (and all source files for it) 
is available from the git repo listed on the PAPER Correlator Manifest page of 
the CASPER Wiki:

https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest

 We, at BYU, have made painstakingly sure that our IP addressing schemes, 
 switch ports, and scripts are all configured correctly (thanks to David 
 MacMahon for that, btw), but we still have hit the proverbial brick wall of 
 10-GbE overflow.  When I last corresponded with David, he explained that he 
 remembers having a similar issue before, but can't recall exactly what the 
 problem was.

Really?  I recall saying that I often forget about increasing the MTU of the 10 
GbE switch and NICs.  I don't recall saying that I had a similar issue before 
but couldn't remember the problem.

 In any case, the fact that by turning down the ADC clock prior to start-up 
 prevents the 10-GbE core from overflowing is a major lead for us at BYU 
 (we've been spinning our wheels on this issue for several months now). By no 
 means are we proposing mid-run ADC clock modifications, but this appears to 
 be a very subtle (and quite sinister, in my opinion) bug.
 
 Any thoughts as to what might be going on?

I cannot explain the 10 GbE overflow that you and Peter are experiencing.  I 
have pushed some updates to the rb-papergpu.git repository listed on the PAPER 
Correlator Manifest page.  The paper_feng_init.rb script now verifies that the 
ADC clocks are locked and provides options for issuing a software sync (only 
recommended for lab use) and for not storing the time of synchronization in 
redis (also only recommended for lab use).

The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1) 
while they are held in reset.  Since you are using the paper_feng_init.rb 
script, this should not be happening (unless something has gone wrong during 
the running of that script) because that script specifically and explicitly 
disables the tx_valid signal before putting the cores into reset and it takes 
the cores out of reset before enabling the tx_valid signal.  So assuming that 
this is not the cause of the overflows, there must be something else that is 
causing the 10 GbE cores to be unable to transmit data fast enough to keep up 
with the data stream it is being fed.  Two things that could cause this are 1) 
running the design faster than the 200 MHz sample clock that it was built for 
and/or 2) some link issue that prevents the core from sending data.  
Unfortunately, I think both of those ideas are also pretty far fetched given 
all you've done to try to get the system working.  I wonder whether there is 
some difference in the ROACH2 firmware (u-boot version or CPLD programming) or 
PPC Linux setup or tcpborhpserver revision or ???.

Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data to 
make sure that it looks OK?

Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
David,

We'll take another close look at what model we are actually using, just to
be safe.

I went back and looked at our e-mails, and sure enough, you're right. You
were referring to the MTU issue as being the problem you tend to suppress
all memory of. It was just that you stated it in a separate paragraph, so,
out-of-context, I extrapolated that you have had the same problem before.
My bad for dragging your good name through the mud. :)

We will also update our local repositories, in the event some bizarre race
condition exists on our end.

I didn't know that the buffer could fill up while reset was asserted. We'll
definitely have to check up on that too.

We haven't tried dumping raw ADC data yet since we have been trying to get
the data link working first. After that, we were planning to inject signal
and examine outputs.

Thanks,

Richard Black

On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon dav...@astro.berkeley.edu
wrote:

 Hi, Richard,

 On Oct 27, 2014, at 9:25 AM, Richard Black wrote:

  This is a reportedly fully-functional model that shouldn't require any
 major changes in order to operate. However, this has clearly not been the
 case in at least two independent situations (us and Peter). This begs the
 question: what's so different about our use of PAPER?

 I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is
 the one being used by the PAPER correlator currently fielded in South
 Africa.  It is definitely a fully functional model.  That image (and all
 source files for it) is available from the git repo listed on the PAPER
 Correlator Manifest page of the CASPER Wiki:

 https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest

  We, at BYU, have made painstakingly sure that our IP addressing schemes,
 switch ports, and scripts are all configured correctly (thanks to David
 MacMahon for that, btw), but we still have hit the proverbial brick wall of
 10-GbE overflow.  When I last corresponded with David, he explained that he
 remembers having a similar issue before, but can't recall exactly what the
 problem was.

 Really?  I recall saying that I often forget about increasing the MTU of
 the 10 GbE switch and NICs.  I don't recall saying that I had a similar
 issue before but couldn't remember the problem.

  In any case, the fact that by turning down the ADC clock prior to
 start-up prevents the 10-GbE core from overflowing is a major lead for us
 at BYU (we've been spinning our wheels on this issue for several months
 now). By no means are we proposing mid-run ADC clock modifications, but
 this appears to be a very subtle (and quite sinister, in my opinion) bug.
 
  Any thoughts as to what might be going on?

 I cannot explain the 10 GbE overflow that you and Peter are experiencing.
 I have pushed some updates to the rb-papergpu.git repository listed on the
 PAPER Correlator Manifest page.  The paper_feng_init.rb script now verifies
 that the ADC clocks are locked and provides options for issuing a software
 sync (only recommended for lab use) and for not storing the time of
 synchronization in redis (also only recommended for lab use).

 The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1)
 while they are held in reset.  Since you are using the paper_feng_init.rb
 script, this should not be happening (unless something has gone wrong
 during the running of that script) because that script specifically and
 explicitly disables the tx_valid signal before putting the cores into reset
 and it takes the cores out of reset before enabling the tx_valid signal.
 So assuming that this is not the cause of the overflows, there must be
 something else that is causing the 10 GbE cores to be unable to transmit
 data fast enough to keep up with the data stream it is being fed.  Two
 things that could cause this are 1) running the design faster than the 200
 MHz sample clock that it was built for and/or 2) some link issue that
 prevents the core from sending data.  Unfortunately, I think both of those
 ideas are also pretty far fetched given all you've done to try to get the
 system working.  I wonder whether there is some difference in the ROACH2
 firmware (u-boot version or CPLD programming) or PPC Linux setup or
 tcpborhpserver revision or ???.

 Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data
 to make sure that it looks OK?

 Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread David MacMahon
Hi, Richard and Peter,

Another possibility that crossed my mind is perhaps your ROACH2s were from the 
batch where the incorrect oscillator was installed for U72.  This seems 
unlikely for Richard based on this email (which also describes the incorrect 
oscillator problem in general):

https://www.mail-archive.com/casper@lists.berkeley.edu/msg04909.html

Maybe it's worth a double check anyway?

Dave

On Oct 27, 2014, at 1:41 PM, Richard Black wrote:

 David,
 
 We'll take another close look at what model we are actually using, just to be 
 safe.
 
 I went back and looked at our e-mails, and sure enough, you're right. You 
 were referring to the MTU issue as being the problem you tend to suppress all 
 memory of. It was just that you stated it in a separate paragraph, so, 
 out-of-context, I extrapolated that you have had the same problem before. My 
 bad for dragging your good name through the mud. :)
 
 We will also update our local repositories, in the event some bizarre race 
 condition exists on our end.
 
 I didn't know that the buffer could fill up while reset was asserted. We'll 
 definitely have to check up on that too.
 
 We haven't tried dumping raw ADC data yet since we have been trying to get 
 the data link working first. After that, we were planning to inject signal 
 and examine outputs.
 
 Thanks,
 
 Richard Black
 
 On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon dav...@astro.berkeley.edu 
 wrote:
 Hi, Richard,
 
 On Oct 27, 2014, at 9:25 AM, Richard Black wrote:
 
  This is a reportedly fully-functional model that shouldn't require any 
  major changes in order to operate. However, this has clearly not been the 
  case in at least two independent situations (us and Peter). This begs the 
  question: what's so different about our use of PAPER?
 
 I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is the 
 one being used by the PAPER correlator currently fielded in South Africa.  It 
 is definitely a fully functional model.  That image (and all source files for 
 it) is available from the git repo listed on the PAPER Correlator Manifest 
 page of the CASPER Wiki:
 
 https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest
 
  We, at BYU, have made painstakingly sure that our IP addressing schemes, 
  switch ports, and scripts are all configured correctly (thanks to David 
  MacMahon for that, btw), but we still have hit the proverbial brick wall of 
  10-GbE overflow.  When I last corresponded with David, he explained that he 
  remembers having a similar issue before, but can't recall exactly what the 
  problem was.
 
 Really?  I recall saying that I often forget about increasing the MTU of the 
 10 GbE switch and NICs.  I don't recall saying that I had a similar issue 
 before but couldn't remember the problem.
 
  In any case, the fact that by turning down the ADC clock prior to start-up 
  prevents the 10-GbE core from overflowing is a major lead for us at BYU 
  (we've been spinning our wheels on this issue for several months now). By 
  no means are we proposing mid-run ADC clock modifications, but this appears 
  to be a very subtle (and quite sinister, in my opinion) bug.
 
  Any thoughts as to what might be going on?
 
 I cannot explain the 10 GbE overflow that you and Peter are experiencing.  I 
 have pushed some updates to the rb-papergpu.git repository listed on the 
 PAPER Correlator Manifest page.  The paper_feng_init.rb script now verifies 
 that the ADC clocks are locked and provides options for issuing a software 
 sync (only recommended for lab use) and for not storing the time of 
 synchronization in redis (also only recommended for lab use).
 
 The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1) 
 while they are held in reset.  Since you are using the paper_feng_init.rb 
 script, this should not be happening (unless something has gone wrong during 
 the running of that script) because that script specifically and explicitly 
 disables the tx_valid signal before putting the cores into reset and it takes 
 the cores out of reset before enabling the tx_valid signal.  So assuming that 
 this is not the cause of the overflows, there must be something else that is 
 causing the 10 GbE cores to be unable to transmit data fast enough to keep up 
 with the data stream it is being fed.  Two things that could cause this are 
 1) running the design faster than the 200 MHz sample clock that it was built 
 for and/or 2) some link issue that prevents the core from sending data.  
 Unfortunately, I think both of those ideas are also pretty far fetched given 
 all you've done to try to get the system working.  I wonder whether there is 
 some difference in the ROACH2 firmware (u-boot version or CPLD programming) 
 or PPC Linux setup or tcpborhpserver revision or ???.
 
 Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data to 
 make sure that it looks OK?
 
 Dave