subject:"Re\: \[casper\] Problem about the adc frequency in PAPER model."

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-19 Thread Marc Welz

On Thu, Nov 20, 2014 at 3:00 AM, Peter Niu  wrote:

> Hi,Marc.
> My PC seems don't support romfs recognition the romfs :
>
> # mkdir -p /mnt/tmp && mount -o loop
> /home/peter/roach2-root-fullmap-2014-08-12.romfs /mnt/tmp
> mount: unknown filesystem type 'romfs
>

Try (as root)

# modprobe romfs

If that doesn't help, you may want to find somebody close by who has some
linux experience...

regards

marc

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-19 Thread Marc Welz

Hello



>  I find a updated roach2-root-fullmap-2014-08-12.romfs.Could you please
> tell me what should I do to make it work?
>  Should I put this file in the same place as tcpborphserver3 in Roach2
> file system (/usr/local/sbin)?
> Thanks for your answer ,I am totally a fresh man. :)
> Peter
>


If you are not solobooting, then on a linux pc somewhere

# mkdir -p /mnt/tmp && mount -o loop roach2-root-fullmap-2014-08-12.romfs
/mnt/tmp

... now copy out /mnt/tmp/sbin/tcpborphserver3 to where you need it

regards

marc

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-19 Thread Marc Welz

On Wed, Nov 19, 2014 at 8:37 AM, Marc Welz  wrote:

> There should be an even newer uImage (ie linux kernel) and romfs (ie flash
> filesystem, containing tcpborphserver3) at that location.
>
> I think the most notable change is that we have changed the kernel memory
> model, so
> that the full 128Mb fpga address space is visible in one go.
>

... meaning that you would need to update both the kernel and
tcpborphserver3 to the revisions checked in a week ago or so, to map the
full address space - just updating one will not be sufficient.

regards

marc

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-19 Thread Marc Welz

On Wed, Nov 19, 2014 at 6:50 AM, Peter Niu  wrote:

> Hi,Dave,
> Sorry reply you late.
> The little trouble I encountered in netboot turned out to be that the
> uImage I am using have changed.Well ,As for a test ,I download the latest
> uImage from
> https://github.com/ska-sa/roach2_nfs_uboot/tree/master/boot, the
> uImage-roach2-3.16-hwmon
> 
>  as
> the uImage in netboot.
> The file like this:
> [peter@roachserver ~]$ file -L /srv/roach_boot/boot/uImage
> /srv/roach_boot/boot/uImage: u-boot legacy uImage,
> Linux-3.16.0-saska-03675-g1c70f, Linux/PowerPC, OS Kernel Image (gzip),
> 3034204 bytes, Tue Aug 26 14:54:14 2014, Load Address: 0x0070, Entry
> Point: 0x007010C4, Header CRC: 0x66EDCF88, Data CRC: 0x42A230BA
> I changed the uImage to uImage-r2borph3
> 
> ,
>
>
> There should be an even newer uImage (ie linux kernel) and romfs (ie flash
filesystem, containing tcpborphserver3) at that location.

I think the most notable change is that we have changed the kernel memory
model, so
that the full 128Mb fpga address space is visible in one go. There are a
probably some other
fixes and change too - the commit logs in katcp_devel should have some
information.

Things are rather busy here, so apologies for not updating the NFS
filesystem - we currently don't use it, so it is likely to remain out of
date, though Dave (I think ?) maintains a more
recent version.

regards

marc

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-18 Thread Peter Niu

Hi,Dave,
Sorry reply you late.
The little trouble I encountered in netboot turned out to be that the uImage I 
am using have changed.Well ,As for a test ,I download the latest uImage from
https://github.com/ska-sa/roach2_nfs_uboot/tree/master/boot, the 
uImage-roach2-3.16-hwmon as the uImage in netboot.
The file like this:
[peter@roachserver ~]$ file -L /srv/roach_boot/boot/uImage
/srv/roach_boot/boot/uImage: u-boot legacy uImage, 
Linux-3.16.0-saska-03675-g1c70f, Linux/PowerPC, OS Kernel Image (gzip), 3034204 
bytes, Tue Aug 26 14:54:14 2014, Load Address: 0x0070, Entry Point: 
0x007010C4, Header CRC: 0x66EDCF88, Data CRC: 0x42A230BA
I changed the uImage to uImage-r2borph3,


$ file -L /srv/roach_boot/boot/uImage
/srv/roach_boot/boot/uImage: u-boot legacy uImage, Linux-3.7.0-rc2+, 
Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Mon Nov 19 15:30:35 2012, 
Load Address: 0x0050, Entry Point: 0x005010D4, Header CRC: 0x9BDC0E32, Data 
CRC: 0xF3A1DC96


the "hwmon" errors didn't show up again.


Thanks for your detailed explanation about the difference of uImage, borph,and 
especially the tcpborphserver!This helps me a lot!
Peter

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-17 Thread David MacMahon

Hi, Peter,

On Nov 15, 2014, at 9:31 PM, Peter Niu wrote:

> I am happy to tell you that your thought is perfect right! It's a tcpborph 
> problem.
> Follow your instructions, I copy the tcpborph3 from soloboot file /sbin to 
> netboot file /usr/local/sbin.
> Then run netboot again the overflow problem go away!
> What's more interesting, I found the no-work-in-soloboot  roach did not have 
> tcpborph3 in /sbin file .I copy a tcpborph3 from the other roach through 
> mount command  to this soloboot problem roach.finally all of my roaches could 
> work normally in netboot without overflow.

That's GREAT!!!

> root@pf5:/usr/local/sbin# /etc/init.d/tcpborphserver start
> tcpborphserver: ... loading roach mmap driver ... 
> insmod: can't read '/lib/modules/roach_mmap.ko': No such file or directory
> version *3* using mmap
> root@pf5:/usr/local/sbin# tcpborphserver3: about to go into background
> hwmon: create hwsensor could not open /sys/bus/i2c/devices/0-0018/temp1_input 
> (No such file or directory)
> hwmon: create hwsensor could not open /sys/bus/i2c/devices/0-0018/temp2_input 
> (No such file or directory)
> hwmon: create hwsensor could not open /sys/bus/i2c/devices/0-0018/temp3_input 
> (No such file or directory)
> ...

The "insmod" error is harmless.  You could eliminate it by commenting out the 
insmod line in the ROACH2's /etc/init.d/tcpborhserver script.

The "hwmon" errors are probably not critical, but still a little troubling.  
The missing /sys/bus/i2c/devices/0-0018/* files should be created by the kernel 
if you are in fact running the kernel you reported on earlier:

On Nov 13, 2014, at 7:22 PM, Peter Niu wrote:

> [peter@roachserver ~]$ file -L /srv/roach_boot/boot/uImage 
> /srv/roach_boot/boot/uImage: u-boot legacy uImage, Linux-3.7.0-rc2+, 
> Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Mon Nov 19 15:30:35 
> 2012, Load Address: 0x0050, Entry Point: 0x005010D4, Header CRC: 
> 0x9BDC0E32, Data CRC: 0xF3A1DC96

That is the kernel version we use here and it does create the 
/sys/bus/i2c/devices/0-0018/* files on our ROACH2s.  You can verify the running 
kernel version with the "uname" command:

$ ssh root@pf1 uname -a
Linux r2d020813 3.7.0-rc2+ #21 Mon Nov 19 09:30:32 SAST 2012 ppc GNU/Linux

$ ssh root@pf1 'ls /sys/bus/i2c/devices/0-0018/temp?_input'
/sys/bus/i2c/devices/0-0018/temp1_input
/sys/bus/i2c/devices/0-0018/temp2_input
/sys/bus/i2c/devices/0-0018/temp3_input
/sys/bus/i2c/devices/0-0018/temp4_input
/sys/bus/i2c/devices/0-0018/temp5_input

Maybe the DHCP server is passing the ROACH a different kernel (i.e. uImage) 
filename?  Maybe there are other i2c issues with your setup?

> In the end, could you tell me what is the difference between tcpborph and 
> borph and uImage?

uImage is just a u-boot loadable kernel image.

BORPH (https://casper.berkeley.edu/wiki/BORPH) is an older interface to the 
FPGA "gateware devices" (e.g. software registers and shared BRAMs) that used a 
heavily modified Linux kernel that created virtual files in the /proc 
filesystem for each gateware device.  These virtual files could be read from 
and written to like regular files.  There was also an adaptation of the ELF 
file format to combine an FPGA loader program with an FPGA bitstream to allow 
one to execute a "BOF file" (short for "BORPH file", I think) as a Linux 
process on the ROACH's PPC.  This approach had some shortcomings: 1) the 
modifications to the Linux kernel were very invasive and hard to keep porting 
forward to new kernel versions and 2) using file I/O to read/write gateware 
devices was slow and overly cumbersome.  The newer approach (designed by Shanly 
Rajan) uses a kernel driver to allow the FPGA address space to be mapped 
directly into the address space of user processes.  This kernel driver is much 
easier to port to newer kernel versions and provides for faster and less 
cumbersome access to the gateware devices.  FWIW, the BOF files in use today 
are still essentially ELF files, though they are no longer executable.

tcpborphserver is the name of a program that provides KATCP services over the 
network.  It should probably be called "tcpkatcpserver", but the tcpborphserver 
name goes back to when it was used with BORPH-modified kernels and the /proc 
filesystem interface to gateware devices.  Most user interaction with CASPER 
FPGA designs occurs through the KATCP interface provides by tcpborphserver.  I 
think the program named "tcpborphserver2" uses the older BORPH interface to the 
FPGA while the program named "tcpborphserver3" uses the newer memory mapped 
interface.

Hope this helps,
Dave

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-17 Thread Marc Welz

On Fri, Nov 14, 2014 at 12:40 AM, David MacMahon 
wrote:

>
> These two ROACH2s were repaired by Digicom (813 for the U72 fix and 669
> for "vehicular stress").  It looks like Digicom is populating the ROACH2
> soloboot with a new uImage that is not available in the "roach2_nfs_uboot"
> repo.  Are different kernels required for netboot vs soloboot or is this
> just an oversight?
>

The kernels we build should have support for both, and thus should be
interchangable. On kernels which have a /proc/config.gz you can also check
with what options the kernel was built to double check that.

And now replying to myself, in response to a previous question:

Are the drivers that provide the /dev/roach/mem and /dev/roach/config nodes
> compiled into the kernel image?
>

> Yes, the roach kernels have never used modules

... is not technically 100% correct - at some point during development
there was a kernel which loaded the fpga module dynamically (hence the
vestigial modprobe on the nfs startup scripts) but recent kernels are all
monolithic, so in those cases the modprobe can be safely ignored/removed.

regards

marc

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-15 Thread Peter Niu

Hi ,Dave,
I am happy to tell you that your thought is perfect right! It's a tcpborph 
problem.
Follow your instructions, I copy the tcpborph3 from soloboot file /sbin to 
netboot file /usr/local/sbin.
Then run netboot again the overflow problem go away!
What's more interesting, I found the no-work-in-soloboot  roach did not have 
tcpborph3 in /sbin file .I copy a tcpborph3 from the other roach through mount 
command  to this soloboot problem roach.finally all of my roaches could work 
normally in netboot without overflow.
However there is one thing left that when I run 
usr/local/sbin# /etc/init.d/tcpborphserver stop,It's ok ,
but when I run
  /etc/init.d/tcpborphserver start,
it shows somehow a bit wired.

root@pf5:/usr/local/sbin# /etc/init.d/tcpborphserver start
tcpborphserver: ... loading roach mmap driver ...
insmod: can't read '/lib/modules/roach_mmap.ko': No such file or directory
version *3* using mmap
root@pf5:/usr/local/sbin# tcpborphserver3: about to go into background
hwmon: create hwsensor could not open /sys/bus/i2c/devices/0-0018/temp1_input 
(No such file or directory)
hwmon: create hwsensor could not open /sys/bus/i2c/devices/0-0018/temp2_input 
(No such file or directory)
hwmon: create hwsensor could not open /sys/bus/i2c/devices/0-0018/temp3_input 
(No such file or directory)
...
I am not sure whether it is a bug error,Any way it doesn't influence the 
netboot work.
Now the version got through telnet like this:
[peter@roachserver ~]$ telnet pf5 7147
Trying 192.168.100.5...
Connected to pf5.
Escape character is '^]'.
#version memcpy-88-g38ad77a-dirty
#build-state 2013-04-11T11:50:43
In the end, could you tell me what is the difference between tcpborph and borph 
and uImage?
 I just guess they may be the linux system run in Fpga PPC.
Thanks for your help!
Peter

At 2014-11-16 01:27:25, "David MacMahon"  wrote:
>Hi, Peter,
>
>I forgot two steps and had a typo in that cp command (should be 
>tcpborphserver3):
>
># Remount root filesystem to be read/write
>mount -o remount,rw /
>
># Copy the soloboot tcpborphserver3 executable to the netboot filesystem
>cp /mnt/sbin/tcpborphserver3 /usr/local/sbin/.
>
># Remount root filesystem to be read-only
>mount -o remount,ro /
>
>And just to clarify, yes, these command should be run on the ROACH2 (i.e. do 
>"ssh root@pf1" to login to ROACH2, then do all the commands).
>
>Thanks (and sorry about that),
>Dave
>
>
>On Nov 14, 2014, at 5:54 PM, Peter Niu wrote:
>
>> 
>> 
>> 
>> 
>> At 2014-11-15 08:34:51, "David MacMahon"  wrote:
>> >Hi, Peter,
>> >
>> >On Nov 13, 2014, at 11:19 PM, 牛晨辉 wrote:
>> >
>> >> The soloboot one use the telnet:
>> >> [root@roachserver Downloads]# telnet pf4 7147
>> >> Trying 192.168.100.4...
>> >> Connected to pf4.
>> >> Escape character is '^]'.
>> >> #version memcpy-88-g38ad77a-dirty
>> >> #build-state 2013-04-11T11:50:43
>> >> 
>> >> And the netboot version like :
>> >> [root@roachserver Downloads]# telnet pf1 7147
>> >> Trying 192.168.100.1...
>> >> Connected to pf1.
>> >> Escape character is '^]'.
>> >> #version alpha-6-g0b8dd54
>> >> #build-state 2012-10-24T10:04:56
>> >> #version-connect katcp-library alpha-6-g0b8dd54 2012-10-24T10:04:56
>> >> #version-connect katcp-protocol 4.9-M
>> >> #version-connect kernel 3.16.0-saska-03675-g1c70ffc 
>> >> #3\_Tue\_Aug\_26\_08:52:14\_SAST\_2014
>> >
>> >Thanks for this information!  Can you please run the following steps on a 
>> >netbooted roach2?
>> >
>> ># Rename existing tcpborphserver3 executable
>> >mv /usr/local/sbin/tcpborphserver3 /usr/local/sbin/tcpborphserver3.netboot
>> >
>> ># Mount the soloboot root filesystem on /mnt
>> >mount /dev/mtdblock1 /mnt
>> >
>> ># Copy the soloboot tcpborphserver3 executable to the netboot filesystem
>> >cp /mnt/sbin/tcpborphserver2 /usr/local/sbin/.
>> >
>> ># Unmount the soloboot root filesystem
>> >umount /mnt
>> >
>> ># Restart tcpborhserver
>> >/etc/init.d/tcpborphserver stop
>> >/etc/init.d/tcpborphserver start
>> >
>> >Then telnet to the roach port 7147 to verify that the running version of 
>> >tcpborphserver is the one copied from soloboot.  If so, try to run the 
>> >PAPER model (or your own model) to see whether the new tcpborphserver3 
>> >executable provides different (hopefully better!) results than the old 
>> >netboot one.
>> >
>> >Thanks,
>> >Dave
>> >
>> 
>> 
>> 
>

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread David MacMahon

Hi, Peter,

Thanks for this information!

On Nov 13, 2014, at 7:22 PM, Peter Niu wrote:

> In our PC, the information like this:
> [peter@roachserver ~]$ file -L /srv/roach_boot/boot/uImage 
> /srv/roach_boot/boot/uImage: u-boot legacy uImage, Linux-3.7.0-rc2+, 
> Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Mon Nov 19 15:30:35 
> 2012, Load Address: 0x0050, Entry Point: 0x005010D4, Header CRC: 
> 0x9BDC0E32, Data CRC: 0xF3A1DC96
> (I am not sure why the data is not same with you:Mon Nov 19 15:30:35 2012)

> At 2014-11-14 08:40:35, "David MacMahon"  
>> $ file -L /srv/tftpboot/uboot-roach2/uImage-r2borph3
>> /srv/tftpboot/uboot-roach2/uImage-r2borph3: u-boot legacy uImage, 
>> Linux-3.7.0-rc2+, Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Sun 
>> Nov 18 23:30:35 2012, Load Address: 0x0050, Entry Point: 0x005010D4, 
>> Header CRC: 0x9BDC0E32, Data CRC: 0xF3A1DC96

These are the same uImage.  The length, header CRC, and data CRC match.  The 
timestamps differ by 16 hours, but I think that's because the timestamp is 
printed in the local timezone.  If you do:

env TZ=UTC file -L /srv/roach_boot/boot/uImage

...you will get a timestamp of Mon Nov 19 07:30:35 2012.

This means that the uImage file is NOT the cause of the problem since the same 
version works for us but not for you.  I think this might leave only the 
tcpborphserver version as the cause of the problem.  Could it be anything else?

Can you please run:

telnet pf1 7147

(Type "" then "q" then "" to quit.)

against both the soloboot and netboot environments and let me know the results?

Thanks again,
Dave

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread Peter Niu




Hi ,Dave
Though I am quite a new hand in the uImage system ,I did suspect the uImage 
could cause this problem.Do you remember one of my roach in netboot doesn't 
work normally I mentioned previously?It works fine with the soloboot. 
Interestingly,one of my roach works fine with netboot before,now could not 
works in soloboot!(Using the telnet,The /boffiles could not found in the 
soloboot Roach linux while others could) .
I checked the uImage on the no-work-in-soloboot roach,Well,I use soloboot 
now,so the file command can not be found on busybox.
~ # file
-sh: file: not found

but the set-up in soloboot process tell me the ulmage is like this：
Image Name: Linux-3.4.0-rc3+
Image Type: PowerPC Linux Kernel Image (gzip compressed)
Data Size: 2429134 Bytes = 2.3 MiB
Load Address: 0050
Entry Point: 00500460
Verifying Checksum ... OK
Uncompressing Kernel Image ... OK

The same roach run in netboot,and log in as root in ssh,The information like 
this:

root@pf1:~# file -s /dev/mtdblock0
/dev/mtdblock0: u-boot legacy uImage, Linux-3.4.0-rc3+, Linux/PowerPC, OS 
Kernel Image (gzip), 2429134 bytes, Tue May 29 15:05:09 2012, Load Address: 
0x00507

As the netboot,we also use the 

uImage-r2borph3

In our PC, the information like this:
[peter@roachserver ~]$ file -L /srv/roach_boot/boot/uImage 
/srv/roach_boot/boot/uImage: u-boot legacy uImage, Linux-3.7.0-rc2+, 
Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Mon Nov 19 15:30:35 2012, 
Load Address: 0x0050, Entry Point: 0x005010D4, Header CRC: 0x9BDC0E32, Data 
CRC: 0xF3A1DC96
(I am not sure why the data is not same with you:Mon Nov 19 15:30:35 2012)

what's more information,The other roachs which could work in both soloboot and 
netboot. 
The soloboot information in set-up process:
## Booting kernel from Legacy Image at f800 ...
   Image Name:   Linux-3.9.0-rc1+
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:2345540 Bytes = 2.2 MiB
   Load Address: 0050
   Entry Point:  005010d4
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK
As the no-work-in-soloboot roach in soloboot has a image name :Linux-3.4.0-rc3+
I am not sure whether is the Linux version in soloboot that matters.
Jason have once mentioned this similar question to me,he also sent me a latest 
binary romfs soloboot.
https://www.mail-archive.com/casper%40lists.berkeley.edu/msg05393.html
Hope these information could be helpful to our question!
Thanks for your warm help to me in PAPER model !

Peter

PS:I also found a new version on https://github.com/ska-sa/roach2_nfs_uboot 
upload on Nov 12, 2014.I will try it latter.





At 2014-11-14 08:40:35, "David MacMahon"  wrote:
>Thanks, Marc,
>
>On Nov 13, 2014, at 12:08 AM, Marc Welz wrote:
>
>> Also look in https://github.com/ska-sa/roach2_nfs_uboot, particularly the 
>> boot directory - occasionally prebuilt images get uploaded there, though for 
>> the change information you will
>> have to read the ska-sa/katcp_devel commits. 
>
>FWIW, we are using the "boot/uImage-r2borph3" kernel image from commit a8da6b6 
>of that repository.  The "file" command shows it as:
>
>$ file -L /srv/tftpboot/uboot-roach2/uImage-r2borph3
>/srv/tftpboot/uboot-roach2/uImage-r2borph3: u-boot legacy uImage, 
>Linux-3.7.0-rc2+, Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Sun 
>Nov 18 23:30:35 2012, Load Address: 0x0050, Entry Point: 0x005010D4, 
>Header CRC: 0x9BDC0E32, Data CRC: 0xF3A1DC96
>
>Interestingly, the (NOT used by PAPER) soloboot uImage kernel image in 
>/dev/mtdblock0 on one of our ROACH2s deployed in South Africa is:
>
>root@r2d020808:~# file -s /dev/mtdblock0
>/dev/mtdblock0: u-boot legacy uImage, Linux-3.4.0-rc3+, Linux/PowerPC, OS 
>Kernel Image (gzip), 2429134 bytes, Tue May 29 17:05:09 2012, Load Address: 
>0x0050, Entry Point: 0x00500460, Header CRC: 0xCAB17B63, Data CRC: 
>0x096FD3C7
>
>...while the (NOT used by PAPER) soloboot uImage kernel image in 
>/dev/mtdblock0 on two ROACH2s in our lab is:
>
>root@r2d020813:~# file -s /dev/mtdblock0 
>/dev/mtdblock0: u-boot legacy uImage, Linux-3.9.0-rc1+, Linux/PowerPC, OS 
>Kernel Image (gzip), 2345540 bytes, Wed Mar  6 02:54:34 2013, Load Address: 
>0x0050, Entry Point: 0x005010D4, Header CRC: 0xC0B47AFF, Data CRC: 
>0x9247592F
>
>root@r2d020669:~# file -s /dev/mtdblock0
>/dev/mtdblock0: u-boot legacy uImage, Linux-3.9.0-rc1+, Linux/PowerPC, OS 
>Kernel Image (gzip), 2345540 bytes, Wed Mar  6 02:54:34 2013, Load Address: 
>0x0050, Entry Point: 0x005010D4, Header CRC: 0xC0B47AFF, Data CRC: 
>0x9247592F
>
>These two ROACH2s were repaired by Digicom (813 for the U72 fix and 669 for 
>"vehicular stress").  It looks like Digicom is populating the ROACH2 soloboot 
>with a new uImage that is not available in the "roach2_nfs_uboot" repo.  Are 
>different kernels required for netboot vs soloboot or is this just an 
>oversight?
>
>Richard and/or Peter,
>
>I'm curious to know what versions of uImage you have for both your netboot 
>env

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread David MacMahon

Thanks, Marc,

On Nov 13, 2014, at 12:08 AM, Marc Welz wrote:

> Also look in https://github.com/ska-sa/roach2_nfs_uboot, particularly the 
> boot directory - occasionally prebuilt images get uploaded there, though for 
> the change information you will
> have to read the ska-sa/katcp_devel commits. 

FWIW, we are using the "boot/uImage-r2borph3" kernel image from commit a8da6b6 
of that repository.  The "file" command shows it as:

$ file -L /srv/tftpboot/uboot-roach2/uImage-r2borph3
/srv/tftpboot/uboot-roach2/uImage-r2borph3: u-boot legacy uImage, 
Linux-3.7.0-rc2+, Linux/PowerPC, OS Kernel Image (gzip), 2231485 bytes, Sun Nov 
18 23:30:35 2012, Load Address: 0x0050, Entry Point: 0x005010D4, Header 
CRC: 0x9BDC0E32, Data CRC: 0xF3A1DC96

Interestingly, the (NOT used by PAPER) soloboot uImage kernel image in 
/dev/mtdblock0 on one of our ROACH2s deployed in South Africa is:

root@r2d020808:~# file -s /dev/mtdblock0
/dev/mtdblock0: u-boot legacy uImage, Linux-3.4.0-rc3+, Linux/PowerPC, OS 
Kernel Image (gzip), 2429134 bytes, Tue May 29 17:05:09 2012, Load Address: 
0x0050, Entry Point: 0x00500460, Header CRC: 0xCAB17B63, Data CRC: 
0x096FD3C7

...while the (NOT used by PAPER) soloboot uImage kernel image in /dev/mtdblock0 
on two ROACH2s in our lab is:

root@r2d020813:~# file -s /dev/mtdblock0 
/dev/mtdblock0: u-boot legacy uImage, Linux-3.9.0-rc1+, Linux/PowerPC, OS 
Kernel Image (gzip), 2345540 bytes, Wed Mar  6 02:54:34 2013, Load Address: 
0x0050, Entry Point: 0x005010D4, Header CRC: 0xC0B47AFF, Data CRC: 
0x9247592F

root@r2d020669:~# file -s /dev/mtdblock0
/dev/mtdblock0: u-boot legacy uImage, Linux-3.9.0-rc1+, Linux/PowerPC, OS 
Kernel Image (gzip), 2345540 bytes, Wed Mar  6 02:54:34 2013, Load Address: 
0x0050, Entry Point: 0x005010D4, Header CRC: 0xC0B47AFF, Data CRC: 
0x9247592F

These two ROACH2s were repaired by Digicom (813 for the U72 fix and 669 for 
"vehicular stress").  It looks like Digicom is populating the ROACH2 soloboot 
with a new uImage that is not available in the "roach2_nfs_uboot" repo.  Are 
different kernels required for netboot vs soloboot or is this just an oversight?

Richard and/or Peter,

I'm curious to know what versions of uImage you have for both your netboot 
environment and in /dev/mtdblock0 on your ROACH2s.  Can you please run the 
above "file" commands on your uImages and report back with the results?  This 
will hopefully help us zero in on where the problem is (and where/when it was 
corrected).

Thanks,
Dave

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread Marc Welz

On Thu, Nov 13, 2014 at 8:32 AM, David MacMahon 
wrote:

>
> Are the drivers that provide the /dev/roach/mem and /dev/roach/config
> nodes compiled into the kernel image?
>

Yes, the roach kernels have never used modules

regards

marc


>
>

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread David MacMahon

Hi, Marc,

On Nov 13, 2014, at 12:08 AM, Marc Welz wrote:

> On Thu, Nov 13, 2014 at 5:49 AM, Richard Black  wrote:
> 
>> Any ideas why this works? Is it because of an NFS lock-out or a 10-GbE 
>> driver issue in the NFS kernel image?

None of the control stuff goes over NFS so I don't think that's likely to be 
the problem, but at this point (almost) nothing would surprise me.

> So I don't know. It could also be a version difference ? The things to look 
> at are
> the kernel and tcpborphserver  (the former is a file in its own right, the 
> latter can
> be gotten by mounting a romfs image via loopback and copying out 
> /sbin/tcpborphserver3).

Are the drivers that provide the /dev/roach/mem and /dev/roach/config nodes 
compiled into the kernel image?

> We also have had interesting cases where the fpga doesn't quite do what the 
> bus controller
> on the power pc expects to happen - in those cases random perturbations 
> change the behaviour, 
> although pathological cases can have the fpga contend with flash accesses 
> which then corrupts things. 
> 
> Also look in https://github.com/ska-sa/roach2_nfs_uboot, particularly the 
> boot directory - occasionally prebuilt images get uploaded there, though for 
> the change information you will
> have to read the ska-sa/katcp_devel commits. 
> 
> Final, unrelated, tip: It is fine to have another (interactive) telnet 
> connection to port 7147 
> on the roach while your scripts are doing things - this connection can be 
> used to see failures or problems, and for detailed debugging messages, try 
> typing "?log-level trace" - just be mindful
> of the performance impact. There is a tool (kcplog) which can be built for a 
> remote machine 
> to automate this.

Thanks for the tips!

Dave

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread David MacMahon

Hi, Richard,

I'm glad this fixed your problem as well!  This is definitely one for the 
wiki!!!

Dave

On Nov 12, 2014, at 2:34 PM, Richard Black wrote:

> Wow. Well that seemed to be the magic bullet. Thanks!
> 
> Any ideas why this works? Is it because of an NFS lock-out or a 10-GbE driver 
> issue in the NFS kernel image?
> 
> In any case, this is a tremendous discovery! Thanks to all for all the effort!
> 
> Richard
> 
> On Wednesday, November 12, 2014, 牛晨辉  wrote:
> 
> Hi All,
> I'm happy to tell you the PAPER model can run without overflow finally!
> I find the bof file no matter PAPER model or own could run in 200Mhz and the 
> packet structure is right.
> That is the System setup on roach it matters,(Thanks to Marc's help in 
> soloboot!).I try the soloboot on the roach,
> and it works fine for the model.
> I don't know why the setup on netboot is not ok ,(it influenced the frequency 
> too much I guess)however, FWIW,The overflow problem company with me for few 
> weeks finally solved out!
> I could have a good sleep tonight,Thanks for your warm help!
> Peter
> 
> 
> 
> 
> 
> 
> At 2014-11-08 03:10:47, "David MacMahon"  > wrote:
> >Hi, Richard,
> >
> >I think that your 1 PPS should be very usable.  I think we typically 
> >generate the 1 PPS from a GPS clock.
> >
> >If you want to try a test, you could disconnect the 1 PPS and use the 
> >software generated sync signal as per the earlier emails.  If that works and 
> >using the external 1 PPS doesn't then you will have found the problem.  I'd 
> >be surprised (but happy!) if that turns out to be the problem.
> >
> >Dave
> >
> >On Nov 7, 2014, at 10:55 AM, Richard Black wrote:
> >
> >> Thanks David and all,
> >> 
> >> I unfortunately misspoke when it came to the power in the ADC clock 
> >> signal. In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
> >> 
> >> I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To 
> >> check on possible ringing, I also hooked up our pulse generator to an 
> >> oscilloscope (I increased the pulse width to 10 ms, so I could see it). 
> >> The waveform I observe has some severe overshoot both on the uptake and 
> >> down. I've attached a drawing to explain what I mean.
> >> 
> >> I can't seem to mitigate this overshoot with our little Agilent arbitrary 
> >> waveform generator. Is this similar to the ringing seen at NRAO? If so, 
> >> how is the 1 PPS generated by casperites?
> >> 
> >> Thanks,
> >> 
> >> Richard Black
> >> 
> >> On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon <
> dav...@astro.berkeley.edu
> > wrote:
> >> Hi, Richard,
> >> 
> >> On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
> >> 
> >> > Haven't heard anything for a while, so I thought I would add some more 
> >> > detail about our system setup to see if it might shed some light on the 
> >> > problem:
> >> >
> >> > 1 PPS Signal
> >> > -
> >> > Square pulse
> >> > Frequency: 1 Hz
> >> > Amplitude: 3 Vpp
> >> > Offset: 0 V
> >> > Width: 10 ms
> >> > Edge Time: 5 ns
> >> 
> >> That should be fine assuming the 3 Vpp is measured with the 50 ohm 
> >> termination in place.  If you want to try a software sync, you can pass 
> >> "-S" (UPPERcase!) to the latest paper_feng_init.rb script.  Check the 
> >> output of "paper_feng_init.rb --help" to see whether your version supports 
> >> that option.
> >> 
> >> > ADC Clock
> >> > -
> >> > CW Tone
> >> > Frequency: 200 MHz
> >> > Power: -9 dBm
> >> 
> >> It would be a good idea to increase the power level to +6 dBm as described 
> >> on this wiki page:
> >> 
> >> 
> https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
> 
> >> 
> >> But if the paper_feng_init.rb script reports that the ADC clocks are 
> >> locked and they measure approximately 200 MHz, then I think this is 
> >> unlikely to be the cause of the 10 GbE overflow problems (though it would 
> >> be great if the fix were this simple!).
> >> 
> >> > For David, are there any red flags with our UBoot version or ROACH CPLD? 
> >> > Here they are again for reference:
> >> >
> >> > From serial interface after ROACH reboot
> >> > ==
> >> > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> >> > ...
> >> > CPLD: 2.1
> >> > ==
> >> 
> >> This matches one of our ROACH2s that is running and sending 10 GbE packets 
> >> in our lab:
> >> 
> >> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> >> 
> >> CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
> >>No Security/Kasumi support
> >>Bootstrap Option C - Boot ROM Location EBC (16 bits)
> >>32 kB I-Cache 32 kB D-Cache
> >> Board: ROACH2
> >> I2C:   ready
> >> DRAM:  512 MiB
> >> Flash: 128 MiB
> >> In:serial
> >> Out:   serial
> >> Err:   serial
> >> CPLD:  2.1
> >> USB:   Host(int phy)
> >> SN:ROACH2.2 batch=D#6#69 software fixups match
> >> MAC:   02:44:01:0

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread David MacMahon

Hi, Peter,

That's GREAT!  It would be interesting to figure out what the difference is 
between the netboot environment and the soloboot environment!

Dave

On Nov 12, 2014, at 7:49 AM, 牛晨辉 wrote:

> 
> Hi All,
> I'm happy to tell you the PAPER model can run without overflow finally!
> I find the bof file no matter PAPER model or own could run in 200Mhz and the 
> packet structure is right.
> That is the System setup on roach it matters,(Thanks to Marc's help in 
> soloboot!).I try the soloboot on the roach,
> and it works fine for the model.
> I don't know why the setup on netboot is not ok ,(it influenced the frequency 
> too much I guess)however, FWIW,The overflow problem company with me for few 
> weeks finally solved out!
> I could have a good sleep tonight,Thanks for your warm help!
> Peter
> 
> 
> 
> 
> 
> 
> At 2014-11-08 03:10:47, "David MacMahon"  wrote:
> >Hi, Richard,
> >
> >I think that your 1 PPS should be very usable.  I think we typically 
> >generate the 1 PPS from a GPS clock.
> >
> >If you want to try a test, you could disconnect the 1 PPS and use the 
> >software generated sync signal as per the earlier emails.  If that works and 
> >using the external 1 PPS doesn't then you will have found the problem.  I'd 
> >be surprised (but happy!) if that turns out to be the problem.
> >
> >Dave
> >
> >On Nov 7, 2014, at 10:55 AM, Richard Black wrote:
> >
> >> Thanks David and all,
> >> 
> >> I unfortunately misspoke when it came to the power in the ADC clock 
> >> signal. In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
> >> 
> >> I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To 
> >> check on possible ringing, I also hooked up our pulse generator to an 
> >> oscilloscope (I increased the pulse width to 10 ms, so I could see it). 
> >> The waveform I observe has some severe overshoot both on the uptake and 
> >> down. I've attached a drawing to explain what I mean.
> >> 
> >> I can't seem to mitigate this overshoot with our little Agilent arbitrary 
> >> waveform generator. Is this similar to the ringing seen at NRAO? If so, 
> >> how is the 1 PPS generated by casperites?
> >> 
> >> Thanks,
> >> 
> >> Richard Black
> >> 
> >> On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon 
> >>  wrote:
> >> Hi, Richard,
> >> 
> >> On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
> >> 
> >> > Haven't heard anything for a while, so I thought I would add some more 
> >> > detail about our system setup to see if it might shed some light on the 
> >> > problem:
> >> >
> >> > 1 PPS Signal
> >> > -
> >> > Square pulse
> >> > Frequency: 1 Hz
> >> > Amplitude: 3 Vpp
> >> > Offset: 0 V
> >> > Width: 10 ms
> >> > Edge Time: 5 ns
> >> 
> >> That should be fine assuming the 3 Vpp is measured with the 50 ohm 
> >> termination in place.  If you want to try a software sync, you can pass 
> >> "-S" (UPPERcase!) to the latest paper_feng_init.rb script.  Check the 
> >> output of "paper_feng_init.rb --help" to see whether your version supports 
> >> that option.
> >> 
> >> > ADC Clock
> >> > -
> >> > CW Tone
> >> > Frequency: 200 MHz
> >> > Power: -9 dBm
> >> 
> >> It would be a good idea to increase the power level to +6 dBm as described 
> >> on this wiki page:
> >> 
> >> https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
> >> 
> >> But if the paper_feng_init.rb script reports that the ADC clocks are 
> >> locked and they measure approximately 200 MHz, then I think this is 
> >> unlikely to be the cause of the 10 GbE overflow problems (though it would 
> >> be great if the fix were this simple!).
> >> 
> >> > For David, are there any red flags with our UBoot version or ROACH CPLD? 
> >> > Here they are again for reference:
> >> >
> >> > From serial interface after ROACH reboot
> >> > ==
> >> > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> >> > ...
> >> > CPLD: 2.1
> >> > ==
> >> 
> >> This matches one of our ROACH2s that is running and sending 10 GbE packets 
> >> in our lab:
> >> 
> >> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> >> 
> >> CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
> >>No Security/Kasumi support
> >>Bootstrap Option C - Boot ROM Location EBC (16 bits)
> >>32 kB I-Cache 32 kB D-Cache
> >> Board: ROACH2
> >> I2C:   ready
> >> DRAM:  512 MiB
> >> Flash: 128 MiB
> >> In:serial
> >> Out:   serial
> >> Err:   serial
> >> CPLD:  2.1
> >> USB:   Host(int phy)
> >> SN:ROACH2.2 batch=D#6#69 software fixups match
> >> MAC:   02:44:01:02:06:45
> >> DTT:   1 is 23 C
> >> DTT:   2 is 26 C
> >> Net:   ppc_4xx_eth0
> >> 
> >> Hope this helps,
> >> Dave
> >> 
> >> 
> >> 
> >
> 
> 
>

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-13 Thread Marc Welz

On Thu, Nov 13, 2014 at 5:49 AM, Richard Black 
wrote:

> Wow. Well that seemed to be the magic bullet. Thanks!
>
> Any ideas why this works? Is it because of an NFS lock-out or a 10-GbE
> driver issue in the NFS kernel image?
>

So I don't know. It could also be a version difference ? The things to look
at are
the kernel and tcpborphserver  (the former is a file in its own right, the
latter can
be gotten by mounting a romfs image via loopback and copying out
/sbin/tcpborphserver3).

We also have had interesting cases where the fpga doesn't quite do what the
bus controller
on the power pc expects to happen - in those cases random perturbations
change the behaviour,
although pathological cases can have the fpga contend with flash accesses
which then corrupts things.

Also look in https://github.com/ska-sa/roach2_nfs_uboot, particularly the
boot directory - occasionally prebuilt images get uploaded there, though
for the change information you will
have to read the ska-sa/katcp_devel commits.

Final, unrelated, tip: It is fine to have another (interactive) telnet
connection to port 7147
on the roach while your scripts are doing things - this connection can be
used to see failures or problems, and for detailed debugging messages, try
typing "?log-level trace" - just be mindful
of the performance impact. There is a tool (kcplog) which can be built for
a remote machine
to automate this.

regards

marc

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-12 Thread Richard Black

Wow. Well that seemed to be the magic bullet. Thanks!

Any ideas why this works? Is it because of an NFS lock-out or a 10-GbE
driver issue in the NFS kernel image?

In any case, this is a tremendous discovery! Thanks to all for all the
effort!

Richard

On Wednesday, November 12, 2014, 牛晨辉  wrote:

>
> Hi All,
> I'm happy to tell you the PAPER model can run without overflow finally!
> I find the bof file no matter PAPER model or own could run in 200Mhz and
> the packet structure is right.
> That is the System setup on roach it matters,(Thanks to Marc's help in
> soloboot!).I try the soloboot on the roach,
> and it works fine for the model.
> I don't know why the setup on netboot is not ok ,(it influenced the
> frequency too much I guess)however, FWIW,The overflow problem company with
> me for few weeks finally solved out!
> I could have a good sleep tonight,Thanks for your warm help!
> Peter
>
>
>
>
>
>
> At 2014-11-08 03:10:47, "David MacMahon"  > wrote:
> >Hi, Richard,
> >
> >I think that your 1 PPS should be very usable.  I think we typically 
> >generate the 1 PPS from a GPS clock.
> >
> >If you want to try a test, you could disconnect the 1 PPS and use the 
> >software generated sync signal as per the earlier emails.  If that works and 
> >using the external 1 PPS doesn't then you will have found the problem.  I'd 
> >be surprised (but happy!) if that turns out to be the problem.
> >
> >Dave
> >
> >On Nov 7, 2014, at 10:55 AM, Richard Black wrote:
> >
> >> Thanks David and all,
> >>
> >> I unfortunately misspoke when it came to the power in the ADC clock 
> >> signal. In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
> >>
> >> I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To 
> >> check on possible ringing, I also hooked up our pulse generator to an 
> >> oscilloscope (I increased the pulse width to 10 ms, so I could see it). 
> >> The waveform I observe has some severe overshoot both on the uptake and 
> >> down. I've attached a drawing to explain what I mean.
> >>
> >> I can't seem to mitigate this overshoot with our little Agilent arbitrary 
> >> waveform generator. Is this similar to the ringing seen at NRAO? If so, 
> >> how is the 1 PPS generated by casperites?
> >>
> >> Thanks,
> >>
> >> Richard Black
> >>
> >> On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon  >> > wrote:
> >> Hi, Richard,
> >>
> >> On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
> >>
> >> > Haven't heard anything for a while, so I thought I would add some more 
> >> > detail about our system setup to see if it might shed some light on the 
> >> > problem:
> >> >
> >> > 1 PPS Signal
> >> > -
> >> > Square pulse
> >> > Frequency: 1 Hz
> >> > Amplitude: 3 Vpp
> >> > Offset: 0 V
> >> > Width: 10 ms
> >> > Edge Time: 5 ns
> >>
> >> That should be fine assuming the 3 Vpp is measured with the 50 ohm 
> >> termination in place.  If you want to try a software sync, you can pass 
> >> "-S" (UPPERcase!) to the latest paper_feng_init.rb script.  Check the 
> >> output of "paper_feng_init.rb --help" to see whether your version supports 
> >> that option.
> >>
> >> > ADC Clock
> >> > -
> >> > CW Tone
> >> > Frequency: 200 MHz
> >> > Power: -9 dBm
> >>
> >> It would be a good idea to increase the power level to +6 dBm as described 
> >> on this wiki page:
> >>
> >> https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
> >>
> >> But if the paper_feng_init.rb script reports that the ADC clocks are 
> >> locked and they measure approximately 200 MHz, then I think this is 
> >> unlikely to be the cause of the 10 GbE overflow problems (though it would 
> >> be great if the fix were this simple!).
> >>
> >> > For David, are there any red flags with our UBoot version or ROACH CPLD? 
> >> > Here they are again for reference:
> >> >
> >> > From serial interface after ROACH reboot
> >> > ==
> >> > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> >> > ...
> >> > CPLD: 2.1
> >> > ==
> >>
> >> This matches one of our ROACH2s that is running and sending 10 GbE packets 
> >> in our lab:
> >>
> >> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> >>
> >> CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
> >>No Security/Kasumi support
> >>Bootstrap Option C - Boot ROM Location EBC (16 bits)
> >>32 kB I-Cache 32 kB D-Cache
> >> Board: ROACH2
> >> I2C:   ready
> >> DRAM:  512 MiB
> >> Flash: 128 MiB
> >> In:serial
> >> Out:   serial
> >> Err:   serial
> >> CPLD:  2.1
> >> USB:   Host(int phy)
> >> SN:ROACH2.2 batch=D#6#69 software fixups match
> >> MAC:   02:44:01:02:06:45
> >> DTT:   1 is 23 C
> >> DTT:   2 is 26 C
> >> Net:   ppc_4xx_eth0
> >>
> >> Hope this helps,
> >> Dave
> >>
> >>
> >> 
> >
>
>
>
>

-- 
Richard Black

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-12 Thread Richard Black

Wow. Well that seemed to be the magic bullet. Thanks!

Any ideas why this works? Is it because of an NFS lock-out or a 10-GbE
driver issue in the NFS kernel image?

In any case, this is a tremendous discovery! Thanks to all for all the
effort!

Richard

On Wednesday, November 12, 2014, 牛晨辉  wrote:

>
> Hi All,
> I'm happy to tell you the PAPER model can run without overflow finally!
> I find the bof file no matter PAPER model or own could run in 200Mhz and
> the packet structure is right.
> That is the System setup on roach it matters,(Thanks to Marc's help in
> soloboot!).I try the soloboot on the roach,
> and it works fine for the model.
> I don't know why the setup on netboot is not ok ,(it influenced the
> frequency too much I guess)however, FWIW,The overflow problem company with
> me for few weeks finally solved out!
> I could have a good sleep tonight,Thanks for your warm help!
> Peter
>
>
>
>
>
>
> At 2014-11-08 03:10:47, "David MacMahon"  > wrote:
> >Hi, Richard,
> >
> >I think that your 1 PPS should be very usable.  I think we typically 
> >generate the 1 PPS from a GPS clock.
> >
> >If you want to try a test, you could disconnect the 1 PPS and use the 
> >software generated sync signal as per the earlier emails.  If that works and 
> >using the external 1 PPS doesn't then you will have found the problem.  I'd 
> >be surprised (but happy!) if that turns out to be the problem.
> >
> >Dave
> >
> >On Nov 7, 2014, at 10:55 AM, Richard Black wrote:
> >
> >> Thanks David and all,
> >>
> >> I unfortunately misspoke when it came to the power in the ADC clock 
> >> signal. In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
> >>
> >> I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To 
> >> check on possible ringing, I also hooked up our pulse generator to an 
> >> oscilloscope (I increased the pulse width to 10 ms, so I could see it). 
> >> The waveform I observe has some severe overshoot both on the uptake and 
> >> down. I've attached a drawing to explain what I mean.
> >>
> >> I can't seem to mitigate this overshoot with our little Agilent arbitrary 
> >> waveform generator. Is this similar to the ringing seen at NRAO? If so, 
> >> how is the 1 PPS generated by casperites?
> >>
> >> Thanks,
> >>
> >> Richard Black
> >>
> >> On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon  >> > wrote:
> >> Hi, Richard,
> >>
> >> On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
> >>
> >> > Haven't heard anything for a while, so I thought I would add some more 
> >> > detail about our system setup to see if it might shed some light on the 
> >> > problem:
> >> >
> >> > 1 PPS Signal
> >> > -
> >> > Square pulse
> >> > Frequency: 1 Hz
> >> > Amplitude: 3 Vpp
> >> > Offset: 0 V
> >> > Width: 10 ms
> >> > Edge Time: 5 ns
> >>
> >> That should be fine assuming the 3 Vpp is measured with the 50 ohm 
> >> termination in place.  If you want to try a software sync, you can pass 
> >> "-S" (UPPERcase!) to the latest paper_feng_init.rb script.  Check the 
> >> output of "paper_feng_init.rb --help" to see whether your version supports 
> >> that option.
> >>
> >> > ADC Clock
> >> > -
> >> > CW Tone
> >> > Frequency: 200 MHz
> >> > Power: -9 dBm
> >>
> >> It would be a good idea to increase the power level to +6 dBm as described 
> >> on this wiki page:
> >>
> >> https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
> >>
> >> But if the paper_feng_init.rb script reports that the ADC clocks are 
> >> locked and they measure approximately 200 MHz, then I think this is 
> >> unlikely to be the cause of the 10 GbE overflow problems (though it would 
> >> be great if the fix were this simple!).
> >>
> >> > For David, are there any red flags with our UBoot version or ROACH CPLD? 
> >> > Here they are again for reference:
> >> >
> >> > From serial interface after ROACH reboot
> >> > ==
> >> > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> >> > ...
> >> > CPLD: 2.1
> >> > ==
> >>
> >> This matches one of our ROACH2s that is running and sending 10 GbE packets 
> >> in our lab:
> >>
> >> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> >>
> >> CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
> >>No Security/Kasumi support
> >>Bootstrap Option C - Boot ROM Location EBC (16 bits)
> >>32 kB I-Cache 32 kB D-Cache
> >> Board: ROACH2
> >> I2C:   ready
> >> DRAM:  512 MiB
> >> Flash: 128 MiB
> >> In:serial
> >> Out:   serial
> >> Err:   serial
> >> CPLD:  2.1
> >> USB:   Host(int phy)
> >> SN:ROACH2.2 batch=D#6#69 software fixups match
> >> MAC:   02:44:01:02:06:45
> >> DTT:   1 is 23 C
> >> DTT:   2 is 26 C
> >> Net:   ppc_4xx_eth0
> >>
> >> Hope this helps,
> >> Dave
> >>
> >>
> >> 
> >
>
>
>
>

-- 
Richard Black

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-12 Thread 牛晨辉



Hi All,
I'm happy to tell you the PAPER model can run without overflow finally!
I find the bof file no matter PAPER model or own could run in 200Mhz and the 
packet structure is right.
That is the System setup on roach it matters,(Thanks to Marc's help in 
soloboot!).I try the soloboot on the roach,
and it works fine for the model.
I don't know why the setup on netboot is not ok ,(it influenced the frequency 
too much I guess)however, FWIW,The overflow problem company with me for few 
weeks finally solved out!
I could have a good sleep tonight,Thanks for your warm help!
Peter









At 2014-11-08 03:10:47, "David MacMahon"  wrote:
>Hi, Richard,
>
>I think that your 1 PPS should be very usable.  I think we typically generate 
>the 1 PPS from a GPS clock.
>
>If you want to try a test, you could disconnect the 1 PPS and use the software 
>generated sync signal as per the earlier emails.  If that works and using the 
>external 1 PPS doesn't then you will have found the problem.  I'd be surprised 
>(but happy!) if that turns out to be the problem.
>
>Dave
>
>On Nov 7, 2014, at 10:55 AM, Richard Black wrote:
>
>> Thanks David and all,
>> 
>> I unfortunately misspoke when it came to the power in the ADC clock signal. 
>> In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
>> 
>> I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To check 
>> on possible ringing, I also hooked up our pulse generator to an oscilloscope 
>> (I increased the pulse width to 10 ms, so I could see it). The waveform I 
>> observe has some severe overshoot both on the uptake and down. I've attached 
>> a drawing to explain what I mean.
>> 
>> I can't seem to mitigate this overshoot with our little Agilent arbitrary 
>> waveform generator. Is this similar to the ringing seen at NRAO? If so, how 
>> is the 1 PPS generated by casperites?
>> 
>> Thanks,
>> 
>> Richard Black
>> 
>> On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon  
>> wrote:
>> Hi, Richard,
>> 
>> On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
>> 
>> > Haven't heard anything for a while, so I thought I would add some more 
>> > detail about our system setup to see if it might shed some light on the 
>> > problem:
>> >
>> > 1 PPS Signal
>> > -
>> > Square pulse
>> > Frequency: 1 Hz
>> > Amplitude: 3 Vpp
>> > Offset: 0 V
>> > Width: 10 ms
>> > Edge Time: 5 ns
>> 
>> That should be fine assuming the 3 Vpp is measured with the 50 ohm 
>> termination in place.  If you want to try a software sync, you can pass "-S" 
>> (UPPERcase!) to the latest paper_feng_init.rb script.  Check the output of 
>> "paper_feng_init.rb --help" to see whether your version supports that option.
>> 
>> > ADC Clock
>> > -
>> > CW Tone
>> > Frequency: 200 MHz
>> > Power: -9 dBm
>> 
>> It would be a good idea to increase the power level to +6 dBm as described 
>> on this wiki page:
>> 
>> https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
>> 
>> But if the paper_feng_init.rb script reports that the ADC clocks are locked 
>> and they measure approximately 200 MHz, then I think this is unlikely to be 
>> the cause of the 10 GbE overflow problems (though it would be great if the 
>> fix were this simple!).
>> 
>> > For David, are there any red flags with our UBoot version or ROACH CPLD? 
>> > Here they are again for reference:
>> >
>> > From serial interface after ROACH reboot
>> > ==
>> > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
>> > ...
>> > CPLD: 2.1
>> > ==
>> 
>> This matches one of our ROACH2s that is running and sending 10 GbE packets 
>> in our lab:
>> 
>> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
>> 
>> CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
>>No Security/Kasumi support
>>Bootstrap Option C - Boot ROM Location EBC (16 bits)
>>32 kB I-Cache 32 kB D-Cache
>> Board: ROACH2
>> I2C:   ready
>> DRAM:  512 MiB
>> Flash: 128 MiB
>> In:serial
>> Out:   serial
>> Err:   serial
>> CPLD:  2.1
>> USB:   Host(int phy)
>> SN:ROACH2.2 batch=D#6#69 software fixups match
>> MAC:   02:44:01:02:06:45
>> DTT:   1 is 23 C
>> DTT:   2 is 26 C
>> Net:   ppc_4xx_eth0
>> 
>> Hope this helps,
>> Dave
>> 
>> 
>> 
>

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Richard Black

David,

Well, unfortunately, using only the software-generated sync did not fix the
packet overflow issue. :-(

Richard Black

On Fri, Nov 7, 2014 at 12:10 PM, David MacMahon 
wrote:

> Hi, Richard,
>
> I think that your 1 PPS should be very usable.  I think we typically
> generate the 1 PPS from a GPS clock.
>
> If you want to try a test, you could disconnect the 1 PPS and use the
> software generated sync signal as per the earlier emails.  If that works
> and using the external 1 PPS doesn't then you will have found the problem.
> I'd be surprised (but happy!) if that turns out to be the problem.
>
> Dave
>
> On Nov 7, 2014, at 10:55 AM, Richard Black wrote:
>
> > Thanks David and all,
> >
> > I unfortunately misspoke when it came to the power in the ADC clock
> signal. In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
> >
> > I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To
> check on possible ringing, I also hooked up our pulse generator to an
> oscilloscope (I increased the pulse width to 10 ms, so I could see it). The
> waveform I observe has some severe overshoot both on the uptake and down.
> I've attached a drawing to explain what I mean.
> >
> > I can't seem to mitigate this overshoot with our little Agilent
> arbitrary waveform generator. Is this similar to the ringing seen at NRAO?
> If so, how is the 1 PPS generated by casperites?
> >
> > Thanks,
> >
> > Richard Black
> >
> > On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon <
> dav...@astro.berkeley.edu> wrote:
> > Hi, Richard,
> >
> > On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
> >
> > > Haven't heard anything for a while, so I thought I would add some more
> detail about our system setup to see if it might shed some light on the
> problem:
> > >
> > > 1 PPS Signal
> > > -
> > > Square pulse
> > > Frequency: 1 Hz
> > > Amplitude: 3 Vpp
> > > Offset: 0 V
> > > Width: 10 ms
> > > Edge Time: 5 ns
> >
> > That should be fine assuming the 3 Vpp is measured with the 50 ohm
> termination in place.  If you want to try a software sync, you can pass
> "-S" (UPPERcase!) to the latest paper_feng_init.rb script.  Check the
> output of "paper_feng_init.rb --help" to see whether your version supports
> that option.
> >
> > > ADC Clock
> > > -
> > > CW Tone
> > > Frequency: 200 MHz
> > > Power: -9 dBm
> >
> > It would be a good idea to increase the power level to +6 dBm as
> described on this wiki page:
> >
> >
> https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
> >
> > But if the paper_feng_init.rb script reports that the ADC clocks are
> locked and they measure approximately 200 MHz, then I think this is
> unlikely to be the cause of the 10 GbE overflow problems (though it would
> be great if the fix were this simple!).
> >
> > > For David, are there any red flags with our UBoot version or ROACH
> CPLD? Here they are again for reference:
> > >
> > > From serial interface after ROACH reboot
> > > ==
> > > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> > > ...
> > > CPLD: 2.1
> > > ==
> >
> > This matches one of our ROACH2s that is running and sending 10 GbE
> packets in our lab:
> >
> > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> >
> > CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
> >No Security/Kasumi support
> >Bootstrap Option C - Boot ROM Location EBC (16 bits)
> >32 kB I-Cache 32 kB D-Cache
> > Board: ROACH2
> > I2C:   ready
> > DRAM:  512 MiB
> > Flash: 128 MiB
> > In:serial
> > Out:   serial
> > Err:   serial
> > CPLD:  2.1
> > USB:   Host(int phy)
> > SN:ROACH2.2 batch=D#6#69 software fixups match
> > MAC:   02:44:01:02:06:45
> > DTT:   1 is 23 C
> > DTT:   2 is 26 C
> > Net:   ppc_4xx_eth0
> >
> > Hope this helps,
> > Dave
> >
> >
> > 
>
>

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread David MacMahon

Hi, Richard,

I think that your 1 PPS should be very usable.  I think we typically generate 
the 1 PPS from a GPS clock.

If you want to try a test, you could disconnect the 1 PPS and use the software 
generated sync signal as per the earlier emails.  If that works and using the 
external 1 PPS doesn't then you will have found the problem.  I'd be surprised 
(but happy!) if that turns out to be the problem.

Dave

On Nov 7, 2014, at 10:55 AM, Richard Black wrote:

> Thanks David and all,
> 
> I unfortunately misspoke when it came to the power in the ADC clock signal. 
> In fact, we had it at 9 dBm, not -9. Sorry for any confusion.
> 
> I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To check 
> on possible ringing, I also hooked up our pulse generator to an oscilloscope 
> (I increased the pulse width to 10 ms, so I could see it). The waveform I 
> observe has some severe overshoot both on the uptake and down. I've attached 
> a drawing to explain what I mean.
> 
> I can't seem to mitigate this overshoot with our little Agilent arbitrary 
> waveform generator. Is this similar to the ringing seen at NRAO? If so, how 
> is the 1 PPS generated by casperites?
> 
> Thanks,
> 
> Richard Black
> 
> On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon  
> wrote:
> Hi, Richard,
> 
> On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
> 
> > Haven't heard anything for a while, so I thought I would add some more 
> > detail about our system setup to see if it might shed some light on the 
> > problem:
> >
> > 1 PPS Signal
> > -
> > Square pulse
> > Frequency: 1 Hz
> > Amplitude: 3 Vpp
> > Offset: 0 V
> > Width: 10 ms
> > Edge Time: 5 ns
> 
> That should be fine assuming the 3 Vpp is measured with the 50 ohm 
> termination in place.  If you want to try a software sync, you can pass "-S" 
> (UPPERcase!) to the latest paper_feng_init.rb script.  Check the output of 
> "paper_feng_init.rb --help" to see whether your version supports that option.
> 
> > ADC Clock
> > -
> > CW Tone
> > Frequency: 200 MHz
> > Power: -9 dBm
> 
> It would be a good idea to increase the power level to +6 dBm as described on 
> this wiki page:
> 
> https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
> 
> But if the paper_feng_init.rb script reports that the ADC clocks are locked 
> and they measure approximately 200 MHz, then I think this is unlikely to be 
> the cause of the 10 GbE overflow problems (though it would be great if the 
> fix were this simple!).
> 
> > For David, are there any red flags with our UBoot version or ROACH CPLD? 
> > Here they are again for reference:
> >
> > From serial interface after ROACH reboot
> > ==
> > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> > ...
> > CPLD: 2.1
> > ==
> 
> This matches one of our ROACH2s that is running and sending 10 GbE packets in 
> our lab:
> 
> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> 
> CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
>No Security/Kasumi support
>Bootstrap Option C - Boot ROM Location EBC (16 bits)
>32 kB I-Cache 32 kB D-Cache
> Board: ROACH2
> I2C:   ready
> DRAM:  512 MiB
> Flash: 128 MiB
> In:serial
> Out:   serial
> Err:   serial
> CPLD:  2.1
> USB:   Host(int phy)
> SN:ROACH2.2 batch=D#6#69 software fixups match
> MAC:   02:44:01:02:06:45
> DTT:   1 is 23 C
> DTT:   2 is 26 C
> Net:   ppc_4xx_eth0
> 
> Hope this helps,
> Dave
> 
> 
>

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread David MacMahon

Hi, Peter,

Here is a tcpdump snapshot of the first part of a PAPER packet.  The data from 
tcpdump includes headers from other network layers that encapsulate the 
application data.

Here is the output:

$ sudo tcpdump -i eth4 -s 100 -xx -c 1 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth4, link-type EN10MB (Ethernet), capture size 100 bytes
20:36:04.678013 IP 10.10.4.1.8511 > 10.0.4.54.8511: UDP, length 8208
0x:     0202 c0a8 0401 0800 4500
0x0010:  202c  4000 ff11 3f80 0a0a 0401 0a00
0x0020:  0436 213f 213f 2018  0006 e74d 2d6d
0x0030:  0510 c003 1f1f f200 eefe 0fed e2dd dbf0
0x0040:  e00f e5c3 eef4 03e2 ff11 31ed 1011 1e3c
0x0050:  4ce5 f342 10bf 1ff9 1f2a 9f26 e334 4e60
0x0060:  1010 1ff2 ...

Here is a breakdown of what is there...

# Ethernet Header

  Note the broadcast destination MAC (ff:ff:ff:ff:ff:ff) is used because this is
  a direct connection from ROACH2 to 10 GbE NIC (i.e. no switch).

0x:     0202 c0a8 0401 0800

# IP Header

  Note the source IP (10.10.4.1) and destination IP (10.0.4.45) in the last 8 
octets.

0x: 4500
0x0010:  202c  4000 ff11 3f80 0a0a 0401 0a00
0x0020:  0436

# UDP Header

  PAPER uses port 8511 (0x213f) because US "Letter Size" paper is 8.5x11 
inches. :-)
  The same port number is used for both source and destination ports.
  0x2018 is UDP packet length == UDP header length + application packet length.
  Here we have 8216 == 8 + 8208.

0x0020:   213f 213f 2018 

# PAPER Packet (finally!)

  The first 6 bytes are MCOUNT (0x0006e74d2d6d).
  The next 1 byte is FID (5).
  The next 1 byte is XID (16).
  The next 8192 bytes (not all shown) are the data.
  The final 8 bytes (not shown) are 4 bytes CRC + 4 bytes of zeros.
  The CRC is of the PAPER header and data (mcount+fid+xid+data).

0x0020:   0006 e74d 2d6d
0x0030:  0510 c003 1f1f f200 eefe 0fed e2dd dbf0
0x0040:  e00f e5c3 eef4 03e2 ff11 31ed 1011 1e3c
0x0050:  4ce5 f342 10bf 1ff9 1f2a 9f26 e334 4e60
0x0060:  1010 1ff2 ...

Hope this helps,
Dave

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Richard Black

Thanks David and all,

I unfortunately misspoke when it came to the power in the ADC clock signal.
In fact, we had it at 9 dBm, not -9. Sorry for any confusion.

I set up the pulse generator to swing from +0.0 to +3.0 V at 1 us. To check
on possible ringing, I also hooked up our pulse generator to an
oscilloscope (I increased the pulse width to 10 ms, so I could see it). The
waveform I observe has some severe overshoot both on the uptake and down.
I've attached a drawing to explain what I mean.

I can't seem to mitigate this overshoot with our little Agilent arbitrary
waveform generator. Is this similar to the ringing seen at NRAO? If so, how
is the 1 PPS generated by casperites?

Thanks,

Richard Black

On Fri, Nov 7, 2014 at 11:29 AM, David MacMahon 
wrote:

> Hi, Richard,
>
> On Nov 7, 2014, at 9:03 AM, Richard Black wrote:
>
> > Haven't heard anything for a while, so I thought I would add some more
> detail about our system setup to see if it might shed some light on the
> problem:
> >
> > 1 PPS Signal
> > -
> > Square pulse
> > Frequency: 1 Hz
> > Amplitude: 3 Vpp
> > Offset: 0 V
> > Width: 10 ms
> > Edge Time: 5 ns
>
> That should be fine assuming the 3 Vpp is measured with the 50 ohm
> termination in place.  If you want to try a software sync, you can pass
> "-S" (UPPERcase!) to the latest paper_feng_init.rb script.  Check the
> output of "paper_feng_init.rb --help" to see whether your version supports
> that option.
>
> > ADC Clock
> > -
> > CW Tone
> > Frequency: 200 MHz
> > Power: -9 dBm
>
> It would be a good idea to increase the power level to +6 dBm as described
> on this wiki page:
>
>
> https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs
>
> But if the paper_feng_init.rb script reports that the ADC clocks are
> locked and they measure approximately 200 MHz, then I think this is
> unlikely to be the cause of the 10 GbE overflow problems (though it would
> be great if the fix were this simple!).
>
> > For David, are there any red flags with our UBoot version or ROACH CPLD?
> Here they are again for reference:
> >
> > From serial interface after ROACH reboot
> > ==
> > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> > ...
> > CPLD: 2.1
> > ==
>
> This matches one of our ROACH2s that is running and sending 10 GbE packets
> in our lab:
>
> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
>
> CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
>No Security/Kasumi support
>Bootstrap Option C - Boot ROM Location EBC (16 bits)
>32 kB I-Cache 32 kB D-Cache
> Board: ROACH2
> I2C:   ready
> DRAM:  512 MiB
> Flash: 128 MiB
> In:serial
> Out:   serial
> Err:   serial
> CPLD:  2.1
> USB:   Host(int phy)
> SN:ROACH2.2 batch=D#6#69 software fixups match
> MAC:   02:44:01:02:06:45
> DTT:   1 is 23 C
> DTT:   2 is 26 C
> Net:   ppc_4xx_eth0
>
> Hope this helps,
> Dave
>
>

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread David MacMahon

Hi, Richard,

On Nov 7, 2014, at 9:03 AM, Richard Black wrote:

> Haven't heard anything for a while, so I thought I would add some more detail 
> about our system setup to see if it might shed some light on the problem:
> 
> 1 PPS Signal
> -
> Square pulse
> Frequency: 1 Hz
> Amplitude: 3 Vpp
> Offset: 0 V
> Width: 10 ms
> Edge Time: 5 ns

That should be fine assuming the 3 Vpp is measured with the 50 ohm termination 
in place.  If you want to try a software sync, you can pass "-S" (UPPERcase!) 
to the latest paper_feng_init.rb script.  Check the output of 
"paper_feng_init.rb --help" to see whether your version supports that option.

> ADC Clock
> -
> CW Tone
> Frequency: 200 MHz
> Power: -9 dBm

It would be a good idea to increase the power level to +6 dBm as described on 
this wiki page:

https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Inputs

But if the paper_feng_init.rb script reports that the ADC clocks are locked and 
they measure approximately 200 MHz, then I think this is unlikely to be the 
cause of the 10 GbE overflow problems (though it would be great if the fix were 
this simple!).

> For David, are there any red flags with our UBoot version or ROACH CPLD? Here 
> they are again for reference:
> 
> From serial interface after ROACH reboot
> ==
> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> ...
> CPLD: 2.1
> ==

This matches one of our ROACH2s that is running and sending 10 GbE packets in 
our lab:

U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)

CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
   No Security/Kasumi support
   Bootstrap Option C - Boot ROM Location EBC (16 bits)
   32 kB I-Cache 32 kB D-Cache
Board: ROACH2
I2C:   ready
DRAM:  512 MiB
Flash: 128 MiB
In:serial
Out:   serial
Err:   serial
CPLD:  2.1
USB:   Host(int phy)
SN:ROACH2.2 batch=D#6#69 software fixups match
MAC:   02:44:01:02:06:45
DTT:   1 is 23 C
DTT:   2 is 26 C
Net:   ppc_4xx_eth0

Hope this helps,
Dave

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread 牛晨辉

Hi Glenn,Richard,and all, first,Do you think ADC clock -9dbm is proper?I 
checked the manual on casper website,it said +6dbm,well ,I doubt it too big ,so 
I use -1dbm. second,could it possible that the data received by wireshark on 
hpc is disordered?is the wireshark reading order correct?i received packets 
that the header show up in the middle of the packet when i use wireshark.I 
doubt the wireshark reading order is not correct ... Best Regards! peter -- 发自 
Android 网易邮箱 On 2014-11-08 01:15 , G Jones Wrote: Also, at least for many ADC 
boards that have a PPS input, the signal is connected to a 50 ohm resistor to 
ground and then goes into a TTL to LVDS converter chip. You mentioned 3 Vpp and 
0 V offset, so that sounds like the signal is mostly at -1.5 V and then pulses 
up to +1.5V. I would suggest a positive only waveform, 0 V pulsing up to 3 V 
would be better. Glenn On Fri, Nov 7, 2014 at 12:12 PM, Richard Black 
 wrote: Dan, We aren't using a square wave. It's a pulse 
function, but that pulse's shape can be easily described as a very thin square 
pulse. However, you are saying that the pulse is high for only 1 us? That is 
much shorter than what we are doing. I'll see if I can twiddle that down. 
Thanks, Richard Black On Fri, Nov 7, 2014 at 10:09 AM, Dan Werthimer 
 wrote: you mentioned your 1 PPS is a square wave. 
that's different from everyone else's 1 PPS: standard 1 PPS systems output a 
pulse that is high for about 1 uS.  (extremely low duty cycle). i don't know if 
a square wave could be a problem - my guess is that the correlator design uses 
an edge detection block, so is only sensitive to edges, not levels, but it 
might be worth investigating. best wishes, dan On Fri, Nov 7, 2014 at 9:03 AM, 
Richard Black  wrote: > Hi all, > > Haven't heard 
anything for a while, so I thought I would add some more > detail about our 
system setup to see if it might shed some light on the > problem: > > 1 PPS 
Signal > - > Square pulse > Frequency: 
1 Hz > Amplitude: 3 Vpp > Offset: 0 V > Width: 10 ms > Edge Time: 5 ns > > ADC 
Clock > - > CW Tone > Frequency: 200 
MHz > Power: -9 dBm > > For David, are there any red flags with our UBoot 
version or ROACH CPLD? > Here they are again for reference: > > From serial 
interface after ROACH reboot > == > U-Boot 
2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06) > ... > CPLD: 2.1 > 
== > > Thanks! > > > Richard Black > > On Tue, Nov 4, 2014 at 
12:05 PM, Richard Black  > wrote: >> >> Hi David, >> >> 
Comments below: >> >> Richard Black >> >> On Mon, Nov 3, 2014 at 3:51 PM, David 
MacMahon  >> wrote: >>> >>> Hi, Richard, >>> >>> On 
Nov 3, 2014, at 11:47 AM, Richard Black wrote: >>> >>> > So, it's been a little 
while now, but not much has changed yet. We've >>> > gotten Chipscope working, 
and, so far, there aren't any red flags with the >>> > FPGA firmware 10-GbE 
control signals. >>> >>> That's good to know, although maybe in some way it 
would have been nice >>> if you had found some red flags. >>> >>> > We also 
confirmed that the bitstream we are using is in fact >>> > 
roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not the >>> > 
problem. >>> >>> At least you are using a known good BOF file, so that 
eliminates a source >>> of potential errors. >>> >>> > I also took a look at 
the ROACH2 PPC setup: we pulled from the .git >>> > repository on February 12, 
2014 (commit number = >>> > e14df9016c3b7ccba62cc6d0cae05405f4929c94). There 
haven't been any changes to >>> > that repository since August 2013, so unless 
the SKA-SA ROACH-2s are using a >>> > pull from before then, I don't think that 
is our issue. >>> >>> We use our own homegrown NFS root filesystem for the 
ROACH2s, so I can't >>> comment on the status of the one you refer to >>> 
(https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more interested in >>> 
the U-Boot version you have (see https://github.com/ska-sa/roach2_uboot.git) 
>>> and which version of the ROACH2 CPLD image you are using (not sure where to 
>>> get this).  I think these are unlikely to be problematic, but we've already 
>>> checked all the likely problems. >> >> >> When I rebooted the ROACH-2, I 
got the following header for U-Boot: >> >> U-Boot 
2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06) >> ... >> CPLD: 2.1 
>> >> Hope this is informative. >> >> >>> >>> >>> > We also tried out Jason 
Manley's suggestion of delaying the enabling of >>> > the 10-GbE cores to 
ensure that the sync pulse propagated through the entire >>> > system before 
buffering up data, but the problem persisted. >>> >>> Do you have an external 1 
PPS sync pulse connected or have you tried the >>> latest rb-papergpu software 
that supports a software-generated "sync"?  The >>> paper_feng_init.rb script 
already disables the data flow to the 10 GbE cores >>> until the sync pulse has 
propagated through and

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Dan Werthimer

seconding glenn,

the 1 PPS pulse should be 0 to +3 volts
when terminated in 50 ohms.  (when connected to the roach board).
(that's 0 to 5 or 6 volts when not terminated).
the 1 PPS pulse should not go negative.

i suggest a pulse width of 1 uS  (not 10 ms).

best wishes,

dan




On Fri, Nov 7, 2014 at 9:12 AM, Richard Black  wrote:
> Dan,
>
> We aren't using a square wave. It's a pulse function, but that pulse's shape
> can be easily described as a very thin square pulse.
>
> However, you are saying that the pulse is high for only 1 us? That is much
> shorter than what we are doing. I'll see if I can twiddle that down.
>
> Thanks,
>
> Richard Black
>
> On Fri, Nov 7, 2014 at 10:09 AM, Dan Werthimer 
> wrote:
>>
>> you mentioned your 1 PPS is a square wave.
>> that's different from everyone else's 1 PPS:
>>
>> standard 1 PPS systems output a pulse that is
>> high for about 1 uS.  (extremely low duty cycle).
>>
>> i don't know if a square wave could be a problem - my guess
>> is that the correlator design uses an edge detection block,
>> so is only sensitive to edges, not levels, but it might
>> be worth investigating.
>>
>> best wishes,
>>
>> dan
>>
>>
>> On Fri, Nov 7, 2014 at 9:03 AM, Richard Black 
>> wrote:
>> > Hi all,
>> >
>> > Haven't heard anything for a while, so I thought I would add some more
>> > detail about our system setup to see if it might shed some light on the
>> > problem:
>> >
>> > 1 PPS Signal
>> > -
>> > Square pulse
>> > Frequency: 1 Hz
>> > Amplitude: 3 Vpp
>> > Offset: 0 V
>> > Width: 10 ms
>> > Edge Time: 5 ns
>> >
>> > ADC Clock
>> > -
>> > CW Tone
>> > Frequency: 200 MHz
>> > Power: -9 dBm
>> >
>> > For David, are there any red flags with our UBoot version or ROACH CPLD?
>> > Here they are again for reference:
>> >
>> > From serial interface after ROACH reboot
>> > ==
>> > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
>> > ...
>> > CPLD: 2.1
>> > ==
>> >
>> > Thanks!
>> >
>> >
>> > Richard Black
>> >
>> > On Tue, Nov 4, 2014 at 12:05 PM, Richard Black 
>> > wrote:
>> >>
>> >> Hi David,
>> >>
>> >> Comments below:
>> >>
>> >> Richard Black
>> >>
>> >> On Mon, Nov 3, 2014 at 3:51 PM, David MacMahon
>> >> 
>> >> wrote:
>> >>>
>> >>> Hi, Richard,
>> >>>
>> >>> On Nov 3, 2014, at 11:47 AM, Richard Black wrote:
>> >>>
>> >>> > So, it's been a little while now, but not much has changed yet.
>> >>> > We've
>> >>> > gotten Chipscope working, and, so far, there aren't any red flags
>> >>> > with the
>> >>> > FPGA firmware 10-GbE control signals.
>> >>>
>> >>> That's good to know, although maybe in some way it would have been
>> >>> nice
>> >>> if you had found some red flags.
>> >>>
>> >>> > We also confirmed that the bitstream we are using is in fact
>> >>> > roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not
>> >>> > the
>> >>> > problem.
>> >>>
>> >>> At least you are using a known good BOF file, so that eliminates a
>> >>> source
>> >>> of potential errors.
>> >>>
>> >>> > I also took a look at the ROACH2 PPC setup: we pulled from the .git
>> >>> > repository on February 12, 2014 (commit number =
>> >>> > e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any
>> >>> > changes to
>> >>> > that repository since August 2013, so unless the SKA-SA ROACH-2s are
>> >>> > using a
>> >>> > pull from before then, I don't think that is our issue.
>> >>>
>> >>> We use our own homegrown NFS root filesystem for the ROACH2s, so I
>> >>> can't
>> >>> comment on the status of the one you refer to
>> >>> (https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more
>> >>> interested in
>> >>> the U-Boot version you have (see
>> >>> https://github.com/ska-sa/roach2_uboot.git)
>> >>> and which version of the ROACH2 CPLD image you are using (not sure
>> >>> where to
>> >>> get this).  I think these are unlikely to be problematic, but we've
>> >>> already
>> >>> checked all the likely problems.
>> >>
>> >>
>> >> When I rebooted the ROACH-2, I got the following header for U-Boot:
>> >>
>> >> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
>> >> ...
>> >> CPLD: 2.1
>> >>
>> >> Hope this is informative.
>> >>
>> >>
>> >>>
>> >>>
>> >>> > We also tried out Jason Manley's suggestion of delaying the enabling
>> >>> > of
>> >>> > the 10-GbE cores to ensure that the sync pulse propagated through
>> >>> > the entire
>> >>> > system before buffering up data, but the problem persisted.
>> >>>
>> >>> Do you have an external 1 PPS sync pulse connected or have you tried
>> >>> the
>> >>> latest rb-papergpu software that supports a software-generated "sync"?
>> >>> The
>> >>> paper_feng_init.rb script already disables the data flow to the 10 GbE
>> >>> cores
>> >>> until the sync pulse has propagated through and the cores have been
>> >>> taken
>> >>> out of reset.
>> >>>
>> >>
>> >> We are using an external 1 PPS sync pulse. However, we are cert

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread G Jones

Also, at least for many ADC boards that have a PPS input, the signal is
connected to a 50 ohm resistor to ground and then goes into a TTL to LVDS
converter chip. You mentioned 3 Vpp and 0 V offset, so that sounds like the
signal is mostly at -1.5 V and then pulses up to +1.5V. I would suggest a
positive only waveform, 0 V pulsing up to 3 V would be better.

Glenn

On Fri, Nov 7, 2014 at 12:12 PM, Richard Black 
wrote:

> Dan,
>
> We aren't using a square wave. It's a pulse function, but that pulse's
> shape can be easily described as a very thin square pulse.
>
> However, you are saying that the pulse is high for only 1 us? That is much
> shorter than what we are doing. I'll see if I can twiddle that down.
>
> Thanks,
>
> Richard Black
>
> On Fri, Nov 7, 2014 at 10:09 AM, Dan Werthimer 
> wrote:
>
>> you mentioned your 1 PPS is a square wave.
>> that's different from everyone else's 1 PPS:
>>
>> standard 1 PPS systems output a pulse that is
>> high for about 1 uS.  (extremely low duty cycle).
>>
>> i don't know if a square wave could be a problem - my guess
>> is that the correlator design uses an edge detection block,
>> so is only sensitive to edges, not levels, but it might
>> be worth investigating.
>>
>> best wishes,
>>
>> dan
>>
>>
>> On Fri, Nov 7, 2014 at 9:03 AM, Richard Black 
>> wrote:
>> > Hi all,
>> >
>> > Haven't heard anything for a while, so I thought I would add some more
>> > detail about our system setup to see if it might shed some light on the
>> > problem:
>> >
>> > 1 PPS Signal
>> > -
>> > Square pulse
>> > Frequency: 1 Hz
>> > Amplitude: 3 Vpp
>> > Offset: 0 V
>> > Width: 10 ms
>> > Edge Time: 5 ns
>> >
>> > ADC Clock
>> > -
>> > CW Tone
>> > Frequency: 200 MHz
>> > Power: -9 dBm
>> >
>> > For David, are there any red flags with our UBoot version or ROACH CPLD?
>> > Here they are again for reference:
>> >
>> > From serial interface after ROACH reboot
>> > ==
>> > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
>> > ...
>> > CPLD: 2.1
>> > ==
>> >
>> > Thanks!
>> >
>> >
>> > Richard Black
>> >
>> > On Tue, Nov 4, 2014 at 12:05 PM, Richard Black 
>> > wrote:
>> >>
>> >> Hi David,
>> >>
>> >> Comments below:
>> >>
>> >> Richard Black
>> >>
>> >> On Mon, Nov 3, 2014 at 3:51 PM, David MacMahon <
>> dav...@astro.berkeley.edu>
>> >> wrote:
>> >>>
>> >>> Hi, Richard,
>> >>>
>> >>> On Nov 3, 2014, at 11:47 AM, Richard Black wrote:
>> >>>
>> >>> > So, it's been a little while now, but not much has changed yet.
>> We've
>> >>> > gotten Chipscope working, and, so far, there aren't any red flags
>> with the
>> >>> > FPGA firmware 10-GbE control signals.
>> >>>
>> >>> That's good to know, although maybe in some way it would have been
>> nice
>> >>> if you had found some red flags.
>> >>>
>> >>> > We also confirmed that the bitstream we are using is in fact
>> >>> > roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately
>> not the
>> >>> > problem.
>> >>>
>> >>> At least you are using a known good BOF file, so that eliminates a
>> source
>> >>> of potential errors.
>> >>>
>> >>> > I also took a look at the ROACH2 PPC setup: we pulled from the .git
>> >>> > repository on February 12, 2014 (commit number =
>> >>> > e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any
>> changes to
>> >>> > that repository since August 2013, so unless the SKA-SA ROACH-2s
>> are using a
>> >>> > pull from before then, I don't think that is our issue.
>> >>>
>> >>> We use our own homegrown NFS root filesystem for the ROACH2s, so I
>> can't
>> >>> comment on the status of the one you refer to
>> >>> (https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more
>> interested in
>> >>> the U-Boot version you have (see
>> https://github.com/ska-sa/roach2_uboot.git)
>> >>> and which version of the ROACH2 CPLD image you are using (not sure
>> where to
>> >>> get this).  I think these are unlikely to be problematic, but we've
>> already
>> >>> checked all the likely problems.
>> >>
>> >>
>> >> When I rebooted the ROACH-2, I got the following header for U-Boot:
>> >>
>> >> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
>> >> ...
>> >> CPLD: 2.1
>> >>
>> >> Hope this is informative.
>> >>
>> >>
>> >>>
>> >>>
>> >>> > We also tried out Jason Manley's suggestion of delaying the
>> enabling of
>> >>> > the 10-GbE cores to ensure that the sync pulse propagated through
>> the entire
>> >>> > system before buffering up data, but the problem persisted.
>> >>>
>> >>> Do you have an external 1 PPS sync pulse connected or have you tried
>> the
>> >>> latest rb-papergpu software that supports a software-generated
>> "sync"?  The
>> >>> paper_feng_init.rb script already disables the data flow to the 10
>> GbE cores
>> >>> until the sync pulse has propagated through and the cores have been
>> taken
>> >>> out of reset.
>> >>>
>> >>
>> >> We are using an external 1 PPS sync p

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Richard Black

Dan,

We aren't using a square wave. It's a pulse function, but that pulse's
shape can be easily described as a very thin square pulse.

However, you are saying that the pulse is high for only 1 us? That is much
shorter than what we are doing. I'll see if I can twiddle that down.

Thanks,

Richard Black

On Fri, Nov 7, 2014 at 10:09 AM, Dan Werthimer 
wrote:

> you mentioned your 1 PPS is a square wave.
> that's different from everyone else's 1 PPS:
>
> standard 1 PPS systems output a pulse that is
> high for about 1 uS.  (extremely low duty cycle).
>
> i don't know if a square wave could be a problem - my guess
> is that the correlator design uses an edge detection block,
> so is only sensitive to edges, not levels, but it might
> be worth investigating.
>
> best wishes,
>
> dan
>
>
> On Fri, Nov 7, 2014 at 9:03 AM, Richard Black 
> wrote:
> > Hi all,
> >
> > Haven't heard anything for a while, so I thought I would add some more
> > detail about our system setup to see if it might shed some light on the
> > problem:
> >
> > 1 PPS Signal
> > -
> > Square pulse
> > Frequency: 1 Hz
> > Amplitude: 3 Vpp
> > Offset: 0 V
> > Width: 10 ms
> > Edge Time: 5 ns
> >
> > ADC Clock
> > -
> > CW Tone
> > Frequency: 200 MHz
> > Power: -9 dBm
> >
> > For David, are there any red flags with our UBoot version or ROACH CPLD?
> > Here they are again for reference:
> >
> > From serial interface after ROACH reboot
> > ==
> > U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> > ...
> > CPLD: 2.1
> > ==
> >
> > Thanks!
> >
> >
> > Richard Black
> >
> > On Tue, Nov 4, 2014 at 12:05 PM, Richard Black 
> > wrote:
> >>
> >> Hi David,
> >>
> >> Comments below:
> >>
> >> Richard Black
> >>
> >> On Mon, Nov 3, 2014 at 3:51 PM, David MacMahon <
> dav...@astro.berkeley.edu>
> >> wrote:
> >>>
> >>> Hi, Richard,
> >>>
> >>> On Nov 3, 2014, at 11:47 AM, Richard Black wrote:
> >>>
> >>> > So, it's been a little while now, but not much has changed yet. We've
> >>> > gotten Chipscope working, and, so far, there aren't any red flags
> with the
> >>> > FPGA firmware 10-GbE control signals.
> >>>
> >>> That's good to know, although maybe in some way it would have been nice
> >>> if you had found some red flags.
> >>>
> >>> > We also confirmed that the bitstream we are using is in fact
> >>> > roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not
> the
> >>> > problem.
> >>>
> >>> At least you are using a known good BOF file, so that eliminates a
> source
> >>> of potential errors.
> >>>
> >>> > I also took a look at the ROACH2 PPC setup: we pulled from the .git
> >>> > repository on February 12, 2014 (commit number =
> >>> > e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any
> changes to
> >>> > that repository since August 2013, so unless the SKA-SA ROACH-2s are
> using a
> >>> > pull from before then, I don't think that is our issue.
> >>>
> >>> We use our own homegrown NFS root filesystem for the ROACH2s, so I
> can't
> >>> comment on the status of the one you refer to
> >>> (https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more
> interested in
> >>> the U-Boot version you have (see
> https://github.com/ska-sa/roach2_uboot.git)
> >>> and which version of the ROACH2 CPLD image you are using (not sure
> where to
> >>> get this).  I think these are unlikely to be problematic, but we've
> already
> >>> checked all the likely problems.
> >>
> >>
> >> When I rebooted the ROACH-2, I got the following header for U-Boot:
> >>
> >> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> >> ...
> >> CPLD: 2.1
> >>
> >> Hope this is informative.
> >>
> >>
> >>>
> >>>
> >>> > We also tried out Jason Manley's suggestion of delaying the enabling
> of
> >>> > the 10-GbE cores to ensure that the sync pulse propagated through
> the entire
> >>> > system before buffering up data, but the problem persisted.
> >>>
> >>> Do you have an external 1 PPS sync pulse connected or have you tried
> the
> >>> latest rb-papergpu software that supports a software-generated
> "sync"?  The
> >>> paper_feng_init.rb script already disables the data flow to the 10 GbE
> cores
> >>> until the sync pulse has propagated through and the cores have been
> taken
> >>> out of reset.
> >>>
> >>
> >> We are using an external 1 PPS sync pulse. However, we are certain that
> >> it's set up correctly. Although, this could just be me grasping at
> straws
> >> since nothing else seems to solve the problem. How would we go about
> setting
> >> up the software-generated pulse?
> >>
> >>>
> >>> Does the latest rb-papergpu code show that the ADC clocks (MMCMs) are
> >>> locked?  Does it estimate the clock frequency correctly?  Does
> >>> adc16_dump_chans.rb show samples that correspond correctly to the
> analog
> >>> inputs (e.g. a CW tone)?
> >>
> >>
> >> I've attached an image of the output from xtor_up.sh -f 1 with the
> la

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Dan Werthimer

you mentioned your 1 PPS is a square wave.
that's different from everyone else's 1 PPS:

standard 1 PPS systems output a pulse that is
high for about 1 uS.  (extremely low duty cycle).

i don't know if a square wave could be a problem - my guess
is that the correlator design uses an edge detection block,
so is only sensitive to edges, not levels, but it might
be worth investigating.

best wishes,

dan


On Fri, Nov 7, 2014 at 9:03 AM, Richard Black  wrote:
> Hi all,
>
> Haven't heard anything for a while, so I thought I would add some more
> detail about our system setup to see if it might shed some light on the
> problem:
>
> 1 PPS Signal
> -
> Square pulse
> Frequency: 1 Hz
> Amplitude: 3 Vpp
> Offset: 0 V
> Width: 10 ms
> Edge Time: 5 ns
>
> ADC Clock
> -
> CW Tone
> Frequency: 200 MHz
> Power: -9 dBm
>
> For David, are there any red flags with our UBoot version or ROACH CPLD?
> Here they are again for reference:
>
> From serial interface after ROACH reboot
> ==
> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> ...
> CPLD: 2.1
> ==
>
> Thanks!
>
>
> Richard Black
>
> On Tue, Nov 4, 2014 at 12:05 PM, Richard Black 
> wrote:
>>
>> Hi David,
>>
>> Comments below:
>>
>> Richard Black
>>
>> On Mon, Nov 3, 2014 at 3:51 PM, David MacMahon 
>> wrote:
>>>
>>> Hi, Richard,
>>>
>>> On Nov 3, 2014, at 11:47 AM, Richard Black wrote:
>>>
>>> > So, it's been a little while now, but not much has changed yet. We've
>>> > gotten Chipscope working, and, so far, there aren't any red flags with the
>>> > FPGA firmware 10-GbE control signals.
>>>
>>> That's good to know, although maybe in some way it would have been nice
>>> if you had found some red flags.
>>>
>>> > We also confirmed that the bitstream we are using is in fact
>>> > roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not the
>>> > problem.
>>>
>>> At least you are using a known good BOF file, so that eliminates a source
>>> of potential errors.
>>>
>>> > I also took a look at the ROACH2 PPC setup: we pulled from the .git
>>> > repository on February 12, 2014 (commit number =
>>> > e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any changes 
>>> > to
>>> > that repository since August 2013, so unless the SKA-SA ROACH-2s are 
>>> > using a
>>> > pull from before then, I don't think that is our issue.
>>>
>>> We use our own homegrown NFS root filesystem for the ROACH2s, so I can't
>>> comment on the status of the one you refer to
>>> (https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more interested in
>>> the U-Boot version you have (see https://github.com/ska-sa/roach2_uboot.git)
>>> and which version of the ROACH2 CPLD image you are using (not sure where to
>>> get this).  I think these are unlikely to be problematic, but we've already
>>> checked all the likely problems.
>>
>>
>> When I rebooted the ROACH-2, I got the following header for U-Boot:
>>
>> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
>> ...
>> CPLD: 2.1
>>
>> Hope this is informative.
>>
>>
>>>
>>>
>>> > We also tried out Jason Manley's suggestion of delaying the enabling of
>>> > the 10-GbE cores to ensure that the sync pulse propagated through the 
>>> > entire
>>> > system before buffering up data, but the problem persisted.
>>>
>>> Do you have an external 1 PPS sync pulse connected or have you tried the
>>> latest rb-papergpu software that supports a software-generated "sync"?  The
>>> paper_feng_init.rb script already disables the data flow to the 10 GbE cores
>>> until the sync pulse has propagated through and the cores have been taken
>>> out of reset.
>>>
>>
>> We are using an external 1 PPS sync pulse. However, we are certain that
>> it's set up correctly. Although, this could just be me grasping at straws
>> since nothing else seems to solve the problem. How would we go about setting
>> up the software-generated pulse?
>>
>>>
>>> Does the latest rb-papergpu code show that the ADC clocks (MMCMs) are
>>> locked?  Does it estimate the clock frequency correctly?  Does
>>> adc16_dump_chans.rb show samples that correspond correctly to the analog
>>> inputs (e.g. a CW tone)?
>>
>>
>> I've attached an image of the output from xtor_up.sh -f 1 with the latest
>> rb-papergpu code. Nothing significant to note: the clock reads ~200 MHz.
>>
>> I've also attached an image of the output from adc16_dump_chans.rb, where
>> A1 has a CW tone with a 10-MHz 40-V emf signal. You can see the oscillations
>> in the first column and noise everywhere else.
>>
>>>
>>>
>>> > Just to rule it out, I double-checked (or more accurately
>>> > triple-checked) the U72 part, and, sure enough, it is the correct
>>> > oscillator, model number EEG-2121.
>>>
>>> Does it have the "L" suffix on the "100.000L" frequency part of the chip
>>> markings?
>>>
>>
>> Yes, it does.
>>
>>>
>>> On a related note, as I sent off-list to you and Peter earlier t

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-07 Thread Richard Black

Hi all,

Haven't heard anything for a while, so I thought I would add some more
detail about our system setup to see if it might shed some light on the
problem:

1 PPS Signal
-
Square pulse
Frequency: 1 Hz
Amplitude: 3 Vpp
Offset: 0 V
Width: 10 ms
Edge Time: 5 ns

ADC Clock
-
CW Tone
Frequency: 200 MHz
Power: -9 dBm

For David, are there any red flags with our UBoot version or ROACH CPLD?
Here they are again for reference:

>From serial interface after ROACH reboot
==
U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
...
CPLD: 2.1
==

Thanks!


Richard Black

On Tue, Nov 4, 2014 at 12:05 PM, Richard Black 
wrote:

> Hi David,
>
> Comments below:
>
> Richard Black
>
> On Mon, Nov 3, 2014 at 3:51 PM, David MacMahon 
> wrote:
>
>> Hi, Richard,
>>
>> On Nov 3, 2014, at 11:47 AM, Richard Black wrote:
>>
>> > So, it's been a little while now, but not much has changed yet. We've
>> gotten Chipscope working, and, so far, there aren't any red flags with the
>> FPGA firmware 10-GbE control signals.
>>
>> That's good to know, although maybe in some way it would have been nice
>> if you had found some red flags.
>>
>> > We also confirmed that the bitstream we are using is in fact
>> roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not the
>> problem.
>>
>> At least you are using a known good BOF file, so that eliminates a source
>> of potential errors.
>>
>> > I also took a look at the ROACH2 PPC setup: we pulled from the .git
>> repository on February 12, 2014 (commit number =
>> e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any changes
>> to that repository since August 2013, so unless the SKA-SA ROACH-2s are
>> using a pull from before then, I don't think that is our issue.
>>
>> We use our own homegrown NFS root filesystem for the ROACH2s, so I can't
>> comment on the status of the one you refer to (
>> https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more interested
>> in the U-Boot version you have (see
>> https://github.com/ska-sa/roach2_uboot.git) and which version of the
>> ROACH2 CPLD image you are using (not sure where to get this).  I think
>> these are unlikely to be problematic, but we've already checked all the
>> likely problems.
>>
>
> When I rebooted the ROACH-2, I got the following header for U-Boot:
>
> U-Boot 2011.06-rc2-0-g2694c9d-dirty (Dec 04 2013 - 20:58:06)
> ...
> CPLD: 2.1
>
> Hope this is informative.
>
>
>
>>
>> > We also tried out Jason Manley's suggestion of delaying the enabling of
>> the 10-GbE cores to ensure that the sync pulse propagated through the
>> entire system before buffering up data, but the problem persisted.
>>
>> Do you have an external 1 PPS sync pulse connected or have you tried the
>> latest rb-papergpu software that supports a software-generated "sync"?  The
>> paper_feng_init.rb script already disables the data flow to the 10 GbE
>> cores until the sync pulse has propagated through and the cores have been
>> taken out of reset.
>>
>>
> We are using an external 1 PPS sync pulse. However, we are certain that
> it's set up correctly. Although, this could just be me grasping at straws
> since nothing else seems to solve the problem. How would we go about
> setting up the software-generated pulse?
>
>
>> Does the latest rb-papergpu code show that the ADC clocks (MMCMs) are
>> locked?  Does it estimate the clock frequency correctly?  Does
>> adc16_dump_chans.rb show samples that correspond correctly to the analog
>> inputs (e.g. a CW tone)?
>>
>
> I've attached an image of the output from xtor_up.sh -f 1 with the latest
> rb-papergpu code. Nothing significant to note: the clock reads ~200 MHz.
>
> I've also attached an image of the output from adc16_dump_chans.rb, where
> A1 has a CW tone with a 10-MHz 40-V emf signal. You can see the
> oscillations in the first column and noise everywhere else.
>
>
>>
>> > Just to rule it out, I double-checked (or more accurately
>> triple-checked) the U72 part, and, sure enough, it is the correct
>> oscillator, model number EEG-2121.
>>
>> Does it have the "L" suffix on the "100.000L" frequency part of the chip
>> markings?
>>
>>
> Yes, it does.
>
>
>> On a related note, as I sent off-list to you and Peter earlier today:
>> The fact that the Peter can send small packets at 200 MHz without overflow,
>> but large packets give overflow is very interesting and puzzling.  I assume
>> that the smaller packets are just fewer channels of the same length
>> spectrum and that the number of packets per second remains the same (I
>> think we discussed this previously).  In that case, the small packets
>> reduce the data rate, which suggests that the 156.25 MHz "xaui_ref_clk"
>> clock is maybe not really 156.25 MHz but something somewhat slower.  This
>> clock is driven by the oscillator at U56 and the clock splitter at U54 (see
>> attached schematic snippet).  Can you please inspect

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-03 Thread David MacMahon

Hi, Richard,

On Nov 3, 2014, at 11:47 AM, Richard Black wrote:

> So, it's been a little while now, but not much has changed yet. We've gotten 
> Chipscope working, and, so far, there aren't any red flags with the FPGA 
> firmware 10-GbE control signals.

That's good to know, although maybe in some way it would have been nice if you 
had found some red flags.

> We also confirmed that the bitstream we are using is in fact 
> roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not the 
> problem.

At least you are using a known good BOF file, so that eliminates a source of 
potential errors.

> I also took a look at the ROACH2 PPC setup: we pulled from the .git 
> repository on February 12, 2014 (commit number = 
> e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any changes to 
> that repository since August 2013, so unless the SKA-SA ROACH-2s are using a 
> pull from before then, I don't think that is our issue.

We use our own homegrown NFS root filesystem for the ROACH2s, so I can't 
comment on the status of the one you refer to 
(https://github.com/ska-sa/roach2_nfs_uboot.git).  I am more interested in the 
U-Boot version you have (see https://github.com/ska-sa/roach2_uboot.git) and 
which version of the ROACH2 CPLD image you are using (not sure where to get 
this).  I think these are unlikely to be problematic, but we've already checked 
all the likely problems.

> We also tried out Jason Manley's suggestion of delaying the enabling of the 
> 10-GbE cores to ensure that the sync pulse propagated through the entire 
> system before buffering up data, but the problem persisted.

Do you have an external 1 PPS sync pulse connected or have you tried the latest 
rb-papergpu software that supports a software-generated "sync"?  The 
paper_feng_init.rb script already disables the data flow to the 10 GbE cores 
until the sync pulse has propagated through and the cores have been taken out 
of reset.

Does the latest rb-papergpu code show that the ADC clocks (MMCMs) are locked?  
Does it estimate the clock frequency correctly?  Does adc16_dump_chans.rb show 
samples that correspond correctly to the analog inputs (e.g. a CW tone)?

> Just to rule it out, I double-checked (or more accurately triple-checked) the 
> U72 part, and, sure enough, it is the correct oscillator, model number 
> EEG-2121.

Does it have the "L" suffix on the "100.000L" frequency part of the chip 
markings?

On a related note, as I sent off-list to you and Peter earlier today:  The fact 
that the Peter can send small packets at 200 MHz without overflow, but large 
packets give overflow is very interesting and puzzling.  I assume that the 
smaller packets are just fewer channels of the same length spectrum and that 
the number of packets per second remains the same (I think we discussed this 
previously).  In that case, the small packets reduce the data rate, which 
suggests that the 156.25 MHz "xaui_ref_clk" clock is maybe not really 156.25 
MHz but something somewhat slower.  This clock is driven by the oscillator at 
U56 and the clock splitter at U54 (see attached schematic snippet).  Can you 
please inspect those parts on your board(s)?  I will be able to inspect a 
ROACH2 this afternoon and report what I have on a known working system.

On one of our ROACH2s U56 is labeled like this:

EEG-2121
156.250L
OGPN1Z5C

Again, note the "L" suffix.  I think that signifies "LVDS", which is what is 
expected/required for the ROACH2.  That's very important.  I am not 100% sure 
about my transcription of the third line, it could have typos.

> There is another possibility, albeit an unlikely problem: we currently have 
> the ROACH-2 board booting off another PC (i.e. not the same PC that the ruby 
> control scripts are running on). I can't imagine that this is the problem, 
> but I'm planning on trying to consolidate the NFS and ruby scripts onto a 
> single PC to rule it out.

The scripts communicate with the ROACH2 over the network via KATCP.  There is 
no requirement that the scripts be running on the same server that is providing 
the NFS root filesystem to the ROACH2s.

> So I suppose at this point, my questions are:
> 
> (1) What version of the roach2_nfs_uboot .git repository are SKA-SA using?

I don't know.

> (2) Is SKA-SA using the same PCs for ROACH-2 net boots and file systems as 
> the ruby control scripts?

I doubt SKA-SA is using ruby, but as stated above the ruby scripts can be run 
on any system that can reach the ROACH2 via KATCP.

> (3) Are there any additional steps that need to be taken when installing the 
> Quad SFP+ mezzanine cards onto the ROACH-2 board? Are there potentially some 
> drivers or configuration steps that are needed to make sure they function 
> properly? As I recall, when we got the boards, we didn't do anything special 
> with the cards outside of simply plugging them in.

Just plugging them in is all that is necessary.  There is a slight complication 
in that the standoffs might not

Re: [casper] Problem about the adc frequency in PAPER model.

2014-11-03 Thread Richard Black

David,

So, it's been a little while now, but not much has changed yet. We've
gotten Chipscope working, and, so far, there aren't any red flags with the
FPGA firmware 10-GbE control signals.

We also confirmed that the bitstream we are using is in fact
roach2_fengine_2013_Oct_14_1756.bof.gz, so that is unfortunately not the
problem.

I also took a look at the ROACH2 PPC setup: we pulled from the .git
repository on February 12, 2014 (commit number =
e14df9016c3b7ccba62cc6d0cae05405f4929c94). There haven't been any changes
to that repository since August 2013, so unless the SKA-SA ROACH-2s are
using a pull from before then, I don't think that is our issue.

We also tried out Jason Manley's suggestion of delaying the enabling of the
10-GbE cores to ensure that the sync pulse propagated through the entire
system before buffering up data, but the problem persisted.

Just to rule it out, I double-checked (or more accurately triple-checked)
the U72 part, and, sure enough, it is the correct oscillator, model number
EEG-2121.

There is another possibility, albeit an unlikely problem: we currently have
the ROACH-2 board booting off another PC (i.e. not the same PC that the
ruby control scripts are running on). I can't imagine that this is the
problem, but I'm planning on trying to consolidate the NFS and ruby scripts
onto a single PC to rule it out.

So I suppose at this point, my questions are:

(1) What version of the roach2_nfs_uboot .git repository are SKA-SA using?
(2) Is SKA-SA using the same PCs for ROACH-2 net boots and file systems as
the ruby control scripts?
(3) Are there any additional steps that need to be taken when installing
the Quad SFP+ mezzanine cards onto the ROACH-2 board? Are there potentially
some drivers or configuration steps that are needed to make sure they
function properly? As I recall, when we got the boards, we didn't do
anything special with the cards outside of simply plugging them in.

Again, thanks for your patient advice and suggestions.

Richard Black

On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon 
wrote:

> Hi, Richard,
>
> On Oct 27, 2014, at 9:25 AM, Richard Black wrote:
>
> > This is a reportedly fully-functional model that shouldn't require any
> major changes in order to operate. However, this has clearly not been the
> case in at least two independent situations (us and Peter). This begs the
> question: what's so different about our use of PAPER?
>
> I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is
> the one being used by the PAPER correlator currently fielded in South
> Africa.  It is definitely a fully functional model.  That image (and all
> source files for it) is available from the git repo listed on the PAPER
> Correlator Manifest page of the CASPER Wiki:
>
> https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest
>
> > We, at BYU, have made painstakingly sure that our IP addressing schemes,
> switch ports, and scripts are all configured correctly (thanks to David
> MacMahon for that, btw), but we still have hit the proverbial brick wall of
> 10-GbE overflow.  When I last corresponded with David, he explained that he
> remembers having a similar issue before, but can't recall exactly what the
> problem was.
>
> Really?  I recall saying that I often forget about increasing the MTU of
> the 10 GbE switch and NICs.  I don't recall saying that I had a similar
> issue before but couldn't remember the problem.
>
> > In any case, the fact that by turning down the ADC clock prior to
> start-up prevents the 10-GbE core from overflowing is a major lead for us
> at BYU (we've been spinning our wheels on this issue for several months
> now). By no means are we proposing mid-run ADC clock modifications, but
> this appears to be a very subtle (and quite sinister, in my opinion) bug.
> >
> > Any thoughts as to what might be going on?
>
> I cannot explain the 10 GbE overflow that you and Peter are experiencing.
> I have pushed some updates to the rb-papergpu.git repository listed on the
> PAPER Correlator Manifest page.  The paper_feng_init.rb script now verifies
> that the ADC clocks are locked and provides options for issuing a software
> sync (only recommended for lab use) and for not storing the time of
> synchronization in redis (also only recommended for lab use).
>
> The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1)
> while they are held in reset.  Since you are using the paper_feng_init.rb
> script, this should not be happening (unless something has gone wrong
> during the running of that script) because that script specifically and
> explicitly disables the tx_valid signal before putting the cores into reset
> and it takes the cores out of reset before enabling the tx_valid signal.
> So assuming that this is not the cause of the overflows, there must be
> something else that is causing the 10 GbE cores to be unable to transmit
> data fast enough to keep up with the data stream it is being fed.  Two
> things that could ca

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-28 Thread David MacMahon

Hi, Peter,

On Oct 28, 2014, at 5:34 AM, peter wrote:

> First, Though the serial number of all 8 roaches we have are in the range 
> that might got wrong,fortunately, ours are installed the correct crystals 
> (Epson EEG-2121-100.000L). 

Thanks for checking.  That eliminates one potential cause of the problem.

> I have run the adc16_dump_chans.rb when I run PAPER model. The result is like 
> flowing:
> 
> [peter@roachserver bin]$ ./adc16_dump_chans.rb -r  -v pf1
> data snap took 0.363328416 seconds
> 111.5 112.0 112.1 112.1 127.1 127.1 127.3 127.4 112.2 112.3 111.8 112.0 112.1 
> 112.2 111.6 112.0 112.4 111.6 112.1 112.0 127.0 127.4 127.1 127.3 112.1 111.4 
> 112.0 111.7 127.3 126.7 127.4 126.6

The '-r' option tells the script to output the RMS of the 32 inputs.  Those RMS 
values are very, very high.  A full scale sine wave would have an RMS of "only" 
90.

What signals are driving the ADC inputs?

If you don't pass '-r' then it will dump 1K of samples from each input (one 
column per input, one row per sample).  What does that show?

> [peter@roachserver bin]$ ./paper_feng_init.rb pf1
> initializing pf1 as FID 0
> connecting to pf1
> ./paper_feng_init.rb:130:in `block in ': undefined local variable or 
> method `a' for main:Object (NameError)
>   from ./paper_feng_init.rb:112:in `map'
>   from ./paper_feng_init.rb:112:in `'

Sorry about that copy/paste error!  I have pushed a fix.

Hope this helps to get us closer to understanding this problem,
Dave

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-28 Thread peter

Hi all,
Sorry to reply you late.
First, Though the serial number of all 8 roaches we have are in the range that 
might got wrong,fortunately, ours are installed the correct crystals (Epson 
EEG-2121-100.000L).
I have viewed the discuss yesterday.My project'final frequency is 250Mhz,but I 
didn't turn it up to 250Mhz when I run PAPER model.
As the initialization shows:

[peter@roachserver rb_test]$ ./paper_feng_init.rb roach1:0 initializing roach1 
as FID 0 connecting to roach1 roach1 roach2_fengine app/lib revision 
47c59e2/cd26bd2 disabling network transmission setting roach1 FID to 0 setting 
fftshift to 2047 setting eq to 600/1 configuring 10 GbE interfaces setting 
corner turner mode 0 (8 F engines) arming sync generator(s) arming sync 
generator(s) storing sync time in redis on redishost seeding noise generators 
arming noise generator(s) Setting F-Engine inputs to ADC signals resetting 
network interfaces enable transmission to X engines enable transmission to 
switch all done
The configuration looks ok,but no data send out because the overflow.I agree 
with David that It may not be the script that matters. Because I can use this 
script to initial my own model which are modified from PAPER for our use.What's 
more, it can send out data packets from ROACH in 200Mhz(even in 250Mhz).And the 
overflow problem has never happened.My model are sending data in 4112 bytes 
length.
I also find neither PAPER model in 75 Mhz nor my model in 200Mhz could receive 
the correct data structure on my system.I mean the Header appears in the middle 
of the packet.I found this in wireshark.

I have run the adc16_dump_chans.rb when I run PAPER model. The result is like 
flowing:

[peter@roachserver bin]$ ./adc16_dump_chans.rb -r -v pf1 data snap took 
0.363328416 seconds 111.5 112.0 112.1 112.1 127.1 127.1 127.3 127.4 112.2 112.3 
111.8 112.0 112.1 112.2 111.6 112.0 112.4 111.6 112.1 112.0 127.0 127.4 127.1 
127.3 112.1 111.4 112.0 111.7 127.3 126.7 127.4 126.6
I also download the new script as David point,but I met a name-error:

[peter@roachserver bin]$ ./paper_feng_init.rb pf1 initializing pf1 as FID 0 
connecting to pf1 ./paper_feng_init.rb:130:in `block in ': undefined 
local variable or method `a' for main:Object (NameError) from 
./paper_feng_init.rb:112:in `map' from ./paper_feng_init.rb:112:in `'

Thanks for your communication and suggestions!
peter

At 2014-10-28 05:03:14, "David MacMahon"  wrote:
>Hi, Richard and Peter,
>
>Another possibility that crossed my mind is perhaps your ROACH2s were from the 
>batch where the incorrect oscillator was installed for U72.  This seems 
>unlikely for Richard based on this email (which also describes the incorrect 
>oscillator problem in general):
>
>https://www.mail-archive.com/casper@lists.berkeley.edu/msg04909.html
>
>Maybe it's worth a double check anyway?
>
>Dave
>
>On Oct 27, 2014, at 1:41 PM, Richard Black wrote:
>
>> David,
>> 
>> We'll take another close look at what model we are actually using, just to 
>> be safe.
>> 
>> I went back and looked at our e-mails, and sure enough, you're right. You 
>> were referring to the MTU issue as being the problem you tend to suppress 
>> all memory of. It was just that you stated it in a separate paragraph, so, 
>> out-of-context, I extrapolated that you have had the same problem before. My 
>> bad for dragging your good name through the mud. :)
>> 
>> We will also update our local repositories, in the event some bizarre race 
>> condition exists on our end.
>> 
>> I didn't know that the buffer could fill up while reset was asserted. We'll 
>> definitely have to check up on that too.
>> 
>> We haven't tried dumping raw ADC data yet since we have been trying to get 
>> the data link working first. After that, we were planning to inject signal 
>> and examine outputs.
>> 
>> Thanks,
>> 
>> Richard Black
>> 
>> On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon  
>> wrote:
>> Hi, Richard,
>> 
>> On Oct 27, 2014, at 9:25 AM, Richard Black wrote:
>> 
>> > This is a reportedly fully-functional model that shouldn't require any 
>> > major changes in order to operate. However, this has clearly not been the 
>> > case in at least two independent situations (us and Peter). This begs the 
>> > question: what's so different about our use of PAPER?
>> 
>> I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is the 
>> one being used by the PAPER correlator currently fielded in South Africa.  
>> It is definitely a fully functional model.  That image (and all source files 
>> for it) is available from the git repo listed on the PAPER Correlator 
>> Manifest page of the CASPER Wiki:
>> 
>> https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest
>> 
>> > We, at BYU, have made painstakingly sure that our IP addressing schemes, 
>> > switch ports, and scripts are all configured correctly (thanks to David 
>> > MacMahon for that, btw), but we still have hit the proverbial brick wall 
>> > of 10-GbE overflow.  When I

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread David MacMahon

Hi, Richard and Peter,

Another possibility that crossed my mind is perhaps your ROACH2s were from the 
batch where the incorrect oscillator was installed for U72.  This seems 
unlikely for Richard based on this email (which also describes the incorrect 
oscillator problem in general):

https://www.mail-archive.com/casper@lists.berkeley.edu/msg04909.html

Maybe it's worth a double check anyway?

Dave

On Oct 27, 2014, at 1:41 PM, Richard Black wrote:

> David,
> 
> We'll take another close look at what model we are actually using, just to be 
> safe.
> 
> I went back and looked at our e-mails, and sure enough, you're right. You 
> were referring to the MTU issue as being the problem you tend to suppress all 
> memory of. It was just that you stated it in a separate paragraph, so, 
> out-of-context, I extrapolated that you have had the same problem before. My 
> bad for dragging your good name through the mud. :)
> 
> We will also update our local repositories, in the event some bizarre race 
> condition exists on our end.
> 
> I didn't know that the buffer could fill up while reset was asserted. We'll 
> definitely have to check up on that too.
> 
> We haven't tried dumping raw ADC data yet since we have been trying to get 
> the data link working first. After that, we were planning to inject signal 
> and examine outputs.
> 
> Thanks,
> 
> Richard Black
> 
> On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon  
> wrote:
> Hi, Richard,
> 
> On Oct 27, 2014, at 9:25 AM, Richard Black wrote:
> 
> > This is a reportedly fully-functional model that shouldn't require any 
> > major changes in order to operate. However, this has clearly not been the 
> > case in at least two independent situations (us and Peter). This begs the 
> > question: what's so different about our use of PAPER?
> 
> I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is the 
> one being used by the PAPER correlator currently fielded in South Africa.  It 
> is definitely a fully functional model.  That image (and all source files for 
> it) is available from the git repo listed on the PAPER Correlator Manifest 
> page of the CASPER Wiki:
> 
> https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest
> 
> > We, at BYU, have made painstakingly sure that our IP addressing schemes, 
> > switch ports, and scripts are all configured correctly (thanks to David 
> > MacMahon for that, btw), but we still have hit the proverbial brick wall of 
> > 10-GbE overflow.  When I last corresponded with David, he explained that he 
> > remembers having a similar issue before, but can't recall exactly what the 
> > problem was.
> 
> Really?  I recall saying that I often forget about increasing the MTU of the 
> 10 GbE switch and NICs.  I don't recall saying that I had a similar issue 
> before but couldn't remember the problem.
> 
> > In any case, the fact that by turning down the ADC clock prior to start-up 
> > prevents the 10-GbE core from overflowing is a major lead for us at BYU 
> > (we've been spinning our wheels on this issue for several months now). By 
> > no means are we proposing mid-run ADC clock modifications, but this appears 
> > to be a very subtle (and quite sinister, in my opinion) bug.
> >
> > Any thoughts as to what might be going on?
> 
> I cannot explain the 10 GbE overflow that you and Peter are experiencing.  I 
> have pushed some updates to the rb-papergpu.git repository listed on the 
> PAPER Correlator Manifest page.  The paper_feng_init.rb script now verifies 
> that the ADC clocks are locked and provides options for issuing a software 
> sync (only recommended for lab use) and for not storing the time of 
> synchronization in redis (also only recommended for lab use).
> 
> The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1) 
> while they are held in reset.  Since you are using the paper_feng_init.rb 
> script, this should not be happening (unless something has gone wrong during 
> the running of that script) because that script specifically and explicitly 
> disables the tx_valid signal before putting the cores into reset and it takes 
> the cores out of reset before enabling the tx_valid signal.  So assuming that 
> this is not the cause of the overflows, there must be something else that is 
> causing the 10 GbE cores to be unable to transmit data fast enough to keep up 
> with the data stream it is being fed.  Two things that could cause this are 
> 1) running the design faster than the 200 MHz sample clock that it was built 
> for and/or 2) some link issue that prevents the core from sending data.  
> Unfortunately, I think both of those ideas are also pretty far fetched given 
> all you've done to try to get the system working.  I wonder whether there is 
> some difference in the ROACH2 firmware (u-boot version or CPLD programming) 
> or PPC Linux setup or tcpborhpserver revision or ???.
> 
> Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data to 
> make sure that i

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black

David,

We'll take another close look at what model we are actually using, just to
be safe.

I went back and looked at our e-mails, and sure enough, you're right. You
were referring to the MTU issue as being the problem you tend to suppress
all memory of. It was just that you stated it in a separate paragraph, so,
out-of-context, I extrapolated that you have had the same problem before.
My bad for dragging your good name through the mud. :)

We will also update our local repositories, in the event some bizarre race
condition exists on our end.

I didn't know that the buffer could fill up while reset was asserted. We'll
definitely have to check up on that too.

We haven't tried dumping raw ADC data yet since we have been trying to get
the data link working first. After that, we were planning to inject signal
and examine outputs.

Thanks,

Richard Black

On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon 
wrote:

> Hi, Richard,
>
> On Oct 27, 2014, at 9:25 AM, Richard Black wrote:
>
> > This is a reportedly fully-functional model that shouldn't require any
> major changes in order to operate. However, this has clearly not been the
> case in at least two independent situations (us and Peter). This begs the
> question: what's so different about our use of PAPER?
>
> I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is
> the one being used by the PAPER correlator currently fielded in South
> Africa.  It is definitely a fully functional model.  That image (and all
> source files for it) is available from the git repo listed on the PAPER
> Correlator Manifest page of the CASPER Wiki:
>
> https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest
>
> > We, at BYU, have made painstakingly sure that our IP addressing schemes,
> switch ports, and scripts are all configured correctly (thanks to David
> MacMahon for that, btw), but we still have hit the proverbial brick wall of
> 10-GbE overflow.  When I last corresponded with David, he explained that he
> remembers having a similar issue before, but can't recall exactly what the
> problem was.
>
> Really?  I recall saying that I often forget about increasing the MTU of
> the 10 GbE switch and NICs.  I don't recall saying that I had a similar
> issue before but couldn't remember the problem.
>
> > In any case, the fact that by turning down the ADC clock prior to
> start-up prevents the 10-GbE core from overflowing is a major lead for us
> at BYU (we've been spinning our wheels on this issue for several months
> now). By no means are we proposing mid-run ADC clock modifications, but
> this appears to be a very subtle (and quite sinister, in my opinion) bug.
> >
> > Any thoughts as to what might be going on?
>
> I cannot explain the 10 GbE overflow that you and Peter are experiencing.
> I have pushed some updates to the rb-papergpu.git repository listed on the
> PAPER Correlator Manifest page.  The paper_feng_init.rb script now verifies
> that the ADC clocks are locked and provides options for issuing a software
> sync (only recommended for lab use) and for not storing the time of
> synchronization in redis (also only recommended for lab use).
>
> The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1)
> while they are held in reset.  Since you are using the paper_feng_init.rb
> script, this should not be happening (unless something has gone wrong
> during the running of that script) because that script specifically and
> explicitly disables the tx_valid signal before putting the cores into reset
> and it takes the cores out of reset before enabling the tx_valid signal.
> So assuming that this is not the cause of the overflows, there must be
> something else that is causing the 10 GbE cores to be unable to transmit
> data fast enough to keep up with the data stream it is being fed.  Two
> things that could cause this are 1) running the design faster than the 200
> MHz sample clock that it was built for and/or 2) some link issue that
> prevents the core from sending data.  Unfortunately, I think both of those
> ideas are also pretty far fetched given all you've done to try to get the
> system working.  I wonder whether there is some difference in the ROACH2
> firmware (u-boot version or CPLD programming) or PPC Linux setup or
> tcpborhpserver revision or ???.
>
> Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data
> to make sure that it looks OK?
>
> Dave
>
>

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread David MacMahon

Hi, Richard,

On Oct 27, 2014, at 9:25 AM, Richard Black wrote:

> This is a reportedly fully-functional model that shouldn't require any major 
> changes in order to operate. However, this has clearly not been the case in 
> at least two independent situations (us and Peter). This begs the question: 
> what's so different about our use of PAPER?

I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is the one 
being used by the PAPER correlator currently fielded in South Africa.  It is 
definitely a fully functional model.  That image (and all source files for it) 
is available from the git repo listed on the PAPER Correlator Manifest page of 
the CASPER Wiki:

https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest

> We, at BYU, have made painstakingly sure that our IP addressing schemes, 
> switch ports, and scripts are all configured correctly (thanks to David 
> MacMahon for that, btw), but we still have hit the proverbial brick wall of 
> 10-GbE overflow.  When I last corresponded with David, he explained that he 
> remembers having a similar issue before, but can't recall exactly what the 
> problem was.

Really?  I recall saying that I often forget about increasing the MTU of the 10 
GbE switch and NICs.  I don't recall saying that I had a similar issue before 
but couldn't remember the problem.

> In any case, the fact that by turning down the ADC clock prior to start-up 
> prevents the 10-GbE core from overflowing is a major lead for us at BYU 
> (we've been spinning our wheels on this issue for several months now). By no 
> means are we proposing mid-run ADC clock modifications, but this appears to 
> be a very subtle (and quite sinister, in my opinion) bug.
> 
> Any thoughts as to what might be going on?

I cannot explain the 10 GbE overflow that you and Peter are experiencing.  I 
have pushed some updates to the rb-papergpu.git repository listed on the PAPER 
Correlator Manifest page.  The paper_feng_init.rb script now verifies that the 
ADC clocks are locked and provides options for issuing a software sync (only 
recommended for lab use) and for not storing the time of synchronization in 
redis (also only recommended for lab use).

The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1) 
while they are held in reset.  Since you are using the paper_feng_init.rb 
script, this should not be happening (unless something has gone wrong during 
the running of that script) because that script specifically and explicitly 
disables the tx_valid signal before putting the cores into reset and it takes 
the cores out of reset before enabling the tx_valid signal.  So assuming that 
this is not the cause of the overflows, there must be something else that is 
causing the 10 GbE cores to be unable to transmit data fast enough to keep up 
with the data stream it is being fed.  Two things that could cause this are 1) 
running the design faster than the 200 MHz sample clock that it was built for 
and/or 2) some link issue that prevents the core from sending data.  
Unfortunately, I think both of those ideas are also pretty far fetched given 
all you've done to try to get the system working.  I wonder whether there is 
some difference in the ROACH2 firmware (u-boot version or CPLD programming) or 
PPC Linux setup or tcpborhpserver revision or ???.

Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data to 
make sure that it looks OK?

Dave

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jack Hickish

Hi Richard,

That's my theory, though I doubt it's right. But as you say, an easy
test is just to delay after issuing a sync for a couple more seconds
and see if that helps. But if your PPS is a real PPS (rather than just
a square wave at some vague 1s period) then I can't see what
difference this would make.
When that doesn't help, my inclination would be to start prodding the
10gbe control signals from software to make sure the reset / sw
enables are working / see if a tge reset without a new sync behaves
differently. But I can't imagine how that would be broken unless the
stuff on github is out of date (which I doubt).

Jack

On 27 October 2014 17:28, Richard Black  wrote:
> Jack,
>
> I appreciate your help. I tend to agree that the issue is likely a hardware
> configuration problem, but we have been trying to match it as closely as
> possible.
>
> We do feed a 1-PPS signal into the board, but I'm hazy on the details of the
> other pulse parameters. I'll look into that as well.
>
> So, if I understand you correctly, you believe that the sync pulse is
> reaching the ethernet interfaces after the cores are enabled? If that is the
> case, couldn't we delay enabling the 10-GbE cores for another second to fix
> it? This might be a quick way to test that theory, but please correct me if
> I've misunderstood.
>
> Richard Black
>
> On Mon, Oct 27, 2014 at 11:05 AM, Jack Hickish 
> wrote:
>>
>> Hi Richard,
>>
>> I've just had a very brief look at the design / software, so take this
>> email with a pinch of salt, but on the off-chance you haven't checked
>> this
>>
>> It looks like the PAPER F-engine setup on running the start script for
>> software / firmware out of the box is --
>>
>> 1. Disable all ethernet interfaces
>> 2. Arm sync generator, wait 1 second for PPS
>> 3. Reset ethernet interfaces
>> 4. Enable interfaces.
>>
>> These four steps seem like they should be safe, yet the behaviour
>> you're describing sounds like the design is midway sending a packet,
>> then gets a sync, gives up sending an end-of-frame and starts sending
>> a new packet, at which point the old packet + the new packet =
>> overflow.
>>
>> Knowing that the design works for paper, my wondering is whether after
>> arming the sync generator syncs are flowing through the design before
>> the ethernet interface is enabled. Do you have a PPS-like input? the
>> fengine initialisation script seems to wait for a second after arming,
>> but if your sync input is something significantly slower, you could
>> have problems.
>>
>> I'm sceptical about this theory (I think the symptoms would be lots of
>> OK packets when you brought up the interface, and then it dying when
>> the sync arrives, rather than a single good packet like you're
>> seeing), but if the firmware + software really is the same as that
>> working with paper, and the wiki hasn't just got out of sync with the
>> paper devs, perhaps the problem is in your hardware setup
>>
>> Cheers,
>> Jack
>>
>> On 27 October 2014 16:38, Richard Black  wrote:
>> > By "enable" port, I assume you mean the "valid" port. I've been looking
>> > at
>> > the PAPER model carefully for some time now, and that is how it
>> > operates. It
>> > has a gated valid signal with a software register on each 10-GbE core.
>> >
>> > Once again, this is not our model. This is one made available on the
>> > CASPER
>> > wiki and run without modifications.
>> >
>> > Richard Black
>> >
>> > On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley 
>> > wrote:
>> >>
>> >> I suspect the 10GbE core's input FIFO is overflowing on startup. One
>> >> key
>> >> thing with this core is to the ensure that your design keeps the enable
>> >> port
>> >> held low until the core's been configured. The core becomes unusable
>> >> once
>> >> the TX FIFO overflows. This has been a long-standing bug (my emails
>> >> trace
>> >> back to 2009) but it's so easy to work around that I don't think
>> >> anyone's
>> >> bothered looking into fixing it.
>> >>
>> >> Jason Manley
>> >> CBF Manager
>> >> SKA-SA
>> >>
>> >> Cell: +27 82 662 7726
>> >> Work: +27 21 506 7300
>> >>
>> >> On 27 Oct 2014, at 18:25, Richard Black  wrote:
>> >>
>> >> > Jason,
>> >> >
>> >> > Thanks for your comments. While I agree that changing the ADC
>> >> > frequency
>> >> > mid-operation is non-kosher and could result in uncertain behavior,
>> >> > the
>> >> > issue at hand for us is to figure out what is going on with the PAPER
>> >> > model
>> >> > that has been published on the CASPER wiki. This naturally won't be
>> >> > (and
>> >> > shouldn't be) the end-all solution to this problem.
>> >> >
>> >> > This is a reportedly fully-functional model that shouldn't require
>> >> > any
>> >> > major changes in order to operate. However, this has clearly not been
>> >> > the
>> >> > case in at least two independent situations (us and Peter). This begs
>> >> > the
>> >> > question: what's so different about our use of PAPER?
>> >> >
>> >> > We, at BYU, have made painstakingly su

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black

Jack,

I appreciate your help. I tend to agree that the issue is likely a hardware
configuration problem, but we have been trying to match it as closely as
possible.

We do feed a 1-PPS signal into the board, but I'm hazy on the details of
the other pulse parameters. I'll look into that as well.

So, if I understand you correctly, you believe that the sync pulse is
reaching the ethernet interfaces *after* the cores are enabled? If that is
the case, couldn't we delay enabling the 10-GbE cores for another second to
fix it? This might be a quick way to test that theory, but please correct
me if I've misunderstood.

Richard Black

On Mon, Oct 27, 2014 at 11:05 AM, Jack Hickish 
wrote:

> Hi Richard,
>
> I've just had a very brief look at the design / software, so take this
> email with a pinch of salt, but on the off-chance you haven't checked
> this
>
> It looks like the PAPER F-engine setup on running the start script for
> software / firmware out of the box is --
>
> 1. Disable all ethernet interfaces
> 2. Arm sync generator, wait 1 second for PPS
> 3. Reset ethernet interfaces
> 4. Enable interfaces.
>
> These four steps seem like they should be safe, yet the behaviour
> you're describing sounds like the design is midway sending a packet,
> then gets a sync, gives up sending an end-of-frame and starts sending
> a new packet, at which point the old packet + the new packet =
> overflow.
>
> Knowing that the design works for paper, my wondering is whether after
> arming the sync generator syncs are flowing through the design before
> the ethernet interface is enabled. Do you have a PPS-like input? the
> fengine initialisation script seems to wait for a second after arming,
> but if your sync input is something significantly slower, you could
> have problems.
>
> I'm sceptical about this theory (I think the symptoms would be lots of
> OK packets when you brought up the interface, and then it dying when
> the sync arrives, rather than a single good packet like you're
> seeing), but if the firmware + software really is the same as that
> working with paper, and the wiki hasn't just got out of sync with the
> paper devs, perhaps the problem is in your hardware setup
>
> Cheers,
> Jack
>
> On 27 October 2014 16:38, Richard Black  wrote:
> > By "enable" port, I assume you mean the "valid" port. I've been looking
> at
> > the PAPER model carefully for some time now, and that is how it
> operates. It
> > has a gated valid signal with a software register on each 10-GbE core.
> >
> > Once again, this is not our model. This is one made available on the
> CASPER
> > wiki and run without modifications.
> >
> > Richard Black
> >
> > On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley 
> wrote:
> >>
> >> I suspect the 10GbE core's input FIFO is overflowing on startup. One key
> >> thing with this core is to the ensure that your design keeps the enable
> port
> >> held low until the core's been configured. The core becomes unusable
> once
> >> the TX FIFO overflows. This has been a long-standing bug (my emails
> trace
> >> back to 2009) but it's so easy to work around that I don't think
> anyone's
> >> bothered looking into fixing it.
> >>
> >> Jason Manley
> >> CBF Manager
> >> SKA-SA
> >>
> >> Cell: +27 82 662 7726
> >> Work: +27 21 506 7300
> >>
> >> On 27 Oct 2014, at 18:25, Richard Black  wrote:
> >>
> >> > Jason,
> >> >
> >> > Thanks for your comments. While I agree that changing the ADC
> frequency
> >> > mid-operation is non-kosher and could result in uncertain behavior,
> the
> >> > issue at hand for us is to figure out what is going on with the PAPER
> model
> >> > that has been published on the CASPER wiki. This naturally won't be
> (and
> >> > shouldn't be) the end-all solution to this problem.
> >> >
> >> > This is a reportedly fully-functional model that shouldn't require any
> >> > major changes in order to operate. However, this has clearly not been
> the
> >> > case in at least two independent situations (us and Peter). This begs
> the
> >> > question: what's so different about our use of PAPER?
> >> >
> >> > We, at BYU, have made painstakingly sure that our IP addressing
> schemes,
> >> > switch ports, and scripts are all configured correctly (thanks to
> David
> >> > MacMahon for that, btw), but we still have hit the proverbial brick
> wall of
> >> > 10-GbE overflow.  When I last corresponded with David, he explained
> that he
> >> > remembers having a similar issue before, but can't recall exactly
> what the
> >> > problem was.
> >> >
> >> > In any case, the fact that by turning down the ADC clock prior to
> >> > start-up prevents the 10-GbE core from overflowing is a major lead
> for us at
> >> > BYU (we've been spinning our wheels on this issue for several months
> now).
> >> > By no means are we proposing mid-run ADC clock modifications, but this
> >> > appears to be a very subtle (and quite sinister, in my opinion) bug.
> >> >
> >> > Any thoughts as to what might be going on?
> >> >
> >> > Richard

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jack Hickish

Hi Richard,

I've just had a very brief look at the design / software, so take this
email with a pinch of salt, but on the off-chance you haven't checked
this

It looks like the PAPER F-engine setup on running the start script for
software / firmware out of the box is --

1. Disable all ethernet interfaces
2. Arm sync generator, wait 1 second for PPS
3. Reset ethernet interfaces
4. Enable interfaces.

These four steps seem like they should be safe, yet the behaviour
you're describing sounds like the design is midway sending a packet,
then gets a sync, gives up sending an end-of-frame and starts sending
a new packet, at which point the old packet + the new packet =
overflow.

Knowing that the design works for paper, my wondering is whether after
arming the sync generator syncs are flowing through the design before
the ethernet interface is enabled. Do you have a PPS-like input? the
fengine initialisation script seems to wait for a second after arming,
but if your sync input is something significantly slower, you could
have problems.

I'm sceptical about this theory (I think the symptoms would be lots of
OK packets when you brought up the interface, and then it dying when
the sync arrives, rather than a single good packet like you're
seeing), but if the firmware + software really is the same as that
working with paper, and the wiki hasn't just got out of sync with the
paper devs, perhaps the problem is in your hardware setup

Cheers,
Jack

On 27 October 2014 16:38, Richard Black  wrote:
> By "enable" port, I assume you mean the "valid" port. I've been looking at
> the PAPER model carefully for some time now, and that is how it operates. It
> has a gated valid signal with a software register on each 10-GbE core.
>
> Once again, this is not our model. This is one made available on the CASPER
> wiki and run without modifications.
>
> Richard Black
>
> On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley  wrote:
>>
>> I suspect the 10GbE core's input FIFO is overflowing on startup. One key
>> thing with this core is to the ensure that your design keeps the enable port
>> held low until the core's been configured. The core becomes unusable once
>> the TX FIFO overflows. This has been a long-standing bug (my emails trace
>> back to 2009) but it's so easy to work around that I don't think anyone's
>> bothered looking into fixing it.
>>
>> Jason Manley
>> CBF Manager
>> SKA-SA
>>
>> Cell: +27 82 662 7726
>> Work: +27 21 506 7300
>>
>> On 27 Oct 2014, at 18:25, Richard Black  wrote:
>>
>> > Jason,
>> >
>> > Thanks for your comments. While I agree that changing the ADC frequency
>> > mid-operation is non-kosher and could result in uncertain behavior, the
>> > issue at hand for us is to figure out what is going on with the PAPER model
>> > that has been published on the CASPER wiki. This naturally won't be (and
>> > shouldn't be) the end-all solution to this problem.
>> >
>> > This is a reportedly fully-functional model that shouldn't require any
>> > major changes in order to operate. However, this has clearly not been the
>> > case in at least two independent situations (us and Peter). This begs the
>> > question: what's so different about our use of PAPER?
>> >
>> > We, at BYU, have made painstakingly sure that our IP addressing schemes,
>> > switch ports, and scripts are all configured correctly (thanks to David
>> > MacMahon for that, btw), but we still have hit the proverbial brick wall of
>> > 10-GbE overflow.  When I last corresponded with David, he explained that he
>> > remembers having a similar issue before, but can't recall exactly what the
>> > problem was.
>> >
>> > In any case, the fact that by turning down the ADC clock prior to
>> > start-up prevents the 10-GbE core from overflowing is a major lead for us 
>> > at
>> > BYU (we've been spinning our wheels on this issue for several months now).
>> > By no means are we proposing mid-run ADC clock modifications, but this
>> > appears to be a very subtle (and quite sinister, in my opinion) bug.
>> >
>> > Any thoughts as to what might be going on?
>> >
>> > Richard Black
>> >
>> > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley  wrote:
>> > Just a note that I don't recommend you adjust FPGA clock frequencies
>> > while it's operating. In theory, you should do a global reset in case the
>> > PLL/DLLs lose lock during clock transitions, in which case the logic could
>> > be in a uncertain state. But the Sysgen flow just does a single POR.
>> >
>> > A better solution might be to keep the 10GbE cores turned off (enable
>> > line pulled low) on initialisation, until things are configured (tgtap
>> > started etc), and only then enable the transmission using a SW register.
>> >
>> > Jason Manley
>> > CBF Manager
>> > SKA-SA
>> >
>> > Cell: +27 82 662 7726
>> > Work: +27 21 506 7300
>> >
>> > On 25 Oct 2014, at 10:34, peter  wrote:
>> >
>> > > Hi Richard,Joe,& all,
>> > > Thanks for your help,It finally can receive packets now!
>> > > As you point,After enabled the

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black

Jason,

Fair point. One of our guys is currently trying to get ChipScope configured
to make sure all our control signals are correct. We'll definitely look at
that signal too. Hopefully that will finally put this issue to rest.

Thanks for the tip,

Richard Black

On Mon, Oct 27, 2014 at 10:47 AM, Jason Manley  wrote:

> Yep, ok, so whoever did it (Dave?) already knows about this issue and has
> dealt with it. So scratch that idea then! Only other thing to check is to
> make sure you don't actually toggle that software register until the core
> is configured.
>
> Jason Manley
> CBF Manager
> SKA-SA
>
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
>
> On 27 Oct 2014, at 18:38, Richard Black  wrote:
>
> > By "enable" port, I assume you mean the "valid" port. I've been looking
> at the PAPER model carefully for some time now, and that is how it
> operates. It has a gated valid signal with a software register on each
> 10-GbE core.
> >
> > Once again, this is not our model. This is one made available on the
> CASPER wiki and run without modifications.
> >
> > Richard Black
> >
> > On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley 
> wrote:
> > I suspect the 10GbE core's input FIFO is overflowing on startup. One key
> thing with this core is to the ensure that your design keeps the enable
> port held low until the core's been configured. The core becomes unusable
> once the TX FIFO overflows. This has been a long-standing bug (my emails
> trace back to 2009) but it's so easy to work around that I don't think
> anyone's bothered looking into fixing it.
> >
> > Jason Manley
> > CBF Manager
> > SKA-SA
> >
> > Cell: +27 82 662 7726
> > Work: +27 21 506 7300
> >
> > On 27 Oct 2014, at 18:25, Richard Black  wrote:
> >
> > > Jason,
> > >
> > > Thanks for your comments. While I agree that changing the ADC
> frequency mid-operation is non-kosher and could result in uncertain
> behavior, the issue at hand for us is to figure out what is going on with
> the PAPER model that has been published on the CASPER wiki. This naturally
> won't be (and shouldn't be) the end-all solution to this problem.
> > >
> > > This is a reportedly fully-functional model that shouldn't require any
> major changes in order to operate. However, this has clearly not been the
> case in at least two independent situations (us and Peter). This begs the
> question: what's so different about our use of PAPER?
> > >
> > > We, at BYU, have made painstakingly sure that our IP addressing
> schemes, switch ports, and scripts are all configured correctly (thanks to
> David MacMahon for that, btw), but we still have hit the proverbial brick
> wall of 10-GbE overflow.  When I last corresponded with David, he explained
> that he remembers having a similar issue before, but can't recall exactly
> what the problem was.
> > >
> > > In any case, the fact that by turning down the ADC clock prior to
> start-up prevents the 10-GbE core from overflowing is a major lead for us
> at BYU (we've been spinning our wheels on this issue for several months
> now). By no means are we proposing mid-run ADC clock modifications, but
> this appears to be a very subtle (and quite sinister, in my opinion) bug.
> > >
> > > Any thoughts as to what might be going on?
> > >
> > > Richard Black
> > >
> > > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley 
> wrote:
> > > Just a note that I don't recommend you adjust FPGA clock frequencies
> while it's operating. In theory, you should do a global reset in case the
> PLL/DLLs lose lock during clock transitions, in which case the logic could
> be in a uncertain state. But the Sysgen flow just does a single POR.
> > >
> > > A better solution might be to keep the 10GbE cores turned off (enable
> line pulled low) on initialisation, until things are configured (tgtap
> started etc), and only then enable the transmission using a SW register.
> > >
> > > Jason Manley
> > > CBF Manager
> > > SKA-SA
> > >
> > > Cell: +27 82 662 7726
> > > Work: +27 21 506 7300
> > >
> > > On 25 Oct 2014, at 10:34, peter  wrote:
> > >
> > > > Hi Richard,Joe,& all,
> > > > Thanks for your help,It finally can receive packets now!
> > > > As you point,After enabled the ADC card and run bof
> file(./adc_init.rb roach1 bof file)in 200 Mhz (or higher than it), We need
> run init fengien script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 )
> ,That will allow the packet transfer.  then we can turn the frequency
> higher.However the finally ADC clock frequency is up to 120 Mhz in my
> experiment.Our final ADC frequency standard is 250 Mhz. Maybe I need run
> the bof file in a higher ADC frequency first to make a final steady 250 Mhz
> ADC clock frequncy.
> > > > Why it need init in a lower frequency and turn it up? That didn't
> make sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
> designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
> final frequency in your experiment?
> > > > Any reply will be helpful!
> > > > Best Regards!
> > > >

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley

Yep, ok, so whoever did it (Dave?) already knows about this issue and has dealt 
with it. So scratch that idea then! Only other thing to check is to make sure 
you don't actually toggle that software register until the core is configured.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 27 Oct 2014, at 18:38, Richard Black  wrote:

> By "enable" port, I assume you mean the "valid" port. I've been looking at 
> the PAPER model carefully for some time now, and that is how it operates. It 
> has a gated valid signal with a software register on each 10-GbE core.
> 
> Once again, this is not our model. This is one made available on the CASPER 
> wiki and run without modifications.
> 
> Richard Black
> 
> On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley  wrote:
> I suspect the 10GbE core's input FIFO is overflowing on startup. One key 
> thing with this core is to the ensure that your design keeps the enable port 
> held low until the core's been configured. The core becomes unusable once the 
> TX FIFO overflows. This has been a long-standing bug (my emails trace back to 
> 2009) but it's so easy to work around that I don't think anyone's bothered 
> looking into fixing it.
> 
> Jason Manley
> CBF Manager
> SKA-SA
> 
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
> 
> On 27 Oct 2014, at 18:25, Richard Black  wrote:
> 
> > Jason,
> >
> > Thanks for your comments. While I agree that changing the ADC frequency 
> > mid-operation is non-kosher and could result in uncertain behavior, the 
> > issue at hand for us is to figure out what is going on with the PAPER model 
> > that has been published on the CASPER wiki. This naturally won't be (and 
> > shouldn't be) the end-all solution to this problem.
> >
> > This is a reportedly fully-functional model that shouldn't require any 
> > major changes in order to operate. However, this has clearly not been the 
> > case in at least two independent situations (us and Peter). This begs the 
> > question: what's so different about our use of PAPER?
> >
> > We, at BYU, have made painstakingly sure that our IP addressing schemes, 
> > switch ports, and scripts are all configured correctly (thanks to David 
> > MacMahon for that, btw), but we still have hit the proverbial brick wall of 
> > 10-GbE overflow.  When I last corresponded with David, he explained that he 
> > remembers having a similar issue before, but can't recall exactly what the 
> > problem was.
> >
> > In any case, the fact that by turning down the ADC clock prior to start-up 
> > prevents the 10-GbE core from overflowing is a major lead for us at BYU 
> > (we've been spinning our wheels on this issue for several months now). By 
> > no means are we proposing mid-run ADC clock modifications, but this appears 
> > to be a very subtle (and quite sinister, in my opinion) bug.
> >
> > Any thoughts as to what might be going on?
> >
> > Richard Black
> >
> > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley  wrote:
> > Just a note that I don't recommend you adjust FPGA clock frequencies while 
> > it's operating. In theory, you should do a global reset in case the 
> > PLL/DLLs lose lock during clock transitions, in which case the logic could 
> > be in a uncertain state. But the Sysgen flow just does a single POR.
> >
> > A better solution might be to keep the 10GbE cores turned off (enable line 
> > pulled low) on initialisation, until things are configured (tgtap started 
> > etc), and only then enable the transmission using a SW register.
> >
> > Jason Manley
> > CBF Manager
> > SKA-SA
> >
> > Cell: +27 82 662 7726
> > Work: +27 21 506 7300
> >
> > On 25 Oct 2014, at 10:34, peter  wrote:
> >
> > > Hi Richard,Joe,& all,
> > > Thanks for your help,It finally can receive packets now!
> > > As you point,After enabled the ADC card and run bof file(./adc_init.rb 
> > > roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien 
> > > script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow 
> > > the packet transfer.  then we can turn the frequency  higher.However the 
> > > finally ADC clock frequency is up to 120 Mhz in my experiment.Our final 
> > > ADC frequency standard is 250 Mhz. Maybe I need run the bof file in a 
> > > higher ADC frequency first to make a final steady 250 Mhz ADC clock 
> > > frequncy.
> > > Why it need init in a lower frequency and turn it up? That didn't make 
> > > sense.Is the hardware going wrong?As the yellow block adc16*250-8 is 
> > > designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the 
> > > final frequency in your experiment?
> > > Any reply will be helpful!
> > > Best Regards!
> > > peter
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2014-10-25 00:36:52, "Richard Black"  wrote:
> > > Peter,
> > >
> > > That's correct. We downloaded the FPGA firmware and programmed the ROACH 
> > > with the precompiled bitstream. When we didn't get any data beyond that 
> > > single packet, we stuck some overflow status r

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black

By "enable" port, I assume you mean the "valid" port. I've been looking at
the PAPER model carefully for some time now, and that is how it operates.
It has a gated valid signal with a software register on each 10-GbE core.

Once again, this is not our model. This is one made available on the CASPER
wiki and run without modifications.

Richard Black

On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley  wrote:

> I suspect the 10GbE core's input FIFO is overflowing on startup. One key
> thing with this core is to the ensure that your design keeps the enable
> port held low until the core's been configured. The core becomes unusable
> once the TX FIFO overflows. This has been a long-standing bug (my emails
> trace back to 2009) but it's so easy to work around that I don't think
> anyone's bothered looking into fixing it.
>
> Jason Manley
> CBF Manager
> SKA-SA
>
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
>
> On 27 Oct 2014, at 18:25, Richard Black  wrote:
>
> > Jason,
> >
> > Thanks for your comments. While I agree that changing the ADC frequency
> mid-operation is non-kosher and could result in uncertain behavior, the
> issue at hand for us is to figure out what is going on with the PAPER model
> that has been published on the CASPER wiki. This naturally won't be (and
> shouldn't be) the end-all solution to this problem.
> >
> > This is a reportedly fully-functional model that shouldn't require any
> major changes in order to operate. However, this has clearly not been the
> case in at least two independent situations (us and Peter). This begs the
> question: what's so different about our use of PAPER?
> >
> > We, at BYU, have made painstakingly sure that our IP addressing schemes,
> switch ports, and scripts are all configured correctly (thanks to David
> MacMahon for that, btw), but we still have hit the proverbial brick wall of
> 10-GbE overflow.  When I last corresponded with David, he explained that he
> remembers having a similar issue before, but can't recall exactly what the
> problem was.
> >
> > In any case, the fact that by turning down the ADC clock prior to
> start-up prevents the 10-GbE core from overflowing is a major lead for us
> at BYU (we've been spinning our wheels on this issue for several months
> now). By no means are we proposing mid-run ADC clock modifications, but
> this appears to be a very subtle (and quite sinister, in my opinion) bug.
> >
> > Any thoughts as to what might be going on?
> >
> > Richard Black
> >
> > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley  wrote:
> > Just a note that I don't recommend you adjust FPGA clock frequencies
> while it's operating. In theory, you should do a global reset in case the
> PLL/DLLs lose lock during clock transitions, in which case the logic could
> be in a uncertain state. But the Sysgen flow just does a single POR.
> >
> > A better solution might be to keep the 10GbE cores turned off (enable
> line pulled low) on initialisation, until things are configured (tgtap
> started etc), and only then enable the transmission using a SW register.
> >
> > Jason Manley
> > CBF Manager
> > SKA-SA
> >
> > Cell: +27 82 662 7726
> > Work: +27 21 506 7300
> >
> > On 25 Oct 2014, at 10:34, peter  wrote:
> >
> > > Hi Richard,Joe,& all,
> > > Thanks for your help,It finally can receive packets now!
> > > As you point,After enabled the ADC card and run bof file(./adc_init.rb
> roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien
> script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow
> the packet transfer.  then we can turn the frequency  higher.However the
> finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC
> frequency standard is 250 Mhz. Maybe I need run the bof file in a higher
> ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
> > > Why it need init in a lower frequency and turn it up? That didn't make
> sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
> designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
> final frequency in your experiment?
> > > Any reply will be helpful!
> > > Best Regards!
> > > peter
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2014-10-25 00:36:52, "Richard Black"  wrote:
> > > Peter,
> > >
> > > That's correct. We downloaded the FPGA firmware and programmed the
> ROACH with the precompiled bitstream. When we didn't get any data beyond
> that single packet, we stuck some overflow status registers in the model
> and found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
> > >
> > > We have actually found a way to get packets to flow, but it isn't a
> good fix. When we turn the ADC clock frequency down to about 75 MHz, the
> packets begin to flow. There is an opinion in our group that the 10-GbE
> buffer overflow is a transient behavior, and, hence, if we slowly turn up
> the clock frequency after the ROACH has started up, packets may continue to
> flow in steady-state operation. We have

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley

I suspect the 10GbE core's input FIFO is overflowing on startup. One key thing 
with this core is to the ensure that your design keeps the enable port held low 
until the core's been configured. The core becomes unusable once the TX FIFO 
overflows. This has been a long-standing bug (my emails trace back to 2009) but 
it's so easy to work around that I don't think anyone's bothered looking into 
fixing it.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 27 Oct 2014, at 18:25, Richard Black  wrote:

> Jason,
> 
> Thanks for your comments. While I agree that changing the ADC frequency 
> mid-operation is non-kosher and could result in uncertain behavior, the issue 
> at hand for us is to figure out what is going on with the PAPER model that 
> has been published on the CASPER wiki. This naturally won't be (and shouldn't 
> be) the end-all solution to this problem.
> 
> This is a reportedly fully-functional model that shouldn't require any major 
> changes in order to operate. However, this has clearly not been the case in 
> at least two independent situations (us and Peter). This begs the question: 
> what's so different about our use of PAPER?
> 
> We, at BYU, have made painstakingly sure that our IP addressing schemes, 
> switch ports, and scripts are all configured correctly (thanks to David 
> MacMahon for that, btw), but we still have hit the proverbial brick wall of 
> 10-GbE overflow.  When I last corresponded with David, he explained that he 
> remembers having a similar issue before, but can't recall exactly what the 
> problem was.
> 
> In any case, the fact that by turning down the ADC clock prior to start-up 
> prevents the 10-GbE core from overflowing is a major lead for us at BYU 
> (we've been spinning our wheels on this issue for several months now). By no 
> means are we proposing mid-run ADC clock modifications, but this appears to 
> be a very subtle (and quite sinister, in my opinion) bug.
> 
> Any thoughts as to what might be going on?
> 
> Richard Black
> 
> On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley  wrote:
> Just a note that I don't recommend you adjust FPGA clock frequencies while 
> it's operating. In theory, you should do a global reset in case the PLL/DLLs 
> lose lock during clock transitions, in which case the logic could be in a 
> uncertain state. But the Sysgen flow just does a single POR.
> 
> A better solution might be to keep the 10GbE cores turned off (enable line 
> pulled low) on initialisation, until things are configured (tgtap started 
> etc), and only then enable the transmission using a SW register.
> 
> Jason Manley
> CBF Manager
> SKA-SA
> 
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
> 
> On 25 Oct 2014, at 10:34, peter  wrote:
> 
> > Hi Richard,Joe,& all,
> > Thanks for your help,It finally can receive packets now!
> > As you point,After enabled the ADC card and run bof file(./adc_init.rb 
> > roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien 
> > script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow 
> > the packet transfer.  then we can turn the frequency  higher.However the 
> > finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC 
> > frequency standard is 250 Mhz. Maybe I need run the bof file in a higher 
> > ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
> > Why it need init in a lower frequency and turn it up? That didn't make 
> > sense.Is the hardware going wrong?As the yellow block adc16*250-8 is 
> > designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the 
> > final frequency in your experiment?
> > Any reply will be helpful!
> > Best Regards!
> > peter
> >
> >
> >
> >
> >
> >
> > At 2014-10-25 00:36:52, "Richard Black"  wrote:
> > Peter,
> >
> > That's correct. We downloaded the FPGA firmware and programmed the ROACH 
> > with the precompiled bitstream. When we didn't get any data beyond that 
> > single packet, we stuck some overflow status registers in the model and 
> > found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
> >
> > We have actually found a way to get packets to flow, but it isn't a good 
> > fix. When we turn the ADC clock frequency down to about 75 MHz, the packets 
> > begin to flow. There is an opinion in our group that the 10-GbE buffer 
> > overflow is a transient behavior, and, hence, if we slowly turn up the 
> > clock frequency after the ROACH has started up, packets may continue to 
> > flow in steady-state operation. We haven't tested this yet, though.
> >
> > Richard Black
> >
> > On Thu, Oct 23, 2014 at 8:39 PM, peter  wrote:
> > Hi Richard,& All,
> > As you said the size of isolate packet is changing every time. ) :
> > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
> > 10:10:55.622053 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 4616
> > Ddi you

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black

Jason,

Thanks for your comments. While I agree that changing the ADC frequency
mid-operation is non-kosher and could result in uncertain behavior, the
issue at hand for us is to figure out what is going on with the PAPER model
that has been published on the CASPER wiki. This naturally won't be (and
shouldn't be) the end-all solution to this problem.

This is a reportedly fully-functional model that shouldn't require any
major changes in order to operate. However, this has clearly not been the
case in at least two independent situations (us and Peter). This begs the
question: what's so different about our use of PAPER?

We, at BYU, have made painstakingly sure that our IP addressing schemes,
switch ports, and scripts are all configured correctly (thanks to David
MacMahon for that, btw), but we still have hit the proverbial brick wall of
10-GbE overflow.  When I last corresponded with David, he explained that he
remembers having a similar issue before, but can't recall exactly what the
problem was.

In any case, the fact that by turning down the ADC clock prior to start-up
prevents the 10-GbE core from overflowing is a major lead for us at BYU
(we've been spinning our wheels on this issue for several months now). By
no means are we proposing mid-run ADC clock modifications, but this appears
to be a very subtle (and quite sinister, in my opinion) bug.

Any thoughts as to what might be going on?

Richard Black

On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley  wrote:

> Just a note that I don't recommend you adjust FPGA clock frequencies while
> it's operating. In theory, you should do a global reset in case the
> PLL/DLLs lose lock during clock transitions, in which case the logic could
> be in a uncertain state. But the Sysgen flow just does a single POR.
>
> A better solution might be to keep the 10GbE cores turned off (enable line
> pulled low) on initialisation, until things are configured (tgtap started
> etc), and only then enable the transmission using a SW register.
>
> Jason Manley
> CBF Manager
> SKA-SA
>
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
>
> On 25 Oct 2014, at 10:34, peter  wrote:
>
> > Hi Richard,Joe,& all,
> > Thanks for your help,It finally can receive packets now!
> > As you point,After enabled the ADC card and run bof file(./adc_init.rb
> roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien
> script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow
> the packet transfer.  then we can turn the frequency  higher.However the
> finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC
> frequency standard is 250 Mhz. Maybe I need run the bof file in a higher
> ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
> > Why it need init in a lower frequency and turn it up? That didn't make
> sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
> designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
> final frequency in your experiment?
> > Any reply will be helpful!
> > Best Regards!
> > peter
> >
> >
> >
> >
> >
> >
> > At 2014-10-25 00:36:52, "Richard Black"  wrote:
> > Peter,
> >
> > That's correct. We downloaded the FPGA firmware and programmed the ROACH
> with the precompiled bitstream. When we didn't get any data beyond that
> single packet, we stuck some overflow status registers in the model and
> found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
> >
> > We have actually found a way to get packets to flow, but it isn't a good
> fix. When we turn the ADC clock frequency down to about 75 MHz, the packets
> begin to flow. There is an opinion in our group that the 10-GbE buffer
> overflow is a transient behavior, and, hence, if we slowly turn up the
> clock frequency after the ROACH has started up, packets may continue to
> flow in steady-state operation. We haven't tested this yet, though.
> >
> > Richard Black
> >
> > On Thu, Oct 23, 2014 at 8:39 PM, peter  wrote:
> > Hi Richard,& All,
> > As you said the size of isolate packet is changing every time. ) :
> > tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
> > 10:10:55.622053 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 4616
> > Ddi you download the PAPER gateware on the casper  (
> https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest ) directly?
> How about the PAPER bof file run on your system? Have you met overflow
> before?I download and install  PAPER model as the website says ,but the
> overflow shows when I run the paper_feng_netstat.rb.
> > Thanks for your information.
> > peter
> >
> >
> >
> >
> >
> > At 2014-10-24 09:59:12, "Richard Black"  wrote:
> > Peter,
> >
> > I don't mean to hijack your thread, but we've been having a very similar
> (and time-absorbing) issue with the PAPER f-engine FPGA firmware here at
> BYU. Out of curiosity, does this single packet that you're receiving i

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley

Just a note that I don't recommend you adjust FPGA clock frequencies while it's 
operating. In theory, you should do a global reset in case the PLL/DLLs lose 
lock during clock transitions, in which case the logic could be in a uncertain 
state. But the Sysgen flow just does a single POR. 

A better solution might be to keep the 10GbE cores turned off (enable line 
pulled low) on initialisation, until things are configured (tgtap started etc), 
and only then enable the transmission using a SW register.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 25 Oct 2014, at 10:34, peter  wrote:

> Hi Richard,Joe,& all,
> Thanks for your help,It finally can receive packets now!
> As you point,After enabled the ADC card and run bof file(./adc_init.rb roach1 
> bof file)in 200 Mhz (or higher than it), We need run init fengien script in 
> about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow the packet 
> transfer.  then we can turn the frequency  higher.However the finally ADC 
> clock frequency is up to 120 Mhz in my experiment.Our final ADC frequency 
> standard is 250 Mhz. Maybe I need run the bof file in a higher ADC frequency 
> first to make a final steady 250 Mhz ADC clock frequncy.
> Why it need init in a lower frequency and turn it up? That didn't make 
> sense.Is the hardware going wrong?As the yellow block adc16*250-8 is designed 
> for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the final 
> frequency in your experiment? 
> Any reply will be helpful!
> Best Regards!
> peter
> 
> 
> 
> 
> 
> 
> At 2014-10-25 00:36:52, "Richard Black"  wrote:
> Peter,
> 
> That's correct. We downloaded the FPGA firmware and programmed the ROACH with 
> the precompiled bitstream. When we didn't get any data beyond that single 
> packet, we stuck some overflow status registers in the model and found that 
> we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
> 
> We have actually found a way to get packets to flow, but it isn't a good fix. 
> When we turn the ADC clock frequency down to about 75 MHz, the packets begin 
> to flow. There is an opinion in our group that the 10-GbE buffer overflow is 
> a transient behavior, and, hence, if we slowly turn up the clock frequency 
> after the ROACH has started up, packets may continue to flow in steady-state 
> operation. We haven't tested this yet, though.
> 
> Richard Black
> 
> On Thu, Oct 23, 2014 at 8:39 PM, peter  wrote:
> Hi Richard,& All,
> As you said the size of isolate packet is changing every time. ) :
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
> 10:10:55.622053 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 4616
> Ddi you download the PAPER gateware on the casper  
> (https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest ) directly? How 
> about the PAPER bof file run on your system? Have you met overflow before?I 
> download and install  PAPER model as the website says ,but the overflow shows 
> when I run the paper_feng_netstat.rb.
> Thanks for your information.
> peter
> 
> 
> 
> 
> 
> At 2014-10-24 09:59:12, "Richard Black"  wrote:
> Peter,
> 
> I don't mean to hijack your thread, but we've been having a very similar (and 
> time-absorbing) issue with the PAPER f-engine FPGA firmware here at BYU. Out 
> of curiosity, does this single packet that you're receiving in tcpdump change 
> in size every time you reprogram the ROACH? We've seen this happen, and we're 
> pretty sure that this isolated packet is the 10-GbE buffer flushing when the 
> 10-GbE core is initialized (i.e. the enable signal isn't sync'd with the 
> start of new packet).
> 
> Regardless of whether we have the same issue, I'm very interested to see this 
> problem's resolution.
> 
> Good luck,
> 
> Richard Black
> 
> On Thu, Oct 23, 2014 at 7:50 PM, peter  wrote:
> Hi Joe, & All,
> I find a thing this morning , there is one packet send out from roach When I 
> run PAPER model, which I got from HPC tcpdump:
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
> 09:04:02.757813 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 6456
> 
> The lenght is not expected 8200+8 ,and far from full TX buffer size 
> 8K+512.And the other packets are stopped from overflow.
> I have tried to change the tutorial 2 packet size to 8200 bytes and 8K +512 
> bytes. It is  a good transfer.I also make sure the boundary size is indeed 
> 8K+512 ,because while I change size to 8K+513 byetes ,There is no data 
> send.So the received packet this morning with length 6456  is totally under 
> the limit.But what caused the other packets  in overflow? 
> Any suggestions could be helpful !
> peter
> 
> 
> 
> 
> 
> 
> At 2014-10-24 00:37:14, "Kujawski, Joseph"  wrote:
> Peter,
> 
> By cadence of the broadcast, I mean how often are the 8200 byte packets sent.

46 matches

Mail list logo