from:"Gary, Dale E."

Re: [casper] NFS server + Casper tutorials

2012-08-22 Thread Gary, Dale E.

Hi Fabien,

I am working through your ROACH NFS guide, and have found it very useful so 
far.  Thanks!  Not everything is yet working, but I wanted to report a couple 
of issues that I found so far.  First, there are a couple of typos in the 
section titled The filesystem, where the directory roach_boot is given as 
boot_roach.  Second, our test of DHCP did not work, and there are some useful 
things we discovered for checking why.  We ran tcpdump (has to be run as root), 
to see the DCHP requests being made by the ROACH, but dnsmasq did not respond.  
This was traced to the required ports not being open in the firewall.  We 
opened tcp port 53 and udp port 67.  To see if dnsmasq has those ports open for 
listening, one can issue the netstat -l command.  I am a complete novice at 
this, so you may know a better approach, but something along these lines would 
be helpful to add to the wiki page.

Also a question.  At the Green Bank CASPER workshop last week it was suggested 
that if you are running a system with a lot of ROACHes, to use the dnsmasq.conf 
entry read-ethers, and put MAC address/IP address pairings in the file 
/etc/ethers file.  Is this an alternative to the dhcp-host= assignment, or in 
addition to it?  An example of an /etc/ethers file might be useful to include.

I will report back as I work through the rest of it, if I find any other issues.

Regards,
Dale


From: casper-boun...@lists.berkeley.edu [casper-boun...@lists.berkeley.edu] On 
Behalf Of fabien [fabdefra...@asiaa.sinica.edu.tw]
Sent: Monday, August 06, 2012 6:16 AM
To: casper@lists.berkeley.edu
Subject: Re: [casper] NFS server + Casper tutorials

Dear all,

I updated the ROACH NFS guidehttps://casper.berkeley.edu/wiki/ROACH_NFS_guide 
on wiki.
Please tell me if you find any mistake.

After the ROACH NFS Guide, I added a section 'Getting started with CASPER 
tutorials' (which solves some of the problems you can have when doing the 
tutorials 1, 2 or 3).
It completes the ROACH NFS Guide by specifying what additional packages should 
be installed, or what file should be modified.
If you want this section to be added to another wiki page, or to create a wiki 
page only with this section, please tell me.

Regards,

Fabien



On 07/13/2012 01:13 AM, Mark Wagner wrote:
Hi Fabien,

Thanks! This looks really good, and definitely covers issues that might come 
up.  Scientific Linux is a variant of Enterprise Linux (RHEL, Suse, etc.), 
which many of us use, so I think having instructions for both is really 
helpful.  Would you be willing to update the ROACH NFS 
guidehttps://casper.berkeley.edu/wiki/ROACH_NFS_guide on the wiki to reflect 
what you've done?

Mark


On Sun, Jul 8, 2012 at 8:32 PM, fabien 
fabdefra...@asiaa.sinica.edu.twmailto:fabdefra...@asiaa.sinica.edu.tw wrote:
Dear all,

Some months ago I have been installing a NFS server on a computer with
Scientific Linux, in order to boot a ROACH. Then, I did some Casper's
tutorials.

I encountered some problems, solved them, and wrote a note about it
(attached).
A part of this note (NFS server) is based on the ROACH NFS guide.

I hope it can be useful.
If you find any mistakes or have any suggestions, please let me know.

Regards,

Fabien

Re: [casper] netbooting ROACH2

2013-06-06 Thread Gary, Dale E.

Hi John,

Thanks for the reply.  My uImage checksum for uImage-r2borph3 agrees with
yours.  I do not have any uImage-current, and do not understand why there
would be two uImages.  I will assume you are running two different ROACH2
versions and that I should be using only the uImage-r2borph3 one, but
please correct me if not.

I have the u-boot.bin (with the same checksum), but am not using it.  I am
confused about what steps I need to take, and thought I could just follow
the same procedure for ROACH-1, but do I first have to update the u-boot as
in these instructions?

http://www.mail-archive.com/casper@lists.berkeley.edu/msg03351.html

I actually tried it, but our network seems to be down at the moment.  What
about romfs?  Is this only needed for soloboot?

Thanks,
Dale


On Thu, Jun 6, 2013 at 4:44 AM, John Ford jf...@nrao.edu wrote:

 In our R2 systems, we have the following kernel:

 root@vegasr2-2:~# uname -a
 Linux vegasr2-2 3.7.0-rc2+ #20 Fri Jan 4 18:04:26 SAST 2013 ppc GNU/Linux

 I'm pretty sure the Roach-2 kernels are all 3.x.

 Can you get a checksum (md5sum) of the uimage? I think you have the right
 one, but it may be too old.

 Yes, Master1017 md5sum uImage-current
 300753db387044c3cb8f5f9d92c4fb37  uImage-current

 We also used the following:
 Yes, Master1018 md5sum uImage-r2borph3
 ac6feb36b96c410a336fb3103fafb82c  uImage-r2borph3

 Our uboot is:

 Yes, Master1022 md5sum u-boot.bin
 e396454ffa9d1e2dcbf87abf9eafccba  u-boot.bin

 Hope this helps!

 John

  Posting on behalf of Dale. Please see the message below.
 
  Thanks,
 
  Nimish
 
 
  -- Forwarded message --
  From: Gary, Dale E. dale.e.g...@njit.edu
  To: casper list casper@lists.berkeley.edu
  Cc:
  Date: Thu, 6 Jun 2013 00:40:25 +
  Subject: netbooting ROACH2
  Hi Casperites,
 
  I am setting up a ROACH2 rev2 for netbooting, and I think I am almost
  there, but the ROACH hangs during the boot process, which suggests either
  a
  setup problem or the wrong Image or filesystem.  The uImage is
  uImage-r2borph3, and the filesystem is
  roach2-debian-fs-snapshot-24-10-2012.tar.gz.  According to the output
  during the boot process (see below), I appear to have bootp and tftp set
  up
  correctly.  Can anyone spot the problem based on this information, or
  suggest a way to get more information on what might have gone wrong?
  Perhaps something to do with the USB device?
 
  Thanks,
  Dale
 
  ===Output of boot process
  Waiting for PHY auto negotiation to complete done
  ENET Speed is 1000 Mbps - FULL duplex connection (EMAC0)
  BOOTP broadcast 1
  DHCP client bound to address 192.168.24.121
  Using ppc_4xx_eth0 device
  TFTP from server 192.100.16.206; our IP address is 192.168.24.121;
 sending
  through gateway 192.168.24.1
  Filename 'uImage'.
  Load address: 0x400
  Loading:
 #
 
  #
 
  #
 
  #
   
  done
  Bytes transferred = 1390149 (153645 hex)
  ## Booting kernel from Legacy Image at 0400 ...
 Image Name:   Linux-2.6.25-svn3489
 Image Type:   PowerPC Linux Kernel Image (gzip compressed)
 Data Size:1390085 Bytes = 1.3 MiB
 Load Address: 
 Entry Point:  
 Verifying Checksum ... OK
 Uncompressing Kernel Image ... OK
  id mach(): done
  MMU:enter
  MMU:hw init
  MMU:mapin
  MMU:setio
  MMU:exit
  setup_arch: enter
  setup_arch: bootmem
  ocp: exit
  arch: exit
  Linux version 2.6.25-svn3489 (dave@lapster) (gcc version 4.2.2) #6 Fri
 Aug
  12 09:36:28 SAST 2011
  AMCC PowerPC 440EPx Roach Platform
  Zone PFN ranges:
DMA 0 -   131071
Normal 131071 -   131071
  Movable zone start PFN for each node
  early_node_map[1] active PFN ranges
  0:0 -   131071
  Built 1 zonelists in Zone order, mobility grouping on.  Total pages:
  130048
  Kernel command line: console=ttyS0,115200 root=/dev/nfs
  rootpath=192.100.16.206:/srv/roach2_boot/etch ip=dhcp
  PID hash table entries: 2048 (order: 11, 8192 bytes)
  console [ttyS0] enabled
  Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
  Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
  Memory: 516608k available (2084k kernel code, 720k data, 132k init, 0k
  highmem)
  Mount-cache hash table entries: 512
  BORPH version CVS-$Revision: 1.10 $ Initialized
  net_namespace: 152 bytes
  NET: Registered protocol family 16
 
  PCI: Probing PCI hardware
  SCSI subsystem initialized
  usbcore: registered new interface driver usbfs
  usbcore: registered new interface driver hub
  usbcore: registered new device driver usb
  NET: Registered protocol family 2
  IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
  TCP established hash table entries: 16384 (order: 5

Re: [casper] netbooting ROACH2

2013-06-06 Thread Gary, Dale E.

Ah, now I see. I also have the ROACH-1 uImage on the system, and our tftp
server used to point to it. I changed the tftp server-args to point to the
new directory, but I'll bet the server did not restart and is still
pointing to the old one. I will track that down.

Thanks!
Dale

On Thu, Jun 6, 2013 at 1:51 PM, John Ford jf...@nrao.edu wrote:

Hi John,

Thanks for the reply. My uImage checksum for uImage-r2borph3 agrees with
yours. I do not have any uImage-current, and do not understand why there
would be two uImages. I will assume you are running two different ROACH2
versions and that I should be using only the uImage-r2borph3 one, but
please correct me if not.

I should have explained more fully. We have a custom uImage that enables
both of the IIC ports on the ROACH2 for use by users. That's the
uImage-current that we use, but the other one worked fine besides the IIC
issue.

I have the u-boot.bin (with the same checksum), but am not using it. I
am
confused about what steps I need to take, and thought I could just follow
the same procedure for ROACH-1, but do I first have to update the u-boot
as
in these instructions?

http://www.mail-archive.com/casper@lists.berkeley.edu/msg03351.html

I think you do need the newest uboot, but I'm not positive. But I think
that one main thing is that you don't have the right kernel somehow being
loaded. I think that your server is serving the wrong one, since what was
in the mail showed an older kernel being started up:

Linux version 2.6.25-svn3489 (dave@lapster) (gcc version 4.2.2) #6 Fri
Aug
12 09:36:28 SAST 2011

Which is (nearly) the same as our roach1 kernel image:

Yes, Master1009 strings uImage-roach1 | more
Linux-2.6.25-svn2338
...

Whereas our roach2 kernel looks like this:

Yes, Master1007 strings uImage-r2borph3 | more
Linux-3.7.0-rc2+
...

I actually tried it, but our network seems to be down at the moment.
What
about romfs? Is this only needed for soloboot?

I think that's right. The file system should be on the NFS server.
Here's what our mounts look like:

root@vegasr2-2:~# mount
rootfs on / type rootfs (rw)
10.16.96.203:/export/home/tofu/cicadaroots/vegasr2/ on / type nfs

(ro,noatime,vers=2,rsize=4096,wsize=4096,namlen=255,hard,nolock,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.16.96.203,mountvers=1,mountproto=udp,local_lock=all,addr=10.16.96.203)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
devpts on /dev/pts type devpts (rw,noexec,relatime,mode=620)
tmpfs0 on /var/tmp type tmpfs (rw,relatime,size=32768k)
tmpfs1 on /tmp type tmpfs (rw,relatime,size=16384k)
root@vegasr2-2:~#

John

Thanks,
Dale

On Thu, Jun 6, 2013 at 4:44 AM, John Ford jf...@nrao.edu wrote:

In our R2 systems, we have the following kernel:

root@vegasr2-2:~# uname -a
Linux vegasr2-2 3.7.0-rc2+ #20 Fri Jan 4 18:04:26 SAST 2013 ppc
GNU/Linux

I'm pretty sure the Roach-2 kernels are all 3.x.

Can you get a checksum (md5sum) of the uimage? I think you have the
right
one, but it may be too old.

Yes, Master1017 md5sum uImage-current
300753db387044c3cb8f5f9d92c4fb37 uImage-current

We also used the following:
Yes, Master1018 md5sum uImage-r2borph3
ac6feb36b96c410a336fb3103fafb82c uImage-r2borph3

Our uboot is:

Yes, Master1022 md5sum u-boot.bin
e396454ffa9d1e2dcbf87abf9eafccba u-boot.bin

Hope this helps!

John

Posting on behalf of Dale. Please see the message below.

Thanks,

Nimish

-- Forwarded message --
From: Gary, Dale E. dale.e.g...@njit.edu
To: casper list casper@lists.berkeley.edu
Cc:
Date: Thu, 6 Jun 2013 00:40:25 +
Subject: netbooting ROACH2
Hi Casperites,

I am setting up a ROACH2 rev2 for netbooting, and I think I am almost
there, but the ROACH hangs during the boot process, which suggests
either
a
setup problem or the wrong Image or filesystem. The uImage is
uImage-r2borph3, and the filesystem is
roach2-debian-fs-snapshot-24-10-2012.tar.gz. According to the output
during the boot process (see below), I appear to have bootp and tftp
set
up
correctly. Can anyone spot the problem based on this information, or
suggest a way to get more information on what might have gone wrong?
Perhaps something to do with the USB device?

Thanks,
Dale

===Output of boot process
Waiting for PHY auto negotiation to complete done
ENET Speed is 1000 Mbps - FULL duplex connection (EMAC0)
BOOTP broadcast 1
DHCP client bound to address 192.168.24.121
Using ppc_4xx_eth0 device
TFTP from server 192.100.16.206; our IP address is 192.168.24.121;
sending
through gateway 192.168.24.1
Filename 'uImage'.
Load address: 0x400
Loading

Re: [casper] netbooting ROACH2

2013-06-06 Thread Gary, Dale E.

Hi Dave,

Hmm, I was just trying to follow the instructions in the roach nfs guide,
which says to create a directory named etch to put the root file system.  I
probably renamed the directory to etch.  Will that matter?  It is the
correct file system--just a different directory name.

However, I fixed the problem with loading the wrong uImage, and it now
boots to the point where I get this output:

Sending DHCP requests ., OK
IP-Config: Got DHCP answer from 192.100.16.206, my address is 192.168.24.121
IP-Config: Complete:
 device=eth0, addr=192.168.24.121, mask=255.255.255.0, gw=192.168.24.1
 host=roach1.solar.pvt, domain=solar.pvt solar.ovro.caltech.edu,
nis-domain=(none)
 bootserver=192.100.16.206, rootserver=192.100.16.206,
rootpath=/srv/roach2_boot/etch
 nameserver0=192.100.16.2, nameserver1=192.168.9.15
VFS: Unable to mount root fs via NFS, trying floppy.
VFS: Cannot open root device nfs or unknown-block(2,0): error -6
Please append a correct root= boot option; here are the available
partitions:
1f004096 mtdblock0  (driver?)
1f01   65536 mtdblock1  (driver?)
1f02   49152 mtdblock2  (driver?)
1f03   11264 mtdblock3  (driver?)
1f04 256 mtdblock4  (driver?)
1f05 512 mtdblock5  (driver?)
Kernel panic - not syncing: VFS: Unable to mount root fs on
unknown-block(2,0)
Rebooting in 180 seconds..

So indeed the file system may not be right.  Still working on it...

Regards,
Dale


On Thu, Jun 6, 2013 at 3:50 PM, David MacMahon dav...@astro.berkeley.eduwrote:

 Hi, Dale and Nimish,

 In addition to running an older kernel as others have pointed out, I think
 you might also be using an older root file system:

 On Jun 5, 2013, at 5:59 PM, Nimish Sane wrote:

  Kernel command line: console=ttyS0,115200 root=/dev/nfs
 rootpath=192.100.16.206:/srv/roach2_boot/etch ip=dhcp

 Having the rootpath end with etch suggests that you are using a root
 file system based on Debian Etch, which used for ROACH1.  The ROACH2
 filesystem is now based on Debian Squeeze.

 Hope this helps,
 Dave

Re: [casper] netbooting ROACH2

2013-06-06 Thread Gary, Dale E.

 Good luck!  I don't have any further ideas other than verifying that the
root file system is mountable by
 computers on the ROACH2 subnet.

You are a genius!  I hadn't updated exportfs...  It is now booted!


Thanks,
Dale


On Thu, Jun 6, 2013 at 4:39 PM, David MacMahon dav...@astro.berkeley.eduwrote:

 Hi, Gary,

 On Jun 6, 2013, at 9:20 AM, Gary, Dale E. wrote:

  Hmm, I was just trying to follow the instructions in the roach nfs
 guide, which says to create a directory named etch to put the root file
 system.  I probably renamed the directory to etch.  Will that matter?  It
 is the correct file system--just a different directory name.

 It won't matter for the computer (a rose by any other name...), but it
 might confuse humans! :-)

  However, I fixed the problem with loading the wrong uImage, and it now
 boots to the point where I get this output:
 
  Sending DHCP requests ., OK
  IP-Config: Got DHCP answer from 192.100.16.206, my address is
 192.168.24.121
  IP-Config: Complete:
   device=eth0, addr=192.168.24.121, mask=255.255.255.0,
 gw=192.168.24.1
   host=roach1.solar.pvt, domain=solar.pvt solar.ovro.caltech.edu,
 nis-domain=(none)
   bootserver=192.100.16.206, rootserver=192.100.16.206,
 rootpath=/srv/roach2_boot/etch
   nameserver0=192.100.16.2, nameserver1=192.168.9.15
  VFS: Unable to mount root fs via NFS, trying floppy.
  VFS: Cannot open root device nfs or unknown-block(2,0): error -6
  Please append a correct root= boot option; here are the available
 partitions:
  1f004096 mtdblock0  (driver?)
  1f01   65536 mtdblock1  (driver?)
  1f02   49152 mtdblock2  (driver?)
  1f03   11264 mtdblock3  (driver?)
  1f04 256 mtdblock4  (driver?)
  1f05 512 mtdblock5  (driver?)
  Kernel panic - not syncing: VFS: Unable to mount root fs on
 unknown-block(2,0)
  Rebooting in 180 seconds..
 
  So indeed the file system may not be right.  Still working on it...

 Good luck!  I don't have any further ideas other than verifying that the
 root file system is mountable by computers on the ROACH2 subnet.

 Dave

Re: [casper] netbooting ROACH2

2013-06-06 Thread Gary, Dale E.

Sorry for all of the questions, but now when I try to program the FPGA with
katcp is complains about a read-only file system.  /etc/exports declares it
rw, so I seem to be stuck again.

Thanks,
Dale




On Thu, Jun 6, 2013 at 5:02 PM, Gary, Dale E. dale.e.g...@njit.edu wrote:

  Good luck!  I don't have any further ideas other than verifying that the
 root file system is mountable by
  computers on the ROACH2 subnet.

 You are a genius!  I hadn't updated exportfs...  It is now booted!


 Thanks,
 Dale


 On Thu, Jun 6, 2013 at 4:39 PM, David MacMahon 
 dav...@astro.berkeley.eduwrote:

 Hi, Gary,

 On Jun 6, 2013, at 9:20 AM, Gary, Dale E. wrote:

  Hmm, I was just trying to follow the instructions in the roach nfs
 guide, which says to create a directory named etch to put the root file
 system.  I probably renamed the directory to etch.  Will that matter?  It
 is the correct file system--just a different directory name.

 It won't matter for the computer (a rose by any other name...), but it
 might confuse humans! :-)

  However, I fixed the problem with loading the wrong uImage, and it now
 boots to the point where I get this output:
 
  Sending DHCP requests ., OK
  IP-Config: Got DHCP answer from 192.100.16.206, my address is
 192.168.24.121
  IP-Config: Complete:
   device=eth0, addr=192.168.24.121, mask=255.255.255.0,
 gw=192.168.24.1
   host=roach1.solar.pvt, domain=solar.pvt solar.ovro.caltech.edu,
 nis-domain=(none)
   bootserver=192.100.16.206, rootserver=192.100.16.206,
 rootpath=/srv/roach2_boot/etch
   nameserver0=192.100.16.2, nameserver1=192.168.9.15
  VFS: Unable to mount root fs via NFS, trying floppy.
  VFS: Cannot open root device nfs or unknown-block(2,0): error -6
  Please append a correct root= boot option; here are the available
 partitions:
  1f004096 mtdblock0  (driver?)
  1f01   65536 mtdblock1  (driver?)
  1f02   49152 mtdblock2  (driver?)
  1f03   11264 mtdblock3  (driver?)
  1f04 256 mtdblock4  (driver?)
  1f05 512 mtdblock5  (driver?)
  Kernel panic - not syncing: VFS: Unable to mount root fs on
 unknown-block(2,0)
  Rebooting in 180 seconds..
 
  So indeed the file system may not be right.  Still working on it...

 Good luck!  I don't have any further ideas other than verifying that the
 root file system is mountable by computers on the ROACH2 subnet.

 Dave

Re: [casper] Setting up 10gbe core for ROACH2

2013-06-07 Thread Gary, Dale E.

Hi Glenn,

I recalled your original messages, and just located them again.  There it
says you were going to disable burst checking.  How does one do that?  We
do not have a BEE2 on the network, but it is the main LAN with a lot of
devices on it, so this could be our problem.

Thanks,
Dale


On Fri, Jun 7, 2013 at 1:23 PM, G Jones glenn.calt...@gmail.com wrote:

 H Dale,
 Do you have a lot of other ARP traffic on the network (from a BEE2 for
 example?) We ran into an issue where the current implementation of
 tcpborphserver3 (which does all of the tgtap stuff; there is no
 separate process for it now) will not send enough ARP requests to
 populate it's ARP table if there is significant other ARP traffic on
 the network. This is an optimization for situations where you have a
 large number of ROACH2s on a network to avoid being overwhelmed with
 ARP broadcasts. However, it also means it flat out doesn't work if you
 have a BEE2 on the network. If this sounds like your problem (try
 sniffing the network and see what's flowing around), I'll try to dig
 up old emails with the solution we concocted. It should be in the
 casper list archive.

 Glenn

 On Fri, Jun 7, 2013 at 9:14 AM, Gary, Dale E. dale.e.g...@njit.edu
 wrote:
  Hi All,
 
  We previously were successful in setting up the 10gbe core for ROACH1
 using
  tap_start, which I believe resulted in a process tgtap appearing in the
  ROACH.
 
  I am using the same method for ROACH2 (using the latest ROACH2 rev 2 file
  system loaded via netboot), and there is no reported error, but the ARP
  table does not contain the MAC address for the destination IP address,
 and
  when I check the processes running on the ROACH2 I do not see tgtap.
 
  Has the procedure changed at all, or is there something I am doing wrong?
 
  Thanks,
  Dale
 
  Here are the relevant katcp python calls we are using:
 
  # Configuration Parameters for 10 GbE ports.
  dest_ip = 10 * ( 2 ** 24 ) + 101 # DPP_IP 10.0.0.101
  fabric_port = 6
  source_ip = 10 * ( 2 ** 24) + 11 # ROACH1, SLOT 0, CHANNEL 0: 10.0.0.11
  mac_base = (2  40) + (2  32)
 
  # Configuring 10 Gbe core Tx_P
  print 'Configuring transmitter core...',
  sys.stdout.flush()
  fpga.tap_start('tapP1',tx_core_name, mac_base + source_ip, source_ip,
  fabric_port)
  print 'done'
 
  print 'Setting-up DPP IP and port information...',
  sys.stdout.flush()
  fpga.wordwrite('f_ctrl_dpp_ip', 0, hex(dest_ip))
  fpga.wordwrite('f_ctrl_dpp_port', 0, hex(fabric_port))
  print 'done'
 
  # Reset 10 GbE cores
  print 'Resetting 10 GbE core...',
  sys.stdout.flush()
  fpga.wordwrite('f_ctrl_rst', 0, hex(1))
  fpga.wordwrite('f_ctrl_rst', 0, hex(0))
  print 'done'
 
  time.sleep(2)
 
  # Print ARP table
  print '\n===\n'
  print '10GbE Transmitter core details:'
  print '\n===\n'
  print Note that for some IP address values, only the lower 8 bits
 are
  valid!
  fpga.print_10gbe_core_details( tx_core_name, arp=True )
 
  This ARP table is correct for the source_ip, but all FF for the MAC
 address
  of the dest_ip.

Re: [casper] Setting up 10gbe core for ROACH2

2013-06-10 Thread Gary, Dale E.

Hi Marc,

Thanks for pointing out the ?tap-info command.  In fact, the problem I am
having does not seem to be the tgtap server alone (although it may be
implicated), and is certainly not due to too much ARP traffic.  I found
that after the startup script is done the ARP table is not correct, but I
can start an interactive Python session and issue the exact same commands
as were run in the script and it works.  I tried moving the tap-start
around in the script, adding time.sleep() in various places in case timing
was an issue, but nothing seems to help.

I finally gave up and figured out how to set the ARP table manually.  The
Python code below does it, where tx_cor_name is a string corresponding to
the tap-device in the design.

# Manually set ARP table
arp = fpga.read( tx_core_name, 256*8, 0x3000 )
arp_tab = numpy.array( struct.unpack('256Q', arp) )
arp_tab[101] = 0x0060dd4623fe   # MAC address for 10.0.0.101 (DPP eth2)
arp_tab[102] = 0x0060dd4623ff   # MAC address for 10.0.0.102 (DPP eth3)
arp = struct.pack( '256Q', *arp_tab.tolist() )
fpga.write( tx_core_name, arp, 0x3000 )

I am still puzzled, but at least we have a solution that allows us to move
forward.

Regards,
Dale

[casper] Problem receiving ROACH2 packets

2013-06-14 Thread Gary, Dale E.

Hi All,

I am having a problem receiving 10gbe packets from one of the interfaces on
a dual-port Myricom NIC.  I believe the packets are properly addressed, and
wireshark sees them fine, but programmatically we cannot receive them on
10.0.0.102 via C or Python (recvfrom() just hangs), while on 10.0.0.101
everything is working fine.

Below is the wireshark output, and the output of ifconfig.  Packet 4080 is
from 10.0.0.21 to 10.0.0.101 (eth2) and packet 4081 is from 10.0.0.11 to
10.0.0.102 (eth3).  If the problem is not here, can someone suggest where
else to look?

Thanks,
Dale

No. TimeSourceDestination   Protocol
Length Info
   4080 1.14964410.0.0.21 10.0.0.101UDP
900Source port: 6  Destination port: 6

Frame 4080: 900 bytes on wire (7200 bits), 900 bytes captured (7200 bits)
Linux cooked capture
Packet type: Unicast to us (0)
Link-layer address type: 1
Link-layer address length: 6
Source: MS-NLB-PhysServer-02_0a:00:00:15 (02:02:0a:00:00:15)
Protocol: IP (0x0800)
Internet Protocol Version 4, Src: 10.0.0.21 (10.0.0.21), Dst: 10.0.0.101
(10.0.0.101)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00:
Not-ECT (Not ECN-Capable Transport))
Total Length: 884
Identification: 0x (0)
Flags: 0x02 (Don't Fragment)
Fragment offset: 0
Time to live: 255
Protocol: UDP (17)
Header checksum: 0x63ff [correct]
Source: 10.0.0.21 (10.0.0.21)
Destination: 10.0.0.101 (10.0.0.101)
User Datagram Protocol, Src Port: 6 (6), Dst Port: 6 (6)
Source port: 6 (6)
Destination port: 6 (6)
Length: 864
Checksum: 0x (none)
Data (856 bytes)

No. TimeSourceDestination   Protocol
Length Info
   4081 1.14965910.0.0.11 10.0.0.102UDP
900Source port: 6  Destination port: 6

Frame 4081: 900 bytes on wire (7200 bits), 900 bytes captured (7200 bits)
Linux cooked capture
Packet type: Unicast to us (0)
Link-layer address type: 1
Link-layer address length: 6
Source: MS-NLB-PhysServer-02_0a:00:00:0b (02:02:0a:00:00:0b)
Protocol: IP (0x0800)
Internet Protocol Version 4, Src: 10.0.0.11 (10.0.0.11), Dst: 10.0.0.102
(10.0.0.102)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00:
Not-ECT (Not ECN-Capable Transport))
Total Length: 884
Identification: 0x (0)
Flags: 0x02 (Don't Fragment)
Fragment offset: 0
Time to live: 255
Protocol: UDP (17)
Header checksum: 0x6408 [correct]
Source: 10.0.0.11 (10.0.0.11)
Destination: 10.0.0.102 (10.0.0.102)
User Datagram Protocol, Src Port: 6 (6), Dst Port: 6 (6)
Source port: 6 (6)
Destination port: 6 (6)
Length: 864
Checksum: 0x (none)
Data (856 bytes)

===Output of ifconfig
eth2  Link encap:Ethernet  HWaddr 00:60:dd:46:23:fe
  inet addr:10.0.0.101  Bcast:10.0.0.255  Mask:255.255.255.0
  inet6 addr: fe80::260:ddff:fe46:23fe/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
  RX packets:2764212717 errors:0 dropped:0 overruns:0 frame:0
  TX packets:218163 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:2481980540060 (2.4 TB)  TX bytes:66859062 (66.8 MB)
  Interrupt:60

eth3  Link encap:Ethernet  HWaddr 00:60:dd:46:23:ff
  inet addr:10.0.0.102  Bcast:10.0.0.255  Mask:255.255.255.0
  inet6 addr: fe80::260:ddff:fe46:23ff/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
  RX packets:684995316 errors:0 dropped:0 overruns:0 frame:0
  TX packets:178 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:614750268368 (614.7 GB)  TX bytes:20024 (20.0 KB)
  Interrupt:60

[casper] Netbooting ROACH and ROACH2 from same server

2013-06-19 Thread Gary, Dale E.

Hi All,

I would like to get a ROACH1 to netboot on the same network as our ROACH2s,
but I am not sure how to specify two different directories from which to
serve uImages to different machines.  Is it possible?  I am using the
Ubuntu tftp server, whose config file (/etc/xinetd.d/tftp) specifies the
boot directory as
server_args = -s /srv/roach2_boot/boot
but I would need to specify /srv/roach_boot/boot for the ROACH1.

Thanks,
Dale

[casper] SFP+ Cables

2013-07-25 Thread Gary, Dale E.

Hi All,

I need to order the cables for connecting our ROACH2 SFP+ mezzanine cards
to a 10GBE swtich.  We need only 1 m cables.  I find some rather cheap
passive copper cables (e.g.
http://www.cablesondemand.com/pcategory/91/category/SFP%2B+CBL/URvars/Catalog/Library/InfoManage/SFP+_CABLES_%28DIRECT_ATTACH%29.htm?gclid=CMSTkrqNy7gCFYdxQgodhnkAuw),
but will these work, or do I need optical cables with tranceivers on each
end?  Seems overkill for 1 m!  The 10 GBe switch is not yet defined, but I
am looking for surplus Dell 8024F switches recommended by Dan W.

Thanks,
Dale

[casper] Configuring a DELL 8024F 10gbe SFP+ switch

2013-09-14 Thread Gary, Dale E.

Hi All,

I would like to configure a DELL 8024F switch to get its out-of-band IP
address via DHCP, and just forward packets on the 10GBe side between ROACH
boards and a Myricom dual-port board on a linux PC (two subnets:
192.168.1.x and 192.168.2.x).  Does anyone have a succinct set of
instructions to do that?  The Dell instructions for this managed switch are
bewildering to me, and doing the first step does not seem to work.

Thanks,
Dale

Re: [casper] Matlab components for toolflow

2013-09-26 Thread Gary, Dale E.

Hi Andrew,

We are desperate to complete a design by this week in order to support our
software development team over the weekend, but we are stuck right now due
to this issue. If you do manage to find a work-around and can provide a
block update, please let us know.

Many Thanks,
Dale

On Thu, Sep 26, 2013 at 5:43 AM, Andrew Martens and...@ska.ac.za wrote:

Hi Dave

I use fi to convert coefficients to be stored from parallel matlab double
precision format into single unsigned words that will accurately be
converted back by slicing and using convert blocks. I think it is possible
to create the same functionality, I will have a quick try.

Cheers
Andrew

Hi, Andrew,

Can you elaborate on how you use fi? I think much of the functionality
can be accomplished by judicious use of the multiplication, rounding, and
modulo math (for wrapping) or x(xmax)=max (for saturation). It might be
possible to create a clean room replacement function that does what you
need.

Thanks,
Dave

On Sep 25, 2013, at 7:36 AM, Andrew Martens wrote:

Thanks Andrew! Would you know approximately how much time it would take
for you to look into this? Meanwhile, would you know the latest commit that
does not use fi, and hence, does not need Floating-point toolbox?

Replacing fi will be difficult. I will probably rather provide a
parameter that allows the user to choose not to pack coefficients next to
each other in the same BRAM (which fi is part of now). This will allow the
user to trade off BRAM for license fees :)

The last commit not using fi would be before my FFT upgrade, so
somewhere near the first quarter of the year.

Regards
Andrew

Thanks,

Nimish

On Wed, Sep 25, 2013 at 1:24 AM, David MacMahon
dav...@astro.berkeley.edu wrote:
Thanks, Andrew!

Dave

On Sep 24, 2013, at 10:13 PM, Andrew Martens wrote:

I do use the fi constructor to generate fixed point values when
generating twiddle values for storage.

I will see if it can be done in another way as it seems wasteful to
require a license just for that.

Regards
Andrew

Thanks, Nimish,

Looking at the recently updated casper-astro repository, the
following mask init scripts use the fi function (technically a
constructor, I think) to crete fixed point objects:

casper_library/cosin_init.m
casper_library/feedback_osc_**init.m
casper_library/pfb_fir_coeff_**gen_init.m

Any block that uses the scripts (e.g. via sub-blocks) will probably
need a Fixed Point Toolbox license. I suspect it wouldn't be too
difficult
to rewrite these files in a way that maintain the functionality, but
avoids
the fi (and any related) call(s).

It looks like Andrew Martens introduced at least some of the fi
dependencies, so maybe he would be willing to redo the relevant bits of
these files?

Dave

On Sep 24, 2013, at 5:00 PM, Nimish Sane wrote:

Some more investigation:

I am seeing these messages:

License checkout failed.
License Manager Error -5
Cannot find a license for Fixed_Point_Toolbox.

Troubleshoot this issue by visiting:
http://www.mathworks.com/**support/lme/R2012b/5http://www.mathworks.com/support/lme/R2012b/5

Diagnostic Information:
Feature: Fixed_Point_Toolbox
License path: /home/observer/.matlab/R2012b_**
licenses:/home/observer/tools/**MATLAB/R2012b/licenses/**
license.dat:/home/observer/**tools/MATLAB/R2012b/licenses/**
license_fpgadev_277254_R2012b.**lic
Licensing error: -5,357.
Simulink:Masking:Bad_Init_**Commands: Error in
'fft_wideband_real_core/fft_**wideband_real/fft_direct/**
butterfly0_0/twiddle/coeff_**gen/feedback_osc': Initialization
commands cannot be evaluated.
Backtrace 1: reuse_block:138
Backtrace 2: coeff_gen_init:498
Backtrace 3: reuse_block:51
Backtrace 4: add_convert_init:496
Backtrace 5: draw_basic_partial_cycle:407
Backtrace 6: cosin_init:165
Backtrace 7: xlUpdateIcon:207
Backtrace 8: xlBlockLoadCallback:79
Backtrace 9: UpdateDiagramCB:221

If I turn OFF the option Generate coeffs with multipliers where
useful, these messages do not appear. Still, I get the following error:
Error in 'fft_wideband_real_core/fft_**
wideband_real/fft_biplex_real_**4x/biplex_core/fft_stage_10/**
butterfly_direct/twiddle/**coeff_gen': Initialization commands
cannot be evaluated.

Caused by:
Error in 'fft_wideband_real_core/fft_**
wideband_real/fft_biplex_real_**4x/biplex_core/fft_stage_10/**
butterfly_direct/twiddle/**coeff_gen/cosin': Initialization
commands cannot be evaluated.

Unable to check out a license for the Fixed-Point Toolbox.

I will let you know if I find something more.

Thanks,

Nimish

On Tue, Sep 24, 2013 at 7:36 PM, David MacMahon
dav...@astro.berkeley.edu wrote:
Thanks. I was hoping to narrow it down a little more than that.
There's a lot of stuff inside that little green block!

Dave

On Sep 24, 2013, at 4:33 PM, Nimish Sane wrote:

To be precise, that is the only green block in the design apart
from bunch of

Re: [casper] Questions of getting Xilinx 14.5 license

2013-10-01 Thread Gary, Dale E.

I just updated our old license, which had ended in Dec. 2012, by entering a
request on XUP (Xilinx University Program), and they fulfilled the request
in about 1 day.  However, the license they gave me was for
UEF-VIVADO-SYSTEM-25,
which I hope includes ISE, but I fear it may not.

Nimish, did you try using this license file for ISE under 14.5?  If it does
not work then I have to put in another request.

Thanks,
Dale


On Tue, Oct 1, 2013 at 9:46 PM, Weiwei Sun su...@uw.edu wrote:

 Hi all,

 I'm upgrading xilinx 11.5 to 14.5. I downloaded the ISE Design Suite (14.5
 Full Product Installation) from the following link:
 http://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/design-tools/v2012_4---14_5.html

 Is it the right and only xilinx software I need to install?

 I also have question about the license.  Is the Suite license free to us?
 If not, where can we buy the license? Some instructions are very
 appreciated!

 Thanks!

 Weiwei

Re: [casper] adc5g block run at 2500MHz

2014-01-24 Thread Gary, Dale E.

Hi All,

Could someone summarize this discussion for me re: the case I am interested
in, which is as follows?  I have a need for a dual-pol spectrometer
covering an RF of at least 1-18 GHz (0.5-18 GHz is better).  From the
discussion, it sounds like one option is to use the 1x5GSPs board at 1:1
(8-bit) on ROACH-2, using two such ADCs per ROACH, giving 2.5 GHz
dual-polarization per ROACH.  Using 7 such ROACH systems would cover 17.5
GHz, with suitable downconversion upstream.  However, it is likely that I
can only afford 4 ROACH-2s for the project, which would mean I would have
to time-multiplex to cover the band, but could then relax the clock speed
to 2.2 GHz and cover 2.2 GHz x 4 ROACH x 2 times = 17.6 GHz.

Is this a viable solution?  What is the FPGA clock speed resulting from
this?  Is it 8 times less than the ADC clock speed?

Thanks,
Dale


On Fri, Jan 24, 2014 at 7:04 PM, Dan Werthimer d...@ssl.berkeley.eduwrote:



 you can probably to sample faster than 3.2Gsps 8 bit on a roach1
 if you use the -2 speed grade fpga.   you can look at the fpga
 data sheets to find their max lvds bit rate for the -1 and -2 speed
 grades.

 note that if you want to use the 5Gsps 4 bit ADC, that that's
 a different adc board - you can not make an 8 bit board into a 4 bit board
 or vice versa.   order the one you need.

 i agree that if you want to sample 5Gsps 8 bits, the easiest thing to
 do is get a roach2 board.   several people have developed instruments
 using a pair of 5 Gsps ADC's feeding a roach2.

 best wishes,

 dan


 On Fri, Jan 24, 2014 at 3:43 PM, Ross Williamson 
 rwilliam...@astro.caltech.edu wrote:

 The roach 1 will only work at the speeds you are interested at with
 the dmux 2:1 option at 4 bits.  I think the max speed at 8 bits is
 about 3.2GSPS and that is due to the ZDOK interface.

 I'm sure with significant effort in planahead you could achieve 5GSPS
 on a low resolution correlator in a ROACH-1 (I have it working at
 4GSPS).  My advice would be to just go the ROACH-2 route and eat the
 cost - I believe there are correlator designs already working out
 there at 5GSPS using an adc5g on a ROACH2


 On Fri, Jan 24, 2014 at 8:19 AM, John Ford jf...@nrao.edu wrote:
  i suggest you contact mo ohady at digicom electronics
  for pricing and availability of adc's and roach boards.
  m...@digicom.org
 
  casper collaborators might already have a design similar to
  what you need.
  are you building a correlator?  or a spectrometer?
  how many frequency channels?  how many adc inputs?
  readout rate?   full stokes?
 
  it might be hard to get the roach1 to sample all the way up to 5 Gsps.
  it's been accomplished by several people on roach2, but i don't know
  anyone that has a roach1 and adc08-5000 working at 5 Gsps.
 
  Agreed, although I think it would be easy to port a working spectrometer
  with an iADC to use the 5gs sampler sampling at only 800 MS/s.
 
  Using it in the demux x8 it should work fine.  the demux x16 is more of
 a
  problem for roach1 due to routing resource usage, I think.
 
  John
 
 
  best wishes,
 
  dan
 
  On Fri, Jan 24, 2014 at 2:16 AM, Marco Bartolini
  mbartol...@med.ira.inaf.it
  wrote:
 
  Hi everyone,
  from what I understand the adc5g can easily sample 2.2GHz bandwidth
 at 8
  bit without a great effort in the design phase.
  I'd like to understand how portable would it be a design using
  roach1+iADC
  with 400MHz bandwidth to a roach1+adc5g setup sampling 2.2GHz
 bandwidth,
  just by encreasing the processed bandwidth and tweaking the necessary
  bits
  in the model file like the number of parallel streams ecc...
 
  Also, can anyone provide informations about pricing and availability
 of
  these high freq samplers?
 
  thanks for the info
  cheers
  Marco
 
 
 
  2014/1/15 Weiwei Sun su...@uw.edu
 
  Oh, that's clear!  I followed it and got the design compiled
  successfully
  in Simulink. Thanks Jack!
 
  Best,
 
  Weiwei
 
 
  On Wed, Jan 15, 2014 at 12:35 PM, Jack Hickish
  jackhick...@gmail.comwrote:
 
  Hi Weiwei,
 
  I'll leave it up to Rurik to merge or not the changes into
  sma-wideband, but in the meantime, the changes (along with LOTS of
  other modifications to other parts of the library) are in my repo at
  https://github.com/jack-h/mlib_devel
 
  Alternatively, if you've got the latest mlib_devel from
 sma-wideband,
  you can apply the patch I emailed using the command (run within the
  repository)
 
  git apply /path/to/adc5g-mmcm.patch
 
  Cheers,
  Jack
 
  On 15 January 2014 20:24, Weiwei Sun su...@uw.edu wrote:
   Hi Jack and Rurik,
  
   I'm very excited to hear that the adc could run at exact 5GSPS
 mode
  (2
   channels, demux 1:1) . I'd like to follow the modification, but
 the
  vhdl is
   something that I'm not familiar. I wonder if it is possible that
 the
  module
   could be merged into the sma-wideband github, or help with some
 more
  details
   to instruct me to make the modification. That would be a BIG help
  for
  my
   project.

Re: [casper] adc5g block run at 2500MHz

2014-01-24 Thread Gary, Dale E.

Thanks Dan.  I think I should get a couple of these ADCs and maybe John's
design and give it a try!  John, how many channels does your design use
over 2.5 GHz?  We would want of order 20,000, if that is possible.

Regards,
Dale


On Fri, Jan 24, 2014 at 7:48 PM, Dan Werthimer d...@ssl.berkeley.eduwrote:



 hi dale,

 your plan sounds good to me.

 the adc08-5000 yellow block for roach2 demuxes by 16,
 so a 5Gsps sample rates means a 312.5 MHz FPGA rate.

 312 MHz fpga rate is not easy to acheive - you need to learn how
 to use the plan ahead software, and you can't pack the fpga full.

 how many spectral channels do you want?
 their may be some roach2/adc08-5000 spectrometer designs
 you can use or adapt.
 if you are lucky, you can use a design as is,
 and not have to learn floor planning to get to 312 MHz clock rates.

 i think NRAO's DiBAS design (led by John Ford) uses
 a Roach2 board to sample a pair of ADC's at 5 Gsps,
 and one of the DiBAS modes does stokes spectroscopy.
 and Jack Hickish's AMI correlator also samples a
 pair of 5 Gsps ADC's using Roach2.

 best wishes,

 dan





 On Fri, Jan 24, 2014 at 4:30 PM, Gary, Dale E. dale.e.g...@njit.eduwrote:

 Hi All,

 Could someone summarize this discussion for me re: the case I am
 interested in, which is as follows?  I have a need for a dual-pol
 spectrometer covering an RF of at least 1-18 GHz (0.5-18 GHz is better).
 From the discussion, it sounds like one option is to use the 1x5GSPs board
 at 1:1 (8-bit) on ROACH-2, using two such ADCs per ROACH, giving 2.5 GHz
 dual-polarization per ROACH.  Using 7 such ROACH systems would cover 17.5
 GHz, with suitable downconversion upstream.  However, it is likely that I
 can only afford 4 ROACH-2s for the project, which would mean I would have
 to time-multiplex to cover the band, but could then relax the clock speed
 to 2.2 GHz and cover 2.2 GHz x 4 ROACH x 2 times = 17.6 GHz.

 Is this a viable solution?  What is the FPGA clock speed resulting from
 this?  Is it 8 times less than the ADC clock speed?

 Thanks,
 Dale


 On Fri, Jan 24, 2014 at 7:04 PM, Dan Werthimer d...@ssl.berkeley.eduwrote:



 you can probably to sample faster than 3.2Gsps 8 bit on a roach1
 if you use the -2 speed grade fpga.   you can look at the fpga
 data sheets to find their max lvds bit rate for the -1 and -2 speed
 grades.

 note that if you want to use the 5Gsps 4 bit ADC, that that's
 a different adc board - you can not make an 8 bit board into a 4 bit
 board
 or vice versa.   order the one you need.

 i agree that if you want to sample 5Gsps 8 bits, the easiest thing to
 do is get a roach2 board.   several people have developed instruments
 using a pair of 5 Gsps ADC's feeding a roach2.

 best wishes,

 dan


 On Fri, Jan 24, 2014 at 3:43 PM, Ross Williamson 
 rwilliam...@astro.caltech.edu wrote:

 The roach 1 will only work at the speeds you are interested at with
 the dmux 2:1 option at 4 bits.  I think the max speed at 8 bits is
 about 3.2GSPS and that is due to the ZDOK interface.

 I'm sure with significant effort in planahead you could achieve 5GSPS
 on a low resolution correlator in a ROACH-1 (I have it working at
 4GSPS).  My advice would be to just go the ROACH-2 route and eat the
 cost - I believe there are correlator designs already working out
 there at 5GSPS using an adc5g on a ROACH2


 On Fri, Jan 24, 2014 at 8:19 AM, John Ford jf...@nrao.edu wrote:
  i suggest you contact mo ohady at digicom electronics
  for pricing and availability of adc's and roach boards.
  m...@digicom.org
 
  casper collaborators might already have a design similar to
  what you need.
  are you building a correlator?  or a spectrometer?
  how many frequency channels?  how many adc inputs?
  readout rate?   full stokes?
 
  it might be hard to get the roach1 to sample all the way up to 5
 Gsps.
  it's been accomplished by several people on roach2, but i don't know
  anyone that has a roach1 and adc08-5000 working at 5 Gsps.
 
  Agreed, although I think it would be easy to port a working
 spectrometer
  with an iADC to use the 5gs sampler sampling at only 800 MS/s.
 
  Using it in the demux x8 it should work fine.  the demux x16 is more
 of a
  problem for roach1 due to routing resource usage, I think.
 
  John
 
 
  best wishes,
 
  dan
 
  On Fri, Jan 24, 2014 at 2:16 AM, Marco Bartolini
  mbartol...@med.ira.inaf.it
  wrote:
 
  Hi everyone,
  from what I understand the adc5g can easily sample 2.2GHz bandwidth
 at 8
  bit without a great effort in the design phase.
  I'd like to understand how portable would it be a design using
  roach1+iADC
  with 400MHz bandwidth to a roach1+adc5g setup sampling 2.2GHz
 bandwidth,
  just by encreasing the processed bandwidth and tweaking the
 necessary
  bits
  in the model file like the number of parallel streams ecc...
 
  Also, can anyone provide informations about pricing and
 availability of
  these high freq samplers?
 
  thanks for the info
  cheers
  Marco
 
 
 
  2014/1/15 Weiwei Sun su

[casper] KatADC est_brd_clk

2014-03-11 Thread Gary, Dale E.

Hi All,

We are bringing multiple ROACH2 boards online, each equipped with two
KatADC boards.  To my surprise, 3 of our 8 ROACH2s show a 401 MHz FPGA
clock rate, as returned from the Python KATCP est_brd_clk() function, when
supplying an 800 MHz clock to the ADCs.  The others show the correct value
of 200 MHz.  There is a difference in ROACH2 LED status lights on these
boards as well (LED #1 is out on the misbehaving boards).  Does anyone know
what might cause this, or have a suggestion of what to try?  KatADC
problem, or ROACH problem, or something else?

Thanks,
Dale

[casper] Problem with reading ROACH-2 sensors

2015-10-28 Thread Gary, Dale E.

Hi All,

We have 8 identical (I hope) ROACH-2 boards.  We have been using 4 of them
for a long time, and I just brought two more online, but one of them is
behaving differently than the others.  One problem is that the sensor list
is different.  If I run the katcp routine to get the sensor list

reply, sensors = ro[5].fpga.blocking_request(Message.request('sensor-list'))

the message returns quickly but has a short list:

sensors:
[,
 ,
 ,
 ,
 ,
 ]


If I run the katcp routine to get the sensor values, the command takes ~ 8
s and returns bad values

reply, vals =
ro[5].fpga.blocking_request(Message.request('sensor-value'),timeout=10)

vals:
[,
 ,
 ,
 ,
 ,
 ]

Doing this on a good board returns immediately and gives the much longer
sensor list:

vals:
[,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ]

I also tried telneting into the bad ROACH, and got the same short list,
long sensor value request time, and bad sensor values.

Another symptom is that setting the ADC registers on the KATADC board does
not seem to work, or at least one of the ADCs is misbehaving in the same
way they did when we were not setting the registers correctly.

Has anyone seen this before?  Is it hardware, firmware, software?

Thanks,
Dale

Re: [casper] Colombian Radio Interferometer

2016-03-29 Thread Gary, Dale E.

Hi Juan Camilo,

This is certainly doable with ROACH hardware.  Before embarking on it,
though, you should do some tests of the radio-frequency interference (RFI)
environment there.  Your target frequency range is one of the toughest in
terms of RFI, so you may be frustrated in your goal even if everything
works well.  The details of hardware implementation will depend a lot on
your choice of frequency range, so the check of RFI should be done first.
After the study, you can either restrict the frequency range or contemplate
using notch filters for the worst offenders (generally not cheap, though).

Regards,
Dale Gary

On Tue, Mar 29, 2016 at 11:42 AM, Juan Camilo Guevara Gomez <
jcgueva...@unal.edu.co> wrote:

> Dear all,
>
> Let me introduce myself, my name is Juan Camilo Guevara, I am a Master
> Student in Astronomy at the National University of Colombia. I am working
> in the design and implementation of a radio interferometer of two elements
> to observe the Sun in a range of frequencies between 100 MHz and 1 GHz. The
> main idea is observing the Sun with a high spectral and temporal
> resolution, therefore the data acquisition have to be in the total of the
> bandwidth, preferably simultaneously. The Roach Hardware has been thought
> as the best system to programming of the wished features.
>
> I would like to ask you two things, the first one is what do you think
> about the project and its feasibility to be developed with ROACH system.
> And the second one, is about if someone have a ROACH 1 system that it is
> not being used and if exist  any posibility that it be donated or sold by a
> symbolic value, safely this would be a crucial piece in the appropiation
> and development of technology applied to scientific research.
>
> Thank so much,
>
>
> Juan Camilo
>

Re: [casper] VEGAS PFB / FFT

2016-12-05 Thread Gary, Dale E.

Hi Richard,

I will let someone else say how the PFB block is actually implemented, but
the correct method is as you have described, so that each sample in a 4-tap
PFB will be used 4 times.  I do not see how it could work otherwise, and
indeed the whole point of the PFB is to give the effect of a window 4 times
as wide as the sample window without lowering the sample rate.

Regards,
Dale

On Mon, Dec 5, 2016 at 9:39 AM, Richard Prestage  wrote:

> Hi All –
>
>
>
> [ Long time CASPER groupie, first time posting. J ]
>
>
>
> I am trying to understand the VEGAS implementations of the PFB/FFT
> technique, both in the FPGA and the CPU. For simplicity, I will use the
> example on the CASPER wiki page, https://casper.berkeley.edu/
> wiki/The_Polyphase_Filter_Bank_Technique,  and specifically Figure 3, to
> pose my question.
>
>
>
> In the FFT-only approach, I would take the FFT of 256 samples, running
> from i = 384 to 639. Let us refer to the time this operation takes as T.
>
>
>
> In this case, the PFB/FFT approach makes use of 1024 samples, but still
> only produces the equivalent of a 256 point FFT.
>
>
>
> My initial assumption was that VEGAS would slide the 1024 window by 256
> points, each time. Thus, every T seconds, it would generate a new FFT, and
> each one would correspond to the conventional approach, but with the window
> function centered on it, hence reducing spectral leakage. This is how Joe
> Brandt believes the GPU code works.
>
>
>
> But, Randy McCullough understands that in the FPGA implementation, each
> sample is only used once. That would imply that one FFT comes out every P x
> T seconds.
>
>
>
> These two approaches are not the same. For example, imagine a large RFI
> spike at i = 896.  If we do not use a sliding window, then the value of the
> window function w(896) will be quite small, and the RFI will be attenuated.
> The next spectrum will correspond to I = 1024 – 2047, and there will be no
> RFI present.  If we use a sliding window, keeping the same origin for I,
> then when the window runs from I = 512 to 1535, w(896) will have a high
> value, and so the RFI would be obvious in the corresponding spectrum. This
> seems intuitively correct to me.
>
>
>
> Can someone definitively explain what is actually implemented in the FPGA
> (HBW modes), and the GPU (LBW modes) for VEGAS? If it does not use a
> sliding window, then is my above example correct, and the RFI would be
> attenuated? If not, can you explain that also?
>
>
>
> Thanks,
>
>Richard
>
>
>
> Richard Prestage
>
> Scientist
>
> Green Bank Observatory
>
> Green Bank, WV 24944, USA
>
>
>
>
>
>
>

Re: [casper] Application of SMA correlator design to a larger array

2017-11-16 Thread Gary, Dale E.

 your needs, and it has no yellow block yet
> (though Peralex promised to deliver one at the last CASPER workshop if a
> customer wanted it).
> >
> > Another consideration is that the newer platforms (SKARAB, SNAP etc) are
> using the new JASPER toolflow, which is under active development and has
> support from the bigger CASPER developers. The older CASPER flow is still
> supported on ROACH2s, but new features and bug-fixes are not being
> back-ported and development there has stagnated. Since Xilinx will not
> support Virtex 6 in Vivado, ROACH2 (and earlier boards) can never be
> supported by all the new tools. I don't know of anyone using JASPER with
> ISE (though, it was possible at one time).
> >
> > If you wanted to build your correlator out of ROACH2s or SKARABs (they
> have similar processing capacities), a quick back-of-the-envelope,
> worst-case calculation suggests that, for a 16 dual-pol, 4k channel 8-tap,
> 2GHz BW correlator (I can almost hear the infomercial already: "act now,
> and we'll throw in a free beamformer"; you'd fit a beam or two in the spare
> capacity within the X-engine boards, FWIW), a packetised design would need
> something like:
> >
> > 32x F-engine boards (though I'd say there's a good chance you could
> squeeze it into 16 boards).
> > 32x X-engine boards (30 if you don't want to process the band edges)
> > 1x Arista 7250QX-64 or similar ~64 port 40G switch (or just a 32-port
> switch with some loopback trickery, which would be much cheaper).
> >
> > It looks like you will be BRAM limited in both cases (otherwise you
> could halve the board counts). You could also opt for some BRAM-saving
> tradeoffs to ease fitment of two polarisations onto an F-engine. For
> example you could drop down to a 4-tap PFB, or reduce the delay-correction
> resolution (MeerKAT's specs, upon which I based the numbers above, are
> overkill for most applications). If you're using a single network switch,
> you might also be able to reduce packet buffer requirements, which
> currently use BRAM, too.
> >
> > The beauty of this design is in its flexibility. You can access the raw,
> intermediate data streams on the switch, swap out portions of the design
> for computers/GPUs (eg LEDA and HERA), add more antennas incrementally,
> increase or decreased processed bandwidth, change spectral resolution etc.
> all with a quick parameter change and a recompile.
> >
> > Jason Manley
> > Functional Manager: DSP
> > SKA-SA
> >
> > Cell: +27 82 662 7726
> > Work: +27 21 506 7300
> >
> > On 07 Nov 2017, at 20:02, Jonathan Weintroub <jweintr...@cfa.harvard.edu>
> wrote:
> >
> >> Hi Dale,
> >>
> >> I’ll offer a few bullets on SWARM, the new SMA system.
> >>
> >> 1.  SWARM is all open source and shared via CASPER and you are welcome
> to use it as is, or develop it further to adapt it to a new application,
> indeed it would be very pleasing to see the design used in some other
> instrument.
> >>
> >> 2.  There is a paper which is worth reading to understand what SWARM is
> and what it does.  Take a careful look if you are contemplating using the
> design.
> >> http://www.worldscientific.com/doi/pdf/10.1142/S2251171716410063
> >> You can get insight the paper without looking at the gory details of
> source codes, both bitcodes and associated software, but if you want to dig
> even deeper, sources are all shared here:
> >> http://www.github.com/sma-wideband.
> >>
> >> 3.  You are correct SWARM processes 2 GHz blocks of *usable”
> bandwidth.  The Nyquist band is somewhat wider, 2.288 GHz.   That Nyquist
> band is divided into 16,384 channels (not 1024), so in fact it exceeds
> (rather than falls short of) your requirement for at least 4096 channels.
> >>
> >> 4.  With all of the above the positive aspects, now comes the
> cautionary remark: it is by no means trivial to expand SWARM from 8 dual
> polarization antennas to 16 antennas.  The X-engine would then have to
> process roughly 4x the number of baselines as for SWARM.  This may well
> push the ROACH2 too far—we struggled to meet timing on the highly utilized
> ROACH2 for SWARM (286 MHz FPGA fabric clock).
> >>
> >> We are also looking at porting SWARM to newer platforms, primarily to
> expand bandwidth in the SMA’s case, rather than number of antennas.  We
> have also studied application of CASPER-like methods to ALMA, which of
> course has far more than 8 antennas, but those studies were on paper, we
> have yet to reduce to real design. Taking SWARM as-is (8 antennas 2 GHz
> 16384 channels on ROACH2) is fairly simple.  Expanding SWARM to

Re: [casper] Application of SMA correlator design to a larger array

2017-11-08 Thread Gary, Dale E.

lution etc.
> all with a quick parameter change and a recompile.
>
> Jason Manley
> Functional Manager: DSP
> SKA-SA
>
> Cell: +27 82 662 7726
> Work: +27 21 506 7300
>
> On 07 Nov 2017, at 20:02, Jonathan Weintroub <jweintr...@cfa.harvard.edu>
> wrote:
>
> > Hi Dale,
> >
> > I’ll offer a few bullets on SWARM, the new SMA system.
> >
> > 1.  SWARM is all open source and shared via CASPER and you are welcome
> to use it as is, or develop it further to adapt it to a new application,
> indeed it would be very pleasing to see the design used in some other
> instrument.
> >
> > 2.  There is a paper which is worth reading to understand what SWARM is
> and what it does.  Take a careful look if you are contemplating using the
> design.
> > http://www.worldscientific.com/doi/pdf/10.1142/S2251171716410063
> > You can get insight the paper without looking at the gory details of
> source codes, both bitcodes and associated software, but if you want to dig
> even deeper, sources are all shared here:
> > http://www.github.com/sma-wideband.
> >
> > 3.  You are correct SWARM processes 2 GHz blocks of *usable” bandwidth.
> The Nyquist band is somewhat wider, 2.288 GHz.   That Nyquist band is
> divided into 16,384 channels (not 1024), so in fact it exceeds (rather than
> falls short of) your requirement for at least 4096 channels.
> >
> > 4.  With all of the above the positive aspects, now comes the cautionary
> remark: it is by no means trivial to expand SWARM from 8 dual polarization
> antennas to 16 antennas.  The X-engine would then have to process roughly
> 4x the number of baselines as for SWARM.  This may well push the ROACH2 too
> far—we struggled to meet timing on the highly utilized ROACH2 for SWARM
> (286 MHz FPGA fabric clock).
> >
> > We are also looking at porting SWARM to newer platforms, primarily to
> expand bandwidth in the SMA’s case, rather than number of antennas.  We
> have also studied application of CASPER-like methods to ALMA, which of
> course has far more than 8 antennas, but those studies were on paper, we
> have yet to reduce to real design. Taking SWARM as-is (8 antennas 2 GHz
> 16384 channels on ROACH2) is fairly simple.  Expanding SWARM to 16 antennas
> and/or porting to a new FPGA platform will be a significant project—the
> SWARM design may be an excellent starting point, but even so.
> >
> > SKARAB is an interesting platform but doesn’t presently support the
> appropriate ADC.  Not sure about SNAP2 I’ll leave that assessment to others.
> >
> > Best wishes.
> >
> > Jonathan
> >
> >
> >
> >> On Nov 7, 2017, at 12:29 PM, Gary, Dale E. <dale.e.g...@njit.edu>
> wrote:
> >>
> >> Dear Jonathan (and the rest of the CASPER list, in case anyone has
> additional comments),
> >>
> >> I am looking into a new project that would require processing around 2
> GHz of bandwidth on of-order 10 (but more than 8) dual-polarization
> antennas.  Our science case calls for at least 4096 frequency channels.  My
> understanding is that the SMA correlator design is for a similar bandwidth,
> for 8 dual-pol antennas, but 1024 channels or something similar. We do not
> want to spend a lot of resources on correlator design, so my question is
> whether it is possible and would it make sense to adapt the SMA design to a
> 16-antenna, dual-pol, 4096-channel system, or whether it is better (or
> necessary) to leave the ROACH-2 designs behind and move to one of the newer
> platforms?  If the latter, what digitizer bandwidths are available, and
> which board (SNAP2, Scarab, others?) would be most appropriate to a new
> project of this scope?
> >>
> >> Thanks,
> >> Dale
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "casper@lists.berkeley.edu" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to casper+unsubscr...@lists.berkeley.edu.
> > To post to this group, send email to casper@lists.berkeley.edu.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

[casper] Application of SMA correlator design to a larger array

2017-11-07 Thread Gary, Dale E.

Dear Jonathan (and the rest of the CASPER list, in case anyone has
additional comments),

I am looking into a new project that would require processing around 2 GHz
of bandwidth on of-order 10 (but more than 8) dual-polarization antennas.
Our science case calls for at least 4096 frequency channels.  My
understanding is that the SMA correlator design is for a similar bandwidth,
for 8 dual-pol antennas, but 1024 channels or something similar. We do not
want to spend a lot of resources on correlator design, so my question is
whether it is possible and would it make sense to adapt the SMA design to a
16-antenna, dual-pol, 4096-channel system, or whether it is better (or
necessary) to leave the ROACH-2 designs behind and move to one of the newer
platforms?  If the latter, what digitizer bandwidths are available, and
which board (SNAP2, Scarab, others?) would be most appropriate to a new
project of this scope?

Thanks,
Dale

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

Re: [casper] KAT-7 KatADC on ROACH2

2018-03-07 Thread Gary, Dale E.

Hi Yan,

This sounds like the problem we had until Matt Dexter and Dave MacMahon
determined that some KatADC registers need to be set for correct ADC
operation.  Below is the python code we use to set them:

addr = [0x, 0x0001, 0x0002, 0x0003, 0x0009, 0x000A, 0x000B,
0x000E, 0x000F]
val  = [0x7FFF, 0xBAFF, 0x007F, 0x807F, 0x03FF, 0x007F, 0x807F,
0x00FF, 0x007F]
#val  = [0x7FFF, 0xB2FF, 0x007F, 0x807F, 0x03FF, 0x007F,
0x807F, 0x00FF, 0x007F]  # 300 MHz
#if interleaved: val[4] = 0x23FF # Uncomment this line for
interleaved mode
for i in range(len(addr)):
print('Setting ADC register %04Xh to 0x%04X' % (addr[i],
val[i]))
# Program both ZDOKs (this could be made smarter if needed).
corr.katadc.spi_write_register(self.fpga, 0, addr[i],
val[i])
corr.katadc.spi_write_register(self.fpga, 1, addr[i],
val[i])

Note that at higher clock speeds, the value of the second register is
changed (commented third line), as determined by Jack Hickish.  If you want
more detailed information, you will have to consult these folks, but I am
optimistic that this will help you.  This uses the older corr python
module, but some equivalent command to spi_write_register() must exist in
the more up-to-date casperfpga module.

Regards,
Dale

On Wed, Mar 7, 2018 at 7:09 AM, 朱岩  wrote:

> Hi all,
>
> We have encountered some issue while using the KAT-7 KatADC on ROACH2.
> Does anyone has experience in using KatADC on ROACH2 board?
>
> Below are what we got while test katadc on roach2.
>
> In attached pictures katadc-snap-r1817*.png,
> This is a simulink model built by me to capture the sampling data from
> Katadc,
> The 2 Zdoks are all polulated with katadc board. The sampling clock is
> 1GHz and
> the input signals are all 5MHz sine wave. The FPGA clock is derived from
> ADC0.
> It is clear that there are something wrong with ZDOK1 katadc board.
>
> After that, I swapped these 2 cards, all captured signal are not good this
> time,
> as seen in katadc-snap-r1807-swapped.png.
>
> We also tested one single board(the bad one) connected in ZDOK0. And found
> that
> while feeding 1GHz sampling clock, the estimated fpga clock speed from
> katcp_wrapper.py:est_brd_clk() is very unstable, as seen in
> bad-card-fpga-clock.png.
> And compare to the good board, good-card-fpga-clock.png
>
> This suggest some clock/timing issue with these board. We also double
> check the sampling
> clock and input signal, they are all correct.
>
> Then I try to lower down the sampling clock, and found while the sampling
> clock is lower
> than about 950MHz, the estimated fpga clock tends to be stable, but the
> fpga clock speed
> is /2 of sampling clock, not /4, which I think it should be /4. This was
> very strange to me,
> and might be a clue for further investigation.
>
> Any suggestion are welcome.
>
>
> Thanks
> Yan
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

Re: [casper] KAT-7 KatADC on ROACH2

2018-04-18 Thread Gary, Dale E.

Hi Tom,

I am resending my reply of about a month ago to another, similar question.

Regards,
Dale

On Wed, Mar 7, 2018 at 8:14 AM, Gary, Dale E. <dale.e.g...@njit.edu> wrote:

> Hi Yan,
>
> This sounds like the problem we had until Matt Dexter and Dave MacMahon
> determined that some KatADC registers need to be set for correct ADC
> operation.  Below is the python code we use to set them:
>
> addr = [0x, 0x0001, 0x0002, 0x0003, 0x0009, 0x000A,
> 0x000B, 0x000E, 0x000F]
> val  = [0x7FFF, 0xBAFF, 0x007F, 0x807F, 0x03FF, 0x007F,
> 0x807F, 0x00FF, 0x007F]
> #val  = [0x7FFF, 0xB2FF, 0x007F, 0x807F, 0x03FF, 0x007F,
> 0x807F, 0x00FF, 0x007F]  # 300 MHz
> #if interleaved: val[4] = 0x23FF # Uncomment this line for
> interleaved mode
> for i in range(len(addr)):
> print('Setting ADC register %04Xh to 0x%04X' % (addr[i],
> val[i]))
> # Program both ZDOKs (this could be made smarter if
> needed).
> corr.katadc.spi_write_register(self.fpga, 0, addr[i],
> val[i])
> corr.katadc.spi_write_register(self.fpga, 1, addr[i],
> val[i])
>
> Note that at higher clock speeds, the value of the second register is
> changed (commented third line), as determined by Jack Hickish.  If you want
> more detailed information, you will have to consult these folks, but I am
> optimistic that this will help you.  This uses the older corr python
> module, but some equivalent command to spi_write_register() must exist in
> the more up-to-date casperfpga module.
>
> Regards,
> Dale
>
> On Wed, Mar 7, 2018 at 7:09 AM, 朱岩 <zhu...@nao.cas.cn> wrote:
>
>> Hi all,
>>
>> We have encountered some issue while using the KAT-7 KatADC on ROACH2.
>> Does anyone has experience in using KatADC on ROACH2 board?
>>
>> Below are what we got while test katadc on roach2.
>>
>> In attached pictures katadc-snap-r1817*.png,
>> This is a simulink model built by me to capture the sampling data from
>> Katadc,
>> The 2 Zdoks are all polulated with katadc board. The sampling clock is
>> 1GHz and
>> the input signals are all 5MHz sine wave. The FPGA clock is derived from
>> ADC0.
>> It is clear that there are something wrong with ZDOK1 katadc board.
>>
>> After that, I swapped these 2 cards, all captured signal are not good
>> this time,
>> as seen in katadc-snap-r1807-swapped.png.
>>
>> We also tested one single board(the bad one) connected in ZDOK0. And
>> found that
>> while feeding 1GHz sampling clock, the estimated fpga clock speed from
>> katcp_wrapper.py:est_brd_clk() is very unstable, as seen in
>> bad-card-fpga-clock.png.
>> And compare to the good board, good-card-fpga-clock.png
>>
>> This suggest some clock/timing issue with these board. We also double
>> check the sampling
>> clock and input signal, they are all correct.
>>
>> Then I try to lower down the sampling clock, and found while the sampling
>> clock is lower
>> than about 950MHz, the estimated fpga clock tends to be stable, but the
>> fpga clock speed
>> is /2 of sampling clock, not /4, which I think it should be /4. This was
>> very strange to me,
>> and might be a clue for further investigation.
>>
>> Any suggestion are welcome.
>>
>>
>> Thanks
>> Yan
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To post to this group, send email to casper@lists.berkeley.edu.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

Re: [casper] ROACH1 est_brd_clk()

2018-04-18 Thread Gary, Dale E.

Hi Tom,

What digitizer are these using?  We have had issues with the KATADC on
ROACH2, which turns out to need some register initializations in order to
function properly, and the symptom is a bad est_brd_clk() result.

Regards,
Dale

On Wed, Apr 18, 2018 at 1:48 PM, Tom Kuiper  wrote:

> What situations could cause est_brd_clk() to give the wrong answer?
>
> I have two ROACH1s and a Valon 5007.  Every time we check the Valon it
> puts out the required clock frequency, 1020 MHz, at about +7 dBm.
>
> When I initialize the ROACHs (roach1, roach2) from an ipython command
> line, est_brd_clk() in the __init__() routine finds the expected system
> clock of 255 MHz.  When the two ROACHs are initialized as part of a server
> initialization, the first ROACH initialized passes the system clock test
> and the second fails. I've reversed the order of initialization so it's not
> a specific ROACH that passes or fails.  Then at the end of the server
> initialization, I check both system clocks and they are then both wrong.
> Then, if I shut down the server and re-initialize the ROACHs, they both
> have the right system clock.
>
> If no one has an explanation, perhaps there might be some suggestions for
> what I should try next.
>
> Thanks and warm regards,
>
> Tom
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

Re: [casper] 10 GBE Network Slowdown with Ubuntu 18.04

2018-10-03 Thread Gary, Dale E.

Hi All,

I thought I would send an update to this problem, which still persists.
Jonathan's suggestion did not seem to work, since each ethernet interface
does not send packets to multiple processors.  If I specify two cpus in the
SMP_AFFINITY files, the board sends to only one of them.  Also, I removed
irqbalance as Jean suggested, but that had no effect.

I wrote a python script to read and plot the number of packets handled by
each interface from /proc/net/softnet_stat once per second, and then
started two packet-reading processes on difference cpus.  The attached file
is a good example of what I find.  The packet-readers run normally until
about 335 s in, and then the number of packets on both interfaces suddenly
drops by about 30,000, and the packet readers dutifully complain that they
are getting too few packets per accumulation.  At about 360 s, I killed one
of the packet-reader processes, and the number of packets on the interface
it was reading jumps immediately up to normal.  It is interesting that the
*other* interface also shows more packets arriving, but not up to normal.
After killing the second process, all is well again.  When this process is
repeated, the timing of the failures changes, but seems always to be longer
than 5 minutes--I'm not sure I ever saw a failure within the first 5
minutes.

This seems to confirm that

   1. The interfaces are running fine, and it is the act of reading them
   that somehow is associated with the problem
   2. The failure is sudden, and leads to a lower, but stable number of
   packets being handled by the interface.
   3. The failure is usually, but not always, on both interfaces at the
   same time.
   4. Killing the process brings the packets back, without resetting
   anything else.
   5. The probability of failure seems to be near 0 within the first 5
   minutes, and near 1 by 10 minutes, yet the timing of the glitch is quite
   random between those limits.

Note that we have plenty of resources (top shows cpu idle time on the
packet-reading processes is near 50%, and on the packet-handling processes
is 75%, with 25% si).  Memory usage is also miniscule compared to the 65 GB
available.

So far, that is all I have.  The myricom folks (CSPi) have opened a ticket,
but so far have not had any suggestions.  They did say that they were about
to embark on some tests of Ubuntu 18.04 compatibility, so perhaps they will
find something.  Meanwhile, we have no solution.

Regards,
Dale

On Wed, Oct 3, 2018 at 4:19 AM Jean Borsenberger 
wrote:

> First sorry for the delay, I was off for a time.
>
> We do not use UBUNTU, but DEBIAN, but the two distribs are in fact two
> flavours of the same thing.
>
> We manage on each machine an UDP download link from a ROACH2. ROACH2 does
> nothing but adding an 8byte counter to each 8K data block. That way we can
> precisely mesure the packet loss rate.
>
> First notice that driver writers for 10GBe NIC found wise to split IRQ on
> wether six or seven IRQ numbers, why this? I do not have the slightest
> idea. Then come the worse. We run at 1.1 Gsamples/sec, we are very close to
> the link capacity. With the standard setup the loss may rise to 5%, with a
> current value of 1%. I suspect a cache problem. On our 8 (real) core
> system, IRQ can be splitted on each one, but each has to be aware of what
> is currently done by the others. This coherence issue may take some cycles.
> Using that guess I assigned all IRQ of a given I/F to a single core
> (/proc/irq/xx/smp_affinity). Concurently I removed all other things from
> this core (smp affinity and taskset). It worked: the loss is now arround
> 10^^-6, which we find acceptable.
>
> The new pledge is named irqbalance, which takes over you on IRQ
>
> aptitude remove irqbalance.
>
> That's harmless.
>
>
> You may wish also to get rid of systemd, which takes cycles for a
> questionable purpose, but the issue is hazardous. Anyhow we took this
> option.
>
> systemd gets worse at each OS release.
>
>
> Jean Borsenberger
>
> On 22/09/18 01:56 PM, Gary, Dale E. wrote:
>
> Hi All,
>
> We are running a multi-core (32-core) system at Owens Valley that has a
> dual-port Myricom 10GBe NIC.  We ran the system very successfully under
> Ubuntu 12.04 for more than 1 year, but after upgrading to Ubuntu 18.04
> (generic) we are now experiencing reliability problems, despite the tuning
> parameters and smp_affinity adjustments being (as far as we can tell) the
> same.  The problem seems to be somehow associated with system load and
> packet handling rather than receipt of the packets by the interface, since
> things run fine for up to 10 minutes, then start to deteriorate.  In
> researching this, I see various other flavors of Ubuntu (low-latency,
> realtime, rt, preempt) that make kernel adjustments that might help, but I
> am not able to te

Re: [casper] 10 GBE Network Slowdown with Ubuntu 18.04

2018-10-08 Thread Gary, Dale E.

Hi Dave,

I just had some time to investigate your comment, and run the script you
linked to, and indeed there may be some problem here.  The output of the
script is shown below, which seems to indicate that all of the NICs are
connected to cpus 0-7 (socket 0).  We steered the interrupts (47 and 48) to
cpus 15 and 31 as a test, although we were using 30 and 31 in an earlier
test. However, I just tried assigning the NICs to cpus 14 and 15, and the
packet reading to cpus 12 and 13, and the problem is unchanged.  Does this
setup satisfy your expectation as the correct one?

Thanks,
Dale

Sockets/cores to CPUs:
socket 0, core  0 -> cpu  0
socket 0, core  0 -> cpu  8
socket 0, core  1 -> cpu  1
socket 0, core  1 -> cpu  9
socket 0, core  2 -> cpu  2
socket 0, core  2 -> cpu 10
socket 0, core  3 -> cpu  3
socket 0, core  3 -> cpu 11
socket 0, core  4 -> cpu  4
socket 0, core  4 -> cpu 12
socket 0, core  5 -> cpu  5
socket 0, core  5 -> cpu 13
socket 0, core  6 -> cpu  6
socket 0, core  6 -> cpu 14
socket 0, core  7 -> cpu  7
socket 0, core  7 -> cpu 15
socket 1, core  0 -> cpu 16
socket 1, core  0 -> cpu 24
socket 1, core  1 -> cpu 17
socket 1, core  1 -> cpu 25
socket 1, core  2 -> cpu 18
socket 1, core  2 -> cpu 26
socket 1, core  3 -> cpu 19
socket 1, core  3 -> cpu 27
socket 1, core  4 -> cpu 20
socket 1, core  4 -> cpu 28
socket 1, core  5 -> cpu 21
socket 1, core  5 -> cpu 29
socket 1, core  6 -> cpu 22
socket 1, core  6 -> cpu 30
socket 1, core  7 -> cpu 23
socket 1, core  7 -> cpu 31

Ethernet interfaces to CPUs:
eth0: 0-7
eth1: 0-7
eth2: 0-7
eth3: 0-7


On Wed, Oct 3, 2018 at 10:12 PM Dale Gary  wrote:

> Hi Dave,
>
> When you say multi-socket, do you mean multi-processor?  There are two 16
> core AMD Opteron processors.  We are using taskset, and have tried every
> permutation we could think of.  I’ll check out the script.
>
> Thanks,
> Dale
>
> Sent from my iPhone
>
> On Oct 3, 2018, at 9:39 PM, David MacMahon  wrote:
>
> HI, Dale,
>
> Is this a multi-socket system?  If so, are you using "numactl" or
> "taskset" to bind the packet reading processes to CPU(s) on the same socket
> that the NIC is connected to?  Are you sure you are sending the NIC
> interrupts to CPU(s) on the socket that the NIC is connected to?
>
> FWIW, the Hashpipe program includes a script (hashpipe_topology.sh) that
> will summarize the NUMA topology of a system vis a vis network cards and/or
> GPUs.
>
>
> https://github.com/david-macmahon/hashpipe/blob/master/src/hashpipe_topology.sh
>
> HTH,
> Dave
>
> On Oct 3, 2018, at 14:22, Gary, Dale E.  wrote:
>
> Hi All,
>
> I thought I would send an update to this problem, which still persists.
> Jonathan's suggestion did not seem to work, since each ethernet interface
> does not send packets to multiple processors.  If I specify two cpus in the
> SMP_AFFINITY files, the board sends to only one of them.  Also, I removed
> irqbalance as Jean suggested, but that had no effect.
>
> I wrote a python script to read and plot the number of packets handled by
> each interface from /proc/net/softnet_stat once per second, and then
> started two packet-reading processes on difference cpus.  The attached file
> is a good example of what I find.  The packet-readers run normally until
> about 335 s in, and then the number of packets on both interfaces suddenly
> drops by about 30,000, and the packet readers dutifully complain that they
> are getting too few packets per accumulation.  At about 360 s, I killed one
> of the packet-reader processes, and the number of packets on the interface
> it was reading jumps immediately up to normal.  It is interesting that the
> *other* interface also shows more packets arriving, but not up to normal.
> After killing the second process, all is well again.  When this process is
> repeated, the timing of the failures changes, but seems always to be longer
> than 5 minutes--I'm not sure I ever saw a failure within the first 5
> minutes.
>
> This seems to confirm that
>
>1. The interfaces are running fine, and it is the act of reading them
>that somehow is associated with the problem
>2. The failure is sudden, and leads to a lower, but stable number of
>packets being handled by the interface.
>3. The failure is usually, but not always, on both interfaces at the
>same time.
>4. Killing the process brings the packets back, without resetting
>anything else.
>5. The probability of failure seems to be near 0 within the first 5
>minutes, and near 1 by 10 minutes, yet the timing of the glitch is quite
>random between those limits.
>
> Note that we have plenty of resources (top shows cpu idle time on the
&

[casper] 10 GBE Network Slowdown with Ubuntu 18.04

2018-09-22 Thread Gary, Dale E.

Hi All,

We are running a multi-core (32-core) system at Owens Valley that has a
dual-port Myricom 10GBe NIC.  We ran the system very successfully under
Ubuntu 12.04 for more than 1 year, but after upgrading to Ubuntu 18.04
(generic) we are now experiencing reliability problems, despite the tuning
parameters and smp_affinity adjustments being (as far as we can tell) the
same.  The problem seems to be somehow associated with system load and
packet handling rather than receipt of the packets by the interface, since
things run fine for up to 10 minutes, then start to deteriorate.  In
researching this, I see various other flavors of Ubuntu (low-latency,
realtime, rt, preempt) that make kernel adjustments that might help, but I
am not able to tell from the descriptions which if any of these might
address the problem.  Has anyone had a similar experience, and/or have
advice about what options we might have?  I am using the myri10ge driver
that came with Ubuntu 18.04.

One thing I might mention is that I ran this script:
https://github.com/majek/dump/blob/master/how-to-receive-a-packet/softnet.sh,
and find a certain number of "squeezed" packets, which are "# of times
ksoftirq ran out of netdev_budget or time slice with work remaining."  I
don't know if this is something to worry about?  The output of softnet.sh
is like this.  Note we had the NIC assigned to cpus 1 and 2, but changed to
30 and 31.

user@dpp:~$ ./softnet.sh
cpu  totaldropped   squeezed  collisionrps flow_limit
  01328082  0   3729  0  0  0
  1 1716559544  07208929  0  0  0
  2 1793125842  08158475  0  0  0
  31069150  0   3714  0  0  0
  41400569  0   5443  0  0  0
  56988379  0   5985  0  0  0
  66466640  0   5950  0  0  0
  71070366  0   4097  0  0  0
  8 878808  0   3906  0  0  0
  9 933541  0   4207  0  0  0
 10   1229  0  4  0  0  0
 11848  0  0  0  0  0
 12   1310  0  5  0  0  0
 13662  0  0  0  0  0
 14   1304  0  2  0  0  0
 15680  0  3  0  0  0
 16   1817  0  2  0  0  0
 17648  0  3  0  0  0
 18742  0  2  0  0  0
 19605  0  2  0  0  0
 20690  0  2  0  0  0
 21536  0  3  0  0  0
 22860  0  0  0  0  0
 23493  0  3  0  0  0
 24   1657  0  4  0  0  0
 259244642  0   1487  0  0  0
 26912  0  2  0  0  0
 27287  0  0  0  0  0
 285252171  0877  0  0  0
 29339  0  3  0  0  0
 30 3378532079  0   17299324  0  0  0
 31 3390959304  0   16129528  0  0  0

Thanks,
Dale

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

[casper] CASPER design question

2018-12-24 Thread Gary, Dale E.

Hi Casperites,

I received an inquiry from Neal Hurlburt, from Lockheed Martin (Palo Alto),
who wants to look into the possibility of using CASPER hardware and tools
to design a prototype system for interferometry of optical signals.  So far
the specifications are quite fluid (see message below)--they would like a
500 MHz bandwidth, but would accept as low as 50 MHz.  However, he is
looking for 60-100 inputs, so I guess he is driven to the lower end of this
bandwidth range.  Neal is not an engineer, but he can relay some
suggestions from this list to engineering support at Lockheed.  If anyone
has some suggestions for Neal, especially any projects of a similar nature,
he would be delighted to hear from you.  I added his name to the cc list.

Many thanks,
Dale

Hi Dale,

Thanks for the info on your experience with CASPER. At a minimum, we are
looking for a system that can digitize and correlate about 60 elements at
50 to 500Mhz each. It sounds like CASPER is a good starting point. Who do
you suggest we talk to at Berkeley?

Thanks,
Neal


Neal Hurlburt
Manager, Space Science & Software
Space Science & Instrumentation
Lockheed Martin ATC
650-354-5504
hurlb...@lmsal.com

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

Re: [casper] seeking high accuracy GPS disciplined time/frequency standards ?

2019-03-07 Thread Gary, Dale E.

Hi Dan,

We use a PRS10 Rubidium clock from Stanford Research Systems (
https://www.thinksrs.com/products/prs10.html).  This takes a 1 pps from a
GPS clock, and produces a highly accurate (and stable) 10 MHz signal, which
we use to lock all of our time and frequency sources.  You can read about
the precision at the above link and decide if it is good enough.

Regards,
Dale

On Thu, Mar 7, 2019 at 11:51 AM Dan Werthimer  wrote:

>
>
> in a somewhat related question.
>
> can anybody give us advice about GPS disciplined oscillators time/freq
> standards that are very accurate wrt UTC?
> we don't want to buy a hydrogen maser (too pricy).
> we have been looking at a company called endrun technologies that sell
> time/freq standards accurate to about +-10 ns wrt UTC.
> they might be able to match a pair of them that track each other +- 3ns
> RMS.
> we need a pair of well matched time/freq standards for coincidence time
> stamping/correlation between two observatories for our panoseti experiment.
> (the two optical/IR observatories are 500 km apart, and don't have
> masers).
>
> thanks for any advice on this.
>
> btw, we are using white rabbit for time/frequency distribution over 1 Gbe
> bidi fiber,
> and we put the white rabbit hardware (VCO and DAC chips) and software on
> our FPGA boards for this project.
> (we made our own FPGA boards with white rabbit and kintex7 because we need
> a few thousand boards)
> white rabbit does sub-ns accuracy in timing distribution - some white
> rabbit users have measured 30 ps RMS.
>
> best wishes,
>
> dan
>
>
> On Thu, Mar 7, 2019 at 12:05 AM Michael Inggs  wrote:
>
>> Hi Franco
>>
>> Simon Lewis in the RRSG at UCT has White Rabbit hardware and expertise
>> (PhD incubating). Snag is that it runs on 1GE Fibre. We also have a GPS
>> version. The former gives sub ns precision, the latter about 4 ns rms. Send
>> me a message off line and I can link you. We also have a scheme of aligning
>> a trigger to both a local MHz clock and the 1 pps. This is all open source
>> hardware and software.
>>
>> Regards
>>
>> On Thu, 7 Mar 2019 at 08:52, James Smith  wrote:
>>
>>> Hello Franco,
>>>
>>> As I understand it, PTP wasn't terribly useful in our application
>>> (though I wasn't involved with this directly). You can probably sync the
>>> little Linux instance that runs on the ROACH2, but getting the time
>>> information onto your FPGA may prove somewhat tricky.
>>>
>>> Are you using an ADC card in the ROACH2? Or is the data digitised
>>> separately?
>>>
>>> What we've done with ROACH and ROACH2 designs in the past is more or
>>> less this:
>>>
>>>- FPGA's clock comes from a timing & frequency reference (TFR).
>>>- ROACH2 gets a 1PPS input from the same TFR.
>>>- In the FPGA logic there's a counter which is reset as part of the
>>>initialisation, and some logic that starts the counter going after a set
>>>number of 1PPS pulses (two to three, I forget exactly now).
>>>- The output of this counter is pipelined along with the data and
>>>then sent out as part of the SPEAD data on the 10GbE network.
>>>
>>> The idea here being that you know with a fairly high degree of precision
>>> which pulse your ROACH was initialised on. The counter that comes through
>>> on the SPEAD packet counts in FPGA clock cycles (or multiples thereof,
>>> perhaps you might want to count in spectra), and then you can use the start
>>> time to calculate the timestamp of each packet (Unix time, MJD, whichever
>>> your preferred reference is).
>>>
>>> Hope that helps.
>>>
>>> Regards,
>>> James
>>>
>>>
>>> On Wed, Mar 6, 2019 at 7:41 PM Franco  wrote:
>>>
 Dear Casperiites,

 I was given the task of timestamping ROACH2 spectral data in a
 telescope that uses PTP (precision time protocol) as a synchronization
 protocol. I understand that ROACH's BORPH come preloaded with NTP (network
 time protocol) libraries/daemos, but PTP is preferred because is already in
 use in the telescope, and it achieves greater time precision.

 Does somebody know if it is feasible to compile/install PTP libraries
 in BORPH?

 Alternatively, we have though of sending the ROACH the current time
 through a GPIO pin using IRIG-B timecode standard. Has anybody done
 something similar in the past?

 Thanks,

 Franco

 --
 You received this message because you are subscribed to the Google
 Groups "casper@lists.berkeley.edu" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to casper+unsubscr...@lists.berkeley.edu.
 To post to this group, send email to casper@lists.berkeley.edu.

>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "casper@lists.berkeley.edu" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to casper+unsubscr...@lists.berkeley.edu.
>>> To post to this group, send email to

[casper] Bringing up a new ROACH2

2019-02-03 Thread Gary, Dale E.

Hi All,

We need to bring up a new ROACH2 board from scratch for the purpose of
introducing students to the tools based on the Casper 2018 tutorials.
Although I did all of this 3-4 years ago at our observatory, I need to
refresh my memory, and I would like to take advantage of any new
developments.  Are the instructions (ca 2015) still correct as listed
here:
https://casper.ssl.berkeley.edu/wiki/ROACH-2_Revision_2#Updating_tcpborphserver_3_etc.
?  If there are updated instructions, please let me know.

Thanks,
Dale Gary

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

Re: [casper] Bringing up a new ROACH2

2019-02-04 Thread Gary, Dale E.

Hi James,

Thanks for the reply.  The ROACHes at the observatory are netbooted, and I
do not have an SD card for booting, so would have to create one.  I suppose
I could clone the uboot system from the observatory to campus and try to
set up tftp etc. -- that is pretty much what I intended to do using the
instructions I had linked to, but my question is whether there are new
instructions somewhere that I should use.  It seems the answer is no, so I
will proceed with the old instructions.

Thanks,
Dale

On Mon, Feb 4, 2019 at 5:37 AM James Smith  wrote:

> Hello Dale,
>
> Do you have any existing, functioning ROACH2 boards? Are they booting from
> an SD card or from network? If it's a network card then it's a simple
> matter to copy the one to the other, if from network then the most you may
> need to do is update your dnsmasq.conf to assign an IP address to the new
> ROACH2 (its MAC address should be the same as existing ones but the last
> few digits reflect the serial number of the ROACH2).
>
> The instructions linked look as though they should work but AFAIK it's
> been a while since anyone put in a new ROACH2 anywhere, so YMMV. At SKA,
> we've mostly moved on to SKARAB. However, there's quite a bit of
> organisational memory, so if you battle, just shout.
>
> Regards,
> James
>
>
> On Sun, Feb 3, 2019 at 8:51 PM Gary, Dale E.  wrote:
>
>> Hi All,
>>
>> We need to bring up a new ROACH2 board from scratch for the purpose of
>> introducing students to the tools based on the Casper 2018 tutorials.
>> Although I did all of this 3-4 years ago at our observatory, I need to
>> refresh my memory, and I would like to take advantage of any new
>> developments.  Are the instructions (ca 2015) still correct as listed
>> here:
>> https://casper.ssl.berkeley.edu/wiki/ROACH-2_Revision_2#Updating_tcpborphserver_3_etc.
>> ?  If there are updated instructions, please let me know.
>>
>> Thanks,
>> Dale Gary
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To post to this group, send email to casper@lists.berkeley.edu.
>>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

Re: [casper] Weird python problem

2020-09-04 Thread Gary, Dale E.

It is always good practice to Google the error.  Others have met with it,
e.g.
https://stackoverflow.com/questions/12805044/python-console-importerror-pyunicodeucs4-fromstring
.
This might help (suggests there is a .pyc file compiled under another
version of Python), or check other links on a Google search.

Regards,
Dale
[image: NJIT logo]  *Dale E. Gary*
Distinguished Professor, Physics
Center for Solar-Terrestrial Research
Director, Owens Valley Solar Array
dg...@njit.edu • (973) 642-7878 <(973)+642-7878>


On Fri, Sep 4, 2020 at 6:43 AM Heystek Grobler 
wrote:

> Hey Mike
>
> Thanks for your reply. I have tried that as well. Even if I run it
> straight out of the home directory it still gives the same error.  This
> have been frustrating me for the last week.
>
> Heystek
>
> On 04 Sep 2020, at 12:40, Michael D'Cruze 
> wrote:
>
> Hi Heystek,
>
> I haven’t seen this before. Is there anything in your working directory
> that the import statement could be confusing with your desired module?
> Perhaps try changing your working directory.
>
> GL
> Mike
>
> *From:* Heystek Grobler [mailto:heystekgrob...@gmail.com
> ]
> *Sent:* 04 September 2020 11:37
> *To:* 'Siddharth Savyasachi Malu' via casper@lists.berkeley.edu
> *Subject:* [casper] Weird python problem
>
> Good day everyone.
>
> I have encountered a weird problem. I have a python script that makes use
> of pylab. When importinf pylab, I get this error:
>
> PyUnicodeUCS4_FromString
>
>
> I have tried to use import matplotlib.pyplot as plt and I get the same
> error. I have googled it, but nothing seems to help.
>
>
> I am using python 2.7
>
>
> Thanks for the help.
>
>
> Heystek
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/35AD58DC-408B-45C5-BE42-07D0FA3BB440%40gmail.com
> 
> .
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/VI1PR01MB4799C99D9E12534CB8A47858AC2D0%40VI1PR01MB4799.eurprd01.prod.exchangelabs.com
> 
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/4BB243AB-DA2E-478D-936B-3641D51F1FA4%40gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAKeNqUi8dDnqvtcjpe9Y1c6%3D4RdqF6N90z%2BZxyo2QPF1xbvYog%40mail.gmail.com.

Re: [casper] Help with ROACH2 uboot issue

2022-08-24 Thread Gary, Dale E

Marc, I tried soloboot and it did let me log in, but I couldn't seem to
create a mount point.  The filesystem shows

Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/root 6641  6641 0 100% /
/dev/mtdblock2   49152  1352 47800   3% /usr
tmpfs   387004 8386996   0% /var

and I was able to make a directory /usr/local/ovsasrv, but when I tried to
mount into that I got

~ # mount 192.100.16.206/srv/roach2_boot/etch /usr/local/ovsasrv
mount: mounting 192.100.16.206/srv/roach2_boot/etch on /usr/local/ovsasrv
failed: No such file or directory


I also tried to mount into /tmp, /usr/tmp, and other places but got the
same result.  I created an empty file in /usr/local/ovsasrv thinking I
would get a different error like "directory not empty" but it still gave
the above error.  Can you suggest something else to test local mounting in
soloboot?

Regards,
Dale

On Wed, Aug 24, 2022 at 3:05 AM Marc  wrote:

> On Wed, Aug 24, 2022 at 12:57 AM Gary, Dale E 
> wrote:
> >
> > Hi Dave,
> >
> > There is only one interface on the server.  I am not actually sure how
> it works with the private network, but it is not a dedicated NIC.  I was
> using dnsmasq on the old server but when I changed to the new one I was
> trying not to use it.  Since I can mount the share on another client on the
> private network I don't understand why the ROACH can't do it.  I tried the
> nolock option also, and specifying vers=3.  Nothing makes any difference.
>
> Perhaps try soloboot the roach and try to mount the nfs share in some
> random subdirectory - maybe the error messages are more informative
> there ? I'd try to go with nolock and an earlier NFS version across
> UDP, at least initially. That might give you an idea what
> versions/options the roaches understand.
>
> regards
>
> marc
>
> --
> https://katfs.kat.ac.za/~marc/
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAGrhWaTVAsJD-Tn1z3w_dLwtEdZant%2Bz2Sn5C%2BG1fQw0Rjk9Bw%40mail.gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAKeNqUjvfnXQCLj%3DnHnJ%2B62v1KywYPzEobdwHqae2Vr5i18Juw%40mail.gmail.com.

Re: [casper] Help with ROACH2 uboot issue

2022-08-24 Thread Gary, Dale E

Hi Marc and Dave,

Here are responses to your suggestions while in soloboot:
> showmount -a 192.100.16.206
There is no showmount command in the soloboot.
> running a firewall?
Yes, I am running ufw on the server.  Its status gives:

dgary@ovsa:~$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)
New profiles: skip

To Action  From
-- --  
22/tcp ALLOW INAnywhere
80/tcp ALLOW INAnywhere
443/tcpALLOW INAnywhere
Anywhere   ALLOW IN192.168.24.0/24
22/tcp (v6)ALLOW INAnywhere (v6)
80/tcp (v6)ALLOW INAnywhere (v6)
443/tcp (v6)   ALLOW INAnywhere (v6)

Maybe this disabling of routed connections is related to the problem?  I
then edited the ufw sysctl.conf and restarted the firewall, and the status
is

dgary@ovsa:~$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), deny (routed)

However, the default is still deny.  The example I found for setting an
allow rule is

ufw route allow in on eth1 out on eth2

but there is only one interface so I can't make such a rule.  I am not sure
how Rick Hobbs at Caltech has this set up.  I'll send him a request about
it.

> cat /proc/filesystems
I see nfs and nfs4 in the list, so I assume that means it is there and
known.
> mount -t nfs 192.100.16.206:/srv/roach2_boot/etch /usr/local/ovsasrv
Here I get something potentially useful:

svc: failed to register lockdv1 RPC service (errno 111).
mount: mounting 192.100.16.206:/srv/roach2_boot/etch on /usr/local/ovsasrv
failed: Connection refused

 On the servers syslog I get

Aug 24 19:54:30 ovsa rpc.mountd[1022]: authenticated mount request from
192.168.24.121:714 for /mnt/data0/srv/roach2_boot/etch
(/mnt/data0/srv/roach2_boot)

in response, so it did send the mount request.  I then tried setting
options for tcp, vers=3, and got the same error.  Finally, I tried vers=4
and got

mount: NFSv4 not supported
mount: mounting 192.100.16.206:/srv/roach2_boot/etch on /usr/local/ovsasrv
failed


 All of this does indeed seem to point to the firewall as the culprit.  But
what bothers me is that *I *can* mount the share on an ubuntu machine on
the private network*, and that has to negotiate this same thicket so I
still have doubts.

Anyway, I hope we are starting to narrow the problem.

Regards,
Dale

On Wed, Aug 24, 2022 at 1:10 PM Marc  wrote:

> Hello
>
> So first check if the kernel you are booting knows about nfs by doing
> a "cat /proc/filesystems".
>
> If so then try specifying the filesystem type explicity, using a "-t
> nfs". If I recall correctly the mount executable is busybox, so might
> not be as featureful as the mount we normally encounter.
>
> regards
>
> marc
>
> On Wed, Aug 24, 2022 at 3:54 PM Gary, Dale E  wrote:
> >
> > Marc, I tried soloboot and it did let me log in, but I couldn't seem to
> create a mount point.  The filesystem shows
> >
> > Filesystem   1K-blocks  Used Available Use% Mounted on
> > /dev/root 6641  6641 0 100% /
> > /dev/mtdblock2   49152  1352 47800   3% /usr
> > tmpfs   387004 8386996   0% /var
> >
> > and I was able to make a directory /usr/local/ovsasrv, but when I tried
> to mount into that I got
> >
> > ~ # mount 192.100.16.206/srv/roach2_boot/etch /usr/local/ovsasrv
> > mount: mounting 192.100.16.206/srv/roach2_boot/etch on
> /usr/local/ovsasrv failed: No such file or directory
> >
> >
> > I also tried to mount into /tmp, /usr/tmp, and other places but got the
> same result.  I created an empty file in /usr/local/ovsasrv thinking I
> would get a different error like "directory not empty" but it still gave
> the above error.  Can you suggest something else to test local mounting in
> soloboot?
> >
> > Regards,
> > Dale
> >
> > On Wed, Aug 24, 2022 at 3:05 AM Marc  wrote:
> >>
> >> On Wed, Aug 24, 2022 at 12:57 AM Gary, Dale E 
> wrote:
> >> >
> >> > Hi Dave,
> >> >
> >> > There is only one interface on the server.  I am not actually sure
> how it works with the private network, but it is not a dedicated NIC.  I
> was using dnsmasq on the old server but when I changed to the new one I was
> trying not to use it.  Since I can mount the share on another client on the
> private network I don't understand why the ROACH can't do it.  I tried the
> nolock option also, and specifying vers=3.  Nothing makes any difference.
> >>
> >> Perhaps try soloboot the roach

Re: [casper] Help with ROACH2 uboot issue

2022-08-23 Thread Gary, Dale E

Thanks for the reply Marc.  I am able to mount the share from another
machine on the network without problems.  I interrupted the uboot and typed
printenv, which produces this output:

Hit any key to stop autoboot:  0
=> printenv
baudrate=115200
bootargs=console=ttyS0,115200
bootcmd=run netboot
bootdelay=2
bootfile=uImage
clearenv=protect off fff4 fff7;era fff4 fff7;protect on
fff4 fff7
ethact=ppc_4xx_eth0
ethaddr=02:44:01:02:05:06
hostname=roach1
initboot=echo; echo type  run netboot  to boot via dhcp+tftp+nfs; echo type
 run soloboot  to run from flash independent of network; echo
mem=524264k
mmcboot=setenv bootargs ${bootargs} rootdelay=2 root=b301;bootm 0xf800
netboot=dhcp 0x400; setenv bootargs ${bootargs} root='/dev/nfs'
rootpath=${rootpath} ip=dhcp; bootm 0x400
netdev=eth0
newkernel=run yget; run writekernel
newroot=run yget; run writeroot
newuboot=run yget; run writeuboot
preboot=run initboot
rootpath=/dev/mtdblock1
soloboot=setenv bootargs ${bootargs} root=/dev/mtdblock1; bootm 0xf800
tftpkernel=dhcp; tftp 0x400 uImage; run writekernel
tftproot=dhcp; tftp 0x400 romfs; run writeroot
tftpuboot=dhcp; tftp 0x400 u-boot.bin; run writeuboot
usbboot=setenv bootargs ${bootargs} rootdelay=8 root='/dev/sda1 rw'; bootm
0xf800
ver=U-Boot 2011.06-rc2-0-gd422dc0-dirty (Nov 26 2012 - 12:08:53)
writekernel=era 0xf800 0xf83f; cp.b 0x400 0xf800 ${filesize}
writeroot=era 0xf840 0xfc3f; cp.b 0x400 0xf840 ${filesize}
writeuboot=protect off 0xfff8 0x; era 0xfff8 0x;
cp.b 0x400 0xfff8 ${filesize}; protect on 0xfff8 0x
yget=loady 0x400

Environment size: 1516/8187 bytes


The line "netboot=dhcp 0x400; setenv bootargs ${bootargs}
root='/dev/nfs' rootpath=${rootpath} ip=dhcp; bootm" looks promising, so
are you saying I should edit this line and save it using saveenv?  Are
there instructions anywhere on how to do this?

Thanks,
Dale

On Tue, Aug 23, 2022 at 3:40 PM Marc  wrote:

> Hi
>
> So it might be worth checking if your new server actually exports the
> NFS filesystem - "showmount -e localhost" might be a first start, but
> perhaps a better approach would be to try and mount the filesystem on
> a normal PC ?
>
> Other things to try might be the "nolock" option - I am hazy on the
> detail, but I don't think the roaches are running a lock manager
>
> The roaches get their command line via uboot, which can be
> interrupted, and the commandline inspected by typing "printenv". I
> remember the variables being nested - so the "bootargs" variable would
> contain references other ${variables}. Be careful to escape them
> propery, and only use saveenv once you are sure they do what you want.
>
> All the best and regards
>
> marc
>
>
>
> On Tue, Aug 23, 2022 at 7:08 PM Gary, Dale E  wrote:
> >
> > Hi All,
> >
> > I upgraded my remote boot server to a new machine running ubuntu 20.04,
> and although I tried to set everything up the same for remote booting the
> ROACH2s, the process fails as shown in the attached file because the root
> file system could not be mounted.  I found one suggestion on the web to
> edit the Linux/PPC load configuration to add vers=4,tcp to the line, i.e.
> it might look like this:
> >
> > root=/dev/nfs nfsroot=192.100.16.206:/srv/roach2_boot/etch,vers=4,tcp
> ip=dhcp
> >
> > but I cannot find any file where that configuration is set.  There is a
> file in /srv/roach2_boot called pxelinux.cfg that looks promising, but it
> is an empty file.  Am I on the right track?
> >
> > Any suggestions?
> > Thanks,
> > Dale
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "casper@lists.berkeley.edu" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to casper+unsubscr...@lists.berkeley.edu.
> > To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAKeNqUiMwt1RWdSmW6qACbPptY_Dfc1TYiddZa6CdTdUDkDGSA%40mail.gmail.com
> .
>
>
>
> --
> https://katfs.kat.ac.za/~marc/
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAGrhWaT4ZYUX5SKKkxC054rF_pZiTcGB12s3RiSS9wZ3Jd-DUQ%40mail.gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAKeNqUgSPOXfLecqnvxBgs4J_zaF2itqBnnuW_UvRBVr4dip_A%40mail.gmail.com.

[casper] Help with ROACH2 uboot issue

2022-08-23 Thread Gary, Dale E

Hi All,

I upgraded my remote boot server to a new machine running ubuntu 20.04, and
although I tried to set everything up the same for remote booting the
ROACH2s, the process fails as shown in the attached file because the root
file system could not be mounted.  I found one suggestion on the web to
edit the Linux/PPC load configuration to add vers=4,tcp to the line, i.e.
it might look like this:

root=/dev/nfs nfsroot=192.100.16.206:/srv/roach2_boot/etch,vers=4,tcp
ip=dhcp

but I cannot find any file where that configuration is set.  There is a
file in /srv/roach2_boot called pxelinux.cfg that looks promising, but it
is an empty file.  Am I on the right track?

Any suggestions?
Thanks,
Dale

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAKeNqUiMwt1RWdSmW6qACbPptY_Dfc1TYiddZa6CdTdUDkDGSA%40mail.gmail.com.
U-Boot 2011.06-rc2-0-gd422dc0-dirty (Nov 26 2012 - 12:08:53)

CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
   No Security/Kasumi support
   Bootstrap Option C - Boot ROM Location EBC (16 bits)
   32 kB I-Cache 32 kB D-Cache
Board: ROACH2
I2C:   ready
DRAM:  512 MiB
Flash: 128 MiB
In:serial
Out:   serial
Err:   serial
CPLD:  2.1
USB:   Host(int phy)
SN:ROACH2.2 batch=D#5#6 software fixups match
MAC:   02:44:01:02:05:06
DTT:   1 is 27 C
DTT:   2 is 27 C
Net:   ppc_4xx_eth0
Sensors Config
type run netboot to boot via dhcp+tftp+nfs
type run soloboot to run from flash independent of network

Hit any key to stop autoboot:  0
Waiting for PHY auto negotiation to complete... done
ENET Speed is 1000 Mbps - FULL duplex connection (EMAC0)
BOOTP broadcast 1
DHCP client bound to address 192.168.24.121
Using ppc_4xx_eth0 device
TFTP from server 192.100.16.206; our IP address is 192.168.24.121; sending 
through gateway 192.168.24.1
Filename 'uImage-roach2'.
Load address: 0x400
Loading: #
 #
 ###
done
Bytes transferred = 2231549 (220cfd hex)
## Booting kernel from Legacy Image at 0400 ...
   Image Name:   Linux-3.7.0-rc2+
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:2231485 Bytes = 2.1 MiB
   Load Address: 0050
   Entry Point:  005010d4
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK
CPU clock-frequency <- 0x1fca0550 (533MHz)
CPU timebase-frequency <- 0x1fca0550 (533MHz)
/plb: clock-frequency <- 7f28154 (133MHz)
/plb/opb: clock-frequency <- 3f940aa (67MHz)
/plb/opb/ebc: clock-frequency <- 3f940aa (67MHz)
/plb/opb/serial@ef600300: clock-frequency <- 54c563 (6MHz)
/plb/opb/serial@ef600400: clock-frequency <- 54c563 (6MHz)
Memory <- <0x0 0x0 0x3000> (1023MB)
ethernet0: local-mac-address <- 02:44:01:02:05:06

zImage starting: loaded at 0x0050 (sp: 0x1fe22b38)
Allocating 0x4b0508 bytes for kernel ...
gunzipping (0x <- 0x0050e000:0x00966034)...done 0x447880 bytes

Linux/PowerPC load: console=ttyS0,115200 root=/dev/nfs 
rootpath=192.100.16.206:/srv/roach2_boot/etch ip=dhcp
Finalizing device tree... flat tree at 0x973160
roach2: claiming matching platform
Using PowerPC 44x Platform machine description
Linux version 3.7.0-rc2+ (shanly@shanly-HP8710w) (gcc version 4.6.1 20110627 
(prerelease) (GCC) ) #21 Mon Nov 19 09:30:32 SAST 2012
bootconsole [udbg0] enabled
setup_arch: bootmem
arch: exit
Zone ranges:
  DMA  [mem 0x-0x2fff]
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x-0x2fff]
MMU: Allocated 1088 bytes of context maps for 255 contexts
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 195072
Kernel command line: console=ttyS0,115200 root=/dev/nfs 
rootpath=192.100.16.206:/srv/roach2_boot/etch ip=dhcp
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 774576k/786432k available (4176k kernel code, 11856k reserved, 208k 
data, 417k bss, 144k init)
Kernel virtual memory layout:
  * 0xfffdf000..0xf000  : fixmap
  * 0xfde0..0xfe00  : consistent mem
  * 0xfddfe000..0xfde0  : early ioremap
  * 0xf100..0xfddfe000  : vmalloc & ioremap
SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
NR_IRQS:512 nr_irqs:512 16
UIC0 (32 IRQ sources) at DCR 0xc0
UIC1 (32 IRQ sources) at DCR 0xd0
UIC2 (32 IRQ sources) at DCR 0xe0
clocksource: timebase mult[1e0] shift[24] registered
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512
regulator-dummy: no parameters
NET: Registered

Re: [casper] Help with ROACH2 uboot issue

2022-08-23 Thread Gary, Dale E

Hi Dave,

I think you are right that I found the place to do it, but I don't know
what syntax to use to set the environment variable for

netboot=dhcp 0x400; setenv bootargs ${bootargs} root='/dev/nfs'
rootpath=${rootpath} ip=dhcp; bootm 0x400

I am not sure what needs escaping, etc.  Can I just add single quotes
around the whole and escape the ticks?  I guess I'll play with it.

Regards,
Dale

On Tue, Aug 23, 2022 at 5:55 PM David Harold Edward MacMahon <
dav...@berkeley.edu> wrote:

> Hi, Dale,
>
> It sounds like you’ve found where to set the `vers=4` option.  If that
> doesn’t work, it’s possible that the ROACH2s don’t support NFSv4.  If
> that's the case (I’m not sure it is), then you may have to look into
> exporting the root filesystem via NFSv3 instead.
>
> I always have to double check the MTU configuration on the NFS server’s
> network interface.  I think setting it to larger than 1500 won’t work for
> the ROACH2s.
>
> I’m curious to know what the solution ends up being!
>
> Cheers,
> Dave
>
> On Aug 23, 2022, at 12:08 PM, Gary, Dale E  wrote:
>
> Hi All,
>
> I upgraded my remote boot server to a new machine running ubuntu 20.04,
> and although I tried to set everything up the same for remote booting the
> ROACH2s, the process fails as shown in the attached file because the root
> file system could not be mounted.  I found one suggestion on the web to
> edit the Linux/PPC load configuration to add vers=4,tcp to the line, i.e.
> it might look like this:
>
> root=/dev/nfs nfsroot=192.100.16.206:/srv/roach2_boot/etch,vers=4,tcp
> ip=dhcp
>
> but I cannot find any file where that configuration is set.  There is a
> file in /srv/roach2_boot called pxelinux.cfg that looks promising, but it
> is an empty file.  Am I on the right track?
>
> Any suggestions?
> Thanks,
> Dale
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAKeNqUiMwt1RWdSmW6qACbPptY_Dfc1TYiddZa6CdTdUDkDGSA%40mail.gmail.com
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAKeNqUiMwt1RWdSmW6qACbPptY_Dfc1TYiddZa6CdTdUDkDGSA%40mail.gmail.com?utm_medium=email_source=footer>
> .
> 
>
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/61DC9F78-39C0-4EF9-9F84-1E3704648F0D%40berkeley.edu
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/61DC9F78-39C0-4EF9-9F84-1E3704648F0D%40berkeley.edu?utm_medium=email_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAKeNqUgTWZpPgcYfn0ZjRp-9a%2BMH8RyVTn-94H5dRahTdHcmhA%40mail.gmail.com.

Re: [casper] Help with ROACH2 uboot issue

2022-08-23 Thread Gary, Dale E

Hi Dave,

There is only one interface on the server.  I am not actually sure how it
works with the private network, but it is not a dedicated NIC.  I was using
dnsmasq on the old server but when I changed to the new one I was trying
not to use it.  Since I can mount the share on another client on the
private network I don't understand why the ROACH can't do it.  I tried the
nolock option also, and specifying vers=3.  Nothing makes any difference.

Regards,
Dale

On Tue, Aug 23, 2022 at 8:31 PM David Harold Edward MacMahon <
dav...@berkeley.edu> wrote:

> I notice these excerpts from your u-boot log:
>
> TFTP from server 192.100.16.206; our IP address is 192.168.24.121; sending
> through gateway 192.168.24.1
>
>
> …and...
>
> IP-Config: Complete:
>  device=eth0, addr=192.168.24.121, mask=255.255.255.0, gw=192.168.24.1
>  host=roach1.solar.pvt, domain=solar.pvt solar.ovro.caltech.edu,
> nis-domain=(none)
>  bootserver=192.100.16.206, rootserver=192.100.16.206,
> rootpath=/srv/roach2_boot/etch
>  nameserver0=192.100.16.2VFS: Unable to mount root fs via NFS, trying
> floppy.
>
>
> It is not technically wrong to have the boot and root server be on a
> different subnet (192.100.16.x) from the ROACH2s, but I wonder if this is
> what you really intend.  The IP address 192.100.16.206 appears to be the
> public facing IP address of your server.  I suspect maybe you really want
> those to be using IP address 192.168.24.1 (or some other address on the
> 192.168.24.0/24 subnet).  This is something that would get changed in the
> DHCP config files.  Are you using dnsmasq for DHCP/TFTP/DNS for the ROACH2s?
>
> Dave
>
>
> On Aug 23, 2022, at 5:08 PM, Gary, Dale E  wrote:
>
> Hi Dave,
>
> Okay, this "worked" to set those two recommended settings (I also had to
> set the rootpath environment variable correctly first):
>
> setenv netboot 'dhcp 0x400; setenv bootargs ${bootargs} root=/dev/nfs
> rootpath=${rootpath},vers=4,tcp ip=dhcp; bootm 0x400'
>
> but unfortunately the result was the same.  There may be some different
> problem.
>
> Regards,
> Dale
>
> On Tue, Aug 23, 2022 at 6:42 PM David Harold Edward MacMahon <
> dav...@berkeley.edu> wrote:
>
>> I don’t think you need the single quotes in the middle.  I think at the
>> u-boot prompt you can just run:
>>
>> setenv ’netboot=dhcp 0x400; setenv bootargs ${bootargs} root=/dev/nfs
>> rootpath=${rootpath},nfsver=4 ip=dhcp; bootm 0x400’
>>
>> You can run printenv afterwards to see whether it did what we hope it
>> will do.  If it does, I suggest running `run netboot` to test it out.  If
>> it works, you’ll have to reboot, go into netboot, enter the `setenv`
>> command again, then run `saveenv`.  I’m reluctant to change a
>> known-though-non-working config to something that may or may not work, so
>> that’s why I suggest testing before running `saveenv`.  When in doubt, you
>> can always reboot to “undo” any unsaved changes.
>>
>> Good luck,
>> Dave
>>
>> On Aug 23, 2022, at 3:19 PM, Gary, Dale E  wrote:
>>
>> Hi Dave,
>>
>> I think you are right that I found the place to do it, but I don't know
>> what syntax to use to set the environment variable for
>>
>> netboot=dhcp 0x400; setenv bootargs ${bootargs} root='/dev/nfs'
>> rootpath=${rootpath} ip=dhcp; bootm 0x400
>>
>> I am not sure what needs escaping, etc.  Can I just add single quotes
>> around the whole and escape the ticks?  I guess I'll play with it.
>>
>> Regards,
>> Dale
>>
>> On Tue, Aug 23, 2022 at 5:55 PM David Harold Edward MacMahon <
>> dav...@berkeley.edu> wrote:
>>
>>> Hi, Dale,
>>>
>>> It sounds like you’ve found where to set the `vers=4` option.  If that
>>> doesn’t work, it’s possible that the ROACH2s don’t support NFSv4.  If
>>> that's the case (I’m not sure it is), then you may have to look into
>>> exporting the root filesystem via NFSv3 instead.
>>>
>>> I always have to double check the MTU configuration on the NFS server’s
>>> network interface.  I think setting it to larger than 1500 won’t work for
>>> the ROACH2s.
>>>
>>> I’m curious to know what the solution ends up being!
>>>
>>> Cheers,
>>> Dave
>>>
>>> On Aug 23, 2022, at 12:08 PM, Gary, Dale E  wrote:
>>>
>>> Hi All,
>>>
>>> I upgraded my remote boot server to a new machine running ubuntu 20.04,
>>> and although I tried to set everything up the same for remote booting the
>>> ROACH2s, the process fails as shown in

Re: [casper] Help with ROACH2 uboot issue

2022-08-23 Thread Gary, Dale E

Hi Dave,

Okay, this "worked" to set those two recommended settings (I also had to
set the rootpath environment variable correctly first):

setenv netboot 'dhcp 0x400; setenv bootargs ${bootargs} root=/dev/nfs
rootpath=${rootpath},vers=4,tcp ip=dhcp; bootm 0x400'

but unfortunately the result was the same.  There may be some different
problem.

Regards,
Dale

On Tue, Aug 23, 2022 at 6:42 PM David Harold Edward MacMahon <
dav...@berkeley.edu> wrote:

> I don’t think you need the single quotes in the middle.  I think at the
> u-boot prompt you can just run:
>
> setenv ’netboot=dhcp 0x400; setenv bootargs ${bootargs} root=/dev/nfs
> rootpath=${rootpath},nfsver=4 ip=dhcp; bootm 0x400’
>
> You can run printenv afterwards to see whether it did what we hope it will
> do.  If it does, I suggest running `run netboot` to test it out.  If it
> works, you’ll have to reboot, go into netboot, enter the `setenv` command
> again, then run `saveenv`.  I’m reluctant to change a
> known-though-non-working config to something that may or may not work, so
> that’s why I suggest testing before running `saveenv`.  When in doubt, you
> can always reboot to “undo” any unsaved changes.
>
> Good luck,
> Dave
>
> On Aug 23, 2022, at 3:19 PM, Gary, Dale E  wrote:
>
> Hi Dave,
>
> I think you are right that I found the place to do it, but I don't know
> what syntax to use to set the environment variable for
>
> netboot=dhcp 0x400; setenv bootargs ${bootargs} root='/dev/nfs'
> rootpath=${rootpath} ip=dhcp; bootm 0x400
>
> I am not sure what needs escaping, etc.  Can I just add single quotes
> around the whole and escape the ticks?  I guess I'll play with it.
>
> Regards,
> Dale
>
> On Tue, Aug 23, 2022 at 5:55 PM David Harold Edward MacMahon <
> dav...@berkeley.edu> wrote:
>
>> Hi, Dale,
>>
>> It sounds like you’ve found where to set the `vers=4` option.  If that
>> doesn’t work, it’s possible that the ROACH2s don’t support NFSv4.  If
>> that's the case (I’m not sure it is), then you may have to look into
>> exporting the root filesystem via NFSv3 instead.
>>
>> I always have to double check the MTU configuration on the NFS server’s
>> network interface.  I think setting it to larger than 1500 won’t work for
>> the ROACH2s.
>>
>> I’m curious to know what the solution ends up being!
>>
>> Cheers,
>> Dave
>>
>> On Aug 23, 2022, at 12:08 PM, Gary, Dale E  wrote:
>>
>> Hi All,
>>
>> I upgraded my remote boot server to a new machine running ubuntu 20.04,
>> and although I tried to set everything up the same for remote booting the
>> ROACH2s, the process fails as shown in the attached file because the root
>> file system could not be mounted.  I found one suggestion on the web to
>> edit the Linux/PPC load configuration to add vers=4,tcp to the line, i.e.
>> it might look like this:
>>
>> root=/dev/nfs nfsroot=192.100.16.206:/srv/roach2_boot/etch,vers=4,tcp
>> ip=dhcp
>>
>> but I cannot find any file where that configuration is set.  There is a
>> file in /srv/roach2_boot called pxelinux.cfg that looks promising, but it
>> is an empty file.  Am I on the right track?
>>
>> Any suggestions?
>> Thanks,
>> Dale
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To view this discussion on the web visit
>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAKeNqUiMwt1RWdSmW6qACbPptY_Dfc1TYiddZa6CdTdUDkDGSA%40mail.gmail.com
>> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAKeNqUiMwt1RWdSmW6qACbPptY_Dfc1TYiddZa6CdTdUDkDGSA%40mail.gmail.com?utm_medium=email_source=footer>
>> .
>> 
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To view this discussion on the web visit
>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/61DC9F78-39C0-4EF9-9F84-1E3704648F0D%40berkeley.edu
>> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/61DC9F78-39C0-4EF9-9F84-1E3704648F0D%40berkeley.edu?utm_medium=email_source=footer>
>> .
>>
>
> --
> You received this message because you are subscribed to the Google Groups "
> c

Re: [casper] Help with ROACH2 uboot issue

2022-08-24 Thread Gary, Dale E

Hi Dave,

This is the information in the server syslog:

Aug 24 12:41:59 ovsa in.tftpd[35999]: RRQ from 192.168.24.121 filename
uImage-roach2
Aug 24 12:42:16 ovsa rpc.mountd[1022]: authenticated mount request from
192.168.24.121:892 for /mnt/data0/srv/roach2_boot/etch
(/mnt/data0/srv/roach2_boot)
Aug 24 12:42:21 ovsa rpc.mountd[1022]: authenticated mount request from
192.168.24.121:773 for /mnt/data0/srv/roach2_boot/etch
(/mnt/data0/srv/roach2_boot)
Aug 24 12:42:31 ovsa rpc.mountd[1022]: authenticated mount request from
192.168.24.121:932 for /mnt/data0/srv/roach2_boot/etch
(/mnt/data0/srv/roach2_boot)
Aug 24 12:42:51 ovsa rpc.mountd[1022]: authenticated mount request from
192.168.24.121:733 for /mnt/data0/srv/roach2_boot/etch
(/mnt/data0/srv/roach2_boot)
Aug 24 12:43:21 ovsa rpc.mountd[1022]: authenticated mount request from
192.168.24.121:935 for /mnt/data0/srv/roach2_boot/etch
(/mnt/data0/srv/roach2_boot)
Aug 24 12:43:51 ovsa rpc.mountd[1022]: authenticated mount request from
192.168.24.121:799 for /mnt/data0/srv/roach2_boot/etch
(/mnt/data0/srv/roach2_boot)

so it seems that it tried 6 times and succeeded 6 times, but it doesn't
seem to know it.  I guess this suggests a transport issue (version
incompatibility or something) rather than a configuration or permissions
error.

I am curious about the root=/dev/nfs part of the netboot command.  I assume
the ROACH is trying to mount /srv/roach2_boot/etch as /.  There is a
directory /srv/roach2_boot/etch/dev, but not a
/srv/roach2_boot/etch/dev/nfs.  But of course that hasn't changed from
the previous working version, so I don't see how it can be a problem.

Regards,
Dale

On Wed, Aug 24, 2022 at 1:26 AM David Harold Edward MacMahon <
dav...@berkeley.edu> wrote:

> Have you looked for any messages in /var/log/syslog on the NFS server?
> When things work you’ll see a line that says something like “authenticated
> mount request from w.x.y.x” (though I haven’t tried this on a modern Ubuntu
> version so that message might have changed), but if there’s a problem you
> might see a helpful(?) error message instead.
>
> Dave
>
> On Aug 23, 2022, at 5:57 PM, Gary, Dale E  wrote:
>
> Hi Dave,
>
> There is only one interface on the server.  I am not actually sure how it
> works with the private network, but it is not a dedicated NIC.  I was using
> dnsmasq on the old server but when I changed to the new one I was trying
> not to use it.  Since I can mount the share on another client on the
> private network I don't understand why the ROACH can't do it.  I tried the
> nolock option also, and specifying vers=3.  Nothing makes any difference.
>
> Regards,
> Dale
>
> On Tue, Aug 23, 2022 at 8:31 PM David Harold Edward MacMahon <
> dav...@berkeley.edu> wrote:
>
>> I notice these excerpts from your u-boot log:
>>
>> TFTP from server 192.100.16.206; our IP address is 192.168.24.121;
>> sending through gateway 192.168.24.1
>>
>>
>> …and...
>>
>> IP-Config: Complete:
>>  device=eth0, addr=192.168.24.121, mask=255.255.255.0, gw=192.168.24.1
>>  host=roach1.solar.pvt, domain=solar.pvt solar.ovro.caltech.edu,
>> nis-domain=(none)
>>  bootserver=192.100.16.206, rootserver=192.100.16.206,
>> rootpath=/srv/roach2_boot/etch
>>  nameserver0=192.100.16.2VFS: Unable to mount root fs via NFS, trying
>> floppy.
>>
>>
>> It is not technically wrong to have the boot and root server be on a
>> different subnet (192.100.16.x) from the ROACH2s, but I wonder if this is
>> what you really intend.  The IP address 192.100.16.206 appears to be the
>> public facing IP address of your server.  I suspect maybe you really want
>> those to be using IP address 192.168.24.1 (or some other address on the
>> 192.168.24.0/24 subnet).  This is something that would get changed in
>> the DHCP config files.  Are you using dnsmasq for DHCP/TFTP/DNS for the
>> ROACH2s?
>>
>> Dave
>>
>>
>> On Aug 23, 2022, at 5:08 PM, Gary, Dale E  wrote:
>>
>> Hi Dave,
>>
>> Okay, this "worked" to set those two recommended settings (I also had to
>> set the rootpath environment variable correctly first):
>>
>> setenv netboot 'dhcp 0x400; setenv bootargs ${bootargs} root=/dev/nfs
>> rootpath=${rootpath},vers=4,tcp ip=dhcp; bootm 0x400'
>>
>> but unfortunately the result was the same.  There may be some different
>> problem.
>>
>> Regards,
>> Dale
>>
>> On Tue, Aug 23, 2022 at 6:42 PM David Harold Edward MacMahon <
>> dav...@berkeley.edu> wrote:
>>
>>> I don’t think you need the single quotes in the middle.  I think at the
>>> u-boot prompt you can just run:
>>>
>>> setenv ’netboot=

42 matches

Mail list logo