Re: [casper] Used to operate the ROACH2 operating system

2022-06-13 Thread David MacMahon
Hi, Wang,I noticed that it looks like the "rootserver" IP address returned by DHCP is actually the same as the ROACH2’s IP address (192.168.100.1). I would have expected the "rootserver" IP address to be the same as "bootserver" (192.168.100.100).If you can connect the ROACH2 to the DHCP server directly (without any switch in between) then you can watch what it is trying to do on the network using tcpdump or wireshark on the DHCP server.HTH,DaveOn Jun 13, 2022, at 03:19, 王钊  wrote:Hi Heystek,Thank you for your reply!Can you use ISE14.7 when using Ubuntu 18.04?  I think the official website says this version is not supported.I was using Ubuntu 16.04 and had a problem with NFS mount on netboot.



-- 
You received this message because you are subscribed to the Google Groups "casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/78EDEF7F-374F-45C5-A4D7-56D09A4661B1%40berkeley.edu.
In terms of hardware, I consulted the after-sale technician of the computer and they said that my computer does not support CentOS6.5 and Fedora23.I would like to take a look at your Ubuntu 18 file Settings, could you please take a screenshot?1.dnsmasq.conf  2.exports 3.interfaces  and ifconfigBWWangHeystek Grobler  于2022年6月13日周一 17:40写道:Good day Wang. I use Ubuntu 18.04 LTS with Matlab 2012b with my ROACH2 and it is working fine. What are the problems that you are encountering with booting up? What are the hardware that you are running? Have a great day!Heystek 
-Heystek Grobler0832721009heystekgrob...@gmail.com

On 13 Jun 2022, at 05:32, Wang  wrote:Hello CASPER,How's it going?I am currently using ROACH2.However, there are always some problems when booting up, and I want to try another Linux system.I tried installing CentOS6.5 and Fedora23, but my computer couldn't install it due to hardware problems. When I used Ubuntu14.04, my computer was very slow, which affected my operation.What version of Linux system can you recommend?Thanks.Regards.Wang

-- 
You received this message because you are subscribed to the Google Groups "casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/86e7ec24-1a8c-4965-b8e4-94f2b64b7ee0n%40lists.berkeley.edu.




-- 
You received this message because you are subscribed to the Google Groups "casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/EB08001E-C66F-4372-AE6F-934993D63C9A%40gmail.com.




-- 
You received this message because you are subscribed to the Google Groups "casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAEq%3DE3GNYE3LqcKJYMxO9YM6yy3jXy75Eq7zJbJbA5ueh1Chnw%40mail.gmail.com.




-- 
You received this message because you are subscribed to the Google Groups "casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/78EDEF7F-374F-45C5-A4D7-56D09A4661B1%40berkeley.edu.


Re: [casper] low cost academic xilinx RFSOC board - feb 28 tutorial 8am pacific

2021-02-10 Thread David MacMahon
The RFSoC 2x2 sounds intriguing.  It will be interesting to see all its 
capabilities (e.g. will it have high speed networking or other digital I/O?).

I enjoyed the web site's positive spin on the conference's virtual nature:

"Breakfast is available daily in the kitchen. Menu may vary by locale."

Bon appétit,
Dave

> On Feb 10, 2021, at 11:00, Dan Werthimer  wrote:
> 
> 
> 
> dear casper community, 
> 
> please see email below from xilinx's patrick lysaght,
> about their new low cost RFSOC board for academia,  
> and the february 28 tutorial about this board.  
> 
> best wishes,
> 
> dan
> 
> -- Forwarded message -
> Date: Wed, Feb 10, 2021 at 10:00 AM
> Subject: Emailing: isfpga_rfsoc_2x2_tutorial.jpg
> 
> 
>  Dear Friends
> 
> I hope you are all doing well in 2021.  I have some good news that I would 
> like to share with you.  At the end of Feb, we will launch a new RFSoC 
> platform for academia.  It features a new board, the RFSoC 2x2, supported 
> with open source designs and teaching material.  We will be hosting a 
> tutorial on Sunday 28th Feb from 8 - 10 AM PST as part of the ISFPGA 2021 
> conference (https://isfpga.org/ ) to announce and 
> introduce the new RFSoC 2x2 platform.  Everyone is welcome to attend the 
> tutorial and registration is free for students.
> 
> Would you kindly share this invitation with the CASPER community?  For those 
> who would like to attend, links to the tutorial site and the registration 
> pages are provided below.  I have also attached a graphic summarizing the 
> tutorial.
> 
> Best .. Patrick
> 
> More details ...
> A Low-Cost Teaching and Research Platform Based on Xilinx RFSoC Technology 
> and the PYNQ Framework
> Time: February 28, 8:00 AM - 10:00 AM PST
> Organizer: Patrick Lysaght (Xilinx), Robert W. Stewart (Strathclyde)
> 
> The Xilinx Zynq(r) UltraScale+(tm) RFSoC architecture integrates ZU+ MPSoCs 
> with state-of-the-art, analog-to-digital (ADC) and digital-to-analog (DAC) 
> data converters. The combination of banks of high-precision data converters, 
> capable of processing multi giga samples of data per second, along with FPGA 
> fabric and ARM processors creates a uniquely powerful family architecture.  
> RFSoC technology re-defines what is possible in applications such as software 
> defined radio (SDR) and advanced instrumentation.
> This tutorial introduces a new low-cost teaching and research platform for 
> RFSoC, designed especially for academia. The platform exploits the PYNQ 
> open-source framework to provide a highly intuitive user system interface 
> incorporating Linux, Python and Jupyter notebooks. It also comes with a suite 
> of open-source teaching resources including videos, notebooks and design 
> examples.
> We will demonstrate the benefits of integrating direct RF sampling data 
> converters by introducing  a novel, open-source spectrum analyzer built using 
> the new board. This RFSoC design exploits advanced signal processing 
> techniques, including higher-order Nyquist zones, to demonstrate performance 
> that has only previously been achieved on very high-end instrumentation. 
> Using the spectrum analyzer example, we will also demonstrate new approaches 
> to the rapid prototyping of graphical user interfaces for research 
> demonstrators.
> 
> Links ...
> ISFPGA tutorial page: http://bit.ly/ISFPGA_rfsoc2x2 
> 
> Register for ISFPGA & tutorial here: http://bit.ly/ISFPGA_rfsoc2x2 
>  #Xilinx #RFSoC
> 
> 
> 
> 
> This email and any attachments are intended for the sole use of the named 
> recipient(s) and contain(s) confidential information that may be proprietary, 
> privileged or copyrighted under applicable law. If you are not the intended 
> recipient, do not read, copy, or forward this email message or any 
> attachments. Delete this email message and any attachments immediately.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAGHS_vHaL9wJFZjVWYoXPTZfwm%2BE%2BNJPd8QVifZ6-MqhgNX6BQ%40mail.gmail.com
>  
> .
> 

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/18CA6BB0-DC57-48E9-9B74-9096380611F5%40berkeley.edu.


Re: [casper] Issue compiling design from previous version.

2021-02-02 Thread David MacMahon
I wish I could make all my software work just by wanting it to fail! :)

Cheers,
Dave

> On Feb 2, 2021, at 12:27, Guillermo Gancio  wrote:
> 
> Hi Adam,
> 
> Thanks for you answer, and as Murphy said, if it can fail, it
> willbut in this case it didn't fail...
> I was preparing the files from scratch to attach them correctly and to
> copy the error, but in the process it worked ok, I mean, I open the
> m2016a version in the m2018a, updated with "update_casper_blocks",
> save the m2018a model, close everything, reopen the recently updated
> and saved m2018a model, and it compiled ok...
> I guess that with the several tests that I did I have some mix of versions
> 
> If I find what the actual problem was I'll let you know.
> 
> Thanks!
> 
> 
> 
> El mar, 2 feb 2021 a las 3:05, Adam Isaacson () escribió:
>> 
>> Hi Guillermo,
>> 
>> Interesting. Please send me the following:
>> 
>> 1) screen capture of the errors you get before you do "update_casper_blocks".
>> 2) please attach your R2016a slx file
>> 3) please attach your new saved R2018a slx file.
>> 
>> I will then investigate and get back to you.
>> 
>> Kind regards,
>> 
>> Adam
>> 
>> 
>> On Tue, 02 Feb 2021, 2:59 AM Guillermo Gancio,  wrote:
>>> 
>>> Dearest CasperAmigos,
>>> 
>>> I'm having the following issue, I'm trying to compile a simulink
>>> design from m2016a into a m2018a, if I try to compile it directly I
>>> got several errors, but if I do "update_casper_blocks(bdroot)" the
>>> model compiles ok..
>>> 
>>> Now if I save the updated model and reopen it again, I have to do
>>> "update_casper_blocks(bdroot)" again, what I mean is that every time
>>> that the model is opened, I have to update it with
>>> update_casper.--
>>> 
>>> I'm not sure if this is the normal procedure, or if there is a way to
>>> save the model in order to avoid the "update." every time.
>>> 
>>> Cheers.
>>> 
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "casper@lists.berkeley.edu" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to casper+unsubscr...@lists.berkeley.edu.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2BEePfQc_e33QqiK2MZvLV_wjYaRGCNkOioN%3DJAnKJKgXBH3xw%40mail.gmail.com.
>> 
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To view this discussion on the web visit 
>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CADTJ%3DnHQU_CE%3D0qRgs7vQwEBKbLqWCQdDfyje149C1zK0cVptQ%40mail.gmail.com.
> 
> 
> 
> -- 
> Instituto Argentino de Radioastronomia
> [Argentine Institute of Radioastronomy]
> 
> Guillermo M. Gancio
> Responsable Área Observatorio
> [Head of Observatory]
> 
> Tel: (0054-0221) 482-4903 Int: 106
> Mail laboralggan...@iar.unlp.edu.ar
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2BEePfSPOvCEF9sbKpk66%3DUKwmeBYnApSOtp-UVHBpCvzzhMrA%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BFDD1A90-18F8-410A-BF43-0E46EA8571A8%40berkeley.edu.


Re: [casper] New setup installation problems (python setup.py egg_info" failed with error code 1)

2021-01-27 Thread David MacMahon
Wow, thanks for these truly awesome forensics, Adam!  It sounds like you went 
down a rabbit hole and lived to tell the tale.  I'm sure many of us will 
benefit from these details.

Sorry if this is a FAQ, but what are the prospects for moving the tool flow 
beyond Ubuntu 16.04?  That release is really starting to show its age, 
especially since its stock Python3 is only 3.5.  (Though to be fair, Python is 
at least partly to blame due to its poor track record on version compatibility. 
 :P)

Cheers,
Dave

> On Jan 27, 2021, at 13:00, Adam Isaacson  wrote:
> 
> Hi Kaj,
> 
> I am including the CASPER community in this email thread as it applies to 
> everyone.
> 
> Interesting, so I have run into another person with the same virtualenv 
> install issue that you encountered as shown in red below. I have been helping 
> him debug too on a new machine and I am pleased to say that his virtualenv is 
> now working. It seems like the new Ubuntu 16.04LTS installs come with a newer 
> version of virtualenv which is not compatible with python 3.5 - fun, fun, 
> fun. This definitely applies to all Casperites who are installing the 
> toolflow on their new machines.
> 
> kjwiik@casperx:~/work$ virtualenv -p python3 casper_venv
> Traceback (most recent call last):
>File "/usr/local/bin/virtualenv", line 7, in 
>  from virtualenv.__main__ import run_with_catch
>File 
> "/home/kjwiik/.local/lib/python3.5/site-packages/virtualenv/__init__.py", 
> line 3, in 
>  from .run import cli_run, session_via_cli
>File 
> "/home/kjwiik/.local/lib/python3.5/site-packages/virtualenv/run/__init__.py", 
> line 13, in 
>  from .plugin.activators import ActivationSelector
>File 
> "/home/kjwiik/.local/lib/python3.5/site-packages/virtualenv/run/plugin/activators.py",
>  line 6, in 
>  from .base import ComponentBuilder
>File 
> "/home/kjwiik/.local/lib/python3.5/site-packages/virtualenv/run/plugin/base.py",
>  line 9, in 
>  from importlib_metadata import entry_points
>File 
> "/home/kjwiik/.local/lib/python3.5/site-packages/importlib_metadata/__init__.py",
>  line 88
>  dist: Optional['Distribution'] = None
>  ^
> SyntaxError: invalid syntax
> kjwiik@casperx:~/work$
> 
> I want you to check the following versions for me:
> 
> 1) virtualenv -> type "virtualenv --version" at the prompt. You should get 
> 16.7.5. If not then you will need to uninstall the virtualenv by typing: 
> "pip3 uninstall virtualenv" and then we will need to install the 16.7.5 
> version. To install an exact version then type: "pip3 install 
> virtualenv==16.7.5". The use of "sudo" with "-H" arguments may or may not be 
> needed. The installs will guide you.
> 
> 2) What is your Ubuntu 16.04LTS version? -> type "lsb_release -a". I am using 
> Ubuntu 16.04.7 LTS, Xenial.
> 
> 3) What is your pip3 version? -> type "pip3 --version". I am using pip 8.1.1.
> 
> 4) What is your python version? -> type: "python --version". I am using 
> python version 2.7.12
> 
> 5) What is your python3 version? -> type: "python3 --version". I am using 
> python3 version 3.5.2.
> 
> Please make sure all these versions are the same before continuing. I suspect 
> that the latest Ubuntu 16.04LTS installs have made some upgrades that are 
> incompatible.  
> 
> Once these versions are as above then try the following step to create the 
> virtual env:
> 
> 1) Type: "virtualenv -p python3 ". This should create a 
> folder with the same name as "name_of_virtual_env". There should be no issues 
> with the install. It should complete without issues.
> 2) To activate the virtual environment -> type: "source 
> //bin/activate
> 3) Once activated then you can deactivate it by typing "deactivate".
> 4) If the virtual environment is activated then if you type "python" you 
> should get version 3.5.2
> 5) If the virtual environment is deactivated then if you type "python" you 
> should get 2.7.12.
> 
> I think if you can achieve this then you are ready to install the toolflow by 
> following these steps:
> 
> 1) 
> https://casper-toolflow.readthedocs.io/en/latest/src/Installing-the-Toolflow.html#
>  
> 
> 2) 
> https://casper-toolflow.readthedocs.io/en/latest/src/How-to-install-Matlab.html
>  
> 
> 3) 
> https://casper-toolflow.readthedocs.io/en/latest/src/How-to-install-Xilinx-Vivado.html
>  
> 
> 4) 
> https://casper-toolflow.readthedocs.io/en/latest/src/Configuring-the-Toolflow.html
>  
> 
> 5) 
> https://casper-toolflow.readthedocs.io/en/latest/src/Running-the-Toolflow.html
>  
> 
> 6) 

[casper] Happy New Year!

2021-01-04 Thread David MacMahon
Happy New Year to the CASPER community!!!  It's hard to imagine 2021 not being 
better than 2020... :P

Here's some intriguing AMD/Xilinx news for you:

https://hothardware.com/news/amd-patent-hybrid-cpu-fpga-design-xilinx 


In addition to (instead of?) embedding CPUs in FPGAs it looks like FPGAs will 
be embedded in CPUs.  Not sure how useful this will be for CASPER, but it will 
bet interesting to see what this looks like when it's a real product and not 
just a patent.

Cheers,
Dave

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/2954161E-F25E-44D2-9160-B31B0175BC0C%40berkeley.edu.


Re: [casper] Hashpipe_databuf_* control functions semctl/semop errors.

2020-12-18 Thread David MacMahon
Hi, Ross,

> On Dec 18, 2020, at 10:39, Ross Andrew Donnachie  
> wrote:
> 
>  Is it possible that backend code that doesn't enclose its shared memory 
> access (hput*() or hget*()) between hashpipe_status_un/lock_safe() would 
> cause semaphore errors? This could be reasonably deduced as logical as the 
> semaphore error occurred when reading from the hashpipe_status_buffer 

The errors you showed earlier were data buffer related, not status buffer 
related...

> On Dec 16, 2020, at 22:08, Ross Andrew Donnachie  
> wrote:
> 
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_set_filled): semctl error 
> [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_wait_free_timeout): semop 
> error [Invalid argument]
> semop: Invalid argument
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_atasnap_pktsock_thread): error 
> waiting for free databuf [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_set_free): semctl error 
> [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_wait_filled_timeout): 
> semop error [Invalid argument]
> semop: Invalid argument
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_atasnap_pkt_to_FTP_transpose): 
> error waiting for input buffer, rv: -2 [Invalid argument] 

That said, there was a bug fix made to the hget.c file last April in commit 
75a9a17:

https://github.com/david-macmahon/hashpipe/commit/75a9a17b52b265e8caf398640c410f8f0004ac8f
 
<https://github.com/david-macmahon/hashpipe/commit/75a9a17b52b265e8caf398640c410f8f0004ac8f>

This bug could cause corruption of values returned by hget functions iff any 
hget functions were used without locking the semaphore (e.g. if using hget 
functions on memory that isn't the status buffer even if it is only accessed by 
a single thread).  Not sure whether this is related to the problem you're 
experiencing, but all Hashpipe installations should use this latest hget.c file.

HTH,
Dave

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/D6AD8FB0-8D0B-42BC-9794-523A21C1DB3F%40berkeley.edu.


Re: [casper] Hashpipe_databuf_* control functions semctl/semop errors.

2020-12-18 Thread David MacMahon
Hi, Ross,

> On Dec 16, 2020, at 22:08, Ross Andrew Donnachie  
> wrote:
> 
> Been working on a hashpipe with a pipeline of network, transposition and then 
> disk-dump threads. We have 24 data-buffers that we rotate through. 
> 
> An inconsistent (happens after various amounts of time) crash occurs with 
> this printout:
> -
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_set_filled): semctl error 
> [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_wait_free_timeout): semop 
> error [Invalid argument]
> semop: Invalid argument
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_atasnap_pktsock_thread): error 
> waiting for free databuf [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_set_free): semctl error 
> [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_wait_filled_timeout): 
> semop error [Invalid argument]
> semop: Invalid argument
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_atasnap_pkt_to_FTP_transpose): 
> error waiting for input buffer, rv: -2 [Invalid argument] 
> ---

This can happen if data are erroneously written to the header of the data 
buffer to pointer arithmetic) and clobber the semaphore ID that is stored 
there.  One way (but certainly not the only way) this can happen is due to bad 
pointer arithmetic.  You can check for this "corruption" by running 
"hashpipe_check_databuf".  It should show something like the following example 
(though obviously with values specific to your application):

$ hashpipe_check_databuf -K /root
databuf 1 stats:
  data_type='unknown'
  header_size=4096

  block_size=134422528
  n_block=24
  shmid=32769
  semid=0

semaphore mask: 00

Specifically, the shmid and semid value shown should match the values displayed 
by "ipcs -a".

> Other times an error is caught but no full printout from hashpipe_error() is 
> made:
> 
> Code calls:
> 
> hpguppi_databuf_data(struct hpguppi_input_databuf *d, int block_id) {
> if(block_id < 0 || d->header.n_block < block_id) {
> hashpipe_error(__FUNCTION__,
> "block_id %s out of range [0, %d)",
> block_id, d->header.n_block);
> return NULL;
> 
> 
> 
> Printout:
> 
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_databuf_data)~/src/hpguppi_daq/src: 
> 
> 
> Only once have I seen the above printout complete showing that 
> d->header.n_block = -23135124... Which indicates some deep rooted rot 
> somewhere.

Indeed, it looks like corruption of the data buffer header (which can also be 
verified as shown above).

HTH,
Dave

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AA182455-9AB2-4CD9-B4E3-27BF61B21564%40berkeley.edu.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-12-18 Thread David MacMahon
Hi, Mark,

Glad to hear that your segfault issue has gone away :) even though it sounds 
frustrating not to understand why :(.  Here are some additional responses for 
you:

> On Dec 15, 2020, at 21:00, Mark Ruzindana  wrote:
> 
> I'm taking note of the following change for documentation purposes. It's not 
> the reason for my issue. Feel free to ignore or comment on it. This change 
> was made before and remained after I observed the segfault issue. To flush 
> the packets in the port before the thread is run, I am using 
> "p_frame=hashpipe_pktsock_recv_udp_frame_nonblock(p_ps, bindport)" instead of 
> "p_frame=hashpipe_pktsock_recv_frame_nonblock(p_ps, bindport)" in the while 
> loop, otherwise, there's an infinite loop because there are packets with 
> other protocols constantly being captured by the port. 

Looping until hashpipe_recv_udp_frame_nonblock() returns NULL will only discard 
the initial UDP packets and one non-UDP packet.  What sort of packet rate is 
the interface receiving?  I find it hard to imagine packets being received so 
fast that the discard loop never completes.

> Okay, so now, I'm still experiencing dropped packets. Given a kernel page 
> size of 4096 bytes and a frame size of 16384 bytes, I have tried buffer 
> parameters ranging from, 480 to 128000 total number of frames and 60 to 1000 
> blocks respectively. With improvements in throughput in one instance, but not 
> the other three that I have running. The one instance with improvements, on 
> the upper end of that range, exceeds the number of packets expected in a 
> hashpipe shared memory buffer block (the ring buffers in between threads), 
> but only for about four or so of them at the very beginning of a scan. No 
> dropped packets for the rest of the scan. While the other instances, with no 
> recognizable improvements, drop packets through out the scan with one of them 
> dropping significantly more than the other two.

If you are you running four instances on the same host, do they each bind to a 
different interface?  Multiple instances binding to the same interface is not 
going improve performance because each instance will receive copies of all 
packets that arrive at the interface.  This is almost certainly not what you 
want.  The way the packet socket buffer is specified by frames and blocks is a 
bit unusual and the rationale for it could be better explained in the kernel 
docs IMHO.

> I'm currently trying a few things to debug this, but I figured that I would 
> ask sooner rather than later. Is there a configuration or step that I may 
> have missed in the implementation of packet sockets? My understanding is that 
> it should handle my current data rates with no problem. So with multiple 
> instances running (four in my case), I should be able to capture data with 0 
> dropped packets (100% data throughput).

What is the incoming packet rate and data rate?  What packet and data rate are 
your instances achieving?  Obviously both the latter has to be higher than both 
the former or things won't work.

>  Just a note, with a packet size of 8168 bytes, and a frame size of 8192 
> bytes, hashpipe was crashing, but in a completely unrelated way to how it did 
> before. It was not a segfault after capturing the exact number of packets 
> that correspond to the number of frames in the packet socket ring buffer as I 
> described in previous emails. The crashes were more inconsistent and I think 
> it's because the frame size needs to be considerably larger than the packet 
> size. An order of 2 seemed to be enough. I currently have the frame size set 
> to 16384 (also a multiple of the kernel page size), and do not have an issue 
> with hashpipe crashing.

The frame size used when sizing the buffers needs to be large enough to hold 
the entire packet (including network headers) plus TPACKET_HDRLEN.  A 
frame_size of 8192 bytes and a packet size of 8168 bytes leaves just 24 bytes, 
which is definitely less than TPACKET_HDRLEN.  You could probably use 12,288 
bytes (3*4096) instead of 16,384 for a frame size if you really want/need to 
minimize memory usage.  I'm not sure what happens if the frame size is not 
large enough.  At best the packets will get truncated, but that's still not 
good.

Dave

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/8EE2FE4A-8710-4E40-B25F-299B3466C8D6%40berkeley.edu.


Re: [casper] Multi-instance hashpipe "pktsock" on single interface

2020-12-03 Thread David MacMahon
Hi, Wael,

I think I know what's going on here.  You don't say how the reported data rate 
differed from expected, but I suspect the reported data rate was higher than 
expected.  Packet sockets are a low level packet delivery mechanism supported 
by the kernel.  It allows the kernel to copy packets directly into memory that 
is mapped into the memory space of a user process (e.g. hashpipe).  The kernel 
does no filtering (by default) on the incoming packets before delivering them 
to the user process(es) that have requested them.  The selection by port 
happens (by default) at the application layer.  This means that two hashpipe 
instances using packet sockets to listen to the same network interface will 
each receive copies of all packets, regardless of the destination UDP port, 
even if they only want a specific UDP destination port.  This is very similar 
to how two tcpdump instances will get copies of all packets.

Alessio Magro has done some work to use the "Berkeley Packet Filter" 
(https://www.kernel.org/doc/html/latest/networking/filter.html) to perform 
low-level packet filtering in the kernel with packet sockets in hashpipe.  I 
think that approach could allow you to achieve the packet filtering that you 
want, but it's somewhat non-trivial to implement.

As for 100% CPU utilization, that could be due to using "busywait" versions of 
the status buffer locking and/or data buffer access functions or it could just 
be due to the net threads being very busy processing packets.

HTH,
Dave


> On Dec 2, 2020, at 19:06, Wael Farah  wrote:
> 
> Hi Folks,
> 
> Hope everyone's doing well.
> 
> I have an application I am trying to develop using hashpipe, and one of the 
> solutions might be using multiple instances of hashpipe on a single 40 GbE 
> interface.
> 
> When I tried running 2 instances of hashpipe I faced a problem. The data rate 
> reported by the instances does not match that expected from the TXs. No 
> issues were seen if I reconfigure the TXs to send data to a single port, 
> rather than 2, and initialising a single hashpipe thread. Can the 2 
> netthreads compete for resources on the NIC even if they are bound to 
> different ports? I've also noticed that the CPU usage for the 2 netthreads is 
> always 100%.
> I am using "hashpipe_pktsock_recv_udp_frame" for the acquisition.
> 
> Has anyone seen this/similar issue before?
> 
> Thanks!
> Wael
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CALO2pVe814yov06vb%3DeqSgXJdkN%2BDc3gEcF63Xwb7Kk_YGMy2Q%40mail.gmail.com
>  
> .

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/1BA4B4C2-05F7-4BC2-AB49-C7181748B26A%40berkeley.edu.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-12-03 Thread David MacMahon
Hi, Mark,

Sorry to hear you're still getting a segfault.  It sounds like you made some 
progress with gdb, but the fact that you ended up with a different sort of 
error suggests that you were starting hashpipe in the debugger.  To debug your 
initial segfault problem, you can run hashpipe without the debugger, let it 
segfault and generate a core file, then use gdb and the core file (and 
hashpipe) to examine the state of the program when the segfault occurred.  The 
tricky part is getting the core file to be generated on a segfault.  You 
typically have to increase the core file size limit using "ulimit -c unlimited" 
and (because hashpipe is typically installed with the suid bit set) you have to 
let the kernel know it's OK to dump core files for suid programs using "sudo 
sysctl -w fs.suid_dumpable=1" (or maybe 2 if 1 doesn't quite do it).  You can 
read more about these steps with "help ulimit" (ulimit is a bash builtin) and 
"man 5 proc".

Once you have the core file (typically named "core" but it may have a numeric 
extension from the PID of the crashing process) you can debug things with "gbd 
/path/to/hashpipe /path/to/core/file".  Note that the core file may be created 
with permissions that only let root read it, so you might have to "sudo chown 
a+r core" or similar to get read access to it.  This starts the debugger in a a 
sort of forensic mode using the core file as a snapshot of the process and its 
memory space at the time of the segfault.  You can use "info threads" to see 
which threads existed, "thread N" to switch between threads (N is a thread 
number as shown by "info threads"), "bt" to see the function call backtrace fo 
the current thread, and "frame N" to switch to a specific frame in the function 
call backtrace.  Once you zero in on which part of your code was executing when 
the segfault occurred you can examine variables to see what exactly caused the 
segfault to occur.  You might find that the "interesting" or "relevant" 
variables have been optimized away, so you may want/need to recompile with a 
lower optimization level (e.g. -O1 or maybe even -O0?) to prevent that from 
happening.

Because this happens when you reach the end of your data buffer, I have to 
think it's a pointer arithmetic error of some sort.  If you can't figure out 
the problem from the core file, please create a "minimum working example" 
(well, in this case I guess a minimum non-working example), including a dummy 
packet generator script that creates suitable packets, and I'll see if I can 
recreate the problem.

HTH,
Dave

> On Nov 30, 2020, at 14:45, Mark Ruzindana  wrote:
> 
> 'm currently using gdb to debug and it either tells me that I have a 
> segmentation fault at the memcpy() in process_packet() or something very 
> strange happens where the starting mcnt of a block greatly exceeds the mcnt 
> corresponding to the packet being processed and there's no segmentation fault 
> because the mcnt distance becomes negative so the memcpy() is skipped. 
> Hopefully that wasn't too hard to track. Very strange problem that only 
> occurs with gdb and not when I run hashpipe without it. Without gdb, I get 
> the same segmentation fault at the end of the circular buffer as mentioned 
> above.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AC9534AD-390F-44D8-ABFE-8AE76F059957%40berkeley.edu.


Re: [casper] ROACH 2 10 GbE Troubleshooting

2020-10-23 Thread David MacMahon
Thanks for sharing, Ben!

It's not easy to acknowledge solving "silly" problems of one's own making, but 
it is very valuable/helpful to have these explained on the mailing list.  
Things like this are far more common than one would perhaps care to admit, 
especially among more experienced CASPER folks :P, so it's nice to have a 
reminder in the archives for future victims of such self-induced silliness.

FWIW, my Achilles heel is usually forgetting to increase the MTU setting on the 
switch ports and/or the network interfaces of the receiving computer.

It might be useful and maybe even fun(ny?) to start a poll on the CASPER wiki 
to find out which is the most common and/or silliest silly problem.  In the end 
I find these types of problems have equal parts frustration ("why isn't this 
working?!"), embarrassment/astonishment ("I can't believe I did/forgot 
that!?"), and relief/joy ("yay, it's working now!!"), usually in that order. :)

Cheers,
Dave

> On Oct 22, 2020, at 17:06, 'Benjamin Godfrey' via casper@lists.berkeley.edu 
>  wrote:
> 
> Hi Marc and Jack, 
>Thank you again for all the suggestions. After fiddling for a while, the 
> answer ended up being sillier than I expected: I had selected the wrong slot 
> number on the ten gbe yellow block, and I was also using the wrong SFP+ port 
> on the PC side of things. No doubt there will be other issues that come up, 
> but I can at least digitize mock data using the ROACH2 and read it using a 
> Python script now. 
> 
> - Ben G.
> 
> On Tue, Oct 20, 2020 at 3:02 AM Marc mailto:m...@ska.ac.za>> 
> wrote:
> Hi
> 
> Hmm... if you are capable of pinging things in one direction, then
> tcpborphserver is at least partially up - amongst other things, it is
> responsible for picking up frames from the fpga and handing them
> off to the kernel, which then does the IP logic and vice versa.
> 
> You seem to have problems with arp, and say that you have
> prepopulated the arp tables on the roach with set arp - maybe
> you will have to do the same on the PC side. Linux at least
> as an "arp -s" command to hardcode them into the PC arp
> cache (cat /proc/net/arp).
> 
> Note that the roaches do arp in an usual way - they iterate over
> the subnet (fixed size) and query the hardware addresses
> periodically and pre-emptively, unlike normal arp which only does
> that on demand. This is needed as the ppc/tcpborphserver
> might have no idea which stations the fpga is trying
> to reach. So if you run tcpdump on a PC, you should
> see these queries all the time, if the tap device is up.
> 
> There are commands like ?tap-info and ?tap-arp-reload
> which might give you more detail, either on the roach
> type "kcpcmd tap-info", or remotely
> 
> echo "?tap-info" | nc -q 2 -w 2 ip-of-roach 7147
> 
> Note that you will have to use those commands, rather
> then looking in /proc/net/arp on the roach, as arp
> isn't handled by the ppc linux kernel - those tables
> have to be shared with the fpga.
> 
> regards
> 
> marc
> 
> On Tue, Oct 20, 2020 at 8:42 AM 'Benjamin Godfrey' via
> casper@lists.berkeley.edu  
> mailto:casper@lists.berkeley.edu>> wrote:
> >
> > Hi Jack,
> >Thank you for all your suggestions. Really appreciate all the 
> > troubleshooting help. Going through your suggestions in order:
> >
> > - EOF is going low with the final valid signal in simulation
> > - But valid is always high when I read the snapshot block, which is 
> > unexpected (need to dig further to figure out why this is happening). EOF, 
> > though, is still going high for one clock cycle at the  expected time.
> > - Reading from the transmit full output reports false, but I don't really 
> > understand this since the valid signal is always high.
> >
> > I was having issues with the tap interface populating the ARP table with 
> > correct addresses so I've now taken to populating it manually (using 
> > set_arp_table, which I found in the docs). Furthermore, I've had problems 
> > being able to ping the ROACH from the PC. I am now able to ping the PC 
> > logged into the ROACH, but I am unable to ping the ROACH from the PC side. 
> > Do you know why this may be the case?
> >
> > I definitely have some paths to explore.
> >
> > Thanks,
> > Ben G.
> >
> > On Tue, Oct 20, 2020 at 12:56 AM Jack Hickish  > > wrote:
> >>
> >> Hi Ben,
> >>
> >> Before getting too far into the power PC software side, some basic checks 
> >> in firmware which are probably worth doing -
> >>
> >> - does EOF go high with (not after) the last valid sample?
> >> - can you (using a snapshot block) verify that what is happening in 
> >> firmware with the vld / EOF signals matches your simulation?
> >> - do you have the ability to read the Tge overflow outputs, which are a 
> >> good indicator of something going awry?
> >>
> >> If you compile with the "enable core on startup" option on the The block 
> >> checked, you should be able to transmit regardless of the 

Re: [casper] NIC tuning and IRQ binding : Regarding

2020-09-10 Thread David MacMahon
Hi, Hari,

I think modern Linux network drivers use a "polling" approach rather than an 
interrupt driven approach, so I've found IRQ affinity to be less important than 
it used to be.  This can be observed as relatively low interrupt counts in 
/proc/interrupts.  The main things that I've found beneficial are:

1. Ensuring that the processing code runs on CPU cores in the same socket that 
the NIC's PCIe slot is connected to.  If you have a multi-socket NUMA system 
you will want to become familiar with its NUMA topology.  The "hwloc" package 
includes the cool "lstopo" utility that will show you a lot about your system's 
topology.  Even on a single socket system it can help to stay away from core 0 
where many OS things tend to run.

2. Ensuring that memory allocations happen after your processes/threads have 
had their CPU affinity set, either by "taskset" or "numactl" or its own 
built-in CPU affinity setting code.  This is mostly for NUMA systems.

3. Ensuring that various buffers are sized appropriately.  There are a number 
of settings that can be tweaked in this category, most via "sysctl".  I won't 
dare to make any specific recommendations here.  Everybody seems to have their 
own set of "these are the settings I used last time".  One of the most 
important things you can do in your packet receiving code is to keep track of 
how many packets you receive over a certain time interval.  If this value does 
not match the expected number of packets then you have a problem.  Any 
difference usually will be that the received packet count is lower than the 
expected packet count.  Some people call these dropped packets, but I prefer to 
call them "missed packets" at this point because all we can say is that we 
didn't get them.  We don't yet know what happened to them (maybe they were 
dropped, maybe they were misdirected, maybe they were never sent), but it helps 
to know where to look to find out.

4. Places to check for missing packets getting "dropped":

4.1 If you are using "normal" (aka SOCK_DGRAM) sockets to receive UDP packets, 
you will see a line in /proc/net/udp for your socket.  The last number on that 
line will be the number of packets that the kernel wanted to give to your 
socket but couldn't because the socket's receive buffer was full so the kernel 
had to drop the packet.

4.2 If you are using "packet" (aka SOCK_RAW) sockets to receive UDP packets, 
there are ways to get the total number of packets the kernel has handled for 
that socket and the number it had to drop because of lack of kernel/application 
buffer space.  I forget the details, but I'm sure you can google for it.  If 
you're using Hashpipe's packet socket support it has a function that will fetch 
these values for you.

4.3 The ifconfig utility will give you a count of "RX errors".  This is a 
generic category and I don't know all possible contributions to it, but one is 
that the NIC couldn't pass packets to the kernel.

4.4 Using "ethtool -S IFACE" (eg "ethtool -S eth4") will show you loads of 
stats.  These values all come from counters on the NIC.  Two interesting ones 
are called something like "rx_dropped" and "rx_fifo_errors".  A non-zero 
rx_fifo_errors value means that the kernel was not keeping up with the packet 
rate for long enough that the NIC/kernel buffers filled up and packets had to 
be dropped.

4.5 If you're using a lower-level kernel bypass approach (e.g. IBVerbs or 
DPDK), then you may have to dig a little harder to find the packet drop 
counters as th kernel is no longer involved and all the previously mentioned 
counters will be useless (with the possible exception of the NIC counters).

4.6 You may be able to login to and query your switch for interface statistics. 
 That can show various data and packet rates as well as bytes sent, packets 
sent, and some various error counters.

One thing to remember about buffer sizes is that if your average processing 
rate isn't keeping up with the data rate, larger buffers won't solve your 
problem.  Larger buffers will only allow the system to withstand slightly 
longer temporary lulls in throughput ("hiccups") if the overall throughput of 
the system (including the lulls/hiccups) is as fast or (ideally) faster than 
the incoming data rate.

Hope this helps,
Dave

> On Sep 9, 2020, at 22:15, Hariharan Krishnan  
> wrote:
> 
> Hello Everyone,
> 
>   I'm trying to tune the NIC on a server with Ubuntu 18.04 OS 
> to listen to a multicast network and optimize it for throughput through IRQ 
> affinity binding. It is a Mellanox card and I tried using the "mlnx_tune" for 
> doing this, but haven't been successful. 
> I would really appreciate any help in this regard.
> 
> Looking forward to responses from the group.
> 
> Thank you.
> 
> Regards,
> 
> Hari
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, 

Re: [casper] Help with timing constraint

2020-08-27 Thread David MacMahon
Hi, Heystek,

I think the build will generate a timing report in a file that ends with 
".twr".  If it's not, you can generate one using the "trce" utility (part of 
ISE).  This will tell you how many nets failed timing and show details of the N 
worst offenders (I think N defaults to 3 or maybe 10?).  If you have a large 
number of nets failing timing it's likely more difficult to rectify, but if 
there's only a few it's usually not too hard to resolve.

Happy hunting,
Dave

> On Aug 27, 2020, at 04:46, Heystek Grobler  wrote:
> 
> Hey Andrew and James and everyone. 
> 
> After probed around and the following timing constraint is not met: 
> 
> TS_sys_clk_n
> 
> I assume that my system clock is not running at an appropriate frequency?
> 
> Thanks for the help! 
> 
> Heystek 
> 
> 
> 
> On Wed, Aug 26, 2020 at 10:59 AM Andrew Martens  > wrote:
> Hi Heystek
> 
> Output reports and their location change over versions, between ISE and 
> Vivado etc. I think the output reports for ISE are located in the 
> 'implementation' folder. I think the timing related ones have 'timing' in the 
> name... A quick Google search of the error will help.
> 
> Note that there are archives of the mailing list available at 
> https://www.mail-archive.com/casper@lists.berkeley.edu/ 
>  - your problem has 
> probably been answered already previously.
> 
> Regards
> Andrew
> 
> On Wed, Aug 26, 2020 at 10:49 AM Andrew van der Byl  > wrote:
> Hi Heystek,
> 
> It's possible that you then have another issue that causes the build process 
> to exit prior to generating that file. You'll need to debug that first.
> 
> Regards,
> Andrew
> 
> On Wed, Aug 26, 2020 at 10:40 AM Heystek Grobler  > wrote:
> Hey Andrew
> 
> It is strange, I cant seem to locate top_timing_summary_routed.rpt
> 
> I am running Matlab 2012B with ISE 14.7 
> 
> 
> 
>> On 26 Aug 2020, at 10:27, Andrew van der Byl > > wrote:
>> 
>> Hi Heystek,
>> 
>> 1) Navigate to your project folder
>> 2) Then go to and open: 
>> /myproj/myproj.runs/impl_1/top_timing_summary_routed.rpt
>> 
>> Just a note - this file is usually fairly large as text files go ~20MB.
>> 
>> Regards,
>> Andrew
>> 
>> On Wed, Aug 26, 2020 at 10:22 AM Heystek Grobler > > wrote:
>> Hey James and Andrew
>> 
>> Thank you so much for the advice! 
>> 
>> @Andrew, this might be a stupid question, but where do I locate the 
>> top_timing_summary_routed.rpt file? 
>> 
>> Heystek 
>> 
>> 
>>> On 26 Aug 2020, at 10:17, Andrew van der Byl >> > wrote:
>>> 
>>> Hi Heystek,
>>> 
>>> Have a look in top_timing_summary_routed.rpt and search for 'VIOLATED' - 
>>> this usually shows up which paths are hurting your design. Then, as James 
>>> said, start pipeling your design.
>>> 
>>> Hope this helps.
>>> 
>>> Regards,
>>> Andrew
>>> 
>>> On Wed, Aug 26, 2020 at 10:13 AM James Smith >> > wrote:
>>> Hello Heystek,
>>> 
>>> You will have to go through the timing reports and see which signal path is 
>>> failing timing, and by how much.
>>> 
>>> Once you have an idea, you will need to sprinkle delay blocks and / or 
>>> adjust latencies in your logic to get to a point where the place-and-route 
>>> can find a layout that satisfies timing requirements.
>>> 
>>> It's a bit of a black art, always hit and miss for me.
>>> 
>>> Regards,
>>> James
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Aug 26, 2020 at 8:00 AM Heystek Grobler >> > wrote:
>>> Good day everyone
>>> 
>>> I am running a design but ran into this problem:
>>> 
>>> xflow done!
>>> touch __xps/system_routed
>>> xilperl /opt/Xilinx_ISE/14.7/ISE_DS/EDK/data/fpga_impl/observe_par.pl 
>>>  -error yes implementation/system.par
>>> Analyzing implementation/system.par
>>> 
>>> ERROR: 1 constraint not met.
>>> 
>>> PAR could not meet all timing constraints. A bitstream will not be 
>>> generated.
>>> 
>>> To disable the PAR timing check:
>>> 
>>> 1> Disable the "Treat timing closure failure as error" option from the 
>>> Project Options dialog in XPS.
>>> 
>>> OR
>>> 
>>> 2> Type following at the XPS prompt:
>>> XPS% xset enable_par_timing_error 0
>>> 
>>> system.make:140: recipe for target 'implementation/system.bit' failed
>>> gmake: *** [implementation/system.bit] Error 1
>>> ERROR:EDK -  
>>>Error while running "gmake -f system.make bits".
>>> 
>>> It seems to be a timing constraint. 
>>> 
>>> How do I deal with this?
>>> 
>>> Thanks for the help! 
>>> 
>>> Heystek 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "casper@lists.berkeley.edu " group.

Re: [casper] SNAP FPGA data endianness and networking

2020-08-18 Thread David MacMahon
Is it April already? :) :) :)

> On Aug 18, 2020, at 10:43, Jack Hickish  wrote:
> 
> There is, of course, always the compromise option of using half 
> network-endianness and half little-endianness. For example, all positive 
> numbers could be encoded with big-endian and negative numbers could be 
> encoded little-endian. This would incur a similar overhead on both little- 
> and big-endian CPU platforms, and would also be easily parallelizable on a 
> GPU decoder.
> 
> Yours,
> 
> Nathan Poe
> 
> On Tue, 18 Aug 2020 at 17:18, James Smith  <mailto:jsm...@ska.ac.za>> wrote:
> Hi Dave,
> 
> Yes of course! Though it makes little sense IMO to do the conversion on the 
> host CPU, as GPUs are pretty well-equipped to do this operation pretty 
> quickly if the need arises.
> 
> In some cases being pragmatic is important - if your instrument is small, for 
> example, and you don't have any user-supplied equipment. In the MeerKAT case 
> however, we specifically cater for having third-party computers connecting to 
> our network, then some sort of standards-compliance comes in very handy. 
> Though most of our data is 8- (or 10-) bit anyway so byte order makes little 
> difference.
> 
> Regards,
> James
> 
> 
> On Tue, Aug 18, 2020 at 3:30 PM David MacMahon  <mailto:dav...@berkeley.edu>> wrote:
> I guess I’m going to play angels’s advocate and suggest the pragmatic over 
> the dogmatic. :)
> 
> Some standards mandate network byte order, aka big endian, but if you’re not 
> constrained in that way and you know that the data will be processed 
> downstream by a little-endian system for the foreseeable future, then I think 
> it makes sense to send it out in little-endian form. You can use `le32toh()` 
> etc in the receiving code to make it host-endian agnostic, but on 
> little-endian systems that is optimized away to nothing. Sure, that might 
> only be saving 1 CPU cycle per value, but when you’re dealing with billions 
> of values per second that can start adding up!
> 
> Of course, the packet format should be documented regardless of which 
> endianess is used. Future users will thank you.
> 
> Cheers,
> Dave
> 
>> On Aug 18, 2020, at 07:21, James Smith > <mailto:jsm...@ska.ac.za>> wrote:
>> 
>> 
>> Hello Nitish,
>> 
>> So I'm going to play devil's advocate and say that while you could do the 
>> byte swapping in the FPGA, it would be morally wrong ;-)
>> 
>> Ideally, all data that goes out on a network will be network order, and you 
>> use the ntohl or htohs functions to get it in host format. That way the code 
>> stays more portable - if you one day find yourself on a big-endian system, 
>> it would work without modification.
>> (https://en.wikipedia.org/wiki/Endianness#Networking 
>> <https://en.wikipedia.org/wiki/Endianness#Networking>)
>> 
>> Sometimes for performance reasons you may have to make these kinds of 
>> compromises, and if you do you should document them well! But most modern 
>> servers should have no issue with 10Gb/s datarates. You could probably even 
>> do the swaps in the GPUs using Nvidia's primitives.
>> 
>> Regards,
>> James
>> 
>> 
>> 
>> 
>> On Tue, Aug 18, 2020 at 1:28 PM Nitish Ragoomundun 
>> mailto:nitish.ragoomun...@gmail.com>> wrote:
>> Hi,
>> 
>> Thanks a lot Jack. It makes sense.
>> And thank you very much for the note on the 2x32-bit pair. It is exactly how 
>> our data is formatted.
>> Ok, we will go with an FPGA correction instead of a CPU byteswap. I am 
>> guessing it will be faster this way.
>> 
>> Thanks again.
>> Cheers
>> Nitish
>> 
>> 
>> On Tue, Aug 18, 2020 at 4:47 PM Jack Hickish > <mailto:jackhick...@gmail.com>> wrote:
>> Hi Nitish,
>> 
>> To try and answer your first question without adding confusion --
>> 
>> If you send a UFix64_0 value into the 10GbE block, you will need to 
>> interpret it on the other end via an appropriate 64-bit byte swap if your 
>> CPU is little-endian.
>> If you send a 64-bit input into the 10GbE block where the most significant 
>> 32 bits are the value A, and the least significant bits are value B, you 
>> should interpret the 64-bits  on your little endian CPU as the struct
>> 
>> typedef struct pkt {
>>   uint32_t A;
>>   uint32_t B;
>> } pkt;
>> 
>> where each of the A and B will need byteswapping before you use them.
>> 
>> To answer your second question --
>> Yes, you can absolutely flip the endianness on the FPGA prior to 
>> transmission so you don

Re: [casper] SNAP FPGA data endianness and networking

2020-08-18 Thread David MacMahon
I guess I’m going to play angels’s advocate and suggest the pragmatic over the 
dogmatic. :)

Some standards mandate network byte order, aka big endian, but if you’re not 
constrained in that way and you know that the data will be processed downstream 
by a little-endian system for the foreseeable future, then I think it makes 
sense to send it out in little-endian form. You can use `le32toh()` etc in the 
receiving code to make it host-endian agnostic, but on little-endian systems 
that is optimized away to nothing. Sure, that might only be saving 1 CPU cycle 
per value, but when you’re dealing with billions of values per second that can 
start adding up!

Of course, the packet format should be documented regardless of which endianess 
is used. Future users will thank you.

Cheers,
Dave

> On Aug 18, 2020, at 07:21, James Smith  wrote:
> 
> 
> Hello Nitish,
> 
> So I'm going to play devil's advocate and say that while you could do the 
> byte swapping in the FPGA, it would be morally wrong ;-)
> 
> Ideally, all data that goes out on a network will be network order, and you 
> use the ntohl or htohs functions to get it in host format. That way the code 
> stays more portable - if you one day find yourself on a big-endian system, it 
> would work without modification.
> (https://en.wikipedia.org/wiki/Endianness#Networking)
> 
> Sometimes for performance reasons you may have to make these kinds of 
> compromises, and if you do you should document them well! But most modern 
> servers should have no issue with 10Gb/s datarates. You could probably even 
> do the swaps in the GPUs using Nvidia's primitives.
> 
> Regards,
> James
> 
> 
> 
> 
>> On Tue, Aug 18, 2020 at 1:28 PM Nitish Ragoomundun 
>>  wrote:
>> Hi,
>> 
>> Thanks a lot Jack. It makes sense.
>> And thank you very much for the note on the 2x32-bit pair. It is exactly how 
>> our data is formatted.
>> Ok, we will go with an FPGA correction instead of a CPU byteswap. I am 
>> guessing it will be faster this way.
>> 
>> Thanks again.
>> Cheers
>> Nitish
>> 
>> 
>>> On Tue, Aug 18, 2020 at 4:47 PM Jack Hickish  wrote:
>>> Hi Nitish,
>>> 
>>> To try and answer your first question without adding confusion --
>>> 
>>> If you send a UFix64_0 value into the 10GbE block, you will need to 
>>> interpret it on the other end via an appropriate 64-bit byte swap if your 
>>> CPU is little-endian.
>>> If you send a 64-bit input into the 10GbE block where the most significant 
>>> 32 bits are the value A, and the least significant bits are value B, you 
>>> should interpret the 64-bits  on your little endian CPU as the struct
>>> 
>>> typedef struct pkt {
>>>   uint32_t A;
>>>   uint32_t B;
>>> } pkt;
>>> 
>>> where each of the A and B will need byteswapping before you use them.
>>> 
>>> To answer your second question --
>>> Yes, you can absolutely flip the endianness on the FPGA prior to 
>>> transmission so you don't have to byteswap on your CPU. You can either do 
>>> this with a bus-expand + bus-create blocks, using the first to split your 
>>> words into bytes, and then flipping them before concatenating. The Xilinx 
>>> "bitbasher" block would also be good for this, using the Verilog (for a 
>>> 64-bit input):
>>> 
>>> out = {in[7:0], in[15:8], in[23:16], in[31:24], in[39:32], in[47:40], 
>>> in[55:48], in[63:48]}
>>> 
>>> If your 64 bit data streams are not made up of 64-bit integers (eg, they 
>>> are pairs of 32-bit integers) then you should flip the 4 bytes of each 
>>> value individually, but leave the ordering of the two values within the 64 
>>> bits unchanged.
>>> 
>>> Hopefully that makes sense
>>> 
>>> Jack
>>> 
>>> 
 On Tue, 18 Aug 2020 at 13:28, Nitish Ragoomundun 
  wrote:
 
 Hello,
 
 We are setting up the digital back-end of a low-frequency telescope 
 consisting of SNAP boards and GPUs. The SNAP boards packetize the data and 
 send to the GPU processing nodes via 10 GbE links. We are currently 
 programming the packetizer/depacketizer.
 I have a few questions about the 10gbe yellow blocks and endianness. We 
 observed from the tutorials that the data stored in bram is big-endian. I 
 would like to know how the data is handled by the 10gbe and in what form 
 is it sent over the network.
 Our depacketizers run on Intel processors, which are little-endian. We are 
 aware that network byte order is big-endian, but we noticed that integer 
 data can be sent from one Intel machine to another via network without 
 ever calling ntohl( ) or htonl( ) and the data was preserved. So, we would 
 like to know if we need to correct the endianness when receiving the data 
 from the SNAP.
 
 If we need to perform this correction, is there a way we could possibly 
 correct the endianness on the FPGA itself before input to the 10gbe block?
 
 Thanks,
 Nitish
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 

Re: [casper] Installation of Matlab 2012B

2020-08-18 Thread David MacMahon
The consistent albeit cryptic names like “enp0s5” might make life easier for 
automating Linux installations, but I don’t think they make life easier for 
sysadmins or power users. Fortunately, this naming scheme is optional and it’s 
easy to switch to the more human-friendly names by adding “net.ifnames=0” to 
the kernel command line (probably in some grub config file, depending on your 
distribution). You might also need “net.biosdevnames=0”.  Googling those terms 
plus your distro’s name will get you some relevant pages with more details.

A rose by any other name would smell as sweet,
Dave

> On Aug 18, 2020, at 04:00, James Smith  wrote:
> 
> 
> Hi Heystek,
> 
> Unfortunately not - I have had this in the past as well IIRC, some of the 
> more modern Linux distributions will give you something like "en0s1" or the 
> like. Matlab is stuck in the past, looking for eth0.
> 
> It's easy enough to change the name, but bear in mind that you may have some 
> funnies elsewhere that you will need to change as well (e.g. if you have 
> /etc/network/interfaces - you'll need to update that too).
> 
> Regards,
> james
> 
> 
>> On Tue, Aug 18, 2020 at 10:51 AM Heystek Grobler  
>> wrote:
>> Hey Mike
>> 
>> Thank you for your reply! 
>> 
>> On the Mathworks forums some of the folks suggest to “force” a name change. 
>> Apparently the license is looking for “eth0” but on my machine it is “em1”.  
>> That is what is. causing the error. 
>> 
>> I was just wondering if there is perhaps a more elegant solution to this. 
>> 
>> Thanks for the help! 
>> 
>> Heystek 
>> 
>> 
>>  
>> 
>>> On 18 Aug 2020, at 12:44, Michael D'Cruze  
>>> wrote:
>>> 
>>> Hi Heystek,
>>>  
>>> I’ve seen a similar thing recently installing ISE on a Linux 7 machine. It 
>>> looks like a complaint about the naming convention of your primary NIC. You 
>>> can force a name-change if you want using the network manager (I did it in 
>>> RHEL, unsure about Ubuntu) but better to find a solution from Mathworks if 
>>> you can. What does the indicated solution say?
>>>  
>>> Good luck,
>>> Mike
>>>  
>>>  
>>> From: Heystek Grobler [mailto:heystekgrob...@gmail.com] 
>>> Sent: 18 August 2020 11:34
>>> To: Casper Lists
>>> Subject: [casper] Installation of Matlab 2012B
>>>  
>>> Hello everyone
>>>  
>>> I have a bit of a problem. The first time that I am experiencing it. I am 
>>> trying to install Matlab 2012B on a Ubuntu machine (That I redid), but the 
>>> installation gives this error:
>>>  
>>> 
>>>  
>>> Does anyone perhaps know how to fix this?
>>>  
>>> Heystek 
>>>  
>>>  
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "casper@lists.berkeley.edu" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to casper+unsubscr...@lists.berkeley.edu.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CALWRf%3DTsifQZubv1Fbo0jEhY0LSba%2BgApk%3DkE6qKy_CO1izVpQ%40mail.gmail.com.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "casper@lists.berkeley.edu" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to casper+unsubscr...@lists.berkeley.edu.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/VI1PR01MB4799E1A85E37F7297C293AFCAC5C0%40VI1PR01MB4799.eurprd01.prod.exchangelabs.com.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To view this discussion on the web visit 
>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/5C04BF02-14A5-4BB7-8137-CD4CD729FBF9%40gmail.com.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG67D36rSJhJ-8ifUWFpGqUH0SKbHeu6XL7%2BNLVYTWn39Xo2vw%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/76021786-3B81-4A11-91E7-0C08C0ACD5D1%40berkeley.edu.


Re: [casper] A bug in the Xilinx FFT frame sync

2020-08-14 Thread David MacMahon
Hi, Sean,

Thanks for sharing that link to the interesting (though not overly helpful :P) 
Xilinx support issue.

One thing that might work for you in the meantime (while waiting for CASPER 
support of 2020.1) is to use 2020.1 to build an FFT-only model and generate it 
to a netlist (maybe now called a “design checkpoint”?) suitable for “black 
boxing” into the overall model that you built in the earlier 
(CASPER-compatible) version of Matlab.

Not sure if that’s the easiest alternative/work-around for you, but it may be 
an option worth exploring.

Good luck,
Dave

> On Aug 14, 2020, at 08:55, Sean Mckee  wrote:
> 

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/B64791D4-B510-48CE-AD35-086A4BBCBE19%40berkeley.edu.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-05-25 Thread David MacMahon
A few more suggestions:

1) Enable core dumps.  Usually you have to run "ulimit -c unlimited" and for 
suid executables there's an extra step related to /proc/sys/fs/suid_dumpable.  
See "man 5 core" and "man 5 proc" for details.  Once you have a core file, you 
can use gdb to examine the state of things when the segfault happened.  You 
might want to recompile your plug-in with debugging enabled and fewer 
optimizations to get the most out of this approach: "gdb /path/to/hashpipe 
/path/to/core".  (Gotta love how it's still called "core"!).  gdb can be a bit 
cryptic, but it's also very powerful.

2) Another idea, just for diagnostic purposes, is to omit the "+ 
input_databuf_idx(...)" part of the dest_p assignment.  That will write all 
payloads to the first part of the data block, so not buffer overflow for sure 
(assuming idx is in range :)).  It's just a way to eliminate a variable.

3) Make sure the packet socket blocks are large enough for the packet frames.  
I agree it looks like you're not reading past the end of the packet payload 
size, but maybe the payload itself goes beyond the end of the packet socket 
blocks?  The kernel might silently truncate the packets in that case.

4) If you're using tagged VLANs the PKT_UDP_xxx macros won't work right.  It 
sounds like that's not happening because you're seeing the expected size, but 
it's worth mentioning for mail archive completeness.

5) You can use hashpipe_dump_databuf to examine the 159 payloads you were able 
copy before the segfault to see whether every byte is properly positioned and 
has believable values.  You could change memcpy(..) to memset(p_dest, 'X', 
PKT_UDP_SIZE(frame)-16) so you'll know the exact value that every byte should 
have. Instead of 'X' you could use pkt_num+1 (i.e. a 1-based packet counter) so 
you'll know which bytes correspond to which packets.  Using memset() would also 
eliminate reading from the packet socket blocks (another variable gone).

Happy hunting,
Dave

> On May 25, 2020, at 16:33, Mark Ruzindana  wrote:
> 
> Thanks for the suggestions. I neglected to mention that I'm printing out the 
> PKT_UDP_SIZE() and PKT_UDP_DST() right before the memcpy(), I take into 
> account the 8 byte UDP header and the size and port are correct. When 
> performing the memcpy(), I am taking into account that PKT_UDP_DATA() returns 
> a pointer of the payload and excludes the UDP header. However, I also have an 
> 8 byte packet header within that payload (this gives me the mcnt, f-engine, 
> and x-engine indices) and I exclude it when performing the memcpy(). This is 
> what it looks like:
> 
> uint8_t * dest_p = db->block[idx].data + input_databuf_idx(m, f, 0,0,0); // 
> This macro index shifts every mcnt and f-engine index
> const uint8_t * payload = (uint8_t *)(PKT_UDP_DATA(frame)+8); // Ignore 
> packet header
> 
> fprintf(...); // prints PKT_UDP_SIZE() and PKT_UDP_DST()
> memcpy(dest_p, payload, PKT_UDP_SIZE(frame) - 16)  // Ignore both UDP (8 
> bytes) and packet header (8 bytes)
> 
> I will look into the other possible issues that you suggested, but as far as 
> I can tell, it doesn't seem like there should be a segfault given what I'm 
> doing before that memcpy(). I will let you know what else I find.
> 
> Thanks again, I really appreciate the help.
> 
> Mark
> 
> On Mon, May 25, 2020 at 4:30 PM David MacMahon  <mailto:dav...@berkeley.edu>> wrote:
> Hi, Mark,
> 
> Sounds like progress!
> 
>> On May 25, 2020, at 13:56, Mark Ruzindana > <mailto:ruziem...@gmail.com>> wrote:
>> 
>> I have been able to capture data with the first round of frames of the 
>> circular buffer i.e. if I have 160 frames, I am able to capture packets of 
>> frames 0 to 159 at which point right at the memcpy() in the process_packet() 
>> function of the net thread, I get a segmentation fault.
> 
> The fact that you get a the segfault right at the memcpy of the final frame 
> of the ring buffer suggests that there is problem with the parameters passed 
> to memcpy.  Most likely src+length-1 exceeds the end of the frame so you get 
> a segfault when memcpy tries to read from beyond the allocated memory.  This 
> would explain why it segfaults on the final frame and not the previous frames 
> because reading beyond a previous frame still reads from "legal" (though 
> incorrect) memory locations.  It's also possible that the segfault happens 
> due to a bad address on the destination side of the memcpy(), but unless the 
> destination buffer is also 160 frames in size that seems less likely.
> 
> The release_frame function is not likely to be a culprit here unless the 
> pointer you are passing it differs from the pointer that the pktsock_recv 
> function returned.
> 
> For debu

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-05-25 Thread David MacMahon
Hi, Mark,

Sounds like progress!

> On May 25, 2020, at 13:56, Mark Ruzindana  wrote:
> 
> I have been able to capture data with the first round of frames of the 
> circular buffer i.e. if I have 160 frames, I am able to capture packets of 
> frames 0 to 159 at which point right at the memcpy() in the process_packet() 
> function of the net thread, I get a segmentation fault.

The fact that you get a the segfault right at the memcpy of the final frame of 
the ring buffer suggests that there is problem with the parameters passed to 
memcpy.  Most likely src+length-1 exceeds the end of the frame so you get a 
segfault when memcpy tries to read from beyond the allocated memory.  This 
would explain why it segfaults on the final frame and not the previous frames 
because reading beyond a previous frame still reads from "legal" (though 
incorrect) memory locations.  It's also possible that the segfault happens due 
to a bad address on the destination side of the memcpy(), but unless the 
destination buffer is also 160 frames in size that seems less likely.

The release_frame function is not likely to be a culprit here unless the 
pointer you are passing it differs from the pointer that the pktsock_recv 
function returned.

For debugging, I suggest logging dst, src, len before calling memcpy.  Normally 
you wouldn't generate a log message for every packet because that would ruin 
your throughput, but since you know it's going to crash after the first 160 
packets there's not much throughout to ruin. :)

One thing to remember is that PKT_UDP_DATA() evaluates to a pointer to the UDP 
payload of the packet, but PKT_UDP_SIZE() evaluates to the total UDP size (i.e. 
8 bytes for the UDP header plus the length of the UDP payload).  Passing 
PKT_UDP_SIZE() as "len" to memcpy without subtracting 8 for the header bytes is 
not correct and could potentially cause this problem.

HTH,
Dave

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/297C1709-AE9C-488D-9110-FD0832BF5951%40berkeley.edu.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-04-17 Thread David MacMahon
Hi, Mark,

Yeah packet sockets do require extra privileges.  The solution/workaround that 
Hashpipe uses is to instal hashpipe with the suid bit set.  The init() 
functions of the threads will be called with the privileges of the suid user.  
Then hashpipe will drop the suid privileges before invoking the run() functions 
of the threads.  If you setup the packet sockets in the init() function, you 
can then use them in the run() functions.  It's not an ideal solution and could 
be considered a security hole, but given the limited and generally tightly 
controlled environments in which Hashpipe is typically used this is a working 
compromise.  The other option, as you indicated is to do something with the 
NAP_NET_RAW privilege, but I've not explored how to utilize that.  My limited 
understanding of that is that it is for users rather than executables, but like 
I said I haven't explored that route so I'm not really sure what's possible 
there.  If you figure out something useful, please post here.

Cheers,
Dave

> On Apr 17, 2020, at 14:25, Mark Ruzindana  wrote:
> 
> Hi all,
> 
> Hope you're doing fine. I was able to add packet sockets and the functions 
> provided by Hashpipe in hashpipe_pktsock.h, but I get permission issues when 
> trying to capture packets as a non-root user. 
> 
> The method I am trying to use to overcome this is owning the 
> plugins/executables as root and using the setuid flag to give root privileges 
> to hashpipe. At this point, I still get an 'operation not permitted' when 
> trying to open the socket. Then when trying to use the CAP_NET_RAW privilege 
> (setcap cap_net_raw=pe 'program'), I'm told that the operation is not 
> supported.
> 
> Just to be clear, I don't have any of these issues when running the process 
> as root, but I'd rather have non-root users running hashpipe. How were you 
> able to overcome the permission issues when trying to capture raw packets 
> with hashpipe as a non-root user? If you were running it as a non-root user.
> 
> Let me know whether you need any more information or whether I'm not stating 
> anything clearly.
> 
> Thanks a lot for the help.
> 
> Mark Ruzindana
> 
> On Tue, Mar 31, 2020 at 5:08 PM Mark Ruzindana  <mailto:ruziem...@gmail.com>> wrote:
> Thanks a lot for the quick responses John and David! I really appreciate it.
> 
> I will definitely be updating the version of Hashpipe that I currently have 
> on the server as well as ensure that the network tuning is good.
> 
> I'm currently using the standard "socket()" function, and a switch to packet 
> sockets, with the description that you gave, seems like it will definitely be 
> beneficial.
> 
> I also currently pin the threads to the desired cores with a "-c #" on the 
> command line, but thank you for mentioning it, I might have not been doing 
> so. The NUMA info is also very helpful. I'll make sure that the architecture 
> is as optimal as it should be.
> 
> Thanks again! This was very helpful and I'll update you with the progress 
> that I make.
> 
> Mark
> 
> 
> 
> 
> On Tue, Mar 31, 2020 at 4:38 PM David MacMahon  <mailto:dav...@berkeley.edu>> wrote:
> Just to expand on John's excellent tips, Hashpipe does lock its shared memory 
> buffers with mlock.  These buffers will have the NUMA node affinity of the 
> thread that created them so be sure to pin the threads to the desired core or 
> cores by preceding the thread names on the command line with a -c # (set 
> thread affinity to a single core) or -m # (set thread affinity to multiple 
> cores) option.  Alternatively (or additional) you can run the entire hashpipe 
> process with numactl.  For example...
> 
> numactl --cpunodebind=1 --membind=1 hashpipe [...]
> 
> ...will restrict hashpipe and all its threads to run on NUMA node 1 and all 
> memory allocations will (to the extent possible) be made within memory that 
> is affiliated with NUMA node 1.  You can use various tools to find out which 
> hardware is associated with which NUMA node such as "numactl --hardware" or 
> "lstopo".  Hashpipe includes its own such utility: "hashpipe_topology.sh".
> 
> On NUMA (i.e. multi-socket) systems, each PCIe slot is associated with a 
> specific NUMA node.  It can be beneficial to have relevant peripherals (e.g. 
> NIC and GPU) be in PCIe slots that are on the same NUMA node.
> 
> Of course, if you have as single socket mainboard, then all this NUMA stuff 
> is irrelevant. :P
> 
> Cheers,
> Dave
> 
>> On Mar 31, 2020, at 15:04, John Ford > <mailto:jmfor...@gmail.com>> wrote:
>> 
>> 
>> 
>> Hi Mark.  Since the newer version has a script called 
>> "hashpipe_irqaffinity.sh" I would t

Re: [casper] Red Pitaya Tutorial 1 reg

2020-04-02 Thread David MacMahon
Hi, Aravind,

> On Apr 2, 2020, at 17:19, Aravind Venkitasubramony  
> wrote:
> 
>  am stuck on the secure copy step mentioned in the tutorial 1 - "As per the 
> previous figure, navigate to the outputs folder and (secure)copy this across 
> to a test folder on the workshop server. Instructions to do this are 
> available here 
> ."
> 
> What is the "workshop server" mentioned here according to my setup? Do I copy 
> directly to rp-xx/katcp ? Can you please clarify that part?

I'm not a Red Pitaya expert, but I think that part of the instructions was 
written for a specific setup that was running at a workshop where the Red 
Pitayas were being shared between multiple participants.  I think all you need 
to do for your local setup is ensure that the fpg file gets to a machine which 
can communicate with the Red Pitaya using casperfpga (if it's not already on 
such a machine).

I'm sure others with more Red Pitaya experience can expand on or correct that 
as needed. :)

HTH,
Dave

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/346E510A-6B20-42B0-B743-B2BF3B87EE15%40berkeley.edu.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-03-31 Thread David MacMahon
Just to expand on John's excellent tips, Hashpipe does lock its shared memory 
buffers with mlock.  These buffers will have the NUMA node affinity of the 
thread that created them so be sure to pin the threads to the desired core or 
cores by preceding the thread names on the command line with a -c # (set thread 
affinity to a single core) or -m # (set thread affinity to multiple cores) 
option.  Alternatively (or additional) you can run the entire hashpipe process 
with numactl.  For example...

numactl --cpunodebind=1 --membind=1 hashpipe [...]

...will restrict hashpipe and all its threads to run on NUMA node 1 and all 
memory allocations will (to the extent possible) be made within memory that is 
affiliated with NUMA node 1.  You can use various tools to find out which 
hardware is associated with which NUMA node such as "numactl --hardware" or 
"lstopo".  Hashpipe includes its own such utility: "hashpipe_topology.sh".

On NUMA (i.e. multi-socket) systems, each PCIe slot is associated with a 
specific NUMA node.  It can be beneficial to have relevant peripherals (e.g. 
NIC and GPU) be in PCIe slots that are on the same NUMA node.

Of course, if you have as single socket mainboard, then all this NUMA stuff is 
irrelevant. :P

Cheers,
Dave

> On Mar 31, 2020, at 15:04, John Ford  wrote:
> 
> 
> 
> Hi Mark.  Since the newer version has a script called 
> "hashpipe_irqaffinity.sh" I would think that the most expedient thing to do 
> is to upgrade to the newer version.  It's likely to fix some or all of this.
> 
> That said, there are a lot of things that you can check, and not only the irq 
> affinity, but also make sure that your network tuning is good, that your 
> network card irqs are attached to processes where the memory is local to that 
> processor, and that the hashpipe threads are mapped to processor cores that 
> are also local to that memory.   Sometimes it's counterproductive to map 
> processes to processor cores by themselves if they need data that is produced 
> by a different core that's far away, NUMA-wise.  And lock all the memory in 
> core with mlockall() or one of his friends.
> 
> Good luck with it!
> 
> John
> 
> 
> 
> 
> On Tue, Mar 31, 2020 at 12:09 PM Mark Ruzindana  > wrote:
> Hi all,
> 
> I am fairly new to asking questions on a forum so if I need to provide more 
> details, please let me know. 
> 
> Worth noting that just as I was about to send this out, I checked and I don't 
> have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh among 
> other additions and modifications. So this might fix my problem, but maybe 
> not and someone else has more insight. I will update everyone if it does.
> 
> I am trying to reduce the number of packets lost/dropped when running 
> HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and diagnostics 
> to be confident that the problem is not any HASHPIPE thread running for too 
> long. Also, the percentage of packets dropped on any given scan is between 
> about 0.3 and 0.8%. Approx. 5,000 packets in a 30 second scan with a total of 
> 1,650,000 packets. So while it's a small percentage, the number of packets 
> lost is still quite large. I have also done enough tests with 'top', 'iostat' 
> as well as timing HASHPIPE in between time windows where there are no packets 
> dropped to diagnose the issue further. I (as well as my colleagues) have come 
> to the conclusion that the kernel is allowing processes to interrupt HASHPIPE 
> as it is running. 
> 
> So I have researched and run tests involving 'niceness' and I am currently 
> trying to configure smp affinities and irq balancing, but the changes that I 
> make to the smp_affinity files aren't doing anything. My plan was to have the 
> interrupts run on the 20 cores that aren't being used by HASHPIPE. Also, 
> disabling 'irqbalance' didn't do anything either. I also restarted the 
> machine to see whether the changes made are permanent, but the system reverts 
> back to what it was.
> 
> I might be missing something, or trying the wrong things. Has anyone 
> experienced this? And could you point me in the right direction if you have 
> any insight?
> 
> If you need anymore details, please let me know. I didn't add as much as I 
> could because I wanted this to be a reasonably sized message.
> 
> Thanks,
> 
> Mark Ruzindana
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu " group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxcwSQT-EsjuyqXpGmmBzykDeLt6JbfUUg_ZYpkXyat2w%40mail.gmail.com
>  
> 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-03-31 Thread David MacMahon
Hi, Mark,

That packet rate should be very manageable.  Are you using the standard 
"socket()" and "recv()" functions or are you using packet sockets?  Packet 
sockets are a more efficient way to get packets from the kernel that bypasses 
the kernel's IP stack.  It's not as efficient as IBVerbs or DPDK, but it is 
widely supported and should be more than adequate for the packet/data rates you 
are dealing with.  Hashpipe has functions that make it easy to work with packet 
sockets by providing a somewhat higher level interface to them.  If your 
version of Hashpipe doesn't have a "hashpipe_pktsock.h" then you should update 
for sure.

HTH,
Dave

> On Mar 31, 2020, at 12:09, Mark Ruzindana  wrote:
> 
> Hi all,
> 
> I am fairly new to asking questions on a forum so if I need to provide more 
> details, please let me know. 
> 
> Worth noting that just as I was about to send this out, I checked and I don't 
> have the most recent version of HASHPIPE with hashpipe_irqaffinity.sh among 
> other additions and modifications. So this might fix my problem, but maybe 
> not and someone else has more insight. I will update everyone if it does.
> 
> I am trying to reduce the number of packets lost/dropped when running 
> HASHPIPE on a 32 core RHEL 7 server. I have run enough tests and diagnostics 
> to be confident that the problem is not any HASHPIPE thread running for too 
> long. Also, the percentage of packets dropped on any given scan is between 
> about 0.3 and 0.8%. Approx. 5,000 packets in a 30 second scan with a total of 
> 1,650,000 packets. So while it's a small percentage, the number of packets 
> lost is still quite large. I have also done enough tests with 'top', 'iostat' 
> as well as timing HASHPIPE in between time windows where there are no packets 
> dropped to diagnose the issue further. I (as well as my colleagues) have come 
> to the conclusion that the kernel is allowing processes to interrupt HASHPIPE 
> as it is running. 
> 
> So I have researched and run tests involving 'niceness' and I am currently 
> trying to configure smp affinities and irq balancing, but the changes that I 
> make to the smp_affinity files aren't doing anything. My plan was to have the 
> interrupts run on the 20 cores that aren't being used by HASHPIPE. Also, 
> disabling 'irqbalance' didn't do anything either. I also restarted the 
> machine to see whether the changes made are permanent, but the system reverts 
> back to what it was.
> 
> I might be missing something, or trying the wrong things. Has anyone 
> experienced this? And could you point me in the right direction if you have 
> any insight?
> 
> If you need anymore details, please let me know. I didn't add as much as I 
> could because I wanted this to be a reasonably sized message.
> 
> Thanks,
> 
> Mark Ruzindana
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxcwSQT-EsjuyqXpGmmBzykDeLt6JbfUUg_ZYpkXyat2w%40mail.gmail.com
>  
> .

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/C3EFBF88-75DA-477C-A28A-D3235996E0FB%40berkeley.edu.


Re: [casper] Use of xBlock for block scripting

2019-08-30 Thread David MacMahon
I could be wrong, but as I recall, the block diagrams created via xBlocks were 
(at least at the time) not very conducive to visualizing the structure of the 
diagrams (e.g. lots of lines/traces overlaid on top of each other).  This was 
not a serious problem when everything worked as it should, but it was a 
nightmare when it didn't.

Dave

> On Aug 30, 2019, at 15:21, Dan Werthimer  wrote:
> 
> 
> 
> i don't know much about xblocks, so can't add much to jack's comments, 
> except: 
> 
> about a dozen years ago chris dick and others at xilinx recommended casper 
> use xblocks, 
> so hong chen tried it out, and ported several of the casper dsp blocks. 
> i think it worked well,  and hong chen liked xblocks, but it didn't catch on 
> in the casper community. 
> not sure why. 
> 
> best wishes,
> 
> dan
> 
> 
> 
> 
> On Fri, Aug 30, 2019 at 1:39 PM Jack Hickish  > wrote:
> Hi Franco,
> 
> I don't think there's any reason not to use xblocks. Someone can
> correct me if I'm wrong.
> 
> Several years ago there was a quest to move the whole casper library
> to xblocks -- https://github.com/casper-astro/xblocks_devel/ 
>  -- but it
> never seemed to get traction and the original libraries won over. I
> suspect had the project been more aggressive about just replacing the
> casper libraries it would have caught on.
> 
> Cheers
> Jack
> 
> On Thu, 29 Aug 2019 at 12:34, Franco  > wrote:
> >
> > Dear Casperites,
> >
> > I've recently been playing around with the creation of block libraries and 
> > I found out about Xilinx's API for programmatic model creation (xBlock). I 
> > find it particularly convenient because you don't have to explicitly 
> > position the blocks, as the software does all the positioning for you, and 
> > for what I tested the results are pretty nice.
> >
> > However when I checked in the CASPER library, only a few blocks are created 
> > using xBlock, and moreover, some blocks where re-implemented from xBlock to 
> > standard Matlab block scripting.
> >
> > So my question is: is there any reason why I should avoid using xBlock? The 
> > only inconvenient I have had with it for now is that you have to install 
> > some additional libraries in Linux to make it work, which I didn't find it 
> > documented anywhere.
> >
> > Thanks,
> >
> > Franco Curotto
> >
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "casper@lists.berkeley.edu " group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to casper+unsubscr...@lists.berkeley.edu 
> > .
> > To view this discussion on the web visit 
> > https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CACBfcEmKHeDGr5ny5HH2ruqvHgYr-VbNbyyPjpJz5eR1C-JaRA%40mail.gmail.com
> >  
> > .
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu " group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG1GKSn%2Bgy-f4HFRXYsNAoheHDJxYA_EzFcXyxUgwnM_STxWtA%40mail.gmail.com
>  
> .
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAGHS_vGWWsG8y9RShzBaji3-3B5wD8Gd6K-gdOL-_jwsR8j%3DHg%40mail.gmail.com
>  
> .

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/C230AA84-F656-4536-B101-5E07C3B391BD%40berkeley.edu.


Re: [casper] ROACH Network is unreachable

2019-06-20 Thread David MacMahon
Assuming you want to use DHCP, what happens if you comment out (or remove) the 
“iface eth0 inet static” line (and the “address” and “netmask” lines)?

> On Jun 20, 2019, at 18:17, zhang laiyu  wrote:
> 
>   Thanks you.
>   the contents of /etc/network/interfaces:
> 
>   auto eth0
>   iface eth0 inet dhcp
>   iface eth0 inet static
>   address 10.0.0.21
>   netmask 255.255.255.0
> 
>   when I want to start the network, I got some warning:
> 
> roach:~# /etc/init.d/networking restart
> Reconfiguring network interfaces...ifdown: failed to open statefile /etc/ne
> ifup: failed to open statefile /etc/network/run/ifstate: Stale NFS file hae
> failed. 
> 
>   I aslo use this:
>auto eth0
>   #iface eth0 inet dhcp
>   iface eth0 inet static
>   address 10.0.0.21
>   netmask 255.255.255.0
> 
>   But can not start the network.
>   if I run dhclient and can get IP address.
> 
> 
>> -Original Messages-
>> From: "David MacMahon" 
>> Sent Time: 2019-06-20 22:26:41 (Thursday)
>> To: casper@lists.berkeley.edu
>> Cc: "zhang laiyu" , jackhick...@gmail.com
>> Subject: Re: [casper] ROACH Network is unreachable
>> 
>> To further Marc’s query: when you mmcboot into the broken system, what are 
>> the contents of /etc/network/interfaces?
>> 
>> Dave
>> 
>>>> On Jun 20, 2019, at 04:06, Marc  wrote:
>>>> 
>>>> On 6/20/19, zhang laiyu  wrote:
>>>> Hi,Marc, Jack
>>>>   I make some progress but it was not solved.
>>>>   I boot the ROACH by 'run mmcboot' and did not got an IP address. And
>>>> then log in  RAOCH as root and try issuing the commands:
>>>>   dhclient -r
>>>>   dhclient
>>>>   ifconfig
>>>>   Then ROACH was assigned an ip address. And can use telnet to login
>>>> ROACH.
>>>>   But when I reboot the ROACH, ROACH Network is still unreachable.I have
>>>> to issuing the commands:dhclient again.
>>>>   I also open two ports (  tcp port 53 and udp port 67) in the server.But
>>>> it still does not work.
>>>>   I think that the DHCP connection doesn't work during boot.But I do not
>>>> the reason.
>>> 
>>> So if you boot into the broken system and connect via serial cable
>>> what does the output of
>>> 
>>> /sbin/ifconfig eth0
>>> 
>>> say ? Is there no IP address configured, or is it set to the wrong one
>>> ? If it is set incorrectly there may be some startup script which has
>>> some old/stale values set.
>>> You could try to add a
>>> 
>>> set -x
>>> 
>>> to some of the startup scripts, so that they echo the commands they execute
>>> 
>>> regards
>>> 
>>> marc
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "casper@lists.berkeley.edu" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to casper+unsubscr...@lists.berkeley.edu.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAGrhWaQPV8NBHcrvV8rQaUGdyQZJTFcW5aG1Vyujrn0HcBr7xg%40mail.gmail.com.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To view this discussion on the web visit 
>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/F2389975-B9E7-4BA6-B797-9D2421624736%40berkeley.edu.
> 
> 
> --
> Cheers!
>> 
> ZHANG Laiyu   
> Phone(China)   010-88236415   
> Cellphone(China)   13681385567
> E-mail:zhan...@ihep.ac.cn
> Address:   19B Yuquan Road,Shijingshan District,Beijing,China
> Department:Center for Particle Astrophysics 
> Office:Astrophysics Building 205Institute of High Energy Physics, 
> CAS  
> web: 
> http://www.ihep.cas.cn>
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/308856dd.3900.16b779c8676.Coremail.zhangly%40ihep.ac.cn.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/98E2F9D7-40C1-4823-822A-D8D0C253%40berkeley.edu.


Re: [casper] ROACH Network is unreachable

2019-06-20 Thread David MacMahon
To further Marc’s query: when you mmcboot into the broken system, what are the 
contents of /etc/network/interfaces?

Dave

> On Jun 20, 2019, at 04:06, Marc  wrote:
> 
>> On 6/20/19, zhang laiyu  wrote:
>> Hi,Marc, Jack
>>I make some progress but it was not solved.
>>I boot the ROACH by 'run mmcboot' and did not got an IP address. And
>> then log in  RAOCH as root and try issuing the commands:
>>dhclient -r
>>dhclient
>>ifconfig
>>Then ROACH was assigned an ip address. And can use telnet to login
>> ROACH.
>>But when I reboot the ROACH, ROACH Network is still unreachable.I have
>> to issuing the commands:dhclient again.
>>I also open two ports (  tcp port 53 and udp port 67) in the server.But
>> it still does not work.
>>I think that the DHCP connection doesn't work during boot.But I do not
>> the reason.
> 
> So if you boot into the broken system and connect via serial cable
> what does the output of
> 
> /sbin/ifconfig eth0
> 
> say ? Is there no IP address configured, or is it set to the wrong one
> ? If it is set incorrectly there may be some startup script which has
> some old/stale values set.
> You could try to add a
> 
> set -x
> 
> to some of the startup scripts, so that they echo the commands they execute
> 
> regards
> 
> marc
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAGrhWaQPV8NBHcrvV8rQaUGdyQZJTFcW5aG1Vyujrn0HcBr7xg%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/F2389975-B9E7-4BA6-B797-9D2421624736%40berkeley.edu.


Re: [casper] Data-Width Conversion in FIFO

2019-04-26 Thread David MacMahon
I thought that system generator included a FIFO block with different input 
output widths. As I recall, the only tricky parts were ensuring that the 
overall input/output throughputs are commensurate (true for any FIFO, really) 
and, for cases with input 2x wider than output, ensuring that the input is 
“half-word swapped” appropriately so that the output stream is properly ordered.

Maybe I’m thinking of the dual port BRAM block that can easily be coerced to 
act as a FIFO?

HTH (despite the lack of actionable details),
Dave

> On Apr 25, 2019, at 14:28, Jack Hickish  wrote:
> 
> Hi Indrajit,
> 
> I'm surprised that the Xilinx FIFO block doesn't give the option of
> having ports of two different widths. However, if it doesn't, the
> easiest thing to do might be to use a dual port RAM, which does allow
> the two interfaces to have different widths. If you can explain a bit
> more about what you're trying to achieve someone may already have a
> solution (for example, lots of designs have logic to turn N-bit data
> streams into 64-bit streams which can be used to feed the 10GbE
> block).
> 
> Cheers
> Jack
> 
>> On Thu, 25 Apr 2019 at 06:53, Indrajit Barve  wrote:
>> 
>> Hello all,
>> 
>> I would like to implement a FIFO with input port data type depth and width 
>> of 2048 X 32 and output port data type 1024 X 64. Basically looking a 
>> similar module like this 
>> https://www.xilinx.com/support/documentation/application_notes/xapp261.pdf . 
>>  or how to implement / configure  Data-Width Conversion for a FIFO on ROACH1 
>> .
>> 
>> Thanks
>> Indrajit
>> 
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To post to this group, send email to casper@lists.berkeley.edu.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] adc16x250-8 duplicated samples

2019-04-03 Thread David MacMahon
The adc16 scripts that you are using will work with the adc16_test design (all 
adc16 designs really).  The adc16 yellow block includes built-in snapshot 
functionality so all adc16 designs can get ADC snapshots using the built-in 
snapshot functionality.  The adc16_test design has an extra (much larger) 
snapshot block, but you don't have to use it.

HTH,
Dave

> On Apr 3, 2019, at 14:27, Franco  wrote:
> 
> Yes, it happens in the 16 inputs. Unfortunately, I don't have another adc 
> board. I have another ROACH2 but it is used for the moment. I'll try testing 
> in the other ROACH2 when it gets available.
> I notice that there is a sample compiled bof in: 
> https://casper.ssl.berkeley.edu/wiki/ADC16x250-8_coax_rev_2 
> <https://casper.ssl.berkeley.edu/wiki/ADC16x250-8_coax_rev_2> , maybe I could 
> try testing that model to see if it is a problem with my compilation tools, 
> but I haven't found the script that performs the data acquisition to the pc. 
> Does such script exists?
> 
> Thanks, 
> 
> Franco
> 
> On Wed, Apr 3, 2019 at 4:52 PM David MacMahon  <mailto:dav...@berkeley.edu>> wrote:
> Does this symptom appear on all 16 inputs?  Do you have another adc16x250-8 
> card and/or another ROACH2 you could try instead?
> 
> Dave
> 
>> On Apr 3, 2019, at 11:51, Franco > <mailto:francocuro...@gmail.com>> wrote:
>> 
>> Hi Jack,
>> 
>> To answer your questions:
>> - Yes I'm using the latest version of mlib_devel, roach2 branch.
>> - No, the initialization script doesn't suggest any error. Here is the 
>> script output:
>> 
>> Connecting to 192.168.1.13...
>> Programming 192.168.1.13 with adc16snap.bof.gz...
>> Design built for ROACH2 rev2 with 4 ADCs (ZDOK rev2)
>> Gateware supports demux modes (using demux by 1)
>> Resetting ADC, power cycling ADC, and reprogramming FPGA...
>> ZDOK0 clock OK
>> Calibrating SERDES blocks...ABCD
>> SERDES calibration successful.
>> Selecting analog inputs...
>> Using default digital gain of 1...
>> Done!
>> 
>> - I plotted the adc output with adc16_plot_chans.rb, and it seems to present 
>> the same data duplication. Here is an image of the plot:
>> 
>> 
>> - Yes, the User IP Clock is correctly set to adc0_clk.
>> 
>> I also tried with a different model and a different clock rate 
>> (200MHz/MSPS), and using the initialization code from here: 
>> http://w.astro.berkeley.edu/~davidm/gems/ 
>> <http://w.astro.berkeley.edu/~davidm/gems/> and I had the same result. It 
>> seems the I stumbled into some mysterious behavior of the ADC board. Has 
>> anybody else experienced this behavior?
>> 
>> Thanks,
>> 
>> Franco
>> 
>> 
>> On Wed, Apr 3, 2019 at 2:02 PM Jack Hickish > <mailto:jackhick...@gmail.com>> wrote:
>> Hi Franco,
>> 
>> Sorry, I missed your note in your first email where you already said you 
>> used the ruby init code with demux 1. This is correct for the 16-input 
>> configuration you are using. In this configuration, the FPGA and ADC should 
>> both run at the same clock rate.
>> If you put in a 140MHz clock and est_brd_clk() returns ~140 that is a good 
>> sign.
>> I assume you're using the latest roach2 branch on 
>> github.com/casper-astro/mlib_devel 
>> <http://github.com/casper-astro/mlib_devel>?
>> Does the ruby initialization script suggest anything is wrong in the 
>> initialization? 
>> In the same repository as the adc16_init script, there is also a script to 
>> plot adc outputs -- adc16_plot_chans.rb . Does this give the same sample 
>> duplication you see in your snapshots?
>> I assume that the User IP Clock source in your design is correctly set to 
>> adc0? (not user_clk/sys_clk?)
>> 
>> 
>> Thats all I can think to check right now...
>> 
>> Good luck!
>> Jack
>> 
>> On Wed, 3 Apr 2019 at 16:57, Franco > <mailto:francocuro...@gmail.com>> wrote:
>> Hi David and Jack,
>> 
>> Interesting. Yes I'm using a 140MHz clock (I'm injecting a 140MHz tone into 
>> the adc clock input). I'm sure the FPGA is running at 140MHz because I 
>> checked it with fpga.est_brd_clk(). Also, the data duplication occurs for 
>> all 16 inputs, so my guess is that is a problem at the adc board level. I'm 
>> using the adc_init.rb code with the '--demux 1' flag (I understand that this 
>> is the 16 in mode), however I copied this code from someone else, so maybe 
>> is an old version. I'll try to use the latest version to see if that is the 
>> problem. I'll also try a different (valid) samplin

Re: [casper] adc16x250-8 duplicated samples

2019-04-03 Thread David MacMahon
Does this symptom appear on all 16 inputs?  Do you have another adc16x250-8 
card and/or another ROACH2 you could try instead?

Dave

> On Apr 3, 2019, at 11:51, Franco  wrote:
> 
> Hi Jack,
> 
> To answer your questions:
> - Yes I'm using the latest version of mlib_devel, roach2 branch.
> - No, the initialization script doesn't suggest any error. Here is the script 
> output:
> 
> Connecting to 192.168.1.13...
> Programming 192.168.1.13 with adc16snap.bof.gz...
> Design built for ROACH2 rev2 with 4 ADCs (ZDOK rev2)
> Gateware supports demux modes (using demux by 1)
> Resetting ADC, power cycling ADC, and reprogramming FPGA...
> ZDOK0 clock OK
> Calibrating SERDES blocks...ABCD
> SERDES calibration successful.
> Selecting analog inputs...
> Using default digital gain of 1...
> Done!
> 
> - I plotted the adc output with adc16_plot_chans.rb, and it seems to present 
> the same data duplication. Here is an image of the plot:
> 
> 
> - Yes, the User IP Clock is correctly set to adc0_clk.
> 
> I also tried with a different model and a different clock rate (200MHz/MSPS), 
> and using the initialization code from here: 
> http://w.astro.berkeley.edu/~davidm/gems/ 
> <http://w.astro.berkeley.edu/~davidm/gems/> and I had the same result. It 
> seems the I stumbled into some mysterious behavior of the ADC board. Has 
> anybody else experienced this behavior?
> 
> Thanks,
> 
> Franco
> 
> 
> On Wed, Apr 3, 2019 at 2:02 PM Jack Hickish  <mailto:jackhick...@gmail.com>> wrote:
> Hi Franco,
> 
> Sorry, I missed your note in your first email where you already said you used 
> the ruby init code with demux 1. This is correct for the 16-input 
> configuration you are using. In this configuration, the FPGA and ADC should 
> both run at the same clock rate.
> If you put in a 140MHz clock and est_brd_clk() returns ~140 that is a good 
> sign.
> I assume you're using the latest roach2 branch on 
> github.com/casper-astro/mlib_devel 
> <http://github.com/casper-astro/mlib_devel>?
> Does the ruby initialization script suggest anything is wrong in the 
> initialization? 
> In the same repository as the adc16_init script, there is also a script to 
> plot adc outputs -- adc16_plot_chans.rb . Does this give the same sample 
> duplication you see in your snapshots?
> I assume that the User IP Clock source in your design is correctly set to 
> adc0? (not user_clk/sys_clk?)
> 
> 
> Thats all I can think to check right now...
> 
> Good luck!
> Jack
> 
> On Wed, 3 Apr 2019 at 16:57, Franco  <mailto:francocuro...@gmail.com>> wrote:
> Hi David and Jack,
> 
> Interesting. Yes I'm using a 140MHz clock (I'm injecting a 140MHz tone into 
> the adc clock input). I'm sure the FPGA is running at 140MHz because I 
> checked it with fpga.est_brd_clk(). Also, the data duplication occurs for all 
> 16 inputs, so my guess is that is a problem at the adc board level. I'm using 
> the adc_init.rb code with the '--demux 1' flag (I understand that this is the 
> 16 in mode), however I copied this code from someone else, so maybe is an old 
> version. I'll try to use the latest version to see if that is the problem. 
> I'll also try a different (valid) sampling frequency.
> 
> Thanks for the suggestions,
> 
> Franco
> 
> On Wed, Apr 3, 2019 at 9:07 AM Jack Hickish  <mailto:jackhick...@gmail.com>> wrote:
> Hi Franco,
> 
> In addition to Dave's advice-- how are you configuring your board? After 
> programming the FPGA, you'll need to appropriately configure the ADC to 
> operate in the right mode. The code seems to be linked here -- 
> https://casper.ssl.berkeley.edu/wiki/ADC16x250-8_coax_rev_2 
> <https://casper.ssl.berkeley.edu/wiki/ADC16x250-8_coax_rev_2>
> 
> Cheers
> Jack
> 
> On Wed, 3 Apr 2019 at 00:53, Franco  <mailto:francocuro...@gmail.com>> wrote:
> Hi Casperites,
> 
> I'm working in some project that uses a ROACH2 and an adc16x250-8 ADC board. 
> When I check the raw data from the ADC using a snapshot block I see this 
> weird effect where two consecutive samples have always the same value, as 
> shown in this image:
> https://my.pcloud.com/publink/show?code=XZMRx67Zpj7XjnkE5PypVuuDCB9Mhu8IJJ37 
> <https://my.pcloud.com/publink/show?code=XZMRx67Zpj7XjnkE5PypVuuDCB9Mhu8IJJ37>
> 
> According to an ex-coworker, this is the expected behavior of the adc16x250-8 
> board in 16 input mode, because of some constraints in the communication 
> between the ADC and the FPGA, the FPGA must run at twice the speed to 
> correctly receive the sampled data. However, couldn't find any explicit 
> mention of this phenomenon in the CASPER website or mailing list. Can someone 
> confir

Re: [casper] adc16x250-8 duplicated samples

2019-04-02 Thread David MacMahon
Hi, Franco,

Are you trying to use a 140 MHz sample clock?  140 Msps operation in 16 input 
mode is supported.  It should be very similar to how we ran this ADC board for 
the PAPER correlator (200 Msps in 16 input mode).  We did not have duplicate 
samples, so that sounds kind of strange to me.

FWIW, there are some frequency limitations based on the MMCM limitations in the 
Virtex 6 FPGA on the ROACH2.  These are described here:

https://casper.ssl.berkeley.edu/wiki/ADC16x250-8#ADC16_Sample_Rate_vs_Virtex-6_MMCM_Limitations
 
<https://casper.ssl.berkeley.edu/wiki/ADC16x250-8#ADC16_Sample_Rate_vs_Virtex-6_MMCM_Limitations>

The text reads somewhat cryptically, but if you read it in conjunction with the 
Virtex 6 MMCM documentation and the HMCAD1511 documentation, it will hopefully 
be clear.

HTH,
Dave

> On Apr 2, 2019, at 15:53, Franco  wrote:
> 
> Hi Casperites,
> 
> I'm working in some project that uses a ROACH2 and an adc16x250-8 ADC board. 
> When I check the raw data from the ADC using a snapshot block I see this 
> weird effect where two consecutive samples have always the same value, as 
> shown in this image:
> https://my.pcloud.com/publink/show?code=XZMRx67Zpj7XjnkE5PypVuuDCB9Mhu8IJJ37 
> <https://my.pcloud.com/publink/show?code=XZMRx67Zpj7XjnkE5PypVuuDCB9Mhu8IJJ37>
> 
> According to an ex-coworker, this is the expected behavior of the adc16x250-8 
> board in 16 input mode, because of some constraints in the communication 
> between the ADC and the FPGA, the FPGA must run at twice the speed to 
> correctly receive the sampled data. However, couldn't find any explicit 
> mention of this phenomenon in the CASPER website or mailing list. Can someone 
> confirm this is the correct behavior so I can get peace of mind :)?
> 
> Some info of my test:
> - Board: ROACH2-rev2
> - ADC: ADC16x250-8 coax rev2
> - ADC mode: 16 inputs (demux 1, using David Macmahon initalization code)
> - User IP Clock Rate: 140 MHz
> - Actual clock frequency used in the adc board: 140MHz
> 
> Thanks,
> 
> Franco Curotto
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> <mailto:casper+unsubscr...@lists.berkeley.edu>.
> To post to this group, send email to casper@lists.berkeley.edu 
> <mailto:casper@lists.berkeley.edu>.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [EXTERNAL] [casper] seeking high accuracy GPS disciplined time/frequency standards ?

2019-03-13 Thread David MacMahon
Hi, Dan,

Thanks for summarizing all the great suggestions from the collective wisdom of 
the CASPER mailing list!

I think getting super precise wrt some external time reference between multiple 
observatories is an admirable goal, but I think it's also important to keep the 
ultimate goal in mind: coincident detection of transient events at astronomical 
distances.  No matter how good you think you're doing with the external time 
reference synchronization, you will never really know you got it right until 
you can demonstrate coincident detection at multiple sites of a known 
astronomical transient event.  I'm not sure what kinds of events your 
instrument is sensitive to, but something like the occultation of a star by an 
asteroid or other solar system object would be very convincing.  Tracking the 
movement of the limb of the moon across your detector arrays would perhaps be 
another possibility.  Not only would these demonstrations be a very reassuring 
confirmation that you've nailed the time synchronization problem, they would 
also provide a nice way to measure/calibrate the residual errors.

Fun stuff!!!

Dave

> On Mar 12, 2019, at 20:17, Dan Werthimer  wrote:
> 
> 
> dear casper time keepers, 
> 
> thanks for all your good ideas on how to get a pair of time/freq standards 
> 500 km apart to agree to a few ns. 
> 
> some of you suggested common view GPS, which can get clocks to agree to 
> within about 2 or 3 ns: 
> https://www.nist.gov/pml/time- 
> and-frequency-division/atomic-standards/common-view-gps-time-transfer
>  
> 
> also see papers at the bottom of that page.
> 
> some suggested NIST TMAS service, (time-measurement and analysis service),
> where you pay a monthly fee for NIST to keep your GPSDO to < 10ns RMS wrt 
> NIST-UTC. 
> https://www.nist.gov/programs-projects/time-measurement-and-analysis-service-tmas
>  
> 
>   
> 
> i like john's suggestion of letting the oscillator drift, rather than 
> continually trying to correct it.
> i think several observatories do that -  makes sense.
> 
> best wishes,
> 
> dan
> 
> 
> 
> 
>  
> 
> From: John Ford mailto:jmfor...@gmail.com>> 
> Sent: Monday, March 11, 2019 9:41 AM
> To: LIST mailto:casper@lists.berkeley.edu>>
> Subject: Re: [EXTERNAL] [casper] seeking high accuracy GPS disciplined 
> time/frequency standards ?
> 
>  
> 
> Hi Dan.  I've been thinking about this a bit over the weekend, and I think 
> the problem can be solved by dividing the problem.  I think the frequency 
> standard should not be coupled to GPS, rather a free-running rubidium or 
> better oscillator could provide sufficient frequency stability and could also 
> generate a local clock.  The CLOCK, then can be either referenced or 
> calibrated somehow (postprocessed GPS time + ?) to UTC.  The local clocks at 
> the N sites then would not be slaved to GPS, but rather to their own 
> frequency standard, corrected as needed by filtered/corrected GPS time.
> 
>  
> 
> Green Bank does this to some extent, without closing the loop.  The hydrogen 
> maser drives a counter, which provides 1 PPS.  But this one PPS is NOT slaved 
> to GPS, but rather the GPS signal is used to measure the offset from GPS.  
> The offset is then supplied to the users for their calculations.  The system 
> drifts some microseconds per year.  Eventually the time gets resynced to GPS, 
> when someone decides it's time, or when something happens to force it.  Like 
> a power/UPS outage.
> 
>  
> 
> I think you could build such a system that would be accurate to << 10 ns with 
> respect to UTC using GPS and doing corrections to the GPS TIME after the 
> fact, and feeding forward the errors into your clock.  The corrections would 
> be a day late, but presumably you are not going to rely on raw GPS, rather on 
> the time-corrected GPS time.  In the meantime your clock will freewheel with 
> your oscillator, which will be extremely accurate anyway.
> 
>  
> 
> John
> 
>  
> 
>   
> 
>  
> 
> On Sat, Mar 9, 2019 at 3:08 PM Dan Werthimer  > wrote:
> 
>  
> 
> hi david, 
> 
>  
> 
> i'm not an expert in atmospheric delay correction and gps,  but if you are 
> interested, 
> 
> i think there are several papers about what corrections gps can do and what 
> it can't do. 
> 
> some references are listed at the end of: 
> 
> https://en.wikipedia.org/wiki/Error_analysis_for_the_Global_Positioning_System
>  
> 
>  
> 
> the best GPSDO's on the market provide errors of around 10 ns RMS wrt UTC. 
> 
> i think most of this error is due to variable atmospheric delays that can not 
> be removed
> 
> (eg:  dispersion errors 

Re: [casper] Timestamp in ROACH2 and PTP

2019-03-11 Thread David MacMahon
Hi, Franco,

> On Mar 11, 2019, at 07:37, Franco  wrote:
> 
> why you call "ARM" the signal/register that resets the time-tracking counter?

One of the many definitions of the verb “to arm” is...

to equip or prepare for any specific purpose or effective use: 
to arm a security system; to arm oneself with persuasive arguments.
In the CASPER case, it means “to prepare for the next external sync pulse 
(usually a 1 PPS signal) to reset the timebase counter”.  As I’m sure you’ll 
agree, it would be problematic if the counter reset on every 1 PPS pulse.

HTH,
Dave

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] ROACH2 10GbE Multicasting

2019-01-28 Thread David MacMahon
Hi, Luke,

I have not used a ROACH2 to send multicast packets, but the MeerKAT folks 
certainly have.  My understanding is that the 10 GbE block does the proper 
mapping of multicast destination IP address to multicast destination MAC 
address so it should "just work".

I recommend starting with a direct connection between a ROACH2 and a PC.  That 
way the PC's NIC will get the packets without having to interact with an 
intervening switch.  That will let you examine the packets using tcpdump or 
wireshark.  To get the kernel to pass the multicast packets up the network 
stack, I think you will still need to use the IP_ADD_MEMBERSHIP socket option 
(see "man 7 ip" for details).  When you have that working, then you can add a 
switch into the path.  The IP_ADD_MEMBERSHIP option will inform the switch (via 
IGMP) of the multicast group (i.e. address) that you wish to receive packets 
for.

HTH,
Dave

> On Jan 28, 2019, at 08:08, Luke Hawkins  wrote:
> 
> CASPERites,
> 
> Has anybody been using ROACH2s to transmit multicast packets?
> 
> I found the following two threads on the mailing archive, and was
> wondering if the 10GbE blocks had been updated since then in ways that
> could impact multicasting, or if anybody had any comments on preferred
> IGMP snooping switches, etc...
> 
> https://www.mail-archive.com/casper@lists.berkeley.edu/msg06132.html
> https://www.mail-archive.com/casper@lists.berkeley.edu/msg04527.html
> 
> -Luke Hawkins
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] 10 GBE Network Slowdown with Ubuntu 18.04

2018-10-03 Thread David MacMahon
HI, Dale,

Is this a multi-socket system?  If so, are you using "numactl" or "taskset" to 
bind the packet reading processes to CPU(s) on the same socket that the NIC is 
connected to?  Are you sure you are sending the NIC interrupts to CPU(s) on the 
socket that the NIC is connected to?

FWIW, the Hashpipe program includes a script (hashpipe_topology.sh) that will 
summarize the NUMA topology of a system vis a vis network cards and/or GPUs.

https://github.com/david-macmahon/hashpipe/blob/master/src/hashpipe_topology.sh

HTH,
Dave

> On Oct 3, 2018, at 14:22, Gary, Dale E.  wrote:
> 
> Hi All,
> 
> I thought I would send an update to this problem, which still persists.  
> Jonathan's suggestion did not seem to work, since each ethernet interface 
> does not send packets to multiple processors.  If I specify two cpus in the 
> SMP_AFFINITY files, the board sends to only one of them.  Also, I removed 
> irqbalance as Jean suggested, but that had no effect.
> 
> I wrote a python script to read and plot the number of packets handled by 
> each interface from /proc/net/softnet_stat once per second, and then started 
> two packet-reading processes on difference cpus.  The attached file is a good 
> example of what I find.  The packet-readers run normally until about 335 s 
> in, and then the number of packets on both interfaces suddenly drops by about 
> 30,000, and the packet readers dutifully complain that they are getting too 
> few packets per accumulation.  At about 360 s, I killed one of the 
> packet-reader processes, and the number of packets on the interface it was 
> reading jumps immediately up to normal.  It is interesting that the *other* 
> interface also shows more packets arriving, but not up to normal.  After 
> killing the second process, all is well again.  When this process is 
> repeated, the timing of the failures changes, but seems always to be longer 
> than 5 minutes--I'm not sure I ever saw a failure within the first 5 minutes.
> 
> This seems to confirm that 
> The interfaces are running fine, and it is the act of reading them that 
> somehow is associated with the problem
> The failure is sudden, and leads to a lower, but stable number of packets 
> being handled by the interface.
> The failure is usually, but not always, on both interfaces at the same time.
> Killing the process brings the packets back, without resetting anything else.
> The probability of failure seems to be near 0 within the first 5 minutes, and 
> near 1 by 10 minutes, yet the timing of the glitch is quite random between 
> those limits.
> Note that we have plenty of resources (top shows cpu idle time on the 
> packet-reading processes is near 50%, and on the packet-handling processes is 
> 75%, with 25% si).  Memory usage is also miniscule compared to the 65 GB 
> available.
> 
> So far, that is all I have.  The myricom folks (CSPi) have opened a ticket, 
> but so far have not had any suggestions.  They did say that they were about 
> to embark on some tests of Ubuntu 18.04 compatibility, so perhaps they will 
> find something.  Meanwhile, we have no solution.
> 
> Regards,
> Dale
> 
> On Wed, Oct 3, 2018 at 4:19 AM Jean Borsenberger  <mailto:jean.borsenber...@obspm.fr>> wrote:
> First sorry for the delay, I was off for a time.
> 
> We do not use UBUNTU, but DEBIAN, but the two distribs are in fact two 
> flavours of the same thing.
> 
> We manage on each machine an UDP download link from a ROACH2. ROACH2 
> does nothing but adding an 8byte counter to each 8K data block. That way we 
> can precisely mesure the packet loss rate.
> 
> First notice that driver writers for 10GBe NIC found wise to split IRQ on 
> wether six or seven IRQ numbers, why this? I do not have the slightest idea. 
> Then come the worse. We run at 1.1 Gsamples/sec, we are very close to the 
> link capacity. With the standard setup the loss may rise to 5%, with a 
> current value of 1%. I suspect a cache problem. On our 8 (real) core system, 
> IRQ can be splitted on each one, but each has to be aware of what is 
> currently done by the others. This coherence issue may take some cycles. 
> Using that guess I assigned all IRQ of a given I/F to a single core 
> (/proc/irq/xx/smp_affinity). Concurently I removed all other things from this 
> core (smp affinity and taskset). It worked: the loss is now arround 10^^-6, 
> which we find acceptable.
> 
> The new pledge is named irqbalance, which takes over you on IRQ
> 
> aptitude remove irqbalance.
> 
> That's harmless.
> 
> 
> You may wish also to get rid of systemd, which takes cycles for a 
> questionable purpose, but the issue is hazardous. Anyhow we took this option.
> 
> systemd gets worse at each OS release.
> 
&g

Re: [casper] Inverse of fft_wideband_real

2018-08-14 Thread David MacMahon
Hi, Jonathan,

The biplex FFT has two complex inputs and two complex outputs.  The two input 
signals are presented "in parallel" (i.e. simultaneously) one sample per input 
per FPGA cycle, but the output spectra appear serially (i.e. one signal's 
spectrum followed by the other signal's spectrum) two frequency channels per 
FPGA cycle.  This block and its various sub-blocks comprise the basic FFT 
functionality of the CASPER library.  All the other FFT blocks exist to perform 
various "tricks" to perform FFTs on other types of input data (e.g. real and/or 
"wideband"), but the underlying core is the biplex FFT.

One of the tricks is to combine two real inputs into a single complex input.  
The positive (non-redundant) frequency channels of each real input can be 
recovered from the complex output spectrum by adding/subtracting channel X with 
channel -X.  I am not aware of any CASPER FFT that processes a real input as a 
complex input with the imaginary component set to 0.

Getting back to Alex's question, I think Jack is on the right track that 
performing inverse FFTs of two N point complex spectra of real inputs can 
probably be accomplished by creating a 2N point complex spectra that is not 
symmetric (via linear combinations of the two input spectra) and then inverse 
FFTing and taking the real and imaginary outputs as the time series.  I would 
recommend working this out analytically first (to ensure this is in fact a 
valid option), then experimenting with MATLAB to ensure that the mechanics are 
understood, then implement in simulink and simulate to ensure that the 
implementation is correct.  There will be some details about fft_shift that 
might be hard to solve if the signals have different power levels or spectral 
shapes.

HTH,
Dave

> On Aug 14, 2018, at 13:35, Jonathan Weintroub  
> wrote:
> 
> Thanks for the response, Jack.   Hope you are properly caffeinated today ;)
> 
> I thought Alex put our application question really clearly, and he’s building 
> some very nice bitcodes to see how the various processes work.  I’d like to 
> ask you, and the community, some simpler meta questions which came up for me 
> as we dug into the CASPER blocks.   Personnel changes have caused some loss 
> of institutional memory, which is the context.
> 
> Jack you said: "you should be able to do FFTs of two streams, A, B, in 
> parallel.”  I was under the impression CASPER streaming FFTs are very 
> efficient, so that if a real FFT is needed, they are indeed done this way, in 
> pairs and stuffed into the real and imaginary part of a complex-in 
> complex-out FFT.   Alex is using two CASPER blocks, one called simply “fft” 
> which is complex-in complex-out and works as expected. The other called 
> “wideband_fft_real”.  What is curious to me about the latter block is that it 
> takes a real input stream, and produces half the number of complex points at 
> the output (which is the same amount of data, of course).
> 
> There is, of course, a redundant copy of the spectrum produced by 
> wideband_fft_real too, but that data seems to be stubbed out in the inner 
> workings.  What worries me about this is that it seems to be working by 
> effectively setting the imaginary part of the inner complex FFT to zero, 
> zeroing the redundant hermitian conjugate, and in so doing, is apparently 
> doing a factor of two more computation than it needs to in principal.  I feel 
> I must be mistaken in this, since I really thought CASPER FFTs would not 
> waste computation in this way.
> 
> Can someone help guide me to the source of my assumed misunderstanding?  It 
> may be helpful to know something of the history of wideband_fft_real, and 
> it’s relationship to “fft” and any other blocks which might be useful.
> 
> As a sidelight, I’ve been trying to find a venerable paper I recall Aaron 
> Parsons et al. wrote on the CASPER Biplex FFT.  I can’t seem to find the one 
> I recall, this is the closest I have come, but doesn’t have the singular 
> focus on the FFT I recall in another useful reference.
> 
> https://arxiv.org/pdf/0809.2266.pdf 
> 
> Can someone point me in the right direction.
> 
> 
> 
>> On Aug 14, 2018, at 2:27 AM, Jack Hickish > > wrote:
>> 
>> Hi Alex,
>> 
>> I don't know if there's a way to avoid generating the full spectra prior to 
>> taking the fft (I suspect if there is, it just pushes the buffering 
>> somewhere else), but it certainly seems like you should be able to do FFTs 
>> of two streams, A, B, in parallel. I would think this would work by (after 
>> generating the full spectra) computing FFT(A+iB) such that FFT(A) emerges in 
>> the real part of the output, and FFT(B) in the imag.
>> If you don't need to FFT two independent streams you can probably do 
>> something smart (like the wideband real fft) to leverage the above to do the 
>> multiple serial FFTs on a wideland input. 
>> 
>> I think? I haven't had coffee yet. 
>> 
>> 

Re: [casper] Problem when programming ROACH2

2018-08-13 Thread David MacMahon
Just to add some more details to the problems seen at NRAO, it turns out that 
some of the uBoot settings passed a "mem=" command line option to the Linux 
kernel, where  was actually greater than the amount of RAM available.  It 
seems that the memory allocator would not reclaim the memory used during BOF 
programming since it thought there was still plenty available.  When it went to 
use memory beyond the actual memory available, but within , the system 
would crash.  The fix we used at the time was to change  to  in the 
uBoot environment, where  was the actual memory available (or maybe we just 
did away with the "mem=" altogether?).

At any rate, I'm glad you got it working now!

Cheers,
Dave

> On Aug 10, 2018, at 08:48, John Ford  > wrote:
> 
> Hi Raimondo,
> 
> We saw this years ago at NRAO.  Marc is right about the solution.
> 
> John
> 
> On Fri, Aug 10, 2018 at 7:20 AM, Marc Welz  > wrote:
> As per previous email: Either start it again, or upgrade it
> 
> regards
> 
> marc
> 
> 
> On Fri, Aug 10, 2018 at 12:45 PM, Concu, Raimondo
> mailto:raimondo.co...@inaf.it>> wrote:
> > Hi Mark,
> >
> > Hi everyone,
> >
> > maybe you're right
> >
> > when the problem happens
> >
> > and i restart tcpborphserver3, the problem disappears.
> >
> > and this is the output:
> >
> > root@192:~# ps -ALL | grep tcp
> >   820   820 ?00:00:53 tcpborphserver3
> > root@192:~# kill 820
> > root@192:~#
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> > roach VMA close
> > roach release mem called
> >
> >
> >
> > Any suggestions?
> >
> > Thanks in advance
> > Raimondo
> >
> >
> > 2018-08-10 13:46 GMT+02:00 Marc Welz  > >:
> >>
> >> Hello
> >>
> >>
> >> The "raw unable\_to\_map\_file\_/dev/roach/mem:\_Cannot\_allocate\_memory"
> >> is the interesting line. I suspect you are running a slightly older
> >> version of the filesystem or romfs which had a bug where it didn't
> >> unmap the previous image, and so suffered address space exhaustion
> >> after a dozen or so programs. The quick way to solve this is to reboot
> >> occasionally, the long term solution is to program a newer 

Re: [casper] New Wideband Spectrometer Designs

2018-08-06 Thread David MacMahon
Hi, John,

I was speaking with Homin last week at the CASPER Hardware Porting Workshop 
hosted in Cape Town.  He mentioned the Analog Devices 10 Gsps 12-bit ADC that I 
think is the same one Dan referenced. My recollection of our conversation was 
that the pricing was not so bad, but I could be conflating that assessment with 
a different ADC. In any event, I think future ADC boards will move to FMC (or 
FMC+) connectors.  This will necessitate new host FPGA boards which is why the 
next gen CASPER toolflow emphasizes ease of porting new boards to the CASPER 
toolflow. It’s an exciting upgrade in processing capacity, but it’s still early 
days. Definitely fodder CASPER Workshop discussions! :)

Dave

> On Aug 6, 2018, at 20:25, Dan Werthimer  wrote:
> 
> 
> 
> hi john,
> 
> how many bits do you need?
> 
> here are some possibilities:
> 
> a)
> if 4 bits, then homin's 15 Gsps adc board might be a good choice.
> 
> b)
> if 12 bits, and you don't mind breaking the band up into 2 GHz chunks,
> then this $9K RFsoc eval board has eight 12 bit 4Gsps ADC's and a big FPGA to 
> channelize and packetize: 
> https://www.xilinx.com/products/boards-and-kits/zcu111.html
> 
> c)
> if 12 bits, and you want to digitize the whole band at once, 
> you might consider AD's new 10.25GSps 12-bits ADC with a 6.5GHz bandwidth: 
> http://www.analog.com/en/products/analog-to-digital-converters/standard-adc/high-speed-ad-10msps/ad9213.html
> but it's pricy, data sheet is preliminary, and i don't know if anybody has an 
> FMC board with this chip yet. 
> http://www.analog.com/en/about-adi/news-room/press-releases/2018/5-21-2018-12-bit-10-point-25-gsps-radio-frequency-adc-sets-new-performance.html
>  shows the pricing as $3,652.49 in 1000 piece quantities. 
> 
> 
> best wishes,
> 
> dan
> 
> 
> 
> 
> 
>  
> 
> 
>> On Mon, Aug 6, 2018 at 11:13 AM, John Ford  wrote:
>> Hi all.  We're interested in wideband moderate performance spectrometers.  
>> Something that can digitize 2 (or 4) polarizations at at ~ 8 to 10 GS/s, and 
>> provide ~4K channels, full stokes, with a moderate dump rate (1 GB/sec or 
>> less).  
>> 
>> We could use 8 ROACH-2/ADC-5GS VEGAS-style spectrometer blades, 2 for each 4 
>> GHz polarization and sideband. (simultaneous upper and lower sidebands are 
>> required)
>> 
>> Or we could move on to newer ADCs and processing boards, and get the full 4 
>> GHz from each polarization in one go.  I think this is preferable, for many 
>> reasons.
>> 
>> The key here, I think, is the ADCs that may have come to market (or to the 
>> casper community) since VEGAS was built.  I know Homin Jiang and Jonathan 
>> Weintroub's groups are working on wideband ADCs and integration with FPGA 
>> boards.
>> 
>> So the questions that I am posing are:  What ADCs and hardware configuration 
>> would you choose for a minimum-effort maximum-effect project to be built and 
>> deployed in the near future?  What projects could we assist with in getting 
>> the next fast ADCs developed and tested?
>> 
>> Is this all fodder for the CASPER workshop discussions?
>> 
>> John
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To post to this group, send email to casper@lists.berkeley.edu.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] Programming the SNAP board via the 10GbE interface.

2018-05-23 Thread David MacMahon
Hi, Jake,

I think the problem might be that your DHCP server if offering a 169.254.x.y 
address. This address range is reserved for “auto-config” IPs so the LWIP code 
on the microblaze might be refusing to use it. I suspect that it is then timing 
out and inventing its own auto-config IP address, but far too late for the 
TAPCP stuff to know about it.

Can you serve different IP address range to the SNAP(s)?  Maybe that will 
fix/avoid the problem?

HTH,
Dave

P.S. There is some ancient history of ancient CASPER hardware using this 
auto-config address range, but that was not configured via DHCP and was not 
using the LWIP IP implementation. 

> On May 22, 2018, at 23:35, Jake Jones  wrote:
> 
> Hi Jack,
> 
> Is it normal for the debug messages to say 'ICAP init FAIL'?
> 
> Unfortunately syslog is not very helpful either, here's a summary:
> May 23 10:48:32 Antenna-PC dnsmasq-dhcp[29048]: DHCPDISCOVER(eth3) 
> 00:40:89:41:13:02
> May 23 10:48:32 Antenna-PC dnsmasq-dhcp[29048]: DHCPOFFER(eth3) 
> 169.254.39.125 00:40:89:41:13:02
> May 23 10:48:45 Antenna-PC dnsmasq-dhcp[29048]: DHCPDISCOVER(eth3) 
> 00:40:89:41:13:02
> May 23 10:48:45 Antenna-PC dnsmasq-dhcp[29048]: DHCPOFFER(eth3) 
> 169.254.39.125 00:40:89:41:13:02
> ...
> 
> After some testing I can't see anything wrong with the NIC I'm using, same 
> for the DHCP server. I also compiled a modified golden image that used SFP 
> port 1 instead of port 0, but ran into the same problems.
> 
> I also noticed that the following debug message appears consistently 240 
> seconds after programming:
> IP A9FE0313  NM   GW 
> TAPCP server ready
> but there is no network traffic or logs suggesting that this IP was obtained 
> by the DHCP server. Nonetheless, I tried connecting to it using this IP but 
> had no success there either.
> fpga = casperfpga.CasperFpga('169.254.3.19', port=69)
> 
> 
> Thanks,
> Jake Jones.
> 
>> On Tue, May 22, 2018 at 6:07 PM, Jack Hickish  wrote:
>> Well that's not wildly instructive!
>> Is there anything telling from the dhcp server, perhaps in syslog?
>> 
>> Cheers
>> Jack
>> 
>>> On Tue, 22 May 2018 at 11:00 Jake Jones  wrote:
>>> Hi Jack,
>>> 
>>> This is the output I get immediately after programming the golden image:
>>> 
>>> ICAP init FAIL
>>> # JAM starting
>>> using ethernet core gbe_port0
>>> MAC 0x004089411302
>>> IP   NM   GW 
>>> link is UP
>>> FPGA at 33.6 C [ms 1]
>>> FPGA at 33.6 C [ms 2]
>>> ...
>>> 
>>> 
>>> Cheers,
>>> Jake Jones.
>>> 
 On Tue, May 22, 2018 at 3:48 PM, Jack Hickish  
 wrote:
 Hi Jake,
 
 That's interesting, I don't think I've seen this failure mode before. If 
 you plug a mini usb connector into the SNAP, you can read debug messages 
 over this port, using it as an 8N1 115200 baud serial interface. If you 
 have a xilinx programmer, you could also burn the flash manually with 
 vivado and the .bin file in the repository, but I don't see why that 
 should make any difference.
 
 Cheers
 Jack
 
> On Tue, 22 May 2018 at 04:24 Jake Jones  wrote:
> Hi All,
> 
> I have an issue trying to program the SNAP board via the 10GbE port. When 
> I program the board with a golden image the 10GbE port does not obtain an 
> ip address.
> 
> Investigating further, I:
> 1) Programed the golden image (using snap160t_golden_2018-02-17_1540.fpg 
> found in the casper-astro/mlib_devel repository) via the raspberry pi.
> 2) Monitoring the traffic on the 10GbE port using wireshark it can be 
> seen that the SNAP board sends DHCP Discover packets which is followed by 
> a DHCP offer from the server. However the SNAP board never responds with 
> a DHCP Request. Additionally it doesn't respond to any arp requests.
> 
> Is there something I'm missing here? Any help is much appreciated!
> 
> Thanks,
> Jake Jones.
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 "casper@lists.berkeley.edu" group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to casper+unsubscr...@lists.berkeley.edu.
 To post to this group, send email to casper@lists.berkeley.edu.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "casper@lists.berkeley.edu" group.
>>> To unsubscribe from this group and stop receiving emails 

Re: [casper] Problem with ROACH2 netboot

2018-05-09 Thread David MacMahon
Hi, Bela,

I'm not sure what the problem is, but I can offer a few comments that might be 
helpful...

> On May 8, 2018, at 22:50, Bela Dixit  wrote:
> 
> U-Boot 2011.06-rc2-0-gd422dc0-dirty (Nov 08 2012 - 16:04:14)
> 
> CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
>No Security/Kasumi support
>Bootstrap Option C - Boot ROM Location EBC (16 bits)
>32 kB I-Cache 32 kB D-Cache
> Board: ROACH2
> I2C:   ready
> DRAM:  512 MiB
> Flash: 128 MiB
> *** Warning - bad CRC, using default environment

Does the working ROACH2 also give this "bad CRC, using default environment" 
message?

> Linux/PowerPC load: console=ttyS0,115200 root=/dev/nfs 
> rootpath=192.168.100.1:/srv/roach_boot/etch ip=dhcp

These kernel command line options seem good.

> [1.626566] IP-Config: Failed to open eth0
> [1.630764] IP-Config: No network devices available
> [1.636493] Root-NFS: no NFS server address
> [1.640794] VFS: Unable to mount root fs via NFS, trying floppy.
> [1.647713] VFS: Cannot open root device "nfs" or unknown-block(2,0): 
> error -6
> [1.655011] Please append a correct "root=" boot option; here are the 
> available partitions:
> [1.663380] 1f004096 mtdblock0  (driver?)
> [1.668452] 1f01   65536 mtdblock1  (driver?)
> [1.673513] 1f02   49152 mtdblock2  (driver?)
> [1.678577] 1f03   11264 mtdblock3  (driver?)
> [1.683640] 1f04 256 mtdblock4  (driver?)
> [1.688710] 1f05 512 mtdblock5  (driver?)
> [1.693773] Kernel panic - not syncing: VFS: Unable to mount root fs on 
> unknown-block(2,0)

It's weird that it can't mount the root filesystem.  You could check the 
"/etc/exports" file in the NFS server to make sure it is exporting the root 
filesystem to this ROACH2's IP address.   I also wonder whether the working 
ROACH2 also has these MTD partitions.   Maybe they are confusing the boot 
process?  The "Failed to open eth0" message also seems weird.  I suspect 
comparing with the console output from the working ROACH2 could offer some more 
clues.

HTH,
Dave

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] temporary ROACH2 faults after power dips and spikes

2018-04-19 Thread David MacMahon
Hi, Jonathan,

The ROACH2s at GB output over all 8 SFP+ ports very often without problem.  Not 
sure whether this matters, but they are connected via fiber optic transceivers 
rather than copper cables.

HTH,
Dave

> On Apr 19, 2018, at 11:00, Jonathan Weintroub  
> wrote:
> 
> Dear kind CASPER Colleagues,
> 
> To offer a little more feedback on this:
> 
> —We reiterate that all advice is appreciated and useful.  They may well be 
> relevant to prior weird experiences, however in the current case . . .
> 
> — . . . after assorted power cycles removing all inputs, confirmation that 
> the unit has an approved FSP power supply, and swapping in spares both at the 
> LRU and NIC level, we are now convinced that our current issues with one 
> 10GigE port of 8 going down are not ROACH2 hardware related, but rather 
> something to do with the environment in which it is installed (i.e. related 
> to external stimuli).  Still investigating.
> 
> —One unusual aspect of this application is we are using all 8 SFP+ ports on 
> the ROACH2, though we are not stressing the rates. It is a long shot, but are 
> there any insights into possible stresses or snafus we might run into when 
> fully utilizing the ROACH 10GigE NIC ports?
> 
> Thanks again.
> 
> Jonathan & crew
> 
> 
> 
> 
>> On Apr 18, 2018, at 10:13 AM, Jonathan Weintroub > > wrote:
>> 
>> Hi Jonathon,
>> 
>> Your important input here warrants cc to the mailing list, hereby 
>> accomplished.
>> 
>> We have switched to the FSP power supplies for new builds, and have repaired 
>> older ROACH2s a number of which have had failing XEALs (mostly) by replacing 
>> same with FSPs.  We have I think done some prophylactic FSP replacements in 
>> offline spare stock.  But we’ve ordered and deployed probably over 100 
>> ROACH2s over about a four, perhaps even five year period, they are used at 
>> SMA for SWARM, and also distributed all over the world for the EHT.  So we 
>> have NOT retrofitted every unit out there with FSP power supplies.  
>> 
>> While the XEAL are known to be not reliable, when a unit is working, it's 
>> not that straightforward to recall it for a power supply replacement—ain’t 
>> broke don’t fix applies.
>> 
>> Thanks for your input. Thanks also for input from Dan, Jason, Matt and Mike, 
>> which is valuable and relevant advice.  I was holding off on responding, 
>> we’re at SMA running tests, and don’t yet know the resolution for the units 
>> in question.  
>> 
>> Jonathon’s email triggered this interim response. I’ll let all know the 
>> outcome on the lightening damage when we have one.
>> 
>> Thanks,
>> 
>> Jonathan
>> 
>> 
>> 
>>> On Apr 18, 2018, at 9:44 AM, Jonathon Kocz >> > wrote:
>>> 
>>> Hi Jonathan,
>>> 
>>> I think you've already addressed this, but to double check, are these R2s 
>>> after you switched to the SP25-60FAG power supply?
>>> 
>>> I've had a lot of trouble with R2s using istar/xeal supplies getting into 
>>> strange situations that always seem fixable with a new power supply. 
>>> 
>>> Cheers,
>>> Jonathon
>>> 
>>> On 17 April 2018 at 16:22, Jonathan Weintroub >> > wrote:
>>> Hi CASPERites,
>>> 
>>> With experience on quite a few ROACH2s in the lab and in the field for some 
>>> years, and a pattern has emerged which warrants a question to the ROACH2 
>>> experts on this list. The SAO team has seen strange faults happen on 
>>> multiple ROACH2 units after power failures, dips and lightening storms.   
>>> I’ll list the various weirdnesses below, but the key point is while a full 
>>> power cycle, including removing power from the line input, does not reset 
>>> and cure the units. But extended power down (like overnight, or 24 hours, 
>>> or more) does seem to bring the units back to life again.  This was 
>>> discovered serendipitously, and has happened often enough that the pattern 
>>> seems repeatable (though controlled experiments aren’t really possible, we 
>>> try not to stress our equipment this way).
>>> 
>>> Has anyone else seen this, and does someone perhaps have a suggestion as to 
>>> root cause, or some way to accelerate the reset?
>>> 
>>> Example faults have included:
>>> 
>>> —ADC5G clock not being correctly received, or not being transmitted to 
>>> FPGA, or being transmitted at incorrect speed.
>>> 
>>> —A particular ADC would refuse to calibrate its digital interface to the 
>>> FPGA.
>>> 
>>> —QDRs which don’t calibrate
>>> 
>>> —After a lightening storm on Maunakea we have two units with a single SFP+ 
>>> port among 8 falling to transmit packets, though we have yet to see if an 
>>> extended power down will cure this.
>>> 
>>> Again these faults have been distributed across multiple units, and in all 
>>> cases have eventually been cleared, after extended power down.  Which is 
>>> good, but the 

Re: [casper] Black box compilation error using Casper XPS flow

2018-04-02 Thread David MacMahon
Hi, Vijay,

Since your HDL is not combinatorial you should comment out this line in your 
config.m file:

>   % System Generator has to assume that your entity  has a combinational feed 
> through; 
>   %   if it  doesn't, then comment out the following line:
>   this_block.tagAsCombinational;

I'm not sure this will fix your problem, but there's no other obvious problems 
that i can see.

HTH,
Dave

> On Apr 2, 2018, at 13:12, Vijay Kumar <vijaykumarcas...@gmail.com> wrote:
> 
> Hi Dave,
> 
> I referred to the black box guide, so I have a single clock and a clock 
> enable. 
> And the config file was generated from the wizard. 
> I am attaching herewith my model which is a simple one to generate a sinusoid 
> signal.
> 
> I would be much grateful if you could please have look at this. 
> 
> Thanks,
> Vijay.
> 
> 
> 
> 
> On Sun, Apr 1, 2018 at 4:36 PM, David MacMahon <dav...@berkeley.edu 
> <mailto:dav...@berkeley.edu>> wrote:
> I'm not sure what's causing your problem, but here are some other ideas...  
> Did you use the Black Box Wizard to create a "...config.m" file for your HDL 
> or did you hand code it?  Does your top level HDL have a single clock input 
> and a clock enable input?  Does your config.m file list all the HDL files 
> needed to build your HDL?  You could check the system generator synthesis 
> report to ensure that it is synthesizing your HDL as you would expect.
> 
> Dave
> 
> 
>> On Apr 1, 2018, at 13:14, Vijay Kumar <vijaykumarcas...@gmail.com 
>> <mailto:vijaykumarcas...@gmail.com>> wrote:
>> 
>> Hi David,
>> 
>> Yes, the normal designs that don't have black boxes compile without any 
>> problem. I am not sure why this "Possible deprecated  ..." warning is given. 
>> I have seen it in few emails for other Casperites, so I thought it was 
>> normal. 
>> 
>> The OS is: Red Hat  (release 6.7)
>> 
>> Thanks for your help on this.
>> 
>> Regards,
>> Vijay.
>> 
>> On Sun, Apr 1, 2018 at 3:34 PM, David MacMahon <dav...@berkeley.edu 
>> <mailto:dav...@berkeley.edu>> wrote:
>> Hi, Vijay,
>> 
>> Are you able to compile a simple model without using the back box block?  
>> The "Possible deprecated use of get on a Java object with an HG Property 
>> 'UserData'" waning seems like a simulink and/or system generator issue.  And 
>> the segmentation fault when running xps is not a good sign at all.  What OS 
>> are you using?
>> 
>> Dave
>> 
>> 
>>> On Apr 1, 2018, at 10:58, Vijay Kumar <vijaykumarcas...@gmail.com 
>>> <mailto:vijaykumarcas...@gmail.com>> wrote:
>>> 
>>> Hi Jack,
>>> 
>>> Thanks a lot for your reply. 
>>> 
>>> I am trying to incorporate a verilog design with the Casper design. It's a 
>>> simple test design and works in simulation. So probably no HDL syntax 
>>> issues. 
>>> The setup I am using is Xilinx 14.4 based one with Matlab 2012b and 
>>> unfortunately, I don't have the 14.7 setup.  Do you think its an issue with 
>>> the tool setup? 
>>> 
>>> Please see the below outputs for the mlib_devel version.
>>> 
>>> $ git rev-parse HEAD
>>> a949c9d5c1761078b4c884699ff52c1497a17ff6
>>> 
>>> git describe --tags
>>> mlib_devel-2010-09-20-1369-ga949c9d
>>> 
>>> 
>>> Thanks again for your help.
>>> 
>>> Vijay.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Sun, Apr 1, 2018 at 9:34 AM, Jack Hickish <jackhick...@gmail.com 
>>> <mailto:jackhick...@gmail.com>> wrote:
>>> Hi Vijay,
>>> 
>>> Are you trying to black box a system generator model, or plain 
>>> verilog/vhdl? In the latter case, any error in the HDL syntax or issues 
>>> with port declarations will probably cause the tools to implode like you 
>>> see.
>>> 
>>> Do you still get this error with the latest (14.7) of the xilinx tools, and 
>>> an appropriate version of Matlab (2013b would be ideal)?
>>> 
>>> What version of mlib_devel are you using?
>>> 
>>> Cheers
>>> Jack
>>> 
>>> On Sun, Apr 1, 2018, 6:24 AM Vijay Kumar <vijaykumarcas...@gmail.com 
>>> <mailto:vijaykumarcas...@gmail.com>> wrote:
>>> Dear Casperites,
>>> 
>>> I am trying to run a simple model with a black box to see if I can get an 
>>> RTL to work with a Simulink model in the Casper fl

Re: [casper] Black box compilation error using Casper XPS flow

2018-04-01 Thread David MacMahon
I'm not sure what's causing your problem, but here are some other ideas...  Did 
you use the Black Box Wizard to create a "...config.m" file for your HDL or did 
you hand code it?  Does your top level HDL have a single clock input and a 
clock enable input?  Does your config.m file list all the HDL files needed to 
build your HDL?  You could check the system generator synthesis report to 
ensure that it is synthesizing your HDL as you would expect.

Dave

> On Apr 1, 2018, at 13:14, Vijay Kumar <vijaykumarcas...@gmail.com> wrote:
> 
> Hi David,
> 
> Yes, the normal designs that don't have black boxes compile without any 
> problem. I am not sure why this "Possible deprecated  ..." warning is given. 
> I have seen it in few emails for other Casperites, so I thought it was 
> normal. 
> 
> The OS is: Red Hat  (release 6.7)
> 
> Thanks for your help on this.
> 
> Regards,
> Vijay.
> 
> On Sun, Apr 1, 2018 at 3:34 PM, David MacMahon <dav...@berkeley.edu 
> <mailto:dav...@berkeley.edu>> wrote:
> Hi, Vijay,
> 
> Are you able to compile a simple model without using the back box block?  The 
> "Possible deprecated use of get on a Java object with an HG Property 
> 'UserData'" waning seems like a simulink and/or system generator issue.  And 
> the segmentation fault when running xps is not a good sign at all.  What OS 
> are you using?
> 
> Dave
> 
> 
>> On Apr 1, 2018, at 10:58, Vijay Kumar <vijaykumarcas...@gmail.com 
>> <mailto:vijaykumarcas...@gmail.com>> wrote:
>> 
>> Hi Jack,
>> 
>> Thanks a lot for your reply. 
>> 
>> I am trying to incorporate a verilog design with the Casper design. It's a 
>> simple test design and works in simulation. So probably no HDL syntax 
>> issues. 
>> The setup I am using is Xilinx 14.4 based one with Matlab 2012b and 
>> unfortunately, I don't have the 14.7 setup.  Do you think its an issue with 
>> the tool setup? 
>> 
>> Please see the below outputs for the mlib_devel version.
>> 
>> $ git rev-parse HEAD
>> a949c9d5c1761078b4c884699ff52c1497a17ff6
>> 
>> git describe --tags
>> mlib_devel-2010-09-20-1369-ga949c9d
>> 
>> 
>> Thanks again for your help.
>> 
>> Vijay.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sun, Apr 1, 2018 at 9:34 AM, Jack Hickish <jackhick...@gmail.com 
>> <mailto:jackhick...@gmail.com>> wrote:
>> Hi Vijay,
>> 
>> Are you trying to black box a system generator model, or plain verilog/vhdl? 
>> In the latter case, any error in the HDL syntax or issues with port 
>> declarations will probably cause the tools to implode like you see.
>> 
>> Do you still get this error with the latest (14.7) of the xilinx tools, and 
>> an appropriate version of Matlab (2013b would be ideal)?
>> 
>> What version of mlib_devel are you using?
>> 
>> Cheers
>> Jack
>> 
>> On Sun, Apr 1, 2018, 6:24 AM Vijay Kumar <vijaykumarcas...@gmail.com 
>> <mailto:vijaykumarcas...@gmail.com>> wrote:
>> Dear Casperites,
>> 
>> I am trying to run a simple model with a black box to see if I can get an 
>> RTL to work with a Simulink model in the Casper flow. 
>> 
>> I am getting the following error at the very beginning of compilation. 
>> Please see the following output. Appreciate your help to resolve this.
>> 
>> Thanks a lot.
>> 
>> Detected Linux OS
>> #
>> ##  System Update  ##
>> #
>> #
>> ## Block objects creation  ##
>> #
>> ##
>> ## Checking objects ##
>> ##
>> Running system generator ...
>> Warning: Possible deprecated use of get on a Java object with an HG Property 
>> 'UserData'. 
>> > In xlNGCPostGeneration at 57
>>   In 
>> /opt/Xilinx/14.4/ISE_DS/ISE/sysgen/bin/lin64/xlruntargetfcn.p>xlruntargetfcn 
>> at 12
>>   In 
>> /opt/Xilinx/14.4/ISE_DS/ISE/sysgen/bin/lin64/xlGenerateButton.p>xlGenerateButton
>>  at 478
>>   In gen_xps_files at 323
>>   In casper_xps>run_Callback at 155
>>   In casper_xps at 88
>>   In 
>> @(hObject,eventdata)casper_xps('run_Callback',hObject,eventdata,guidata(hObject))
>>  
>> Warning: Possible deprecated use of set on a Java object with an HG Property 
>> 'UserData'. 
>> > In xlNGCPostGeneration at 60
>>   In 
>> /op

Re: [casper] Black box compilation error using Casper XPS flow

2018-04-01 Thread David MacMahon
Hi, Vijay,

Are you able to compile a simple model without using the back box block?  The 
"Possible deprecated use of get on a Java object with an HG Property 
'UserData'" waning seems like a simulink and/or system generator issue.  And 
the segmentation fault when running xps is not a good sign at all.  What OS are 
you using?

Dave

> On Apr 1, 2018, at 10:58, Vijay Kumar  wrote:
> 
> Hi Jack,
> 
> Thanks a lot for your reply. 
> 
> I am trying to incorporate a verilog design with the Casper design. It's a 
> simple test design and works in simulation. So probably no HDL syntax issues. 
> The setup I am using is Xilinx 14.4 based one with Matlab 2012b and 
> unfortunately, I don't have the 14.7 setup.  Do you think its an issue with 
> the tool setup? 
> 
> Please see the below outputs for the mlib_devel version.
> 
> $ git rev-parse HEAD
> a949c9d5c1761078b4c884699ff52c1497a17ff6
> 
> git describe --tags
> mlib_devel-2010-09-20-1369-ga949c9d
> 
> 
> Thanks again for your help.
> 
> Vijay.
> 
> 
> 
> 
> 
> 
> 
> On Sun, Apr 1, 2018 at 9:34 AM, Jack Hickish  > wrote:
> Hi Vijay,
> 
> Are you trying to black box a system generator model, or plain verilog/vhdl? 
> In the latter case, any error in the HDL syntax or issues with port 
> declarations will probably cause the tools to implode like you see.
> 
> Do you still get this error with the latest (14.7) of the xilinx tools, and 
> an appropriate version of Matlab (2013b would be ideal)?
> 
> What version of mlib_devel are you using?
> 
> Cheers
> Jack
> 
> On Sun, Apr 1, 2018, 6:24 AM Vijay Kumar  > wrote:
> Dear Casperites,
> 
> I am trying to run a simple model with a black box to see if I can get an RTL 
> to work with a Simulink model in the Casper flow. 
> 
> I am getting the following error at the very beginning of compilation. Please 
> see the following output. Appreciate your help to resolve this.
> 
> Thanks a lot.
> 
> Detected Linux OS
> #
> ##  System Update  ##
> #
> #
> ## Block objects creation  ##
> #
> ##
> ## Checking objects ##
> ##
> Running system generator ...
> Warning: Possible deprecated use of get on a Java object with an HG Property 
> 'UserData'. 
> > In xlNGCPostGeneration at 57
>   In 
> /opt/Xilinx/14.4/ISE_DS/ISE/sysgen/bin/lin64/xlruntargetfcn.p>xlruntargetfcn 
> at 12
>   In 
> /opt/Xilinx/14.4/ISE_DS/ISE/sysgen/bin/lin64/xlGenerateButton.p>xlGenerateButton
>  at 478
>   In gen_xps_files at 323
>   In casper_xps>run_Callback at 155
>   In casper_xps at 88
>   In 
> @(hObject,eventdata)casper_xps('run_Callback',hObject,eventdata,guidata(hObject))
>  
> Warning: Possible deprecated use of set on a Java object with an HG Property 
> 'UserData'. 
> > In xlNGCPostGeneration at 60
>   In 
> /opt/Xilinx/14.4/ISE_DS/ISE/sysgen/bin/lin64/xlruntargetfcn.p>xlruntargetfcn 
> at 12
>   In 
> /opt/Xilinx/14.4/ISE_DS/ISE/sysgen/bin/lin64/xlGenerateButton.p>xlGenerateButton
>  at 478
>   In gen_xps_files at 323
>   In casper_xps>run_Callback at 155
>   In casper_xps at 88
>   In 
> @(hObject,eventdata)casper_xps('run_Callback',hObject,eventdata,guidata(hObject))
>  
> XSG generation complete.
> XSG generation complete.
> #
> ## Copying base system ##
> #
> Copying base package from:
>  /opt/mlib_devel/xps_base/XPS_ROACH2_base
> 
> ## Copying custom IPs ##
> 
> ##
> ## Creating Simulink IP ##
> ##
> ##
> ## Creating EDK files   ##
> ##
> Running off sys_clk @100MHz
> Running off sys_clk @100MHz
> Running off sys_clk @100MHz
> #
> ## Elaborating objects ##
> #
> ##
> ## Preparing software files ##
> ##
> #
> ## Running EDK backend ##
> #
> Warning: File 
> '/home/vijay/work/testing/blackboxcheck/testing1/XPS_ROACH2_base/implementation/system.bit'
>  not found. 
> > In gen_xps_files at 664
>   In casper_xps>run_Callback at 155
>   In casper_xps at 88
>   In 
> @(hObject,eventdata)casper_xps('run_Callback',hObject,eventdata,guidata(hObject))
>  
> Warning: File 
> '/home/vijay/work/testing/blackboxcheck/testing1/XPS_ROACH2_base/implementation/download.bit'
>  not found. 
> > In gen_xps_files at 665
>   In casper_xps>run_Callback at 155
>   In casper_xps at 88
>   In 
> @(hObject,eventdata)casper_xps('run_Callback',hObject,eventdata,guidata(hObject))
>  
> xps -nw -scr run_xps.tcl system.xmp: Segmentation fault
> Error using gen_xps_files (line 685)
> XPS failed.
> 
> 
> 
> -- 
> You received this 

Re: [casper] packets lost of a packetized correlator

2018-03-12 Thread David MacMahon
I think the tx overflow will be OK since the FPGA won't try to send more than 
10 Gbps.  I think the "rx overrun" flag would be more interesting.  But 
probably best to check both of course! :)

Is the X engine clock an exact copy of the F engine clock (i.e. a common clock 
that goes through a massive splitter) or just a clock of the same frequency 
locked to the same reference (but not the exact same clock)?  Things get more 
complicated once you run F and X at different rates, so I wouldn't recommend 
that path if you can avoid it.

HTH,
Dave

> On Mar 12, 2018, at 22:01, Homin Jiang <ho...@asiaa.sinica.edu.tw> wrote:
> 
> Hi Dave:
> 
> Thanks of prompt response and suggestion.
> The X engine is running the same clock as the F engine, 2.24GHz/8 = 280MHz. 
> Perhaps I should increase the clock in X engine ?
> Yes, there is Tx overflow flag in the model, it will be the first thing for 
> me to check.
> 
> best
> homin
> 
> 
> 
> On Tue, Mar 13, 2018 at 12:42 PM, David MacMahon <dav...@berkeley.edu 
> <mailto:dav...@berkeley.edu>> wrote:
> Hi, Homin,
> 
> The first thing to do is figure out where packet loss is actually happening.  
> The fact that you have to reset the 10G yellow blocks to get things going 
> again suggests that the X engines are not keeping up with the data rate 
> (since the F engines will happily churn out 8.96 Gbps data regardless of the 
> receivers' states and the X engines will happily churn out data regardless of 
> the PC's state, it seems that the only way for the 10 GbE blocks to get 
> confused is if the X engines are not keep up with the incoming data rate).  I 
> assume the F engine ROACH2s are being clocked via their ADCs.  How are the X 
> engine ROACH2s being clocked?
> 
> Assuming the F-to-X packets are going through a switch, you could query the 
> switch to see what it thinks the incoming and outgoing data rates are on the 
> various ports involved.
> 
> Does your design have any way of capturing the overflow flags of the 10 GbE 
> cores?
> 
> Dave
> 
>> On Mar 12, 2018, at 19:39, Homin Jiang <ho...@asiaa.sinica.edu.tw 
>> <mailto:ho...@asiaa.sinica.edu.tw>> wrote:
>> 
>> Dear Casperite:
>> 
>> We have been deployed a 7(actually 8) antenna packetized correlator on Mauna 
>> Loa Hawaii. Running at 2.24GHz clock, that means 8.96 G bits per second for 
>> each 10G ethernet. The packet size is 2K. There are 8 sets of ROACH2 as F 
>> engines, the other 8 sets of ROACH2 as X engines. Data packets from F to X 
>> looks fine, the problem of lost packets is the integration data from X 
>> engine to the computer. The 10G yellow blocks in X engines handle the 
>> incoming data packets from F engine at the data rate of 8.96 Gbps, and 
>> output the integration data to PC, the outgoing data rate depends on the 
>> integration time, usually it is longer than 0.5 second. The syndrome is that 
>> packets lost happened by specific X engines after 10,20 minutes or couple of 
>> hours. Once it happened, we reset all the 10G yellow blocks in F and X, then 
>> the system revived.
>> 
>> I have no idea about the 10G ethernet yellow block. Any comments of 
>> suggestions are highly welcome.
>> 
>> best
>> homin jiang
>>   
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "casper@lists.berkeley.edu <mailto:casper@lists.berkeley.edu>" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to casper+unsubscr...@lists.berkeley.edu 
>> <mailto:casper+unsubscr...@lists.berkeley.edu>.
>> To post to this group, send email to casper@lists.berkeley.edu 
>> <mailto:casper@lists.berkeley.edu>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu <mailto:casper@lists.berkeley.edu>" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> <mailto:casper+unsubscr...@lists.berkeley.edu>.
> To post to this group, send email to casper@lists.berkeley.edu 
> <mailto:casper@lists.berkeley.edu>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> <mailto:casper+unsubscr...@lists.berkeley.edu>.
> To post to this group, send email to casper@lists.berkeley.edu 
> <mailto:casper@lists.berkeley.edu>.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] packets lost of a packetized correlator

2018-03-12 Thread David MacMahon
Hi, Homin,

The first thing to do is figure out where packet loss is actually happening.  
The fact that you have to reset the 10G yellow blocks to get things going again 
suggests that the X engines are not keeping up with the data rate (since the F 
engines will happily churn out 8.96 Gbps data regardless of the receivers' 
states and the X engines will happily churn out data regardless of the PC's 
state, it seems that the only way for the 10 GbE blocks to get confused is if 
the X engines are not keep up with the incoming data rate).  I assume the F 
engine ROACH2s are being clocked via their ADCs.  How are the X engine ROACH2s 
being clocked?

Assuming the F-to-X packets are going through a switch, you could query the 
switch to see what it thinks the incoming and outgoing data rates are on the 
various ports involved.

Does your design have any way of capturing the overflow flags of the 10 GbE 
cores?

Dave

> On Mar 12, 2018, at 19:39, Homin Jiang  wrote:
> 
> Dear Casperite:
> 
> We have been deployed a 7(actually 8) antenna packetized correlator on Mauna 
> Loa Hawaii. Running at 2.24GHz clock, that means 8.96 G bits per second for 
> each 10G ethernet. The packet size is 2K. There are 8 sets of ROACH2 as F 
> engines, the other 8 sets of ROACH2 as X engines. Data packets from F to X 
> looks fine, the problem of lost packets is the integration data from X engine 
> to the computer. The 10G yellow blocks in X engines handle the incoming data 
> packets from F engine at the data rate of 8.96 Gbps, and output the 
> integration data to PC, the outgoing data rate depends on the integration 
> time, usually it is longer than 0.5 second. The syndrome is that packets lost 
> happened by specific X engines after 10,20 minutes or couple of hours. Once 
> it happened, we reset all the 10G yellow blocks in F and X, then the system 
> revived.
> 
> I have no idea about the 10G ethernet yellow block. Any comments of 
> suggestions are highly welcome.
> 
> best
> homin jiang
>   
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To post to this group, send email to casper@lists.berkeley.edu 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] FFT biplex_core block

2018-02-22 Thread David MacMahon
I think we need to takes bets on the even/odd vs high/low division of channels 
between out1 and out2... :)

> On Feb 21, 2018, at 23:33, Jack Hickish <jackhick...@gmail.com> wrote:
> 
> Well, if Dave, Glenn and Aaron all agree, then I'm sold.
> 
> Thanks.
> 
>> On Wed, Feb 21, 2018, 10:50 PM David MacMahon <dav...@berkeley.edu> wrote:
>> Hi, Jack,
>> 
>> I haven’t used the biplex_core block in a while, but I believe the inputs, 
>> pol1 and pol2, are two independent complex input signals. The outputs, out1 
>> and out2, first output the frequency channels for input pol1, with the low 
>> half of the band being output in bit reversed order on out1 and the high 
>> half of the band being output on out2 in bit reversed order. After 
>> outputting the channels for pol1, out1 and out2 output the channels for pol2 
>> in a similar order. 
>> 
>> For a 16 channel fft, I think out1 will have channels: pol1[0 4 2 6 1 5 3 7] 
>> followed by pol2[0 4 2 6 1 5 3 7]. Out2 will have the same output order but 
>> 8 (ie N/2) channels higher.  I hope that makes sense, but if not I can make 
>> a better diagram for you tomorrow. 
>> 
>> Of course this should be verified with simulation, but I think it’s a good 
>> starting point. BTW, this assumes the inputs are presented in natural tone 
>> order at both inputs with the t=0 samples of pol1 and pol2 being presented 
>> at the respective inputs simultaneously. 
>> 
>> Dave
>> 
>>> On Feb 21, 2018, at 18:04, Jack Hickish <jackhick...@gmail.com> wrote:
>>> 
>> 
>>> Howdy,
>>> 
>>> Partly motivated by a search for RAM savings, and partly for fun, I'm 
>>> looking through the innards of the fft_biplex_real_4x block. Can someone 
>>> tell me, using short words and/or pictures, what the the relationship 
>>> between the inputs (pol1, pol2) and the outputs (out1, out2) on the 
>>> biplex_core block is.
>>> 
>>> I'm in the midst of reverse engineering the block by simulation / staring 
>>> at the unscrambler / reading about fft biplex implementations, but surely 
>>> someone must(!) know what this block actually does (or claims to do)?
>>> 
>>> Yours, optimistically,
>>> 
>>> Jack
>> 
>>> -- 
>> 
>>> You received this message because you are subscribed to the Google Groups 
>>> "casper@lists.berkeley.edu" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to casper+unsubscr...@lists.berkeley.edu.
>>> To post to this group, send email to casper@lists.berkeley.edu.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To post to this group, send email to casper@lists.berkeley.edu.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] FFT biplex_core block

2018-02-21 Thread David MacMahon
Hi, Jack,

I haven’t used the biplex_core block in a while, but I believe the inputs, pol1 
and pol2, are two independent complex input signals. The outputs, out1 and 
out2, first output the frequency channels for input pol1, with the low half of 
the band being output in bit reversed order on out1 and the high half of the 
band being output on out2 in bit reversed order. After outputting the channels 
for pol1, out1 and out2 output the channels for pol2 in a similar order. 

For a 16 channel fft, I think out1 will have channels: pol1[0 4 2 6 1 5 3 7] 
followed by pol2[0 4 2 6 1 5 3 7]. Out2 will have the same output order but 8 
(ie N/2) channels higher.  I hope that makes sense, but if not I can make a 
better diagram for you tomorrow. 

Of course this should be verified with simulation, but I think it’s a good 
starting point. BTW, this assumes the inputs are presented in natural tone 
order at both inputs with the t=0 samples of pol1 and pol2 being presented at 
the respective inputs simultaneously. 

Dave

> On Feb 21, 2018, at 18:04, Jack Hickish  wrote:
> 
> Howdy,
> 
> Partly motivated by a search for RAM savings, and partly for fun, I'm looking 
> through the innards of the fft_biplex_real_4x block. Can someone tell me, 
> using short words and/or pictures, what the the relationship between the 
> inputs (pol1, pol2) and the outputs (out1, out2) on the biplex_core block is.
> 
> I'm in the midst of reverse engineering the block by simulation / staring at 
> the unscrambler / reading about fft biplex implementations, but surely 
> someone must(!) know what this block actually does (or claims to do)?
> 
> Yours, optimistically,
> 
> Jack
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] sharing CASPER equipment

2018-01-29 Thread David MacMahon
To further John's comments, Breakthrough Listen has a completely separate set 
of HPC servers installed at Green Bank that are connected to the same Ethernet 
switch as the ROACH2s and the VEGAS spectrometer's HPC servers.  When using the 
BL backend, up to 4 of the 8 ROACH2s are configured to send data the the BL 
backends rather than the VEGAS backends and that all works fine.  There is a 
vision to multicast the UDP packets so that BL and VEGAS can process the same 
IF signal, but we haven't gone down that path yet.  One thing to be careful of 
is ensuring that no system spams the network (e.g. by sending packets to an MAC 
address that is down resulting in the switch sending the packets to all 
connected devices), but that is not very onerous.  IMHO, maintaining a large 
shared environment is easier than maintaining several smaller environments, 
plus the upstream analog stuff can all be shared rather than having to 
duplicate/split it for multiple digital systems.

I guess the executive summary is that sharing the hardware with experiments 
that will not run simultaneously is relatively straightforward.  Sharing the 
hardware output to multiple back ends simultaneously is a bit trickier but 
still possible by using multicast (provided that the FPGA functionality is 
suitable for all consumers).

Dave

> On Jan 29, 2018, at 10:20, John Ford  wrote:
> 
> Hi Tom.
> 
> I think this is reasonably easy to manage.  At Green Bank, the spectrometer 
> consists of 8 ROACH-2s that are all reprogrammed for different observing 
> modes.  The modes are personalities stored on disk and loaded on command.  It 
> works fine.  You do have to coordinate to make sure only one computer is 
> commanding things.  If you're not hitting the wall performace-wise, rebooting 
> your control computer into different virtual machines is an interesting way 
> to make sure you don't get wrapped around that particular axle.  We never 
> attempted to run stuff on a virtual machine because we were trying to wring 
> all the performance we could out of our 10 gig ports.  It would be an 
> interesting experiment to see how much running in a VM costs in 
> performance...  
> 
> Managing the PFGA personalities is easy, I think.  Managing the software is 
> probably pretty easy as well if you package things up and have scripts to 
> start and stop the different experiments.
> 
> John
> 
> 
> On Mon, Jan 29, 2018 at 12:17 AM, Jason Manley  > wrote:
> Hi Tom
> 
> We switch firmware around on our boards regularly (~20min observation 
> windows) on KAT-7 and MeerKAT. But we maintain control of the various 
> bitstreams ourselves, and manage the boards internally.
> 
> There is a master controller which handles allocation of processing nodes to 
> various projects, and loads the appropriate firmware onto those boards for 
> their use. The master controller has a standardised KATCP external-facing 
> interface. But we write to the registers on the FPGAs ourselves -- ie only 
> this controller has direct, raw access to the FPGAs. This "master controller" 
> software process kills and starts separate software sub-processes as needed 
> in order to run the various instruments. Sometimes they operate 
> simultaneously by sub-dividing the available boards into separate resource 
> pools.
> 
> We have one special case where two completely independent computers access 
> the hardware. We use this for internal testing and verification. But I 
> wouldn't use that for a deployed, science-observation system due to risk of 
> collisions. You'd have to implement some sort of semaphore/lock/token system, 
> which would require co-ordination between the computers. To me, that seems 
> like a complicated effort for such a simple task.
> 
> Jason Manley
> Functional Manager: DSP
> SKA-SA
> 
> 
> On 28 Jan 2018, at 21:04, Tom Kuiper  > wrote:
> 
> > I'm interested in experience people have had using the same equipment 
> > installed on a telescope for different projects using different firmware 
> > and software.  Have there been issues with firmware swapping?  Are there 
> > software issues that cannot be managed by using a different control 
> > computer or a virtual environment in the same controller?  In addition to 
> > your experience I'd like a summary opinion: yes, it can be done without 
> > risking observations, or no, better to have separate hardware.
> >
> > Many thanks and best regards,
> >
> > Tom
> >
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "casper@lists.berkeley.edu " group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to casper+unsubscr...@lists.berkeley.edu 
> > .
> > To post to this group, send email to casper@lists.berkeley.edu 
> > 

Re: [casper] adc5g Compilation Frequencies

2017-09-12 Thread David MacMahon
Hi, Franco,

I'm not extensively familiar with the inner workings of the AGC5G yellow block, 
but I suspect the limitation is caused by the somewhat obscure constraints 
imposed by the MMCM in the Virtex 6.  This limitation also affects the ADC16 
yellow block.  More details can be found here:

https://casper.berkeley.edu/wiki/ADC16x250-8#ADC16_Sample_Rate_vs_Virtex-6_MMCM_Limitations

Hope this helps,
Dave

> On Sep 12, 2017, at 14:37, Franco  wrote:
> 
> Dear Casperites,
> 
> Recently I've been testing adc5g block for different compilation frequency, I 
> figured that the block can be compiled at [540, 960] U [1080, 2500] MHz, for 
> every other frequency it gives you a "An optimum PLL solution is not 
> available!" error. This restrictions come from the Matlab script 
> 'xps_adc5g.m' that generates the block. The block does some computation I 
> don't understand to to set the PLL parameters (or fails to do). Does someone 
> has information about why this block has that particular behavior? I want to 
> compile a model at 1000MHz, and I wonder if it is possible.
> 
> Many Thanks,
> 
> Franco Curotto
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] Bof files

2017-09-01 Thread David MacMahon
Maybe "./mkbof" instead of just "mkbof"?

Dave

> On Aug 31, 2017, at 23:50, Heystek Grobler  wrote:
> 
> Hi Michael
> 
> If I run mkbof -o system.bof –s core_info.tab -t 3 system.bit it gives the 
> following error/message
> 
> -bash: mkbof: command not found
> 
> I run it like this :
> 
> dserver@rserver:~/Simulink/wb_spectrometer_17/XPS_ROACH_base$ mkbof -o 
> system.bof –s core_info.tab -t 3 system.bit
> 
> I am diong something wrong?
> 
> Thanks for the help
> 
> Heystek :-)
> 
> 
> On Thu, Aug 31, 2017 at 9:39 PM, Michael D'Cruze 
>  > wrote:
> Hi Heystek,
> 
>  
> 
> If a bit file is created – and I think it is (someone correct me if I’m 
> wrong?) – you can use the following code to generate a bof file from it:
> 
>  
> 
> mkbof -o system.bof –s core_info.tab -t 3 system.bit
> 
>  
> 
> where core_info.tab is in XPS_ROACH2_base/.
> 
>  
> 
> If that doesn’t work you can use the ISE suite itself to perform the 
> creation. Have a look at the memo I wrote on the Casper memos page (the one 
> with SmartXplorer in the title) which tells you how to do this.
> 
>  
> 
> Cheers
> 
> Michael
> 
>  
> 
> From: Heystek Grobler [mailto:heystekgrob...@gmail.com 
> ] 
> Sent: 31 August 2017 20:22
> To: Casper Lists
> Subject: [casper] Bof files
> 
>  
> 
> Hi Everyone :-)
> 
>  
> 
> My apologies for bugging everyone again. 
> 
>  
> 
> I want to know I would I be able to compile/create a .bof file if I un-tick 
> the last box on the casper_xps screen?
> 
>  
> 
> I have a unique problem where casper_xps does not run if the last box is 
> ticked. 
> 
>  
> 
> Thanks for the help
> 
>  
> 
> Heystek :-)
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu " group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To post to this group, send email to casper@lists.berkeley.edu 
> .
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To post to this group, send email to casper@lists.berkeley.edu 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] Re: FFT Nyquist Freq

2017-05-22 Thread David MacMahon
Hi, Andrew,

It's been a while since I've looked at the CASPER FFT implementation, but my 
understanding is that the "real" N-channel FFT actually packs two real FFTs 
into one 2N-channel complex FFT.  Corresponding positive and negative bins of 
the complex FFT's output are added/subtracted to get the N channels for each of 
the two inputs.  The "Nyquist channel" (channel +N or -N of the 2N complex FFT) 
has no corresponding channel and i think gets ignored in the current CASPER 
"real" FFT implementation (but not the complex FFT implementation).

The Nyquist channel is a bit of an oddity.  The Fourier coefficients for that 
channel are all real (+1, -1, +1, -1...) so for a purely real (or imaginary) 
input the Nyquist channel will also be real (or imaginary).  That means that 
the Nyquist channel of the two real inputs could easily (in theory) be 
separated.  The idea Jonathan asks about (including the Nyquist channel's data 
in the imaginary component of the DC channel) is interesting, but due to 
aliasing the utility of that channel is somewhat diminished so I'm not sure how 
useful it would be.

Dave

> On May 22, 2017, at 02:13, Andrew Martens  wrote:
> 
> Hey Jonathon
> 
> I am copying my reply to the list to expose my own potential ignorance.
> 
> The CASPER FFT implements a DFT where each resultant bin is the same as 
> mixing with a complex exponential and then low pass filtering the result (as 
> per the DFT definition). The complex exponentials have frequencies centred at 
> 0, 2B/N, 2(2B/N) etc where B is the Bandwidth of your signal and N is the 
> number of FFT bins. So the 'DC' FFT bin goes from -(1/2)*(2B/N) to 
> (1/2)*(2B/N) or -B/N to B/N. For example, a signal sampled at 1024GS/s gives 
> a B of 512MHz. Assuming a 1k point FFT we have a 'DC' bin from -0.5MHz to 
> 0.5MHz. The N/2 FFT bin, in our example, contains 510.5MHz to 511.5MHz. So, 
> all of the FFT bins are shifted down by 1/2 bin from where you might 
> intuitively think they are located. Because we sample a real signal, and due 
> to aliasing, we discard the last N/2 FFT bins as they contain the same 
> information, but as shown above, they are not actually the same as the first 
> N/2.
> 
> You probably know all of this but it does help answer your question. We 
> simply discard the last N/2 channels. So you can never get the last half an 
> FFT bin's worth of frequency info up to Nyquist (in our example, from 
> 511.5MHz to 512MHz. The DC bin is strange as it contains the aliased band 
> from 1023.5MHz to 1024 MHz (hopefully nicely filtered out in analogue 
> stages), as well as 0 MHz to 0.5MHz. So far, we have treated it as something 
> we throw away and have not tried to extract anything from it. 
> 
> This might be a problem for people wanting to do a very coarse FFT, as half 
> an FFT bin might be a lot of bandwidth to discard.
> 
> Regards
> Andrew
> 
> On Fri, May 19, 2017 at 6:17 PM, Jonathon Kocz  > wrote:
> Hi Andrew,
> 
> Sorry to bug you, but I thought it was easier to ask first before looking at 
> the FFT in detail.
> 
> Do you remember, in the CASPER FFT, is the Nyquist frequency (the purely 
> imaginary ch N/2 + 1) put in the imaginary part of the DC or discarded?
> 
> Cheers,
> Jonathon
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To post to this group, send email to casper@lists.berkeley.edu 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] sync_gen parameters questions

2017-04-27 Thread David MacMahon
Hi, Franco,

Maybe CASPER Memo 26 will provide some relevant background info:

http://casper.berkeley.edu/memos/sync_memo_v1.pdf 


HTH,
Dave

> On Apr 27, 2017, at 07:22, Franco  wrote:
> 
> Dear Casper Community,
> 
> 
> I was playing with the sync_gen block, and I have some question regarding its 
> parameters. I read the syn_gen memo 
> (https://casper.berkeley.edu/memos/sync_memo_v1.pdf) but it didn't help much:
> 
> 
> - Simulation Accumulation Length: how do you set this parameter when you can 
> change accumulation length dynamically? Does this value matter only for 
> simulation?(Is the simulation word that is throwing me off).
> 
> - Reorder Orders: In the updated library the Reorder Orders parameter get 
> initialized with the code: getfield( getfield( get_param( gcb,'UserData' ), 
> 'parameters'),'reorder_vec'), does this mean that the block get the orders 
> automatically and I don't have to care about it?
> 
> - Scale: I don't understand what is this parameter used for. Why would you 
> want to use a scale other than 1?
> 
> 
> My apologies if this questions have been answer already, and I wasn't able to 
> find them.
> 
> 
> Thanks,
> 
> 
> Franco
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] RE: ten_Gbe_v2 usage

2017-04-18 Thread David MacMahon
You may also have to enable "jumbo packets" on any switches between the roach2 
and your computer. The default "maximum transmission unit" (MTU) for Ethernet 
is 1500.  Most implementations start with that value by default, but virtually 
all can be configured to be larger. I'm not sure what the absolute maximum is 
(there's not an official limit that I'm aware of), but I've never seen any 
smaller than 9000 which is
more than enough for IP/UDP headers plus your application's packet header 
followed by an 8192 byte payload. I wouldn't recommend planning to get much 
more than 9000. 

HTH,
Dave

PS This MTU thing seems to be a personal nemesis of mine as it frequently bites 
me when I start using a new switch. Maybe some day I'll remember my own advice! 
:)

> On Apr 18, 2017, at 06:26, Madden, Timothy J.  wrote:
> 
> You also have to enable larger packet sizes on the enet card receiving the 
> packets. There are some linux commands to do this, but I dont know them. ask 
> your IT guy.
> 
> Tim
> 
> From: Mike Movius [mi...@reutech.co.za]
> Sent: Tuesday, April 18, 2017 4:20 AM
> To: casper@lists.berkeley.edu
> Subject: [casper] ten_Gbe_v2 usage
> 
> 
> Hi all,
> I have a roach2 design using a ten_Gbe_v2 yellow block to transmit data to an 
> external pc. Everything works fine until the packet size reaches ~1500 bytes 
> after which data stops being transmitted. I did the obvious thing and enabled 
> large packets on the block mask but the problem persists. Any ideas? Thanks, 
> MM.  
> 
> 
> <117041811203900983.gif>  
> Please consider the environment before printing this e-mail
> 
> View the Reutech Radar System online disclaimer at
> http://www.rrs.co.za/links/E-maildisclaimer.asp
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.


Re: [casper] Roach2 Tutorial 4 Troubles (Can't Compile .slx or Upload .fpg)

2016-11-10 Thread David MacMahon
Hi, Alec,

I suspect your MLIB_DEVEL_PATH environment variable is not getting set.  This 
error message...

> An error or warning occurred during a callback while saving 
> '/casper_library/casper_library_bus.slx'.

…shows that matlab is trying to save the casper_library_bus.slx file in 
"/casper_library" (i.e. one level down from the root directory).  I suspect 
that’s not really where you’re mlib_devel lives.  The matlab code that writes 
this file is shown in this part of the error message...

> Error in casper_library_bus_init (line 285)
> filename = save_system(mdl,[getenv('MLIB_DEVEL_PATH'), 
> '/casper_library/', 'casper_library_bus']);

As you can see, if "getenv('MLIB_DEVEL_PATH')" returns an empty string, the the 
file name which will be used is "/casper_library/casper_library_bus" (plus any 
extension that gets added by matlab/simulink).  This is the path that is being 
used (see previous error message) so that indicates that MLIB_DEVEL_PATH is not 
being set in your environment (or maybe set but not "exported").  I think this 
is supposed to happen automatically as part of the "startsg" script.

HTH,
Dave



Re: [casper] High noise floor

2016-11-10 Thread David MacMahon
Hi, Paulina,

That spectrum has way too much structure in it. Something is not working right. 
I'm not that familiar with tut3, but if there is a way to get a snapshot of 
samples from the ADC it would be good to verify whether the data from the ADC 
looks reasonable.  Like most systems, the PFB is GIGO (garbage in, garbage out) 
so it's good to verify that what's going on is not garbage. 

HTH,
Dave

> On Nov 10, 2016, at 09:36, Paulina Unanue  wrote:
> 
> 
> Hi all
> 
> i'm working with the Tutorial 3: Wideband Spectrometer, with ROACH 1, i'm 
> using a signal generation to feed the ADC(2x1000-8 sintonize at 1 GHz) with a 
> tone (-40 dbm 200 MHz)  but i'm obtaining a high noise floor,i had reduced 
> the gain 
> as recommended in the instructions ( "decrease the value (for a -10dBm input 
> 0x100)  to not saturate the spectrum" ) trying different values, but the same 
> problem persists. Any idea that may be causing this?.
> 
> To see the graphic, please check the attached.
> best wishes
> 
> 


Re: [casper] Programming a ROACH2

2016-10-11 Thread David MacMahon
I think the intent of exit_fail() is to try to close the connection, ignore any 
exceptions raised while trying to close the connection, and then re-raise the 
original exception that happened before exit_fail was called, but I think the 
implementation is flawed. Here’s the definition of exit_fail() as it appears on 
GitHub:

def exit_fail():
print 'FAILURE DETECTED. Log entries:\n',lh.printMessages()
try:
fpga.stop()
except: pass
raise
exit()

I think this try/except block (with "pass" in the except part) followed by 
"raise" is completely superfluous.  I think it means try to do something and if 
an exception is raised while trying, ignore it but then re-raise it, which 
seems exactly the same as not having the try/except block there at all!  Not to 
mention that the exit() call will never be reached.  I’m also not a fan of 
functions that can only be called while an exception is being handled 
(otherwise the no-arg form of "raise" will bomb out I think).

It would probably be preferable to pass the original exception to exit_fail() 
as an argument so that the original exception can be re-raised.  I can make 
that change when I get back to Berkeley next week (unless someone beats me to 
it).

Sorry for veering so far off topic,
Dave


> On Oct 11, 2016, at 10:16, Jason Manley <jman...@ska.ac.za> wrote:
> 
> Some of the earlier scripts had bad error handling. If anything fails before 
> the host object was successfully created, then you get this error because it 
> tries to close the connection before exiting.
> 
> Jason
> 
> On 11 Oct 2016, at 16:09, David MacMahon <dav...@berkeley.edu> wrote:
> 
>> 
>>> On Oct 11, 2016, at 06:46, Heystek Grobler <heystekgrob...@gmail.com> wrote:
>>> 
>>> Connecting to server 192.168.33.7 on port 7147...  FAILURE DETECTED
>> 
>> Editorial comments on error handling in tut3.py aside, I think the fact that 
>> "FAILURE DETECTED" follows "Connecting to server…" on the same line (i.e. no 
>> newline character inbetween) means that something went wrong when 
>> constructing the FpgaClient object which connects to TCP port 7147 of the 
>> ROACH2 with IP address 192.168.33.7.  This is expecting the ROACH2 to have a 
>> tcpborphserver process listening on that port.
>> 
>> What happens when you run:
>> 
>> telnet 198.168.33.7 7147
>> 
>> HTH,
>> Dave
>> 
> 




Re: [casper] Programming a ROACH2

2016-10-11 Thread David MacMahon

> On Oct 11, 2016, at 06:46, Heystek Grobler  wrote:
> 
> Connecting to server 192.168.33.7 on port 7147...  FAILURE DETECTED

Editorial comments on error handling in tut3.py aside, I think the fact that 
"FAILURE DETECTED" follows "Connecting to server…" on the same line (i.e. no 
newline character inbetween) means that something went wrong when constructing 
the FpgaClient object which connects to TCP port 7147 of the ROACH2 with IP 
address 192.168.33.7.  This is expecting the ROACH2 to have a tcpborphserver 
process listening on that port.

What happens when you run:

telnet 198.168.33.7 7147

HTH,
Dave



Re: [casper] Question about complex_addsub block

2016-10-03 Thread David MacMahon
I just went into the casper_library directory and ran "git grep -l 
complex_addsub".  I now realize that maybe the block is used by a block in a 
.slx format file which that command will not find.  So maybe it is used.  To be 
sure, one would have to use some form of the "find_system" command at the 
matlab prompt.

Dave

> On Oct 3, 2016, at 16:10, Franco Curotto <francocuro...@gmail.com> wrote:
> 
> Thanks! That's a relief, for a moment there I doubted my high school math 
> knowledge. Just out of curiosity, how do you check if a block is used by 
> another block in the library?
> 
> Franco  
> 
> On 10/03/2016 06:14 PM, David MacMahon wrote:
>> Hi, Franco,
>> 
>> I can understand your confusion!  This block is very unusual.  In all 
>> fairness, that bock does what it says it will do, but it is unclear what 
>> it’s intended purpose is.  I think it really computes "a+(b*)" and 
>> "-i(a-(b*))".
>> 
>> It does not appear to be used by any other blocks in the CASPER library.  I 
>> think this block should be removed from the library (or modified to match 
>> the expectations from its name) unless anyone can justify its continued 
>> existence in its current state.
>> 
>> Hope this helps,
>> Dave
>> 
>>> On Oct 3, 2016, at 10:44, Franco <francocuro...@gmail.com 
>>> <mailto:francocuro...@gmail.com>> wrote:
>>> 
>>> Hi
>>> 
>>> Well I feel a little bit stupid asking this, but from the addsub block:
>>> 
>>> 
>>> Adds and subtracts the complex inputs. 
>>> If a = w + ix, b = y + iz then 
>>> a+b = (w+y)/2 + i(x-z)/2 and 
>>> a-b = (x+z)/2 + i(y-w)/2
>>> 
>>> Why it does it like this? Shouldn't it be simply:
>>> 
>>> a+b = w+y + i(x+z)
>>> 
>>> a-b = w-y + i(x-z)
>>> 
>>> I could understand the division by two as a way to avoid overflow, but not 
>>> the rest of the changes.
>>> 
>>> Franco
>>> 
>> 
> 



Re: [casper] Question about complex_addsub block

2016-10-03 Thread David MacMahon
Hi, Franco,

I can understand your confusion!  This block is very unusual.  In all fairness, 
that bock does what it says it will do, but it is unclear what it’s intended 
purpose is.  I think it really computes "a+(b*)" and "-i(a-(b*))".

It does not appear to be used by any other blocks in the CASPER library.  I 
think this block should be removed from the library (or modified to match the 
expectations from its name) unless anyone can justify its continued existence 
in its current state.

Hope this helps,
Dave

> On Oct 3, 2016, at 10:44, Franco  wrote:
> 
> Hi
> 
> Well I feel a little bit stupid asking this, but from the addsub block:
> 
> 
> Adds and subtracts the complex inputs. 
> If a = w + ix, b = y + iz then 
> a+b = (w+y)/2 + i(x-z)/2 and 
> a-b = (x+z)/2 + i(y-w)/2
> 
> Why it does it like this? Shouldn't it be simply:
> 
> a+b = w+y + i(x+z)
> 
> a-b = w-y + i(x-z)
> 
> I could understand the division by two as a way to avoid overflow, but not 
> the rest of the changes.
> 
> Franco
> 



Re: [casper] FFT speed optimizations

2016-09-23 Thread David MacMahon

> On Sep 23, 2016, at 04:17, Guenter Knittel  wrote:
> 
> I guess normally one would do this by going to the Function Block Parameters 
> and setting
> 
> the latencies accordingly. However, I have made so many manual ad-hoc changes 
> to the
> 
> Simulink model that this is not an option any more.
> 
> I’m sure this is not the best way in general to apply optimizations to the 
> design.
> 
Many of the high level CASPER DSP blocks allow the user to specify latency 
values for things like multipliers, adders, and BRAMs in the block's mask 
dialog box. That's probably the easiest place to change the amount of latency 
for the underlying components.  It's a bit of a blunt tool in that it applies 
to entire high level block, but it's easy to use. Delving into the innards to 
make ad hoc changes makes it hard to go back because changing the mask dialog 
parameters will obliterate all the ad hoc changes. Ad hoc changes to latencies 
can also be tricky because other (possibly not so obvious) latencies might need 
corresponding changes.

Dave

Re: [casper] ADC16x250-8 coax rev 2 noise

2016-09-20 Thread David MacMahon
Hi, Adam,

I haven't looked at the spectral content of a terminated input before so I 
don’t have any comparative results, but I think the spikes you are seeing are 
caused by mismatched gains and/or offsets of the ADC’s interleaved cores (I 
think there are a total of 8 cores, also called "branches" in the datasheet).  
Have you tried sampling band limited noise (e.g. low pass at Fs/20)?  I think 
the severity of the spikes will be less in the presence of a signal.  In the 
worst case you will have to ignore the 8 frequency channels with these spikes, 
but you might be able to improve things by tweaking the coarse gain and/or fine 
gain registers in the ADC.

HTH,
Dave

> On Sep 20, 2016, at 11:46 AM, Schoenwald, Adam (GSFC-564.0)[AS and D, Inc.] 
>  wrote:
> 
> Hi All,
> I have a question regarding the ADC16x250-8 coax rev 2 running in demux by 4 
> mode. When I terminate the inputs with a 50 ohm connector and collect data 
> using a basic snapshot block design, I get a little bit of noise on the LSB. 
> I export the data to a csv file and find that it is a collection of 0’s and 
> -1’s, indicating that the level being converted by the adc is sitting between 
> these two values and there is some sort of internal noise causing us to jump 
> between values. What is alarming is that when I collected 2^18 samples and 
> run spectral analysis I see large spikes at intervals of FS/16 (See Attached, 
> pwelch(x,1024,256,[],800)).
>  
> I am running the ADC at 800MHZ and the FPGA at 200MHZ. I am using one of the 
> more recent casper builds after the bitslipping commit on Mar 25th. I also 
> tried to use the ruby script located 
> athttps://github.com/jack-h/casper_adc16/tree/master/ruby/lib 
> .
>  
> I run the bof file with “adc16_init.rb --reg=0x3a=0x0202,0x3b=0x0202 -d 4 
> roach_ip boffile”, where I change the ic inputs. We only connected 8 of the 
> 16 coax cables to the board and I had to switch off the default.
>  
> Has anyone else had a similar experience or ideas? 
> 
> Adam Schoenwald
> 
>  
> 



Re: [casper] Help with Xilinx and Simulink

2016-08-16 Thread David MacMahon
Hi, Heystek,

> On Aug 16, 2016, at 8:36 AM, Heystek Grobler  wrote:
> 
> Details:
> standard exception: XNetlistEngine:
> Exception message could not be parsed:
> com.xilinx.sysgen.netlist.NetlistInternal: couldn't open first
> pass text file at
> /home/heystek/Desktop/tut1/sysgen/sysgen/masterScript4237968397574398825.pl
> line 559'
> 

I suggest you open this .pl file with a text editor and look at (and around) 
line 559 to see what that line is trying to do.  That might provide some clues.

HTH,
Dave



Re: [casper] gpu correlator

2016-07-26 Thread David MacMahon
Hi, Gerry,

You could start with this:

https://github.com/GPU-correlators/xGPU

I don’t think Kate Clark has optimized it yet for the GTX 1080, but it should 
still work reasonably well.

Dave

> On Jul 26, 2016, at 4:16 PM, Gerald Harp  wrote:
> 
> Hi
> 
> Although this is not really a question involving CASPER hardware...
> 
> Does anyone have a software correlator written for latter generation NVidia 
> cards? Can I share your work?
> 
> Gerry
> 
> -- 
> Gerald (Gerry) R. Harp
> Senior Astrophysicist
> SETI Institute
> 189 Bernardo, Ste. 100
> Mountain View, CA 94043
> 650-960-4576
> 
> 




Re: [casper] ROACH status queries

2016-07-15 Thread David MacMahon
Hi, Eric,

It sure sounds like your second BOF file is not being properly clocked. I think 
even roach1 designs include a built-in register (maybe named 
"says_clk_counter") for estimating the FPGA clock frequency. The python corr 
package has a function named something like est_clock_freq()  (I'm not on my 
computer right now so I can't check the exact name) that reads this register, 
sleeps 1 second, and reads the register again. The difference in register 
counts divided by time interval provided the estimate of the FPGA clock 
frequency.

When you run one BOF file, "set things up", and then run a different BOF file, 
the loading of the second BOF file reinitializes everything inside the FPGA 
that got setup with the first BOF file. It is possible in some scenarios for 
the first BOF file to configure resources external to the FPGA. In that case 
the settings of the external resources can persist from one BOF to the next 
BOF. 

Is there anything unusual about your clocking scheme?  Have you verified that 
the init_clock design and the "take data" design are clocked the same way (eg 
ADC0 clock)?

Thanks,
Dave

> On Jul 14, 2016, at 12:59, Miller, Eric H.  wrote:
> 
> Thanks all for the replies and suggestions.  
> 
> "?listdev" does indeed show a number of register names, so it looks like the 
> ROACH was programmed successfully, and that "program" was an indication of 
> success rather than an error.  
> 
> Currently I am unable to take data (attempts return only zeros).  Perhaps 
> more significantly, the clock returns \x00\x00\x00\x00.  
> 
> Some additional information:
> I am using roach1. 
> tcpborphserver2 is running on the ROACH.
> I have been controlling the ROACH through the Casper python interface on the 
> control computer.  
> I have input a clock signal at 746 MHz through the digitization card.  
> I can successfully read and write to any registers on the ROACH after it is 
> programmed. 
> My first step is to load and run a .bof file called init_clock.  After 
> loading this, clock values increment reasonably.  
> My second step is to load and run a .bof file to take data.  This appears to 
> load correctly, but bram registers are all zeros after forcing a trigger, and 
> the clock value sits at zero always.  
> 
> This system had been run previously with success using the same python 
> operations and .bof files.  It's possible that there are required packages 
> that are not installed on the ROACH, because the USB drive had been loaded as 
> read-only (so packages installed previously may now be absent).  
> 
> Best,
> Eric Miller
> 
> 
> 
> From: Marc Welz 
> Sent: Tuesday, June 28, 2016 1:52 AM
> To: Miller, Eric H.
> Cc: To: casper@lists.berkeley.edu
> Subject: Re: [casper] ROACH status queries
> 
> What does
> 
> "?listdev" (issued via telnet roach-ip 7147, discard the quotes) have
> to say after a program ? If there are lots of register names, then it
> probably programmed successfully.
> 
> You don't specify if you are using a roach1 or roach2...  Generally
> you can telnet to the roach on port 7147 via a separate connection
> which you can keep open indefinitely, and it should give you some
> feedback on what it is attempting to do. If you issue a "?log-level
> debug" or even "?log-level trace" you should get even more detailed
> feedback. Programming on a roach1 happens in a subprocess, so that is
> one of the cases where there is a bit less detail... common problems
> include that the executable in the bof file doesn't match the
> libraries installed on the roach.
> 
> regards
> 
> marc
> 
> 
>> On Mon, Jun 27, 2016 at 3:45 PM, Miller, Eric H.  
>> wrote:
>> Hello ROACHers,
>> 
>> 
>> I'm having some trouble operating a ROACH I've inherited, which used to
>> work.  Currently, after loading a .bof file via progdev, a status query
>> returns "program."  I understand this to be an error message of some sort,
>> but cannot find any documentation explaining what the various status errors
>> mean.  Can anyone shed some light on this, or point me to where these error
>> messages are detailed?
>> 
>> 
>> Thanks,
>> 
>> Eric
> 



Re: [casper] XST options

2016-07-07 Thread David MacMahon
Hi, Gunter,

I think you are looking for the resynth_netlist function found in mlib_devel as 
the file resynth_netlist.m. I think the comments make it rather 
self-explanatory, but please let me know (via the mailing list) if it's not 
quite what you're after. 

Cheers,
Dave

> On Jul 7, 2016, at 08:38, Guenter Knittel  wrote:
> 
> Hi,
>  
> I’m new to this list, and I would be grateful if somebody could give me a 
> hint.
> I’m trying to speed-optimize a completed and working SL design, and it appears
> as if an old topic is a main problem. This is the default XST option to merge
> a chain of FFs into a shift register.
> What I’m trying to accomplish is to run casper_xps with the right XST options
> from the start. What I have learned so far is that the XST options are written
> into the file system_xst.scr, which is re-generated before each run. The file
> fast_runtime.opt only applies to tools running after XST.
> Now I’m trying to figure out which tool is actually assembling this scr file, 
> and
> where it gets the options from. In the hope that I can change the default 
> behavior.
> Can somebody give me a pointer? Or is my approach fundamentally wrong?
>  
> Thanks a lot
> Gunter
> from MPIfR Bonn
>  


Re: [casper] Error in Simulating Roach 2 Tutorial 1 .mdl file

2016-06-14 Thread David MacMahon
Hi, Christopher,

I don’t have any insights into working with the ""Scope" blocks other than to 
switch to the "WaveScope" block.  It is much nicer than the Scope block, IMHO.  
It’s a little different to setup, but I think it’s well worth it!

FWIW, I think all the tutorials should be modified to use WaveScope instead of 
Scope.

HTH,
Dave

> On Jun 14, 2016, at 11:17 AM, Christopher Barnes  wrote:
> 
> The axes were not auto-scaled for the adder; you were correct.
> 
> For the counter, the settings are correct; I just checked them again.  Do you 
> have any more ideas on this?
> 
> On Tue, Jun 14, 2016 at 2:04 PM, Jack Hickish  > wrote:
> For the adder, have you clicked the icon at the top marked with a pair of 
> binoculars to auto-scale the axes?
> 
> For the counter, are your control signal and slicing definitely correct -- 
> i.e., is the reset of the counter 0, and the enable of the counter 1?
> 
> Cheers
> Jack
> 
> On Tue, 14 Jun 2016 at 11:00 Christopher Barnes  > wrote:
> Jack,
> 
> I'm not able to reproduce the counter output 
> (https://casper.berkeley.edu/wiki/File:Counter_sim.png 
> ) or the Adder Output 
> (https://casper.berkeley.edu/wiki/File:Adder_sim.png 
> ).  Instead of those 
> two, I get a line at 0 for the counter output and then just an empty set of 
> axes for the adder output.
> 
> On Tue, Jun 14, 2016 at 1:56 PM, Jack Hickish  > wrote:
> Hi Christopher,
> 
> Which scope outputs are you not able to reproduce from the wiki? What outputs 
> did you see?
> 
> Cheers
> Jack
> 
> On Tue, 14 Jun 2016 at 10:21 Christopher Barnes  > wrote:
> Hello,
> 
> My name is Christopher Barnes, and I'm a graduate student at the University 
> of Michigan.  I'm working through the first four tutorials on the Casper 
> website (located at https://casper.berkeley.edu/wiki/Tutorials 
> ) to program a ROACH2 as a 
> wideband pocket correlator.  I've finished building the .mdl file from the 
> first tutorial, and my output from the scopes did not match what is displayed 
> on the webpage, so I'm requesting some help with this.  I'm a beginner, so 
> I'm not sure what the problem could be aside from the cookbook instructions 
> on the website.
> 
> If you're willing to help me, then please email me back and I can show you my 
> file.  My suspicion is that the discrepancy between my output and the output 
> on the webpage is caused by an outdated release of the tools in Simulink.
> 
> 
> 
> 
> 
> -- 
> Physics Graduate Student Symposium
> University of Michigan
> sympphysg...@umich.edu 
> Website 
> 
> 
> 
> 
> -- 
> Physics Graduate Student Symposium
> University of Michigan
> sympphysg...@umich.edu 
> Website 
> 



Re: [casper] Working with demux modes of 'ADC16x250-8 coax rev 2'

2016-03-11 Thread David MacMahon
Hi Nilan,

I think the problem is that your design appears (based on the timing report) to 
require a signal to propagate from the output of “delay42” through “convert8” 
through “mult2” through “addsub1" through “addsub2" and through “mult1” all 
within one clock cycle.  That is simply asking too much from the FPGA. As the 
timing report indicates, “Component delays alone exceed the constraint”, which 
means that the timing constraint could not be met even if the signal had zero 
propagation delay between components.  You somehow have to restructure the 
filter implementation so that it can be realized within the limits of the FPGA. 
 You might be able to gain some improvements through clever use of the DSP48 
block in Matlab, which can provide more control of DSP48 blocks rather than 
relying on the tools to merge multipliers and adders in an optimal way, but I 
suspect that alone will not be sufficient to get this design to meet timing.  I 
also suspect that the bit width of the signal is rather large as well, which 
also makes things harder.

Hope this helps,
Dave

On Mar 10, 2016, at 16:49, Nilan Udayanga <g...@zips.uakron.edu 
<mailto:g...@zips.uakron.edu>> wrote:

> Hi Jack,
> 
> Thank you very much for your suggestions. The block t_z2 is a 2nd order 
> feedback loop (figure is attached, Even though it shows 3 delays in 
> multipliers, it does not have any delays). 
> But I don't think this feedback loop may cause that much of delay. 
> 
> Regards,
> Nilan Udayanga.
> 
> On Thu, Mar 10, 2016 at 7:18 PM, Jack Hickish <jackhick...@gmail.com 
> <mailto:jackhick...@gmail.com>> wrote:
> Hi Nilan,
> 
> It looks like there's a block called (something like) ppcm12/block_t_z2 with 
> a huge logic delay -- from line 135 of the failing twr file --
> 
>   Data Path Delay:  20.585ns (Levels of Logic = 12)(Component delays 
> alone exceeds constraint)
> 
> What is this block? It looks like it has some multipliers and adders and 
> stuff...
> 
> There's also a timing error in the adc yellow block, but my guess is this is 
> just because the place and route tool gave up when it hit impossible 
> constraints elsewhere.
> 
> Cheers,
> Jack
> 
> On Thu, 10 Mar 2016 at 23:49 Nilan Udayanga <g...@zips.uakron.edu 
> <mailto:g...@zips.uakron.edu>> wrote:
> Hi all,
> 
> We are having a little weird problem during the compilation of a roach 2 
> design with the adc16 block. I have a design for a specific application. It 
> is well pipelined and we are using ADC interfaces clocked at 200 MHz. When I 
> just terminate the output without using any software registers at the output, 
> there is no timing error (all timing costrains have been met). And when I 
> compile the design using the software register at the output (just a one 
> software register), it has a timing error, and says the maximum frequency 
> that can be achieved is around 50 MHz. I am wondering whether it is a problem 
> with the software register or not. Please find the following attachments for 
> the .twr and .twx files for each cases. 
> 
> I have tried using snapshots blocks too. Thats giving the same timing error. 
> 
> Your help will be greatly appreciated.
> 
> Regards,
> Nilan Udayanga.
> 
> On Wed, Mar 9, 2016 at 4:03 PM, Nilan Udayanga <g...@zips.uakron.edu 
> <mailto:g...@zips.uakron.edu>> wrote:
> Hi All,
> 
> Thank you very much for all your suggestions.
> 
> I have two more questions,
> 
> Since, ADCs need to be clocked at 480 MHz for the demux=2 mode, how does the 
> FPGA clock at 240 MHz? does it use a clock divider internally?
> 
> Is there any maximum operating frequency for the FPGA, when we use the adc16 
> block? 
> 
> Regards,
> Nilan Udayanga.
> 
> On Wed, Mar 9, 2016 at 3:22 PM, Jack Hickish <jackhick...@gmail.com 
> <mailto:jackhick...@gmail.com>> wrote:
> With regards to the demux option, for the system you describe you want -d 2 
> (I.e. demux by = run the FPGA at half the sample rate, and process two 
> samples in parallel on every FPGA clock cycle). Basically, provided you have 
> the up to date ruby package, all you need to do is run adc16_init.rb with 
> appropriate options, and that will program your roach and set everything up 
> for you. 
> 
> I think the default mode of the adc16 ruby script assumes that, whatever mode 
> you're using the ADC in, the external clock provided is at the sample rate. 
> Though, as Matt added, the ADC supports other dividing options if they're 
> useful to you and you're willing to read the ADC data sheet to work out how 
> to set the divider properties. 
> 
> Cheers
> Jack
> 
> 
> On Wed, 9 Mar 2016, 09:49 David MacMahon, <dav...@astro.berkeley.edu 
> &

Re: [casper] Working with demux modes of 'ADC16x250-8 coax rev 2'

2016-03-09 Thread David MacMahon
Hi Vishwa,

I am not at my computer right now, so this is from memory, but I think you want 
to specify an IP clock rate of 240 MHz and supply a 480 MHz clock to the ADC 
card(s). The IP clock rate is sometimes called the fabric clock rate. It is the 
rate at which the FPGA logic elements (aka fabric) operate. The ADC chips need 
a sample clock that is commensurate with the sampling frequency. When you 
initialize the ADCs using adc16_init.rb, be sure to pass the "-d" option. If 
your version of adc16_init.rb does not support the "-d" option, then you will 
need to update it. 

Hope this helps,
Dave 

> On Mar 9, 2016, at 08:33, Vishwa Seneviratne <mp...@zips.uakron.edu> wrote:
> 
> Hi David/Jack,
> 
> We are working on a beam former and we use the 'ADC16x250-8 coax rev 2' to 
> sample RF signals using ROACH2-Rev 2. The operating BW is 240MHz. Thus, we 
> need to sample the signals at 480 MSamples/s. We have few queries regarding 
> the adc16 yellow block and how to setup the input clock.
> 
> 1. Can we compile a design by setting the IP clock rate to 480MHz?
> 2. Should we supply a IP clock frequency of 480MHz to the ADC board to 
> achieve a sampling rate of 480MSamples/s.  If not, at what clock rate should 
> we supply? And what other parameters needed to setup when running the bof 
> file.  
> 
> Thank you 
> 
> 
> Sincerely,
> 
> Vishwa Seneviratne
> Graduate Student
> Dept. of Electrical and Computer Engineering
> University of Akron
> 
>> On Wed, Feb 3, 2016 at 12:38 PM, David MacMahon <dav...@astro.berkeley.edu> 
>> wrote:
>> Hi, Vishwa,
>> 
>> The software installed by following the ADC16 user guide had not been 
>> updated with the newer version of the adc16 code that supports demux mode.  
>> I have updated the software that the user guide points to, so if you 
>> reinstall the adc16 gem as per the user guide you should get version 0.4.0 
>> which supports demux mode.
>> 
>> Thanks for bringing this issue to my attention.
>> 
>> Dave
>> 
>>> On Feb 3, 2016, at 7:02 PM, Vishwa Seneviratne <mp...@zips.uakron.edu> 
>>> wrote:
>>> 
>>> Hi Dave,
>>> 
>>> Here is the output.
>>> 
>>> vishwa@server3:~/Desktop/roach/poly$ adc16_init.rb -h
>>> Usage: adc16_init.rb [OPTIONS] HOSTNAME BOF
>>> 
>>> Programs HOSTNAME with ADC16-based design BOF and then calibrates
>>> the serdes receivers.
>>> 
>>> Options:
>>> -i, --iters=NNumber of snaps per tap [1]
>>> -r, --reg=R1=V1[,R2=V2...]   Register addr=value pairs to set
>>> -v, --[no-]verbose   Display more info [false]
>>> -h, --help   Show this message
>>>  
>>> vishwa@server3:~/Desktop/roach/poly$ gem list adc16
>>> 
>>> *** LOCAL GEMS ***
>>> 
>>> adc16 (0.3.6)
>>> 
>>> 
>>> 
>>> 
>>> Sincerely,
>>> 
>>> Vishwa Seneviratne
>>> Graduate Student
>>> Dept. of Electrical and Computer Engineering
>>> University of Akron
>>> 
>>>> On Wed, Feb 3, 2016 at 11:28 AM, David MacMahon 
>>>> <dav...@astro.berkeley.edu> wrote:
>>>> What does "adc16_init.rb -h" show?  What does "gem list adc16" show?  
>>>> Maybe you need a newer version of the adc16 code. 
>>>> 
>>>> Dave
>>>> 
>>>>> On Feb 3, 2016, at 18:20, Vishwa Seneviratne <mp...@zips.uakron.edu> 
>>>>> wrote:
>>>>> 
>>>>> Hi Jack,
>>>>> 
>>>>> I'm thinking that the ruby script 'adc16_init.rb' does not identify the 
>>>>> '--demux' parameter. I used the code at 
>>>>> 'git://github.com/david-macmahon/casper_adc16.git'. What can I do to set 
>>>>> the parameter?
>>>>> 
>>>>> Thank you
>>>>> 
>>>>> 
>>>>> Sincerely,
>>>>> 
>>>>> Vishwa Seneviratne
>>>>> Graduate Student
>>>>> Dept. of Electrical and Computer Engineering
>>>>> University of Akron
>>>>> 
>>>>>> On Wed, Feb 3, 2016 at 11:02 AM, Vishwa Seneviratne 
>>>>>> <mp...@zips.uakron.edu> wrote:
>>>>>> Hi Jack,
>>>>>> 
>>>>>> I did try all the combinations. The error remains the same. 
>>>>>> 
>>>>>> $ adc16_init.rb -v --demux=1 192.168.10.5 poly_design.bo

Re: [casper] PlanAhead to a working bof

2016-02-03 Thread David MacMahon
Hi, Johnathon,

> On Feb 3, 2016, at 2:26 AM, Gard, Johnathon D.  
> wrote:
> 
> There are options in the PlanAhead bitfile generation and I could have those 
> wrong. This could be very likely. 

The bitgen options that the CASPER flow uses are in 
mlib_devel/xps_base/XPS_ROACH2_base/etc/bitgen.ut.

> I  could alternatively use the system.ucf file updated by PlanAhead through 
> the casper_xps process in matlab. However this would drop my control of the 
> PAR which seems to have a strong influence on how well it meets timing. 
> Ironically, timing driven placement gets the far worse timing results. 

Newer versions of mlib_devel support the inclusion of user-specified UCF 
snippets into the overall UCF file.  To do this, you create a 
“/ucf” directory and then place files with a “.ucf” extension into 
that directory.  All .ucf files in that directory will be included in the 
overall system.ucf file that the casper flow generates.

For example, if your model is "/path/to/my_model.slx", then all *.ucf files in 
“/path/to/my_model/ucf/” will be automatically included in the system.ucf 
constraints file.

HTH,
Dave



Re: [casper] Working with demux modes of 'ADC16x250-8 coax rev 2'

2016-02-03 Thread David MacMahon
What does "adc16_init.rb -h" show?  What does "gem list adc16" show?  Maybe you 
need a newer version of the adc16 code. 

Dave

> On Feb 3, 2016, at 18:20, Vishwa Seneviratne <mp...@zips.uakron.edu> wrote:
> 
> Hi Jack,
> 
> I'm thinking that the ruby script 'adc16_init.rb' does not identify the 
> '--demux' parameter. I used the code at 
> 'git://github.com/david-macmahon/casper_adc16.git'. What can I do to set the 
> parameter?
> 
> Thank you
> 
> 
> Sincerely,
> 
> Vishwa Seneviratne
> Graduate Student
> Dept. of Electrical and Computer Engineering
> University of Akron
> 
>> On Wed, Feb 3, 2016 at 11:02 AM, Vishwa Seneviratne <mp...@zips.uakron.edu> 
>> wrote:
>> Hi Jack,
>> 
>> I did try all the combinations. The error remains the same. 
>> 
>> $ adc16_init.rb -v --demux=1 192.168.10.5 poly_design.bof
>> /var/lib/gems/1.9.1/gems/adc16-0.3.6/bin/adc16_init.rb:40:in `> (required)>': invalid option: --demux=2 (OptionParser::InvalidOption)
>>  from /usr/local/bin/adc16_init.rb:19:in `load'
>>  from /usr/local/bin/adc16_init.rb:19:in `'
>> 
>> 
>> Sincerely,
>> 
>> Vishwa Seneviratne
>> Graduate Student
>> Dept. of Electrical and Computer Engineering
>> University of Akron
>> 
>>> On Wed, Feb 3, 2016 at 2:25 AM, Jack Hickish <jackhick...@gmail.com> wrote:
>>> Hi Vishwa,
>>> 
>>> Is the syntax definitely -demux=1 andnot either --demux=1 or -d 1 ?
>>> 
>>> 
>>> 
>>> Jack
>>> 
>>> 
>>>> On Wed, 3 Feb 2016, 12:39 a.m. Vishwa Seneviratne <mp...@zips.uakron.edu> 
>>>> wrote:
>>>> Hi,
>>>> 
>>>> I am working on how to work with different operating of the 'ADC16x250-8 
>>>> coax rev 2' for a very simple design to test how the ADC works. The design 
>>>> is compiled at an IP clock rate setting of 200MHz. My objective is to 
>>>> sample my input signal at higher sampling rate (preferably 400, 800 MHz).
>>>> 
>>>> According to the user guide 
>>>> ''https://casper.berkeley.edu/wiki/images/4/4c/ADC16_user_guide.txt; by 
>>>> setting the demux parameter I should be able to switch between different 
>>>> sampling rates. I get the following error.
>>>> 
>>>> $ adc16_init.rb -v -demux=1 192.168.10.5 poly_design.bof
>>>> /var/lib/gems/1.9.1/gems/adc16-0.3.6/bin/adc16_init.rb:40:in `>>> (required)>': invalid option: -demux=2 (OptionParser::InvalidOption)
>>>>from /usr/local/bin/adc16_init.rb:19:in `load'
>>>>from /usr/local/bin/adc16_init.rb:19:in `'
>>>> 
>>>> When I don't pass the 'demux' parameter the ADC board get initialized to 8 
>>>> analog inputs by default.
>>>> 
>>>> How do I resolve this issue? or how can I set the ADC's to operate at 
>>>> different sampling rates?
>>>> 
>>>> Thank you in advance
>>>> 
>>>> Sincerely,
>>>> 
>>>> Vishwa Seneviratne
>>>> Graduate Student
>>>> Dept. of Electrical and Computer Engineering
>>>> University of Akron
> 


Re: [casper] Working with demux modes of 'ADC16x250-8 coax rev 2'

2016-02-03 Thread David MacMahon
Hi, Vishwa,

The software installed by following the ADC16 user guide had not been updated 
with the newer version of the adc16 code that supports demux mode.  I have 
updated the software that the user guide points to, so if you reinstall the 
adc16 gem as per the user guide you should get version 0.4.0 which supports 
demux mode.

Thanks for bringing this issue to my attention.

Dave

> On Feb 3, 2016, at 7:02 PM, Vishwa Seneviratne <mp...@zips.uakron.edu> wrote:
> 
> Hi Dave,
> 
> Here is the output.
> 
> vishwa@server3:~/Desktop/roach/poly$ adc16_init.rb -h
> Usage: adc16_init.rb [OPTIONS] HOSTNAME BOF
> 
> Programs HOSTNAME with ADC16-based design BOF and then calibrates
> the serdes receivers.
> 
> Options:
> -i, --iters=NNumber of snaps per tap [1]
> -r, --reg=R1=V1[,R2=V2...]   Register addr=value pairs to set
> -v, --[no-]verbose   Display more info [false]
> -h, --help   Show this message
>  
> vishwa@server3:~/Desktop/roach/poly$ gem list adc16
> 
> *** LOCAL GEMS ***
> 
> adc16 (0.3.6)
> 
> 
> 
> 
> Sincerely,
> 
> Vishwa Seneviratne
> Graduate Student
> Dept. of Electrical and Computer Engineering
> University of Akron
> 
> On Wed, Feb 3, 2016 at 11:28 AM, David MacMahon <dav...@astro.berkeley.edu 
> <mailto:dav...@astro.berkeley.edu>> wrote:
> What does "adc16_init.rb -h" show?  What does "gem list adc16" show?  Maybe 
> you need a newer version of the adc16 code. 
> 
> Dave
> 
> On Feb 3, 2016, at 18:20, Vishwa Seneviratne <mp...@zips.uakron.edu 
> <mailto:mp...@zips.uakron.edu>> wrote:
> 
>> Hi Jack,
>> 
>> I'm thinking that the ruby script 'adc16_init.rb' does not identify the 
>> '--demux' parameter. I used the code at 
>> 'git://github.com/david-macmahon/casper_adc16.git 
>> <http://github.com/david-macmahon/casper_adc16.git>'. What can I do to set 
>> the parameter?
>> 
>> Thank you
>> 
>> 
>> Sincerely,
>> 
>> Vishwa Seneviratne
>> Graduate Student
>> Dept. of Electrical and Computer Engineering
>> University of Akron
>> 
>> On Wed, Feb 3, 2016 at 11:02 AM, Vishwa Seneviratne <mp...@zips.uakron.edu 
>> <mailto:mp...@zips.uakron.edu>> wrote:
>> Hi Jack,
>> 
>> I did try all the combinations. The error remains the same. 
>> 
>> $ adc16_init.rb -v --demux=1 192.168.10.5 poly_design.bof
>> /var/lib/gems/1.9.1/gems/adc16-0.3.6/bin/adc16_init.rb:40:in `> (required)>': invalid option: --demux=2 (OptionParser::InvalidOption)
>>  from /usr/local/bin/adc16_init.rb:19:in `load'
>>  from /usr/local/bin/adc16_init.rb:19:in `'
>> 
>> 
>> Sincerely,
>> 
>> Vishwa Seneviratne
>> Graduate Student
>> Dept. of Electrical and Computer Engineering
>> University of Akron
>> 
>> On Wed, Feb 3, 2016 at 2:25 AM, Jack Hickish <jackhick...@gmail.com 
>> <mailto:jackhick...@gmail.com>> wrote:
>> Hi Vishwa,
>> 
>> Is the syntax definitely -demux=1 andnot either --demux=1 or -d 1 ?
>> 
>> 
>> 
>> Jack
>> 
>> 
>> On Wed, 3 Feb 2016, 12:39 a.m. Vishwa Seneviratne <mp...@zips.uakron.edu 
>> <mailto:mp...@zips.uakron.edu>> wrote:
>> Hi,
>> 
>> I am working on how to work with different operating of the 'ADC16x250-8 
>> coax rev 2' for a very simple design to test how the ADC works. The design 
>> is compiled at an IP clock rate setting of 200MHz. My objective is to sample 
>> my input signal at higher sampling rate (preferably 400, 800 MHz).
>> 
>> According to the user guide 
>> ''https://casper.berkeley.edu/wiki/images/4/4c/ADC16_user_guide.txt 
>> <https://casper.berkeley.edu/wiki/images/4/4c/ADC16_user_guide.txt>" by 
>> setting the demux parameter I should be able to switch between different 
>> sampling rates. I get the following error.
>> 
>> $ adc16_init.rb -v -demux=1 192.168.10.5 poly_design.bof
>> /var/lib/gems/1.9.1/gems/adc16-0.3.6/bin/adc16_init.rb:40:in `> (required)>': invalid option: -demux=2 (OptionParser::InvalidOption)
>>  from /usr/local/bin/adc16_init.rb:19:in `load'
>>  from /usr/local/bin/adc16_init.rb:19:in `'
>> 
>> When I don't pass the 'demux' parameter the ADC board get initialized to 8 
>> analog inputs by default.
>> 
>> How do I resolve this issue? or how can I set the ADC's to operate at 
>> different sampling rates?
>> 
>> Thank you in advance
>> 
>> Sincerely,
>> 
>> Vishwa Seneviratne
>> Graduate Student
>> Dept. of Electrical and Computer Engineering
>> University of Akron
>> 
>> 
> 



Re: [casper] Matlab / Xilinx startup error

2016-01-11 Thread David MacMahon
Hi, Brad,

I think the problem is with the “awk” utility.  It’s probably something like 
the tools are running the awk binary from the OS, but it ends up using a shared 
library from your MATLAB or Xilinx installation (or vice versa) and there is 
something of a mismatch.  I recommend trying to find out which awk binary is 
being used and then use “ldd /path/to/the/used/awk” to find out which libraries 
the dynamic linker is using with it.

FWIW, on Ubuntu 12.04 I found that I needed to use the OS version of libstdc++ 
so I renamed /opt/Xilinx/14.7/ISE_DS/ISE/lib/lin64/libstdc++.so.6 (to save it 
as a backup) then recreated it as a symlink to the OS version at 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.

HTH,
Dave

> On Jan 7, 2016, at 4:52 PM, Brad Dober  wrote:
> 
> awk: symbol lookup error: awk: undefined symbol: mpfr_z_sub
> 




Re: [casper] FFT woes

2015-11-11 Thread David MacMahon
Hi, Dan,

I think the anecdote you mention happened with older Xilinx tools and an 
older/ancient mlib_devel version.  I could be wrong about this, but I’m not 
convinced it applies to the current versions of things.  Any evidence one way 
or the other would be most welcome.

Cheers,
Dave

> On Nov 11, 2015, at 10:37 AM, Dan Werthimer  wrote:
> 
> 
> hi michael,
> 
> what operating system are you using?
> we have seen problems where the FFT works in simulation,
> and doesn't produce correct results on the FPGA when we were compiling using 
> a non-xilinx supported
> operating system.
> the problem occurred only for large FFT's -  i think 8K or larger.  
> 
> best wishes,
> 
> dan
> 
> 
> On Wed, Nov 11, 2015 at 7:34 AM, Michael D'Cruze 
>  > wrote:
> Hi Jack
> 
>  
> 
> Sorry it’s taken me so long to come back (I’m going to write back to everyone 
> shortly). I’ve been chasing a few hunches I’ve had which might have 
> exonerated the FFT, but to no avail. Indeed the FFT does simulate OK, but in 
> the majority of cases in hardware every other channel is a zero. I say in the 
> majority of cases, because in one or two cases the design works correctly. I 
> have not been able to find a reason for this yet.
> 
>  
> 
> BW
> Michael
> 
>  
> 
> From: Jack Hickish [mailto:jackhick...@gmail.com 
> ] 
> Sent: 03 November 2015 01:01
> To: Michael D'Cruze
> Cc: casper@lists.berkeley.edu 
> Subject: Re: [casper] FFT woes
> 
>  
> 
> Hi Michael,
> 
>  
> 
> Just so everyone is on the same page -- does your issue only show up in 
> hardware like Andrew/Jonathon's - i.e., in simulation the FFT works ok?
> 
>  
> 
> Jack
> 
>  
> 
> On 3 November 2015 at 00:57, Michael D'Cruze 
>  > wrote:
> 
> Dear all,
> 
>  
> 
> Following on from the email thread from Jonathan Kocz and Andrew Martens 
> about odd FFT outputs….
> 
>  
> 
> I’ve been experiencing similar inexplicable problems for a while now. Every 
> other channel in my output is invariably a zero. I’ve tried everything I can 
> think of, including solutions along the lines of those observed to work by 
> Jonathan and Andrew (black-boxing, changing mask parameters etc.), in 
> addition to wiping clean my libraries and re-syncing with 
> casper-astro-soak-test. I’ve even re-drawn the entire model from scratch. The 
> results are always the same. Below is a link to an example output.
> 
>  
> 
> https://dl.dropboxusercontent.com/u/38103354/32k_test_image.png 
> 
>  
> 
> Hopefully it’s clear from a_0 (note that a_0 is zoomed in, a_1 is not) that 
> every other channel outputs zero, and the interleaved a_0 and a_1 spectra (to 
> form the full 32k channel spectrum) are interleaving correctly to produce 
> pairs of zeroes. I’ve been trying various things for quite a while now, 
> without success and would appreciate some suggestions…!
> 
>  
> 
> Thanks
> 
> Michael
> 
>  
> 
> 



Re: [casper] Multicast on 10 gbe on ROACH-2?

2015-11-05 Thread David MacMahon
As I think Jack commented, I was remembering an old trick that predated 
multicast support in the core.




> On Nov 5, 2015, at 02:51, Marc Welz  wrote:
> 
> Not sure if the arp table is involved in this - the destination mac is
> (should be)
> generated algorithmically from the destination multicast IP address, though
> the above might be a unusual workaround.



Re: [casper] ADC16x250 SERDES calibration issues

2015-11-04 Thread David MacMahon
Hi, Danny,

If you have its dependencies installed, the adc16_plot_taps.rb script can be 
useful for showing the deskew stuff.  The sync errors seem really strange.  
What is the ADC clock frequency?  Have you tried swapping ADC clock cables 
between a “good” ROACH2 and a “bad/marginal” ROACH2?

Dave

> On Nov 4, 2015, at 3:07 PM, Danny Price  wrote:
> 
> Hi all
> 
> We are seeing some errors where our ADC16x250 cards don’t seem to be 
> completing SERDES calibration successfully when programmed (using the 
> adc16_init.rb script). 
> 
> Running the adc16_status.rb script we see errors like this:
> 
> rofl1: Design built for ROACH2 rev2 with 8 ADCs (ZDOK rev2)
> rofl1: Gateware does not support demux modes
> rofl1: ZDOK0 clock OK, ZDOK1 clock OK
> rofl1: 
> rofl1: 12341234123412341234123412341234
> rofl1: .XX. deskew
> rofl1: .XXX.X...XX. sync
> 
> Interestingly, out of our 16 roach2 boards, it seems to only/mainly affect 
> boards #1 and #7. The FPGAs themselves seem to be clocking correctly, and 
> we’re reasonably sure that we have good clock signal distribution.
> 
> Iteratively reprogramming the roach we can eventually get these boards to 
> calibrate successfully, but it takes multiple trials.
> 
> Any ideas of things to check/consider?
> 
> Thanks
> Danny 



Re: [casper] Problems with ADC captured data.

2015-09-07 Thread David MacMahon
Hi, Sharat,

On Sep 6, 2015, at 11:02 PM, Jack Hickish wrote:

> As the code suggests, the error comes because bit 1 of core 3 appears to 
> never be glitch free, no matter what the delay setting. It's not obvious to 
> me what could cause this.

Just to expand on what Jack said, here are a few possible ideas (some of which 
are sheer speculation):

1. Verify the pinout of the ADC data pins in system_pad.txt.  Revision 1 of the 
ROACH2 connected one of the ZDOK differential pairs to FPGA pins that were in a 
different bank than the others.  This sub-optimal situation was "discovered" 
after a small number of the "rev 1" boards had been made.  The design was 
quickly re-done for "rev 2".  Virtually all ROACH2s now are "rev 2", but in the 
interest of advancing ROACH2 development (and their own development) SMA took 
delivery of the "rev 1" boards.  It could be possible that you are using an 
mlib_devel that is targeting a "rev 1" board, but using the resultant BOF file 
on a "rev 2" board.

2. Try to compile your design for a slower clock setting.  For example, maybe 2 
or 3 Gsps instead of 5 Gsps.  While this may not meet your application's needs, 
it could be a useful diagnostic.  Different sample clock rates would result in 
different MMCM parameters.  The MMCM is quite complex and sometimes the clock 
rates used result in internal MMCM parameters that are suboptimal.  Although 
not directly relevant to the ADC5G, the following ADC16 write-up expands on 
this idea:

https://casper.berkeley.edu/wiki/ADC16x250-8#ADC16_Sample_Rate_vs_Virtex-6_MMCM_Limitations

Caveat: I am not vary familiar with the ADC5G's MMCM configuration, so this may 
not be an issue at all.

3. Swap the ADCs between the two ZDOK connectors and see whether the same 
bit(s) from the same core(s) fail(s) in the same way(s).  If so, then the 
problem is probably on the ROACH2 side of things.

4. Find another ROACH2 with ADC5G cards on it and try your BOF file there.  If 
it fails the same way then it is most likely a gateware problem and not a 
hardware problem.

HTH,
Dave




Re: [casper] VHDL black-boxing: basic issue

2015-08-10 Thread David MacMahon
Thanks for the update, Michael!  I'm glad you got it resolved.

Dave

On Aug 10, 2015, at 5:47 AM, Michael D'Cruze wrote:

 Hi guys,
  
 Just to say that Xilinx came back with a pretty standard response, evidently 
 not reading through my message thoroughly enough. After explaining a second 
 time, the only suggestion they could come back with was to reinstall 
 everything (Matlab, Xilinx ISE etc.) which I’ve now done, and the black 
 boxing seems to be working.
  
 If anyone comes across this in future….”brute force” reinstallation worked 
 for me. Though I still have no idea what caused the problem in the first 
 place.
  
 Thanks
 Michael
  
 From: Homin Jiang [mailto:ho...@asiaa.sinica.edu.tw] 
 Sent: 10 August 2015 04:24
 To: Michael D'Cruze
 Subject: [casper] VHDL black-boxing: basic issue
  
 Hi Michael:
  
 Which version of toolflow you are using ?
 If V14.7, following is my own solution, hope that help:
   • Or right click the black box on the library icon, then select the 
 VHD file, for example, amiba2_v147/fft_1k_core.vhd. This vhd file has the 
 same name as the one under amiba2_v147/fft_1k_core. Don't get mess up.
   • copy the *.m file from the subdirectories to upper level. i.e. copy 
 fgain_core_config.m from /amiba2_v147/fgain_core to amiba2_v147. Because the 
 black box only search the config file in the same directory.
   • Check the  this_block.addFile('vhd') in the config.m file, it has 
 to be correct vhd filename, for example: cdelay_core.vhd, not just vhd. 
 homin jiang
  
  
  
 Message: 4
 Date: Fri, 7 Aug 2015 14:50:41 +
 From: Michael D'Cruze michael.dcr...@postgrad.manchester.ac.uk
 Subject: [casper] VHDL black-boxing: basic issue
 To: 'casper@lists.berkeley.edu' casper@lists.berkeley.edu
 Message-ID:
 
 am2pr01mb0385dc4804c6367dcdff39828a...@am2pr01mb0385.eurprd01.prod.exchangelabs.com
  
 Content-Type: text/plain; charset=us-ascii
  
 Hi Casper,
  
 I'm trying to black box some of my larger blocks (PFB, FFT), initially 
 following tutorial 6 and the memo on the wiki. Both say that, as soon as the 
 black box is dropped into the model, it should throw up a dialog box allowing 
 me to link to the pre-compiled code. However, this dialog doesn't come up. I 
 can't see any other instances of this on the mail archive... does anyone have 
 any ideas how to solve this, or get the block to manually link to the .vhd 
 file? I can't see anything obvious from within Simulink. This seems a really 
 silly problem!
  
 Thanks
 Michael
 -- next part --
 An HTML attachment scrubbed and removed.
 HTML attachments are only available in MIME digests.
  
 End of casper Digest, Vol 93, Issue 7
 *




Re: [casper] VHDL black-boxing: basic issue

2015-08-09 Thread David MacMahon
Hi, Michael,

That sounds very strange.  Did you drag-and-drop the black box block into the 
model from the library or did you right-click it in the library and pick Add 
to model?  Are you sure that the dialog box didn't pop up behind other windows 
(sometimes happens on Linux with X11)?  Does your model already have a System 
Generator block in it?  Have you tried exiting Matlab and restarting?

Please post if/when you solve it so others will be able to find the solution in 
the archive!

Hope this helps,
Dave

On Aug 7, 2015, at 7:50 AM, Michael D'Cruze wrote:

 Hi Casper,
  
 I’m trying to black box some of my larger blocks (PFB, FFT), initially 
 following tutorial 6 and the memo on the wiki. Both say that, as soon as the 
 black box is dropped into the model, it should throw up a dialog box allowing 
 me to link to the pre-compiled code. However, this dialog doesn’t come up. I 
 can’t see any other instances of this on the mail archive… does anyone have 
 any ideas how to solve this, or get the block to manually link to the .vhd 
 file? I can’t see anything obvious from within Simulink. This seems a really 
 silly problem!
  
 Thanks
 Michael




Re: [casper] Roach2 aux clocking and Bram's

2015-08-04 Thread David MacMahon
Hi, Vereese,

That's a very curious failure mode!  It's very interesting that everything 
worked fine when clocking via iADC, but not when clocking via your mezzanine 
board (or aux_clk) even though the fabric clock rate was the same.  I can think 
of two possible theories for what's going on (both alluded to in your email).

The first theory is that the problem is somehow related to the address mapping. 
 Maybe under certain circumstances the memory map works out such that bram7 is 
accessed via an extra opb_to_opb bridge (or some other slight variation) 
compared to the other brams?  It would be interesting to compare the system.mhs 
and core_info.tab files for your various test builds so what's different 
between working builds and non-working builds.  What if you add additional 
unrelated BRAMs such that they appear earlier in the address map (maybe give 
them a lexicographically early name like aaa_bram0) or maybe enlarge the 
existing BRAMs?  That might push bram6 (and others?) into the hypothetically 
troublesome address range.  Maybe clocking from the iADC changed the address 
map such that this hypothesized problem was avoided.

The other theory is that some sort of MMCM (mis-)configuration is causing the 
MMCM to behave in a weird way.  I don't know what kind of problem that would 
be.  I've only seen issues when trying to use the ISERDES resources at fairly 
high bit rates, but the ~155 MHz clock should be fine.  I think the MMCM config 
options are shown in the system_map.mrp file, so maybe comparing working vs 
non-working versions of those files would be illuminating.

When you tried to run aux_clk at faster than 143 MHz, what values did you use 
for MULTIPLY and DIVIDE?  It looks like the defaults would result in the same 
settings as the hardcoded values (with the exception of CLKOUT5) .  I think the 
MULTIPLY and DIVIDE parameters are intended to be used for sys_clk, so I'm not 
sure how they get set (or passed on) for aux_clk.

Hope this helps,
Dave

On Aug 4, 2015, at 12:13 PM, Vereese Van Tonder wrote:

 Hi Everyone,
 
 I'm clocking a ROACH2 design from a mezzanine board at 155.52MHz. In the 
 design I have 8 bram's each with a 16 bit address width and data width. I'm 
 writing the bram's address counter value into the bram, (so at the first 
 address write 0, at the second address write 1, ... at the last address write 
 2^16-1). The output bram's 0-6 shows the expected linear relationship however 
 bram7 does the following (also see attached .png's):
 
 Address 0-16387: expected linear relationship with gradient = 1
 Address 16388 - 20476 (diff=4088): the output toggles between the expected 
 behavior and the following set 
 (4,12,20,...,252,4,12,20,...,252,..4,12,,252)
 Address 20477 - 36867: expected linear relationship with gradient = 1
 Address 36868 - 40956 (diff=4088): the output toggles between the expected 
 behavior and the following set 
 (4,12,20,...,252,4,12,20,...,252,..4,12,,252)
 
 The design works in simulation and when I clock the same design from the on 
 board 100MHz system clock I don't get the problem anymore. I also saw that 
 when I include only 7 brams and occupy the memory space 102-0103 for 
 bram0 . 0108-0109 for bram6 then I don't get the error but when I 
 do include the 8th bram which occupies the space 010E-010F then I get 
 the error. I'm unsure whether this is a memory problem or a clocking speed 
 error. I also clocked the design off an iadc running at 4*155.52=622.08 MHz 
 and then the design worked.
 
 Then I tried compiling a design that clocks from the aux_clk input and the 
 design fails because the FVCO is out of range, I found the input limit to be 
 143MHz. From an earlier post I saw that Dave said the limit is between 
 100MHz-200MHz as there are some hard-coded parameters in the MMCM of the 
 roach_infrastructure_v1_00_a pcore. I modified the MMCM_BASE_aux_clk 
 instantiation to take the parameter values instead of having the hard coded 
 values as follows:
 
 .CLKFBOUT_MULT_F  (6), - .CLKFBOUT_MULT_F  (MULTIPLY)
 .CLKOUT1_DIVIDE (6), - .CLKOUT1_DIVIDE (DIVIDE), //THIS IS THE DIVISOR
 .CLKOUT2_DIVIDE (6), - .CLKOUT2_DIVIDE (DIVIDE),
 .CLKOUT3_DIVIDE (6), - .CLKOUT3_DIVIDE (DIVIDE),
 .CLKOUT4_DIVIDE (6), - .CLKOUT4_DIVIDE (DIVIDE),
 .CLKOUT5_DIVIDE (6), - .CLKOUT5_DIVIDE (DIVIDE/2),
 .CLKOUT6_DIVIDE (6), - .CLKOUT6_DIVIDE (MULTIPLY/DIVCLK),
 .DIVCLK_DIVIDE(1), - .DIVCLK_DIVIDE(DIVCLK),
 
 but it didn't solve the problem. Has anyone clocked a ROACH2 board from the 
 aux_clk input at a  frequency higher than 143MHz?
 
 Thanks,
 Vereese
 all_brams.pngbram7.pngslx.png




Re: [casper] Roach-2 crashing fix

2015-07-28 Thread David MacMahon
Hi, Marc,

On Jul 28, 2015, at 1:34 AM, Marc Welz wrote:

 So I confess to relying on third parties for this information, but isn't the 
 board populated with 1Gb RAM after all ? 

When U-Boot starts up it reports that the system has 512 KB of memory.  I 
assume (uh-oh!) that uboot is detecting that size dynamically at run time.  Is 
it possible that later production runs of the ROACH2 were populated with larger 
capacity memory chips?

 Would the crash be trigged by a kernel memory layout of 3Gb+1Gb rather than  
 2Gb+2Gb ?

I don't understand this question.  Can you please clarify?  I don't think it's 
a layout issue, but rather a size issue.

 Have you tried the kernel from 9 months ago at github ska-sa/roach2_nfs_uboot 
 ? 

I'll have to double check the kernel version that we used.

Thanks,
Dave




Re: [casper] Call for awesome commits

2015-05-29 Thread David MacMahon
Thanks, Jack!!!  That all sounds awesome!!!

Dave

On May 29, 2015, at 5:27 PM, Jack Hickish wrote:

 Howdy,
 
 I've just merged a bunch of stuff into the casper-astro repository. I haven't 
 yet merged it into master, but it's the casper-astro-soak-test -- 
 https://github.com/casper-astro/mlib_devel/tree/casper-astro-soak-test
 
 I'm going to run a few compiles against it next week to check all is well.
 
 Highlights, and major changes:
 
 - Merged sma-wideband
 - Merged github
 - Added/updated x64 adc and mkid adc/dac for ROACH2
 - Added UCF yellow block
 - Updated QDR interface
 - Added software in mlib_devel/xps_support_sw for QDR calibration
 - Removed spaces from yellow block names (This will almost certainly cause 
 your MSSGE block to become unlinked. update_casper_blocks(bdroot) should fix 
 this. And is probably a good idea if you're updating your libraries anyway)
 
 I've tried the QDR calibration between 120MHz and 325 MHz with good results. 
 Any feedback bug reports welcome.
 
 Cheers,
 Jack
 
 
 On Wed, 27 May 2015 at 21:35 Jack Hickish jackhick...@gmail.com wrote:
 Hi All,
 
 I believe I have a version of the QDR block / software that works at
 every conceivable clock frequency anyone could want. Tomorrow
 (Berkeley time), I'm going to merge this into the main casper-astro
 github repository. This seems like as good a time as any to ask: does
 anyone have any bugfixes/features/new blocks they would like to add to
 the main repository? So far I've got the following
 
 -- everything in ska-sa as of now. (I based my fix off ska-sa:master)
 -- tweaked qdr interface
 -- remove spaces from xps blocks (because that now throws errors)
 -- add a ucf yellow block to add custom constraints to models
 
 If you have suggestions/requests, feel free to raise github pull
 requests or email me repo addresses / commit hashes / patches / other
 useful info. If you want to contribute something but aren't sure
 exactly how best to do this, just drop me an email and we'll figure
 something out!
 
 Cheers,
 Jack




Re: [casper] Roach1 Host name lookup error.

2015-05-27 Thread David MacMahon
Hi, Brad,

On May 27, 2015, at 9:37 AM, Brad Dober wrote:

 I have no issues with the Roach2 that is also connected to this host computer.

No issues meaning that you can tftp the uImage file from the server to the 
roach2?  Does tcpdump/wireshark show any clues vis a vis the ROACH1's attempted 
tftp transfer?

 I switched the Roach1's and Roach'2 ethernet cables and still have the same 
 problem on the Roach1 and Roach2 is still working fine.

Did this also swap the switch ports?

 Which pins / voltages should I be checking on the power supply?

I'd check all of them.  I think it's a standard ATX power supply, but the wiki 
might have more details.

Maybe try reseating (or replacing) the ROACH1 PPC's DIMM module?  I guess it 
was working OK with soloboot/usbboot until the root filesystem got corrupted, 
so maybe this isn't really relevant?

 U-Boot 2008.10-svn3231 (Jul 15 2010 - 14:58:38)

I don't know of any specific problems, but you might want to consider updating 
uboot...

https://casper.berkeley.edu/wiki/ROACH_kernel_uboot_update

https://casper.berkeley.edu/wiki/LatestVersions

Dave




Re: [casper] Roach1 Host name lookup error.

2015-05-27 Thread David MacMahon
Hi, Brad,

On May 27, 2015, at 1:15 PM, Brad Dober wrote:

 Kernel command line: console=ttyS0,115200 
 mtdparts=physmap-flash.0:1792k(linux),256k@0x1c(fdt),8192k@0x20(root),54656k@0xa0(usr),256k@0x3f6(env),384k@0x3fa(uboot)fdt_addr=0xfc1c
  root=192.168.40.1:/srv/roach_boot/etch ip=dhcp

I think you want root=/dev/nfs rootpath=192.168.40.1:/srv/roach_boot/etch 
instead of root=192.168.40.1:/srv/roach_boot/etch.

Interrupt the u-boot startup and run printenv to see how this line gets 
constructed, then fix it, then run saveenv nd reboot.

The fact that this roach1 gets this far indicates that something is bad with 
the other one.

Dave




Re: [casper] Roach1 Host name lookup error.

2015-05-26 Thread David MacMahon
Nothing obvious comes to mind (yet).  Can you watch the roach1 boot process via 
serial console?  What does that show?  Are you using dnsmasq for the DHCP 
server?  What if you try direct connect with mii-tool to set the speed of eth1 
to 100 Mbps?

Dave

On May 26, 2015, at 5:00 PM, Brad Dober wrote:

 Hi Dave,
 
 Here is the configuration of the network. The host computer, a ROACH1 and a 
 working ROACH2 running in soloboot are the only ones connected. The host is 
 192.168.40.1 and is offering 192.168.100.50 and the ROACH2 is assigned to 
 192.168.40.50. Is there anything weird about how the ROACH1 handles larger 
 subnets like below? Or maybe infinite address leases?
 
 eth1  Link encap:Ethernet  HWaddr 00:08:54:54:d3:f5  
   inet addr:192.168.40.1  Bcast:192.168.255.255  Mask:255.255.0.0
   inet6 addr: fe80::208:54ff:fe54:d3f5/64 Scope:Link
   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
   RX packets:705 errors:0 dropped:0 overruns:0 frame:0
   TX packets:1377 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000 
   RX bytes:210574 (210.5 KB)  TX bytes:302407 (302.4 KB)
   Interrupt:20 Base address:0x6000
 
 Here is the tcpdump:
 
 tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 
 bytes
 19:46:49.968388 02:00:00:03:01:91  ff:ff:ff:ff:ff:ff, ethertype IPv4 
 (0x0800), length 343: (tos 0x0, ttl 255, id 225, offset 0, flags [DF], proto 
 UDP (17), length 329)
 0.0.0.0.68  255.255.255.255.67: BOOTP/DHCP, Request from 
 02:00:00:03:01:91, length 301, xid 0x144252, secs 1130, Flags [none]
 Client-Ethernet-Address 02:00:00:03:01:91
 Vendor-rfc1048 Extensions
   Magic Cookie 0x63825363
   DHCP-Message Option 53, length 1: Discover
   MSZ Option 57, length 2: 576
   Parameter-Request Option 55, length 5: 
 Subnet-Mask, Default-Gateway, Hostname, BS
 RP
 19:46:49.968953 00:08:54:54:d3:f5  02:00:00:03:01:91, ethertype IPv4 
 (0x0800), length 349: (tos 0x0, ttl 64, id 33388, offset 0, flags [none], 
 proto UDP (17), length 335)
 192.168.40.1.67  192.168.100.50.68: BOOTP/DHCP, Reply, length 307, xid 
 0x144252, secs 1130, Flags [none]
 Your-IP 192.168.100.50
 Server-IP 192.168.40.1
 Client-Ethernet-Address 02:00:00:03:01:91
 file uImage
 Vendor-rfc1048 Extensions
   Magic Cookie 0x63825363
   DHCP-Message Option 53, length 1: Offer
   Server-ID Option 54, length 4: 192.168.40.1
   Lease-Time Option 51, length 4: 4294967295
   Subnet-Mask Option 1, length 4: 255.255.0.0
   Hostname Option 12, length 8: roach1-4
   RP Option 17, length 33: 192.168.40.1:/srv/roach_boot/etch
 19:46:52.971116 02:00:00:03:01:91  ff:ff:ff:ff:ff:ff, ethertype IPv4 
 (0x0800), length 343: (tos 0x0, ttl 255, id 226, offset 0, flags [DF], proto 
 UDP (17), length 329)
 0.0.0.0.68  255.255.255.255.67: BOOTP/DHCP, Request from 
 02:00:00:03:01:91, length 301, xid 0x144e0d, secs 1133, Flags [none]
 Client-Ethernet-Address 02:00:00:03:01:91
 Vendor-rfc1048 Extensions
   Magic Cookie 0x63825363
   DHCP-Message Option 53, length 1: Discover
   MSZ Option 57, length 2: 576
   Parameter-Request Option 55, length 5: 
 Subnet-Mask, Default-Gateway, Hostname, BS
 RP
 19:46:52.971744 00:08:54:54:d3:f5  02:00:00:03:01:91, ethertype IPv4 
 (0x0800), length 349: (tos 0x0, ttl 64, id 34083, offset 0, flags [none], 
 proto UDP (17), length 335)
 192.168.40.1.67  192.168.100.50.68: BOOTP/DHCP, Reply, length 307, xid 
 0x144e0d, secs 1133, Flags [none]
 Your-IP 192.168.100.50
 Server-IP 192.168.40.1
 Client-Ethernet-Address 02:00:00:03:01:91
 file uImage
 Vendor-rfc1048 Extensions
   Magic Cookie 0x63825363
   DHCP-Message Option 53, length 1: Offer
   Server-ID Option 54, length 4: 192.168.40.1
   Lease-Time Option 51, length 4: 4294967295
   Subnet-Mask Option 1, length 4: 255.255.0.0
   Hostname Option 12, length 8: roach1-4
   RP Option 17, length 33: 192.168.40.1:/srv/roach_boot/etch
 
 
 Brad Dober
 Ph.D. Candidate
 Department of Physics and Astronomy
 University of Pennsylvania
 Cell: 262-949-4668
 
 On Tue, May 26, 2015 at 7:42 PM, David MacMahon dav...@astro.berkeley.edu 
 wrote:
 Weird.  Are there any other hosts on the network that might be also sending 
 (non-netboot-aware) DHCP offers?
 
 What does sudo tcpdump -i eth0 -n -e -v port bootps or port bootpc show 
 (replacing eth0 with the actual network interface name where the DHCP 
 activity is).
 
 Dave
 
 On May 26, 2015, at 4:34 PM, Brad Dober wrote:
 
  Hi Dave,
 
  I switched to a 100 Mbps switch, and now I'm still getting the ROACH1 
  continuously sending DCHP  discovers, and my host computer continuously 
  sending offers

Re: [casper] Roach1 Host name lookup error.

2015-05-26 Thread David MacMahon
Are you trying to run the ROACH1 on 1 GbE?  ROACH1 is not reliable on 1 GbE.  
You have to force it to be 100 Mbps.  This can be done by using an unmanaged 
non-gigabit switch (or hub), a managed switch that can force its port for the 
ROACH1 to be 100 Mbsp only.  For direct connect, you'll have to use mii-tool 
on the server.

Another thing that always gets me is MTU.  I don't think the ROACH1 u-boot 
supports jumbo frames, so you'll have to run the server with MTU==1500 to 
netboot ROACH1.

Dave

On May 26, 2015, at 3:51 PM, Brad Dober wrote:

 I've switched to NFS boot to avoid SD card corruptions. 
 
 However, when attempting to run netboot, the roach will send an IP discover, 
 the host will offer one, and then the roach will send a discover again.
 This goes on for 10-15 times when finally the roach will request the correct 
 IP, and the host will acknowledge. The roach will then begin the tftp of the 
 uboot image, but will request block 1 multiple times, gets sent it, 
 acknowledges once starts getting block 1 and 2 sent and then restarts the 
 whole process asking for an IP request.
 
 The whole process seems very strange and I'm having trouble wrapping my head 
 around what could be causing it.
 
 Has anyone encountered something similar???
 
 
 Brad Dober
 Ph.D. Candidate
 Department of Physics and Astronomy
 University of Pennsylvania
 Cell: 262-949-4668
 
 On Thu, May 21, 2015 at 3:32 AM, Marc Welz m...@ska.ac.za wrote:
 
 
 On Wed, May 20, 2015 at 5:19 PM, Brad Dober do...@sas.upenn.edu wrote:
 Hi Casperites,
 
 I have a Roach1 which is booting from an SD card.
 I booted it up yesterday, and it was displaying the STALE NFS handle error 
 that other people have seen in the past (which suggested a corrupt flash 
 card). I ran fsck and fixed several errors, and when rebooting, the stale nfs 
 handle error went away. However, now the ROACH could not connect to the 
 network. 
 
 When I run ifconfig 128.91.46.20 netmask 255.255.248.0 gateway 128.91.4, I 
 get:
 gateway: Host name lookup failure
 
 root@(none):~# hostname -v
 (none)
 
 If you require a hostname, put it in /etc/hostname or similar and then run
 
 hostname -f /etc/hostname
 
 regards
 
 marc
 
 
 




Re: [casper] Roach1 Host name lookup error.

2015-05-26 Thread David MacMahon
Weird.  Are there any other hosts on the network that might be also sending 
(non-netboot-aware) DHCP offers?

What does sudo tcpdump -i eth0 -n -e -v port bootps or port bootpc show 
(replacing eth0 with the actual network interface name where the DHCP activity 
is).

Dave

On May 26, 2015, at 4:34 PM, Brad Dober wrote:

 Hi Dave,
 
 I switched to a 100 Mbps switch, and now I'm still getting the ROACH1 
 continuously sending DCHP  discovers, and my host computer continuously 
 sending offers, but now the occasional request/acknowledge and uboot download 
 is no longer happening.
 
 For what it's worth, I am not using jumbo frames.
 
 
 Brad Dober
 Ph.D. Candidate
 Department of Physics and Astronomy
 University of Pennsylvania
 Cell: 262-949-4668
 
 On Tue, May 26, 2015 at 7:21 PM, David MacMahon dav...@astro.berkeley.edu 
 wrote:
 Are you trying to run the ROACH1 on 1 GbE?  ROACH1 is not reliable on 1 GbE.  
 You have to force it to be 100 Mbps.  This can be done by using an unmanaged 
 non-gigabit switch (or hub), a managed switch that can force its port for the 
 ROACH1 to be 100 Mbsp only.  For direct connect, you'll have to use 
 mii-tool on the server.
 
 Another thing that always gets me is MTU.  I don't think the ROACH1 u-boot 
 supports jumbo frames, so you'll have to run the server with MTU==1500 to 
 netboot ROACH1.
 
 Dave
 
 On May 26, 2015, at 3:51 PM, Brad Dober wrote:
 
  I've switched to NFS boot to avoid SD card corruptions.
 
  However, when attempting to run netboot, the roach will send an IP 
  discover, the host will offer one, and then the roach will send a discover 
  again.
  This goes on for 10-15 times when finally the roach will request the 
  correct IP, and the host will acknowledge. The roach will then begin the 
  tftp of the uboot image, but will request block 1 multiple times, gets sent 
  it, acknowledges once starts getting block 1 and 2 sent and then restarts 
  the whole process asking for an IP request.
 
  The whole process seems very strange and I'm having trouble wrapping my 
  head around what could be causing it.
 
  Has anyone encountered something similar???
 
 
  Brad Dober
  Ph.D. Candidate
  Department of Physics and Astronomy
  University of Pennsylvania
  Cell: 262-949-4668
 
  On Thu, May 21, 2015 at 3:32 AM, Marc Welz m...@ska.ac.za wrote:
 
 
  On Wed, May 20, 2015 at 5:19 PM, Brad Dober do...@sas.upenn.edu wrote:
  Hi Casperites,
 
  I have a Roach1 which is booting from an SD card.
  I booted it up yesterday, and it was displaying the STALE NFS handle 
  error that other people have seen in the past (which suggested a corrupt 
  flash card). I ran fsck and fixed several errors, and when rebooting, the 
  stale nfs handle error went away. However, now the ROACH could not connect 
  to the network.
 
  When I run ifconfig 128.91.46.20 netmask 255.255.248.0 gateway 128.91.4, 
  I get:
  gateway: Host name lookup failure
 
  root@(none):~# hostname -v
  (none)
 
  If you require a hostname, put it in /etc/hostname or similar and then run
 
  hostname -f /etc/hostname
 
  regards
 
  marc
 
 
 
 
 




Re: [casper] Select IO DDR Puzzle

2015-05-18 Thread David MacMahon
Hi, Rich,

What are the frequencies of the various clocks?  I think you want the OSERDES 
CLKDIV to be the same freq as the parallel data's clock.  For 6 bit parallel 
data, you want the freq of the OSERDES CLK to be 6*freq(CLKDIV) in SDR mode and 
3*freq(CLKDIV) in DDR mode.  I think the same is true for the ISERDES.  I'm not 
sure I'm discerning things correctly from the VHDL, but it looks like maybe you 
are running CLKDIV at 87.5 MHz instead of 125 MHz?

Dave




Re: [casper] Roach1 not working

2015-05-08 Thread David MacMahon
Hi, Nishanth,

On May 7, 2015, at 8:23 PM, Nishanth Shivashankaran wrote:

 ENET Speed is 1000 Mbps - FULL duplex connection (EMAC0)

I don't know whether this is causing your problem, but I believe the ROACH1 
ethernet port is not reliable at 1 Gbps.  I think you somehow need to limit it 
to 100 Mbps (e.g. use a switch/hub that only supports 100 Mbps or use the 
mii-tool program on the server for direct connections).

HTH,
Dave




Re: [casper] Skewed data samples

2015-04-27 Thread David MacMahon
Hi, Tom,

Have you calculated the skewness for some largish number of samples or are you 
just going by the appearance of the histogram?  If the latter, are you sure 
that the apparent skewness is not due to artifacts from the histogram bin 
limits vs discrete sample values?

If you swap ADCs, does the same input signal show the same skewness?

Just some ideas,
Dave

On Apr 24, 2015, at 5:38 PM, Kuiper, Thomas (3266) wrote:

 Thanks, Dan.  Yes, we're using KAT ADCs.  I'm not worried about a DC offset 
 and I know about the slight ADC bias.  It's the skewness I'm wondering about. 
  It's just barely detectable by eye in a histogram.
 
 Tom
 
 From: dan.werthi...@gmail.com [dan.werthi...@gmail.com] on behalf of Dan 
 Werthimer [d...@ssl.berkeley.edu]
 Sent: Friday, April 24, 2015 5:34 PM
 To: Kuiper, Thomas (3266)
 Cc: G Jones; Casper Lists
 Subject: Re: [casper] Skewed data samples
 
 hi tom,
 
 if you are using casper adcs:
 
 all the casper adc boards are AC coupled
 (they have baluns and coupling capacitors),
 so  even if your input signal has a DC offset, it won't couple
 into the ADC.   however, there are slight DC offsets in the ADC,
 so there will be a small spike in the DC bin, but probably
 not from the signal your are injecting.
 
 best wishes,
 
 dan
 




[casper] New CASPER toolflow features for planAhead

2015-04-23 Thread David MacMahon
I pushed a few changes to the casper-astro mlib_devel repository to make life 
easier when working with Pblocks and planAhead on CASPER designs.

## casper_create_ppr.sh

The casper_create_ppr.sh shell script has been added to the ROACH2 base 
package (XPS_ROACH2_base).  After completing a build (typically one that failed 
to meet timing) you can cd into the design's XPS_ROACH2_base directory and run 
the casper_create_ppr.sh script to create and populate a planAhead project 
(i.e. a .ppr file) that can be used to explore the results of the build and 
then perhaps define Pblocks.  See the log message for more details:

https://github.com/casper-astro/mlib_devel/commit/a949c9d

* UPDATE: The log's example invocation of planAhead is not quite right.  It 
shows planAhead ../planahead/foo-g1234567.ppr, but it should be planAhead 
./planahead/foo-g1234567.ppr (i.e. it should be ./ not ../).  The name of 
the actual ppr file will depend on your design name and its git status.

## Auto-include of user-provided UCF snippets

With the new changes, the toolflow now provides the ability to automatically 
include UCF snippets, such as one might use to define Pblocks and assign 
various components to them.  Once you have created a UCF snippet that defines 
Pblocks and assigns components to them, all you need to do is store that UCF 
snippet in the model_name/ucf subdirectory.  For example, if your model 
file is named .../foo.mdl, then you would place your UCF snippet(s) in 
.../foo/ucf/.  When the toolflow next creates the overall system.ucf file 
(e.g. on the next build), it will automatically include these UCF snippet(s) in 
the overall system.ucf file.  This provides a convenient way to apply 
previously defined Pblocks to future builds as well as to version control the 
user-defined UCF snippet(s).  The location of the UCF snippets can be 
overridden by environment variables (if desired).  For more details, see the 
commit log message:

https://github.com/casper-astro/mlib_devel/commit/057d65b

Thanks go to Rurik Primiani for the original 
environment-variable-specified-auto-include feature upon which this 
auto-include feature is based!  Note that this new auto-include feature is 100% 
backwards compatible with Rurik's, so if you are using that already, this new 
one will not break anything for you.

Enjoy,
Dave

P.S. AFAICT, these changes have not yet been pushed to the ska-sa mlib_devel 
repository.  If you cloned your mlib_devel from there then you will have to 
wait until these changes get there.




[casper] QDR simulations

2015-03-31 Thread David MacMahon
Does anyone know whether simulations involving the QDR yellow block are 
supposed to work?

They don't seem to be working for me.  Maybe I'm doing something wrong?

Thanks,
Dave




Re: [casper] Operands to the || and operators must be convertible to logical scalar values

2015-03-09 Thread David MacMahon
Hi, Charles,

Is it when you click OK on a block's mask dialog or when you run update 
diagram or ???

Maybe you can find additional details in the model_sysgen.log or 
model_sysgen_warning.log or model_sysgen_error.log files?  It would 
really help to know which block or file is causing this error.

Thanks,
Dave


On Mar 9, 2015, at 6:24 AM, Charles Copley wrote:

 Hi all,
 
 I have a strange error that arises after a small change to a design:
 
 I have used the sync pulse to reset a counter, that in turn (together with a 
 comparator coupled to a register) produces a higher frequency pulse for use 
 in snap blocks i.e. to write the snap block data more frequently than simply 
 using the sync pulse for synchronizing the data to the FFT windows.  
 
 As soon as I do this I get the error in the subject line:
 Operands to the || and  operators must be convertible to logical scalar 
 values
 
 This does not happen if I use the sync pulse directly, without the counter.
 
 All the data types are identical after a port update.
 
 Does anyone have this problem with the 2014 July Casper toolflow?
 
 Thanks in advance...
 
 
 Charles Copley
 
 Cell: +27 (0) 84 430 1160
  ---
  




Re: [casper] unable to run tcpborphserver2 from the command line

2015-02-25 Thread David MacMahon
Hi, Paul,

Probably you are, but just to verify, are you running tcpborphserver2 from the 
command line as root?

It sounds like something changes between boot time and command line time.  Are 
you sure it's the same tcpborphserver2 executable that is getting executed?  
Are you running additional processes post-boot that might be 
locking/hogging/clobbering some resource(s) that tcpborphserver2 needs?  Have 
you tried running tcpborphserver2 under strace to see if that helps pinpoint 
what is failing?

Just some ideas,
Dave

On Feb 25, 2015, at 3:16 AM, Paul Marganian wrote:

 Thanks Marc,
 No, the *only* difference between being able to program the fpga with 
 'progdev' is whether or not I'm running tcpborphserver2 from the 
 command-line, or whether it is running from when the board booted up.  It 
 makes no difference what bof I use.
 Paul
 
 On 02/25/2015 02:46 AM, Marc Welz wrote:
 
 
 
 ?progdev mba15_obs2d_2014_Jan_31_1052.bof
 
 
 So this is the same bof file which sometimes works and sometimes does not ? 
 Or is this subsequent to some upgrade of the roach ? If it is a sometimes 
 work/not work issue maybe
 you are not marking it executable (chmod +x) on transfer or the transfer is 
 garbled.
 
 If it is subsequent to an upgrade: Note that at some point we moved to a 
 different kernel+driver+tcpborphserver  combination, which uses a mmap 
 interface - if you change to that you will have to update both kernel and 
 tcpborphserver, but for roach1 this is a backport, so probably something for 
 experts ...
 
 regards
 
 marc
 
 
 




Re: [casper] Question About The ADC Clock Frequency

2015-01-12 Thread David MacMahon
Hi, Peter,

On Jan 12, 2015, at 1:11 AM, Peter Niu wrote:

 In our model, We need ADC clock frequency up to 250Mhz.  Our ADC boards are 
 ADC16*250-8.We are using adc16*250-8 yellow block in our model modified based 
 PAPER model .However when I  changed the XSG core config/User IP Clock 
 Rate(MHz) to 250 Mhz and System Generator/FPGA Clock Period(ns) to 4ns,   it 
 could not create bof file,something like the following:
 
 ERROR:LIT:667 - Block 'MMCM_ADV symbol [...] has its target frequency, FVCO, 
 out of range.

I'm surprised that you did not get a DRC (design rule check) error earlier in 
the build process.  The ADC chips on the ADC16 board can sample as high as 250 
Msps when the ADC16 is running in 16 input mode, but MMCM limitations prevent 
using some sample rate ranges.  In short, the ADC16 board is limited to a 
maximum of 240 Msps in 16 input mode.  For details, see this new section of the 
CASPER wiki:

https://casper.berkeley.edu/wiki/ADC16x250-8#ADC16_Sample_Rate_vs_Virtex-6_MMCM_Limitations

 Now, The system work in 250Mhz clock rate while the model bof file is 
 crearted in 200Mhz. It looks no problem in sending the correct data packets, 
 but I am not sure whether it run normally.In theory , the input data rate is 
 250Mhz*8bits*32=64Gbits/s,after fft, EQ ,the data rate becomes 32Gbits/s,we 
 have 4 10Gbe ports to send out data.Each ports will have32/4=8Gbits/s,(if we 
 use 200MHz,this data rate is about 6.4Gbits/s )I don't know whether it is ok 
 for the transition capability of the 10Gbe NICs(10Gbits/s). 
 Could anyone help me please?

It may seem like it works at 250 MHz, but you are asking for problems if you 
clock the FPGA faster than the design was built for.  You could rebuild the 
design for 240 Msps and it should work OK (assuming the build can meet timing 
at 240 MHz).

Hope this helps,
Dave




Re: [casper] Compiler merging SRLs -- Timing performance

2014-12-04 Thread David MacMahon
Hi, Jack,

Are the tools are optimizing for area instead of speed?  Are you using Pblocks?

I don't know if this is relevant to your situation, but I've run into 
annoyances when the tools use equivalent register removal to save a few 
flip-flops but end up causing fan-out/routing issues.  That can be turned off, 
but it's a synthesis option so if you want to apply it to a System Generator 
netlist, you have to use the resynth_netlist Matlab function from the casper 
library to re-synthesize the entire netlist.

Dave

On Dec 4, 2014, at 10:48 AM, Jack Hickish wrote:

 Hi all,
 
 This is something I've been fighting with for a while now, and I wonder if 
 anyone on this maillist has any insight (because I'm pretty sure I may just 
 be doing something wrong with the tools).
 
 The problem:
 I'm playing with a ROACH2 design that (sometimes) compiles at 312 MHz. 
 However, every now and then I'll make a small change to the design and the 
 compile will fail timing catastrophically, with paths failing sometimes with 
 -2 ns (or worse) slack.
 When I look at the failing path(s), the delays are usually ~80% routing. I'll 
 see a signal take a huge detour to use a shift register in some arbitrary 
 location on the chip. Upon closer inspection of the relevant SRL, it appears 
 that the LUT concerned is being used for two signal paths, one on the O5 
 output, one on the O6. The result seems to be that it is poorly placed for 
 both it's roles.
 
 I'm only using ~50% of the slices and about 30% of the registers / luts on 
 the FPGA, and there are plenty of sensibly located SLICEMs the placer could 
 use if it so desired. I've switched lut combining off (with the -lt flag), in 
 planahead which doesn't seem to have made any difference.
 
 Can anyone offer me any words of advice / wisdom which might reduce my 
 confusion at what's going on (or, even better, help me solve the problem)?
 
 Despairingly yours,
 Jack
 
 




Re: [casper] Question of chose Correlator Architecture

2014-12-02 Thread David MacMahon
Hi, Peter,

If you have enough ports on the switch then you certainly can configure things 
to send the packets directly from the ROACHs to the various X boxes.  The 
pre-built PAPER model will support this if properly configured.  The one thing 
to keep in mind is the IP-to-MAC address table in the 10 GbE cores.  These are 
setup by paper_feng_init.rb.

You could use the factory MAC addresses of the X-box 10 GbE interfaces or you 
could configure the X boxes to set their 10 GbE MAC addresses to predefined 
values.  If using the latter approach, we often use 02:02:ww:xx:yy:zz where 
ww:xx:yy:zz corresponds to the IP address.

Hope this helps,
Dave

On Dec 1, 2014, at 5:21 AM, Peter Niu wrote:

 Hi,Dave,
 Thanks for your Document about EQ,and suggestion about the sample rate.Now I 
 have a question about the correlator architecture.
 I have saw your PPT : 
 Correlator Architectures
 Present and Future
 CASPER Workshop 2011
 
 The structure mentioned in the PPT is the structure PAPER used now .Using a 
 set of precise IP assignment to avoid Loop Back is ok. However,If we use the 
 Packetized F/X Concept:Uses two ports on switch per F/X pair.It may not meet 
 the Loop Back problem.On the switch, The IP address will tell the packet 
 which Xeng to go .The structure which The PAPER model using now is the 
 eth_?_gpu port on ROACH connect HPC port directly.Is this only for saving 
 ports on switch?Well,Our switch have 64 ports,If we use the  two ports on 
 switch per F/X pair Concept,the ports may be sufficient .
 This is the question asked by my teacher Wu fengquan. As PAPER provide a lot 
 of ruby control scripts online to use,I'd rather use this model exits 
 now.What should I say to him?Is there some more advantages to use this 
 structure instead the two ports on switch per F/X pair?
 Thanks for your help!
 Best wishes!
 Peter
 
 
 
 




Re: [casper] ROACH serial connection issues

2014-12-02 Thread David MacMahon
Hi, Norbert,

If you hit any key to stop autoboot when it says Hit any key to stop 
autoboot, does it in fact stop the autoboot?  If so, you could use u-boot's 
printenv command to see what commands get run as part of autoboot and then 
try to run them by hand to try to figure out where things go bad.

HTH,
Dave

On Dec 2, 2014, at 12:16 AM, Norbert Bonnici wrote:

 Hi Marc,
 
 The USB dongle's baud rate should have been set properly. When set to
 different baud rates no readable data is received through the serial
 port. Added line wrapping but it didn't change anything.
 
 In addition, recently the communications are being disabled when the
 GND wire is connected to the USB dongle. Data is only being received
 when only the tx and rx wires are connected.
 
 Regards,
 Norbert
 
 On 2 December 2014 at 08:39, Marc Welz m...@ska.ac.za wrote:
 
 
 On Mon, Dec 1, 2014 at 2:47 PM, Norbert Bonnici
 norbert.bonnici...@um.edu.mt wrote:
 
 Dear Marc,
 
 I've have tried all the possible CR+LF combinations.
 
 
 
 Any ideas?
 
 
 Then I am not sure - I know that some USB dongles attempt to autodetect the
 serial
 speed - maybe something is going wrong there ? Also, maybe enable line
 wrapping (Control-A W) might help.
 
 BTW:  CC'ing the mailing list is good form - it helps others who might have
 the same problem, and you might also get suggestions from other people
 
 regards
 
 marc
 
 
 




Re: [casper] NFS setup: TFTP permissions problem

2014-12-02 Thread David MacMahon
Hi, Michael,

In addition to the other suggestions, you should check whether you are running 
dnsmasq in tftp-secure more.  That might impose ownership and/or permission 
restrictions.  See man dnsmasq for more details.

Dave

On Dec 2, 2014, at 6:07 AM, Michael D'Cruze wrote:

 Hi everyone
 
 I'm following the NFS setup guide, and have come across a problem with the 
 /srv/roach_boot/boot directory permissions. I restart the dnsmasq service and 
 receive the following error:
 
 Starting dnsmasq: 
 dnsmasq: TFTP directory /srv/roach_boot/boot inaccessible: Permission denied
[FAILED]
 
 The output of ls -l from /srv/roach_boot is
 
 [root@roach-workstation roach_boot]# ls -l
 total 8
 drwxrwxrwx.  2 root root 4096 Dec  1 16:31 boot
 drwxrwxrwx. 23 root root 4096 Feb  2  2009 etch
 
 and from within /boot is
 
 [root@roach-workstation boot]# ls -l
 total 1360
 -rwxrwxrwx. 1 michael michael 1390149 Dec  1 15:35 uImage-20110812-mmcomitfix
 
 The output of ls --context from within /boot is
 
 [root@roach-workstation boot]# ls --context
 -rwxrwxrwx. michael michael unconfined_u:object_r:tftpdir_t:s0 
 uImage-20110812-mmcomitfix
 
 All of these permissions and contexts look correct according to the 
 guideso I'm at a bit of a loss. Has anyone seen this problem before, 
 given all of the above conditions?
 
 Does the /boot directory have to have the same context as the uImage file 
 within it?
 
 Suggestions or guidance greatly appreciated.
 
 Michael




Re: [casper] Question of chose Correlator Architecture

2014-12-02 Thread David MacMahon
Hi, Peter,

On Dec 2, 2014, at 9:34 AM, Peter Niu wrote:

 Thanks for your reply.Our switch ports are enough,but the Nics on roach is 
 not enough(we only have 4 ports each roach),so  if we send packets to x box 
 though switch,we only need 4 10Gbe ports on each roach.that is why we want to 
 use the pre-build paper model.

Yes, to do F - SWITCH - X you only need four 10 GbE ports on the ROACH2.  
In theory this could be done with 1 SFP+ card in the ROACH2, but if you want to 
use the pre-built model available on the internet to do this, you will still 
need 2 SFP+ cards in each ROACH2 due to how the interfaces are allocated.  If 
you modify the PAPER model, you can reassign the four eth_N_sw cores to be on 
1 SFP+ card.  The eth_N_gpu cores would be unused and could be deleted to 
save resources.

 if we want to use the pre-build model,the initial ruby scripts may should be 
 modified.the eth_n_gpu codes should be delete,the arp codes of eth_n_sw 
 should be changed.except the ethernet part of ruby codes should be 
 modified,need the other 
 parts codes like the PFB ,EQ parts be changed?

Yes, the network config stuff will have to change somewhat, but the rest of the 
configuration will remain the same.  Once you have settled on a network 
configuration, it should be fairly straightforward to change the 
paper_feng_init.rb script accordingly.

 could the pre-build model and scripts find on the websites?

All my pre-built PAPER F engine models and scripts are on the internet and I 
think you already have them.

Dave




Re: [casper] How to use the EQ model in PAPER?

2014-11-26 Thread David MacMahon
Hi, Peter,

I created a wiki page describing the EQ settings:

https://casper.berkeley.edu/wiki/PAPER_Correlator_EQ

For the -1 speed grade FPGAs on the RPACH2, the MMCMs on the ROACH2 cannot be 
configured in HIGH Bandwidth Mode when the input clock frequency is 250 MHz, 
so they must run in LOW Bandwidth Mode at that frequency.  Running with the 
MMCMs in LOW bandwidth mode does not give reliable capture of the high speed 
serial data bits that come from the ADC16 cards.  Maybe it is somehow possible 
to coax the MMCM into high bandwidth mode with a 250 MHz sample clock, but we 
gave up trying.  The ADC16 yellow block in 16-input mode supports a maximum 
sample rate of 240 MSPS.  This is explained here:

https://casper.berkeley.edu/wiki/ADC16x250-8_coax_rev_2#ADC16x250-8_coax_rev_2_Operating_Modes

It could probably use a little clarification that 16 inputs by 250 MSPS by 8 
bits is the ADC chip's max sample rate, but that rate is not supported by the 
ADC16 yellow block gateware due to limitations in the -1 speed grade FPGA MMCM.

Dave

On Nov 25, 2014, at 1:01 AM, Peter Niu wrote:

 Hi Dave,
 Thanks for your help! I have a question about EQ model.If I gusee right,the 
 EQ model have a cability to initial the value for every input signal channel.
 The paper_feng_init.rb could use 
 ./paper_feng_init.rb -e  eq_value 
 to initial the value of the EQ.I don't quite understand how it could initial 
 each channel.I think the eq_value of each channel  is different,but we just 
 got one value here.
 I got from former mails on mail-list,the value of the EQ could got in a 
 method : set the value to zero first, turn the value bigger,and stop when the 
 data value of received packet  is not zero,thenThe  EQ value is what we 
 need.Is this method right?
 I have a second question to ask you.Our project now need up the clock rate to 
 250Mhz,I found the bof file could run correctlly in 250Mhz clock rate.Do you 
 think It is ok for a long time running?I try to change the model to 250Mhz 
 when I compiled,but bof file could not compiled.The matlab said it is a ADC 
 block problem.
 Best wishes
 Peter
 
 
 




  1   2   3   4   >