Re: [casper] Question regarding FFT

2013-03-15 Thread Ryan Monroe

Comments:

-The FFT is probably fine.  If it was broken, it would probably be 100% 
broken.  At least this looks like a spectrum
-It appears to me as if the broken section is exactly 1/8th of the 
spectrum.  Did you hook up all of your outputs correctly?
-What's up with the spikes in the spectrum?  If they were interleave 
artifacts, I'd expect to see the one which is at channel ~1800 closer to 
2048.  Maybe they are some other tone though



On 03/15/2013 12:14 PM, Nimish Sane wrote:

Hi all:

I am attaching a plot for Power of two inputs vs frequency channels. 
As can be seen, we have this persistent problem where some channels at 
the end just do not make any sense. We suspect that there is something 
going wrong in the FFT block. Has anybody seen such behavior before or 
can think of what may be going wrong?


Following are some specifications that may be useful:
Hardware: ROACH2 with KATADC.
ADC Clock: 800 MHz,
FPGA clock: 200 MHz
Toolflow: XSG 11.5 with Matlab 2009b with RHEL5.8
Libraries: SKA
FFT green block: fft_wideband_real
FFT size: 2^13 (4096 Output channels)

Thanks,

Nimish
--
Nimish Sane

Center for Solar-Terrestrial Research
New Jersey Institute of Technology
University Heights

Newark, NJ 07102-1982 USA

Tel: (973) 642 4958

Fax: (973) 596 3617

nimish.s...@njit.edu 




Re: [casper] ADC 1x5000-8 Dux 1:2 Notes

2013-03-15 Thread Ross Williamson
Hi Rurik,

Thanks - this is really useful. I've started delving into the
datasheets - head hurts a little - it took me far to long to work out
why we can only do 4-bits in 1:2 mode (number of pins! on ZDOK). This
is making me wonder why the adc_bit_width in the 1:2 version is 8 and
not 4?

Also the ROACH2 version of the 1:2 doesn't have the FIFO in (although
I have no idea why anyone would use the 1:2 on a ROACH2)

I'll keep you updated

Best regards,

Ross

On Thu, Mar 14, 2013 at 4:14 PM, Rurik A Primiani
 wrote:
> Hi Ross,
>
> I developed a large part of the 5GSps ADC yellow block(s) in the
> sma-wideband repository using previous code kindly provided by Homin Jiang
> and Kim Guzzino. Unfortunately, as Jonathan mentioned in a previous email,
> we basically left the ROACH1, 1:2 DMUX yellow block behind when we realized
> we weren't going to use it. In its present state it's basically broken.
>
> I would suggest getting it up to date by bringing in the clock-domain
> crossing FIFO that was added into the other blocks. If I remember correctly
> this was the last feature left when I stopped developing that particular
> block. You should be able to copy the FIFO-related VHDL code and the FIFO
> netlist over from the 1:1, just make sure to adjust the MPD, PAO, etc as
> needed. Using the ROACH2, 1:2 block as a comparison is also a good idea.
> Basically the biggest difference between the two is that the R2 version uses
> a MMCM while the R1 version uses DCM/PLL.
>
> About the "adc1_dcm_locked" error you're receiving: this is a bug and I
> guess I forgot to fix it for ROACH1. If you look at line 203 in
> "system.mhs",
> https://github.com/sma-wideband/mlib_devel/blob/master/xps_base/XPS_ROACH_base/system.mhs#L203,
> you'll notice that the conditional statement checks for the presence of
> "adc0":
>
> #IF# (strcmp(get(b,'type'),'xps_adc5g'))  && get(b,'use_adc0')# PORT
> adc1_dcm_locked   = adc1_dcm_locked
>
> This should actually read:
>
> #IF# (strcmp(get(b,'type'),'xps_adc5g'))  && get(b,'use_adc1')# PORT
> adc1_dcm_locked   = adc1_dcm_locked
>
> If you make this change it should get rid of your error and allow you to use
> just adc0 or adc1 without needing both present. If you do make this change
> please feel free to issue a pull-request to sma-wideband and we'll merge it
> into the repo.
>
> Best,
> Rurik
>
>
>
> On 3/14/2013 12:33 PM, Ross Williamson wrote:
>>
>> Hi All,
>>
>> I'm starting to look into getting the ADC 1x5000-8 DMUX 1:2 to work on
>> a roach 1. I'm just posting a couple of comments here that I've
>> uncovered so far - I think most of these stem from the fact that a lot
>> of work has gone into developing the 1:1 demux with 2 cards for the
>> sma-wideband project and ROACH-2.
>>
>> Notes:
>>
>> 1) I'm using the git repo from the sma-wideband project - If I should
>> be looking elsewhere then let me know.
>> 2) The 1:1 version uses the Xilinx FIFO IP core where as the 1:2 does
>> not - This causes the 1:2 to not compile
>> 3) If you hack to remove the FIFO ports from the system.mhs (bad idea)
>> then you quickly notice that the opb_adc5g_controller has ports for
>> the FIFO and also for 2 adc's - i.e. I don't think it will compile if
>> you only have adc0 and not adc1 - error is  "adc1_dcm_locked - port is
>> driven by a sourceless connector "
>> 4) I'm going to look at the ROACH-2 implementation as that might help
>> a lot but I haven't got to it yet.
>>
>> Anyone know of a quick fix before I delve into the vhdl - different
>> repo/earlier version?
>>
>> I know most people are pushing ahead with the 1:1 on the ROACH-2 with
>> this board and so I'm happy looking into these issues but if anyone
>> has some quick good ideas it would be great to hear them.
>>
>> Cheers
>>
>> Ross
>
>



-- 
Ross Williamson
Research Scientist - Sub-mm Group
California Institute of Technology
626-395-2647 (office)
312-504-3051 (Cell)



Re: [casper] ADC 1x5000-8 Dux 1:2 Notes

2013-03-15 Thread Ross Williamson
Thanks Kim for the info - let me know if you get something up and running.

R

On Thu, Mar 14, 2013 at 6:35 PM, Kim Guzzino  wrote:
> Ross,
> I did a little work on the 1:2 after Rurik had finished it, although I
> abandoned it also.
> I will check my code and see if it compiles ok for Roach 1. I do remember it
> having some issue with using both ZDOKs but I'm not sure.
> I will let you know in the morning.
>
> Kim
>
>
> -Original Message-
> From: casper-boun...@lists.berkeley.edu
> [mailto:casper-boun...@lists.berkeley.edu] On Behalf Of Rurik A Primiani
> Sent: Thursday, March 14, 2013 4:14 PM
> To: Ross Williamson
> Cc: casper
> Subject: Re: [casper] ADC 1x5000-8 Dux 1:2 Notes
>
> Hi Ross,
>
> I developed a large part of the 5GSps ADC yellow block(s) in the
> sma-wideband repository using previous code kindly provided by Homin Jiang
> and Kim Guzzino. Unfortunately, as Jonathan mentioned in a previous email,
> we basically left the ROACH1, 1:2 DMUX yellow block behind when we realized
> we weren't going to use it. In its present state it's basically broken.
>
> I would suggest getting it up to date by bringing in the clock-domain
> crossing FIFO that was added into the other blocks. If I remember correctly
> this was the last feature left when I stopped developing that particular
> block. You should be able to copy the FIFO-related VHDL code and the FIFO
> netlist over from the 1:1, just make sure to adjust the MPD, PAO, etc as
> needed. Using the ROACH2, 1:2 block as a comparison is also a good idea.
> Basically the biggest difference between the two is that the R2 version uses
> a MMCM while the R1 version uses DCM/PLL.
>
> About the "adc1_dcm_locked" error you're receiving: this is a bug and I
> guess I forgot to fix it for ROACH1. If you look at line 203 in
> "system.mhs",
> https://github.com/sma-wideband/mlib_devel/blob/master/xps_base/XPS_ROACH_ba
> se/system.mhs#L203,
> you'll notice that the conditional statement checks for the presence of
> "adc0":
>
> #IF# (strcmp(get(b,'type'),'xps_adc5g'))  && get(b,'use_adc0')# PORT
> adc1_dcm_locked   = adc1_dcm_locked
>
> This should actually read:
>
> #IF# (strcmp(get(b,'type'),'xps_adc5g'))  && get(b,'use_adc1')# PORT
> adc1_dcm_locked   = adc1_dcm_locked
>
> If you make this change it should get rid of your error and allow you to use
> just adc0 or adc1 without needing both present. If you do make this change
> please feel free to issue a pull-request to sma-wideband and we'll merge it
> into the repo.
>
> Best,
> Rurik
>
>
> On 3/14/2013 12:33 PM, Ross Williamson wrote:
>> Hi All,
>>
>> I'm starting to look into getting the ADC 1x5000-8 DMUX 1:2 to work on
>> a roach 1. I'm just posting a couple of comments here that I've
>> uncovered so far - I think most of these stem from the fact that a lot
>> of work has gone into developing the 1:1 demux with 2 cards for the
>> sma-wideband project and ROACH-2.
>>
>> Notes:
>>
>> 1) I'm using the git repo from the sma-wideband project - If I should
>> be looking elsewhere then let me know.
>> 2) The 1:1 version uses the Xilinx FIFO IP core where as the 1:2 does
>> not - This causes the 1:2 to not compile
>> 3) If you hack to remove the FIFO ports from the system.mhs (bad idea)
>> then you quickly notice that the opb_adc5g_controller has ports for
>> the FIFO and also for 2 adc's - i.e. I don't think it will compile if
>> you only have adc0 and not adc1 - error is  "adc1_dcm_locked - port is
>> driven by a sourceless connector "
>> 4) I'm going to look at the ROACH-2 implementation as that might help
>> a lot but I haven't got to it yet.
>>
>> Anyone know of a quick fix before I delve into the vhdl - different
>> repo/earlier version?
>>
>> I know most people are pushing ahead with the 1:1 on the ROACH-2 with
>> this board and so I'm happy looking into these issues but if anyone
>> has some quick good ideas it would be great to hear them.
>>
>> Cheers
>>
>> Ross
>
>



-- 
Ross Williamson
Research Scientist - Sub-mm Group
California Institute of Technology
626-395-2647 (office)
312-504-3051 (Cell)



Re: [casper] Question regarding FFT

2013-03-15 Thread David MacMahon
Hi, Nimish,

Does this occur in simulation as well?

Dave

On Mar 15, 2013, at 12:14 PM, Nimish Sane wrote:

> Hi all:
> 
> I am attaching a plot for Power of two inputs vs frequency channels. As can 
> be seen, we have this persistent problem where some channels at the end just 
> do not make any sense. We suspect that there is something going wrong in the 
> FFT block. Has anybody seen such behavior before or can think of what may be 
> going wrong?
> 
> Following are some specifications that may be useful:
> Hardware: ROACH2 with KATADC.
> ADC Clock: 800 MHz, 
> FPGA clock: 200 MHz
> Toolflow: XSG 11.5 with Matlab 2009b with RHEL5.8
> Libraries: SKA
> FFT green block: fft_wideband_real
> FFT size: 2^13 (4096 Output channels)
> 
> Thanks,
> 
> Nimish
> -- 
> Nimish Sane
> 
> Center for Solar-Terrestrial Research
> New Jersey Institute of Technology
> University Heights
> Newark, NJ 07102-1982 USA
> Tel: (973) 642 4958
> Fax: (973) 596 3617
> nimish.s...@njit.edu
> 




Re: [casper] Question regarding FFT

2013-03-15 Thread Dan Werthimer
hi nimish,

this could be a sync pulse problem???

how often do you inject a sync  pulse?
did you check the re-order block in your fft to figure out the
minimum period sync pulse that's allowed for that particular FFT?
the casper memo on sync pulse generation has info on this.

dan


On Fri, Mar 15, 2013 at 12:14 PM, Nimish Sane  wrote:

> Hi all:
>
> I am attaching a plot for Power of two inputs vs frequency channels. As
> can be seen, we have this persistent problem where some channels at the end
> just do not make any sense. We suspect that there is something going wrong
> in the FFT block. Has anybody seen such behavior before or can think of
> what may be going wrong?
>
> Following are some specifications that may be useful:
> Hardware: ROACH2 with KATADC.
> ADC Clock: 800 MHz,
> FPGA clock: 200 MHz
> Toolflow: XSG 11.5 with Matlab 2009b with RHEL5.8
> Libraries: SKA
> FFT green block: fft_wideband_real
> FFT size: 2^13 (4096 Output channels)
>
> Thanks,
>
> Nimish
> --
> Nimish Sane
>
> Center for Solar-Terrestrial Research
> New Jersey Institute of Technology
> University Heights
>
> Newark, NJ 07102-1982 USA
>
> Tel: (973) 642 4958
>
> Fax: (973) 596 3617
> nimish.s...@njit.edu
>


Re: [casper] Question regarding FFT

2013-03-15 Thread Nimish Sane
Hi Glenn,

Yes, that is the plan, but we have some issues using snap blocks
independent of this. Hence, wanted to know if someone has seen such
behavior before as we continue to probe as you have suggested.

Thanks,

Nimish

On Fri, Mar 15, 2013 at 3:22 PM, G Jones  wrote:

> Hi Nimish,
> My suggestion is to add snapshot blocks triggered on the sync pulse
> directly after the FFT and after the blocks that follow it. You can
> then look at the signal at each stage and see at which stage that
> strange behavior is occurring. You can also add a "test vector
> generator" that puts in a known sequence and see if what comes out
> matches what you expect.
>
> Glenn
>
> On Fri, Mar 15, 2013 at 3:14 PM, Nimish Sane  wrote:
> > Hi all:
> >
> > I am attaching a plot for Power of two inputs vs frequency channels. As
> can
> > be seen, we have this persistent problem where some channels at the end
> just
> > do not make any sense. We suspect that there is something going wrong in
> the
> > FFT block. Has anybody seen such behavior before or can think of what
> may be
> > going wrong?
> >
> > Following are some specifications that may be useful:
> > Hardware: ROACH2 with KATADC.
> > ADC Clock: 800 MHz,
> > FPGA clock: 200 MHz
> > Toolflow: XSG 11.5 with Matlab 2009b with RHEL5.8
> > Libraries: SKA
> > FFT green block: fft_wideband_real
> > FFT size: 2^13 (4096 Output channels)
> >
> > Thanks,
> >
> > Nimish
> > --
> > Nimish Sane
> >
> > Center for Solar-Terrestrial Research
> > New Jersey Institute of Technology
> > University Heights
> >
> > Newark, NJ 07102-1982 USA
> >
> > Tel: (973) 642 4958
> >
> > Fax: (973) 596 3617
> >
> > nimish.s...@njit.edu
>



-- 
Nimish Sane

Center for Solar-Terrestrial Research
New Jersey Institute of Technology
University Heights

Newark, NJ 07102-1982 USA

Tel: (973) 642 4958

Fax: (973) 596 3617
nimish.s...@njit.edu


Re: [casper] Question regarding FFT

2013-03-15 Thread G Jones
Hi Nimish,
My suggestion is to add snapshot blocks triggered on the sync pulse
directly after the FFT and after the blocks that follow it. You can
then look at the signal at each stage and see at which stage that
strange behavior is occurring. You can also add a "test vector
generator" that puts in a known sequence and see if what comes out
matches what you expect.

Glenn

On Fri, Mar 15, 2013 at 3:14 PM, Nimish Sane  wrote:
> Hi all:
>
> I am attaching a plot for Power of two inputs vs frequency channels. As can
> be seen, we have this persistent problem where some channels at the end just
> do not make any sense. We suspect that there is something going wrong in the
> FFT block. Has anybody seen such behavior before or can think of what may be
> going wrong?
>
> Following are some specifications that may be useful:
> Hardware: ROACH2 with KATADC.
> ADC Clock: 800 MHz,
> FPGA clock: 200 MHz
> Toolflow: XSG 11.5 with Matlab 2009b with RHEL5.8
> Libraries: SKA
> FFT green block: fft_wideband_real
> FFT size: 2^13 (4096 Output channels)
>
> Thanks,
>
> Nimish
> --
> Nimish Sane
>
> Center for Solar-Terrestrial Research
> New Jersey Institute of Technology
> University Heights
>
> Newark, NJ 07102-1982 USA
>
> Tel: (973) 642 4958
>
> Fax: (973) 596 3617
>
> nimish.s...@njit.edu



[casper] Question regarding FFT

2013-03-15 Thread Nimish Sane
Hi all:

I am attaching a plot for Power of two inputs vs frequency channels. As can
be seen, we have this persistent problem where some channels at the end
just do not make any sense. We suspect that there is something going wrong
in the FFT block. Has anybody seen such behavior before or can think of
what may be going wrong?

Following are some specifications that may be useful:
Hardware: ROACH2 with KATADC.
ADC Clock: 800 MHz,
FPGA clock: 200 MHz
Toolflow: XSG 11.5 with Matlab 2009b with RHEL5.8
Libraries: SKA
FFT green block: fft_wideband_real
FFT size: 2^13 (4096 Output channels)

Thanks,

Nimish
-- 
Nimish Sane

Center for Solar-Terrestrial Research
New Jersey Institute of Technology
University Heights

Newark, NJ 07102-1982 USA

Tel: (973) 642 4958

Fax: (973) 596 3617
nimish.s...@njit.edu
<>

Re: [casper] SOLVED: ROACH 2's suddenly freezing left and right

2013-03-15 Thread G Jones
Hi Henno,
I'll send the model separately. Typically the crash occurs within a minute
or two, which corresponds to ~10-30 register/bram read/writes.
Glenn


On Fri, Mar 15, 2013 at 10:25 AM, Henno Kriel  wrote:

> Hi Glenn
>
> Is it possible to send me you model file?
>
> I have a fairly sizable design running with these changes, that has many
> register, shared BRAMs and snap blocks, without issues.
>
> You mentioned that the design crashes after a while - could you give me a
> more precise indication of the time span?
>
> Regards
> Henno
>
> On Fri, Mar 15, 2013 at 3:28 PM, G Jones  wrote:
>
>> Hi,
>> It should have occurred to me sooner, but I checked through the commit
>> logs for mlib_devel and remembered I had updated from ska-sa a couple of
>> weeks ago to get the bugfix for the rcs block. In doing so, I had also
>> pulled down this commit:
>>
>>
>> https://github.com/ska-sa/mlib_devel/commit/bad95b18fe79146d288607e5fe3c0360c071c2ad
>> "Simplified the EPB to OPB 32bit bus cycle and now supports legacy byte
>> enable support for ROACH 1 modules on ROACH 2."
>>
>> which sounds suspicious since the problem seemed to be related to reading
>> writing brams/software registers.
>>
>> Indeed, when I switched over to the commit right before that one and
>> compiled the same test design, I ended up with a boffile that has not yet
>> crashed (the bad bof would have certainly crashed by now).
>>
>> The design is simply two ADC5Gs connected to a snapshot blocks. The ADCs
>> are clocked at 2880 MHz, so the FPGA is running at 180 MHz.  I'm not sure
>> if the problem is some interaction between the ADC5Gs and this commit, or
>> the clock rate or what.
>>
>> Henno, can you double check the code in this commit and see if you can
>> ascertain where the bug might be?
>>
>> Glenn
>>
>> On Thu, Mar 14, 2013 at 12:00 PM, G Jones wrote:
>>
>>> Hi,
>>> For some unknown reason, boffiles I generate with my toolflow cause
>>> ROACH 2's to freeze up after a few minutes (I think related to I/O to
>>> software registers and shared BRAMs rather than any specific amount of
>>> time). I don't know of any changes I made to my toolflow since the
>>> last time I compiled working boffiles. Previously working boffiles
>>> still work, but recompiled designs do not work. The symptom is that
>>> the python katcp client stops responding. SSHing to the ROACH and
>>> running ps shows that tcpborphserver3 is no longer running. It finally
>>> occurred to me to check dmesg, and on all crashed ROACHs, I see this
>>> in the demsg:
>>>
>>> ...
>>> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value
>>> 1
>>> attempting led toggle
>>> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value
>>> 0
>>> attempting led toggle
>>> About to toggle cpu_rdy pinMachine check in kernel mode.
>>> Data Read PLB Error
>>> Oops: Machine check, sig: 7 [#1]
>>> PowerPC 44x Platform
>>> Modules linked in:
>>> NIP: 0fea4048 LR: 0fea3f88 CTR: 0004
>>> REGS: ef00bf10 TRAP: 0214   Not tainted  (3.7.0-rc2+)
>>> MSR: 0002d000   CR: 2224  XER: 
>>> TASK = efb54060[516] 'tcpborphserver3' THREAD: ef00a000
>>> GPR00:  bfcb7290 48031e20 10628bf9 4802c010 0004 0018
>>> 7f7f7f7f
>>> GPR08:  10628bf0 10628ba0 0fea3f80 2222 1006ba18 
>>> 
>>> GPR16:       
>>> 
>>> GPR24:    0004 10628bf9 10628bf9 0ff91ff4
>>> 4802c011
>>> NIP [0fea4048] 0xfea4048
>>> LR [0fea3f88] 0xfea3f88
>>> Call Trace:
>>> ---[ end trace 59d28c137ef7dde2 ]---
>>>
>>> roach VMA close
>>> roach release mem called
>>>
>>> -
>>>
>>> If I then try to reboot the ROACH with shutdown -r now, it hardfreezes
>>> and requires a power cycle to get it running again.
>>>
>>> Any ideas where to look for this problem?
>>>
>>> Thanks,
>>> Glenn
>>>
>>
>>
>
>
> --
> Henno Kriel
>
> DSP Engineer
> Digital Back End
> meerKAT
>
> SKA South Africa
> Third Floor
> The Park
> Park Road (off Alexandra Road)
> Pinelands
> 7405
> Western Cape
> South Africa
>
> Latitude: -33.94329 (South); Longitude: 18.48945 (East).
>
> (p) +27 (0)21 506 7300
> (p) +27 (0)21 506 7365 (direct)
> (f) +27 (0)21 506 7375
> (m) +27 (0)84 504 5050
>


Re: [casper] SOLVED: ROACH 2's suddenly freezing left and right

2013-03-15 Thread G Jones
Hi Wes,
The problem shows up with both the latest pull from ska-sa and an earlier
one from a couple of months ago. The crashing bof crashes with both and the
working bof works with both. That's why I'm wondering if it's some
interaction with the ADC5G since I presume your designs are mostly with the
katADC?

Has anyone else compiled/run bofs using ADC5G with these latest changes?

Glenn


On Fri, Mar 15, 2013 at 10:19 AM, Wesley New  wrote:

> Hi Glenn,
>
> We are running many bof files with that change and are doing plenty of
> register and bram reads and writes and have not experienced any issues with
> these bus accesses. What version of TCPBorphServer are you running?
>
> Wes
>
>
> On Fri, Mar 15, 2013 at 3:28 PM, G Jones  wrote:
>
>> Hi,
>> It should have occurred to me sooner, but I checked through the commit
>> logs for mlib_devel and remembered I had updated from ska-sa a couple of
>> weeks ago to get the bugfix for the rcs block. In doing so, I had also
>> pulled down this commit:
>>
>>
>> https://github.com/ska-sa/mlib_devel/commit/bad95b18fe79146d288607e5fe3c0360c071c2ad
>> "Simplified the EPB to OPB 32bit bus cycle and now supports legacy byte
>> enable support for ROACH 1 modules on ROACH 2."
>>
>> which sounds suspicious since the problem seemed to be related to reading
>> writing brams/software registers.
>>
>> Indeed, when I switched over to the commit right before that one and
>> compiled the same test design, I ended up with a boffile that has not yet
>> crashed (the bad bof would have certainly crashed by now).
>>
>> The design is simply two ADC5Gs connected to a snapshot blocks. The ADCs
>> are clocked at 2880 MHz, so the FPGA is running at 180 MHz.  I'm not sure
>> if the problem is some interaction between the ADC5Gs and this commit, or
>> the clock rate or what.
>>
>> Henno, can you double check the code in this commit and see if you can
>> ascertain where the bug might be?
>>
>> Glenn
>>
>> On Thu, Mar 14, 2013 at 12:00 PM, G Jones wrote:
>>
>>> Hi,
>>> For some unknown reason, boffiles I generate with my toolflow cause
>>> ROACH 2's to freeze up after a few minutes (I think related to I/O to
>>> software registers and shared BRAMs rather than any specific amount of
>>> time). I don't know of any changes I made to my toolflow since the
>>> last time I compiled working boffiles. Previously working boffiles
>>> still work, but recompiled designs do not work. The symptom is that
>>> the python katcp client stops responding. SSHing to the ROACH and
>>> running ps shows that tcpborphserver3 is no longer running. It finally
>>> occurred to me to check dmesg, and on all crashed ROACHs, I see this
>>> in the demsg:
>>>
>>> ...
>>> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value
>>> 1
>>> attempting led toggle
>>> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value
>>> 0
>>> attempting led toggle
>>> About to toggle cpu_rdy pinMachine check in kernel mode.
>>> Data Read PLB Error
>>> Oops: Machine check, sig: 7 [#1]
>>> PowerPC 44x Platform
>>> Modules linked in:
>>> NIP: 0fea4048 LR: 0fea3f88 CTR: 0004
>>> REGS: ef00bf10 TRAP: 0214   Not tainted  (3.7.0-rc2+)
>>> MSR: 0002d000   CR: 2224  XER: 
>>> TASK = efb54060[516] 'tcpborphserver3' THREAD: ef00a000
>>> GPR00:  bfcb7290 48031e20 10628bf9 4802c010 0004 0018
>>> 7f7f7f7f
>>> GPR08:  10628bf0 10628ba0 0fea3f80 2222 1006ba18 
>>> 
>>> GPR16:       
>>> 
>>> GPR24:    0004 10628bf9 10628bf9 0ff91ff4
>>> 4802c011
>>> NIP [0fea4048] 0xfea4048
>>> LR [0fea3f88] 0xfea3f88
>>> Call Trace:
>>> ---[ end trace 59d28c137ef7dde2 ]---
>>>
>>> roach VMA close
>>> roach release mem called
>>>
>>> -
>>>
>>> If I then try to reboot the ROACH with shutdown -r now, it hardfreezes
>>> and requires a power cycle to get it running again.
>>>
>>> Any ideas where to look for this problem?
>>>
>>> Thanks,
>>> Glenn
>>>
>>
>>
>


Re: [casper] SOLVED: ROACH 2's suddenly freezing left and right

2013-03-15 Thread Henno Kriel
Hi Glenn

Is it possible to send me you model file?

I have a fairly sizable design running with these changes, that has many
register, shared BRAMs and snap blocks, without issues.

You mentioned that the design crashes after a while - could you give me a
more precise indication of the time span?

Regards
Henno

On Fri, Mar 15, 2013 at 3:28 PM, G Jones  wrote:

> Hi,
> It should have occurred to me sooner, but I checked through the commit
> logs for mlib_devel and remembered I had updated from ska-sa a couple of
> weeks ago to get the bugfix for the rcs block. In doing so, I had also
> pulled down this commit:
>
>
> https://github.com/ska-sa/mlib_devel/commit/bad95b18fe79146d288607e5fe3c0360c071c2ad
> "Simplified the EPB to OPB 32bit bus cycle and now supports legacy byte
> enable support for ROACH 1 modules on ROACH 2."
>
> which sounds suspicious since the problem seemed to be related to reading
> writing brams/software registers.
>
> Indeed, when I switched over to the commit right before that one and
> compiled the same test design, I ended up with a boffile that has not yet
> crashed (the bad bof would have certainly crashed by now).
>
> The design is simply two ADC5Gs connected to a snapshot blocks. The ADCs
> are clocked at 2880 MHz, so the FPGA is running at 180 MHz.  I'm not sure
> if the problem is some interaction between the ADC5Gs and this commit, or
> the clock rate or what.
>
> Henno, can you double check the code in this commit and see if you can
> ascertain where the bug might be?
>
> Glenn
>
> On Thu, Mar 14, 2013 at 12:00 PM, G Jones  wrote:
>
>> Hi,
>> For some unknown reason, boffiles I generate with my toolflow cause
>> ROACH 2's to freeze up after a few minutes (I think related to I/O to
>> software registers and shared BRAMs rather than any specific amount of
>> time). I don't know of any changes I made to my toolflow since the
>> last time I compiled working boffiles. Previously working boffiles
>> still work, but recompiled designs do not work. The symptom is that
>> the python katcp client stops responding. SSHing to the ROACH and
>> running ps shows that tcpborphserver3 is no longer running. It finally
>> occurred to me to check dmesg, and on all crashed ROACHs, I see this
>> in the demsg:
>>
>> ...
>> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value 1
>> attempting led toggle
>> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value 0
>> attempting led toggle
>> About to toggle cpu_rdy pinMachine check in kernel mode.
>> Data Read PLB Error
>> Oops: Machine check, sig: 7 [#1]
>> PowerPC 44x Platform
>> Modules linked in:
>> NIP: 0fea4048 LR: 0fea3f88 CTR: 0004
>> REGS: ef00bf10 TRAP: 0214   Not tainted  (3.7.0-rc2+)
>> MSR: 0002d000   CR: 2224  XER: 
>> TASK = efb54060[516] 'tcpborphserver3' THREAD: ef00a000
>> GPR00:  bfcb7290 48031e20 10628bf9 4802c010 0004 0018
>> 7f7f7f7f
>> GPR08:  10628bf0 10628ba0 0fea3f80 2222 1006ba18 
>> 
>> GPR16:       
>> 
>> GPR24:    0004 10628bf9 10628bf9 0ff91ff4
>> 4802c011
>> NIP [0fea4048] 0xfea4048
>> LR [0fea3f88] 0xfea3f88
>> Call Trace:
>> ---[ end trace 59d28c137ef7dde2 ]---
>>
>> roach VMA close
>> roach release mem called
>>
>> -
>>
>> If I then try to reboot the ROACH with shutdown -r now, it hardfreezes
>> and requires a power cycle to get it running again.
>>
>> Any ideas where to look for this problem?
>>
>> Thanks,
>> Glenn
>>
>
>


-- 
Henno Kriel

DSP Engineer
Digital Back End
meerKAT

SKA South Africa
Third Floor
The Park
Park Road (off Alexandra Road)
Pinelands
7405
Western Cape
South Africa

Latitude: -33.94329 (South); Longitude: 18.48945 (East).

(p) +27 (0)21 506 7300
(p) +27 (0)21 506 7365 (direct)
(f) +27 (0)21 506 7375
(m) +27 (0)84 504 5050


Re: [casper] SOLVED: ROACH 2's suddenly freezing left and right

2013-03-15 Thread Wesley New
Hi Glenn,

We are running many bof files with that change and are doing plenty of
register and bram reads and writes and have not experienced any issues with
these bus accesses. What version of TCPBorphServer are you running?

Wes


On Fri, Mar 15, 2013 at 3:28 PM, G Jones  wrote:

> Hi,
> It should have occurred to me sooner, but I checked through the commit
> logs for mlib_devel and remembered I had updated from ska-sa a couple of
> weeks ago to get the bugfix for the rcs block. In doing so, I had also
> pulled down this commit:
>
>
> https://github.com/ska-sa/mlib_devel/commit/bad95b18fe79146d288607e5fe3c0360c071c2ad
> "Simplified the EPB to OPB 32bit bus cycle and now supports legacy byte
> enable support for ROACH 1 modules on ROACH 2."
>
> which sounds suspicious since the problem seemed to be related to reading
> writing brams/software registers.
>
> Indeed, when I switched over to the commit right before that one and
> compiled the same test design, I ended up with a boffile that has not yet
> crashed (the bad bof would have certainly crashed by now).
>
> The design is simply two ADC5Gs connected to a snapshot blocks. The ADCs
> are clocked at 2880 MHz, so the FPGA is running at 180 MHz.  I'm not sure
> if the problem is some interaction between the ADC5Gs and this commit, or
> the clock rate or what.
>
> Henno, can you double check the code in this commit and see if you can
> ascertain where the bug might be?
>
> Glenn
>
> On Thu, Mar 14, 2013 at 12:00 PM, G Jones  wrote:
>
>> Hi,
>> For some unknown reason, boffiles I generate with my toolflow cause
>> ROACH 2's to freeze up after a few minutes (I think related to I/O to
>> software registers and shared BRAMs rather than any specific amount of
>> time). I don't know of any changes I made to my toolflow since the
>> last time I compiled working boffiles. Previously working boffiles
>> still work, but recompiled designs do not work. The symptom is that
>> the python katcp client stops responding. SSHing to the ROACH and
>> running ps shows that tcpborphserver3 is no longer running. It finally
>> occurred to me to check dmesg, and on all crashed ROACHs, I see this
>> in the demsg:
>>
>> ...
>> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value 1
>> attempting led toggle
>> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value 0
>> attempting led toggle
>> About to toggle cpu_rdy pinMachine check in kernel mode.
>> Data Read PLB Error
>> Oops: Machine check, sig: 7 [#1]
>> PowerPC 44x Platform
>> Modules linked in:
>> NIP: 0fea4048 LR: 0fea3f88 CTR: 0004
>> REGS: ef00bf10 TRAP: 0214   Not tainted  (3.7.0-rc2+)
>> MSR: 0002d000   CR: 2224  XER: 
>> TASK = efb54060[516] 'tcpborphserver3' THREAD: ef00a000
>> GPR00:  bfcb7290 48031e20 10628bf9 4802c010 0004 0018
>> 7f7f7f7f
>> GPR08:  10628bf0 10628ba0 0fea3f80 2222 1006ba18 
>> 
>> GPR16:       
>> 
>> GPR24:    0004 10628bf9 10628bf9 0ff91ff4
>> 4802c011
>> NIP [0fea4048] 0xfea4048
>> LR [0fea3f88] 0xfea3f88
>> Call Trace:
>> ---[ end trace 59d28c137ef7dde2 ]---
>>
>> roach VMA close
>> roach release mem called
>>
>> -
>>
>> If I then try to reboot the ROACH with shutdown -r now, it hardfreezes
>> and requires a power cycle to get it running again.
>>
>> Any ideas where to look for this problem?
>>
>> Thanks,
>> Glenn
>>
>
>


Re: [casper] problems with adc2x400-14 (re-send with attachments)

2013-03-15 Thread Guy kenfack
hi Ryan,
another solution for you(if you don't want to change your ucf file):
you can just modify the DCM phase shift parameters of your ADC yellow block
sampling clock by modifying the line 345 (adc2x14_400_interface.vhd )
  PHASE_SHIFT   => 0,  <-  you can set here the appropriate
value: they are explained in the DCM userguide of the virtex5, for example:
if you set the value 32, it  will sample your "adc zdock data bus" with a
phase shift of 45 degree. try several value until you get rid of spike in
your plot. for each value you need to generate a new boffile. once you get
it, keep it for the rest of your future design. Of course if you change
your adc sampling clock you will have to make sure that the adc data are
still good.

adc2x14_400_interface.vhd (if you are using the "adc2x14_400" yellow block )
located in your casper xps library :
/home/roach/mlib_git/mlib_devel/xps_base/XPS_ROACH_base/pcores/adc2x14_400_interface_v1_00_a/hdl/vhdl

regards, I hope this trick will help also someone else.


[casper] SOLVED: ROACH 2's suddenly freezing left and right

2013-03-15 Thread G Jones
Hi,
It should have occurred to me sooner, but I checked through the commit logs
for mlib_devel and remembered I had updated from ska-sa a couple of weeks
ago to get the bugfix for the rcs block. In doing so, I had also pulled
down this commit:

https://github.com/ska-sa/mlib_devel/commit/bad95b18fe79146d288607e5fe3c0360c071c2ad
"Simplified the EPB to OPB 32bit bus cycle and now supports legacy byte
enable support for ROACH 1 modules on ROACH 2."

which sounds suspicious since the problem seemed to be related to reading
writing brams/software registers.

Indeed, when I switched over to the commit right before that one and
compiled the same test design, I ended up with a boffile that has not yet
crashed (the bad bof would have certainly crashed by now).

The design is simply two ADC5Gs connected to a snapshot blocks. The ADCs
are clocked at 2880 MHz, so the FPGA is running at 180 MHz.  I'm not sure
if the problem is some interaction between the ADC5Gs and this commit, or
the clock rate or what.

Henno, can you double check the code in this commit and see if you can
ascertain where the bug might be?

Glenn

On Thu, Mar 14, 2013 at 12:00 PM, G Jones  wrote:

> Hi,
> For some unknown reason, boffiles I generate with my toolflow cause
> ROACH 2's to freeze up after a few minutes (I think related to I/O to
> software registers and shared BRAMs rather than any specific amount of
> time). I don't know of any changes I made to my toolflow since the
> last time I compiled working boffiles. Previously working boffiles
> still work, but recompiled designs do not work. The symptom is that
> the python katcp client stops responding. SSHing to the ROACH and
> running ps shows that tcpborphserver3 is no longer running. It finally
> occurred to me to check dmesg, and on all crashed ROACHs, I see this
> in the demsg:
>
> ...
> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value 1
> attempting led toggle
> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value 0
> attempting led toggle
> About to toggle cpu_rdy pinMachine check in kernel mode.
> Data Read PLB Error
> Oops: Machine check, sig: 7 [#1]
> PowerPC 44x Platform
> Modules linked in:
> NIP: 0fea4048 LR: 0fea3f88 CTR: 0004
> REGS: ef00bf10 TRAP: 0214   Not tainted  (3.7.0-rc2+)
> MSR: 0002d000   CR: 2224  XER: 
> TASK = efb54060[516] 'tcpborphserver3' THREAD: ef00a000
> GPR00:  bfcb7290 48031e20 10628bf9 4802c010 0004 0018
> 7f7f7f7f
> GPR08:  10628bf0 10628ba0 0fea3f80 2222 1006ba18 
> 
> GPR16:       
> 
> GPR24:    0004 10628bf9 10628bf9 0ff91ff4
> 4802c011
> NIP [0fea4048] 0xfea4048
> LR [0fea3f88] 0xfea3f88
> Call Trace:
> ---[ end trace 59d28c137ef7dde2 ]---
>
> roach VMA close
> roach release mem called
>
> -
>
> If I then try to reboot the ROACH with shutdown -r now, it hardfreezes
> and requires a power cycle to get it running again.
>
> Any ideas where to look for this problem?
>
> Thanks,
> Glenn
>