Hi Glenn

Is it possible to send me you model file?

I have a fairly sizable design running with these changes, that has many
register, shared BRAMs and snap blocks, without issues.

You mentioned that the design crashes after a while - could you give me a
more precise indication of the time span?

Regards
Henno

On Fri, Mar 15, 2013 at 3:28 PM, G Jones <glenn.calt...@gmail.com> wrote:

> Hi,
> It should have occurred to me sooner, but I checked through the commit
> logs for mlib_devel and remembered I had updated from ska-sa a couple of
> weeks ago to get the bugfix for the rcs block. In doing so, I had also
> pulled down this commit:
>
>
> https://github.com/ska-sa/mlib_devel/commit/bad95b18fe79146d288607e5fe3c0360c071c2ad
> "Simplified the EPB to OPB 32bit bus cycle and now supports legacy byte
> enable support for ROACH 1 modules on ROACH 2."
>
> which sounds suspicious since the problem seemed to be related to reading
> writing brams/software registers.
>
> Indeed, when I switched over to the commit right before that one and
> compiled the same test design, I ended up with a boffile that has not yet
> crashed (the bad bof would have certainly crashed by now).
>
> The design is simply two ADC5Gs connected to a snapshot blocks. The ADCs
> are clocked at 2880 MHz, so the FPGA is running at 180 MHz.  I'm not sure
> if the problem is some interaction between the ADC5Gs and this commit, or
> the clock rate or what.
>
> Henno, can you double check the code in this commit and see if you can
> ascertain where the bug might be?
>
> Glenn
>
> On Thu, Mar 14, 2013 at 12:00 PM, G Jones <glenn.calt...@gmail.com> wrote:
>
>> Hi,
>> For some unknown reason, boffiles I generate with my toolflow cause
>> ROACH 2's to freeze up after a few minutes (I think related to I/O to
>> software registers and shared BRAMs rather than any specific amount of
>> time). I don't know of any changes I made to my toolflow since the
>> last time I compiled working boffiles. Previously working boffiles
>> still work, but recompiled designs do not work. The symptom is that
>> the python katcp client stops responding. SSHing to the ROACH and
>> running ps shows that tcpborphserver3 is no longer running. It finally
>> occurred to me to check dmesg, and on all crashed ROACHs, I see this
>> in the demsg:
>>
>> ...
>> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value 1
>> attempting led toggle
>> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value 0
>> attempting led toggle
>> About to toggle cpu_rdy pinMachine check in kernel mode.
>> Data Read PLB Error
>> Oops: Machine check, sig: 7 [#1]
>> PowerPC 44x Platform
>> Modules linked in:
>> NIP: 0fea4048 LR: 0fea3f88 CTR: 00000004
>> REGS: ef00bf10 TRAP: 0214   Not tainted  (3.7.0-rc2+)
>> MSR: 0002d000 <CE,EE,PR,ME>  CR: 20000224  XER: 00000000
>> TASK = efb54060[516] 'tcpborphserver3' THREAD: ef00a000
>> GPR00: 00000000 bfcb7290 48031e20 10628bf9 4802c010 00000004 00000018
>> 7f7f7f7f
>> GPR08: 00000000 10628bf0 10628ba0 0fea3f80 20000222 1006ba18 00000000
>> 00000000
>> GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> 00000000
>> GPR24: 00000000 00000000 00000000 00000004 10628bf9 10628bf9 0ff91ff4
>> 4802c011
>> NIP [0fea4048] 0xfea4048
>> LR [0fea3f88] 0xfea3f88
>> Call Trace:
>> ---[ end trace 59d28c137ef7dde2 ]---
>>
>> roach VMA close
>> roach release mem called
>>
>> -----
>>
>> If I then try to reboot the ROACH with shutdown -r now, it hardfreezes
>> and requires a power cycle to get it running again.
>>
>> Any ideas where to look for this problem?
>>
>> Thanks,
>> Glenn
>>
>
>


-- 
Henno Kriel

DSP Engineer
Digital Back End
meerKAT

SKA South Africa
Third Floor
The Park
Park Road (off Alexandra Road)
Pinelands
7405
Western Cape
South Africa

Latitude: -33.94329 (South); Longitude: 18.48945 (East).

(p) +27 (0)21 506 7300
(p) +27 (0)21 506 7365 (direct)
(f) +27 (0)21 506 7375
(m) +27 (0)84 504 5050

Reply via email to