[casper] ROACH 2's suddenly freezing left and right

2013-03-14 Thread G Jones
Hi,
For some unknown reason, boffiles I generate with my toolflow cause
ROACH 2's to freeze up after a few minutes (I think related to I/O to
software registers and shared BRAMs rather than any specific amount of
time). I don't know of any changes I made to my toolflow since the
last time I compiled working boffiles. Previously working boffiles
still work, but recompiled designs do not work. The symptom is that
the python katcp client stops responding. SSHing to the ROACH and
running ps shows that tcpborphserver3 is no longer running. It finally
occurred to me to check dmesg, and on all crashed ROACHs, I see this
in the demsg:

...
About to toggle cpu_rdy pin7r2case_event(): Got type 11, code 8, value 1
attempting led toggle
About to toggle cpu_rdy pin7r2case_event(): Got type 11, code 8, value 0
attempting led toggle
About to toggle cpu_rdy pinMachine check in kernel mode.
Data Read PLB Error
Oops: Machine check, sig: 7 [#1]
PowerPC 44x Platform
Modules linked in:
NIP: 0fea4048 LR: 0fea3f88 CTR: 0004
REGS: ef00bf10 TRAP: 0214   Not tainted  (3.7.0-rc2+)
MSR: 0002d000 CE,EE,PR,ME  CR: 2224  XER: 
TASK = efb54060[516] 'tcpborphserver3' THREAD: ef00a000
GPR00:  bfcb7290 48031e20 10628bf9 4802c010 0004 0018 7f7f7f7f
GPR08:  10628bf0 10628ba0 0fea3f80 2222 1006ba18  
GPR16:        
GPR24:    0004 10628bf9 10628bf9 0ff91ff4 4802c011
NIP [0fea4048] 0xfea4048
LR [0fea3f88] 0xfea3f88
Call Trace:
---[ end trace 59d28c137ef7dde2 ]---

roach VMA close
roach release mem called

-

If I then try to reboot the ROACH with shutdown -r now, it hardfreezes
and requires a power cycle to get it running again.

Any ideas where to look for this problem?

Thanks,
Glenn



Re: [casper] ROACH 2's suddenly freezing left and right

2013-03-14 Thread G Jones
Also, I meant to mention, I've checked coreinfo.tab etc and they are
identical between the working and non-working bofs.

On Thu, Mar 14, 2013 at 12:00 PM, G Jones glenn.calt...@gmail.com wrote:
 Hi,
 For some unknown reason, boffiles I generate with my toolflow cause
 ROACH 2's to freeze up after a few minutes (I think related to I/O to
 software registers and shared BRAMs rather than any specific amount of
 time). I don't know of any changes I made to my toolflow since the
 last time I compiled working boffiles. Previously working boffiles
 still work, but recompiled designs do not work. The symptom is that
 the python katcp client stops responding. SSHing to the ROACH and
 running ps shows that tcpborphserver3 is no longer running. It finally
 occurred to me to check dmesg, and on all crashed ROACHs, I see this
 in the demsg:

 ...
 About to toggle cpu_rdy pin7r2case_event(): Got type 11, code 8, value 1
 attempting led toggle
 About to toggle cpu_rdy pin7r2case_event(): Got type 11, code 8, value 0
 attempting led toggle
 About to toggle cpu_rdy pinMachine check in kernel mode.
 Data Read PLB Error
 Oops: Machine check, sig: 7 [#1]
 PowerPC 44x Platform
 Modules linked in:
 NIP: 0fea4048 LR: 0fea3f88 CTR: 0004
 REGS: ef00bf10 TRAP: 0214   Not tainted  (3.7.0-rc2+)
 MSR: 0002d000 CE,EE,PR,ME  CR: 2224  XER: 
 TASK = efb54060[516] 'tcpborphserver3' THREAD: ef00a000
 GPR00:  bfcb7290 48031e20 10628bf9 4802c010 0004 0018 7f7f7f7f
 GPR08:  10628bf0 10628ba0 0fea3f80 2222 1006ba18  
 GPR16:        
 GPR24:    0004 10628bf9 10628bf9 0ff91ff4 4802c011
 NIP [0fea4048] 0xfea4048
 LR [0fea3f88] 0xfea3f88
 Call Trace:
 ---[ end trace 59d28c137ef7dde2 ]---

 roach VMA close
 roach release mem called

 -

 If I then try to reboot the ROACH with shutdown -r now, it hardfreezes
 and requires a power cycle to get it running again.

 Any ideas where to look for this problem?

 Thanks,
 Glenn