Hi Glenn Is it possible to send me you model file?
I have a fairly sizable design running with these changes, that has many register, shared BRAMs and snap blocks, without issues. You mentioned that the design crashes after a while - could you give me a more precise indication of the time span? Regards Henno On Fri, Mar 15, 2013 at 3:28 PM, G Jones <glenn.calt...@gmail.com> wrote: > Hi, > It should have occurred to me sooner, but I checked through the commit > logs for mlib_devel and remembered I had updated from ska-sa a couple of > weeks ago to get the bugfix for the rcs block. In doing so, I had also > pulled down this commit: > > > https://github.com/ska-sa/mlib_devel/commit/bad95b18fe79146d288607e5fe3c0360c071c2ad > "Simplified the EPB to OPB 32bit bus cycle and now supports legacy byte > enable support for ROACH 1 modules on ROACH 2." > > which sounds suspicious since the problem seemed to be related to reading > writing brams/software registers. > > Indeed, when I switched over to the commit right before that one and > compiled the same test design, I ended up with a boffile that has not yet > crashed (the bad bof would have certainly crashed by now). > > The design is simply two ADC5Gs connected to a snapshot blocks. The ADCs > are clocked at 2880 MHz, so the FPGA is running at 180 MHz. I'm not sure > if the problem is some interaction between the ADC5Gs and this commit, or > the clock rate or what. > > Henno, can you double check the code in this commit and see if you can > ascertain where the bug might be? > > Glenn > > On Thu, Mar 14, 2013 at 12:00 PM, G Jones <glenn.calt...@gmail.com> wrote: > >> Hi, >> For some unknown reason, boffiles I generate with my toolflow cause >> ROACH 2's to freeze up after a few minutes (I think related to I/O to >> software registers and shared BRAMs rather than any specific amount of >> time). I don't know of any changes I made to my toolflow since the >> last time I compiled working boffiles. Previously working boffiles >> still work, but recompiled designs do not work. The symptom is that >> the python katcp client stops responding. SSHing to the ROACH and >> running ps shows that tcpborphserver3 is no longer running. It finally >> occurred to me to check dmesg, and on all crashed ROACHs, I see this >> in the demsg: >> >> ... >> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value 1 >> attempting led toggle >> About to toggle cpu_rdy pin<7>r2case_event(): Got type 11, code 8, value 0 >> attempting led toggle >> About to toggle cpu_rdy pinMachine check in kernel mode. >> Data Read PLB Error >> Oops: Machine check, sig: 7 [#1] >> PowerPC 44x Platform >> Modules linked in: >> NIP: 0fea4048 LR: 0fea3f88 CTR: 00000004 >> REGS: ef00bf10 TRAP: 0214 Not tainted (3.7.0-rc2+) >> MSR: 0002d000 <CE,EE,PR,ME> CR: 20000224 XER: 00000000 >> TASK = efb54060[516] 'tcpborphserver3' THREAD: ef00a000 >> GPR00: 00000000 bfcb7290 48031e20 10628bf9 4802c010 00000004 00000018 >> 7f7f7f7f >> GPR08: 00000000 10628bf0 10628ba0 0fea3f80 20000222 1006ba18 00000000 >> 00000000 >> GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >> 00000000 >> GPR24: 00000000 00000000 00000000 00000004 10628bf9 10628bf9 0ff91ff4 >> 4802c011 >> NIP [0fea4048] 0xfea4048 >> LR [0fea3f88] 0xfea3f88 >> Call Trace: >> ---[ end trace 59d28c137ef7dde2 ]--- >> >> roach VMA close >> roach release mem called >> >> ----- >> >> If I then try to reboot the ROACH with shutdown -r now, it hardfreezes >> and requires a power cycle to get it running again. >> >> Any ideas where to look for this problem? >> >> Thanks, >> Glenn >> > > -- Henno Kriel DSP Engineer Digital Back End meerKAT SKA South Africa Third Floor The Park Park Road (off Alexandra Road) Pinelands 7405 Western Cape South Africa Latitude: -33.94329 (South); Longitude: 18.48945 (East). (p) +27 (0)21 506 7300 (p) +27 (0)21 506 7365 (direct) (f) +27 (0)21 506 7375 (m) +27 (0)84 504 5050