Re: [casper] ROACH hangs before Uboot prompt
Hi Jason, thanks for your email. All your help getting our systems up and running is very much appreciated. I have upgraded to the latest version of uBoot as per your suggestion. I also reinstalled the latest filesystem I could find; filesystem_etch_2009_10_28_tgtap.bz Unfortunately, it still dies before the end of the boot sequence. Both using usbboot and nfsboot the boot sequence gets almost to the end when it dies with some malloc problem. The exact cause is hard to determine, but I have attached some logs for your reference. (Note that these were taken before I upgraded the filesystem, but doing so changed nothing). The main thing to note are: Memory error message before uboot prompt: Memory error at 0150, wrote , read ffeb ! Malloc error at the end of the boot sequence So, at this stage I have run out of ideas. So unless you have any other ideas it may be that the best option is to wait for Wan to come back, and he can try to use his newfound knowledge to update everything on this board to the same working state as his roach hopefully will be in. thanks again,k and cheer Kjetil Jason Manley wrote: Hi Kjetil Since decreasing the bus speed back to the normal 66.67MHz (133MHz memory) and using registered DIMMs, we have not had any further memory troubles. This has been checked on 5 different ROACH boards from production run 1 and 2. Dave's even put one of the boards in a burn-in bootloop and after over a thousand boots, still not a single failure. But perhaps this sample size is too small to be meaningful. Preparation H was trying to operate the memory right at the PPC's limits (167MHz), and it seems to be an unrealistic target no matter how carefully the board is designed. The PPC system just does not run reliably at those speeds. We knew this might be a problem from the outset and so made provision for booting at slower speeds through the use of DIP switches or remotely reconfiguring from the XPORT. I am surprised that you're still seeing memory errors. Both registered and unregistered DIMMs work for me at these speeds. I doubt there will be a hardware fix for this. We could rev the board and try'n tighten-up the timing, but I am confident in the hardware design at this stage and suspect a software issue on your side. For one, there is an error in the floating-point-unit test that Uboot doesn't like which might be causing some of your troubles. Could you please provide a printout of Uboot's error messages? There is now a new version of Uboot in SVN (uboot-clkfix-20091113.bin) which has this test disabled. Please give that a go. You do not need any DIMM in the FPGA to boot the PPC. And with registered DIMMs and bootstrap option C, you should not be seeing any more memory errors. If updating uboot doesn't fix your problem, I suggest you send all your ROACHs and memory modules with Wan next week and we'll get 'em all up and running with the latest firmware versions for you. Jason On 13 Nov 2009, at 02:58, Kjetil Wormnes wrote: Hi again Jason, thanks again. Swapping to the FPGA dimm did indeed get me to the uboot prompt (although with the same memory errormessages along the way). However I have not been able to then proceed to boot the kernel, but this may be because I now have no memory available to the FPGA. It may also be the other bugs you spoke off. I should also mention that I took the opportunity of upgrading uboot to svn2226. The problem still persists, and only using the FPGA dimm gets me to the uboot prompt. To be honest, the solutions here to swap to registered DIMMs or by changing to bootstrap C seem more like hacks than anything else, and certainly not something that inspires confidence in reliability. Do you think the underlying problem; ie the poor signal integrity or the aggressive bus timing is fixable by bug-fixes to uboot? Or is this something that will require an upgrade to the hardware? If this is the case, we would love (and need) to know as it means we probably will need to delay our program until we can get more reliable hardware. If it is a software issue then I would have to ask you if you think the fix could get a quite high priority? But on another note, I pretty desperately need to get this thing booting again, even with the reliability issues we had before. This is so that I can work on some of the software interfaces while Wan is over your way. So, I guess I might try to go out and buy some registered DIMM. Could you please advice me on some of the other specs that are important; 512 MB DDR2 at 400 MHz seem very difficult to get hold off, and anything else appears to cause the memory errors I mentioned. Are these important? Or should it work if I got, say a 1 GB stick at some faster speed? I noticed the FPGA ram is 800 MHz which should be obtainable... that is... if those errors are something we can live with for
Re: [casper] ROACH hangs before Uboot prompt
Hi again Jason, thanks again. Swapping to the FPGA dimm did indeed get me to the uboot prompt (although with the same memory errormessages along the way). However I have not been able to then proceed to boot the kernel, but this may be because I now have no memory available to the FPGA. It may also be the other bugs you spoke off. I should also mention that I took the opportunity of upgrading uboot to svn2226. The problem still persists, and only using the FPGA dimm gets me to the uboot prompt. To be honest, the solutions here to swap to registered DIMMs or by changing to bootstrap C seem more like hacks than anything else, and certainly not something that inspires confidence in reliability. Do you think the underlying problem; ie the poor signal integrity or the aggressive bus timing is fixable by bug-fixes to uboot? Or is this something that will require an upgrade to the hardware? If this is the case, we would love (and need) to know as it means we probably will need to delay our program until we can get more reliable hardware. If it is a software issue then I would have to ask you if you think the fix could get a quite high priority? But on another note, I pretty desperately need to get this thing booting again, even with the reliability issues we had before. This is so that I can work on some of the software interfaces while Wan is over your way. So, I guess I might try to go out and buy some registered DIMM. Could you please advice me on some of the other specs that are important; 512 MB DDR2 at 400 MHz seem very difficult to get hold off, and anything else appears to cause the memory errors I mentioned. Are these important? Or should it work if I got, say a 1 GB stick at some faster speed? I noticed the FPGA ram is 800 MHz which should be obtainable... that is... if those errors are something we can live with for now. cheers Kjetil Jason Manley wrote: Hi Kjetil. We have occasionally observed a similar problem here. Uboot tries to learn the required memory timing when booting. Sometimes it fails. It seems to be due to aggressive bus timing and poor signal integrity. Switching to registered DIMMs (like the one the FPGA uses) solves that problem, but introduces a new one which appears sporadically later in the Uboot boot process. We've declocked our boards to Bootstrap C and it solved our memory issues. Since you've already tried this without success, I suggest you try registered DIMMs (put the FPGA dimm in the PPC slot) and see if that solves this problem for you. If declocking doesn't fix this, we will have to work on a Uboot fix to enable reliable support for registered DIMMs. Jason On 11 Nov 2009, at 03:43, Kjetil Wormnes wrote: Hi all, new week, new problem. I tried to boot my Roach board this morning after not touching it for about 1.5 weeks. This time however it didn't get to the uBoot prompt. It hangs at the memory test. Interestingly there has been no change from before when it did boot. Anyway; this is what is displayed: U-Boot 2008.10-svn2212 (Aug 7 2009 - 12:20:58) CPU: AMCC PowerPC 440EPx Rev. A at 528 MHz (PLB=132, OPB=66, EBC=66 MHz) No Security/Kasumi support Bootstrap Option C - Boot ROM Location EBC (16 bits) 32 kB I-Cache 32 kB D-Cache Board: Roach I2C: ready DTT: 1 is 26 C DRAM: (spd v1.0) 512 MB I noticed that C-H Cheng had posted about this exact same problem in August this year. (http://www.mail-archive.com/casper@lists.berkeley.edu/msg00870.html). In their case the problem seems to have been solved by swapping memory stick and upgrading uboot using a JTAG programmer. As you can see above, I tried using SW3 to force Bootstrap option C. This did not make a difference. I have tried to swap the memory; could not find the exact same so got a 1 GB stick at a higher speed. This just caused the system to return a Memory Error; snip DRAM: (spd v1.2) 1 GB Memory error at 0004, wrote , read 0055 ! I tried to swap with an identical memory stick from our other Roach board. This did not make a difference. Before I send this board back to be reprogrammed I was wondering if anyone would have any other suggestions? Thank you for all your continuing help regards Kjetil
[casper] ROACH hangs before Uboot prompt
Hi all, new week, new problem. I tried to boot my Roach board this morning after not touching it for about 1.5 weeks. This time however it didn't get to the uBoot prompt. It hangs at the memory test. Interestingly there has been no change from before when it did boot. Anyway; this is what is displayed: U-Boot 2008.10-svn2212 (Aug 7 2009 - 12:20:58) CPU: AMCC PowerPC 440EPx Rev. A at 528 MHz (PLB=132, OPB=66, EBC=66 MHz) No Security/Kasumi support Bootstrap Option C - Boot ROM Location EBC (16 bits) 32 kB I-Cache 32 kB D-Cache Board: Roach I2C: ready DTT: 1 is 26 C DRAM: (spd v1.0) 512 MB I noticed that C-H Cheng had posted about this exact same problem in August this year. (http://www.mail-archive.com/casper@lists.berkeley.edu/msg00870.html). In their case the problem seems to have been solved by swapping memory stick and upgrading uboot using a JTAG programmer. As you can see above, I tried using SW3 to force Bootstrap option C. This did not make a difference. I have tried to swap the memory; could not find the exact same so got a 1 GB stick at a higher speed. This just caused the system to return a Memory Error; snip DRAM: (spd v1.2) 1 GB Memory error at 0004, wrote , read 0055 ! I tried to swap with an identical memory stick from our other Roach board. This did not make a difference. Before I send this board back to be reprogrammed I was wondering if anyone would have any other suggestions? Thank you for all your continuing help regards Kjetil
Re: [casper] ROACH hangs before Uboot prompt
Hi Kjetil. We have occasionally observed a similar problem here. Uboot tries to learn the required memory timing when booting. Sometimes it fails. It seems to be due to aggressive bus timing and poor signal integrity. Switching to registered DIMMs (like the one the FPGA uses) solves that problem, but introduces a new one which appears sporadically later in the Uboot boot process. We've declocked our boards to Bootstrap C and it solved our memory issues. Since you've already tried this without success, I suggest you try registered DIMMs (put the FPGA dimm in the PPC slot) and see if that solves this problem for you. If declocking doesn't fix this, we will have to work on a Uboot fix to enable reliable support for registered DIMMs. Jason On 11 Nov 2009, at 03:43, Kjetil Wormnes wrote: Hi all, new week, new problem. I tried to boot my Roach board this morning after not touching it for about 1.5 weeks. This time however it didn't get to the uBoot prompt. It hangs at the memory test. Interestingly there has been no change from before when it did boot. Anyway; this is what is displayed: U-Boot 2008.10-svn2212 (Aug 7 2009 - 12:20:58) CPU: AMCC PowerPC 440EPx Rev. A at 528 MHz (PLB=132, OPB=66, EBC=66 MHz) No Security/Kasumi support Bootstrap Option C - Boot ROM Location EBC (16 bits) 32 kB I-Cache 32 kB D-Cache Board: Roach I2C: ready DTT: 1 is 26 C DRAM: (spd v1.0) 512 MB I noticed that C-H Cheng had posted about this exact same problem in August this year. (http://www.mail-archive.com/casper@lists.berkeley.edu/msg00870.html). In their case the problem seems to have been solved by swapping memory stick and upgrading uboot using a JTAG programmer. As you can see above, I tried using SW3 to force Bootstrap option C. This did not make a difference. I have tried to swap the memory; could not find the exact same so got a 1 GB stick at a higher speed. This just caused the system to return a Memory Error; snip DRAM: (spd v1.2) 1 GB Memory error at 0004, wrote , read 0055 ! I tried to swap with an identical memory stick from our other Roach board. This did not make a difference. Before I send this board back to be reprogrammed I was wondering if anyone would have any other suggestions? Thank you for all your continuing help regards Kjetil