Re: [casper] ROACH hangs before Uboot prompt

2009-11-17 Thread Kjetil Wormnes

Hi Jason,

thanks for your email. All your help getting our systems up and running 
is very much appreciated.


I have upgraded to the latest version of uBoot as per your suggestion.
I also reinstalled the latest filesystem I could find; 
filesystem_etch_2009_10_28_tgtap.bz


Unfortunately, it still dies before the end of the boot sequence.

Both using usbboot and nfsboot the boot sequence gets almost to the end 
when it dies with some malloc problem. The exact cause is hard to 
determine, but I have attached some logs for your reference. (Note that 
these were taken before I upgraded the filesystem, but doing so changed 
nothing).

The main thing to note are:
Memory error message before uboot prompt: Memory error at 0150, 
wrote , read ffeb !

Malloc error at the end of the boot sequence

So, at this stage I have run out of ideas. So unless you have any other 
ideas it may be that the best option is to wait for Wan to come back, 
and he can try to use his newfound knowledge to update everything on 
this board to the same working state as his roach hopefully will be in.


thanks again,k and cheer

Kjetil

Jason Manley wrote:

Hi Kjetil

Since decreasing the bus speed back to the normal 66.67MHz (133MHz  
memory) and using registered DIMMs, we have not had any further memory  
troubles. This has been checked on 5 different ROACH boards from  
production run 1 and 2. Dave's even put one of the boards in a burn-in  
bootloop and after over a thousand boots, still not a single failure.  
But perhaps this sample size is too small to be meaningful.


Preparation H was trying to operate the memory right at the PPC's  
limits (167MHz), and it seems to be an unrealistic target no matter  
how carefully the board is designed. The PPC system just does not run  
reliably at those speeds. We knew this might be a problem from the  
outset and so made provision for booting at slower speeds through the  
use of DIP switches or remotely reconfiguring from the XPORT. I am  
surprised that you're still seeing memory errors. Both registered and  
unregistered DIMMs work for me at these speeds.


I doubt there will be a hardware fix for this. We could rev the board  
and try'n tighten-up the timing, but I am confident in the hardware  
design at this stage and suspect a software issue on your side. For  
one, there is an error in the floating-point-unit test that Uboot  
doesn't like which might be causing some of your troubles. Could you  
please provide a printout of Uboot's error messages? There is now a  
new version of Uboot in SVN (uboot-clkfix-20091113.bin) which has this  
test disabled. Please give that a go.


You do not need any DIMM in the FPGA to boot the PPC. And with  
registered DIMMs and bootstrap option C, you should not be seeing any  
more memory errors. If updating uboot doesn't fix your problem, I  
suggest you send all your ROACHs and memory modules with Wan next week  
and we'll get 'em all up and running with the latest firmware versions  
for you.


Jason

On 13 Nov 2009, at 02:58, Kjetil Wormnes wrote:

  

Hi again Jason,

thanks again. Swapping to the FPGA dimm did indeed get me to the  
uboot prompt (although with the same memory errormessages along the  
way).


However I have not been able to then proceed to boot the kernel, but  
this may be because I now have no memory available to the FPGA. It  
may also be the other bugs you spoke off.


I should also mention that I took the opportunity of upgrading uboot  
to svn2226. The problem still persists, and only using the FPGA dimm  
gets me to the uboot prompt.


To be honest, the solutions here to swap to registered DIMMs or by  
changing to bootstrap C seem more like hacks than anything else, and  
certainly not something that inspires confidence in reliability.


Do you think the underlying problem; ie the poor signal integrity or  
the aggressive bus timing is fixable by bug-fixes to uboot? Or is  
this something that will require an upgrade to the hardware? If this  
is the case, we would love (and need) to know as it means we  
probably will need to delay our program until we can get more  
reliable hardware. If it is a software issue then I would have to  
ask you if you think the fix could get a quite high priority?


But on another note, I pretty desperately need to get this thing  
booting again, even with the reliability issues we had before. This  
is so that I can work on some of the software interfaces while Wan  
is over your way.


So, I guess I might try to go out and buy some registered DIMM.  
Could you please advice me on some of the other specs that are  
important; 512 MB DDR2 at 400 MHz seem very difficult to get hold  
off, and anything else appears to cause the memory errors I  
mentioned. Are these important? Or should it work if I got, say a 1  
GB stick at some faster speed? I noticed the FPGA ram is 800 MHz  
which should be obtainable... that is... if those errors are  
something we can live with for 

Re: [casper] ROACH hangs before Uboot prompt

2009-11-12 Thread Kjetil Wormnes

Hi again Jason,

thanks again. Swapping to the FPGA dimm did indeed get me to the uboot 
prompt (although with the same memory errormessages along the way).


However I have not been able to then proceed to boot the kernel, but 
this may be because I now have no memory available to the FPGA. It may 
also be the other bugs you spoke off.


I should also mention that I took the opportunity of upgrading uboot to 
svn2226. The problem still persists, and only using the FPGA dimm gets 
me to the uboot prompt.


To be honest, the solutions here to swap to registered DIMMs or by 
changing to bootstrap C seem more like hacks than anything else, and 
certainly not something that inspires confidence in reliability.


Do you think the underlying problem; ie the poor signal integrity or the 
aggressive bus timing is fixable by bug-fixes to uboot? Or is this 
something that will require an upgrade to the hardware? If this is the 
case, we would love (and need) to know as it means we probably will need 
to delay our program until we can get more reliable hardware. If it is a 
software issue then I would have to ask you if you think the fix could 
get a quite high priority?


But on another note, I pretty desperately need to get this thing booting 
again, even with the reliability issues we had before. This is so that I 
can work on some of the software interfaces while Wan is over your way.


So, I guess I might try to go out and buy some registered DIMM. Could 
you please advice me on some of the other specs that are important; 512 
MB DDR2 at 400 MHz seem very difficult to get hold off, and anything 
else appears to cause the memory errors I mentioned. Are these 
important? Or should it work if I got, say a 1 GB stick at some faster 
speed? I noticed the FPGA ram is 800 MHz which should be obtainable... 
that is... if those errors are something we can live with for now.


cheers

Kjetil

Jason Manley wrote:

Hi Kjetil.

We have occasionally observed a similar problem here. Uboot tries to  
learn the required memory timing when booting. Sometimes it fails.  
It seems to be due to aggressive bus timing and poor signal integrity.  
Switching to registered DIMMs (like the one the FPGA uses) solves that  
problem, but introduces a new one which appears sporadically later in  
the Uboot boot process.


We've declocked our boards to Bootstrap C and it solved our memory  
issues. Since you've already tried this without success, I suggest you  
try registered DIMMs (put the FPGA dimm in the PPC slot) and see if  
that solves this problem for you.


If declocking doesn't fix this, we will have to work on a Uboot fix to  
enable reliable support for registered DIMMs.


Jason


On 11 Nov 2009, at 03:43, Kjetil Wormnes wrote:

  

Hi all,

new week, new problem.

I tried to boot my Roach board this morning after not touching it  
for about 1.5 weeks. This time however it didn't get to the uBoot  
prompt. It hangs at the memory test. Interestingly there has been no  
change from before when it did boot. Anyway; this is what is  
displayed:




U-Boot 2008.10-svn2212 (Aug  7 2009 - 12:20:58)

CPU:   AMCC PowerPC 440EPx Rev. A at 528 MHz (PLB=132, OPB=66,  
EBC=66 MHz)

  No Security/Kasumi support
  Bootstrap Option C - Boot ROM Location EBC (16 bits)
  32 kB I-Cache 32 kB D-Cache
Board: Roach
I2C:   ready
DTT:   1 is 26 C
DRAM:  (spd v1.0) 512 MB
  
I noticed that C-H Cheng had posted about this exact same problem in  
August this year.

(http://www.mail-archive.com/casper@lists.berkeley.edu/msg00870.html).

In their case the problem seems to have been solved by swapping  
memory stick and upgrading uboot using a JTAG programmer.


As you can see above, I tried using SW3 to force Bootstrap option C.  
This did not make a difference.


I have tried to swap the memory; could not find the exact same so  
got a 1 GB stick at a higher speed. This just caused the system to  
return a Memory Error;


snip
DRAM:  (spd v1.2)  1 GB
Memory error at 0004, wrote , read 0055 !
  
I tried to swap with an identical memory stick from our other Roach  
board. This did not make a difference.


Before I send this board back to be reprogrammed I was wondering if  
anyone would have any other suggestions?


Thank you for all your continuing help

regards

Kjetil




  





[casper] ROACH hangs before Uboot prompt

2009-11-10 Thread Kjetil Wormnes

Hi all,

new week, new problem.

I tried to boot my Roach board this morning after not touching it for 
about 1.5 weeks. This time however it didn't get to the uBoot prompt. It 
hangs at the memory test. Interestingly there has been no change from 
before when it did boot. Anyway; this is what is displayed:



U-Boot 2008.10-svn2212 (Aug  7 2009 - 12:20:58)

CPU:   AMCC PowerPC 440EPx Rev. A at 528 MHz (PLB=132, OPB=66, EBC=66 MHz)
   No Security/Kasumi support
   Bootstrap Option C - Boot ROM Location EBC (16 bits)
   32 kB I-Cache 32 kB D-Cache
Board: Roach
I2C:   ready
DTT:   1 is 26 C
DRAM:  (spd v1.0) 512 MB


I noticed that C-H Cheng had posted about this exact same problem in 
August this year.

(http://www.mail-archive.com/casper@lists.berkeley.edu/msg00870.html).

In their case the problem seems to have been solved by swapping memory 
stick and upgrading uboot using a JTAG programmer.


As you can see above, I tried using SW3 to force Bootstrap option C. 
This did not make a difference.


I have tried to swap the memory; could not find the exact same so got a 
1 GB stick at a higher speed. This just caused the system to return a 
Memory Error;

snip
DRAM:  (spd v1.2)  1 GB
Memory error at 0004, wrote , read 0055 !


I tried to swap with an identical memory stick from our other Roach 
board. This did not make a difference.


Before I send this board back to be reprogrammed I was wondering if 
anyone would have any other suggestions?


Thank you for all your continuing help

regards

Kjetil



Re: [casper] ROACH hangs before Uboot prompt

2009-11-10 Thread Jason Manley

Hi Kjetil.

We have occasionally observed a similar problem here. Uboot tries to  
learn the required memory timing when booting. Sometimes it fails.  
It seems to be due to aggressive bus timing and poor signal integrity.  
Switching to registered DIMMs (like the one the FPGA uses) solves that  
problem, but introduces a new one which appears sporadically later in  
the Uboot boot process.


We've declocked our boards to Bootstrap C and it solved our memory  
issues. Since you've already tried this without success, I suggest you  
try registered DIMMs (put the FPGA dimm in the PPC slot) and see if  
that solves this problem for you.


If declocking doesn't fix this, we will have to work on a Uboot fix to  
enable reliable support for registered DIMMs.


Jason


On 11 Nov 2009, at 03:43, Kjetil Wormnes wrote:


Hi all,

new week, new problem.

I tried to boot my Roach board this morning after not touching it  
for about 1.5 weeks. This time however it didn't get to the uBoot  
prompt. It hangs at the memory test. Interestingly there has been no  
change from before when it did boot. Anyway; this is what is  
displayed:



U-Boot 2008.10-svn2212 (Aug  7 2009 - 12:20:58)

CPU:   AMCC PowerPC 440EPx Rev. A at 528 MHz (PLB=132, OPB=66,  
EBC=66 MHz)

  No Security/Kasumi support
  Bootstrap Option C - Boot ROM Location EBC (16 bits)
  32 kB I-Cache 32 kB D-Cache
Board: Roach
I2C:   ready
DTT:   1 is 26 C
DRAM:  (spd v1.0) 512 MB


I noticed that C-H Cheng had posted about this exact same problem in  
August this year.

(http://www.mail-archive.com/casper@lists.berkeley.edu/msg00870.html).

In their case the problem seems to have been solved by swapping  
memory stick and upgrading uboot using a JTAG programmer.


As you can see above, I tried using SW3 to force Bootstrap option C.  
This did not make a difference.


I have tried to swap the memory; could not find the exact same so  
got a 1 GB stick at a higher speed. This just caused the system to  
return a Memory Error;

snip
DRAM:  (spd v1.2)  1 GB
Memory error at 0004, wrote , read 0055 !


I tried to swap with an identical memory stick from our other Roach  
board. This did not make a difference.


Before I send this board back to be reprogrammed I was wondering if  
anyone would have any other suggestions?


Thank you for all your continuing help

regards

Kjetil