Re: [casper] ROACH hangs before Uboot prompt
Hi Jason, thanks for your email. All your help getting our systems up and running is very much appreciated. I have upgraded to the latest version of uBoot as per your suggestion. I also reinstalled the latest filesystem I could find; filesystem_etch_2009_10_28_tgtap.bz Unfortunately, it still dies before the end of the boot sequence. Both using usbboot and nfsboot the boot sequence gets almost to the end when it dies with some malloc problem. The exact cause is hard to determine, but I have attached some logs for your reference. (Note that these were taken before I upgraded the filesystem, but doing so changed nothing). The main thing to note are: Memory error message before uboot prompt: Memory error at 0150, wrote , read ffeb ! Malloc error at the end of the boot sequence So, at this stage I have run out of ideas. So unless you have any other ideas it may be that the best option is to wait for Wan to come back, and he can try to use his newfound knowledge to update everything on this board to the same working state as his roach hopefully will be in. thanks again,k and cheer Kjetil Jason Manley wrote: Hi Kjetil Since decreasing the bus speed back to the normal 66.67MHz (133MHz memory) and using registered DIMMs, we have not had any further memory troubles. This has been checked on 5 different ROACH boards from production run 1 and 2. Dave's even put one of the boards in a burn-in bootloop and after over a thousand boots, still not a single failure. But perhaps this sample size is too small to be meaningful. Preparation H was trying to operate the memory right at the PPC's limits (167MHz), and it seems to be an unrealistic target no matter how carefully the board is designed. The PPC system just does not run reliably at those speeds. We knew this might be a problem from the outset and so made provision for booting at slower speeds through the use of DIP switches or remotely reconfiguring from the XPORT. I am surprised that you're still seeing memory errors. Both registered and unregistered DIMMs work for me at these speeds. I doubt there will be a hardware fix for this. We could rev the board and try'n tighten-up the timing, but I am confident in the hardware design at this stage and suspect a software issue on your side. For one, there is an error in the floating-point-unit test that Uboot doesn't like which might be causing some of your troubles. Could you please provide a printout of Uboot's error messages? There is now a new version of Uboot in SVN (uboot-clkfix-20091113.bin) which has this test disabled. Please give that a go. You do not need any DIMM in the FPGA to boot the PPC. And with registered DIMMs and bootstrap option C, you should not be seeing any more memory errors. If updating uboot doesn't fix your problem, I suggest you send all your ROACHs and memory modules with Wan next week and we'll get 'em all up and running with the latest firmware versions for you. Jason On 13 Nov 2009, at 02:58, Kjetil Wormnes wrote: Hi again Jason, thanks again. Swapping to the FPGA dimm did indeed get me to the uboot prompt (although with the same memory errormessages along the way). However I have not been able to then proceed to boot the kernel, but this may be because I now have no memory available to the FPGA. It may also be the other bugs you spoke off. I should also mention that I took the opportunity of upgrading uboot to svn2226. The problem still persists, and only using the FPGA dimm gets me to the uboot prompt. To be honest, the solutions here to swap to registered DIMMs or by changing to bootstrap C seem more like hacks than anything else, and certainly not something that inspires confidence in reliability. Do you think the underlying problem; ie the poor signal integrity or the aggressive bus timing is fixable by bug-fixes to uboot? Or is this something that will require an upgrade to the hardware? If this is the case, we would love (and need) to know as it means we probably will need to delay our program until we can get more reliable hardware. If it is a software issue then I would have to ask you if you think the fix could get a quite high priority? But on another note, I pretty desperately need to get this thing booting again, even with the reliability issues we had before. This is so that I can work on some of the software interfaces while Wan is over your way. So, I guess I might try to go out and buy some registered DIMM. Could you please advice me on some of the other specs that are important; 512 MB DDR2 at 400 MHz seem very difficult to get hold off, and anything else appears to cause the memory errors I mentioned. Are these important? Or should it work if I got, say a 1 GB stick at some faster speed? I noticed the FPGA ram is 800 MHz which should be obtainable... that is... if those errors are something we can live
Re: [casper] ROACH hangs before Uboot prompt
Hi again Jason, thanks again. Swapping to the FPGA dimm did indeed get me to the uboot prompt (although with the same memory errormessages along the way). However I have not been able to then proceed to boot the kernel, but this may be because I now have no memory available to the FPGA. It may also be the other bugs you spoke off. I should also mention that I took the opportunity of upgrading uboot to svn2226. The problem still persists, and only using the FPGA dimm gets me to the uboot prompt. To be honest, the solutions here to swap to registered DIMMs or by changing to bootstrap C seem more like hacks than anything else, and certainly not something that inspires confidence in reliability. Do you think the underlying problem; ie the poor signal integrity or the aggressive bus timing is fixable by bug-fixes to uboot? Or is this something that will require an upgrade to the hardware? If this is the case, we would love (and need) to know as it means we probably will need to delay our program until we can get more reliable hardware. If it is a software issue then I would have to ask you if you think the fix could get a quite high priority? But on another note, I pretty desperately need to get this thing booting again, even with the reliability issues we had before. This is so that I can work on some of the software interfaces while Wan is over your way. So, I guess I might try to go out and buy some registered DIMM. Could you please advice me on some of the other specs that are important; 512 MB DDR2 at 400 MHz seem very difficult to get hold off, and anything else appears to cause the memory errors I mentioned. Are these important? Or should it work if I got, say a 1 GB stick at some faster speed? I noticed the FPGA ram is 800 MHz which should be obtainable... that is... if those errors are something we can live with for now. cheers Kjetil Jason Manley wrote: Hi Kjetil. We have occasionally observed a similar problem here. Uboot tries to learn the required memory timing when booting. Sometimes it fails. It seems to be due to aggressive bus timing and poor signal integrity. Switching to registered DIMMs (like the one the FPGA uses) solves that problem, but introduces a new one which appears sporadically later in the Uboot boot process. We've declocked our boards to Bootstrap C and it solved our memory issues. Since you've already tried this without success, I suggest you try registered DIMMs (put the FPGA dimm in the PPC slot) and see if that solves this problem for you. If declocking doesn't fix this, we will have to work on a Uboot fix to enable reliable support for registered DIMMs. Jason On 11 Nov 2009, at 03:43, Kjetil Wormnes wrote: Hi all, new week, new problem. I tried to boot my Roach board this morning after not touching it for about 1.5 weeks. This time however it didn't get to the uBoot prompt. It hangs at the memory test. Interestingly there has been no change from before when it did boot. Anyway; this is what is displayed: U-Boot 2008.10-svn2212 (Aug 7 2009 - 12:20:58) CPU: AMCC PowerPC 440EPx Rev. A at 528 MHz (PLB=132, OPB=66, EBC=66 MHz) No Security/Kasumi support Bootstrap Option C - Boot ROM Location EBC (16 bits) 32 kB I-Cache 32 kB D-Cache Board: Roach I2C: ready DTT: 1 is 26 C DRAM: (spd v1.0) 512 MB I noticed that C-H Cheng had posted about this exact same problem in August this year. (http://www.mail-archive.com/casper@lists.berkeley.edu/msg00870.html). In their case the problem seems to have been solved by swapping memory stick and upgrading uboot using a JTAG programmer. As you can see above, I tried using SW3 to force Bootstrap option C. This did not make a difference. I have tried to swap the memory; could not find the exact same so got a 1 GB stick at a higher speed. This just caused the system to return a Memory Error; snip DRAM: (spd v1.2) 1 GB Memory error at 0004, wrote , read 0055 ! I tried to swap with an identical memory stick from our other Roach board. This did not make a difference. Before I send this board back to be reprogrammed I was wondering if anyone would have any other suggestions? Thank you for all your continuing help regards Kjetil
[casper] ROACH hangs before Uboot prompt
Hi all, new week, new problem. I tried to boot my Roach board this morning after not touching it for about 1.5 weeks. This time however it didn't get to the uBoot prompt. It hangs at the memory test. Interestingly there has been no change from before when it did boot. Anyway; this is what is displayed: U-Boot 2008.10-svn2212 (Aug 7 2009 - 12:20:58) CPU: AMCC PowerPC 440EPx Rev. A at 528 MHz (PLB=132, OPB=66, EBC=66 MHz) No Security/Kasumi support Bootstrap Option C - Boot ROM Location EBC (16 bits) 32 kB I-Cache 32 kB D-Cache Board: Roach I2C: ready DTT: 1 is 26 C DRAM: (spd v1.0) 512 MB I noticed that C-H Cheng had posted about this exact same problem in August this year. (http://www.mail-archive.com/casper@lists.berkeley.edu/msg00870.html). In their case the problem seems to have been solved by swapping memory stick and upgrading uboot using a JTAG programmer. As you can see above, I tried using SW3 to force Bootstrap option C. This did not make a difference. I have tried to swap the memory; could not find the exact same so got a 1 GB stick at a higher speed. This just caused the system to return a Memory Error; snip DRAM: (spd v1.2) 1 GB Memory error at 0004, wrote , read 0055 ! I tried to swap with an identical memory stick from our other Roach board. This did not make a difference. Before I send this board back to be reprogrammed I was wondering if anyone would have any other suggestions? Thank you for all your continuing help regards Kjetil
Re: [casper] Fwd: Re: SPDO ROACH spectrometer
Hi Jason, Just out of curiosity, did you get my last email? I noticed that your reply was not to the last one I sent. In the last one I detailed some tests and the results. It also showed the uboot and bootstrap config. Anyway, it had some attachments and I am unsure how the list handled those. I can't seem to find it in the archives, but it may just not have been listed yet. If you didn't get that email, let me know and I'll send it again. So, it looks like the clocks and bootstrap is matching (although to be honest, I only updated the eeprom to make it boot of H about a week ago). I checked all the resistors you indicated; all except for one is within 1 Ohm of 51. One resistor is at 59 ohm. But I don't suspect that should really be a problem. To be honest, I am a bit reluctant to point to hardware problems since we have two Roach boards and the chance that we would have two dud ones seems slim. It seems much more likely to me that the problem is in the software/kernel or firmware. Or alternatively there is something a bit unreliable in the hardware that the software/kernel/firmware is not handling as well as it could. I appreciate greatly that you are doing some tests on your hardware. I think what I would like to do now is to wait for the new uboot, and ensure that *everything*; uboot/kernel/cpld/filesystem/bootargs/physical setup is identical between our systems. Then it would be good if we could develop a well specified simple test; scping a 2gb file a few times would probably be fine. And if there are still mismatches then we can start worrying about hardware problems. I'll be out of action at a course most of next week, but since we are waiting for the new uboot anyway that should not be such an issue. Wan and/or Aaron may wish to continue this discussion during that week though, otherwise I'll be back on the 9th. I've ccd Aaron on this email. Thanks once again. You are being very helpful and I am feeling that we are making progress. best regards Kjetil Jason Manley wrote: Hi Kjetil Since you're not using the FPGA at all, that rules out bus issues. I suspect a memory problem. Please check your memory DIMM as outlined in my earlier email. WRT Uboot versions: We'll work on releasing a new Uboot with latest SVN source to be sure we're all running the same version. Expect an update next week after we've had a chance to verify that the new version works correctly. Please also check your clocks: make sure you're booting with bootstrap option H with the same bus speeds as listed below (check lines 3 and 5 in Uboot header). If this is the same, then don't worry about updating the Fusion. CPU: AMCC PowerPC 440EPx Rev. A at 495 MHz (PLB=165, OPB=82, EBC=82 MHz) No Security/Kasumi support Bootstrap Option H - Boot ROM Location I2C (Addr 0x52) ... If it's something else, clocks are setup incorrectly. To fix this, first check that all DIP switches are set to off. If DIP switches are off and it's booting into config C, you might need to flash your Fusion (a flag in its eeprom toggles between boot option C and boot option H). If it is already boot option H, but speeds are wrong, then the settings in an I2C EEPROM are wrong. Reset 'em as follows: *) update your uboot to latest version. *) interrupt Uboot and clear the environment by executing run clearenv. *) Reboot. *) Interrupt boot and execute run init_eeprom *) reconfigure your mac address by executing setenv ethaddr 02:00:00:aa:bb:cc (where aabbcc is your board's serial number). *) save the environment by executing saveenv. *) Reboot. FWIW, Dave's managed to transfer large files (2GB) without problem, even using SCP, both sending and receiving. Tests ongoing this side. Jason On 05 Nov 2009, at 00:15, Kjetil Wormnes wrote: Hi Jason, Thanks for your pointers; I am currently not actually using the FPGA. Just focusing on being able to talk to the powerpc reliably at the moment. The system does also crash when using NFS, but as I said and you noted; it is more difficult to trace them directly back to EMACS related kernel functions. It may very well be a secondary symptom of something else. Now your suggested versions for Uboot/CPLD/Monitor are interesting. We have two roach boards; the newer one that I have been testing is reporting U-Boot 2008.10-svn2157 (Jul 31 2009 - 17:15:22) ... Monitor Revision: 7.3.0 CPLD Revision:7.5.6 Whereas the older Roach that Wan has been using reports U-Boot 2008.10-svn1923 (May 29 2009 - 17:22:43) ... Monitor Revision: 6.5.1429 CPLD Revision:2.0.5 Leaving this older one aside for reference for now, I have upgraded the U-boot image on the newer roach to 20090807-uboot-nohack.bin, which is actually from revision 2212, but seemed to be the closest to the suggested revision I could find without compiling the image myself. I was unsuccessfully looking around for how to upgrade the CPLD/ Monitor. Would you be able
Re: [casper] Fwd: Re: SPDO ROACH spectrometer
Hi Jason, Thanks for your pointers; I am currently not actually using the FPGA. Just focusing on being able to talk to the powerpc reliably at the moment. The system does also crash when using NFS, but as I said and you noted; it is more difficult to trace them directly back to EMACS related kernel functions. It may very well be a secondary symptom of something else. Now your suggested versions for Uboot/CPLD/Monitor are interesting. We have two roach boards; the newer one that I have been testing is reporting U-Boot 2008.10-svn2157 (Jul 31 2009 - 17:15:22) ... Monitor Revision: 7.3.0 CPLD Revision:7.5.6 Whereas the older Roach that Wan has been using reports U-Boot 2008.10-svn1923 (May 29 2009 - 17:22:43) ... Monitor Revision: 6.5.1429 CPLD Revision:2.0.5 Leaving this older one aside for reference for now, I have upgraded the U-boot image on the newer roach to 20090807-uboot-nohack.bin, which is actually from revision 2212, but seemed to be the closest to the suggested revision I could find without compiling the image myself. I was unsuccessfully looking around for how to upgrade the CPLD/Monitor. Would you be able to point me in the right direction? I'll test for any improvements with the new uboot now. Thanks again Kjetil Jason Manley wrote: Also, make sure you're running newwer versions of uboot and the CPLD image. Bus settings changed some months back and improved stability significantly. Uboot will report the versions, and I recommend: U-Boot 2008.10-svn2226 (Aug 7 2009 - 16:06:44) ... Monitor Revision: 8.3.1698 CPLD Revision:8.1.0 at the very least, you should have CPLD Revision 8.0.1588. The only outstanding bug that regularly affects me is that u-boot sometimes doesn't detect the PPC's SDRAM on startup. The system then hangs. Replacing the DIMM with registered memory (same as FPGA DIMM) apparently fixes this. Jason On 04 Nov 2009, at 07:56, Kjetil Wormnes wrote: Hi all, For reference I've attached a summary of our problems below, and a few things I have attempted to do to isolate it. The short of it is that we are unable to transfer large amounts of data across the ethernet reliably regardless of; --kernel version --whether we are usb mount or nfs mount root file system. --network protocol used for transfer The way the crash happens varies, and is not repeatable. Sometime it seems to be a userspace crash, sometimes it is a kernel panic. I have been unable to see any real pattern in the crash reports. This to me seems to indicate that the root cause of the problem may be common, and either an obscure kernel problem or possibly something in the interface between the kernel and the hardware or in the hardware itself. It wouldn't be a big effort to re implement our software to run on a remote machine and talk to the ROACH over KATCP, rather than run locally on the ppc. But since it would require a complete rewrite of the software, we haven't tested this yet. Perhaps it is worth trying. The catch is that I am still really unsure whether we are dealing with many symptoms of the same problem; or many different problems. Anyway, I would like to thank you for all your input, and will let you know if and how we find a satisfactory solution. cheers Kjetil Here is the summary: *The problem* The system crashes when downloading large files. There appears to be varying causes for this crash that may or may not have a common underlying reason. I have attempted to isolate the problem by • Downloading using different protocols and software; ssh and two different ftp servers. • Mounting the filesystem over NFS as opposed to USB • Installing well-known and used kernels, and comparing to custom kernels. SSH SSH always crashes with “Invalid MAC on input” or related error messages. This appears to be a problem with SSH. *FTP* System instabilities were observed using two different ftp servers; proftpd and pure-ftpd. In the best case, with pure-ftpd was able to download 2-3 files, each of size about 2GB before system crashing. Looking through the call stack seemed to indicate that the crash happened in EMAC interface functions. (ie ethernet). However, we have no way of knowing whether these crashes are in fact rather side-effects of the USB subsystem misbehaving. Jason from the Casper mailing list has once again reconfirmed that USB on powerpcs is notoriously unreliable. *DIFFERENT KERNELS - DIFFERENT PROBLEMS* Using some kernels (the latest) saw the link unable to come up at all, while both a custom compiled older kernel (a couple of months ago) and a downloaded image, uImage-20091006-mmcfix both saw the link come up, but with all the crashes described. *ELIMINATING USB AS A CAUSE* To eliminate the effects of USB, I mounted the root filesystem remotely using NFS. I make a few observations; *SSH* Still dies from time to time with the Invalid MAC error message. This was expected as we have already pretty much determined that this error is ssh
Re: [casper] Fwd: Re: SPDO ROACH spectrometer
Hi Jason, Thank you again for your reply. I can use FTP or even write my own little raw socket transfer routine, and it seems to work, I can transfer a few gigabyte-size files. However, at the end of this, the other problem kicks in; causing a system crash. I believe this is a kernel problem, as it exhibits itself differently with different kernels I have tried. So, putting the ssh problem aside as something that we can work around and returning to the other request I made; I am compiling my own kernel because I seem to need to in order to get EHCI and EXT3 to work properly. However, when I do, EMAC can't autonegotiate a link, and even forcing it to something doesn't work. The link comes up, then drops out again... repeatedly. The interesting thing is this problem *does not* occur when I compile my kernel using an svn checkout from a couple of months ago. Even with the exact same .config file. At least this is the case as far as I can tell. Now, in order to be 100% sure that it is in fact a difference in the source that is causing this problem, rather than just the .config. I would love it if you could send me the .config file used to compile the uImage-20091006-mmcfix kernel. The ethernet interface does appear to be more stable with that kernel, but unfortunately I can't use it as it doesn't allow USB 2.0 speeds, so if you please, the .config file would be very useful. Thanks again for all your help Kjetil Jason Manley wrote: There appears to be some issue with ssh on ROACH with large transfers. It is definitely not a hardware problem as other network transfers work fine. Both Andrew Martens and myself regularly transfer large amounts of data (1GB) using KATCP. This ssh bug has become a low priority for us as we concentrate on other things. If you do not want to try'n debug it yourself, I recommend you try an FTP server. Kjetil, you are correct; at present, KATCP does not support transfer of arbitrary files from filesystem. Jason On 02 Nov 2009, at 00:51, Kjetil Wormnes wrote: Hi Jason, thank you for your reply. The SUN link was very descriptive. Firstly, it appears the problem is still there with the kernel build you suggested/ After a few megabytes, the connection closes telling me; Corrupted MAC on input. But interestingly it seems to have solved another problem that I was having with one of our ROACH boards. It would be great if you could send me the .config file for that build so I can compare it with mine. I have a custom kernel as I like ext3 support and a few other bits and pieces, but have been having some issues getting the network to establish a stable link. Now, back to the problem; We have a locally attached harddrive that we are writing our data to over USB. Occasionally we want to connect and download these. That's why I am using ssh. I can't really use KATCP for this, can I? Thanks again, Kjetil Jason Manley wrote: Um, no, this is probably a different problem. You are getting these errors while using SSH/SCP, right? The hardware problem with faulty PHY manifests as one or more of the PHY LEDs flashing on/ off (there are three red ones next to the PHY chip). If your link is stable, then I believe the hardware is fine. The MAC problem appears to be software related, and comes and goes depending on the kernel build. It does not refer to the MAC address, but rather ssh's Machine Authentication Code. Check out http://blogs.sun.com/janp/entry/ssh_messages_code_bad_packet for some info. Dave's made various changes to try'n fix it, and increasing some software buffer has solved it for me. I no longer see this problem, but it's probably been masked rather than solved. Also, you never see it using KATCP, which is one more reason to use that method for larger transfers. WRT large (1GB) transfers, remember that it will take a long time to pull that much data off the FPGA. It does so in pages of ~4000Bytes at a time. Also make sure you're using the latest kernel. We discovered a bug in this paging system during the workshop. http://casper.berkeley.edu/svn/trunk/roach/sw/binaries/linux/uImage-20091006-mmcfix should be good. I have never tried pulling such volumes over the SSH shell, but it works fine with KATCP. I will ask him to comment further. Jason On 30 Oct 2009, at 01:25, John Ford wrote: casper collaborators, appended below is further info on roach ethernet problems seen at CSIRO: any ideas? If I recall correctly, Alan mentioned this problem at the workshop, and the problem was that some of the PHY chips were faulty at one point. This may be what's going on. Hopefully someone knows for sure! John thanks, dan Original Message Subject: Re: SPDO ROACH spectrometer Date: Fri, 30 Oct 2009 09:19:01 +1100 From: Kjetil Wormnes kjetil.worm...@csiro.au To: Dan Werthimer d
Re: [casper] Fwd: Re: SPDO ROACH spectrometer
Hi Jason, thank you for your reply. The SUN link was very descriptive. Firstly, it appears the problem is still there with the kernel build you suggested/ After a few megabytes, the connection closes telling me; Corrupted MAC on input. But interestingly it seems to have solved another problem that I was having with one of our ROACH boards. It would be great if you could send me the .config file for that build so I can compare it with mine. I have a custom kernel as I like ext3 support and a few other bits and pieces, but have been having some issues getting the network to establish a stable link. Now, back to the problem; We have a locally attached harddrive that we are writing our data to over USB. Occasionally we want to connect and download these. That's why I am using ssh. I can't really use KATCP for this, can I? Thanks again, Kjetil Jason Manley wrote: Um, no, this is probably a different problem. You are getting these errors while using SSH/SCP, right? The hardware problem with faulty PHY manifests as one or more of the PHY LEDs flashing on/off (there are three red ones next to the PHY chip). If your link is stable, then I believe the hardware is fine. The MAC problem appears to be software related, and comes and goes depending on the kernel build. It does not refer to the MAC address, but rather ssh's Machine Authentication Code. Check out http://blogs.sun.com/janp/entry/ssh_messages_code_bad_packet for some info. Dave's made various changes to try'n fix it, and increasing some software buffer has solved it for me. I no longer see this problem, but it's probably been masked rather than solved. Also, you never see it using KATCP, which is one more reason to use that method for larger transfers. WRT large (1GB) transfers, remember that it will take a long time to pull that much data off the FPGA. It does so in pages of ~4000Bytes at a time. Also make sure you're using the latest kernel. We discovered a bug in this paging system during the workshop. http://casper.berkeley.edu/svn/trunk/roach/sw/binaries/linux/uImage-20091006-mmcfix should be good. I have never tried pulling such volumes over the SSH shell, but it works fine with KATCP. I will ask him to comment further. Jason On 30 Oct 2009, at 01:25, John Ford wrote: casper collaborators, appended below is further info on roach ethernet problems seen at CSIRO: any ideas? If I recall correctly, Alan mentioned this problem at the workshop, and the problem was that some of the PHY chips were faulty at one point. This may be what's going on. Hopefully someone knows for sure! John thanks, dan Original Message Subject: Re: SPDO ROACH spectrometer Date: Fri, 30 Oct 2009 09:19:01 +1100 From: Kjetil Wormnes kjetil.worm...@csiro.au To: Dan Werthimer d...@ssl.berkeley.edu Hi Dan and Wan I can confirm that we are seeing at least some of the problems with another ROACH board as well. This time it is connected directly to a computer with a short CATY5 cable. So maybe this indicates that it is less likely to be a hardware problem? Incidentally, the error message that happens when attempting to download a large file over sftp is Corrupted MAC on input. cheers Kjetil Dan Werthimer wrote: hi wan, i don't know of anyone who has roach ethernet problems at 100 Mbit/sec. i'm cc'ing casper community to see if anyone has any ideas. in general, it's good to post questions to cas...@lists, so that everyone can help answer, and everyone can see the answers, and the info will be captured in the wiki/email archive. if you want you can buy or ask digicom if they can send you another national PHY chip and see if this helps. also you might want to try using short cable, and/or a cat6 cable. is your roach connected directly to a computer, or going through a switch? might be interesting to try a different NIC or different switch or different computer. best, dan On 10/29/2009 02:47 PM, wan.ch...@csiro.au wrote: Hi Dan: I believe you have done a very nice job. My problem is Ethernet port is not very reliable. Even running at 100MHz, the Ethernet port will be disconnected at some times. Normally, it can resume after reboot whole system. And I could not transfer big file through ethernet. Small files like a few MB are all right. But I could not download 1GB file from Roach at all. So Dan, could this problem be solved by replacing the on board PHY? Thanks Wan
Re: [casper] SPDO ROACH spectrometer
Hi Dan and Wan I can confirm that we are seeing at least some of the problems with another ROACH board as well. This time it is connected directly to a computer with a short CATY5 cable. So maybe this indicates that it is less likely to be a hardware problem? Incidentally, the error message that happens when attempting to download a large file over sftp is Corrupted MAC on input. cheers Kjetil Dan Werthimer wrote: hi wan, i don't know of anyone who has roach ethernet problems at 100 Mbit/sec. i'm cc'ing casper community to see if anyone has any ideas. in general, it's good to post questions to cas...@lists, so that everyone can help answer, and everyone can see the answers, and the info will be captured in the wiki/email archive. if you want you can buy or ask digicom if they can send you another national PHY chip and see if this helps. also you might want to try using short cable, and/or a cat6 cable. is your roach connected directly to a computer, or going through a switch? might be interesting to try a different NIC or different switch or different computer. best, dan On 10/29/2009 02:47 PM, wan.ch...@csiro.au wrote: Hi Dan: I believe you have done a very nice job. My problem is Ethernet port is not very reliable. Even running at 100MHz, the Ethernet port will be disconnected at some times. Normally, it can resume after reboot whole system. And I could not transfer big file through ethernet. Small files like a few MB are all right. But I could not download 1GB file from Roach at all. So Dan, could this problem be solved by replacing the on board PHY? Thanks Wan
Re: [casper] Slow HD write speed
Hi David, Thank you so much for that. It was extremely helpful. We were indeed mounting the drive syncronously, *and* the USB is falling back to OHCI. Following your suggestions and removing the sync flag seems to have helped a bit, so to recompile the kernel next ... cheers Kjetil David George wrote: Hi Kjetil. So, we have a ROACH system that we have set up to boot via usbboot into a full debian etch filesystem. The problem is that we get extremely low write speeds to the disk. In the order of a couple of Mbit/s. The problem might be that your root filesystem is mounted with the 'sync' flag. Edit your /etc/rcSimple file (Init runs this script first); if you have something like: mount -o remount,rw,sync,noatime / change it to something like: mount -o remount,rw,noatime / This was my mistake - I though it would be a good idea for SD/MMC card access to be synchronous. Turns out it wasn't really. Perhaps a filesystem update is on the cards. The other problem is the AMCC PPC440EPX USB seems to misbehave quite badly. I have been fiddling around for a day or so, trying to work out why one of our flash sticks doesn't work reliably. Firstly, there is a known issue for the PPC440EPX USB that can lead to screw-ups when both OHCI and EHCI Linux drivers are loaded. https://kerneltrap.org/mailarchive/linux-usb/2008/11/2/3900114 There are fixes (hacks) in the upstream kernel, but we are running off the old ppc tree and updating to the new powerpc is not an insignificant task (It will probably happen this year though). I think this leads to USB devices falling over to OHCI (full-speed) even when they are EHCI(high-speed) compatible. This leads to data-rates of 1-2 MB per second. This could also be your problem. Now in theory, if you compile a kernel with just EHCI (high-speed mode) your USB devices should work at high speeds (9+ MB per second). However, I have seen very weird behaviour on one specific flash drive here with just EHCI compiled in. When I first insert the device the usb driver spews out errors. Then if I put in another device, which happens to work, and reinsert the old flakey flash drive it works fine from then on. This makes me think there is some software/setup problem. The same flakey device always works in OHCI mode when EHCI hasn't been compiled into the kernel. In summary - USB on ROACH has some known problems which will hopefully improve when we update to the new mainline kernel. If you are having reliability trouble try compiling a kernel without EHCI. If you want maximum performance try compiling without OHCI. Also make sure that your root filesystem isn't mounted 'sync'. Regards, David
[casper] Slow HD write speed
Hello all, We have a ROACH system and have butted against an ever so small problem that I was hoping one of you may be able to give some input on. You may notice that I am new to the mailing list, so hello :-). Please don't hesitate to let me know if I am not conforming to the posting policies. So, we have a ROACH system that we have set up to boot via usbboot into a full debian etch filesystem. The problem is that we get extremely low write speeds to the disk. In the order of a couple of Mbit/s. Has anyone come across this problem before? Any ideas how to solve it? Here is a bit of information about our setup UBOOT version: U-Boot 2008.10-svn1923 bootargs: bootargs console=ttyS0,115200 mtdparts=${partitions} rootdelay=8 root=/dev/sda1 rw kernel version: Linux-2.6.25-svn1867-dirty1 I can attach the full bootlog if that would be useful. I haven't done it here as I wouldn't want to scare you all with an enormous first post. best regards Kjetil