Sorry, here's the issue: Almost two years ago I loaded a cluster of 33 Dell PowerEdge 1750s with RedHat 9 and OSCAR 3. Initially, I couldn't get the nodes to network boot. Searching through this list, I found Jason Hlady's post concerning issues he was having loading a batch of 1750s and the fact that he'd gotten an updated kernel from Frank Crawford which had worked on some of his 1750s. Jason was kind enough to share this tarball with me and it worked great.
Now, what we actually purchased at that time was 38 Dell PowerEdge 1750s. Five of them were kept seperate as a small test cluster to test changes, etc. I've loaded this cluster multiple times with the same Redhat 9 and OSCAR 3 using Frank's updated kernel. It has always worked. Well, in troubleshooting a kernel issue with some of our software, we decided to take the four nodes from my test cluster and add them to the original 32 node cluster (33 actually, but one is the headnode). This is where the problem begins. I ran install_cluster eth0 (eth0 is connected to the cluster gb ethernet back end) and added four nodes. When I went to network boot these nodes, it did not work. Here are some of the errors... (stuff omitted) tg3: (02:00.0) phy probe failed, err -16 tg3: problem fetching invariants of chip, aborting tg3: (02:00.1) phy probe failed, err -16 tg3: problem fetching invariants of chip, aborting (stuff omitted) FusionMPT base driver 2.03.00 mptbase: Initiating ioc0 bringup mptbase: ioc0: WARNING: unexpected doorbell active mptbase: ioc0: ERROR: doorbell ACK timeout (2) (mptbase stuff repeats a couple times) Kernel panic.... Some searching on the users list immediately brought me back to Jason Hlady's problem. I read through the string of messages, but didn't see any solution other than to reload the cluster. This isn't an option for me. I removed the kernel and initrd.img from /tftpboot and verified that without them the nodes don't get a kernel...so I know they are being used. It's just really strange that these have worked over and over for me but suddenly on this cluster they've quit working. I don't get it. Anyway...if there's anything you suggest checking please let me know. I have Frank's tarball in /usr/share/systemimager/boot/i386/standard/ and of course the kernel and initrd.img are in /tftpboot. It just doesn't work now for some reason. John -----Original Message----- From: Bernard Li [mailto:[EMAIL PROTECTED] Posted At: Thursday, February 09, 2006 10:49 PM Posted To: OSCAR Conversation: [Oscar-users] Upgrading OSCAR Cluster Subject: RE: [Oscar-users] Upgrading OSCAR Cluster Hi John: boel_binaries.tar.gz is located in /usr/share/systemimager/boot/<arch>/standard and this gets pulled down via rsync by SystemImager during node imaging. Can you post your specific problem? I get lost in the thread. Cheers, Bernard From: OSCAR [mailto:[EMAIL PROTECTED] Sent: Thu 09/02/2006 07:46 To: Bernard Li; [email protected] Subject: RE: [Oscar-users] Upgrading OSCAR Cluster Thanks Bernard. I think this is what I'll end up doing long term. We have another cluster running OSCAR 4 and RHEL 3, with the support for CentOS in later OSCAR versions we'll definitely be switching to that. But, in the mean time I need to limp this cluster through the end of the project it's currently for. Do you have any thoughts for the other thread I posted? I'm running RH9 with OSCAR 3 and I am getting the exact issue discussed in that thread. Unfortunately, starting over isn't an option. I need to get these nodes loaded to test the new kernel before I push it out to the rest of the cluster....but for some reason the nodes don't get the right kernel when they pxe boot. I can't find where they are getting it from...it doesn't appear to be the one in /tftpboot because that's the one I used to successfully the load the cluster initially. (it's the boel_binaries one discussed in the thread). Any and all thoughts are appreciated. Thanks in advance for your support. John -----Original Message----- From: Bernard Li [mailto:[EMAIL PROTECTED] Posted At: Thursday, February 09, 2006 1:42 AM Posted To: OSCAR Conversation: Upgrading OSCAR Cluster Subject: RE: [Oscar-users] Upgrading OSCAR Cluster Hi John: There is currently no upgrade path for OSCAR - i.e. if you want to upgrade, you'll have to re-install the OS (on the headnode), re-install OSCAR, re-create the images and re-deploy your compute nodes. You don't have to do this on the production cluster, if you have 2 spare computers (hopefully with similar hardware as your cluster nodes), you can build a test cluster and create/tweak your images before you perform this on your production cluster. You probably also want to backup your user files, fstab as well as other configuration settings from Ganglia and/or TORQUE, etc. CentOS 3 is based on RHEL3 and should be quite similar to Red Hat Linux 9 - if you want a "newer" distribution I would recommend at least CentOS 4 (that is provided that your other software work under this OS). Do note that CentOS 4 runs 2.6 kernel, whereas both CentOS 3 and RHL9 runs 2.4 kernel. Cheers, Bernard From: [EMAIL PROTECTED] on behalf of OSCAR Sent: Wed 08/02/2006 12:04 To: [email protected] Subject: [Oscar-users] Upgrading OSCAR Cluster We're running OSCAR 3.0 on RedHat9. We're running into several issues with some of our software and the solution seems to be upgrading to a newer linux distribution. This cluster is shared by multiple projects so I want to minimize the impact as much as possible. What I'd like to do is upgrade OSCAR, use it to build an image based on a newer distribution, tweak the image with all the changes we've made for our software, and then deploy it. I'm just starting to look at this, so if anyone has suggestions, please let me know. Can anybody point me to upgrade instructions? Is it as simple as just installing the latest oscar version and proceeding like a new install? Will the installation notice that I already have a cluster deployed and pick up those settings? Any advice on whether to upgrade the headnode OS? Should I do this first? I plan to use CentOS 3, primarily for compatibility with other systems we use. All advice is greatly appreciated...I'll start digging around for this information myself now too. :) Thanks, John Artman CCNA, MCP, RHCE/CT Senior Systems Engineer ENSCO Inc. The information contained in this email message is intended only for the use of the individuals to whom it is addressed and may contain information that is privileged and sensitive. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by email at the above referenced address. Thank you. The information contained in this email message is intended only for the use of the individuals to whom it is addressed and may contain information that is privileged and sensitive. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by email at the above referenced address. Thank you. The information contained in this email message is intended only for the use of the individuals to whom it is addressed and may contain information that is privileged and sensitive. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by email at the above referenced address. Thank you. ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642 _______________________________________________ Oscar-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-users
