Hey guys,

I'm back from the weekend, I tried reinstalling on the new nodes using the *other* ethernet port, and I no longer receive destination host unreachable errors. (Although it still calls the port "eth0").

I haven't really done any testing yet, but I was able to finish all of the steps without further error. I'll keep you posted.

Now I just have to figure out why xfs is hogging a whole CPU on the nodes...

David

Bernard Li wrote:

Hey David:
So let's say you have 1 headnode and 2 compute nodes (node1, and node2). Can headnode ping either node1 or node2? Can you show the output of ifconfig on all three computers? Do the compute nodes have more than one network card on board? I wonder if the systems got confused between eth0 and eth1... Cheers, Bernard
------------------------------------------------------------------------
*From:* David Isaacson [mailto:[EMAIL PROTECTED]
*Sent:* Fri 08/07/2005 5:55 PM
*To:* Bernard Li
*Subject:* Re: [Oscar-devel] New Problem - DHCP not found on client nodes

I used to have SL3 on these machines without any network problems...maybe that doesn't mean anything though.

By the way, I did get the system successfully installed on 2 of the newer nodes, but the nodes and host couldn't ping eachother, or mount nfs, or ssh....etc. I got "Destination Host Unreachable". All 3 are plugged straight into the same private switch, though.

David

On Jul 8, 2005, at 5:28 PM, Bernard Li wrote:

> Another quick way to test is just boot off the first CD of SL3 and see
> if the network adapter is detected.  lspci and friends should work.
>
> Cheers,
>
> Bernard
>
>
>> -----Original Message-----
>> From: Lombard, David N [mailto:[EMAIL PROTECTED]
>> Sent: Friday, July 08, 2005 17:10
>> To: David Isaacson
>> Cc: Bernard Li; [email protected]
>> Subject: RE: [Oscar-devel] New Problem - DHCP not found on
>> client nodes
>>
>> From: David Isaacson on Friday, July 08, 2005 4:12 PM
>>
>>>
>>>
>>>> Questions, I have questions:
>>>>
>>>> Have you tried another client node?
>>>>
>>>>
>>>>
>>> I had tried another node with the exact same hardware with the same
>>> results.  Now however I decide to try one of our nodes, which have
>>> similar but not exactly the same hardware as the old ones.  It looks
>>> like its working on this one.  At the very least it got
>>>
>> past the point
>>
>>> where it hung before.  Of course I want to use the old nodes too so
>>>
>> the
>>
>>> problem doesn't just go away :(
>>>
>>
>> OK, so that would again appear to implicate the client. The node >> that
>> did work should either reboot or beep, depending on the
>> config in Step 4
>> "Build OSCAR Client Image". If you set it to beep, just reset it, >> and
>> it should boot from local disk (assuming you have the right
>> boot order).
>> The best  BIOSes have a "boot once" mode, so you just do a
>> network boot
>> when you want to reinstall, and a normal boot from disk otherwise
>> (there's also a method where you can use per-node PXE config files to
>> either do a LOCALBOOT or network boot).
>>
>>
>>>> Do you see any messages about the NIC driver loading?
>>>>
>>>>
>>>>
>>> I did see messages about the NIC driver loading....
>>> I think.  Now I can't seem to see any.  But they all go by pretty
>>>
>> fast,
>>
>>> so I really can't be sure.
>>>
>>
>> Back to the failing node...
>>
>> After the "Please contribute" lines, you should see
>>
>>   Listening on LPF/eth0/<MAC-address-as-octets>
>>   Sending on   LPF/eth0/<MAC-address-as-octets>
>>   Listening on LPF/lo/<null>
>>   Sending on   LPF/lo/<null>
>>   DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6
>>   DHCPDISCOVER on lo to 255.255.255.255 port 67 interval 6
>>
>> If you don't have the three "eth0" lines, you don't have a NIC (well,
>> you don't have a functioning driver).
>>
>> After the above, you should then have
>>
>>   DHCPOFFER from 192.168.x.y
>>   DHCPREQUEST on eth0 to 255.255.255.255 port 67
>>   DHCPACK from 192.168.x.y
>>
>> These represent the *second* OFFER/REQUREST/ACK set that
>> don't appear to
>> be present in /var/log/messages on the server.
>>
>>
>>>> What board do you have?
>>>>
>>>>
>>>>
>>> The board is a Tyan S2720.
>>>
>>
>> Hmmm.  I did find this
>>
>> <https://www.redhat.com/archives/redhat-list/2002-July/msg00832.html>
>>
>> Can you see which driver, e100 or e1000, is initializing?
>>
>> You can <Ctrl-S> and <Ctrl-Q> the client to stop/start the
>> display, and
>> <Ctrl-C> to interrupt the process.  Once you do, you could
>>
>>     cat /var/state/dhcp/dhclient.leases
>>
>> to see if anything is present, but I'd guess not, and
>>
>>     ifconfig
>>
>> to see what sort of errors you may be hitting.
>>
>>     cat /proc/bus/pci/devices
>>
>> should dump out device info.  Sadly, lspci isn't on the initrd :-
>> (  At
>> any rate, you should see the NIC if a driver has claimed it.
>> The first
>> number (4 digit hex) lists the bus, slot, and function number of each
>> device in a packed format; the second number (8 digit hex) will start
>> with "8086" for an Intel device.  Along with the various
>> bridges & etc,
>> you should see the NIC (with the driver that claimed it)--what is the
>> second set of 4 digits?
>>
>> --
>> dnl
>>
>> My comments represent my opinions, not those of Intel Corporation.
>>
>




-------------------------------------------------------
This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
core and dual graphics technology at this free one hour event hosted by HP, AMD, and NVIDIA. To register visit http://www.hp.com/go/dualwebinar
_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to