Hi Nick,

Can you curl http://127.0.0.1:50075/stacks on one of the stuck nodes and
paste the result?

Sometimes that can give an indication as to where things are getting stuck.

-Todd

On Mon, Sep 28, 2009 at 7:21 PM, Nick Rathke <n...@sci.utah.edu> wrote:

> FYI I get the same hanging behavior if I follow the Hadoop quick start for
> a single node base line configuration ( no modified conf files)
>
> -Nick
>
>
>
> Brian Bockelman wrote:
>
>> Hey Nicke,
>>
>> Do you have any error messages appearing in the log files?
>>
>> Brian
>>
>> On Sep 28, 2009, at 2:06 PM, Nick Rathke wrote:
>>
>>  Ted Dunning wrote:
>>>
>>>> I think that the last time you asked this question, the suggestion was
>>>> to
>>>> look at DNS and make sure that everything is exactly correct in the
>>>> net-boot
>>>> configuration.  Hadoop is very sensitive to network routing and naming
>>>> details.
>>>>
>>>> So,
>>>>
>>>> a) in your net-boot, how are IP addresses assigned?
>>>>
>>>>  We assign static IP's based on a node's MAC address via DHCP so that
>>> when a node is netbooted or booted with a local OS it gets the same IP and
>>> hostname.
>>>
>>>> b) how are DNS names propagated?
>>>>
>>>>  cluster DNS names are on a mixed in with our facility DNS servers.
>>> All nodes have proper forward and reverse DNS lookups.
>>>
>>>> c) how have you guaranteed that (a) and (b) are exactly consistent?
>>>>
>>>>  Host MAC address. I also have manually conformed this.
>>>
>>>> d) how have your guaranteed that every node can talk to every other node
>>>> both by name and IP address?
>>>>
>>>>  Local cluster DNS / DHCP + all nodes have all other nodes host names
>>> and IP's in /etc/hosts. I have compared all the config files for DNS / DHCP
>>> / and /etc/hosts to make sure all information is the same.
>>>
>>>> e) have you assured yourself that any reverse mapping that exists is
>>>> correct?
>>>>
>>>>  Yes, and tested.
>>>
>>> One more bit of information. The system boots on a 1Gb network all other
>>> network traffic i.e. MPI and NFS to data volumes is via IB.
>>>
>>> The IB network also has proper forward/backwards DNS entries. IB IP
>>> address are setup at boot time via a script that takes the host IP and a
>>> fixed offset to calculate the address for the IB interface. I have also
>>> confirmed that the IB IP address's match our DNS .
>>>
>>> -Nick
>>>
>>>
>>>  On Mon, Sep 28, 2009 at 9:45 AM, Nick Rathke <n...@sci.utah.edu> wrote:
>>>>
>>>>
>>>>  I am hopping that someone can help with this issue. I have a 64 node
>>>>> cluster that we would like to run Hadoop on, most of the nodes are
>>>>> netbooted
>>>>> via NFS.
>>>>>
>>>>> Hadoop runs fine on nodes IF the node uses a local OS install, but
>>>>> doesn't
>>>>> work when nodes are netbooted. Under netboot I can see that the slaves
>>>>> have
>>>>> the correct Java processes running, but the Hadoop web pages never
>>>>> shows the
>>>>> nodes as available. The Hadoop logs on the nodes also show that
>>>>> everything
>>>>> is running and started up correctly.
>>>>>
>>>>> On the few node that have a local OS installed everything works just
>>>>> fine
>>>>> and I can run the test jobs without issue (so far).
>>>>>
>>>>> I  am using the identical hadoop install and configuration between
>>>>> netbooted nodes and none netbooted nodes.
>>>>>
>>>>> Has anyone encountered this type of issue ?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Nick Rathke
>>> Scientific Computing and Imaging Institute
>>> Sr. Systems Administrator
>>> n...@sci.utah.edu
>>> www.sci.utah.edu
>>> 801-587-9933
>>> 801-557-3832
>>>
>>> "I came I saw I made it possible" Royal Bliss - Here They Come
>>>
>>
>>
>

Reply via email to