I just realized that it finally posted my original message moments after I resent it. I had assumed after that period of time that it had not been successfully received by the system since I had not seen it come up on the archive or on the list itself. Sorry for the double posting there.
On Thu, Aug 27, 2009 at 3:01 PM, Seb Seith<[email protected]> wrote: > Hello, > > I have been working on setting up a Hadoop on Demand cluster on > three machines and have run into a bit of a snag. I went through the > admin and user guides and have successfully installed torque and HOD. > When I run "hod allocate" it successfully starts hodring on 2 of the > machines but not the third. The result is that I have a working > Namenode and Jobtracker (though its UI does not seem to work > presently) but no slave nodes. > > Even at level 4 debug in all sections there is nothing to indicate > a failure as the ringmaster has no problem communicating with the > running hodring jobs and pbsdsh returns without error. I can find no > logs on any of the machines indicating a torque issue (though I admit > I am not terribly familiar with torque) and no logs at all for HOD on > the machine that is not running hodring. > > It would appear that the pbsdsh job simply isn't starting hodring > on the one node given the lack of any HOD log on that machine. Either > it is not recognizing the node (seems somewhat unlikely as it comes up > in pbsnodes as free) or there is a relatively silent failure > somewhere. If you have any suggestions I would much appreciate them. > > One quick side note is I have successfully run a standard > hadoop-0.20.0 cluster on these three machines with no difficulty, > which should rule out connection, ssh or firewall issues. > > Thanks, > > Seb >
