Thank you for providing this information, and I think it may be resulted from "too many fetch failures". Now I have accumulated some experience and I think I'll solve it soon. Thanks again.
2009/8/19 Jason Venner <[email protected]> > I have added small numbers of nodes into running clusters, with running > jobs > without issue - when the machines were correctly configured for the > cluster, > so this is known to work at least in the 0.18 release series (when I was > doing this operation). > > On Mon, Aug 17, 2009 at 6:56 AM, yang song <[email protected]> > wrote: > > > The situation is I can't find any unusual thing from the logs. > > Maybe there is a lot of data to transfer since so many new nodes and the > > jobs are waiting for it > > > > 2009/8/17 Ted Dunning <[email protected]> > > > > > Have you looked at the logs? > > > > > > On Sun, Aug 16, 2009 at 11:36 PM, yang song <[email protected]> > > > wrote: > > > > > > > Hi, all > > > > When I add another 50 nodes into the current cluster(200 nodes) at > > the > > > > same time, the jobs run very smoothly at first. However, after a > while, > > > all > > > > the jobs are suspended and never continue. > > > > > > > > > > > > > > > > > -- > Pro Hadoop, a book to guide you from beginner to hadoop mastery, > http://www.amazon.com/dp/1430219424?tag=jewlerymall > www.prohadoopbook.com a community for Hadoop Professionals >
