The problem seems to have gone away, but I can not offer a solid explanation. At some point after having removed the working directories for the datanode and reformatted the namenode and restarted the cluster, this issue does not manifest anymore. However, I had already done those same steps well before posting these issues, so it is not clear what small detail that I had done was different this time. if this problem were to recur I would not be able to precisely prescribe a solution.
2011/11/29 Stephen Boesch <java...@gmail.com> > I verified the DN was down via both jps and java. Anyways, it was enough > to see via "top" since as mentioned DN was consuming 100% of one cpu when > running. > > > 2011/11/29 Stephen Boesch <java...@gmail.com> > >> Hi Uma, >> I mentioned that I have restarted the datanode *many *times, and in >> fact the entire cluster more than ten times. >> >> >> 2011/11/29 Uma Maheswara Rao G <mahesw...@huawei.com> >> >>> Looks you are getting HDFS-2553. >>> >>> The cause might be that, you cleared the datadirectories directly >>> without DN restart. Workaround would be to restart DNs. >>> >>> >>> >>> Regards, >>> >>> Uma >>> >>> >>> >>> ------------------------------ >>> >>> *From:* Stephen Boesch [java...@gmail.com] >>> *Sent:* Tuesday, November 29, 2011 8:53 PM >>> *To:* mapreduce-user@hadoop.apache.org >>> *Subject:* Re: MRv2 DataNode problem: isBPServiceAlive invoked order of >>> 200K times per second >>> >>> Update on this: I've shut down all the servers multiple times. Also >>> cleared the data directories and reformatted the namenode. Restarted it and >>> the same results: 100% cpu and millions of these calls to isBPServiceAlive. >>> >>> >>> 2011/11/29 Stephen Boesch <java...@gmail.com> >>> >>>> I am just trying to get off the ground with MRv2. The first node (in >>>> pseudo distributed mode) is working fine - ran a couple of TeraSort's on >>>> it. >>>> >>>> The second node has a serious issue with its single DataNode: it >>>> consumes 100% of one of the CPU's. Looking at it through JVisualVM, there >>>> are over 8 million invocations of isBPServiceAlive in a matter of a minute >>>> or so and continually incrementing at a steady clip. A screenshot of the >>>> JvisualVM cpu profile - showing just shy of 8M invocations is attached. >>>> >>>> What kind of configuration error could lead to this? The >>>> conf/masters and conf/slaves simply say localhost. If need be I'll copy >>>> the *-site.xml's. They are boilerplate from the Cloudera page by Ahmed >>>> Radwan. >>>> >>>> >>>> >>>> >>>> >>> >> >