Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second
The problem seems to have gone away, but I can not offer a solid explanation. At some point after having removed the working directories for the datanode and reformatted the namenode and restarted the cluster, this issue does not manifest anymore. However, I had already done those same steps well before posting these issues, so it is not clear what small detail that I had done was different this time. if this problem were to recur I would not be able to precisely prescribe a solution. 2011/11/29 Stephen Boesch > I verified the DN was down via both jps and java. Anyways, it was enough > to see via "top" since as mentioned DN was consuming 100% of one cpu when > running. > > > 2011/11/29 Stephen Boesch > >> Hi Uma, >>I mentioned that I have restarted the datanode *many *times, and in >> fact the entire cluster more than ten times. >> >> >> 2011/11/29 Uma Maheswara Rao G >> >>> Looks you are getting HDFS-2553. >>> >>> The cause might be that, you cleared the datadirectories directly >>> without DN restart. Workaround would be to restart DNs. >>> >>> >>> >>> Regards, >>> >>> Uma >>> >>> >>> >>> ---------- >>> >>> *From:* Stephen Boesch [java...@gmail.com] >>> *Sent:* Tuesday, November 29, 2011 8:53 PM >>> *To:* mapreduce-user@hadoop.apache.org >>> *Subject:* Re: MRv2 DataNode problem: isBPServiceAlive invoked order of >>> 200K times per second >>> >>> Update on this: I've shut down all the servers multiple times. Also >>> cleared the data directories and reformatted the namenode. Restarted it and >>> the same results: 100% cpu and millions of these calls to isBPServiceAlive. >>> >>> >>> 2011/11/29 Stephen Boesch >>> >>>> I am just trying to get off the ground with MRv2. The first node (in >>>> pseudo distributed mode) is working fine - ran a couple of TeraSort's on >>>> it. >>>> >>>> The second node has a serious issue with its single DataNode: it >>>> consumes 100% of one of the CPU's. Looking at it through JVisualVM, there >>>> are over 8 million invocations of isBPServiceAlive in a matter of a minute >>>> or so and continually incrementing at a steady clip. A screenshot of the >>>> JvisualVM cpu profile - showing just shy of 8M invocations is attached. >>>> >>>> What kind of configuration error could lead to this? The >>>> conf/masters and conf/slaves simply say localhost. If need be I'll copy >>>> the *-site.xml's. They are boilerplate from the Cloudera page by Ahmed >>>> Radwan. >>>> >>>> >>>> >>>> >>>> >>> >> >
Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second
I verified the DN was down via both jps and java. Anyways, it was enough to see via "top" since as mentioned DN was consuming 100% of one cpu when running. 2011/11/29 Stephen Boesch > Hi Uma, >I mentioned that I have restarted the datanode *many *times, and in > fact the entire cluster more than ten times. > > > 2011/11/29 Uma Maheswara Rao G > >> Looks you are getting HDFS-2553. >> >> The cause might be that, you cleared the datadirectories directly without >> DN restart. Workaround would be to restart DNs. >> >> >> >> Regards, >> >> Uma >> >> >> >> -- >> >> *From:* Stephen Boesch [java...@gmail.com] >> *Sent:* Tuesday, November 29, 2011 8:53 PM >> *To:* mapreduce-user@hadoop.apache.org >> *Subject:* Re: MRv2 DataNode problem: isBPServiceAlive invoked order of >> 200K times per second >> >> Update on this: I've shut down all the servers multiple times. Also >> cleared the data directories and reformatted the namenode. Restarted it and >> the same results: 100% cpu and millions of these calls to isBPServiceAlive. >> >> >> 2011/11/29 Stephen Boesch >> >>> I am just trying to get off the ground with MRv2. The first node (in >>> pseudo distributed mode) is working fine - ran a couple of TeraSort's on >>> it. >>> >>> The second node has a serious issue with its single DataNode: it >>> consumes 100% of one of the CPU's. Looking at it through JVisualVM, there >>> are over 8 million invocations of isBPServiceAlive in a matter of a minute >>> or so and continually incrementing at a steady clip. A screenshot of the >>> JvisualVM cpu profile - showing just shy of 8M invocations is attached. >>> >>> What kind of configuration error could lead to this? The conf/masters >>> and conf/slaves simply say localhost. If need be I'll copy the >>> *-site.xml's. They are boilerplate from the Cloudera page by Ahmed Radwan. >>> >>> >>> >>> >>> >> >
Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second
Hi Uma, I mentioned that I have restarted the datanode *many *times, and in fact the entire cluster more than ten times. 2011/11/29 Uma Maheswara Rao G > Looks you are getting HDFS-2553. > > The cause might be that, you cleared the datadirectories directly without > DN restart. Workaround would be to restart DNs. > > > > Regards, > > Uma > > > > -- > > *From:* Stephen Boesch [java...@gmail.com] > *Sent:* Tuesday, November 29, 2011 8:53 PM > *To:* mapreduce-user@hadoop.apache.org > *Subject:* Re: MRv2 DataNode problem: isBPServiceAlive invoked order of > 200K times per second > > Update on this: I've shut down all the servers multiple times. Also > cleared the data directories and reformatted the namenode. Restarted it and > the same results: 100% cpu and millions of these calls to isBPServiceAlive. > > > 2011/11/29 Stephen Boesch > >> I am just trying to get off the ground with MRv2. The first node (in >> pseudo distributed mode) is working fine - ran a couple of TeraSort's on >> it. >> >> The second node has a serious issue with its single DataNode: it >> consumes 100% of one of the CPU's. Looking at it through JVisualVM, there >> are over 8 million invocations of isBPServiceAlive in a matter of a minute >> or so and continually incrementing at a steady clip. A screenshot of the >> JvisualVM cpu profile - showing just shy of 8M invocations is attached. >> >> What kind of configuration error could lead to this? The conf/masters >> and conf/slaves simply say localhost. If need be I'll copy the >> *-site.xml's. They are boilerplate from the Cloudera page by Ahmed Radwan. >> >> >> >> >> >
RE: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second
Looks you are getting HDFS-2553. The cause might be that, you cleared the datadirectories directly without DN restart. Workaround would be to restart DNs. Regards, Uma From: Stephen Boesch [java...@gmail.com] Sent: Tuesday, November 29, 2011 8:53 PM To: mapreduce-user@hadoop.apache.org Subject: Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second Update on this: I've shut down all the servers multiple times. Also cleared the data directories and reformatted the namenode. Restarted it and the same results: 100% cpu and millions of these calls to isBPServiceAlive. 2011/11/29 Stephen Boesch mailto:java...@gmail.com>> I am just trying to get off the ground with MRv2. The first node (in pseudo distributed mode) is working fine - ran a couple of TeraSort's on it. The second node has a serious issue with its single DataNode: it consumes 100% of one of the CPU's. Looking at it through JVisualVM, there are over 8 million invocations of isBPServiceAlive in a matter of a minute or so and continually incrementing at a steady clip. A screenshot of the JvisualVM cpu profile - showing just shy of 8M invocations is attached. What kind of configuration error could lead to this? The conf/masters and conf/slaves simply say localhost. If need be I'll copy the *-site.xml's. They are boilerplate from the Cloudera page by Ahmed Radwan.
Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second
Update on this: I've shut down all the servers multiple times. Also cleared the data directories and reformatted the namenode. Restarted it and the same results: 100% cpu and millions of these calls to isBPServiceAlive. 2011/11/29 Stephen Boesch > I am just trying to get off the ground with MRv2. The first node (in > pseudo distributed mode) is working fine - ran a couple of TeraSort's on > it. > > The second node has a serious issue with its single DataNode: it consumes > 100% of one of the CPU's. Looking at it through JVisualVM, there are over > 8 million invocations of isBPServiceAlive in a matter of a minute or so and > continually incrementing at a steady clip. A screenshot of the JvisualVM > cpu profile - showing just shy of 8M invocations is attached. > > What kind of configuration error could lead to this? The conf/masters and > conf/slaves simply say localhost. If need be I'll copy the *-site.xml's. > They are boilerplate from the Cloudera page by Ahmed Radwan. > > > > >