The problem seems to have gone away, but I can not offer a solid
explanation.  At some point after having removed the working directories
for the datanode and reformatted the namenode and restarted the cluster,
this issue does not manifest anymore.  However, I had already done those
same steps well before posting these issues, so it is not clear what small
detail that I had done was different this time. if this problem were to
recur I would not be able to precisely prescribe a solution.

2011/11/29 Stephen Boesch <java...@gmail.com>

> I verified the DN was down via both jps and java. Anyways,  it was enough
> to see via "top" since as mentioned DN was consuming 100% of one cpu when
> running.
>
>
> 2011/11/29 Stephen Boesch <java...@gmail.com>
>
>> Hi Uma,
>>    I mentioned that I have restarted the datanode *many *times, and in
>> fact the entire cluster more than ten times.
>>
>>
>> 2011/11/29 Uma Maheswara Rao G <mahesw...@huawei.com>
>>
>>>  Looks you are getting HDFS-2553.
>>>
>>> The cause might be that, you cleared the datadirectories directly
>>> without DN restart. Workaround would be to restart DNs.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Uma
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>  *From:* Stephen Boesch [java...@gmail.com]
>>> *Sent:* Tuesday, November 29, 2011 8:53 PM
>>> *To:* mapreduce-user@hadoop.apache.org
>>> *Subject:* Re: MRv2 DataNode problem: isBPServiceAlive invoked order of
>>> 200K times per second
>>>
>>>  Update on this:  I've shut down all the servers multiple times.  Also
>>> cleared the data directories and reformatted the namenode. Restarted it and
>>> the same results: 100% cpu and millions of these calls to isBPServiceAlive.
>>>
>>>
>>> 2011/11/29 Stephen Boesch <java...@gmail.com>
>>>
>>>> I am just trying to get off the ground with MRv2.  The first node (in
>>>> pseudo distributed mode)  is working fine - ran a couple of TeraSort's on
>>>> it.
>>>>
>>>>  The second node has a serious issue with its single DataNode: it
>>>> consumes 100% of one of the CPU's.  Looking at it through JVisualVM, there
>>>> are over 8 million invocations of isBPServiceAlive in a matter of a minute
>>>> or so and  continually incrementing at a steady clip.   A screenshot of the
>>>> JvisualVM cpu profile - showing just shy of 8M invocations is attached.
>>>>
>>>>  What kind of configuration error could lead to this?  The
>>>> conf/masters and conf/slaves simply say localhost.   If need be I'll copy
>>>> the *-site.xml's.  They are boilerplate from the Cloudera page by Ahmed
>>>> Radwan.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to