Re: Decommissioning a datanode takes forever

Ben Kim Tue, 22 Jan 2013 06:10:02 -0800

Impatient I am, I just shut down the cluster and restarted it with empty
exclude file.


If I added the datanode hostname back to the exclude file, and ran hadoop
dfsadmin -refreshNodes, *the datanode goes straight to the dead node *without
going to the descommission process.

I'm done for today. maybe someone else can figure it out when I come back
tomorrow :)

Best regards,
Ben

On Tue, Jan 22, 2013 at 5:38 PM, Ben Kim <benkimkim...@gmail.com> wrote:

> UPDATE:
>
> WARN with edit log had nothing to do with the current problem.
>
> However replica placement warnings seem to be suspicious.
> Please have a look at the following logs.
>
>
> 2013-01-22 09:12:10,885 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
> enough replicas, still in need of 1
> 2013-01-22 00:02:17,541 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> Block: blk_4844131893883391179_3440513,
> Expected Replicas: 10, live replicas: 9, c        orrupt replicas: 0,
> decommissioned replicas: 1, excess replicas: 0, Is Open File: false,
> Datanodes having this block: 203.235.211.155:50010 203.235.211.156:5001020
> 3.235.211.145:50010 203.235.211.144:50010 203.235.211.146:50010
> 203.235.211.158:50010 203.235.211.159:50010 203.235.211.157:50010
> 203.235.211.160:50010 203.235.211.        143:50010 ,
> Current Datanode: 203.235.211.155:50010, Is current datanode
> decommissioning: true
>
> I have set my replication factor to 3. I dont understand why hadoop is
> trying to replicate it to 10 nodes. I have decommissioned one node so
> currently I have 9 nodes in operation. It will never be replicated to 10
> nodes.
>
> I also see that all repeated warning msg like the above is for
> blk_4844131893883391179_3440513.
>
> How would I delete the block? it's not showing as corrupted block on fsck.
> :(
>
> BEN
>
>
>
>
>
> On Tue, Jan 22, 2013 at 9:28 AM, Ben Kim <benkimkim...@gmail.com> wrote:
>
>> Hi Varun, Thnk you for the reponse
>>
>> No there doesnt seem to be any corrupted blocks in my cluster.
>> I did "hadoop fsck -blocks /" and it didnt report any corrupted block.
>>
>> However, these are two WARNings in the namenode log, constantly repeating
>> since the decommission.
>>
>>    - 2013-01-22 09:16:30,908 WARN
>>    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
>>    edits.new files already exists in all healthy directories:
>>    - 2013-01-22 09:12:10,885 WARN
>>    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
>>    enough replicas, still in need of 1
>>
>> There isn't any WARN or ERROR in the decommissioning datanode log
>>
>> Ben
>>
>>
>>
>> On Mon, Jan 21, 2013 at 3:05 PM, varun kumar <varun....@gmail.com> wrote:
>>
>>> Hi Ben,
>>>
>>> Are there any corrupted blocks in your hadoop cluster.
>>>
>>> Regards,
>>> Varun Kumar
>>>
>>>
>>> On Mon, Jan 21, 2013 at 8:22 AM, Ben Kim <benkimkim...@gmail.com> wrote:
>>>
>>>> Hi!
>>>>
>>>> I followed the decommissioning guide on the hadoop hdfs wiki.
>>>>
>>>> the hdfs web ui shows that the decommissioning proceess has
>>>> successfully begun.
>>>>
>>>> it started redeploying 80,000 blocks through the hadoop cluster, but
>>>> for some reason it stopped at 9059 blocks. I've waited 30 hours and still
>>>> no progress.
>>>>
>>>> Any one with any idea?
>>>>  --
>>>>
>>>> *Benjamin Kim*
>>>> *benkimkimben at gmail*
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Varun Kumar.P
>>>
>>
>>
>>
>> --
>>
>> *Benjamin Kim*
>> *benkimkimben at gmail*
>>
>
>
>
> --
>
> *Benjamin Kim*
> *benkimkimben at gmail*
>



-- 

*Benjamin Kim*
*benkimkimben at gmail*

Re: Decommissioning a datanode takes forever

Reply via email to