Impatient I am, I just shut down the cluster and restarted it with empty exclude file.
If I added the datanode hostname back to the exclude file, and ran hadoop dfsadmin -refreshNodes, *the datanode goes straight to the dead node *without going to the descommission process. I'm done for today. maybe someone else can figure it out when I come back tomorrow :) Best regards, Ben On Tue, Jan 22, 2013 at 5:38 PM, Ben Kim <benkimkim...@gmail.com> wrote: > UPDATE: > > WARN with edit log had nothing to do with the current problem. > > However replica placement warnings seem to be suspicious. > Please have a look at the following logs. > > > 2013-01-22 09:12:10,885 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place > enough replicas, still in need of 1 > 2013-01-22 00:02:17,541 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > Block: blk_4844131893883391179_3440513, > Expected Replicas: 10, live replicas: 9, c orrupt replicas: 0, > decommissioned replicas: 1, excess replicas: 0, Is Open File: false, > Datanodes having this block: 203.235.211.155:50010 203.235.211.156:5001020 > 3.235.211.145:50010 203.235.211.144:50010 203.235.211.146:50010 > 203.235.211.158:50010 203.235.211.159:50010 203.235.211.157:50010 > 203.235.211.160:50010 203.235.211. 143:50010 , > Current Datanode: 203.235.211.155:50010, Is current datanode > decommissioning: true > > I have set my replication factor to 3. I dont understand why hadoop is > trying to replicate it to 10 nodes. I have decommissioned one node so > currently I have 9 nodes in operation. It will never be replicated to 10 > nodes. > > I also see that all repeated warning msg like the above is for > blk_4844131893883391179_3440513. > > How would I delete the block? it's not showing as corrupted block on fsck. > :( > > BEN > > > > > > On Tue, Jan 22, 2013 at 9:28 AM, Ben Kim <benkimkim...@gmail.com> wrote: > >> Hi Varun, Thnk you for the reponse >> >> No there doesnt seem to be any corrupted blocks in my cluster. >> I did "hadoop fsck -blocks /" and it didnt report any corrupted block. >> >> However, these are two WARNings in the namenode log, constantly repeating >> since the decommission. >> >> - 2013-01-22 09:16:30,908 WARN >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log, >> edits.new files already exists in all healthy directories: >> - 2013-01-22 09:12:10,885 WARN >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place >> enough replicas, still in need of 1 >> >> There isn't any WARN or ERROR in the decommissioning datanode log >> >> Ben >> >> >> >> On Mon, Jan 21, 2013 at 3:05 PM, varun kumar <varun....@gmail.com> wrote: >> >>> Hi Ben, >>> >>> Are there any corrupted blocks in your hadoop cluster. >>> >>> Regards, >>> Varun Kumar >>> >>> >>> On Mon, Jan 21, 2013 at 8:22 AM, Ben Kim <benkimkim...@gmail.com> wrote: >>> >>>> Hi! >>>> >>>> I followed the decommissioning guide on the hadoop hdfs wiki. >>>> >>>> the hdfs web ui shows that the decommissioning proceess has >>>> successfully begun. >>>> >>>> it started redeploying 80,000 blocks through the hadoop cluster, but >>>> for some reason it stopped at 9059 blocks. I've waited 30 hours and still >>>> no progress. >>>> >>>> Any one with any idea? >>>> -- >>>> >>>> *Benjamin Kim* >>>> *benkimkimben at gmail* >>>> >>> >>> >>> >>> -- >>> Regards, >>> Varun Kumar.P >>> >> >> >> >> -- >> >> *Benjamin Kim* >> *benkimkimben at gmail* >> > > > > -- > > *Benjamin Kim* > *benkimkimben at gmail* > -- *Benjamin Kim* *benkimkimben at gmail*