David Watzke created HDFS-9955:
----------------------------------

             Summary: DataNode won't self-heal after some block dirs were 
manually misplaced
                 Key: HDFS-9955
                 URL: https://issues.apache.org/jira/browse/HDFS-9955
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.6.0
         Environment: CentOS 6, Cloudera 5.4.4 (patched Hadoop 2.6.0)
            Reporter: David Watzke


I have accidentally ran this tool on top of DataNode's datadirs (of a datanode 
that was shut down at the moment): 
https://github.com/killerwhile/volume-balancer

The tool makes assumptions about block directory placement that are no longer 
valid in hadoop 2.6.0 and it was just moving them around between different 
datadirs to make the disk usage balanced. OK, it was not a good idea to run it 
but my concern is the way the datanode was (not) handling the resulting state. 
I've seen these messages in DN log (see below) which means DN knew about this 
but didn't do anything to fix it (self-heal by copying the other replica) - 
which seems like a bug to me. If you need any additional info please just ask.


2016-03-04 12:40:06,008 WARN 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding 
block BP-680964103-77.234.46.18-1375882473930:blk_-3159875140074863904_0 on 
volume /data/18/cdfs/dn
2016-03-04 12:40:06,009 WARN 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding 
block BP-680964103-77.234.46.18-1375882473930:blk_8369468090548520777_0 on 
volume /data/18/cdfs/dn
2016-03-04 12:40:06,011 WARN 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding 
block BP-680964103-77.234.46.18-1375882473930:blk_1226431637_0 on volume 
/data/18/cdfs/dn
2016-03-04 12:40:06,012 WARN 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding 
block BP-680964103-77.234.46.18-1375882473930:blk_1169332185_0 on volume 
/data/18/cdfs/dn
2016-03-04 12:40:06,825 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
opReadBlock 
BP-680964103-77.234.46.18-1375882473930:blk_1226781281_1099829669050 received 
exception java.io.IOException: BlockId 1226781281 is not valid.
2016-03-04 12:40:06,825 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(5.45.56.30, 
datanodeUuid=9da950ca-87ae-44ee-9391-0bca669c796b, infoPort=50075, 
ipcPort=50020, 
storageInfo=lv=-56;cid=cluster12;nsid=1625487778;c=1438754073236):Got exception 
while serving 
BP-680964103-77.234.46.18-1375882473930:blk_1226781281_1099829669050 to 
/5.45.56.30:48146
java.io.IOException: BlockId 1226781281 is not valid.
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:650)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:641)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:214)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:282)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:529)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:243)
        at java.lang.Thread.run(Thread.java:745)
2016-03-04 12:40:06,826 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
prg04-002.ff.avast.com:50010:DataXceiver error processing READ_BLOCK operation  
src: /5.45.56.30:48146 dst: /5.45.56.30:50010
java.io.IOException: BlockId 1226781281 is not valid.
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:650)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:641)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:214)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:282)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:529)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:243)
        at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to