Jing Zhao created HDFS-9818: ------------------------------- Summary: Correctly handle EC reconstruction work caused by not enough racks Key: HDFS-9818 URL: https://issues.apache.org/jira/browse/HDFS-9818 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Takuya Fukudome Assignee: Jing Zhao
This is reported by [~tfukudom]: In a system test where 1 of 7 datanode racks were stopped, {{HadoopIllegalArgumentException}} was seen on DataNode side while reconstructing missing EC blocks: {code} 2016-02-16 11:09:06,672 WARN datanode.DataNode (ErasureCodingWorker.java:run(482)) - Failed to recover striped block: BP-480558282-172.29.4.13-1453805190696:blk_-9223372036850962784_278270 org.apache.hadoop.HadoopIllegalArgumentException: Inputs not fully corresponding to erasedIndexes in null places. erasedOrNotToReadIndexes: [1, 2, 6], erasedIndexes: [3] at org.apache.hadoop.io.erasurecode.rawcoder.RSRawDecoder.doDecode(RSRawDecoder.java:166) at org.apache.hadoop.io.erasurecode.rawcoder.AbstractRawErasureDecoder.decode(AbstractRawErasureDecoder.java:84) at org.apache.hadoop.io.erasurecode.rawcoder.RSRawDecoder.decode(RSRawDecoder.java:89) at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.recoverTargets(ErasureCodingWorker.java:683) at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:465) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)