[ https://issues.apache.org/jira/browse/HDFS-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965550#comment-16965550 ]
Fei Hui commented on HDFS-14946: -------------------------------- [~ayushtkn] Add the assert because i had wanted to tell callers that they should guarantee srcNodes match liveBlockIndicies. Maybe it is unnecessary > Erasure Coding: Block recovery failed during decommissioning > ------------------------------------------------------------ > > Key: HDFS-14946 > URL: https://issues.apache.org/jira/browse/HDFS-14946 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.0.3, 3.2.1, 3.1.3 > Reporter: Fei Hui > Assignee: Fei Hui > Priority: Major > Attachments: HDFS-14946.001.patch, HDFS-14946.002.patch > > > DataNode logs as follow > {quote} > org.apache.hadoop.HadoopIllegalArgumentException: No enough valid inputs are > provided, not recoverable > at > org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:119) > at > org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:47) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstructTargets(StripedBlockReconstructor.java:126) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:97) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {quote} > Block recovery always failed because of srcNodes in the wrong order > Reproduce steps are: > # ec block (b0, b1, b2, b3, b4, b5, b6, b7, b8), b[0-8] are on dn[0-8], > dn[0-3] are decommissioning > # dn[1-3] are decommissioned, dn0 are in decommissioning, ec block is > [b0(decommissioning), b[1-3](decommissioned), b[4-8](live), b[0-3](live)] > # dn4 is crash, and b4 will be recovery, ec block is [b0(decommissioning), > b[1-3](decommissioned), null, b[5-8](live), b[0-3](live)] > We can see error log as above, and b4 is not recovery successfuly. Because > srcNodes transfered to recovery datanode contains block [b0, b[5-8],b[0-3]], > and datanode use [b0, b[5-8], b0](minRequiredSources Readers to reconstruct, > minRequiredSources = Math.min(cellsNum, dataBlkNum)) to recovery the missing > block. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org