[ 
https://issues.apache.org/jira/browse/HDFS-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qinyuren updated HDFS-16544:
----------------------------
    Description: 
In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
found an EC file decoding bug if more than one data block read failed. 

Currently, we found another bug trigger by #StatefulStripeReader.decode.

If we read an EC file which {*}length more than one stripe{*}, and this file 
have *one data block* and *the first parity block* corrupted, this error will 
happen.
{code:java}
org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not 
allowing null    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:48)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
    at 
org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
    at 
org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
    at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 
{code}
 

Let's say we use ec(6+3) and the data block[0] and the first parity block[6] 
are corrupted.
 # The readers for block[0] and block[6] will be closed after reading the first 
stripe of an EC file;
 # When the client reading the second stripe of the EC file, it will trigger 
#prepareParityChunk for block[6]. 
 # The decodeInputs[6] will not be constructed due to the reader for block[6] 
was closed.

 
{code:java}
boolean prepareParityChunk(int index) {
  Preconditions.checkState(index >= dataBlkNum
      && alignedStripe.chunks[index] == null);
  if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
    alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING);
    // we have failed the block reader before
    return false;
  }
  final int parityIndex = index - dataBlkNum;
  ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate();
  buf.position(cellSize * parityIndex);
  buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock);
  decodeInputs[index] =
      new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock);
  alignedStripe.chunks[index] =
      new StripingChunk(decodeInputs[index].getBuffer());
  return true;
} {code}
 

  was:
In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
found an EC file decoding bug if more than one data block read failed. 

Currently, we found another bug trigger by #StatefulStripeReader.decode.

If we read an EC file which {*}length more than one stripe{*}, and this file 
have *one data block* and *the first parity block* corrupted, this error will 
happen.
{code:java}
org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not 
allowing null    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:48)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
    at 
org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
    at 
org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
    at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 
{code}


> EC decoding failed due to invalid buffer
> ----------------------------------------
>
>                 Key: HDFS-16544
>                 URL: https://issues.apache.org/jira/browse/HDFS-16544
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding
>            Reporter: qinyuren
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
> found an EC file decoding bug if more than one data block read failed. 
> Currently, we found another bug trigger by #StatefulStripeReader.decode.
> If we read an EC file which {*}length more than one stripe{*}, and this file 
> have *one data block* and *the first parity block* corrupted, this error will 
> happen.
> {code:java}
> org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not 
> allowing null    at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:48)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
>     at 
> org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
>     at 
> org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
>     at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
>     at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
>     at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
>     at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 
> {code}
>  
> Let's say we use ec(6+3) and the data block[0] and the first parity block[6] 
> are corrupted.
>  # The readers for block[0] and block[6] will be closed after reading the 
> first stripe of an EC file;
>  # When the client reading the second stripe of the EC file, it will trigger 
> #prepareParityChunk for block[6]. 
>  # The decodeInputs[6] will not be constructed due to the reader for block[6] 
> was closed.
>  
> {code:java}
> boolean prepareParityChunk(int index) {
>   Preconditions.checkState(index >= dataBlkNum
>       && alignedStripe.chunks[index] == null);
>   if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
>     alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING);
>     // we have failed the block reader before
>     return false;
>   }
>   final int parityIndex = index - dataBlkNum;
>   ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate();
>   buf.position(cellSize * parityIndex);
>   buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock);
>   decodeInputs[index] =
>       new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock);
>   alignedStripe.chunks[index] =
>       new StripingChunk(decodeInputs[index].getBuffer());
>   return true;
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to