[jira] [Comment Edited] (HDFS-10460) Erasure Coding: Recompute block checksum for a particular range less than file size on the fly by reconstructing missed block

Rakesh R (JIRA) Mon, 20 Jun 2016 04:43:11 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338600#comment-15338600
 ]


Rakesh R edited comment on HDFS-10460 at 6/20/16 11:41 AM:
-----------------------------------------------------------

Thanks [~drankye] for the review comments.

bq. 1. Could you explain why we need to add actualNumBytes for this, or 
ellaborate some bit in the description for better understanding
I've used the {{actualNumBytes}} parameter for reconstructing the block 
correctly. Initially I have tried {{requestLength}} value for reconstructing 
the block but hits the following exception. IIUC this could occur in cases 
where the requested length is conflicting with the target buffer size. Probably 
you can reproduce this exception by commenting out setting of acutalNumBytes 
value after applying my patch and run 
{{TestFileChecksum#testStripedFileChecksumWithMissedDataBlocksRangeQuery1}}
{code}
BlockChecksumHelper.java
line no#481

      ExtendedBlock reconBlockGroup = new ExtendedBlock(blockGroup);
      // reconBlockGroup.setNumBytes(actualNumBytes);
{code}
{code}
2016-06-19 21:37:34,583 [DataXceiver for client /127.0.0.1:5882 [Getting 
checksum for block 
groupBP-1490511527-10.252.155.196-1466352430600:blk_-9223372036854775792_1001]] 
ERROR datanode.DataNode (DataXceiver.java:run(316)) - 
127.0.0.1:5333:DataXceiver error processing BLOCK_GROUP_CHECKSUM operation  
src: /127.0.0.1:5882 dst: /127.0.0.1:5333
org.apache.hadoop.HadoopIllegalArgumentException: No enough valid inputs are 
provided, not recoverable
        at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:107)
{code}

I have done the following approach to handle {{requestLength}} less than 
{{cellSize}}. 
1) reconstruct target buffers using {{actualNumBytes}}
2) take a copy of target buffer using remaining length. 
3) using this copied buffer it will calculate the checksum.
{code}
StripedBlockChecksumReconstructor.java
line no#93

if (requestedLen <= toReconstructLen) {
        int remainingLen = (int) requestedLen;
        outputData = Arrays.copyOf(targetBuffer.array(), remainingLen);
{code}


bq. 1) you mean less than bytesPerCRC, but in fact you passed bytesPerCRC as 
the request length. 2) you could get bytesPerCRC and save it in setup method? 
So you can use it in other tests.
Yes, I will do this modifications in next patch.


was (Author: rakeshr):
Thanks [~drankye] for the review comments.

bq. 1. Could you explain why we need to add actualNumBytes for this, or 
ellaborate some bit in the description for better understanding
I've used the {{actualNumBytes}} parameter for reconstructing the block 
correctly. Initially I have tried {{requestLength}} value for reconstructing 
the block to avoid following exception. IIUC this could occur in cases where 
the requested length is conflicting with the target buffer size. Probably you 
can reproduce this exception by commenting out setting of acutalNumBytes value 
after applying my patch and run 
{{TestFileChecksum#testStripedFileChecksumWithMissedDataBlocksRangeQuery1}}
{code}
BlockChecksumHelper.java
line no#481

      ExtendedBlock reconBlockGroup = new ExtendedBlock(blockGroup);
      // reconBlockGroup.setNumBytes(actualNumBytes);
{code}
{code}
2016-06-19 21:37:34,583 [DataXceiver for client /127.0.0.1:5882 [Getting 
checksum for block 
groupBP-1490511527-10.252.155.196-1466352430600:blk_-9223372036854775792_1001]] 
ERROR datanode.DataNode (DataXceiver.java:run(316)) - 
127.0.0.1:5333:DataXceiver error processing BLOCK_GROUP_CHECKSUM operation  
src: /127.0.0.1:5882 dst: /127.0.0.1:5333
org.apache.hadoop.HadoopIllegalArgumentException: No enough valid inputs are 
provided, not recoverable
        at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:107)
{code}

I have done the following approach to handle requestLength less than cellSize. 
First, using the actualNumBytes it will allow to reconstruct the buffers and 
then take a copy of the targetbuffer using the remaining length. Then using 
this copied buffer will calculate the checksum.
{code}
StripedBlockChecksumReconstructor.java
line no#93

if (requestedLen <= toReconstructLen) {
        int remainingLen = (int) requestedLen;
        outputData = Arrays.copyOf(targetBuffer.array(), remainingLen);
{code}


bq. 1) you mean less than bytesPerCRC, but in fact you passed bytesPerCRC as 
the request length. 2) you could get bytesPerCRC and save it in setup method? 
So you can use it in other tests.
Yes, I will do this modifications in next patch.

> Erasure Coding: Recompute block checksum for a particular range less than 
> file size on the fly by reconstructing missed block
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10460
>                 URL: https://issues.apache.org/jira/browse/HDFS-10460
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: HDFS-10460-00.patch, HDFS-10460-01.patch
>
>
> This jira is HDFS-9833 follow-on task to address reconstructing block and 
> then recalculating block checksum for a particular range query.
> For example,
> {code}
> // create a file 'stripedFile1' with fileSize = cellSize * numDataBlocks = 
> 65536 * 6 = 393216
> FileChecksum stripedFileChecksum = getFileChecksum(stripedFile1, 10, true);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-10460) Erasure Coding: Recompute block checksum for a particular range less than file size on the fly by reconstructing missed block

Reply via email to