[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

Yi Liu (JIRA) Thu, 21 May 2015 20:23:58 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555514#comment-14555514
 ]


Yi Liu commented on HADOOP-11847:
---------------------------------

{quote}
Sorry I missed to explain why the codes are like that. It was thinking that 
it's rarely the first units that's erased, so in most cases just checking 
inputs\[0\] will return the wanted result, avoiding involving into the loop.
{quote}
If the first element is not null, it will return. It will have loop?

{quote}
How about simply having maxInvalidUnits = numParityUnits? The good is we don't 
have to re-allocate the shared buffers for different erasures.
{quote}
We don't need to allocate {{numParityUnits}} number of buffers, the output 
should at least have one, right?  Maybe more than one.   I don't think we have 
to re-allocate the shared buffers for different erasures.   If the buffers is 
not enough, then we allocate new and add it to the shared pool, it's typically 
behavior.

{quote}
We don't have or use chunkSize now. Please note the check is:
{quote}
Right, we don't need to use ChunkSize now.  I think 
{{bytesArrayBuffers\[0\].length < dataLen}} is OK.    
{{ensureBytesArrayBuffer}} and {{ensureDirectBuffers}} need to be renamed and 
rewritten per above comments.

{quote}
Would you check again, thanks.
{quote}
{code}
    for (int i = 0; i < adjustedByteArrayOutputsParameter.length; i++) {
      adjustedByteArrayOutputsParameter[i] =
          resetBuffer(bytesArrayBuffers[bufferIdx++], 0, dataLen);
      adjustedOutputOffsets[i] = 0; // Always 0 for such temp output
    }

    int outputIdx = 0;
    for (int i = 0; i < erasedIndexes.length; i++, outputIdx++) {
      for (int j = 0; j < erasedOrNotToReadIndexes.length; j++) {
        // If this index is one requested by the caller via erasedIndexes, then
        // we use the passed output buffer to avoid copying data thereafter.
        if (erasedIndexes[i] == erasedOrNotToReadIndexes[j]) {
          adjustedByteArrayOutputsParameter[j] =
              resetBuffer(outputs[outputIdx], 0, dataLen);
          adjustedOutputOffsets[j] = outputOffsets[outputIdx];
        }
      }
    }
{code}
You call {{resetBuffer}}: parityNum + erasedIndexes,  is that true?    


> Enhance raw coder allowing to read least required inputs in decoding
> --------------------------------------------------------------------
>
>                 Key: HADOOP-11847
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11847
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: io
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>              Labels: BB2015-05-TBR
>         Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
> HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch, 
> HADOOP-11847-HDFS-7285-v6.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch
>
>
> This is to enhance raw erasure coder to allow only reading least required 
> inputs while decoding. It will also refine and document the relevant APIs for 
> better understanding and usage. When using least required inputs, it may add 
> computating overhead but will possiblly outperform overall since less network 
> traffic and disk IO are involved.
> This is something planned to do but just got reminded by [~zhz]' s question 
> raised in HDFS-7678, also copied here:
> bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
> is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
> I construct the inputs to RawErasureDecoder#decode?
> With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

Reply via email to