[ https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555514#comment-14555514 ]
Yi Liu commented on HADOOP-11847: --------------------------------- {quote} Sorry I missed to explain why the codes are like that. It was thinking that it's rarely the first units that's erased, so in most cases just checking inputs\[0\] will return the wanted result, avoiding involving into the loop. {quote} If the first element is not null, it will return. It will have loop? {quote} How about simply having maxInvalidUnits = numParityUnits? The good is we don't have to re-allocate the shared buffers for different erasures. {quote} We don't need to allocate {{numParityUnits}} number of buffers, the output should at least have one, right? Maybe more than one. I don't think we have to re-allocate the shared buffers for different erasures. If the buffers is not enough, then we allocate new and add it to the shared pool, it's typically behavior. {quote} We don't have or use chunkSize now. Please note the check is: {quote} Right, we don't need to use ChunkSize now. I think {{bytesArrayBuffers\[0\].length < dataLen}} is OK. {{ensureBytesArrayBuffer}} and {{ensureDirectBuffers}} need to be renamed and rewritten per above comments. {quote} Would you check again, thanks. {quote} {code} for (int i = 0; i < adjustedByteArrayOutputsParameter.length; i++) { adjustedByteArrayOutputsParameter[i] = resetBuffer(bytesArrayBuffers[bufferIdx++], 0, dataLen); adjustedOutputOffsets[i] = 0; // Always 0 for such temp output } int outputIdx = 0; for (int i = 0; i < erasedIndexes.length; i++, outputIdx++) { for (int j = 0; j < erasedOrNotToReadIndexes.length; j++) { // If this index is one requested by the caller via erasedIndexes, then // we use the passed output buffer to avoid copying data thereafter. if (erasedIndexes[i] == erasedOrNotToReadIndexes[j]) { adjustedByteArrayOutputsParameter[j] = resetBuffer(outputs[outputIdx], 0, dataLen); adjustedOutputOffsets[j] = outputOffsets[outputIdx]; } } } {code} You call {{resetBuffer}}: parityNum + erasedIndexes, is that true? > Enhance raw coder allowing to read least required inputs in decoding > -------------------------------------------------------------------- > > Key: HADOOP-11847 > URL: https://issues.apache.org/jira/browse/HADOOP-11847 > Project: Hadoop Common > Issue Type: Sub-task > Components: io > Reporter: Kai Zheng > Assignee: Kai Zheng > Labels: BB2015-05-TBR > Attachments: HADOOP-11847-HDFS-7285-v3.patch, > HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch, > HADOOP-11847-HDFS-7285-v6.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch > > > This is to enhance raw erasure coder to allow only reading least required > inputs while decoding. It will also refine and document the relevant APIs for > better understanding and usage. When using least required inputs, it may add > computating overhead but will possiblly outperform overall since less network > traffic and disk IO are involved. > This is something planned to do but just got reminded by [~zhz]' s question > raised in HDFS-7678, also copied here: > bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 > is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should > I construct the inputs to RawErasureDecoder#decode? > With this work, hopefully the answer to above question would be obvious. -- This message was sent by Atlassian JIRA (v6.3.4#6332)