[ https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530193#comment-14530193 ]
Kai Zheng commented on HADOOP-11847: ------------------------------------ Hi [~hitliuyi], Thanks for your good thoughts about the decoder API. It's refined as below. How do you like of this? Thanks. {code} /** * Decode with inputs and erasedIndexes, generates outputs. * How to prepare for inputs: * 1. Create an array containing parity units + data units; * 2. Set null in the array locations specified via erasedIndexes to indicate * they're erased and no data are to read from; * 3. Set null in the array locations for extra redundant items, as they're not * necessary to read when decoding. For example in RS-6-3, if only 1 unit * is really erased, then we have 2 extra items as redundant. They can be * set as null to indicate no data will be used from them. * * For an example using RS (6, 3), assuming sources (d0, d1, d2, d3, d4, d5) * and parities (p0, p1, p2), d2 being erased. We can and may want to use only * 6 units like (d1, d3, d4, d5, p0, p2) to recover d2. We will have: * inputs = [p0, null(p1), p2, null(d0), d1, null(d2), d3, d4, d5] * erasedIndexes = [5] // index of d2 into inputs array * outputs = [a-writable-buffer] * * @param inputs inputs to read data from * @param erasedIndexes indexes of erased units into inputs array * @param outputs outputs to write into for data generated according to * erasedIndexes */ public void decode(ByteBuffer[] inputs, int[] erasedIndexes, ByteBuffer[] outputs); {code} The impact from the caller's point of view: It requires to provide input buffers using NULL to indicate not to read or erased; It requires to provide erasedIndexes to be for the ones that's really erased and to be taken care of. > Enhance raw coder allowing to read least required inputs in decoding > -------------------------------------------------------------------- > > Key: HADOOP-11847 > URL: https://issues.apache.org/jira/browse/HADOOP-11847 > Project: Hadoop Common > Issue Type: Sub-task > Components: io > Reporter: Kai Zheng > Assignee: Kai Zheng > Labels: BB2015-05-TBR > Attachments: HADOOP-11847-HDFS-7285-v3.patch, > HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch > > > This is to enhance raw erasure coder to allow only reading least required > inputs while decoding. It will also refine and document the relevant APIs for > better understanding and usage. When using least required inputs, it may add > computating overhead but will possiblly outperform overall since less network > traffic and disk IO are involved. > This is something planned to do but just got reminded by [~zhz]' s question > raised in HDFS-7678, also copied here: > bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 > is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should > I construct the inputs to RawErasureDecoder#decode? > With this work, hopefully the answer to above question would be obvious. -- This message was sent by Atlassian JIRA (v6.3.4#6332)