[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

Kai Zheng (JIRA) Wed, 06 May 2015 02:09:50 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530193#comment-14530193
 ]


Kai Zheng commented on HADOOP-11847:
------------------------------------

Hi [~hitliuyi],

Thanks for your good thoughts about the decoder API. It's refined as below. How 
do you like of this? Thanks.
{code}
  /**
   * Decode with inputs and erasedIndexes, generates outputs.
   * How to prepare for inputs:
   * 1. Create an array containing parity units + data units;
   * 2. Set null in the array locations specified via erasedIndexes to indicate
   *    they're erased and no data are to read from;
   * 3. Set null in the array locations for extra redundant items, as they're 
not
   *    necessary to read when decoding. For example in RS-6-3, if only 1 unit
   *    is really erased, then we have 2 extra items as redundant. They can be
   *    set as null to indicate no data will be used from them.
   *
   * For an example using RS (6, 3), assuming sources (d0, d1, d2, d3, d4, d5)
   * and parities (p0, p1, p2), d2 being erased. We can and may want to use only
   * 6 units like (d1, d3, d4, d5, p0, p2) to recover d2. We will have:
   *     inputs = [p0, null(p1), p2, null(d0), d1, null(d2), d3, d4, d5]
   *     erasedIndexes = [5] // index of d2 into inputs array
   *     outputs = [a-writable-buffer]
   *
   * @param inputs inputs to read data from
   * @param erasedIndexes indexes of erased units into inputs array
   * @param outputs outputs to write into for data generated according to
   *                erasedIndexes
   */
public void decode(ByteBuffer[] inputs, int[] erasedIndexes, ByteBuffer[] 
outputs);
{code}
The impact from the caller's point of view:
It requires to provide input buffers using NULL to indicate not to read or 
erased;
It requires to provide erasedIndexes to be for the ones that's really erased 
and  to be taken care of.

> Enhance raw coder allowing to read least required inputs in decoding
> --------------------------------------------------------------------
>
>                 Key: HADOOP-11847
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11847
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: io
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>              Labels: BB2015-05-TBR
>         Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
> HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch
>
>
> This is to enhance raw erasure coder to allow only reading least required 
> inputs while decoding. It will also refine and document the relevant APIs for 
> better understanding and usage. When using least required inputs, it may add 
> computating overhead but will possiblly outperform overall since less network 
> traffic and disk IO are involved.
> This is something planned to do but just got reminded by [~zhz]' s question 
> raised in HDFS-7678, also copied here:
> bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
> is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
> I construct the inputs to RawErasureDecoder#decode?
> With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

Reply via email to