[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

Zhe Zhang (JIRA) Mon, 12 Jan 2015 08:57:13 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273774#comment-14273774
 ]


Zhe Zhang commented on HDFS-7344:
---------------------------------

bq. In fact, the code uploaded is about non-striping encode/decode before we 
decided to implement striping first. Because the basic idea is similar and 
BlockReader/BlockWriter will be reused, I hope we can get some feedbacks to 
help further development.
Thanks for clarifying.

bq. For DN side EC work, multiple blocks may be generated. If we send these 
blocks after they're entirely generated, each EC work will consume a lot of 
memory, typically 4*128M~6*128M, and there may be several EC work for a DN at 
the same time. So a better choice is to allocate a buffer for each EC 
work(produce-consumer model). When the buffer is full, encoder/decoder will 
wait for BlockWriter to write the buffer locally or remotely. 
In most recovery cases, each ECWorker only generates 1 block. In conversion 
each ECWorker does need to generate multiple blocks. How about generating all 
these blocks locally (on disk) and then sending them to remote DNs? The 
downside is increased disk I/O. But my feeling is that the complex 
{{DataStreamer}} logic is an overkill here. Would be great if other folks can 
chime in.

bq. BlockReader and BlockWriter will have several sub classes, that is, operate 
data locally or remotely, work in datanode or client. We can refine the logic 
to get the best efficiency for different classes.
I see. We can leave it as-is then, and wait until we see the client side 
implementation to discuss further details.

bq. When DN receives an encoding/decoding work from namenode, it will send it 
to ECWorker.
Then the patch should modify {{DataNode}} class?

> Erasure Coding worker and support in DataNode
> ---------------------------------------------
>
>                 Key: HDFS-7344
>                 URL: https://issues.apache.org/jira/browse/HDFS-7344
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Kai Zheng
>            Assignee: Li Bo
>         Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, 
> hdfs-ec-datanode.0108.zip
>
>
> According to HDFS-7285 and the design, this handles DataNode side extension 
> and related support for Erasure Coding, and implements ECWorker. It mainly 
> covers the following aspects, and separate tasks may be opened to handle each 
> of them.
> * Process encoding work, calculating parity blocks as specified in block 
> groups and codec schema;
> * Process decoding work, recovering data blocks according to block groups and 
> codec schema;
> * Handle client requests for passive recovery blocks data and serving data on 
> demand while reconstructing;
> * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

Reply via email to