[ https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271900#comment-14271900 ]
Zhe Zhang commented on HDFS-7344: --------------------------------- Some quick comments: Cosmetics: # It seems the {{ec}} package should at least be under hdfs/? # All test classes should be under src/test instead of src/main Code logic: # Looks like we need an updated design doc? # In general I think the client implementation (HDFS-7545) should go before DN: ** Client support is needed in regular I/O, while DN is only involved in recovery and conversion ** I see that the DN patch here tries to reuse the client side striping/codec logic (e.g., {{ECRemoteBlockWriter}}). It is helpful to first finalize the client code itself. # Apparently {{ECRemoteBlockWriter}} is a copy of {{DFSOutputStream}} now. Many complex components and logics in {{DFSOutputStream}} (e.g., {{DataStreamer}}) are only useful on the client side. For example, it needs a {{dataQueue}} to buffer packets because client might write data slowly and in small units. The client write pipeline is actually very complicated and should be avoided if possible. How much benefits are there for a DN to transfer recovered/converted data to peer DNs in small units, rather than after the entire block is recovered/converted? # How does DN initiate ECWorker? # ECRemoteBlockReader extends ECBlockReaderBase, which implements ECBlockReader: is this abstraction necessary? I.e., except for ECRemoteBlockReader, what other block readers could extend ECBlockReaderBase? # In general, rather than referring to and leveraging client side reader/writer code, I think we should refer to DN side transfer functions like {{DataNode#transferBlock}}, which are much simpler. > Erasure Coding worker and support in DataNode > --------------------------------------------- > > Key: HDFS-7344 > URL: https://issues.apache.org/jira/browse/HDFS-7344 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode > Reporter: Kai Zheng > Assignee: Li Bo > Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, > hdfs-ec-datanode.0108.zip > > > According to HDFS-7285 and the design, this handles DataNode side extension > and related support for Erasure Coding, and implements ECWorker. It mainly > covers the following aspects, and separate tasks may be opened to handle each > of them. > * Process encoding work, calculating parity blocks as specified in block > groups and codec schema; > * Process decoding work, recovering data blocks according to block groups and > codec schema; > * Handle client requests for passive recovery blocks data and serving data on > demand while reconstructing; > * Write parity blocks according to storage policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)