[ https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728959#comment-14728959 ]
Walter Su commented on HDFS-8383: --------------------------------- There is debate long ago about parallel write and pipeline write. Parallel looks like not quite compelling. If HDFS supports parallel, implementing DFSStripedOutputStream would be quite easy. DFSStripedOutputStream/StripedDataStreamer is very like parallel write. If you change DFSStripedOutputStream.writeChunk(..) you can do parallel write for non-EC files easily. We have done the heavy lifting(synchronization), but don't want to change many existing code of the pipeline mechanism. bq. Right now when a DN (e.g. DN_0) fails, we handle other streams (DN_1~DN_5) as if each of them has a failed DN. We trigger processDatanodeError to close the stream and open again with the same DN. This overhead isn't really necessary. IIUC all we want to do is to bump the GenerationStamp for internal blocks 1~5. Can we do it by sending a packet (or piggybacking with a data packet) to DN? I think it's incompatible, and changes the protocol of the pipeline mechanism. Nothing I can do for single failure. I do suggest interrupt the on-going recovery for multiple failures to reduce the number of stream open/close. I have added a TODO. bq. By doing the above we can also simplify the error handling logic. All we need is an AtomicInteger groupGS in DFSStripedOutputStream recording the current GS. Each failed streamer should increment groupGS. Each streamer can compare groupGS with its current GS before sending the next packet. Without #2 improvement, this is just about passive vs active. bq. Regardless of this change, the write error handling logic is already very complex IMO. Maybe we can consider moving locateFollowingBlock to OutputStream level so the streamer's task is capped within a single block. For non-EC files this refactor will also facilitate HDFS-8955. OutputStream and streamer have different roles to play. I think {{locateFollowingBlock}} belong to streamer. Actually it should belong to a single {{BlockGroupDataStreamer}} to communicate with NN to allocate/update block, and {{StripedDataStreamer}} only has to stream block to DN. But I think it's ok don't seperate them, just let fastest streamer take the job. > Tolerate multiple failures in DFSStripedOutputStream > ---------------------------------------------------- > > Key: HDFS-8383 > URL: https://issues.apache.org/jira/browse/HDFS-8383 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Tsz Wo Nicholas Sze > Assignee: Walter Su > Attachments: HDFS-8383.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)