[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728959#comment-14728959
 ] 

Walter Su commented on HDFS-8383:
---------------------------------

There is debate long ago about parallel write and pipeline write. Parallel 
looks like not quite compelling. If HDFS supports parallel, implementing 
DFSStripedOutputStream would be quite easy. 
DFSStripedOutputStream/StripedDataStreamer is very like parallel write. If you 
change DFSStripedOutputStream.writeChunk(..) you can do parallel write for 
non-EC files easily. We have done the heavy lifting(synchronization), but don't 
want to change many existing code of the pipeline mechanism.

bq. Right now when a DN (e.g. DN_0) fails, we handle other streams (DN_1~DN_5) 
as if each of them has a failed DN. We trigger processDatanodeError to close 
the stream and open again with the same DN. This overhead isn't really 
necessary. IIUC all we want to do is to bump the GenerationStamp for internal 
blocks 1~5. Can we do it by sending a packet (or piggybacking with a data 
packet) to DN?
I think it's incompatible, and changes the protocol of the pipeline mechanism. 
Nothing I can do for single failure. I do suggest interrupt the on-going 
recovery for multiple failures to reduce the number of stream open/close. I 
have added a TODO.

bq. By doing the above we can also simplify the error handling logic. All we 
need is an AtomicInteger groupGS in DFSStripedOutputStream recording the 
current GS. Each failed streamer should increment groupGS. Each streamer can 
compare groupGS with its current GS before sending the next packet.
Without #2 improvement, this is just about passive vs active.

bq. Regardless of this change, the write error handling logic is already very 
complex IMO. Maybe we can consider moving locateFollowingBlock to OutputStream 
level so the streamer's task is capped within a single block. For non-EC files 
this refactor will also facilitate HDFS-8955.
OutputStream and streamer have different roles to play. I think 
{{locateFollowingBlock}} belong to streamer. Actually it should belong to a 
single {{BlockGroupDataStreamer}} to communicate with NN to allocate/update 
block, and {{StripedDataStreamer}} only has to stream block to DN. But I think 
it's ok don't seperate them, just let fastest streamer take the job.



> Tolerate multiple failures in DFSStripedOutputStream
> ----------------------------------------------------
>
>                 Key: HDFS-8383
>                 URL: https://issues.apache.org/jira/browse/HDFS-8383
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Walter Su
>         Attachments: HDFS-8383.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to