[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744443#comment-14744443
 ] 

Zhe Zhang commented on HDFS-9040:
---------------------------------

bq. The reason I prefer not to do locateFollowingBlock in DFSOutputStream is, 
DFSOutputStream is async with DataStreamer
bq. Yeah, I can understand your concern. In the replication mechanism, the 
async implementation matches the single write pipeline model, and the 
datastreamer can handle its failure perfectly. But with 9 streamers in 
parallel, we need to 1) sync all the streamers when writing a new block, and 2) 
stop all the streamers and assign them with new GS when failure happens. Thus I 
think we'd better add some sync code in DFSStripedOutputStream. Also in this 
way it becomes easier to calculate block length and set/reset external error 
state.
Very good discussion here. Jing's patch leaves the behavior of non-EC 
{{DFSOutputStream}} and {{DataStreamer}} unchanged: the streamer is still in 
charge of locating following blocks. I think we should probably change that as 
well so that {{OutputStream}} and streamer have consistent roles under both 
contiguous and striped layouts.

bq. Currently the fastest streamer also has to wait for other streamers before 
requesting a following block group from NN, so I think we may not feel the 
writing speed becomes slow.
Considering the buffer in {{DFSOutputStream}}, the above is only partially 
true. Performance-wise it still makes sense to decouple 
{{locateFollowingBlock}} from the main {{DFSOutputStream}} thread. How about 
starting a separate thread to allocate new block?

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>         Attachments: HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and 
> StripedDataStreamer s only have to stream blocks to DNs.
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to