[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744645#comment-14744645
 ] 

Walter Su commented on HDFS-9040:
---------------------------------

bq. Yeah, I can understand your concern. In the replication mechanism, the 
async implementation matches the single write pipeline model, and the 
datastreamer can handle its failure perfectly. But with 9 streamers in 
parallel, we need to 1) sync all the streamers when writing a new block, and 2) 
stop all the streamers and assign them with new GS when failure happens. Thus I 
think we'd better add some sync code in DFSStripedOutputStream. Also in this 
way it becomes easier to calculate block length and set/reset external error 
state.
yeah. streamer synchronization is too slow. It can't be slower to do it in 
DFSStripedOutputStream. I'll take some time to review the patch.

bq. With BlockGroupDataStreamer I can make 9 internal streamers to wait for 
error-handling to be finished, until then I put empty_last_packet to all 9 
internal streamers to let them close blockStreams.
bq. I actually did similar thing: closeImpl() first let all the streamers to 
flush out all the data packets, then call checkStreamerFailures to handle any 
failure during the data transfer, and in the end to send out the last empty 
packet to close the packet. But the challenge here is, we could not use the 
same way to handle the failure for the last empty packet, since successful 
streamers may have closed the block already.
closeImpl() did well in handling last paritial blockGroup. What if the failure 
happens in last stripe of full blockGroup? The first # streamers ends but one 
of the last streamers fails.
{noformat}
writeChunk(..) --> super.writeChunk(..) --> enqueueCurrentPacketFull() --> 
endBlock() --> send empty_last_packet
{noformat}


> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>         Attachments: HDFS-9040-HDFS-7285.002.patch, HDFS-9040.00.patch, 
> HDFS-9040.001.wip.patch, HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and 
> StripedDataStreamer s only have to stream blocks to DNs.
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to