[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741388#comment-14741388
 ] 

Jing Zhao commented on HDFS-9040:
---------------------------------

Thanks for the patch, Walter! I think this looks much clearer compared with the 
current implementation. Some thoughts and comments:
# In general I think it's the correct direction to push all the coordination 
logic into one place, and let all the other streamers simply transfer data.
# Currently the new block allocation step and failure handling steps can still 
be interleaved. To me this may be too hard to guarantee the correctness. For 
example, we need to handle a scenario where some data streamer has not fetched 
the new block yet when the coordinator starts handling a failure. The current 
patch tries to handle this by checking the corresponding following block queue. 
But since a data streamer can be in a state where it fetches the new block but 
has not assigned new values to its nodes/storageTypes, we may still have some 
race condition. Thus I agree with Nicholas's comment 
[here|https://issues.apache.org/jira/browse/HDFS-8383?focusedCommentId=14737962&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737962],
 i.e., we need to add some "barriers" to sync all the data streamers and so as 
to simplify the problem.
# More specifically, my current proposal for failure handling looks like this:
The coordinator side:
#* Check if there is failure(s) periodically. If we use DFSStripedOutputStream 
as the coordinator, we can easily do this in {{writeChunk}}, e.g., to check 
failures whenever we've received one stripe of data.
#* If there is new failure, first wait till all the healthy streamers fetch the 
new block and are in DATA_STREAMING stage.
#* Mark all the healthy streamers as external error.
#* Call updateBlockForPipeline and get the new GS.
#* Wait till all the healthy streamers to fetch the new block from the queue 
and creating new block streams.
#* If there is new failure happening when creating new block streams, notify 
all the remaining streamers the failure and keep them in the external error 
state. Repeat the above steps.
#* Otherwise reset all the external error states and make the updatePipeline 
RPC call. Then notify all the streamers that this failure handling session has 
succeeded.
# The DataStreamer side:
#* When finding itself in external error state, wait and take the new block 
from the blocking queue.
#* Create new datanode connection using the new block.
#* Notify the coordinator the result of the new datanode connection creation.
#* If the connection creation succeeded, wait the coordinator for the overall 
result.
#* If all the involving streamers succeed, update its block based on the new GS.
#* Otherwise repeat the steps.
#* And instead of overriding updateBlockForPipeline and updatePipeline, it may 
be easier to implement the above logic by overriding 
{{setupPipelineForAppendOrRecovery}}.
# Obviously the above proposal may still have some holes. But the direction 
here is to make sure there is no overlap between different error handling 
efforts and the new block allocation. Please see if this makes sense to you.
# Also I think it is easier to implement the above logic in 
StripedOutputStream: 1) it's easier to determine when to start block allocation 
and failure check, 2) it's easier to handle exceptions during the NN RPCs since 
we do not need to pass the exception from a separate coordinator thread. But we 
can discuss this further and please let me know if I miss something.

Currently I have an in-progress patch implementing the above proposal. I will 
try to make it in a better shape and post it as a demo soon.

> Erasure coding: A BlockGroupDataStreamer to rule all internal blocks streamers
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>            Assignee: Walter Su
>         Attachments: HDFS-9040.00.patch
>
>
> A {{BlockGroupDataStreamer}} to communicate with NN to allocate/update block, 
> and {{StripedDataStreamer}} s only have to stream blocks to DNs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to