[ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733248#comment-14733248
 ] 

Li Bo commented on HDFS-8704:
-----------------------------

I have tried replacing the failed streamer with a new one.  When replacing, the 
outputstream has to stop sending packets to old streamer and start sending 
packets to new streamer after all packets of next block are moved  from old to 
new streamer. It's much more difficult than restarting the failed streamer. The 
auto restart of failed streamer makes ouputstream unnecessary to care about if 
some streamer is failed.

> Erasure Coding: client fails to write large file when one datanode fails
> ------------------------------------------------------------------------
>
>                 Key: HDFS-8704
>                 URL: https://issues.apache.org/jira/browse/HDFS-8704
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, 
> HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch, 
> HDFS-8704-HDFS-7285-005.patch, HDFS-8704-HDFS-7285-006.patch, 
> HDFS-8704-HDFS-7285-007.patch
>
>
> I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
> corrupt, client succeeds to write a file smaller than a block group but fails 
> to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
> files smaller than a block group, this jira will add more test situations.
> A streamer may encounter some bad datanodes when writing blocks allocated to 
> it. When it fails to connect datanode or send a packet, the streamer needs to 
> prepare for the next block. First it removes the packets of current  block 
> from its data queue. If the first packet of next block has already been in 
> the data queue, the streamer will reset its state and start to wait for the 
> next block allocated for it; otherwise it will just wait for the first packet 
> of next block. The streamer will check periodically if it is asked to 
> terminate during its waiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to