[ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647239#comment-14647239
 ] 

Li Bo commented on HDFS-8704:
-----------------------------

I have just update a second patch of this problem. The changes in this patch 
include:
1.      {{DFSStripedOutputStream}} and the failed status of 
{{StripedDataStreamer}}:  it’s not right to take different actions according to 
the current status of a streamer. When a streamer has failed, while the packet 
to be queued belongs to the next block(the streamer can successfully write to 
that block because the new datanode may be well), in this condition the packet 
should be handled as usual. When {{DFSStripedOutputStream}} finds a streamer is 
working well, it queue the packet to the streamer, but the streamer may fail 
before sending the packet. So I remove the logic of checking and setting the 
failed status of a streamer in {{DFSStripedOutputStream}}.  When a streamer 
fails, itself knows how to handle the failure.
2.      Extend the functionality of {{StripedDataStreamer}} : if error occurs, 
{{ StripedDataStreamer }} will first handle remaining trivial packets of 
current block, and then restart to waiting for a new block to be allocated to 
it. 
3.      Add a test to {{TestDFSStripedOutputStreamWithFailure}} which tests 
writing a file with two block groups.  

The unit test occasionally fails because only 8 block locations are given by 
namenode for the second block group. HDFS-8839 has been created to track this 
problem.


> Erasure Coding: client fails to write large file when one datanode fails
> ------------------------------------------------------------------------
>
>                 Key: HDFS-8704
>                 URL: https://issues.apache.org/jira/browse/HDFS-8704
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch
>
>
> I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
> corrupt, client succeeds to write a file smaller than a block group but fails 
> to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
> files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to