[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647239#comment-14647239 ]
Li Bo commented on HDFS-8704: ----------------------------- I have just update a second patch of this problem. The changes in this patch include: 1. {{DFSStripedOutputStream}} and the failed status of {{StripedDataStreamer}}: it’s not right to take different actions according to the current status of a streamer. When a streamer has failed, while the packet to be queued belongs to the next block(the streamer can successfully write to that block because the new datanode may be well), in this condition the packet should be handled as usual. When {{DFSStripedOutputStream}} finds a streamer is working well, it queue the packet to the streamer, but the streamer may fail before sending the packet. So I remove the logic of checking and setting the failed status of a streamer in {{DFSStripedOutputStream}}. When a streamer fails, itself knows how to handle the failure. 2. Extend the functionality of {{StripedDataStreamer}} : if error occurs, {{ StripedDataStreamer }} will first handle remaining trivial packets of current block, and then restart to waiting for a new block to be allocated to it. 3. Add a test to {{TestDFSStripedOutputStreamWithFailure}} which tests writing a file with two block groups. The unit test occasionally fails because only 8 block locations are given by namenode for the second block group. HDFS-8839 has been created to track this problem. > Erasure Coding: client fails to write large file when one datanode fails > ------------------------------------------------------------------------ > > Key: HDFS-8704 > URL: https://issues.apache.org/jira/browse/HDFS-8704 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Li Bo > Assignee: Li Bo > Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch > > > I test current code on a 5-node cluster using RS(3,2). When a datanode is > corrupt, client succeeds to write a file smaller than a block group but fails > to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests > files smaller than a block group, this jira will add more test situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)