[ https://issues.apache.org/jira/browse/HDFS-7889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495614#comment-14495614 ]
Li Bo commented on HDFS-7889: ----------------------------- hi, Zhe, please see my following explanation of the related code. The first(leading) streamer is responsible for committing block groups. Before committing, the first streamer needs to wait for other streamers to finish writing their blocks and then count the total number of bytes written in this block group. Because streamers only share {{stripedBlocks}}, when an ordinary streamer finish writing its block, it has to report its work to leading streamer. It sends a LocatedBlock object(containing how many bytes it has written for its block) to the blocking queue of leading streamer(i.e.{{stripedBlocks\[0\]}}). The leading streamer will wait for the queue and collect other streamers' report. The ordinary streamer can just send an Integer to the leading streamer, here I choose LocatedBlock is because it may be more convenient to do error handling in HDFS-7786. bq. hasCommittedBlock is initially false. But once becoming true, it will never be false again. What's the purpose of this flag? For an ordinary streamer, it send its report to leading streamer in {{endBlock}} when it finishes writing a block. For the leading streamer, at first he just request a block group from NN. When it has to request another block group, it has to commit the old one. So {{hasCommittedBlock}} will be true after the first request. bq. Why are we always polling the first located block, instead of the i_th? {{stripedBlocks.get(0)}} is the blocking queue of the leading streamer, it needs to get the results of other streamer’s work before committing the block group to NN. bq. Shouldn't we always commit block.getNumBytes() * NUM_DATA_BLOCKS? The size of last block group may be smaller than {{block.getNumBytes() * NUM_DATA_BLOCKS}}, {{StripedDataStreamer#countTrailingBlockGroupBytes()}} is used to count the written bytes of last block group. For previous full block group, the leading streamer has to wait for the slowest streamer to finish writing. Otherwise, if the leading streamer commits {{block.getNumBytes() * NUM_DATA_BLOCKS}} bytes to NN before slow streamers, and one streamer fails after that, the error handling will be complicated. The above solution may be not the best but it works by now. If you have a better solution, we can discuss and optimize the related logic. > Subclass DFSOutputStream to support writing striping layout files > ----------------------------------------------------------------- > > Key: HDFS-7889 > URL: https://issues.apache.org/jira/browse/HDFS-7889 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Li Bo > Assignee: Li Bo > Fix For: HDFS-7285 > > Attachments: HDFS-7889-001.patch, HDFS-7889-002.patch, > HDFS-7889-003.patch, HDFS-7889-004.patch, HDFS-7889-005.patch, > HDFS-7889-006.patch, HDFS-7889-007.patch, HDFS-7889-008.patch, > HDFS-7889-009.patch, HDFS-7889-010.patch, HDFS-7889-011.patch, > HDFS-7889-012.patch, HDFS-7889-013.patch, HDFS-7889-014.patch > > > After HDFS-7888, we can subclass {{DFSOutputStream}} to support writing > striping layout files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)