[ https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Wang updated HDFS-9079: ------------------------------ Issue Type: Improvement (was: Sub-task) Parent: (was: HDFS-8031) > Erasure coding: preallocate multiple generation stamps and serialize updates > from data streamers > ------------------------------------------------------------------------------------------------ > > Key: HDFS-9079 > URL: https://issues.apache.org/jira/browse/HDFS-9079 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding > Affects Versions: HDFS-7285 > Reporter: Zhe Zhang > Assignee: Zhe Zhang > Attachments: HDFS-9079.01.patch, HDFS-9079.02.patch, > HDFS-9079.03.patch, HDFS-9079.04.patch, HDFS-9079.05.patch, > HDFS-9079.06.patch, HDFS-9079.07.patch, HDFS-9079.08.patch, > HDFS-9079.09.patch, HDFS-9079.10.patch, HDFS-9079.11.patch, > HDFS-9079.12.patch, HDFS-9079.13.patch, HDFS-9079.14.patch, > HDFS-9079.15.patch, HDFS-9079-HDFS-7285.00.patch > > > A non-striped DataStreamer goes through the following steps in error handling: > {code} > 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) > Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) > Updates block on NN > {code} > With multiple streamer threads run in parallel, we need to correctly handle a > large number of possible combinations of interleaved thread events. For > example, {{streamer_B}} starts step 2 in between events {{streamer_A.2}} and > {{streamer_A.3}}. > HDFS-9040 moves steps 1, 2, 3, 6 from streamer to {{DFSStripedOutputStream}}. > This JIRA proposes some further optimizations based on HDFS-9040: > # We can preallocate GS when NN creates a new striped block group > ({{FSN#createNewBlock}}). For each new striped block group we can reserve > {{NUM_PARITY_BLOCKS}} GS's. If more than {{NUM_PARITY_BLOCKS}} errors have > happened we shouldn't try to further recover anyway. > # We can use a dedicated event processor to offload the error handling logic > from {{DFSStripedOutputStream}}, which is not a long running daemon. > # We can limit the lifespan of a streamer to be a single block. A streamer > ends either after finishing the current block or when encountering a DN > failure. > With the proposed change, a {{StripedDataStreamer}}'s flow becomes: > {code} > 1) Finds DN error => 2) Notify coordinator (async, not waiting for response) > => terminates > 1) Finds external error => 2) Applies new GS to DN (createBlockOutputStream) > => 3) Ack from DN => 4) Notify coordinator (async, not waiting for response) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org