[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

Walter Su (JIRA) Thu, 17 Sep 2015 00:52:58 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791739#comment-14791739
 ]


Walter Su commented on HDFS-9040:
---------------------------------

bq. should we just do updatePipeline when completing the block? 1. In the 
read-being-written scenario, there will be a longer window of *false-fresh" 
(meaning a stale internal block is considered as fresh).
We should do it before hflush/hsync as well.

bq. 2. When NUM_PARITY_BLOCKS number of streamers are dead, the OutputStream 
should die immediately instead of waiting for the next writeChunk.
failed streamer is detected in writeChunk. We plan to add periodical checking. 
[~jingzhao] said that before. 

bq. 3. We might want to add the logic to replace a failed StripedDataStreamer 
in the future.
No, we won't. I think so? if you're talking something like Datanode replacement 
for repl block. You can transfer a healthy repl RBW to a new Datanode, then you 
still get 3 DNs after replacement. But recover a corrupted RBW internal block 
is difficult.

I've a question. Instead of delay, Do we even need refresh UC.replicas? 
1. A client read UC block being written can decode replica if it misses some 
part. ( With checksum verification, we are only concern about 'missing')
2. Block recovery/ lease recovery truncates all RBW's length to minimal length 
for repl block. For striping, Assume a corrupted internalBlock has a small 
length ,like 200kb. 8 healthy internalBlocks have long length, like 
(1mb-cellSize, 1mb+cellSize). Of course after recovery we should truncate the 8 
to 1mb ( 8 healthy internal blocks should be at the same last stripe, but 
should we truncate last stripe? That's not my point.). My point is , we can 
rule out the corrupted internalBlocks by {{commitBlockSynchronization}}.
3. Maintenance the indices of UC.replicas. UC.replicas updated by BlockReport 
is safe, because reportedBlock has ID. If UC.replicas is updated by 
updatePipeline, the indices are derived from array offset. You can see 
{{UC.setExpectedLocations()}} It's error prone. If we don't refresh UC.replicas 
we are pretty safe.

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>         Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and 
> StripedDataStreamer s only have to stream blocks to DNs.
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

Reply via email to