[jira] [Comment Edited] (HDDS-935) Avoid creating an already created container on a datanode in case of disk removal followed by datanode restart

Shashikant Banerjee (JIRA) Tue, 26 Feb 2019 05:48:23 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777923#comment-16777923
 ]


Shashikant Banerjee edited comment on HDDS-935 at 2/26/19 1:47 PM:
-------------------------------------------------------------------

Thanks [~arpitagarwal] for the review. Patch v6 addresses your review comments.
{code:java}
Why do we add the container set to DispatcherContext? We just need to update 
this set once the container is successfully created right?
{code}
The Dispatcher#dispatch which performs the command execution gets passed two 
set of 2 infos : ContainercommandRequestProto which is client visible plus 
DispatcherContext which is ratis StateMachine specific info to execute the 
ContainercommandRequest properly. The createContainerSet is maintained per 
ContainerStateMachine (per pipeline) into the snapshot file and since its 
specific to StatMachine, we add the containerSet to DispatcherContext while 
executing createContainer on the Dispatcher.


was (Author: shashikant):
Thanks [~arpitagarwal] for the review. Patch v6 addresses your review comments.
{code:java}
Why do we add the container set to DispatcherContext? We just need to update 
this set once the container is successfully created right?
{code}
The Dispatcher gets passed two set of 2 infos : ContainercommandRequestProto 
which is client visible plus DispatcherContext which is ratis StateMachine 
specific info to execute the ContainercommandRequest properly. The 
createContainerSet is maintained per ContainerStateMachine (per pipeline) into 
the snapshot file and since its specific to StatMachine, we add the 
containerSet to DispatcherContext while executing createContainer on the 
Dispatcher.

> Avoid creating an already created container on a datanode in case of disk 
> removal followed by datanode restart
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-935
>                 URL: https://issues.apache.org/jira/browse/HDDS-935
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: Ozone Datanode
>    Affects Versions: 0.4.0
>            Reporter: Rakesh R
>            Assignee: Shashikant Banerjee
>            Priority: Major
>         Attachments: HDDS-935.000.patch, HDDS-935.001.patch, 
> HDDS-935.002.patch, HDDS-935.003.patch, HDDS-935.004.patch, 
> HDDS-935.005.patch, HDDS-935.006.patch
>
>
> Currently, a container gets created when a writeChunk request comes to 
> HddsDispatcher and if the container does not exist already. In case a disk on 
> which a container exists gets removed and datanode restarts and now, if a 
> writeChunkRequest comes , it might end up creating the same container again 
> with an updated BCSID as it won't detect the disk is removed. This won't be 
> detected by SCM as well as it will have the latest BCSID. This Jira aims to 
> address this issue.
> The proposed fix would be to persist the all the containerIds existing in the 
> containerSet when a ratis snapshot is taken in the snapshot file. If the disk 
> is removed and dn gets restarted, the container set will be rebuild after 
> scanning all the available disks and the the container list stored in the 
> snapshot file will give all the containers created in the datanode. The diff 
> between these two will give the exact list of containers which were created 
> but were not detected after the restart. Any writeChunk request now should 
> validate the container Id from the list of missing containers. Also, we need 
> to ensure container creation does not happen as part of applyTransaction of 
> writeChunk request in Ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-935) Avoid creating an already created container on a datanode in case of disk removal followed by datanode restart

Reply via email to