[ 
https://issues.apache.org/jira/browse/HDDS-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-4227:
------------------------------------
    Description: 
*Why is this needed?*
Through HDDS-4143, we have a generic factory to handle multiple versions of 
apply transaction implementations based on layout version. Hence, this factory 
can be used to handle versioned requests across layout versions, whenever both 
the versions need to exist in the code (Let's say for HDDS-2939). 

However, it has been noticed that the OM ratis requests are still undergoing 
lot of minor changes (HDDS-4007, HDDS-4007, HDDS-3903), and in these cases it 
will become hard to maintain 2 versions of the code just to support clean 
upgrades. 

Hence, the plan is to build a pre-upgrade utility (client API) that makes sure 
that an OM instance has no "un-applied" transactions in this Raft log. Invoking 
this client API makes sure that the upgrade starts with a clean state. Of 
course, this would be needed only in a HA setup. In a non HA setup, this can 
either be skipped, or when invoked will be a No-Op (Non Ratis) or cause no harm 
(Single node Ratis).

*How does it work?*
Before updating the software bits, our goal is to get OMs to get to the  latest 
state with respect to apply transaction. The reason we want this is to make 
sure that the same version of the code executes the AT step in all the 3 OMs. 
In a high level, the flow will be as follows.

* Before upgrade, *stop* the OMs.
* Start OMs with a special flag --prepareUpgrade (This is something like 
--init,  which is a special state which stops the ephemeral OM instance after 
doing some work)
* When OM is started with the --prepareUpgrade flag, it does not start the RPC 
server, so no new requests can get in.
* In this state, we give every OM time to apply txn until the last txn.
* We know that at least 2 OMs would have gotten the last client request 
transaction committed into their log. Hence, those 2 OMs are expected to apply 
transaction to that index faster.
* At every OM, the Raft log will be purged after this wait period (so that the 
replay does not happen), and a Ratis snapshot taken at last txn.
* Even if there is a lagger OM which is unable to get to last applied txn 
index, its logs will be purged after the wait time expires.
* Now when OMs are started with newer version, all the OMs will start using the 
new code.
* The lagger OM will get the new Ratis snapshot since there are no logs to 
replay from.

> Implement a "prepareForUpgrade" step that applies all committed transactions 
> onto the OM state machine.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-4227
>                 URL: https://issues.apache.org/jira/browse/HDDS-4227
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: Ozone Manager
>            Reporter: Aravindan Vijayan
>            Assignee: Aravindan Vijayan
>            Priority: Major
>             Fix For: 1.1.0
>
>
> *Why is this needed?*
> Through HDDS-4143, we have a generic factory to handle multiple versions of 
> apply transaction implementations based on layout version. Hence, this 
> factory can be used to handle versioned requests across layout versions, 
> whenever both the versions need to exist in the code (Let's say for 
> HDDS-2939). 
> However, it has been noticed that the OM ratis requests are still undergoing 
> lot of minor changes (HDDS-4007, HDDS-4007, HDDS-3903), and in these cases it 
> will become hard to maintain 2 versions of the code just to support clean 
> upgrades. 
> Hence, the plan is to build a pre-upgrade utility (client API) that makes 
> sure that an OM instance has no "un-applied" transactions in this Raft log. 
> Invoking this client API makes sure that the upgrade starts with a clean 
> state. Of course, this would be needed only in a HA setup. In a non HA setup, 
> this can either be skipped, or when invoked will be a No-Op (Non Ratis) or 
> cause no harm (Single node Ratis).
> *How does it work?*
> Before updating the software bits, our goal is to get OMs to get to the  
> latest state with respect to apply transaction. The reason we want this is to 
> make sure that the same version of the code executes the AT step in all the 3 
> OMs. In a high level, the flow will be as follows.
> * Before upgrade, *stop* the OMs.
> * Start OMs with a special flag --prepareUpgrade (This is something like 
> --init,  which is a special state which stops the ephemeral OM instance after 
> doing some work)
> * When OM is started with the --prepareUpgrade flag, it does not start the 
> RPC server, so no new requests can get in.
> * In this state, we give every OM time to apply txn until the last txn.
> * We know that at least 2 OMs would have gotten the last client request 
> transaction committed into their log. Hence, those 2 OMs are expected to 
> apply transaction to that index faster.
> * At every OM, the Raft log will be purged after this wait period (so that 
> the replay does not happen), and a Ratis snapshot taken at last txn.
> * Even if there is a lagger OM which is unable to get to last applied txn 
> index, its logs will be purged after the wait time expires.
> * Now when OMs are started with newer version, all the OMs will start using 
> the new code.
> * The lagger OM will get the new Ratis snapshot since there are no logs to 
> replay from.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to