[GitHub] [hadoop-ozone] fapifta commented on a change in pull request #1430: HDDS-4227. Implement a 'Prepare For Upgrade' step in OM that applies all committed Ratis transactions.
fapifta commented on a change in pull request #1430: URL: https://github.com/apache/hadoop-ozone/pull/1430#discussion_r492702681 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -994,6 +1005,45 @@ public static boolean omInit(OzoneConfiguration conf) throws IOException, } } + public boolean applyAllPendingTransactions() + throws InterruptedException, IOException { + +if (!isRatisEnabled) { + LOG.info("Ratis not enabled. Nothing to do."); + return true; +} + +String purgeConfig = omRatisServer.getServer() +.getProperties().get(PURGE_UPTO_SNAPSHOT_INDEX_KEY); +if (!Boolean.parseBoolean(purgeConfig)) { + throw new IllegalStateException("Cannot prepare OM for Upgrade since " + + "raft.server.log.purge.upto.snapshot.index is not true"); +} + +waitForAllTxnsApplied(omRatisServer.getOmStateMachine(), +omRatisServer.getRaftGroup(), +(RaftServerProxy) omRatisServer.getServer(), +TimeUnit.MINUTES.toSeconds(5)); Review comment: Are you sure we want to add a configuration for this one? I would argue we do not need one more configurable thing to this one at least. prepareForUpgrade is a special startup type of OM, during which it applies all transactions that are in the raft log. If 5 minutes is not enough to apply all transactions in the raft log, then the process will shut down and let the user know that some of the transactions were not applied, so that the user can start the process again as a last resort, to apply further transactions. If we assume that at least a few transactions are applied sooner or later the user can get to a state where everything is applied, and if none of the transactions can be applied within 5 minutes, that sounds like a serious problem anyways, independently from the upgrade. Also in 5 minutes I would expect in all cases that the unapplied transactions can be applied, as the number of this kind of transactions should not be too much as far as I know, or if it is then the system is not healthy anyway. Can you please elaborate, why would it be useful to make this configurable? ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -1179,15 +1229,22 @@ public void start() throws IOException { // Allow OM to start as Http Server failure is not fatal. LOG.error("OM HttpServer failed to start.", ex); } -omRpcServer.start(); -isOmRpcServerRunning = true; +if (!prepareForUpgrade) { + omRpcServer.start(); + isOmRpcServerRunning = true; +} Review comment: As we discussed this with @avijayanhwx during internal design discussions, after OM is started in prepareForUpgrade mode, it will tear down, when the last transaction is applied from the raft log, and a snapshot is taken in raft, so with that the OM reached a state when all transactions are applied and none needs to be applied after the next startup. This is to ensure that all the transactions are applied with the code that was there when the transactions arrived in, so with that we can ensure consistency of the state of different OM instances. After this is finished, and OM tear down from prepareForUpgrade, one will need a normal startup of OM to bring it up again, and at that time the RPC server will start properly. ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerStarter.java ## @@ -98,6 +98,28 @@ public void initOm() } } + + /** + * This function implements a sub-command to allow the OM to be + * "prepared for upgrade". + */ + @CommandLine.Command(name = "--prepareForUpgrade", + aliases = {"--prepareForDowngrade", "--flushTransactions"}, Review comment: This command should be issued when the OM is already stopped before the upgrade of software bits. This is a command that starts up the OM code in a special way, with that it can start up only the current local OM, as I understand. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] fapifta commented on a change in pull request #1430: HDDS-4227. Implement a 'Prepare For Upgrade' step in OM that applies all committed Ratis transactions.
fapifta commented on a change in pull request #1430: URL: https://github.com/apache/hadoop-ozone/pull/1430#discussion_r492706805 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerStarter.java ## @@ -98,6 +98,28 @@ public void initOm() } } + + /** + * This function implements a sub-command to allow the OM to be + * "prepared for upgrade". + */ + @CommandLine.Command(name = "--prepareForUpgrade", + aliases = {"--prepareForDowngrade", "--flushTransactions"}, Review comment: This command should be issued when the OM is already stopped before the upgrade of software bits. This is a command that starts up the OM code in a special way, with that it can start up only the current local OM, as I understand. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] fapifta commented on a change in pull request #1430: HDDS-4227. Implement a 'Prepare For Upgrade' step in OM that applies all committed Ratis transactions.
fapifta commented on a change in pull request #1430: URL: https://github.com/apache/hadoop-ozone/pull/1430#discussion_r492705325 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -1179,15 +1229,22 @@ public void start() throws IOException { // Allow OM to start as Http Server failure is not fatal. LOG.error("OM HttpServer failed to start.", ex); } -omRpcServer.start(); -isOmRpcServerRunning = true; +if (!prepareForUpgrade) { + omRpcServer.start(); + isOmRpcServerRunning = true; +} Review comment: As we discussed this with @avijayanhwx during internal design discussions, after OM is started in prepareForUpgrade mode, it will tear down, when the last transaction is applied from the raft log, and a snapshot is taken in raft, so with that the OM reached a state when all transactions are applied and none needs to be applied after the next startup. This is to ensure that all the transactions are applied with the code that was there when the transactions arrived in, so with that we can ensure consistency of the state of different OM instances. After this is finished, and OM tear down from prepareForUpgrade, one will need a normal startup of OM to bring it up again, and at that time the RPC server will start properly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] fapifta commented on a change in pull request #1430: HDDS-4227. Implement a 'Prepare For Upgrade' step in OM that applies all committed Ratis transactions.
fapifta commented on a change in pull request #1430: URL: https://github.com/apache/hadoop-ozone/pull/1430#discussion_r492702681 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -994,6 +1005,45 @@ public static boolean omInit(OzoneConfiguration conf) throws IOException, } } + public boolean applyAllPendingTransactions() + throws InterruptedException, IOException { + +if (!isRatisEnabled) { + LOG.info("Ratis not enabled. Nothing to do."); + return true; +} + +String purgeConfig = omRatisServer.getServer() +.getProperties().get(PURGE_UPTO_SNAPSHOT_INDEX_KEY); +if (!Boolean.parseBoolean(purgeConfig)) { + throw new IllegalStateException("Cannot prepare OM for Upgrade since " + + "raft.server.log.purge.upto.snapshot.index is not true"); +} + +waitForAllTxnsApplied(omRatisServer.getOmStateMachine(), +omRatisServer.getRaftGroup(), +(RaftServerProxy) omRatisServer.getServer(), +TimeUnit.MINUTES.toSeconds(5)); Review comment: Are you sure we want to add a configuration for this one? I would argue we do not need one more configurable thing to this one at least. prepareForUpgrade is a special startup type of OM, during which it applies all transactions that are in the raft log. If 5 minutes is not enough to apply all transactions in the raft log, then the process will shut down and let the user know that some of the transactions were not applied, so that the user can start the process again as a last resort, to apply further transactions. If we assume that at least a few transactions are applied sooner or later the user can get to a state where everything is applied, and if none of the transactions can be applied within 5 minutes, that sounds like a serious problem anyways, independently from the upgrade. Also in 5 minutes I would expect in all cases that the unapplied transactions can be applied, as the number of this kind of transactions should not be too much as far as I know, or if it is then the system is not healthy anyway. Can you please elaborate, why would it be useful to make this configurable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org