[GitHub] [hadoop-ozone] fapifta commented on a change in pull request #1430: HDDS-4227. Implement a 'Prepare For Upgrade' step in OM that applies all committed Ratis transactions.

2020-09-22 Thread GitBox


fapifta commented on a change in pull request #1430:
URL: https://github.com/apache/hadoop-ozone/pull/1430#discussion_r492702681



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -994,6 +1005,45 @@ public static boolean omInit(OzoneConfiguration conf) 
throws IOException,
 }
   }
 
+  public boolean applyAllPendingTransactions()
+  throws InterruptedException, IOException {
+
+if (!isRatisEnabled) {
+  LOG.info("Ratis not enabled. Nothing to do.");
+  return true;
+}
+
+String purgeConfig = omRatisServer.getServer()
+.getProperties().get(PURGE_UPTO_SNAPSHOT_INDEX_KEY);
+if (!Boolean.parseBoolean(purgeConfig)) {
+  throw new IllegalStateException("Cannot prepare OM for Upgrade since  " +
+  "raft.server.log.purge.upto.snapshot.index is not true");
+}
+
+waitForAllTxnsApplied(omRatisServer.getOmStateMachine(),
+omRatisServer.getRaftGroup(),
+(RaftServerProxy) omRatisServer.getServer(),
+TimeUnit.MINUTES.toSeconds(5));

Review comment:
   Are you sure we want to add a configuration for this one? I would argue 
we do not need one more configurable thing to this one at least.
   prepareForUpgrade is a special startup type of OM, during which it applies 
all transactions that are in the raft log.
   If 5 minutes is not enough to apply all transactions in the raft log, then 
the process will shut down and let the user know that some of the transactions 
were not applied, so that the user can start the process again as a last 
resort, to apply further transactions. If we assume that at least a few 
transactions are applied sooner or later the user can get to a state where 
everything is applied, and if none of the transactions can be applied within 5 
minutes, that sounds like a serious problem anyways, independently from the 
upgrade.
   
   Also in 5 minutes I would expect in all cases that the unapplied 
transactions can be applied, as the number of this kind of transactions should 
not be too much as far as I know, or if it is then the system is not healthy 
anyway.
   
   Can you please elaborate, why would it be useful to make this configurable?

##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -1179,15 +1229,22 @@ public void start() throws IOException {
   // Allow OM to start as Http Server failure is not fatal.
   LOG.error("OM HttpServer failed to start.", ex);
 }
-omRpcServer.start();
-isOmRpcServerRunning = true;
 
+if (!prepareForUpgrade) {
+  omRpcServer.start();
+  isOmRpcServerRunning = true;
+}

Review comment:
   As we discussed this with @avijayanhwx during internal design 
discussions, after OM is started in prepareForUpgrade mode, it will tear down, 
when the last transaction is applied from the raft log, and a snapshot is taken 
in raft, so with that the OM reached a state when all transactions are applied 
and none needs to be applied after the next startup.
   
   This is to ensure that all the transactions are applied with the code that 
was there when the transactions arrived in, so with that we can ensure 
consistency of the state of different OM instances.
   
   After this is finished, and OM tear down from prepareForUpgrade, one will 
need a normal startup of OM to bring it up again, and at that time the RPC 
server will start properly.

##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerStarter.java
##
@@ -98,6 +98,28 @@ public void initOm()
 }
   }
 
+
+  /**
+   * This function implements a sub-command to allow the OM to be
+   * "prepared for upgrade".
+   */
+  @CommandLine.Command(name = "--prepareForUpgrade",
+  aliases = {"--prepareForDowngrade", "--flushTransactions"},

Review comment:
   This command should be issued when the OM is already stopped before the 
upgrade of software bits. This is a command that starts up the OM code in a 
special way, with that it can start up only the current local OM, as I 
understand.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] fapifta commented on a change in pull request #1430: HDDS-4227. Implement a 'Prepare For Upgrade' step in OM that applies all committed Ratis transactions.

2020-09-22 Thread GitBox


fapifta commented on a change in pull request #1430:
URL: https://github.com/apache/hadoop-ozone/pull/1430#discussion_r492706805



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerStarter.java
##
@@ -98,6 +98,28 @@ public void initOm()
 }
   }
 
+
+  /**
+   * This function implements a sub-command to allow the OM to be
+   * "prepared for upgrade".
+   */
+  @CommandLine.Command(name = "--prepareForUpgrade",
+  aliases = {"--prepareForDowngrade", "--flushTransactions"},

Review comment:
   This command should be issued when the OM is already stopped before the 
upgrade of software bits. This is a command that starts up the OM code in a 
special way, with that it can start up only the current local OM, as I 
understand.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] fapifta commented on a change in pull request #1430: HDDS-4227. Implement a 'Prepare For Upgrade' step in OM that applies all committed Ratis transactions.

2020-09-22 Thread GitBox


fapifta commented on a change in pull request #1430:
URL: https://github.com/apache/hadoop-ozone/pull/1430#discussion_r492705325



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -1179,15 +1229,22 @@ public void start() throws IOException {
   // Allow OM to start as Http Server failure is not fatal.
   LOG.error("OM HttpServer failed to start.", ex);
 }
-omRpcServer.start();
-isOmRpcServerRunning = true;
 
+if (!prepareForUpgrade) {
+  omRpcServer.start();
+  isOmRpcServerRunning = true;
+}

Review comment:
   As we discussed this with @avijayanhwx during internal design 
discussions, after OM is started in prepareForUpgrade mode, it will tear down, 
when the last transaction is applied from the raft log, and a snapshot is taken 
in raft, so with that the OM reached a state when all transactions are applied 
and none needs to be applied after the next startup.
   
   This is to ensure that all the transactions are applied with the code that 
was there when the transactions arrived in, so with that we can ensure 
consistency of the state of different OM instances.
   
   After this is finished, and OM tear down from prepareForUpgrade, one will 
need a normal startup of OM to bring it up again, and at that time the RPC 
server will start properly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] fapifta commented on a change in pull request #1430: HDDS-4227. Implement a 'Prepare For Upgrade' step in OM that applies all committed Ratis transactions.

2020-09-22 Thread GitBox


fapifta commented on a change in pull request #1430:
URL: https://github.com/apache/hadoop-ozone/pull/1430#discussion_r492702681



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -994,6 +1005,45 @@ public static boolean omInit(OzoneConfiguration conf) 
throws IOException,
 }
   }
 
+  public boolean applyAllPendingTransactions()
+  throws InterruptedException, IOException {
+
+if (!isRatisEnabled) {
+  LOG.info("Ratis not enabled. Nothing to do.");
+  return true;
+}
+
+String purgeConfig = omRatisServer.getServer()
+.getProperties().get(PURGE_UPTO_SNAPSHOT_INDEX_KEY);
+if (!Boolean.parseBoolean(purgeConfig)) {
+  throw new IllegalStateException("Cannot prepare OM for Upgrade since  " +
+  "raft.server.log.purge.upto.snapshot.index is not true");
+}
+
+waitForAllTxnsApplied(omRatisServer.getOmStateMachine(),
+omRatisServer.getRaftGroup(),
+(RaftServerProxy) omRatisServer.getServer(),
+TimeUnit.MINUTES.toSeconds(5));

Review comment:
   Are you sure we want to add a configuration for this one? I would argue 
we do not need one more configurable thing to this one at least.
   prepareForUpgrade is a special startup type of OM, during which it applies 
all transactions that are in the raft log.
   If 5 minutes is not enough to apply all transactions in the raft log, then 
the process will shut down and let the user know that some of the transactions 
were not applied, so that the user can start the process again as a last 
resort, to apply further transactions. If we assume that at least a few 
transactions are applied sooner or later the user can get to a state where 
everything is applied, and if none of the transactions can be applied within 5 
minutes, that sounds like a serious problem anyways, independently from the 
upgrade.
   
   Also in 5 minutes I would expect in all cases that the unapplied 
transactions can be applied, as the number of this kind of transactions should 
not be too much as far as I know, or if it is then the system is not healthy 
anyway.
   
   Can you please elaborate, why would it be useful to make this configurable?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org