[ https://issues.apache.org/jira/browse/HDFS-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746015#comment-17746015 ]
ASF GitHub Bot commented on HDFS-17116: --------------------------------------- slfan1989 commented on code in PR #5876: URL: https://github.com/apache/hadoop/pull/5876#discussion_r1271379772 ########## hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterSafemode.java: ########## @@ -141,6 +141,31 @@ public void testRouterExitSafemode() verifyRouter(RouterServiceState.RUNNING); } + @Test + public void testRouterExitSafemodeResetUpTime() + throws InterruptedException, IllegalStateException, IOException { + + Calendar calendar = Calendar.getInstance(); + // Get the future times, add one day to the current date. + calendar.add(Calendar.DAY_OF_MONTH, 1); + long timestampAfterOneDay = calendar.getTimeInMillis(); + router.getSafemodeService().setStartupTime(timestampAfterOneDay); + + assertTrue(router.getSafemodeService().isInSafeMode()); + verifyRouter(RouterServiceState.SAFEMODE); + + // Wait for initial time in milliseconds + long interval = + conf.getTimeDuration(DFS_ROUTER_SAFEMODE_EXTENSION, + TimeUnit.SECONDS.toMillis(2), TimeUnit.MILLISECONDS) + + conf.getTimeDuration(DFS_ROUTER_CACHE_TIME_TO_LIVE_MS, + TimeUnit.SECONDS.toMillis(1), TimeUnit.MILLISECONDS) * 2; + Thread.sleep(interval); Review Comment: Use GenericTestUtils.waitFor > Reset startupTime and enterSafeModeTime if check time interval is negative > during router safe mode exit check > ------------------------------------------------------------------------------------------------------------- > > Key: HDFS-17116 > URL: https://issues.apache.org/jira/browse/HDFS-17116 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Haiyang Hu > Assignee: Haiyang Hu > Priority: Major > Labels: pull-request-available > > The following exceptions occurred in our online environment: > # After the machine restarts, the system time is abnormal, is a time in the > future > # After starting the router, there is log "safemode exit for 24981702 > milliseconds...", which has been in the safemode state, > this is mainly because the startupTime is recorded as the future system time > when router is started at this time, and the system time returns to normal > soon, resulting in a negative delta, > at this time, the service can only be restored by restart the router service. > The relevant logs are: > {code:java} > 2023-07-15 03:15:49,276 INFO ipc.Server xxx > 2023-07-15 11:21:03,785 INFO router.DFSRouter (LogAdapter.java:info(51)) > [main] - STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting Router > ... > 2023-07-15 11:21:51,325 INFO xxx > 2023-07-15 03:22:00,257 INFO xxx > 2023-07-15 03:22:29,829 INFO router.RouterSafemodeService > (RouterSafemodeService.java:periodicInvoke(167)) [RouterSafemodeService-0] - > Delaying safemode exit for 28761777 milliseconds... > {code} > Maybe we can be compatible with this case at the code level, and reset the > startupTime and enterSafeModeTime in the case of a negative delta, > which can ensure that the router service can also exit the safemode state > normally after the system time returns to normal. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org