[ https://issues.apache.org/jira/browse/HDFS-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748411#comment-17748411 ]
ASF GitHub Bot commented on HDFS-17116: --------------------------------------- haiyang1987 commented on PR #5876: URL: https://github.com/apache/hadoop/pull/5876#issuecomment-1655034285 The failed unit test seems unrelated to the change, I will follow up on this UT failure issue and create a new issue to solve it > Reset startupTime and enterSafeModeTime if check time interval is negative > during router safe mode exit check > ------------------------------------------------------------------------------------------------------------- > > Key: HDFS-17116 > URL: https://issues.apache.org/jira/browse/HDFS-17116 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Haiyang Hu > Assignee: Haiyang Hu > Priority: Major > Labels: pull-request-available > > The following exceptions occurred in our online environment: > # After the machine restarts, the system time is abnormal, is a time in the > future > # After starting the router, there is log "safemode exit for 24981702 > milliseconds...", which has been in the safemode state, > this is mainly because the startupTime is recorded as the future system time > when router is started at this time, and the system time returns to normal > soon, resulting in a negative delta, > at this time, the service can only be restored by restart the router service. > The relevant logs are: > {code:java} > 2023-07-15 03:15:49,276 INFO ipc.Server xxx > 2023-07-15 11:21:03,785 INFO router.DFSRouter (LogAdapter.java:info(51)) > [main] - STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting Router > ... > 2023-07-15 11:21:51,325 INFO xxx > 2023-07-15 03:22:00,257 INFO xxx > 2023-07-15 03:22:29,829 INFO router.RouterSafemodeService > (RouterSafemodeService.java:periodicInvoke(167)) [RouterSafemodeService-0] - > Delaying safemode exit for 28761777 milliseconds... > {code} > Maybe we can be compatible with this case at the code level, and reset the > startupTime and enterSafeModeTime in the case of a negative delta, > which can ensure that the router service can also exit the safemode state > normally after the system time returns to normal. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org