[ 
https://issues.apache.org/jira/browse/HDFS-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746013#comment-17746013
 ] 

ASF GitHub Bot commented on HDFS-17116:
---------------------------------------

haiyang1987 opened a new pull request, #5876:
URL: https://github.com/apache/hadoop/pull/5876

   ### Description of PR
   https://issues.apache.org/jira/browse/HDFS-17116
   
   The following exceptions occurred in our online environment:
   
   1. After the machine restarts, the system time is abnormal, is a time in the 
future
   2. After starting the router, there is log "safemode exit for 24981702 
milliseconds...", which has been in the safemode state,this is mainly because 
the startupTime is recorded as the future system time when router is started at 
this time, and the system time returns to normal soon, resulting in a negative 
delta, at this time, the service can only be restored by restart the router 
service.
   
   The relevant logs are:
   
   ```
   2023-07-15 03:15:49,276 INFO  ipc.Server xxx
   2023-07-15 11:21:03,785 INFO  router.DFSRouter (LogAdapter.java:info(51)) 
[main] - STARTUP_MSG:
   /************************************************************
   STARTUP_MSG: Starting Router
   ...
   2023-07-15 11:21:51,325 INFO xxx
   2023-07-15 03:22:00,257 INFO xxx
   2023-07-15 03:22:29,829 INFO router.RouterSafemodeService 
(RouterSafemodeService.java:periodicInvoke(167)) [RouterSafemodeService-0] - 
Delaying safemode exit for 28761777 milliseconds...
   ```
   
   Maybe we can be compatible with this case at the code level, and reset the 
startupTime and enterSafeModeTime in the case of a negative delta, which can 
ensure that the router service can also exit the safemode state normally after 
the system time returns to normal.




> Reset startupTime and enterSafeModeTime if check time interval is negative 
> during router safe mode exit check
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17116
>                 URL: https://issues.apache.org/jira/browse/HDFS-17116
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>
> The following exceptions occurred in our online environment:
> # After the machine restarts, the system time is abnormal, is a time in the 
> future
> # After starting the router, there is log "safemode exit for 24981702 
> milliseconds...", which has been in the safemode state,
> this is mainly because the startupTime is recorded as the future system time 
> when router is started at this time, and the system time returns to normal 
> soon, resulting in a negative delta,
> at this time, the service can only be restored by restart the router service.
> The relevant logs are:
> {code:java}
> 2023-07-15 03:15:49,276 INFO  ipc.Server xxx
> 2023-07-15 11:21:03,785 INFO  router.DFSRouter (LogAdapter.java:info(51)) 
> [main] - STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting Router
> ...
> 2023-07-15 11:21:51,325 INFO xxx
> 2023-07-15 03:22:00,257 INFO xxx
> 2023-07-15 03:22:29,829 INFO router.RouterSafemodeService 
> (RouterSafemodeService.java:periodicInvoke(167)) [RouterSafemodeService-0] - 
> Delaying safemode exit for 28761777 milliseconds...
> {code}
> Maybe we can be compatible with this case at the code level, and reset the 
> startupTime and enterSafeModeTime in the case of a negative delta,
> which can ensure that the router service can also exit the safemode state 
> normally after the system time returns to normal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to