[ 
https://issues.apache.org/jira/browse/HDFS-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747377#comment-17747377
 ] 

ASF GitHub Bot commented on HDFS-17116:
---------------------------------------

haiyang1987 commented on code in PR #5876:
URL: https://github.com/apache/hadoop/pull/5876#discussion_r1274553525


##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterSafemodeService.java:
##########
@@ -161,11 +161,17 @@ protected void serviceInit(Configuration conf) throws 
Exception {
 
   @Override
   public void periodicInvoke() {
-    long now = Time.now();
+    long now = now();
     long delta = now - startupTime;

Review Comment:
   Thanks @Hexiaoqiao help me review it
   yeah, your suggestion is right, because monotonicNow() is not affected by 
settimeofday or similar system clock changes, if here invoke monotonicNow() to 
calculate the delta will avoid the exception case.
   
   if startupTime use monotonicNow, maybe cacheLastUpdateTime and 
enterSafeModeTime we should also use monotonicNow() need to be consistent
   
   what you think?





> Reset startupTime and enterSafeModeTime if check time interval is negative 
> during router safe mode exit check
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17116
>                 URL: https://issues.apache.org/jira/browse/HDFS-17116
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>              Labels: pull-request-available
>
> The following exceptions occurred in our online environment:
> # After the machine restarts, the system time is abnormal, is a time in the 
> future
> # After starting the router, there is log "safemode exit for 24981702 
> milliseconds...", which has been in the safemode state,
> this is mainly because the startupTime is recorded as the future system time 
> when router is started at this time, and the system time returns to normal 
> soon, resulting in a negative delta,
> at this time, the service can only be restored by restart the router service.
> The relevant logs are:
> {code:java}
> 2023-07-15 03:15:49,276 INFO  ipc.Server xxx
> 2023-07-15 11:21:03,785 INFO  router.DFSRouter (LogAdapter.java:info(51)) 
> [main] - STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting Router
> ...
> 2023-07-15 11:21:51,325 INFO xxx
> 2023-07-15 03:22:00,257 INFO xxx
> 2023-07-15 03:22:29,829 INFO router.RouterSafemodeService 
> (RouterSafemodeService.java:periodicInvoke(167)) [RouterSafemodeService-0] - 
> Delaying safemode exit for 28761777 milliseconds...
> {code}
> Maybe we can be compatible with this case at the code level, and reset the 
> startupTime and enterSafeModeTime in the case of a negative delta,
> which can ensure that the router service can also exit the safemode state 
> normally after the system time returns to normal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to