[ 
https://issues.apache.org/jira/browse/PHOENIX-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117156#comment-17117156
 ] 

Sandeep Guggilam commented on PHOENIX-4216:
-------------------------------------------

I have spent some time looking at few builds that had test related to the 
"Master not initialized" exception.  The Master is not able to complete the 
initialization as it waiting for the region server to report to it. The region 
server actually reported to the master but the master rejected the request 
because of clock skew issue

_org.apache.hadoop.hbase.ClockOutOfSyncException: Server 
asf948.gq1.ygridcore.net,36973,1590112065298 has been rejected; Reported time 
is too far out of sync with master. Time difference of 1589507264841ms > max 
allowed of 30000ms_

 

There are multiple things to understand here :
 # The Region server uses EnvironmentEdgeManager.currentTime to report the 
current time and HMaster uses System.currentTimeMillis() to get the current 
time for computation against the reported time by RS. Ideally, even the 
EnvironmentEdgeManager should give the same as System.currenttimemillis() here 
unless we use some other delegate which I am not sure is possible in 
HRegionServer startup. On other note, should we just use EnvironmentEdgeManager 
even in HMaster for computation ?
 # We try to get the diff of the time between RS and Master like "abs(a-b) =c" 
and see if c is greater than configured value. In the log message we just log 
"c" (the difference). Should we also log either "a" or "b" to understand who 
(master or slave) is reporting the wrong value ?

[~apurtell] Can you please provide your thoughts ?

> Figure out why tests randomly fail with master not able to initialize in 200 
> seconds
> ------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4216
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4216
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.15.0, 4.14.3
>            Reporter: Samarth Jain
>            Priority: Major
>              Labels: phoenix-hardening, precommit, quality-improvement
>             Fix For: 5.1.0, 4.16.0
>
>         Attachments: Precommit-3849.log
>
>
> Sample failure:
> https://builds.apache.org/job/PreCommit-PHOENIX-Build/1450//testReport/
> [~apurtell] - Looking at the thread dump in the above link, do you see why 
> master startup failed? I couldn't see any obvious deadlocks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to