[ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:
--------------------------------

    Attachment: 7504-trunk v1.patch
    
> -ROOT- may be offline forever after FullGC of  RS
> -------------------------------------------------
>
>                 Key: HBASE-7504
>                 URL: https://issues.apache.org/jira/browse/HBASE-7504
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.3
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 7504-trunk v1.patch
>
>
> 1.FullGC happen on ROOT regionserver.
> 2.ZK session timeout, master expire the regionserver and submit to 
> ServerShutdownHandler
> 3.Regionserver complete the FullGC
> 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
> true
> 5.ServerShutdownHandler skip assigning -ROOT- region
> 6.Regionserver abort itself because it reveive YouAreDeadException after a 
> regionserver report
> 7.-ROO- is offline now, and won't be assigned any more unless we restart 
> master
> Master Log:
> {code}
> 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
> Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
> shutdown handler to be executed, root=true, meta=false
> 2012-10-31 19:51:39,045 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for dw88.kgb.sqa.cm4,60020,1351671478752
> 2012-10-31 19:51:50,113 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
> dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
> 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
> Server REPORT rejected; currently processing 
> dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
> 2012-10-31 19:52:15,945 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
> splitting for dw88.kgb.sqa.cm4,60020,1351671478752
> {code}
> No log of assigning -ROOT-
> Regionserver log:
> {code}
> 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
> 229128ms instead of 100000ms, this is likely due to a long garbage collecting 
> pause and it's usually bad, see 
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to