[jira] [Updated] (HBASE-17287) Master becomes a zombie if filesystem object closes

Ted Yu (JIRA) Mon, 27 Mar 2017 13:32:03 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ted Yu updated HBASE-17287:
---------------------------
    Attachment: 17287.master.v4.txt

Added testSafemodeBringsDownMaster in patch v4 - planning to create separate 
test class once the new test passes.
Currently the wait for master thread to exit times out:
{code}
testSafemodeBringsDownMaster(org.apache.hadoop.hbase.master.procedure.TestCreateTableProcedure)
  Time elapsed: 61.538 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 60000 
milliseconds
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:196)
        at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:143)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3959)
        at 
org.apache.hadoop.hbase.master.procedure.TestCreateTableProcedure.testSafemodeBringsDownMaster(TestCreateTableProcedure.java:92)
{code}
Let me see what the cause could be.

> Master becomes a zombie if filesystem object closes
> ---------------------------------------------------
>
>                 Key: HBASE-17287
>                 URL: https://issues.apache.org/jira/browse/HBASE-17287
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: Clay B.
>            Assignee: Ted Yu
>             Fix For: 1.4.0, 2.0
>
>         Attachments: 17287.branch-1.v3.txt, 17287.master.v2.txt, 
> 17287.master.v3.txt, 17287.master.v4.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HBASE-17287) Master becomes a zombie if filesystem object closes

Reply via email to