[ 
https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612261#comment-16612261
 ] 

stack commented on HBASE-21035:
-------------------------------

bq. Shouldn't we just trust the meta location in the ZK node?

Maybe I could look there first. If I find something, go with it. Otherwise, 
hunt all WAL dirs for .meta WALs. That'd be best.

I'll have to take the lease on any WALs I find if only to read them. Maybe I 
should delete them when done though I think there should be no damage done if 
they are replayed. I need to figure how to split inline rather than ask our 
splitting service to do it for us.

bq. So maybe the problem here is that, we should make master retrying for a 
longer time before exiting, and add a new method in the rpc service, which is 
for hbck2 to schedule some recovery procedures?

Was thinking on this. Yeah, let me change the isMeta method here so that rather 
than schedule the assign of meta procedure, instead I drop logs asking the 
operator to schedule a meta assign.... and perhaps a procedure that does the 
recover of meta WALs.  Then do the same for namespace (Namespace needs to go 
away).

Let me try it.

> Meta Table should be able to online even if all procedures are lost
> -------------------------------------------------------------------
>
>                 Key: HBASE-21035
>                 URL: https://issues.apache.org/jira/browse/HBASE-21035
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.1.0
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>         Attachments: HBASE-21035.branch-2.0.001.patch, 
> HBASE-21035.branch-2.1.001.patch
>
>
> After HBASE-20708, we changed the way we init after master starts. It will 
> only check WAL dirs and compare to Zookeeper RS nodes to decide which server 
> need to expire. For servers which's dir is ending with 'SPLITTING', we assure 
> that there will be a SCP for it.
> But, if the server with the meta region crashed before master restarts, and 
> if all the procedure wals are lost (due to bug, or deleted manually, 
> whatever), the new restarted master will be stuck when initing. Since no one 
> will bring meta region online.
> Although it is an anomaly case, but I think no matter what happens, we need 
> to online meta region. Otherwise, we are sitting ducks, noting can be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to