[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585114#comment-16585114
 ] 

stack commented on HBASE-19121:
-------------------------------

Chatting with [~allan163] and [~Apache9], major concern is loss of master proc 
wals. If gone, mis-deleted, or damaged, then the cluster is hosed. Can't have 
this. Redundancy? How to have redundant master proc WAL? Or can we leave 
breadcrumbs as we used to try in hbck1 days that allow us rebuild if all is 
trashed? How? We have some file-based droppings. Will use for now though we 
would like to move away from depending on particularities of our fs persist. 
For hbase2, minimally:

* A rebuild procedure that can put cluster back together after catastrophy.  
Rebuild procedure might be composed of multiple fix-it procedures that an 
operator would run via hbck2.  hbck2 would require at least a minimal Master 
running ("maintenance mode"). Best if no dependency on RSs.
* But only ever one master at a time! Even if a mimimal.
* One procedure would repair meta. It would work though minimal master. It 
would look for meta WAL logs for recovery. It'd run splitting inline rather 
than try farm it out to cluster to minimize dependency on RS's being up. It'd 
dump the recovered.edits into place. It might then open the the meta region    
for hbck2 to read.
* hbck2 would make report of the troublesome....RITs. Or unfinished split or 
merge.
* A procedure to look for -SPLITTING RS dirs for queuing new SCPs.

Other hbck2 features:

* Move aside the master proc wals.
* Force complete of a procedure. Can't kill Procedures. Rollback doesn't always 
work. Procedures maybe subprocedures. Need to have them complete so parent can 
complete. Then operator does fixup. When force complete, need to release locks 
too... else operator or new procedures to fix cannot make progress.



> HBCK for AMv2 (A.K.A HBCK2)
> ---------------------------
>
>                 Key: HBASE-19121
>                 URL: https://issues.apache.org/jira/browse/HBASE-19121
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck
>            Reporter: stack
>            Assignee: Umesh Agashe
>            Priority: Major
>         Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to