[ https://issues.apache.org/jira/browse/HBASE-21156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620893#comment-16620893 ]
stack edited comment on HBASE-21156 at 9/19/18 5:17 PM: -------------------------------------------------------- I pushed this to branch-2.1+ after addressing checkstyle issue. was (Author: stack): I pushed this to branch-2.1+ > [hbck2] Queue an assign of hbase:meta and bulk assign/unassign > -------------------------------------------------------------- > > Key: HBASE-21156 > URL: https://issues.apache.org/jira/browse/HBASE-21156 > Project: HBase > Issue Type: Sub-task > Components: hbck2 > Affects Versions: 2.1.0 > Reporter: stack > Assignee: stack > Priority: Critical > Fix For: 2.1.1 > > Attachments: HBASE-21156.branch-2.1.001.patch, > HBASE-21156.branch-2.1.002.patch, HBASE-21156.branch-2.1.003.patch, > HBASE-21156.branch-2.1.004.patch, HBASE-21156.branch-2.1.005.patch > > > We need this to effect repair when damage. > If procedure WALs AND a server WAL dir are lost or cleaned or we crashed > during partial split (unlikely scenarios but nonetheless possible), a Master > can be stuck unable to become active because there is no assign procedure for > hbase:meta in the system. > The reasonable argument over in HBASE-21035 has it that attempts at > auto-repair under these extremes could cause other issues so at least until > we learn more, we for now punt to the operator for fix-up. > To reproduce the catastrophe, see notes in HBASE-21035 (and [~allan163]'s > test). > UPDATE: HBASE-21191 adds a Master assuming an "holding-pattern" if on startup > it does not have an assign for meta (possible if we lose all Master WAL > Procs.). Holding pattern is needed because we were exiting after one minute > of RPC'ing to old meta location. To inject an assign, the Admin#assign won't > work because it gets rejected because the "Master is Initializing". So we > need to be able to assign hbase:meta even if "Master is initializing". Also, > while in here, add being able to bulk assign because assigning a > Region-at-a-time from the shell only works if the offflined region count is > in the low 10s; fails when thousands offline. -- This message was sent by Atlassian JIRA (v7.6.3#76005)