[
https://issues.apache.org/jira/browse/HBASE-21156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619807#comment-16619807
]
stack edited comment on HBASE-21156 at 9/18/18 10:23 PM:
---------------------------------------------------------
Review please. This seems to work at scale. See note at end of HBASE-21192.
was (Author: stack):
Review please. This seems to work at scale.
> [hbck2] Queue an assign of hbase:meta and bulk assign/unassign
> --------------------------------------------------------------
>
> Key: HBASE-21156
> URL: https://issues.apache.org/jira/browse/HBASE-21156
> Project: HBase
> Issue Type: Sub-task
> Components: hbck2
> Affects Versions: 2.1.0
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Fix For: 2.1.1
>
> Attachments: HBASE-21156.branch-2.1.001.patch,
> HBASE-21156.branch-2.1.002.patch, HBASE-21156.branch-2.1.003.patch,
> HBASE-21156.branch-2.1.004.patch
>
>
> We need this to effect repair when damage.
> If procedure WALs AND a server WAL dir are lost or cleaned or we crashed
> during partial split (unlikely scenarios but nonetheless possible), a Master
> can be stuck unable to become active because there is no assign procedure for
> hbase:meta in the system.
> The reasonable argument over in HBASE-21035 has it that attempts at
> auto-repair under these extremes could cause other issues so at least until
> we learn more, we for now punt to the operator for fix-up.
> To reproduce the catastrophe, see notes in HBASE-21035 (and [~allan163]'s
> test).
> UPDATE: HBASE-21191 adds a Master assuming an "holding-pattern" if on startup
> it does not have an assign for meta (possible if we lose all Master WAL
> Procs.). Holding pattern is needed because we were exiting after one minute
> of RPC'ing to old meta location. To inject an assign, the Admin#assign won't
> work because it gets rejected because the "Master is Initializing". So we
> need to be able to assign hbase:meta even if "Master is initializing". Also,
> while in here, add being able to bulk assign because assigning a
> Region-at-a-time from the shell only works if the offflined region count is
> in the low 10s; fails when thousands offline.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)