[jira] [Commented] (HBASE-23282) HBCKServerCrashProcedure for 'Unknown Servers'

Michael Stack (Jira) Wed, 12 Feb 2020 18:58:12 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-23282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035872#comment-17035872
 ]


Michael Stack commented on HBASE-23282:
---------------------------------------

[~jfrabaute] What is the name of the region you report COLUMN CELL info for? Is 
it 353ab75c788cd0f77027706900453c49, the Unknown Server/Inconsistent server you 
note?

The server 'regionserver-2.hbase.hbase.svc.cluster.local,16020,1573519312100' 
is not in your cluster?  But it is mentioned in your hbase:meta as host for 
353ab75c788cd0f77027706900453c49. You've tried scheduling a 
scheduleProcedureRecoveries with hbck2? It doesn't clean it up?



> HBCKServerCrashProcedure for 'Unknown Servers'
> ----------------------------------------------
>
>                 Key: HBASE-23282
>                 URL: https://issues.apache.org/jira/browse/HBASE-23282
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck2, proc-v2
>    Affects Versions: 2.2.2
>            Reporter: Michael Stack
>            Assignee: Michael Stack
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> With an overdriving, sustained load, I can fairly easily manufacture an 
> hbase:meta table that references servers that are no longer in the live list 
> nor are members of deadservers; i.e. 'Unknown Servers'.  The new 'HBCK 
> Report' UI in Master has a section where it lists 'Unknown Servers' if any in 
> hbase:meta.
> Once in this state, the repair is awkward. Our assign/unassign Procedure is 
> particularly dogged about insisting that we confirm close/open of Regions 
> when it is going about its business which is well and good if server is in 
> live/dead sets but when an 'Unknown Server', we invariably end up trying to 
> confirm against a non-longer present server (More on this in follow-on 
> issues).
> What is wanted is queuing of a ServerCrashProcedure for each 'Unknown 
> Server'. It would split any WALs (there shouldn't be any if server was 
> restarted) and ideally it would cancel out any assigns and reassign regions 
> off the 'Unknown Server'.  But the 'normal' SCP consults the in-memory 
> cluster state figuring what Regions were on the crashed server... And 
> 'Unknown Servers' don't have state in in-master memory Maps of Servers to 
> Regions or  in DeadServers list which works fine for the usual case.
> Suggestion here is that hbck2 be able to drive in a special SCP, one which 
> would get list of Regions by scanning hbase:meta rather than asking Master 
> memory; an HBCKSCP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23282) HBCKServerCrashProcedure for 'Unknown Servers'

Reply via email to