[ https://issues.apache.org/jira/browse/HBASE-23282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035872#comment-17035872 ]
Michael Stack commented on HBASE-23282: --------------------------------------- [~jfrabaute] What is the name of the region you report COLUMN CELL info for? Is it 353ab75c788cd0f77027706900453c49, the Unknown Server/Inconsistent server you note? The server 'regionserver-2.hbase.hbase.svc.cluster.local,16020,1573519312100' is not in your cluster? But it is mentioned in your hbase:meta as host for 353ab75c788cd0f77027706900453c49. You've tried scheduling a scheduleProcedureRecoveries with hbck2? It doesn't clean it up? > HBCKServerCrashProcedure for 'Unknown Servers' > ---------------------------------------------- > > Key: HBASE-23282 > URL: https://issues.apache.org/jira/browse/HBASE-23282 > Project: HBase > Issue Type: Bug > Components: hbck2, proc-v2 > Affects Versions: 2.2.2 > Reporter: Michael Stack > Assignee: Michael Stack > Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.3 > > > With an overdriving, sustained load, I can fairly easily manufacture an > hbase:meta table that references servers that are no longer in the live list > nor are members of deadservers; i.e. 'Unknown Servers'. The new 'HBCK > Report' UI in Master has a section where it lists 'Unknown Servers' if any in > hbase:meta. > Once in this state, the repair is awkward. Our assign/unassign Procedure is > particularly dogged about insisting that we confirm close/open of Regions > when it is going about its business which is well and good if server is in > live/dead sets but when an 'Unknown Server', we invariably end up trying to > confirm against a non-longer present server (More on this in follow-on > issues). > What is wanted is queuing of a ServerCrashProcedure for each 'Unknown > Server'. It would split any WALs (there shouldn't be any if server was > restarted) and ideally it would cancel out any assigns and reassign regions > off the 'Unknown Server'. But the 'normal' SCP consults the in-memory > cluster state figuring what Regions were on the crashed server... And > 'Unknown Servers' don't have state in in-master memory Maps of Servers to > Regions or in DeadServers list which works fine for the usual case. > Suggestion here is that hbck2 be able to drive in a special SCP, one which > would get list of Regions by scanning hbase:meta rather than asking Master > memory; an HBCKSCP. -- This message was sent by Atlassian Jira (v8.3.4#803005)