[ https://issues.apache.org/jira/browse/HBASE-19757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325436#comment-16325436 ]
Ted Yu edited comment on HBASE-19757 at 1/15/18 6:16 PM: --------------------------------------------------------- In master, we have the following code in RSGroupInfoManagerImpl#refresh() {code:java} if(!masterServices.isInitialized()) { specialTables = Arrays.asList(AccessControlLists.ACL_TABLE_NAME, TableName.META_TABLE_NAME, TableName.NAMESPACE_TABLE_NAME, RSGROUP_TABLE_NAME); } else { specialTables = masterServices.listTableNamesByNamespace(NamespaceDescriptor.SYSTEM_NAMESPACE_NAME_STR); } {code} If acl table is about to be created, the call in else branch may end up not having hbase:acl as one of the special tables. In RSGroupBasedLoadBalancer, due to lack of rs group, no server is provided for hbase:acl table, leading to the deadlock. was (Author: yuzhih...@gmail.com): In master, we have the following code in RSGroupInfoManagerImpl#refresh() {code} if(!masterServices.isInitialized()) { specialTables = Arrays.asList(AccessControlLists.ACL_TABLE_NAME, TableName.META_TABLE_NAME, TableName.NAMESPACE_TABLE_NAME, RSGROUP_TABLE_NAME); } else { specialTables = masterServices.listTableNamesByNamespace(NamespaceDescriptor.SYSTEM_NAMESPACE_NAME_STR); } {code} If acl table is about to be created, the call in else branch may end up not having hbase:acl as one of the special tables. By always using the assignment in if block, TestRSGroupsWithACL passes. > System table gets stuck after enabling region server group feature in secure > cluster > ------------------------------------------------------------------------------------ > > Key: HBASE-19757 > URL: https://issues.apache.org/jira/browse/HBASE-19757 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > Assignee: Ted Yu > Priority: Major > Attachments: 19757.v1.txt, 19757.v2.txt, 19757.v3.txt > > > I was testing on an hbase-2 secure cluster against hadoop 3 where some tables > were created without region server group feature. > After adding the RSGroupAdminEndpoint and RSGroupBasedLoadBalancer to > hbase-site, I restarted the whole cluster. > After the restart, hbase:meta region got stuck in transition (forever). > {code} > 2018-01-10 21:20:16,696 INFO > [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker-ctr-e137-1514896590304-8706-01-000002.hwx.site,20000,1515619212617] > zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at > address=ctr-e137-1514896590304-8706-01-000004.hwx.site,16020,1515618538016, > exception=org.apache.hadoop. hbase.NotServingRegionException: > hbase:meta,,1 is not online on > ctr-e137-1514896590304-8706-01-000004.hwx.site,16020,1515619181453 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3314) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3291) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1355) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1667) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)