Josh Elser created HBASE-27044:
----------------------------------

             Summary: Serialized procedures which point to users from other 
Kerberos domains can prevent master startup
                 Key: HBASE-27044
                 URL: https://issues.apache.org/jira/browse/HBASE-27044
             Project: HBase
          Issue Type: Bug
          Components: proc-v2
            Reporter: Josh Elser


We ran into an interesting bug when test teams were running HBase against cloud 
storage without ensuring that the previous location was cleaned. This resulted 
in an hbase.rootdir that had:
 * A valid HBase MasterData Region
 * A valid hbase:meta
 * A valid collection of HBase tables
 * An empty ZooKeeper

Through the changes that we've worked on prior, those described in HBASE-24286 
were effective in getting every _except_ the Procedures back online without 
issue. Parsing the existing procedures produced an interesting error:
{noformat}
java.lang.IllegalArgumentException: Illegal principal name 
hbase/wrong-hostname.domain@WRONG_REALM: 
org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No 
rules applied to hbase/wrong-hostname.domain@WRONG_REALM
        at org.apache.hadoop.security.User.<init>(User.java:51)
        at org.apache.hadoop.security.User.<init>(User.java:43)
        at 
org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1418)
        at 
org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1402)
        at 
org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.toUserInfo(MasterProcedureUtil.java:60)
        at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.deserializeStateData(ModifyTableProcedure.java:262)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:411)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:78)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.load(ProcedureExecutor.java:339)
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:285)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:330)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:600)
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1581)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:835)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2205)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:514)
        at java.lang.Thread.run(Thread.java:750) {noformat}
What's actually happening is that we are storing the {{User}} into the 
procedure and then relying on UserGroupInformation to parse the {{User}} 
protobuf into a UGI to get the "short" username.

When the serialized procedure (whether in the MasterData region over via PV2 
WAL files, I think) gets loaded, we end up needing Hadoop auth_to_local 
configuration to be able to parse that kerberos principal back to a name. 
However, Hadoop's KerberosName will only unwrap Kerberos principals which match 
the local Kerberos realm (defined by the krb5.conf's default_realm, 
[ref|https://github.com/frohoff/jdk8u-jdk/blob/master/src/share/classes/sun/security/krb5/Config.java#L978-L983])

The interesting part is that we don't seem to ever use the user _other_ than to 
display the {{owner}} attribute for procedures on the HBase UI. There is a 
method in hbase-procedure which can filter procedures based on Owner, but I 
didn't see any usages of that method.

Given the pushback against HBASE-24286, I assume that, for the same reasons, we 
would see pushback against fixing this issue. However, I wanted to call it out 
for posterity. The expectation of users is that HBase _should_ implicitly 
handle this case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to