Josh Elser created HBASE-27044: ---------------------------------- Summary: Serialized procedures which point to users from other Kerberos domains can prevent master startup Key: HBASE-27044 URL: https://issues.apache.org/jira/browse/HBASE-27044 Project: HBase Issue Type: Bug Components: proc-v2 Reporter: Josh Elser
We ran into an interesting bug when test teams were running HBase against cloud storage without ensuring that the previous location was cleaned. This resulted in an hbase.rootdir that had: * A valid HBase MasterData Region * A valid hbase:meta * A valid collection of HBase tables * An empty ZooKeeper Through the changes that we've worked on prior, those described in HBASE-24286 were effective in getting every _except_ the Procedures back online without issue. Parsing the existing procedures produced an interesting error: {noformat} java.lang.IllegalArgumentException: Illegal principal name hbase/wrong-hostname.domain@WRONG_REALM: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to hbase/wrong-hostname.domain@WRONG_REALM at org.apache.hadoop.security.User.<init>(User.java:51) at org.apache.hadoop.security.User.<init>(User.java:43) at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1418) at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1402) at org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.toUserInfo(MasterProcedureUtil.java:60) at org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.deserializeStateData(ModifyTableProcedure.java:262) at org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294) at org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) at org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:411) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.load(ProcedureExecutor.java:339) at org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:285) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:330) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:600) at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1581) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:835) at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2205) at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:514) at java.lang.Thread.run(Thread.java:750) {noformat} What's actually happening is that we are storing the {{User}} into the procedure and then relying on UserGroupInformation to parse the {{User}} protobuf into a UGI to get the "short" username. When the serialized procedure (whether in the MasterData region over via PV2 WAL files, I think) gets loaded, we end up needing Hadoop auth_to_local configuration to be able to parse that kerberos principal back to a name. However, Hadoop's KerberosName will only unwrap Kerberos principals which match the local Kerberos realm (defined by the krb5.conf's default_realm, [ref|https://github.com/frohoff/jdk8u-jdk/blob/master/src/share/classes/sun/security/krb5/Config.java#L978-L983]) The interesting part is that we don't seem to ever use the user _other_ than to display the {{owner}} attribute for procedures on the HBase UI. There is a method in hbase-procedure which can filter procedures based on Owner, but I didn't see any usages of that method. Given the pushback against HBASE-24286, I assume that, for the same reasons, we would see pushback against fixing this issue. However, I wanted to call it out for posterity. The expectation of users is that HBase _should_ implicitly handle this case. -- This message was sent by Atlassian Jira (v8.20.7#820007)