[
https://issues.apache.org/jira/browse/ZOOKEEPER-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388789#comment-17388789
]
Emil Kleszcz commented on ZOOKEEPER-4334:
-----------------------------------------
Hi [~ztzg], I think that won't solve the problem as the change considers only
the SASL auth between the quorum members and my case regards the Java client to
server auth. In the current setup, I don't enable the SASL auth for the
server-to-server communication. It's only client-server. Nevertheless, thanks
to your pointer, I have just discovered the extra flag:
`zookeeper.sasl.client.canonicalize.hostname` (that is btw not covered in the
admin doc: [https://zookeeper.apache.org/doc/r3.6.0/zookeeperAdmin.html
)|https://zookeeper.apache.org/doc/r3.6.0/zookeeperAdmin.html)] that you
reference with a corresponding Jira in ZK-4030. According to the code base, it
is by default enabled. This means that by default we have to strictly use the
canonical names for the principals. What I would like to achieve instead is to
define the aliases in the principals. I will try again by playing with this
flag. Cheers!
> SASL authentication fails when using host aliases
> -------------------------------------------------
>
> Key: ZOOKEEPER-4334
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4334
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.6.1
> Reporter: Emil Kleszcz
> Priority: Critical
>
> I faced an issue while trying to use alternative aliases with Zookeeper
> quorum when SASL is enabled. The errors I get in zookeeper log are the
> following:
> ```
> 2021-07-12 21:04:46,437 [myid:3] - WARN
> [NIOWorkerThread-3:ZooKeeperServer@1661] - Client /<IP addr>:37368 failed to
> SASL authenticate: {}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum
> failed)]
> at
> com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199)
> at
> org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:49)
> at
> org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:1650)
> at
> org.apache.zookeeper.server.ZooKeeperServer.processPacket(ZooKeeperServer.java:1599)
> at
> org.apache.zookeeper.server.NIOServerCnxn.readRequest(NIOServerCnxn.java:379)
> at
> org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:182)
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> at
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism
> level: Checksum failed)
> at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:856)
> at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
> at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
> at
> com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:167)
> ... 11 more
> Caused by: KrbException: Checksum failed
> at
> sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:102)
> at
> sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:94)
> at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:175)
> at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:281)
> at sun.security.krb5.KrbApReq.<init>(KrbApReq.java:149)
> at
> sun.security.jgss.krb5.InitSecContextToken.<init>(InitSecContextToken.java:108)
> at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:829)
> ... 14 more
> Caused by: java.security.GeneralSecurityException: Checksum failed
> at
> sun.security.krb5.internal.crypto.dk.AesDkCrypto.decryptCTS(AesDkCrypto.java:451)
> at
> sun.security.krb5.internal.crypto.dk.AesDkCrypto.decrypt(AesDkCrypto.java:272)
> at sun.security.krb5.internal.crypto.Aes256.decrypt(Aes256.java:76)
> at
> sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:100)
> ... 20 more
> ```
> What did I do?
> 1) created host aliases for each quorum node (a,b,c): zk1, zk2, zk3
> 2) Changed in zoo.cfg:
> changed from
> server.1=a
> server.2=b
> server.3=c
> to:
> server.1=zk1
> server.2=zk2
> server.3=zk3
> (at this stage after restarting the ensemble all works as expected.
> 3) Generate new keytab with alias-based principals and host-based principals
> in zookeeper.keytab
> 4) Change jaas.conf (server) definition from:
> Server
> { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true
> keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true
> useTicketCache=false principal="zookeeper/a.com@COM"; }
> ;
> to
> Server
> { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true
> keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true
> useTicketCache=false principal="zookeeper/zk1.com@COM"; }
> ;
> From that moment, after restarting quorum members, I get the above error.
> Now, why do I do this?
> To allow other services such as zkfc,hbase,hdfs,yarn to connect to the
> quorum using aliases. Interestingly, without changing the zookeeper
> principal, hbase works perfectly, but the other 3 services fail with:
> ```
> <2021-07-12T20:45:19.491+0200> <INFO> <org.apache.zookeeper.ZooKeeper>:
> <Initiating client connection,
> connectString=zk01.com:2181,zk02.com:2181,zk03.com:2181 sessionTimeout=10000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3246fb96>
> <2021-07-12T20:45:19.519+0200> <INFO> <org.apache.zookeeper.Login>: <Client
> successfully logged in.>
> <2021-07-12T20:45:19.521+0200> <INFO> <org.apache.zookeeper.Login>: <TGT
> refresh thread started.>
> <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT
> valid starting at: Mon Jul 12 20:45:19 CEST 2021>
> <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT
> expires: Tue Jul 13 21:45:19 CEST 2021>
> <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT
> refresh sleeping until: Tue Jul 13 17:05:16 CEST 2021>
> <2021-07-12T20:45:19.524+0200> <INFO>
> <org.apache.zookeeper.client.ZooKeeperSaslClient>: <Client will use GSSAPI as
> SASL mechanism.>
> <2021-07-12T20:45:19.530+0200> <INFO> <org.apache.zookeeper.ClientCnxn>:
> <Opening socket connection to server zk02.com/<ip addr>:2181. Will attempt to
> SASL-authenticate using Login Context section 'Client'>
> <2021-07-12T20:45:19.535+0200> <INFO> <org.apache.zookeeper.ClientCnxn>:
> <Socket connection established to zk02.com/<ip addr>:2181, initiating session>
> <2021-07-12T20:45:19.543+0200> <INFO> <org.apache.zookeeper.ClientCnxn>:
> <Session establishment complete on server zk02.com/<ip addr>:2181, sessionid
> = 0x200247870fb0007, negotiated timeout = 10000>
> <2021-07-12T20:45:19.561+0200> <ERROR>
> <org.apache.zookeeper.client.ZooKeeperSaslClient>: <SASL authentication
> failed using login context 'Client' with exception: {}>
> javax.security.sasl.SaslException: Error in authenticating with a Zookeeper
> Quorum member: the quorum member's saslToken is null.
> at
> org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:279)
> at
> org.apache.zookeeper.client.ZooKeeperSaslClient.respondToServer(ZooKeeperSaslClient.java:242)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:805)
> at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)
> <2021-07-12T20:45:19.564+0200> <INFO> <org.apache.zookeeper.ClientCnxn>:
> <Unable to read additional data from server sessionid 0x200247870fb0007,
> likely server has closed socket, closing socket connection and attempting
> reconnect>
> <2021-07-12T20:45:19.671+0200> <INFO>
> <org.apache.hadoop.ha.ActiveStandbyElector>: <Session connected.>
> <2021-07-12T20:45:19.672+0200> <ERROR>
> <org.apache.hadoop.hdfs.tools.DFSZKFailoverController>:
> <DFSZKFailOverController exiting due to earlier exception
> java.io.IOException: Couldn't determine existence of znode
> ```
> When I change the principle of zookeeper hbase starts failing with this
> error and other services except for the zookeeper itself is somehow working
> fine. After that, I cannot connect manually to the zk quorum using zkCli and
> zookeeper-client with all possible combinations of principals.
> I wonder if that may have something to do with the "Server
> environment:host.name=" pointing to the canonical name (and not the alias)
> during the startup. The same happens after specifying the alias with
> clientPortAddress=.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)