arshadmohammad opened a new issue, #17729:
URL: https://github.com/apache/pinot/issues/17729

   
   **ISSUE:**
   Pinot controller startup fails with Zookeeper Kerberos authentication 
enabled.
   
   
   _2026/02/19 10:37:12.994 ERROR [HelixControllerMain] [main] Exception while 
starting controller
   org.apache.helix.zookeeper.zkclient.exception.ZkTimeoutException: Waiting to 
be connected to ZK server has timed out.
           at 
org.apache.helix.zookeeper.zkclient.ZkClient.waitForEstablishedSession(ZkClient.java:1990)
           at 
org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:775)
           at 
org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:817)
           at 
org.apache.helix.controller.HelixControllerMain.startHelixController(HelixControllerMain.java:159)
           at 
org.apache.pinot.controller.helix.core.util.HelixSetupUtils.setupHelixController(HelixSetupUtils.java:131)
           at 
org.apache.pinot.controller.BaseControllerStarter.setUpHelixController(BaseControllerStarter.java:469)
           at 
org.apache.pinot.controller.BaseControllerStarter.start(BaseControllerStarter.java:440)
           at 
org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:118)
           at 
org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:87)
           at 
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:240)
           at 
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:293)
           at 
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:239)
           at 
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183)
           at 
org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:180)
           at org.apache.pinot.tools.Command.call(Command.java:33)
           at org.apache.pinot.tools.Command.call(Command.java:29)
           at picocli.CommandLine.executeUserObject(CommandLine.java:2031)
           at picocli.CommandLine.access$1500(CommandLine.java:148)
           at 
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2469)
           at picocli.CommandLine$RunLast.handle(CommandLine.java:2461)
           at picocli.CommandLine$RunLast.handle(CommandLine.java:2423)
           at 
picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
           at picocli.CommandLine$RunLast.execute(CommandLine.java:2425)
           at picocli.CommandLine.execute(CommandLine.java:2174)
           at 
org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:174)
           at 
org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:210)
           at 
org.apache.pinot.tools.admin.PinotController.main(PinotController.java:38)_
   
   
   **ANALYSIS:**
   
   The method 
org.apache.helix.zookeeper.zkclient.ZkClient.waitForEstablishedSession is 
designed to wait for the Zookeeper client to reach the SyncConnected state.
   However, as shown in the logs, the client transitions to the 
SaslAuthenticated state instead.
   This causes waitForEstablishedSession to timeout, since the code is 
specifically waiting for SyncConnected and does not recognize SaslAuthenticated 
as a valid connected state.
   
   _2026/02/19 10:34:02.799 DEBUG [ZkClient] [main] zkclient5 Awaiting 
connection to Zookeeper server
   2026/02/19 10:34:02.799 DEBUG [ZkClient] [main] zkclient 5, Waiting for 
keeper state SyncConnected
   2026/02/19 10:34:02.836 DEBUG [ZkClient] [main-EventThread] zkclient 5, 
Received event: WatchedEvent state:SyncConnected type:None path:null zxid: -1
   2026/02/19 10:34:02.836 INFO [ZkClient] [main-EventThread] zkclient 5, 
zookeeper state changed ( SyncConnected )
   ....
   2026/02/19 10:34:02.852 DEBUG [ZkClient] [main-EventThread] zkclient 5, 
Received event: WatchedEvent state:SaslAuthenticated type:None path:null zxid: 
-1
   2026/02/19 10:34:02.852 INFO [ZkClient] [main-EventThread] zkclient 5, 
zookeeper state changed ( SaslAuthenticated )
   2026/02/19 10:34:02.852 DEBUG [ZkClient] [main-EventThread] zkclient 5 
Leaving process event_
   
   Using Kerberos authentication in Zookeeper and connecting Pinot components 
is a common scenario.
   I am surprised this problem occurs and wonder if I am overlooking something 
fundamental. If you have encountered this issue before or have suggestions for 
a solution, please share your experience.
   
   **SOLUTION:**
   The fix needs to be implemented in the Apache Helix codebase. I will raise 
an issue with the Helix community and collaborate to resolve it. Need to handle 
SaslAuthenticated as the internal state
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to