Hello! TL;DR Could you explain how (and which) Kerberos tokens should be delegated from driver to workers? Does it depend on spark mode?
I have a Hadoop cluster HDP 2.3 with Kerberos. I use spark-sql (1.6.1 compiled with hadoop 2.7.1 and hive 1.2.1) on yarn-cluster mode to query my hive tables. 1. When I query hive table stored in HDFS everything is fine. (assume there is no problem with my app, config and credentials setup) 2. When I try to query external table of HBase (defined in Hive using HBaseHandler) I have a permissions problem on RPC call from Spark-workers to HBase region server. (there is no problem to connect HBaseMaster from driver, Zookeepers from both driver and workers) 3. When I query the HBase table by hive (beeswax) everything is ok. (assume there is no problem with HBaseHandler) After some time of debugging (and write some additional logging) I see the driver has (and delegates) only: 16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: 16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: 172.xx.xx102:8188 16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: ha-hdfs:dataocean Which means there are only credentials for YARN and HDFS. I am curious is it proper behavior? I see another user has similar doubt: https://issues.apache.org/jira/browse/SPARK-12279?focusedCommentId=15067020&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15067020 Could you explain how (and which) Kerberos tokens should be delegated from driver to workers? Does it depend on spark mode? As I saw in the code the method obtainTokenForHBase is calling when yarn-client mode is on, but not for yarn-cluster. Am I right? Is it ok? -- Kind regards/ Pozdrawiam, Wojciech Indyk http://datacentric.pl --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org