Hello!
 TL;DR Could you explain how (and which) Kerberos tokens should be
delegated from driver to workers? Does it depend on spark mode?

I have a Hadoop cluster HDP 2.3 with Kerberos. I use spark-sql (1.6.1
compiled with hadoop 2.7.1 and hive 1.2.1) on yarn-cluster mode to
query my hive tables.
1. When I query hive table stored in HDFS everything is fine. (assume
there is no problem with my app, config and credentials setup)
2. When I try to query external table of HBase (defined in Hive using
HBaseHandler) I have a permissions problem on RPC call from
Spark-workers to HBase region server. (there is no problem to connect
HBaseMaster from driver, Zookeepers from both driver and workers)
3. When I query the HBase table by hive (beeswax) everything is ok.
(assume there is no problem with HBaseHandler)

After some time of debugging (and write some additional logging) I see
the driver has (and delegates) only:
16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for:
16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: 172.xx.xx102:8188
16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: ha-hdfs:dataocean
Which means there are only credentials for YARN and HDFS. I am curious
is it proper behavior? I see another user has similar doubt:
https://issues.apache.org/jira/browse/SPARK-12279?focusedCommentId=15067020&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15067020

Could you explain how (and which) Kerberos tokens should be delegated
from driver to workers? Does it depend on spark mode? As I saw in the
code the method obtainTokenForHBase is calling when yarn-client mode
is on, but not for yarn-cluster. Am I right? Is it ok?

--
Kind regards/ Pozdrawiam,
Wojciech Indyk
http://datacentric.pl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to