[ 
https://issues.apache.org/jira/browse/SPARK-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968238#comment-14968238
 ] 

Yu Gao commented on SPARK-10181:
--------------------------------

The exception was from the driver process on local machine, as yarn-client mode 
was used. On start up, UserGroupInformation.loginUserFromKeytab was indeed 
called with ambari-qa principal and keys, and thus static var 
UserGroupInfomation,loginUser was set to ambari...@biads.svl.ibm.com and all 
threads within the driver process were supposed to see and use this login 
credentials to authenticate with Hive and Hadoop. However, because of 
IsolatedClientLoader, UserGroupInformation class was not shared for hive 
metastore clients, and it was loaded separately and of course not able to see 
the prepared kerberos login credentials in the main thread.

A fix would be adding UserGroupInformation and related class to 
spark.sql.hive.metastore.sharedPrefixes as default value. I tested it in my 
cluster and the issue was gone.

> HiveContext is not used with keytab principal but with user principal/unix 
> username
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-10181
>                 URL: https://issues.apache.org/jira/browse/SPARK-10181
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>         Environment: kerberos
>            Reporter: Bolke de Bruin
>              Labels: hive, hivecontext, kerberos
>
> `bin/spark-submit --num-executors 1 --executor-cores 5 --executor-memory 5G  
> --driver-java-options -XX:MaxPermSize=4G --driver-class-path 
> lib/datanucleus-api-jdo-3.2.6.jar:lib/datanucleus-core-3.2.10.jar:lib/datanucleus-rdbms-3.2.9.jar:conf/hive-site.xml
>  --files conf/hive-site.xml --master yarn --principal sparkjob --keytab 
> /etc/security/keytabs/sparkjob.keytab --conf 
> spark.yarn.executor.memoryOverhead=18000 --conf 
> "spark.executor.extraJavaOptions=-XX:MaxPermSize=4G" --conf 
> spark.eventLog.enabled=false ~/test.py`
> With:
> #!/usr/bin/python
> from pyspark import SparkContext
> from pyspark.sql import HiveContext
> sc = SparkContext()
> sqlContext = HiveContext(sc)
> query = """ SELECT * FROM fm.sk_cluster """
> rdd = sqlContext.sql(query)
> rdd.registerTempTable("test")
> sqlContext.sql("CREATE TABLE wcs.test LOCATION '/tmp/test_gl' AS SELECT * 
> FROM test")
> Ends up with:
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
>  Permission denie
> d: user=ua80tl, access=READ_EXECUTE, 
> inode="/tmp/test_gl/.hive-staging_hive_2015-08-24_10-43-09_157_78057390024057878
> 34-1/-ext-10000":sparkjob:hdfs:drwxr-x---
> (Our umask denies read access to other by default)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to