[ https://issues.apache.org/jira/browse/SPARK-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968238#comment-14968238 ]
Yu Gao commented on SPARK-10181: -------------------------------- The exception was from the driver process on local machine, as yarn-client mode was used. On start up, UserGroupInformation.loginUserFromKeytab was indeed called with ambari-qa principal and keys, and thus static var UserGroupInfomation,loginUser was set to ambari...@biads.svl.ibm.com and all threads within the driver process were supposed to see and use this login credentials to authenticate with Hive and Hadoop. However, because of IsolatedClientLoader, UserGroupInformation class was not shared for hive metastore clients, and it was loaded separately and of course not able to see the prepared kerberos login credentials in the main thread. A fix would be adding UserGroupInformation and related class to spark.sql.hive.metastore.sharedPrefixes as default value. I tested it in my cluster and the issue was gone. > HiveContext is not used with keytab principal but with user principal/unix > username > ----------------------------------------------------------------------------------- > > Key: SPARK-10181 > URL: https://issues.apache.org/jira/browse/SPARK-10181 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Environment: kerberos > Reporter: Bolke de Bruin > Labels: hive, hivecontext, kerberos > > `bin/spark-submit --num-executors 1 --executor-cores 5 --executor-memory 5G > --driver-java-options -XX:MaxPermSize=4G --driver-class-path > lib/datanucleus-api-jdo-3.2.6.jar:lib/datanucleus-core-3.2.10.jar:lib/datanucleus-rdbms-3.2.9.jar:conf/hive-site.xml > --files conf/hive-site.xml --master yarn --principal sparkjob --keytab > /etc/security/keytabs/sparkjob.keytab --conf > spark.yarn.executor.memoryOverhead=18000 --conf > "spark.executor.extraJavaOptions=-XX:MaxPermSize=4G" --conf > spark.eventLog.enabled=false ~/test.py` > With: > #!/usr/bin/python > from pyspark import SparkContext > from pyspark.sql import HiveContext > sc = SparkContext() > sqlContext = HiveContext(sc) > query = """ SELECT * FROM fm.sk_cluster """ > rdd = sqlContext.sql(query) > rdd.registerTempTable("test") > sqlContext.sql("CREATE TABLE wcs.test LOCATION '/tmp/test_gl' AS SELECT * > FROM test") > Ends up with: > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): > Permission denie > d: user=ua80tl, access=READ_EXECUTE, > inode="/tmp/test_gl/.hive-staging_hive_2015-08-24_10-43-09_157_78057390024057878 > 34-1/-ext-10000":sparkjob:hdfs:drwxr-x--- > (Our umask denies read access to other by default) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org