[ https://issues.apache.org/jira/browse/SPARK-14115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210354#comment-15210354 ]
Wojciech Indyk commented on SPARK-14115: ---------------------------------------- [~srowen] thanks for fast feedback! I've updated the description with some information. As I know Hive 1.2.1 is default for Spark (and I see it in logs). I wonder the hbase-handler 1.2.1 has such mismatched libraries problems, however HBase-handler 2.0.0 has all methods needed, but there is a problem with Kerberos I've described. > SparkSql + Hive + HBase + Kerberos doesn't work > ----------------------------------------------- > > Key: SPARK-14115 > URL: https://issues.apache.org/jira/browse/SPARK-14115 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.1 > Environment: Spark 1.6.1. compiled with hadoop 2.7.1, yarn, hive > Hadoop 2.7.1 (HDP 2.3) > Hive 1.2.1 (HDP 2.3) > Kerberos > Reporter: Wojciech Indyk > > When I try to run SparkSql on hive, where table is defined by HBaseHandler I > have an error: > {code} > ERROR ApplicationMaster: User class threw exception: > java.lang.NoSuchMethodError: > org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(Lorg/apache/hadoop/hbase/client/HConnection;Lorg/apache/hadoop/hbase/security/User;Lorg/apache/hadoop/mapreduce/Job;)V > java.lang.NoSuchMethodError: > org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(Lorg/apache/hadoop/hbase/client/HConnection;Lorg/apache/hadoop/hbase/security/User;Lorg/apache/hadoop/mapreduce/Job;)V > at > org.apache.hadoop.hive.hbase.HBaseStorageHandler.addHBaseDelegationToken(HBaseStorageHandler.java:482) > at > org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:427) > at > org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:328) > at > org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:304) > at > org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:323) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at scala.Option.map(Option.scala:145) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91) > at > org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) > at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at org.apache.spark.sql.execution.Sort.doExecute(Sort.scala:64) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.aggregate.SortBasedAggregate$$anonfun$doExecute$1.apply(SortBasedAggregate.scala:72) > at > org.apache.spark.sql.execution.aggregate.SortBasedAggregate$$anonfun$doExecute$1.apply(SortBasedAggregate.scala:69) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) > at > org.apache.spark.sql.execution.aggregate.SortBasedAggregate.doExecute(SortBasedAggregate.scala:69) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.columnar.InMemoryRelation.buildBuffers(InMemoryColumnarTableScan.scala:129) > at > org.apache.spark.sql.execution.columnar.InMemoryRelation.<init>(InMemoryColumnarTableScan.scala:118) > at > org.apache.spark.sql.execution.columnar.InMemoryRelation$.apply(InMemoryColumnarTableScan.scala:41) > at > org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:93) > at > org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:60) > at > org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:84) > at org.apache.spark.sql.DataFrame.persist(DataFrame.scala:1581) > at org.apache.spark.sql.DataFrame.cache(DataFrame.scala:1590) > at > pl.com.agora.bigdata.recommendations.ranking.App$$anon$1.run(App.scala:40) > at > pl.com.agora.bigdata.recommendations.ranking.App$$anon$1.run(App.scala:35) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) > at pl.com.agora.bigdata.recommendations.ranking.App$.main(App.scala:35) > at pl.com.agora.bigdata.recommendations.ranking.App.main(App.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542) > {code} > When I query simmilar table defined as Avro on HDFS all is OK. > I tried to use a newer version of Hive-HBase-handler that has missing method. > I used Hive-HBase-handler 2.0.0 and have another issue with Kerberos on > worker machines: > at the beginning I have: > {code} > 16/03/24 15:26:11 DEBUG UserGroupInformation: User entry: "test-dataocean" > 16/03/24 15:26:11 DEBUG UserGroupInformation: UGI loginUser:test-dataocean > (auth:KERBEROS) > 16/03/24 15:26:11 DEBUG UserGroupInformation: PrivilegedAction > as:test-dataocean (auth:SIMPLE) > from:org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) > {code} > then > {code} > 16/03/24 15:26:45 DEBUG AbstractRpcClient: RPC Server Kerberos principal name > for service=ClientService is hbase/abc.abc....@abc.abc > 16/03/24 15:26:45 DEBUG AbstractRpcClient: Use KERBEROS authentication for > service ClientService, sasl=true > 16/03/24 15:26:45 DEBUG AbstractRpcClient: Connecting to > abc.abc.abc/abc.abc.abc:16020 > 16/03/24 15:26:45 DEBUG UserGroupInformation: PrivilegedAction > as:test-dataocean (auth:SIMPLE) > from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:735) > 16/03/24 15:26:45 DEBUG HBaseSaslRpcClient: Creating SASL GSSAPI client. > Server's Kerberos principal name is hbase/abc.abc....@abc.abc > 16/03/24 15:26:45 DEBUG UserGroupInformation: PrivilegedActionException > as:test-dataocean (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > 16/03/24 15:26:45 DEBUG UserGroupInformation: PrivilegedAction > as:test-dataocean (auth:SIMPLE) > from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleSaslConnectionFailure(RpcClientImpl.java:638) > 16/03/24 15:26:45 WARN AbstractRpcClient: Exception encountered while > connecting to the server : javax.security.sasl.SaslException: GSS initiate > failed [Caused by GSSException: No valid credentials provided (Mechanism > level: Failed to find any Kerberos tgt)] > 16/03/24 15:26:45 ERROR AbstractRpcClient: SASL authentication failed. The > most likely cause is missing or invalid credentials. Consider 'kinit'. > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:612) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:157) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:738) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:735) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:735) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:897) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:866) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$CallSender.run(RpcClientImpl.java:267) > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tgt) > at > sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) > at > sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) > at > sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) > at > sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) > at > sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) > ... 12 more > {code} > The problem is the method of authentication is overwritten as SIMPLE instead > of KERBEROS > I submit a job using parameters: > {code} > --principal test-dataocean --keytab > /etc/security/keytabs/test-dataocean.keytab > {code} > and setup required parameters regarding Kerberos for Hadoop, Hive and HBase. > All that configs work for SparkSql on Hive (on HDFS), HBase (without Spark), > but not for SparkSql on Hive on HBase (HBaseHandler 2.0.0) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org