Hi all, I have been puzzling over a Kerberos problem for a while now and wondered if anyone can help.
For spark-submit, I specify --master yarn-client --keytab x --principal y, which creates my SparkContext fine. Connections to Zookeeper Quorum to find the HBase master work well too. But when it comes to a .count() action on the RDD, I am always presented with the stack trace at the end of this mail. We are using CDH5.5.2 (spark 1.5.0), and com.cloudera.spark.hbase.HBaseContext is a wrapper around TableInputFormat/hadoopRDD (see https://github.com/cloudera-labs/SparkOnHBase), as you can see in the stack trace. Am I doing something obvious wrong here? A similar flow, inside test code, works well, only going via spark-submit exposes this issue. Code snippet (I have tried using the commented-out lines in various combinations, without success): val conf = new SparkConf(). set("spark.shuffle.consolidateFiles", "true"). set("spark.kryo.registrationRequired", "false"). set("spark.serializer", "org.apache.spark.serializer.KryoSerializer"). set("spark.kryoserializer.buffer", "30m") val sc = new SparkContext(conf) val cfg = sc.hadoopConfiguration // cfg.addResource(new org.apache.hadoop.fs.Path("/etc/hbase/conf/hbase-site.xml")) // UserGroupInformation.getCurrentUser.setAuthenticationMethod(UserGroupInformation.AuthenticationMethod.KERBEROS) // cfg.set("hbase.security.authentication", "kerberos") val hc = new HBaseContext(sc, cfg) val scan = new Scan scan.setTimeRange(startMillis, endMillis) val matchesInRange = hc.hbaseRDD(MY_TABLE, scan, resultToMatch) val cnt = matchesInRange.count() log.info(s"matches in range $cnt") Stack trace / log: 16/05/17 17:04:47 INFO SparkContext: Starting job: count at Analysis.scala:93 16/05/17 17:04:47 INFO DAGScheduler: Got job 0 (count at Analysis.scala:93) with 1 output partitions 16/05/17 17:04:47 INFO DAGScheduler: Final stage: ResultStage 0(count at Analysis.scala:93) 16/05/17 17:04:47 INFO DAGScheduler: Parents of final stage: List() 16/05/17 17:04:47 INFO DAGScheduler: Missing parents: List() 16/05/17 17:04:47 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at HBaseContext.scala:580), which has no missing parents 16/05/17 17:04:47 INFO MemoryStore: ensureFreeSpace(3248) called with curMem=428022, maxMem=244187136 16/05/17 17:04:47 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 3.2 KB, free 232.5 MB) 16/05/17 17:04:47 INFO MemoryStore: ensureFreeSpace(2022) called with curMem=431270, maxMem=244187136 16/05/17 17:04:47 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2022.0 B, free 232.5 MB) 16/05/17 17:04:47 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.6.164.40:33563 (size: 2022.0 B, free: 232.8 MB) 16/05/17 17:04:47 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:861 16/05/17 17:04:47 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at HBaseContext.scala:580) 16/05/17 17:04:47 INFO YarnScheduler: Adding task set 0.0 with 1 tasks 16/05/17 17:04:47 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hpg-dev-vm, partition 0,PROCESS_LOCAL, 2208 bytes) 16/05/17 17:04:47 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on hpg-dev-vm:52698 (size: 2022.0 B, free: 388.4 MB) 16/05/17 17:04:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on hpg-dev-vm:52698 (size: 26.0 KB, free: 388.4 MB) 16/05/17 17:04:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hpg-dev-vm): org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:308) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:155) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:63) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:314) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:289) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:161) at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:156) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.restart(TableRecordReaderImpl.java:90) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.initialize(TableRecordReaderImpl.java:167) at org.apache.hadoop.hbase.mapreduce.TableRecordReader.initialize(TableRecordReader.java:138) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.initialize(TableInputFormatBase.java:200) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:153) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:124) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Could not set up IO Streams to hpg-dev-vm /127.0.0.1:60020 at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:773) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:890) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:859) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1193) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:32627) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1583) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1293) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1125) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:299) ... 26 more Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$1.run(RpcClientImpl.java:673) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleSaslConnectionFailure(RpcClientImpl.java:631) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:739) ... 36 more Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:605) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:154) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:731) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:728) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:728) ... 36 more Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) ... 45 more -- Philipp Meyerhoefer Thomson Reuters ________________________________ This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website.<http://site.thomsonreuters.com/site/disclosures/> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HBase-Spark-Kerberos-problem-tp26982.html Sent from the Apache Spark User List mailing list archive at Nabble.com.