Some of the Hadoop services cannot make use of the ticket obtained by oginUserFromKeytab.
I was able to get past it using gss Jaas configuration where you can pass either Keytab file or ticketCache to spark executors that access HBase. Sent from my iPhone > On May 19, 2016, at 4:51 AM, Ellis, Tom (Financial Markets IT) > <tom.el...@lloydsbanking.com.INVALID> wrote: > > Yeah we ran into this issue. Key part is to have the hbase jars and > hbase-site.xml config on the classpath of the spark submitter. > > We did it slightly differently from Y Bodnar, where we set the required jars > and config on the env var SPARK_DIST_CLASSPATH in our spark env file (rather > than SPARK_CLASSPATH which is deprecated). > > With this and –principal/--keytab, if you turn DEBUG logging for > org.apache.spark.deploy.yarn you should see “Added HBase security token to > credentials.” > > Otherwise you should at least hopefully see the error where it fails to add > the HBase tokens. > > Check out the source of Client [1] and YarnSparkHadoopUtil [2] – you’ll see > how obtainTokenForHBase is being done. > > It’s a bit confusing as to why it says you haven’t kinited even when you do > loginUserFromKeytab – I haven’t quite worked through the reason for that yet. > > Cheers, > > Tom Ellis > telli...@gmail.com > > [1] > https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala > [2] > https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala > > > From: John Trengrove [mailto:john.trengr...@servian.com.au] > Sent: 19 May 2016 08:09 > To: philipp.meyerhoe...@thomsonreuters.com > Cc: user > Subject: Re: HBase / Spark Kerberos problem > > -- This email has reached the Bank via an external source -- > > Have you had a look at this issue? > > https://issues.apache.org/jira/browse/SPARK-12279 > > There is a comment by Y Bodnar on how they successfully got Kerberos and > HBase working. > > 2016-05-18 18:13 GMT+10:00 <philipp.meyerhoe...@thomsonreuters.com>: > Hi all, > > I have been puzzling over a Kerberos problem for a while now and wondered if > anyone can help. > > For spark-submit, I specify --keytab x --principal y, which creates my > SparkContext fine. > Connections to Zookeeper Quorum to find the HBase master work well too. > But when it comes to a .count() action on the RDD, I am always presented with > the stack trace at the end of this mail. > > We are using CDH5.5.2 (spark 1.5.0), and > com.cloudera.spark.hbase.HBaseContext is a wrapper around > TableInputFormat/hadoopRDD (see > https://github.com/cloudera-labs/SparkOnHBase), as you can see in the stack > trace. > > Am I doing something obvious wrong here? > A similar flow, inside test code, works well, only going via spark-submit > exposes this issue. > > Code snippet (I have tried using the commented-out lines in various > combinations, without success): > > val conf = new SparkConf(). > set("spark.shuffle.consolidateFiles", "true"). > set("spark.kryo.registrationRequired", "false"). > set("spark.serializer", "org.apache.spark.serializer.KryoSerializer"). > set("spark.kryoserializer.buffer", "30m") > val sc = new SparkContext(conf) > val cfg = sc.hadoopConfiguration > // cfg.addResource(new > org.apache.hadoop.fs.Path("/etc/hbase/conf/hbase-site.xml")) > // > UserGroupInformation.getCurrentUser.setAuthenticationMethod(UserGroupInformation.AuthenticationMethod.KERBEROS) > // cfg.set("hbase.security.authentication", "kerberos") > val hc = new HBaseContext(sc, cfg) > val scan = new Scan > scan.setTimeRange(startMillis, endMillis) > val matchesInRange = hc.hbaseRDD(MY_TABLE, scan, resultToMatch) > val cnt = matchesInRange.count() > log.info(s"matches in range $cnt") > > Stack trace / log: > > 16/05/17 17:04:47 INFO SparkContext: Starting job: count at Analysis.scala:93 > 16/05/17 17:04:47 INFO DAGScheduler: Got job 0 (count at Analysis.scala:93) > with 1 output partitions > 16/05/17 17:04:47 INFO DAGScheduler: Final stage: ResultStage 0(count at > Analysis.scala:93) > 16/05/17 17:04:47 INFO DAGScheduler: Parents of final stage: List() > 16/05/17 17:04:47 INFO DAGScheduler: Missing parents: List() > 16/05/17 17:04:47 INFO DAGScheduler: Submitting ResultStage 0 > (MapPartitionsRDD[1] at map at HBaseContext.scala:580), which has no missing > parents > 16/05/17 17:04:47 INFO MemoryStore: ensureFreeSpace(3248) called with > curMem=428022, maxMem=244187136 > 16/05/17 17:04:47 INFO MemoryStore: Block broadcast_3 stored as values in > memory (estimated size 3.2 KB, free 232.5 MB) > 16/05/17 17:04:47 INFO MemoryStore: ensureFreeSpace(2022) called with > curMem=431270, maxMem=244187136 > 16/05/17 17:04:47 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes > in memory (estimated size 2022.0 B, free 232.5 MB) > 16/05/17 17:04:47 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory > on 10.6.164.40:33563 (size: 2022.0 B, free: 232.8 MB) > 16/05/17 17:04:47 INFO SparkContext: Created broadcast 3 from broadcast at > DAGScheduler.scala:861 > 16/05/17 17:04:47 INFO DAGScheduler: Submitting 1 missing tasks from > ResultStage 0 (MapPartitionsRDD[1] at map at HBaseContext.scala:580) > 16/05/17 17:04:47 INFO YarnScheduler: Adding task set 0.0 with 1 tasks > 16/05/17 17:04:47 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, > hpg-dev-vm, partition 0,PROCESS_LOCAL, 2208 bytes) > 16/05/17 17:04:47 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory > on hpg-dev-vm:52698 (size: 2022.0 B, free: 388.4 MB) > 16/05/17 17:04:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory > on hpg-dev-vm:52698 (size: 26.0 KB, free: 388.4 MB) > 16/05/17 17:04:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, > hpg-dev-vm): org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't > get the location > at > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:308) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:155) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:63) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) > at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:314) > at > org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:289) > at > org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:161) > at > org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:156) > at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888) > at > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.restart(TableRecordReaderImpl.java:90) > at > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.initialize(TableRecordReaderImpl.java:167) > at > org.apache.hadoop.hbase.mapreduce.TableRecordReader.initialize(TableRecordReader.java:138) > at > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.initialize(TableInputFormatBase.java:200) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:153) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:124) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Could not set up IO Streams to hpg-dev-vm > /127.0.0.1:60020 > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:773) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:890) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:859) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1193) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:32627) > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1583) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1293) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1125) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:299) > ... 26 more > Caused by: java.lang.RuntimeException: SASL authentication failed. The most > likely cause is missing or invalid credentials. Consider 'kinit'. > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$1.run(RpcClientImpl.java:673) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleSaslConnectionFailure(RpcClientImpl.java:631) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:739) > ... 36 more > Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:605) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:154) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:731) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:728) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:728) > ... 36 more > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tgt) > at > sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) > at > sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) > at > sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) > at > sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) > at > sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) > ... 45 more > > -- > Philipp Meyerhoefer > Thomson Reuters > philipp.meyerhoe...@tr.com > > > ________________________________ > > This e-mail is for the sole use of the intended recipient and contains > information that may be privileged and/or confidential. If you are not an > intended recipient, please notify the sender by return e-mail and delete this > e-mail and any attachments. Certain required legal entity disclosures can be > accessed on our website.<http://site.thomsonreuters.com/site/disclosures/> > > > > Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. > Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank > plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in > England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. > Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. > SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered > Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales > 2299428. Telephone: 0345 603 1637 > > Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential > Regulation Authority and regulated by the Financial Conduct Authority and > Prudential Regulation Authority. > > Cheltenham & Gloucester plc is authorised and regulated by the Financial > Conduct Authority. > > Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester > Savings is a division of Lloyds Bank plc. > > HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in > Scotland no. SC218813. > > This e-mail (including any attachments) is private and confidential and may > contain privileged material. If you have received this e-mail in error, > please notify the sender and delete it (including any attachments) > immediately. You must not copy, distribute, disclose or use any of the > information in it or any attachments. Telephone calls may be monitored or > recorded.