Re: Accessing external Kerberised resources from Spark executors in Yarn client/cluster mode

Doug Balog Thu, 22 Oct 2015 07:32:43 -0700

Another thing to check is to make sure each one of you executor nodes has the 
JCE jars installed.


try{ javax.crypto.Cipher.getMaxAllowedKeyLength("AES") > 128 } catch { case  
e:java.security.NoSuchAlgorithmException => false }

Setting  "-Dsun.security.krb5.debug=true” and “-Dsun.security.jgss.debug=true”  
in spark.executor.extraJavaOptions
and running loginUserFromKeytab() will generate a lot of info in the executor 
logs, which might be helpful to figure out what is going on too.

Cheers,

Doug


> On Oct 22, 2015, at 7:59 AM, Deenar Toraskar <deenar.toras...@gmail.com> 
> wrote:
> 
> Hi All
> 
> I am trying to access a SQLServer that uses Kerberos for authentication from 
> Spark. I can successfully connect to the SQLServer from the driver node, but 
> any connections to SQLServer from executors fails with "Failed to find any 
> Kerberos tgt". 
> 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser on the driver 
> returns myPrincipal (auth:KERBEROS) as expected. And the same call on 
> executors returns
> 
> sc.parallelize(0 to 10).map { _ =>(("hostname".!!).trim, 
> UserGroupInformation.getCurrentUser.toString)}.collect.distinct 
> 
> returns
> 
> Array((hostname1, myprincipal (auth:SIMPLE), (hostname2, myprincipal 
> (auth:SIMPLE))
> 
> 
> I tried passing the keytab and logging in explicitly from the executors, but 
> that didnt help either.
> 
> sc.parallelize(0 to 10).map { _ 
> =>(SparkHadoopUtil.get.loginUserFromKeytab("myprincipal",SparkFiles.get("myprincipal.keytab")),
>  ("hostname".!!).trim, 
> UserGroupInformation.getCurrentUser.toString)}.collect.distinct
> 
> Digging deeper I found SPARK-6207 and came across code for each Kerberised 
> service that is accessed from the executors in Yarn Client, such as
> 
> obtainTokensForNamenodes(nns, hadoopConf, credentials)
> 
> obtainTokenForHiveMetastore(hadoopConf, credentials)
> 
> I was wondering if anyone has been successful in accessing external resources 
> (running external to the Hadoop cluster) secured by Kerberos in Spark 
> executors running in Yarn. 
> 
> 
> 
> Regards
> Deenar
> 
> 
> On 20 April 2015 at 21:58, Andrew Lee <alee...@hotmail.com> wrote:
> Hi All,
> 
> Affected version: spark 1.2.1 / 1.2.2 / 1.3-rc1
> 
> Posting this problem to user group first to see if someone is encountering 
> the same problem. 
> 
> When submitting spark jobs that invokes HiveContext APIs on a Kerberos Hadoop 
> + YARN (2.4.1) cluster, 
> I'm getting this error. 
> 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 
> Apparently, the Kerberos ticket is not on the remote data node nor computing 
> node since we don't 
> deploy Kerberos tickets, and that is not a good practice either. On the other 
> hand, we can't just SSH to every machine and run kinit for that users. This 
> is not practical and it is insecure.
> 
> The point here is that shouldn't there be a delegation token during the doAs 
> to use the token instead of the ticket ? 
> I'm trying to understand what is missing in Spark's HiveContext API while a 
> normal MapReduce job that invokes Hive APIs will work, but not in Spark SQL. 
> Any insights or feedback are appreciated.
> 
> Anyone got this running without pre-deploying (pre-initializing) all tickets 
> node by node? Is this worth filing a JIRA?
> 
> 
> 
> 15/03/25 18:59:08 INFO hive.metastore: Trying to connect to metastore with 
> URI thrift://alee-cluster.test.testserver.com:9083
> 15/03/25 18:59:08 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>       at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>       at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>       at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
>       at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>       at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>       at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>       at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:336)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:214)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
>       at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
>       at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:235)
>       at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:231)
>       at scala.Option.orElse(Option.scala:257)
>       at 
> org.apache.spark.sql.hive.HiveContext.x$3$lzycompute(HiveContext.scala:231)
>       at org.apache.spark.sql.hive.HiveContext.x$3(HiveContext.scala:229)
>       at 
> org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:229)
>       at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:229)
>       at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.<init>(HiveMetastoreCatalog.scala:55)
>       at 
> org.apache.spark.sql.hive.HiveContext$$anon$2.<init>(HiveContext.scala:253)
>       at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:253)
>       at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:253)
>       at 
> org.apache.spark.sql.hive.HiveContext$$anon$4.<init>(HiveContext.scala:263)
>       at 
> org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:263)
>       at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:262)
>       at 
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
>       at 
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
>       at 
> org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
>       at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108)
>       at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:102)
>       at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:106)
>       at 
> SparkSQLTestCase2HiveContextYarnClusterApp$.main(sparksql_hivecontext_examples_yarncluster.scala:17)
>       at 
> SparkSQLTestCase2HiveContextYarnClusterApp.main(sparksql_hivecontext_examples_yarncluster.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:441)
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt)
>       at 
> sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
>       at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:121)
>       at 
> sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
>       at 
> sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:223)
>       at 
> sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
>       at 
> sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
>       at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
>       ... 48 more
> 
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Accessing external Kerberised resources from Spark executors in Yarn client/cluster mode

Reply via email to