Re: Spark standalone mode and kerberized cluster

2015-06-16 Thread Borja Garrido Bear
Thank you for the answer, it doesn't seem to work neither (I've not log
into the machine as the spark user, but kinit inside the spark-env script),
and also tried inside the job.

I've notice when I run pyspark that the kerberos token is used for
something, but this same behavior is not presented when I start a worker,
so maybe those aren't think to use kerberos...

On Tue, Jun 16, 2015 at 12:10 PM, Steve Loughran ste...@hortonworks.com
wrote:


  On 15 Jun 2015, at 15:43, Borja Garrido Bear kazebo...@gmail.com wrote:

  I tried running the job in a standalone cluster and I'm getting this:

  java.io.IOException: Failed on local exception: java.io.IOException:
 org.apache.hadoop.security.AccessControlException: Client cannot
 authenticate via:[TOKEN, KERBEROS]; Host Details : local host is:
 worker-node/0.0.0.0; destination host is: hdfs:9000;


 Both nodes can access the HDFS running spark locally, and have valid kerberos 
 credentials, I know for the moment keytab is not supported for standalone 
 mode, but as long as the tokens I had when initiating the workers and masters 
 are valid this should work, shouldn't it?




 I don't know anything about tokens on standalone. In YARN what we have to
 do is something called delegation tokens, the client asks (something) for
 tokens granting access to HDFS, and attaches that to the YARN container
 creation request, which is then handed off to the app master, which then
 gets to deal with (a) passing them down to launched workers and (b) dealing
 with token refresh (which is where keytabs come in to play)

  Why not try sshing in to the worker-node as the spark user and run kinit
 there to see if the problem goes away once you've logged in with Kerberos.
 If that works, you're going to have to automate that process across the
 cluster



Re: Spark standalone mode and kerberized cluster

2015-06-16 Thread Steve Loughran

On 15 Jun 2015, at 15:43, Borja Garrido Bear 
kazebo...@gmail.commailto:kazebo...@gmail.com wrote:

I tried running the job in a standalone cluster and I'm getting this:

java.io.IOException: Failed on local exception: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]; Host Details : local host is: 
worker-node/0.0.0.0http://0.0.0.0/; destination host is: hdfs:9000;


Both nodes can access the HDFS running spark locally, and have valid kerberos 
credentials, I know for the moment keytab is not supported for standalone mode, 
but as long as the tokens I had when initiating the workers and masters are 
valid this should work, shouldn't it?



I don't know anything about tokens on standalone. In YARN what we have to do is 
something called delegation tokens, the client asks (something) for tokens 
granting access to HDFS, and attaches that to the YARN container creation 
request, which is then handed off to the app master, which then gets to deal 
with (a) passing them down to launched workers and (b) dealing with token 
refresh (which is where keytabs come in to play)

Why not try sshing in to the worker-node as the spark user and run kinit there 
to see if the problem goes away once you've logged in with Kerberos. If that 
works, you're going to have to automate that process across the cluster


Re: Spark standalone mode and kerberized cluster

2015-06-15 Thread Borja Garrido Bear
I tried running the job in a standalone cluster and I'm getting this:

java.io.IOException: Failed on local exception: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN, KERBEROS]; Host Details : local host is:
worker-node/0.0.0.0; destination host is: hdfs:9000;


Both nodes can access the HDFS running spark locally, and have valid
kerberos credentials, I know for the moment keytab is not supported
for standalone mode, but as long as the tokens I had when initiating
the workers and masters are valid this should work, shouldn't it?



On Thu, Jun 11, 2015 at 10:22 AM, Steve Loughran ste...@hortonworks.com
wrote:

  That's spark on YARN in Kerberos

  In Spark 1.3 you can submit work to a Kerberized Hadoop cluster; once
 the tokens you passed up with your app submission expire (~72 hours) your
 job can't access HDFS any more.

  That's been addressed in Spark 1.4, where you can now specify a kerberos
 keytab for the application master; the AM will then give the workers
 updated tokens when needed.

  The kerberos authentication is all related to the HDFS interaction, YARN
 itself, and the way Kerberized YARN runs your work under your userid, not
 mapred or yarn
 It will also handle SPNEGO authentication between your web browser and the
 Spark UI (which is redirected via the YARN RM Proxy to achieve this)

  it does not do anything about Akka-based IPC between your client code
 and the spark application

  -steve

  On 11 Jun 2015, at 06:47, Akhil Das ak...@sigmoidanalytics.com wrote:

  This might help
 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/Apache_Spark_Quickstart_v224/content/ch_installing-kerb-spark-quickstart.html

  Thanks
 Best Regards

 On Wed, Jun 10, 2015 at 6:49 PM, kazeborja kazebo...@gmail.com wrote:

 Hello all.

 I've been reading some old mails and notice that the use of kerberos in a
 standalone cluster was not supported. Is this stillt he case?

 Thanks.
 Borja.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-mode-and-kerberized-cluster-tp23255.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






Re: Spark standalone mode and kerberized cluster

2015-06-11 Thread Steve Loughran
That's spark on YARN in Kerberos

In Spark 1.3 you can submit work to a Kerberized Hadoop cluster; once the 
tokens you passed up with your app submission expire (~72 hours) your job can't 
access HDFS any more.

That's been addressed in Spark 1.4, where you can now specify a kerberos keytab 
for the application master; the AM will then give the workers updated tokens 
when needed.

The kerberos authentication is all related to the HDFS interaction, YARN 
itself, and the way Kerberized YARN runs your work under your userid, not 
mapred or yarn
It will also handle SPNEGO authentication between your web browser and the 
Spark UI (which is redirected via the YARN RM Proxy to achieve this)

it does not do anything about Akka-based IPC between your client code and the 
spark application

-steve

On 11 Jun 2015, at 06:47, Akhil Das 
ak...@sigmoidanalytics.commailto:ak...@sigmoidanalytics.com wrote:

This might help 
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/Apache_Spark_Quickstart_v224/content/ch_installing-kerb-spark-quickstart.html

Thanks
Best Regards

On Wed, Jun 10, 2015 at 6:49 PM, kazeborja 
kazebo...@gmail.commailto:kazebo...@gmail.com wrote:
Hello all.

I've been reading some old mails and notice that the use of kerberos in a
standalone cluster was not supported. Is this stillt he case?

Thanks.
Borja.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-mode-and-kerberized-cluster-tp23255.html
Sent from the Apache Spark User List mailing list archive at 
Nabble.comhttp://Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org





Re: Spark standalone mode and kerberized cluster

2015-06-10 Thread Akhil Das
This might help
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/Apache_Spark_Quickstart_v224/content/ch_installing-kerb-spark-quickstart.html

Thanks
Best Regards

On Wed, Jun 10, 2015 at 6:49 PM, kazeborja kazebo...@gmail.com wrote:

 Hello all.

 I've been reading some old mails and notice that the use of kerberos in a
 standalone cluster was not supported. Is this stillt he case?

 Thanks.
 Borja.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-mode-and-kerberized-cluster-tp23255.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org