Re: Spark standalone mode and kerberized cluster
Thank you for the answer, it doesn't seem to work neither (I've not log into the machine as the spark user, but kinit inside the spark-env script), and also tried inside the job. I've notice when I run pyspark that the kerberos token is used for something, but this same behavior is not presented when I start a worker, so maybe those aren't think to use kerberos... On Tue, Jun 16, 2015 at 12:10 PM, Steve Loughran ste...@hortonworks.com wrote: On 15 Jun 2015, at 15:43, Borja Garrido Bear kazebo...@gmail.com wrote: I tried running the job in a standalone cluster and I'm getting this: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: worker-node/0.0.0.0; destination host is: hdfs:9000; Both nodes can access the HDFS running spark locally, and have valid kerberos credentials, I know for the moment keytab is not supported for standalone mode, but as long as the tokens I had when initiating the workers and masters are valid this should work, shouldn't it? I don't know anything about tokens on standalone. In YARN what we have to do is something called delegation tokens, the client asks (something) for tokens granting access to HDFS, and attaches that to the YARN container creation request, which is then handed off to the app master, which then gets to deal with (a) passing them down to launched workers and (b) dealing with token refresh (which is where keytabs come in to play) Why not try sshing in to the worker-node as the spark user and run kinit there to see if the problem goes away once you've logged in with Kerberos. If that works, you're going to have to automate that process across the cluster
Re: Spark standalone mode and kerberized cluster
On 15 Jun 2015, at 15:43, Borja Garrido Bear kazebo...@gmail.commailto:kazebo...@gmail.com wrote: I tried running the job in a standalone cluster and I'm getting this: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: worker-node/0.0.0.0http://0.0.0.0/; destination host is: hdfs:9000; Both nodes can access the HDFS running spark locally, and have valid kerberos credentials, I know for the moment keytab is not supported for standalone mode, but as long as the tokens I had when initiating the workers and masters are valid this should work, shouldn't it? I don't know anything about tokens on standalone. In YARN what we have to do is something called delegation tokens, the client asks (something) for tokens granting access to HDFS, and attaches that to the YARN container creation request, which is then handed off to the app master, which then gets to deal with (a) passing them down to launched workers and (b) dealing with token refresh (which is where keytabs come in to play) Why not try sshing in to the worker-node as the spark user and run kinit there to see if the problem goes away once you've logged in with Kerberos. If that works, you're going to have to automate that process across the cluster
Re: Spark standalone mode and kerberized cluster
I tried running the job in a standalone cluster and I'm getting this: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: worker-node/0.0.0.0; destination host is: hdfs:9000; Both nodes can access the HDFS running spark locally, and have valid kerberos credentials, I know for the moment keytab is not supported for standalone mode, but as long as the tokens I had when initiating the workers and masters are valid this should work, shouldn't it? On Thu, Jun 11, 2015 at 10:22 AM, Steve Loughran ste...@hortonworks.com wrote: That's spark on YARN in Kerberos In Spark 1.3 you can submit work to a Kerberized Hadoop cluster; once the tokens you passed up with your app submission expire (~72 hours) your job can't access HDFS any more. That's been addressed in Spark 1.4, where you can now specify a kerberos keytab for the application master; the AM will then give the workers updated tokens when needed. The kerberos authentication is all related to the HDFS interaction, YARN itself, and the way Kerberized YARN runs your work under your userid, not mapred or yarn It will also handle SPNEGO authentication between your web browser and the Spark UI (which is redirected via the YARN RM Proxy to achieve this) it does not do anything about Akka-based IPC between your client code and the spark application -steve On 11 Jun 2015, at 06:47, Akhil Das ak...@sigmoidanalytics.com wrote: This might help http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/Apache_Spark_Quickstart_v224/content/ch_installing-kerb-spark-quickstart.html Thanks Best Regards On Wed, Jun 10, 2015 at 6:49 PM, kazeborja kazebo...@gmail.com wrote: Hello all. I've been reading some old mails and notice that the use of kerberos in a standalone cluster was not supported. Is this stillt he case? Thanks. Borja. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-mode-and-kerberized-cluster-tp23255.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark standalone mode and kerberized cluster
That's spark on YARN in Kerberos In Spark 1.3 you can submit work to a Kerberized Hadoop cluster; once the tokens you passed up with your app submission expire (~72 hours) your job can't access HDFS any more. That's been addressed in Spark 1.4, where you can now specify a kerberos keytab for the application master; the AM will then give the workers updated tokens when needed. The kerberos authentication is all related to the HDFS interaction, YARN itself, and the way Kerberized YARN runs your work under your userid, not mapred or yarn It will also handle SPNEGO authentication between your web browser and the Spark UI (which is redirected via the YARN RM Proxy to achieve this) it does not do anything about Akka-based IPC between your client code and the spark application -steve On 11 Jun 2015, at 06:47, Akhil Das ak...@sigmoidanalytics.commailto:ak...@sigmoidanalytics.com wrote: This might help http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/Apache_Spark_Quickstart_v224/content/ch_installing-kerb-spark-quickstart.html Thanks Best Regards On Wed, Jun 10, 2015 at 6:49 PM, kazeborja kazebo...@gmail.commailto:kazebo...@gmail.com wrote: Hello all. I've been reading some old mails and notice that the use of kerberos in a standalone cluster was not supported. Is this stillt he case? Thanks. Borja. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-mode-and-kerberized-cluster-tp23255.html Sent from the Apache Spark User List mailing list archive at Nabble.comhttp://Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
Re: Spark standalone mode and kerberized cluster
This might help http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/Apache_Spark_Quickstart_v224/content/ch_installing-kerb-spark-quickstart.html Thanks Best Regards On Wed, Jun 10, 2015 at 6:49 PM, kazeborja kazebo...@gmail.com wrote: Hello all. I've been reading some old mails and notice that the use of kerberos in a standalone cluster was not supported. Is this stillt he case? Thanks. Borja. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-mode-and-kerberized-cluster-tp23255.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org