Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
You could have posted just the error, which is at the end of my response. Why are you trying to use WebHDFS? I'm not really sure how authentication works with that. But generally applications use HDFS (which uses a different URI scheme), and Spark should work fine with that. Error:

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Gerard Casey
Sure - I wanted to check with admin before sharing. I’ve attached it now, does this help? Many thanks again, G Container: container_e34_1479877553404_0174_01_03 on hdp-node12.xcat.cluster_45454_1481228528201

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
Then you probably have a configuration error somewhere. Since you haven't actually posted the error you're seeing, it's kinda hard to help any further. On Thu, Dec 8, 2016 at 11:17 AM, Gerard Casey wrote: > Right. I’m confident that is setup correctly. > > I can run

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Gerard Casey
Right. I’m confident that is setup correctly. I can run the SparkPi test script. The main difference between it and my application is that it doesn’t access HDFS. > On 8 Dec 2016, at 18:43, Marcelo Vanzin wrote: > > On Wed, Dec 7, 2016 at 11:54 PM, Gerard Casey

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
On Wed, Dec 7, 2016 at 11:54 PM, Gerard Casey wrote: > To be specific, where exactly should spark.authenticate be set to true? spark.authenticate has nothing to do with kerberos. It's for authentication between different Spark processes belonging to the same app. --

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks Marcin, That seems to be the case. It explains why there is no documentation on this part too! To be specific, where exactly should spark.authenticate be set to true? Many thanks, Gerry > On 8 Dec 2016, at 08:46, Marcin Pastecki wrote: > > My understanding

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Marcin Pastecki
My understanding is that the token generation is handled by Spark itself as long as you were authenticated in Kerberos when submitting the job and spark.authenticate is set to true. --keytab and --principal options should be used for "long" running job, when you may need to do ticket renewal.

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
I just read an interesting comment on cloudera: What does it mean by “when the job is submitted,and you have a kinit, you will have TOKEN to access HDFS, you would need to pass that on, or the KERBEROS ticket” ? Reference

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks Marcelo. I’ve completely removed it. Ok - even if I read/write from HDFS? Trying to the SparkPi example now G > On 7 Dec 2016, at 22:10, Marcelo Vanzin wrote: > > Have you removed all the code dealing with Kerberos that you posted? > You should not be setting

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Marcelo Vanzin
Have you removed all the code dealing with Kerberos that you posted? You should not be setting those principal / keytab configs. Literally all you have to do is login with kinit then run spark-submit. Try with the SparkPi example for instance, instead of your own code. If that doesn't work, you

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks. I’ve checked the TGT, principal and key tab. Where to next?! > On 7 Dec 2016, at 22:03, Marcelo Vanzin wrote: > > On Wed, Dec 7, 2016 at 12:15 PM, Gerard Casey > wrote: >> Can anyone point me to a tutorial or a run through of how to

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Marcelo Vanzin
On Wed, Dec 7, 2016 at 12:15 PM, Gerard Casey wrote: > Can anyone point me to a tutorial or a run through of how to use Spark with > Kerberos? This is proving to be quite confusing. Most search results on the > topic point to what needs inputted at the point of `sparks

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks Marcelo, Turns out I had missed setup steps in the actual file itself. Thanks to Richard for the help here. He pointed me to some java implementations. I’m using the import org.apache.hadoop.security API. I now have: /* graphx_sp.scala */ import scala.util.Try import scala.io.Source

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Marcelo Vanzin
That's not the error, that's just telling you the application failed. You have to look at the YARN logs for application_1479877553404_0041 to see why it failed. On Mon, Dec 5, 2016 at 10:44 AM, Gerard Casey wrote: > Thanks Marcelo, > > My understanding from a few

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
Thanks Marcelo, My understanding from a few pointers is that this may be due to insufficient read permissions to the key tab or a corrupt key tab. I have checked the read permissions and they are ok. I can see that it is initially configuring correctly: INFO

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Marcelo Vanzin
There's generally an exception in these cases, and you haven't posted it, so it's hard to tell you what's wrong. The most probable cause, without the extra information the exception provides, is that you're using the wrong Hadoop configuration when submitting the job to YARN. On Mon, Dec 5, 2016

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Jorge Sánchez
Hi Gerard, have you tried running in yarn-client mode? If so, do you still get that same error? Regards. 2016-12-05 12:49 GMT+00:00 Gerard Casey : > Edit. From here > > I > read that

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
Edit. From here I read that you can pass a `key tab` option to spark-submit. I thus tried spark-submit --class "graphx_sp" --master yarn --keytab /path/to/keytab --deploy-mode cluster --executor-memory 13G

Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
Hello all, I am using Spark with Kerberos authentication. I can run my code using `spark-shell` fine and I can also use `spark-submit` in local mode (e.g. —master local[16]). Both function as expected. local mode - spark-submit --class "graphx_sp" --master local[16] --driver-memory