Thanks Marcin,

That seems to be the case. It explains why there is no documentation on this 
part too!

To be specific, where exactly should spark.authenticate be set to true?

Many thanks,

Gerry

> On 8 Dec 2016, at 08:46, Marcin Pastecki <marcin.paste...@gmail.com> wrote:
> 
> My understanding is that the token generation is handled by Spark itself as 
> long as you were authenticated in Kerberos when submitting the job and 
> spark.authenticate is set to true.
> 
> --keytab and --principal options should be used for "long" running job, when 
> you may need to do ticket renewal. Spark will handle it then. I may be wrong 
> though.
> 
> I guess it gets even more complicated if you need to access other secured 
> service from Spark like hbase or Phoenix, but i guess this is for another 
> discussion.
> 
> Regards,
> Marcin
> 
> 
> On Thu, Dec 8, 2016, 08:40 Gerard Casey <gerardhughca...@gmail.com 
> <mailto:gerardhughca...@gmail.com>> wrote:
> I just read an interesting comment on cloudera:
> 
> What does it mean by “when the job is submitted,and you have a kinit, you 
> will have TOKEN to access HDFS, you would need to pass that on, or the 
> KERBEROS ticket” ?
> 
> Reference 
> <https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/org-apache-hadoop-security-AccessControlException-SIMPLE/td-p/28082>
>  and full quote:
> 
> In a cluster which is kerberised there is no SIMPLE authentication. Make sure 
> that you have run kinit before you run the application.
> Second thing to check: In your application you need to do the right thing and 
> either pass on the TOKEN or a KERBEROS ticket.
> When the job is submitted, and you have done a kinit, you will have TOKEN to 
> access HDFS you would need to pass that on, or the KERBEROS ticket.
> You will need to handle this in your code. I can not see exactly what you are 
> doing at that point in the startup of your code but any HDFS access will 
> require a TOKEN or KERBEROS ticket.
>  
> Cheers,
> Wilfred
> 
>> On 8 Dec 2016, at 08:35, Gerard Casey <gerardhughca...@gmail.com 
>> <mailto:gerardhughca...@gmail.com>> wrote:
>> 
>> Thanks Marcelo.
>> 
>> I’ve completely removed it. Ok - even if I read/write from HDFS?
>> 
>> Trying to the SparkPi example now
>> 
>> G
>> 
>>> On 7 Dec 2016, at 22:10, Marcelo Vanzin <van...@cloudera.com 
>>> <mailto:van...@cloudera.com>> wrote:
>>> 
>>> Have you removed all the code dealing with Kerberos that you posted?
>>> You should not be setting those principal / keytab configs.
>>> 
>>> Literally all you have to do is login with kinit then run spark-submit.
>>> 
>>> Try with the SparkPi example for instance, instead of your own code.
>>> If that doesn't work, you have a configuration issue somewhere.
>>> 
>>> On Wed, Dec 7, 2016 at 1:09 PM, Gerard Casey <gerardhughca...@gmail.com 
>>> <mailto:gerardhughca...@gmail.com>> wrote:
>>>> Thanks.
>>>> 
>>>> I’ve checked the TGT, principal and key tab. Where to next?!
>>>> 
>>>>> On 7 Dec 2016, at 22:03, Marcelo Vanzin <van...@cloudera.com 
>>>>> <mailto:van...@cloudera.com>> wrote:
>>>>> 
>>>>> On Wed, Dec 7, 2016 at 12:15 PM, Gerard Casey <gerardhughca...@gmail.com 
>>>>> <mailto:gerardhughca...@gmail.com>> wrote:
>>>>>> Can anyone point me to a tutorial or a run through of how to use Spark 
>>>>>> with
>>>>>> Kerberos? This is proving to be quite confusing. Most search results on 
>>>>>> the
>>>>>> topic point to what needs inputted at the point of `sparks submit` and 
>>>>>> not
>>>>>> the changes needed in the actual src/main/.scala file
>>>>> 
>>>>> You don't need to write any special code to run Spark with Kerberos.
>>>>> Just write your application normally, and make sure you're logged in
>>>>> to the KDC (i.e. "klist" shows a valid TGT) before running your app.
>>>>> 
>>>>> 
>>>>> --
>>>>> Marcelo
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Marcelo
>> 
> 

Reply via email to