Re: Question about standalone Spark cluster reading from Kerberosed hadoop

Mu Kong Fri, 23 Jun 2017 03:04:39 -0700

Thanks for your prompt responses!

@Steve


I actually put my keytabs to all the nodes already. And I used them to
kinit on each server.

But how can I make spark to use my key tab and principle when I start
cluster or submit the job? Or is there a way to let spark use ticket cache
on each node?

I tried --keytab and --principle when I submit the job, still get the same
error. I guess that's for YARN only.

On Fri, Jun 23, 2017 at 18:50 Steve Loughran <ste...@hortonworks.com> wrote:

> On 23 Jun 2017, at 10:22, Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
> Spark running with standalone cluster manager currently doesn't support
> accessing security Hadoop. Basically the problem is that standalone mode
> Spark doesn't have the facility to distribute delegation tokens.
>
> Currently only Spark on YARN or local mode supports security Hadoop.
>
> Thanks
> Jerry
>
>
> There's possibly an ugly workaround where you ssh in to every node and log
> in direct to your kdc using a keytab you pushed out...that would eliminate
> the need for anything related to hadoop tokens. After all, that's
> essentially what spark-on-yarn does when when you give it keytab.
>
>
> see also:
> https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details
>
> On Fri, Jun 23, 2017 at 5:10 PM, Mu Kong <kong.mu....@gmail.com> wrote:
>
>> Hi, all!
>>
>> I was trying to read from a Kerberosed hadoop cluster from a standalone
>> spark cluster.
>> Right now, I encountered some authentication issues with Kerberos:
>>
>>
>> java.io.IOException: Failed on local exception: java.io.IOException: 
>> org.apache.hadoop.security.AccessControlException: Client cannot 
>> authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: 
>> "XXXXXXXXXXXX"; destination host is: XXXXXXXXXXXXXXX;
>>
>>
>>
>> I checked with klist, and principle/realm is correct.
>> I also used hdfs command line to poke HDFS from all the nodes, and it
>> worked.
>> And if I submit job using local(client) mode, the job worked fine.
>>
>> I tried to put everything from hadoop/conf to spark/conf and hive/conf to
>> spark/conf.
>> Also tried edit spark/conf/spark-env.sh to add
>> SPARK_SUBMIT_OPTS/SPARK_MASTER_OPTS/SPARK_SLAVE_OPTS/HADOOP_CONF_DIR/HIVE_CONF_DIR,
>> and tried to export them in .bashrc as well.
>>
>> However, I'm still experiencing the same exception.
>>
>> Then I read some concerning posts about problems with
>> kerberosed hadoop, some post like the following one:
>> http://blog.stratio.com/spark-kerberos-safe-story/
>> , which indicates that we can not access to kerberosed hdfs using
>> standalone spark cluster.
>>
>> I'm using spark 2.1.1, is it still the case that we can't access
>> kerberosed hdfs with 2.1.1?
>>
>> Thanks!
>>
>>
>> Best regards,
>> Mu
>>
>>
>

Re: Question about standalone Spark cluster reading from Kerberosed hadoop

Reply via email to