[jira] [Comment Edited] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode

2018-06-19 Thread Jelmer Kuperus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516856#comment-16516856
 ] 

Jelmer Kuperus edited comment on SPARK-5158 at 6/19/18 9:33 AM:


I ended up with the following workaround which at first glance seems to work

1. create a _.java.login.config_ file in the home directory of the spark with 
the following contents
{noformat}
com.sun.security.jgss.krb5.initiate {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  useTicketCache="true"
  ticketCache="/tmp/krb5cc_0"
  keyTab="/path/to/my.keytab"
  principal="u...@foo.com";
};{noformat}
2. put a krb5.conf file in /etc/krb5.conf

3. place your hadoop configuration in /etc/hadoop/conf and in `core-site.xml` 
set : 
 * fs.defaultFS to webhdfs://your_hostname:14000/webhdfs/v1
 * hadoop.security.authentication to kerberos
 * hadoop.security.authorization to true

4. make sure the hadoop config gets is on the classpath of spark. Eg the 
process should have something like this in it
{noformat}
-cp /etc/spark/:/usr/share/spark/jars/*:/etc/hadoop/conf/{noformat}
 

This configures a single principal for the enire spark process. If you want to 
change the default paths to the configuration files you can use
{noformat}
-Djava.security.krb5.conf=/etc/krb5.conf 
-Djava.security.auth.login.config=/path/to/jaas.conf{noformat}
 

 

 


was (Author: jelmer):
I ended up with the following workaround which at first glance seems to work

1. create a `.java.login.config` file in the home directory of the spark with 
the following contents
{noformat}
com.sun.security.jgss.krb5.initiate {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  useTicketCache="true"
  ticketCache="/tmp/krb5cc_0"
  keyTab="/path/to/my.keytab"
  principal="u...@foo.com";
};{noformat}
2. put a krb5.conf file in /etc/krb5.conf

3. place your hadoop configuration in /etc/hadoop/conf and in `core-site.xml` 
set : 
 * fs.defaultFS to webhdfs://your_hostname:14000/webhdfs/v1
 * hadoop.security.authentication to kerberos
 * hadoop.security.authorization to true

4. make sure the hadoop config gets is on the classpath of spark. Eg the 
process should have something like this in it
{noformat}
-cp /etc/spark/:/usr/share/spark/jars/*:/etc/hadoop/conf/{noformat}
 

This configures a single principal for the enire spark process. If you want to 
change the default paths to the configuration files you can use
{noformat}
-Djava.security.krb5.conf=/etc/krb5.conf 
-Djava.security.auth.login.config=/path/to/jaas.conf{noformat}
 

 

 

> Allow for keytab-based HDFS security in Standalone mode
> ---
>
> Key: SPARK-5158
> URL: https://issues.apache.org/jira/browse/SPARK-5158
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Matthew Cheah
>Priority: Critical
>
> There have been a handful of patches for allowing access to Kerberized HDFS 
> clusters in standalone mode. The main reason we haven't accepted these 
> patches have been that they rely on insecure distribution of token files from 
> the driver to the other components.
> As a simpler solution, I wonder if we should just provide a way to have the 
> Spark driver and executors independently log in and acquire credentials using 
> a keytab. This would work for users who have a dedicated, single-tenant, 
> Spark clusters (i.e. they are willing to have a keytab on every machine 
> running Spark for their application). It wouldn't address all possible 
> deployment scenarios, but if it's simple I think it's worth considering.
> This would also work for Spark streaming jobs, which often run on dedicated 
> hardware since they are long-running services.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode

2018-06-19 Thread Jelmer Kuperus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516856#comment-16516856
 ] 

Jelmer Kuperus edited comment on SPARK-5158 at 6/19/18 9:32 AM:


I ended up with the following workaround which at first glance seems to work

1. create a `.java.login.config` file in the home directory of the spark with 
the following contents
{noformat}
com.sun.security.jgss.krb5.initiate {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  useTicketCache="true"
  ticketCache="/tmp/krb5cc_0"
  keyTab="/path/to/my.keytab"
  principal="u...@foo.com";
};{noformat}
2. put a krb5.conf file in /etc/krb5.conf

3. place your hadoop configuration in /etc/hadoop/conf and in `core-site.xml` 
set : 
 * fs.defaultFS to webhdfs://your_hostname:14000/webhdfs/v1
 * hadoop.security.authentication to kerberos
 * hadoop.security.authorization to true

4. make sure the hadoop config gets is on the classpath of spark. Eg the 
process should have something like this in it
{noformat}
-cp /etc/spark/:/usr/share/spark/jars/*:/etc/hadoop/conf/{noformat}
 

This configures a single principal for the enire spark process. If you want to 
change the default paths to the configuration files you can use
{noformat}
-Djava.security.krb5.conf=/etc/krb5.conf 
-Djava.security.auth.login.config=/path/to/jaas.conf{noformat}
 

 

 


was (Author: jelmer):
I ended up with the following workaround which at first glance seems to work

1. create a `.java.login.config` file in the home directory of the spark with 
the following contents


{noformat}
com.sun.security.jgss.krb5.initiate {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  useTicketCache="true"
  ticketCache="/tmp/krb5cc_0"
  keyTab="/path/to/my.keytab"
  principal="u...@foo.com";
};{noformat}

2. put a krb5.conf file in /etc/krb5.conf

3. place your hadoop configuration in /etc/hadoop/conf and in `core-site.xml` 
set : 
 * fs.defaultFS to webhdfs://your_hostname:14000/webhdfs/v1
 * hadoop.security.authentication to kerberos
 * hadoop.security.authorization to true

4. make sure the hadoop config gets is on the classpath of spark. Eg the 
process should have something like this in it
{noformat}
-cp /etc/spark/:/usr/share/spark/jars/*:/etc/hadoop/conf/{noformat}
 

 

> Allow for keytab-based HDFS security in Standalone mode
> ---
>
> Key: SPARK-5158
> URL: https://issues.apache.org/jira/browse/SPARK-5158
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Matthew Cheah
>Priority: Critical
>
> There have been a handful of patches for allowing access to Kerberized HDFS 
> clusters in standalone mode. The main reason we haven't accepted these 
> patches have been that they rely on insecure distribution of token files from 
> the driver to the other components.
> As a simpler solution, I wonder if we should just provide a way to have the 
> Spark driver and executors independently log in and acquire credentials using 
> a keytab. This would work for users who have a dedicated, single-tenant, 
> Spark clusters (i.e. they are willing to have a keytab on every machine 
> running Spark for their application). It wouldn't address all possible 
> deployment scenarios, but if it's simple I think it's worth considering.
> This would also work for Spark streaming jobs, which often run on dedicated 
> hardware since they are long-running services.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode

2017-04-04 Thread Ian Hummel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955528#comment-15955528
 ] 

Ian Hummel edited comment on SPARK-5158 at 4/4/17 6:07 PM:
---

At Bloomberg we've been working on a solution to this issue so we can access 
kerberized HDFS clusters from standalone Spark installations we run on our 
internal cloud infrastructure.

Folks who are interested can try out a patch at 
https://github.com/themodernlife/spark/tree/spark-5158.  It extends standalone 
mode to support configuration related to {{\-\-principal}} and {{\-\-keytab}}.

The main changes are
- Refactor {{ConfigurableCredentialManager}} and related 
{{CredentialProviders}} so that they are no longer tied to YARN
- Setup credential renewal/updating from within the 
{{StandaloneSchedulerBackend}}
- Ensure executors/drivers are able to find initial tokens for contacting HDFS 
and renew them at regular intervals

The implementation does basically the same thing as the YARN backend.  The 
keytab is copied to driver/executors through an environment variable in the 
{{ApplicationDescription}}.  I might be wrong, but I'm assuming proper 
{{spark.authenticate}} setup would ensure it's encrypted over-the-wire (can 
anyone confirm?).  Credentials on the executors and the driver (cluster mode) 
are written to disk as whatever user the Spark daemon runs as.  Open to 
suggestions on whether it's worth tightening that up.

Would appreciate any feedback from the community.


was (Author: themodernlife):
At Bloomberg we've been working on a solution to this issue so we can access 
kerberized HDFS clusters from standalone Spark installations we run on our 
internal cloud infrastructure.

Folks who are interested can try out a patch at 
https://github.com/themodernlife/spark/tree/spark-5158.  It extends standalone 
mode to support configuration related to {{--principal}} and {{--keytab}}.

The main changes are
- Refactor {{ConfigurableCredentialManager}} and related 
{{CredentialProviders}} so that they are no longer tied to YARN
- Setup credential renewal/updating from within the 
{{StandaloneSchedulerBackend}}
- Ensure executors/drivers are able to find initial tokens for contacting HDFS 
and renew them at regular intervals

The implementation does basically the same thing as the YARN backend.  The 
keytab is copied to driver/executors through an environment variable in the 
{{ApplicationDescription}}.  I might be wrong, but I'm assuming proper 
{{spark.authenticate}} setup would ensure it's encrypted over-the-wire (can 
anyone confirm?).  Credentials on the executors and the driver (cluster mode) 
are written to disk as whatever user the Spark daemon runs as.  Open to 
suggestions on whether it's worth tightening that up.

Would appreciate any feedback from the community.

> Allow for keytab-based HDFS security in Standalone mode
> ---
>
> Key: SPARK-5158
> URL: https://issues.apache.org/jira/browse/SPARK-5158
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Matthew Cheah
>Priority: Critical
>
> There have been a handful of patches for allowing access to Kerberized HDFS 
> clusters in standalone mode. The main reason we haven't accepted these 
> patches have been that they rely on insecure distribution of token files from 
> the driver to the other components.
> As a simpler solution, I wonder if we should just provide a way to have the 
> Spark driver and executors independently log in and acquire credentials using 
> a keytab. This would work for users who have a dedicated, single-tenant, 
> Spark clusters (i.e. they are willing to have a keytab on every machine 
> running Spark for their application). It wouldn't address all possible 
> deployment scenarios, but if it's simple I think it's worth considering.
> This would also work for Spark streaming jobs, which often run on dedicated 
> hardware since they are long-running services.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode

2016-02-16 Thread Henry Saputra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149309#comment-15149309
 ] 

Henry Saputra edited comment on SPARK-5158 at 2/16/16 9:14 PM:
---

HI All, seemed like all PRs for this issue are closed.

This PR: 
https://github.com/apache/spark/pull/265 

is closed claiming there is a more recent PR is being work on, which I assume 
is this one:

https://github.com/apache/spark/pull/4106

but this one is also closed due to inactivity.

Looking at the issues filed that are closed as duplicate for this one, there is 
a need and interest to get standalone mode to access secured HDFS given the 
active users keytab already available to the machines that run Spark.


was (Author: hsaputra):
All, the PR for this issues are closed.

This PR: 
https://github.com/apache/spark/pull/265 

is closed claiming there is a more recent PR is being work on, which I assume 
is this one:

https://github.com/apache/spark/pull/4106

but this one is also closed due to inactivity.

Looking at the issues filed that are closed as duplicate for this one, there is 
a need and interest to get standalone mode to access secured HDFS given the 
active users keytab already available to the machines that run Spark.

> Allow for keytab-based HDFS security in Standalone mode
> ---
>
> Key: SPARK-5158
> URL: https://issues.apache.org/jira/browse/SPARK-5158
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Matthew Cheah
>Priority: Critical
>
> There have been a handful of patches for allowing access to Kerberized HDFS 
> clusters in standalone mode. The main reason we haven't accepted these 
> patches have been that they rely on insecure distribution of token files from 
> the driver to the other components.
> As a simpler solution, I wonder if we should just provide a way to have the 
> Spark driver and executors independently log in and acquire credentials using 
> a keytab. This would work for users who have a dedicated, single-tenant, 
> Spark clusters (i.e. they are willing to have a keytab on every machine 
> running Spark for their application). It wouldn't address all possible 
> deployment scenarios, but if it's simple I think it's worth considering.
> This would also work for Spark streaming jobs, which often run on dedicated 
> hardware since they are long-running services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org