[jira] [Comment Edited] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode
[ https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516856#comment-16516856 ] Jelmer Kuperus edited comment on SPARK-5158 at 6/19/18 9:33 AM: I ended up with the following workaround which at first glance seems to work 1. create a _.java.login.config_ file in the home directory of the spark with the following contents {noformat} com.sun.security.jgss.krb5.initiate { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache="true" ticketCache="/tmp/krb5cc_0" keyTab="/path/to/my.keytab" principal="u...@foo.com"; };{noformat} 2. put a krb5.conf file in /etc/krb5.conf 3. place your hadoop configuration in /etc/hadoop/conf and in `core-site.xml` set : * fs.defaultFS to webhdfs://your_hostname:14000/webhdfs/v1 * hadoop.security.authentication to kerberos * hadoop.security.authorization to true 4. make sure the hadoop config gets is on the classpath of spark. Eg the process should have something like this in it {noformat} -cp /etc/spark/:/usr/share/spark/jars/*:/etc/hadoop/conf/{noformat} This configures a single principal for the enire spark process. If you want to change the default paths to the configuration files you can use {noformat} -Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/path/to/jaas.conf{noformat} was (Author: jelmer): I ended up with the following workaround which at first glance seems to work 1. create a `.java.login.config` file in the home directory of the spark with the following contents {noformat} com.sun.security.jgss.krb5.initiate { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache="true" ticketCache="/tmp/krb5cc_0" keyTab="/path/to/my.keytab" principal="u...@foo.com"; };{noformat} 2. put a krb5.conf file in /etc/krb5.conf 3. place your hadoop configuration in /etc/hadoop/conf and in `core-site.xml` set : * fs.defaultFS to webhdfs://your_hostname:14000/webhdfs/v1 * hadoop.security.authentication to kerberos * hadoop.security.authorization to true 4. make sure the hadoop config gets is on the classpath of spark. Eg the process should have something like this in it {noformat} -cp /etc/spark/:/usr/share/spark/jars/*:/etc/hadoop/conf/{noformat} This configures a single principal for the enire spark process. If you want to change the default paths to the configuration files you can use {noformat} -Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/path/to/jaas.conf{noformat} > Allow for keytab-based HDFS security in Standalone mode > --- > > Key: SPARK-5158 > URL: https://issues.apache.org/jira/browse/SPARK-5158 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Matthew Cheah >Priority: Critical > > There have been a handful of patches for allowing access to Kerberized HDFS > clusters in standalone mode. The main reason we haven't accepted these > patches have been that they rely on insecure distribution of token files from > the driver to the other components. > As a simpler solution, I wonder if we should just provide a way to have the > Spark driver and executors independently log in and acquire credentials using > a keytab. This would work for users who have a dedicated, single-tenant, > Spark clusters (i.e. they are willing to have a keytab on every machine > running Spark for their application). It wouldn't address all possible > deployment scenarios, but if it's simple I think it's worth considering. > This would also work for Spark streaming jobs, which often run on dedicated > hardware since they are long-running services. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode
[ https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516856#comment-16516856 ] Jelmer Kuperus edited comment on SPARK-5158 at 6/19/18 9:32 AM: I ended up with the following workaround which at first glance seems to work 1. create a `.java.login.config` file in the home directory of the spark with the following contents {noformat} com.sun.security.jgss.krb5.initiate { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache="true" ticketCache="/tmp/krb5cc_0" keyTab="/path/to/my.keytab" principal="u...@foo.com"; };{noformat} 2. put a krb5.conf file in /etc/krb5.conf 3. place your hadoop configuration in /etc/hadoop/conf and in `core-site.xml` set : * fs.defaultFS to webhdfs://your_hostname:14000/webhdfs/v1 * hadoop.security.authentication to kerberos * hadoop.security.authorization to true 4. make sure the hadoop config gets is on the classpath of spark. Eg the process should have something like this in it {noformat} -cp /etc/spark/:/usr/share/spark/jars/*:/etc/hadoop/conf/{noformat} This configures a single principal for the enire spark process. If you want to change the default paths to the configuration files you can use {noformat} -Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/path/to/jaas.conf{noformat} was (Author: jelmer): I ended up with the following workaround which at first glance seems to work 1. create a `.java.login.config` file in the home directory of the spark with the following contents {noformat} com.sun.security.jgss.krb5.initiate { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache="true" ticketCache="/tmp/krb5cc_0" keyTab="/path/to/my.keytab" principal="u...@foo.com"; };{noformat} 2. put a krb5.conf file in /etc/krb5.conf 3. place your hadoop configuration in /etc/hadoop/conf and in `core-site.xml` set : * fs.defaultFS to webhdfs://your_hostname:14000/webhdfs/v1 * hadoop.security.authentication to kerberos * hadoop.security.authorization to true 4. make sure the hadoop config gets is on the classpath of spark. Eg the process should have something like this in it {noformat} -cp /etc/spark/:/usr/share/spark/jars/*:/etc/hadoop/conf/{noformat} > Allow for keytab-based HDFS security in Standalone mode > --- > > Key: SPARK-5158 > URL: https://issues.apache.org/jira/browse/SPARK-5158 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Matthew Cheah >Priority: Critical > > There have been a handful of patches for allowing access to Kerberized HDFS > clusters in standalone mode. The main reason we haven't accepted these > patches have been that they rely on insecure distribution of token files from > the driver to the other components. > As a simpler solution, I wonder if we should just provide a way to have the > Spark driver and executors independently log in and acquire credentials using > a keytab. This would work for users who have a dedicated, single-tenant, > Spark clusters (i.e. they are willing to have a keytab on every machine > running Spark for their application). It wouldn't address all possible > deployment scenarios, but if it's simple I think it's worth considering. > This would also work for Spark streaming jobs, which often run on dedicated > hardware since they are long-running services. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode
[ https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955528#comment-15955528 ] Ian Hummel edited comment on SPARK-5158 at 4/4/17 6:07 PM: --- At Bloomberg we've been working on a solution to this issue so we can access kerberized HDFS clusters from standalone Spark installations we run on our internal cloud infrastructure. Folks who are interested can try out a patch at https://github.com/themodernlife/spark/tree/spark-5158. It extends standalone mode to support configuration related to {{\-\-principal}} and {{\-\-keytab}}. The main changes are - Refactor {{ConfigurableCredentialManager}} and related {{CredentialProviders}} so that they are no longer tied to YARN - Setup credential renewal/updating from within the {{StandaloneSchedulerBackend}} - Ensure executors/drivers are able to find initial tokens for contacting HDFS and renew them at regular intervals The implementation does basically the same thing as the YARN backend. The keytab is copied to driver/executors through an environment variable in the {{ApplicationDescription}}. I might be wrong, but I'm assuming proper {{spark.authenticate}} setup would ensure it's encrypted over-the-wire (can anyone confirm?). Credentials on the executors and the driver (cluster mode) are written to disk as whatever user the Spark daemon runs as. Open to suggestions on whether it's worth tightening that up. Would appreciate any feedback from the community. was (Author: themodernlife): At Bloomberg we've been working on a solution to this issue so we can access kerberized HDFS clusters from standalone Spark installations we run on our internal cloud infrastructure. Folks who are interested can try out a patch at https://github.com/themodernlife/spark/tree/spark-5158. It extends standalone mode to support configuration related to {{--principal}} and {{--keytab}}. The main changes are - Refactor {{ConfigurableCredentialManager}} and related {{CredentialProviders}} so that they are no longer tied to YARN - Setup credential renewal/updating from within the {{StandaloneSchedulerBackend}} - Ensure executors/drivers are able to find initial tokens for contacting HDFS and renew them at regular intervals The implementation does basically the same thing as the YARN backend. The keytab is copied to driver/executors through an environment variable in the {{ApplicationDescription}}. I might be wrong, but I'm assuming proper {{spark.authenticate}} setup would ensure it's encrypted over-the-wire (can anyone confirm?). Credentials on the executors and the driver (cluster mode) are written to disk as whatever user the Spark daemon runs as. Open to suggestions on whether it's worth tightening that up. Would appreciate any feedback from the community. > Allow for keytab-based HDFS security in Standalone mode > --- > > Key: SPARK-5158 > URL: https://issues.apache.org/jira/browse/SPARK-5158 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Matthew Cheah >Priority: Critical > > There have been a handful of patches for allowing access to Kerberized HDFS > clusters in standalone mode. The main reason we haven't accepted these > patches have been that they rely on insecure distribution of token files from > the driver to the other components. > As a simpler solution, I wonder if we should just provide a way to have the > Spark driver and executors independently log in and acquire credentials using > a keytab. This would work for users who have a dedicated, single-tenant, > Spark clusters (i.e. they are willing to have a keytab on every machine > running Spark for their application). It wouldn't address all possible > deployment scenarios, but if it's simple I think it's worth considering. > This would also work for Spark streaming jobs, which often run on dedicated > hardware since they are long-running services. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode
[ https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149309#comment-15149309 ] Henry Saputra edited comment on SPARK-5158 at 2/16/16 9:14 PM: --- HI All, seemed like all PRs for this issue are closed. This PR: https://github.com/apache/spark/pull/265 is closed claiming there is a more recent PR is being work on, which I assume is this one: https://github.com/apache/spark/pull/4106 but this one is also closed due to inactivity. Looking at the issues filed that are closed as duplicate for this one, there is a need and interest to get standalone mode to access secured HDFS given the active users keytab already available to the machines that run Spark. was (Author: hsaputra): All, the PR for this issues are closed. This PR: https://github.com/apache/spark/pull/265 is closed claiming there is a more recent PR is being work on, which I assume is this one: https://github.com/apache/spark/pull/4106 but this one is also closed due to inactivity. Looking at the issues filed that are closed as duplicate for this one, there is a need and interest to get standalone mode to access secured HDFS given the active users keytab already available to the machines that run Spark. > Allow for keytab-based HDFS security in Standalone mode > --- > > Key: SPARK-5158 > URL: https://issues.apache.org/jira/browse/SPARK-5158 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Matthew Cheah >Priority: Critical > > There have been a handful of patches for allowing access to Kerberized HDFS > clusters in standalone mode. The main reason we haven't accepted these > patches have been that they rely on insecure distribution of token files from > the driver to the other components. > As a simpler solution, I wonder if we should just provide a way to have the > Spark driver and executors independently log in and acquire credentials using > a keytab. This would work for users who have a dedicated, single-tenant, > Spark clusters (i.e. they are willing to have a keytab on every machine > running Spark for their application). It wouldn't address all possible > deployment scenarios, but if it's simple I think it's worth considering. > This would also work for Spark streaming jobs, which often run on dedicated > hardware since they are long-running services. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org