[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-221970852 thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11033 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-221961068 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-220023585 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58775/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-220023582 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-220023317 **[Test build #58775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58775/consoleFull)** for PR 11033 at commit [`56571da`](https://github.com/apache/spark/commit/56571da16cc0e71fd68c07abba26476fa51f7ddf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-219998704 **[Test build #58775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58775/consoleFull)** for PR 11033 at commit [`56571da`](https://github.com/apache/spark/commit/56571da16cc0e71fd68c07abba26476fa51f7ddf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-219756852 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-219756858 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58699/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-219756838 **[Test build #58699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58699/consoleFull)** for PR 11033 at commit [`f14768a`](https://github.com/apache/spark/commit/f14768aa6fce320ba07f7146755a833d651d16ac). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-219756299 **[Test build #58699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58699/consoleFull)** for PR 11033 at commit [`f14768a`](https://github.com/apache/spark/commit/f14768aa6fce320ba07f7146755a833d651d16ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-214881768 I have the comment above about the wording on one of the sentences, otherwise I think its fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-214831899 @tgravescs the logging bit of the patch is in sync with master/ . Is there anything else you want me to do regarding the documentation to get it into a committable state? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-214813942 **[Test build #57008 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57008/consoleFull)** for PR 11033 at commit [`4707ec6`](https://github.com/apache/spark/commit/4707ec624bf94cd6b2e592fc76640fafb8b137a9). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-214813952 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57008/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-214813950 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-214813608 **[Test build #57008 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57008/consoleFull)** for PR 11033 at commit [`4707ec6`](https://github.com/apache/spark/commit/4707ec624bf94cd6b2e592fc76640fafb8b137a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/11033#discussion_r57734251 --- Diff: docs/running-on-yarn.md --- @@ -452,3 +452,104 @@ If you need a reference to the proper location to put log files in the YARN so t - In `cluster` mode, the local directories used by the Spark executors and the Spark driver will be the local directories configured for YARN (Hadoop YARN config `yarn.nodemanager.local-dirs`). If the user specifies `spark.local.dir`, it will be ignored. In `client` mode, the Spark executors will use the local directories configured for YARN while the Spark driver will use those defined in `spark.local.dir`. This is because the Spark driver does not run on the YARN cluster in `client` mode, only the Spark executors do. - The `--files` and `--archives` options support specifying file names with the # similar to Hadoop. For example you can specify: `--files localtest.txt#appSees.txt` and this will upload the file you have locally named `localtest.txt` into HDFS but this will be linked to by the name `appSees.txt`, and your application should use the name as `appSees.txt` to reference it when running on YARN. - The `--jars` option allows the `SparkContext.addJar` function to work if you are using it with local files and running in `cluster` mode. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files. + +# Running in a Secure Cluster + +As covered in [security](security.html), Kerberos is used in a secure Hadoop cluster to +authenticate principals associated with services and clients. This allows clients to +make requests of these authenticated services; the services to grant rights +to the authenticated principals. + +Hadoop services issue *hadoop tokens* to grant access to the services and data, +tokens which the client must supply over Hadoop IPC and REST/Web APIs as proof of access rights. +For Spark applications launched in a YARN cluster to interact with HDFS, HBase and Hive, +the application must acquire the relevant tokens +using the Kerberos credentials of the user launching the application âthat is, the principal whose +identity will become that of the launched Spark application. + +This is normally done at launch time: in a secure cluster Spark will automatically obtain a +token for the cluster's HDFS filesystem, and potentially for HBase and Hive. + +An HBase token will be obtained if HBase is in on classpath, the HBase configuration declares +the application is secure (i.e. `hbase.security.authentication==kerberos`), +and `spark.yarn.security.tokens.hbase.enabled` is not set to `false`. + +Similarly, a Hive token will be obtained if Hive is on the classpath, its configuration +includes a URI of the metadata store in `"hive.metastore.uris`, and +`spark.yarn.security.tokens.hive.enabled` is not set to `false`. + +If an application needs to interact with other secure HDFS clusters, then +the tokens needed to access these clusters must be explicitly requested at +launch time. This is done by listing them in the `spark.yarn.access.namenodes` property. + +``` +spark.yarn.access.namenodes hdfs://ireland.example.org:8020/,hdfs://frankfurt.example.org:8020/ +``` + +Hadoop tokens expire. They can be renewed "for a while". --- End diff -- ok, sorry I missed your question go by. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/11033#discussion_r57308003 --- Diff: docs/running-on-yarn.md --- @@ -452,3 +452,104 @@ If you need a reference to the proper location to put log files in the YARN so t - In `cluster` mode, the local directories used by the Spark executors and the Spark driver will be the local directories configured for YARN (Hadoop YARN config `yarn.nodemanager.local-dirs`). If the user specifies `spark.local.dir`, it will be ignored. In `client` mode, the Spark executors will use the local directories configured for YARN while the Spark driver will use those defined in `spark.local.dir`. This is because the Spark driver does not run on the YARN cluster in `client` mode, only the Spark executors do. - The `--files` and `--archives` options support specifying file names with the # similar to Hadoop. For example you can specify: `--files localtest.txt#appSees.txt` and this will upload the file you have locally named `localtest.txt` into HDFS but this will be linked to by the name `appSees.txt`, and your application should use the name as `appSees.txt` to reference it when running on YARN. - The `--jars` option allows the `SparkContext.addJar` function to work if you are using it with local files and running in `cluster` mode. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files. + +# Running in a Secure Cluster + +As covered in [security](security.html), Kerberos is used in a secure Hadoop cluster to +authenticate principals associated with services and clients. This allows clients to +make requests of these authenticated services; the services to grant rights +to the authenticated principals. + +Hadoop services issue *hadoop tokens* to grant access to the services and data, +tokens which the client must supply over Hadoop IPC and REST/Web APIs as proof of access rights. +For Spark applications launched in a YARN cluster to interact with HDFS, HBase and Hive, +the application must acquire the relevant tokens +using the Kerberos credentials of the user launching the application âthat is, the principal whose +identity will become that of the launched Spark application. + +This is normally done at launch time: in a secure cluster Spark will automatically obtain a +token for the cluster's HDFS filesystem, and potentially for HBase and Hive. + +An HBase token will be obtained if HBase is in on classpath, the HBase configuration declares +the application is secure (i.e. `hbase.security.authentication==kerberos`), +and `spark.yarn.security.tokens.hbase.enabled` is not set to `false`. + +Similarly, a Hive token will be obtained if Hive is on the classpath, its configuration +includes a URI of the metadata store in `"hive.metastore.uris`, and +`spark.yarn.security.tokens.hive.enabled` is not set to `false`. + +If an application needs to interact with other secure HDFS clusters, then +the tokens needed to access these clusters must be explicitly requested at +launch time. This is done by listing them in the `spark.yarn.access.namenodes` property. + +``` +spark.yarn.access.namenodes hdfs://ireland.example.org:8020/,hdfs://frankfurt.example.org:8020/ +``` + +Hadoop tokens expire. They can be renewed "for a while". --- End diff -- ...I'll leave out the renew part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/11033#discussion_r56998643 --- Diff: docs/running-on-yarn.md --- @@ -452,3 +452,104 @@ If you need a reference to the proper location to put log files in the YARN so t - In `cluster` mode, the local directories used by the Spark executors and the Spark driver will be the local directories configured for YARN (Hadoop YARN config `yarn.nodemanager.local-dirs`). If the user specifies `spark.local.dir`, it will be ignored. In `client` mode, the Spark executors will use the local directories configured for YARN while the Spark driver will use those defined in `spark.local.dir`. This is because the Spark driver does not run on the YARN cluster in `client` mode, only the Spark executors do. - The `--files` and `--archives` options support specifying file names with the # similar to Hadoop. For example you can specify: `--files localtest.txt#appSees.txt` and this will upload the file you have locally named `localtest.txt` into HDFS but this will be linked to by the name `appSees.txt`, and your application should use the name as `appSees.txt` to reference it when running on YARN. - The `--jars` option allows the `SparkContext.addJar` function to work if you are using it with local files and running in `cluster` mode. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files. + +# Running in a Secure Cluster + +As covered in [security](security.html), Kerberos is used in a secure Hadoop cluster to +authenticate principals associated with services and clients. This allows clients to +make requests of these authenticated services; the services to grant rights +to the authenticated principals. + +Hadoop services issue *hadoop tokens* to grant access to the services and data, +tokens which the client must supply over Hadoop IPC and REST/Web APIs as proof of access rights. +For Spark applications launched in a YARN cluster to interact with HDFS, HBase and Hive, +the application must acquire the relevant tokens +using the Kerberos credentials of the user launching the application âthat is, the principal whose +identity will become that of the launched Spark application. + +This is normally done at launch time: in a secure cluster Spark will automatically obtain a +token for the cluster's HDFS filesystem, and potentially for HBase and Hive. + +An HBase token will be obtained if HBase is in on classpath, the HBase configuration declares +the application is secure (i.e. `hbase.security.authentication==kerberos`), +and `spark.yarn.security.tokens.hbase.enabled` is not set to `false`. + +Similarly, a Hive token will be obtained if Hive is on the classpath, its configuration +includes a URI of the metadata store in `"hive.metastore.uris`, and +`spark.yarn.security.tokens.hive.enabled` is not set to `false`. + +If an application needs to interact with other secure HDFS clusters, then +the tokens needed to access these clusters must be explicitly requested at +launch time. This is done by listing them in the `spark.yarn.access.namenodes` property. + +``` +spark.yarn.access.namenodes hdfs://ireland.example.org:8020/,hdfs://frankfurt.example.org:8020/ +``` + +Hadoop tokens expire. They can be renewed "for a while". --- End diff -- adding more details is something I'm scared of on the basis that I'd only repeat my own misunderstandings. What would you suggest? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-199824035 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53768/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-199824030 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-199823547 **[Test build #53768 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53768/consoleFull)** for PR 11033 at commit [`9a37d62`](https://github.com/apache/spark/commit/9a37d629133d0743d476019893d886de80dd6d60). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/11033#discussion_r56984737 --- Diff: docs/running-on-yarn.md --- @@ -452,3 +452,104 @@ If you need a reference to the proper location to put log files in the YARN so t - In `cluster` mode, the local directories used by the Spark executors and the Spark driver will be the local directories configured for YARN (Hadoop YARN config `yarn.nodemanager.local-dirs`). If the user specifies `spark.local.dir`, it will be ignored. In `client` mode, the Spark executors will use the local directories configured for YARN while the Spark driver will use those defined in `spark.local.dir`. This is because the Spark driver does not run on the YARN cluster in `client` mode, only the Spark executors do. - The `--files` and `--archives` options support specifying file names with the # similar to Hadoop. For example you can specify: `--files localtest.txt#appSees.txt` and this will upload the file you have locally named `localtest.txt` into HDFS but this will be linked to by the name `appSees.txt`, and your application should use the name as `appSees.txt` to reference it when running on YARN. - The `--jars` option allows the `SparkContext.addJar` function to work if you are using it with local files and running in `cluster` mode. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files. + +# Running in a Secure Cluster + +As covered in [security](security.html), Kerberos is used in a secure Hadoop cluster to +authenticate principals associated with services and clients. This allows clients to +make requests of these authenticated services; the services to grant rights +to the authenticated principals. + +Hadoop services issue *hadoop tokens* to grant access to the services and data, +tokens which the client must supply over Hadoop IPC and REST/Web APIs as proof of access rights. +For Spark applications launched in a YARN cluster to interact with HDFS, HBase and Hive, +the application must acquire the relevant tokens +using the Kerberos credentials of the user launching the application âthat is, the principal whose +identity will become that of the launched Spark application. + +This is normally done at launch time: in a secure cluster Spark will automatically obtain a +token for the cluster's HDFS filesystem, and potentially for HBase and Hive. + +An HBase token will be obtained if HBase is in on classpath, the HBase configuration declares +the application is secure (i.e. `hbase.security.authentication==kerberos`), +and `spark.yarn.security.tokens.hbase.enabled` is not set to `false`. + +Similarly, a Hive token will be obtained if Hive is on the classpath, its configuration +includes a URI of the metadata store in `"hive.metastore.uris`, and +`spark.yarn.security.tokens.hive.enabled` is not set to `false`. + +If an application needs to interact with other secure HDFS clusters, then +the tokens needed to access these clusters must be explicitly requested at +launch time. This is done by listing them in the `spark.yarn.access.namenodes` property. + +``` +spark.yarn.access.namenodes hdfs://ireland.example.org:8020/,hdfs://frankfurt.example.org:8020/ +``` + +Hadoop tokens expire. They can be renewed "for a while". --- End diff -- I think we should explain this a bit more. Either add more details about hard expiration and the renew period or just leave out the renew part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/11033#discussion_r56984185 --- Diff: docs/running-on-yarn.md --- @@ -452,3 +452,104 @@ If you need a reference to the proper location to put log files in the YARN so t - In `cluster` mode, the local directories used by the Spark executors and the Spark driver will be the local directories configured for YARN (Hadoop YARN config `yarn.nodemanager.local-dirs`). If the user specifies `spark.local.dir`, it will be ignored. In `client` mode, the Spark executors will use the local directories configured for YARN while the Spark driver will use those defined in `spark.local.dir`. This is because the Spark driver does not run on the YARN cluster in `client` mode, only the Spark executors do. - The `--files` and `--archives` options support specifying file names with the # similar to Hadoop. For example you can specify: `--files localtest.txt#appSees.txt` and this will upload the file you have locally named `localtest.txt` into HDFS but this will be linked to by the name `appSees.txt`, and your application should use the name as `appSees.txt` to reference it when running on YARN. - The `--jars` option allows the `SparkContext.addJar` function to work if you are using it with local files and running in `cluster` mode. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files. + +# Running in a Secure Cluster + +As covered in [security](security.html), Kerberos is used in a secure Hadoop cluster to +authenticate principals associated with services and clients. This allows clients to +make requests of these authenticated services; the services to grant rights +to the authenticated principals. + +Hadoop services issue *hadoop tokens* to grant access to the services and data, +tokens which the client must supply over Hadoop IPC and REST/Web APIs as proof of access rights. --- End diff -- These sentences sound a bit weird to me. Perhaps something just like: Hadoop services issue tokens to grant access to the services and data. Clients must first acquire tokens for the services they will access and pass them along with their application. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-199773576 **[Test build #53768 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53768/consoleFull)** for PR 11033 at commit [`9a37d62`](https://github.com/apache/spark/commit/9a37d629133d0743d476019893d886de80dd6d60). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-195783350 1. I'll update 2. I think the extra credential dump should be pulled up into {{SparkHadoopUtil}}; it's not yarn-specific --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org