[jira] [Commented] (SPARK-11305) Remove Third-Party Hadoop Distributions Doc Page
[ https://issues.apache.org/jira/browse/SPARK-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976167#comment-14976167 ] Apache Spark commented on SPARK-11305: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/9298 > Remove Third-Party Hadoop Distributions Doc Page > > > Key: SPARK-11305 > URL: https://issues.apache.org/jira/browse/SPARK-11305 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Patrick Wendell >Priority: Critical > > There is a fairly old page in our docs that contains a bunch of assorted > information regarding running Spark on Hadoop clusters. I think this page > should be removed and merged into other parts of the docs because the > information is largely redundant and somewhat outdated. > http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html > There are three sections: > 1. Compile time Hadoop version - this information I think can be removed in > favor of that on the "building spark" page. These days most "advanced users" > are building without bundling Hadoop, so I'm not sure giving them a bunch of > different Hadoop versions sends the right message. > 2. Linking against Hadoop - this doesn't seem to add much beyond what is in > the programming guide. > 3. Where to run Spark - redundant with the hardware provisioning guide. > 4. Inheriting cluster configurations - I think this would be better as a > section at the end of the configuration page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11305) Remove Third-Party Hadoop Distributions Doc Page
[ https://issues.apache.org/jira/browse/SPARK-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973908#comment-14973908 ] Sean Owen commented on SPARK-11305: --- I support this and would tack on a few more reasons: - the Hadoop distributions listed here are quite old at this stage anyhow - it could be perceived as subtly favoring the listed distributions - I am not clear that, for example, the CDH4 build continues to work with CDH; for all distributions, this might be implying a level of guarantee of compatibility that isn't reflected in testing Related: what about continuing to package and distribute the cdh4 build? For similar reasons I think this could go in 1.6. > Remove Third-Party Hadoop Distributions Doc Page > > > Key: SPARK-11305 > URL: https://issues.apache.org/jira/browse/SPARK-11305 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Patrick Wendell >Priority: Critical > > There is a fairly old page in our docs that contains a bunch of assorted > information regarding running Spark on Hadoop clusters. I think this page > should be removed and merged into other parts of the docs because the > information is largely redundant and somewhat outdated. > http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html > There are three sections: > 1. Compile time Hadoop version - this information I think can be removed in > favor of that on the "building spark" page. These days most "advanced users" > are building without bundling Hadoop, so I'm not sure giving them a bunch of > different Hadoop versions sends the right message. > 2. Linking against Hadoop - this doesn't seem to add much beyond what is in > the programming guide. > 3. Where to run Spark - redundant with the hardware provisioning guide. > 4. Inheriting cluster configurations - I think this would be better as a > section at the end of the configuration page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11305) Remove Third-Party Hadoop Distributions Doc Page
[ https://issues.apache.org/jira/browse/SPARK-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973493#comment-14973493 ] Patrick Wendell commented on SPARK-11305: - /cc [~srowen] for his thoughts. > Remove Third-Party Hadoop Distributions Doc Page > > > Key: SPARK-11305 > URL: https://issues.apache.org/jira/browse/SPARK-11305 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Patrick Wendell >Priority: Critical > > There is a fairly old page in our docs that contains a bunch of assorted > information regarding running Spark on Hadoop clusters. I think this page > should be removed and merged into other parts of the docs because the > information is largely redundant and somewhat outdated. > http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html > There are three sections: > 1. Compile time Hadoop version - this information I think can be removed in > favor of that on the "building spark" page. These days most "advanced users" > are building without bundling Hadoop, so I'm not sure giving them a bunch of > different Hadoop versions sends the right message. > 2. Linking against Hadoop - this doesn't seem to add much beyond what is in > the programming guide. > 3. Where to run Spark - redundant with the hardware provisioning guide. > 4. Inheriting cluster configurations - I think this would be better as a > section at the end of the configuration page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org