Patrick Wendell created SPARK-11305: ---------------------------------------
Summary: Remove Third-Party Hadoop Distributions Doc Page Key: SPARK-11305 URL: https://issues.apache.org/jira/browse/SPARK-11305 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Priority: Critical There is a fairly old page in our docs that contains a bunch of assorted information regarding running Spark on Hadoop clusters. I think this page should be removed and merged into other parts of the docs because the information is largely redundant and somewhat outdated. http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html There are three sections: 1. Compile time Hadoop version - this information I think can be removed in favor of that on the "building spark" page. These days most "advanced users" are building without bundling Hadoop, so I'm not sure giving them a bunch of different Hadoop versions sends the right message. 2. Linking against Hadoop - this doesn't seem to add much beyond what is in the programming guide. 3. Where to run Spark - redundant with the hardware provisioning guide. 4. Inheriting cluster configurations - I think this would be better as a section at the end of the configuration page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org