[jira] [Commented] (SPARK-11305) Remove Third-Party Hadoop Distributions Doc Page

2015-10-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976167#comment-14976167
 ] 

Apache Spark commented on SPARK-11305:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/9298

> Remove Third-Party Hadoop Distributions Doc Page
> 
>
> Key: SPARK-11305
> URL: https://issues.apache.org/jira/browse/SPARK-11305
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Patrick Wendell
>Priority: Critical
>
> There is a fairly old page in our docs that contains a bunch of assorted 
> information regarding running Spark on Hadoop clusters. I think this page 
> should be removed and merged into other parts of the docs because the 
> information is largely redundant and somewhat outdated.
> http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html
> There are three sections:
> 1. Compile time Hadoop version - this information I think can be removed in 
> favor of that on the "building spark" page. These days most "advanced users" 
> are building without bundling Hadoop, so I'm not sure giving them a bunch of 
> different Hadoop versions sends the right message.
> 2. Linking against Hadoop - this doesn't seem to add much beyond what is in 
> the programming guide.
> 3. Where to run Spark - redundant with the hardware provisioning guide.
> 4. Inheriting cluster configurations - I think this would be better as a 
> section at the end of the configuration page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11305) Remove Third-Party Hadoop Distributions Doc Page

2015-10-26 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973908#comment-14973908
 ] 

Sean Owen commented on SPARK-11305:
---

I support this and would tack on a few more reasons:

- the Hadoop distributions listed here are quite old at this stage anyhow
- it could be perceived as subtly favoring the listed distributions
- I am not clear that, for example, the CDH4 build continues to work with CDH; 
for all distributions, this might be implying a level of guarantee of 
compatibility that isn't reflected in testing

Related: what about continuing to package and distribute the cdh4 build? For 
similar reasons I think this could go in 1.6.

> Remove Third-Party Hadoop Distributions Doc Page
> 
>
> Key: SPARK-11305
> URL: https://issues.apache.org/jira/browse/SPARK-11305
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Patrick Wendell
>Priority: Critical
>
> There is a fairly old page in our docs that contains a bunch of assorted 
> information regarding running Spark on Hadoop clusters. I think this page 
> should be removed and merged into other parts of the docs because the 
> information is largely redundant and somewhat outdated.
> http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html
> There are three sections:
> 1. Compile time Hadoop version - this information I think can be removed in 
> favor of that on the "building spark" page. These days most "advanced users" 
> are building without bundling Hadoop, so I'm not sure giving them a bunch of 
> different Hadoop versions sends the right message.
> 2. Linking against Hadoop - this doesn't seem to add much beyond what is in 
> the programming guide.
> 3. Where to run Spark - redundant with the hardware provisioning guide.
> 4. Inheriting cluster configurations - I think this would be better as a 
> section at the end of the configuration page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11305) Remove Third-Party Hadoop Distributions Doc Page

2015-10-25 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973493#comment-14973493
 ] 

Patrick Wendell commented on SPARK-11305:
-

/cc [~srowen] for his thoughts.

> Remove Third-Party Hadoop Distributions Doc Page
> 
>
> Key: SPARK-11305
> URL: https://issues.apache.org/jira/browse/SPARK-11305
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Patrick Wendell
>Priority: Critical
>
> There is a fairly old page in our docs that contains a bunch of assorted 
> information regarding running Spark on Hadoop clusters. I think this page 
> should be removed and merged into other parts of the docs because the 
> information is largely redundant and somewhat outdated.
> http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html
> There are three sections:
> 1. Compile time Hadoop version - this information I think can be removed in 
> favor of that on the "building spark" page. These days most "advanced users" 
> are building without bundling Hadoop, so I'm not sure giving them a bunch of 
> different Hadoop versions sends the right message.
> 2. Linking against Hadoop - this doesn't seem to add much beyond what is in 
> the programming guide.
> 3. Where to run Spark - redundant with the hardware provisioning guide.
> 4. Inheriting cluster configurations - I think this would be better as a 
> section at the end of the configuration page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org