Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/629#discussion_r12266937 --- Diff: docs/building-with-maven.md --- @@ -42,22 +54,40 @@ For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop versions wit # Apache Hadoop 0.23.x $ mvn -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean package -For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, you can enable the "yarn-alpha" or "yarn" profile and set the "hadoop.version", "yarn.version" property. Note that Hadoop 0.23.X requires a special `-Phadoop-0.23` profile: +For Apache Hadoop 2.x, 0.23.x, Cloudera CDH, and other Hadoop versions with YARN, you can enable the "yarn-alpha" or "yarn" profile and optionally set the "yarn.version" property if it is different from "hadoop.version". The additional build profile required depends on the YARN version: + +<table class="table"> + <thead> + <tr><th>YARN version</th><th>Profile required</th></tr> + </thead> + <tbody> + <tr><td>0.23.x to 2.1.x</td><td>yarn-alpha</td></tr> + <tr><td>2.2.x and later</td><td>yarn</td></tr> + </tbody> +</table> + +Examples: # Apache Hadoop 2.0.5-alpha $ mvn -Pyarn-alpha -Dhadoop.version=2.0.5-alpha -DskipTests clean package - # Cloudera CDH 4.2.0 with MapReduce v2 + # Cloudera CDH 4.2.0 $ mvn -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.2.0 -DskipTests clean package - # Apache Hadoop 2.2.X (e.g. 2.2.0 as below) and newer - $ mvn -Pyarn -Dhadoop.version=2.2.0 -DskipTests clean package - # Apache Hadoop 0.23.x - $ mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -Dyarn.version=0.23.7 -DskipTests clean package + $ mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean package + + # Apache Hadoop 2.2.X + $ mvn -Pyarn -Phadoop-2.2 -DskipTests clean package --- End diff -- I think it might be better to always ask people to specify `hadoop.version` and then just explain that in some cases they need to add a profile to work around problems in the hadoop dependency graph. Otherwise sometimes we are relying on the profile to set `hadoop.version` and it could be a bit confusing to users what is going on. ``` mvn -Pyarn -Dhadoop.version=2.2.X -Phadoop-2.2 -DskipTests clean package ``` The header here says "Apache Hadoop 2.2.X" but the actual example can't be directly generalized to 2.2.X without them digging around the build more.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---