Re: [GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Yes I think this may be all fine already as the Jenkins jobs just call the run-tests script? and that was already changed to reflect new configurations. Let's keep an eye on them for failures but it may be all set already. On Thu, May 14, 2015 at 7:44 PM, shaneknapp g...@git.apache.org wrote: Github user shaneknapp commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102132344 @srowen -- so i'm looking at the configs and we don't actually specify anything in the builds WRT to mvn or sbt build options[1]. these are all matrix build configs and the hadoop options are all handled in the scripts in dev/... (for the most part). [1] - not *technically* true, but these guys don't seem like they'll break as the hadoop version 2.2.0: - https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.4-Maven-with-YARN/configure - https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-Maven-with-YARN/configure - https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.2-Maven-with-YARN/configure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102027647 [Test build #32708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32708/consoleFull) for PR 5786 at commit [`11670e5`](https://github.com/apache/spark/commit/11670e5baf49489c2d0e394a32865deff8e3a791). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102027658 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32708/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5786 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102027657 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102049341 All test passed @srowen. It was, as expected, an unrelated error. Is everything set now to merge this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user shaneknapp commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102132344 @srowen -- so i'm looking at the configs and we don't actually specify anything in the builds WRT to mvn or sbt build options[1]. these are all matrix build configs and the hadoop options are all handled in the scripts in dev/... (for the most part). [1] - not *technically* true, but these guys don't seem like they'll break as the hadoop version 2.2.0: - https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.4-Maven-with-YARN/configure - https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-Maven-with-YARN/configure - https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.2-Maven-with-YARN/configure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101980296 Are you sure Sean? I could make the change and push it, but if is easier to make the change in the merge you tell me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101981241 I'm happy to help, give me a sec and I'll push the changes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5786#discussion_r30307006 --- Diff: dev/create-release/create-release.sh --- @@ -118,14 +118,14 @@ if [[ ! $@ =~ --skip-publish ]]; then rm -rf $SPARK_REPO - build/mvn -DskipTests -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 \ --Pyarn -Phive -Phive-thriftserver -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \ + build/mvn -DskipTests -Pyarn -Phive \ +-Phive-thriftserver -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \ clean install ./dev/change-version-to-2.11.sh - build/mvn -DskipTests -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 \ --Dscala-2.11 -Pyarn -Phive -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \ + build/mvn -DskipTests -Pyarn -Phive \ +-Dscala-2.11 -Pspark-ganglia-lgpl -Pkinesis-asl \ --- End diff -- This still needs `-Phadoop-2.2` but maybe I can add that on merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101980703 Go ahead if you have a moment; only if it's not much work. Thanks for your perseverance. This ends up being a great change IMHO. It will go in shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102002203 [Test build #32700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32700/consoleFull) for PR 5786 at commit [`11670e5`](https://github.com/apache/spark/commit/11670e5baf49489c2d0e394a32865deff8e3a791). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102002247 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32700/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102002243 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102002721 This has happened before @srowen, I think this is again an unrelated fail. Could you ask jenkins to retest this please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102002860 I'm sure it is as you only made a doc change but while we are waiting: Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102003335 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102003319 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-102003559 [Test build #32708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32708/consoleFull) for PR 5786 at commit [`11670e5`](https://github.com/apache/spark/commit/11670e5baf49489c2d0e394a32865deff8e3a791). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101983143 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101983370 [Test build #32700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32700/consoleFull) for PR 5786 at commit [`11670e5`](https://github.com/apache/spark/commit/11670e5baf49489c2d0e394a32865deff8e3a791). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101983163 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101637112 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32605/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101637110 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101601409 I hope that this is the correct way of making all the changes you suggested. Please check this and thank you @srowen @vanzin and @pwendell. Let me know if there is something else that could be done, or if this finishes the patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101603183 [Test build #32605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32605/consoleFull) for PR 5786 at commit [`379f50d`](https://github.com/apache/spark/commit/379f50d63629318d1d0689a155a201a220aa54fe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101602744 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101602861 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101637097 [Test build #32605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32605/consoleFull) for PR 5786 at commit [`379f50d`](https://github.com/apache/spark/commit/379f50d63629318d1d0689a155a201a220aa54fe). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101554493 @FavioVazquez Other conflicting changes have been merged to `master` since this was opened, which is why Github says This pull request contains merge conflicts that must be resolved.. Basically you need to pull the latest `master` changes and `rebase` on it. Are you familiar with that process? I have a remote `origin` for my fork and `upstream` for the main project and so usually do ... ``` git checkout master git pull upstream master git checkout mybranch git rebase master ``` You have to fix the merge conflicts and `git push origin mybranch` then. @vanzin still disagree on the effective POM stuff; you're all correct about what should happen, but this happened in reality: https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.3.1/spark-core_2.10-1.3.1.pom ``` dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-client/artifactId version2.2.0/version ... ``` That was what I'm getting at; this shouldn't change backwards now. Whether or not that is a point doesn't matter though if we're on board with proceeding. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101556712 I think @vanzin was saying the pom for spark-parent does not have 2.2.0, it has 1.0.4. But it's moot because none of the other projects expect to get hadoop.version from it since we use effective poms. https://repo1.maven.org/maven2/org/apache/spark/spark-parent_2.10/1.3.1/spark-parent_2.10-1.3.1.pom But really spark-parent affects no one except for someone directly extending spark's build (which we've never expected people to do, really). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101557370 Yes, agree with all that. But `spark-core` affects people. The proposed change would cause its effective POM to depend on 1.0.4 in Spark 1.4, assuming it's published as intended with no build flags, whereas Spark 1.3 depends on 2.2.0 (which I understand was inadvertent to begin with). Is the disconnect that people think that's fine? I'd be surprised, since actual, real Hadoop dependency for Spark Core would have flip flopped from 2.2.0 to 1.0.4 and back to 2.2.0 over three releases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101560695 I thought the proposal here was to continue publishing artifacts with 2.2.0 (?) If you look at the patch, it moves the default build to 2.2.0 and then it publishes with the default build. I think that's the way to go... but maybe I'm misunderstanding? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101561958 Yes, I'm arguing that the alternative, changing `hadoop.version` back to 1.0.4, is not a solution. I disagree with The smallest fix for the issue would be to revert back to 1.0.4 as the default version -- a fix perhaps for one thing here, but that causes a different problem. I am in favor of this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101562899 Oh yeah sorry - I was wrong about that. Never said so explicitly, but basically I support this patch in it's current form (brought up to date) with the only change being to not ask users to rely on default behavior in our docs (i.e. don't delete the docs referring to hadoop-2.2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5786#discussion_r30209856 --- Diff: docs/building-spark.md --- @@ -67,8 +67,8 @@ Because HDFS is not protocol-compatible across versions, if you want to read fro /thead tbody trtd0.23.x/tdtdhadoop-0.23/td/tr -trtd1.x to 2.1.x/tdtd(none)/td/tr -trtd2.2.x/tdtdhadoop-2.2/td/tr +trtd1.x to 2.1.x/tdtdhadoop-1/td/tr +trtd2.2.x/tdtd(none)/td/tr --- End diff -- And back here... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5786#discussion_r30209874 --- Diff: docs/building-spark.md --- @@ -92,8 +92,6 @@ You can enable the yarn profile and optionally set the yarn.version property Examples: {% highlight bash %} -# Apache Hadoop 2.2.X --- End diff -- This can come back --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5786#discussion_r30209842 --- Diff: dev/scalastyle --- @@ -20,8 +20,8 @@ echo -e q\n | build/sbt -Phive -Phive-thriftserver scalastyle scalastyle.txt echo -e q\n | build/sbt -Phive -Phive-thriftserver test:scalastyle scalastyle.txt # Check style with YARN built too -echo -e q\n | build/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 scalastyle scalastyle.txt -echo -e q\n | build/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 test:scalastyle scalastyle.txt +echo -e q\n | build/sbt -Pyarn scalastyle scalastyle.txt --- End diff -- `-Phadoop-2.2` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5786#discussion_r30209825 --- Diff: dev/create-release/create-release.sh --- @@ -118,14 +118,14 @@ if [[ ! $@ =~ --skip-publish ]]; then rm -rf $SPARK_REPO - build/mvn -DskipTests -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 \ --Pyarn -Phive -Phive-thriftserver -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \ + build/mvn -DskipTests -Pyarn -Phive \ --- End diff -- Favio I think these two now need `-Phadoop-2.2` again to be fully consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5786#discussion_r30209835 --- Diff: dev/run-tests --- @@ -40,11 +40,11 @@ function handle_error () { { if [ -n $AMPLAB_JENKINS_BUILD_PROFILE ]; then if [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop1.0 ]; then - export SBT_MAVEN_PROFILES_ARGS=-Dhadoop.version=1.0.4 + export SBT_MAVEN_PROFILES_ARGS=-Phadoop-1 -Dhadoop.version=1.0.4 elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.0 ]; then - export SBT_MAVEN_PROFILES_ARGS=-Dhadoop.version=2.0.0-mr1-cdh4.1.1 + export SBT_MAVEN_PROFILES_ARGS=-Phadoop-1 -Dhadoop.version=2.0.0-mr1-cdh4.1.1 elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.2 ]; then - export SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 + export SBT_MAVEN_PROFILES_ARGS=-Pyarn --- End diff -- `-Phadoop-2.2` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101565704 Great, I'm familiar with the process @srowen. Thank you guys for all the suggestions, I'm making the changes and be pushing the changes soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101355534 I don't have any problems with a plain vanilla mvn package ... ? Can you successfully run unit tests with that? I'd expect the library version conflicts to start causing issues at that point. Also, I'm not sure the generated assembly would work on a real cluster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101377390 Yes, `mvn -DskipTests clean package; mvn test` succeeds in `master` for me now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101353251 I think this is more than just a cleanup. As Favio points out, if you build with just `mvn package`, the build is broken because of inconsistent versions. The minimum command line to get a working build today is `mvn -Dhadoop.version=1.0.4 package`. It may be that all official build scripts work around that problem inadvertently. But the current code is not correct. So we either need to fix things so that the default profile is *actually* hadoop-2.2, or revert the previous change so that the default profile is hadoop-1. But right now the default profile is in a weird unhappy state. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101354454 I don't have any problems with a plain vanilla `mvn package` ... ? what's the issue? Things that don't care about Hadoop don't care; things that do, well, sometimes do need a Hadoop dependency set to a particular version of course. One problem here is that this version was actually already set to 2.2 in 1.3.0, at least: https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.3.1/spark-core_2.10-1.3.1.pom If there is any issue, then I think it's best to fix-forward. I have not observed any immediate issue though? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez closed the pull request at: https://github.com/apache/spark/pull/5786 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101474934 I will make the suggestes changes and push them --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101474903 Sorry i closed it by accident --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
GitHub user FavioVazquez reopened a pull request: https://github.com/apache/spark/pull/5786 [SPARK-7249] Updated Hadoop dependencies due to inconsistency in the versions Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons. Changes proposed by @vanzin resulting from previous pull-request https://github.com/apache/spark/pull/5783 that did not fixed the problem correctly. Please let me know if this is the correct way of doing this, the comments of @vanzin are in the pull-request mentioned. You can merge this pull request into a Git repository by running: $ git pull https://github.com/FavioVazquez/spark update-hadoop-dependencies Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5786.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5786 commit ec91ce3c405123818a4c56ef361d9cc82951677d Author: FavioVazquez favio.vazqu...@gmail.com Date: 2015-04-29T17:58:09Z - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version) commit 660decce9d3c2300aee493b605da0da8a74b3ea6 Author: FavioVazquez favio.vazqu...@gmail.com Date: 2015-04-29T19:16:04Z - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons commit 7e9955df29b5d5c9cda950636d51da753e6d17ea Author: FavioVazquez favio.vazqu...@gmail.com Date: 2015-04-29T19:35:08Z - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons commit 6b4bfafbe4f98c92ac2fe7aeb5f36a37d27a9678 Author: FavioVazquez favio.vazqu...@gmail.com Date: 2015-04-30T21:41:08Z - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff. commit 13542929c9cb3ddfec31bbb794e490b44c273df4 Author: FavioVazquez favio.vazqu...@gmail.com Date: 2015-04-30T22:13:50Z - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation commit 287fa2ffc31bb0c9eaf5daf80825ff0093f3f20d Author: FavioVazquez favio.vazqu...@gmail.com Date: 2015-04-30T22:17:44Z - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc. commit 70b8344dcad8f6de71bd6356cd6eec375211fdb3 Author: FavioVazquez favio.vazqu...@gmail.com Date: 2015-04-30T22:57:16Z - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles commit 88a8b88a13a02cbde04792cb63e3c6a81407d915 Author: FavioVazquez favio.vazqu...@gmail.com Date: 2015-05-01T16:48:27Z - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file commit 199f40b1733015a414eb928b2090f3bf4d0b7a7e Author: FavioVazquez favio.vazqu...@gmail.com Date: 2015-05-01T20:44:30Z - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that. commit a6507792cc12fc03139be825357f22329773c823 Author: FavioVazquez favio.vazqu...@gmail.com Date: 2015-05-01T20:50:46Z - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml commit 0470587ad7af93041e25dcb07954b835d9508a10 Author: FavioVazquez favio.vazqu...@gmail.com Date: 2015-05-01T21:06:52Z - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0 commit
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101474861 Perfect, I've been whatching all of your conversations. I wil make th --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101473151 let's just keep that line there and always suggest that people use a build profile Sure, no problems with that at all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101473496 K - everything else from your comment makes sense, and fine to put it in 1.4. It just wasn't immediately obvious to me that it was really broken since tests and build were okay, but clearly the versions in the pom were not consistent with Hadoop 2.2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101475218 In summary, add that line that @pwendell suggested.But I'm not sure about the default profiles, should I erase the hadoop-1 profile? there will be no default hadoop version now? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101466543 So, I had a chat offline (well, off-github) with Sean and these are my conclusions: - There is a real issue, addressed by this PR, that the default build generates an assembly that cannot talk to any version of HDFS. - In my view, the fix proposed here is the right way forward; it standardizes on hadoop-2 as the preferred hadoop version by making it the default, and having the default build work with a hadoop 2 cluster. - The smallest fix for the issue would be to revert back to 1.0.4 as the default version. Because we publish effective poms, that would not change the version of Hadoop for any artifacts except for spark-parent; that is not a big problem because it would only affect someone who depends on `${hadoop.version}` and has `spark-parent_2.10` as the parent project of their own project, which I'd guess is a very small set of people (if it even exists). As for whether the default build should work or we should disallow it, I don't really have a strong opinion. If there's an easy fix, sure, but if it gets complicated, then it's probably not worth it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101472800 Sounds good - however, I think the issues aren't totally decoupled, because this pull request deletes the following line from the documentation: ``` # Apache Hadoop 2.2.X -mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package ``` I am suggestion look, let's just keep that line there and always suggest that people use a build profile. Otherwise changes like this will not be future proof. Changes to default behavior are annoying for developers and I see no downside in asking people to be explicit about hadoop flags. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101478579 And @srowen you said some days ago that you knew the places that this PR needed a Rebase, could you point them out to me please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101486305 I think there has to be some default version by definition - and yeah let's have the defaults at 2.2.0. But I'd just in the instructions tell people building for hadoop 1 to use the -Phadoop1 profile and tell people building for 2.2.0 to use the -Phadoop2.2 profile etc. I.e. let's not encourage people to rely on the default behavior. On Tue, May 12, 2015 at 6:31 PM, Favio André Vázquez notificati...@github.com wrote: And @srowen https://github.com/srowen you said some days ago that you knew the places that this PR needed a Rebase, could you point them out to me please? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/5786#issuecomment-101478579. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101445511 @vanzin what works when the build says `hadoop.version=1.0.4` that doesn't work when the build says `hadoop.version=2.2.0`? Just running on Hadoop 1.x? Agree but that is no longer supposed to work by default if the default Hadoop version is supposed to be 2.x. Whatever the problem is, is already a problem, since the Spark 1.3 POMs already have 2.2.0 specified. Anyway, maybe that's just violent agreement that something has to be tweaked. If this is merged as a resolution for 1.4, OK by me for sure. I don't like `activeByDefault` merely because it gets disabled if any profile is selected, not just a Hadoop-related profile. I think coaching in the docs to always set these Hadoop profiles is maybe safer and more overt. Then, the net change would be: everywhere in this PR that doesn't say `-Phadoop-x.y` should add `-Phadoop-2.2`, which is actually a no-op profile, but then at least it's explicit. Eventually when, say, Hadoop 1.x support really goes away, the `hadoop-1` profile really goes away and breaks command lines that select this profile, but, that's good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101437368 Just OOC, I build with no enabled profiles and tried to run spark-shell on a real cluster (standalone since YARN profile wasn't enabled). It fails pretty early with: 15/05/12 15:08:18 INFO AppClient$ClientActor: Executor updated: app-20150512150817-0001/8 is now RUNNING java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$SetOwnerRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) Which is the issue Favio raised (mismatched protobuf versions). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101439466 but haven't we always generally needed to build Spark for Hadoop X to avoid this? Correct. But the default now claims to be 2.2, whereas the default build (i.e. the build where you do not enable any profiles) will not work on a hadoop-2.2 cluster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101442338 Color me surprised that the default Hadoop 1.0.4 build worked on Hadoop 2.2 (really?) though that's a fair point then. I think there's some line noise somewhere. I didn't say that at all, and I wouldn't expect that to work. But that is not the issue being raised here. So, even if the pom.xml went back to saying Hadoop 1.0.4 for 1.4, would this still be a problem for people building against Spark 1.3? What are you calling a problem here? Reverting that would revert the default build to hadoop-1. It would work on a hadoop 1 cluster, but not on a hadoop 2 cluster. Which is fine, because that's what the build would suggest. The current problem is that the default build says it's a hadoop-2.2 build but it does not work on a hadoop 2.2 cluster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101442243 Hey so I am okay to merge this into 1.4, but what about not having any publicly advertised default build and just asking people to always use profiles when building Spark in the documentation? Otherwise every time we change the default version we are likely to make someone's life more difficult by silently changing behavior on them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101459672 We all understand that POMs define how assemblies are built. This can't have nothing to do with them. You're mixing published POMs with the actual pom files in the build. They are not the same thing and that's what's making me incredibly confused. You're suggesting reverting to restore the default to work with Hadoop 1.x No, I'm suggesting fixing it, one way or another, which one doesn't matter. but then that trips a different version-related problem: the published POM for Spark 1.3 already references Hadoop 2.2.0. That's completely unrelated; the build that actually does a mvn deploy to update the maven artifacts needs to match the profiles needed to replicate the 1.3 build. That's completely separate from what the default property values are, which is what this PR is about. To summarize: fixing what the default build should be, or whether we should have a default build at all, is irrelevant to the published poms, and bringing them up just causes confusion. This PR exists for a simple reason: the artifacts generated when you build with the current default properties are broken. Period. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101438765 `spark-shell` works locally for me. You're right, this may not work on Hadoop cluster X, but haven't we always generally needed to build Spark for Hadoop X to avoid this? I get it though, maybe the inconsistent Hadoop client libs don't work whereas a consistent Hadoop 1.x client lib set did, even against a mismatched cluster version. Fair point and all that but this isn't the right way to build Spark anyway, and I'm afraid this change was effectively already released. I'm narrowly arguing against undoing the `hadoop.version=2.2.0` change. I'm also asserting that the 1.4 release artifacts will be fine. And then saying we should fix-forward the rest of this for 1.5, if not 1.4. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101443231 but what about not having any publicly advertised default build That's ok too, but it should be enforced somehow. e.g. have a profile with `activeByDefault` set to `true` than causes the build to fail with an error message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101458849 We all understand that POMs define how assemblies are built. This can't have nothing to do with them. As I say, the build works fine in, say, local mode (well, no obvious problems). Jenkins tests are happy since they don't rely on defaults. It's not true that nothing works, but there's a problem. This is narrowly about the Hadoop user's perilous expectations of defaults. I don't expect the default assembly to work on a Hadoop 1.x cluster, but it's not supposed to now in Spark 1.4. You're suggesting reverting to restore the default to work with Hadoop 1.x, but then that trips a different version-related problem: the published POM for Spark 1.3 already references Hadoop 2.2.0. Fixing that may make the default assembly work for Hadoop 1.x again as it did in Spark 1.2, but then it yet again changes the transitive deps of anyone relying on Spark Core artifacts in Maven. This is why I don't think reverting to `hadoop.version=1.0.4` is a good solution, and maybe that is the only point still being batted around. But Spark 1.4 is in a no-mans-land where the defaults don't work on 1.x (expected) and apparently don't quite work on 2.x (not expected). You'd think that at least one does. That's plainly suboptimal, and while not a show-stopper, needs fixing. I don't think anyone disputes that this PR would do the trick. Further, I like the idea of encouraging people to do the right thing, what the release has always safely done: specify Hadoop profile when it matters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101440685 Color me surprised that the default Hadoop 1.0.4 build worked on Hadoop 2.2 (really?) though that's a fair point then. So, even if the `pom.xml` went back to saying Hadoop 1.0.4 for 1.4, would this still be a problem for people building against Spark 1.3? because that `pom.xml` already had this change. It seems weird to go back but not impossible. This turns into a stronger argument for merging for 1.4 I suppose --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101443347 Sure - that's good too. We can explicitly require it. But even just changing the docs to not mention default behavior would IMO be good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101445976 Sean, I think you're confused. Or at least you're confusing the hell out of me. This has nothing to do with the POMs. This has to do with the final assembly generated by the build. If you build currently without specifying *anything* - no `-D` overrides, no profiles, no nothing - you end up with a broken build that doesn't work anywhere. That's all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101511508 I see @pwendell, I'll push the changes tomorrow, is a little late here in Venezuela. Greetings and thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101284746 @pwendell I don't think the previous update was wrong, certainly not for development. It was insufficient for creating a Hadoop 2.2 assembly from defaults, but that's not how Hadoop 2.2 assemblies are created. In that sense, this is not required for release 1.4 to be as correct as ever. Still the idea is that it would be better to make the default fully consistent, as if it were ready for a Hadoop 2.2 assembly. I think the cat is out of the bag on #5027; I believe 1.3 was accidentally released with, effectively, this change? So I don't think undo that, certainly not if it's solving more problems than it causes. (This is not at all about building for CDH.) This doesn't remove any profiles in order to reduce impact on build scripts, yes -- otherwise `-Phadoop-2.2` would start being an error. However it must add a `hadoop-1` profile to allow selecting the Hadoop 1.x settings. This profile has always silently existed as the unofficial collection of defaults. Adding it does indeed require a developer change -- but only for those who need to build for Hadoop 1.x explicitly. It at least makes this explicit. The cleanup is appealing, of course. I would campaign modestly for introducing this into 1.4. If the above hasn't swayed your second opinion here though, then let's just do nothing for 1.4, and put this into master for 1.5. By that point I think the case will be stronger still, and there will have been time to get used to the change for the small subset of people who need to build for 1.x. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101090681 Hey @FavioVazquez and @srowen. I took a look at this. A few questions: 1. Does this mean that #5027 was just wrong? I guess I don't see how things worked before this patch. 2. It's actually a pain for users when the default build changes. Why not just keep a -Phadoop-2.2 profile in the instructions? I wonder if we should just always advise users to use a Hadoop profile when building. Otherwise, we'll have to go to people and get them to change things, just like we are here. 3. Should we just merge this into master and then just revert #5027 in branch-1.4? From what I understand the change upgrading the Hadoop version was just to make it more convenient for IDE importing. Hardly a user facing feature. Also, I think it would be good to e-mail the dev list and explain that the default build behavior is changing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101134292 I think that you guys are all right, you have suggested some great changes and I think I'll let @srowen and @vanzin, with you @pwendell decide for the future of this PR, in my humble opinion it could be good, but is all up to you guys. I'll be alert to the comments of this PR and please let me know if there is something I could help, making this patch better, or fixing this issues in another way. Thanks for teaching me great stuff. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101126703 I think @srowen saw this PR as a cleaned up of old dependencies and updating of spark's defaults to a currently used Hadoop version. This started as a minor fix for inconsistencies in the Hadoop defaults when using the latest CDH5 distribution, and grew to be a upgrading of the Hadoop default version, updating of docs, cleaned up yarn's POM and main POM. I still face problems when building Spark for CDH5 without this changes, and I think it would be helpful to update the versions, since Hadoop-1 is really old, and I really believe it pumps up Spark to the newest technologies. I'm no expert in this field, but I think this PR could be interesting and useful for a lot of people that's starting with this technologies and would like to build Spark with the newest Hadoop version. I have to remark that if you use the actual building process and main POM, you'll get errors when try to connect to Cloudera's newest HDFS, yo can see that in the beginning of the PR. It's really awkward to build Spark with lots of ad hoc and in situ dependencies just to keep old versions, Idk maybe it's just me. I really appreciated @srowen and @vanzin help with this, and would like to now if you think this is the right track to Spark 1.4.0 @pwendell. I'm up to making any more changes and updates if you think is necessary, and I repeat, I think this could be a good refresh to spark dependencies, I know this is really a minor change, but it could grow to be even a better update. Thanks for your comments, I'll wait for your replies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-101131567 Hey @FavioVazquez - thanks for commenting. I'm happy to have a patch that makes the default settings more coherent. I just looked into the other pull request to understand better the origins (#5783). I was just a bit confused because it seems like if you were building for CDH 5.3 you would need non default settings anyways. But it seems like this was not specifically related to your issue and instead some clean-up suggested by @vainzn. My suggestion was maybe to just make this change in master rather than putting it into the 1.4 branch, since I see basically no benefit to this other than tidiness and it introduces some natural risk of mucking around build stuff. Also, if we are going to make build changes that require developer's to build Spark differently, I think we should give ample warning. And I suggested we retain the existing profiles in our documentation in order to avoid having to keep changing developer habits every time we bump the hadoop version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-99771339 This will take some time to review. It needs another rebase in the meantime I'm afraid. You can see that here in the UI. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-99705433 Hello @srowen any advances in the coordination (if/when) of mergin this PR? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98834110 This patch will need a rebase now, since unfortunately I merged a change that will conflict with it. It should be a simple resolution since I know the one line that will conflict. After you rebase and force-push, it's ready again. It is still not entirely clear when/if to merge this since it will need some coordination and a release is happening. So you can sit tight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98835084 Ok I see @srowen thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98831666 Hello @srowen, I'm not sure about the next steps you mentioned, could you please explain me what's going to happen now with the PR. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98477569 Looking good. I think I will need to coordinate closely with @pwendell on this one since it would be useful to put in before the release, but also, will have some tiny implications for the release process. It finishes the process of making Hadoop 2.2 the default and that helps simplify a number of things here, so I like it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98498096 Great @srowen please let me know if I can help with something else in this patch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98347073 I noticed a few more things that need to be updated. Kind of confusingly, the Spark build conflates Hadoop 1.0 with Hadoop 2.0 and 2.1, so you'll need to add `-Phadoop-1` to a few more places. All of these need it, I think: create-release.sh make_binary_release cdh4 -Phive -Phive-thriftserver -Dhadoop.version=2.0.0-mr1-cdh4.2.0 3032 building-spark.md # Apache Hadoop 1.2.1 mvn -Dhadoop.version=1.2.1 -DskipTests clean package # Cloudera CDH 4.2.0 with MapReduce v1 mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package dev/run-tests if [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop1.0 ]; then export SBT_MAVEN_PROFILES_ARGS=-Dhadoop.version=1.0.4 elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.0 ]; then export SBT_MAVEN_PROFILES_ARGS=-Dhadoop.version=2.0.0-mr1-cdh4.1.1 Soon I want to get rid of this unsupported CDH4 info/profile (and CDH*3* docs! surely Spark hasn't worked with that in a long time). Separate issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98347284 @shaneknapp if this goes in, then the essence of the config change is this: anything building for Hadoop 1.x, 2.0, or 2.1 now needs `-Phadoop-1` to maintain the same config. Anything that used to specify `-Phadoop-2.2 -Dhadoop.version=2.2.0` doesn't need to, as that's the default. Anything that doesn't specify profiles will get Hadoop 2.2. I believe this therefore just affects the Spark-Master-Maven-pre-YARN job in Jenkins, and certainly only affects `master`. Both of those jobs could use `-Phadoop-1`. Looks like we don't have a Hadoop 2.2 specific build? that's fine, but then there's nothing along those lines to change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98371065 Hello @srowen I was noticing some of that things. Thank you for making it easy for me to change it. I just pushed your suggested changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98379921 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98380193 [Test build #31667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31667/consoleFull) for PR 5786 at commit [`31bdafa`](https://github.com/apache/spark/commit/31bdafad21674fe5bc582fa678753454b04026ad). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98384948 [Test build #31667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31667/consoleFull) for PR 5786 at commit [`31bdafa`](https://github.com/apache/spark/commit/31bdafad21674fe5bc582fa678753454b04026ad). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98384951 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98384952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31667/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98380139 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98380133 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98394731 @srowen @vanzin everything seems fine and test passed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user shaneknapp commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98434090 ah cool, got it. thanks sean! On Sat, May 2, 2015 at 3:49 AM, Sean Owen notificati...@github.com wrote: @shaneknapp https://github.com/shaneknapp if this goes in, then the essence of the config change is this: anything building for Hadoop 1.x, 2.0, or 2.1 now needs -Phadoop-1 to maintain the same config. Anything that used to specify -Phadoop-2.2 -Dhadoop.version=2.2.0 doesn't need to, as that's the default. Anything that doesn't specify profiles will get Hadoop 2.2. I believe this therefore just affects the Spark-Master-Maven-pre-YARN job in Jenkins, and certainly only affects master. Both of those jobs could use -Phadoop-1. Looks like we don't have a Hadoop 2.2 specific build? that's fine, but then there's nothing along those lines to change. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/5786#issuecomment-98347284. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98261735 The `yarn` profile just exists to turn on these modules, right? I meant that the stuff in the `yarn` profile in `yarn/pom.xml` should be pulled into the main config of `yarn/pom.xml`, not `pom.xml`. Isn't that easier (keeps YARN stuff in the `yarn` module) or do I miss something? This is turning into a good cleanup. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98264171 No, the parent should contain the same `yarn` profile which only activates the YARN-related modules. The content of the `hadoop-2.2` profile that was in `yarn/pom.xml` should simply be integrated into `yarn/pom.xml`. Then that `hadoop-2.2` profile can go away. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98260176 @vanzin I think that's what you ment. Check if everithing is OK now please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98257941 I already did but github will make it really hard for you to find it. :-/ Here's the comment: Can we just fold this config into the main POM config for yarn? I guess that could work. If there's any discrepancy in the versions in other hadoop profiles, we can override the versions later, but adding the dependencies for non-hadoop-2.2 is at worst redundant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98259020 I see. I'll do that right away --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...
Github user FavioVazquez commented on the pull request: https://github.com/apache/spark/pull/5786#issuecomment-98262625 So should I erase the yarn profile from the root POM and move the entire profile into the yarn/POM? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org