subject:"\[GitHub\] spark pull request\: \[SPARK\-7249\] Updated Hadoop dependencies due t..."

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102027647
  
  [Test build #32708 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32708/consoleFull)
 for   PR 5786 at commit 
[`11670e5`](https://github.com/apache/spark/commit/11670e5baf49489c2d0e394a32865deff8e3a791).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102027658
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32708/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5786


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102027657
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102049341
  
All test passed @srowen. It was, as expected, an unrelated error. Is 
everything set now to merge this PR? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-14 Thread shaneknapp

Github user shaneknapp commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102132344
  
@srowen -- so i'm looking at the configs and we don't actually specify 
anything in the builds WRT to mvn or sbt build options[1].  these are all 
matrix build configs and the hadoop options are all handled in the scripts in 
dev/... (for the most part).

[1] - not *technically* true, but these guys don't seem like they'll break 
as the hadoop version  2.2.0:
  - 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.4-Maven-with-YARN/configure
  - 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-Maven-with-YARN/configure
  - 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.2-Maven-with-YARN/configure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101980296
  
Are you sure Sean? I could make the change and push it, but if is easier to 
make the change in the merge you tell me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101981241
  
I'm happy to help, give me a sec and I'll push the changes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-14 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5786#discussion_r30307006
  
--- Diff: dev/create-release/create-release.sh ---
@@ -118,14 +118,14 @@ if [[ ! $@ =~ --skip-publish ]]; then
 
   rm -rf $SPARK_REPO
 
-  build/mvn -DskipTests -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 \
--Pyarn -Phive -Phive-thriftserver -Phadoop-2.2 -Pspark-ganglia-lgpl 
-Pkinesis-asl \
+  build/mvn -DskipTests -Pyarn -Phive \
+-Phive-thriftserver -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \
 clean install
 
   ./dev/change-version-to-2.11.sh
   
-  build/mvn -DskipTests -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 \
--Dscala-2.11 -Pyarn -Phive -Phadoop-2.2 -Pspark-ganglia-lgpl 
-Pkinesis-asl \
+  build/mvn -DskipTests -Pyarn -Phive \
+-Dscala-2.11 -Pspark-ganglia-lgpl -Pkinesis-asl \
--- End diff --

This still needs `-Phadoop-2.2` but maybe I can add that on merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-14 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101980703
  
Go ahead if you have a moment; only if it's not much work.
Thanks for your perseverance. This ends up being a great change IMHO. It 
will go in shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102002203
  
  [Test build #32700 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32700/consoleFull)
 for   PR 5786 at commit 
[`11670e5`](https://github.com/apache/spark/commit/11670e5baf49489c2d0e394a32865deff8e3a791).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102002247
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32700/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102002243
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102002721
  
This has happened before @srowen, I think this is again an unrelated fail. 
Could you ask jenkins to retest this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-14 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102002860
  
I'm sure it is as you only made a doc change but while we are waiting: 
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102003335
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102003319
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102003559
  
  [Test build #32708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32708/consoleFull)
 for   PR 5786 at commit 
[`11670e5`](https://github.com/apache/spark/commit/11670e5baf49489c2d0e394a32865deff8e3a791).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101983143
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101983370
  
  [Test build #32700 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32700/consoleFull)
 for   PR 5786 at commit 
[`11670e5`](https://github.com/apache/spark/commit/11670e5baf49489c2d0e394a32865deff8e3a791).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101983163
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101637112
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32605/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101637110
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-13 Thread FavioVazquez

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101601409
  
I hope that this is the correct way of making all the changes you 
suggested. Please check this and thank you @srowen @vanzin and @pwendell. Let 
me know if there is something else that could be done, or if this finishes the 
patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101603183
  
  [Test build #32605 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32605/consoleFull)
 for   PR 5786 at commit 
[`379f50d`](https://github.com/apache/spark/commit/379f50d63629318d1d0689a155a201a220aa54fe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101602744
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101602861
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101637097
  
  [Test build #32605 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32605/consoleFull)
 for   PR 5786 at commit 
[`379f50d`](https://github.com/apache/spark/commit/379f50d63629318d1d0689a155a201a220aa54fe).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101554493

@FavioVazquez Other conflicting changes have been merged to `master` since
this was opened, which is why Github says This pull request contains merge
conflicts that must be resolved.. Basically you need to pull the latest
`master` changes and `rebase` on it. Are you familiar with that process? I have
a remote `origin` for my fork and `upstream` for the main project and so
usually do ...

```
git checkout master
git pull upstream master
git checkout mybranch
git rebase master
```

You have to fix the merge conflicts and `git push origin mybranch` then.

@vanzin still disagree on the effective POM stuff; you're all correct about
what should happen, but this happened in reality:

https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.3.1/spark-core_2.10-1.3.1.pom
```
dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-client/artifactId
version2.2.0/version
...
```
That was what I'm getting at; this shouldn't change backwards now. Whether
or not that is a point doesn't matter though if we're on board with proceeding.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101556712
  
I think @vanzin was saying the pom for spark-parent does not have 2.2.0, it 
has 1.0.4. But it's moot because none of the other projects expect to get 
hadoop.version from it since we use effective poms.

https://repo1.maven.org/maven2/org/apache/spark/spark-parent_2.10/1.3.1/spark-parent_2.10-1.3.1.pom

But really spark-parent affects no one except for someone directly 
extending spark's build (which we've never expected people to do, really).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101557370
  
Yes, agree with all that. But `spark-core` affects people. The proposed 
change would cause its effective POM to depend on 1.0.4 in Spark 1.4, assuming 
it's published as intended with no build flags, whereas Spark 1.3 depends on 
2.2.0 (which I understand was inadvertent to begin with).

Is the disconnect that people think that's fine? I'd be surprised, since 
actual, real Hadoop dependency for Spark Core would have flip flopped from 
2.2.0 to 1.0.4 and back to 2.2.0 over three releases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101560695
  
I thought the proposal here was to continue publishing artifacts with 2.2.0 
(?) If you look at the patch, it moves the default build to 2.2.0 and then it 
publishes with the default build. I think that's the way to go... but maybe I'm 
misunderstanding?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101561958
  
Yes, I'm arguing that the alternative, changing `hadoop.version` back to 
1.0.4, is not a solution. I disagree with The smallest fix for the issue would 
be to revert back to 1.0.4 as the default version -- a fix perhaps for one 
thing here, but that causes a different problem. I am in favor of this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101562899
  
Oh yeah sorry - I was wrong about that. Never said so explicitly, but 
basically I support this patch in it's current form (brought up to date) with 
the only change being to not ask users to rely on default behavior in our docs 
(i.e. don't delete the docs referring to hadoop-2.2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5786#discussion_r30209856
  
--- Diff: docs/building-spark.md ---
@@ -67,8 +67,8 @@ Because HDFS is not protocol-compatible across versions, 
if you want to read fro
   /thead
   tbody
 trtd0.23.x/tdtdhadoop-0.23/td/tr
-trtd1.x to 2.1.x/tdtd(none)/td/tr
-trtd2.2.x/tdtdhadoop-2.2/td/tr
+trtd1.x to 2.1.x/tdtdhadoop-1/td/tr
+trtd2.2.x/tdtd(none)/td/tr
--- End diff --

And back here...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5786#discussion_r30209874
  
--- Diff: docs/building-spark.md ---
@@ -92,8 +92,6 @@ You can enable the yarn profile and optionally set the 
yarn.version property
 Examples:
 
 {% highlight bash %}
-# Apache Hadoop 2.2.X
--- End diff --

This can come back


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5786#discussion_r30209842
  
--- Diff: dev/scalastyle ---
@@ -20,8 +20,8 @@
 echo -e q\n | build/sbt -Phive -Phive-thriftserver scalastyle  
scalastyle.txt
 echo -e q\n | build/sbt -Phive -Phive-thriftserver test:scalastyle  
scalastyle.txt
 # Check style with YARN built too
-echo -e q\n | build/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 
scalastyle  scalastyle.txt
-echo -e q\n | build/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 
test:scalastyle  scalastyle.txt
+echo -e q\n | build/sbt -Pyarn scalastyle  scalastyle.txt
--- End diff --

`-Phadoop-2.2` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5786#discussion_r30209825
  
--- Diff: dev/create-release/create-release.sh ---
@@ -118,14 +118,14 @@ if [[ ! $@ =~ --skip-publish ]]; then
 
   rm -rf $SPARK_REPO
 
-  build/mvn -DskipTests -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 \
--Pyarn -Phive -Phive-thriftserver -Phadoop-2.2 -Pspark-ganglia-lgpl 
-Pkinesis-asl \
+  build/mvn -DskipTests -Pyarn -Phive \
--- End diff --

Favio I think these two now need `-Phadoop-2.2` again to be fully 
consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5786#discussion_r30209835
  
--- Diff: dev/run-tests ---
@@ -40,11 +40,11 @@ function handle_error () {
 {
   if [ -n $AMPLAB_JENKINS_BUILD_PROFILE ]; then
 if [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop1.0 ]; then
-  export SBT_MAVEN_PROFILES_ARGS=-Dhadoop.version=1.0.4
+  export SBT_MAVEN_PROFILES_ARGS=-Phadoop-1 -Dhadoop.version=1.0.4
 elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.0 ]; then
-  export SBT_MAVEN_PROFILES_ARGS=-Dhadoop.version=2.0.0-mr1-cdh4.1.1
+  export SBT_MAVEN_PROFILES_ARGS=-Phadoop-1 
-Dhadoop.version=2.0.0-mr1-cdh4.1.1
 elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.2 ]; then
-  export SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.2 
-Dhadoop.version=2.2.0
+  export SBT_MAVEN_PROFILES_ARGS=-Pyarn
--- End diff --

`-Phadoop-2.2` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-13 Thread FavioVazquez

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101565704
  
Great, I'm familiar with the process @srowen. Thank you guys for all the 
suggestions, I'm making the changes and be pushing the changes soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101355534
  
 I don't have any problems with a plain vanilla mvn package ... ?

Can you successfully run unit tests with that? I'd expect the library 
version conflicts to start causing issues at that point. Also, I'm not sure the 
generated assembly would work on a real cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101377390
  
Yes, `mvn -DskipTests clean package; mvn test` succeeds in `master` for me 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101353251
  
I think this is more than just a cleanup. As Favio points out, if you build 
with just `mvn package`, the build is broken because of inconsistent versions. 
The minimum command line to get a working build today is `mvn 
-Dhadoop.version=1.0.4 package`.

It may be that all official build scripts work around that problem 
inadvertently. But the current code is not correct.

So we either need to fix things so that the default profile is *actually* 
hadoop-2.2, or revert the previous change so that the default profile is 
hadoop-1. But right now the default profile is in a weird unhappy state.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101354454
  
I don't have any problems with a plain vanilla `mvn package` ... ? what's 
the issue? Things that don't care about Hadoop don't care; things that do, 
well, sometimes do need a Hadoop dependency set to a particular version of 
course.

One problem here is that this version was actually already set to 2.2 in 
1.3.0, at least:

https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.3.1/spark-core_2.10-1.3.1.pom

If there is any issue, then I think it's best to fix-forward. I have not 
observed any immediate issue though?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user FavioVazquez closed the pull request at:

https://github.com/apache/spark/pull/5786


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101474934
  
I will make the suggestes changes and push them


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101474903
  
Sorry i closed it by accident


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

GitHub user FavioVazquez reopened a pull request:

https://github.com/apache/spark/pull/5786

[SPARK-7249] Updated Hadoop dependencies due to inconsistency in the 
versions

Updated Hadoop dependencies due to inconsistency in the versions. Now the 
global properties are the ones used by the hadoop-2.2 profile, and the profile 
was set to empty but kept for backwards compatibility reasons.

Changes proposed by @vanzin resulting from previous pull-request 
https://github.com/apache/spark/pull/5783 that did not fixed the problem 
correctly.

Please let me know if this is the correct way of doing this, the comments 
of @vanzin are in the pull-request mentioned. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/FavioVazquez/spark update-hadoop-dependencies

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5786.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5786


commit ec91ce3c405123818a4c56ef361d9cc82951677d
Author: FavioVazquez favio.vazqu...@gmail.com
Date:   2015-04-29T17:58:09Z

- Updated protobuf-java version of com.google.protobuf dependancy to fix 
blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix 
for 2.5.0-cdh5.3.3 version)

commit 660decce9d3c2300aee493b605da0da8a74b3ea6
Author: FavioVazquez favio.vazqu...@gmail.com
Date:   2015-04-29T19:16:04Z

- Updated Hadoop dependencies due to inconsistency in the versions. Now the 
global properties are the ones used by the hadoop-2.2 profile, and the profile 
was set to empty but kept for backwards compatibility reasons

commit 7e9955df29b5d5c9cda950636d51da753e6d17ea
Author: FavioVazquez favio.vazqu...@gmail.com
Date:   2015-04-29T19:35:08Z

- Updated Hadoop dependencies due to inconsistency in the versions. Now the 
global properties are the ones used by the hadoop-2.2 profile, and the profile 
was set to empty but kept for backwards compatibility reasons

commit 6b4bfafbe4f98c92ac2fe7aeb5f36a37d27a9678
Author: FavioVazquez favio.vazqu...@gmail.com
Date:   2015-04-30T21:41:08Z

- Cleanup in hadoop-2.x profiles since they contained mostly redundant 
stuff.

commit 13542929c9cb3ddfec31bbb794e490b44c273df4
Author: FavioVazquez favio.vazqu...@gmail.com
Date:   2015-04-30T22:13:50Z

- Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests 
and documentation

commit 287fa2ffc31bb0c9eaf5daf80825ff0093f3f20d
Author: FavioVazquez favio.vazqu...@gmail.com
Date:   2015-04-30T22:17:44Z

- Updated documentation about specifying the hadoop version in 
building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by 
default.
- Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark 
doc.

commit 70b8344dcad8f6de71bd6356cd6eec375211fdb3
Author: FavioVazquez favio.vazqu...@gmail.com
Date:   2015-04-30T22:57:16Z

- Fixed typo in the make-distribution.sh file and added hadoop-1 in the 
Related profiles

commit 88a8b88a13a02cbde04792cb63e3c6a81407d915
Author: FavioVazquez favio.vazqu...@gmail.com
Date:   2015-05-01T16:48:27Z

- Simplified Hadoop profiles due to new setting of global properties in the 
pom.xml file
- Added comment to specify that the hadoop-2.2 profile is now the default 
hadoop profile in the pom.xml file
- Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the 
make-distribution.sh file

commit 199f40b1733015a414eb928b2090f3bf4d0b7a7e
Author: FavioVazquez favio.vazqu...@gmail.com
Date:   2015-05-01T20:44:30Z

- Erased unnecessary CDH5-specific note in docs/building-spark.md
- Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in 
docs/building-spark.md
- Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now 
the default .Added comment in the yarn/pom.xml to specify that.

commit a6507792cc12fc03139be825357f22329773c823
Author: FavioVazquez favio.vazqu...@gmail.com
Date:   2015-05-01T20:50:46Z

- Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml
- Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set 
in avro.mapred.classifier in pom.xml

commit 0470587ad7af93041e25dcb07954b835d9508a10
Author: FavioVazquez favio.vazqu...@gmail.com
Date:   2015-05-01T21:06:52Z

- Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in 
create-release.sh
- Updated how the releases are made in the create-release.sh no that the 
default hadoop version is the 2.2.0
- Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in 
scalastyle
- Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in 
run-tests
- Better example given in the hadoop-third-party-distributions.md now that 
the default hadoop version is 2.2.0

commit

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101474861
  
Perfect, I've been whatching all of your conversations. I wil make th


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101473151
  
  let's just keep that line there and always suggest that people use a 
build profile

Sure, no problems with that at all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101473496
  
K - everything else from your comment makes sense, and fine to put it in 
1.4. It just wasn't immediately obvious to me that it was really broken since 
tests and build were okay, but clearly the versions in the pom were not 
consistent with Hadoop 2.2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101475218
  
In summary, add that line that @pwendell suggested.But I'm not sure about 
the default profiles, should I erase the hadoop-1 profile? there will be no 
default hadoop version now? Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101466543
  
So, I had a chat offline (well, off-github) with Sean and these are my 
conclusions:

- There is a real issue, addressed by this PR, that the default build 
generates an assembly that cannot talk to any version of HDFS.

- In my view, the fix proposed here is the right way forward; it 
standardizes on hadoop-2 as the preferred hadoop version by making it the 
default, and having the default build work with a hadoop 2 cluster.

- The smallest fix for the issue would be to revert back to 1.0.4 as the 
default version. Because we publish effective poms, that would not change the 
version of Hadoop for any artifacts except for spark-parent; that is not a big 
problem because it would only affect someone who depends on `${hadoop.version}` 
and has `spark-parent_2.10` as the parent project of their own project, which 
I'd guess is a very small set of people (if it even exists).

As for whether the default build should work or we should disallow it, I 
don't really have a strong opinion. If there's an easy fix, sure, but if it 
gets complicated, then it's probably not worth it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101472800
  
Sounds good - however, I think the issues aren't totally decoupled, because 
this pull request deletes the following line from the documentation:

```
# Apache Hadoop 2.2.X
-mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
```

I am suggestion look, let's just keep that line there and always suggest 
that people use a build profile. Otherwise changes like this will not be future 
proof.

Changes to default behavior are annoying for developers and I see no 
downside in asking people to be explicit about hadoop flags.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101478579
  
And @srowen you said some days ago that you knew the places that this PR 
needed a Rebase, could you point them out to me please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101486305
  
I think there has to be some default version by definition - and yeah let's
have the defaults at 2.2.0. But I'd just in the instructions tell people
building for hadoop 1 to use the -Phadoop1 profile and tell people building
for 2.2.0 to use the -Phadoop2.2 profile etc. I.e. let's not encourage
people to rely on the default behavior.

On Tue, May 12, 2015 at 6:31 PM, Favio AndrÃ© VÃ¡zquez 
notificati...@github.com wrote:

 And @srowen https://github.com/srowen you said some days ago that you
 knew the places that this PR needed a Rebase, could you point them out to
 me please?

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/5786#issuecomment-101478579.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101445511
  
@vanzin what works when the build says `hadoop.version=1.0.4` that doesn't 
work when the build says `hadoop.version=2.2.0`? Just running on Hadoop 1.x? 
Agree but that is no longer supposed to work by default if the default Hadoop 
version is supposed to be 2.x. Whatever the problem is, is already a problem, 
since the Spark 1.3 POMs already have 2.2.0 specified.

Anyway, maybe that's just violent agreement that something has to be 
tweaked. If this is merged as a resolution for 1.4, OK by me for sure.

I don't like `activeByDefault` merely because it gets disabled if any 
profile is selected, not just a Hadoop-related profile.

I think coaching in the docs to always set these Hadoop profiles is maybe 
safer and more overt. Then, the net change would be: everywhere in this PR that 
doesn't say `-Phadoop-x.y` should add `-Phadoop-2.2`, which is actually a no-op 
profile, but then at least it's explicit.

Eventually when, say, Hadoop 1.x support really goes away, the `hadoop-1` 
profile really goes away and breaks command lines that select this profile, 
but, that's good.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101437368
  
Just OOC, I build with no enabled profiles and tried to run spark-shell on 
a real cluster (standalone since YARN profile wasn't enabled). It fails pretty 
early with:

15/05/12 15:08:18 INFO AppClient$ClientActor: Executor updated: 
app-20150512150817-0001/8 is now RUNNING
java.lang.VerifyError: class 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$SetOwnerRequestProto
 overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

Which is the issue Favio raised (mismatched protobuf versions).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101439466
  
 but haven't we always generally needed to build Spark for Hadoop X to 
avoid this?

Correct. But the default now claims to be 2.2, whereas the default build 
(i.e. the build where you do not enable any profiles) will not work on a 
hadoop-2.2 cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101442338
  
 Color me surprised that the default Hadoop 1.0.4 build worked on Hadoop 
2.2 (really?) though that's a fair point then. 

I think there's some line noise somewhere. I didn't say that at all, and I 
wouldn't expect that to work. But that is not the issue being raised here.

 So, even if the pom.xml went back to saying Hadoop 1.0.4 for 1.4, would 
this still be a problem for people building against Spark 1.3?

What are you calling a problem here? Reverting that would revert the 
default build to hadoop-1. It would work on a hadoop 1 cluster, but not on a 
hadoop 2 cluster. Which is fine, because that's what the build would suggest.

The current problem is that the default build says it's a hadoop-2.2 
build but it does not work on a hadoop 2.2 cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101442243
  
Hey so I am okay to merge this into 1.4, but what about not having any 
publicly advertised default build and just asking people to always use 
profiles when building Spark in the documentation?

Otherwise every time we change the default version we are likely to make 
someone's life more difficult by silently changing behavior on them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101459672
  
 We all understand that POMs define how assemblies are built. This can't 
have nothing to do with them.

You're mixing published POMs with the actual pom files in the build. They 
are not the same thing and that's what's making me incredibly confused.

  You're suggesting reverting to restore the default to work with Hadoop 
1.x

No, I'm suggesting fixing it, one way or another, which one doesn't matter.

  but then that trips a different version-related problem: the published 
POM for Spark 1.3 already references Hadoop 2.2.0. 

That's completely unrelated; the build that actually does a mvn deploy to 
update the maven artifacts needs to match the profiles needed to replicate the 
1.3 build. That's completely separate from what the default property values 
are, which is what this PR is about.

To summarize: fixing what the default build should be, or whether we 
should have a default build at all, is irrelevant to the published poms, and 
bringing them up just causes confusion.

This PR exists for a simple reason: the artifacts generated when you build 
with the current default properties are broken. Period.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101438765
  
`spark-shell` works locally for me. You're right, this may not work on 
Hadoop cluster X, but haven't we always generally needed to build Spark for 
Hadoop X to avoid this? I get it though, maybe the inconsistent Hadoop client 
libs don't work whereas a consistent Hadoop 1.x client lib set did, even 
against a mismatched cluster version.

Fair point and all that but this isn't the right way to build Spark anyway, 
and I'm afraid this change was effectively already released. I'm narrowly 
arguing against undoing the `hadoop.version=2.2.0` change. I'm also asserting 
that the 1.4 release artifacts will be fine.

And then saying we should fix-forward the rest of this for 1.5, if not 1.4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101443231
  
 but what about not having any publicly advertised default build

That's ok too, but it should be enforced somehow. e.g. have a profile with 
`activeByDefault` set to `true` than causes the build to fail with an error 
message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101458849
  
We all understand that POMs define how assemblies are built. This can't 
have nothing to do with them.

As I say, the build works fine in, say, local mode (well, no obvious 
problems). Jenkins tests are happy since they don't rely on defaults. It's not 
true that nothing works, but there's a problem. This is narrowly about the 
Hadoop user's perilous expectations of defaults.

I don't expect the default assembly to work on a Hadoop 1.x cluster, but 
it's not supposed to now in Spark 1.4. You're suggesting reverting to restore 
the default to work with Hadoop 1.x, but then that trips a different 
version-related problem: the published POM for Spark 1.3 already references 
Hadoop 2.2.0. Fixing that may make the default assembly work for Hadoop 1.x 
again as it did in Spark 1.2, but then it yet again changes the transitive deps 
of anyone relying on Spark Core artifacts in Maven. This is why I don't think 
reverting to `hadoop.version=1.0.4` is a good solution, and maybe that is the 
only point still being batted around.

But Spark 1.4 is in a no-mans-land where the defaults don't work on 1.x 
(expected) and apparently don't quite work on 2.x (not expected). You'd think 
that at least one does. That's plainly suboptimal, and while not a 
show-stopper, needs fixing. I don't think anyone disputes that this PR would do 
the trick. Further, I like the idea of encouraging people to do the right 
thing, what the release has always safely done: specify Hadoop profile when it 
matters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101440685
  
Color me surprised that the default Hadoop 1.0.4 build worked on Hadoop 2.2 
(really?) though that's a fair point then. 

So, even if the `pom.xml` went back to saying Hadoop 1.0.4 for 1.4, would 
this still be a problem for people building against Spark 1.3? because that 
`pom.xml` already had this change. It seems weird to go back but not impossible.

This turns into a stronger argument for merging for 1.4 I suppose


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101443347
  
Sure - that's good too. We can explicitly require it. But even just 
changing the docs to not mention default behavior would IMO be good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101445976
  
Sean, I think you're confused. Or at least you're confusing the hell out of 
me.

This has nothing to do with the POMs. This has to do with the final 
assembly generated by the build.

If you build currently without specifying *anything* - no `-D` overrides, 
no profiles, no nothing - you end up with a broken build that doesn't work 
anywhere. That's all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101511508
  
I see @pwendell, I'll push the changes tomorrow, is a little late here in 
Venezuela.

Greetings and thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101284746
  
@pwendell I don't think the previous update was wrong, certainly not for 
development. It was insufficient for creating a Hadoop 2.2 assembly from 
defaults, but that's not how Hadoop 2.2 assemblies are created. In that sense, 
this is not required for release 1.4 to be as correct as ever.

Still the idea is that it would be better to make the default fully 
consistent, as if it were ready for a Hadoop 2.2 assembly.

I think the cat is out of the bag on #5027; I believe 1.3 was accidentally 
released with, effectively, this change? So I don't think undo that, certainly 
not if it's solving more problems than it causes.

(This is not at all about building for CDH.)

This doesn't remove any profiles in order to reduce impact on build 
scripts, yes -- otherwise `-Phadoop-2.2` would start being an error. However it 
must add a `hadoop-1` profile to allow selecting the Hadoop 1.x settings. This 
profile has always silently existed as the unofficial collection of defaults. 
Adding it does indeed require a developer change -- but only for those who need 
to build for Hadoop 1.x explicitly. It at least makes this explicit.

The cleanup is appealing, of course.

I would campaign modestly for introducing this into 1.4. If the above 
hasn't swayed your second opinion here though, then let's just do nothing for 
1.4, and put this into master for 1.5. By that point I think the case will be 
stronger still, and there will have been time to get used to the change for the 
small subset of people who need to build for 1.x.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-11 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101090681
  
Hey @FavioVazquez and @srowen. I took a look at this. A few questions:

1. Does this mean that #5027 was just wrong? I guess I don't see how things 
worked before this patch.
2. It's actually a pain for users when the default build changes. Why not 
just keep a -Phadoop-2.2 profile in the instructions? I wonder if we should 
just always advise users to use a Hadoop profile when building. Otherwise, 
we'll have to go to people and get them to change things, just like we are 
here. 
3. Should we just merge this into master and then just revert #5027 in 
branch-1.4? From what I understand the change upgrading the Hadoop version was 
just to make it more convenient for IDE importing. Hardly a user facing 
feature. Also, I think it would be good to e-mail the dev list and explain that 
the default build behavior is changing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-11 Thread FavioVazquez

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101134292
  
I think that you guys are all right, you have suggested some great changes 
and I think I'll let @srowen and @vanzin, with you @pwendell decide for the 
future of this PR, in my humble opinion it could be good, but is all up to you 
guys. 

I'll be alert to the comments of this PR and please let me know if there is 
something I could help, making this patch better, or fixing this issues in 
another way. 

Thanks for teaching me great stuff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-11 Thread FavioVazquez

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101126703
  
I think @srowen saw this PR as a cleaned up of old dependencies and 
updating of spark's defaults to a currently used Hadoop version. This started 
as a minor fix for inconsistencies in the Hadoop defaults when using the latest 
CDH5 distribution, and grew to be a upgrading of the Hadoop default version, 
updating of docs, cleaned up yarn's POM and main POM. I still face problems 
when building Spark for CDH5 without this changes, and I think it would be 
helpful to update the versions, since Hadoop-1 is really old, and I really 
believe it pumps up Spark to the newest technologies.

I'm no expert in this field, but I think this PR could be interesting and 
useful for a lot of people that's starting with this technologies and would 
like to build Spark with the newest Hadoop version. I have to remark that if 
you use the actual building process and main POM, you'll get errors when try to 
connect to Cloudera's newest HDFS, yo can see that in the beginning of the PR. 
It's really awkward to build Spark with lots of ad hoc  and in situ 
dependencies just to keep old versions, Idk maybe it's just me. 

I really appreciated @srowen and @vanzin help with this, and would like to 
now if you think this is the right track to Spark 1.4.0 @pwendell. I'm up to 
making any more changes and updates if you think is necessary, and I repeat, I 
think this could be a good refresh to spark dependencies, I know this is really 
a minor change, but it could grow to be even a better update.

Thanks for your comments, I'll wait for your replies. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-11 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101131567
  
Hey @FavioVazquez - thanks for commenting. I'm happy to have a patch that 
makes the default settings more coherent. I just looked into the other pull 
request to understand better the origins (#5783). 

I was just a bit confused because it seems like if you were building for 
CDH 5.3 you would need non default settings anyways. But it seems like this was 
not specifically related to your issue and instead some clean-up suggested by 
@vainzn.

My suggestion was maybe to just make this change in master rather than 
putting it into the 1.4 branch, since I see basically no benefit to this other 
than tidiness and it introduces some natural risk of mucking around build 
stuff. Also, if we are going to make build changes that require developer's to 
build Spark differently, I think we should give ample warning. And I suggested 
we retain the existing profiles in our documentation in order to avoid having 
to keep changing developer habits every time we bump the hadoop version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-07 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-99771339
  
This will take some time to review. It needs another rebase in the meantime 
I'm afraid. You can see that here in the UI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-06 Thread FavioVazquez

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-99705433
  
Hello @srowen any advances in the coordination (if/when) of mergin this PR? 
Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-04 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98834110
  
This patch will need a rebase now, since unfortunately I merged a change 
that will conflict with it. It should be a simple resolution since I know the 
one line that will conflict. After you rebase and force-push, it's ready again.

It is still not entirely clear when/if to merge this since it will need 
some coordination and a release is happening. So you can sit tight.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-04 Thread FavioVazquez

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98835084
  
Ok I see @srowen thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-04 Thread FavioVazquez

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98831666
  
Hello @srowen, I'm not sure about the next steps you mentioned, could you 
please explain me what's going to happen now with the PR. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-03 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98477569
  
Looking good. I think I will need to coordinate closely with @pwendell on 
this one since it would be useful to put in before the release, but also, will 
have some tiny implications for the release process. It finishes the process of 
making Hadoop 2.2 the default and that helps simplify a number of things here, 
so I like it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-03 Thread FavioVazquez

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98498096
  
Great @srowen please let me know if I can help with something else in this 
patch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-02 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98347073
  
I noticed a few more things that need to be updated. Kind of confusingly, 
the Spark build conflates Hadoop 1.0 with Hadoop 2.0 and 2.1, so you'll need to 
add `-Phadoop-1` to a few more places. All of these need it, I think:

create-release.sh
  make_binary_release cdh4 -Phive -Phive-thriftserver 
-Dhadoop.version=2.0.0-mr1-cdh4.2.0 3032 

building-spark.md

# Apache Hadoop 1.2.1
mvn -Dhadoop.version=1.2.1 -DskipTests clean package

# Cloudera CDH 4.2.0 with MapReduce v1
mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package

dev/run-tests

if [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop1.0 ]; then
  export SBT_MAVEN_PROFILES_ARGS=-Dhadoop.version=1.0.4
elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.0 ]; then
  export SBT_MAVEN_PROFILES_ARGS=-Dhadoop.version=2.0.0-mr1-cdh4.1.1


Soon I want to get rid of this unsupported CDH4 info/profile (and CDH*3* 
docs! surely Spark hasn't worked with that in a long time). Separate issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-02 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98347284
  
@shaneknapp if this goes in, then the essence of the config change is this: 
anything building for Hadoop 1.x, 2.0, or 2.1 now needs `-Phadoop-1` to 
maintain the same config. Anything that used to specify `-Phadoop-2.2 
-Dhadoop.version=2.2.0` doesn't need to, as that's the default. Anything that 
doesn't specify profiles will get Hadoop 2.2.

I believe this therefore just affects the Spark-Master-Maven-pre-YARN job 
in Jenkins, and certainly only affects `master`. Both of those jobs could use 
`-Phadoop-1`.

Looks like we don't have a Hadoop 2.2 specific build? that's fine, but then 
there's nothing along those lines to change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-02 Thread FavioVazquez

Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98371065
  
Hello @srowen I was noticing some of that things. Thank you for making it 
easy for me to change it. I just pushed your suggested changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-02 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98379921
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98380193
  
  [Test build #31667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31667/consoleFull)
 for   PR 5786 at commit 
[`31bdafa`](https://github.com/apache/spark/commit/31bdafad21674fe5bc582fa678753454b04026ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98384948
  
  [Test build #31667 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31667/consoleFull)
 for   PR 5786 at commit 
[`31bdafa`](https://github.com/apache/spark/commit/31bdafad21674fe5bc582fa678753454b04026ad).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98384951
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98384952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31667/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98380139
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...