[GitHub] spark issue #19190: [SPARK-21976][DOC] Fix wrong documentation for Mean Abso...

2017-09-12 Thread FavioVazquez
Github user FavioVazquez commented on the issue:

https://github.com/apache/spark/pull/19190
  
Thanks to Carlos Munguia, Jared Romero and Christhian Flores :). 
@montactuaria @jared275 @chris122flores


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19190: [SPARK-21976][DOC]

2017-09-11 Thread FavioVazquez
GitHub user FavioVazquez opened a pull request:

https://github.com/apache/spark/pull/19190

[SPARK-21976][DOC]

## What changes were proposed in this pull request?

Fixed wrong documentation for Mean Absolute Error.

Even though the code is correct for the MAE:

```scala
@Since("1.2.0")
  def meanAbsoluteError: Double = {
summary.normL1(1) / summary.count
  }
```
In the documentation the division by N is missing.

## How was this patch tested?

All of spark tests were run.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/FavioVazquez/spark mae-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19190.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19190


commit b2e2f8caef4f43d62cb48fc09a0fe17e71f3f4dd
Author: FavioVazquez 
Date:   2015-04-30T19:46:40Z

Merge remote-tracking branch 'upstream/master'

commit edab1ef07312cf44e26fccb7fca8f4a4977ad3ee
Author: FavioVazquez 
Date:   2015-05-05T14:16:02Z

Merge remote-tracking branch 'upstream/master'

commit 9af7074235b6c13001924e037772195b640115b8
Author: FavioVazquez 
Date:   2015-05-15T13:58:04Z

Merge remote-tracking branch 'upstream/master'

commit f27a20b9f53b643d5c963729f56ee548d0c8e263
Author: FavioVazquez 
Date:   2015-06-04T16:10:00Z

Merge remote-tracking branch 'upstream/master'

commit ad882a378e24458832b961fd97eb4b7662203ef9
Author: FavioVazquez 
Date:   2015-06-09T12:47:59Z

Merge remote-tracking branch 'upstream/master'

commit 424a92853f44c34f310e0a9e8dd927d246bd9171
Author: FavioVazquez 
Date:   2015-06-17T14:56:35Z

Merge remote-tracking branch 'upstream/master'

commit 5311719db61454eee5e1715f44507c294509ec1c
Author: FavioVazquez 
Date:   2015-09-21T06:43:29Z

Merge remote-tracking branch 'upstream/master'

commit d6b551b1d25cef07096b7e1fc22b659ed753d9dc
Author: faviovazquez 
Date:   2017-09-11T13:36:27Z

Merge remote-tracking branch 'upstream/master'

commit a95cfc6c5e88f44429319b52462b537ae0bc1857
Author: Favio André Vázquez 
Date:   2017-09-11T13:49:03Z

Fix doc for MAE




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8274][Documentation-MLlib] Fix wrong UR...

2015-06-09 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/6722#issuecomment-110371513
  
Thanks @srowen!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8274][Documentation-MLlib] Fix wrong UR...

2015-06-09 Thread FavioVazquez
GitHub user FavioVazquez opened a pull request:

https://github.com/apache/spark/pull/6722

[SPARK-8274][Documentation-MLlib] Fix wrong URLs in MLlib Frequent Pattern 
Mining Documentation

There is a mistake in the URLs of the Scala section of FP-Growth in the 
MLlib Frequent Pattern Mining documentation. The URL points to 
https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/fpm/FPGrowth.html
 which is the Java's API, the link should point to the Scala API 
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.fpm.FPGrowth

There's another mistake in the FP-GrowthModel in the same section, the link 
points, again, to the Java's API 
https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/fpm/FPGrowthModel.html,
 the link should point to the Scala API 
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.fpm.FPGrowthModel

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/FavioVazquez/spark 
fix-wrog-urls-mllib-fpgrowth

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6722.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6722


commit b2e2f8caef4f43d62cb48fc09a0fe17e71f3f4dd
Author: FavioVazquez 
Date:   2015-04-30T19:46:40Z

Merge remote-tracking branch 'upstream/master'

commit edab1ef07312cf44e26fccb7fca8f4a4977ad3ee
Author: FavioVazquez 
Date:   2015-05-05T14:16:02Z

Merge remote-tracking branch 'upstream/master'

commit 9af7074235b6c13001924e037772195b640115b8
Author: FavioVazquez 
Date:   2015-05-15T13:58:04Z

Merge remote-tracking branch 'upstream/master'

commit f27a20b9f53b643d5c963729f56ee548d0c8e263
Author: FavioVazquez 
Date:   2015-06-04T16:10:00Z

Merge remote-tracking branch 'upstream/master'

commit ad882a378e24458832b961fd97eb4b7662203ef9
Author: FavioVazquez 
Date:   2015-06-09T12:47:59Z

Merge remote-tracking branch 'upstream/master'

commit e1ca54dfa69179758166e059177e11155f0310b3
Author: FavioVazquez 
Date:   2015-06-09T12:55:50Z

- Fixed wrong URLs in MLlib Frequent Pattern Mining, FP-Growth Scala section




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7671] Fix wrong URLs in MLlib Data Type...

2015-05-15 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/6196#issuecomment-102553184
  
Happy to help @jkbradley


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7671] Fix wrong URLs in MLlib Data Type...

2015-05-15 Thread FavioVazquez
GitHub user FavioVazquez opened a pull request:

https://github.com/apache/spark/pull/6196

[SPARK-7671] Fix wrong URLs in MLlib Data Types Documentation

There is a mistake in the URL of Matrices in the MLlib Data Types 
documentation (Local matrix scala section), the URL points to 
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices
 which is a mistake, since Matrices is an object that implements factory 
methods for Matrix that does not have a companion class. The correct link 
should point to 
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices$

There is another mistake, in the Local Vector section in Scala, Java and 
Python

In the Scala section the URL of Vectors points to the trait Vector 
(https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vector)
 and not to the factory methods implemented in Vectors. 

The correct link should be: 
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$

In the Java section the URL of Vectors points to the Interface Vector 
(https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html)
 and not to the Class Vectors

The correct link should be:

https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vectors.html

In the Python section the URL of Vectors points to the class Vector 
(https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vector)
 and not the Class Vectors

The correct link should be:

https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vectors

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/FavioVazquez/spark 
fix-typo-matrices-mllib-datatypes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6196


commit b2e2f8caef4f43d62cb48fc09a0fe17e71f3f4dd
Author: FavioVazquez 
Date:   2015-04-30T19:46:40Z

Merge remote-tracking branch 'upstream/master'

commit edab1ef07312cf44e26fccb7fca8f4a4977ad3ee
Author: FavioVazquez 
Date:   2015-05-05T14:16:02Z

Merge remote-tracking branch 'upstream/master'

commit 9af7074235b6c13001924e037772195b640115b8
Author: FavioVazquez 
Date:   2015-05-15T13:58:04Z

Merge remote-tracking branch 'upstream/master'

commit 3e9efd56b8d6436f2985db24e2074a2662f3ed89
Author: FavioVazquez 
Date:   2015-05-15T18:31:49Z

- Fixed wrong URLs in the MLlib Data Types Documentation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-14 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102049341
  
All test passed @srowen. It was, as expected, an unrelated error. Is 
everything set now to merge this PR? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-14 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-102002721
  
This has happened before @srowen, I think this is again an unrelated fail. 
Could you ask jenkins to retest this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-14 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101981241
  
I'm happy to help, give me a sec and I'll push the changes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-14 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101980296
  
Are you sure Sean? I could make the change and push it, but if is easier to 
make the change in the merge you tell me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-13 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101601409
  
I hope that this is the correct way of making all the changes you 
suggested. Please check this and thank you @srowen @vanzin and @pwendell. Let 
me know if there is something else that could be done, or if this finishes the 
patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-13 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101565704
  
Great, I'm familiar with the process @srowen. Thank you guys for all the 
suggestions, I'm making the changes and be pushing the changes soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-12 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101511508
  
I see @pwendell, I'll push the changes tomorrow, is a little late here in 
Venezuela.

Greetings and thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-12 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101478579
  
And @srowen you said some days ago that you knew the places that this PR 
needed a Rebase, could you point them out to me please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-12 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101475218
  
In summary, add that line that @pwendell suggested.But I'm not sure about 
the default profiles, should I erase the hadoop-1 profile? there will be no 
default hadoop version now? Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-12 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101474934
  
I will make the suggestes changes and push them


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-12 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101474903
  
Sorry i closed it by accident


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-12 Thread FavioVazquez
Github user FavioVazquez closed the pull request at:

https://github.com/apache/spark/pull/5786


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-12 Thread FavioVazquez
GitHub user FavioVazquez reopened a pull request:

https://github.com/apache/spark/pull/5786

[SPARK-7249] Updated Hadoop dependencies due to inconsistency in the 
versions

Updated Hadoop dependencies due to inconsistency in the versions. Now the 
global properties are the ones used by the hadoop-2.2 profile, and the profile 
was set to empty but kept for backwards compatibility reasons.

Changes proposed by @vanzin resulting from previous pull-request 
https://github.com/apache/spark/pull/5783 that did not fixed the problem 
correctly.

Please let me know if this is the correct way of doing this, the comments 
of @vanzin are in the pull-request mentioned. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/FavioVazquez/spark update-hadoop-dependencies

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5786.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5786


commit ec91ce3c405123818a4c56ef361d9cc82951677d
Author: FavioVazquez 
Date:   2015-04-29T17:58:09Z

- Updated protobuf-java version of com.google.protobuf dependancy to fix 
blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix 
for 2.5.0-cdh5.3.3 version)

commit 660decce9d3c2300aee493b605da0da8a74b3ea6
Author: FavioVazquez 
Date:   2015-04-29T19:16:04Z

- Updated Hadoop dependencies due to inconsistency in the versions. Now the 
global properties are the ones used by the hadoop-2.2 profile, and the profile 
was set to empty but kept for backwards compatibility reasons

commit 7e9955df29b5d5c9cda950636d51da753e6d17ea
Author: FavioVazquez 
Date:   2015-04-29T19:35:08Z

- Updated Hadoop dependencies due to inconsistency in the versions. Now the 
global properties are the ones used by the hadoop-2.2 profile, and the profile 
was set to empty but kept for backwards compatibility reasons

commit 6b4bfafbe4f98c92ac2fe7aeb5f36a37d27a9678
Author: FavioVazquez 
Date:   2015-04-30T21:41:08Z

- Cleanup in hadoop-2.x profiles since they contained mostly redundant 
stuff.

commit 13542929c9cb3ddfec31bbb794e490b44c273df4
Author: FavioVazquez 
Date:   2015-04-30T22:13:50Z

- Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests 
and documentation

commit 287fa2ffc31bb0c9eaf5daf80825ff0093f3f20d
Author: FavioVazquez 
Date:   2015-04-30T22:17:44Z

- Updated documentation about specifying the hadoop version in 
building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by 
default.
- Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark 
doc.

commit 70b8344dcad8f6de71bd6356cd6eec375211fdb3
Author: FavioVazquez 
Date:   2015-04-30T22:57:16Z

- Fixed typo in the make-distribution.sh file and added hadoop-1 in the 
Related profiles

commit 88a8b88a13a02cbde04792cb63e3c6a81407d915
Author: FavioVazquez 
Date:   2015-05-01T16:48:27Z

- Simplified Hadoop profiles due to new setting of global properties in the 
pom.xml file
- Added comment to specify that the hadoop-2.2 profile is now the default 
hadoop profile in the pom.xml file
- Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the 
make-distribution.sh file

commit 199f40b1733015a414eb928b2090f3bf4d0b7a7e
Author: FavioVazquez 
Date:   2015-05-01T20:44:30Z

- Erased unnecessary CDH5-specific note in docs/building-spark.md
- Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in 
docs/building-spark.md
- Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now 
the default .Added comment in the yarn/pom.xml to specify that.

commit a6507792cc12fc03139be825357f22329773c823
Author: FavioVazquez 
Date:   2015-05-01T20:50:46Z

- Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml
- Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set 
in avro.mapred.classifier in pom.xml

commit 0470587ad7af93041e25dcb07954b835d9508a10
Author: FavioVazquez 
Date:   2015-05-01T21:06:52Z

- Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in 
create-release.sh
- Updated how the releases are made in the create-release.sh no that the 
default hadoop version is the 2.2.0
- Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in 
scalastyle
- Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in 
run-tests
- Better example given in the hadoop-third-party-distributions.md now that 
the default hadoop version is 2.2.0

commit fda6a51986ed4a656d37539502cd4684c46c8cfe
Author: FavioVazquez 
Date:   2015-05-01T21:42:04Z

- Updated hadoop1 releases in create-release.sh  due to changes in the 
default hadoop version set
- Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh
- Prettify

[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-12 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101474861
  
Perfect, I've been whatching all of your conversations. I wil make th


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-11 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101134292
  
I think that you guys are all right, you have suggested some great changes 
and I think I'll let @srowen and @vanzin, with you @pwendell decide for the 
future of this PR, in my humble opinion it could be good, but is all up to you 
guys. 

I'll be alert to the comments of this PR and please let me know if there is 
something I could help, making this patch better, or fixing this issues in 
another way. 

Thanks for teaching me great stuff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-11 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-101126703
  
I think @srowen saw this PR as a cleaned up of old dependencies and 
updating of spark's defaults to a currently used Hadoop version. This started 
as a minor fix for inconsistencies in the Hadoop defaults when using the latest 
CDH5 distribution, and grew to be a upgrading of the Hadoop default version, 
updating of docs, cleaned up yarn's POM and main POM. I still face problems 
when building Spark for CDH5 without this changes, and I think it would be 
helpful to update the versions, since Hadoop-1 is really old, and I really 
believe it pumps up Spark to the newest technologies.

I'm no expert in this field, but I think this PR could be interesting and 
useful for a lot of people that's starting with this technologies and would 
like to build Spark with the newest Hadoop version. I have to remark that if 
you use the actual building process and main POM, you'll get errors when try to 
connect to Cloudera's newest HDFS, yo can see that in the beginning of the PR. 
It's really awkward to build Spark with lots of ad hoc  and in situ 
dependencies just to keep old versions, Idk maybe it's just me. 

I really appreciated @srowen and @vanzin help with this, and would like to 
now if you think this is the right track to Spark 1.4.0 @pwendell. I'm up to 
making any more changes and updates if you think is necessary, and I repeat, I 
think this could be a good refresh to spark dependencies, I know this is really 
a minor change, but it could grow to be even a better update.

Thanks for your comments, I'll wait for your replies. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-06 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-99705433
  
Hello @srowen any advances in the coordination (if/when) of mergin this PR? 
Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-04 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98835084
  
Ok I see @srowen thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-04 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98831666
  
Hello @srowen, I'm not sure about the next steps you mentioned, could you 
please explain me what's going to happen now with the PR. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-03 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98498096
  
Great @srowen please let me know if I can help with something else in this 
patch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-02 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98394731
  
@srowen @vanzin everything seems fine and test passed 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-02 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98371065
  
Hello @srowen I was noticing some of that things. Thank you for making it 
easy for me to change it. I just pushed your suggested changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98277843
  
 @vanzin @srowen thank you for helping me a lot with this patch. Let me 
know if this finishes the patch please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98268024
  
Hope that's what you have been talking about @srowen @vanzin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98262625
  
So should I erase the yarn profile from the root POM and move the entire 
profile into the yarn/POM? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98261176
  
You are right @vanzin, I just pushed those changes. I'll wait for @srowen 
comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98260176
  
@vanzin I think that's what you ment. Check if everithing is OK now please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98259020
  
I see. I'll do that right away


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98258799
  
OK @vanzin so I should change the yarn confing in the main POM and leave 
the yarn/pom.xml as it was?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98257656
  
Great @srowen. Lets wait for @vanzin comments on this 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98251739
  
Please let me now if this completes the changes in the patch. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98244148
  
Yes I figured it out a second before you told me looking at the other 
implementations. Thanks. I'll be pushing the changes in a second


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98242483
  
Great. My confusion is with the create-release, what have to be change in 
there? So the names of the releases should be kept but change the 
implementation? I'm not quite sure what you ment there @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98242063
  
Yes I saw the hadoop1.jar there also. Should I keep it like this or change 
the hadoop-1 profile to the hadoop2 also?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98239441
  
Please check if this are the changes that you suggested @vanzin @srowen.

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-05-01 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98212186
  
Great, I'll do that right away


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-04-30 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-98009452
  
Are this changes what is needed to complete the patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-04-30 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-97959873
  
Oh I see. Thanks. Should I keep the PR open then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-04-30 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-97934646
  
Hi, now that the test have passed what will happen now? I've read the 
documentation in 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PullRequest
 but I'm not sure, will this be merge to master? Should I do something else? 
keep the PR open?

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-04-29 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-97618915
  
In the link 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31337/, 
there is no failure in the tests, but in the console output the error thrown 
was this:

FAIL: test_count_by_value_and_window (__main__.WindowFunctionTests)
--
Traceback (most recent call last):
  File "pyspark/streaming/tests.py", line 418, in 
test_count_by_value_and_window
self._test_func(input, func, expected)
  File "pyspark/streaming/tests.py", line 133, in _test_func
self.assertEqual(expected, result)
AssertionError: Lists differ: [[1], [2], [3], [4], [5], [6], [6], [6], [6], 
[6]] != [[1], [2], [3], [4], [5], [6], [6], [6], [6]]

First list contains 1 additional elements.
First extra element 9:
[6]

- [[1], [2], [3], [4], [5], [6], [6], [6], [6], [6]]
?  -

+ [[1], [2], [3], [4], [5], [6], [6], [6], [6]]

--
Ran 40 tests in 134.429s

FAILED (failures=1)
('timeout after', 20)
('timeout after', 20)
('timeout after', 20)
('timeout after', 5)
Had test failures; see logs.
[error] Got a return code of 255 on line 240 of the run-tests script.
Archiving unit tests logs...

So I'm not sure what happened.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7249] Updated Hadoop dependencies due t...

2015-04-29 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5786#issuecomment-97618173
  
Can someone please explain me what failed? I'm kinda new to this, and I'm 
not sure what failed, I want to know to fix it and maybe create another pull 
request. Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Updated Hadoop dependencies due to inconsisten...

2015-04-29 Thread FavioVazquez
GitHub user FavioVazquez opened a pull request:

https://github.com/apache/spark/pull/5786

Updated Hadoop dependencies due to inconsistency in the versions

Updated Hadoop dependencies due to inconsistency in the versions. Now the 
global properties are the ones used by the hadoop-2.2 profile, and the profile 
was set to empty but kept for backwards compatibility reasons.

Changes proposed by @vanzin resulting from previous pull-request 
https://github.com/apache/spark/pull/5783 that did not fixed the problem 
correctly.

Please let me know if this is the correct way of doing this, the comments 
of @vanzin are in the pull-request mentioned. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/FavioVazquez/spark update-hadoop-dependencies

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5786.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5786


commit ec91ce3c405123818a4c56ef361d9cc82951677d
Author: FavioVazquez 
Date:   2015-04-29T17:58:09Z

- Updated protobuf-java version of com.google.protobuf dependancy to fix 
blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix 
for 2.5.0-cdh5.3.3 version)

commit 660decce9d3c2300aee493b605da0da8a74b3ea6
Author: FavioVazquez 
Date:   2015-04-29T19:16:04Z

- Updated Hadoop dependencies due to inconsistency in the versions. Now the 
global properties are the ones used by the hadoop-2.2 profile, and the profile 
was set to empty but kept for backwards compatibility reasons

commit 7e9955df29b5d5c9cda950636d51da753e6d17ea
Author: FavioVazquez 
Date:   2015-04-29T19:35:08Z

- Updated Hadoop dependencies due to inconsistency in the versions. Now the 
global properties are the ones used by the hadoop-2.2 profile, and the profile 
was set to empty but kept for backwards compatibility reasons




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7238] Update protobuf-java version of c...

2015-04-29 Thread FavioVazquez
Github user FavioVazquez closed the pull request at:

https://github.com/apache/spark/pull/5783


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7238] Update protobuf-java version of c...

2015-04-29 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5783#issuecomment-97545038
  
Thank you for clearing that up for me. I'm doing the changes that you 
suggested and will make soon a pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7238] Update protobuf-java version of c...

2015-04-29 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5783#issuecomment-97529157
  
I see. So it should be done in the command line when building? The problem 
is that the pre-compiled version for CDH doesn't work for the  2.5.0-cdh5.3.3 
version because the protbuf inherits from the global property that is fixed for 
2.4.1 and throws an error, doing this it worked. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7238] Update protobuf-java version of c...

2015-04-29 Thread FavioVazquez
GitHub user FavioVazquez opened a pull request:

https://github.com/apache/spark/pull/5783

[SPARK-7238] Update protobuf-java version of com.google.protobuf dependancy

This upgrade is needed when building spark for CDH5 2.5.0-cdh5.3.3 due to 
incompatibilities in the protobuf version used by com.google.protobuf and the 
one used in hadoop. The default version of protobuf is set to 2.4.1 in the 
global properties, and this is stated in the pom.xml file:



So this upgrade will only be affecting the com.google.protobuf version of 
java-protobuf. Tested for the Cloudera distribution 2.5.0-cdh5.3.3 using Mesos 
0.22.0 in cluster mode.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/FavioVazquez/spark upgrade-protobuf-version

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5783.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5783


commit ec91ce3c405123818a4c56ef361d9cc82951677d
Author: FavioVazquez 
Date:   2015-04-29T17:58:09Z

- Updated protobuf-java version of com.google.protobuf dependancy to fix 
blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix 
for 2.5.0-cdh5.3.3 version)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6972][SQL] Add Coalesce to DataFrame

2015-04-29 Thread FavioVazquez
Github user FavioVazquez commented on the pull request:

https://github.com/apache/spark/pull/5545#issuecomment-97453694
  
How do I can implement this? I've built spark using the newest version of 
master, that contains this code, but IntelliJ still doesn't recognize the 
coalesce(1)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org