[GitHub] spark pull request #17874: [SPARK-20612][SQL] Throw exception when there is ...

2017-05-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17874#discussion_r115135600
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1023,8 +1023,6 @@ class Analyzer(
* clause.  This rule detects such queries and adds the required 
attributes to the original
* projection, so that they will be available during sorting. Another 
projection is added to
* remove these attributes after sorting.
-   *
-   * The HAVING clause could also used a grouping columns that is not 
presented in the SELECT.
--- End diff --

This is by design. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17874: [SPARK-20612][SQL] Throw exception when there is unresol...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17874
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17874: [SPARK-20612][SQL] Throw exception when there is unresol...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17874
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76538/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17874: [SPARK-20612][SQL] Throw exception when there is unresol...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17874
  
**[Test build #76538 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76538/testReport)**
 for PR 17874 at commit 
[`f19976a`](https://github.com/apache/spark/commit/f19976a7e0818f36768d339bdcd883b31197de7e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17885
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76535/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17885
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17885
  
**[Test build #76535 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76535/testReport)**
 for PR 17885 at commit 
[`99414d7`](https://github.com/apache/spark/commit/99414d7ce352d7d4dd32a9ad4eda93c11d360cac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...

2017-05-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17077
  
Also cc @cloud-fan who is the original PR author who implemented bucketBy. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...

2017-05-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17077
  
@zero323 Could you also update the [SQL 
document](http://spark.apache.org/docs/latest/sql-programming-guide.html)?

https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md

Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for...

2017-05-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17077#discussion_r115134650
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -563,6 +563,60 @@ def partitionBy(self, *cols):
 self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, 
cols))
 return self
 
+@since(2.3)
+def bucketBy(self, numBuckets, *cols):
+"""Buckets the output by the given columns on the file system.
--- End diff --

Thank you for adding the wrapper. 

Yes. We should make the Python APIs consistent with Scala APIs, if 
possible. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17882
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76533/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17882
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17882
  
**[Test build #76533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76533/testReport)**
 for PR 17882 at commit 
[`53d0c25`](https://github.com/apache/spark/commit/53d0c2551ef73dc843a53a088c5c7c835956f490).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76541 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76541/testReport)**
 for PR 17770 at commit 
[`d0a94f4`](https://github.com/apache/spark/commit/d0a94f417bbe22f081772b2518315b367093b81d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-05-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17865
  
Could you please check the documents we did in Scala APIs? It sounds like 
we forgot to update the Python function descriptions when we did the change in 
the Scala APIs. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

2017-05-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17865#discussion_r115134581
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -409,7 +432,7 @@ def isnan(col):
 
 @since(1.6)
 def isnull(col):
-"""An expression that returns true iff the column is null.
+"""An expression that returns true if the column is null.
--- End diff --

`the column`? This is misleading. We should make the Python documents 
consistent with what we did in Scala. 
For example, `isNull` in Scala APIs is described as
> Returns true if `expr` is null, or false otherwise.

Ref: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L280


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17831: [SPARK-18777][PYTHON][SQL] Return UDF from udf.re...

2017-05-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17831


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17831: [SPARK-18777][PYTHON][SQL] Return UDF from udf.register

2017-05-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17831
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17835: [SPARK-20557] [SQL] Support JDBC data type Time w...

2017-05-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17835


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17835: [SPARK-20557] [SQL] Support JDBC data type Time with Tim...

2017-05-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17835
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17887: [SPARK-20399][SQL][WIP] Add a config to fallback string ...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17887
  
**[Test build #76540 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76540/testReport)**
 for PR 17887 at commit 
[`d0b2c22`](https://github.com/apache/spark/commit/d0b2c2278ec7d10cc1ab998be489e6553a8dc193).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17736: [SPARK-20399][SQL] Can't use same regex pattern b...

2017-05-06 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/17736


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17887: [SPARK-20399][SQL][WIP] Add a config to fallback string ...

2017-05-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17887
  
cc @dbtsai @cloud-fan @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17887: [SPARK-20399][SQL][WIP] Add a config to fallback ...

2017-05-06 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/17887

[SPARK-20399][SQL][WIP] Add a config to fallback string literal parsing 
consistent with old sql parser behavior

## What changes were proposed in this pull request?

Follow the discussion in #17736, this patch adds a config to fallback to 
1.6 string literal parsing.

## How was this patch tested?

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 
add-config-fallback-string-parsing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17887.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17887


commit d0b2c2278ec7d10cc1ab998be489e6553a8dc193
Author: Liang-Chi Hsieh 
Date:   2017-04-19T01:49:47Z

Add a config to fallback string literal parsing consistent with old sql 
parser behavior.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17886
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17886
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76539/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17886
  
**[Test build #76539 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76539/testReport)**
 for PR 17886 at commit 
[`995d9a8`](https://github.com/apache/spark/commit/995d9a864e68febbca7b9541815c5e42735ebd03).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17886
  
**[Test build #76539 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76539/testReport)**
 for PR 17886 at commit 
[`995d9a8`](https://github.com/apache/spark/commit/995d9a864e68febbca7b9541815c5e42735ebd03).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17878
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17878
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76532/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17878
  
**[Test build #76532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76532/testReport)**
 for PR 17878 at commit 
[`d69c71a`](https://github.com/apache/spark/commit/d69c71a0dacabc47863d49815ee67dc0d5515e5a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17881: [SPARK-20621][deploy]Delete deprecated config parameter ...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17881
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76531/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17886
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17886
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76537/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17881: [SPARK-20621][deploy]Delete deprecated config parameter ...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17881
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17886
  
**[Test build #76537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76537/testReport)**
 for PR 17886 at commit 
[`4bf1443`](https://github.com/apache/spark/commit/4bf1443a65d23b4e470dc4f7c4e57ce34460f551).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17881: [SPARK-20621][deploy]Delete deprecated config parameter ...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17881
  
**[Test build #76531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76531/testReport)**
 for PR 17881 at commit 
[`50b4f2d`](https://github.com/apache/spark/commit/50b4f2d2269f03f3650443405a28546843f98f53).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76536 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76536/testReport)**
 for PR 17770 at commit 
[`7debd76`](https://github.com/apache/spark/commit/7debd76a0d69758d394a881a932c6714120cc180).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76536/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15466: [SPARK-13983][SQL] HiveThriftServer2 can not get ...

2017-05-06 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/15466


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17874: [SPARK-20612][SQL][WIP] Throw exception when there is un...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17874
  
**[Test build #76538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76538/testReport)**
 for PR 17874 at commit 
[`f19976a`](https://github.com/apache/spark/commit/f19976a7e0818f36768d339bdcd883b31197de7e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17880: [SPARK-20620][TEST]Add some unit tests into NullExpressi...

2017-05-06 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/17880
  
@gatorsmile thanks,l will do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17886
  
**[Test build #76537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76537/testReport)**
 for PR 17886 at commit 
[`4bf1443`](https://github.com/apache/spark/commit/4bf1443a65d23b4e470dc4f7c4e57ce34460f551).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17874: [SPARK-20612][SQL][WIP] Throw exception when there is un...

2017-05-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17874
  
@cloud-fan This rule could make the query work:

Seq(1).toDF("c1").createOrReplaceTempView("onerow")
sql(
  """
| select 1
|from   (select 1 from onerow t2 LIMIT 1)
|where  t2.c1=1""".stripMargin)

But the where condition should not be able to refer `t2.c1` which is only 
available in the inner scope.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can...

2017-05-06 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/17886

[SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not get "--hiveconf" and 
''--hivevar" variables since 2.x

## What changes were proposed in this pull request?

Fix HiveThriftServer2 can not get "--hiveconf" and ''--hivevar" variables 
since 2.x

## How was this patch tested?

manual tests and unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-13983-dev

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17886.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17886


commit 4bf1443a65d23b4e470dc4f7c4e57ce34460f551
Author: Yuming Wang 
Date:   2017-05-07T03:35:33Z

Spark 2.x's HiveThriftServer2 support get "--hiveconf" and ''--hivevar" 
variables




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17884
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17884
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76534/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17884
  
**[Test build #76534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76534/testReport)**
 for PR 17884 at commit 
[`796a8e7`](https://github.com/apache/spark/commit/796a8e73fdfb986bedf443e69d228782f5e82fa8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for...

2017-05-06 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17077#discussion_r115133626
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -563,6 +563,60 @@ def partitionBy(self, *cols):
 self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, 
cols))
 return self
 
+@since(2.3)
+def bucketBy(self, numBuckets, *cols):
+"""Buckets the output by the given columns on the file system.
+
+:param numBuckets: the number of buckets to save
+:param cols: name of columns
+
+.. note:: Applicable for file-based data sources in combination 
with
+  :py:meth:`DataFrameWriter.saveAsTable`.
+
+>>> (df.write.format('parquet')
+... .bucketBy(100, 'year', 'month')
+... .mode("overwrite")
+... .saveAsTable('bucketed_table'))
+"""
+if len(cols) == 1 and isinstance(cols[0], (list, tuple)):
+cols = cols[0]
+
+if not isinstance(numBuckets, int):
+raise TypeError("numBuckets should be an int, got 
{0}.".format(type(numBuckets)))
+
+if not all(isinstance(c, basestring) for c in cols):
+raise TypeError("cols argument should be a string or a 
sequence of strings.")
--- End diff --

Good point. We can support arbitrary `Iterable[str]` though. 

```python
if len(cols) == 1 and isinstance(cols[0], collections.abc.Iterable):
cols = list(cols[0])
```

Caveat is, we don't allow this anywhere else.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76536/testReport)**
 for PR 17770 at commit 
[`7debd76`](https://github.com/apache/spark/commit/7debd76a0d69758d394a881a932c6714120cc180).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17885
  
**[Test build #76535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76535/testReport)**
 for PR 17885 at commit 
[`99414d7`](https://github.com/apache/spark/commit/99414d7ce352d7d4dd32a9ad4eda93c11d360cac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...

2017-05-06 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17885
  
I'll target this for master, branch-2.2, branch-2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbuti...

2017-05-06 Thread holdenk
GitHub user holdenk opened a pull request:

https://github.com/apache/spark/pull/17885

[SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Python 
version

## What changes were proposed in this pull request?

Drop the hadoop distirbution name from the Python version (PEP440).

## How was this patch tested?

Ran `make-distribution` locally

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/holdenk/spark 
SPARK-20627-remove-pip-local-version-string

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17885.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17885


commit 4e30ba90a7f14627d098d676f1ee8bf02d62eb9e
Author: Holden Karau 
Date:   2017-05-07T02:40:40Z

Drop the hadoop distirbution name from the Python version packaging string

commit 99414d7ce352d7d4dd32a9ad4eda93c11d360cac
Author: Holden Karau 
Date:   2017-05-07T03:22:02Z

Update comment since we don't have name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17884
  
**[Test build #76534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76534/testReport)**
 for PR 17884 at commit 
[`796a8e7`](https://github.com/apache/spark/commit/796a8e73fdfb986bedf443e69d228782f5e82fa8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes

2017-05-06 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17884
  
@felixcheung I ran a quick QA on the vignettes and fixed some additional 
typos and styles. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...

2017-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17697#discussion_r115133315
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctionsSuite.scala 
---
@@ -22,9 +22,13 @@ import org.apache.spark.mllib.rdd.MLPairRDDFunctions._
 import org.apache.spark.mllib.util.MLlibTestSparkContext
 
 class MLPairRDDFunctionsSuite extends SparkFunSuite with 
MLlibTestSparkContext {
+  val source_array = Array(
--- End diff --

Also, I think we use naming convention, `sourceArray`. Probably, just 
`data` is enough?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17697: [SPARK-20414][MLLIB] avoid creating only 16 reducers whe...

2017-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17697
  
I left some comments here though, I think I am not confident enough for a 
sign-off. Please let me defer to @srowen and @tejasapatil 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...

2017-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17697#discussion_r115133197
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctionsSuite.scala 
---
@@ -22,9 +22,13 @@ import org.apache.spark.mllib.rdd.MLPairRDDFunctions._
 import org.apache.spark.mllib.util.MLlibTestSparkContext
 
 class MLPairRDDFunctionsSuite extends SparkFunSuite with 
MLlibTestSparkContext {
+  val source_array = Array(
+(1, 7), (1, 3), (1, 6), (1, 1), (1, 2), (1, -1),
+(3, 2), (3, 7), (5, 1), (3, 5)
+ )
--- End diff --

Indentation should be double-spaced here too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...

2017-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17697#discussion_r115133107
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala ---
@@ -49,6 +53,7 @@ class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: 
RDD[(K, V)]) extends Se
   }
 ).mapValues(_.toArray.sorted(ord.reverse))  // This is a min-heap, so 
we reverse the order.
   }
+
--- End diff --

It looks this newline can be removed if more commits should be pushed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...

2017-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17697#discussion_r115133103
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala ---
@@ -40,7 +40,11 @@ class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: 
RDD[(K, V)]) extends Se
* @return an RDD that contains the top k values for each key
*/
   def topByKey(num: Int)(implicit ord: Ordering[V]): RDD[(K, Array[V])] = {
-self.aggregateByKey(new BoundedPriorityQueue[V](num)(ord))(
+  topByKey(num, 16)
--- End diff --

To be clear, was this 16 by default before this PR? Adding a parameter 
would be fine but this should not change the original behaviour without this 
parameter before this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...

2017-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17697#discussion_r115133174
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala ---
@@ -40,7 +40,11 @@ class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: 
RDD[(K, V)]) extends Se
* @return an RDD that contains the top k values for each key
*/
   def topByKey(num: Int)(implicit ord: Ordering[V]): RDD[(K, Array[V])] = {
-self.aggregateByKey(new BoundedPriorityQueue[V](num)(ord))(
+  topByKey(num, 16)
--- End diff --

Also, I believe the indentation here should be double spaced.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17882
  
**[Test build #76533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76533/testReport)**
 for PR 17882 at commit 
[`53d0c25`](https://github.com/apache/spark/commit/53d0c2551ef73dc843a53a088c5c7c835956f490).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

2017-05-06 Thread debasish83
Github user debasish83 commented on the issue:

https://github.com/apache/spark/pull/17862
  
@hhbyyh can we smooth the hinge-loss using soft-max (variant of ReLU) and 
then use LBFGS ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17697: [SPARK-20414][MLLIB] avoid creating only 16 reducers whe...

2017-05-06 Thread yangyangyyy
Github user yangyangyyy commented on the issue:

https://github.com/apache/spark/pull/17697
  
@HyukjinKwon  @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17831: [SPARK-18777][PYTHON][SQL] Return UDF from udf.register

2017-05-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17831
  
This change LGTM.

I go to check #17848. It seems to me that the PR simply adds two flags into 
ScalaUDF. It appears that there is not API change regarding with existing UDF 
registration. I agreed with @holdenk and @HyukjinKwon that it is orthogonal to 
this change for now.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scala...

2017-05-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17801#discussion_r115132676
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2657,22 +2661,27 @@ object functions {
 
   /**
* Converts time string in format -MM-dd HH:mm:ss to Unix timestamp 
(in seconds),
-   * using the default timezone and the default locale, return null if 
fail.
+   * using the default timezone and the default locale.
+   * Returns `null` if fails.
+   *
* @group datetime_funcs
* @since 1.5.0
*/
   def unix_timestamp(s: Column): Column = withExpr {
 UnixTimestamp(s.expr, Literal("-MM-dd HH:mm:ss"))
   }
 
+  // scalastyle:off line.size.limit
   /**
-   * Convert time string with given pattern
-   * (see 
[http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html])
-   * to Unix time stamp (in seconds), return null if fail.
+   * Converts time string with given pattern to Unix timestamp (in 
seconds).
+   * Returns `null` if fails.
+   *
+   * @see http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html";>Customizing
 Formats
--- End diff --

that can avoid having scalastyle:off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...

2017-05-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17878
  
Thanks, @HyukjinKwon AppVeyor looks good, waiting for Jenkins again 
(although, it has nothing to do with it..)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17878
  
**[Test build #76532 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76532/testReport)**
 for PR 17878 at commit 
[`d69c71a`](https://github.com/apache/spark/commit/d69c71a0dacabc47863d49815ee67dc0d5515e5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17881: [SPARK-20621][deploy]Delete deprecated config parameter ...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17881
  
**[Test build #76531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76531/testReport)**
 for PR 17881 at commit 
[`50b4f2d`](https://github.com/apache/spark/commit/50b4f2d2269f03f3650443405a28546843f98f53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...

2017-05-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17878
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17881: [SPARK-20621][deploy]Delete deprecated config parameter ...

2017-05-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17881
  
Jenkins, ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes

2017-05-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17884
  
@actuaryzhang thanks - would you have a chance to run a quick QA check on 
the rest of the vignettes, if you haven't already?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes

2017-05-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17884
  
This test seems flaky on AppVeyor, not sure why
```
Failed 
-
1. Error: spark.glm and predict (@test_mllib_regression.R#57) 
--
java.lang.IllegalStateException: SparkContext has been shutdown
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2015)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2044)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2063)
at 
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:333)
at 
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2923)
at 
org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2237)
at 
org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2237)
at org.apache.spark.sql.Dataset$$anonfun$57.apply(Dataset.scala:2907)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2906)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2237)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2244)
at org.apache.spark.sql.Dataset.first(Dataset.scala:2251)

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...

2017-05-06 Thread mariahualiu
Github user mariahualiu commented on the issue:

https://github.com/apache/spark/pull/17854
  
Now I can comfortably use 2500 executors. But when I pushed the executor 
count to 3000, I saw a lot of heartbeat timeout errors. It is something else we 
can improve, probably another jira. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...

2017-05-06 Thread mariahualiu
Github user mariahualiu commented on the issue:

https://github.com/apache/spark/pull/17854
  
I re-ran the same application adding these configurations "--conf 
spark.yarn.scheduler.heartbeat.interval-ms=15000 --conf 
spark.yarn.launchContainer.count.simultaneously=50". Though it took 50 
iterations to get 2500 containers from Yarn, it was faster to reach 2500 
executors since there was much less executor failures and as a result little 
overhead of removing failed executors and less allocation requests to Yarn.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17298: [SPARK-19094][WIP][PySpark] Plumb through logging for IJ...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17298
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17298: [SPARK-19094][WIP][PySpark] Plumb through logging for IJ...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17298
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76530/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17298: [SPARK-19094][WIP][PySpark] Plumb through logging for IJ...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17298
  
**[Test build #76530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76530/testReport)**
 for PR 17298 at commit 
[`6c22a89`](https://github.com/apache/spark/commit/6c22a895f12a54f8b23f2cbb94c8bad3276a93bd).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...

2017-05-06 Thread mariahualiu
Github user mariahualiu commented on the issue:

https://github.com/apache/spark/pull/17854
  
Let me describe what I've seen when using 2500 executors.

1. In the first a few (2~3) requests, AM received all (in this case 2500) 
containers from Yarn. 
2. In a few seconds, 2500 launch container commands were sent out. 
3. It took 3~4 minutes to start an executor on an NM (most of the time was 
spent on container localization: downloading spark jar, application jar and 
etc. from the hdfs staging folder). 
4. A large number of executors tried to retrieve spark properties from 
driver but failed to connect. A massive removing failed executors happened. It 
seems to me RemoveExecutor is handled by the same single thread that responds 
to RetrieveSparkProps and RegisterExecutor. As a result, this thread was even 
busier, and more executors cannot connect/register/etc.
5. YarnAllocator requested more containers to make up for the failed ones. 
More executors tried to retrieve spark props and register. However the thread 
was still overwhelmed by the previous round of executors and cannot respond. 

In some cases, we got 5000 executor failures and the application retried 
and eventually failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17644
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76529/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17644
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17644
  
**[Test build #76529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76529/testReport)**
 for PR 17644 at commit 
[`6e6e767`](https://github.com/apache/spark/commit/6e6e767c9a6787965d6eb9a32608aacacd543e23).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...

2017-05-06 Thread mariahualiu
Github user mariahualiu commented on the issue:

https://github.com/apache/spark/pull/17854
  
@squito yes, I capped the number of resources in updateResourceRequests so 
that YarnAllocator asks for less number of resources in each iteration. When 
allocation fails one iteration, the request is then added back and 
YarnAllocator will try to allocate the leftover (from the previous iteration) 
plus the new requests in the next iteration, which can result a lot of 
allocated containers. The second change, as you pointed out, is used to address 
this possibility. On a second thought, maybe it is a better solution to change 
AMRMClientImpl::allocate where it does not add all resource requests from ask 
to askList. 

@tgravescs I tried reducing spark.yarn.containerLauncherMaxThreads but it 
didn't help much. My understanding is that these threads send container launch 
commands to node managers and immediately return, which is very light weight 
and can be extremely fast. Launching container on NM side is an async 
operation. 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes

2017-05-06 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17884
  
@HyukjinKwon Thanks for pointing this out. I will keep this in mind next 
time.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (an...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17801
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76528/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (an...

2017-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17801
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (an...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17801
  
**[Test build #76528 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76528/testReport)**
 for PR 17801 at commit 
[`5326ad1`](https://github.com/apache/spark/commit/5326ad1775d3c5467d4684bfa13cbece7cc92ac5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaFPGrowthExample `
  * `class SingularValueDecomposition(JavaModelWrapper):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17298: [SPARK-19094][WIP][PySpark] Plumb through logging for IJ...

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17298
  
**[Test build #76530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76530/testReport)**
 for PR 17298 at commit 
[`6c22a89`](https://github.com/apache/spark/commit/6c22a895f12a54f8b23f2cbb94c8bad3276a93bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

2017-05-06 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17451
  
Great, let me know if there is any questions @keypointt :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17831: [SPARK-18777][PYTHON][SQL] Return UDF from udf.register

2017-05-06 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17831
  
@gatorsmile want to know if you're ok with this going into master or if you 
still have concerns about this if its targeted to 2.3?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for...

2017-05-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17077#discussion_r115129682
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -563,6 +563,60 @@ def partitionBy(self, *cols):
 self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, 
cols))
 return self
 
+@since(2.3)
+def bucketBy(self, numBuckets, *cols):
+"""Buckets the output by the given columns on the file system.
--- End diff --

I'd copy the full description from DataFrameWriter here since comparing it 
to Hive could help people new to Spark understand what bucketBy does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for...

2017-05-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17077#discussion_r115129876
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -563,6 +563,60 @@ def partitionBy(self, *cols):
 self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, 
cols))
 return self
 
+@since(2.3)
+def bucketBy(self, numBuckets, *cols):
+"""Buckets the output by the given columns on the file system.
+
+:param numBuckets: the number of buckets to save
+:param cols: name of columns
+
+.. note:: Applicable for file-based data sources in combination 
with
+  :py:meth:`DataFrameWriter.saveAsTable`.
+
+>>> (df.write.format('parquet')
+... .bucketBy(100, 'year', 'month')
+... .mode("overwrite")
+... .saveAsTable('bucketed_table'))
+"""
+if len(cols) == 1 and isinstance(cols[0], (list, tuple)):
+cols = cols[0]
+
+if not isinstance(numBuckets, int):
+raise TypeError("numBuckets should be an int, got 
{0}.".format(type(numBuckets)))
+
+if not all(isinstance(c, basestring) for c in cols):
+raise TypeError("cols argument should be a string or a 
sequence of strings.")
--- End diff --

So I don't think we really support all sequences (the above typecheck on 
L581 requires list or tuple but there are additional types of sequences).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for...

2017-05-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17077#discussion_r115129884
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -563,6 +563,60 @@ def partitionBy(self, *cols):
 self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, 
cols))
 return self
 
+@since(2.3)
+def bucketBy(self, numBuckets, *cols):
+"""Buckets the output by the given columns on the file system.
+
+:param numBuckets: the number of buckets to save
+:param cols: name of columns
+
+.. note:: Applicable for file-based data sources in combination 
with
+  :py:meth:`DataFrameWriter.saveAsTable`.
+
+>>> (df.write.format('parquet')
+... .bucketBy(100, 'year', 'month')
+... .mode("overwrite")
+... .saveAsTable('bucketed_table'))
+"""
+if len(cols) == 1 and isinstance(cols[0], (list, tuple)):
+cols = cols[0]
+
+if not isinstance(numBuckets, int):
+raise TypeError("numBuckets should be an int, got 
{0}.".format(type(numBuckets)))
+
+if not all(isinstance(c, basestring) for c in cols):
+raise TypeError("cols argument should be a string or a 
sequence of strings.")
+
+col = cols[0]
+cols = cols[1:]
+
+self._jwrite = self._jwrite.bucketBy(numBuckets, col, 
_to_seq(self._spark._sc, cols))
+return self
+
+@since(2.3)
+def sortBy(self, *cols):
+"""Sorts the output in each bucket by the given columns on the 
file system.
+
+:param cols: name of columns
+
+>>> (df.write.format('parquet')
+... .bucketBy(100, 'year', 'month')
+... .sortBy('day')
+... .mode("overwrite")
+... .saveAsTable('sorted_bucketed_table'))
+"""
+if len(cols) == 1 and isinstance(cols[0], (list, tuple)):
+cols = cols[0]
+
+if not all(isinstance(c, basestring) for c in cols):
+raise TypeError("cols argument should be a string or a 
sequence of strings.")
--- End diff --

same note as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-05-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r115129786
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -1355,7 +1370,7 @@ def test_java_params(self):
 for name, cls in inspect.getmembers(module, inspect.isclass):
 if not name.endswith('Model') and issubclass(cls, 
JavaParams)\
 and not inspect.isabstract(cls):
-self.check_params(cls())
+ParamTests.check_params(self, cls(), 
check_params_exist=False)
--- End diff --

This might make sense to include as a comment in the code for whoever is 
coming to update this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-05-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r115129846
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -263,7 +282,14 @@ def _fit_java(self, dataset):
 
 def _fit(self, dataset):
 java_model = self._fit_java(dataset)
-return self._create_model(java_model)
+model = self._create_model(java_model)
+
+# SPARK-10931: This is a temporary fix to allow models to own 
params
+# from estimators. Eventually, these params should be in models 
through
+# using common base classes between estimators and models.
+model._create_params_from_java()
--- End diff --

So right now this would apply to all of the models, would it make sense to 
make it so that we can selectively move the params forward one at a time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables

2017-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17644
  
**[Test build #76529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76529/testReport)**
 for PR 17644 at commit 
[`6e6e767`](https://github.com/apache/spark/commit/6e6e767c9a6787965d6eb9a32608aacacd543e23).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables

2017-05-06 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/17644
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16966: [SPARK-18409][ML]LSH approxNearestNeighbors should use a...

2017-05-06 Thread Yunni
Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/16966
  
@MLnick @jkbradley @sethah Could you take a review? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...

2017-05-06 Thread Yunni
Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/17092
  
@MLnick @jkbradley @sethah Could you take a review? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >