[GitHub] spark issue #18592: [SPARK-21368][SQL] TPCDSQueryBenchmark can't refer query...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18592
  
**[Test build #81679 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81679/testReport)**
 for PR 18592 at commit 
[`06e306f`](https://github.com/apache/spark/commit/06e306fdb4199a8c7850a6a370ce67aeac0cdf8e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19203: [BUILD] Close stale PRs

2017-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19203
  
@srowen, it looks `19091` is missed. The rest of mine is a subset of the 
current list.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19199
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19132
  
Thanks @HyukjinKwon , I will ping Josh about this thing 😄 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19201
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19132
  
@jerryshao, for triggering tests on Jenkins, I think this should be added 
by its admin manually as well if I understood correctly. In my case, I asked 
this to Josh Rosen before via email privately. I am quite sure you are facing 
the same issue I (and Holden, Felix and Takuya) met before if I understood 
correctly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19201
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81671/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19201
  
**[Test build #81671 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81671/testReport)**
 for PR 19201 at commit 
[`7b414fa`](https://github.com/apache/spark/commit/7b414fafcf53e9e9e79a403a47e409238c0b9761).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19132
  
**[Test build #81678 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81678/testReport)**
 for PR 19132 at commit 
[`25fe22c`](https://github.com/apache/spark/commit/25fe22cddde276f846fd4808de1b575a87b1c059).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19132
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19199
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81673/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19195: [DOCS] Fix unreachable links in the document

2017-09-12 Thread sarutak
Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/19195#discussion_r138338917
  
--- Diff: docs/building-spark.md ---
@@ -111,7 +111,7 @@ should run continuous compilation (i.e. wait for 
changes). However, this has not
 extensively. A couple of gotchas to note:
 
 * it only scans the paths `src/main` and `src/test` (see
-[docs](http://scala-tools.org/mvnsites/maven-scala-plugin/usage_cc.html)), 
so it will only work
+[docs](http://davidb.github.io/scala-maven-plugin/example_compile.html)), 
so it will only work
--- End diff --

I confirmed [Internet 
Archive](https://web.archive.org/web/20160314050540/http://scala-tools.org/mvnsites/maven-scala-plugin/usage_cc.html)
 and I found the link you suggested is more proper. I'll modify it soon. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19199
  
**[Test build #81673 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81673/testReport)**
 for PR 19199 at commit 
[`b7fbc42`](https://github.com/apache/spark/commit/b7fbc42b5d50cb4380162b19aecd386c786659fd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15544
  
**[Test build #81677 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81677/testReport)**
 for PR 15544 at commit 
[`cd61382`](https://github.com/apache/spark/commit/cd61382aa7f5ef54059edead709da6b818267801).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15544: [SPARK-17997] [SQL] Add an aggregation function f...

2017-09-12 Thread wzhfy
GitHub user wzhfy reopened a pull request:

https://github.com/apache/spark/pull/15544

[SPARK-17997] [SQL] Add an aggregation function for counting distinct 
values for multiple intervals

## What changes were proposed in this pull request?

This work is a part of 
[SPARK-17074](https://issues.apache.org/jira/browse/SPARK-17074) to compute 
equi-height histograms. Equi-height histogram is an array of bins. A bin 
consists of two endpoints which form an interval of values and the ndv in that 
interval.

This PR creates a new aggregate function, given an array of endpoints, 
counting distinct values (ndv) in intervals among those endpoints.

This PR also refactors `HyperLogLogPlusPlus` by extracting a helper class 
`HyperLogLogPlusPlusHelper`, where the underlying HLLPP algorithm locates.

## How was this patch tested?

Add new test cases.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark countIntervals

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15544.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15544


commit 9960fab07d2075d2beba1fea7024fe6dd30d9eef
Author: wangzhenhua 
Date:   2016-10-14T06:23:39Z

refactor hllpp

commit 5aa835ce2769a34f88bacb389c4af30f52459226
Author: wangzhenhua 
Date:   2016-10-17T13:18:36Z

add IntervalDistinctApprox

commit 840171efa08c70da83af54bc726079a88fb7a1d2
Author: wangzhenhua 
Date:   2016-10-19T01:58:32Z

add test cases

commit a6417e7df5cf44ba9f75a7d66d46258a56b0082f
Author: wangzhenhua 
Date:   2016-10-20T04:46:57Z

convert HLLPP and IntervalDistinctApprox to ImperativeAggregate

commit 74d7ae7ac817d427a264b67f580fe39bbb49811b
Author: wangzhenhua 
Date:   2016-11-04T08:36:23Z

add negative column type test and update doc




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19175
  
**[Test build #81676 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81676/testReport)**
 for PR 19175 at commit 
[`709c2d3`](https://github.com/apache/spark/commit/709c2d3d81e331d6f69d8ed7ecdabe035142d296).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2017-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14158
  
hey @nblintao, do you maybe happened to have some time to continue this one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11494: [SPARK-10399][CORE][SQL] Introduce OffHeapMemoryBlock to...

2017-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/11494
  
gentle ping @yzotov


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19205: [SPARK-21982] Set locale to US

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19205
  
**[Test build #3918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3918/testReport)**
 for PR 19205 at commit 
[`22bbb92`](https://github.com/apache/spark/commit/22bbb924eae20b8d3f899008317f5d623c6a49ef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19175
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81670/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19175
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19175
  
**[Test build #81670 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81670/testReport)**
 for PR 19175 at commit 
[`da36e37`](https://github.com/apache/spark/commit/da36e37df9c31901975c29dfa77cb7d648e94f40).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19205: [SPARK-21982] Set locale to US

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19205
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19205: [SPARK-21982] Set locale to US

2017-09-12 Thread Gschiavon
GitHub user Gschiavon opened a pull request:

https://github.com/apache/spark/pull/19205

[SPARK-21982] Set locale to US

## What changes were proposed in this pull request?

In UtilsSuite Locale was set by default to US, but at the format time it 
wasn't, taking by default JVM locale which could be different than US making 
this test fail.

## How was this patch tested?
Unit test (UtilsSuite)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Gschiavon/spark fix/test-locale

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19205.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19205


commit 22bbb924eae20b8d3f899008317f5d623c6a49ef
Author: German Schiavon 
Date:   2017-09-12T12:05:03Z

Set locale to US




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19201
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81668/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19201
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19201
  
**[Test build #81668 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81668/testReport)**
 for PR 19201 at commit 
[`036e846`](https://github.com/apache/spark/commit/036e846a571f7aea3ad28b875afd5f9d714c25a5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19204
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19204: [SPARK-21981][PYTHON][ML] Added Python interface ...

2017-09-12 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/19204

[SPARK-21981][PYTHON][ML] Added Python interface for ClusteringEvaluator

## What changes were proposed in this pull request?

Added Python interface for ClusteringEvaluator

## How was this patch tested?

Manual test, eg. the example Python code in the comments.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-21981

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19204.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19204


commit 31b3c6c7e1298a1b4bf1fc969cee50534970ab0a
Author: Marco Gaido 
Date:   2017-09-05T17:22:21Z

Added python interface for ClusteringEvaluator




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19203: [BUILD] Close stale PRs

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19203
  
**[Test build #81675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81675/testReport)**
 for PR 19203 at commit 
[`6386e0c`](https://github.com/apache/spark/commit/6386e0c6ef027d2858d0860c6f9dd472e8ede6aa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19203: [BUILD] Close stale PRs

2017-09-12 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/19203

[BUILD] Close stale PRs

Closes #18522
Closes #17722
Closes #18879
Closes #18891
Closes #18806
Closes #18948
Closes #18949
Closes #19070
Closes #19039
Closes #19142
Closes #18515
Closes #19154
Closes #19162
Closes #19187

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark CloseStalePRs3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19203.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19203


commit 6386e0c6ef027d2858d0860c6f9dd472e8ede6aa
Author: Sean Owen 
Date:   2017-09-12T11:19:41Z

Close stale PRs.

Closes #18522
Closes #17722
Closes #18879
Closes #18891
Closes #18806
Closes #18948
Closes #18949
Closes #19070
Closes #19039
Closes #19142
Closes #18515
Closes #19154
Closes #19162
Closes #19187




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19134: [SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behin...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19134
  
**[Test build #81674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81674/testReport)**
 for PR 19134 at commit 
[`d888f7b`](https://github.com/apache/spark/commit/d888f7b4b457d537c6875de31cbd77f5460c7d3b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19185
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81669/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19185
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19185
  
**[Test build #81669 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81669/testReport)**
 for PR 19185 at commit 
[`eb8f6b4`](https://github.com/apache/spark/commit/eb8f6b431982d6f1f0118965391560f94812ab53).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19130
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81665/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19130
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19130
  
**[Test build #81665 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81665/testReport)**
 for PR 19130 at commit 
[`4bbc09d`](https://github.com/apache/spark/commit/4bbc09d68c21496d97be3e2d9f781e7ca0bbf7e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19198
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81663/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19198
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19198
  
**[Test build #81663 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81663/testReport)**
 for PR 19198 at commit 
[`6f3859c`](https://github.com/apache/spark/commit/6f3859c38392c9d1e5b5be9883610ecb26513736).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19199
  
**[Test build #81673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81673/testReport)**
 for PR 19199 at commit 
[`b7fbc42`](https://github.com/apache/spark/commit/b7fbc42b5d50cb4380162b19aecd386c786659fd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19181
  
**[Test build #81672 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81672/testReport)**
 for PR 19181 at commit 
[`ae7fbc4`](https://github.com/apache/spark/commit/ae7fbc48b349f5608aaef9f66e9e692354b72d18).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19201
  
**[Test build #81671 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81671/testReport)**
 for PR 19201 at commit 
[`7b414fa`](https://github.com/apache/spark/commit/7b414fafcf53e9e9e79a403a47e409238c0b9761).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-09-12 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/19181
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-12 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/18538
  
@yanboliang yes, thank you very much.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-12 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18538
  
@mgaido91 I opened 
[SPARK-21981](https://issues.apache.org/jira/browse/SPARK-21981) for Python 
API, would you like to work on it? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-09-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18538


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19175
  
**[Test build #81670 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81670/testReport)**
 for PR 19175 at commit 
[`da36e37`](https://github.com/apache/spark/commit/da36e37df9c31901975c29dfa77cb7d648e94f40).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-12 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18538
  
I'm merging this into master, thanks for all. If anyone has more comments, 
we can address them in follow-up PRs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2017-09-12 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/19175
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16422
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81664/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16422
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16422
  
**[Test build #81664 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81664/testReport)**
 for PR 16422 at commit 
[`0d49ee9`](https://github.com/apache/spark/commit/0d49ee91508c908daef672a04768c15a9e5c5dba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2017-09-12 Thread DonnyZone
Github user DonnyZone commented on the issue:

https://github.com/apache/spark/pull/19175
  
Could you help to review this PR? @jiangxb1987 @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19200: Get default Locale

2017-09-12 Thread Gschiavon
Github user Gschiavon closed the pull request at:

https://github.com/apache/spark/pull/19200


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19200: Get default Locale

2017-09-12 Thread Gschiavon
Github user Gschiavon commented on the issue:

https://github.com/apache/spark/pull/19200
  
Ok, I got it. I will do that then. 
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19182: [SPARK-21970][Core] Fix Redundant Throws Declarations in...

2017-09-12 Thread original-brownbear
Github user original-brownbear commented on the issue:

https://github.com/apache/spark/pull/19182
  
@srowen done, all changes to `org.apache.hive.*` reverted :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19185
  
**[Test build #81669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81669/testReport)**
 for PR 19185 at commit 
[`eb8f6b4`](https://github.com/apache/spark/commit/eb8f6b431982d6f1f0118965391560f94812ab53).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19199
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-12 Thread DonnyZone
Github user DonnyZone commented on the issue:

https://github.com/apache/spark/pull/19202
  
ping @cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81661/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19199
  
**[Test build #81661 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81661/testReport)**
 for PR 19199 at commit 
[`e703fc8`](https://github.com/apache/spark/commit/e703fc8f33d1fde90d790057481f1d23f466f378).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-12 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/19185
  
Jenkins, test this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19182: [SPARK-21970][Core] Fix Redundant Throws Declarations in...

2017-09-12 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19182
  
Ah OK one more subtle thing @original-brownbear  -- the code you see in 
org/apache/hive packages is, I believe, copied from Hive. Therefore it's 
probably best to leave it as-is because it makes it easier to update it if it 
hasn't varied at all from its source. Could you reverse those? otherwise looks 
OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19202
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19202: [SPARK-21980][SQL]References in grouping function...

2017-09-12 Thread DonnyZone
GitHub user DonnyZone opened a pull request:

https://github.com/apache/spark/pull/19202

[SPARK-21980][SQL]References in grouping functions should be indexed with 
resolver

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-21980

This PR fixes the issue in ResolveGroupingAnalytics rule, which indexes the 
column references in grouping functions without considering case sensitive 
configurations.

## How was this patch tested?
unit tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DonnyZone/spark ResolveGroupingAnalytics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19202.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19202


commit ac61a6620e59447c575092bee5d4d7f0af99695c
Author: donnyzone 
Date:   2017-09-12T09:28:01Z

SPARK-21980

commit b08fd9301cdbd4c1a29d5eb322eacd1cf2ffc546
Author: donnyzone 
Date:   2017-09-12T09:34:53Z

rename




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19190: [SPARK-21976][DOC] Fix wrong documentation for Me...

2017-09-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19190


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19190: [SPARK-21976][DOC] Fix wrong documentation for Mean Abso...

2017-09-12 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19190
  
Merged to master/2.2/2.1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19200: Get default Locale

2017-09-12 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19200
  
Ah, the problem is the reverse really, but there is a problem. 
`"...".format(...)` is locale-sensitive in Scala, and this is a place where 
that matters. The `Utils` method needs to change to use `formalLocal` with 
`Locale.US`. Open a JIRA for it, and close this and reopen vs `master` with 
that fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19191: [SPARK-21958][ML] Word2VecModel save: transform data in ...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19191
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81667/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19191: [SPARK-21958][ML] Word2VecModel save: transform data in ...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19191
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19191: [SPARK-21958][ML] Word2VecModel save: transform data in ...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19191
  
**[Test build #81667 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81667/testReport)**
 for PR 19191 at commit 
[`5f4ce99`](https://github.com/apache/spark/commit/5f4ce997f6f30cd0d59bc2e2f4396f495c3c0fd8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-12 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/18538
  
@yanboliang I addressed them. Thank you very much for your time, help and 
your great reviews.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19201
  
**[Test build #81668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81668/testReport)**
 for PR 19201 at commit 
[`036e846`](https://github.com/apache/spark/commit/036e846a571f7aea3ad28b875afd5f9d714c25a5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18538
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81666/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18538
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18538
  
**[Test build #81666 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81666/testReport)**
 for PR 18538 at commit 
[`a7c1481`](https://github.com/apache/spark/commit/a7c14818283467276a8f7eaa30b074a0f25237dc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints fr...

2017-09-12 Thread gengliangwang
GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/19201

[SPARK-21979][SQL]Improve QueryPlanConstraints framework

## What changes were proposed in this pull request?

Improve QueryPlanConstraints framework, make it robust and simple.
In https://github.com/apache/spark/pull/15319, constraints for expressions 
like `a = f(b, c)` is resolved. 
However, for expressions like 
```scala
a = f(b, c) && c = g(a, b)
```
The current QueryPlanConstraints framework will produce non-converging 
constraints.
Essentially, the problem is caused by having both the name and child of 
aliases in the same constraint set.   We infer constraints, and push down 
constraints as predicates in filters, later on these predicates are propagated 
as constraints, etc..
Simply using the alias names only can resolve these problems.  The size of 
constraints is reduced without losing any information. We can always get these 
inferred constraints on child of aliases when pushing down filters.

Also, the EqualNullSafe between name and child in propagating alias is 
meaningless
```scala
allConstraints += EqualNullSafe(e, a.toAttribute)
```
It just produces redundant constraints.

## How was this patch tested?

Unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark QueryPlanConstraints

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19201.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19201


commit 036e846a571f7aea3ad28b875afd5f9d714c25a5
Author: Wang Gengliang 
Date:   2017-09-12T09:06:09Z

improve QueryPlanConstraints




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19200: Get default Locale

2017-09-12 Thread Gschiavon
Github user Gschiavon commented on the issue:

https://github.com/apache/spark/pull/19200
  
As far as I saw there are other test cases that set default locale to US. 
This case is not passing when your jvm default locale value differs from "US". 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread j-baker
Github user j-baker commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138291926
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/Statistics.java
 ---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader.upward;
+
+import java.util.OptionalLong;
+
+/**
+ * An interface to represent statistics for a data source.
+ */
+public interface Statistics {
+  long sizeInBytes();
--- End diff --

and now is a good time to fix it :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread j-baker
Github user j-baker commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138291376
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/Statistics.java
 ---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader.upward;
+
+import java.util.OptionalLong;
+
+/**
+ * An interface to represent statistics for a data source.
+ */
+public interface Statistics {
+  long sizeInBytes();
--- End diff --

like, I get that it's non-optional at the moment, but it's odd that we have 
a method that the normal implementor will have to replace with

```
public long sizeInBytes() {
return Long.MAX_VALUE;
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread j-baker
Github user j-baker commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138290363
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java 
---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import java.util.HashMap;
+import java.util.Locale;
+import java.util.Map;
+
+/**
+ * An immutable case-insensitive string-to-string map, which is used to 
represent data source
+ * options.
+ */
+public class DataSourceV2Options {
+  private Map keyLowerCasedMap;
--- End diff --

nit: final


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread j-baker
Github user j-baker commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138289995
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java 
---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import java.util.HashMap;
+import java.util.Locale;
+import java.util.Map;
+
+/**
+ * An immutable case-insensitive string-to-string map, which is used to 
represent data source
+ * options.
+ */
+public class DataSourceV2Options {
+  private Map keyLowerCasedMap;
+
+  private String toLowerCase(String key) {
+return key.toLowerCase(Locale.ROOT);
+  }
+
+  public DataSourceV2Options(Map originalMap) {
+keyLowerCasedMap = new HashMap<>(originalMap.size());
+for (Map.Entry entry : originalMap.entrySet()) {
+  keyLowerCasedMap.put(toLowerCase(entry.getKey()), entry.getValue());
+}
+  }
+
+  /**
+   * Returns the option value to which the specified key is mapped, 
case-insensitively,
+   * or {@code null} if there is no mapping for the key.
+   */
+  public String get(String key) {
+return keyLowerCasedMap.get(toLowerCase(key));
+  }
+
+  /**
+   * Returns the option value to which the specified key is mapped, 
case-insensitively,
+   * or {@code defaultValue} if there is no mapping for the key.
+   */
+  public String getOrDefault(String key, String defaultValue) {
--- End diff --

if the above returns `Optional`, you probably don't need this method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread j-baker
Github user j-baker commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138289921
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java 
---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import java.util.HashMap;
+import java.util.Locale;
+import java.util.Map;
+
+/**
+ * An immutable case-insensitive string-to-string map, which is used to 
represent data source
+ * options.
+ */
+public class DataSourceV2Options {
+  private Map keyLowerCasedMap;
+
+  private String toLowerCase(String key) {
+return key.toLowerCase(Locale.ROOT);
+  }
+
+  public DataSourceV2Options(Map originalMap) {
+keyLowerCasedMap = new HashMap<>(originalMap.size());
+for (Map.Entry entry : originalMap.entrySet()) {
+  keyLowerCasedMap.put(toLowerCase(entry.getKey()), entry.getValue());
+}
+  }
+
+  /**
+   * Returns the option value to which the specified key is mapped, 
case-insensitively,
+   * or {@code null} if there is no mapping for the key.
--- End diff --

can we return `Optional` here? JDK maintainers wish they could 
return optional on Map


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-12 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
Any more comments on this PR? It have been about one month since the last 
modification.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread j-baker
Github user j-baker commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138289364
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import java.io.Serializable;
+
+/**
+ * A read task returned by a data source reader and is responsible to 
create the data reader.
+ * The relationship between `ReadTask` and `DataReader` is similar to 
`Iterable` and `Iterator`.
+ *
+ * Note that, the read task will be serialized and sent to executors, then 
the data reader will be
+ * created on executors and do the actual reading.
+ */
+public interface ReadTask extends Serializable {
+  /**
+   * The preferred locations for this read task to run faster, but Spark 
can't guarantee that this
+   * task will always run on these locations. Implementations should make 
sure that it can
+   * be run on any location.
+   */
+  default String[] preferredLocations() {
--- End diff --

can we have a class Host which represents this? Just makes the API more 
clear.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19200: Get default Locale

2017-09-12 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19200
  
No, because the project is purposely not locale sensitive at this level.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are ...

2017-09-12 Thread jmchung
Github user jmchung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19199#discussion_r138287636
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
 ---
@@ -109,6 +109,20 @@ class CSVFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
   }
 }
 
+if (requiredSchema.length == 1 &&
+  requiredSchema.head.name == parsedOptions.columnNameOfCorruptRecord) 
{
+  throw new AnalysisException(
+"Since Spark 2.3, the queries from raw JSON/CSV files are 
disallowed when the\n" +
+  "referenced columns only include the internal corrupt record 
column\n" +
+  s"(named ${parsedOptions.columnNameOfCorruptRecord} by default). 
For example:\n" +
--- End diff --

Thanks @viirya. Should we also need to replace the weird part in 
`JsonFileFormat` with `_corrupt_record`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread j-baker
Github user j-baker commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138287456
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/Statistics.java
 ---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader.upward;
+
+import java.util.OptionalLong;
+
+/**
+ * An interface to represent statistics for a data source.
+ */
+public interface Statistics {
+  long sizeInBytes();
--- End diff --

OptionalLong for sizeInBytes? It's not obvious that sizeInBytes is well 
defined for e.g. JDBC datasources, but row count can generally be easily 
estimated from the query plan.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19200: Get default Locale

2017-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19200
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread j-baker
Github user j-baker commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138286429
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.Strategy
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.{FilterExec, ProjectExec, SparkPlan}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.Filter
+import 
org.apache.spark.sql.sources.v2.reader.downward.{CatalystFilterPushDownSupport, 
ColumnPruningSupport, FilterPushDownSupport}
+
+object DataSourceV2Strategy extends Strategy {
+  // TODO: write path
+  override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
+case PhysicalOperation(projects, filters, DataSourceV2Relation(output, 
reader)) =>
+  val attrMap = AttributeMap(output.zip(output))
+
+  val projectSet = AttributeSet(projects.flatMap(_.references))
+  val filterSet = AttributeSet(filters.flatMap(_.references))
+
+  // Match original case of attributes.
+  // TODO: nested fields pruning
+  val requiredColumns = (projectSet ++ filterSet).toSeq.map(attrMap)
+  reader match {
+case r: ColumnPruningSupport =>
+  r.pruneColumns(requiredColumns.toStructType)
+case _ =>
+  }
+
+  val stayUpFilters: Seq[Expression] = reader match {
+case r: CatalystFilterPushDownSupport =>
+  r.pushCatalystFilters(filters.toArray)
+
+case r: FilterPushDownSupport =>
--- End diff --

like, we might as well not document it if the code can document it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19200: Set default Locale

2017-09-12 Thread Gschiavon
GitHub user Gschiavon opened a pull request:

https://github.com/apache/spark/pull/19200

Set default Locale

## What changes were proposed in this pull request?

Get default Locale in UtilsSuite.scala in order to make it work with 
different Locales than US.

## How was this patch tested?

Running UtilsSuite.scala 

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Gschiavon/spark fix/locale

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19200.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19200


commit 632526ba3e9a4d72133202cf0bfcc8a997dc9cb9
Author: German Schiavon 
Date:   2017-09-12T08:33:00Z

Set default Locale




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread j-baker
Github user j-baker commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138286323
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.Strategy
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.{FilterExec, ProjectExec, SparkPlan}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.Filter
+import 
org.apache.spark.sql.sources.v2.reader.downward.{CatalystFilterPushDownSupport, 
ColumnPruningSupport, FilterPushDownSupport}
+
+object DataSourceV2Strategy extends Strategy {
+  // TODO: write path
+  override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
+case PhysicalOperation(projects, filters, DataSourceV2Relation(output, 
reader)) =>
+  val attrMap = AttributeMap(output.zip(output))
+
+  val projectSet = AttributeSet(projects.flatMap(_.references))
+  val filterSet = AttributeSet(filters.flatMap(_.references))
+
+  // Match original case of attributes.
+  // TODO: nested fields pruning
+  val requiredColumns = (projectSet ++ filterSet).toSeq.map(attrMap)
+  reader match {
+case r: ColumnPruningSupport =>
+  r.pruneColumns(requiredColumns.toStructType)
+case _ =>
+  }
+
+  val stayUpFilters: Seq[Expression] = reader match {
+case r: CatalystFilterPushDownSupport =>
+  r.pushCatalystFilters(filters.toArray)
+
+case r: FilterPushDownSupport =>
--- End diff --

can FilterPushDownSupport be an interface which extends 
CatalystFilterPushDownSupport and provides a default impl of pruning the 
catalyst flter? Like, this code can just go there as a method:

```
interface FilterPushDownSupport extends CatalystFilterPushDownSupport {
List pushFilters(List filters);

default List pushCatalystFilters(List filters) {
Map translatedMap = new HashMap<>();
List nonconvertiblePredicates = new ArrayList<>();

for (Expression catalystFilter : filters) {
Optional translatedFilter = 
DataSourceStrategy.translateFilter(catalystFilter);
if (translatedFilter.isPresent()) {
translatedMap.put(translatedFilter.get(), catalystFilter);
} else {
nonconvertiblePredicates.add(catalystFilter);
}
}

List unhandledFilters = pushFilters(new 
ArrayList<>(translatedMap.values()));
return Stream.concat(
nonconvertiblePredicates.stream(),
unhandledFilters().stream().map(translatedMap::get))
   .collect(toList());
}
}
```

and we can trivially ignore the interface confusion (it's truly confusing 
if you can implement two interfaces)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread j-baker
Github user j-baker commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138282764
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/CatalystFilterPushDownSupport.java
 ---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader.downward;
+
+import org.apache.spark.annotation.Experimental;
+import org.apache.spark.annotation.InterfaceStability;
+import org.apache.spark.sql.catalyst.expressions.Expression;
+
+/**
+ * A mix-in interface for `DataSourceV2Reader`. Users can implement this 
interface to push down
+ * arbitrary expressions as predicates to the data source.
+ */
+@Experimental
+@InterfaceStability.Unstable
+public interface CatalystFilterPushDownSupport {
+
+  /**
+   * Push down filters, returns unsupported filters.
+   */
+  Expression[] pushCatalystFilters(Expression[] filters);
--- End diff --

any chance this could push java lists? They're just more idiomatic in a 
java interface


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread j-baker
Github user j-baker commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138281654
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.Strategy
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.{FilterExec, ProjectExec, SparkPlan}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.Filter
+import 
org.apache.spark.sql.sources.v2.reader.downward.{CatalystFilterPushDownSupport, 
ColumnPruningSupport, FilterPushDownSupport}
+
+object DataSourceV2Strategy extends Strategy {
+  // TODO: write path
+  override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
+case PhysicalOperation(projects, filters, DataSourceV2Relation(output, 
reader)) =>
+  val attrMap = AttributeMap(output.zip(output))
+
+  val projectSet = AttributeSet(projects.flatMap(_.references))
+  val filterSet = AttributeSet(filters.flatMap(_.references))
+
+  // Match original case of attributes.
+  // TODO: nested fields pruning
+  val requiredColumns = (projectSet ++ filterSet).toSeq.map(attrMap)
+  reader match {
+case r: ColumnPruningSupport =>
+  r.pruneColumns(requiredColumns.toStructType)
+case _ =>
+  }
+
+  val stayUpFilters: Seq[Expression] = reader match {
+case r: CatalystFilterPushDownSupport =>
+  r.pushCatalystFilters(filters.toArray)
+
+case r: FilterPushDownSupport =>
--- End diff --

Considering that there is a translation between Catalyst filters and 
Filters, it's probably worth _just_ doing the catalyst one, and providing the 
user with the translator if they want to do the Filter approach?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19191: [SPARK-21958][ML] Word2VecModel save: transform data in ...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19191
  
**[Test build #81667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81667/testReport)**
 for PR 19191 at commit 
[`5f4ce99`](https://github.com/apache/spark/commit/5f4ce997f6f30cd0d59bc2e2f4396f495c3c0fd8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19191: [SPARK-21958][ML] Word2VecModel save: transform data in ...

2017-09-12 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/19191
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18538
  
**[Test build #81666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81666/testReport)**
 for PR 18538 at commit 
[`a7c1481`](https://github.com/apache/spark/commit/a7c14818283467276a8f7eaa30b074a0f25237dc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   >