[GitHub] spark issue #18592: [SPARK-21368][SQL] TPCDSQueryBenchmark can't refer query...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18592 **[Test build #81679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81679/testReport)** for PR 18592 at commit [`06e306f`](https://github.com/apache/spark/commit/06e306fdb4199a8c7850a6a370ce67aeac0cdf8e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19203: [BUILD] Close stale PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19203 @srowen, it looks `19091` is missed. The rest of mine is a subset of the current list. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19199 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19132 Thanks @HyukjinKwon , I will ping Josh about this thing ð . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19201 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19132 @jerryshao, for triggering tests on Jenkins, I think this should be added by its admin manually as well if I understood correctly. In my case, I asked this to Josh Rosen before via email privately. I am quite sure you are facing the same issue I (and Holden, Felix and Takuya) met before if I understood correctly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19201 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81671/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19201 **[Test build #81671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81671/testReport)** for PR 19201 at commit [`7b414fa`](https://github.com/apache/spark/commit/7b414fafcf53e9e9e79a403a47e409238c0b9761). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19132 **[Test build #81678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81678/testReport)** for PR 19132 at commit [`25fe22c`](https://github.com/apache/spark/commit/25fe22cddde276f846fd4808de1b575a87b1c059). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19132 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19199 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19199 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81673/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19195: [DOCS] Fix unreachable links in the document
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/19195#discussion_r138338917 --- Diff: docs/building-spark.md --- @@ -111,7 +111,7 @@ should run continuous compilation (i.e. wait for changes). However, this has not extensively. A couple of gotchas to note: * it only scans the paths `src/main` and `src/test` (see -[docs](http://scala-tools.org/mvnsites/maven-scala-plugin/usage_cc.html)), so it will only work +[docs](http://davidb.github.io/scala-maven-plugin/example_compile.html)), so it will only work --- End diff -- I confirmed [Internet Archive](https://web.archive.org/web/20160314050540/http://scala-tools.org/mvnsites/maven-scala-plugin/usage_cc.html) and I found the link you suggested is more proper. I'll modify it soon. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19199 **[Test build #81673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81673/testReport)** for PR 19199 at commit [`b7fbc42`](https://github.com/apache/spark/commit/b7fbc42b5d50cb4380162b19aecd386c786659fd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15544 **[Test build #81677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81677/testReport)** for PR 15544 at commit [`cd61382`](https://github.com/apache/spark/commit/cd61382aa7f5ef54059edead709da6b818267801). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15544: [SPARK-17997] [SQL] Add an aggregation function f...
GitHub user wzhfy reopened a pull request: https://github.com/apache/spark/pull/15544 [SPARK-17997] [SQL] Add an aggregation function for counting distinct values for multiple intervals ## What changes were proposed in this pull request? This work is a part of [SPARK-17074](https://issues.apache.org/jira/browse/SPARK-17074) to compute equi-height histograms. Equi-height histogram is an array of bins. A bin consists of two endpoints which form an interval of values and the ndv in that interval. This PR creates a new aggregate function, given an array of endpoints, counting distinct values (ndv) in intervals among those endpoints. This PR also refactors `HyperLogLogPlusPlus` by extracting a helper class `HyperLogLogPlusPlusHelper`, where the underlying HLLPP algorithm locates. ## How was this patch tested? Add new test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark countIntervals Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15544.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15544 commit 9960fab07d2075d2beba1fea7024fe6dd30d9eef Author: wangzhenhua Date: 2016-10-14T06:23:39Z refactor hllpp commit 5aa835ce2769a34f88bacb389c4af30f52459226 Author: wangzhenhua Date: 2016-10-17T13:18:36Z add IntervalDistinctApprox commit 840171efa08c70da83af54bc726079a88fb7a1d2 Author: wangzhenhua Date: 2016-10-19T01:58:32Z add test cases commit a6417e7df5cf44ba9f75a7d66d46258a56b0082f Author: wangzhenhua Date: 2016-10-20T04:46:57Z convert HLLPP and IntervalDistinctApprox to ImperativeAggregate commit 74d7ae7ac817d427a264b67f580fe39bbb49811b Author: wangzhenhua Date: 2016-11-04T08:36:23Z add negative column type test and update doc --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19175 **[Test build #81676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81676/testReport)** for PR 19175 at commit [`709c2d3`](https://github.com/apache/spark/commit/709c2d3d81e331d6f69d8ed7ecdabe035142d296). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14158 hey @nblintao, do you maybe happened to have some time to continue this one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11494: [SPARK-10399][CORE][SQL] Introduce OffHeapMemoryBlock to...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11494 gentle ping @yzotov --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19205: [SPARK-21982] Set locale to US
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19205 **[Test build #3918 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3918/testReport)** for PR 19205 at commit [`22bbb92`](https://github.com/apache/spark/commit/22bbb924eae20b8d3f899008317f5d623c6a49ef). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19175 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81670/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19175 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19175 **[Test build #81670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81670/testReport)** for PR 19175 at commit [`da36e37`](https://github.com/apache/spark/commit/da36e37df9c31901975c29dfa77cb7d648e94f40). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19205: [SPARK-21982] Set locale to US
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19205 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19205: [SPARK-21982] Set locale to US
GitHub user Gschiavon opened a pull request: https://github.com/apache/spark/pull/19205 [SPARK-21982] Set locale to US ## What changes were proposed in this pull request? In UtilsSuite Locale was set by default to US, but at the format time it wasn't, taking by default JVM locale which could be different than US making this test fail. ## How was this patch tested? Unit test (UtilsSuite) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Gschiavon/spark fix/test-locale Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19205.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19205 commit 22bbb924eae20b8d3f899008317f5d623c6a49ef Author: German Schiavon Date: 2017-09-12T12:05:03Z Set locale to US --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19201 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81668/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19201 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19201 **[Test build #81668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81668/testReport)** for PR 19201 at commit [`036e846`](https://github.com/apache/spark/commit/036e846a571f7aea3ad28b875afd5f9d714c25a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19204 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19204: [SPARK-21981][PYTHON][ML] Added Python interface ...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19204 [SPARK-21981][PYTHON][ML] Added Python interface for ClusteringEvaluator ## What changes were proposed in this pull request? Added Python interface for ClusteringEvaluator ## How was this patch tested? Manual test, eg. the example Python code in the comments. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-21981 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19204.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19204 commit 31b3c6c7e1298a1b4bf1fc969cee50534970ab0a Author: Marco Gaido Date: 2017-09-05T17:22:21Z Added python interface for ClusteringEvaluator --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19203: [BUILD] Close stale PRs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19203 **[Test build #81675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81675/testReport)** for PR 19203 at commit [`6386e0c`](https://github.com/apache/spark/commit/6386e0c6ef027d2858d0860c6f9dd472e8ede6aa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19203: [BUILD] Close stale PRs
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/19203 [BUILD] Close stale PRs Closes #18522 Closes #17722 Closes #18879 Closes #18891 Closes #18806 Closes #18948 Closes #18949 Closes #19070 Closes #19039 Closes #19142 Closes #18515 Closes #19154 Closes #19162 Closes #19187 You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark CloseStalePRs3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19203.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19203 commit 6386e0c6ef027d2858d0860c6f9dd472e8ede6aa Author: Sean Owen Date: 2017-09-12T11:19:41Z Close stale PRs. Closes #18522 Closes #17722 Closes #18879 Closes #18891 Closes #18806 Closes #18948 Closes #18949 Closes #19070 Closes #19039 Closes #19142 Closes #18515 Closes #19154 Closes #19162 Closes #19187 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19134: [SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19134 **[Test build #81674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81674/testReport)** for PR 19134 at commit [`d888f7b`](https://github.com/apache/spark/commit/d888f7b4b457d537c6875de31cbd77f5460c7d3b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19185 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81669/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19185 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19185 **[Test build #81669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81669/testReport)** for PR 19185 at commit [`eb8f6b4`](https://github.com/apache/spark/commit/eb8f6b431982d6f1f0118965391560f94812ab53). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19130 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81665/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19130 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19130 **[Test build #81665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81665/testReport)** for PR 19130 at commit [`4bbc09d`](https://github.com/apache/spark/commit/4bbc09d68c21496d97be3e2d9f781e7ca0bbf7e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19198 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81663/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19198 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19198 **[Test build #81663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81663/testReport)** for PR 19198 at commit [`6f3859c`](https://github.com/apache/spark/commit/6f3859c38392c9d1e5b5be9883610ecb26513736). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19199 **[Test build #81673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81673/testReport)** for PR 19199 at commit [`b7fbc42`](https://github.com/apache/spark/commit/b7fbc42b5d50cb4380162b19aecd386c786659fd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19181 **[Test build #81672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81672/testReport)** for PR 19181 at commit [`ae7fbc4`](https://github.com/apache/spark/commit/ae7fbc48b349f5608aaef9f66e9e692354b72d18). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19201 **[Test build #81671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81671/testReport)** for PR 19201 at commit [`7b414fa`](https://github.com/apache/spark/commit/7b414fafcf53e9e9e79a403a47e409238c0b9761). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19181 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18538 @yanboliang yes, thank you very much. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18538 @mgaido91 I opened [SPARK-21981](https://issues.apache.org/jira/browse/SPARK-21981) for Python API, would you like to work on it? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18538 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19175 **[Test build #81670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81670/testReport)** for PR 19175 at commit [`da36e37`](https://github.com/apache/spark/commit/da36e37df9c31901975c29dfa77cb7d648e94f40). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18538 I'm merging this into master, thanks for all. If anyone has more comments, we can address them in follow-up PRs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19175 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16422 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81664/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16422 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16422 **[Test build #81664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81664/testReport)** for PR 16422 at commit [`0d49ee9`](https://github.com/apache/spark/commit/0d49ee91508c908daef672a04768c15a9e5c5dba). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/19175 Could you help to review this PR? @jiangxb1987 @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19200: Get default Locale
Github user Gschiavon closed the pull request at: https://github.com/apache/spark/pull/19200 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19200: Get default Locale
Github user Gschiavon commented on the issue: https://github.com/apache/spark/pull/19200 Ok, I got it. I will do that then. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19182: [SPARK-21970][Core] Fix Redundant Throws Declarations in...
Github user original-brownbear commented on the issue: https://github.com/apache/spark/pull/19182 @srowen done, all changes to `org.apache.hive.*` reverted :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19185 **[Test build #81669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81669/testReport)** for PR 19185 at commit [`eb8f6b4`](https://github.com/apache/spark/commit/eb8f6b431982d6f1f0118965391560f94812ab53). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19199 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/19202 ping @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19199 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81661/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19199 **[Test build #81661 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81661/testReport)** for PR 19199 at commit [`e703fc8`](https://github.com/apache/spark/commit/e703fc8f33d1fde90d790057481f1d23f466f378). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/19185 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19182: [SPARK-21970][Core] Fix Redundant Throws Declarations in...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19182 Ah OK one more subtle thing @original-brownbear -- the code you see in org/apache/hive packages is, I believe, copied from Hive. Therefore it's probably best to leave it as-is because it makes it easier to update it if it hasn't varied at all from its source. Could you reverse those? otherwise looks OK. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19202 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19202: [SPARK-21980][SQL]References in grouping function...
GitHub user DonnyZone opened a pull request: https://github.com/apache/spark/pull/19202 [SPARK-21980][SQL]References in grouping functions should be indexed with resolver ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-21980 This PR fixes the issue in ResolveGroupingAnalytics rule, which indexes the column references in grouping functions without considering case sensitive configurations. ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/DonnyZone/spark ResolveGroupingAnalytics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19202.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19202 commit ac61a6620e59447c575092bee5d4d7f0af99695c Author: donnyzone Date: 2017-09-12T09:28:01Z SPARK-21980 commit b08fd9301cdbd4c1a29d5eb322eacd1cf2ffc546 Author: donnyzone Date: 2017-09-12T09:34:53Z rename --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19190: [SPARK-21976][DOC] Fix wrong documentation for Me...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19190 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19190: [SPARK-21976][DOC] Fix wrong documentation for Mean Abso...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19190 Merged to master/2.2/2.1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19200: Get default Locale
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19200 Ah, the problem is the reverse really, but there is a problem. `"...".format(...)` is locale-sensitive in Scala, and this is a place where that matters. The `Utils` method needs to change to use `formalLocal` with `Locale.US`. Open a JIRA for it, and close this and reopen vs `master` with that fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19191: [SPARK-21958][ML] Word2VecModel save: transform data in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19191 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81667/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19191: [SPARK-21958][ML] Word2VecModel save: transform data in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19191 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19191: [SPARK-21958][ML] Word2VecModel save: transform data in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19191 **[Test build #81667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81667/testReport)** for PR 19191 at commit [`5f4ce99`](https://github.com/apache/spark/commit/5f4ce997f6f30cd0d59bc2e2f4396f495c3c0fd8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18538 @yanboliang I addressed them. Thank you very much for your time, help and your great reviews. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19201 **[Test build #81668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81668/testReport)** for PR 19201 at commit [`036e846`](https://github.com/apache/spark/commit/036e846a571f7aea3ad28b875afd5f9d714c25a5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18538 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81666/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18538 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18538 **[Test build #81666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81666/testReport)** for PR 18538 at commit [`a7c1481`](https://github.com/apache/spark/commit/a7c14818283467276a8f7eaa30b074a0f25237dc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints fr...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/19201 [SPARK-21979][SQL]Improve QueryPlanConstraints framework ## What changes were proposed in this pull request? Improve QueryPlanConstraints framework, make it robust and simple. In https://github.com/apache/spark/pull/15319, constraints for expressions like `a = f(b, c)` is resolved. However, for expressions like ```scala a = f(b, c) && c = g(a, b) ``` The current QueryPlanConstraints framework will produce non-converging constraints. Essentially, the problem is caused by having both the name and child of aliases in the same constraint set. We infer constraints, and push down constraints as predicates in filters, later on these predicates are propagated as constraints, etc.. Simply using the alias names only can resolve these problems. The size of constraints is reduced without losing any information. We can always get these inferred constraints on child of aliases when pushing down filters. Also, the EqualNullSafe between name and child in propagating alias is meaningless ```scala allConstraints += EqualNullSafe(e, a.toAttribute) ``` It just produces redundant constraints. ## How was this patch tested? Unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark QueryPlanConstraints Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19201.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19201 commit 036e846a571f7aea3ad28b875afd5f9d714c25a5 Author: Wang Gengliang Date: 2017-09-12T09:06:09Z improve QueryPlanConstraints --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19200: Get default Locale
Github user Gschiavon commented on the issue: https://github.com/apache/spark/pull/19200 As far as I saw there are other test cases that set default locale to US. This case is not passing when your jvm default locale value differs from "US". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user j-baker commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138291926 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/Statistics.java --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.upward; + +import java.util.OptionalLong; + +/** + * An interface to represent statistics for a data source. + */ +public interface Statistics { + long sizeInBytes(); --- End diff -- and now is a good time to fix it :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user j-baker commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138291376 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/Statistics.java --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.upward; + +import java.util.OptionalLong; + +/** + * An interface to represent statistics for a data source. + */ +public interface Statistics { + long sizeInBytes(); --- End diff -- like, I get that it's non-optional at the moment, but it's odd that we have a method that the normal implementor will have to replace with ``` public long sizeInBytes() { return Long.MAX_VALUE; } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user j-baker commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138290363 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import java.util.HashMap; +import java.util.Locale; +import java.util.Map; + +/** + * An immutable case-insensitive string-to-string map, which is used to represent data source + * options. + */ +public class DataSourceV2Options { + private Map keyLowerCasedMap; --- End diff -- nit: final --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user j-baker commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138289995 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import java.util.HashMap; +import java.util.Locale; +import java.util.Map; + +/** + * An immutable case-insensitive string-to-string map, which is used to represent data source + * options. + */ +public class DataSourceV2Options { + private Map keyLowerCasedMap; + + private String toLowerCase(String key) { +return key.toLowerCase(Locale.ROOT); + } + + public DataSourceV2Options(Map originalMap) { +keyLowerCasedMap = new HashMap<>(originalMap.size()); +for (Map.Entry entry : originalMap.entrySet()) { + keyLowerCasedMap.put(toLowerCase(entry.getKey()), entry.getValue()); +} + } + + /** + * Returns the option value to which the specified key is mapped, case-insensitively, + * or {@code null} if there is no mapping for the key. + */ + public String get(String key) { +return keyLowerCasedMap.get(toLowerCase(key)); + } + + /** + * Returns the option value to which the specified key is mapped, case-insensitively, + * or {@code defaultValue} if there is no mapping for the key. + */ + public String getOrDefault(String key, String defaultValue) { --- End diff -- if the above returns `Optional`, you probably don't need this method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user j-baker commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138289921 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import java.util.HashMap; +import java.util.Locale; +import java.util.Map; + +/** + * An immutable case-insensitive string-to-string map, which is used to represent data source + * options. + */ +public class DataSourceV2Options { + private Map keyLowerCasedMap; + + private String toLowerCase(String key) { +return key.toLowerCase(Locale.ROOT); + } + + public DataSourceV2Options(Map originalMap) { +keyLowerCasedMap = new HashMap<>(originalMap.size()); +for (Map.Entry entry : originalMap.entrySet()) { + keyLowerCasedMap.put(toLowerCase(entry.getKey()), entry.getValue()); +} + } + + /** + * Returns the option value to which the specified key is mapped, case-insensitively, + * or {@code null} if there is no mapping for the key. --- End diff -- can we return `Optional` here? JDK maintainers wish they could return optional on Map --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/18902 Any more comments on this PR? It have been about one month since the last modification. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user j-baker commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138289364 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.io.Serializable; + +/** + * A read task returned by a data source reader and is responsible to create the data reader. + * The relationship between `ReadTask` and `DataReader` is similar to `Iterable` and `Iterator`. + * + * Note that, the read task will be serialized and sent to executors, then the data reader will be + * created on executors and do the actual reading. + */ +public interface ReadTask extends Serializable { + /** + * The preferred locations for this read task to run faster, but Spark can't guarantee that this + * task will always run on these locations. Implementations should make sure that it can + * be run on any location. + */ + default String[] preferredLocations() { --- End diff -- can we have a class Host which represents this? Just makes the API more clear. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19200: Get default Locale
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19200 No, because the project is purposely not locale sensitive at this level. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are ...
Github user jmchung commented on a diff in the pull request: https://github.com/apache/spark/pull/19199#discussion_r138287636 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -109,6 +109,20 @@ class CSVFileFormat extends TextBasedFileFormat with DataSourceRegister { } } +if (requiredSchema.length == 1 && + requiredSchema.head.name == parsedOptions.columnNameOfCorruptRecord) { + throw new AnalysisException( +"Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the\n" + + "referenced columns only include the internal corrupt record column\n" + + s"(named ${parsedOptions.columnNameOfCorruptRecord} by default). For example:\n" + --- End diff -- Thanks @viirya. Should we also need to replace the weird part in `JsonFileFormat` with `_corrupt_record`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user j-baker commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138287456 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/Statistics.java --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.upward; + +import java.util.OptionalLong; + +/** + * An interface to represent statistics for a data source. + */ +public interface Statistics { + long sizeInBytes(); --- End diff -- OptionalLong for sizeInBytes? It's not obvious that sizeInBytes is well defined for e.g. JDBC datasources, but row count can generally be easily estimated from the query plan. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19200: Get default Locale
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19200 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user j-baker commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138286429 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.Strategy +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.{FilterExec, ProjectExec, SparkPlan} +import org.apache.spark.sql.execution.datasources.DataSourceStrategy +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.sources.v2.reader.downward.{CatalystFilterPushDownSupport, ColumnPruningSupport, FilterPushDownSupport} + +object DataSourceV2Strategy extends Strategy { + // TODO: write path + override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { +case PhysicalOperation(projects, filters, DataSourceV2Relation(output, reader)) => + val attrMap = AttributeMap(output.zip(output)) + + val projectSet = AttributeSet(projects.flatMap(_.references)) + val filterSet = AttributeSet(filters.flatMap(_.references)) + + // Match original case of attributes. + // TODO: nested fields pruning + val requiredColumns = (projectSet ++ filterSet).toSeq.map(attrMap) + reader match { +case r: ColumnPruningSupport => + r.pruneColumns(requiredColumns.toStructType) +case _ => + } + + val stayUpFilters: Seq[Expression] = reader match { +case r: CatalystFilterPushDownSupport => + r.pushCatalystFilters(filters.toArray) + +case r: FilterPushDownSupport => --- End diff -- like, we might as well not document it if the code can document it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19200: Set default Locale
GitHub user Gschiavon opened a pull request: https://github.com/apache/spark/pull/19200 Set default Locale ## What changes were proposed in this pull request? Get default Locale in UtilsSuite.scala in order to make it work with different Locales than US. ## How was this patch tested? Running UtilsSuite.scala Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Gschiavon/spark fix/locale Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19200.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19200 commit 632526ba3e9a4d72133202cf0bfcc8a997dc9cb9 Author: German Schiavon Date: 2017-09-12T08:33:00Z Set default Locale --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user j-baker commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138286323 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.Strategy +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.{FilterExec, ProjectExec, SparkPlan} +import org.apache.spark.sql.execution.datasources.DataSourceStrategy +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.sources.v2.reader.downward.{CatalystFilterPushDownSupport, ColumnPruningSupport, FilterPushDownSupport} + +object DataSourceV2Strategy extends Strategy { + // TODO: write path + override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { +case PhysicalOperation(projects, filters, DataSourceV2Relation(output, reader)) => + val attrMap = AttributeMap(output.zip(output)) + + val projectSet = AttributeSet(projects.flatMap(_.references)) + val filterSet = AttributeSet(filters.flatMap(_.references)) + + // Match original case of attributes. + // TODO: nested fields pruning + val requiredColumns = (projectSet ++ filterSet).toSeq.map(attrMap) + reader match { +case r: ColumnPruningSupport => + r.pruneColumns(requiredColumns.toStructType) +case _ => + } + + val stayUpFilters: Seq[Expression] = reader match { +case r: CatalystFilterPushDownSupport => + r.pushCatalystFilters(filters.toArray) + +case r: FilterPushDownSupport => --- End diff -- can FilterPushDownSupport be an interface which extends CatalystFilterPushDownSupport and provides a default impl of pruning the catalyst flter? Like, this code can just go there as a method: ``` interface FilterPushDownSupport extends CatalystFilterPushDownSupport { List pushFilters(List filters); default List pushCatalystFilters(List filters) { Map translatedMap = new HashMap<>(); List nonconvertiblePredicates = new ArrayList<>(); for (Expression catalystFilter : filters) { Optional translatedFilter = DataSourceStrategy.translateFilter(catalystFilter); if (translatedFilter.isPresent()) { translatedMap.put(translatedFilter.get(), catalystFilter); } else { nonconvertiblePredicates.add(catalystFilter); } } List unhandledFilters = pushFilters(new ArrayList<>(translatedMap.values())); return Stream.concat( nonconvertiblePredicates.stream(), unhandledFilters().stream().map(translatedMap::get)) .collect(toList()); } } ``` and we can trivially ignore the interface confusion (it's truly confusing if you can implement two interfaces) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user j-baker commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138282764 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/CatalystFilterPushDownSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.downward; + +import org.apache.spark.annotation.Experimental; +import org.apache.spark.annotation.InterfaceStability; +import org.apache.spark.sql.catalyst.expressions.Expression; + +/** + * A mix-in interface for `DataSourceV2Reader`. Users can implement this interface to push down + * arbitrary expressions as predicates to the data source. + */ +@Experimental +@InterfaceStability.Unstable +public interface CatalystFilterPushDownSupport { + + /** + * Push down filters, returns unsupported filters. + */ + Expression[] pushCatalystFilters(Expression[] filters); --- End diff -- any chance this could push java lists? They're just more idiomatic in a java interface --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user j-baker commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138281654 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.Strategy +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.{FilterExec, ProjectExec, SparkPlan} +import org.apache.spark.sql.execution.datasources.DataSourceStrategy +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.sources.v2.reader.downward.{CatalystFilterPushDownSupport, ColumnPruningSupport, FilterPushDownSupport} + +object DataSourceV2Strategy extends Strategy { + // TODO: write path + override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { +case PhysicalOperation(projects, filters, DataSourceV2Relation(output, reader)) => + val attrMap = AttributeMap(output.zip(output)) + + val projectSet = AttributeSet(projects.flatMap(_.references)) + val filterSet = AttributeSet(filters.flatMap(_.references)) + + // Match original case of attributes. + // TODO: nested fields pruning + val requiredColumns = (projectSet ++ filterSet).toSeq.map(attrMap) + reader match { +case r: ColumnPruningSupport => + r.pruneColumns(requiredColumns.toStructType) +case _ => + } + + val stayUpFilters: Seq[Expression] = reader match { +case r: CatalystFilterPushDownSupport => + r.pushCatalystFilters(filters.toArray) + +case r: FilterPushDownSupport => --- End diff -- Considering that there is a translation between Catalyst filters and Filters, it's probably worth _just_ doing the catalyst one, and providing the user with the translator if they want to do the Filter approach? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19191: [SPARK-21958][ML] Word2VecModel save: transform data in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19191 **[Test build #81667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81667/testReport)** for PR 19191 at commit [`5f4ce99`](https://github.com/apache/spark/commit/5f4ce997f6f30cd0d59bc2e2f4396f495c3c0fd8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19191: [SPARK-21958][ML] Word2VecModel save: transform data in ...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19191 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18538 **[Test build #81666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81666/testReport)** for PR 18538 at commit [`a7c1481`](https://github.com/apache/spark/commit/a7c14818283467276a8f7eaa30b074a0f25237dc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org