[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 Thank for the hints. I've taken a deeper look at the possible solutions and the suggested test. The problem is similar but not the same so I would solve it a different way. So here is my proposal. `cancelStage` sets `reasonIfKilled` in `TaskContext` normally but the executor thread will run untouched at this timestamp. The thread will be killed later triggered by `killTaskIfInterrupted` which throws `TaskKilledException`. When `isInterrupted` checked all the time when `DataFrameRangeSuite.stageToKill` will be set then the race can be avoided. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21028 **[Test build #89130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89130/testReport)** for PR 21028 at commit [`e5ebdad`](https://github.com/apache/spark/commit/e5ebdad41645c0058f1cd2788f6cc1d4158ff2e9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/21028 [SPARK-23922][SQL] Add arrays_overlap function ## What changes were proposed in this pull request? The PR adds the function `arrays_overlap`. This function returns `true` if the input arrays contain a non-null common element; if not, it returns `null` if any of the arrays contains a `null` element, `false` otherwise. ## How was this patch tested? added UTs You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-23922 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21028.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21028 commit e5ebdad41645c0058f1cd2788f6cc1d4158ff2e9 Author: Marco GaidoDate: 2018-04-10T13:49:53Z [SPARK-23922][SQL] Add arrays_overlap function --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20560 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89118/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20560 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2159/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20560 **[Test build #89118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89118/testReport)** for PR 20560 at commit [`1c7cae6`](https://github.com/apache/spark/commit/1c7cae685314bf762b38defb9233dbef315ab0df). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20986: [SPARK-23864][SQL] Add unsafe object writing to UnsafeWr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20986 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89112/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20986: [SPARK-23864][SQL] Add unsafe object writing to UnsafeWr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20986 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20986: [SPARK-23864][SQL] Add unsafe object writing to UnsafeWr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20986 **[Test build #89112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89112/testReport)** for PR 20986 at commit [`352c735`](https://github.com/apache/spark/commit/352c735ea54a17ef55a9740ad3ae9b163f982539). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89108/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21007 **[Test build #89108 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89108/testReport)** for PR 21007 at commit [`edb5eea`](https://github.com/apache/spark/commit/edb5eea8501c8348d037b3328229f0cdc078441a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21026 **[Test build #89129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89129/testReport)** for PR 21026 at commit [`0b194ca`](https://github.com/apache/spark/commit/0b194ca4c3ef6b2b6411e123c1153da63a111374). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20871: [SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20871 ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21011 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89106/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21011 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21026 **[Test build #89128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89128/testReport)** for PR 21026 at commit [`821e08a`](https://github.com/apache/spark/commit/821e08a988e81b389d454eca01f0cd0b3e3c9463). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89128/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21011 **[Test build #89106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89106/testReport)** for PR 21011 at commit [`e52ff85`](https://github.com/apache/spark/commit/e52ff856d42adc5af2e2b2593c2e63d5c3f3a205). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2158/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21027 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21027 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20940: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/20940#discussion_r180446530 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -234,8 +244,22 @@ private[spark] class EventLoggingListener( } } - // No-op because logging every update would be overkill - override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { } + /** + * Log if there is a new peak value for one of the memory metrics for the given executor. + * Metrics are cleared out when a new stage is started in onStageSubmitted, so this will + * log new peak memory metric values per executor per stage. + */ + override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { --- End diff -- I will make the change to log at stage end, and will update the design doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20998: [SPARK-23888][CORE] speculative task should not r...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20998#discussion_r180443917 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -880,6 +880,59 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(manager.resourceOffer("execB", "host2", ANY).get.index === 3) } + test("speculative task should not run on a given host where another attempt " + +"is already running on") { +sc = new SparkContext("local", "test") +sched = new FakeTaskScheduler( + sc, ("execA", "host1"), ("execB", "host2")) +val taskSet = FakeTask.createTaskSet(1, + Seq(TaskLocation("host1", "execA"), TaskLocation("host2", "execB"))) +val clock = new ManualClock +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES, clock = clock) + +// let task0.0 run on host1 +assert(manager.resourceOffer("execA", "host1", PROCESS_LOCAL).get.index == 0) +val info1 = manager.taskAttempts(0)(0) +assert(info1.running === true) +assert(info1.host === "host1") + +// long time elapse, and task0.0 is still running, +// so we launch a speculative task0.1 on host2 +clock.advance(1000) +manager.speculatableTasks += 0 +assert(manager.resourceOffer("execB", "host2", PROCESS_LOCAL).get.index === 0) +val info2 = manager.taskAttempts(0)(0) +assert(info2.running === true) +assert(info2.host === "host2") +assert(manager.speculatableTasks.size === 0) + +// now, task0 has two copies running on host1, host2 separately, +// so we can not launch a speculative task on any hosts. +manager.speculatableTasks += 0 +assert(manager.resourceOffer("execA", "host1", PROCESS_LOCAL) === None) +assert(manager.resourceOffer("execB", "host2", PROCESS_LOCAL) === None) +assert(manager.speculatableTasks.size === 1) + +// after a long long time, task0.0 failed, and task0.0 can not re-run since +// there's already a running copy. +clock.advance(1000) +info1.finishTime = clock.getTimeMillis() --- End diff -- it would be better here for you to call `manager.handleFailedTask`, to more accurately simulate the real behavior, and also makes the purpose of a test a little more clear. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20998: [SPARK-23888][CORE] speculative task should not r...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20998#discussion_r180439612 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -880,6 +880,59 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(manager.resourceOffer("execB", "host2", ANY).get.index === 3) } + test("speculative task should not run on a given host where another attempt " + +"is already running on") { +sc = new SparkContext("local", "test") +sched = new FakeTaskScheduler( + sc, ("execA", "host1"), ("execB", "host2")) +val taskSet = FakeTask.createTaskSet(1, + Seq(TaskLocation("host1", "execA"), TaskLocation("host2", "execB"))) +val clock = new ManualClock +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES, clock = clock) + +// let task0.0 run on host1 +assert(manager.resourceOffer("execA", "host1", PROCESS_LOCAL).get.index == 0) +val info1 = manager.taskAttempts(0)(0) +assert(info1.running === true) +assert(info1.host === "host1") + +// long time elapse, and task0.0 is still running, +// so we launch a speculative task0.1 on host2 +clock.advance(1000) +manager.speculatableTasks += 0 +assert(manager.resourceOffer("execB", "host2", PROCESS_LOCAL).get.index === 0) +val info2 = manager.taskAttempts(0)(0) +assert(info2.running === true) +assert(info2.host === "host2") +assert(manager.speculatableTasks.size === 0) + +// now, task0 has two copies running on host1, host2 separately, +// so we can not launch a speculative task on any hosts. +manager.speculatableTasks += 0 +assert(manager.resourceOffer("execA", "host1", PROCESS_LOCAL) === None) +assert(manager.resourceOffer("execB", "host2", PROCESS_LOCAL) === None) +assert(manager.speculatableTasks.size === 1) + +// after a long long time, task0.0 failed, and task0.0 can not re-run since +// there's already a running copy. +clock.advance(1000) +info1.finishTime = clock.getTimeMillis() +assert(info1.running === false) + +// time goes on, and task0.1 is still running +clock.advance(1000) +// so we try to launch a new speculative task +// we can not run it on host2, because task0.1 is already running on +assert(manager.resourceOffer("execB", "host2", PROCESS_LOCAL) === None) +// we successfully launch a speculative task0.2 on host1, since there's +// no more running copy of task0 +assert(manager.resourceOffer("execA", "host1", PROCESS_LOCAL).get.index === 0) +val info3 = manager.taskAttempts(0)(0) +assert(info3.running === true) --- End diff -- `assert(info3.running)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20998: [SPARK-23888][CORE] speculative task should not r...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20998#discussion_r180439559 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -880,6 +880,59 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(manager.resourceOffer("execB", "host2", ANY).get.index === 3) } + test("speculative task should not run on a given host where another attempt " + +"is already running on") { +sc = new SparkContext("local", "test") +sched = new FakeTaskScheduler( + sc, ("execA", "host1"), ("execB", "host2")) +val taskSet = FakeTask.createTaskSet(1, + Seq(TaskLocation("host1", "execA"), TaskLocation("host2", "execB"))) +val clock = new ManualClock +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES, clock = clock) + +// let task0.0 run on host1 +assert(manager.resourceOffer("execA", "host1", PROCESS_LOCAL).get.index == 0) +val info1 = manager.taskAttempts(0)(0) +assert(info1.running === true) +assert(info1.host === "host1") + +// long time elapse, and task0.0 is still running, +// so we launch a speculative task0.1 on host2 +clock.advance(1000) +manager.speculatableTasks += 0 +assert(manager.resourceOffer("execB", "host2", PROCESS_LOCAL).get.index === 0) +val info2 = manager.taskAttempts(0)(0) +assert(info2.running === true) +assert(info2.host === "host2") +assert(manager.speculatableTasks.size === 0) + +// now, task0 has two copies running on host1, host2 separately, +// so we can not launch a speculative task on any hosts. +manager.speculatableTasks += 0 +assert(manager.resourceOffer("execA", "host1", PROCESS_LOCAL) === None) +assert(manager.resourceOffer("execB", "host2", PROCESS_LOCAL) === None) +assert(manager.speculatableTasks.size === 1) + +// after a long long time, task0.0 failed, and task0.0 can not re-run since +// there's already a running copy. +clock.advance(1000) +info1.finishTime = clock.getTimeMillis() +assert(info1.running === false) --- End diff -- `assert(!info1.running)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...
GitHub user pmackles opened a pull request: https://github.com/apache/spark/pull/21027 [SPARK-23943][MESOS][DEPLOY] Improve observability of MesosRestServer/MesosClusterDi⦠See https://issues.apache.org/jira/browse/SPARK-23943 for details on proposed changes Tested manually on branch-2.3 You can merge this pull request into a Git repository by running: $ git pull https://github.com/pmackles/spark new-SPARK-23943 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21027.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21027 commit dc06283885aed247391280a12e2cca1f6c6c22ff Author: Paul MacklesDate: 2018-04-09T15:09:34Z [SPARK-23943] Improve observability of MesosRestServer/MesosClusterDispatcher --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/21026 cc @viirya @cloud-fan @rednaxelafx --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21026 **[Test build #89128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89128/testReport)** for PR 21026 at commit [`821e08a`](https://github.com/apache/spark/commit/821e08a988e81b389d454eca01f0cd0b3e3c9463). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21026: [SPARK-23951][SQL] Use actual java class instead ...
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/21026 [SPARK-23951][SQL] Use actual java class instead of string representation. ## What changes were proposed in this pull request? This PR slightly refactors the newly added `ExprValue` API by quite a bit. The following changes are introduced: 1. `ExprValue` now uses the actual class instead of the class name as its type. This should give some more flexibility with generating code in the future. 2. Renamed `StatementValue` to `SimpleExprValue`. The statement concept is broader then an expression (untyped and it cannot be on the right hand side of an assignment), and this was not really what we were using it for. I have added a top level `JavaCode` trait that can be used in the future to reinstate (no pun intended) a statement a-like code fragment. 3. Added factory methods to the `JavaCode` companion object to make it slightly less verbose to create `JavaCode`/`ExprValue` objects. This is also what makes the diff quite large. 4. Added one more factory method to `ExprCode` to make it easier to create code-less expressions. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-23951 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21026.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21026 commit 821e08a988e81b389d454eca01f0cd0b3e3c9463 Author: Herman van HovellDate: 2018-04-10T13:55:30Z Use actual java class instead of string representation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21025 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89113/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21025 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21025 **[Test build #89113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89113/testReport)** for PR 21025 at commit [`b176f8d`](https://github.com/apache/spark/commit/b176f8d94a175190f3ef478d418341aa66d8a82c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ArrayMin(child: Expression) extends UnaryExpression with ImplicitCastInputTypes ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20925 **[Test build #4150 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4150/testReport)** for PR 20925 at commit [`262bad8`](https://github.com/apache/spark/commit/262bad88a6d4d6c2513d6da3b2b52e86cd3f5b70). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21024 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2157/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21024 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2156/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21024 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89110/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21024 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21025 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21025 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2155/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21025 **[Test build #89125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89125/testReport)** for PR 21025 at commit [`fbb9dc1`](https://github.com/apache/spark/commit/fbb9dc104a0bf78fc25d7c060f38b5485f279c1c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21024 **[Test build #89126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89126/testReport)** for PR 21024 at commit [`e082f00`](https://github.com/apache/spark/commit/e082f0017dc670441e96a9b7d2ffa527302db2e3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21024 **[Test build #89110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89110/testReport)** for PR 21024 at commit [`a296bc0`](https://github.com/apache/spark/commit/a296bc0db8b8d3befa05b7d0a8faedea4f21a625). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21007 **[Test build #89127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89127/testReport)** for PR 21007 at commit [`e865c88`](https://github.com/apache/spark/commit/e865c883abd1f1e340ef50d149e2defc5636610e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21007 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r180431349 --- Diff: python/pyspark/sql/functions.py --- @@ -2080,6 +2080,21 @@ def size(col): return Column(sc._jvm.functions.size(_to_java_column(col))) +@since(2.4) +def array_min(col): +""" +Collection function: returns the minimum value of the array. + +:param col: name of column or expression + +>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], ['data']) +>>> df.select(array_min(df.data).alias('min')).collect() +[Row(min=1), Row(min=-1)] + """ --- End diff -- you are right, good catch! I was looking for reference at the `sort_array` function below which has the same issue. I will fix it there too, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89109/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21007 **[Test build #89109 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89109/testReport)** for PR 21007 at commit [`e865c88`](https://github.com/apache/spark/commit/e865c883abd1f1e340ef50d149e2defc5636610e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20956 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21021: [SPARK-23921][SQL] Add array_sort function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21021#discussion_r180429349 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -190,28 +161,118 @@ case class SortArray(base: Expression, ascendingOrder: Expression) if (o1 == null && o2 == null) { 0 } else if (o1 == null) { - 1 + 1 * placeNullAtEnd } else if (o2 == null) { - -1 + -1 * placeNullAtEnd } else { -ordering.compare(o1, o2) } } } } - override def nullSafeEval(array: Any, ascending: Any): Any = { -val elementType = base.dataType.asInstanceOf[ArrayType].elementType + def sortEval(array: Any, ascending: Boolean): Any = { +val elementType = arrayExpression.dataType.asInstanceOf[ArrayType].elementType val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType) if (elementType != NullType) { - java.util.Arrays.sort(data, if (ascending.asInstanceOf[Boolean]) lt else gt) + java.util.Arrays.sort(data, if (ascending) lt else gt) } new GenericArrayData(data.asInstanceOf[Array[Any]]) } +} + +/** + * Sorts the input array in ascending / descending order according to the natural ordering of + * the array elements and returns it. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(array[, ascendingOrder]) - Sorts the input array in ascending or descending order according to the natural ordering of the array elements.", + examples = """ +Examples: + > SELECT _FUNC_(array('b', 'd', 'c', 'a'), true); + ["a","b","c","d"] + """) +// scalastyle:on line.size.limit +case class SortArray(base: Expression, ascendingOrder: Expression) + extends BinaryExpression with ArraySortUtil { + + def this(e: Expression) = this(e, Literal(true)) + + override def left: Expression = base + override def right: Expression = ascendingOrder + override def dataType: DataType = base.dataType + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, BooleanType) + + override def arrayExpression: Expression = base + override def placeNullAtEnd: Int = 1 + + override def checkInputDataTypes(): TypeCheckResult = base.dataType match { +case ArrayType(dt, _) if RowOrdering.isOrderable(dt) => + ascendingOrder match { +case Literal(_: Boolean, BooleanType) => + TypeCheckResult.TypeCheckSuccess +case _ => + TypeCheckResult.TypeCheckFailure( +"Sort order in second argument requires a boolean literal.") + } +case ArrayType(dt, _) => + val dtSimple = dt.simpleString + TypeCheckResult.TypeCheckFailure( +s"$prettyName does not support sorting array of type $dtSimple which is not orderable") +case _ => + TypeCheckResult.TypeCheckFailure(s"$prettyName only supports array input.") + } + + override def nullSafeEval(array: Any, ascending: Any): Any = { +sortEval(array, ascending.asInstanceOf[Boolean]) + } override def prettyName: String = "sort_array" } +/** + * Sorts the input array in ascending order according to the natural ordering of + * the array elements and returns it. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ +_FUNC_(array) - Sorts the input array in ascending order. The elements of the input array must + be orderable. Null elements will be placed at the end of the returned array.""", + examples = """ +Examples: + > SELECT _FUNC_(array('b', 'd', null, 'c', 'a')); + ["a","b","c","d",null] + """, + since = "2.4.0") +// scalastyle:on line.size.limit +case class ArraySort(child: Expression) extends UnaryExpression with ArraySortUtil { --- End diff -- Yeah, as you said they are doing similar things. Therefore, a new trait is not introduced to reuse as possible. When one is subset of another one (e.g. `size` v.s. `cardinality`), we could take an approach that one calls another one. What I am doing in `cardinality`. Good point about the description. I will add the description on how it works with `null`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r180429701 --- Diff: python/pyspark/sql/functions.py --- @@ -2080,6 +2080,21 @@ def size(col): return Column(sc._jvm.functions.size(_to_java_column(col))) +@since(2.4) +def array_min(col): +""" +Collection function: returns the minimum value of the array. + +:param col: name of column or expression + +>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], ['data']) +>>> df.select(array_min(df.data).alias('min')).collect() +[Row(min=1), Row(min=-1)] + """ --- End diff -- """ seems having one more leading space .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21001 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20925 Flaky test I've seen before: https://issues.apache.org/jira/browse/SPARK-23894 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20956 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r180427313 --- Diff: python/pyspark/sql/functions.py --- @@ -2080,6 +2080,21 @@ def size(col): return Column(sc._jvm.functions.size(_to_java_column(col))) +@since(2.4) +def array_min(col): +""" +Collection function: returns the minimum value of the array. + +:param col: name of column or expression + +>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], ['data']) +>>> df.select(array_min(df.data).alias('min')).collect() +[Row(min=1), Row(min=-1)] + """ --- End diff -- sorry, I can't see what is the problem here. May you please clarify? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20940: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20940 btw you mentioned that some of the issues were fixed, but I haven't seen any more changes, maybe you forgot to push the changes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21001 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2154/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20940: [SPARK-23429][CORE] Add executor memory metrics t...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20940#discussion_r180426692 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -234,8 +244,22 @@ private[spark] class EventLoggingListener( } } - // No-op because logging every update would be overkill - override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { } + /** + * Log if there is a new peak value for one of the memory metrics for the given executor. + * Metrics are cleared out when a new stage is started in onStageSubmitted, so this will + * log new peak memory metric values per executor per stage. + */ + override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { --- End diff -- yeah logging an event per executor at stage end seems good to me. It would be great if we could see how much that version affects log size as well, if you can get those metrics. also these tradeoffs should go into the design doc, its harder to find comments from a PR after this feature has been merged. For now, it would also be nice if you could post a version that everyone can comment on, eg. a google doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21001 **[Test build #89124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89124/testReport)** for PR 21001 at commit [`c4f359a`](https://github.com/apache/spark/commit/c4f359a4a7047569a596354eda6ea99f2549c797). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/19881 No we don't strictly need it in the name, the reasoning behind it was to indicate that this was a divisor based on if you have fully allocated executors for all the tasks and were running full parallelism. Are you suggesting just use spark.dynamicAllocation.executorAllocationDivisor? other ones thrown are were like maxExecutorAllocationDivisor. One thing we were trying to keep from doing is confusing it with the maxExecutors config as well. Opinions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20981 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2153/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20981 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21024 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21024 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89114/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21024 **[Test build #89114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89114/testReport)** for PR 21024 at commit [`c8c1d03`](https://github.com/apache/spark/commit/c8c1d0385f9ccaa714f5f57d3e65c12bf9586447). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180423825 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -119,4 +119,26 @@ object InternalRow { case v: MapData => v.copy() case _ => value } + + /** + * Returns an accessor for an InternalRow with given data type and ordinal. + */ + def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any = dataType match { --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20984 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2152/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20984 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20981 **[Test build #89123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89123/testReport)** for PR 20981 at commit [`54dd939`](https://github.com/apache/spark/commit/54dd939e4771ca1678a3c9e5ffb9fc56ee119c32). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r180422191 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,70 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + + +/** + * Returns the minimum value in the array. + */ +@ExpressionDescription( +usage = "_FUNC_(array) - Returns the minimum value in the array.", +examples = """ +Examples: + > SELECT _FUNC_(array(1, 20, null, 3)); + 1 + """, since = "2.4.0") --- End diff -- indentation .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r180421841 --- Diff: python/pyspark/sql/functions.py --- @@ -2080,6 +2080,21 @@ def size(col): return Column(sc._jvm.functions.size(_to_java_column(col))) +@since(2.4) +def array_min(col): +""" +Collection function: returns the minimum value of the array. + +:param col: name of column or expression + +>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], ['data']) +>>> df.select(array_min(df.data).alias('min')).collect() +[Row(min=1), Row(min=-1)] + """ --- End diff -- quick nit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20984 **[Test build #89122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89122/testReport)** for PR 20984 at commit [`a77128f`](https://github.com/apache/spark/commit/a77128f910eca1e0ced20257fa94ddaef513eae1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20984#discussion_r180420612 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -164,3 +167,46 @@ abstract class ArrayData extends SpecializedGetters with Serializable { } } } + +/** + * Implements an `IndexedSeq` interface for `ArrayData`. Notice that if the original `ArrayData` + * is a primitive array and contains null elements, it is better to ask for `IndexedSeq[Any]`, + * instead of `IndexedSeq[Int]`, in order to keep the null elements. + */ +class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) extends IndexedSeq[T] { + + private def getAccessor(dataType: DataType): (Int) => Any = dataType match { --- End diff -- Ok. I will also want to reuse the accessor getter in #20981 too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20984 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20933#discussion_r180419738 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -368,8 +368,7 @@ case class FileSourceScanExec( val bucketed = selectedPartitions.flatMap { p => p.files.map { f => - val hosts = getBlockHosts(getBlockLocations(f), 0, f.getLen) --- End diff -- if we agree that a separated PR is self-contained as it can help this PR, I'm also OK with it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 @wzhfy i am working on it, when i ran locally few test-cases were failing, correcting the same. once done i will update. Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hint in p...
Github user DylanGuedes commented on the issue: https://github.com/apache/spark/pull/20788 Hi, any new feedback about this? thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21025 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21025 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2151/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21025 **[Test build #89121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89121/testReport)** for PR 21025 at commit [`626f8cd`](https://github.com/apache/spark/commit/626f8cd49018ccb631e493f4cb3565bdb1415d75). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20984 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89103/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20984 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20984 **[Test build #89103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89103/testReport)** for PR 20984 at commit [`ac8d5b4`](https://github.com/apache/spark/commit/ac8d5b4e2b95bb058565af0ca14b9226775acb58). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21024 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21024 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2150/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21024 **[Test build #89120 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89120/testReport)** for PR 21024 at commit [`d017ccf`](https://github.com/apache/spark/commit/d017ccf05c9787521b4af7489b20e96c69e4b8d5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21024: [SPARK-23917][SQL] Add array_max function
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21024#discussion_r180414332 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + + +/** + * Returns the maximum value in the array. + */ +@ExpressionDescription( +usage = "_FUNC_(array) - Returns the maximum value in the array.", +examples = """ +Examples: + > SELECT _FUNC_(array(1, 20, null, 3)); + 20 + """, since = "2.4.0") +case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def nullable: Boolean = +child.nullable || child.dataType.asInstanceOf[ArrayType].containsNull + + override def foldable: Boolean = child.foldable + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(dataType) + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val childGen = child.genCode(ctx) +val javaType = CodeGenerator.javaType(dataType) +val i = ctx.freshName("i") +val item = ExprCode("", + isNull = StatementValue(s"${childGen.value}.isNullAt($i)", "boolean"), + value = StatementValue(CodeGenerator.getValue(childGen.value, dataType, i), javaType)) +ev.copy(code = + s""" + |${childGen.code} + |boolean ${ev.isNull} = true; + |$javaType ${ev.value} = ${CodeGenerator.defaultValue(dataType)}; + |if (!${childGen.isNull}) { + | for (int $i = 0; $i < ${childGen.value}.numElements(); $i ++) { + |${ctx.reassignIfGreater(dataType, ev, item)} + | } + |} + """.stripMargin) + } + + override protected def nullSafeEval(input: Any): Any = { +var max: Any = null +input.asInstanceOf[ArrayData].foreach(dataType, (_, item) => + if (item != null && (max == null || ordering.gt(item, max))) { +max = item + } +) +max + } + + override def dataType: DataType = child.dataType match { +case ArrayType(dt, _) => dt --- End diff -- I added the check in the `checkInputDataTypes` method, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20938: [SPARK-23821][SQL] Collection function: flatten
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20938 **[Test build #89119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89119/testReport)** for PR 20938 at commit [`b9d99f7`](https://github.com/apache/spark/commit/b9d99f70cabadfaae72102e1d3ca80ccd2a616df). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20938: [SPARK-23821][SQL] Collection function: flatten
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/20938 Any idea why those tests are failing? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19627 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19627 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89111/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19627 **[Test build #89111 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89111/testReport)** for PR 19627 at commit [`81473b0`](https://github.com/apache/spark/commit/81473b0846d1054409922f6cc5a0d3242d313c22). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20938: [SPARK-23821][SQL] Collection function: flatten
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20938 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org