[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88442/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20851 **[Test build #88454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88454/testReport)** for PR 20851 at commit [`7946bea`](https://github.com/apache/spark/commit/7946bea7c0eed08808696f34732c434e2c8ab4ea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20345 **[Test build #88442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88442/testReport)** for PR 20345 at commit [`895b6a1`](https://github.com/apache/spark/commit/895b6a1595fef31f2028180a36869c6d344e5ac7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 https://user-images.githubusercontent.com/18561820/37696015-b1250bae-2c90-11e8-8ad1-515661487b94.png;> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 https://user-images.githubusercontent.com/18561820/37695954-5aacaa2a-2c90-11e8-9f73-f57d0e1b27f6.png;> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19381 **[Test build #88446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88446/testReport)** for PR 19381 at commit [`20b245a`](https://github.com/apache/spark/commit/20b245ad49124d8d8b42c6835859759cd6af7964). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88446/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20745 **[Test build #88453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88453/testReport)** for PR 20745 at commit [`214cddc`](https://github.com/apache/spark/commit/214cddc242fbfa9a217d544ec695b062c148cd85). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18982 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18982 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1667/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18982 **[Test build #88452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88452/testReport)** for PR 18982 at commit [`9162944`](https://github.com/apache/spark/commit/9162944cf39a61db8060ee83829e7537dc979663). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20727 **[Test build #88451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88451/testReport)** for PR 20727 at commit [`f1c951f`](https://github.com/apache/spark/commit/f1c951f0c84e334e185a0bcc810c08d48ca726e8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1666/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1665/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20579 **[Test build #88450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88450/testReport)** for PR 20579 at commit [`4fe4eb6`](https://github.com/apache/spark/commit/4fe4eb6dee62b85523cd937c97076285836350a9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20851#discussion_r175987083 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } + test("filter pushdown - date") { +implicit class IntToDate(int: Int) { --- End diff -- Yup, I think this one is better than the current one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20870: [SPARK-23760][SQL] CodegenContext.withSubExprElim...
Github user rednaxelafx commented on a diff in the pull request: https://github.com/apache/spark/pull/20870#discussion_r175986986 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -942,7 +940,7 @@ class CodegenContext { def subexpressionEliminationForWholeStageCodegen(expressions: Seq[Expression]): SubExprCodes = { // Create a clear EquivalentExpressions and SubExprEliminationState mapping val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions -val subExprEliminationExprs = mutable.HashMap.empty[Expression, SubExprEliminationState] +val localSubExprEliminationExprs = mutable.HashMap.empty[Expression, SubExprEliminationState] --- End diff -- This renaming isn't necessary for the fix per-se, but I'd like to piggyback it on this change so that it's clearer that we're not interfering with the current CSE state of this CodegenContext here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/20579 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20851#discussion_r175986514 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } + test("filter pushdown - date") { +implicit class IntToDate(int: Int) { --- End diff -- I think `"2017-08-19".d` is at least better than `1.d`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20870: [SPARK-23760][SQL] CodegenContext.withSubExprElimination...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20870 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1664/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20870: [SPARK-23760][SQL] CodegenContext.withSubExprElimination...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20870 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88440/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20579 **[Test build #88440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88440/testReport)** for PR 20579 at commit [`4fe4eb6`](https://github.com/apache/spark/commit/4fe4eb6dee62b85523cd937c97076285836350a9). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20870: [SPARK-23760][SQL]: CodegenContext.withSubExprEliminatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20870 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1663/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20870: [SPARK-23760][SQL]: CodegenContext.withSubExprEliminatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20870 **[Test build #88449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88449/testReport)** for PR 20870 at commit [`df45286`](https://github.com/apache/spark/commit/df452861d16eb36c9982f6c438ea1dc2f8d9d1fc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20870: [SPARK-23760][SQL]: CodegenContext.withSubExprEliminatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20870 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20870: [SPARK-23760][SQL]: CodegenContext.withSubExprEliminatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20870 **[Test build #88448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88448/testReport)** for PR 20870 at commit [`8635969`](https://github.com/apache/spark/commit/863596956ced94c49289c9eaeebf544b2de68f15). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20870: [SPARK-23760][SQL]: CodegenContext.withSubExprEli...
GitHub user rednaxelafx opened a pull request: https://github.com/apache/spark/pull/20870 [SPARK-23760][SQL]: CodegenContext.withSubExprEliminationExprs should save/restore CSE state correctly ## What changes were proposed in this pull request? Fixed `CodegenContext.withSubExprEliminationExprs()` so that it saves/restores CSE state correctly. ## How was this patch tested? Added new unit test to verify that the old CSE state is indeed saved and restored around the `withSubExprEliminationExprs()` call. Manually verified that this test fails without this patch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rednaxelafx/apache-spark codegen-subexpr-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20870.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20870 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20787: Documenting months_between direction
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20787#discussion_r175985253 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1115,13 +1115,17 @@ case class AddMonths(startDate: Expression, numMonths: Expression) override def prettyName: String = "add_months" } - /** - * Returns number of months between dates date1 and date2. - */ + * Returns number of months between dates `timestamp1` and `timestamp2`. + * If `timestamp` is later than `timestamp2`, then the result is positive. + * If `timestamp1` and `timestamp2` are on the same day of month, or both + * are the last day of month, returns an integer (time of day will be ignored). + * Otherwise, the difference is calculated based on 31 days per month, and + * rounded to 8 digits. +*/ // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(timestamp1, timestamp2) - Returns number of months between `timestamp1` and `timestamp2`.", + usage = "_FUNC_(timestamp1, timestamp2) - Returns number of months between `timestamp1` and `timestamp2`. Positive if `timestamp1` is later than `timestamp2`", --- End diff -- You could do either ```scala @ExpressionDescription( usage = """ _FUNC_(timestamp1, timestamp2) - blablabla blabla blabla """, ... ``` Let's add the description here too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20787 Seems fine otherwise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20787: Documenting months_between direction
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20787#discussion_r175983804 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -881,10 +881,10 @@ object DateTimeUtils { * Returns number of months between time1 and time2. time1 and time2 are expressed in * microseconds since 1.1.1970. * - * If time1 and time2 having the same day of month, or both are the last day of month, - * it returns an integer (time under a day will be ignored). + * If time1 and time2 are on the same day of month, or both are the last day of month, + * returns an integer (time under a day will be ignored). --- End diff -- It seems a bit awkward because it actually returns a double. Shall we fix this like .. `returns an integer (time under a day will be ignored)` -> `time under a day will be ignored.`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20787: Documenting months_between direction
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20787#discussion_r175983334 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1115,13 +1115,17 @@ case class AddMonths(startDate: Expression, numMonths: Expression) override def prettyName: String = "add_months" } - --- End diff -- Let's revert this change back. Seems unrelated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20787: Documenting months_between direction
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20787#discussion_r175982564 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1115,13 +1115,17 @@ case class AddMonths(startDate: Expression, numMonths: Expression) override def prettyName: String = "add_months" } - /** - * Returns number of months between dates date1 and date2. - */ + * Returns number of months between dates `timestamp1` and `timestamp2`. --- End diff -- Hm, this should have been caught by Scala linter because we follow Java style comment. See "Code documentation style" in http://spark.apache.org/contributing.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...
Github user liufengdb commented on the issue: https://github.com/apache/spark/pull/18666 I asked the following question in https://github.com/apache/spark/pull/20864: is it necessary to create these temp directories when the hive thrift server starts? It sounds some legacy from Hive and we can skip creating them in the first place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20864: [SPARK-23745][SQL]Remove the directories of the “hive....
Github user liufengdb commented on the issue: https://github.com/apache/spark/pull/20864 @samartinucci @zuotingbing a high-level question: is it necessary to create these temp directories when the hive thrift server starts? It sounds some legacy from Hive and we can skip creating them in the first place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20795 **[Test build #88447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88447/testReport)** for PR 20795 at commit [`17f7e74`](https://github.com/apache/spark/commit/17f7e741632548f263a933b29a66f67b59af6725). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1662/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19381 **[Test build #88446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88446/testReport)** for PR 19381 at commit [`20b245a`](https://github.com/apache/spark/commit/20b245ad49124d8d8b42c6835859759cd6af7964). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19381 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20727#discussion_r175982059 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -30,9 +30,19 @@ import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl /** * An adaptor from a [[PartitionedFile]] to an [[Iterator]] of [[Text]], which are all of the lines * in that file. + * + * @param file A part (i.e. "block") of a single file that should be read line by line. + * @param lineSeparator A line separator that should be used for each line. If the value is `None`, + * it covers `\r`, `\r\n` and `\n`. --- End diff -- Sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20827 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88439/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20827 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20827 **[Test build #88439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88439/testReport)** for PR 20827 at commit [`043d6c1`](https://github.com/apache/spark/commit/043d6c1a888fbe3593dcb98a84c2c8aec4b35a28). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/20028 Please advice if this is a good feature to add. If not I'll close it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19599 Please advice if this is a good feature to add. If not I'll close it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20831: [SPARK-23614][SQL] Fix incorrect reuse exchange w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20831#discussion_r175980451 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -68,6 +69,15 @@ case class InMemoryRelation( override protected def innerChildren: Seq[SparkPlan] = Seq(child) + override def doCanonicalize(): logical.LogicalPlan = +copy(output = output.map(QueryPlan.normalizeExprId(_, child.output)), + storageLevel = StorageLevel.NONE, --- End diff -- It is followed. I just ignored `useCompression`, `batchSize` as they are just primitives and don't need to be canonicalized here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20831: [SPARK-23614][SQL] Fix incorrect reuse exchange w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20831#discussion_r175980243 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -169,7 +174,10 @@ case class InMemoryTableScanExec( override def outputOrdering: Seq[SortOrder] = relation.child.outputOrdering.map(updateAttribute(_).asInstanceOf[SortOrder]) - private def statsFor(a: Attribute) = relation.partitionStatistics.forAttribute(a) + // When we make canonicalized plan, we can't find a normalized attribute in this map. + // We return a `ColumnStatisticsSchema` for normalized attribute in this case. --- End diff -- I've tried that at beginning. However, `partitionFilters` uses `buildFilter`. Making `partitionFilters` a lazy doesn't work because when do `copy`, the initialization of `InMemoryTableScanExec` will try to materialize `partitionFilters` for coping it value. Making `partitionFilters`, `buildFilter` as methods is not enough too, we also need to remove `@transient` from `relation` and `InMemoryRelation.partitionStatistics`. So I think it isn't worth and leave it as is. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17583 Please advice if this is a good feature to add. If not I'll close it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17280 Please advice if this is a good feature to add. If not I'll close it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16158 Please advice if this is a good feature to add. If not I'll close it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20786 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1661/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20786 **[Test build #88445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88445/testReport)** for PR 20786 at commit [`9707fe5`](https://github.com/apache/spark/commit/9707fe5db5b23f071282dc897adea337a2796c8d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20786 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20864: [SPARK-23745][SQL]Remove the directories of the “hive....
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/20864 i take a look at [https://github.com/apache/spark/pull/18666]. i find it can not clean all the ***_resources directories. because when we start HiveThriftServer2, we created two resource directories: `8/03/21 11:23:33 INFO **SessionState: Created local directory: /data1/zdh/spark/hive/tmp/616f66c9-fa4e-4a0c-a63a-10ff97e5019c_resources** 18/03/21 11:23:33 INFO SessionState: Created HDFS directory: /spark-tmp/scratchdir/root/616f66c9-fa4e-4a0c-a63a-10ff97e5019c 18/03/21 11:23:33 INFO SessionState: Created local directory: /data1/zdh/spark/hive/tmp/616f66c9-fa4e-4a0c-a63a-10ff97e5019c 18/03/21 11:23:33 INFO SessionState: Created HDFS directory: /spark-tmp/scratchdir/root/616f66c9-fa4e-4a0c-a63a-10ff97e5019c/_tmp_space.db 18/03/21 11:23:33 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is file:/media/A/gitspace/spark/dist/sbin/spark-warehouse 18/03/21 11:23:33 INFO HiveMetaStore: 0: get_database: default 18/03/21 11:23:33 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default 18/03/21 11:23:33 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 18/03/21 11:23:33 INFO HiveUtils: Initializing execution hive, version 1.2.1 18/03/21 11:23:34 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 18/03/21 11:23:34 INFO ObjectStore: ObjectStore, initialize called 18/03/21 11:23:34 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 18/03/21 11:23:34 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored 18/03/21 11:23:36 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 18/03/21 11:23:36 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 18/03/21 11:23:36 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 18/03/21 11:23:37 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 18/03/21 11:23:37 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 18/03/21 11:23:37 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 18/03/21 11:23:37 INFO ObjectStore: Initialized ObjectStore 18/03/21 11:23:37 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 18/03/21 11:23:38 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 18/03/21 11:23:38 INFO HiveMetaStore: Added admin role in metastore 18/03/21 11:23:38 INFO HiveMetaStore: Added public role in metastore 18/03/21 11:23:38 INFO HiveMetaStore: No user is added in admin role, since config is empty 18/03/21 11:23:38 INFO HiveMetaStore: 0: get_all_databases 18/03/21 11:23:38 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases 18/03/21 11:23:38 INFO HiveMetaStore: 0: get_functions: db=default pat=* 18/03/21 11:23:38 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=* 18/03/21 11:23:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table. 18/03/21 11:23:38 INFO **SessionState: Created local directory: /data1/zdh/spark/hive/tmp/16aa5bb9-33e4-43e6-8bdb-8e0318ab175e_resources** 18/03/21 11:23:38 INFO SessionState: Created HDFS directory: /spark-tmp/scratchdir/root/16aa5bb9-33e4-43e6-8bdb-8e0318ab175e 18/03/21 11:23:38 INFO SessionState: Created local directory: /data1/zdh/spark/hive/tmp/16aa5bb9-33e4-43e6-8bdb-8e0318ab175e 18/03/21 11:23:38 INFO SessionState: Created HDFS directory: /spark-tmp/scratchdir/root/16aa5bb9-33e4-43e6-8bdb-8e0318ab175e/_tmp_space.db 18/03/21 11:23:38 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is file:/media/A/gitspace/spark/dist/sbin/spark-warehouse` but when stop just remove only one resource directory which is current: `public void close() throws IOException { registry.clear(); if (txnMgr != null) txnMgr.closeTxnManager(); JavaUtils.closeClassLoadersTo(conf.getClassLoader(), parentLoader); **File resourceDir = new File(getConf().getVar(HiveConf.ConfVars.DOWNLOADED_RESOURCES_DIR))**; LOG.debug("Removing resource dir "
[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20745 LGTM, can you also attach a web UI SQL tab screenshot? thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20745: [SPARK-23288][SS] Fix output metrics with parquet...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20745#discussion_r175978136 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -405,4 +406,53 @@ class FileStreamSinkSuite extends StreamTest { } } } + + test("SPARK-23288 writing and checking output metrics") { +Seq("parquet", "orc", "text", "json").foreach { format => + val inputData = MemoryStream[String] + val df = inputData.toDF() + + val outputDir = Utils.createTempDir(namePrefix = "stream.output").getCanonicalPath --- End diff -- we should use `withTempDir` to clean up the temp directory at the end --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88438/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #88438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88438/testReport)** for PR 20433 at commit [`5ee6f89`](https://github.com/apache/spark/commit/5ee6f897bc71eac24e086f39549ef3a396059b4d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20727 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20786 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1660/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20786 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20786 **[Test build #88444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88444/testReport)** for PR 20786 at commit [`2ee7e72`](https://github.com/apache/spark/commit/2ee7e7227bd18ccffbf415e83588a3dde2c8fd3a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20727#discussion_r175977344 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -30,9 +30,19 @@ import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl /** * An adaptor from a [[PartitionedFile]] to an [[Iterator]] of [[Text]], which are all of the lines * in that file. + * + * @param file A part (i.e. "block") of a single file that should be read line by line. + * @param lineSeparator A line separator that should be used for each line. If the value is `None`, + * it covers `\r`, `\r\n` and `\n`. --- End diff -- We should mention that this default rule is not defined by us, but by hadoop. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20786 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1659/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20786 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20786 **[Test build #88443 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88443/testReport)** for PR 20786 at commit [`3fac42e`](https://github.com/apache/spark/commit/3fac42e4d7713d156b691ffcacaa0519e3e85b77). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20803: [SPARK-23653][SQL] Show sql statement in spark SQ...
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/20803#discussion_r175975380 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -166,20 +168,28 @@ private[sql] object Dataset { class Dataset[T] private[sql]( @transient val sparkSession: SparkSession, @DeveloperApi @InterfaceStability.Unstable @transient val queryExecution: QueryExecution, -encoder: Encoder[T]) +encoder: Encoder[T], +val sqlText: String = "") --- End diff -- Your speculation is almost right. First call val df = spark.sql(), then separates the sql text with pattern matching to there type: count, limit and other. if count, then invoke the df.showString(2,20). if limit, just invoke df.limit(1).foreach, the last type other will do noting. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/20851#discussion_r175975025 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -313,6 +315,36 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } + test("filter pushdown - date") { +implicit class IntToDate(int: Int) { + def d: Date = new Date(Date.valueOf("2018-03-01").getTime + 24 * 60 * 60 * 1000 * (int - 1)) +} + +withParquetDataFrame((1 to 4).map(i => Tuple1(i.d))) { implicit df => + checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row]) + checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]], (1 to 4).map(i => Row.apply(i.d))) + + checkFilterPredicate('_1 === 1.d, classOf[Eq[_]], 1.d) --- End diff -- Got it, thanks very much for explanation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19381 **[Test build #4142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4142/testReport)** for PR 19381 at commit [`20b245a`](https://github.com/apache/spark/commit/20b245ad49124d8d8b42c6835859759cd6af7964). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20345 **[Test build #88442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88442/testReport)** for PR 20345 at commit [`895b6a1`](https://github.com/apache/spark/commit/895b6a1595fef31f2028180a36869c6d344e5ac7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1658/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20868 **[Test build #88441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88441/testReport)** for PR 20868 at commit [`0d189ab`](https://github.com/apache/spark/commit/0d189ab49b2dcb748b51f875f1a04e6b2fb9f69b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20868 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1657/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20868 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/20868 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20868 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20868 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88437/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20868 **[Test build #88437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88437/testReport)** for PR 20868 at commit [`0d189ab`](https://github.com/apache/spark/commit/0d189ab49b2dcb748b51f875f1a04e6b2fb9f69b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait CatalogMetadata ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20767 @tdas @zsxwing @koeninger @tedyu do you think it makes sense to make similar step in the DStream area like this and then later follow with the mentioned Apache Common Pool? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20869: Improve implicitNotFound message for Encoder
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20869 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20869: Improve implicitNotFound message for Encoder
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20869 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20869: Improve implicitNotFound message for Encoder
GitHub user ceedubs opened a pull request: https://github.com/apache/spark/pull/20869 Improve implicitNotFound message for Encoder The `implicitNotFound` message for `Encoder` doesn't mention the name of the type for which it can't find an encoder. Furthermore, it covers up the fact that `Encoder` is the name of the relevant type class. Hopefully this new message provides a little more specific type detail while still giving the general message about which types are supported. ## What changes were proposed in this pull request? Augment the existing message to mention that it's looking for an `Encoder` and what the type of the encoder is. For example instead of: ``` Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. ``` return this message: ``` Unable to find encoder for type Exception. An implicit Encoder[Exception] is needed to store Exception instances in a Dataset. Primitive types (Int, String, etc) and Product types (ca se classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. ``` ## How was this patch tested? It was tested manually in the Scala REPL, since triggering this in a test would cause a compilation error. ``` scala> implicitly[Encoder[Exception]] :51: error: Unable to find encoder for type Exception. An implicit Encoder[Exception] is needed to store Exception instances in a Dataset. Primitive types (Int, String, etc) and Product types (ca se classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. implicitly[Encoder[Exception]] ^ ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/ceedubs/spark encoder-implicit-msg Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20869.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20869 commit 588dffc51df53bcbb885305e8ecd5bf39aa2e465 Author: Cody AllenDate: 2018-03-21T01:05:02Z Improve implicitNotFound message for Encoder The `implicitNotFound` message for `Encoder` doesn't mention the name of the type for which it can't find an encoder. Furthermore, it covers up the fact that `Encoder` is the name of the relevant type class. Hopefully this new message provides a little more specific type detail while still giving the general message about which types are supported. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88436/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20579 **[Test build #88436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88436/testReport)** for PR 20579 at commit [`ecf0865`](https://github.com/apache/spark/commit/ecf08654d4c7b50eb498481011d3c6f856419207). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20695: [SPARK-21741][ML][PySpark] Python API for DataFra...
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20695#discussion_r175971741 --- Diff: python/pyspark/ml/stat.py --- @@ -132,6 +134,172 @@ def corr(dataset, column, method="pearson"): return _java2py(sc, javaCorrObj.corr(*args)) +class Summarizer(object): +""" +.. note:: Experimental + +Tools for vectorized statistics on MLlib Vectors. +The methods in this package provide various statistics for Vectors contained inside DataFrames. +This class lets users pick the statistics they would like to extract for a given column. + +>>> from pyspark.ml.stat import Summarizer +>>> from pyspark.sql import Row +>>> from pyspark.ml.linalg import Vectors +>>> summarizer = Summarizer.metrics("mean", "count") +>>> df = sc.parallelize([Row(weight=1.0, features=Vectors.dense(1.0, 1.0, 1.0)), +... Row(weight=0.0, features=Vectors.dense(1.0, 2.0, 3.0))]).toDF() +>>> df.select(summarizer.summary(df.features, df.weight)).show(truncate=False) ++---+ +|aggregate_metrics(features, weight)| ++---+ +|[[1.0,1.0,1.0], 1] | ++---+ + +>>> df.select(summarizer.summary(df.features)).show(truncate=False) +++ +|aggregate_metrics(features, 1.0)| +++ +|[[1.0,1.5,2.0], 2] | +++ + +>>> df.select(Summarizer.mean(df.features, df.weight)).show(truncate=False) ++--+ +|mean(features)| ++--+ +|[1.0,1.0,1.0] | ++--+ + +>>> df.select(Summarizer.mean(df.features)).show(truncate=False) ++--+ +|mean(features)| ++--+ +|[1.0,1.5,2.0] | ++--+ + + +.. versionadded:: 2.4.0 + +""" +def __init__(self, js): +self._js = js + +@staticmethod +@since("2.4.0") +def mean(col, weightCol=None): +""" +return a column of mean summary +""" +return Summarizer._get_single_metric(col, weightCol, "mean") + +@staticmethod +@since("2.4.0") +def variance(col, weightCol=None): +""" +return a column of variance summary +""" +return Summarizer._get_single_metric(col, weightCol, "variance") + +@staticmethod +@since("2.4.0") +def count(col, weightCol=None): +""" +return a column of count summary +""" +return Summarizer._get_single_metric(col, weightCol, "count") + +@staticmethod +@since("2.4.0") +def numNonZeros(col, weightCol=None): +""" +return a column of numNonZero summary +""" +return Summarizer._get_single_metric(col, weightCol, "numNonZeros") + +@staticmethod +@since("2.4.0") +def max(col, weightCol=None): +""" +return a column of max summary +""" +return Summarizer._get_single_metric(col, weightCol, "max") + +@staticmethod +@since("2.4.0") +def min(col, weightCol=None): +""" +return a column of min summary +""" +return Summarizer._get_single_metric(col, weightCol, "min") + +@staticmethod +@since("2.4.0") +def normL1(col, weightCol=None): +""" +return a column of normL1 summary +""" +return Summarizer._get_single_metric(col, weightCol, "normL1") + +@staticmethod +@since("2.4.0") +def normL2(col, weightCol=None): +""" +return a column of normL2 summary +""" +return Summarizer._get_single_metric(col, weightCol, "normL2") + +@staticmethod +def _check_param(featureCol, weightCol): +if weightCol is None: +weightCol = lit(1.0) +if not isinstance(featureCol, Column) or not isinstance(weightCol, Column): +raise TypeError("featureCol and weightCol should be a Column") +return featureCol, weightCol + +@staticmethod +def _get_single_metric(col, weightCol, metric): +col, weightCol = Summarizer._check_param(col, weightCol) +return Column(JavaWrapper._new_java_obj("org.apache.spark.ml.stat.Summarizer." + metric, +col._jc, weightCol._jc)) + +
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175971417 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -84,19 +84,49 @@ object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper { } } + // Extract a list of logical plans to be joined for join-order comparisons. + // Since `ExtractFiltersAndInnerJoins` handles left-deep trees only, this function have + // the same strategy to extract the plan list. + private def extractLeftDeepInnerJoins(plan: LogicalPlan): Seq[LogicalPlan] = plan match { +case j @ Join(left, right, _: InnerLike, _) => right +: extractLeftDeepInnerJoins(left) +case p @ Project(_, j @ Join(_, _, _: InnerLike, _)) => extractLeftDeepInnerJoins(j) +case _ => Seq(plan) + } + + private def checkSameJoinOrder(plan1: LogicalPlan, plan2: LogicalPlan): Boolean = { +extractLeftDeepInnerJoins(plan1) == extractLeftDeepInnerJoins(plan2) + } + + private def mayCreateOrderedJoin( + originalPlan: LogicalPlan, + input: Seq[(LogicalPlan, InnerLike)], + conditions: Seq[Expression]): LogicalPlan = { +val orderedJoins = createOrderedJoin(input, conditions) +if (!checkSameJoinOrder(orderedJoins, originalPlan)) { --- End diff -- If we don't have this check, `operatorOptimizationRuleSet` reaches `fixedPoint` because `ReorderJoin` is re-applied in the same join trees every time the optimization rule batch invoked. This case does not happen in the master because reordered joins have `Project` in internal nodes (`Project` added by following optimization rules, e.g., `ColumnPruning`) and this plan structure guards this case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175971428 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala --- @@ -145,4 +159,15 @@ class JoinOptimizationSuite extends PlanTest { } assert(broadcastChildren.size == 1) } + + test("SPARK-23172 skip projections when flattening joins") { --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175971439 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -141,14 +141,16 @@ object ExtractEquiJoinKeys extends Logging with PredicateHelper { } /** - * A pattern that collects the filter and inner joins. + * A pattern that collects the filter and inner joins (and skip projections in plan sub-trees). --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1656/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20579 **[Test build #88440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88440/testReport)** for PR 20579 at commit [`4fe4eb6`](https://github.com/apache/spark/commit/4fe4eb6dee62b85523cd937c97076285836350a9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org