[GitHub] [spark] cloud-fan commented on a change in pull request #29152: [SPARK-32356][SQL] Forbid create view with null type
cloud-fan commented on a change in pull request #29152: URL: https://github.com/apache/spark/pull/29152#discussion_r457078454 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala ## @@ -1557,6 +1557,17 @@ class PlanResolutionSuite extends AnalysisTest { checkFailure("testcat.tab", "foo") } + test("SPARK-32356: forbid null type in create view") { +val sql1 = "create view v as select null as c" +val sql2 = "alter view v as select null as c" +Seq(sql1, sql2).foreach { sql => + val msg = intercept[AnalysisException] { +parseAndResolve(sql) + }.getMessage + assert(msg.contains(s"Cannot create tables with ${NullType.simpleString} type.")) Review comment: shall we update the error message to be `tables/views`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29152: [SPARK-32356][SQL] Forbid create view with null type
cloud-fan commented on a change in pull request #29152: URL: https://github.com/apache/spark/pull/29152#discussion_r457078204 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala ## @@ -1557,6 +1557,17 @@ class PlanResolutionSuite extends AnalysisTest { checkFailure("testcat.tab", "foo") } + test("SPARK-32356: forbid null type in create view") { +val sql1 = "create view v as select null as c" +val sql2 = "alter view v as select null as c" Review comment: can we test temp view as well? also the df api `df.createTempView` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #29159: [SPARK-32310][ML][PySpark] ML params default value parity
huaxingao commented on pull request #29159: URL: https://github.com/apache/spark/pull/29159#issuecomment-660820452 cc @srowen @viirya @zhengruifeng This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #29139: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
huaxingao commented on a change in pull request #29139: URL: https://github.com/apache/spark/pull/29139#discussion_r457076755 ## File path: docs/ml-guide.md ## @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on -[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing. -If native libraries[^1] are not available at runtime, you will see a warning message and a pure JVM -implementation will be used instead. +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths. -Due to licensing issues with runtime proprietary binaries, we do not include `netlib-java`'s native -proxies by default. -To configure `netlib-java` / Breeze to use system optimised binaries, include -`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as a dependency of your -project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your -platform's additional installation instructions. - -The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. - -Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. - -Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) or [Intel oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). Note that if nativeBLAS is not properly configured in system, java implementation(f2jBLAS) will be used as fallback option. +Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.md) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message below and a pure JVM implementation will be used instead: Review comment: `ml-linalg-guide.html` instead of `ml-linalg-guide.md`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions
cloud-fan commented on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-660817399 The github action checks are all passed. We don't need to wait for jenkins. @wangyum can you do the final sign-off? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29158: [SPARK-32362][SQL][TEST] AdaptiveQueryExecSuite misses verifying AE results
cloud-fan commented on a change in pull request #29158: URL: https://github.com/apache/spark/pull/29158#discussion_r457071998 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -68,7 +68,9 @@ class AdaptiveQueryExecSuite val result = dfAdaptive.collect() withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false") { val df = sql(query) - QueryTest.sameRows(result.toSeq, df.collect().toSeq) + QueryTest.sameRows(result.toSeq, df.collect().toSeq).foreach { Review comment: good catch! can we use `checkAnswer(df, result)`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29158: [SPARK-32362][SQL][TEST] AdaptiveQueryExecSuite misses verifying AE results
cloud-fan commented on pull request #29158: URL: https://github.com/apache/spark/pull/29158#issuecomment-660816143 cc @maryannxue @JkSelf This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r457071605 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ## @@ -713,13 +714,18 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { } (Seq.empty, Option(name), props.toSeq, recordHandler) - case null => + case null if conf.getConf(CATALOG_IMPLEMENTATION).equals("hive") => // Use default (serde) format. val name = conf.getConfString("hive.script.serde", "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe") val props = Seq("field.delim" -> "\t") val recordHandler = Option(conf.getConfString(configKey, defaultConfigValue)) (Nil, Option(name), props, recordHandler) + + // SPARK-32106: When there is no definition about format, we return empty result + // to use a built-in default Serde in SparkScriptTransformationExec. + case null => +(Nil, None, Seq.empty, None) Review comment: > > CalenderIntervalType/ArrayType/MapType/StructType as input of hive default serde will throw error > > btw, we already have end-2-end tests for the unspported cases in the hive side? Added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions
gengliangwang commented on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-660814078 @maropu I checked the output of the optimized query plan of the 3 queries and they are equivalent. I think the performance result should be consistent. [after.txt](https://github.com/apache/spark/files/4945705/after.txt) [before.txt](https://github.com/apache/spark/files/4945706/before.txt) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29159: [SPARK-32310][ML][PySpark] ML params default value parity
AmplabJenkins removed a comment on pull request #29159: URL: https://github.com/apache/spark/pull/29159#issuecomment-660812454 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29159: [SPARK-32310][ML][PySpark] ML params default value parity
AmplabJenkins commented on pull request #29159: URL: https://github.com/apache/spark/pull/29159#issuecomment-660812454 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29159: [SPARK-32310][ML][PySpark] ML params default value parity
SparkQA commented on pull request #29159: URL: https://github.com/apache/spark/pull/29159#issuecomment-660812005 **[Test build #126152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126152/testReport)** for PR 29159 at commit [`f657d77`](https://github.com/apache/spark/commit/f657d7778f2574914e955b461d0e4dd8d92c7bcf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao opened a new pull request #29159: [SPARK-32310][ML][PySpark] ML params default value parity
huaxingao opened a new pull request #29159: URL: https://github.com/apache/spark/pull/29159 ### What changes were proposed in this pull request? backporting the changes to 3.0 set params default values in trait Params for feature and tuning in both Scala and Python. ### Why are the changes needed? Make ML has the same default param values between estimator and its corresponding transformer, and also between Scala and Python. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing and modified tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activat
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660808078 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126151/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activate
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660805371 **[Test build #126151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126151/testReport)** for PR 29117 at commit [`ece8906`](https://github.com/apache/spark/commit/ece89067ebaf67b84f6d3c108ec15c6b569957a1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activat
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660808073 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activate
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660808046 **[Test build #126151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126151/testReport)** for PR 29117 at commit [`ece8906`](https://github.com/apache/spark/commit/ece89067ebaf67b84f6d3c108ec15c6b569957a1). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activate
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660808073 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activat
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660805680 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activate
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660805680 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activate
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660805371 **[Test build #126151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126151/testReport)** for PR 29117 at commit [`ece8906`](https://github.com/apache/spark/commit/ece89067ebaf67b84f6d3c108ec15c6b569957a1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activate
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660797226 **[Test build #126149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126149/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activat
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660804936 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activate
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660804936 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activate
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660804807 **[Test build #126149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126149/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
dbtsai commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-660802517 Thanks! This is a great milestone. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activat
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660801711 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126150/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activat
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660801706 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activate
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660799121 **[Test build #126150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126150/testReport)** for PR 29117 at commit [`7690935`](https://github.com/apache/spark/commit/769093576cf0f5d79e2069df82031310f498e017). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activate
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660801706 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [SPARK-32363][PYTHON][BUILD] Avoid using --user in pip installation test and explicitly choose conda and source for (de)activate
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660801689 **[Test build #126150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126150/testReport)** for PR 29117 at commit [`7690935`](https://github.com/apache/spark/commit/769093576cf0f5d79e2069df82031310f498e017). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660799407 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660799407 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660799121 **[Test build #126150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126150/testReport)** for PR 29117 at commit [`7690935`](https://github.com/apache/spark/commit/769093576cf0f5d79e2069df82031310f498e017). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize
AmplabJenkins removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660798223 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126138/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize
AmplabJenkins removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660798219 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize
AmplabJenkins commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660798219 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize
SparkQA commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660797872 **[Test build #126138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126138/testReport)** for PR 29104 at commit [`dc76141`](https://github.com/apache/spark/commit/dc761417a17530fa198d0471902605e6acd70995). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize
SparkQA removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660766081 **[Test build #126138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126138/testReport)** for PR 29104 at commit [`dc76141`](https://github.com/apache/spark/commit/dc761417a17530fa198d0471902605e6acd70995). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660797488 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660797488 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660797226 **[Test build #126149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126149/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
holdenk commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-660796492 Merged to dev branch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] asfgit closed pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
asfgit closed pull request #28708: URL: https://github.com/apache/spark/pull/28708 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29117: [WIP] Debug flaky pip installation test failure
HyukjinKwon commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660796154 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #29021: [SPARK-32201][SQL] More general skew join pattern matching
LantaoJin commented on a change in pull request #29021: URL: https://github.com/apache/spark/pull/29021#discussion_r457039081 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala ## @@ -340,3 +340,28 @@ case class BroadcastPartitioning(mode: BroadcastMode) extends Partitioning { case _ => false } } + +/** Review comment: Hi @JkSelf I will provide another approach that removes this `CoalescedHashPartitioning` and simplify the code. But current implementation with `CoalescedHashPartitioning` might be more general for more cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins removed a comment on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660794069 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126144/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins removed a comment on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660794065 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
SparkQA removed a comment on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660786632 **[Test build #126144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126144/testReport)** for PR 29085 at commit [`72b2155`](https://github.com/apache/spark/commit/72b215558b5d3e326ebe2416367a9d33455f9d58). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660794065 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29158: [SPARK-32362][SQL][TEST] AdaptiveQueryExecSuite misses verifying AE results
AmplabJenkins removed a comment on pull request #29158: URL: https://github.com/apache/spark/pull/29158#issuecomment-660793719 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
SparkQA commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660793992 **[Test build #126144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126144/testReport)** for PR 29085 at commit [`72b2155`](https://github.com/apache/spark/commit/72b215558b5d3e326ebe2416367a9d33455f9d58). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29158: [SPARK-32362][SQL][TEST] AdaptiveQueryExecSuite misses verifying AE results
AmplabJenkins commented on pull request #29158: URL: https://github.com/apache/spark/pull/29158#issuecomment-660793719 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29158: [SPARK-32362][SQL][TEST] AdaptiveQueryExecSuite misses verifying AE results
SparkQA commented on pull request #29158: URL: https://github.com/apache/spark/pull/29158#issuecomment-660793434 **[Test build #126148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126148/testReport)** for PR 29158 at commit [`e6d3083`](https://github.com/apache/spark/commit/e6d308335ef7b4a78a5fcc9cda83e623214d9990). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on pull request #29158: [SPARK-32362][SQL][TEST] AdaptiveQueryExecSuite misses verifying AE results
LantaoJin commented on pull request #29158: URL: https://github.com/apache/spark/pull/29158#issuecomment-660793421 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660786642 **[Test build #126143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126143/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660792705 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin opened a new pull request #29158: [SPARK-32362][SQL][TEST] AdaptiveQueryExecSuite misses verifying AE results
LantaoJin opened a new pull request #29158: URL: https://github.com/apache/spark/pull/29158 ### What changes were proposed in this pull request? Verify results for `AdaptiveQueryExecSuite` ### Why are the changes needed? `AdaptiveQueryExecSuite` misses verifying AE results ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Exists unit tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660792705 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660792637 **[Test build #126143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126143/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660789389 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins removed a comment on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660790160 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660790160 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions
maropu commented on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-660790062 This PR itself looks okay, so just to check; have you checked that this PR can get the same performance gain? ``` SQL | Before this PR | After this PR --- | --- | --- TPCDS 5T Q13 | 84s | 21s TPCDS 5T q85 | 66s | 34s TPCH 1T q19 | 37s | 32s ``` https://github.com/apache/spark/pull/28733#issue-428291092 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
SparkQA commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660789956 **[Test build #126147 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126147/testReport)** for PR 29085 at commit [`e16c136`](https://github.com/apache/spark/commit/e16c13620032f8062cb0fcd6ecad9836c97febf7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660783515 **[Test build #126142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126142/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660789389 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660789324 **[Test build #126142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126142/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29142: [SPARK-32360][SQL] Add MaxMinBy to support eliminate sorts
AmplabJenkins commented on pull request #29142: URL: https://github.com/apache/spark/pull/29142#issuecomment-660788635 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins removed a comment on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660788675 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660788675 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29142: [SPARK-32360][SQL] Add MaxMinBy to support eliminate sorts
AmplabJenkins removed a comment on pull request #29142: URL: https://github.com/apache/spark/pull/29142#issuecomment-660788635 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
SparkQA commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660788345 **[Test build #126146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126146/testReport)** for PR 29085 at commit [`a3628ac`](https://github.com/apache/spark/commit/a3628ac576ef9fbe06e87ad4ff36043897e0056a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29142: [SPARK-32360][SQL] Add MaxMinBy to support eliminate sorts
SparkQA commented on pull request #29142: URL: https://github.com/apache/spark/pull/29142#issuecomment-660788332 **[Test build #126145 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126145/testReport)** for PR 29142 at commit [`cd93b70`](https://github.com/apache/spark/commit/cd93b707dfd9e033a0580d688a19fe044af379f9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r457025605 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala ## @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import java.sql.{Date, Timestamp} + +import org.json4s.DefaultFormats +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ +import org.scalatest.Assertions._ +import org.scalatest.BeforeAndAfterEach +import org.scalatest.exceptions.TestFailedException + +import org.apache.spark.{SparkException, TaskContext, TestUtils} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Column, Row} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, Expression, GenericInternalRow} +import org.apache.spark.sql.catalyst.plans.physical.Partitioning +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.CalendarInterval + +abstract class BaseScriptTransformationSuite extends SparkPlanTest with SQLTestUtils + with BeforeAndAfterEach { + import testImplicits._ + import ScriptTransformationIOSchema._ + + protected val uncaughtExceptionHandler = new TestUncaughtExceptionHandler + + private var defaultUncaughtExceptionHandler: Thread.UncaughtExceptionHandler = _ + + protected override def beforeAll(): Unit = { +super.beforeAll() +defaultUncaughtExceptionHandler = Thread.getDefaultUncaughtExceptionHandler +Thread.setDefaultUncaughtExceptionHandler(uncaughtExceptionHandler) + } + + protected override def afterAll(): Unit = { +super.afterAll() +Thread.setDefaultUncaughtExceptionHandler(defaultUncaughtExceptionHandler) + } + + override protected def afterEach(): Unit = { +super.afterEach() +uncaughtExceptionHandler.cleanStatus() + } + + def isHive23OrSpark: Boolean + + def createScriptTransformationExec( + input: Seq[Expression], + script: String, + output: Seq[Attribute], + child: SparkPlan, + ioschema: ScriptTransformationIOSchema): BaseScriptTransformationExec + + test("cat without SerDe") { +assume(TestUtils.testCommandAvailable("/bin/bash")) + +val rowsDf = Seq("a", "b", "c").map(Tuple1.apply).toDF("a") +checkAnswer( + rowsDf, + (child: SparkPlan) => createScriptTransformationExec( +input = Seq(rowsDf.col("a").expr), +script = "cat", +output = Seq(AttributeReference("a", StringType)()), +child = child, +ioschema = defaultIOSchema + ), + rowsDf.collect()) +assert(uncaughtExceptionHandler.exception.isEmpty) + } + + test("script transformation should not swallow errors from upstream operators (no serde)") { +assume(TestUtils.testCommandAvailable("/bin/bash")) + +val rowsDf = Seq("a", "b", "c").map(Tuple1.apply).toDF("a") +val e = intercept[TestFailedException] { + checkAnswer( +rowsDf, +(child: SparkPlan) => createScriptTransformationExec( + input = Seq(rowsDf.col("a").expr), + script = "cat", + output = Seq(AttributeReference("a", StringType)()), + child = ExceptionInjectingOperator(child), + ioschema = defaultIOSchema +), +rowsDf.collect()) +} +assert(e.getMessage().contains("intentional exception")) +// Before SPARK-25158, uncaughtExceptionHandler will catch IllegalArgumentException +assert(uncaughtExceptionHandler.exception.isEmpty) + } + + test("SPARK-25990: TRANSFORM should handle different data types correctly") { +assume(TestUtils.testCommandAvailable("python")) +val scriptFilePath = getTestResourcePath("test_script.py") + +withTempView("v") { + val df = Seq( +(1, "1", 1.0, BigDecimal(1.0), new Timestamp(1)), +(2, "2", 2.0, BigDecimal(2.0), new Timestamp(2)), +(3, "3", 3.0, BigDecimal(3.0), new Timestamp(3)) + ).toDF("a", "b", "c", "d", "e") // Note column d's data type is Decimal(38, 18) +
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r457025094 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -0,0 +1,49 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW t1 AS SELECT * FROM VALUES +('a'), ('b'), ('v') +as t1(a); + +CREATE OR REPLACE TEMPORARY VIEW t2 AS SELECT * FROM VALUES +('1', true, unhex('537061726B2053514C'), tinyint(1), array_position(array(3, 2, 1), 1), float(1.0), 1.0, Decimal(1.0), timestamp(1), current_date), +('2', false, unhex('537061726B2053514C'), tinyint(2), array_position(array(3, 2, 1), 2), float(2.0), 2.0, Decimal(2.0), timestamp(2), current_date), +('3', true, unhex('537061726B2053514C'), tinyint(3), array_position(array(3, 2, 1), 1), float(3.0), 3.0, Decimal(3.0), timestamp(3), current_date) +as t2(a,b,c,d,e,f,g,h,i,j); + +SELECT TRANSFORM(a) Review comment: Added some case without serde. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r457025094 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -0,0 +1,49 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW t1 AS SELECT * FROM VALUES +('a'), ('b'), ('v') +as t1(a); + +CREATE OR REPLACE TEMPORARY VIEW t2 AS SELECT * FROM VALUES +('1', true, unhex('537061726B2053514C'), tinyint(1), array_position(array(3, 2, 1), 1), float(1.0), 1.0, Decimal(1.0), timestamp(1), current_date), +('2', false, unhex('537061726B2053514C'), tinyint(2), array_position(array(3, 2, 1), 2), float(2.0), 2.0, Decimal(2.0), timestamp(2), current_date), +('3', true, unhex('537061726B2053514C'), tinyint(3), array_position(array(3, 2, 1), 1), float(3.0), 3.0, Decimal(3.0), timestamp(3), current_date) +as t2(a,b,c,d,e,f,g,h,i,j); + +SELECT TRANSFORM(a) Review comment: Added some case without serde. With serde will show different when with/without hive This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins removed a comment on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660786972 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660786972 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660786626 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
SparkQA commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-660786632 **[Test build #126144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126144/testReport)** for PR 29085 at commit [`72b2155`](https://github.com/apache/spark/commit/72b215558b5d3e326ebe2416367a9d33455f9d58). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660786626 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660786642 **[Test build #126143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126143/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660779916 **[Test build #126140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126140/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660786336 **[Test build #126140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126140/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660784319 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29157: [SPARK-32344][SQL][2.4] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates
AmplabJenkins removed a comment on pull request #29157: URL: https://github.com/apache/spark/pull/29157#issuecomment-660784274 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660784319 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29157: [SPARK-32344][SQL][2.4] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates
AmplabJenkins commented on pull request #29157: URL: https://github.com/apache/spark/pull/29157#issuecomment-660784274 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29157: [SPARK-32344][SQL][2.4] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates
SparkQA commented on pull request #29157: URL: https://github.com/apache/spark/pull/29157#issuecomment-660783497 **[Test build #126141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126141/testReport)** for PR 29157 at commit [`c190886`](https://github.com/apache/spark/commit/c190886bed931f7084439b0a737c4a1cfeb90bc3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660783515 **[Test build #126142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126142/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
HyukjinKwon removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660778721 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29117: [WIP] Debug flaky pip installation test failure
HyukjinKwon commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660783001 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu opened a new pull request #29157: [SPARK-32344][SQL][2.4] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates
maropu opened a new pull request #29157: URL: https://github.com/apache/spark/pull/29157 ### What changes were proposed in this pull request? This PR intends to fix a bug of distinct FIRST/LAST aggregates in v2.4.6; ``` scala> sql("SELECT FIRST(DISTINCT v) FROM VALUES 1, 2, 3 t(v)").show() ... Caused by: java.lang.UnsupportedOperationException: Cannot evaluate expression: false#37 at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258) at org.apache.spark.sql.catalyst.expressions.AttributeReference.eval(namedExpressions.scala:226) at org.apache.spark.sql.catalyst.expressions.aggregate.First.ignoreNulls(First.scala:68) at org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions$lzycompute(First.scala:82) at org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions(First.scala:81) at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$15.apply(HashAggregateExec.scala:268) ``` A root cause of this bug is that the `Aggregation` strategy replaces a foldable boolean `ignoreNullsExpr` expr with a `Unevaluable` expr (`AttributeReference`) for distinct FIRST/LAST aggregate functions. But, this operation cannot be allowed because the `Analyzer` has checked that it must be foldabe; https://github.com/apache/spark/blob/ffdbbae1d465fe2c710d020de62ca1a6b0b924d9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/First.scala#L74-L76 So, this PR proposes to change a vriable for `IGNORE NULLS` from `Expression` to `Boolean` to avoid the case. ### Why are the changes needed? Bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a test in `DataFrameAggregateSuite`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660780221 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660780221 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660779916 **[Test build #126140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126140/testReport)** for PR 29117 at commit [`7a9cf67`](https://github.com/apache/spark/commit/7a9cf6718ae2b7d266ba2e67923fa7fe8ccf8fae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #29137: [SPARK-32337][SQL] Show initial plan in AQE plan tree string
gengliangwang commented on a change in pull request #29137: URL: https://github.com/apache/spark/pull/29137#discussion_r457011061 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -288,25 +291,59 @@ case class AdaptiveSparkPlanExec( addSuffix, maxFields, printNodeId) -currentPhysicalPlan.generateTreeString( +plans.zipWithIndex.foreach { case ((name, plan), i) => Review comment: Since there are always only two plans, shall we just call `initialPlan.generateTreeString` and `currentPhysicalPlan.generateTreeString` here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #29137: [SPARK-32337][SQL] Show initial plan in AQE plan tree string
gengliangwang commented on a change in pull request #29137: URL: https://github.com/apache/spark/pull/29137#discussion_r457011061 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -288,25 +291,59 @@ case class AdaptiveSparkPlanExec( addSuffix, maxFields, printNodeId) -currentPhysicalPlan.generateTreeString( +plans.zipWithIndex.foreach { case ((name, plan), i) => Review comment: Since there are always only two plans. Shall we just call `initialPlan.generateTreeString` and `currentPhysicalPlan.generateTreeString` here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29117: [WIP] Debug flaky pip installation test failure
HyukjinKwon commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-660778707 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org