[GitHub] [spark] SparkQA commented on pull request #30650: [SPARK-24818][CORE] Support delay scheduling for barrier execution
SparkQA commented on pull request #30650: URL: https://github.com/apache/spark/pull/30650#issuecomment-769644475 **[Test build #134640 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134640/testReport)** for PR 30650 at commit [`64113c0`](https://github.com/apache/spark/commit/64113c0fa3040251442e6d4089c335547bc277d0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31264: [SPARK-34144][SQL] Exception thrown when trying to write LocalDate and Instant values to a JDBC relation
SparkQA commented on pull request #31264: URL: https://github.com/apache/spark/pull/31264#issuecomment-769643165 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39231/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly
SparkQA commented on pull request #31245: URL: https://github.com/apache/spark/pull/31245#issuecomment-769641399 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39232/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31377: [SPARK-34239][SQL] Unify output of SHOW COLUMNS pass output attributes properly
AmplabJenkins commented on pull request #31377: URL: https://github.com/apache/spark/pull/31377#issuecomment-769635442 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134633/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31389: [SPARK-34284][CORE][TESTS] Fix deprecated API usage of Apache commons-io
AmplabJenkins commented on pull request #31389: URL: https://github.com/apache/spark/pull/31389#issuecomment-769635443 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134638/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly
SparkQA commented on pull request #31245: URL: https://github.com/apache/spark/pull/31245#issuecomment-769633875 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39232/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #29185: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner
zhengruifeng commented on a change in pull request #29185: URL: https://github.com/apache/spark/pull/29185#discussion_r566626246 ## File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ## @@ -73,7 +75,21 @@ class OrderedRDDFunctions[K : Ordering : ClassTag, * because it can push the sorting down into the shuffle machinery. */ def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = self.withScope { -new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering) +if (self.partitioner == Some(partitioner)) { + self.mapPartitions(iter => { +val context = TaskContext.get +val sorter = new ExternalSorter[K, V, V](context, None, None, Some(ordering)) +sorter.insertAll(iter) +context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled) +context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled) +context.taskMetrics.incPeakExecutionMemory(sorter.peakMemoryUsedBytes) +val outputIter = new InterruptibleIterator(context, + sorter.iterator.asInstanceOf[Iterator[(K, V)]]) +CompletionIterator[(K, V), Iterator[(K, V)]](outputIter, sorter.stop) + }, preservesPartitioning = true) Review comment: @mridulm sorry for the late reply. I think I am missing something, I don't quite understand why/how to refactor here, do you mean adding a Listener like this? ``` // Use completion callback to stop sorter if task was finished/cancelled. context.addTaskCompletionListener[Unit](_ => { sorter.stop() }) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31264: [SPARK-34144][SQL] Exception thrown when trying to write LocalDate and Instant values to a JDBC relation
SparkQA commented on pull request #31264: URL: https://github.com/apache/spark/pull/31264#issuecomment-769631155 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39231/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31207: [SPARK-34136][PYTHON][SQL] Add support for complex literals in PySpark
maropu commented on a change in pull request #31207: URL: https://github.com/apache/spark/pull/31207#discussion_r566624021 ## File path: python/pyspark/sql/functions.py ## @@ -91,13 +92,48 @@ def lit(col): Creates a :class:`Column` of literal value. .. versionadded:: 1.3.0 +.. versionchanged:: 3.2.0 +Added support for complex type literals. Examples >>> df.select(lit(5).alias('height')).withColumn('spark_user', lit(True)).take(1) [Row(height=5, spark_user=True)] -""" -return col if isinstance(col, Column) else _invoke_function("lit", col) +>>> df.select( +... lit({"height": 5}).alias("data"), +... lit(["python", "scala"]).alias("languages") +... ).take(1) +[Row(data={'height': 5}, languages=['python', 'scala'])] +""" +if isinstance(col, Column): +return col + +elif isinstance(col, list): +return array(*[lit(x) for x in col]) + +elif isinstance(col, tuple): +fields = ( +# Named tuple +col._fields if hasattr(col, "_fields") +# PySpark Row +else col.__fields__ if hasattr(col, "__fields__") +# Other +else [f"_{i + 1}" for i in range(len(col))] +) + +return struct(*[ Review comment: It seems fine, but we might be able to convert these exprs for complex types into literals in advance via `PythonSQLUtils`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #31348: [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
Ngone51 commented on pull request #31348: URL: https://github.com/apache/spark/pull/31348#issuecomment-769621949 > for example to trigger the schedule() function periodically. I have considered this approach and think it might not work in the case of the `requestedCores` is specified by users. Because the Master thinks those leaked executors are alive, so the granted cores for the application are not changed. So the application can not have the chance to schedule more executors if the granted cores already reach the `requestedCores`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skestle closed pull request #31252: [SPARK-33888][SQL][FOLLOWUP] Restored scale metadata for ARRAY type (Postgres)
skestle closed pull request #31252: URL: https://github.com/apache/spark/pull/31252 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skestle commented on pull request #31252: [SPARK-33888][SQL][FOLLOWUP] Restored scale metadata for ARRAY type (Postgres)
skestle commented on pull request #31252: URL: https://github.com/apache/spark/pull/31252#issuecomment-769621669 Thankyou @sarutak for following up with #31262 when I didn't have the time ;) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31389: [SPARK-34284][CORE][TESTS] Fix deprecated API usage of Apache commons-io
SparkQA removed a comment on pull request #31389: URL: https://github.com/apache/spark/pull/31389#issuecomment-769569671 **[Test build #134638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134638/testReport)** for PR 31389 at commit [`1cc27e9`](https://github.com/apache/spark/commit/1cc27e94d6f8884e8b393ad53a06958c5da66dd7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31389: [SPARK-34284][CORE][TESTS] Fix deprecated API usage of Apache commons-io
SparkQA commented on pull request #31389: URL: https://github.com/apache/spark/pull/31389#issuecomment-769620214 **[Test build #134638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134638/testReport)** for PR 31389 at commit [`1cc27e9`](https://github.com/apache/spark/commit/1cc27e94d6f8884e8b393ad53a06958c5da66dd7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31377: [SPARK-34239][SQL] Unify output of SHOW COLUMNS pass output attributes properly
SparkQA removed a comment on pull request #31377: URL: https://github.com/apache/spark/pull/31377#issuecomment-769537068 **[Test build #134633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134633/testReport)** for PR 31377 at commit [`81d5a38`](https://github.com/apache/spark/commit/81d5a38ac0209edce01e60e7d6e24aaa66ffa81e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31377: [SPARK-34239][SQL] Unify output of SHOW COLUMNS pass output attributes properly
SparkQA commented on pull request #31377: URL: https://github.com/apache/spark/pull/31377#issuecomment-769618672 **[Test build #134633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134633/testReport)** for PR 31377 at commit [`81d5a38`](https://github.com/apache/spark/commit/81d5a38ac0209edce01e60e7d6e24aaa66ffa81e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cristichircu commented on a change in pull request #31264: [SPARK-34144][SQL] Exception thrown when trying to write LocalDate and Instant values to a JDBC relation
cristichircu commented on a change in pull request #31264: URL: https://github.com/apache/spark/pull/31264#discussion_r566612985 ## File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ## @@ -606,4 +610,70 @@ class JDBCWriteSuite extends SharedSparkSession with BeforeAndAfter { sparkContext.removeSparkListener(listener) taskMetrics.sum } + + test("SPARK-34144: write and read java.sql Date and Timestamp") { +val schema = new StructType().add("d", DateType).add("t", TimestampType); +val values = Seq(Row.apply(Date.valueOf("2020-01-01"), + Timestamp.valueOf("2020-02-02 12:13:14.56789"))) +val df = spark.createDataFrame(sparkContext.makeRDD(values), schema) + +df.write.jdbc(url, "TEST.TIMETYPES", new Properties()) + +val rows = spark.read.jdbc(url, "TEST.TIMETYPES", new Properties()).collect() +assert(1 === rows.length); +assert(rows(0).getAs[java.sql.Date](0) === java.sql.Date.valueOf("2020-01-01")) +assert(rows(0).getAs[java.sql.Timestamp](1) + === java.sql.Timestamp.valueOf("2020-02-02 12:13:14.56789")) + } Review comment: Done. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly
SparkQA removed a comment on pull request #31378: URL: https://github.com/apache/spark/pull/31378#issuecomment-769534379 **[Test build #134632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134632/testReport)** for PR 31378 at commit [`1036b2d`](https://github.com/apache/spark/commit/1036b2d732fdf173f905c117dc7f89d2c2572769). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly
AmplabJenkins commented on pull request #31245: URL: https://github.com/apache/spark/pull/31245#issuecomment-769614926 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134644/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31264: [SPARK-34144][SQL] Exception thrown when trying to write LocalDate and Instant values to a JDBC relation
AmplabJenkins commented on pull request #31264: URL: https://github.com/apache/spark/pull/31264#issuecomment-769614925 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134643/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31391: [SPARK-33212][FOLLOW-UP][BUILD] Fix test "built-in Hadoop version should support shaded client" for hadoop-2.7
AmplabJenkins commented on pull request #31391: URL: https://github.com/apache/spark/pull/31391#issuecomment-769614924 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31352: [SPARK-34252][SQL] Subquery with view in aggregate's grouping expression fails during the analysis check
cloud-fan commented on pull request #31352: URL: https://github.com/apache/spark/pull/31352#issuecomment-769614946 Hi @imback82 , can you re-open the PR to add the tests? thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31382: [SPARK-34144][SQL][3.0] Exception thrown when trying to write LocalDate and Instant values to a table
AmplabJenkins commented on pull request #31382: URL: https://github.com/apache/spark/pull/31382#issuecomment-769614927 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134642/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31182: [SPARK-34108][SQL] Caching with permanent view doesn't work in certain cases
cloud-fan commented on pull request #31182: URL: https://github.com/apache/spark/pull/31182#issuecomment-769614860 Hi @sunchao , can you re-open the PR to add the tests? thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31384: [SPARK-31816][SQL] Added high level description about JDBC connection providers for users/developers
HyukjinKwon commented on pull request #31384: URL: https://github.com/apache/spark/pull/31384#issuecomment-769614770 cc @huaxingao, @dilipbiswal FYI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly
AmplabJenkins commented on pull request #31378: URL: https://github.com/apache/spark/pull/31378#issuecomment-769614748 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134632/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31391: [SPARK-33212][FOLLOW-UP][BUILD] Fix test "built-in Hadoop version should support shaded client" for hadoop-2.7
HyukjinKwon commented on pull request #31391: URL: https://github.com/apache/spark/pull/31391#issuecomment-769614229 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #31391: [SPARK-33212][FOLLOW-UP][BUILD] Fix test "built-in Hadoop version should support shaded client" for hadoop-2.7
HyukjinKwon closed pull request #31391: URL: https://github.com/apache/spark/pull/31391 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #31368: [SPARK-34269][SQL] Simplify SQL view resolution
cloud-fan closed pull request #31368: URL: https://github.com/apache/spark/pull/31368 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly
SparkQA commented on pull request #31378: URL: https://github.com/apache/spark/pull/31378#issuecomment-769614050 **[Test build #134632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134632/testReport)** for PR 31378 at commit [`1036b2d`](https://github.com/apache/spark/commit/1036b2d732fdf173f905c117dc7f89d2c2572769). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31368: [SPARK-34269][SQL] Simplify SQL view resolution
cloud-fan commented on pull request #31368: URL: https://github.com/apache/spark/pull/31368#issuecomment-769613707 GA passed, merging to master, thanks for the review! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31372: [SPARK-34272][SQL] Pretty SQL should check NonSQLExpression
cloud-fan commented on a change in pull request #31372: URL: https://github.com/apache/spark/pull/31372#discussion_r566610241 ## File path: sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out ## @@ -109,7 +109,7 @@ struct -- !query SELECT udf(stddev_pop(CAST(b AS Decimal(38,0 FROM aggtest -- !query schema -struct +struct Review comment: Why do we couple this with `NonSQLExpression`? Can we change `AnsiCast.sql` directly? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30650: [SPARK-24818][CORE] Support delay scheduling for barrier execution
AmplabJenkins commented on pull request #30650: URL: https://github.com/apache/spark/pull/30650#issuecomment-769610134 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39228/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31368: [SPARK-34269][SQL] Simplify SQL view resolution
AmplabJenkins commented on pull request #31368: URL: https://github.com/apache/spark/pull/31368#issuecomment-769610135 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134639/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning
AmplabJenkins commented on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-769610136 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134636/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31389: [SPARK-34284][CORE][TESTS] Fix deprecated API usage of Apache commons-io
AmplabJenkins commented on pull request #31389: URL: https://github.com/apache/spark/pull/31389#issuecomment-769610133 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39226/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly
cloud-fan commented on a change in pull request #31245: URL: https://github.com/apache/spark/pull/31245#discussion_r566607701 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowTablesSuiteBase.scala ## @@ -37,13 +36,14 @@ trait ShowTablesSuiteBase extends QueryTest with DDLCommandTestUtils { override val command = "SHOW TABLES" protected def defaultNamespace: Seq[String] case class ShowRow(namespace: String, table: String, isTemporary: Boolean) Review comment: They can use `Row` directly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #31340: [SPARK-34237][SQL] Add more metrics (fallback, spill) to object hash aggregate
c21 commented on pull request #31340: URL: https://github.com/apache/spark/pull/31340#issuecomment-769606994 Thanks @maropu and @cloud-fan for review! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #31340: [SPARK-34237][SQL] Add more metrics (fallback, spill) to object hash aggregate
c21 commented on a change in pull request #31340: URL: https://github.com/apache/spark/pull/31340#discussion_r566604963 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala ## @@ -76,6 +82,11 @@ class ObjectAggregationIterator( */ processInputs() + TaskContext.get().addTaskCompletionListener[Unit](_ => { +// At the end of the task, update the task's spill size. +spillSize.set(TaskContext.get().taskMetrics().memoryBytesSpilled - spillSizeBefore) Review comment: Thanks @maropu, created a followup JIRA - https://issues.apache.org/jira/browse/SPARK-34286 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly
beliefer commented on a change in pull request #31245: URL: https://github.com/apache/spark/pull/31245#discussion_r566539660 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala ## @@ -138,4 +121,21 @@ class ShowTablesSuite extends ShowTablesSuiteBase with CommandSuiteBase { } } } + + test("SPARK-34157 Unify output of SHOW TABLES and pass output attributes properly") { Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning
SparkQA commented on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-769601744 **[Test build #134636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134636/testReport)** for PR 31258 at commit [`cdd1226`](https://github.com/apache/spark/commit/cdd122674877a3588f9adc480b1c08514f6dc5d8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning
SparkQA removed a comment on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-769555073 **[Test build #134636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134636/testReport)** for PR 31258 at commit [`cdd1226`](https://github.com/apache/spark/commit/cdd122674877a3588f9adc480b1c08514f6dc5d8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly
beliefer commented on a change in pull request #31245: URL: https://github.com/apache/spark/pull/31245#discussion_r566539660 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala ## @@ -138,4 +121,21 @@ class ShowTablesSuite extends ShowTablesSuiteBase with CommandSuiteBase { } } } + + test("SPARK-34157 Unify output of SHOW TABLES and pass output attributes properly") { Review comment: org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31389: [SPARK-34284][CORE][TESTS] Fix deprecated API usage of Apache commons-io
SparkQA commented on pull request #31389: URL: https://github.com/apache/spark/pull/31389#issuecomment-769594305 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39226/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31391: [SPARK-33212][FOLLOW-UP][BUILD] Fix test "built-in Hadoop version should support shaded client" for hadoop-2.7
HyukjinKwon commented on pull request #31391: URL: https://github.com/apache/spark/pull/31391#issuecomment-769591236 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31363: [SPARK-34154][YARN] Extend LocalityPlacementStrategySuite's test with a timeout
HyukjinKwon commented on pull request #31363: URL: https://github.com/apache/spark/pull/31363#issuecomment-769590677 Thanks @attilapiros. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30650: [SPARK-24818][CORE] Support delay scheduling for barrier execution
SparkQA commented on pull request #30650: URL: https://github.com/apache/spark/pull/30650#issuecomment-769589402 **[Test build #134640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134640/testReport)** for PR 30650 at commit [`64113c0`](https://github.com/apache/spark/commit/64113c0fa3040251442e6d4089c335547bc277d0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #30650: [SPARK-24818][CORE] Support delay scheduling for barrier execution
Ngone51 commented on pull request #30650: URL: https://github.com/apache/spark/pull/30650#issuecomment-769587926 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #30650: [SPARK-24818][CORE] Support delay scheduling for barrier execution
Ngone51 commented on pull request #30650: URL: https://github.com/apache/spark/pull/30650#issuecomment-769587860 kindly ping @mridulm @tgravescs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31391: [SPARK-33212][FOLLOW-UP][BUILD] Fix test "built-in Hadoop version should support shaded client" for hadoop-2.7
AmplabJenkins commented on pull request #31391: URL: https://github.com/apache/spark/pull/31391#issuecomment-769587564 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning
AmplabJenkins removed a comment on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-769587200 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39224/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly
AmplabJenkins commented on pull request #31245: URL: https://github.com/apache/spark/pull/31245#issuecomment-769587193 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134635/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31388: [SPARK-34260][SQL][2.4] Fix UnresolvedException when creating temp view twice
AmplabJenkins commented on pull request #31388: URL: https://github.com/apache/spark/pull/31388#issuecomment-769587197 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134631/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31368: [SPARK-34269][SQL] Simplify SQL view resolution
AmplabJenkins removed a comment on pull request #31368: URL: https://github.com/apache/spark/pull/31368#issuecomment-769587192 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39227/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31390: [SPARK-28123][SQL] String Functions: support btrim
AmplabJenkins removed a comment on pull request #31390: URL: https://github.com/apache/spark/pull/31390#issuecomment-769587194 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly
AmplabJenkins removed a comment on pull request #31245: URL: https://github.com/apache/spark/pull/31245#issuecomment-769587193 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134635/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31388: [SPARK-34260][SQL][2.4] Fix UnresolvedException when creating temp view twice
AmplabJenkins removed a comment on pull request #31388: URL: https://github.com/apache/spark/pull/31388#issuecomment-769587197 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134631/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31368: [SPARK-34269][SQL] Simplify SQL view resolution
AmplabJenkins commented on pull request #31368: URL: https://github.com/apache/spark/pull/31368#issuecomment-769587192 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39227/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31390: [SPARK-28123][SQL] String Functions: support btrim
AmplabJenkins commented on pull request #31390: URL: https://github.com/apache/spark/pull/31390#issuecomment-769587194 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning
AmplabJenkins commented on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-769587200 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39224/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly
SparkQA removed a comment on pull request #31245: URL: https://github.com/apache/spark/pull/31245#issuecomment-769534158 **[Test build #134635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134635/testReport)** for PR 31245 at commit [`3a4a177`](https://github.com/apache/spark/commit/3a4a1776706ea009b67507a4d0a6b32036368238). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly
SparkQA commented on pull request #31245: URL: https://github.com/apache/spark/pull/31245#issuecomment-769585837 **[Test build #134635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134635/testReport)** for PR 31245 at commit [`3a4a177`](https://github.com/apache/spark/commit/3a4a1776706ea009b67507a4d0a6b32036368238). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31390: [SPARK-28123][SQL] String Functions: support btrim
SparkQA commented on pull request #31390: URL: https://github.com/apache/spark/pull/31390#issuecomment-769584623 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39225/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31389: [SPARK-34284][CORE][TESTS] Fix deprecated API usage of Apache commons-io
SparkQA commented on pull request #31389: URL: https://github.com/apache/spark/pull/31389#issuecomment-769584117 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39226/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31390: [SPARK-28123][SQL] String Functions: support btrim
SparkQA commented on pull request #31390: URL: https://github.com/apache/spark/pull/31390#issuecomment-769583147 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39225/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31391: [SPARK-33212][FOLLOW-UP][BUILD] Fix test "built-in Hadoop version should support shaded client" for hadoop-2.7
cloud-fan commented on pull request #31391: URL: https://github.com/apache/spark/pull/31391#issuecomment-769581661 cc @sunchao @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros edited a comment on pull request #31363: [SPARK-34154][YARN] Extend LocalityPlacementStrategySuite's test with a timeout
attilapiros edited a comment on pull request #31363: URL: https://github.com/apache/spark/pull/31363#issuecomment-769579261 This was expected. The goal was to avoid endless wait for the test to finish and get more info about the problem via a stack trace. And we have the stack trace: ``` [info] - handle large number of containers and tasks (SPARK-18750) *** FAILED *** (30 seconds, 4 milliseconds) [info] Failed with an exception or a timeout at thread join: [info] [info] java.lang.RuntimeException: Timeout at waiting for thread to stop (its stack trace is added to the exception) [info] at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) [info] at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) [info] at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) [info] at java.net.InetAddress.getAllByName0(InetAddress.java:1277) [info] at java.net.InetAddress.getAllByName(InetAddress.java:1193) [info] at java.net.InetAddress.getAllByName(InetAddress.java:1127) [info] at java.net.InetAddress.getByName(InetAddress.java:1077) [info] at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:568) [info] at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:585) [info] at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109) [info] at org.apache.spark.deploy.yarn.SparkRackResolver.coreResolve(SparkRackResolver.scala:75) [info] at org.apache.spark.deploy.yarn.SparkRackResolver.resolve(SparkRackResolver.scala:66) [info] at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.$anonfun$localityOfRequestedContainers$3(LocalityPreferredContainerPlacementStrategy.scala:142) [info] at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy$$Lambda$658/1080992036.apply$mcVI$sp(Unknown Source) [info] at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) [info] at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.localityOfRequestedContainers(LocalityPreferredContainerPlacementStrategy.scala:138) [info] at org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite.org$apache$spark$deploy$yarn$LocalityPlacementStrategySuite$$runTest(LocalityPlacementStrategySuite.scala:94) [info] at org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite$$anon$1.run(LocalityPlacementStrategySuite.scala:40) [info] at java.lang.Thread.run(Thread.java:748) (LocalityPlacementStrategySuite.scala:61) ``` I already suspected this must be related to DNS resolution see my [jira comment](https://issues.apache.org/jira/browse/SPARK-34154?focusedCommentId=17272990=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17272990). Now I am checking the possible solutions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros commented on pull request #31363: [SPARK-34154][YARN] Extend LocalityPlacementStrategySuite's test with a timeout
attilapiros commented on pull request #31363: URL: https://github.com/apache/spark/pull/31363#issuecomment-769579261 This was expected. The goal was to avoid endless wait for the test to finish and get more info about the problem via a stack trace. And we have the stack trace: ``` [info] - handle large number of containers and tasks (SPARK-18750) *** FAILED *** (30 seconds, 4 milliseconds) [info] Failed with an exception or a timeout at thread join: [info] [info] java.lang.RuntimeException: Timeout at waiting for thread to stop (its stack trace is added to the exception) [info] at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) [info] at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) [info] at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) [info] at java.net.InetAddress.getAllByName0(InetAddress.java:1277) [info] at java.net.InetAddress.getAllByName(InetAddress.java:1193) [info] at java.net.InetAddress.getAllByName(InetAddress.java:1127) [info] at java.net.InetAddress.getByName(InetAddress.java:1077) [info] at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:568) [info] at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:585) [info] at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109) [info] at org.apache.spark.deploy.yarn.SparkRackResolver.coreResolve(SparkRackResolver.scala:75) [info] at org.apache.spark.deploy.yarn.SparkRackResolver.resolve(SparkRackResolver.scala:66) [info] at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.$anonfun$localityOfRequestedContainers$3(LocalityPreferredContainerPlacementStrategy.scala:142) [info] at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy$$Lambda$658/1080992036.apply$mcVI$sp(Unknown Source) [info] at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) [info] at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.localityOfRequestedContainers(LocalityPreferredContainerPlacementStrategy.scala:138) [info] at org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite.org$apache$spark$deploy$yarn$LocalityPlacementStrategySuite$$runTest(LocalityPlacementStrategySuite.scala:94) [info] at org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite$$anon$1.run(LocalityPlacementStrategySuite.scala:40) [info] at java.lang.Thread.run(Thread.java:748) (LocalityPlacementStrategySuite.scala:61) ``` I already suspected this must be related to DNS resolution see my [jira comment](https://issues.apache.org/jira/browse/SPARK-34154?focusedCommentId=17272990=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17272990). I am checking the possible solutions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31388: [SPARK-34260][SQL][2.4] Fix UnresolvedException when creating temp view twice
SparkQA removed a comment on pull request #31388: URL: https://github.com/apache/spark/pull/31388#issuecomment-769537466 **[Test build #134631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134631/testReport)** for PR 31388 at commit [`fa6cfcd`](https://github.com/apache/spark/commit/fa6cfcddf8a98c039739f91f4fd76ba5ac6510ca). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31388: [SPARK-34260][SQL][2.4] Fix UnresolvedException when creating temp view twice
SparkQA commented on pull request #31388: URL: https://github.com/apache/spark/pull/31388#issuecomment-769577476 **[Test build #134631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134631/testReport)** for PR 31388 at commit [`fa6cfcd`](https://github.com/apache/spark/commit/fa6cfcddf8a98c039739f91f4fd76ba5ac6510ca). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bozhang2820 opened a new pull request #31391: [SPARK-33212][FOLLOW-UP][BUILD] Fix test "built-in Hadoop version should support shaded client" for hadoop-2.7
bozhang2820 opened a new pull request #31391: URL: https://github.com/apache/spark/pull/31391 ### What changes were proposed in this pull request? We added test "built-in Hadoop version should support shaded client" in https://github.com/apache/spark/pull/31203, but it fails when profile hadoop-2.7 is activated. This change fixes the test by skipping the assertion when Hadoop version is 2. ### Why are the changes needed? The test fails in master branch when profile hadoop-2.7 is activated. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran the test with hadoop-2.7 profile. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #31368: [SPARK-34269][SQL] Simplify SQL view resolution
viirya commented on a change in pull request #31368: URL: https://github.com/apache/spark/pull/31368#discussion_r566578589 ## File path: sql/core/src/test/resources/sql-tests/results/group-by-filter.sql.out ## @@ -796,14 +796,16 @@ IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few comman : +- Filter (dept_id#x = outer(dept_id#x)) :+- SubqueryAlias dept : +- View (`DEPT`, [dept_id#x,dept_name#x,state#x]) -: +- Project [dept_id#x, dept_name#x, state#x] -: +- SubqueryAlias DEPT -:+- LocalRelation [dept_id#x, dept_name#x, state#x] +: +- Project [cast(dept_id#x as int) AS dept_id#x, cast(dept_name#x as string) AS dept_name#x, cast(state#x as string) AS state#x] Review comment: sounds okay. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #31368: [SPARK-34269][SQL] Simplify SQL view resolution
imback82 commented on a change in pull request #31368: URL: https://github.com/apache/spark/pull/31368#discussion_r566577976 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ## @@ -230,7 +230,7 @@ object LogicalPlanIntegrity { // NOTE: we still need to filter resolved expressions here because the output of // some resolved logical plans can have unresolved references, // e.g., outer references in `ExistenceJoin`. - p.output.filter(_.resolved).map { a => (a.exprId, a.dataType) } + p.output.filter(_.resolved).map { a => (a.exprId, a.dataType.asNullable) } Review comment: got it. thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31390: [SPARK-28123][SQL] String Functions: support btrim
SparkQA commented on pull request #31390: URL: https://github.com/apache/spark/pull/31390#issuecomment-769572709 **[Test build #134637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134637/testReport)** for PR 31390 at commit [`1600d4c`](https://github.com/apache/spark/commit/1600d4cb7787d2fa88922e02d4fef8172777ea18). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning
SparkQA commented on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-769572412 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39224/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31207: [SPARK-34136][PYTHON][SQL] Add support for complex literals in PySpark
maropu commented on a change in pull request #31207: URL: https://github.com/apache/spark/pull/31207#discussion_r566568600 ## File path: python/pyspark/sql/functions.py ## @@ -91,13 +92,48 @@ def lit(col): Creates a :class:`Column` of literal value. .. versionadded:: 1.3.0 +.. versionchanged:: 3.2.0 +Added support for complex type literals. Examples >>> df.select(lit(5).alias('height')).withColumn('spark_user', lit(True)).take(1) [Row(height=5, spark_user=True)] -""" -return col if isinstance(col, Column) else _invoke_function("lit", col) +>>> df.select( +... lit({"height": 5}).alias("data"), +... lit(["python", "scala"]).alias("languages") +... ).take(1) +[Row(data={'height': 5}, languages=['python', 'scala'])] +""" +if isinstance(col, Column): +return col + +elif isinstance(col, list): +return array(*[lit(x) for x in col]) + +elif isinstance(col, tuple): +fields = ( +# Named tuple +col._fields if hasattr(col, "_fields") +# PySpark Row +else col.__fields__ if hasattr(col, "__fields__") +# Other +else [f"_{i + 1}" for i in range(len(col))] +) + +return struct(*[ Review comment: Looks fine, too. As @zero323 said above, `typedLit` supports the complex types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31368: [SPARK-34269][SQL] Simplify SQL view resolution
cloud-fan commented on a change in pull request #31368: URL: https://github.com/apache/spark/pull/31368#discussion_r566575842 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ## @@ -230,7 +230,7 @@ object LogicalPlanIntegrity { // NOTE: we still need to filter resolved expressions here because the output of // some resolved logical plans can have unresolved references, // e.g., outer references in `ExistenceJoin`. - p.output.filter(_.resolved).map { a => (a.exprId, a.dataType) } + p.output.filter(_.resolved).map { a => (a.exprId, a.dataType.asNullable) } Review comment: The view tests fail without this change. It's a test only thing (the check is skipped in production) that we don't need to backport, so I didn't spend time putting this into a separate PR with tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31368: [SPARK-34269][SQL] Simplify SQL view resolution
cloud-fan commented on a change in pull request #31368: URL: https://github.com/apache/spark/pull/31368#discussion_r566575842 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ## @@ -230,7 +230,7 @@ object LogicalPlanIntegrity { // NOTE: we still need to filter resolved expressions here because the output of // some resolved logical plans can have unresolved references, // e.g., outer references in `ExistenceJoin`. - p.output.filter(_.resolved).map { a => (a.exprId, a.dataType) } + p.output.filter(_.resolved).map { a => (a.exprId, a.dataType.asNullable) } Review comment: The view tests fail without this change. It's a test only thing that we don't need to backport, so I didn't spend time putting this into a separate PR with tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31368: [SPARK-34269][SQL] Simplify SQL view resolution
cloud-fan commented on a change in pull request #31368: URL: https://github.com/apache/spark/pull/31368#discussion_r566575428 ## File path: sql/core/src/test/resources/sql-tests/results/group-by-filter.sql.out ## @@ -796,14 +796,16 @@ IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few comman : +- Filter (dept_id#x = outer(dept_id#x)) :+- SubqueryAlias dept : +- View (`DEPT`, [dept_id#x,dept_name#x,state#x]) -: +- Project [dept_id#x, dept_name#x, state#x] -: +- SubqueryAlias DEPT -:+- LocalRelation [dept_id#x, dept_name#x, state#x] +: +- Project [cast(dept_id#x as int) AS dept_id#x, cast(dept_name#x as string) AS dept_name#x, cast(state#x as string) AS state#x] Review comment: It's better to delay the cast adding (after the parsed view plan is resolved), so that we can skip adding cast for views that have no schema changing. But I can't find an easy way to do it and this is really not a big deal (optimizer willl remove redundant casts), so I go with the simple approach for maintainability. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31389: [SPARK-34284][CORE][TESTS] Fix deprecated API usage of Apache commons-io
SparkQA commented on pull request #31389: URL: https://github.com/apache/spark/pull/31389#issuecomment-769569671 **[Test build #134638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134638/testReport)** for PR 31389 at commit [`1cc27e9`](https://github.com/apache/spark/commit/1cc27e94d6f8884e8b393ad53a06958c5da66dd7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly
AmplabJenkins removed a comment on pull request #31378: URL: https://github.com/apache/spark/pull/31378#issuecomment-769569104 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39220/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31388: [SPARK-34260][SQL][2.4] Fix UnresolvedException when creating temp view twice
AmplabJenkins removed a comment on pull request #31388: URL: https://github.com/apache/spark/pull/31388#issuecomment-769569105 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39219/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31388: [SPARK-34260][SQL][2.4] Fix UnresolvedException when creating temp view twice
AmplabJenkins commented on pull request #31388: URL: https://github.com/apache/spark/pull/31388#issuecomment-769569105 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39219/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly
AmplabJenkins commented on pull request #31378: URL: https://github.com/apache/spark/pull/31378#issuecomment-769569104 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39220/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #31340: [SPARK-34237][SQL] Add more metrics (fallback, spill) to object hash aggregate
cloud-fan closed pull request #31340: URL: https://github.com/apache/spark/pull/31340 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31340: [SPARK-34237][SQL] Add more metrics (fallback, spill) to object hash aggregate
cloud-fan commented on pull request #31340: URL: https://github.com/apache/spark/pull/31340#issuecomment-769568965 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning
SparkQA commented on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-769564911 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39224/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31207: [SPARK-34136][PYTHON][SQL] Add support for complex literals in PySpark
maropu commented on a change in pull request #31207: URL: https://github.com/apache/spark/pull/31207#discussion_r566570144 ## File path: python/pyspark/sql/functions.py ## @@ -91,13 +92,58 @@ def lit(col): Creates a :class:`Column` of literal value. .. versionadded:: 1.3.0 +.. versionchanged:: 3.2.0 +Added support for complex type literals. + +Parameters +-- +col : bool, float, int, str, datetime.date, datetime.datetime, dict, list, tuple +Object to be converted into :class:`Column`. + +If it is a collection, conversion will be applied recursively. In such case, +all stored values should be of compatible types. + +I `col` is already a :class:`Column`, it will be returned unmodified. Review comment: ditto: "The passed in object is returned directly if it is already a :class:`Column`." https://github.com/apache/spark/blob/72b7f8abfb60d0008f1f9bed94ce1c367a7d7cce/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L107 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31207: [SPARK-34136][PYTHON][SQL] Add support for complex literals in PySpark
maropu commented on a change in pull request #31207: URL: https://github.com/apache/spark/pull/31207#discussion_r566569587 ## File path: python/pyspark/sql/functions.py ## @@ -91,13 +92,58 @@ def lit(col): Creates a :class:`Column` of literal value. .. versionadded:: 1.3.0 +.. versionchanged:: 3.2.0 +Added support for complex type literals. + +Parameters +-- +col : bool, float, int, str, datetime.date, datetime.datetime, dict, list, tuple +Object to be converted into :class:`Column`. Review comment: "Creates a :class:`Column` of literal value."? https://github.com/apache/spark/blob/72b7f8abfb60d0008f1f9bed94ce1c367a7d7cce/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L105 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31207: [SPARK-34136][PYTHON][SQL] Add support for complex literals in PySpark
maropu commented on a change in pull request #31207: URL: https://github.com/apache/spark/pull/31207#discussion_r566568861 ## File path: python/pyspark/sql/functions.py ## @@ -91,13 +92,58 @@ def lit(col): Creates a :class:`Column` of literal value. .. versionadded:: 1.3.0 +.. versionchanged:: 3.2.0 +Added support for complex type literals. + +Parameters +-- +col : bool, float, int, str, datetime.date, datetime.datetime, dict, list, tuple +Object to be converted into :class:`Column`. + +If it is a collection, conversion will be applied recursively. In such case, +all stored values should be of compatible types. + +I `col` is already a :class:`Column`, it will be returned unmodified. Review comment: nit: how about `I` -> `If` and removing "already". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31207: [SPARK-34136][PYTHON][SQL] Add support for complex literals in PySpark
maropu commented on a change in pull request #31207: URL: https://github.com/apache/spark/pull/31207#discussion_r566568600 ## File path: python/pyspark/sql/functions.py ## @@ -91,13 +92,48 @@ def lit(col): Creates a :class:`Column` of literal value. .. versionadded:: 1.3.0 +.. versionchanged:: 3.2.0 +Added support for complex type literals. Examples >>> df.select(lit(5).alias('height')).withColumn('spark_user', lit(True)).take(1) [Row(height=5, spark_user=True)] -""" -return col if isinstance(col, Column) else _invoke_function("lit", col) +>>> df.select( +... lit({"height": 5}).alias("data"), +... lit(["python", "scala"]).alias("languages") +... ).take(1) +[Row(data={'height': 5}, languages=['python', 'scala'])] +""" +if isinstance(col, Column): +return col + +elif isinstance(col, list): +return array(*[lit(x) for x in col]) + +elif isinstance(col, tuple): +fields = ( +# Named tuple +col._fields if hasattr(col, "_fields") +# PySpark Row +else col.__fields__ if hasattr(col, "__fields__") +# Other +else [f"_{i + 1}" for i in range(len(col))] +) + +return struct(*[ Review comment: Looks fine, too. As @zero323 said above, `typedLit` supports the complex types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #31128: [SPARK-28123][SQL] String Functions: support btrim
beliefer commented on pull request #31128: URL: https://github.com/apache/spark/pull/31128#issuecomment-769559957 The function adopt the new PR https://github.com/apache/spark/pull/31390 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer closed pull request #31128: [SPARK-28123][SQL] String Functions: support btrim
beliefer closed pull request #31128: URL: https://github.com/apache/spark/pull/31128 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request #31390: [SPARK-28123][SQL] String Functions: support btrim
beliefer opened a new pull request #31390: URL: https://github.com/apache/spark/pull/31390 ### What changes were proposed in this pull request? Spark support `trim`/`ltrim`/`rtrim` now. The function `btrim` is an alternate form of `TRIM(BOTH FROM )`. `btrim` removes the longest string consisting only of specified characters from the start and end of a string. The mainstream database support this feature show below: **Postgresql** https://www.postgresql.org/docs/11/functions-binarystring.html **Vertica** https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/BTRIM.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_5 **Redshift** https://docs.aws.amazon.com/redshift/latest/dg/r_BTRIM.html **Druid** https://druid.apache.org/docs/latest/querying/sql.html#string-functions **Greenplum** http://docs.greenplum.org/6-8/ref_guide/function-summary.html ### Why are the changes needed? btrim is very useful. ### Does this PR introduce _any_ user-facing change? Yes. btrim is a new function ### How was this patch tested? Jenkins test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] akiyamaneko commented on pull request #30446: [SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted
akiyamaneko commented on pull request #30446: URL: https://github.com/apache/spark/pull/30446#issuecomment-769559581 @tgravescs sorry for delay, Is there anything I can do? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang opened a new pull request #31389: [SPARK-34284][CORE] Fix deprecated API usage of Apache commons-io
LuciferYang opened a new pull request #31389: URL: https://github.com/apache/spark/pull/31389 ### What changes were proposed in this pull request? There are some deprecated API usage compilation warning related to Apache commons-io as follows: ``` [WARNING] [Warn] /spark/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala:1109: [deprecation @ org.apache.spark.deploy.SparkSubmitSuite.checkDownloadedFile.$org_scalatest_assert_macro_expr.$org_scalatest_assert_macro_left | origin=org.apache.commons.io.FileUtils.readFileToString | version=] method readFileToString in class FileUtils is deprecated [WARNING] [Warn] /spark/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala:1110: [deprecation @ org.apache.spark.deploy.SparkSubmitSuite.checkDownloadedFile.$org_scalatest_assert_macro_expr.$org_scalatest_assert_macro_right | origin=org.apache.commons.io.FileUtils.readFileToString | version=] method readFileToString in class FileUtils is deprecated [WARNING] [Warn] /spark/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala:1152: [deprecation @ org.apache.spark.deploy.SparkSubmitSuite | origin=org.apache.commons.io.FileUtils.write | version=] method write in class FileUtils is deprecated [WARNING] [Warn] /spark/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala:1167: [deprecation @ org.apache.spark.deploy.SparkSubmitSuite | origin=org.apache.commons.io.FileUtils.write | version=] method write in class FileUtils is deprecated [WARNING] [Warn] /spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:201: [deprecation @ org.apache.spark.deploy.history.HistoryServerSuite..$anonfun.exp | origin=org.apache.commons.io.IOUtils.toString | version=] method toString in class IOUtils is deprecated [WARNING] [Warn] /spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:716: [deprecation @ org.apache.spark.deploy.history.HistoryServerSuite.getContentAndCode.inString.$anonfun | origin=org.apache.commons.io.IOUtils.toString | version=] method toString in class IOUtils is deprecated [WARNING] [Warn] /spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:732: [deprecation @ org.apache.spark.deploy.history.HistoryServerSuite.connectAndGetInputStream.errString.$anonfun | origin=org.apache.commons.io.IOUtils.toString | version=] method toString in class IOUtils is deprecated [WARNING] [Warn] /spark/streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala:267: [deprecation @ org.apache.spark.streaming.InputStreamsSuite..$anonfun.$anonfun.write | origin=org.apache.commons.io.IOUtils.write | version=] method write in class IOUtils is deprecated [WARNING] [Warn] /spark/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala:912: [deprecation @ org.apache.spark.streaming.StreamingContextSuite.createCorruptedCheckpoint | origin=org.apache.commons.io.FileUtils.write | version=] method write in class FileUtils is deprecated ``` The main API change is to need to add a `java.nio.charset.Charset` parameter when the corresponding method is called, so the main change of is pr is add a `StandardCharsets.UTF_8` parameter to the these method. ### Why are the changes needed? Fix deprecated API usage of Apache commons-io. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the Jenkins or GitHub Action This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31388: [SPARK-34260][SQL][2.4] Fix UnresolvedException when creating temp view twice
SparkQA commented on pull request #31388: URL: https://github.com/apache/spark/pull/31388#issuecomment-769558511 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39219/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning
SparkQA commented on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-769555073 **[Test build #134636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134636/testReport)** for PR 31258 at commit [`cdd1226`](https://github.com/apache/spark/commit/cdd122674877a3588f9adc480b1c08514f6dc5d8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31377: [SPARK-34239][SQL] Unify output of SHOW COLUMNS pass output attributes properly
AmplabJenkins removed a comment on pull request #31377: URL: https://github.com/apache/spark/pull/31377#issuecomment-769553519 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39221/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31377: [SPARK-34239][SQL] Unify output of SHOW COLUMNS pass output attributes properly
AmplabJenkins commented on pull request #31377: URL: https://github.com/apache/spark/pull/31377#issuecomment-769553519 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39221/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org