[GitHub] [spark] ulysses-you commented on pull request #32355: [SPARK-35221][SQL] Add join hint build side check
ulysses-you commented on pull request #32355: URL: https://github.com/apache/spark/pull/32355#issuecomment-827333232 thanks for review @maropu @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32358: [SPARK-34837][SQL][FOLLOWUP] Fix division by zero in the avg function over ANSI intervals
SparkQA commented on pull request #32358: URL: https://github.com/apache/spark/pull/32358#issuecomment-827333068 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors
SparkQA commented on pull request #32340: URL: https://github.com/apache/spark/pull/32340#issuecomment-827330598 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42502/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros commented on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
attilapiros commented on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827328738 The description would be perfect with the UI screenshot and it would be so good to take a look how the message is actually rendered on UI. @Ngone51 So I would like to kindly ask you to make those screenshots (before and after) by for example doing a temporarily code change! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors
SparkQA commented on pull request #32340: URL: https://github.com/apache/spark/pull/32340#issuecomment-827328032 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42502/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32198: [SPARK-26164][SQL] Allow concurrent writers for writing dynamic partitions and bucket table
cloud-fan closed pull request #32198: URL: https://github.com/apache/spark/pull/32198 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32198: [SPARK-26164][SQL] Allow concurrent writers for writing dynamic partitions and bucket table
cloud-fan commented on pull request #32198: URL: https://github.com/apache/spark/pull/32198#issuecomment-827326207 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32357: [SPARK-35235][SQL] Add row-based hash map into aggregate benchmark
SparkQA commented on pull request #32357: URL: https://github.com/apache/spark/pull/32357#issuecomment-827322010 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32359: [SPARK-35236][SQL] Support archive files as resources for CREATE FUNCTION USING syntax
SparkQA commented on pull request #32359: URL: https://github.com/apache/spark/pull/32359#issuecomment-827323316 **[Test build #137984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137984/testReport)** for PR 32359 at commit [`a6b9f1a`](https://github.com/apache/spark/commit/a6b9f1ae9e564e1c8c07c8987de6de0ca5f728cd). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32314: [SPARK-35169][SQL] Fix wrong result of min ANSI interval division by -1
AmplabJenkins removed a comment on pull request #32314: URL: https://github.com/apache/spark/pull/32314#issuecomment-827322372 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42500/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32357: [SPARK-35235][SQL] Add row-based hash map into aggregate benchmark
AmplabJenkins removed a comment on pull request #32357: URL: https://github.com/apache/spark/pull/32357#issuecomment-827322371 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42501/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32314: [SPARK-35169][SQL] Fix wrong result of min ANSI interval division by -1
AmplabJenkins commented on pull request #32314: URL: https://github.com/apache/spark/pull/32314#issuecomment-827322372 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42500/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32357: [SPARK-35235][SQL] Add row-based hash map into aggregate benchmark
AmplabJenkins commented on pull request #32357: URL: https://github.com/apache/spark/pull/32357#issuecomment-827322371 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42501/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #32359: [SPARK-35236][SQL] Support archive files as resources for CREATE FUNCTION USING syntax
sarutak opened a new pull request #32359: URL: https://github.com/apache/spark/pull/32359 ### What changes were proposed in this pull request? This PR proposes to make `CREATE FUNCTION USING` syntax can take archives as resources. ### Why are the changes needed? It would be useful. `CREATE FUNCTION USING` syntax doesn't support archives as resources because archives were not supported in Spark SQL. Now Spark SQL supports archives so I think we can support them for the syntax. ### Does this PR introduce _any_ user-facing change? Yes. Users can specify archives for `CREATE FUNCTION USING` syntax. ### How was this patch tested? New test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32314: [SPARK-35169][SQL] Fix wrong result of min ANSI interval division by -1
SparkQA commented on pull request #32314: URL: https://github.com/apache/spark/pull/32314#issuecomment-827314021 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32344: [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
AmplabJenkins removed a comment on pull request #32344: URL: https://github.com/apache/spark/pull/32344#issuecomment-827306943 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137974/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32344: [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
AmplabJenkins commented on pull request #32344: URL: https://github.com/apache/spark/pull/32344#issuecomment-827306943 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137974/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32344: [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
AmplabJenkins removed a comment on pull request #32344: URL: https://github.com/apache/spark/pull/32344#issuecomment-827267133 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42494/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32344: [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
SparkQA removed a comment on pull request #32344: URL: https://github.com/apache/spark/pull/32344#issuecomment-827242970 **[Test build #137974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137974/testReport)** for PR 32344 at commit [`5df750d`](https://github.com/apache/spark/commit/5df750dcc364ea866aca7b2b9e3e921bd470b5f0). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32344: [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
SparkQA commented on pull request #32344: URL: https://github.com/apache/spark/pull/32344#issuecomment-827306300 **[Test build #137974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137974/testReport)** for PR 32344 at commit [`5df750d`](https://github.com/apache/spark/commit/5df750dcc364ea866aca7b2b9e3e921bd470b5f0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32358: [SPARK-34837][SQL][FOLLOWUP] Support ANSI SQL intervals by the aggregate function avg
SparkQA commented on pull request #32358: URL: https://github.com/apache/spark/pull/32358#issuecomment-827306180 **[Test build #137983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137983/testReport)** for PR 32358 at commit [`02d8282`](https://github.com/apache/spark/commit/02d82823f86117ff6c8399a1d653283a669482ea). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request #32358: [SPARK-34837][SQL][FOLLOWUP] Support ANSI SQL intervals by the aggregate function avg
beliefer opened a new pull request #32358: URL: https://github.com/apache/spark/pull/32358 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32229 support ANSI SQL intervals by the aggregate function `avg`. But have not treat that the input zero rows. so this will lead to: ``` Caused by: java.lang.ArithmeticException: / by zero at com.google.common.math.LongMath.divide(LongMath.java:367) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1864) at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1253) at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1253) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2248) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1437) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` ### Why are the changes needed? Fix a bug. ### Does this PR introduce _any_ user-facing change? No. Just new feature. ### How was this patch tested? new tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
AmplabJenkins removed a comment on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827304500 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42497/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
AmplabJenkins commented on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827304500 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42497/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
SparkQA commented on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827304483 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors
SparkQA commented on pull request #32340: URL: https://github.com/apache/spark/pull/32340#issuecomment-827304365 **[Test build #137982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137982/testReport)** for PR 32340 at commit [`b6f1dc0`](https://github.com/apache/spark/commit/b6f1dc00c7a8ae9eb5ef9133596142b6c3cd58ac). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32357: [SPARK-35235][SQL] Add row-based hash map into aggregate benchmark
SparkQA commented on pull request #32357: URL: https://github.com/apache/spark/pull/32357#issuecomment-827304353 **[Test build #137981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137981/testReport)** for PR 32357 at commit [`469ac63`](https://github.com/apache/spark/commit/469ac63f7df8ae1c7cd0457fc8027c6d6bff82e3). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
AmplabJenkins removed a comment on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827303783 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42499/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32355: [SPARK-35221][SQL] Add join hint build side check
AmplabJenkins removed a comment on pull request #32355: URL: https://github.com/apache/spark/pull/32355#issuecomment-827303781 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42498/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32355: [SPARK-35221][SQL] Add join hint build side check
AmplabJenkins commented on pull request #32355: URL: https://github.com/apache/spark/pull/32355#issuecomment-827303781 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42498/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
AmplabJenkins commented on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827303783 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42499/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32355: [SPARK-35221][SQL] Add join hint build side check
SparkQA commented on pull request #32355: URL: https://github.com/apache/spark/pull/32355#issuecomment-827303225 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42498/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32355: [SPARK-35221][SQL] Add join hint build side check
SparkQA commented on pull request #32355: URL: https://github.com/apache/spark/pull/32355#issuecomment-827301506 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42498/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
SparkQA commented on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827300152 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42499/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 edited a comment on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
Ngone51 edited a comment on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827294571 @attilapiros I have done the experiment on UI. And it turns out that the format in UI is always correct. That's because we passed the formatted `failureMessage` to UI directly without any change. You could check the related code there: https://github.com/apache/spark/blob/38ef4771d447f6135382ee2767b3f32b96cb1b0e/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 opened a new pull request #32357: [SPARK-35235][SQL] Add row-based hash map into aggregate benchmark
c21 opened a new pull request #32357: URL: https://github.com/apache/spark/pull/32357 ### What changes were proposed in this pull request? `AggregateBenchmark` is only testing the performance for vectorized fast hash map, but not row-based hash map (which is used by default). We should add the row-based hash map into the benchmark. java 8 benchmark run - https://github.com/c21/spark/actions/runs/787731549 java 11 benchmark run - https://github.com/c21/spark/actions/runs/787742858 ### Why are the changes needed? To have and track a basic sense of benchmarking different fast hash map used in hash aggregate. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit test, as this only touches benchmark code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
Ngone51 commented on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827294571 @attilapiros I have done the experiment on UI. And it turns out that the format in UI is always correct. That's because we passed the formatted `failureMessage` to UI directly without any change. See: https://github.com/apache/spark/blob/38ef4771d447f6135382ee2767b3f32b96cb1b0e/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors
Peng-Lei commented on a change in pull request #32340: URL: https://github.com/apache/spark/pull/32340#discussion_r620832720 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java ## @@ -508,4 +516,37 @@ final ColumnarMap getMap(int rowId) { super(vector); } } + + private static class IntervalYearAccessor extends ArrowVectorAccessor { + +private final IntervalYearVector accessor; + +IntervalYearAccessor(IntervalYearVector vector) { + super(vector); + this.accessor = vector; +} + +@Override +int getInt(int rowId) { + return accessor.get(rowId); +} + } + + private static class IntervalDayAccessor extends ArrowVectorAccessor { + +private final IntervalDayVector accessor; +private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder(); + +IntervalDayAccessor(IntervalDayVector vector) { + super(vector); + this.accessor = vector; +} + +@Override +long getLong(int rowId) { + accessor.get(rowId, intervalDayHolder); + return Math.addExact(intervalDayHolder.days * MICROS_PER_DAY, Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
yaooqinn commented on a change in pull request #32351: URL: https://github.com/apache/spark/pull/32351#discussion_r620828995 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -29,67 +29,105 @@ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.CalendarInterval -abstract class ExtractIntervalPart( -child: Expression, +abstract class ExtractIntervalPart[T]( val dataType: DataType, -func: CalendarInterval => Any, -funcName: String) - extends UnaryExpression with ExpectsInputTypes with NullIntolerant with Serializable { - - override def inputTypes: Seq[AbstractDataType] = Seq(CalendarIntervalType) - - override protected def nullSafeEval(interval: Any): Any = { -func(interval.asInstanceOf[CalendarInterval]) - } - +func: T => Any, +funcName: String) extends UnaryExpression with NullIntolerant with Serializable { override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val iu = IntervalUtils.getClass.getName.stripSuffix("$") defineCodeGen(ctx, ev, c => s"$iu.$funcName($c)") } + + override protected def nullSafeEval(interval: Any): Any = { +func(interval.asInstanceOf[T]) + } } case class ExtractIntervalYears(child: Expression) - extends ExtractIntervalPart(child, IntegerType, getYears, "getYears") { + extends ExtractIntervalPart[CalendarInterval](IntegerType, getYears, "getYears") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalYears = copy(child = newChild) } case class ExtractIntervalMonths(child: Expression) - extends ExtractIntervalPart(child, ByteType, getMonths, "getMonths") { + extends ExtractIntervalPart[CalendarInterval](ByteType, getMonths, "getMonths") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalMonths = copy(child = newChild) } case class ExtractIntervalDays(child: Expression) - extends ExtractIntervalPart(child, IntegerType, getDays, "getDays") { + extends ExtractIntervalPart[CalendarInterval](IntegerType, getDays, "getDays") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalDays = copy(child = newChild) } case class ExtractIntervalHours(child: Expression) - extends ExtractIntervalPart(child, LongType, getHours, "getHours") { + extends ExtractIntervalPart[CalendarInterval](ByteType, getHours, "getHours") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalHours = copy(child = newChild) } case class ExtractIntervalMinutes(child: Expression) - extends ExtractIntervalPart(child, ByteType, getMinutes, "getMinutes") { + extends ExtractIntervalPart[CalendarInterval](ByteType, getMinutes, "getMinutes") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalMinutes = copy(child = newChild) } case class ExtractIntervalSeconds(child: Expression) - extends ExtractIntervalPart(child, DecimalType(8, 6), getSeconds, "getSeconds") { + extends ExtractIntervalPart[CalendarInterval](DecimalType(8, 6), getSeconds, "getSeconds") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalSeconds = copy(child = newChild) } +case class ExtractANSIIntervalYears(child: Expression) +extends ExtractIntervalPart[Int](IntegerType, getYears, "getYears") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalYears = +copy(child = newChild) +} + +case class ExtractANSIIntervalMonths(child: Expression) +extends ExtractIntervalPart[Int](ByteType, getMonths, "getMonths") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalMonths = +copy(child = newChild) +} + +case class ExtractANSIIntervalDays(child: Expression) +extends ExtractIntervalPart[Long](IntegerType, getDays, "getDays") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalDays = { +copy(child = newChild) + } +} + +case class ExtractANSIIntervalHours(child: Expression) +extends ExtractIntervalPart[Long](ByteType, getHours, "getHours") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalHours = +copy(child = newChild) +} + +case class ExtractANSIIntervalMinutes(child: Expression) +extends ExtractIntervalPart[Long](ByteType, getMinutes, "getMinutes") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalMinutes = +copy(child = newChild) +} + +case class ExtractANSIIntervalSeconds(child: Expression) +extends ExtractIntervalPart[Long](DecimalType(8, 6), getSeconds, "getSeconds") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalSeconds = +copy(child = newChild) +} +
[GitHub] [spark] Ngone51 commented on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
Ngone51 commented on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827288510 Yes, the screenshot is from a test. It's for convenient purpose. But the stage failure is shown to users directly, right? So I consider it a user-facing change. Attach UI changes sounds like better idea, which is surely a user-facing change. I'll try. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32314: [SPARK-35169][SQL] Fix wrong result of min ANSI interval division by -1
SparkQA commented on pull request #32314: URL: https://github.com/apache/spark/pull/32314#issuecomment-827287522 **[Test build #137980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137980/testReport)** for PR 32314 at commit [`8363249`](https://github.com/apache/spark/commit/83632491d6f9d016186075429bfcdb438e3e5f80). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
SparkQA commented on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827287500 **[Test build #137979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137979/testReport)** for PR 32351 at commit [`e4bc0f6`](https://github.com/apache/spark/commit/e4bc0f625d10410750cd9bed7e01e28a3649023b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32355: [SPARK-35221][SQL] Add join hint build side check
SparkQA commented on pull request #32355: URL: https://github.com/apache/spark/pull/32355#issuecomment-827287492 **[Test build #137978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137978/testReport)** for PR 32355 at commit [`97029f2`](https://github.com/apache/spark/commit/97029f2765df9694c0e698c5addf3059c2c926d7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
SparkQA commented on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827287450 **[Test build #137977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137977/testReport)** for PR 32356 at commit [`0896255`](https://github.com/apache/spark/commit/0896255f43c622e096cab6b0eee5bc4f715abc40). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros commented on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
attilapiros commented on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827287106 @Ngone51 the images you attached to the PR description are the logs of a running test? If yes that would be not a user-facing change but a developer-facing one :) But I think these messages are landing on the UI and those should be attached to the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
Ngone51 commented on a change in pull request #32356: URL: https://github.com/apache/spark/pull/32356#discussion_r620826226 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -1951,11 +1950,8 @@ private[spark] class DAGScheduler( "Barrier stage will not retry stage due to testing config. Most recent failure " + s"reason: $message" } else { - s"""$failedStage (${failedStage.name}) - |has failed the maximum allowable number of - |times: $maxConsecutiveStageAttempts. - |Most recent failure reason: $message - """.stripMargin.replaceAll("\n", " ") + s"$failedStage (${failedStage.name}) has failed the maximum allowable number of " + +s"times: $maxConsecutiveStageAttempts. Most recent failure reason: $message" Review comment: Note that we don't append `\n` for this "Most recent failure reason" because `message` already contains it: ```scala val message = s"Stage failed because barrier task $task finished unsuccessfully.\n" + failure.toErrorString ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
AmplabJenkins removed a comment on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-827286546 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137973/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
AmplabJenkins commented on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827286547 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42495/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32354: [SPARK-35232][SQL] Nested column pruning should retain column metadata
AmplabJenkins removed a comment on pull request #32354: URL: https://github.com/apache/spark/pull/32354#issuecomment-827286545 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137972/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
AmplabJenkins commented on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-827286546 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137973/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32354: [SPARK-35232][SQL] Nested column pruning should retain column metadata
AmplabJenkins commented on pull request #32354: URL: https://github.com/apache/spark/pull/32354#issuecomment-827286545 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137972/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
AmplabJenkins removed a comment on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827269014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format
AmplabJenkins removed a comment on pull request #31847: URL: https://github.com/apache/spark/pull/31847#issuecomment-827286544 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42496/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format
AmplabJenkins commented on pull request #31847: URL: https://github.com/apache/spark/pull/31847#issuecomment-827286544 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42496/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
cloud-fan commented on a change in pull request #32351: URL: https://github.com/apache/spark/pull/32351#discussion_r620824798 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -29,67 +29,105 @@ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.CalendarInterval -abstract class ExtractIntervalPart( -child: Expression, +abstract class ExtractIntervalPart[T]( val dataType: DataType, -func: CalendarInterval => Any, -funcName: String) - extends UnaryExpression with ExpectsInputTypes with NullIntolerant with Serializable { - - override def inputTypes: Seq[AbstractDataType] = Seq(CalendarIntervalType) - - override protected def nullSafeEval(interval: Any): Any = { -func(interval.asInstanceOf[CalendarInterval]) - } - +func: T => Any, +funcName: String) extends UnaryExpression with NullIntolerant with Serializable { override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val iu = IntervalUtils.getClass.getName.stripSuffix("$") defineCodeGen(ctx, ev, c => s"$iu.$funcName($c)") } + + override protected def nullSafeEval(interval: Any): Any = { +func(interval.asInstanceOf[T]) + } } case class ExtractIntervalYears(child: Expression) - extends ExtractIntervalPart(child, IntegerType, getYears, "getYears") { + extends ExtractIntervalPart[CalendarInterval](IntegerType, getYears, "getYears") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalYears = copy(child = newChild) } case class ExtractIntervalMonths(child: Expression) - extends ExtractIntervalPart(child, ByteType, getMonths, "getMonths") { + extends ExtractIntervalPart[CalendarInterval](ByteType, getMonths, "getMonths") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalMonths = copy(child = newChild) } case class ExtractIntervalDays(child: Expression) - extends ExtractIntervalPart(child, IntegerType, getDays, "getDays") { + extends ExtractIntervalPart[CalendarInterval](IntegerType, getDays, "getDays") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalDays = copy(child = newChild) } case class ExtractIntervalHours(child: Expression) - extends ExtractIntervalPart(child, LongType, getHours, "getHours") { + extends ExtractIntervalPart[CalendarInterval](ByteType, getHours, "getHours") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalHours = copy(child = newChild) } case class ExtractIntervalMinutes(child: Expression) - extends ExtractIntervalPart(child, ByteType, getMinutes, "getMinutes") { + extends ExtractIntervalPart[CalendarInterval](ByteType, getMinutes, "getMinutes") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalMinutes = copy(child = newChild) } case class ExtractIntervalSeconds(child: Expression) - extends ExtractIntervalPart(child, DecimalType(8, 6), getSeconds, "getSeconds") { + extends ExtractIntervalPart[CalendarInterval](DecimalType(8, 6), getSeconds, "getSeconds") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalSeconds = copy(child = newChild) } +case class ExtractANSIIntervalYears(child: Expression) +extends ExtractIntervalPart[Int](IntegerType, getYears, "getYears") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalYears = +copy(child = newChild) +} + +case class ExtractANSIIntervalMonths(child: Expression) +extends ExtractIntervalPart[Int](ByteType, getMonths, "getMonths") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalMonths = +copy(child = newChild) +} + +case class ExtractANSIIntervalDays(child: Expression) +extends ExtractIntervalPart[Long](IntegerType, getDays, "getDays") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalDays = { +copy(child = newChild) + } +} + +case class ExtractANSIIntervalHours(child: Expression) +extends ExtractIntervalPart[Long](ByteType, getHours, "getHours") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalHours = +copy(child = newChild) +} + +case class ExtractANSIIntervalMinutes(child: Expression) +extends ExtractIntervalPart[Long](ByteType, getMinutes, "getMinutes") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalMinutes = +copy(child = newChild) +} + +case class ExtractANSIIntervalSeconds(child: Expression) +extends ExtractIntervalPart[Long](DecimalType(8, 6), getSeconds, "getSeconds") { + override protected def withNewChildInternal(newChild: Expression): ExtractANSIIntervalSeconds = +copy(child = newChild) +} +
[GitHub] [spark] Ngone51 commented on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
Ngone51 commented on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827285607 SGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #32314: [SPARK-35169][SQL] Fix wrong result of min ANSI interval division by -1
AngersZh commented on a change in pull request #32314: URL: https://github.com/apache/spark/pull/32314#discussion_r620824625 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -418,20 +430,31 @@ case class DivideYMInterval( } override def nullSafeEval(interval: Any, num: Any): Any = { +checkDivideOverflow(interval.asInstanceOf[Int], Int.MinValue, right, num) evalFunc(interval.asInstanceOf[Int], num) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = right.dataType match { -case LongType => - val math = classOf[LongMath].getName +case t: IntegralType => + val math = t match { +case LongType => classOf[LongMath].getName +case _ => classOf[IntMath].getName + } val javaType = CodeGenerator.javaType(dataType) - defineCodeGen(ctx, ev, (m, n) => + val micros = left.genCode(ctx) Review comment: done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -418,20 +430,31 @@ case class DivideYMInterval( } override def nullSafeEval(interval: Any, num: Any): Any = { +checkDivideOverflow(interval.asInstanceOf[Int], Int.MinValue, right, num) evalFunc(interval.asInstanceOf[Int], num) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = right.dataType match { -case LongType => - val math = classOf[LongMath].getName +case t: IntegralType => + val math = t match { +case LongType => classOf[LongMath].getName +case _ => classOf[IntMath].getName + } val javaType = CodeGenerator.javaType(dataType) - defineCodeGen(ctx, ev, (m, n) => + val micros = left.genCode(ctx) + val num = right.genCode(ctx) + val checkIntegralDivideOverflow = +s""" + |if (${micros.value} == ${Int.MinValue}L && ${num.value} == -1) Review comment: done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -418,20 +430,31 @@ case class DivideYMInterval( } override def nullSafeEval(interval: Any, num: Any): Any = { +checkDivideOverflow(interval.asInstanceOf[Int], Int.MinValue, right, num) evalFunc(interval.asInstanceOf[Int], num) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = right.dataType match { -case LongType => - val math = classOf[LongMath].getName +case t: IntegralType => + val math = t match { +case LongType => classOf[LongMath].getName +case _ => classOf[IntMath].getName + } val javaType = CodeGenerator.javaType(dataType) - defineCodeGen(ctx, ev, (m, n) => + val micros = left.genCode(ctx) + val num = right.genCode(ctx) + val checkIntegralDivideOverflow = +s""" + |if (${micros.value} == ${Int.MinValue}L && ${num.value} == -1) Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format
SparkQA commented on pull request #31847: URL: https://github.com/apache/spark/pull/31847#issuecomment-827285490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
SparkQA commented on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827284834 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42495/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros edited a comment on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
attilapiros edited a comment on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827284117 There is another case of using `replaceAll` on the message: https://github.com/apache/spark/blob/78caf0a53e60d81e6211faadfc45fede5ef9c941/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1957 What about fixing that one too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros commented on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
attilapiros commented on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827284117 There is another case of using `replaceAll` on the message: https://github.com/apache/spark/blob/78caf0a53e60d81e6211faadfc45fede5ef9c941/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1957 What about fixing that one, too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32354: [SPARK-35232][SQL] Nested column pruning should retain column metadata
SparkQA removed a comment on pull request #32354: URL: https://github.com/apache/spark/pull/32354#issuecomment-827183325 **[Test build #137972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137972/testReport)** for PR 32354 at commit [`f90d882`](https://github.com/apache/spark/commit/f90d8822ccad024f9b95356736b0e83e4a3a06df). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32354: [SPARK-35232][SQL] Nested column pruning should retain column metadata
SparkQA commented on pull request #32354: URL: https://github.com/apache/spark/pull/32354#issuecomment-827283692 **[Test build #137972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137972/testReport)** for PR 32354 at commit [`f90d882`](https://github.com/apache/spark/commit/f90d8822ccad024f9b95356736b0e83e4a3a06df). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
SparkQA commented on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827283421 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42495/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32314: [SPARK-35169][SQL] Fix wrong result of min ANSI interval division by -1
cloud-fan commented on a change in pull request #32314: URL: https://github.com/apache/spark/pull/32314#discussion_r620822133 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -418,20 +430,31 @@ case class DivideYMInterval( } override def nullSafeEval(interval: Any, num: Any): Any = { +checkDivideOverflow(interval.asInstanceOf[Int], Int.MinValue, right, num) evalFunc(interval.asInstanceOf[Int], num) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = right.dataType match { -case LongType => - val math = classOf[LongMath].getName +case t: IntegralType => + val math = t match { +case LongType => classOf[LongMath].getName +case _ => classOf[IntMath].getName + } val javaType = CodeGenerator.javaType(dataType) - defineCodeGen(ctx, ev, (m, n) => + val micros = left.genCode(ctx) + val num = right.genCode(ctx) + val checkIntegralDivideOverflow = +s""" + |if (${micros.value} == ${Int.MinValue}L && ${num.value} == -1) Review comment: nit: no L after `${Int.MinValue}` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32314: [SPARK-35169][SQL] Fix wrong result of min ANSI interval division by -1
cloud-fan commented on a change in pull request #32314: URL: https://github.com/apache/spark/pull/32314#discussion_r620821986 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -418,20 +430,31 @@ case class DivideYMInterval( } override def nullSafeEval(interval: Any, num: Any): Any = { +checkDivideOverflow(interval.asInstanceOf[Int], Int.MinValue, right, num) evalFunc(interval.asInstanceOf[Int], num) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = right.dataType match { -case LongType => - val math = classOf[LongMath].getName +case t: IntegralType => + val math = t match { +case LongType => classOf[LongMath].getName +case _ => classOf[IntMath].getName + } val javaType = CodeGenerator.javaType(dataType) - defineCodeGen(ctx, ev, (m, n) => + val micros = left.genCode(ctx) Review comment: not `micros`, but `months` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
cloud-fan commented on a change in pull request #32032: URL: https://github.com/apache/spark/pull/32032#discussion_r620819712 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ## @@ -48,29 +48,41 @@ import org.apache.spark.sql.util.SchemaUtils * @param properties the properties of this view. * @param originalText the original SQL text of this view, can be None if this view is created via * Dataset API. - * @param child the logical plan that represents the view; this is used to generate the logical - * plan for temporary view and the view schema. + * @param plan the logical plan that represents the view; this is used to generate the logical + * plan for temporary view and the view schema. * @param allowExisting if true, and if the view already exists, noop; if false, and if the view *already exists, throws analysis exception. * @param replace if true, and if the view already exists, updates it; if false, and if the view *already exists, throws analysis exception. * @param viewType the expected view type to be created with this command. + * @param isAnalyzed whether this command is analyzed or not. */ case class CreateViewCommand( name: TableIdentifier, userSpecifiedColumns: Seq[(String, Option[String])], comment: Option[String], properties: Map[String, String], originalText: Option[String], -child: LogicalPlan, +plan: LogicalPlan, allowExisting: Boolean, replace: Boolean, -viewType: ViewType) - extends LeafRunnableCommand { +viewType: ViewType, +isAnalyzed: Boolean = false) extends RunnableCommand with AnalysisOnlyCommand { import ViewHelper._ - override def innerChildren: Seq[QueryPlan[_]] = Seq(child) + override protected def withNewChildrenInternal( + newChildren: IndexedSeq[LogicalPlan]): CreateViewCommand = { +assert(!isAnalyzed) +copy(plan = newChildren.head) + } + + override def innerChildren: Seq[QueryPlan[_]] = Seq(plan) Review comment: but the "Parsed Logical Plan" should be unresolved and the `plan` is in both `children` and `innerChildren`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #32314: [SPARK-35169][SQL] Fix wrong result of min ANSI interval division by -1
AngersZh commented on a change in pull request #32314: URL: https://github.com/apache/spark/pull/32314#discussion_r620819191 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -402,6 +403,17 @@ case class DivideYMInterval( override def inputTypes: Seq[AbstractDataType] = Seq(YearMonthIntervalType, NumericType) override def dataType: DataType = YearMonthIntervalType + def checkDivideOverflow(month: Int, num: Any): Unit = { +if (month == Int.MinValue) { Review comment: Done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -418,20 +430,42 @@ case class DivideYMInterval( } override def nullSafeEval(interval: Any, num: Any): Any = { +checkDivideOverflow(interval.asInstanceOf[Int], num) evalFunc(interval.asInstanceOf[Int], num) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = right.dataType match { case LongType => val math = classOf[LongMath].getName val javaType = CodeGenerator.javaType(dataType) - defineCodeGen(ctx, ev, (m, n) => + val micros = left.genCode(ctx) + val num = right.genCode(ctx) + val checkIntegralDivideOverflow = +s""" + |if (${micros.value} == ${Int.MinValue}L && ${num.value} == -1L) + | throw QueryExecutionErrors.overflowInIntegralDivideError(); + |""".stripMargin + nullSafeCodeGen(ctx, ev, (m, n) => // Similarly to non-codegen code. The result of `divide(Int, Long, ...)` must fit to `Int`. // Casting to `Int` is safe here. -s"($javaType)($math.divide($m, $n, java.math.RoundingMode.HALF_UP))") +s""" + |$checkIntegralDivideOverflow + |${ev.value} = ($javaType)($math.divide($m, $n, java.math.RoundingMode.HALF_UP)); +""".stripMargin) case _: IntegralType => val math = classOf[IntMath].getName - defineCodeGen(ctx, ev, (m, n) => s"$math.divide($m, $n, java.math.RoundingMode.HALF_UP)") + val micros = left.genCode(ctx) + val num = right.genCode(ctx) + val checkIntegralDivideOverflow = +s""" + |if (${micros.value} == ${Int.MinValue}L && ${num.value} == -1L) Review comment: Done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -418,20 +430,42 @@ case class DivideYMInterval( } override def nullSafeEval(interval: Any, num: Any): Any = { +checkDivideOverflow(interval.asInstanceOf[Int], num) evalFunc(interval.asInstanceOf[Int], num) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = right.dataType match { case LongType => val math = classOf[LongMath].getName val javaType = CodeGenerator.javaType(dataType) - defineCodeGen(ctx, ev, (m, n) => + val micros = left.genCode(ctx) + val num = right.genCode(ctx) + val checkIntegralDivideOverflow = +s""" + |if (${micros.value} == ${Int.MinValue}L && ${num.value} == -1L) + | throw QueryExecutionErrors.overflowInIntegralDivideError(); + |""".stripMargin + nullSafeCodeGen(ctx, ev, (m, n) => // Similarly to non-codegen code. The result of `divide(Int, Long, ...)` must fit to `Int`. // Casting to `Int` is safe here. -s"($javaType)($math.divide($m, $n, java.math.RoundingMode.HALF_UP))") +s""" + |$checkIntegralDivideOverflow + |${ev.value} = ($javaType)($math.divide($m, $n, java.math.RoundingMode.HALF_UP)); +""".stripMargin) case _: IntegralType => val math = classOf[IntMath].getName Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
SparkQA removed a comment on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-827183412 **[Test build #137973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137973/testReport)** for PR 32303 at commit [`af0c3b0`](https://github.com/apache/spark/commit/af0c3b0bb4647ce50700151f6c04089c34f915d9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
SparkQA commented on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-827278662 **[Test build #137973 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137973/testReport)** for PR 32303 at commit [`af0c3b0`](https://github.com/apache/spark/commit/af0c3b0bb4647ce50700151f6c04089c34f915d9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
Ngone51 commented on pull request #32356: URL: https://github.com/apache/spark/pull/32356#issuecomment-827278383 cc @mridulm @tgravescs @attilapiros -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 opened a new pull request #32356: [SPARK-35234][CORE] Reserve the format of stage failureMessage
Ngone51 opened a new pull request #32356: URL: https://github.com/apache/spark/pull/32356 ### What changes were proposed in this pull request? `failureMessage` is already formatted, but `replaceAll("\n", " ")` destroyed the format. This PR fixed it. ### Why are the changes needed? The formatted error message is easier to read and debug. ### Does this PR introduce _any_ user-facing change? Yes, users see the clear error message of stage failure. Before: ![2141619490903_ pic_hd](https://user-images.githubusercontent.com/16397174/116177970-5a092f00-a747-11eb-9a0f-017391e80c8b.jpg) After: ![2151619490955_ pic_hd](https://user-images.githubusercontent.com/16397174/116177981-5ecde300-a747-11eb-90ef-fd16e906beeb.jpg) ### How was this patch tested? Manually tested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you opened a new pull request #32355: [SPARK-35221][SQL] Add join hint build side check
ulysses-you opened a new pull request #32355: URL: https://github.com/apache/spark/pull/32355 ### What changes were proposed in this pull request? Print warning msg if join hint is not supported for the specified build side. ### Why are the changes needed? Currently we support specify the join implementation with hint, but Spark did not promised it. For example broadcast outer join and hash outer join we need to check if its build side was supported. And at least we should print some warning log instead of changing to other join implementation silently. ### Does this PR introduce _any_ user-facing change? Yes, warn logging might be printed. ### How was this patch tested? Add new test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
SparkQA removed a comment on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827268063 **[Test build #137975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137975/testReport)** for PR 32351 at commit [`db74496`](https://github.com/apache/spark/commit/db744963bde1ac6ceae6ba9d42f961654d7d5613). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tiehexue commented on pull request #32341: [SPARK-35212][Spark Core][DStreams] Another way to added PreferRandom for the scenario that topic partitions need to be randomly distributed
tiehexue commented on pull request #32341: URL: https://github.com/apache/spark/pull/32341#issuecomment-827273196 > @tiehexue can you keep the PR description template https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE? Just updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
AmplabJenkins commented on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827269014 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137975/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
SparkQA commented on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827268999 **[Test build #137975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137975/testReport)** for PR 32351 at commit [`db74496`](https://github.com/apache/spark/commit/db744963bde1ac6ceae6ba9d42f961654d7d5613). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ExtractANSIIntervalYears(child: Expression)` * `case class ExtractANSIIntervalMonths(child: Expression)` * `case class ExtractANSIIntervalDays(child: Expression)` * `case class ExtractANSIIntervalHours(child: Expression)` * `case class ExtractANSIIntervalMinutes(child: Expression)` * `case class ExtractANSIIntervalSeconds(child: Expression)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format
SparkQA commented on pull request #31847: URL: https://github.com/apache/spark/pull/31847#issuecomment-827268284 **[Test build #137976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137976/testReport)** for PR 31847 at commit [`a95dcc7`](https://github.com/apache/spark/commit/a95dcc79da59296596a91a1e72c59aa23f2d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
SparkQA commented on pull request #32351: URL: https://github.com/apache/spark/pull/32351#issuecomment-827268063 **[Test build #137975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137975/testReport)** for PR 32351 at commit [`db74496`](https://github.com/apache/spark/commit/db744963bde1ac6ceae6ba9d42f961654d7d5613). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32344: [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
AmplabJenkins commented on pull request #32344: URL: https://github.com/apache/spark/pull/32344#issuecomment-827267133 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42494/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32344: [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
SparkQA commented on pull request #32344: URL: https://github.com/apache/spark/pull/32344#issuecomment-827264569 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42494/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32344: [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
SparkQA commented on pull request #32344: URL: https://github.com/apache/spark/pull/32344#issuecomment-827263119 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42494/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #32272: [SPARK-35172][SS] The implementation of RocksDBCheckpointMetadata
HeartSaVioR commented on pull request #32272: URL: https://github.com/apache/spark/pull/32272#issuecomment-827262787 Sorry to visit this lately. I just went through design doc and left some comments. Probably it'd be nice if we can resolve comments on the design doc and reflect them to current/following PRs. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32350: [SPARK-35231][SQL] logical.Range override maxRowsPerPartition
HyukjinKwon commented on pull request #32350: URL: https://github.com/apache/spark/pull/32350#issuecomment-827261427 cc @wangyum FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32344: [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
HyukjinKwon commented on pull request #32344: URL: https://github.com/apache/spark/pull/32344#issuecomment-827259111 cc @gaborgsomogyi FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32342: [SPARK-35225][SQL] EXPLAIN command should handle empty output of analyzed plan.
HyukjinKwon closed pull request #32342: URL: https://github.com/apache/spark/pull/32342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32342: [SPARK-35225][SQL] EXPLAIN command should handle empty output of analyzed plan.
HyukjinKwon commented on pull request #32342: URL: https://github.com/apache/spark/pull/32342#issuecomment-827258686 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32341: [SPARK-35212][Spark Core][DStreams] Another way to added PreferRandom for the scenario that topic partitions need to be randomly distribu
HyukjinKwon commented on pull request #32341: URL: https://github.com/apache/spark/pull/32341#issuecomment-827258216 @tiehexue can you keep the PR description template https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
yaooqinn commented on a change in pull request #32351: URL: https://github.com/apache/spark/pull/32351#discussion_r620791680 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -98,6 +130,19 @@ object ExtractIntervalPart { case "SECOND" | "S" | "SEC" | "SECONDS" | "SECS" => ExtractIntervalSeconds(source) case _ => errorHandleFunc } + + def parseExtractFieldANSI( Review comment: OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #32351: [SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
yaooqinn commented on a change in pull request #32351: URL: https://github.com/apache/spark/pull/32351#discussion_r620791633 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala ## @@ -29,61 +29,93 @@ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.CalendarInterval -abstract class ExtractIntervalPart( -child: Expression, +abstract class ExtractIntervalPart[T]( val dataType: DataType, -func: CalendarInterval => Any, -funcName: String) - extends UnaryExpression with ExpectsInputTypes with NullIntolerant with Serializable { - - override def inputTypes: Seq[AbstractDataType] = Seq(CalendarIntervalType) - - override protected def nullSafeEval(interval: Any): Any = { -func(interval.asInstanceOf[CalendarInterval]) - } - +func: T => Any, +funcName: String) extends UnaryExpression with NullIntolerant with Serializable { override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val iu = IntervalUtils.getClass.getName.stripSuffix("$") defineCodeGen(ctx, ev, c => s"$iu.$funcName($c)") } + + override protected def nullSafeEval(interval: Any): Any = { +func(interval.asInstanceOf[T]) + } } case class ExtractIntervalYears(child: Expression) - extends ExtractIntervalPart(child, IntegerType, getYears, "getYears") { + extends ExtractIntervalPart[CalendarInterval](IntegerType, getYears, "getYears") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalYears = copy(child = newChild) } case class ExtractIntervalMonths(child: Expression) - extends ExtractIntervalPart(child, ByteType, getMonths, "getMonths") { + extends ExtractIntervalPart[CalendarInterval](ByteType, getMonths, "getMonths") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalMonths = copy(child = newChild) } case class ExtractIntervalDays(child: Expression) - extends ExtractIntervalPart(child, IntegerType, getDays, "getDays") { + extends ExtractIntervalPart[CalendarInterval](IntegerType, getDays, "getDays") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalDays = copy(child = newChild) } case class ExtractIntervalHours(child: Expression) - extends ExtractIntervalPart(child, LongType, getHours, "getHours") { + extends ExtractIntervalPart[CalendarInterval](ByteType, getHours, "getHours") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalHours = copy(child = newChild) } case class ExtractIntervalMinutes(child: Expression) - extends ExtractIntervalPart(child, ByteType, getMinutes, "getMinutes") { + extends ExtractIntervalPart[CalendarInterval](ByteType, getMinutes, "getMinutes") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalMinutes = copy(child = newChild) } case class ExtractIntervalSeconds(child: Expression) - extends ExtractIntervalPart(child, DecimalType(8, 6), getSeconds, "getSeconds") { + extends ExtractIntervalPart[CalendarInterval](DecimalType(8, 6), getSeconds, "getSeconds") { override protected def withNewChildInternal(newChild: Expression): ExtractIntervalSeconds = copy(child = newChild) } +case class YearsOfYMInterval(child: Expression) Review comment: make sense, updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32346: [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
HyukjinKwon commented on pull request #32346: URL: https://github.com/apache/spark/pull/32346#issuecomment-827256231 Merged to master, branch-3.1, branch-3.0 and branch-2.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32346: [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
HyukjinKwon closed pull request #32346: URL: https://github.com/apache/spark/pull/32346 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #27237: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression
beliefer commented on pull request #27237: URL: https://github.com/apache/spark/pull/27237#issuecomment-827254220 Currently, this work is suspended. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #32244: [SPARK-35060][SQL] Group exception messages in sql/types
beliefer commented on pull request #32244: URL: https://github.com/apache/spark/pull/32244#issuecomment-827253560 @allisonwang-db Thanks for you review. @cloud-fan Thanks for you work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM
Ngone51 commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r620785987 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -613,6 +634,12 @@ final class ShuffleBlockFetcherIterator( } if (isNetworkReqDone) { reqsInFlight -= 1 +if (!buf.isInstanceOf[NettyManagedBuffer]) { Review comment: > For me this means we do not necessary have some extra free off-heap memory by this. Yes. that's true. (So the original assuming here is that whenever there's a completed request, there would be some freed memory.) I have another idea here: https://github.com/apache/spark/pull/32287#discussion_r620779976 Could you also take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM
Ngone51 commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r620782919 ## File path: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala ## @@ -48,15 +49,22 @@ class ShuffleBlockFetcherIteratorSuite extends SparkFunSuite with PrivateMethodT // in the presence of faults. /** Creates a mock [[BlockTransferService]] that returns data from the given map. */ - private def createMockTransfer(data: Map[BlockId, ManagedBuffer]): BlockTransferService = { + private def createMockTransfer( + data: Map[BlockId, ManagedBuffer], + throwNettyOOM: Boolean = false): BlockTransferService = { val transfer = mock(classOf[BlockTransferService]) +var hasThrowOOM = false when(transfer.fetchBlocks(any(), any(), any(), any(), any(), any())).thenAnswer( (invocation: InvocationOnMock) => { val blocks = invocation.getArguments()(3).asInstanceOf[Array[String]] val listener = invocation.getArguments()(4).asInstanceOf[BlockFetchingListener] +val OOMBlockIndex = new Random(System.currentTimeMillis()).nextInt(blocks.length) Review comment: The random intends to test both failure cases of "the last block of a request" and "a middle block of a request". I'll make it determined. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM
Ngone51 commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r620780932 ## File path: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala ## @@ -48,15 +49,22 @@ class ShuffleBlockFetcherIteratorSuite extends SparkFunSuite with PrivateMethodT // in the presence of faults. /** Creates a mock [[BlockTransferService]] that returns data from the given map. */ - private def createMockTransfer(data: Map[BlockId, ManagedBuffer]): BlockTransferService = { + private def createMockTransfer( + data: Map[BlockId, ManagedBuffer], + throwNettyOOM: Boolean = false): BlockTransferService = { val transfer = mock(classOf[BlockTransferService]) +var hasThrowOOM = false when(transfer.fetchBlocks(any(), any(), any(), any(), any(), any())).thenAnswer( (invocation: InvocationOnMock) => { val blocks = invocation.getArguments()(3).asInstanceOf[Array[String]] val listener = invocation.getArguments()(4).asInstanceOf[BlockFetchingListener] +val OOMBlockIndex = new Random(System.currentTimeMillis()).nextInt(blocks.length) -for (blockId <- blocks) { - if (data.contains(BlockId(blockId))) { +for ((blockId, i) <- blocks.zipWithIndex) { + if (throwNettyOOM && !hasThrowOOM && i == OOMBlockIndex) { +hasThrowOOM = true +listener.onBlockFetchFailure(blockId, new TestNettyOutOfMemoryError()) Review comment: This sounds good, I'll try it. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM
Ngone51 commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r620779976 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -613,6 +618,12 @@ final class ShuffleBlockFetcherIterator( } if (isNetworkReqDone) { reqsInFlight -= 1 +if (!buf.isInstanceOf[NettyManagedBuffer]) { + // Non-`NettyManagedBuffer` doesn't occupy Netty's memory so we can unset the flag + // directly once the request succeeds. But for the `NettyManagedBuffer`, we'll only + // unset the flag when the data is fully consumed (see `BufferReleasingInputStream`). + NettyUtils.isNettyOOMOnShuffle = false Review comment: How about this: ```scala if (Netty.freeMemory > minRequestSize) { unset the flag } ``` ? * `minRequestSize` supposed to be the minimize request size of all the pending requests(or block?). I think this would mitigate the situation you mentioned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM
Ngone51 commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r620777695 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -683,7 +694,28 @@ final class ShuffleBlockFetcherIterator( } } -case FailureFetchResult(blockId, mapIndex, address, e) => +// Catching OOM and do something based on it is only a workaround for handling the +// Netty OOM issue, which is not the best way towards memory management. We can +// get rid of it when we find a way to manage Netty's memory precisely. +case FailureFetchResult(blockId, mapIndex, address, size, isNetworkReqDone, e) +if e.isInstanceOf[OutOfDirectMemoryError] || e.isInstanceOf[NettyOutOfMemoryError] => + assert(address != blockManager.blockManagerId && +!hostLocalBlocks.contains(blockId -> mapIndex), +"Netty OOM error should only happen on remote fetch requests") + logWarning(s"Failed to fetch block $blockId due to Netty OOM, will retry", e) + NettyUtils.isNettyOOMOnShuffle = true + numBlocksInFlightPerAddress(address) = numBlocksInFlightPerAddress(address) - 1 + bytesInFlight -= size + if (isNetworkReqDone) { +reqsInFlight -= 1 +logDebug("Number of requests in flight " + reqsInFlight) + } + val defReqQueue = +deferredFetchRequests.getOrElseUpdate(address, new Queue[FetchRequest]()) + defReqQueue.enqueue(FetchRequest(address, Array(FetchBlockInfo(blockId, size, mapIndex Review comment: > When it is not a NettyManagedBuffer then can there still be a OutOfDirectMemoryError? It's possible. In general, for a remote request, it must rely on Netty to receive the data sent from the server, which of course consumes Netty Memory. So it doesn't matter whatever the final stored type is, e.g., `NettyManagedBuffer` or `FileSegmentManagedBuffer`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32344: [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
SparkQA commented on pull request #32344: URL: https://github.com/apache/spark/pull/32344#issuecomment-827242970 **[Test build #137974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137974/testReport)** for PR 32344 at commit [`5df750d`](https://github.com/apache/spark/commit/5df750dcc364ea866aca7b2b9e3e921bd470b5f0). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org