[GitHub] [spark] SparkQA removed a comment on pull request #30326: [MINOR][GRAPHX] Correct typos in the sub-modules: graphx, external, and examples
SparkQA removed a comment on pull request #30326: URL: https://github.com/apache/spark/pull/30326#issuecomment-725261615 **[Test build #130916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130916/testReport)** for PR 30326 at commit [`4203606`](https://github.com/apache/spark/commit/4203606bf33ad903441ea2a8be81f9f9fcf997a2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30326: [MINOR][GRAPHX] Correct typos in the sub-modules: graphx, external, and examples
AmplabJenkins removed a comment on pull request #30326: URL: https://github.com/apache/spark/pull/30326#issuecomment-725268481 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
cloud-fan commented on a change in pull request #30297: URL: https://github.com/apache/spark/pull/30297#discussion_r521179168 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -1919,9 +1920,14 @@ case class ArrayPosition(left: Expression, right: Expression) b """, since = "2.4.0") -case class ElementAt(left: Expression, right: Expression) +case class ElementAt( +left: Expression, +right: Expression, +failOnError: Boolean = SQLConf.get.ansiEnabled) Review comment: Making it a parameter is more robust to retain this info. Otherwise, we may change it when transform and copy the expresion. This also helps if we want to support SAFE prefix like https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30326: [MINOR][GRAPHX] Correct typos in the sub-modules: graphx, external, and examples
AmplabJenkins commented on pull request #30326: URL: https://github.com/apache/spark/pull/30326#issuecomment-725268481 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30326: [MINOR][GRAPHX] Correct typos in the sub-modules: graphx, external, and examples
SparkQA commented on pull request #30326: URL: https://github.com/apache/spark/pull/30326#issuecomment-725268456 **[Test build #130916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130916/testReport)** for PR 30326 at commit [`4203606`](https://github.com/apache/spark/commit/4203606bf33ad903441ea2a8be81f9f9fcf997a2). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
AmplabJenkins removed a comment on pull request #30297: URL: https://github.com/apache/spark/pull/30297#issuecomment-725265973 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130917/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
SparkQA removed a comment on pull request #30297: URL: https://github.com/apache/spark/pull/30297#issuecomment-725264615 **[Test build #130917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130917/testReport)** for PR 30297 at commit [`a9312a0`](https://github.com/apache/spark/commit/a9312a0546e891266323423e007f65d78ce49ff4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
AmplabJenkins removed a comment on pull request #30297: URL: https://github.com/apache/spark/pull/30297#issuecomment-725265964 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
viirya commented on a change in pull request #30297: URL: https://github.com/apache/spark/pull/30297#discussion_r521175748 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -1919,9 +1920,14 @@ case class ArrayPosition(left: Expression, right: Expression) b """, since = "2.4.0") -case class ElementAt(left: Expression, right: Expression) +case class ElementAt( +left: Expression, +right: Expression, +failOnError: Boolean = SQLConf.get.ansiEnabled) Review comment: Why not just have `val failOnError: Boolean = SQLConf.get.ansiEnabled`? Do you need to assign it a value other than `SQLConf.get.ansiEnabled` when constructing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
SparkQA commented on pull request #30297: URL: https://github.com/apache/spark/pull/30297#issuecomment-725265949 **[Test build #130917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130917/testReport)** for PR 30297 at commit [`a9312a0`](https://github.com/apache/spark/commit/a9312a0546e891266323423e007f65d78ce49ff4). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
viirya commented on a change in pull request #30297: URL: https://github.com/apache/spark/pull/30297#discussion_r521175748 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -1919,9 +1920,14 @@ case class ArrayPosition(left: Expression, right: Expression) b """, since = "2.4.0") -case class ElementAt(left: Expression, right: Expression) +case class ElementAt( +left: Expression, +right: Expression, +failOnError: Boolean = SQLConf.get.ansiEnabled) Review comment: Why just have `val failOnError: Boolean = SQLConf.get.ansiEnabled`? Do you need to assign it a value other than `SQLConf.get.ansiEnabled` when constructing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
AmplabJenkins commented on pull request #30297: URL: https://github.com/apache/spark/pull/30297#issuecomment-725265964 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
SparkQA commented on pull request #30297: URL: https://github.com/apache/spark/pull/30297#issuecomment-725264607 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30309: [WIP][SPARK-33407][PYTHON] Simplify the exception message from Python UDFs
SparkQA commented on pull request #30309: URL: https://github.com/apache/spark/pull/30309#issuecomment-725264247 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35516/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
SparkQA commented on pull request #30297: URL: https://github.com/apache/spark/pull/30297#issuecomment-725263945 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35515/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak edited a comment on pull request #30292: [SPARK-33166][DOC] Provide Search Function in Spark docs site
sarutak edited a comment on pull request #30292: URL: https://github.com/apache/spark/pull/30292#issuecomment-725262955 > @sarutak Because this PR not merged and not publish to the online Spark document. Yes. But even if this is published, the search results consistently refer to the latest release at the time point. Imagine that this feature is merged to 3.1 and then 3.2 is released in the future, and someone still uses Spark 3.1. In that case, even though the user uses this feature for 3.1, the search result refer to the document for 3.2 right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak edited a comment on pull request #30292: [SPARK-33166][DOC] Provide Search Function in Spark docs site
sarutak edited a comment on pull request #30292: URL: https://github.com/apache/spark/pull/30292#issuecomment-725262955 > @sarutak Because this PR not merged and not publish to the online Spark document. Yes. But even if this is published, the search results consistently refers to the latest release at the time point. Imagine that this feature is merged to 3.1 and then 3.2 is released in the future, and someone still uses Spark 3.1. In that case, even though the user uses this feature for 3.1, one moves to the document for 3.2 right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak edited a comment on pull request #30292: [SPARK-33166][DOC] Provide Search Function in Spark docs site
sarutak edited a comment on pull request #30292: URL: https://github.com/apache/spark/pull/30292#issuecomment-725262955 > @sarutak Because this PR not merged and not publish to the online Spark document. Yes. But even if this is published the search results consistently refers to the latest release at the time point. Imagine that this feature is merged to 3.1 and then 3.2 is released in the future, and someone still uses Spark 3.1. In that case, even though the user uses this feature for 3.1, one moves to the document for 3.2 right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak edited a comment on pull request #30292: [SPARK-33166][DOC] Provide Search Function in Spark docs site
sarutak edited a comment on pull request #30292: URL: https://github.com/apache/spark/pull/30292#issuecomment-725262955 > @sarutak Because this PR not merged and not publish to the online Spark document. Yes. But even if this is published, the search results consistently refer to the latest release at the time point. Imagine that this feature is merged to 3.1 and then 3.2 is released in the future, and someone still uses Spark 3.1. In that case, even though the user uses this feature for 3.1, one moves to the document for 3.2 right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
leanken commented on a change in pull request #30297: URL: https://github.com/apache/spark/pull/30297#discussion_r521172555 ## File path: docs/sql-ref-ansi-compliance.md ## @@ -111,6 +111,13 @@ SELECT * FROM t; The behavior of some SQL functions can be different under ANSI mode (`spark.sql.ansi.enabled=true`). - `size`: This function returns null for null input under ANSI mode. Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #30292: [SPARK-33166][DOC] Provide Search Function in Spark docs site
sarutak commented on pull request #30292: URL: https://github.com/apache/spark/pull/30292#issuecomment-725262955 > @sarutak Because this PR not merged and not publish to the online Spark document. Yes. But even if this is published the search results consistently refers to the latest release at the time point. Imagine that this feature is merged to 3.1 and then 3.2 is released in the future, and someone still uses Spark 3.1. In that case, even though the user uses this feature for 3.1, one moves to the document for 3.2 right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30326: [MINOR][GRAPHX] Correct typos
SparkQA commented on pull request #30326: URL: https://github.com/apache/spark/pull/30326#issuecomment-725261615 **[Test build #130916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130916/testReport)** for PR 30326 at commit [`4203606`](https://github.com/apache/spark/commit/4203606bf33ad903441ea2a8be81f9f9fcf997a2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #30297: [SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound
maropu commented on pull request #30297: URL: https://github.com/apache/spark/pull/30297#issuecomment-725261520 Looks fine otherwise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30325: [SPARK-33416][SQL] Avoid Hive metastore stack overflow when InSet predicate have many values
AmplabJenkins removed a comment on pull request #30325: URL: https://github.com/apache/spark/pull/30325#issuecomment-725261010 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] li36909 commented on pull request #30248: [SPARK-33339][PYTHON] Pyspark application will hang due to non Exception error
li36909 commented on pull request #30248: URL: https://github.com/apache/spark/pull/30248#issuecomment-725260785 > It has a conflict in branch-2.4. It's sort of a corner case so I think we don't bother porting it back. @li36909, please go ahead and open a PR to backport if you're willing to do. ok, I will open a PR at branch-2.4, thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30325: [SPARK-33416][SQL] Avoid Hive metastore stack overflow when InSet predicate have many values
SparkQA commented on pull request #30325: URL: https://github.com/apache/spark/pull/30325#issuecomment-725260995 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35513/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30325: [SPARK-33416][SQL] Avoid Hive metastore stack overflow when InSet predicate have many values
AmplabJenkins commented on pull request #30325: URL: https://github.com/apache/spark/pull/30325#issuecomment-725261010 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30297: [SPARK-33386][SQL] Accessing array elements should failed if index is out of bound.
maropu commented on a change in pull request #30297: URL: https://github.com/apache/spark/pull/30297#discussion_r521169867 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ## @@ -231,15 +232,23 @@ case class ConcatWs(children: Seq[Expression]) */ // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(n, input1, input2, ...) - Returns the `n`-th input, e.g., returns `input2` when `n` is 2.", + usage = """ +_FUNC_(n, input1, input2, ...) - Returns the `n`-th input, e.g., returns `input2` when `n` is 2. +If the index exceeds the length of the array, Returns NULL if ANSI mode is off; +Throws ArrayIndexOutOfBoundsException when ANSI mode is on. Review comment: The same comment with https://github.com/apache/spark/pull/30297#discussion_r521166711. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30297: [SPARK-33386][SQL] Accessing array elements should failed if index is out of bound.
maropu commented on a change in pull request #30297: URL: https://github.com/apache/spark/pull/30297#discussion_r521166711 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -1906,8 +1906,8 @@ case class ArrayPosition(left: Expression, right: Expression) @ExpressionDescription( usage = """ _FUNC_(array, index) - Returns element of array at given (1-based) index. If index < 0, - accesses elements from the last to the first. Returns NULL if the index exceeds the length - of the array. + accesses elements from the last to the first. If the index exceeds the length of the array, + Returns NULL if ANSI mode is off; Throws ArrayIndexOutOfBoundsException when ANSI mode is on. Review comment: How about rewriting it like this? ``` The function returns NULL if the index exceeds the length of the array and `spark.sql.ansi.enabled` is set to false. If `spark.sql.ansi.enabled` is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices. ``` by referring to the `Size` usage: https://github.com/apache/spark/blob/8760032f4f7e1ef36fee6afc45923d3826ef14fc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L79-L82 ## File path: docs/sql-ref-ansi-compliance.md ## @@ -111,6 +111,13 @@ SELECT * FROM t; The behavior of some SQL functions can be different under ANSI mode (`spark.sql.ansi.enabled=true`). - `size`: This function returns null for null input under ANSI mode. Review comment: (This is not related to this PR though) could you remove `under ANSI mode` in this statement, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
AmplabJenkins removed a comment on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-725257940 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
AmplabJenkins commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-725257940 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30309: [WIP][SPARK-33407][PYTHON] Simplify the exception message from Python UDFs
SparkQA commented on pull request #30309: URL: https://github.com/apache/spark/pull/30309#issuecomment-725257686 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35514/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer edited a comment on pull request #30292: [SPARK-33166][DOC] Provide Search Function in Spark docs site
beliefer edited a comment on pull request #30292: URL: https://github.com/apache/spark/pull/30292#issuecomment-725256974 @sarutak Because this PR not merged and not publish to the online Spark document. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
SparkQA removed a comment on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-725094540 **[Test build #130900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130900/testReport)** for PR 2 at commit [`15bac5b`](https://github.com/apache/spark/commit/15bac5bfecb209ba7b6963d83423b659fbc5086d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #30292: [SPARK-33166][DOC] Provide Search Function in Spark docs site
beliefer commented on pull request #30292: URL: https://github.com/apache/spark/pull/30292#issuecomment-725256974 @sarutak Because this PR not merged and publish to the online Spark document. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
SparkQA commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-725256865 **[Test build #130900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130900/testReport)** for PR 2 at commit [`15bac5b`](https://github.com/apache/spark/commit/15bac5bfecb209ba7b6963d83423b659fbc5086d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
SparkQA commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-725255595 **[Test build #130915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130915/testReport)** for PR 2 at commit [`d039c33`](https://github.com/apache/spark/commit/d039c33de33ea4bab4cea3170925c0c4f92ca771). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
beliefer commented on a change in pull request #2: URL: https://github.com/apache/spark/pull/2#discussion_r521164688 ## File path: sql/core/src/test/resources/sql-functions/sql-expression-schema.md ## @@ -346,4 +346,4 @@ | org.apache.spark.sql.catalyst.expressions.xml.XPathList | xpath | SELECT xpath('b1b2b3c1c2','a/b/text()') | structb1b2b3c1c2, a/b/text()):array> | | org.apache.spark.sql.catalyst.expressions.xml.XPathLong | xpath_long | SELECT xpath_long('12', 'sum(a/b)') | struct12, sum(a/b)):bigint> | | org.apache.spark.sql.catalyst.expressions.xml.XPathShort | xpath_short | SELECT xpath_short('12', 'sum(a/b)') | struct12, sum(a/b)):smallint> | -| org.apache.spark.sql.catalyst.expressions.xml.XPathString | xpath_string | SELECT xpath_string('bcc','a/c') | structbcc, a/c):string> | \ No newline at end of file +| org.apache.spark.sql.catalyst.expressions.xml.XPathString | xpath_string | SELECT xpath_string('bcc','a/c') | structbcc, a/c):string> | Review comment: I tried revert it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
beliefer commented on a change in pull request #2: URL: https://github.com/apache/spark/pull/2#discussion_r521163422 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -178,6 +180,86 @@ case class Like(left: Expression, right: Expression, escapeChar: Char) } } +/** + * Optimized version of LIKE ALL, when all pattern values are literal. + */ +abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant { + + protected def patterns: Seq[Any] + + protected def isNotDefined: Boolean + + override def inputTypes: Seq[DataType] = StringType :: Nil + + override def dataType: DataType = BooleanType + + override def nullable: Boolean = true + + private lazy val hasNull: Boolean = patterns.contains(null) + + private lazy val cache = patterns.filterNot(_ == null) +.map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\'))) + + override def eval(input: InternalRow): Any = { +if (hasNull) { + null Review comment: Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30327: [WIP] Test
SparkQA commented on pull request #30327: URL: https://github.com/apache/spark/pull/30327#issuecomment-725252621 **[Test build #130914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130914/testReport)** for PR 30327 at commit [`0e6f09f`](https://github.com/apache/spark/commit/0e6f09f9b7a7984c93e80269aace51f66a3662b3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30325: [SPARK-33416][SQL] Avoid Hive metastore stack overflow when InSet predicate have many values
SparkQA commented on pull request #30325: URL: https://github.com/apache/spark/pull/30325#issuecomment-725252148 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35513/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30318: [SPARK-33412][SQL] OverwriteByExpression should resolve its delete condition based on the table relation not the input query
HyukjinKwon commented on pull request #30318: URL: https://github.com/apache/spark/pull/30318#issuecomment-725250995 Merged to master. It has conflict with branch-3.0. @cloud-fan mind opening a PR for branch-3.0? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
beliefer commented on a change in pull request #2: URL: https://github.com/apache/spark/pull/2#discussion_r521160412 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1408,7 +1408,20 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging case Some(SqlBaseParser.ANY) | Some(SqlBaseParser.SOME) => getLikeQuantifierExprs(ctx.expression).reduceLeft(Or) case Some(SqlBaseParser.ALL) => -getLikeQuantifierExprs(ctx.expression).reduceLeft(And) +validate(!ctx.expression.isEmpty, "Expected something between '(' and ')'.", ctx) +val expressions = ctx.expression.asScala.map(expression) +if (expressions.size > 200 && expressions.forall(_.foldable)) { Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #30318: [SPARK-33412][SQL] OverwriteByExpression should resolve its delete condition based on the table relation not the input query
HyukjinKwon closed pull request #30318: URL: https://github.com/apache/spark/pull/30318 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30327: [WIP] Test
SparkQA commented on pull request #30327: URL: https://github.com/apache/spark/pull/30327#issuecomment-725249562 **[Test build #130912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130912/testReport)** for PR 30327 at commit [`52602f4`](https://github.com/apache/spark/commit/52602f43ef3b80630375937f970932948527a7ff). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
SparkQA commented on pull request #30299: URL: https://github.com/apache/spark/pull/30299#issuecomment-725249564 **[Test build #130913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130913/testReport)** for PR 30299 at commit [`7047924`](https://github.com/apache/spark/commit/70479247efe6a6b4b3e4d653281ce3a4ea8c5224). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 opened a new pull request #30327: [WIP] Test
WeichenXu123 opened a new pull request #30327: URL: https://github.com/apache/spark/pull/30327 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30297: [SPARK-33386][SQL] Accessing array elements should failed if index is out of bound.
SparkQA commented on pull request #30297: URL: https://github.com/apache/spark/pull/30297#issuecomment-725246332 **[Test build #130911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130911/testReport)** for PR 30297 at commit [`6109838`](https://github.com/apache/spark/commit/610983835626ae5afb7bc7fd6ec4efa0aec9f548). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30309: [WIP][SPARK-33407][PYTHON] Simplify the exception message from Python UDFs
SparkQA commented on pull request #30309: URL: https://github.com/apache/spark/pull/30309#issuecomment-725246240 **[Test build #130910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130910/testReport)** for PR 30309 at commit [`2c7b4af`](https://github.com/apache/spark/commit/2c7b4af57adbe627fd8d9322a13702dded136daa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30309: [WIP][SPARK-33407][PYTHON] Simplify the exception message from Python UDFs
HyukjinKwon commented on a change in pull request #30309: URL: https://github.com/apache/spark/pull/30309#discussion_r521155393 ## File path: python/pyspark/worker.py ## @@ -604,17 +604,19 @@ def process(): # reuse. TaskContext._setTaskContext(None) BarrierTaskContext._setTaskContext(None) -except BaseException: +except BaseException as e: try: -exc_info = traceback.format_exc() -if isinstance(exc_info, bytes): -# exc_info may contains other encoding bytes, replace the invalid bytes and convert -# it back to utf-8 again -exc_info = exc_info.decode("utf-8", "replace").encode("utf-8") Review comment: We dropped Python 2. It doesn't need it anymore. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30297: [SPARK-33386][SQL] Accessing array elements should failed if index is out of bound.
cloud-fan commented on a change in pull request #30297: URL: https://github.com/apache/spark/pull/30297#discussion_r521152883 ## File path: docs/sql-ref-ansi-compliance.md ## @@ -111,6 +111,13 @@ SELECT * FROM t; The behavior of some SQL functions can be different under ANSI mode (`spark.sql.ansi.enabled=true`). - `size`: This function returns null for null input under ANSI mode. + - `element_at`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices. + - `elt`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices. + +### SQL Operators + +The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`). + - `GetArrayItem`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices. Review comment: to be more user facing, `GetArrayItem` -> `array_col[index]` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #30297: [SPARK-33386][SQL] Accessing array elements should failed if index is out of bound.
leanken commented on pull request #30297: URL: https://github.com/apache/spark/pull/30297#issuecomment-725240046 updated, @cloud-fan and @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30297: [SPARK-33386][SQL] Accessing array elements should failed if index is out of bound.
SparkQA commented on pull request #30297: URL: https://github.com/apache/spark/pull/30297#issuecomment-725240297 **[Test build #130909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130909/testReport)** for PR 30297 at commit [`f8cfa5b`](https://github.com/apache/spark/commit/f8cfa5be4573a495e5fd281fd665a1c140be4c0f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30297: [SPARK-33386][SQL] Accessing array elements should failed if index is out of bound.
cloud-fan commented on a change in pull request #30297: URL: https://github.com/apache/spark/pull/30297#discussion_r521150251 ## File path: docs/sql-ref-ansi-compliance.md ## @@ -111,6 +111,8 @@ SELECT * FROM t; The behavior of some SQL functions can be different under ANSI mode (`spark.sql.ansi.enabled=true`). - `size`: This function returns null for null input under ANSI mode. + - `element_at`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices under ANSI mode. + - `elt`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices under ANSI mode. Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29339: [SPARK-32512][SQL] add alter table add/drop partition command for datasourcev2
AmplabJenkins removed a comment on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-725238424 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35512/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29339: [SPARK-32512][SQL] add alter table add/drop partition command for datasourcev2
AmplabJenkins commented on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-725238415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29339: [SPARK-32512][SQL] add alter table add/drop partition command for datasourcev2
AmplabJenkins removed a comment on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-725238415 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29339: [SPARK-32512][SQL] add alter table add/drop partition command for datasourcev2
SparkQA commented on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-725238388 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35512/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression
AmplabJenkins removed a comment on pull request #27429: URL: https://github.com/apache/spark/pull/27429#issuecomment-725237671 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30297: [SPARK-33386][SQL] Accessing array elements should failed if index is out of bound.
maropu commented on a change in pull request #30297: URL: https://github.com/apache/spark/pull/30297#discussion_r521149081 ## File path: docs/sql-ref-ansi-compliance.md ## @@ -111,6 +111,8 @@ SELECT * FROM t; The behavior of some SQL functions can be different under ANSI mode (`spark.sql.ansi.enabled=true`). - `size`: This function returns null for null input under ANSI mode. + - `element_at`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices under ANSI mode. + - `elt`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices under ANSI mode. Review comment: nit: how about removing "`under ANSI mode`" in each entry? They look redundant. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression
AmplabJenkins commented on pull request #27429: URL: https://github.com/apache/spark/pull/27429#issuecomment-725237671 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30309: [WIP][SPARK-33407][PYTHON] Simplify the exception message from Python UDFs
SparkQA commented on pull request #30309: URL: https://github.com/apache/spark/pull/30309#issuecomment-725237233 **[Test build #130908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130908/testReport)** for PR 30309 at commit [`6b74f5f`](https://github.com/apache/spark/commit/6b74f5f04b44c78708ffbd26470316303bb657ef). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression
SparkQA removed a comment on pull request #27429: URL: https://github.com/apache/spark/pull/27429#issuecomment-725082859 **[Test build #130898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130898/testReport)** for PR 27429 at commit [`bbe10d6`](https://github.com/apache/spark/commit/bbe10d61cd3845ba0d0a031dfacde1a2861df8db). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30297: [SPARK-33386][SQL] Accessing array elements should failed if index is out of bound.
maropu commented on a change in pull request #30297: URL: https://github.com/apache/spark/pull/30297#discussion_r521147000 ## File path: docs/sql-ref-ansi-compliance.md ## @@ -111,6 +111,8 @@ SELECT * FROM t; The behavior of some SQL functions can be different under ANSI mode (`spark.sql.ansi.enabled=true`). - `size`: This function returns null for null input under ANSI mode. + - `element_at`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices under ANSI mode. + - `elt`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices under ANSI mode. Review comment: I think its better to describe the behaviour change of `GetArrayItem`, too, so how about creating a new subsection for it like `Other SQL Operations`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression
SparkQA commented on pull request #27429: URL: https://github.com/apache/spark/pull/27429#issuecomment-725236676 **[Test build #130898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130898/testReport)** for PR 27429 at commit [`bbe10d6`](https://github.com/apache/spark/commit/bbe10d61cd3845ba0d0a031dfacde1a2861df8db). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
luluorta commented on a change in pull request #30299: URL: https://github.com/apache/spark/pull/30299#discussion_r521143561 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala ## @@ -188,27 +188,29 @@ class BooleanSimplificationSuite extends PlanTest with ExpressionEvalHelper with checkCondition(!(('e || 'f) && ('g || 'h)), (!'e && !'f) || (!'g && !'h)) } - private val caseInsensitiveConf = new SQLConf().copy(SQLConf.CASE_SENSITIVE -> false) - private val caseInsensitiveAnalyzer = new Analyzer( -new SessionCatalog(new InMemoryCatalog, EmptyFunctionRegistry, caseInsensitiveConf), -caseInsensitiveConf) + private val analyzer = new Analyzer( +new SessionCatalog(new InMemoryCatalog, EmptyFunctionRegistry)) test("(a && b) || (a && c) => a && (b || c) when case insensitive") { -val plan = caseInsensitiveAnalyzer.execute( - testRelation.where(('a > 2 && 'b > 3) || ('A > 2 && 'b < 5))) -val actual = Optimize.execute(plan) -val expected = caseInsensitiveAnalyzer.execute( - testRelation.where('a > 2 && ('b > 3 || 'b < 5))) -comparePlans(actual, expected) +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + val plan = analyzer.execute( +testRelation.where(('a > 2 && 'b > 3) || ('A > 2 && 'b < 5))) + val actual = Optimize.execute(plan) + val expected = analyzer.execute( +testRelation.where('a > 2 && ('b > 3 || 'b < 5))) + comparePlans(actual, expected) +} } test("(a || b) && (a || c) => a || (b && c) when case insensitive") { -val plan = caseInsensitiveAnalyzer.execute( - testRelation.where(('a > 2 || 'b > 3) && ('A > 2 || 'b < 5))) -val actual = Optimize.execute(plan) -val expected = caseInsensitiveAnalyzer.execute( - testRelation.where('a > 2 || ('b > 3 && 'b < 5))) -comparePlans(actual, expected) +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { Review comment: reverted ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala ## @@ -60,8 +59,6 @@ case class HiveTableScanExec( require(partitionPruningPred.isEmpty || relation.isPartitioned, "Partition pruning predicates only supported for partitioned tables.") - override def conf: SQLConf = sparkSession.sessionState.conf Review comment: reverted This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
luluorta commented on a change in pull request #30299: URL: https://github.com/apache/spark/pull/30299#discussion_r521143263 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala ## @@ -188,27 +188,29 @@ class BooleanSimplificationSuite extends PlanTest with ExpressionEvalHelper with checkCondition(!(('e || 'f) && ('g || 'h)), (!'e && !'f) || (!'g && !'h)) } - private val caseInsensitiveConf = new SQLConf().copy(SQLConf.CASE_SENSITIVE -> false) - private val caseInsensitiveAnalyzer = new Analyzer( -new SessionCatalog(new InMemoryCatalog, EmptyFunctionRegistry, caseInsensitiveConf), -caseInsensitiveConf) + private val analyzer = new Analyzer( +new SessionCatalog(new InMemoryCatalog, EmptyFunctionRegistry)) test("(a && b) || (a && c) => a && (b || c) when case insensitive") { -val plan = caseInsensitiveAnalyzer.execute( - testRelation.where(('a > 2 && 'b > 3) || ('A > 2 && 'b < 5))) -val actual = Optimize.execute(plan) -val expected = caseInsensitiveAnalyzer.execute( - testRelation.where('a > 2 && ('b > 3 || 'b < 5))) -comparePlans(actual, expected) +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { Review comment: reverted This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
luluorta commented on a change in pull request #30299: URL: https://github.com/apache/spark/pull/30299#discussion_r521142212 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -42,7 +42,7 @@ import org.apache.spark.sql.catalyst.trees.TreeNodeRef import org.apache.spark.sql.catalyst.util.toPrettySQL import org.apache.spark.sql.connector.catalog._ import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ -import org.apache.spark.sql.connector.catalog.TableChange.{AddColumn, After, ColumnChange, ColumnPosition, DeleteColumn, RenameColumn, UpdateColumnComment, UpdateColumnNullability, UpdateColumnPosition, UpdateColumnType} +import org.apache.spark.sql.connector.catalog.TableChange.{First => _, _} Review comment: reverted This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
luluorta commented on a change in pull request #30299: URL: https://github.com/apache/spark/pull/30299#discussion_r521142283 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ## @@ -61,34 +61,38 @@ class SessionCatalog( externalCatalogBuilder: () => ExternalCatalog, globalTempViewManagerBuilder: () => GlobalTempViewManager, functionRegistry: FunctionRegistry, -conf: SQLConf, hadoopConf: Configuration, parser: ParserInterface, -functionResourceLoader: FunctionResourceLoader) extends Logging { +functionResourceLoader: FunctionResourceLoader, +cacheSize: Int = SQLConf.get.tableRelationCacheSize, +cacheTTL: Long = SQLConf.get.metadataCacheTTL) extends Logging with HasConf { import SessionCatalog._ import CatalogTypes.TablePartitionSpec // For testing only. def this( externalCatalog: ExternalCatalog, functionRegistry: FunctionRegistry, - conf: SQLConf) = { + staticConf: SQLConf) = { Review comment: reverted This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
luluorta commented on a change in pull request #30299: URL: https://github.com/apache/spark/pull/30299#discussion_r521141904 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/HasConf.scala ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst + +import org.apache.spark.sql.internal.SQLConf + +/** + * Trait for shared SQLConf. Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30325: [SPARK-33416][SQL] Avoid Hive metastore stack overflow when InSet predicate have many values
SparkQA commented on pull request #30325: URL: https://github.com/apache/spark/pull/30325#issuecomment-725231145 **[Test build #130907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130907/testReport)** for PR 30325 at commit [`f9d018c`](https://github.com/apache/spark/commit/f9d018c28a860c65c5ba47bd28f2a23a3b5d2be3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
luluorta commented on a change in pull request #30299: URL: https://github.com/apache/spark/pull/30299#discussion_r521134961 ## File path: sql/core/src/test/resources/sql-tests/inputs/datetime.sql ## @@ -149,7 +149,7 @@ select to_timestamp('2019-10-06 A', '-MM-dd G'); select to_timestamp('22 05 2020 Friday', 'dd MM EE'); select to_timestamp('22 05 2020 Friday', 'dd MM E'); select unix_timestamp('22 05 2020 Friday', 'dd MM E'); -select from_json('{"time":"26/October/2015"}', 'time Timestamp', map('timestampFormat', 'dd/M/')); +select from_json('{"timestamp":"26/October/2015"}', 'timestamp Timestamp', map('timestampFormat', 'dd/M/')); Review comment: After this PR, dynamically set "spark.sql.ansi.enabled" actually takes effect in parsing phase. This query will fails parsing cause `time` is a reserved key word of SQL standard. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29339: [SPARK-32512][SQL] add alter table add/drop partition command for datasourcev2
SparkQA commented on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-725228809 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35512/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30326: [MINOR][GRAPHX] Correct typos
AmplabJenkins removed a comment on pull request #30326: URL: https://github.com/apache/spark/pull/30326#issuecomment-725228634 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30326: [MINOR][GRAPHX] Correct typos
SparkQA commented on pull request #30326: URL: https://github.com/apache/spark/pull/30326#issuecomment-725228620 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35511/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30326: [MINOR][GRAPHX] Correct typos
AmplabJenkins commented on pull request #30326: URL: https://github.com/apache/spark/pull/30326#issuecomment-725228634 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30315: [SPARK-33388][SQL] Merge In and InSet predicate
AmplabJenkins removed a comment on pull request #30315: URL: https://github.com/apache/spark/pull/30315#issuecomment-725228239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30315: [SPARK-33388][SQL] Merge In and InSet predicate
AmplabJenkins commented on pull request #30315: URL: https://github.com/apache/spark/pull/30315#issuecomment-725228239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #30324: [SPARK-33417][SQL][TEST] Correct the behaviour of query filters in TPCDSQueryBenchmark
maropu commented on pull request #30324: URL: https://github.com/apache/spark/pull/30324#issuecomment-725228254 Thanks, @dongjoon-hyun @HyukjinKwon ! Merged to master/branch-3.0/branch-2.4. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30315: [SPARK-33388][SQL] Merge In and InSet predicate
SparkQA removed a comment on pull request #30315: URL: https://github.com/apache/spark/pull/30315#issuecomment-725078337 **[Test build #130897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130897/testReport)** for PR 30315 at commit [`a3675d9`](https://github.com/apache/spark/commit/a3675d92941b0db08d2b7a36e63d3076f200797e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu closed pull request #30324: [SPARK-33417][SQL][TEST] Correct the behaviour of query filters in TPCDSQueryBenchmark
maropu closed pull request #30324: URL: https://github.com/apache/spark/pull/30324 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30315: [SPARK-33388][SQL] Merge In and InSet predicate
SparkQA commented on pull request #30315: URL: https://github.com/apache/spark/pull/30315#issuecomment-725227411 **[Test build #130897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130897/testReport)** for PR 30315 at commit [`a3675d9`](https://github.com/apache/spark/commit/a3675d92941b0db08d2b7a36e63d3076f200797e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait BaseIn extends Predicate ` * `case class In(value: Expression, override val list: Seq[Expression]) extends BaseIn ` * `case class InSet(value: Expression, override val hset: Set[Any]) extends BaseIn ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30318: [SPARK-33412][SQL] OverwriteByExpression should resolve its delete condition based on the table relation not the input query
AmplabJenkins removed a comment on pull request #30318: URL: https://github.com/apache/spark/pull/30318#issuecomment-725226485 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30318: [SPARK-33412][SQL] OverwriteByExpression should resolve its delete condition based on the table relation not the input query
SparkQA commented on pull request #30318: URL: https://github.com/apache/spark/pull/30318#issuecomment-725226468 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35510/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30318: [SPARK-33412][SQL] OverwriteByExpression should resolve its delete condition based on the table relation not the input query
AmplabJenkins commented on pull request #30318: URL: https://github.com/apache/spark/pull/30318#issuecomment-725226485 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
wangyum commented on a change in pull request #2: URL: https://github.com/apache/spark/pull/2#discussion_r521135433 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -178,6 +180,86 @@ case class Like(left: Expression, right: Expression, escapeChar: Char) } } +/** + * Optimized version of LIKE ALL, when all pattern values are literal. + */ +abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant { + + protected def patterns: Seq[Any] + + protected def isNotDefined: Boolean + + override def inputTypes: Seq[DataType] = StringType :: Nil + + override def dataType: DataType = BooleanType + + override def nullable: Boolean = true + + private lazy val hasNull: Boolean = patterns.contains(null) + + private lazy val cache = patterns.filterNot(_ == null) +.map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\'))) + + override def eval(input: InternalRow): Any = { +if (hasNull) { + null Review comment: ```sql spark-sql> select 'a' like all ('%a%', null); NULL spark-sql> select 'a' not like all ('%a%', null); false spark-sql> select 'a' like any ('%a%', null); true spark-sql> select 'a' not like any ('%a%', null); NULL ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30325: [SPARK-33416][SQL] Avoid Hive metastore stack overflow when InSet predicate have many values
maropu commented on a change in pull request #30325: URL: https://github.com/apache/spark/pull/30325#discussion_r521126955 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ## @@ -1267,9 +1267,19 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat val catalogTable = restoreTableMetadata(rawTable) val partColNameMap = buildLowerCasePartColNameMap(catalogTable) +val hivePredicates = predicates.map { + // Avoid Hive metastore stack overflow. + case InSet(child, values) Review comment: Ah, ok. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30325: [SPARK-33416][SQL] Avoid Hive metastore stack overflow when InSet predicate have many values
maropu commented on a change in pull request #30325: URL: https://github.com/apache/spark/pull/30325#discussion_r521135541 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -54,6 +54,15 @@ object StaticSQLConf { .transform(_.toLowerCase(Locale.ROOT)) .createWithDefault("global_temp") + val HIVE_METASTORE_PARTITION_PRUNING_INSET_THRESHOLD = Review comment: hm, I see. I think users might lower the value on runtime just after they see the exception, so IMO it is useful that users can update the value on runtime. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
wangyum commented on a change in pull request #2: URL: https://github.com/apache/spark/pull/2#discussion_r521135433 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -178,6 +180,86 @@ case class Like(left: Expression, right: Expression, escapeChar: Char) } } +/** + * Optimized version of LIKE ALL, when all pattern values are literal. + */ +abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant { + + protected def patterns: Seq[Any] + + protected def isNotDefined: Boolean + + override def inputTypes: Seq[DataType] = StringType :: Nil + + override def dataType: DataType = BooleanType + + override def nullable: Boolean = true + + private lazy val hasNull: Boolean = patterns.contains(null) + + private lazy val cache = patterns.filterNot(_ == null) +.map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\'))) + + override def eval(input: InternalRow): Any = { +if (hasNull) { + null Review comment: ```sql spark-sql> select 'a' like all ('%a%', null); NULL spark-sql> select 'a' not like all ('%a%', null); false ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
luluorta commented on a change in pull request #30299: URL: https://github.com/apache/spark/pull/30299#discussion_r521134961 ## File path: sql/core/src/test/resources/sql-tests/inputs/datetime.sql ## @@ -149,7 +149,7 @@ select to_timestamp('2019-10-06 A', '-MM-dd G'); select to_timestamp('22 05 2020 Friday', 'dd MM EE'); select to_timestamp('22 05 2020 Friday', 'dd MM E'); select unix_timestamp('22 05 2020 Friday', 'dd MM E'); -select from_json('{"time":"26/October/2015"}', 'time Timestamp', map('timestampFormat', 'dd/M/')); +select from_json('{"timestamp":"26/October/2015"}', 'timestamp Timestamp', map('timestampFormat', 'dd/M/')); Review comment: After this PR, dynamically set "spark.sql.ansi.enabled" actually takes effect in parsing phase. This query will failed parsing cause `time` is a reserved key word of SQL standard. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
beliefer commented on a change in pull request #2: URL: https://github.com/apache/spark/pull/2#discussion_r521133744 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ## @@ -102,6 +102,8 @@ package object dsl { def like(other: Expression, escapeChar: Char = '\\'): Expression = Like(expr, other, escapeChar) def rlike(other: Expression): Expression = RLike(expr, other) +def likeAll(others: Literal*): Expression = LikeAll(expr, others.map(_.eval(EmptyRow))) Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30225: [SPARK-33187][SQL] Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method
maropu commented on a change in pull request #30225: URL: https://github.com/apache/spark/pull/30225#discussion_r521132554 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -815,6 +815,14 @@ object SQLConf { .booleanConf .createWithDefault(true) + val HIVE_METASTORE_PARTITION_LIMIT = +buildConf("spark.sql.hive.metastorePartitionLimit") + .doc("The maximum number of metastore partitions allowed. Use -1 for unlimited.") Review comment: The current statement looks a bit ambiguous, so how about following the statement of the Hive config you pointed out? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30225: [SPARK-33187][SQL] Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method
maropu commented on a change in pull request #30225: URL: https://github.com/apache/spark/pull/30225#discussion_r521132100 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ## @@ -773,26 +773,7 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { filters.flatMap(convert).mkString(" and ") } - private def quoteStringLiteral(str: String): String = { Review comment: Why did you move this func? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30326: [MINOR][GRAPHX] Correct typos
SparkQA commented on pull request #30326: URL: https://github.com/apache/spark/pull/30326#issuecomment-725221115 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35511/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30225: [SPARK-33187][SQL] Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method
maropu commented on a change in pull request #30225: URL: https://github.com/apache/spark/pull/30225#discussion_r521131433 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ## @@ -1233,6 +1242,34 @@ private[client] class Shim_v2_1 extends Shim_v2_0 { override def alterPartitions(hive: Hive, tableName: String, newParts: JList[Partition]): Unit = { alterPartitionsMethod.invoke(hive, tableName, newParts, environmentContextInAlterTable) } + + override def getPartitionsByFilter( + hive: Hive, + table: Table, + predicates: Seq[Expression]): Seq[Partition] = { + +// Hive getPartitionsByFilter() takes a string that represents partition +// predicates like "str_key=\"value\" and int_key=1 ..." +val filter = convertFilters(table, predicates) + +val limit = SQLConf.get.metastorePartitionLimit +if (limit > -1) { + val num = try { +getNumPartitionsByFilterMethod.invoke(hive, table, filter).asInstanceOf[Int] + } catch { +case ex: Exception => + logWarning("Caught Hive MetaException attempting to get partition metadata by " + Review comment: Could you make the message clearer? e.g., ...get the number of partitions from , but using 0 for a partition number This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30225: [SPARK-33187][SQL] Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method
maropu commented on a change in pull request #30225: URL: https://github.com/apache/spark/pull/30225#discussion_r521130875 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ## @@ -1233,6 +1242,34 @@ private[client] class Shim_v2_1 extends Shim_v2_0 { override def alterPartitions(hive: Hive, tableName: String, newParts: JList[Partition]): Unit = { alterPartitionsMethod.invoke(hive, tableName, newParts, environmentContextInAlterTable) } + + override def getPartitionsByFilter( + hive: Hive, + table: Table, + predicates: Seq[Expression]): Seq[Partition] = { + +// Hive getPartitionsByFilter() takes a string that represents partition +// predicates like "str_key=\"value\" and int_key=1 ..." +val filter = convertFilters(table, predicates) + +val limit = SQLConf.get.metastorePartitionLimit +if (limit > -1) { + val num = try { +getNumPartitionsByFilterMethod.invoke(hive, table, filter).asInstanceOf[Int] + } catch { +case ex: Exception => Review comment: `case NonFatal(_) `? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #30325: [SPARK-33416][SQL] Avoid Hive metastore stack overflow when InSet predicate have many values
wangyum commented on a change in pull request #30325: URL: https://github.com/apache/spark/pull/30325#discussion_r521130075 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -54,6 +54,15 @@ object StaticSQLConf { .transform(_.toLowerCase(Locale.ROOT)) .createWithDefault("global_temp") + val HIVE_METASTORE_PARTITION_PRUNING_INSET_THRESHOLD = Review comment: Yes, for 2 reasons: 1. This parameter should be set according to your own Hive Metastore and does not need to be modified frequently. 2. All SQL configs in `HiveExternalCatalog` are static config, e.g.: `SCHEMA_STRING_LENGTH_THRESHOLD` and `DEBUG_MODE`. Of course, we can make this parameter to runtime config if needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30318: [SPARK-33412][SQL] OverwriteByExpression should resolve its delete condition based on the table relation not the input query
SparkQA commented on pull request #30318: URL: https://github.com/apache/spark/pull/30318#issuecomment-725218357 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35510/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30225: [SPARK-33187][SQL] Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method
maropu commented on a change in pull request #30225: URL: https://github.com/apache/spark/pull/30225#discussion_r521128972 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -815,6 +815,14 @@ object SQLConf { .booleanConf .createWithDefault(true) + val HIVE_METASTORE_PARTITION_LIMIT = +buildConf("spark.sql.hive.metastorePartitionLimit") + .doc("The maximum number of metastore partitions allowed. Use -1 for unlimited.") + .version("3.0.2") + .intConf + .checkValue(_ >= -1, "The maximum must be a positive integer, -1 to apply no limit.") + .createWithDefault(10) Review comment: `-1` to follow the Hive config? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30225: [SPARK-33187][SQL] Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method
maropu commented on a change in pull request #30225: URL: https://github.com/apache/spark/pull/30225#discussion_r521128775 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -815,6 +815,14 @@ object SQLConf { .booleanConf .createWithDefault(true) + val HIVE_METASTORE_PARTITION_LIMIT = +buildConf("spark.sql.hive.metastorePartitionLimit") + .doc("The maximum number of metastore partitions allowed. Use -1 for unlimited.") + .version("3.0.2") Review comment: `3.0.2` -> `3.1.0` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org