[PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
uros-db opened a new pull request, #45422: URL: https://github.com/apache/spark/pull/45422 ### What changes were proposed in this pull request? ### Why are the changes needed? Currently, all `StringType` arguments passed to built-in string functions in Spark SQL get treated

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
dbatomic commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1516510742 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1517022572 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1517236308 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-08 Thread via GitHub
MaxGekk commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1518490602 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationTypeConstraints.scala: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-12 Thread via GitHub
cloud-fan commented on PR #45422: URL: https://github.com/apache/spark/pull/45422#issuecomment-1991557191 Without updating `StringType.acceptsType`, I'm not confident to find out all functions that expect StringType but do not support collation. -- This is an automated message from the Ap

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-12 Thread via GitHub
uros-db commented on PR #45422: URL: https://github.com/apache/spark/pull/45422#issuecomment-1991590743 @cloud-fan yes, that is a problem... should we settle only on `string functions` for now? I think these functions that are meant to work with Strings are more sensitive to this error

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-12 Thread via GitHub
cloud-fan commented on PR #45422: URL: https://github.com/apache/spark/pull/45422#issuecomment-1991702090 I don't think it's safe to only handle expressions in `regexpExpressions.scala`. For example, `Substring` is not there. I don't know how to collect all functions that take `StringType`,

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-13 Thread via GitHub
uros-db commented on PR #45422: URL: https://github.com/apache/spark/pull/45422#issuecomment-1994394143 @cloud-fan that makes a lot of sense, to combat this - now new case classes should handle this. essentially: - `StringType` no longer accepts all collationIds, but only the default col

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1525855043 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -65,9 +64,43 @@ class StringType private(val collationId: Int) extends AtomicType with S

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1525857828 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -702,9 +702,13 @@ abstract class TypeCoercionBase { }.getOrEl

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1525952969 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -956,9 +956,19 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
dbatomic commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1525990277 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -69,6 +69,7 @@ public static class Collation { * byte for byte equ

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
dbatomic commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1525992996 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -65,9 +64,41 @@ class StringType private(val collationId: Int) extends AtomicType with Se

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526006045 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -65,9 +64,41 @@ class StringType private(val collationId: Int) extends AtomicType with S

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
dbatomic commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526026385 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -65,9 +64,41 @@ class StringType private(val collationId: Int) extends AtomicType with Se

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
uros-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526059294 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -65,9 +64,41 @@ class StringType private(val collationId: Int) extends AtomicType with Ser

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
uros-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526059294 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -65,9 +64,41 @@ class StringType private(val collationId: Int) extends AtomicType with Ser

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526067004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -994,8 +994,10 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526084649 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala: ## @@ -205,6 +205,10 @@ object AnsiTypeCoercion extends TypeCoercionBase

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
uros-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526085933 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -994,8 +994,10 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
uros-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526092915 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala: ## @@ -205,6 +205,10 @@ object AnsiTypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
uros-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1517236308 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
mihailom-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526146584 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -994,8 +994,10 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
uros-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526085933 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -994,8 +994,10 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
dbatomic commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526193615 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -40,6 +40,7 @@ class StringType private(val collationId: Int) extends AtomicType with Ser

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
dbatomic commented on PR #45422: URL: https://github.com/apache/spark/pull/45422#issuecomment-1999648400 LGTM As a follow up we should revisit error messages. IMO it is weird to expose message with "string_any_collation" type to customer. But I think that we can do that as a follow u

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-15 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1526346919 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -994,8 +994,11 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
mihailom-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1529966896 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -40,6 +40,7 @@ class StringType private(val collationId: Int) extends AtomicType with

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
mihailom-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1530084248 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -994,8 +994,11 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1530673443 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala: ## @@ -205,6 +205,11 @@ object AnsiTypeCoercion extends TypeCoercionBase

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1530674212 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala: ## @@ -205,6 +205,11 @@ object AnsiTypeCoercion extends TypeCoercionBase

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1530675927 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala: ## @@ -215,6 +220,10 @@ object AnsiTypeCoercion extends TypeCoercionBase

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1530677086 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -994,8 +994,12 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1530682774 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala: ## @@ -205,6 +205,11 @@ object AnsiTypeCoercion extends TypeCoercionBase

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1530684010 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -994,8 +994,12 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1530687285 ## sql/core/src/test/scala/org/apache/spark/sql/CollationRegexpExpressionsSuite.scala: ## @@ -0,0 +1,438 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
cloud-fan commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1530688780 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
mihailom-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1530992551 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala: ## @@ -215,6 +220,10 @@ object AnsiTypeCoercion extends TypeCoercionBa

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
mihailom-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1531040306 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -994,8 +994,12 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
mihailom-db commented on PR #45422: URL: https://github.com/apache/spark/pull/45422#issuecomment-2008104827 > LGTM > > As a follow up we should revisit error messages. IMO it is weird to expose message with "string_any_collation" type to customer. But I think that we can do that as a

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-20 Thread via GitHub
cloud-fan commented on PR #45422: URL: https://github.com/apache/spark/pull/45422#issuecomment-2009773392 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-20 Thread via GitHub
cloud-fan closed pull request #45422: [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations URL: https://github.com/apache/spark/pull/45422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the