cloud-fan closed pull request #45422: [SPARK-47296][SQL][COLLATION] Fail
unsupported functions for non-binary collations
URL: https://github.com/apache/spark/pull/45422
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
cloud-fan commented on PR #45422:
URL: https://github.com/apache/spark/pull/45422#issuecomment-2009773392
thanks, merging to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
mihailom-db commented on PR #45422:
URL: https://github.com/apache/spark/pull/45422#issuecomment-2008104827
> LGTM
>
> As a follow up we should revisit error messages. IMO it is weird to expose
message with "string_any_collation" type to customer. But I think that we can
do that as
mihailom-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1531040306
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -994,8 +994,12 @@ object TypeCoercion extends TypeCoercionBase {
mihailom-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1530992551
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala:
##
@@ -215,6 +220,10 @@ object AnsiTypeCoercion extends
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1530688780
##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1530687285
##
sql/core/src/test/scala/org/apache/spark/sql/CollationRegexpExpressionsSuite.scala:
##
@@ -0,0 +1,438 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1530684010
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -994,8 +994,12 @@ object TypeCoercion extends TypeCoercionBase {
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1530682774
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala:
##
@@ -205,6 +205,11 @@ object AnsiTypeCoercion extends
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1530677086
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -994,8 +994,12 @@ object TypeCoercion extends TypeCoercionBase {
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1530675927
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala:
##
@@ -215,6 +220,10 @@ object AnsiTypeCoercion extends
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1530674212
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala:
##
@@ -205,6 +205,11 @@ object AnsiTypeCoercion extends
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1530673443
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala:
##
@@ -205,6 +205,11 @@ object AnsiTypeCoercion extends
mihailom-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1530084248
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -994,8 +994,11 @@ object TypeCoercion extends TypeCoercionBase {
mihailom-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1529966896
##
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala:
##
@@ -40,6 +40,7 @@ class StringType private(val collationId: Int) extends
AtomicType with
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526346919
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -994,8 +994,11 @@ object TypeCoercion extends TypeCoercionBase {
dbatomic commented on PR #45422:
URL: https://github.com/apache/spark/pull/45422#issuecomment-1999648400
LGTM
As a follow up we should revisit error messages. IMO it is weird to expose
message with "string_any_collation" type to customer. But I think that we can
do that as a follow
dbatomic commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526193615
##
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala:
##
@@ -40,6 +40,7 @@ class StringType private(val collationId: Int) extends
AtomicType with
uros-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526085933
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -994,8 +994,10 @@ object TypeCoercion extends TypeCoercionBase {
mihailom-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526146584
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -994,8 +994,10 @@ object TypeCoercion extends TypeCoercionBase {
uros-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1517236308
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala:
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation
uros-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526092915
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala:
##
@@ -205,6 +205,10 @@ object AnsiTypeCoercion extends TypeCoercionBase
uros-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526085933
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -994,8 +994,10 @@ object TypeCoercion extends TypeCoercionBase {
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526084649
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala:
##
@@ -205,6 +205,10 @@ object AnsiTypeCoercion extends
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526067004
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -994,8 +994,10 @@ object TypeCoercion extends TypeCoercionBase {
uros-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526059294
##
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala:
##
@@ -65,9 +64,41 @@ class StringType private(val collationId: Int) extends
AtomicType with
uros-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526059294
##
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala:
##
@@ -65,9 +64,41 @@ class StringType private(val collationId: Int) extends
AtomicType with
dbatomic commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526026385
##
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala:
##
@@ -65,9 +64,41 @@ class StringType private(val collationId: Int) extends
AtomicType with
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1526006045
##
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala:
##
@@ -65,9 +64,41 @@ class StringType private(val collationId: Int) extends
AtomicType with
dbatomic commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1525992996
##
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala:
##
@@ -65,9 +64,41 @@ class StringType private(val collationId: Int) extends
AtomicType with
dbatomic commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1525990277
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##
@@ -69,6 +69,7 @@ public static class Collation {
* byte for byte
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1525952969
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -956,9 +956,19 @@ object TypeCoercion extends TypeCoercionBase {
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1525857828
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -702,9 +702,13 @@ abstract class TypeCoercionBase {
cloud-fan commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1525855043
##
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala:
##
@@ -65,9 +64,43 @@ class StringType private(val collationId: Int) extends
AtomicType with
uros-db commented on PR #45422:
URL: https://github.com/apache/spark/pull/45422#issuecomment-1994394143
@cloud-fan that makes a lot of sense, to combat this - now new case classes
should handle this. essentially:
- `StringType` no longer accepts all collationIds, but only the default
cloud-fan commented on PR #45422:
URL: https://github.com/apache/spark/pull/45422#issuecomment-1991702090
I don't think it's safe to only handle expressions in
`regexpExpressions.scala`. For example, `Substring` is not there. I don't know
how to collect all functions that take
uros-db commented on PR #45422:
URL: https://github.com/apache/spark/pull/45422#issuecomment-1991590743
@cloud-fan yes, that is a problem... should we settle only on `string
functions` for now? I think these functions that are meant to work with Strings
are more sensitive to this error
cloud-fan commented on PR #45422:
URL: https://github.com/apache/spark/pull/45422#issuecomment-1991557191
Without updating `StringType.acceptsType`, I'm not confident to find out all
functions that expect StringType but do not support collation.
--
This is an automated message from the
MaxGekk commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1518490602
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationTypeConstraints.scala:
##
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software
uros-db commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1517236308
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala:
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation
HyukjinKwon commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1517022572
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala:
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software
dbatomic commented on code in PR #45422:
URL: https://github.com/apache/spark/pull/45422#discussion_r1516510742
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala:
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation
uros-db opened a new pull request, #45422:
URL: https://github.com/apache/spark/pull/45422
### What changes were proposed in this pull request?
### Why are the changes needed?
Currently, all `StringType` arguments passed to built-in string functions in
Spark SQL get
43 matches
Mail list logo