Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
cloud-fan closed pull request #46165: [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions URL: https://github.com/apache/spark/pull/46165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
cloud-fan commented on PR #46165: URL: https://github.com/apache/spark/pull/46165#issuecomment-2079297307 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
uros-db commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1579263303 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import scala.collection.immutable.Seq + +import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types.{MapType, StringType} + +// scalastyle:off nonascii +class CollationSQLExpressionsSuite Review Comment: for example, Mihailo is working on some more expressions right now, and they don't fall under stringExpressions nor regexpExpressions, so I suggest we keep `CollationSQLExpressionsSuite.scala` for these purposes and then we can migrate CollationStringExpressionsSuite & CollationRegexpExpressionsSuite into CollationSQLExpressionsSuite later on -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
uros-db commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1578946140 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import scala.collection.immutable.Seq + +import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types.{MapType, StringType} + +// scalastyle:off nonascii +class CollationSQLExpressionsSuite Review Comment: We do have separate suites for `stringExpressions` and `regexpExpressions`, which are the two larger "groups" of SQL expressions. So now it's a bit unclear on where to put these tests, and there may be more to come for some expression that don't fall into any of the previous 2 categories So it's either multiple smaller suites or 1 large suite to cover them all. Should we go with this for now and then I'll shuffle these tests around in a separate effort in order to unify them under a single suite? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
cloud-fan commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1578940945 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import scala.collection.immutable.Seq + +import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types.{MapType, StringType} + +// scalastyle:off nonascii +class CollationSQLExpressionsSuite Review Comment: We already have multiple test suite for string collation, can you justify the need of a new suite? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
cloud-fan commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1577440230 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -79,12 +81,14 @@ import org.apache.spark.unsafe.types.UTF8String object MaskExpressionBuilder extends ExpressionBuilder { override def functionSignature: Option[FunctionSignature] = { val strArg = InputParameter("str") -val upperCharArg = InputParameter("upperChar", Some(Literal(Mask.MASKED_UPPERCASE))) -val lowerCharArg = InputParameter("lowerChar", Some(Literal(Mask.MASKED_LOWERCASE))) -val digitCharArg = InputParameter("digitChar", Some(Literal(Mask.MASKED_DIGIT))) -val otherCharArg = InputParameter( - "otherChar", - Some(Literal(Mask.MASKED_IGNORE, StringType))) +val upperCharArg = InputParameter("upperChar", + Some(Literal(Mask.MASKED_UPPERCASE, SQLConf.get.defaultStringType))) Review Comment: nit: we can use `Literal.create`, so that we don't need to convert String to UTF8String ahead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
cloud-fan commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1575775863 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder { val digitCharArg = InputParameter("digitChar", Some(Literal(Mask.MASKED_DIGIT))) val otherCharArg = InputParameter( "otherChar", - Some(Literal(Mask.MASKED_IGNORE, StringType))) + Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType))) Review Comment: We did this before but reverted. `Literal` is a low-level thing and it's better to not access SQLConf there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
uros-db commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1575760634 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder { val digitCharArg = InputParameter("digitChar", Some(Literal(Mask.MASKED_DIGIT))) val otherCharArg = InputParameter( "otherChar", - Some(Literal(Mask.MASKED_IGNORE, StringType))) + Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType))) Review Comment: update: I fixed this in the latest commit, but I don't like the idea of doing this manually instead, I think we should consider updating `literals.scala` to construct string literals with respect to `defaultStringType` instead of `StringType` if that sound good @cloud-fan, then I'll get to that a separate effort -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
uros-db commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1575736465 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder { val digitCharArg = InputParameter("digitChar", Some(Literal(Mask.MASKED_DIGIT))) val otherCharArg = InputParameter( "otherChar", - Some(Literal(Mask.MASKED_IGNORE, StringType))) + Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType))) Review Comment: Agreed, all 4 string literals should be `SQLConf.get.defaultStringType` here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
mihailom-db commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1575723275 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder { val digitCharArg = InputParameter("digitChar", Some(Literal(Mask.MASKED_DIGIT))) val otherCharArg = InputParameter( "otherChar", - Some(Literal(Mask.MASKED_IGNORE, StringType))) + Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType))) Review Comment: Why are we then leaving digitCharArg as StringType? Either they all take StringType or SQLConf.get.defaultStringType. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
uros-db commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1575711699 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder { val digitCharArg = InputParameter("digitChar", Some(Literal(Mask.MASKED_DIGIT))) val otherCharArg = InputParameter( "otherChar", - Some(Literal(Mask.MASKED_IGNORE, StringType))) + Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType))) Review Comment: `MASKED_IGNORE` is null, so we have to specify a type for this literal in func signature so instead of `StringType(0)` it should only be correct to get the default string type we enforce implicit cast for this expression, so in a situation where the first parameter takes the default string type (for example when not using explicit collation) there would be a conflict with this `StringType(0)` if it were to stay here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
uros-db commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1575701941 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -54,7 +54,7 @@ object CollationTypeCasts extends TypeCoercionRule { case otherExpr @ ( _: In | _: InSubquery | _: CreateArray | _: ArrayJoin | _: Concat | _: Greatest | _: Least | - _: Coalesce | _: BinaryExpression | _: ConcatWs) => + _: Coalesce | _: BinaryExpression | _: ConcatWs | _: Mask) => Review Comment: For `StringToMap`, the second and third parameters are treated as regex: > Both `pairDelim` and `keyValueDelim` are treated as regular expressions." so no casting there, and their collation shouldn't be taken into consideration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
cloud-fan commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1575699776 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder { val digitCharArg = InputParameter("digitChar", Some(Literal(Mask.MASKED_DIGIT))) val otherCharArg = InputParameter( "otherChar", - Some(Literal(Mask.MASKED_IGNORE, StringType))) + Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType))) Review Comment: why do we use the default string type as the expected input type here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
uros-db commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1575701941 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -54,7 +54,7 @@ object CollationTypeCasts extends TypeCoercionRule { case otherExpr @ ( _: In | _: InSubquery | _: CreateArray | _: ArrayJoin | _: Concat | _: Greatest | _: Least | - _: Coalesce | _: BinaryExpression | _: ConcatWs) => + _: Coalesce | _: BinaryExpression | _: ConcatWs | _: Mask) => Review Comment: For StringToMap, the second and third parameters are treated as regex: `Both `pairDelim` and `keyValueDelim` are treated as regular expressions."` so no casting there, and their collation shouldn't be taken into consideration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]
cloud-fan commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1575700039 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -54,7 +54,7 @@ object CollationTypeCasts extends TypeCoercionRule { case otherExpr @ ( _: In | _: InSubquery | _: CreateArray | _: ArrayJoin | _: Concat | _: Greatest | _: Least | - _: Coalesce | _: BinaryExpression | _: ConcatWs) => + _: Coalesce | _: BinaryExpression | _: ConcatWs | _: Mask) => Review Comment: should `StringToMap` have the same requirement? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org