Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]
uros-db commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1515603380 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE collationName=stringLit +: COLLATE collationName=multipartIdentifier Review Comment: @cloud-fan related to your comment, I'm just wondering what would be a better rule for this (if any)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]
uros-db commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1515603380 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE collationName=stringLit +: COLLATE collationName=multipartIdentifier Review Comment: @cloud-fan related to your comment, I'm just wondering what would be a better rule for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]
uros-db commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1515587070 ## python/pyspark/sql/tests/test_types.py: ## @@ -862,15 +862,13 @@ def test_parse_datatype_string(self): if k != "varchar" and k != "char": self.assertEqual(t(), _parse_datatype_string(k)) self.assertEqual(IntegerType(), _parse_datatype_string("int")) -self.assertEqual(StringType(), _parse_datatype_string("string COLLATE 'UCS_BASIC'")) +self.assertEqual(StringType(), _parse_datatype_string("string COLLATE UCS_BASIC")) self.assertEqual(StringType(0), _parse_datatype_string("string")) -self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 'UCS_BASIC'")) -self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 'UCS_BASIC'")) -self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE'UCS_BASIC'")) -self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 'UCS_BASIC_LCASE'")) -self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 'UCS_BASIC_LCASE'")) -self.assertEqual(StringType(2), _parse_datatype_string("string COLLATE 'UNICODE'")) -self.assertEqual(StringType(3), _parse_datatype_string("string COLLATE 'UNICODE_CI'")) +self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE UCS_BASIC")) Review Comment: perhaps that would be best as a separate change? this one seems already scattered enough, and I think there's plenty of other places that may require changes w/ respect to naming -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]
uros-db commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1515587070 ## python/pyspark/sql/tests/test_types.py: ## @@ -862,15 +862,13 @@ def test_parse_datatype_string(self): if k != "varchar" and k != "char": self.assertEqual(t(), _parse_datatype_string(k)) self.assertEqual(IntegerType(), _parse_datatype_string("int")) -self.assertEqual(StringType(), _parse_datatype_string("string COLLATE 'UCS_BASIC'")) +self.assertEqual(StringType(), _parse_datatype_string("string COLLATE UCS_BASIC")) self.assertEqual(StringType(0), _parse_datatype_string("string")) -self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 'UCS_BASIC'")) -self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 'UCS_BASIC'")) -self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE'UCS_BASIC'")) -self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 'UCS_BASIC_LCASE'")) -self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 'UCS_BASIC_LCASE'")) -self.assertEqual(StringType(2), _parse_datatype_string("string COLLATE 'UNICODE'")) -self.assertEqual(StringType(3), _parse_datatype_string("string COLLATE 'UNICODE_CI'")) +self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE UCS_BASIC")) Review Comment: perhaps that would be best as a separate change? this one seems already scattered enough -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]
srielau commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1514694683 ## python/pyspark/sql/tests/test_types.py: ## @@ -862,15 +862,13 @@ def test_parse_datatype_string(self): if k != "varchar" and k != "char": self.assertEqual(t(), _parse_datatype_string(k)) self.assertEqual(IntegerType(), _parse_datatype_string("int")) -self.assertEqual(StringType(), _parse_datatype_string("string COLLATE 'UCS_BASIC'")) +self.assertEqual(StringType(), _parse_datatype_string("string COLLATE UCS_BASIC")) self.assertEqual(StringType(0), _parse_datatype_string("string")) -self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 'UCS_BASIC'")) -self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 'UCS_BASIC'")) -self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE'UCS_BASIC'")) -self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 'UCS_BASIC_LCASE'")) -self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 'UCS_BASIC_LCASE'")) -self.assertEqual(StringType(2), _parse_datatype_string("string COLLATE 'UNICODE'")) -self.assertEqual(StringType(3), _parse_datatype_string("string COLLATE 'UNICODE_CI'")) +self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE UCS_BASIC")) Review Comment: Silly question, isn't this a good time to switch from UCS_BASIC to UTF8_BINARY? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]
cloud-fan commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1514563440 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala: ## @@ -218,6 +218,6 @@ class DataTypeAstBuilder extends SqlBaseParserBaseVisitor[AnyRef] { * Returns a collation name. */ override def visitCollateClause(ctx: CollateClauseContext): String = withOrigin(ctx) { -string(visitStringLit(ctx.stringLit)) +ctx.multipartIdentifier().getText Review Comment: This is a bit confusing. How do we turn a multi part identifier into a single string? using dot to connect? where is the implementation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org