Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]

2024-03-06 Thread via GitHub


uros-db commented on code in PR #45405:
URL: https://github.com/apache/spark/pull/45405#discussion_r1515603380


##
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##
@@ -1096,7 +1096,7 @@ colPosition
 ;
 
 collateClause
-: COLLATE collationName=stringLit
+: COLLATE collationName=multipartIdentifier

Review Comment:
   @cloud-fan related to your comment, I'm just wondering what would be a 
better rule for this (if any)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]

2024-03-06 Thread via GitHub


uros-db commented on code in PR #45405:
URL: https://github.com/apache/spark/pull/45405#discussion_r1515603380


##
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##
@@ -1096,7 +1096,7 @@ colPosition
 ;
 
 collateClause
-: COLLATE collationName=stringLit
+: COLLATE collationName=multipartIdentifier

Review Comment:
   @cloud-fan related to your comment, I'm just wondering what would be a 
better rule for this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]

2024-03-06 Thread via GitHub


uros-db commented on code in PR #45405:
URL: https://github.com/apache/spark/pull/45405#discussion_r1515587070


##
python/pyspark/sql/tests/test_types.py:
##
@@ -862,15 +862,13 @@ def test_parse_datatype_string(self):
 if k != "varchar" and k != "char":
 self.assertEqual(t(), _parse_datatype_string(k))
 self.assertEqual(IntegerType(), _parse_datatype_string("int"))
-self.assertEqual(StringType(), _parse_datatype_string("string COLLATE 
'UCS_BASIC'"))
+self.assertEqual(StringType(), _parse_datatype_string("string COLLATE 
UCS_BASIC"))
 self.assertEqual(StringType(0), _parse_datatype_string("string"))
-self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 
'UCS_BASIC'"))
-self.assertEqual(StringType(0), _parse_datatype_string("string   
COLLATE 'UCS_BASIC'"))
-self.assertEqual(StringType(0), _parse_datatype_string("string 
COLLATE'UCS_BASIC'"))
-self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 
'UCS_BASIC_LCASE'"))
-self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 
'UCS_BASIC_LCASE'"))
-self.assertEqual(StringType(2), _parse_datatype_string("string COLLATE 
'UNICODE'"))
-self.assertEqual(StringType(3), _parse_datatype_string("string COLLATE 
'UNICODE_CI'"))
+self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 
UCS_BASIC"))

Review Comment:
   perhaps that would be best as a separate change? this one seems already 
scattered enough, and I think there's plenty of other places that may require 
changes w/ respect to naming



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]

2024-03-06 Thread via GitHub


uros-db commented on code in PR #45405:
URL: https://github.com/apache/spark/pull/45405#discussion_r1515587070


##
python/pyspark/sql/tests/test_types.py:
##
@@ -862,15 +862,13 @@ def test_parse_datatype_string(self):
 if k != "varchar" and k != "char":
 self.assertEqual(t(), _parse_datatype_string(k))
 self.assertEqual(IntegerType(), _parse_datatype_string("int"))
-self.assertEqual(StringType(), _parse_datatype_string("string COLLATE 
'UCS_BASIC'"))
+self.assertEqual(StringType(), _parse_datatype_string("string COLLATE 
UCS_BASIC"))
 self.assertEqual(StringType(0), _parse_datatype_string("string"))
-self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 
'UCS_BASIC'"))
-self.assertEqual(StringType(0), _parse_datatype_string("string   
COLLATE 'UCS_BASIC'"))
-self.assertEqual(StringType(0), _parse_datatype_string("string 
COLLATE'UCS_BASIC'"))
-self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 
'UCS_BASIC_LCASE'"))
-self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 
'UCS_BASIC_LCASE'"))
-self.assertEqual(StringType(2), _parse_datatype_string("string COLLATE 
'UNICODE'"))
-self.assertEqual(StringType(3), _parse_datatype_string("string COLLATE 
'UNICODE_CI'"))
+self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 
UCS_BASIC"))

Review Comment:
   perhaps that would be best as a separate change? this one seems already 
scattered enough



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]

2024-03-06 Thread via GitHub


srielau commented on code in PR #45405:
URL: https://github.com/apache/spark/pull/45405#discussion_r1514694683


##
python/pyspark/sql/tests/test_types.py:
##
@@ -862,15 +862,13 @@ def test_parse_datatype_string(self):
 if k != "varchar" and k != "char":
 self.assertEqual(t(), _parse_datatype_string(k))
 self.assertEqual(IntegerType(), _parse_datatype_string("int"))
-self.assertEqual(StringType(), _parse_datatype_string("string COLLATE 
'UCS_BASIC'"))
+self.assertEqual(StringType(), _parse_datatype_string("string COLLATE 
UCS_BASIC"))
 self.assertEqual(StringType(0), _parse_datatype_string("string"))
-self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 
'UCS_BASIC'"))
-self.assertEqual(StringType(0), _parse_datatype_string("string   
COLLATE 'UCS_BASIC'"))
-self.assertEqual(StringType(0), _parse_datatype_string("string 
COLLATE'UCS_BASIC'"))
-self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 
'UCS_BASIC_LCASE'"))
-self.assertEqual(StringType(1), _parse_datatype_string("string COLLATE 
'UCS_BASIC_LCASE'"))
-self.assertEqual(StringType(2), _parse_datatype_string("string COLLATE 
'UNICODE'"))
-self.assertEqual(StringType(3), _parse_datatype_string("string COLLATE 
'UNICODE_CI'"))
+self.assertEqual(StringType(0), _parse_datatype_string("string COLLATE 
UCS_BASIC"))

Review Comment:
   Silly question, isn't this a good time to switch from UCS_BASIC to 
UTF8_BINARY?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]

2024-03-06 Thread via GitHub


cloud-fan commented on code in PR #45405:
URL: https://github.com/apache/spark/pull/45405#discussion_r1514563440


##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala:
##
@@ -218,6 +218,6 @@ class DataTypeAstBuilder extends 
SqlBaseParserBaseVisitor[AnyRef] {
* Returns a collation name.
*/
   override def visitCollateClause(ctx: CollateClauseContext): String = 
withOrigin(ctx) {
-string(visitStringLit(ctx.stringLit))
+ctx.multipartIdentifier().getText

Review Comment:
   This is a bit confusing. How do we turn a multi part identifier into a 
single string? using dot to connect? where is the implementation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org