Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-28 Thread via GitHub


cloud-fan closed pull request #46618: [SPARK-48159][SQL] Extending support for 
collated strings on datetime expressions
URL: https://github.com/apache/spark/pull/46618


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-28 Thread via GitHub


cloud-fan commented on PR #46618:
URL: https://github.com/apache/spark/pull/46618#issuecomment-2135727861

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-22 Thread via GitHub


nebojsa-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1610009218


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1587,237 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query = "select current_timezone()"
+  // Data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  }
+})
+  }
+
+  test("DayName expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query = "select dayname(current_date())"
+  // Data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  }
+})
+  }
+
+  test("ToUnixTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_unix_timestamp(collate('2021-01-01 00:00:00', 
'${collationName}'),
+  |collate('-MM-dd HH:mm:ss', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = LongType
+  val expectedResult = 1609488000L
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(expectedResult))
+})
+  }
+
+  test("FromUnixTime expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select from_unixtime(1609488000, collate('-MM-dd HH:mm:ss', 
'${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+val expectedResult = "2021-01-01 00:00:00"
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+checkAnswer(testQuery, Row(expectedResult))
+  }
+})
+  }
+
+  test("NextDay expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select next_day('2015-01-14', collate('TU', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = DateType
+val expectedResult = "2015-01-20"
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+checkAnswer(testQuery, Row(Date.valueOf(expectedResult)))
+  }
+})
+  }
+
+  test("FromUTCTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select from_utc_timestamp(collate('2016-08-31', '${collationName}'),
+  |collate('Asia/Seoul', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = TimestampType
+  val expectedResult = "2016-08-31 09:00:00.0"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(Timestamp.valueOf(expectedResult)))
+})
+  }
+
+  test("ToUTCTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_utc_timestamp(collate('2016-08-31 09:00:00', 
'${collationName}'),
+  |collate('Asia/Seoul', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = TimestampType
+  val expectedResult = "2016-08-31 00:00:00.0"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(Timestamp.valueOf(expectedResult)))
+})
+  }
+
+  test("ParseToDate expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_date(collate('2016-12-31', '${collationName}'),
+  |collate('-MM-dd', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = DateType
+  val expectedResult = "2016-12-31"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+

Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-20 Thread via GitHub


uros-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1606726766


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1587,237 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query = "select current_timezone()"
+  // Data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  }
+})
+  }
+
+  test("DayName expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query = "select dayname(current_date())"
+  // Data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  }
+})
+  }
+
+  test("ToUnixTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_unix_timestamp(collate('2021-01-01 00:00:00', 
'${collationName}'),
+  |collate('-MM-dd HH:mm:ss', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = LongType
+  val expectedResult = 1609488000L
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(expectedResult))
+})
+  }
+
+  test("FromUnixTime expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select from_unixtime(1609488000, collate('-MM-dd HH:mm:ss', 
'${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+val expectedResult = "2021-01-01 00:00:00"
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+checkAnswer(testQuery, Row(expectedResult))
+  }
+})
+  }
+
+  test("NextDay expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select next_day('2015-01-14', collate('TU', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = DateType
+val expectedResult = "2015-01-20"
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+checkAnswer(testQuery, Row(Date.valueOf(expectedResult)))
+  }
+})
+  }
+
+  test("FromUTCTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select from_utc_timestamp(collate('2016-08-31', '${collationName}'),
+  |collate('Asia/Seoul', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = TimestampType
+  val expectedResult = "2016-08-31 09:00:00.0"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(Timestamp.valueOf(expectedResult)))
+})
+  }
+
+  test("ToUTCTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_utc_timestamp(collate('2016-08-31 09:00:00', 
'${collationName}'),
+  |collate('Asia/Seoul', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = TimestampType
+  val expectedResult = "2016-08-31 00:00:00.0"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(Timestamp.valueOf(expectedResult)))
+})
+  }
+
+  test("ParseToDate expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_date(collate('2016-12-31', '${collationName}'),
+  |collate('-MM-dd', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = DateType
+  val expectedResult = "2016-12-31"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+   

Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-20 Thread via GitHub


nebojsa-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1606703312


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1587,237 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query = "select current_timezone()"
+  // Data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  }
+})
+  }
+
+  test("DayName expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query = "select dayname(current_date())"
+  // Data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  }
+})
+  }
+
+  test("ToUnixTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_unix_timestamp(collate('2021-01-01 00:00:00', 
'${collationName}'),
+  |collate('-MM-dd HH:mm:ss', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = LongType
+  val expectedResult = 1609488000L
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(expectedResult))
+})
+  }
+
+  test("FromUnixTime expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select from_unixtime(1609488000, collate('-MM-dd HH:mm:ss', 
'${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+val expectedResult = "2021-01-01 00:00:00"
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+checkAnswer(testQuery, Row(expectedResult))
+  }
+})
+  }
+
+  test("NextDay expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select next_day('2015-01-14', collate('TU', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = DateType
+val expectedResult = "2015-01-20"
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+checkAnswer(testQuery, Row(Date.valueOf(expectedResult)))
+  }
+})
+  }
+
+  test("FromUTCTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select from_utc_timestamp(collate('2016-08-31', '${collationName}'),
+  |collate('Asia/Seoul', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = TimestampType
+  val expectedResult = "2016-08-31 09:00:00.0"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(Timestamp.valueOf(expectedResult)))
+})
+  }
+
+  test("ToUTCTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_utc_timestamp(collate('2016-08-31 09:00:00', 
'${collationName}'),
+  |collate('Asia/Seoul', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = TimestampType
+  val expectedResult = "2016-08-31 00:00:00.0"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(Timestamp.valueOf(expectedResult)))
+})
+  }
+
+  test("ParseToDate expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_date(collate('2016-12-31', '${collationName}'),
+  |collate('-MM-dd', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = DateType
+  val expectedResult = "2016-12-31"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+

Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-20 Thread via GitHub


uros-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1606554130


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1587,237 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query = "select current_timezone()"
+  // Data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  }
+})
+  }
+
+  test("DayName expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query = "select dayname(current_date())"
+  // Data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  }
+})
+  }
+
+  test("ToUnixTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_unix_timestamp(collate('2021-01-01 00:00:00', 
'${collationName}'),
+  |collate('-MM-dd HH:mm:ss', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = LongType
+  val expectedResult = 1609488000L
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(expectedResult))
+})
+  }
+
+  test("FromUnixTime expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select from_unixtime(1609488000, collate('-MM-dd HH:mm:ss', 
'${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+val expectedResult = "2021-01-01 00:00:00"
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+checkAnswer(testQuery, Row(expectedResult))
+  }
+})
+  }
+
+  test("NextDay expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select next_day('2015-01-14', collate('TU', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = DateType
+val expectedResult = "2015-01-20"
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+checkAnswer(testQuery, Row(Date.valueOf(expectedResult)))
+  }
+})
+  }
+
+  test("FromUTCTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select from_utc_timestamp(collate('2016-08-31', '${collationName}'),
+  |collate('Asia/Seoul', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = TimestampType
+  val expectedResult = "2016-08-31 09:00:00.0"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(Timestamp.valueOf(expectedResult)))
+})
+  }
+
+  test("ToUTCTimestamp expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_utc_timestamp(collate('2016-08-31 09:00:00', 
'${collationName}'),
+  |collate('Asia/Seoul', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = TimestampType
+  val expectedResult = "2016-08-31 00:00:00.0"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+  checkAnswer(testQuery, Row(Timestamp.valueOf(expectedResult)))
+})
+  }
+
+  test("ParseToDate expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  val query =
+s"""
+  |select to_date(collate('2016-12-31', '${collationName}'),
+  |collate('-MM-dd', '${collationName}'))
+  |""".stripMargin
+  // Result & data type check
+  val testQuery = sql(query)
+  val dataType = DateType
+  val expectedResult = "2016-12-31"
+  assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+   

Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-17 Thread via GitHub


uros-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1604620097


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1584,234 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query = "select current_timezone()"
+  // Result
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  }
+})
+  }
+
+  test("DayName expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query = "select dayname(current_date())"
+  // Result
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  }
+})
+  }
+
+  test("ToUnixTimestamp expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query =
+s"""
+  |select to_unix_timestamp(collate('2021-01-01 00:00:00', 
'${collationName}'),
+  |collate('-MM-dd HH:mm:ss', '${collationName}'))
+  |""".stripMargin
+  // Result
+  val testQuery = sql(query)
+  val dataType = LongType
+  val expectedResult = 1609488000L
+  assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  assertResult(expectedResult)(testQuery.collect().head.getLong(0))
+})
+  }
+
+  test("FromUnixTime expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query =
+s"""
+  |select from_unixtime(1609488000, collate('-MM-dd HH:mm:ss', 
'${collationName}'))
+  |""".stripMargin
+  // Result
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+val expectedResult = "2021-01-01 00:00:00"
+assertResult(dataType)(testQuery.schema.fields.head.dataType)
+assertResult(expectedResult)(testQuery.collect().head.getString(0))
+  }
+})
+  }
+
+  test("NextDay expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query =
+s"""
+  |select next_day('2015-01-14', collate('TU', '${collationName}'))
+  |""".stripMargin
+  // Result
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = DateType
+val expectedResult = "2015-01-20"
+assertResult(dataType)(testQuery.schema.fields.head.dataType)
+
assertResult(expectedResult)(testQuery.collect().head.getDate(0).toString)
+  }
+})
+  }
+
+  test("FromUTCTimestamp expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query =
+s"""
+  |select from_utc_timestamp(collate('2016-08-31', '${collationName}'),
+  |collate('Asia/Seoul', '${collationName}'))
+  |""".stripMargin
+  // Result
+  val testQuery = sql(query)
+  val dataType = TimestampType
+  val expectedResult = "2016-08-31 09:00:00.0"
+  assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  
assertResult(expectedResult)(testQuery.collect().head.getTimestamp(0).toString)
+})
+  }
+
+  test("ToUTCTimestamp expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query =
+s"""
+  |select to_utc_timestamp(collate('2016-08-31 09:00:00', 
'${collationName}'),
+  |collate('Asia/Seoul', '${collationName}'))
+  |""".stripMargin
+  // Result
+  val testQuery = sql(query)
+  val dataType = TimestampType
+  val expectedResult = "2016-08-31 00:00:00.0"
+  assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  
assertResult(expectedResult)(testQuery.collect().head.getTimestamp(0).toString)
+})
+  }
+
+  test("ParseToDate expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query =
+s"""
+  |select to_d

Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-17 Thread via GitHub


uros-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1604614789


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1584,234 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {

Review Comment:
   since we're now using this `Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", 
"UNICODE", "UNICODE_CI")` a lot, let's separate it out and call it something 
like `testCollationsSeq`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-17 Thread via GitHub


uros-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1604612518


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1584,234 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query = "select current_timezone()"
+  // Result

Review Comment:
   (goes for other similar tests too)



##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1584,234 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query = "select current_timezone()"
+  // Result
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  }
+})
+  }
+
+  test("DayName expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query = "select dayname(current_date())"
+  // Result
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  }
+})
+  }
+
+  test("ToUnixTimestamp expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query =
+s"""
+  |select to_unix_timestamp(collate('2021-01-01 00:00:00', 
'${collationName}'),
+  |collate('-MM-dd HH:mm:ss', '${collationName}'))
+  |""".stripMargin
+  // Result
+  val testQuery = sql(query)
+  val dataType = LongType
+  val expectedResult = 1609488000L
+  assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  assertResult(expectedResult)(testQuery.collect().head.getLong(0))

Review Comment:
   (goes for other similar tests too)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-17 Thread via GitHub


uros-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1604611590


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1584,234 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query = "select current_timezone()"
+  // Result
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  }
+})
+  }
+
+  test("DayName expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query = "select dayname(current_date())"
+  // Result
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  }
+})
+  }
+
+  test("ToUnixTimestamp expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query =
+s"""
+  |select to_unix_timestamp(collate('2021-01-01 00:00:00', 
'${collationName}'),
+  |collate('-MM-dd HH:mm:ss', '${collationName}'))
+  |""".stripMargin
+  // Result
+  val testQuery = sql(query)
+  val dataType = LongType
+  val expectedResult = 1609488000L
+  assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  assertResult(expectedResult)(testQuery.collect().head.getLong(0))

Review Comment:
   use `checkAnswer` instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-17 Thread via GitHub


uros-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1604609258


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1584,234 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query = "select current_timezone()"
+  // Result

Review Comment:
   ```suggestion
 // Data type
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-17 Thread via GitHub


uros-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1604608546


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1584,234 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query = "select current_timezone()"
+  // Result

Review Comment:
   this is not a result check, but rather a Data type check



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-16 Thread via GitHub


nebojsa-db commented on PR #46618:
URL: https://github.com/apache/spark/pull/46618#issuecomment-2115535153

   @cloud-fan  Please review :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-16 Thread via GitHub


mihailom-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1603211716


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1584,240 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query =
+s"""
+  |select current_timezone()
+  |""".stripMargin

Review Comment:
   practice is that if a string can go to one line we should do it, and this 
one seems small enough



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-16 Thread via GitHub


mihailom-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1603210909


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala:
##
@@ -2938,20 +2940,20 @@ case class Extract(field: Expression, source: 
Expression, replacement: Expressio
 object Extract {
   def createExpr(funcName: String, field: Expression, source: Expression): 
Expression = {
 // both string and null literals are allowed.
-if ((field.dataType == StringType || field.dataType == NullType) && 
field.foldable) {
-  val fieldStr = field.eval().asInstanceOf[UTF8String]

Review Comment:
   You do not need to change this here to pattern match, you can use 
`field.dataType.isInstanceOf[StringType]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-16 Thread via GitHub


mihailom-db commented on code in PR #46618:
URL: https://github.com/apache/spark/pull/46618#discussion_r1603212139


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -1584,6 +1584,240 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("CurrentTimeZone expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query =
+s"""
+  |select current_timezone()
+  |""".stripMargin
+  // Result
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+assertResult(dataType)(testQuery.schema.fields.head.dataType)
+  }
+})
+  }
+
+  test("DayName expression with collation") {
+// Supported collations
+Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", 
"UNICODE_CI").foreach(collationName => {
+  val query =
+s"""
+  |select dayname(current_date())

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-16 Thread via GitHub


nebojsa-db commented on PR #46618:
URL: https://github.com/apache/spark/pull/46618#issuecomment-2115033658

   Please take a look @nikolamand-db @stefankandic @uros-db @mihailom-db 
@dbatomic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-16 Thread via GitHub


nebojsa-db opened a new pull request, #46618:
URL: https://github.com/apache/spark/pull/46618

   ### What changes were proposed in this pull request?
   This PR introduces changes that will allow for collated strings to be passed 
to various datetime expressions or return value as collated string from those 
expressions.
   Impacted datetime expressions:
   
   - current_timezone
   - to_unix_timestamp
   - from_unixtime
   - next_day
   - from_utc_timestamp
   - to_utc_timestamp
   - to_date
   - to_timestamp
   - trunc
   - date_trunc
   - make_timestamp
   - date_part
   - convert_timezone
   
   
   ### Why are the changes needed?
   This PR is part of ongoing effort to support collated strings on SparkSQL.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, users will be able to use collated strings for datetime expressions.
   
   
   ### How was this patch tested?
   Added corresponding tests.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org