mihailom-db commented on code in PR #47364: URL: https://github.com/apache/spark/pull/47364#discussion_r1686116407
########## docs/sql-ref-ansi-compliance.md: ########## @@ -442,6 +442,7 @@ Below is a list of all the keywords in Spark SQL. |CODEGEN|non-reserved|non-reserved|non-reserved| |COLLATE|reserved|non-reserved|reserved| |COLLATION|reserved|non-reserved|reserved| +|COLLATIONS|reserved|non-reserved|reserved| Review Comment: When we look at MySQL, they use `SHOW COLLATION`, PostgreSQL uses `pg_collation` catalog, so I would argue use of `COLLATION` keyword is sufficient, so let's keep this keyword free. ########## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ########## @@ -1465,4 +1465,24 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlanHelper { } } } + + test("show collations") { + assert(sql("SHOW COLLATIONS").collect().length == 562) + + checkAnswer(sql("SHOW COLLATIONS LIKE '*UTF8_BINARY*'"), + Row("UTF8_BINARY", "spark", "1.0", true, true, false)) + checkAnswer(sql("SHOW COLLATIONS '*zh_Hant_HKG*'"), + Seq(Row("zh_Hant_HKG", "icu", "153.120.0.0", false, false, false), Review Comment: This version seems weird, I would say it should contain 75.1 as the version of ICU library. @nikolamand-db do you have any more info on is this expected? ########## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ########## @@ -91,7 +91,7 @@ public Optional<String> getVersion() { /** * Entry encapsulating all information about a collation. */ - public static class Collation { + public static class Collation implements Comparable<Collation> { Review Comment: Is this really needed? When do we want to order them? I would say list/table building should be deterministic and should always output collations in the same way. Also I would expect `UTF8_*` family collations to come in first as they represent OSS internal implementations. ########## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ########## @@ -918,4 +967,8 @@ public static String getClosestSuggestionsOnInvalidName( return String.join(", ", suggestions); } + + public static List<Collation> fetchAllCollations() { Review Comment: This operation seems a bit too expensive. We always build the whole table and then do a like on it. We need to do something similar to show tables where we have the pattern, as we do not want to ask for collation information if like operator is not concerned with it. Also it feels pretty weird to have `fetchAllCollations` in many places like this. This is not as expensive as building the whole table, but we are also allocating ArrayList multiple times -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org