nikolamand-db commented on code in PR #46180:
URL: https://github.com/apache/spark/pull/46180#discussion_r1601334716


##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -245,29 +599,26 @@ public static StringSearch getStringSearch(
    * Returns the collation id for the given collation name.
    */
   public static int collationNameToId(String collationName) throws 
SparkException {
-    String normalizedName = collationName.toUpperCase();
-    if (collationNameToIdMap.containsKey(normalizedName)) {
-      return collationNameToIdMap.get(normalizedName);
-    } else {
-      Collation suggestion = Collections.min(List.of(collationTable), 
Comparator.comparingInt(
-        c -> UTF8String.fromString(c.collationName).levenshteinDistance(
-          UTF8String.fromString(normalizedName))));
-
-      Map<String, String> params = new HashMap<>();
-      params.put("collationName", collationName);
-      params.put("proposal", suggestion.collationName);
-
-      throw new SparkException(
-        "COLLATION_INVALID_NAME", 
SparkException.constructMessageParams(params), null);
-    }
+    return Collation.CollationSpec.collationNameToId(collationName);
+  }
+
+  public static Collation fetchCollationUnsafe(int collationId) throws 
SparkException {
+    return Collation.CollationSpec.fetchCollation(collationId);
   }
 
   public static Collation fetchCollation(int collationId) {
-    return collationTable[collationId];
+    try {
+      return fetchCollationUnsafe(collationId);
+    } catch (SparkException e) {
+      return Collation.CollationSpecUTF8Binary.UTF8_BINARY_COLLATION;
+    }

Review Comment:
   The idea for this function is that it is free of exceptions because we 
assume internal implementation will always call the function with valid 
collation id parameter obtained earlier by parsing collation name string. We 
forbid the user to explicitly pass collation id to `StringType` by marking this 
constructor as private.
   
   However, internal fetch with collation id does potentially throw an 
exception. So by returning `UTF8_BINARY` if the error does occur (which would 
indicate code logic problems - internal error) we don't need to change the 
signature of this function to throw an exception and propagate the change to 
numerous places where function is called (mainly in `CollationSupport`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to