Aleksandar Tomic created SPARK-46830:
----------------------------------------

             Summary: Introducing collation concept into Spark
                 Key: SPARK-46830
                 URL: https://issues.apache.org/jira/browse/SPARK-46830
             Project: Spark
          Issue Type: Epic
          Components: Spark Core
    Affects Versions: 4.0.0
            Reporter: Aleksandar Tomic


This feature will introduce collation support to the Spark engine. This means 
that:

 
 # Every StringType will have an associated collation. Default remains UTF8 
Binary, which will behave under the same rules as current UTF8 String 
comparison.
 # Collation will be respected in all collation sensitive operations - 
comparisons, hashing, string operations (contains, startWith, endsWith etc.)
 # Collation can be set through following ways:
 ## COLLATE expression. e.g. strExpr COLLATE collation_name
 ## In CREATE TABLE column definition
 ## By setting session collation.
 # All the Spark operators need to respect collation settings (filters, joins, 
shuffles, aggs etc.)

 

This is a high level description of the feature. You can find detailed design 
under 
[this|https://docs.google.com/document/d/1G3Xap-0Aj-QC6qoWZDDqO84IulHnogjD1REE3yh1_jk/edit?usp=sharing]
 link.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to