[ https://issues.apache.org/jira/browse/SPARK-46830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810853#comment-17810853 ]
Aleksandar Tomic commented on SPARK-46830: ------------------------------------------ [~kabhwan] Please take a look now. > Introducing collation concept into Spark > ---------------------------------------- > > Key: SPARK-46830 > URL: https://issues.apache.org/jira/browse/SPARK-46830 > Project: Spark > Issue Type: Epic > Components: Spark Core > Affects Versions: 4.0.0 > Reporter: Aleksandar Tomic > Priority: Major > > This feature will introduce collation support to the Spark engine. This means > that: > > # Every StringType will have an associated collation. Default remains UTF8 > Binary, which will behave under the same rules as current UTF8 String > comparison. > # Collation will be respected in all collation sensitive operations - > comparisons, hashing, string operations (contains, startWith, endsWith etc.) > # Collation can be set through following ways: > ## COLLATE expression. e.g. strExpr COLLATE collation_name > ## In CREATE TABLE column definition > ## By setting session collation. > # All the Spark operators need to respect collation settings (filters, > joins, shuffles, aggs etc.) > > This is a high level description of the feature. You can find detailed design > under > [this|https://docs.google.com/document/d/1G3Xap-0Aj-QC6qoWZDDqO84IulHnogjD1REE3yh1_jk/edit?usp=sharing] > link. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org