[jira] [Commented] (SPARK-46830) Introducing collation concept into Spark

2024-04-25 Thread Gideon P (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840792#comment-17840792
 ] 

Gideon P commented on SPARK-46830:
--

[~uros-db] what should I work on next?

> Introducing collation concept into Spark
> 
>
> Key: SPARK-46830
> URL: https://issues.apache.org/jira/browse/SPARK-46830
> Project: Spark
>  Issue Type: Epic
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Priority: Major
> Attachments: Collation Support in Spark.docx
>
>
> This feature will introduce collation support to the Spark engine. This means 
> that:
>  
>  # Every StringType will have an associated collation. Default remains UTF8 
> Binary, which will behave under the same rules as current UTF8 String 
> comparison.
>  # Collation will be respected in all collation sensitive operations - 
> comparisons, hashing, string operations (contains, startWith, endsWith etc.)
>  # Collation can be set through following ways:
>  ## COLLATE expression. e.g. strExpr COLLATE collation_name
>  ## In CREATE TABLE column definition
>  ## By setting session collation.
>  # All the Spark operators need to respect collation settings (filters, 
> joins, shuffles, aggs etc.)
>  
> This is a high level description of the feature. You can find detailed design 
> under 
> [this|https://docs.google.com/document/d/1A9RQiwq-n3R3vuh571yjOLaaIuIYRTyCx7UFr0Qg-eY/edit?usp=sharing]
>  link (doc is in attachment as well).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46830) Introducing collation concept into Spark

2024-01-25 Thread Aleksandar Tomic (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810853#comment-17810853
 ] 

Aleksandar Tomic commented on SPARK-46830:
--

[~kabhwan]  Please take a look now.

> Introducing collation concept into Spark
> 
>
> Key: SPARK-46830
> URL: https://issues.apache.org/jira/browse/SPARK-46830
> Project: Spark
>  Issue Type: Epic
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Priority: Major
>
> This feature will introduce collation support to the Spark engine. This means 
> that:
>  
>  # Every StringType will have an associated collation. Default remains UTF8 
> Binary, which will behave under the same rules as current UTF8 String 
> comparison.
>  # Collation will be respected in all collation sensitive operations - 
> comparisons, hashing, string operations (contains, startWith, endsWith etc.)
>  # Collation can be set through following ways:
>  ## COLLATE expression. e.g. strExpr COLLATE collation_name
>  ## In CREATE TABLE column definition
>  ## By setting session collation.
>  # All the Spark operators need to respect collation settings (filters, 
> joins, shuffles, aggs etc.)
>  
> This is a high level description of the feature. You can find detailed design 
> under 
> [this|https://docs.google.com/document/d/1G3Xap-0Aj-QC6qoWZDDqO84IulHnogjD1REE3yh1_jk/edit?usp=sharing]
>  link.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46830) Introducing collation concept into Spark

2024-01-24 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810713#comment-17810713
 ] 

Jungtaek Lim commented on SPARK-46830:
--

drive-by comment: you'd like to check the ACL of design doc you linked. At 
least it doesn't allow me.

> Introducing collation concept into Spark
> 
>
> Key: SPARK-46830
> URL: https://issues.apache.org/jira/browse/SPARK-46830
> Project: Spark
>  Issue Type: Epic
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Priority: Major
>
> This feature will introduce collation support to the Spark engine. This means 
> that:
>  
>  # Every StringType will have an associated collation. Default remains UTF8 
> Binary, which will behave under the same rules as current UTF8 String 
> comparison.
>  # Collation will be respected in all collation sensitive operations - 
> comparisons, hashing, string operations (contains, startWith, endsWith etc.)
>  # Collation can be set through following ways:
>  ## COLLATE expression. e.g. strExpr COLLATE collation_name
>  ## In CREATE TABLE column definition
>  ## By setting session collation.
>  # All the Spark operators need to respect collation settings (filters, 
> joins, shuffles, aggs etc.)
>  
> This is a high level description of the feature. You can find detailed design 
> under 
> [this|https://docs.google.com/document/d/1G3Xap-0Aj-QC6qoWZDDqO84IulHnogjD1REE3yh1_jk/edit?usp=sharing]
>  link.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org