[ 
https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47415:
---------------------------------
    Description: 
Enable collation support for the *Levenshtein* built-in string function in 
Spark. First confirm what is the expected behaviour for this function when 
given collated strings, and then move on to implementation and testing. 
Implement the corresponding unit tests and E2E sql tests to reflect how this 
function should be used with collation in SparkSQL, and feel free to use your 
chosen Spark SQL Editor to experiment with the existing functions to learn more 
about how they work. In addition, look into the possible use-cases and 
implementation of similar functions within other other open-source DBMS, such 
as [PostgreSQL|https://www.postgresql.org/docs/].

 

The goal for this Jira ticket is to implement the *Levenshtein* function so it 
supports all collation types currently supported in Spark. To understand what 
changes were introduced in order to enable full collation support for other 
existing functions in Spark, take a look at the Spark PRs and Jira tickets for 
completed tasks in this parent (for example: Contains, StartsWith, EndsWith).

 

Read more about ICU [Collation Concepts|http://example.com/] and 
[Collator|http://example.com/] class. Also, refer to the Unicode Technical 
Standard for string [searching|https://www.unicode.org/reports/tr10/#Searching] 
and 
[collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].

> Levenshtein (all collations)
> ----------------------------
>
>                 Key: SPARK-47415
>                 URL: https://issues.apache.org/jira/browse/SPARK-47415
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Uroš Bojanić
>            Priority: Major
>              Labels: pull-request-available
>
> Enable collation support for the *Levenshtein* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. 
> Implement the corresponding unit tests and E2E sql tests to reflect how this 
> function should be used with collation in SparkSQL, and feel free to use your 
> chosen Spark SQL Editor to experiment with the existing functions to learn 
> more about how they work. In addition, look into the possible use-cases and 
> implementation of similar functions within other other open-source DBMS, such 
> as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *Levenshtein* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to