[ https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844937#comment-17844937 ]
Uroš Bojanić commented on SPARK-47415: -------------------------------------- Update: [~nikolamand-db] has implemented this function in [https://github.com/apache/spark/pull/45963], so most of the implementation logic and tests should be there. However, we have recently done some refactoring in: https://issues.apache.org/jira/browse/SPARK-47410. Now we need to refactor Nikola's changes by following the guidelines outlined in that Jira ticket. Nikola suggested this could be a good onboarding task for Nebojsa, so he could get familiar with part of the codebase. > Levenshtein (all collations) > ---------------------------- > > Key: SPARK-47415 > URL: https://issues.apache.org/jira/browse/SPARK-47415 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 4.0.0 > Reporter: Uroš Bojanić > Priority: Major > Labels: pull-request-available > > Enable collation support for the *Levenshtein* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. > Implement the corresponding unit tests and E2E sql tests to reflect how this > function should be used with collation in SparkSQL, and feel free to use your > chosen Spark SQL Editor to experiment with the existing functions to learn > more about how they work. In addition, look into the possible use-cases and > implementation of similar functions within other other open-source DBMS, such > as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *Levenshtein* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org