[
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uroš Bojanić updated SPARK-47412:
-
Description:
Enable collation support for the *StringLPad* & *StringRPad* built-in string
functions in Spark. First confirm what is the expected behaviour for these
functions when given collated strings, then move on to the implementation that
would enable handling strings of all collation types. Implement the
corresponding unit tests (CollationStringExpressionsSuite) and E2E tests
(CollationSuite) to reflect how this function should be used with collation in
SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment with
the existing functions to learn more about how they work. In addition, look
into the possible use-cases and implementation of similar functions within
other other open-source DBMS, such as
[PostgreSQL|https://www.postgresql.org/docs/].
The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad*
functions so that they support all collation types currently supported in
Spark. To understand what changes were introduced in order to enable full
collation support for other existing functions in Spark, take a look at the
Spark PRs and Jira tickets for completed tasks in this parent (for example:
Contains, StartsWith, EndsWith).
Read more about ICU [Collation Concepts|http://example.com/] and
[Collator|http://example.com/] class. Also, refer to the Unicode Technical
Standard for
[collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].
was:
Enable collation support for the *Substring* built-in string function in Spark
(including *Right* and *Left* functions). First confirm what is the expected
behaviour for these functions when given collated strings, then move on to the
implementation that would enable handling strings of all collation types.
Implement the corresponding unit tests (CollationStringExpressionsSuite) and
E2E tests (CollationSuite) to reflect how this function should be used with
collation in SparkSQL, and feel free to use your chosen Spark SQL Editor to
experiment with the existing functions to learn more about how they work. In
addition, look into the possible use-cases and implementation of similar
functions within other other open-source DBMS, such as
[PostgreSQL|https://www.postgresql.org/docs/].
The goal for this Jira ticket is to implement the {*}Substring{*}, {*}Right{*},
and *Left* functions so that they support all collation types currently
supported in Spark. To understand what changes were introduced in order to
enable full collation support for other existing functions in Spark, take a
look at the Spark PRs and Jira tickets for completed tasks in this parent (for
example: Contains, StartsWith, EndsWith).
Read more about ICU [Collation Concepts|http://example.com/] and
[Collator|http://example.com/] class. Also, refer to the Unicode Technical
Standard for
[collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].
> StringLPad, StringRPad (all collations)
> ---
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Enable collation support for the *StringLPad* & *StringRPad* built-in string
> functions in Spark. First confirm what is the expected behaviour for these
> functions when given collated strings, then move on to the implementation
> that would enable handling strings of all collation types. Implement the
> corresponding unit tests (CollationStringExpressionsSuite) and E2E tests
> (CollationSuite) to reflect how this function should be used with collation
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment
> with the existing functions to learn more about how they work. In addition,
> look into the possible use-cases and implementation of similar functions
> within other other open-source DBMS, such as
> [PostgreSQL|https://www.postgresql.org/docs/].
>
> The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad*
> functions so that they support all collation types currently supported in
> Spark. To understand what changes were introduced in order to enable full
> collation support for other existing functions in Spark, take a look at the
> Spark PRs and Jira tickets for completed tasks in this parent (for example:
> Contains, StartsWith, EndsWith).
>
> Read more about ICU [Collation Concepts|http://example.com/] and
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical
> Standard for
>