[jira] [Commented] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836100#comment-17836100 ] Uroš Bojanić commented on SPARK-47412: -- [~gpgp] Yup, you got it! That's the expected behaviour, very similar to substring/left/right > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836009#comment-17836009 ] Gideon P commented on SPARK-47412: -- Confirming expected behavior: According to "Collation Support in Spark" word documpend, padding functions will be "pass through". That is to say if both or one of the string parameters to LPAD/RPAD have either the same collation or a collation that overrides the other ( Explicit collation has precedence over Implicit collation which has precedence over Default location), the return type will have that collation. But in terms of _value_, that will be the same as it would be prior to collations. Similar to substring/left/right. And If different collations of the same precedence, the function would be expected to throw an exception, right? [~uros-db] please confirm. Thanks! > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835182#comment-17835182 ] Gideon P commented on SPARK-47412: -- [~uros-db] I agree -- this one can be expected to be simple and familiar. I will move on to this one (while still shepherding https://github.com/apache/spark/pull/45738/ to the finish line). Thanks! I will let you know if I have questions. > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834803#comment-17834803 ] Uroš Bojanić commented on SPARK-47412: -- [~gpgp] Thank you for your hard work on [SPARK-47413|https://issues.apache.org/jira/browse/SPARK-47413]! We'll put your [PR|https://github.com/apache/spark/pull/45738/] under final review, so feel free to move on to this ticket. This one should be relatively simple as well, and you've also got some experience under your belt already. Nevertheless, feel free to let me know if you have any questions! > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org