[jira] [Updated] (SPARK-48280) Improve collation testing surface area using expression walking
[ https://issues.apache.org/jira/browse/SPARK-48280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48280: -- Summary: Improve collation testing surface area using expression walking (was: Add Expression Walker for Testing) > Improve collation testing surface area using expression walking > --- > > Key: SPARK-48280 > URL: https://issues.apache.org/jira/browse/SPARK-48280 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48574) Add support for StructTypes with collations
[ https://issues.apache.org/jira/browse/SPARK-48574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48574: -- Description: While adding expression walker it was noticed that StructType support is broken. One problem is that `CollationsTypeCasts` is doing a cast in all BinaryExpressions which includes ExtractValue. Consequently, we are unable to extract value if we do a cast there, as ExtractValue only supports nonNullLiterals as extracting keys. > Add support for StructTypes with collations > --- > > Key: SPARK-48574 > URL: https://issues.apache.org/jira/browse/SPARK-48574 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > > While adding expression walker it was noticed that StructType support is > broken. One problem is that `CollationsTypeCasts` is doing a cast in all > BinaryExpressions which includes ExtractValue. Consequently, we are unable to > extract value if we do a cast there, as ExtractValue only supports > nonNullLiterals as extracting keys. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48574) Add support for StructTypes with collations
Mihailo Milosevic created SPARK-48574: - Summary: Add support for StructTypes with collations Key: SPARK-48574 URL: https://issues.apache.org/jira/browse/SPARK-48574 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48572) Fix DateSub, DateAdd, WindowTime, TimeWindow and SessionWindow expressions
[ https://issues.apache.org/jira/browse/SPARK-48572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48572: -- Description: While adding Expression Walker testing, these expression were found to be faulty. These expressions need to be fixed to work with collated strings. > Fix DateSub, DateAdd, WindowTime, TimeWindow and SessionWindow expressions > -- > > Key: SPARK-48572 > URL: https://issues.apache.org/jira/browse/SPARK-48572 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > > While adding Expression Walker testing, these expression were found to be > faulty. These expressions need to be fixed to work with collated strings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48572) Fix DateSub, DateAdd, WindowTime, TimeWindow and SessionWindow expressions
[ https://issues.apache.org/jira/browse/SPARK-48572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48572: -- Summary: Fix DateSub, DateAdd, WindowTime, TimeWindow and SessionWindow expressions (was: Fix DateSub and DateAdd expression implicit casting) > Fix DateSub, DateAdd, WindowTime, TimeWindow and SessionWindow expressions > -- > > Key: SPARK-48572 > URL: https://issues.apache.org/jira/browse/SPARK-48572 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48572) Fix DateSub and DateAdd expression implicit casting
Mihailo Milosevic created SPARK-48572: - Summary: Fix DateSub and DateAdd expression implicit casting Key: SPARK-48572 URL: https://issues.apache.org/jira/browse/SPARK-48572 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48472) Enable reflect expressions with collated strings
[ https://issues.apache.org/jira/browse/SPARK-48472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48472: -- Description: As a movement to collated world, we need to make sure all expressions are supported by appropriate collations. Using expression walker testing `CallMethodViaReflection` was found to be erroneous. This expression is used as a replacement for all reflection methods and needs to be improved. This ticket needs to update methods of this expression. Relevant code could be found in these files: `CallMethodViaReflection.scala`, `TypeCoercion.scala`, `AnsiTypeCoercion.scala`, `CollationTypeCasts.scala`, and for testing `Collation*Suite.scala` files. > Enable reflect expressions with collated strings > > > Key: SPARK-48472 > URL: https://issues.apache.org/jira/browse/SPARK-48472 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > > As a movement to collated world, we need to make sure all expressions are > supported by appropriate collations. Using expression walker testing > `CallMethodViaReflection` was found to be erroneous. This expression is used > as a replacement for all reflection methods and needs to be improved. This > ticket needs to update methods of this expression. Relevant code could be > found in these files: `CallMethodViaReflection.scala`, `TypeCoercion.scala`, > `AnsiTypeCoercion.scala`, `CollationTypeCasts.scala`, and for testing > `Collation*Suite.scala` files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48472) Enable reflect expressions with collated strings
[ https://issues.apache.org/jira/browse/SPARK-48472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48472: -- Summary: Enable reflect expressions with collated strings (was: Expression Walker Test) > Enable reflect expressions with collated strings > > > Key: SPARK-48472 > URL: https://issues.apache.org/jira/browse/SPARK-48472 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48472) Expression Walker Test
[ https://issues.apache.org/jira/browse/SPARK-48472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48472: -- Parent: SPARK-46837 Issue Type: Sub-task (was: Improvement) > Expression Walker Test > -- > > Key: SPARK-48472 > URL: https://issues.apache.org/jira/browse/SPARK-48472 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48472) Expression Walker Test
[ https://issues.apache.org/jira/browse/SPARK-48472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48472: -- Epic Link: (was: SPARK-46830) > Expression Walker Test > -- > > Key: SPARK-48472 > URL: https://issues.apache.org/jira/browse/SPARK-48472 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48472) Expression Walker Test
Mihailo Milosevic created SPARK-48472: - Summary: Expression Walker Test Key: SPARK-48472 URL: https://issues.apache.org/jira/browse/SPARK-48472 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48280) Add Expression Walker for Testing
Mihailo Milosevic created SPARK-48280: - Summary: Add Expression Walker for Testing Key: SPARK-48280 URL: https://issues.apache.org/jira/browse/SPARK-48280 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48263) Collate function support for non UTF8_BINARY strings
[ https://issues.apache.org/jira/browse/SPARK-48263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48263: -- Summary: Collate function support for non UTF8_BINARY strings (was: Collate expression not working when default collation set) > Collate function support for non UTF8_BINARY strings > > > Key: SPARK-48263 > URL: https://issues.apache.org/jira/browse/SPARK-48263 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nebojsa Savic >Priority: Major > Labels: pull-request-available > > When default collation level config is set to some collation other than > UTF8_BINARY (i.e. UTF8_BINARY_LCASE) and when we try to execute COLLATE (or > collation) expression, this will fail because it is only accepting > StringType(0) as argument for collation name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48262) Substitute BinaryExpression for explicit Expressions in CollationTypeCast
Mihailo Milosevic created SPARK-48262: - Summary: Substitute BinaryExpression for explicit Expressions in CollationTypeCast Key: SPARK-48262 URL: https://issues.apache.org/jira/browse/SPARK-48262 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48172) Fix escaping issues in JDBCDialects
[ https://issues.apache.org/jira/browse/SPARK-48172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48172: -- Summary: Fix escaping issues in JDBCDialects (was: Fix escaping issue in JDBCDialects) > Fix escaping issues in JDBCDialects > --- > > Key: SPARK-48172 > URL: https://issues.apache.org/jira/browse/SPARK-48172 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48172) Fix escaping issue in JDBCDialects
[ https://issues.apache.org/jira/browse/SPARK-48172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48172: -- Summary: Fix escaping issue in JDBCDialects (was: Fix escaping issue for mysql) > Fix escaping issue in JDBCDialects > -- > > Key: SPARK-48172 > URL: https://issues.apache.org/jira/browse/SPARK-48172 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48172) Fix escaping issue for mysql
Mihailo Milosevic created SPARK-48172: - Summary: Fix escaping issue for mysql Key: SPARK-48172 URL: https://issues.apache.org/jira/browse/SPARK-48172 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47408) Fix mathExpressions that use StringType
[ https://issues.apache.org/jira/browse/SPARK-47408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47408: -- Summary: Fix mathExpressions that use StringType (was: TBD) > Fix mathExpressions that use StringType > --- > > Key: SPARK-47408 > URL: https://issues.apache.org/jira/browse/SPARK-47408 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47972) Restrict CAST expression for collations
[ https://issues.apache.org/jira/browse/SPARK-47972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47972: -- Description: Current state of code allows for calls like CAST(1 AS STRING COLLATE UNICODE). We want to restrict CAST expression to only be able to cast to default collation string, and to only allow COLLATE expression to produce explicitly collated strings. > Restrict CAST expression for collations > --- > > Key: SPARK-47972 > URL: https://issues.apache.org/jira/browse/SPARK-47972 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > > Current state of code allows for calls like CAST(1 AS STRING COLLATE > UNICODE). We want to restrict CAST expression to only be able to cast to > default collation string, and to only allow COLLATE expression to produce > explicitly collated strings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47692) Fix default StringType meaning in implicit casting
[ https://issues.apache.org/jira/browse/SPARK-47692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47692: -- Summary: Fix default StringType meaning in implicit casting (was: Addition of priority flag to StringType) > Fix default StringType meaning in implicit casting > -- > > Key: SPARK-47692 > URL: https://issues.apache.org/jira/browse/SPARK-47692 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47356) Add support for ConcatWs & Elt (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47356: -- Summary: Add support for ConcatWs & Elt (all collations) (was: ConcatWs & Elt (all collations)) > Add support for ConcatWs & Elt (all collations) > --- > > Key: SPARK-47356 > URL: https://issues.apache.org/jira/browse/SPARK-47356 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47356) ConcatWs & Elt (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837145#comment-17837145 ] Mihailo Milosevic commented on SPARK-47356: --- Working on this. > ConcatWs & Elt (all collations) > --- > > Key: SPARK-47356 > URL: https://issues.apache.org/jira/browse/SPARK-47356 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47357) Add support for Upper, Lower, InitCap (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47357: -- Summary: Add support for Upper, Lower, InitCap (all collations) (was: Upper, Lower, InitCap (all collations)) > Add support for Upper, Lower, InitCap (all collations) > -- > > Key: SPARK-47357 > URL: https://issues.apache.org/jira/browse/SPARK-47357 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47736) Add support for AbstractArrayType
[ https://issues.apache.org/jira/browse/SPARK-47736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47736: -- Summary: Add support for AbstractArrayType (was: Add support for AbstractArrayType(StringTypeCollated)) > Add support for AbstractArrayType > - > > Key: SPARK-47736 > URL: https://issues.apache.org/jira/browse/SPARK-47736 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47765) Add SET COLLATION to parser rules
Mihailo Milosevic created SPARK-47765: - Summary: Add SET COLLATION to parser rules Key: SPARK-47765 URL: https://issues.apache.org/jira/browse/SPARK-47765 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47736) Add support for AbstractArrayType(StringTypeCollated)
[ https://issues.apache.org/jira/browse/SPARK-47736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47736: -- Summary: Add support for AbstractArrayType(StringTypeCollated) (was: Add support for ArrayType(StringTypeAnyCollation)) > Add support for AbstractArrayType(StringTypeCollated) > - > > Key: SPARK-47736 > URL: https://issues.apache.org/jira/browse/SPARK-47736 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47736) Add support for ArrayType(StringTypeAnyCollation)
Mihailo Milosevic created SPARK-47736: - Summary: Add support for ArrayType(StringTypeAnyCollation) Key: SPARK-47736 URL: https://issues.apache.org/jira/browse/SPARK-47736 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47692) Addition of priority flag to StringType
[ https://issues.apache.org/jira/browse/SPARK-47692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47692: -- Summary: Addition of priority flag to StringType (was: Addition of default priority flag) > Addition of priority flag to StringType > --- > > Key: SPARK-47692 > URL: https://issues.apache.org/jira/browse/SPARK-47692 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47692) Addition of default priority flag
Mihailo Milosevic created SPARK-47692: - Summary: Addition of default priority flag Key: SPARK-47692 URL: https://issues.apache.org/jira/browse/SPARK-47692 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47626) Addition for Map Implicit Casting of Collated Strings
[ https://issues.apache.org/jira/browse/SPARK-47626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47626: -- Description: Initial ticket for addition of collation implicit casting SPARK-47210 introduced support for casting of arrays and normal string types. This ticket needs to dive into the problem of casting MapType. (was: Initial PR for addition of collation implicit casting [SPARK-47210] introduced support for casting of arrays and normal string types.) > Addition for Map Implicit Casting of Collated Strings > - > > Key: SPARK-47626 > URL: https://issues.apache.org/jira/browse/SPARK-47626 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > > Initial ticket for addition of collation implicit casting SPARK-47210 > introduced support for casting of arrays and normal string types. This ticket > needs to dive into the problem of casting MapType. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47626) Addition for Map Implicit Casting of Collated Strings
[ https://issues.apache.org/jira/browse/SPARK-47626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47626: -- Description: Initial PR for addition of collation implicit casting [SPARK-47210] introduced support for casting of arrays and normal string types. > Addition for Map Implicit Casting of Collated Strings > - > > Key: SPARK-47626 > URL: https://issues.apache.org/jira/browse/SPARK-47626 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > > Initial PR for addition of collation implicit casting [SPARK-47210] > introduced support for casting of arrays and normal string types. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47625) Addition of Indeterminate Collation Support
[ https://issues.apache.org/jira/browse/SPARK-47625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47625: -- Description: {{INDETERMINATE_COLLATION}} should only be thrown on comparison operations and memory storing of data, and we should be able to combine different implicit collations for certain operations like concat and possible others in the future. This is why we have to add another predefined collation id named {{INDETERMINATE_COLLATION_ID}} which means that the result is a combination of conflicting non-default implicit collations. Right now it would an id of -1 so it fail if it ever goes to the {{{}CollatorFactory{}}}. > Addition of Indeterminate Collation Support > --- > > Key: SPARK-47625 > URL: https://issues.apache.org/jira/browse/SPARK-47625 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > > {{INDETERMINATE_COLLATION}} should only be thrown on comparison operations > and memory storing of data, and we should be able to combine different > implicit collations for certain operations like concat and possible others in > the future. > This is why we have to add another predefined collation id named > {{INDETERMINATE_COLLATION_ID}} which means that the result is a combination > of conflicting non-default implicit collations. Right now it would an id of > -1 so it fail if it ever goes to the {{{}CollatorFactory{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47626) Addition for Map Implicit Casting of Collated Strings
Mihailo Milosevic created SPARK-47626: - Summary: Addition for Map Implicit Casting of Collated Strings Key: SPARK-47626 URL: https://issues.apache.org/jira/browse/SPARK-47626 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47625) Addition of Indeterminate Collation Support
Mihailo Milosevic created SPARK-47625: - Summary: Addition of Indeterminate Collation Support Key: SPARK-47625 URL: https://issues.apache.org/jira/browse/SPARK-47625 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47210) Addition of implicit casting without indeterminate support
[ https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47210: -- Description: *What changes were proposed in this pull request?* This PR adds automatic casting and collations resolution as per `PGSQL` behaviour: 1. Collations set on the metadata level are implicit 2. Collations set using the `COLLATE` expression are explicit 3. When there is a combination of expressions of multiple collations the output will be: - if there are explicit collations and all of them are equal then that collation will be the output - if there are multiple different explicit collations `COLLATION_MISMATCH.EXPLICIT` will be thrown - if there are no explicit collations and only a single type of non default collation, that one will be used - if there are no explicit collations and multiple non-default implicit ones `COLLATION_MISMATCH.IMPLICIT` will be thrown *Why are the changes needed?* We need to be able to compare columns and values with different collations and set a way of explicitly changing the collation we want to use. was: *What changes were proposed in this pull request?* This PR adds automatic casting and collations resolution as per `PGSQL` behaviour: 1. Collations set on the metadata level are implicit 2. Collations set using the `COLLATE` expression are explicit 3. When there is a combination of expressions of multiple collations the output will be: - if there are explicit collations and all of them are equal then that collation will be the output - if there are multiple different explicit collations `COLLATION_MISMATCH.EXPLICIT` will be thrown - if there are no explicit collations and only a single type of non default collation, that one will be used - if there are no explicit collations and multiple non-default implicit ones `COLLATION_MISMATCH.IMPLICIT` will be thrown Another thing is that `INDETERMINATE_COLLATION` should only be thrown on comparison operations, and we should be able to combine different implicit collations for certain operations like concat and possible others in the future. This is why I had to add another predefined collation id named `INDETERMINATE_COLLATION_ID` which means that the result is a combination of conflicting non-default implicit collations. Right now it has an id of -1 so it fails if it ever goes to the `CollatorFactory`. *Why are the changes needed?* We need to be able to compare columns and values with different collations and set a way of explicitly changing the collation we want to use. > Addition of implicit casting without indeterminate support > -- > > Key: SPARK-47210 > URL: https://issues.apache.org/jira/browse/SPARK-47210 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > *What changes were proposed in this pull request?* > This PR adds automatic casting and collations resolution as per `PGSQL` > behaviour: > 1. Collations set on the metadata level are implicit > 2. Collations set using the `COLLATE` expression are explicit > 3. When there is a combination of expressions of multiple collations the > output will be: > - if there are explicit collations and all of them are equal then that > collation will be the output > - if there are multiple different explicit collations > `COLLATION_MISMATCH.EXPLICIT` will be thrown > - if there are no explicit collations and only a single type of non default > collation, that one will be used > - if there are no explicit collations and multiple non-default implicit ones > `COLLATION_MISMATCH.IMPLICIT` will be thrown > *Why are the changes needed?* > We need to be able to compare columns and values with different collations > and set a way of explicitly changing the collation we want to use. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47210) Addition of implicit casting without indeterminate support
[ https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47210: -- Summary: Addition of implicit casting without indeterminate support (was: Implicit casting on collated expressions) > Addition of implicit casting without indeterminate support > -- > > Key: SPARK-47210 > URL: https://issues.apache.org/jira/browse/SPARK-47210 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > *What changes were proposed in this pull request?* > This PR adds automatic casting and collations resolution as per `PGSQL` > behaviour: > 1. Collations set on the metadata level are implicit > 2. Collations set using the `COLLATE` expression are explicit > 3. When there is a combination of expressions of multiple collations the > output will be: > - if there are explicit collations and all of them are equal then that > collation will be the output > - if there are multiple different explicit collations > `COLLATION_MISMATCH.EXPLICIT` will be thrown > - if there are no explicit collations and only a single type of non default > collation, that one will be used > - if there are no explicit collations and multiple non-default implicit ones > `COLLATION_MISMATCH.IMPLICIT` will be thrown > Another thing is that `INDETERMINATE_COLLATION` should only be thrown on > comparison operations, and we should be able to combine different implicit > collations for certain operations like concat and possible others in the > future. > This is why I had to add another predefined collation id named > `INDETERMINATE_COLLATION_ID` which means that the result is a combination of > conflicting non-default implicit collations. Right now it has an id of -1 so > it fails if it ever goes to the `CollatorFactory`. > *Why are the changes needed?* > We need to be able to compare columns and values with different collations > and set a way of explicitly changing the collation we want to use. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47210) Implicit casting on collated expressions
[ https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47210: -- Parent: SPARK-47624 Issue Type: Sub-task (was: Improvement) > Implicit casting on collated expressions > > > Key: SPARK-47210 > URL: https://issues.apache.org/jira/browse/SPARK-47210 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > *What changes were proposed in this pull request?* > This PR adds automatic casting and collations resolution as per `PGSQL` > behaviour: > 1. Collations set on the metadata level are implicit > 2. Collations set using the `COLLATE` expression are explicit > 3. When there is a combination of expressions of multiple collations the > output will be: > - if there are explicit collations and all of them are equal then that > collation will be the output > - if there are multiple different explicit collations > `COLLATION_MISMATCH.EXPLICIT` will be thrown > - if there are no explicit collations and only a single type of non default > collation, that one will be used > - if there are no explicit collations and multiple non-default implicit ones > `COLLATION_MISMATCH.IMPLICIT` will be thrown > Another thing is that `INDETERMINATE_COLLATION` should only be thrown on > comparison operations, and we should be able to combine different implicit > collations for certain operations like concat and possible others in the > future. > This is why I had to add another predefined collation id named > `INDETERMINATE_COLLATION_ID` which means that the result is a combination of > conflicting non-default implicit collations. Right now it has an id of -1 so > it fails if it ever goes to the `CollatorFactory`. > *Why are the changes needed?* > We need to be able to compare columns and values with different collations > and set a way of explicitly changing the collation we want to use. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47210) Implicit casting on collated expressions
[ https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47210: -- Epic Link: (was: SPARK-46830) > Implicit casting on collated expressions > > > Key: SPARK-47210 > URL: https://issues.apache.org/jira/browse/SPARK-47210 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > *What changes were proposed in this pull request?* > This PR adds automatic casting and collations resolution as per `PGSQL` > behaviour: > 1. Collations set on the metadata level are implicit > 2. Collations set using the `COLLATE` expression are explicit > 3. When there is a combination of expressions of multiple collations the > output will be: > - if there are explicit collations and all of them are equal then that > collation will be the output > - if there are multiple different explicit collations > `COLLATION_MISMATCH.EXPLICIT` will be thrown > - if there are no explicit collations and only a single type of non default > collation, that one will be used > - if there are no explicit collations and multiple non-default implicit ones > `COLLATION_MISMATCH.IMPLICIT` will be thrown > Another thing is that `INDETERMINATE_COLLATION` should only be thrown on > comparison operations, and we should be able to combine different implicit > collations for certain operations like concat and possible others in the > future. > This is why I had to add another predefined collation id named > `INDETERMINATE_COLLATION_ID` which means that the result is a combination of > conflicting non-default implicit collations. Right now it has an id of -1 so > it fails if it ever goes to the `CollatorFactory`. > *Why are the changes needed?* > We need to be able to compare columns and values with different collations > and set a way of explicitly changing the collation we want to use. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47624) Collation Implict Casting Support
Mihailo Milosevic created SPARK-47624: - Summary: Collation Implict Casting Support Key: SPARK-47624 URL: https://issues.apache.org/jira/browse/SPARK-47624 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47477) SubstringIndex, StringLocate (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47477: -- Parent: (was: SPARK-46837) Issue Type: New Feature (was: Sub-task) > SubstringIndex, StringLocate (all collations) > - > > Key: SPARK-47477 > URL: https://issues.apache.org/jira/browse/SPARK-47477 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *StringInstr* and *FindInSet* built-in > string functions in Spark. First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringInstr* and > *FindInSet* functions so that they support all collation types currently > supported in Spark. To understand what changes were introduced in order to > enable full collation support for other existing functions in Spark, take a > look at the Spark PRs and Jira tickets for completed tasks in this parent > (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47477) SubstringIndex, StringLocate (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47477: -- Epic Link: SPARK-46830 > SubstringIndex, StringLocate (all collations) > - > > Key: SPARK-47477 > URL: https://issues.apache.org/jira/browse/SPARK-47477 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *StringInstr* and *FindInSet* built-in > string functions in Spark. First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringInstr* and > *FindInSet* functions so that they support all collation types currently > supported in Spark. To understand what changes were introduced in order to > enable full collation support for other existing functions in Spark, take a > look at the Spark PRs and Jira tickets for completed tasks in this parent > (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47477) SubstringIndex, StringLocate (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47477: -- Labels: (was: pull-request-available) > SubstringIndex, StringLocate (all collations) > - > > Key: SPARK-47477 > URL: https://issues.apache.org/jira/browse/SPARK-47477 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *StringInstr* and *FindInSet* built-in > string functions in Spark. First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringInstr* and > *FindInSet* functions so that they support all collation types currently > supported in Spark. To understand what changes were introduced in order to > enable full collation support for other existing functions in Spark, take a > look at the Spark PRs and Jira tickets for completed tasks in this parent > (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47504) Resolve AbstractDataType simpleStrings for StringTypeCollated
Mihailo Milosevic created SPARK-47504: - Summary: Resolve AbstractDataType simpleStrings for StringTypeCollated Key: SPARK-47504 URL: https://issues.apache.org/jira/browse/SPARK-47504 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic *SPARK-47296* introduced a change to fail all unsupported functions. Because of this change expected *inputTypes* in *ExpectsInputTypes* had to be changed. This change introduced a change on user side which will print *"STRING_ANY_COLLATION"* in places where before we printed *"STRING"* when an error occurred. Concretely if we get an input of Int where *StringTypeAnyCollation* was expected, we will throw this faulty message for users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47431) Add session level default Collation
Mihailo Milosevic created SPARK-47431: - Summary: Add session level default Collation Key: SPARK-47431 URL: https://issues.apache.org/jira/browse/SPARK-47431 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47210) Implicit casting on collated expressions
[ https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47210: -- Description: *What changes were proposed in this pull request?* This PR adds automatic casting and collations resolution as per `PGSQL` behaviour: 1. Collations set on the metadata level are implicit 2. Collations set using the `COLLATE` expression are explicit 3. When there is a combination of expressions of multiple collations the output will be: - if there are explicit collations and all of them are equal then that collation will be the output - if there are multiple different explicit collations `COLLATION_MISMATCH.EXPLICIT` will be thrown - if there are no explicit collations and only a single type of non default collation, that one will be used - if there are no explicit collations and multiple non-default implicit ones `COLLATION_MISMATCH.IMPLICIT` will be thrown Another thing is that `INDETERMINATE_COLLATION` should only be thrown on comparison operations, and we should be able to combine different implicit collations for certain operations like concat and possible others in the future. This is why I had to add another predefined collation id named `INDETERMINATE_COLLATION_ID` which means that the result is a combination of conflicting non-default implicit collations. Right now it has an id of -1 so it fails if it ever goes to the `CollatorFactory`. *Why are the changes needed?* We need to be able to compare columns and values with different collations and set a way of explicitly changing the collation we want to use. > Implicit casting on collated expressions > > > Key: SPARK-47210 > URL: https://issues.apache.org/jira/browse/SPARK-47210 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > *What changes were proposed in this pull request?* > This PR adds automatic casting and collations resolution as per `PGSQL` > behaviour: > 1. Collations set on the metadata level are implicit > 2. Collations set using the `COLLATE` expression are explicit > 3. When there is a combination of expressions of multiple collations the > output will be: > - if there are explicit collations and all of them are equal then that > collation will be the output > - if there are multiple different explicit collations > `COLLATION_MISMATCH.EXPLICIT` will be thrown > - if there are no explicit collations and only a single type of non default > collation, that one will be used > - if there are no explicit collations and multiple non-default implicit ones > `COLLATION_MISMATCH.IMPLICIT` will be thrown > Another thing is that `INDETERMINATE_COLLATION` should only be thrown on > comparison operations, and we should be able to combine different implicit > collations for certain operations like concat and possible others in the > future. > This is why I had to add another predefined collation id named > `INDETERMINATE_COLLATION_ID` which means that the result is a combination of > conflicting non-default implicit collations. Right now it has an id of -1 so > it fails if it ever goes to the `CollatorFactory`. > *Why are the changes needed?* > We need to be able to compare columns and values with different collations > and set a way of explicitly changing the collation we want to use. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47169) Disable bucketing on collated collumns
[ https://issues.apache.org/jira/browse/SPARK-47169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47169: -- Description: *What changes were proposed in this pull request?* Disable bucketing on columns that are non default collated. *Why are the changes needed?* With current implementation bucketIds are generated from a string value where each unique string guarantees unique id, but when collation is turned on, this is not the case. was: What changes were proposed in this pull request? Disable bucketing on columns that are non default collated. Why are the changes needed? With current implementation bucketIds are generated from a string value where each unique string guarantees unique id, but when collation is turned on, this is not the case. > Disable bucketing on collated collumns > -- > > Key: SPARK-47169 > URL: https://issues.apache.org/jira/browse/SPARK-47169 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > *What changes were proposed in this pull request?* > Disable bucketing on columns that are non default collated. > *Why are the changes needed?* > With current implementation bucketIds are generated from a string value where > each unique string guarantees unique id, but when collation is turned on, > this is not the case. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47169) Disable bucketing on collated collumns
[ https://issues.apache.org/jira/browse/SPARK-47169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47169: -- Description: What changes were proposed in this pull request? Disable bucketing on columns that are non default collated. Why are the changes needed? With current implementation bucketIds are generated from a string value where each unique string guarantees unique id, but when collation is turned on, this is not the case. > Disable bucketing on collated collumns > -- > > Key: SPARK-47169 > URL: https://issues.apache.org/jira/browse/SPARK-47169 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > What changes were proposed in this pull request? > Disable bucketing on columns that are non default collated. > Why are the changes needed? > With current implementation bucketIds are generated from a string value where > each unique string guarantees unique id, but when collation is turned on, > this is not the case. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47326) Moving tests to related Suites
[ https://issues.apache.org/jira/browse/SPARK-47326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47326: -- Description: *What changes were proposed in this pull request?* Tests from `QueryCompilationErrorsSuite` were moved to `DDLSuite` and `JDBCTableCatalogSuite`. *Why are the changes needed?* We should move tests to related test suites in order to improve testing. > Moving tests to related Suites > -- > > Key: SPARK-47326 > URL: https://issues.apache.org/jira/browse/SPARK-47326 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > > *What changes were proposed in this pull request?* > Tests from `QueryCompilationErrorsSuite` were moved to `DDLSuite` and > `JDBCTableCatalogSuite`. > *Why are the changes needed?* > We should move tests to related test suites in order to improve testing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47015) Disable partitioning on collated columns
[ https://issues.apache.org/jira/browse/SPARK-47015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47015: -- Description: (was: *What changes were proposed in this pull request?* Tests from `QueryCompilationErrorsSuite` were moved to `DDLSuite` and `JDBCTableCatalogSuite`. *Why are the changes needed?* We should move tests to related test suites in order to improve testing.) > Disable partitioning on collated columns > > > Key: SPARK-47015 > URL: https://issues.apache.org/jira/browse/SPARK-47015 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47015) Disable partitioning on collated columns
[ https://issues.apache.org/jira/browse/SPARK-47015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47015: -- Description: *What changes were proposed in this pull request?* Tests from `QueryCompilationErrorsSuite` were moved to `DDLSuite` and `JDBCTableCatalogSuite`. *Why are the changes needed?* We should move tests to related test suites in order to improve testing. was: ### What changes were proposed in this pull request? Tests from `QueryCompilationErrorsSuite` were moved to `DDLSuite` and `JDBCTableCatalogSuite`. ### Why are the changes needed? We should move tests to related test suites in order to improve testing. > Disable partitioning on collated columns > > > Key: SPARK-47015 > URL: https://issues.apache.org/jira/browse/SPARK-47015 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > *What changes were proposed in this pull request?* > Tests from `QueryCompilationErrorsSuite` were moved to `DDLSuite` and > `JDBCTableCatalogSuite`. > *Why are the changes needed?* > We should move tests to related test suites in order to improve testing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47015) Disable partitioning on collated columns
[ https://issues.apache.org/jira/browse/SPARK-47015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47015: -- Description: ### What changes were proposed in this pull request? Tests from `QueryCompilationErrorsSuite` were moved to `DDLSuite` and `JDBCTableCatalogSuite`. ### Why are the changes needed? We should move tests to related test suites in order to improve testing. > Disable partitioning on collated columns > > > Key: SPARK-47015 > URL: https://issues.apache.org/jira/browse/SPARK-47015 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > ### What changes were proposed in this pull request? > Tests from `QueryCompilationErrorsSuite` were moved to `DDLSuite` and > `JDBCTableCatalogSuite`. > ### Why are the changes needed? > We should move tests to related test suites in order to improve testing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47326) Moving tests to related Suites
[ https://issues.apache.org/jira/browse/SPARK-47326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825203#comment-17825203 ] Mihailo Milosevic commented on SPARK-47326: --- Issue resolved by pull request 86361 [https://github.com/databricks/runtime/pull/86361|http://example.com] > Moving tests to related Suites > -- > > Key: SPARK-47326 > URL: https://issues.apache.org/jira/browse/SPARK-47326 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47326) Moving tests to related Suites
Mihailo Milosevic created SPARK-47326: - Summary: Moving tests to related Suites Key: SPARK-47326 URL: https://issues.apache.org/jira/browse/SPARK-47326 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47210) Implicit casting on collated expressions
Mihailo Milosevic created SPARK-47210: - Summary: Implicit casting on collated expressions Key: SPARK-47210 URL: https://issues.apache.org/jira/browse/SPARK-47210 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47169) Disable bucketint on collated collumns
Mihailo Milosevic created SPARK-47169: - Summary: Disable bucketint on collated collumns Key: SPARK-47169 URL: https://issues.apache.org/jira/browse/SPARK-47169 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47169) Disable bucketing on collated collumns
[ https://issues.apache.org/jira/browse/SPARK-47169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47169: -- Summary: Disable bucketing on collated collumns (was: Disable bucketint on collated collumns) > Disable bucketing on collated collumns > -- > > Key: SPARK-47169 > URL: https://issues.apache.org/jira/browse/SPARK-47169 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47102) Add COLLATION_ENABLED config flag
[ https://issues.apache.org/jira/browse/SPARK-47102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47102: -- Description: *What changes were proposed in this pull request?* This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error class `COLLATION_SUPPORT_NOT_ENABLED` to appropriately report error on usage of feature under development. *Why are the changes needed?* We want to make collations configurable on this flag. These changes disable usage of `collate` and `collation` functions, along with any `COLLATE` syntax when the flag is set to false. By default, the flag is set to false. was: *What changes were proposed in this pull request?* This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of feature under development. *Why are the changes needed?* We want to make collations configurable on this some flag. These changes disable usage of `collate` and `collation` functions, along with any `COLLATE` syntax when the flag is set to false. By default, the flag is set to false. > Add COLLATION_ENABLED config flag > - > > Key: SPARK-47102 > URL: https://issues.apache.org/jira/browse/SPARK-47102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > *What changes were proposed in this pull request?* > This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error > class `COLLATION_SUPPORT_NOT_ENABLED` to appropriately report error on usage > of feature under development. > *Why are the changes needed?* > We want to make collations configurable on this flag. These changes disable > usage of `collate` and `collation` functions, along with any `COLLATE` syntax > when the flag is set to false. By default, the flag is set to false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47102) Add COLLATION_ENABLED config flag
[ https://issues.apache.org/jira/browse/SPARK-47102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47102: -- Description: *What changes were proposed in this pull request?* This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of feature under development. *Why are the changes needed?* We want to make collations configurable on this some flag. These changes disable usage of `collate` and `collation` functions, along with any `COLLATE` syntax when the flag is set to false. By default, the flag is set to false. was: ### What changes were proposed in this pull request? This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of feature under development. ### Why are the changes needed? We want to make collations configurable on this some flag. These changes disable usage of `collate` and `collation` functions, along with any `COLLATE` syntax when the flag is set to false. By default, the flag is set to false. > Add COLLATION_ENABLED config flag > - > > Key: SPARK-47102 > URL: https://issues.apache.org/jira/browse/SPARK-47102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > *What changes were proposed in this pull request?* > This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error > class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of > feature under development. > *Why are the changes needed?* > We want to make collations configurable on this some flag. These changes > disable usage of `collate` and `collation` functions, along with any > `COLLATE` syntax when the flag is set to false. By default, the flag is set > to false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47102) Add COLLATION_ENABLED config flag
[ https://issues.apache.org/jira/browse/SPARK-47102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47102: -- Description: ### What changes were proposed in this pull request? This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of feature under development. ### Why are the changes needed? We want to make collations configurable on this some flag. These changes disable usage of `collate` and `collation` functions, along with any `COLLATE` syntax when the flag is set to false. By default, the flag is set to false. > Add COLLATION_ENABLED config flag > - > > Key: SPARK-47102 > URL: https://issues.apache.org/jira/browse/SPARK-47102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > ### What changes were proposed in this pull request? > This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error > class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of > feature under development. > ### Why are the changes needed? > We want to make collations configurable on this some flag. These changes > disable usage of `collate` and `collation` functions, along with any > `COLLATE` syntax when the flag is set to false. By default, the flag is set > to false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47102) Add COLLATION_ENABLED config flag
Mihailo Milosevic created SPARK-47102: - Summary: Add COLLATION_ENABLED config flag Key: SPARK-47102 URL: https://issues.apache.org/jira/browse/SPARK-47102 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818730#comment-17818730 ] Mihailo Milosevic commented on SPARK-43259: --- I want to work on this issue. Raised a PR for same https://github.com/apache/spark/pull/45095 > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org