[jira] [Assigned] (SPARK-48559) Fetch globalTempDatabase name directly without invoking initialization of GlobalaTempViewManager
[ https://issues.apache.org/jira/browse/SPARK-48559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48559: --- Assignee: Wenchen Fan > Fetch globalTempDatabase name directly without invoking initialization of > GlobalaTempViewManager > > > Key: SPARK-48559 > URL: https://issues.apache.org/jira/browse/SPARK-48559 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48559) Fetch globalTempDatabase name directly without invoking initialization of GlobalaTempViewManager
[ https://issues.apache.org/jira/browse/SPARK-48559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48559. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46907 [https://github.com/apache/spark/pull/46907] > Fetch globalTempDatabase name directly without invoking initialization of > GlobalaTempViewManager > > > Key: SPARK-48559 > URL: https://issues.apache.org/jira/browse/SPARK-48559 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48286) Analyze 'exists' default expression instead of 'current' default expression in structField to v2 column conversion
[ https://issues.apache.org/jira/browse/SPARK-48286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48286: --- Assignee: Uros Stankovic > Analyze 'exists' default expression instead of 'current' default expression > in structField to v2 column conversion > -- > > Key: SPARK-48286 > URL: https://issues.apache.org/jira/browse/SPARK-48286 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uros Stankovic >Assignee: Uros Stankovic >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > > org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze method > accepts 3 parameter > 1) Field to analyze > 2) Statement type - String > 3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT > Method > org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column > pass fieldToAnalyze and EXISTS_DEFAULT as second parameter, so it is not > metadata key, instead of that, it is statement type, so bad expression is > analyzed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48286) Analyze 'exists' default expression instead of 'current' default expression in structField to v2 column conversion
[ https://issues.apache.org/jira/browse/SPARK-48286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48286. - Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46594 [https://github.com/apache/spark/pull/46594] > Analyze 'exists' default expression instead of 'current' default expression > in structField to v2 column conversion > -- > > Key: SPARK-48286 > URL: https://issues.apache.org/jira/browse/SPARK-48286 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uros Stankovic >Assignee: Uros Stankovic >Priority: Trivial > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze method > accepts 3 parameter > 1) Field to analyze > 2) Statement type - String > 3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT > Method > org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column > pass fieldToAnalyze and EXISTS_DEFAULT as second parameter, so it is not > metadata key, instead of that, it is statement type, so bad expression is > analyzed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48283) Implement modified Lowercase operation for UTF8_BINARY_LCASE
[ https://issues.apache.org/jira/browse/SPARK-48283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48283. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46700 [https://github.com/apache/spark/pull/46700] > Implement modified Lowercase operation for UTF8_BINARY_LCASE > > > Key: SPARK-48283 > URL: https://issues.apache.org/jira/browse/SPARK-48283 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48435) UNICODE collation should not support binary equality
[ https://issues.apache.org/jira/browse/SPARK-48435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48435. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46772 [https://github.com/apache/spark/pull/46772] > UNICODE collation should not support binary equality > > > Key: SPARK-48435 > URL: https://issues.apache.org/jira/browse/SPARK-48435 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48435) UNICODE collation should not support binary equality
[ https://issues.apache.org/jira/browse/SPARK-48435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48435: --- Assignee: Uroš Bojanić > UNICODE collation should not support binary equality > > > Key: SPARK-48435 > URL: https://issues.apache.org/jira/browse/SPARK-48435 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48526) Allow passing custom sink to StreamTest::testStream
[ https://issues.apache.org/jira/browse/SPARK-48526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48526: --- Assignee: Johan Lasperas > Allow passing custom sink to StreamTest::testStream > --- > > Key: SPARK-48526 > URL: https://issues.apache.org/jira/browse/SPARK-48526 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Trivial > Labels: pull-request-available > > The testing helpers for streaming don't allow providing a custom sink, this > is limiting in (at least) two ways: > * A sink can't be reused across multiple calls to `testStream`, e.g. when > canceling and resuming streaming > * A custom sink implementation other than `MemorySink` can't be provided. A > use case here is for example to test the Delta streaming sink by wrapping it > in a MemorySink interface and passing it to the test framework. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48526) Allow passing custom sink to StreamTest::testStream
[ https://issues.apache.org/jira/browse/SPARK-48526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48526. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46866 [https://github.com/apache/spark/pull/46866] > Allow passing custom sink to StreamTest::testStream > --- > > Key: SPARK-48526 > URL: https://issues.apache.org/jira/browse/SPARK-48526 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > > The testing helpers for streaming don't allow providing a custom sink, this > is limiting in (at least) two ways: > * A sink can't be reused across multiple calls to `testStream`, e.g. when > canceling and resuming streaming > * A custom sink implementation other than `MemorySink` can't be provided. A > use case here is for example to test the Delta streaming sink by wrapping it > in a MemorySink interface and passing it to the test framework. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48546) Fix ExpressionEncoder after replacing NullPointerExceptions with proper error classes in AssertNotNull expression
[ https://issues.apache.org/jira/browse/SPARK-48546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48546: --- Assignee: Daniel > Fix ExpressionEncoder after replacing NullPointerExceptions with proper error > classes in AssertNotNull expression > - > > Key: SPARK-48546 > URL: https://issues.apache.org/jira/browse/SPARK-48546 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48546) Fix ExpressionEncoder after replacing NullPointerExceptions with proper error classes in AssertNotNull expression
[ https://issues.apache.org/jira/browse/SPARK-48546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48546. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46888 [https://github.com/apache/spark/pull/46888] > Fix ExpressionEncoder after replacing NullPointerExceptions with proper error > classes in AssertNotNull expression > - > > Key: SPARK-48546 > URL: https://issues.apache.org/jira/browse/SPARK-48546 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48552) multi-line CSV schema inference should also throw FAILED_READ_FILE
Wenchen Fan created SPARK-48552: --- Summary: multi-line CSV schema inference should also throw FAILED_READ_FILE Key: SPARK-48552 URL: https://issues.apache.org/jira/browse/SPARK-48552 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48307) InlineCTE should keep not-inlined relations in the original WithCTE node
[ https://issues.apache.org/jira/browse/SPARK-48307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48307. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46617 [https://github.com/apache/spark/pull/46617] > InlineCTE should keep not-inlined relations in the original WithCTE node > > > Key: SPARK-48307 > URL: https://issues.apache.org/jira/browse/SPARK-48307 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48318) Hash join support for strings with collation (complex types)
[ https://issues.apache.org/jira/browse/SPARK-48318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48318: --- Assignee: Uroš Bojanić > Hash join support for strings with collation (complex types) > > > Key: SPARK-48318 > URL: https://issues.apache.org/jira/browse/SPARK-48318 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48318) Hash join support for strings with collation (complex types)
[ https://issues.apache.org/jira/browse/SPARK-48318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48318. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46722 [https://github.com/apache/spark/pull/46722] > Hash join support for strings with collation (complex types) > > > Key: SPARK-48318 > URL: https://issues.apache.org/jira/browse/SPARK-48318 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47972) Restrict CAST expression for collations
[ https://issues.apache.org/jira/browse/SPARK-47972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47972: --- Assignee: Mihailo Milosevic > Restrict CAST expression for collations > --- > > Key: SPARK-47972 > URL: https://issues.apache.org/jira/browse/SPARK-47972 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > Current state of code allows for calls like CAST(1 AS STRING COLLATE > UNICODE). We want to restrict CAST expression to only be able to cast to > default collation string, and to only allow COLLATE expression to produce > explicitly collated strings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47972) Restrict CAST expression for collations
[ https://issues.apache.org/jira/browse/SPARK-47972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47972. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46474 [https://github.com/apache/spark/pull/46474] > Restrict CAST expression for collations > --- > > Key: SPARK-47972 > URL: https://issues.apache.org/jira/browse/SPARK-47972 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Current state of code allows for calls like CAST(1 AS STRING COLLATE > UNICODE). We want to restrict CAST expression to only be able to cast to > default collation string, and to only allow COLLATE expression to produce > explicitly collated strings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48413) ALTER COLUMN with collation
[ https://issues.apache.org/jira/browse/SPARK-48413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48413. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46734 [https://github.com/apache/spark/pull/46734] > ALTER COLUMN with collation > --- > > Key: SPARK-48413 > URL: https://issues.apache.org/jira/browse/SPARK-48413 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nikola Mandic >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add support for changing collation of a column with ALTER COLUMN command. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48503) Scalar subquery with group-by and non-equality predicate incorrectly allowed, wrong results
[ https://issues.apache.org/jira/browse/SPARK-48503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48503: --- Assignee: Jack Chen > Scalar subquery with group-by and non-equality predicate incorrectly allowed, > wrong results > --- > > Key: SPARK-48503 > URL: https://issues.apache.org/jira/browse/SPARK-48503 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jack Chen >Assignee: Jack Chen >Priority: Major > Labels: pull-request-available > > This query is not legal and should give an error, but instead we incorrectly > allow it and it returns wrong results. > {code:java} > create table x(x1 int, x2 int); > insert into x values (1, 1); > create table y(y1 int, y2 int); > insert into y values (2, 2), (3, 3); > select *, (select count(*) from y where y1 > x1 group by y1) from x; {code} > It returns two rows, even though there's only one row of x. > The correct result is an error: more than one row returned by a subquery used > as an expression (as seen in postgres for example) > > This is a longstanding bug. The bug is in CheckAnalysis in > {{{}checkAggregateInScalarSubquery{}}}. It allows grouping columns that are > present in correlation predicates, but doesn’t check whether those predicates > are equalities - because when that code was written, non-equality > correlation wasn’t allowed. Therefore, it looks like this bug has existed > since non-equality correlation was added (~2 years ago). > > Various other expressions that are not equi-joins between the inner and outer > fields hit this too, e.g. `where y1 + y2 = x1 group by y1`. > Another bugged case is if the correlation condition is an equality but it's > under another operator like an OUTER JOIN or UNION. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48503) Scalar subquery with group-by and non-equality predicate incorrectly allowed, wrong results
[ https://issues.apache.org/jira/browse/SPARK-48503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48503. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46839 [https://github.com/apache/spark/pull/46839] > Scalar subquery with group-by and non-equality predicate incorrectly allowed, > wrong results > --- > > Key: SPARK-48503 > URL: https://issues.apache.org/jira/browse/SPARK-48503 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jack Chen >Assignee: Jack Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This query is not legal and should give an error, but instead we incorrectly > allow it and it returns wrong results. > {code:java} > create table x(x1 int, x2 int); > insert into x values (1, 1); > create table y(y1 int, y2 int); > insert into y values (2, 2), (3, 3); > select *, (select count(*) from y where y1 > x1 group by y1) from x; {code} > It returns two rows, even though there's only one row of x. > The correct result is an error: more than one row returned by a subquery used > as an expression (as seen in postgres for example) > > This is a longstanding bug. The bug is in CheckAnalysis in > {{{}checkAggregateInScalarSubquery{}}}. It allows grouping columns that are > present in correlation predicates, but doesn’t check whether those predicates > are equalities - because when that code was written, non-equality > correlation wasn’t allowed. Therefore, it looks like this bug has existed > since non-equality correlation was added (~2 years ago). > > Various other expressions that are not equi-joins between the inner and outer > fields hit this too, e.g. `where y1 + y2 = x1 group by y1`. > Another bugged case is if the correlation condition is an equality but it's > under another operator like an OUTER JOIN or UNION. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48391) use addAll instead of add function in TaskMetrics to accelerate
[ https://issues.apache.org/jira/browse/SPARK-48391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48391: --- Assignee: jiahong.li > use addAll instead of add function in TaskMetrics to accelerate > - > > Key: SPARK-48391 > URL: https://issues.apache.org/jira/browse/SPARK-48391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0, 3.5.1 >Reporter: jiahong.li >Assignee: jiahong.li >Priority: Major > Labels: pull-request-available > > In the fromAccumulators method of TaskMetrics,we should use ` > tm._externalAccums.addAll` instead of `tm._externalAccums.add`, as > _externalAccums is a instance of CopyOnWriteArrayList -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48391) use addAll instead of add function in TaskMetrics to accelerate
[ https://issues.apache.org/jira/browse/SPARK-48391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48391. - Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46705 [https://github.com/apache/spark/pull/46705] > use addAll instead of add function in TaskMetrics to accelerate > - > > Key: SPARK-48391 > URL: https://issues.apache.org/jira/browse/SPARK-48391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0, 3.5.1 >Reporter: jiahong.li >Assignee: jiahong.li >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > In the fromAccumulators method of TaskMetrics,we should use ` > tm._externalAccums.addAll` instead of `tm._externalAccums.add`, as > _externalAccums is a instance of CopyOnWriteArrayList -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48465) Avoid no-op empty relation propagation in AQE
[ https://issues.apache.org/jira/browse/SPARK-48465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48465. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46814 [https://github.com/apache/spark/pull/46814] > Avoid no-op empty relation propagation in AQE > - > > Key: SPARK-48465 > URL: https://issues.apache.org/jira/browse/SPARK-48465 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Assignee: Ziqi Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We should avoid no-op empty relation propagation in AQE: if we convert an > empty QueryStageExec to empty relation, it will further wrapped into a new > query stage and execute -> produce empty result -> empty relation propagation > again. This issue is currently not exposed because AQE will try to reuse > shuffle. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48430) Fix map value extraction when map contains collated strings
[ https://issues.apache.org/jira/browse/SPARK-48430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48430. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46758 [https://github.com/apache/spark/pull/46758] > Fix map value extraction when map contains collated strings > --- > > Key: SPARK-48430 > URL: https://issues.apache.org/jira/browse/SPARK-48430 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nikola Mandic >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Following queries return unexpected results: > {code:java} > select collation(map('a', 'b' collate utf8_binary_lcase)['a']); > select collation(element_at(map('a', 'b' collate utf8_binary_lcase), > 'a'));{code} > Both return UTF8_BINARY instead of UTF8_BINARY_LCASE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48476) NPE thrown when delimiter set to null in CSV
[ https://issues.apache.org/jira/browse/SPARK-48476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48476. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46810 [https://github.com/apache/spark/pull/46810] > NPE thrown when delimiter set to null in CSV > > > Key: SPARK-48476 > URL: https://issues.apache.org/jira/browse/SPARK-48476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Milan Stefanovic >Assignee: Milan Stefanovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When customers specified delimiter to null, currently we throw NPE. We should > throw customer facing error > repro: > spark.read.format("csv") > .option("delimiter", null) > .load() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48476) NPE thrown when delimiter set to null in CSV
[ https://issues.apache.org/jira/browse/SPARK-48476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48476: --- Assignee: Milan Stefanovic > NPE thrown when delimiter set to null in CSV > > > Key: SPARK-48476 > URL: https://issues.apache.org/jira/browse/SPARK-48476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Milan Stefanovic >Assignee: Milan Stefanovic >Priority: Major > Labels: pull-request-available > > When customers specified delimiter to null, currently we throw NPE. We should > throw customer facing error > repro: > spark.read.format("csv") > .option("delimiter", null) > .load() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48419) Foldable propagation replace foldable column should use origin column
[ https://issues.apache.org/jira/browse/SPARK-48419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48419: --- Assignee: KnightChess > Foldable propagation replace foldable column should use origin column > - > > Key: SPARK-48419 > URL: https://issues.apache.org/jira/browse/SPARK-48419 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.3, 3.1.3, 3.2.4, 4.0.0, 3.5.1, 3.3.4 >Reporter: KnightChess >Assignee: KnightChess >Priority: Major > Labels: pull-request-available > > column name will be change by `FoldablePropagation` in optimizer > befor optimizer: > ```shell > 'Project ['x, 'y, 'z] > +- 'Project ['a AS x#112, str AS Y#113, 'b AS z#114] > +- LocalRelation , [a#0, b#1] > ``` > after optimizer: > ```shell > Project [x#112, str AS Y#113, z#114] > +- Project [a#0 AS x#112, str AS Y#113, b#1 AS z#114] > +- LocalRelation , [a#0, b#1] > ``` > column name `y` will be replace to 'Y' -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48419) Foldable propagation replace foldable column should use origin column
[ https://issues.apache.org/jira/browse/SPARK-48419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48419. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46742 [https://github.com/apache/spark/pull/46742] > Foldable propagation replace foldable column should use origin column > - > > Key: SPARK-48419 > URL: https://issues.apache.org/jira/browse/SPARK-48419 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.3, 3.1.3, 3.2.4, 4.0.0, 3.5.1, 3.3.4 >Reporter: KnightChess >Assignee: KnightChess >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > column name will be change by `FoldablePropagation` in optimizer > befor optimizer: > ```shell > 'Project ['x, 'y, 'z] > +- 'Project ['a AS x#112, str AS Y#113, 'b AS z#114] > +- LocalRelation , [a#0, b#1] > ``` > after optimizer: > ```shell > Project [x#112, str AS Y#113, z#114] > +- Project [a#0 AS x#112, str AS Y#113, b#1 AS z#114] > +- LocalRelation , [a#0, b#1] > ``` > column name `y` will be replace to 'Y' -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48468) Add LogicalQueryStage interface in catalyst
[ https://issues.apache.org/jira/browse/SPARK-48468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48468. - Fix Version/s: 4.0.0 Resolution: Fixed > Add LogicalQueryStage interface in catalyst > --- > > Key: SPARK-48468 > URL: https://issues.apache.org/jira/browse/SPARK-48468 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Assignee: Ziqi Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add `LogicalQueryStage` interface in catalyst so that it's visible in logical > rules -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48468) Add LogicalQueryStage interface in catalyst
[ https://issues.apache.org/jira/browse/SPARK-48468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48468: --- Assignee: Ziqi Liu > Add LogicalQueryStage interface in catalyst > --- > > Key: SPARK-48468 > URL: https://issues.apache.org/jira/browse/SPARK-48468 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Assignee: Ziqi Liu >Priority: Major > Labels: pull-request-available > > Add `LogicalQueryStage` interface in catalyst so that it's visible in logical > rules -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48477) Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite
[ https://issues.apache.org/jira/browse/SPARK-48477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48477. - Fix Version/s: 4.0.0 Resolution: Fixed > Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite > -- > > Key: SPARK-48477 > URL: https://issues.apache.org/jira/browse/SPARK-48477 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48292) Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
[ https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48292. - Fix Version/s: 4.0.0 Assignee: angerszhu (was: L. C. Hsieh) Resolution: Fixed > Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage > when committed file not consistent with task status > -- > > Key: SPARK-48292 > URL: https://issues.apache.org/jira/browse/SPARK-48292 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: L. C. Hsieh >Assignee: angerszhu >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > When a task attemp fails but it is authorized to do task commit, > OutputCommitCoordinator will make the stage failed with a reason message > which says that task commit success, but actually the driver never knows if a > task commit is successful or not. We should update the reason message to make > it less confused. > See https://github.com/apache/spark/pull/36564#discussion_r1598660630 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48292) Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
[ https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48292: --- Assignee: L. C. Hsieh > Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage > when committed file not consistent with task status > -- > > Key: SPARK-48292 > URL: https://issues.apache.org/jira/browse/SPARK-48292 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > Labels: pull-request-available > > When a task attemp fails but it is authorized to do task commit, > OutputCommitCoordinator will make the stage failed with a reason message > which says that task commit success, but actually the driver never knows if a > task commit is successful or not. We should update the reason message to make > it less confused. > See https://github.com/apache/spark/pull/36564#discussion_r1598660630 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48431) Do not forward predicates on collated columns to file readers
[ https://issues.apache.org/jira/browse/SPARK-48431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48431. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46760 [https://github.com/apache/spark/pull/46760] > Do not forward predicates on collated columns to file readers > - > > Key: SPARK-48431 > URL: https://issues.apache.org/jira/browse/SPARK-48431 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jan-Ole Sasse >Assignee: Jan-Ole Sasse >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SPARK-47657 allows to push filters on collated columns to file sources that > support it. If such filters are pushed to file sources, those file sources > must not push those filters to the actual file readers (i.e. parquet or csv > readers), because there is no guarantee that those support collations. > With this task, we are widening filters on collations to be AlwaysTrue when > we translate filters for file sources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48462) Refactor HiveQuerySuite.scala and HiveTableScanSuite
[ https://issues.apache.org/jira/browse/SPARK-48462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48462. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46792 [https://github.com/apache/spark/pull/46792] > Refactor HiveQuerySuite.scala and HiveTableScanSuite > > > Key: SPARK-48462 > URL: https://issues.apache.org/jira/browse/SPARK-48462 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48281) Alter string search logic for: instr, substring_index (UTF8_BINARY_LCASE)
[ https://issues.apache.org/jira/browse/SPARK-48281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48281: --- Assignee: Uroš Bojanić > Alter string search logic for: instr, substring_index (UTF8_BINARY_LCASE) > - > > Key: SPARK-48281 > URL: https://issues.apache.org/jira/browse/SPARK-48281 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48281) Alter string search logic for: instr, substring_index (UTF8_BINARY_LCASE)
[ https://issues.apache.org/jira/browse/SPARK-48281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48281. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46589 [https://github.com/apache/spark/pull/46589] > Alter string search logic for: instr, substring_index (UTF8_BINARY_LCASE) > - > > Key: SPARK-48281 > URL: https://issues.apache.org/jira/browse/SPARK-48281 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48444) Refactor SQLQuerySuite
[ https://issues.apache.org/jira/browse/SPARK-48444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48444. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46778 [https://github.com/apache/spark/pull/46778] > Refactor SQLQuerySuite > -- > > Key: SPARK-48444 > URL: https://issues.apache.org/jira/browse/SPARK-48444 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48000) Hash join support for strings with collation (StringType only)
[ https://issues.apache.org/jira/browse/SPARK-48000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48000. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46599 [https://github.com/apache/spark/pull/46599] > Hash join support for strings with collation (StringType only) > -- > > Key: SPARK-48000 > URL: https://issues.apache.org/jira/browse/SPARK-48000 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48000) Hash join support for strings with collation (StringType only)
[ https://issues.apache.org/jira/browse/SPARK-48000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48000: --- Assignee: Uroš Bojanić > Hash join support for strings with collation (StringType only) > -- > > Key: SPARK-48000 > URL: https://issues.apache.org/jira/browse/SPARK-48000 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48221) Alter string search logic for: startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE)
[ https://issues.apache.org/jira/browse/SPARK-48221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48221: --- Assignee: Uroš Bojanić > Alter string search logic for: startsWith, endsWith, contains, locate > (UTF8_BINARY_LCASE) > - > > Key: SPARK-48221 > URL: https://issues.apache.org/jira/browse/SPARK-48221 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48221) Alter string search logic for: startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE)
[ https://issues.apache.org/jira/browse/SPARK-48221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48221. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46511 [https://github.com/apache/spark/pull/46511] > Alter string search logic for: startsWith, endsWith, contains, locate > (UTF8_BINARY_LCASE) > - > > Key: SPARK-48221 > URL: https://issues.apache.org/jira/browse/SPARK-48221 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48273) Late rewrite of PlanWithUnresolvedIdentifier
[ https://issues.apache.org/jira/browse/SPARK-48273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48273: --- Assignee: Nikola Mandic > Late rewrite of PlanWithUnresolvedIdentifier > > > Key: SPARK-48273 > URL: https://issues.apache.org/jira/browse/SPARK-48273 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nikola Mandic >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > > PlanWithUnresolvedIdentifier is rewritten later in analysis which causes > rules like > SubstituteUnresolvedOrdinals to miss the new plan. This causes following > queries to fail: > {code:java} > create temporary view identifier('v1') as (select my_col from (values (1), > (2), (1) as (my_col)) group by 1); > -- > cache table identifier('t1') as (select my_col from (values (1), (2), (1) as > (my_col)) group by 1); > -- > create table identifier('t2') as (select my_col from (values (1), (2), (1) > as (my_col)) group by 1); > insert into identifier('t2') select my_col from (values (3) as (my_col)) > group by 1; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48159) Datetime expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48159. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46618 [https://github.com/apache/spark/pull/46618] > Datetime expressions (all collations) > - > > Key: SPARK-48159 > URL: https://issues.apache.org/jira/browse/SPARK-48159 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nebojsa Savic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48273) Late rewrite of PlanWithUnresolvedIdentifier
[ https://issues.apache.org/jira/browse/SPARK-48273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48273. - Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46580 [https://github.com/apache/spark/pull/46580] > Late rewrite of PlanWithUnresolvedIdentifier > > > Key: SPARK-48273 > URL: https://issues.apache.org/jira/browse/SPARK-48273 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nikola Mandic >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > PlanWithUnresolvedIdentifier is rewritten later in analysis which causes > rules like > SubstituteUnresolvedOrdinals to miss the new plan. This causes following > queries to fail: > {code:java} > create temporary view identifier('v1') as (select my_col from (values (1), > (2), (1) as (my_col)) group by 1); > -- > cache table identifier('t1') as (select my_col from (values (1), (2), (1) as > (my_col)) group by 1); > -- > create table identifier('t2') as (select my_col from (values (1), (2), (1) > as (my_col)) group by 1); > insert into identifier('t2') select my_col from (values (3) as (my_col)) > group by 1; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46841) Language support for collations
[ https://issues.apache.org/jira/browse/SPARK-46841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-46841. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46180 [https://github.com/apache/spark/pull/46180] > Language support for collations > --- > > Key: SPARK-46841 > URL: https://issues.apache.org/jira/browse/SPARK-46841 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Languages and localization for collations are supported by ICU library. > Collation naming format is as follows: > {code:java} > <2-letter language code>[_<4-letter script>][_<3-letter country > code>][_specifier_specifier...]{code} > Locale specifier consists of the first part of collation name (language + > script + country). Locale specifiers need to be stable across ICU versions; > to keep existing ids and names invariant we introduce golden file will locale > table which should case CI failure on any silent changes. > Currently supported optional specifiers: > * CS/CI - case sensitivity, default is case-sensitive; supported by > configuring ICU collation levels > * AS/AI - accent sensitivity; default is accent-sensitive; supported by > configuring ICU collation levels > * /LCASE/UCASE - case conversion performed prior to > comparisons; supported by internal implementation relying on ICU locale-aware > conversions > User can use collation specifiers in any order except of locale which is > mandatory and must go first. There is a one-to-one mapping between collation > ids and collation names defined in CollationFactory. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46841) Language support for collations
[ https://issues.apache.org/jira/browse/SPARK-46841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-46841: --- Assignee: Nikola Mandic > Language support for collations > --- > > Key: SPARK-46841 > URL: https://issues.apache.org/jira/browse/SPARK-46841 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > > Languages and localization for collations are supported by ICU library. > Collation naming format is as follows: > {code:java} > <2-letter language code>[_<4-letter script>][_<3-letter country > code>][_specifier_specifier...]{code} > Locale specifier consists of the first part of collation name (language + > script + country). Locale specifiers need to be stable across ICU versions; > to keep existing ids and names invariant we introduce golden file will locale > table which should case CI failure on any silent changes. > Currently supported optional specifiers: > * CS/CI - case sensitivity, default is case-sensitive; supported by > configuring ICU collation levels > * AS/AI - accent sensitivity; default is accent-sensitive; supported by > configuring ICU collation levels > * /LCASE/UCASE - case conversion performed prior to > comparisons; supported by internal implementation relying on ICU locale-aware > conversions > User can use collation specifiers in any order except of locale which is > mandatory and must go first. There is a one-to-one mapping between collation > ids and collation names defined in CollationFactory. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48364) Type casting for AbstractMapType
[ https://issues.apache.org/jira/browse/SPARK-48364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48364: --- Assignee: Uroš Bojanić > Type casting for AbstractMapType > > > Key: SPARK-48364 > URL: https://issues.apache.org/jira/browse/SPARK-48364 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48364) Type casting for AbstractMapType
[ https://issues.apache.org/jira/browse/SPARK-48364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48364. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46661 [https://github.com/apache/spark/pull/46661] > Type casting for AbstractMapType > > > Key: SPARK-48364 > URL: https://issues.apache.org/jira/browse/SPARK-48364 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48215) DateFormatClass (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48215. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46561 [https://github.com/apache/spark/pull/46561] > DateFormatClass (all collations) > > > Key: SPARK-48215 > URL: https://issues.apache.org/jira/browse/SPARK-48215 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nebojsa Savic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *DateFormatClass* built-in function in > Spark. First confirm what is the expected behaviour for this expression when > given collated strings, and then move on to implementation and testing. You > will find this expression in the *datetimeExpressions.scala* file, and it > should be considered a pass-through function with respect to collation > awareness. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *DateFormatClass* > expression so that it supports all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48215) DateFormatClass (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48215: --- Assignee: Nebojsa Savic > DateFormatClass (all collations) > > > Key: SPARK-48215 > URL: https://issues.apache.org/jira/browse/SPARK-48215 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nebojsa Savic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *DateFormatClass* built-in function in > Spark. First confirm what is the expected behaviour for this expression when > given collated strings, and then move on to implementation and testing. You > will find this expression in the *datetimeExpressions.scala* file, and it > should be considered a pass-through function with respect to collation > awareness. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *DateFormatClass* > expression so that it supports all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48305) CurrentLike - Database/Schema, Catalog, User (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48305. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46613 [https://github.com/apache/spark/pull/46613] > CurrentLike - Database/Schema, Catalog, User (all collations) > - > > Key: SPARK-48305 > URL: https://issues.apache.org/jira/browse/SPARK-48305 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48175) Store collation information in metadata and not in type for SER/DE
[ https://issues.apache.org/jira/browse/SPARK-48175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48175: --- Assignee: Stefan Kandic > Store collation information in metadata and not in type for SER/DE > -- > > Key: SPARK-48175 > URL: https://issues.apache.org/jira/browse/SPARK-48175 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > > Changing serialization and deserialization of collated strings so that the > collation information is put in the metadata of the enclosing struct field - > and then read back from there during parsing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48175) Store collation information in metadata and not in type for SER/DE
[ https://issues.apache.org/jira/browse/SPARK-48175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48175. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46280 [https://github.com/apache/spark/pull/46280] > Store collation information in metadata and not in type for SER/DE > -- > > Key: SPARK-48175 > URL: https://issues.apache.org/jira/browse/SPARK-48175 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Changing serialization and deserialization of collated strings so that the > collation information is put in the metadata of the enclosing struct field - > and then read back from there during parsing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48308) Unify getting data schema without partition columns in FileSourceStrategy
[ https://issues.apache.org/jira/browse/SPARK-48308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48308. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46619 [https://github.com/apache/spark/pull/46619] > Unify getting data schema without partition columns in FileSourceStrategy > - > > Key: SPARK-48308 > URL: https://issues.apache.org/jira/browse/SPARK-48308 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.1 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > > In > [FileSourceStrategy,|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L191] > the schema of the data excluding partition columns is computed 2 times in a > slightly different way: > > {code:java} > val dataColumnsWithoutPartitionCols = > dataColumns.filterNot(partitionSet.contains) {code} > > vs > {code:java} > val readDataColumns = dataColumns > .filterNot(partitionColumns.contains) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48308) Unify getting data schema without partition columns in FileSourceStrategy
[ https://issues.apache.org/jira/browse/SPARK-48308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48308: --- Assignee: Johan Lasperas > Unify getting data schema without partition columns in FileSourceStrategy > - > > Key: SPARK-48308 > URL: https://issues.apache.org/jira/browse/SPARK-48308 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.1 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Trivial > Labels: pull-request-available > > In > [FileSourceStrategy,|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L191] > the schema of the data excluding partition columns is computed 2 times in a > slightly different way: > > {code:java} > val dataColumnsWithoutPartitionCols = > dataColumns.filterNot(partitionSet.contains) {code} > > vs > {code:java} > val readDataColumns = dataColumns > .filterNot(partitionColumns.contains) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48288) Add source data type to connector.Cast expression
[ https://issues.apache.org/jira/browse/SPARK-48288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48288. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46596 [https://github.com/apache/spark/pull/46596] > Add source data type to connector.Cast expression > - > > Key: SPARK-48288 > URL: https://issues.apache.org/jira/browse/SPARK-48288 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uros Stankovic >Assignee: Uros Stankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, > V2ExpressionBuilder will build connector.Cast expression from catalyst.Cast > expression. > Catalyst cast have expression data type, but connector cast does not have it. > Since some casts are not allowed on external engine, we need to know source > and target data type, since we want finer granularity to block some > unsupported casts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48252) Update CommonExpressionRef when necessary
[ https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48252. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46552 [https://github.com/apache/spark/pull/46552] > Update CommonExpressionRef when necessary > - > > Key: SPARK-48252 > URL: https://issues.apache.org/jira/browse/SPARK-48252 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48252) Update CommonExpressionRef when necessary
[ https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48252: --- Assignee: Wenchen Fan > Update CommonExpressionRef when necessary > - > > Key: SPARK-48252 > URL: https://issues.apache.org/jira/browse/SPARK-48252 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48172) Fix escaping issues in JDBCDialects
[ https://issues.apache.org/jira/browse/SPARK-48172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48172. - Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46588 [https://github.com/apache/spark/pull/46588] > Fix escaping issues in JDBCDialects > --- > > Key: SPARK-48172 > URL: https://issues.apache.org/jira/browse/SPARK-48172 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48277) Improve error message for ErrorClassesJsonReader.getErrorMessage
[ https://issues.apache.org/jira/browse/SPARK-48277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48277. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46584 [https://github.com/apache/spark/pull/46584] > Improve error message for ErrorClassesJsonReader.getErrorMessage > > > Key: SPARK-48277 > URL: https://issues.apache.org/jira/browse/SPARK-48277 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48160) XPath expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48160: --- Assignee: Uroš Bojanić > XPath expressions (all collations) > -- > > Key: SPARK-48160 > URL: https://issues.apache.org/jira/browse/SPARK-48160 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48160) XPath expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48160. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46508 [https://github.com/apache/spark/pull/46508] > XPath expressions (all collations) > -- > > Key: SPARK-48160 > URL: https://issues.apache.org/jira/browse/SPARK-48160 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48162) Miscellaneous expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48162: --- Assignee: Uroš Bojanić > Miscellaneous expressions (all collations) > -- > > Key: SPARK-48162 > URL: https://issues.apache.org/jira/browse/SPARK-48162 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48162) Miscellaneous expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48162. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46461 [https://github.com/apache/spark/pull/46461] > Miscellaneous expressions (all collations) > -- > > Key: SPARK-48162 > URL: https://issues.apache.org/jira/browse/SPARK-48162 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48271) Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER
[ https://issues.apache.org/jira/browse/SPARK-48271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-48271: Summary: Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER (was: support char/varchar in RowEncoder) > Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER > - > > Key: SPARK-48271 > URL: https://issues.apache.org/jira/browse/SPARK-48271 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48263) Collate function support for non UTF8_BINARY strings
[ https://issues.apache.org/jira/browse/SPARK-48263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48263. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46574 [https://github.com/apache/spark/pull/46574] > Collate function support for non UTF8_BINARY strings > > > Key: SPARK-48263 > URL: https://issues.apache.org/jira/browse/SPARK-48263 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nebojsa Savic >Assignee: Nebojsa Savic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When default collation level config is set to some collation other than > UTF8_BINARY (i.e. UTF8_BINARY_LCASE) and when we try to execute COLLATE (or > collation) expression, this will fail because it is only accepting > StringType(0) as argument for collation name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48263) Collate function support for non UTF8_BINARY strings
[ https://issues.apache.org/jira/browse/SPARK-48263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48263: --- Assignee: Nebojsa Savic > Collate function support for non UTF8_BINARY strings > > > Key: SPARK-48263 > URL: https://issues.apache.org/jira/browse/SPARK-48263 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nebojsa Savic >Assignee: Nebojsa Savic >Priority: Major > Labels: pull-request-available > > When default collation level config is set to some collation other than > UTF8_BINARY (i.e. UTF8_BINARY_LCASE) and when we try to execute COLLATE (or > collation) expression, this will fail because it is only accepting > StringType(0) as argument for collation name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48172) Fix escaping issues in JDBCDialects
[ https://issues.apache.org/jira/browse/SPARK-48172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48172. - Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46437 [https://github.com/apache/spark/pull/46437] > Fix escaping issues in JDBCDialects > --- > > Key: SPARK-48172 > URL: https://issues.apache.org/jira/browse/SPARK-48172 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48155) PropagateEmpty relation cause LogicalQueryStage only with broadcast without join then execute failed
[ https://issues.apache.org/jira/browse/SPARK-48155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48155. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46523 [https://github.com/apache/spark/pull/46523] > PropagateEmpty relation cause LogicalQueryStage only with broadcast without > join then execute failed > > > Key: SPARK-48155 > URL: https://issues.apache.org/jira/browse/SPARK-48155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.5.1, 3.3.4 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > 24/05/07 09:48:55 ERROR [main] PlanChangeLogger: > === Applying Rule > org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation === > Project [date#124, station_name#0, shipment_id#14] > +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 > AND station_type#1 IN (3,12)) > +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 > more fields] > ! +- Join LeftOuter, ((cast(date#124 as timestamp) >= > cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, > Some(Asia/Singapore)) as timestamp)) AND (cast(date#124 as timestamp) + > INTERVAL '-4' DAY <= cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, > Some(Asia/Singapore)) as timestamp))) > ! :- LogicalQueryStage Generate > explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), > false, [date#124], BroadcastQueryStage 0 > ! +- LocalRelation , [shipment_id#14, station_name#5, ... 3 > more fields]24/05/07 09:48:55 ERROR [main] > Project [date#124, station_name#0, shipment_id#14] > +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 > AND station_type#1 IN (3,12)) > +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 > more fields] > ! +- Project [date#124, cast(null as string) AS shipment_id#14, ... 4 > more fields] > ! +- LogicalQueryStage Generate > explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), > false, [date#124], BroadcastQueryStage 0 {code} > {code:java} > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.lang.UnsupportedOperationException: BroadcastExchange does not support > the execute() code path.at > org.apache.spark.sql.errors.QueryExecutionErrors$.executeCodePathUnsupportedError(QueryExecutionErrors.scala:1652) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:203) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:119) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:526) > at > org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:454) > at > org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:453) > at > org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:497) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50) > at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:750) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at
[jira] [Assigned] (SPARK-48155) PropagateEmpty relation cause LogicalQueryStage only with broadcast without join then execute failed
[ https://issues.apache.org/jira/browse/SPARK-48155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48155: --- Assignee: angerszhu > PropagateEmpty relation cause LogicalQueryStage only with broadcast without > join then execute failed > > > Key: SPARK-48155 > URL: https://issues.apache.org/jira/browse/SPARK-48155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.5.1, 3.3.4 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > > {code:java} > 24/05/07 09:48:55 ERROR [main] PlanChangeLogger: > === Applying Rule > org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation === > Project [date#124, station_name#0, shipment_id#14] > +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 > AND station_type#1 IN (3,12)) > +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 > more fields] > ! +- Join LeftOuter, ((cast(date#124 as timestamp) >= > cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, > Some(Asia/Singapore)) as timestamp)) AND (cast(date#124 as timestamp) + > INTERVAL '-4' DAY <= cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, > Some(Asia/Singapore)) as timestamp))) > ! :- LogicalQueryStage Generate > explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), > false, [date#124], BroadcastQueryStage 0 > ! +- LocalRelation , [shipment_id#14, station_name#5, ... 3 > more fields]24/05/07 09:48:55 ERROR [main] > Project [date#124, station_name#0, shipment_id#14] > +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 > AND station_type#1 IN (3,12)) > +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 > more fields] > ! +- Project [date#124, cast(null as string) AS shipment_id#14, ... 4 > more fields] > ! +- LogicalQueryStage Generate > explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), > false, [date#124], BroadcastQueryStage 0 {code} > {code:java} > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.lang.UnsupportedOperationException: BroadcastExchange does not support > the execute() code path.at > org.apache.spark.sql.errors.QueryExecutionErrors$.executeCodePathUnsupportedError(QueryExecutionErrors.scala:1652) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:203) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:119) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:526) > at > org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:454) > at > org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:453) > at > org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:497) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50) > at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:750) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) > at >
[jira] [Created] (SPARK-48271) support char/varchar in RowEncoder
Wenchen Fan created SPARK-48271: --- Summary: support char/varchar in RowEncoder Key: SPARK-48271 URL: https://issues.apache.org/jira/browse/SPARK-48271 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48157) CSV expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48157. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46504 [https://github.com/apache/spark/pull/46504] > CSV expressions (all collations) > > > Key: SPARK-48157 > URL: https://issues.apache.org/jira/browse/SPARK-48157 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for *CSV* built-in string functions in Spark > ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *csvExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *CSV* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48157) CSV expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48157: --- Assignee: Uroš Bojanić > CSV expressions (all collations) > > > Key: SPARK-48157 > URL: https://issues.apache.org/jira/browse/SPARK-48157 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for *CSV* built-in string functions in Spark > ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *csvExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *CSV* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48229) inputFile expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48229. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46503 [https://github.com/apache/spark/pull/46503] > inputFile expressions (all collations) > -- > > Key: SPARK-48229 > URL: https://issues.apache.org/jira/browse/SPARK-48229 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48265) Infer window group limit batch should do constant folding
[ https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48265: --- Assignee: angerszhu > Infer window group limit batch should do constant folding > - > > Key: SPARK-48265 > URL: https://issues.apache.org/jira/browse/SPARK-48265 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > > {code:java} > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Result of Batch LocalRelation === > GlobalLimit 21 > GlobalLimit 21 > +- LocalLimit 21 > +- LocalLimit 21 > ! +- Union false, false > +- > LocalLimit 21 > ! :- LocalLimit 21 > +- > Project [item_id#647L] > ! : +- Project [item_id#647L] > +- > Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > ! : +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) > AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > +- Relation db.table[,... 91 more fields] parquet > ! : +- Relation db.table[,... 91 more fields] parquet > ! +- LocalLimit 21 > ! +- Project [item_id#738L] > ! +- LocalRelation , [, ... 91 more fields] > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian > Products has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no > effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > NormalizeFloatingNumbers has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > ReplaceUpdateFieldsExpression has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only > Query has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has > no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from > PartitionPruning has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that > cannot be pushed down has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits === > GlobalLimit 21 > GlobalLimit 21 > !+- LocalLimit 21 > +- LocalLimit > least(, ... 2 more fields) > ! +- LocalLimit 21 > +- Project > [item_id#647L] > ! +- Project [item_id#647L] > +- Filter > (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = > BR)) AND isnotnull(grass_region#735)) > ! +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) +- > Relation db.table[,... 91 more fields] parquet > ! +- Relation db.table[,... 91 more fields] parquet > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48265) Infer window group limit batch should do constant folding
[ https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48265. - Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46568 [https://github.com/apache/spark/pull/46568] > Infer window group limit batch should do constant folding > - > > Key: SPARK-48265 > URL: https://issues.apache.org/jira/browse/SPARK-48265 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > {code:java} > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Result of Batch LocalRelation === > GlobalLimit 21 > GlobalLimit 21 > +- LocalLimit 21 > +- LocalLimit 21 > ! +- Union false, false > +- > LocalLimit 21 > ! :- LocalLimit 21 > +- > Project [item_id#647L] > ! : +- Project [item_id#647L] > +- > Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > ! : +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) > AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > +- Relation db.table[,... 91 more fields] parquet > ! : +- Relation db.table[,... 91 more fields] parquet > ! +- LocalLimit 21 > ! +- Project [item_id#738L] > ! +- LocalRelation , [, ... 91 more fields] > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian > Products has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no > effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > NormalizeFloatingNumbers has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > ReplaceUpdateFieldsExpression has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only > Query has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has > no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from > PartitionPruning has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that > cannot be pushed down has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits === > GlobalLimit 21 > GlobalLimit 21 > !+- LocalLimit 21 > +- LocalLimit > least(, ... 2 more fields) > ! +- LocalLimit 21 > +- Project > [item_id#647L] > ! +- Project [item_id#647L] > +- Filter > (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = > BR)) AND isnotnull(grass_region#735)) > ! +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) +- > Relation db.table[,... 91 more fields] parquet > ! +- Relation db.table[,... 91 more fields] parquet > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48241) CSV parsing failure with char/varchar type columns
[ https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48241. - Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46565 [https://github.com/apache/spark/pull/46565] > CSV parsing failure with char/varchar type columns > -- > > Key: SPARK-48241 > URL: https://issues.apache.org/jira/browse/SPARK-48241 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.1 >Reporter: Jiayi Liu >Assignee: Jiayi Liu >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > CSV table containing char and varchar columns will result in the following > error when selecting from the CSV table: > {code:java} > java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct) should be the subset of dataSchema > (struct). > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code} > The reason for the error is that the StringType columns in the dataSchema and > requiredSchema of UnivocityParser are not consistent. It is due to the > metadata contained in the StringType StructField of the dataSchema, which is > missing in the requiredSchema. We need to retain the metadata when resolving > schema. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48241) CSV parsing failure with char/varchar type columns
[ https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48241: --- Assignee: Jiayi Liu > CSV parsing failure with char/varchar type columns > -- > > Key: SPARK-48241 > URL: https://issues.apache.org/jira/browse/SPARK-48241 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.1 >Reporter: Jiayi Liu >Assignee: Jiayi Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > CSV table containing char and varchar columns will result in the following > error when selecting from the CSV table: > {code:java} > java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct) should be the subset of dataSchema > (struct). > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code} > The reason for the error is that the StringType columns in the dataSchema and > requiredSchema of UnivocityParser are not consistent. It is due to the > metadata contained in the StringType StructField of the dataSchema, which is > missing in the requiredSchema. We need to retain the metadata when resolving > schema. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48206) Add tests for window expression rewrites in RewriteWithExpression
[ https://issues.apache.org/jira/browse/SPARK-48206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48206. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46492 [https://github.com/apache/spark/pull/46492] > Add tests for window expression rewrites in RewriteWithExpression > - > > Key: SPARK-48206 > URL: https://issues.apache.org/jira/browse/SPARK-48206 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Window expressions can be potentially problematic if we pull out a window > expression outside a `Window` operator. Right now this shouldn't happen but > we should add some tests to make sure it doesn't break. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48206) Add tests for window expression rewrites in RewriteWithExpression
[ https://issues.apache.org/jira/browse/SPARK-48206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48206: --- Assignee: Kelvin Jiang > Add tests for window expression rewrites in RewriteWithExpression > - > > Key: SPARK-48206 > URL: https://issues.apache.org/jira/browse/SPARK-48206 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > Window expressions can be potentially problematic if we pull out a window > expression outside a `Window` operator. Right now this shouldn't happen but > we should add some tests to make sure it doesn't break. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48031) Add schema evolution options to views
[ https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48031: --- Assignee: Serge Rielau > Add schema evolution options to views > -- > > Key: SPARK-48031 > URL: https://issues.apache.org/jira/browse/SPARK-48031 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > > We want to provide the ability for views to react to changes in the query > resolution in manners differently than just failing the view. > For example we want the view to be able to compensate for type changes by > casting the query result to the view column types. > Or to adopt any type of column arity changes into a view. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48031) Add schema evolution options to views
[ https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48031. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46267 [https://github.com/apache/spark/pull/46267] > Add schema evolution options to views > -- > > Key: SPARK-48031 > URL: https://issues.apache.org/jira/browse/SPARK-48031 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We want to provide the ability for views to react to changes in the query > resolution in manners differently than just failing the view. > For example we want the view to be able to compensate for type changes by > casting the query result to the view column types. > Or to adopt any type of column arity changes into a view. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite
Wenchen Fan created SPARK-48260: --- Summary: disable output committer coordination in one test of ParquetIOSuite Key: SPARK-48260 URL: https://issues.apache.org/jira/browse/SPARK-48260 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48252) Update CommonExpressionRef when necessary
Wenchen Fan created SPARK-48252: --- Summary: Update CommonExpressionRef when necessary Key: SPARK-48252 URL: https://issues.apache.org/jira/browse/SPARK-48252 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48146) Fix error with aggregate function in With child
[ https://issues.apache.org/jira/browse/SPARK-48146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48146: --- Assignee: Kelvin Jiang > Fix error with aggregate function in With child > --- > > Key: SPARK-48146 > URL: https://issues.apache.org/jira/browse/SPARK-48146 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > Right now, if we have an aggregate function in the child of a With > expression, we fail an assertion. However, queries like this used to work: > {code:sql} > select > id between cast(max(id between 1 and 2) as int) and id > from range(10) > group by id > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48146) Fix error with aggregate function in With child
[ https://issues.apache.org/jira/browse/SPARK-48146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48146. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46443 [https://github.com/apache/spark/pull/46443] > Fix error with aggregate function in With child > --- > > Key: SPARK-48146 > URL: https://issues.apache.org/jira/browse/SPARK-48146 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Right now, if we have an aggregate function in the child of a With > expression, we fail an assertion. However, queries like this used to work: > {code:sql} > select > id between cast(max(id between 1 and 2) as int) and id > from range(10) > group by id > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48158) XML expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48158. - Fix Version/s: 4.0.0 Assignee: Uroš Bojanić Resolution: Fixed > XML expressions (all collations) > > > Key: SPARK-48158 > URL: https://issues.apache.org/jira/browse/SPARK-48158 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for *XML* built-in string functions in Spark > ({*}XmlToStructs{*}, {*}SchemaOfXml{*}, {*}StructsToXml{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *xmlExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *XML* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48222) Sync Ruby Bundler to 2.4.22 and refresh Gem lock file
[ https://issues.apache.org/jira/browse/SPARK-48222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48222: --- Assignee: Nicholas Chammas > Sync Ruby Bundler to 2.4.22 and refresh Gem lock file > - > > Key: SPARK-48222 > URL: https://issues.apache.org/jira/browse/SPARK-48222 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48222) Sync Ruby Bundler to 2.4.22 and refresh Gem lock file
[ https://issues.apache.org/jira/browse/SPARK-48222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48222. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46512 [https://github.com/apache/spark/pull/46512] > Sync Ruby Bundler to 2.4.22 and refresh Gem lock file > - > > Key: SPARK-48222 > URL: https://issues.apache.org/jira/browse/SPARK-48222 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47409) StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47409: --- Assignee: David Milicevic > StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only) > -- > > Key: SPARK-47409 > URL: https://issues.apache.org/jira/browse/SPARK-47409 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: David Milicevic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringTrim* built-in string function in > Spark (including {*}StringTrimBoth{*}, {*}StringTrimLeft{*}, > {*}StringTrimRight{*}). First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|[https://www.postgresql.org/docs/]]. > > The goal for this Jira ticket is to implement the *StringTrim* function so it > supports binary & lowercase collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47409) StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47409. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46206 [https://github.com/apache/spark/pull/46206] > StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only) > -- > > Key: SPARK-47409 > URL: https://issues.apache.org/jira/browse/SPARK-47409 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: David Milicevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringTrim* built-in string function in > Spark (including {*}StringTrimBoth{*}, {*}StringTrimLeft{*}, > {*}StringTrimRight{*}). First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|[https://www.postgresql.org/docs/]]. > > The goal for this Jira ticket is to implement the *StringTrim* function so it > supports binary & lowercase collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47421) URL expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47421. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46460 [https://github.com/apache/spark/pull/46460] > URL expressions (all collations) > > > Key: SPARK-47421 > URL: https://issues.apache.org/jira/browse/SPARK-47421 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47421) URL expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47421: --- Assignee: Uroš Bojanić > URL expressions (all collations) > > > Key: SPARK-47421 > URL: https://issues.apache.org/jira/browse/SPARK-47421 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47354) Variant expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47354. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46424 [https://github.com/apache/spark/pull/46424] > Variant expressions (all collations) > > > Key: SPARK-47354 > URL: https://issues.apache.org/jira/browse/SPARK-47354 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47354) Variant expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47354: --- Assignee: Uroš Bojanić > Variant expressions (all collations) > > > Key: SPARK-47354 > URL: https://issues.apache.org/jira/browse/SPARK-47354 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48186) Add support for AbstractMapType
[ https://issues.apache.org/jira/browse/SPARK-48186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48186. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46458 [https://github.com/apache/spark/pull/46458] > Add support for AbstractMapType > --- > > Key: SPARK-48186 > URL: https://issues.apache.org/jira/browse/SPARK-48186 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48186) Add support for AbstractMapType
[ https://issues.apache.org/jira/browse/SPARK-48186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48186: --- Assignee: Uroš Bojanić > Add support for AbstractMapType > --- > > Key: SPARK-48186 > URL: https://issues.apache.org/jira/browse/SPARK-48186 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48197) avoid assert error for invalid lambda function
[ https://issues.apache.org/jira/browse/SPARK-48197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48197. - Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46475 [https://github.com/apache/spark/pull/46475] > avoid assert error for invalid lambda function > -- > > Key: SPARK-48197 > URL: https://issues.apache.org/jira/browse/SPARK-48197 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48204) fix release script for Spark 4.0+
Wenchen Fan created SPARK-48204: --- Summary: fix release script for Spark 4.0+ Key: SPARK-48204 URL: https://issues.apache.org/jira/browse/SPARK-48204 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org