[jira] [Updated] (SPARK-48578) Add new expressions for UTF8 string validation
[ https://issues.apache.org/jira/browse/SPARK-48578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48578: --- Labels: pull-request-available (was: ) > Add new expressions for UTF8 string validation > -- > > Key: SPARK-48578 > URL: https://issues.apache.org/jira/browse/SPARK-48578 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48577) Replace invalid byte sequences in UTF8Strings
[ https://issues.apache.org/jira/browse/SPARK-48577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48577: --- Labels: pull-request-available (was: ) > Replace invalid byte sequences in UTF8Strings > - > > Key: SPARK-48577 > URL: https://issues.apache.org/jira/browse/SPARK-48577 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48582) Bump `braces` from 3.0.2 to 3.0.3 in /ui-test
[ https://issues.apache.org/jira/browse/SPARK-48582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48582: --- Labels: pull-request-available (was: ) > Bump `braces` from 3.0.2 to 3.0.3 in /ui-test > - > > Key: SPARK-48582 > URL: https://issues.apache.org/jira/browse/SPARK-48582 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48582) Bump `braces` from 3.0.2 to 3.0.3 in /ui-test
Yang Jie created SPARK-48582: Summary: Bump `braces` from 3.0.2 to 3.0.3 in /ui-test Key: SPARK-48582 URL: https://issues.apache.org/jira/browse/SPARK-48582 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48581) Upgrade dropwizard metrics to 4.2.26
[ https://issues.apache.org/jira/browse/SPARK-48581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Guo updated SPARK-48581: Summary: Upgrade dropwizard metrics to 4.2.26 (was: Upgrade dropwizard metrics 4.2.26) > Upgrade dropwizard metrics to 4.2.26 > > > Key: SPARK-48581 > URL: https://issues.apache.org/jira/browse/SPARK-48581 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Wei Guo >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48581) Upgrade dropwizard metrics 4.2.26
[ https://issues.apache.org/jira/browse/SPARK-48581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48581: --- Labels: pull-request-available (was: ) > Upgrade dropwizard metrics 4.2.26 > - > > Key: SPARK-48581 > URL: https://issues.apache.org/jira/browse/SPARK-48581 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Wei Guo >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48581) Upgrade dropwizard metrics 4.2.26
Wei Guo created SPARK-48581: --- Summary: Upgrade dropwizard metrics 4.2.26 Key: SPARK-48581 URL: https://issues.apache.org/jira/browse/SPARK-48581 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Wei Guo -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48565) Fix thread dump display in UI
[ https://issues.apache.org/jira/browse/SPARK-48565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48565. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46916 [https://github.com/apache/spark/pull/46916] > Fix thread dump display in UI > - > > Key: SPARK-48565 > URL: https://issues.apache.org/jira/browse/SPARK-48565 > Project: Spark > Issue Type: Bug > Components: UI >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48565) Fix thread dump display in UI
[ https://issues.apache.org/jira/browse/SPARK-48565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48565: Assignee: Cheng Pan > Fix thread dump display in UI > - > > Key: SPARK-48565 > URL: https://issues.apache.org/jira/browse/SPARK-48565 > Project: Spark > Issue Type: Bug > Components: UI >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48563) Upgrade pickle to 1.5
[ https://issues.apache.org/jira/browse/SPARK-48563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-48563. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46913 [https://github.com/apache/spark/pull/46913] > Upgrade pickle to 1.5 > - > > Key: SPARK-48563 > URL: https://issues.apache.org/jira/browse/SPARK-48563 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Description: When push-based shuffle enabled, 0.03% of the spark application in our cluster experienced shuffle data loss. The metrics of Exchange as follows: !image-2024-06-11-10-19-57-227.png|width=405,height=170! We eventually found some WARN logs on the shuffle server: {code:java} WARN shuffle-server-8-216 org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to index/meta failed{code} And analyzed the cause from the code: The merge metadata obtained by the reduce side from the driver comes from the {{mapTracker}} in the server's memory, while the actual reading of chunk data is based on the records in the shuffle server's {{{}metaFile{}}}. There is no consistency check between the two. was: When push-based shuffle enabled, 0.03% of the spark application in our cluster experienced shuffle data loss. The metrics for the job execution plan's Exchange are as follows: !image-2024-06-11-10-19-57-227.png|width=405,height=170! We eventually found some WARN logs on the shuffle server: {code:java} WARN shuffle-server-8-216 org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to index/meta failed{code} And analyzed the cause from the code: The merge metadata obtained by the reduce side from the driver comes from the {{mapTracker}} in the server's memory, while the actual reading of chunk data is based on the records in the shuffle server's {{{}metaFile{}}}. There is no consistency check between the two. > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-57-227.png > > > When push-based shuffle enabled, 0.03% of the spark application in our > cluster experienced shuffle data loss. The metrics of Exchange as follows: > !image-2024-06-11-10-19-57-227.png|width=405,height=170! > We eventually found some WARN logs on the shuffle server: > > {code:java} > WARN shuffle-server-8-216 > org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application > application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to > index/meta failed{code} > > And analyzed the cause from the code: > The merge metadata obtained by the reduce side from the driver comes from the > {{mapTracker}} in the server's memory, while the actual reading of chunk data > is based on the records in the shuffle server's {{{}metaFile{}}}. There is no > consistency check between the two. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Description: When push-based shuffle enabled, 0.03% of the spark application in our cluster experienced shuffle data loss. The metrics for the job execution plan's Exchange are as follows: !image-2024-06-11-10-19-57-227.png|width=405,height=170! We eventually found some WARN logs on the shuffle server: {code:java} WARN shuffle-server-8-216 org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to index/meta failed{code} And analyzed the cause from the code: The merge metadata obtained by the reduce side from the driver comes from the {{mapTracker}} in the server's memory, while the actual reading of chunk data is based on the records in the shuffle server's {{{}metaFile{}}}. There is no consistency check between the two. > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-57-227.png > > > When push-based shuffle enabled, 0.03% of the spark application in our > cluster experienced shuffle data loss. The metrics for the job execution > plan's Exchange are as follows: > !image-2024-06-11-10-19-57-227.png|width=405,height=170! > We eventually found some WARN logs on the shuffle server: > > {code:java} > WARN shuffle-server-8-216 > org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application > application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to > index/meta failed{code} > > And analyzed the cause from the code: > The merge metadata obtained by the reduce side from the driver comes from the > {{mapTracker}} in the server's memory, while the actual reading of chunk data > is based on the records in the shuffle server's {{{}metaFile{}}}. There is no > consistency check between the two. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Attachment: (was: image-2024-06-11-10-19-22-284.png) > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-57-227.png > > > When push-based shuffle enabled, 0.03% of the spark application in our > cluster experienced shuffle data loss. The metrics for the job execution > plan's Exchange are as follows: > !image-2024-06-11-10-19-57-227.png|width=405,height=170! > We eventually found some WARN logs on the shuffle server: > > {code:java} > WARN shuffle-server-8-216 > org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application > application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to > index/meta failed{code} > > And analyzed the cause from the code: > The merge metadata obtained by the reduce side from the driver comes from the > {{mapTracker}} in the server's memory, while the actual reading of chunk data > is based on the records in the shuffle server's {{{}metaFile{}}}. There is no > consistency check between the two. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Attachment: image-2024-06-11-10-19-57-227.png > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-22-284.png, > image-2024-06-11-10-19-57-227.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Attachment: image-2024-06-11-10-19-22-284.png > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-22-284.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Summary: MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data (was: The merge blocks read by reduce have missing chunks, leading to inconsistent shuffle data) > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48580) The merge blocks read by reduce have missing chunks, leading to inconsistent shuffle data
gaoyajun02 created SPARK-48580: -- Summary: The merge blocks read by reduce have missing chunks, leading to inconsistent shuffle data Key: SPARK-48580 URL: https://issues.apache.org/jira/browse/SPARK-48580 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.5.0, 3.4.0, 3.3.0, 3.2.0 Reporter: gaoyajun02 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48480) StreamingQueryListener thread should not be interruptable
[ https://issues.apache.org/jira/browse/SPARK-48480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48480: --- Labels: pull-request-available (was: ) > StreamingQueryListener thread should not be interruptable > - > > Key: SPARK-48480 > URL: https://issues.apache.org/jira/browse/SPARK-48480 > Project: Spark > Issue Type: New Feature > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48569) Connect - StreamingQuery.name should return null when not specified
[ https://issues.apache.org/jira/browse/SPARK-48569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48569. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46920 [https://github.com/apache/spark/pull/46920] > Connect - StreamingQuery.name should return null when not specified > --- > > Key: SPARK-48569 > URL: https://issues.apache.org/jira/browse/SPARK-48569 > Project: Spark > Issue Type: New Feature > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48410) Fix InitCap expression
[ https://issues.apache.org/jira/browse/SPARK-48410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48410: --- Assignee: Uroš Bojanić > Fix InitCap expression > -- > > Key: SPARK-48410 > URL: https://issues.apache.org/jira/browse/SPARK-48410 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48410) Fix InitCap expression
[ https://issues.apache.org/jira/browse/SPARK-48410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48410. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46732 [https://github.com/apache/spark/pull/46732] > Fix InitCap expression > -- > > Key: SPARK-48410 > URL: https://issues.apache.org/jira/browse/SPARK-48410 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48403) Fix Upper & Lower expressions for UTF8_BINARY_LCASE
[ https://issues.apache.org/jira/browse/SPARK-48403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48403. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46720 [https://github.com/apache/spark/pull/46720] > Fix Upper & Lower expressions for UTF8_BINARY_LCASE > --- > > Key: SPARK-48403 > URL: https://issues.apache.org/jira/browse/SPARK-48403 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48564) Propagate cached schema in set operations
[ https://issues.apache.org/jira/browse/SPARK-48564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48564. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46915 [https://github.com/apache/spark/pull/46915] > Propagate cached schema in set operations > - > > Key: SPARK-48564 > URL: https://issues.apache.org/jira/browse/SPARK-48564 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48564) Propagate cached schema in set operations
[ https://issues.apache.org/jira/browse/SPARK-48564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48564: Assignee: Ruifeng Zheng > Propagate cached schema in set operations > - > > Key: SPARK-48564 > URL: https://issues.apache.org/jira/browse/SPARK-48564 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48342) [M0] Parser support
[ https://issues.apache.org/jira/browse/SPARK-48342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48342: -- Assignee: (was: Apache Spark) > [M0] Parser support > --- > > Key: SPARK-48342 > URL: https://issues.apache.org/jira/browse/SPARK-48342 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > Labels: pull-request-available > > Implement parse for SQL scripting with all supporting changes for upcoming > interpreter implementation and future extensions of the parser: > * Parser - support only compound statements > * Parser testing > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48342) [M0] Parser support
[ https://issues.apache.org/jira/browse/SPARK-48342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48342: -- Assignee: Apache Spark > [M0] Parser support > --- > > Key: SPARK-48342 > URL: https://issues.apache.org/jira/browse/SPARK-48342 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Implement parse for SQL scripting with all supporting changes for upcoming > interpreter implementation and future extensions of the parser: > * Parser - support only compound statements > * Parser testing > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48579) [M1] Merge DatabricksSqlParser with SparkSqlParser
[ https://issues.apache.org/jira/browse/SPARK-48579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48579: Description: OSS has single parser for SQL statements - SparkSqlParser. However, in runtime we have additional parser - DatabricksSqlParser. This parser is used for edge SQL rules, but there is no clear separation because folks keep adding edge rules/features to the SparkSqlParser as well. More details can be found in this design doc comment: [https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo.|https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo] It seems like there is no reason not to merge these two parsers into one, but it needs to be investigated first before refactoring. was: OSS has single parser for SQL statements - SparkSqlParser. However, in runtime we have additional parser - DatabricksSqlParser. This parser is used for edge SQL rules, but there is no clear separation because folks keep adding edge rules/features to the SparkSqlParser as well. More details can be found in [this design doc comment|[https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo]] It seems like there is no reason not to merge these two parsers into one, but it needs to be investigated first before refactoring. > [M1] Merge DatabricksSqlParser with SparkSqlParser > -- > > Key: SPARK-48579 > URL: https://issues.apache.org/jira/browse/SPARK-48579 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > OSS has single parser for SQL statements - SparkSqlParser. > However, in runtime we have additional parser - DatabricksSqlParser. This > parser is used for edge SQL rules, but there is no clear separation because > folks keep adding edge rules/features to the SparkSqlParser as well. > More details can be found in this design doc comment: > [https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo.|https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo] > > It seems like there is no reason not to merge these two parsers into one, but > it needs to be investigated first before refactoring. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48579) [M1] Merge DatabricksSqlParser with SparkSqlParser
[ https://issues.apache.org/jira/browse/SPARK-48579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48579: Description: OSS has single parser for SQL statements - SparkSqlParser. However, in runtime we have additional parser - DatabricksSqlParser. This parser is used for edge SQL rules, but there is no clear separation because folks keep adding edge rules/features to the SparkSqlParser as well. More details can be found in [this design doc comment|[https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo]] It seems like there is no reason not to merge these two parsers into one, but it needs to be investigated first before refactoring. was: OSS has single parser for SQL statements - SparkSqlParser. However, in runtime we have additional parser - DatabricksSqlParser. This parser is used for edge SQL rules, but there is no clear separation because folks keep adding edge rules/features to the SparkSqlParser as well. More details can be found in [this design doc comment|[https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo]]. It seems like there is no reason not to merge these two parsers into one, but it needs to be investigated first before refactoring. > [M1] Merge DatabricksSqlParser with SparkSqlParser > -- > > Key: SPARK-48579 > URL: https://issues.apache.org/jira/browse/SPARK-48579 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > OSS has single parser for SQL statements - SparkSqlParser. > However, in runtime we have additional parser - DatabricksSqlParser. This > parser is used for edge SQL rules, but there is no clear separation because > folks keep adding edge rules/features to the SparkSqlParser as well. > More details can be found in [this design doc > comment|[https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo]] > > It seems like there is no reason not to merge these two parsers into one, but > it needs to be investigated first before refactoring. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48579) [M1] Merge DatabricksSqlParser with SparkSqlParser
[ https://issues.apache.org/jira/browse/SPARK-48579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48579: Description: OSS has single parser for SQL statements - SparkSqlParser. However, in runtime we have additional parser - DatabricksSqlParser. This parser is used for edge SQL rules, but there is no clear separation because folks keep adding edge rules/features to the SparkSqlParser as well. More details can be found in [this design doc comment|[https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo]]. It seems like there is no reason not to merge these two parsers into one, but it needs to be investigated first before refactoring. was: OSS has single parser for SQL statements - SparkSqlParser. However, in runtime we have additional parser - DatabricksSqlParser. This parser is used for edge SQL rules, but there is no clear separation because folks keep adding edge rules/features to the SparkSqlParser as well. More details can be found in [this design doc comment|[https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo].] It seems like there is no reason not to merge these two parsers into one, but it needs to be investigated first before refactoring. > [M1] Merge DatabricksSqlParser with SparkSqlParser > -- > > Key: SPARK-48579 > URL: https://issues.apache.org/jira/browse/SPARK-48579 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > OSS has single parser for SQL statements - SparkSqlParser. > However, in runtime we have additional parser - DatabricksSqlParser. This > parser is used for edge SQL rules, but there is no clear separation because > folks keep adding edge rules/features to the SparkSqlParser as well. > More details can be found in [this design doc > comment|[https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo]]. > > It seems like there is no reason not to merge these two parsers into one, but > it needs to be investigated first before refactoring. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48579) [M1] Merge DatabricksSqlParser with SparkSqlParser
David Milicevic created SPARK-48579: --- Summary: [M1] Merge DatabricksSqlParser with SparkSqlParser Key: SPARK-48579 URL: https://issues.apache.org/jira/browse/SPARK-48579 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: David Milicevic OSS has single parser for SQL statements - SparkSqlParser. However, in runtime we have additional parser - DatabricksSqlParser. This parser is used for edge SQL rules, but there is no clear separation because folks keep adding edge rules/features to the SparkSqlParser as well. More details can be found in [this design doc comment|[https://docs.google.com/document/d/1DIsMf2LQJvD4UC5JR1-YBH6Rv3UvGK3dD2QTOf_0SK4/edit?disco=AAABNCpDpdo].] It seems like there is no reason not to merge these two parsers into one, but it needs to be investigated first before refactoring. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48311) Nested pythonUDF in groupBy and aggregate result in Binding Exception
[ https://issues.apache.org/jira/browse/SPARK-48311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853582#comment-17853582 ] Sumit Singh commented on SPARK-48311: - I have created the PR based on details explained in details doc. > Nested pythonUDF in groupBy and aggregate result in Binding Exception > -- > > Key: SPARK-48311 > URL: https://issues.apache.org/jira/browse/SPARK-48311 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.3.2 >Reporter: Sumit Singh >Priority: Major > Labels: pull-request-available > > Steps to Reproduce > 1. Data creation > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql.types import StructType, StructField, LongType, > TimestampType, StringType > from datetime import datetime > # Define the schema > schema = StructType([ > StructField("col1", LongType(), nullable=True), > StructField("col2", TimestampType(), nullable=True), > StructField("col3", StringType(), nullable=True) > ]) > # Define the data > data = [ > (1, datetime(2023, 5, 15, 12, 30), "Discount"), > (2, datetime(2023, 5, 16, 16, 45), "Promotion"), > (3, datetime(2023, 5, 17, 9, 15), "Coupon") > ] > # Create the DataFrame > df = spark.createDataFrame(data, schema) > df.createOrReplaceTempView("temp_offers") > # Query the temporary table using SQL > # DISTINCT required to reproduce the issue. > testDf = spark.sql(""" > SELECT DISTINCT > col1, > col2, > col3 FROM temp_offers > """) {code} > 2. UDF registration > {code:java} > import pyspark.sql.functions as F > import pyspark.sql.types as T > #Creating udf functions > def udf1(d): > return d > def udf2(d): > if d.isoweekday() in (1, 2, 3, 4): > return 'WEEKDAY' > else: > return 'WEEKEND' > udf1_name = F.udf(udf1, T.TimestampType()) > udf2_name = F.udf(udf2, T.StringType()) {code} > 3. Adding UDF in grouping and agg > {code:java} > groupBy_cols = ['col1', 'col4', 'col5', 'col3'] > temp = testDf \ > .select('*', udf1_name(F.col('col2')).alias('col4')).select('*', > udf2_name('col4').alias('col5')) > result = > (temp.groupBy(*groupBy_cols).agg(F.countDistinct('col5').alias('col6'))){code} > 4. Result > {code:java} > result.show(5, False) {code} > *We get below error* > {code:java} > An error was encountered: > An error occurred while calling o1079.showString. > : java.lang.IllegalStateException: Couldn't find pythonUDF0#1108 in > [col1#978L,groupingPythonUDF#1104,groupingPythonUDF#1105,col3#980,count(pythonUDF0#1108)#1080L] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48311) Nested pythonUDF in groupBy and aggregate result in Binding Exception
[ https://issues.apache.org/jira/browse/SPARK-48311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48311: --- Labels: pull-request-available (was: ) > Nested pythonUDF in groupBy and aggregate result in Binding Exception > -- > > Key: SPARK-48311 > URL: https://issues.apache.org/jira/browse/SPARK-48311 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.3.2 >Reporter: Sumit Singh >Priority: Major > Labels: pull-request-available > > Steps to Reproduce > 1. Data creation > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql.types import StructType, StructField, LongType, > TimestampType, StringType > from datetime import datetime > # Define the schema > schema = StructType([ > StructField("col1", LongType(), nullable=True), > StructField("col2", TimestampType(), nullable=True), > StructField("col3", StringType(), nullable=True) > ]) > # Define the data > data = [ > (1, datetime(2023, 5, 15, 12, 30), "Discount"), > (2, datetime(2023, 5, 16, 16, 45), "Promotion"), > (3, datetime(2023, 5, 17, 9, 15), "Coupon") > ] > # Create the DataFrame > df = spark.createDataFrame(data, schema) > df.createOrReplaceTempView("temp_offers") > # Query the temporary table using SQL > # DISTINCT required to reproduce the issue. > testDf = spark.sql(""" > SELECT DISTINCT > col1, > col2, > col3 FROM temp_offers > """) {code} > 2. UDF registration > {code:java} > import pyspark.sql.functions as F > import pyspark.sql.types as T > #Creating udf functions > def udf1(d): > return d > def udf2(d): > if d.isoweekday() in (1, 2, 3, 4): > return 'WEEKDAY' > else: > return 'WEEKEND' > udf1_name = F.udf(udf1, T.TimestampType()) > udf2_name = F.udf(udf2, T.StringType()) {code} > 3. Adding UDF in grouping and agg > {code:java} > groupBy_cols = ['col1', 'col4', 'col5', 'col3'] > temp = testDf \ > .select('*', udf1_name(F.col('col2')).alias('col4')).select('*', > udf2_name('col4').alias('col5')) > result = > (temp.groupBy(*groupBy_cols).agg(F.countDistinct('col5').alias('col6'))){code} > 4. Result > {code:java} > result.show(5, False) {code} > *We get below error* > {code:java} > An error was encountered: > An error occurred while calling o1079.showString. > : java.lang.IllegalStateException: Couldn't find pythonUDF0#1108 in > [col1#978L,groupingPythonUDF#1104,groupingPythonUDF#1105,col3#980,count(pythonUDF0#1108)#1080L] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48576) Rename UTF8_BINARY_LCASE to UTF8_LCASE
[ https://issues.apache.org/jira/browse/SPARK-48576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48576: --- Labels: pull-request-available (was: ) > Rename UTF8_BINARY_LCASE to UTF8_LCASE > -- > > Key: SPARK-48576 > URL: https://issues.apache.org/jira/browse/SPARK-48576 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48280) Improve collation testing surface area using expression walking
[ https://issues.apache.org/jira/browse/SPARK-48280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48280: -- Summary: Improve collation testing surface area using expression walking (was: Add Expression Walker for Testing) > Improve collation testing surface area using expression walking > --- > > Key: SPARK-48280 > URL: https://issues.apache.org/jira/browse/SPARK-48280 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org