[jira] [Created] (SPARK-41446) Make `createDataFrame` support schema and more input dataset type
Ruifeng Zheng created SPARK-41446: - Summary: Make `createDataFrame` support schema and more input dataset type Key: SPARK-41446 URL: https://issues.apache.org/jira/browse/SPARK-41446 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39948) exclude velocity 1.5 jar
[ https://issues.apache.org/jira/browse/SPARK-39948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644645#comment-17644645 ] Apache Spark commented on SPARK-39948: -- User 'zhouyifan279' has created a pull request for this issue: https://github.com/apache/spark/pull/38978 > exclude velocity 1.5 jar > > > Key: SPARK-39948 > URL: https://issues.apache.org/jira/browse/SPARK-39948 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0, 3.4.0 >Reporter: melin >Priority: Major > > hive-exec depends on importing velocity. Velocity has an older version and > has many security issues > https://issues.apache.org/jira/browse/HIVE-25726 > > !image-2022-08-02-14-05-55-756.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39948) exclude velocity 1.5 jar
[ https://issues.apache.org/jira/browse/SPARK-39948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644643#comment-17644643 ] Apache Spark commented on SPARK-39948: -- User 'zhouyifan279' has created a pull request for this issue: https://github.com/apache/spark/pull/38978 > exclude velocity 1.5 jar > > > Key: SPARK-39948 > URL: https://issues.apache.org/jira/browse/SPARK-39948 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0, 3.4.0 >Reporter: melin >Priority: Major > > hive-exec depends on importing velocity. Velocity has an older version and > has many security issues > https://issues.apache.org/jira/browse/HIVE-25726 > > !image-2022-08-02-14-05-55-756.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41366) DF.groupby.agg() API should be compatible
[ https://issues.apache.org/jira/browse/SPARK-41366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41366: Assignee: Martin Grund > DF.groupby.agg() API should be compatible > - > > Key: SPARK-41366 > URL: https://issues.apache.org/jira/browse/SPARK-41366 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38277) Clear write batch after RocksDB state store's commit
[ https://issues.apache.org/jira/browse/SPARK-38277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-38277. -- Fix Version/s: 3.3.2 3.4.0 Resolution: Fixed Issue resolved by pull request 38880 [https://github.com/apache/spark/pull/38880] > Clear write batch after RocksDB state store's commit > > > Key: SPARK-38277 > URL: https://issues.apache.org/jira/browse/SPARK-38277 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Yun Tang >Assignee: Yun Tang >Priority: Minor > Fix For: 3.3.2, 3.4.0 > > > Current write batch would be cleared when loading the next batch, however, > this could be improved once the batch just committed to release unused memory > earlier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38277) Clear write batch after RocksDB state store's commit
[ https://issues.apache.org/jira/browse/SPARK-38277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-38277: Assignee: Yun Tang > Clear write batch after RocksDB state store's commit > > > Key: SPARK-38277 > URL: https://issues.apache.org/jira/browse/SPARK-38277 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Yun Tang >Assignee: Yun Tang >Priority: Minor > > Current write batch would be cleared when loading the next batch, however, > this could be improved once the batch just committed to release unused memory > earlier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41445) Implement DataFrameReader.parquet
[ https://issues.apache.org/jira/browse/SPARK-41445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644623#comment-17644623 ] Apache Spark commented on SPARK-41445: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/38977 > Implement DataFrameReader.parquet > - > > Key: SPARK-41445 > URL: https://issues.apache.org/jira/browse/SPARK-41445 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41445) Implement DataFrameReader.parquet
[ https://issues.apache.org/jira/browse/SPARK-41445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41445: Assignee: Apache Spark > Implement DataFrameReader.parquet > - > > Key: SPARK-41445 > URL: https://issues.apache.org/jira/browse/SPARK-41445 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41445) Implement DataFrameReader.parquet
[ https://issues.apache.org/jira/browse/SPARK-41445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41445: Assignee: (was: Apache Spark) > Implement DataFrameReader.parquet > - > > Key: SPARK-41445 > URL: https://issues.apache.org/jira/browse/SPARK-41445 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41444) Implement DataFrameReader.json
[ https://issues.apache.org/jira/browse/SPARK-41444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644620#comment-17644620 ] Apache Spark commented on SPARK-41444: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38975 > Implement DataFrameReader.json > -- > > Key: SPARK-41444 > URL: https://issues.apache.org/jira/browse/SPARK-41444 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41444) Implement DataFrameReader.json
[ https://issues.apache.org/jira/browse/SPARK-41444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41444: - Summary: Implement DataFrameReader.json (was: Support read.json) > Implement DataFrameReader.json > -- > > Key: SPARK-41444 > URL: https://issues.apache.org/jira/browse/SPARK-41444 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41445) Implement DataFrameReader.parquet
Hyukjin Kwon created SPARK-41445: Summary: Implement DataFrameReader.parquet Key: SPARK-41445 URL: https://issues.apache.org/jira/browse/SPARK-41445 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41284) Feature parity: I/O in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41284 ] Hyukjin Kwon deleted comment on SPARK-41284: -- was (Author: gurwls223): Issue resolved by pull request 38975 [https://github.com/apache/spark/pull/38975] > Feature parity: I/O in Spark Connect > > > Key: SPARK-41284 > URL: https://issues.apache.org/jira/browse/SPARK-41284 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Rui Wang >Priority: Critical > Fix For: 3.4.0 > > > Implement I/O API such as DataFrameReader/Writer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41444) Support read.json
[ https://issues.apache.org/jira/browse/SPARK-41444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644619#comment-17644619 ] Hyukjin Kwon commented on SPARK-41444: -- Fixed in https://github.com/apache/spark/pull/38975 > Support read.json > - > > Key: SPARK-41444 > URL: https://issues.apache.org/jira/browse/SPARK-41444 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41284) Feature parity: I/O in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41284 ] Hyukjin Kwon deleted comment on SPARK-41284: -- was (Author: apachespark): User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38975 > Feature parity: I/O in Spark Connect > > > Key: SPARK-41284 > URL: https://issues.apache.org/jira/browse/SPARK-41284 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Rui Wang >Priority: Critical > Fix For: 3.4.0 > > > Implement I/O API such as DataFrameReader/Writer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41444) Support read.json
[ https://issues.apache.org/jira/browse/SPARK-41444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41444. -- Fix Version/s: 3.4.0 Resolution: Fixed > Support read.json > - > > Key: SPARK-41444 > URL: https://issues.apache.org/jira/browse/SPARK-41444 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41284) Feature parity: I/O in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41284 ] Hyukjin Kwon deleted comment on SPARK-41284: -- was (Author: apachespark): User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38975 > Feature parity: I/O in Spark Connect > > > Key: SPARK-41284 > URL: https://issues.apache.org/jira/browse/SPARK-41284 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Rui Wang >Priority: Critical > Fix For: 3.4.0 > > > Implement I/O API such as DataFrameReader/Writer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-41284) Feature parity: I/O in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-41284: -- > Feature parity: I/O in Spark Connect > > > Key: SPARK-41284 > URL: https://issues.apache.org/jira/browse/SPARK-41284 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Rui Wang >Priority: Critical > Fix For: 3.4.0 > > > Implement I/O API such as DataFrameReader/Writer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41284) Feature parity: I/O in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41284. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38975 [https://github.com/apache/spark/pull/38975] > Feature parity: I/O in Spark Connect > > > Key: SPARK-41284 > URL: https://issues.apache.org/jira/browse/SPARK-41284 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Rui Wang >Priority: Critical > Fix For: 3.4.0 > > > Implement I/O API such as DataFrameReader/Writer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41442) Only update SQLMetric value if merging with valid metric
[ https://issues.apache.org/jira/browse/SPARK-41442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-41442. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38969 [https://github.com/apache/spark/pull/38969] > Only update SQLMetric value if merging with valid metric > > > Key: SPARK-41442 > URL: https://issues.apache.org/jira/browse/SPARK-41442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > Fix For: 3.4.0 > > > We use -1 as initial value of SQLMetric, and change it to 0 while merging > with other SQLMetric instances. A SQLMetric will be treated as invalid and > filtered out later. > While we are developing with Spark, it is trouble behavior that two invalid > SQLMetric instances merge to a valid SQLMetric because merging will set the > value to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41442) Only update SQLMetric value if merging with valid metric
[ https://issues.apache.org/jira/browse/SPARK-41442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-41442: - Assignee: L. C. Hsieh > Only update SQLMetric value if merging with valid metric > > > Key: SPARK-41442 > URL: https://issues.apache.org/jira/browse/SPARK-41442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > > We use -1 as initial value of SQLMetric, and change it to 0 while merging > with other SQLMetric instances. A SQLMetric will be treated as invalid and > filtered out later. > While we are developing with Spark, it is trouble behavior that two invalid > SQLMetric instances merge to a valid SQLMetric because merging will set the > value to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41366) DF.groupby.agg() API should be compatible
[ https://issues.apache.org/jira/browse/SPARK-41366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644604#comment-17644604 ] Apache Spark commented on SPARK-41366: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/38976 > DF.groupby.agg() API should be compatible > - > > Key: SPARK-41366 > URL: https://issues.apache.org/jira/browse/SPARK-41366 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41366) DF.groupby.agg() API should be compatible
[ https://issues.apache.org/jira/browse/SPARK-41366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644603#comment-17644603 ] Apache Spark commented on SPARK-41366: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/38976 > DF.groupby.agg() API should be compatible > - > > Key: SPARK-41366 > URL: https://issues.apache.org/jira/browse/SPARK-41366 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41439) Implement `DataFrame.melt`
[ https://issues.apache.org/jira/browse/SPARK-41439 ] jiaan.geng deleted comment on SPARK-41439: was (Author: beliefer): I'm working on. > Implement `DataFrame.melt` > -- > > Key: SPARK-41439 > URL: https://issues.apache.org/jira/browse/SPARK-41439 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41350) allow simple name access of using join hidden columns after subquery alias
[ https://issues.apache.org/jira/browse/SPARK-41350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-41350: -- Fix Version/s: 3.3.2 > allow simple name access of using join hidden columns after subquery alias > -- > > Key: SPARK-41350 > URL: https://issues.apache.org/jira/browse/SPARK-41350 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.3.2, 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41284) Feature parity: I/O in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644593#comment-17644593 ] Apache Spark commented on SPARK-41284: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38975 > Feature parity: I/O in Spark Connect > > > Key: SPARK-41284 > URL: https://issues.apache.org/jira/browse/SPARK-41284 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Rui Wang >Priority: Critical > > Implement I/O API such as DataFrameReader/Writer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41284) Feature parity: I/O in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41284: Assignee: Rui Wang (was: Apache Spark) > Feature parity: I/O in Spark Connect > > > Key: SPARK-41284 > URL: https://issues.apache.org/jira/browse/SPARK-41284 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Rui Wang >Priority: Critical > > Implement I/O API such as DataFrameReader/Writer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41284) Feature parity: I/O in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644594#comment-17644594 ] Apache Spark commented on SPARK-41284: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38975 > Feature parity: I/O in Spark Connect > > > Key: SPARK-41284 > URL: https://issues.apache.org/jira/browse/SPARK-41284 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Rui Wang >Priority: Critical > > Implement I/O API such as DataFrameReader/Writer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41284) Feature parity: I/O in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41284: Assignee: Apache Spark (was: Rui Wang) > Feature parity: I/O in Spark Connect > > > Key: SPARK-41284 > URL: https://issues.apache.org/jira/browse/SPARK-41284 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Critical > > Implement I/O API such as DataFrameReader/Writer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41444) Support read.json
Rui Wang created SPARK-41444: Summary: Support read.json Key: SPARK-41444 URL: https://issues.apache.org/jira/browse/SPARK-41444 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41439) Implement `DataFrame.melt`
[ https://issues.apache.org/jira/browse/SPARK-41439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41439: Assignee: Apache Spark > Implement `DataFrame.melt` > -- > > Key: SPARK-41439 > URL: https://issues.apache.org/jira/browse/SPARK-41439 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41439) Implement `DataFrame.melt`
[ https://issues.apache.org/jira/browse/SPARK-41439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41439: Assignee: (was: Apache Spark) > Implement `DataFrame.melt` > -- > > Key: SPARK-41439 > URL: https://issues.apache.org/jira/browse/SPARK-41439 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41439) Implement `DataFrame.melt`
[ https://issues.apache.org/jira/browse/SPARK-41439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644576#comment-17644576 ] Apache Spark commented on SPARK-41439: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/38973 > Implement `DataFrame.melt` > -- > > Key: SPARK-41439 > URL: https://issues.apache.org/jira/browse/SPARK-41439 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41443) Assign a name to the error class _LEGACY_ERROR_TEMP_1061
[ https://issues.apache.org/jira/browse/SPARK-41443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644575#comment-17644575 ] Apache Spark commented on SPARK-41443: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38972 > Assign a name to the error class _LEGACY_ERROR_TEMP_1061 > > > Key: SPARK-41443 > URL: https://issues.apache.org/jira/browse/SPARK-41443 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41443) Assign a name to the error class _LEGACY_ERROR_TEMP_1061
[ https://issues.apache.org/jira/browse/SPARK-41443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41443: Assignee: Apache Spark > Assign a name to the error class _LEGACY_ERROR_TEMP_1061 > > > Key: SPARK-41443 > URL: https://issues.apache.org/jira/browse/SPARK-41443 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41443) Assign a name to the error class _LEGACY_ERROR_TEMP_1061
[ https://issues.apache.org/jira/browse/SPARK-41443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644574#comment-17644574 ] Apache Spark commented on SPARK-41443: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38972 > Assign a name to the error class _LEGACY_ERROR_TEMP_1061 > > > Key: SPARK-41443 > URL: https://issues.apache.org/jira/browse/SPARK-41443 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41443) Assign a name to the error class _LEGACY_ERROR_TEMP_1061
[ https://issues.apache.org/jira/browse/SPARK-41443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41443: Assignee: (was: Apache Spark) > Assign a name to the error class _LEGACY_ERROR_TEMP_1061 > > > Key: SPARK-41443 > URL: https://issues.apache.org/jira/browse/SPARK-41443 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41443) Assign a name to the error class _LEGACY_ERROR_TEMP_1061
BingKun Pan created SPARK-41443: --- Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_1061 Key: SPARK-41443 URL: https://issues.apache.org/jira/browse/SPARK-41443 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41433) Make Max Arrow BatchSize configurable
[ https://issues.apache.org/jira/browse/SPARK-41433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644566#comment-17644566 ] Apache Spark commented on SPARK-41433: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38971 > Make Max Arrow BatchSize configurable > - > > Key: SPARK-41433 > URL: https://issues.apache.org/jira/browse/SPARK-41433 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41376) Executor netty direct memory check should respect spark.shuffle.io.preferDirectBufs
[ https://issues.apache.org/jira/browse/SPARK-41376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-41376: - Priority: Minor (was: Major) > Executor netty direct memory check should respect > spark.shuffle.io.preferDirectBufs > --- > > Key: SPARK-41376 > URL: https://issues.apache.org/jira/browse/SPARK-41376 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41376) Executor netty direct memory check should respect spark.shuffle.io.preferDirectBufs
[ https://issues.apache.org/jira/browse/SPARK-41376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-41376. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38901 [https://github.com/apache/spark/pull/38901] > Executor netty direct memory check should respect > spark.shuffle.io.preferDirectBufs > --- > > Key: SPARK-41376 > URL: https://issues.apache.org/jira/browse/SPARK-41376 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41376) Executor netty direct memory check should respect spark.shuffle.io.preferDirectBufs
[ https://issues.apache.org/jira/browse/SPARK-41376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-41376: Assignee: Cheng Pan > Executor netty direct memory check should respect > spark.shuffle.io.preferDirectBufs > --- > > Key: SPARK-41376 > URL: https://issues.apache.org/jira/browse/SPARK-41376 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41378) Support Column Stats in DS V2
[ https://issues.apache.org/jira/browse/SPARK-41378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-41378. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38904 [https://github.com/apache/spark/pull/38904] > Support Column Stats in DS V2 > - > > Key: SPARK-41378 > URL: https://issues.apache.org/jira/browse/SPARK-41378 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41378) Support Column Stats in DS V2
[ https://issues.apache.org/jira/browse/SPARK-41378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-41378: - Assignee: Huaxin Gao > Support Column Stats in DS V2 > - > > Key: SPARK-41378 > URL: https://issues.apache.org/jira/browse/SPARK-41378 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41412) Implement `Cast`
[ https://issues.apache.org/jira/browse/SPARK-41412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41412: Assignee: Rui Wang (was: Apache Spark) > Implement `Cast` > > > Key: SPARK-41412 > URL: https://issues.apache.org/jira/browse/SPARK-41412 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41412) Implement `Cast`
[ https://issues.apache.org/jira/browse/SPARK-41412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644513#comment-17644513 ] Apache Spark commented on SPARK-41412: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38970 > Implement `Cast` > > > Key: SPARK-41412 > URL: https://issues.apache.org/jira/browse/SPARK-41412 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41412) Implement `Cast`
[ https://issues.apache.org/jira/browse/SPARK-41412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41412: Assignee: Apache Spark (was: Rui Wang) > Implement `Cast` > > > Key: SPARK-41412 > URL: https://issues.apache.org/jira/browse/SPARK-41412 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41442) Only update SQLMetric value if merging with valid metric
[ https://issues.apache.org/jira/browse/SPARK-41442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644512#comment-17644512 ] Apache Spark commented on SPARK-41442: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/38969 > Only update SQLMetric value if merging with valid metric > > > Key: SPARK-41442 > URL: https://issues.apache.org/jira/browse/SPARK-41442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: L. C. Hsieh >Priority: Minor > > We use -1 as initial value of SQLMetric, and change it to 0 while merging > with other SQLMetric instances. A SQLMetric will be treated as invalid and > filtered out later. > While we are developing with Spark, it is trouble behavior that two invalid > SQLMetric instances merge to a valid SQLMetric because merging will set the > value to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41442) Only update SQLMetric value if merging with valid metric
[ https://issues.apache.org/jira/browse/SPARK-41442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41442: Assignee: (was: Apache Spark) > Only update SQLMetric value if merging with valid metric > > > Key: SPARK-41442 > URL: https://issues.apache.org/jira/browse/SPARK-41442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: L. C. Hsieh >Priority: Minor > > We use -1 as initial value of SQLMetric, and change it to 0 while merging > with other SQLMetric instances. A SQLMetric will be treated as invalid and > filtered out later. > While we are developing with Spark, it is trouble behavior that two invalid > SQLMetric instances merge to a valid SQLMetric because merging will set the > value to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41442) Only update SQLMetric value if merging with valid metric
[ https://issues.apache.org/jira/browse/SPARK-41442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41442: Assignee: Apache Spark > Only update SQLMetric value if merging with valid metric > > > Key: SPARK-41442 > URL: https://issues.apache.org/jira/browse/SPARK-41442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Minor > > We use -1 as initial value of SQLMetric, and change it to 0 while merging > with other SQLMetric instances. A SQLMetric will be treated as invalid and > filtered out later. > While we are developing with Spark, it is trouble behavior that two invalid > SQLMetric instances merge to a valid SQLMetric because merging will set the > value to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41442) Only update SQLMetric value if merging with valid metric
[ https://issues.apache.org/jira/browse/SPARK-41442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644511#comment-17644511 ] Apache Spark commented on SPARK-41442: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/38969 > Only update SQLMetric value if merging with valid metric > > > Key: SPARK-41442 > URL: https://issues.apache.org/jira/browse/SPARK-41442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: L. C. Hsieh >Priority: Minor > > We use -1 as initial value of SQLMetric, and change it to 0 while merging > with other SQLMetric instances. A SQLMetric will be treated as invalid and > filtered out later. > While we are developing with Spark, it is trouble behavior that two invalid > SQLMetric instances merge to a valid SQLMetric because merging will set the > value to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41442) Only update SQLMetric value if merging with valid metric
L. C. Hsieh created SPARK-41442: --- Summary: Only update SQLMetric value if merging with valid metric Key: SPARK-41442 URL: https://issues.apache.org/jira/browse/SPARK-41442 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: L. C. Hsieh We use -1 as initial value of SQLMetric, and change it to 0 while merging with other SQLMetric instances. A SQLMetric will be treated as invalid and filtered out later. While we are developing with Spark, it is trouble behavior that two invalid SQLMetric instances merge to a valid SQLMetric because merging will set the value to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41233) High-order function: array_prepend
[ https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644500#comment-17644500 ] Apache Spark commented on SPARK-41233: -- User 'navinvishy' has created a pull request for this issue: https://github.com/apache/spark/pull/38947 > High-order function: array_prepend > -- > > Key: SPARK-41233 > URL: https://issues.apache.org/jira/browse/SPARK-41233 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > refer to > https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41233) High-order function: array_prepend
[ https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41233: Assignee: Apache Spark > High-order function: array_prepend > -- > > Key: SPARK-41233 > URL: https://issues.apache.org/jira/browse/SPARK-41233 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > > refer to > https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41233) High-order function: array_prepend
[ https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644499#comment-17644499 ] Apache Spark commented on SPARK-41233: -- User 'navinvishy' has created a pull request for this issue: https://github.com/apache/spark/pull/38947 > High-order function: array_prepend > -- > > Key: SPARK-41233 > URL: https://issues.apache.org/jira/browse/SPARK-41233 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > refer to > https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41233) High-order function: array_prepend
[ https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41233: Assignee: (was: Apache Spark) > High-order function: array_prepend > -- > > Key: SPARK-41233 > URL: https://issues.apache.org/jira/browse/SPARK-41233 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > refer to > https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41344) Reading V2 datasource masks underlying error
[ https://issues.apache.org/jira/browse/SPARK-41344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644498#comment-17644498 ] Kevin Cheung edited comment on SPARK-41344 at 12/7/22 7:53 PM: --- [~wforget] I believe he just means duplicate CatalogV2Util.loadTable as a new function with signature CatalogV2Util.loadTableThrowsException : Table. The only difference would be you just don't catch the exceptions. Then change this to your new function ({*}CatalogV2Util.loadTableThrowsException(catalog, ident, timeTravel){*}, Some(catalog), Some(ident)). This solves the problem of masking the original exception was (Author: kecheung): [~wforget] I believe he just means duplicate CatalogV2Util.loadTable as a new function with signature CatalogV2Util.loadTableThrowsException : Table. Then change this to your new function ({*}CatalogV2Util.loadTableThrowsException(catalog, ident, timeTravel){*}, Some(catalog), Some(ident)). This solves the problem of masking the original exception > Reading V2 datasource masks underlying error > > > Key: SPARK-41344 > URL: https://issues.apache.org/jira/browse/SPARK-41344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.4.0 >Reporter: Kevin Cheung >Priority: Critical > Attachments: image-2022-12-03-09-24-43-285.png > > > In Spark 3.3, > # DataSourceV2Utils, the loadV2Source calls: > {*}(CatalogV2Util.loadTable(catalog, ident, timeTravel).get{*}, > Some(catalog), Some(ident)). > # CatalogV2Util.scala, when it tries to *loadTable(x,x,x)* and it fails with > any of these exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException, it would return None > # Coming back to DataSourceV2Utils, None was previously returned and calling > None.get results in a cryptic error technically "correct", but the *original > exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException are thrown away.* > > *Ask:* > Retain the original error and propagate this to the user. Prior to Spark 3.3, > the *original error* was shown and this seems like a design flaw. > > *Sample user facing error:* > None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:529) > at scala.None$.get(Option.scala:527) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:129) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209) > at scala.Option.flatMap(Option.scala:271) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) > > *DataSourceV2Utils.scala - CatalogV2Util.loadTable(x,x,x).get* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala#L137] > *CatalogV2Util.scala - Option(catalog.asTableCatalog.loadTable(ident))* > {*}{{*}}[https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L341] > *CatalogV2Util.scala - catching the exceptions and return None* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L344] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41344) Reading V2 datasource masks underlying error
[ https://issues.apache.org/jira/browse/SPARK-41344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644498#comment-17644498 ] Kevin Cheung edited comment on SPARK-41344 at 12/7/22 7:52 PM: --- [~wforget] I believe he just means duplicate CatalogV2Util.loadTable as a new function with signature CatalogV2Util.loadTableThrowsException : Table. Then change this to your new function ({*}CatalogV2Util.loadTableThrowsException(catalog, ident, timeTravel){*}, Some(catalog), Some(ident)). This solves the problem of masking the original exception was (Author: kecheung): [~wforget] I believe he just means duplicate CatalogV2Util.loadTable as a new function with signature CatalogV2Util.loadTableThrowsException : Table. Then change this to your new function ({*}CatalogV2Util.loadTableThrowsException(catalog, ident, timeTravel){*}, Some(catalog), Some(ident)) > Reading V2 datasource masks underlying error > > > Key: SPARK-41344 > URL: https://issues.apache.org/jira/browse/SPARK-41344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.4.0 >Reporter: Kevin Cheung >Priority: Critical > Attachments: image-2022-12-03-09-24-43-285.png > > > In Spark 3.3, > # DataSourceV2Utils, the loadV2Source calls: > {*}(CatalogV2Util.loadTable(catalog, ident, timeTravel).get{*}, > Some(catalog), Some(ident)). > # CatalogV2Util.scala, when it tries to *loadTable(x,x,x)* and it fails with > any of these exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException, it would return None > # Coming back to DataSourceV2Utils, None was previously returned and calling > None.get results in a cryptic error technically "correct", but the *original > exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException are thrown away.* > > *Ask:* > Retain the original error and propagate this to the user. Prior to Spark 3.3, > the *original error* was shown and this seems like a design flaw. > > *Sample user facing error:* > None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:529) > at scala.None$.get(Option.scala:527) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:129) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209) > at scala.Option.flatMap(Option.scala:271) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) > > *DataSourceV2Utils.scala - CatalogV2Util.loadTable(x,x,x).get* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala#L137] > *CatalogV2Util.scala - Option(catalog.asTableCatalog.loadTable(ident))* > {*}{{*}}[https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L341] > *CatalogV2Util.scala - catching the exceptions and return None* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L344] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41344) Reading V2 datasource masks underlying error
[ https://issues.apache.org/jira/browse/SPARK-41344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644498#comment-17644498 ] Kevin Cheung commented on SPARK-41344: -- [~wforget] I believe he just means duplicate CatalogV2Util.loadTable as a new function with signature CatalogV2Util.loadTableThrowsException : Table. Then change this to your new function ({*}CatalogV2Util.loadTableThrowsException(catalog, ident, timeTravel){*}, Some(catalog), Some(ident)) > Reading V2 datasource masks underlying error > > > Key: SPARK-41344 > URL: https://issues.apache.org/jira/browse/SPARK-41344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.4.0 >Reporter: Kevin Cheung >Priority: Critical > Attachments: image-2022-12-03-09-24-43-285.png > > > In Spark 3.3, > # DataSourceV2Utils, the loadV2Source calls: > {*}(CatalogV2Util.loadTable(catalog, ident, timeTravel).get{*}, > Some(catalog), Some(ident)). > # CatalogV2Util.scala, when it tries to *loadTable(x,x,x)* and it fails with > any of these exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException, it would return None > # Coming back to DataSourceV2Utils, None was previously returned and calling > None.get results in a cryptic error technically "correct", but the *original > exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException are thrown away.* > > *Ask:* > Retain the original error and propagate this to the user. Prior to Spark 3.3, > the *original error* was shown and this seems like a design flaw. > > *Sample user facing error:* > None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:529) > at scala.None$.get(Option.scala:527) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:129) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209) > at scala.Option.flatMap(Option.scala:271) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) > > *DataSourceV2Utils.scala - CatalogV2Util.loadTable(x,x,x).get* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala#L137] > *CatalogV2Util.scala - Option(catalog.asTableCatalog.loadTable(ident))* > {*}{{*}}[https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L341] > *CatalogV2Util.scala - catching the exceptions and return None* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L344] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41344) Reading V2 datasource masks underlying error
[ https://issues.apache.org/jira/browse/SPARK-41344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644492#comment-17644492 ] Kevin Cheung commented on SPARK-41344: -- +1 [~planga82]. I like this approach of having another function so the real exception can be propagated > Reading V2 datasource masks underlying error > > > Key: SPARK-41344 > URL: https://issues.apache.org/jira/browse/SPARK-41344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.4.0 >Reporter: Kevin Cheung >Priority: Critical > Attachments: image-2022-12-03-09-24-43-285.png > > > In Spark 3.3, > # DataSourceV2Utils, the loadV2Source calls: > {*}(CatalogV2Util.loadTable(catalog, ident, timeTravel).get{*}, > Some(catalog), Some(ident)). > # CatalogV2Util.scala, when it tries to *loadTable(x,x,x)* and it fails with > any of these exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException, it would return None > # Coming back to DataSourceV2Utils, None was previously returned and calling > None.get results in a cryptic error technically "correct", but the *original > exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException are thrown away.* > > *Ask:* > Retain the original error and propagate this to the user. Prior to Spark 3.3, > the *original error* was shown and this seems like a design flaw. > > *Sample user facing error:* > None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:529) > at scala.None$.get(Option.scala:527) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:129) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209) > at scala.Option.flatMap(Option.scala:271) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) > > *DataSourceV2Utils.scala - CatalogV2Util.loadTable(x,x,x).get* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala#L137] > *CatalogV2Util.scala - Option(catalog.asTableCatalog.loadTable(ident))* > {*}{{*}}[https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L341] > *CatalogV2Util.scala - catching the exceptions and return None* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L344] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41349) Implement `DataFrame.hint`
[ https://issues.apache.org/jira/browse/SPARK-41349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644463#comment-17644463 ] Rui Wang commented on SPARK-41349: -- Keeping this issue as open given that there is python side of work left. > Implement `DataFrame.hint` > -- > > Key: SPARK-41349 > URL: https://issues.apache.org/jira/browse/SPARK-41349 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Deng Ziming >Priority: Major > Fix For: 3.4.0 > > > implement DataFrame.hint with the proto message added in > https://issues.apache.org/jira/browse/SPARK-41345 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-41349) Implement `DataFrame.hint`
[ https://issues.apache.org/jira/browse/SPARK-41349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang reopened SPARK-41349: -- > Implement `DataFrame.hint` > -- > > Key: SPARK-41349 > URL: https://issues.apache.org/jira/browse/SPARK-41349 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Deng Ziming >Priority: Major > Fix For: 3.4.0 > > > implement DataFrame.hint with the proto message added in > https://issues.apache.org/jira/browse/SPARK-41345 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41441) Allow Generate with no required child output to host outer references
[ https://issues.apache.org/jira/browse/SPARK-41441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41441: Assignee: (was: Apache Spark) > Allow Generate with no required child output to host outer references > - > > Key: SPARK-41441 > URL: https://issues.apache.org/jira/browse/SPARK-41441 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Major > > Currently, in CheckAnalysis, Spark disallows Generate to host any outer > references when it's required child output is not empty. But when the child > output is empty, it can host outer references, which DecorrelateInnerQuery > does not handle. > For example, > {code:java} > select * from t, lateral (select explode(array(c1, c2))){code} > This throws an internal error : > {code:java} > Caused by: java.lang.AssertionError: assertion failed: Correlated column is > not allowed in Generate explode(array(outer(c1#219), outer(c2#220))), false, > [col#221] +- OneRowRelation{code} > We should support Generate to host outer references when its required child > output is empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41441) Allow Generate with no required child output to host outer references
[ https://issues.apache.org/jira/browse/SPARK-41441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41441: Assignee: Apache Spark > Allow Generate with no required child output to host outer references > - > > Key: SPARK-41441 > URL: https://issues.apache.org/jira/browse/SPARK-41441 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > Currently, in CheckAnalysis, Spark disallows Generate to host any outer > references when it's required child output is not empty. But when the child > output is empty, it can host outer references, which DecorrelateInnerQuery > does not handle. > For example, > {code:java} > select * from t, lateral (select explode(array(c1, c2))){code} > This throws an internal error : > {code:java} > Caused by: java.lang.AssertionError: assertion failed: Correlated column is > not allowed in Generate explode(array(outer(c1#219), outer(c2#220))), false, > [col#221] +- OneRowRelation{code} > We should support Generate to host outer references when its required child > output is empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41441) Allow Generate with no required child output to host outer references
[ https://issues.apache.org/jira/browse/SPARK-41441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644459#comment-17644459 ] Apache Spark commented on SPARK-41441: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/38968 > Allow Generate with no required child output to host outer references > - > > Key: SPARK-41441 > URL: https://issues.apache.org/jira/browse/SPARK-41441 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Major > > Currently, in CheckAnalysis, Spark disallows Generate to host any outer > references when it's required child output is not empty. But when the child > output is empty, it can host outer references, which DecorrelateInnerQuery > does not handle. > For example, > {code:java} > select * from t, lateral (select explode(array(c1, c2))){code} > This throws an internal error : > {code:java} > Caused by: java.lang.AssertionError: assertion failed: Correlated column is > not allowed in Generate explode(array(outer(c1#219), outer(c2#220))), false, > [col#221] +- OneRowRelation{code} > We should support Generate to host outer references when its required child > output is empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41369) Refactor connect directory structure
[ https://issues.apache.org/jira/browse/SPARK-41369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1765#comment-1765 ] Apache Spark commented on SPARK-41369: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/38967 > Refactor connect directory structure > > > Key: SPARK-41369 > URL: https://issues.apache.org/jira/browse/SPARK-41369 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.3.2, 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > Currently, `spark/connector/connect/` is a single module that contains both > the "server"/service as well as the protobuf definitions. > However, this module can be split into multiple modules - "server" and > "common". This brings the advantage of separating out the protobuf generation > from the core "server" module for efficient reuse. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41369) Refactor connect directory structure
[ https://issues.apache.org/jira/browse/SPARK-41369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1764#comment-1764 ] Apache Spark commented on SPARK-41369: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/38967 > Refactor connect directory structure > > > Key: SPARK-41369 > URL: https://issues.apache.org/jira/browse/SPARK-41369 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.3.2, 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > Currently, `spark/connector/connect/` is a single module that contains both > the "server"/service as well as the protobuf definitions. > However, this module can be split into multiple modules - "server" and > "common". This brings the advantage of separating out the protobuf generation > from the core "server" module for efficient reuse. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41441) Allow Generate with no required child output to host outer references
Allison Wang created SPARK-41441: Summary: Allow Generate with no required child output to host outer references Key: SPARK-41441 URL: https://issues.apache.org/jira/browse/SPARK-41441 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Allison Wang Currently, in CheckAnalysis, Spark disallows Generate to host any outer references when it's required child output is not empty. But when the child output is empty, it can host outer references, which DecorrelateInnerQuery does not handle. For example, {code:java} select * from t, lateral (select explode(array(c1, c2))){code} This throws an internal error : {code:java} Caused by: java.lang.AssertionError: assertion failed: Correlated column is not allowed in Generate explode(array(outer(c1#219), outer(c2#220))), false, [col#221] +- OneRowRelation{code} We should support Generate to host outer references when its required child output is empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40801) Upgrade Apache Commons Text to 1.10
[ https://issues.apache.org/jira/browse/SPARK-40801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644430#comment-17644430 ] Kevin Appel commented on SPARK-40801: - thank you for working on this > Upgrade Apache Commons Text to 1.10 > --- > > Key: SPARK-40801 > URL: https://issues.apache.org/jira/browse/SPARK-40801 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Minor > Fix For: 3.2.3, 3.3.2, 3.4.0 > > > [CVE-2022-42889|https://nvd.nist.gov/vuln/detail/CVE-2022-42889] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41008) Isotonic regression result differs from sklearn implementation
[ https://issues.apache.org/jira/browse/SPARK-41008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644314#comment-17644314 ] Apache Spark commented on SPARK-41008: -- User 'ahmed-mahran' has created a pull request for this issue: https://github.com/apache/spark/pull/38966 > Isotonic regression result differs from sklearn implementation > -- > > Key: SPARK-41008 > URL: https://issues.apache.org/jira/browse/SPARK-41008 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.1 >Reporter: Arne Koopman >Priority: Minor > > > {code:python} > import pandas as pd > from pyspark.sql.types import DoubleType > from sklearn.isotonic import IsotonicRegression as IsotonicRegression_sklearn > from pyspark.ml.regression import IsotonicRegression as > IsotonicRegression_pyspark > # The P(positives | model_score): > # 0.6 -> 0.5 (1 out of the 2 labels is positive) > # 0.333 -> 0.333 (1 out of the 3 labels is positive) > # 0.20 -> 0.25 (1 out of the 4 labels is positive) > tc_pd = pd.DataFrame({ > "model_score": [0.6, 0.6, 0.333, 0.333, 0.333, 0.20, 0.20, 0.20, 0.20], > > "label": [1, 0, 0, 1, 0, 1, 0, 0, 0], > "weight": 1, } > ) > # The fraction of positives for each of the distinct model_scores would be > the best fit. > # Resulting in the following expected calibrated model_scores: > # "calibrated_model_score": [0.5, 0.5, 0.333, 0.333, 0.333, 0.25, 0.25, 0.25, > 0.25] > # The sklearn implementation of Isotonic Regression. > from sklearn.isotonic import IsotonicRegression as IsotonicRegression_sklearn > tc_regressor_sklearn = > IsotonicRegression_sklearn().fit(X=tc_pd['model_score'], y=tc_pd['label'], > sample_weight=tc_pd['weight']) > print("sklearn:", tc_regressor_sklearn.predict(tc_pd['model_score'])) > # >> sklearn: [0.5 0.5 0. 0. 0. 0.25 0.25 0.25 0.25 ] > # The pyspark implementation of Isotonic Regression. > tc_df = spark.createDataFrame(tc_pd) > tc_df = tc_df.withColumn('model_score', > F.col('model_score').cast(DoubleType())) > isotonic_regressor_pyspark = > IsotonicRegression_pyspark(featuresCol='model_score', labelCol='label', > weightCol='weight') > tc_model = isotonic_regressor_pyspark.fit(tc_df) > tc_pd = tc_model.transform(tc_df).toPandas() > print("pyspark:", tc_pd['prediction'].values) > # >> pyspark: [0.5 0.5 0. 0. 0. 0. 0. 0. 0. ] > # The result from the pyspark implementation seems unclear. Similar small toy > examples lead to similar non-expected results for the pyspark implementation. > # Strangely enough, for 'large' datasets, the difference between calibrated > model_scores generated by both implementations dissapears. > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41008) Isotonic regression result differs from sklearn implementation
[ https://issues.apache.org/jira/browse/SPARK-41008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644313#comment-17644313 ] Apache Spark commented on SPARK-41008: -- User 'ahmed-mahran' has created a pull request for this issue: https://github.com/apache/spark/pull/38966 > Isotonic regression result differs from sklearn implementation > -- > > Key: SPARK-41008 > URL: https://issues.apache.org/jira/browse/SPARK-41008 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.1 >Reporter: Arne Koopman >Priority: Minor > > > {code:python} > import pandas as pd > from pyspark.sql.types import DoubleType > from sklearn.isotonic import IsotonicRegression as IsotonicRegression_sklearn > from pyspark.ml.regression import IsotonicRegression as > IsotonicRegression_pyspark > # The P(positives | model_score): > # 0.6 -> 0.5 (1 out of the 2 labels is positive) > # 0.333 -> 0.333 (1 out of the 3 labels is positive) > # 0.20 -> 0.25 (1 out of the 4 labels is positive) > tc_pd = pd.DataFrame({ > "model_score": [0.6, 0.6, 0.333, 0.333, 0.333, 0.20, 0.20, 0.20, 0.20], > > "label": [1, 0, 0, 1, 0, 1, 0, 0, 0], > "weight": 1, } > ) > # The fraction of positives for each of the distinct model_scores would be > the best fit. > # Resulting in the following expected calibrated model_scores: > # "calibrated_model_score": [0.5, 0.5, 0.333, 0.333, 0.333, 0.25, 0.25, 0.25, > 0.25] > # The sklearn implementation of Isotonic Regression. > from sklearn.isotonic import IsotonicRegression as IsotonicRegression_sklearn > tc_regressor_sklearn = > IsotonicRegression_sklearn().fit(X=tc_pd['model_score'], y=tc_pd['label'], > sample_weight=tc_pd['weight']) > print("sklearn:", tc_regressor_sklearn.predict(tc_pd['model_score'])) > # >> sklearn: [0.5 0.5 0. 0. 0. 0.25 0.25 0.25 0.25 ] > # The pyspark implementation of Isotonic Regression. > tc_df = spark.createDataFrame(tc_pd) > tc_df = tc_df.withColumn('model_score', > F.col('model_score').cast(DoubleType())) > isotonic_regressor_pyspark = > IsotonicRegression_pyspark(featuresCol='model_score', labelCol='label', > weightCol='weight') > tc_model = isotonic_regressor_pyspark.fit(tc_df) > tc_pd = tc_model.transform(tc_df).toPandas() > print("pyspark:", tc_pd['prediction'].values) > # >> pyspark: [0.5 0.5 0. 0. 0. 0. 0. 0. 0. ] > # The result from the pyspark implementation seems unclear. Similar small toy > examples lead to similar non-expected results for the pyspark implementation. > # Strangely enough, for 'large' datasets, the difference between calibrated > model_scores generated by both implementations dissapears. > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41008) Isotonic regression result differs from sklearn implementation
[ https://issues.apache.org/jira/browse/SPARK-41008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41008: Assignee: Apache Spark > Isotonic regression result differs from sklearn implementation > -- > > Key: SPARK-41008 > URL: https://issues.apache.org/jira/browse/SPARK-41008 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.1 >Reporter: Arne Koopman >Assignee: Apache Spark >Priority: Minor > > > {code:python} > import pandas as pd > from pyspark.sql.types import DoubleType > from sklearn.isotonic import IsotonicRegression as IsotonicRegression_sklearn > from pyspark.ml.regression import IsotonicRegression as > IsotonicRegression_pyspark > # The P(positives | model_score): > # 0.6 -> 0.5 (1 out of the 2 labels is positive) > # 0.333 -> 0.333 (1 out of the 3 labels is positive) > # 0.20 -> 0.25 (1 out of the 4 labels is positive) > tc_pd = pd.DataFrame({ > "model_score": [0.6, 0.6, 0.333, 0.333, 0.333, 0.20, 0.20, 0.20, 0.20], > > "label": [1, 0, 0, 1, 0, 1, 0, 0, 0], > "weight": 1, } > ) > # The fraction of positives for each of the distinct model_scores would be > the best fit. > # Resulting in the following expected calibrated model_scores: > # "calibrated_model_score": [0.5, 0.5, 0.333, 0.333, 0.333, 0.25, 0.25, 0.25, > 0.25] > # The sklearn implementation of Isotonic Regression. > from sklearn.isotonic import IsotonicRegression as IsotonicRegression_sklearn > tc_regressor_sklearn = > IsotonicRegression_sklearn().fit(X=tc_pd['model_score'], y=tc_pd['label'], > sample_weight=tc_pd['weight']) > print("sklearn:", tc_regressor_sklearn.predict(tc_pd['model_score'])) > # >> sklearn: [0.5 0.5 0. 0. 0. 0.25 0.25 0.25 0.25 ] > # The pyspark implementation of Isotonic Regression. > tc_df = spark.createDataFrame(tc_pd) > tc_df = tc_df.withColumn('model_score', > F.col('model_score').cast(DoubleType())) > isotonic_regressor_pyspark = > IsotonicRegression_pyspark(featuresCol='model_score', labelCol='label', > weightCol='weight') > tc_model = isotonic_regressor_pyspark.fit(tc_df) > tc_pd = tc_model.transform(tc_df).toPandas() > print("pyspark:", tc_pd['prediction'].values) > # >> pyspark: [0.5 0.5 0. 0. 0. 0. 0. 0. 0. ] > # The result from the pyspark implementation seems unclear. Similar small toy > examples lead to similar non-expected results for the pyspark implementation. > # Strangely enough, for 'large' datasets, the difference between calibrated > model_scores generated by both implementations dissapears. > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41008) Isotonic regression result differs from sklearn implementation
[ https://issues.apache.org/jira/browse/SPARK-41008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41008: Assignee: (was: Apache Spark) > Isotonic regression result differs from sklearn implementation > -- > > Key: SPARK-41008 > URL: https://issues.apache.org/jira/browse/SPARK-41008 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.1 >Reporter: Arne Koopman >Priority: Minor > > > {code:python} > import pandas as pd > from pyspark.sql.types import DoubleType > from sklearn.isotonic import IsotonicRegression as IsotonicRegression_sklearn > from pyspark.ml.regression import IsotonicRegression as > IsotonicRegression_pyspark > # The P(positives | model_score): > # 0.6 -> 0.5 (1 out of the 2 labels is positive) > # 0.333 -> 0.333 (1 out of the 3 labels is positive) > # 0.20 -> 0.25 (1 out of the 4 labels is positive) > tc_pd = pd.DataFrame({ > "model_score": [0.6, 0.6, 0.333, 0.333, 0.333, 0.20, 0.20, 0.20, 0.20], > > "label": [1, 0, 0, 1, 0, 1, 0, 0, 0], > "weight": 1, } > ) > # The fraction of positives for each of the distinct model_scores would be > the best fit. > # Resulting in the following expected calibrated model_scores: > # "calibrated_model_score": [0.5, 0.5, 0.333, 0.333, 0.333, 0.25, 0.25, 0.25, > 0.25] > # The sklearn implementation of Isotonic Regression. > from sklearn.isotonic import IsotonicRegression as IsotonicRegression_sklearn > tc_regressor_sklearn = > IsotonicRegression_sklearn().fit(X=tc_pd['model_score'], y=tc_pd['label'], > sample_weight=tc_pd['weight']) > print("sklearn:", tc_regressor_sklearn.predict(tc_pd['model_score'])) > # >> sklearn: [0.5 0.5 0. 0. 0. 0.25 0.25 0.25 0.25 ] > # The pyspark implementation of Isotonic Regression. > tc_df = spark.createDataFrame(tc_pd) > tc_df = tc_df.withColumn('model_score', > F.col('model_score').cast(DoubleType())) > isotonic_regressor_pyspark = > IsotonicRegression_pyspark(featuresCol='model_score', labelCol='label', > weightCol='weight') > tc_model = isotonic_regressor_pyspark.fit(tc_df) > tc_pd = tc_model.transform(tc_df).toPandas() > print("pyspark:", tc_pd['prediction'].values) > # >> pyspark: [0.5 0.5 0. 0. 0. 0. 0. 0. 0. ] > # The result from the pyspark implementation seems unclear. Similar small toy > examples lead to similar non-expected results for the pyspark implementation. > # Strangely enough, for 'large' datasets, the difference between calibrated > model_scores generated by both implementations dissapears. > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41437) Do not optimize the input query twice for v1 write fallback
[ https://issues.apache.org/jira/browse/SPARK-41437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41437. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38942 [https://github.com/apache/spark/pull/38942] > Do not optimize the input query twice for v1 write fallback > --- > > Key: SPARK-41437 > URL: https://issues.apache.org/jira/browse/SPARK-41437 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41437) Do not optimize the input query twice for v1 write fallback
[ https://issues.apache.org/jira/browse/SPARK-41437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-41437: --- Assignee: Wenchen Fan > Do not optimize the input query twice for v1 write fallback > --- > > Key: SPARK-41437 > URL: https://issues.apache.org/jira/browse/SPARK-41437 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-41418) Upgrade scala-maven-plugin from 4.7.2 to 4.8.0
[ https://issues.apache.org/jira/browse/SPARK-41418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-41418. - > Upgrade scala-maven-plugin from 4.7.2 to 4.8.0 > -- > > Key: SPARK-41418 > URL: https://issues.apache.org/jira/browse/SPARK-41418 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41418) Upgrade scala-maven-plugin from 4.7.2 to 4.8.0
[ https://issues.apache.org/jira/browse/SPARK-41418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-41418. --- Resolution: Duplicate > Upgrade scala-maven-plugin from 4.7.2 to 4.8.0 > -- > > Key: SPARK-41418 > URL: https://issues.apache.org/jira/browse/SPARK-41418 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41439) Implement `DataFrame.melt`
[ https://issues.apache.org/jira/browse/SPARK-41439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644258#comment-17644258 ] jiaan.geng commented on SPARK-41439: I'm working on. > Implement `DataFrame.melt` > -- > > Key: SPARK-41439 > URL: https://issues.apache.org/jira/browse/SPARK-41439 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41438) Implement DataFrame. colRegex
[ https://issues.apache.org/jira/browse/SPARK-41438 ] jiaan.geng deleted comment on SPARK-41438: was (Author: beliefer): I'm working on. > Implement DataFrame. colRegex > - > > Key: SPARK-41438 > URL: https://issues.apache.org/jira/browse/SPARK-41438 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41438) Implement DataFrame. colRegex
[ https://issues.apache.org/jira/browse/SPARK-41438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644242#comment-17644242 ] jiaan.geng commented on SPARK-41438: I'm working on. > Implement DataFrame. colRegex > - > > Key: SPARK-41438 > URL: https://issues.apache.org/jira/browse/SPARK-41438 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41440) Implement DataFrame.randomSplit
Ruifeng Zheng created SPARK-41440: - Summary: Implement DataFrame.randomSplit Key: SPARK-41440 URL: https://issues.apache.org/jira/browse/SPARK-41440 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41439) Implement `DataFrame.melt`
Ruifeng Zheng created SPARK-41439: - Summary: Implement `DataFrame.melt` Key: SPARK-41439 URL: https://issues.apache.org/jira/browse/SPARK-41439 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41438) Implement DataFrame. colRegex
Ruifeng Zheng created SPARK-41438: - Summary: Implement DataFrame. colRegex Key: SPARK-41438 URL: https://issues.apache.org/jira/browse/SPARK-41438 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41386) There are some small files when using rebalance(column)
[ https://issues.apache.org/jira/browse/SPARK-41386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644224#comment-17644224 ] Apache Spark commented on SPARK-41386: -- User 'Juerin-Dong' has created a pull request for this issue: https://github.com/apache/spark/pull/38965 > There are some small files when using rebalance(column) > --- > > Key: SPARK-41386 > URL: https://issues.apache.org/jira/browse/SPARK-41386 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Zhe Dong >Priority: Minor > > *Problem ( REBALANCE(column)* {*}){*}: > SparkSession config: > {noformat} > config("spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled", > "true") > config("spark.sql.adaptive.advisoryPartitionSizeInBytes", "20m") > config("spark.sql.adaptive.rebalancePartitionsSmallPartitionFactor", > "0.5"){noformat} > so, we except that files size should be bigger than 20m*0.5=10m at least. > but in fact , we got some small files like the following: > {noformat} > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-0-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-1-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-2-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-3-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 9.1 M 2022-12-07 13:13 > .../part-4-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 3.0 M 2022-12-07 13:13 > .../part-5-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet{noformat} > 9.1 M and 3.0 M is smaller than 10M. we have to handle these small files in > another way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41386) There are some small files when using rebalance(column)
[ https://issues.apache.org/jira/browse/SPARK-41386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644223#comment-17644223 ] Apache Spark commented on SPARK-41386: -- User 'Juerin-Dong' has created a pull request for this issue: https://github.com/apache/spark/pull/38965 > There are some small files when using rebalance(column) > --- > > Key: SPARK-41386 > URL: https://issues.apache.org/jira/browse/SPARK-41386 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Zhe Dong >Priority: Minor > > *Problem ( REBALANCE(column)* {*}){*}: > SparkSession config: > {noformat} > config("spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled", > "true") > config("spark.sql.adaptive.advisoryPartitionSizeInBytes", "20m") > config("spark.sql.adaptive.rebalancePartitionsSmallPartitionFactor", > "0.5"){noformat} > so, we except that files size should be bigger than 20m*0.5=10m at least. > but in fact , we got some small files like the following: > {noformat} > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-0-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-1-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-2-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-3-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 9.1 M 2022-12-07 13:13 > .../part-4-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 3.0 M 2022-12-07 13:13 > .../part-5-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet{noformat} > 9.1 M and 3.0 M is smaller than 10M. we have to handle these small files in > another way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41386) There are some small files when using rebalance(column)
[ https://issues.apache.org/jira/browse/SPARK-41386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41386: Assignee: (was: Apache Spark) > There are some small files when using rebalance(column) > --- > > Key: SPARK-41386 > URL: https://issues.apache.org/jira/browse/SPARK-41386 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Zhe Dong >Priority: Minor > > *Problem ( REBALANCE(column)* {*}){*}: > SparkSession config: > {noformat} > config("spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled", > "true") > config("spark.sql.adaptive.advisoryPartitionSizeInBytes", "20m") > config("spark.sql.adaptive.rebalancePartitionsSmallPartitionFactor", > "0.5"){noformat} > so, we except that files size should be bigger than 20m*0.5=10m at least. > but in fact , we got some small files like the following: > {noformat} > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-0-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-1-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-2-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-3-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 9.1 M 2022-12-07 13:13 > .../part-4-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 3.0 M 2022-12-07 13:13 > .../part-5-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet{noformat} > 9.1 M and 3.0 M is smaller than 10M. we have to handle these small files in > another way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41386) There are some small files when using rebalance(column)
[ https://issues.apache.org/jira/browse/SPARK-41386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41386: Assignee: Apache Spark > There are some small files when using rebalance(column) > --- > > Key: SPARK-41386 > URL: https://issues.apache.org/jira/browse/SPARK-41386 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Zhe Dong >Assignee: Apache Spark >Priority: Minor > > *Problem ( REBALANCE(column)* {*}){*}: > SparkSession config: > {noformat} > config("spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled", > "true") > config("spark.sql.adaptive.advisoryPartitionSizeInBytes", "20m") > config("spark.sql.adaptive.rebalancePartitionsSmallPartitionFactor", > "0.5"){noformat} > so, we except that files size should be bigger than 20m*0.5=10m at least. > but in fact , we got some small files like the following: > {noformat} > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-0-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-1-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-2-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-3-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 9.1 M 2022-12-07 13:13 > .../part-4-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 3.0 M 2022-12-07 13:13 > .../part-5-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet{noformat} > 9.1 M and 3.0 M is smaller than 10M. we have to handle these small files in > another way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41386) There are some small files when using rebalance(column)
[ https://issues.apache.org/jira/browse/SPARK-41386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644141#comment-17644141 ] Zhe Dong edited comment on SPARK-41386 at 12/7/22 8:31 AM: --- OptimizeSkewInRebalancePartitions.scala {code:java} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * *http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.spark.sql.execution.adaptive import org.apache.spark.sql.execution.{CoalescedPartitionSpec, ShufflePartitionSpec, SparkPlan} import org.apache.spark.sql.execution.exchange.{REBALANCE_PARTITIONS_BY_COL, REBALANCE_PARTITIONS_BY_NONE, ShuffleOrigin} import org.apache.spark.sql.internal.SQLConf /** * A rule to optimize the skewed shuffle partitions in [[RebalancePartitions]] based on the map * output statistics, which can avoid data skew that hurt performance. * * We use ADVISORY_PARTITION_SIZE_IN_BYTES size to decide if a partition should be optimized. * Let's say we have 3 maps with 3 shuffle partitions, and assuming r1 has data skew issue. * the map side looks like: * m0:[b0, b1, b2], m1:[b0, b1, b2], m2:[b0, b1, b2] * and the reduce side looks like: *(without this rule) r1[m0-b1, m1-b1, m2-b1] * / \ * r0:[m0-b0, m1-b0, m2-b0], r1-0:[m0-b1], r1-1:[m1-b1], r1-2:[m2-b1], r2[m0-b2, m1-b2, m2-b2] */ object OptimizeSkewInRebalancePartitions extends AQEShuffleReadRule { override val supportedShuffleOrigins: Seq[ShuffleOrigin] = Seq(REBALANCE_PARTITIONS_BY_NONE, REBALANCE_PARTITIONS_BY_COL) /** * Splits the skewed partition based on the map size and the target partition size * after split. Create a list of `PartialReducerPartitionSpec` for skewed partition and * create `CoalescedPartition` for normal partition. */ private def optimizeSkewedPartitions( shuffleId: Int, bytesByPartitionId: Array[Long], targetSize: Long, smallPartitionFactor: Double): Seq[ShufflePartitionSpec] = { bytesByPartitionId.indices.flatMap { reduceIndex => val bytes = bytesByPartitionId(reduceIndex) if (bytes > targetSize) { val newPartitionSpec = ShufflePartitionsUtil.createSkewPartitionSpecs( shuffleId, reduceIndex, targetSize, smallPartitionFactor) if (newPartitionSpec.isEmpty) { CoalescedPartitionSpec(reduceIndex, reduceIndex + 1, bytes) :: Nil } else { logDebug(s"For shuffle $shuffleId, partition $reduceIndex is skew, " + s"split it into ${newPartitionSpec.get.size} parts.") newPartitionSpec.get } } else if (bytes < targetSize * smallPartitionFactor) { CoalescedPartitionSpec(reduceIndex, reduceIndex + 1, bytes) :: Nil } else { CoalescedPartitionSpec(reduceIndex, reduceIndex, bytes) :: Nil } } } private def tryOptimizeSkewedPartitions(shuffle: ShuffleQueryStageExec): SparkPlan = { val advisorySize = conf.getConf(SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES) val smallPartitionFactor = conf.getConf(SQLConf.ADAPTIVE_REBALANCE_PARTITIONS_SMALL_PARTITION_FACTOR) val mapStats = shuffle.mapStats if (mapStats.isEmpty || mapStats.get.bytesByPartitionId.forall( r => r <= advisorySize && r >= advisorySize * smallPartitionFactor)) { return shuffle } val newPartitionsSpec = optimizeSkewedPartitions( mapStats.get.shuffleId, mapStats.get.bytesByPartitionId, advisorySize, smallPartitionFactor) // return origin plan if we can not optimize partitions if (newPartitionsSpec.length == mapStats.get.bytesByPartitionId.length) { shuffle } else { AQEShuffleReadExec(shuffle, newPartitionsSpec) } } override def apply(plan: SparkPlan): SparkPlan = { if (!conf.getConf(SQLConf.ADAPTIVE_OPTIMIZE_SKEWS_IN_REBALANCE_PARTITIONS_ENABLED)) { return plan } plan transformUp { case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) => tryOptimizeSkewedPartitions(stage) } } } {code} was (Author: JIRAUSER298432): OptimizeSkewInRebalancePartitions.scala {noformat} /* * Licensed to the
[jira] [Assigned] (SPARK-41433) Make Max Arrow BatchSize configurable
[ https://issues.apache.org/jira/browse/SPARK-41433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-41433: - Assignee: Ruifeng Zheng > Make Max Arrow BatchSize configurable > - > > Key: SPARK-41433 > URL: https://issues.apache.org/jira/browse/SPARK-41433 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41433) Make Max Arrow BatchSize configurable
[ https://issues.apache.org/jira/browse/SPARK-41433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41433. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38958 [https://github.com/apache/spark/pull/38958] > Make Max Arrow BatchSize configurable > - > > Key: SPARK-41433 > URL: https://issues.apache.org/jira/browse/SPARK-41433 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41386) There are some small files when using rebalance(column)
[ https://issues.apache.org/jira/browse/SPARK-41386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644141#comment-17644141 ] Zhe Dong edited comment on SPARK-41386 at 12/7/22 8:29 AM: --- OptimizeSkewInRebalancePartitions.scala {noformat} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package org.apache.spark.sql.execution.adaptiveimport org.apache.spark.sql.execution.{CoalescedPartitionSpec, ShufflePartitionSpec, SparkPlan} import org.apache.spark.sql.execution.exchange.{REBALANCE_PARTITIONS_BY_COL, REBALANCE_PARTITIONS_BY_NONE, ShuffleOrigin} import org.apache.spark.sql.internal.SQLConf/** * A rule to optimize the skewed shuffle partitions in [[RebalancePartitions]] based on the map * output statistics, which can avoid data skew that hurt performance. * * We use ADVISORY_PARTITION_SIZE_IN_BYTES size to decide if a partition should be optimized. * Let's say we have 3 maps with 3 shuffle partitions, and assuming r1 has data skew issue. * the map side looks like: * m0:[b0, b1, b2], m1:[b0, b1, b2], m2:[b0, b1, b2] * and the reduce side looks like: * (without this rule) r1[m0-b1, m1-b1, m2-b1] * / \ * r0:[m0-b0, m1-b0, m2-b0], r1-0:[m0-b1], r1-1:[m1-b1], r1-2:[m2-b1], r2[m0-b2, m1-b2, m2-b2] */ object OptimizeSkewInRebalancePartitions extends AQEShuffleReadRule { override val supportedShuffleOrigins: Seq[ShuffleOrigin] = Seq(REBALANCE_PARTITIONS_BY_NONE, REBALANCE_PARTITIONS_BY_COL) /** * Splits the skewed partition based on the map size and the target partition size * after split. Create a list of `PartialReducerPartitionSpec` for skewed partition and * create `CoalescedPartition` for normal partition. */ private def optimizeSkewedPartitions( shuffleId: Int, bytesByPartitionId: Array[Long], targetSize: Long, smallPartitionFactor: Double): Seq[ShufflePartitionSpec] = { bytesByPartitionId.indices.flatMap { reduceIndex => val bytes = bytesByPartitionId(reduceIndex) if (bytes > targetSize) { val newPartitionSpec = ShufflePartitionsUtil.createSkewPartitionSpecs( shuffleId, reduceIndex, targetSize, smallPartitionFactor) if (newPartitionSpec.isEmpty) { CoalescedPartitionSpec(reduceIndex, reduceIndex + 1, bytes) :: Nil } else { logDebug(s"For shuffle $shuffleId, partition $reduceIndex is skew, " + s"split it into ${newPartitionSpec.get.size} parts.") newPartitionSpec.get } } else if (bytes < targetSize * smallPartitionFactor) { CoalescedPartitionSpec(reduceIndex, reduceIndex + 1, bytes) :: Nil } else { CoalescedPartitionSpec(reduceIndex, reduceIndex, bytes) :: Nil } } } private def tryOptimizeSkewedPartitions(shuffle: ShuffleQueryStageExec): SparkPlan = { val advisorySize = conf.getConf(SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES) val smallPartitionFactor = conf.getConf(SQLConf.ADAPTIVE_REBALANCE_PARTITIONS_SMALL_PARTITION_FACTOR) val mapStats = shuffle.mapStats if (mapStats.isEmpty || mapStats.get.bytesByPartitionId.forall( r => r <= advisorySize && r >= advisorySize * smallPartitionFactor)) { return shuffle } val newPartitionsSpec = optimizeSkewedPartitions( mapStats.get.shuffleId, mapStats.get.bytesByPartitionId, advisorySize, smallPartitionFactor) // return origin plan if we can not optimize partitions if (newPartitionsSpec.length == mapStats.get.bytesByPartitionId.length) { shuffle } else { AQEShuffleReadExec(shuffle, newPartitionsSpec) } } override def apply(plan: SparkPlan): SparkPlan = { if (!conf.getConf(SQLConf.ADAPTIVE_OPTIMIZE_SKEWS_IN_REBALANCE_PARTITIONS_ENABLED)) { return plan } plan transformUp { case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) => tryOptimizeSkewedPartitions(stage) } } } {noformat} was (Author: JIRAUSER298432): {noformat} if (mapStats.isEmpty || mapStats.get.bytesByPartitionId.forall(_
[jira] [Assigned] (SPARK-41403) Implement DataFrame.describe
[ https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-41403: - Assignee: jiaan.geng > Implement DataFrame.describe > > > Key: SPARK-41403 > URL: https://issues.apache.org/jira/browse/SPARK-41403 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41403) Implement DataFrame.describe
[ https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41403. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38938 [https://github.com/apache/spark/pull/38938] > Implement DataFrame.describe > > > Key: SPARK-41403 > URL: https://issues.apache.org/jira/browse/SPARK-41403 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41349) Implement `DataFrame.hint`
[ https://issues.apache.org/jira/browse/SPARK-41349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41349. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38899 [https://github.com/apache/spark/pull/38899] > Implement `DataFrame.hint` > -- > > Key: SPARK-41349 > URL: https://issues.apache.org/jira/browse/SPARK-41349 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Deng Ziming >Priority: Major > Fix For: 3.4.0 > > > implement DataFrame.hint with the proto message added in > https://issues.apache.org/jira/browse/SPARK-41345 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41349) Implement `DataFrame.hint`
[ https://issues.apache.org/jira/browse/SPARK-41349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-41349: --- Assignee: Deng Ziming > Implement `DataFrame.hint` > -- > > Key: SPARK-41349 > URL: https://issues.apache.org/jira/browse/SPARK-41349 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Deng Ziming >Priority: Major > > implement DataFrame.hint with the proto message added in > https://issues.apache.org/jira/browse/SPARK-41345 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org