[jira] [Resolved] (SPARK-41952) Upgrade Parquet to fix off-heap memory leaks in Zstd codec

2023-02-20 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-41952. -- Fix Version/s: 3.2.4 3.4.0 3.3.3 Resolution: Fixed >

[jira] [Assigned] (SPARK-41952) Upgrade Parquet to fix off-heap memory leaks in Zstd codec

2023-02-20 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-41952: Assignee: Cheng Pan > Upgrade Parquet to fix off-heap memory leaks in Zstd codec >

[jira] [Created] (SPARK-42454) SPJ: encapsulate all SPJ related parameters in BatchScanExec

2023-02-15 Thread Chao Sun (Jira)
Chao Sun created SPARK-42454: Summary: SPJ: encapsulate all SPJ related parameters in BatchScanExec Key: SPARK-42454 URL: https://issues.apache.org/jira/browse/SPARK-42454 Project: Spark Issue

[jira] [Commented] (SPARK-33807) Data Source V2: Remove read specific distributions

2023-02-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685674#comment-17685674 ] Chao Sun commented on SPARK-33807: -- This is actually already resolved as part of SPARK-37377. > Data

[jira] [Assigned] (SPARK-33807) Data Source V2: Remove read specific distributions

2023-02-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-33807: Assignee: (was: Chao Sun) > Data Source V2: Remove read specific distributions >

[jira] [Assigned] (SPARK-33807) Data Source V2: Remove read specific distributions

2023-02-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-33807: Assignee: Chao Sun > Data Source V2: Remove read specific distributions >

[jira] [Updated] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode

2023-02-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41470: - Fix Version/s: 3.4.0 (was: 3.5.0) > SPJ: Spark shouldn't assume InternalRow

[jira] [Assigned] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode

2023-02-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-41470: Assignee: Mars > SPJ: Spark shouldn't assume InternalRow implements equals and hashCode >

[jira] [Resolved] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode

2023-02-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-41470. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 39687

[jira] [Created] (SPARK-42040) SPJ: Introduce a new API for V2 input partition to report partition size

2023-01-12 Thread Chao Sun (Jira)
Chao Sun created SPARK-42040: Summary: SPJ: Introduce a new API for V2 input partition to report partition size Key: SPARK-42040 URL: https://issues.apache.org/jira/browse/SPARK-42040 Project: Spark

[jira] [Created] (SPARK-42039) SPJ: Remove Option in KeyGroupedPartitioning#partitionValues

2023-01-12 Thread Chao Sun (Jira)
Chao Sun created SPARK-42039: Summary: SPJ: Remove Option in KeyGroupedPartitioning#partitionValues Key: SPARK-42039 URL: https://issues.apache.org/jira/browse/SPARK-42039 Project: Spark Issue

[jira] [Created] (SPARK-42038) SPJ: Support partially clustered distribution

2023-01-12 Thread Chao Sun (Jira)
Chao Sun created SPARK-42038: Summary: SPJ: Support partially clustered distribution Key: SPARK-42038 URL: https://issues.apache.org/jira/browse/SPARK-42038 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-36529) Decouple CPU with IO work in vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-36529: - Parent: (was: SPARK-35743) Issue Type: Bug (was: Sub-task) > Decouple CPU with IO work in

[jira] [Updated] (SPARK-36529) Decouple CPU with IO work in vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-36529: - Issue Type: Improvement (was: Bug) > Decouple CPU with IO work in vectorized Parquet reader >

[jira] [Updated] (SPARK-36528) Implement lazy decoding for the vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-36528: - Parent: (was: SPARK-35743) Issue Type: Bug (was: Sub-task) > Implement lazy decoding for

[jira] [Updated] (SPARK-36528) Implement lazy decoding for the vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-36528: - Issue Type: New Feature (was: Bug) > Implement lazy decoding for the vectorized Parquet reader >

[jira] [Resolved] (SPARK-35743) Improve Parquet vectorized reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-35743. -- Fix Version/s: 3.4.0 Resolution: Fixed > Improve Parquet vectorized reader >

[jira] [Updated] (SPARK-36527) Implement lazy materialization for the vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-36527: - Parent: (was: SPARK-35743) Issue Type: Improvement (was: Sub-task) > Implement lazy

[jira] [Assigned] (SPARK-41413) SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-22 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-41413: Assignee: Chao Sun > SPJ: Avoid shuffle when partition keys mismatch, but join expressions are

[jira] [Resolved] (SPARK-41413) SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-22 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-41413. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38950

[jira] [Updated] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41470: - Summary: SPJ: Spark shouldn't assume InternalRow implements equals and hashCode (was: SPJ shouldn't

[jira] [Updated] (SPARK-41471) SPJ: Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41471: - Summary: SPJ: Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning (was: SPJ:

[jira] [Updated] (SPARK-40946) SPJ: Introduce a new DataSource V2 interface SupportsPushDownClusterKeys

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-40946: - Summary: SPJ: Introduce a new DataSource V2 interface SupportsPushDownClusterKeys (was: Introduce a

[jira] [Updated] (SPARK-41398) SPJ: Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41398: - Summary: SPJ: Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering

[jira] [Updated] (SPARK-37375) Umbrella: Storage Partitioned Join (SPJ)

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37375: - Summary: Umbrella: Storage Partitioned Join (SPJ) (was: Umbrella: Storage Partitioned Join) >

[jira] [Updated] (SPARK-41413) SPJ: Spark should avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41413: - Summary: SPJ: Spark should avoid shuffle when partition keys mismatch, but join expressions are

[jira] [Updated] (SPARK-41413) SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41413: - Summary: SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible (was:

[jira] [Updated] (SPARK-37377) SPJ: Initial implementation of Storage-Partitioned Join

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37377: - Summary: SPJ: Initial implementation of Storage-Partitioned Join (was: Initial implementation of

[jira] [Created] (SPARK-41471) SPJ: reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning

2022-12-09 Thread Chao Sun (Jira)
Chao Sun created SPARK-41471: Summary: SPJ: reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning Key: SPARK-41471 URL: https://issues.apache.org/jira/browse/SPARK-41471 Project:

[jira] [Updated] (SPARK-37378) SPJ: Convert V2 Transform expressions into catalyst expressions and load their associated functions from V2 FunctionCatalog

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37378: - Summary: SPJ: Convert V2 Transform expressions into catalyst expressions and load their associated

[jira] [Updated] (SPARK-37376) SPJ: Introduce a new DataSource V2 interface HasPartitionKey

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37376: - Summary: SPJ: Introduce a new DataSource V2 interface HasPartitionKey (was: Introduce a new

[jira] [Created] (SPARK-41470) SPJ shouldn't assume InternalRow implements equals and hashCode

2022-12-09 Thread Chao Sun (Jira)
Chao Sun created SPARK-41470: Summary: SPJ shouldn't assume InternalRow implements equals and hashCode Key: SPARK-41470 URL: https://issues.apache.org/jira/browse/SPARK-41470 Project: Spark

[jira] [Created] (SPARK-41413) Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-06 Thread Chao Sun (Jira)
Chao Sun created SPARK-41413: Summary: Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible Key: SPARK-41413 URL:

[jira] [Created] (SPARK-41398) Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-05 Thread Chao Sun (Jira)
Chao Sun created SPARK-41398: Summary: Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match Key: SPARK-41398 URL: https://issues.apache.org/jira/browse/SPARK-41398

[jira] [Assigned] (SPARK-41096) Support reading parquet FIXED_LEN_BYTE_ARRAY type

2022-11-14 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-41096: Assignee: Kazuyuki Tanimura > Support reading parquet FIXED_LEN_BYTE_ARRAY type >

[jira] [Resolved] (SPARK-41096) Support reading parquet FIXED_LEN_BYTE_ARRAY type

2022-11-14 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-41096. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38628

[jira] [Created] (SPARK-41091) Fix Docker release tool for branch-3.2

2022-11-09 Thread Chao Sun (Jira)
Chao Sun created SPARK-41091: Summary: Fix Docker release tool for branch-3.2 Key: SPARK-41091 URL: https://issues.apache.org/jira/browse/SPARK-41091 Project: Spark Issue Type: Improvement

[jira] (SPARK-33807) Data Source V2: Remove read specific distributions

2022-10-31 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807 ] Chao Sun deleted comment on SPARK-33807: -- was (Author: JIRAUSER295436): Thank you for sharing such good information. Very informative and effective post. 

[jira] (SPARK-33807) Data Source V2: Remove read specific distributions

2022-10-31 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807 ] Chao Sun deleted comment on SPARK-33807: -- was (Author: JIRAUSER294516): Great job. [Salesforce Marketing Cloud

[jira] (SPARK-33807) Data Source V2: Remove read specific distributions

2022-10-22 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807 ] Chao Sun deleted comment on SPARK-33807: -- was (Author: JIRAUSER295111): Very informative and effective post.  [Vlocity Platform Developer

[jira] [Commented] (SPARK-40876) Spark's Vectorized ParquetReader should support type promotions

2022-10-21 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17622514#comment-17622514 ] Chao Sun commented on SPARK-40876: -- Yes, Spark doesn't support int -> long for Parquet. It's a long

[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-10 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615255#comment-17615255 ] Chao Sun commented on SPARK-40703: -- Thanks [~bryanck] . Now I see where the issue is. In your pyspark

[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614303#comment-17614303 ] Chao Sun commented on SPARK-40703: -- Hmm somehow in the unit test I was able to see that changing

[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614298#comment-17614298 ] Chao Sun commented on SPARK-40703: -- (one idea is that {{SinglePartitionSpec#canCreatePartitioing}}

[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614296#comment-17614296 ] Chao Sun commented on SPARK-40703: -- Hmm interesting. Let me try to come up with a unit test and check

[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614289#comment-17614289 ] Chao Sun commented on SPARK-40703: -- I see. The reason HashPartitioning is not picked as the best

[jira] [Updated] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-40703: - Component/s: SQL (was: Spark Core) > Performance regression for joins in Spark 3.3

[jira] [Commented] (SPARK-40508) Treat unknown partitioning as UnknownPartitioning

2022-09-21 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607869#comment-17607869 ] Chao Sun commented on SPARK-40508: -- [~dongjoon][~viirya] could you add [~yuzhih...@gmail.com] to the

[jira] [Resolved] (SPARK-40508) Treat unknown partitioning as UnknownPartitioning

2022-09-21 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40508. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37952

[jira] [Updated] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-09-16 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-40169: - Fix Version/s: 3.2.3 > Fix the issue with Parquet column index and predicate pushdown in Data source >

[jira] [Assigned] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-09-16 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-40169: Assignee: Chao Sun > Fix the issue with Parquet column index and predicate pushdown in Data

[jira] [Updated] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-09-16 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-40169: - Fix Version/s: 3.3.1 > Fix the issue with Parquet column index and predicate pushdown in Data source >

[jira] [Resolved] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-09-16 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40169. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37881

[jira] [Resolved] (SPARK-40295) Allow v2 functions with literal args in write distribution and ordering

2022-09-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40295. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37749

[jira] [Assigned] (SPARK-40295) Allow v2 functions with literal args in write distribution and ordering

2022-09-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-40295: Assignee: Anton Okolnychyi > Allow v2 functions with literal args in write distribution and

[jira] [Commented] (SPARK-40128) Add DELTA_LENGTH_BYTE_ARRAY as a recognized standalone encoding in VectorizedColumnReader

2022-08-17 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581073#comment-17581073 ] Chao Sun commented on SPARK-40128: -- Seems we need to add [~dennishuo] as Spark contributor in order to

[jira] [Resolved] (SPARK-40128) Add DELTA_LENGTH_BYTE_ARRAY as a recognized standalone encoding in VectorizedColumnReader

2022-08-17 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40128. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37557

[jira] [Assigned] (SPARK-40110) Add JDBCWithAQESuite

2022-08-17 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-40110: Assignee: Kazuyuki Tanimura > Add JDBCWithAQESuite > > >

[jira] [Resolved] (SPARK-40110) Add JDBCWithAQESuite

2022-08-17 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40110. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37544

[jira] [Assigned] (SPARK-40052) Handle direct byte buffers in VectorizedDeltaBinaryPackedReader

2022-08-12 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-40052: Assignee: Ivan Sadikov > Handle direct byte buffers in VectorizedDeltaBinaryPackedReader >

[jira] [Resolved] (SPARK-40052) Handle direct byte buffers in VectorizedDeltaBinaryPackedReader

2022-08-12 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40052. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37485

[jira] [Updated] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

2022-08-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-39833: - Affects Version/s: 3.3.0 > Filtered parquet data frame count() and show() produce inconsistent results

[jira] [Commented] (SPARK-39863) Upgrade Hadoop to 3.3.4

2022-08-03 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17574770#comment-17574770 ] Chao Sun commented on SPARK-39863: -- Thanks [~ste...@apache.org], noted > Upgrade Hadoop to 3.3.4 >

[jira] [Updated] (SPARK-39951) Support columnar batches with nested fields in Parquet V2

2022-08-02 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-39951: - Fix Version/s: 3.3.1 > Support columnar batches with nested fields in Parquet V2 >

[jira] [Resolved] (SPARK-39951) Support columnar batches with nested fields in Parquet V2

2022-08-02 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-39951. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37379

[jira] [Created] (SPARK-39863) Upgrade Hadoop to 3.3.4

2022-07-25 Thread Chao Sun (Jira)
Chao Sun created SPARK-39863: Summary: Upgrade Hadoop to 3.3.4 Key: SPARK-39863 URL: https://issues.apache.org/jira/browse/SPARK-39863 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-39657) YARN AM client should call the non-static setTokensConf method

2022-07-01 Thread Chao Sun (Jira)
Chao Sun created SPARK-39657: Summary: YARN AM client should call the non-static setTokensConf method Key: SPARK-39657 URL: https://issues.apache.org/jira/browse/SPARK-39657 Project: Spark

[jira] [Resolved] (SPARK-39638) Change to use `ConstantColumnVector` to store partition columns in `OrcColumnarBatchReader`

2022-07-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-39638. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37029

[jira] [Assigned] (SPARK-39638) Change to use `ConstantColumnVector` to store partition columns in `OrcColumnarBatchReader`

2022-07-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-39638: Assignee: Yang Jie > Change to use `ConstantColumnVector` to store partition columns in >

[jira] [Commented] (SPARK-39644) Add RangePartitioning to DataSource V2

2022-06-30 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561196#comment-17561196 ] Chao Sun commented on SPARK-39644: -- Thanks. Following this JIRA now. > Add RangePartitioning to

[jira] [Assigned] (SPARK-39231) Change to use `ConstantColumnVector` to store partition columns in `VectorizedParquetRecordReader`

2022-06-29 Thread Chao Sun (Jira)
Title: Message Title Chao Sun assigned an

[jira] [Resolved] (SPARK-39231) Change to use `ConstantColumnVector` to store partition columns in `VectorizedParquetRecordReader`

2022-06-29 Thread Chao Sun (Jira)
Title: Message Title Chao Sun resolved as

[jira] [Updated] (SPARK-34863) Support nested column in Spark Parquet vectorized readers

2022-06-27 Thread Chao Sun (Jira)
Title: Message Title Chao Sun updated an

[jira] [Resolved] (SPARK-38647) Add SupportsReportOrdering mix in interface for Scan

2022-06-21 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-38647. -- Fix Version/s: 3.4.0 Assignee: Enrico Minack Resolution: Fixed > Add

[jira] [Commented] (SPARK-29260) Enable supported Hive metastore versions once it support altering database location

2022-06-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545268#comment-17545268 ] Chao Sun commented on SPARK-29260: -- Thanks [~yumwang]. Spark currently throw exception when Hive client

[jira] [Commented] (SPARK-29260) Enable supported Hive metastore versions once it support altering database location

2022-06-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545166#comment-17545166 ] Chao Sun commented on SPARK-29260: -- [~yumwang] Looks like HIVE-8472 is for the server side changes of

[jira] [Updated] (SPARK-39313) V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-06-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-39313: - Fix Version/s: 3.3.0 > V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be >

[jira] [Assigned] (SPARK-39313) V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-06-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-39313: Assignee: Cheng Pan > V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not

[jira] [Resolved] (SPARK-39313) V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-06-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-39313. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36697

[jira] [Updated] (SPARK-39313) V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-39313: - Priority: Blocker (was: Critical) > V2ExpressionUtils.toCatalystOrdering should fail if V2Expression

[jira] [Resolved] (SPARK-39086) Support UDT in Spark Parquet vectorized reader

2022-05-12 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-39086. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36427

[jira] [Assigned] (SPARK-39086) Support UDT in Spark Parquet vectorized reader

2022-05-12 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-39086: Assignee: Ivan Sadikov > Support UDT in Spark Parquet vectorized reader >

[jira] [Created] (SPARK-39119) Upgrade to Hadoop 3.3.3

2022-05-06 Thread Chao Sun (Jira)
Chao Sun created SPARK-39119: Summary: Upgrade to Hadoop 3.3.3 Key: SPARK-39119 URL: https://issues.apache.org/jira/browse/SPARK-39119 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible

2022-05-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-38891: - Fix Version/s: 3.3.0 > Skipping allocating vector for repetition & definition levels when possible >

[jira] [Resolved] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible

2022-05-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-38891. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36202

[jira] [Assigned] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible

2022-05-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-38891: Assignee: Chao Sun > Skipping allocating vector for repetition & definition levels when possible

[jira] [Assigned] (SPARK-38573) Support Auto Partition Statistics Collection

2022-04-15 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-38573: Assignee: Kazuyuki Tanimura > Support Auto Partition Statistics Collection >

[jira] [Resolved] (SPARK-38573) Support Auto Partition Statistics Collection

2022-04-15 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-38573. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36067

[jira] [Created] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible

2022-04-13 Thread Chao Sun (Jira)
Chao Sun created SPARK-38891: Summary: Skipping allocating vector for repetition & definition levels when possible Key: SPARK-38891 URL: https://issues.apache.org/jira/browse/SPARK-38891 Project: Spark

[jira] [Created] (SPARK-38840) Enable spark.sql.parquet.enableNestedColumnVectorizedReader on master branch by default

2022-04-08 Thread Chao Sun (Jira)
Chao Sun created SPARK-38840: Summary: Enable spark.sql.parquet.enableNestedColumnVectorizedReader on master branch by default Key: SPARK-38840 URL: https://issues.apache.org/jira/browse/SPARK-38840

[jira] [Resolved] (SPARK-38786) Test Bug in StatisticsSuite "change stats after add/drop partition command"

2022-04-05 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-38786. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36075

[jira] [Assigned] (SPARK-38786) Test Bug in StatisticsSuite "change stats after add/drop partition command"

2022-04-05 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-38786: Assignee: Kazuyuki Tanimura > Test Bug in StatisticsSuite "change stats after add/drop partition

[jira] [Assigned] (SPARK-34863) Support nested column in Spark Parquet vectorized readers

2022-04-05 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-34863: Assignee: Chao Sun (was: Apache Spark) > Support nested column in Spark Parquet vectorized

[jira] [Updated] (SPARK-37378) Convert V2 Transform expressions into catalyst expressions and load their associated functions from V2 FunctionCatalog

2022-04-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37378: - Fix Version/s: 3.4.0 > Convert V2 Transform expressions into catalyst expressions and load their >

[jira] [Resolved] (SPARK-37378) Convert V2 Transform expressions into catalyst expressions and load their associated functions from V2 FunctionCatalog

2022-04-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-37378. -- Resolution: Duplicate This JIRA is covered as part of SPARK-37377 > Convert V2 Transform expressions

[jira] [Updated] (SPARK-37377) Initial implementation of Storage-Partitioned Join

2022-04-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37377: - Description: This Jira tracks the initial implementation of storage-partitioned join. (was: Currently

[jira] [Updated] (SPARK-37377) Initial implementation of Storage-Partitioned Join

2022-04-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37377: - Summary: Initial implementation of Storage-Partitioned Join (was: Refactor V2 Partitioning interface

[jira] [Updated] (SPARK-37974) Implement vectorized DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support

2022-03-31 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37974: - Fix Version/s: 3.3.0 (was: 3.4.0) > Implement vectorized DELTA_BYTE_ARRAY and

[jira] [Resolved] (SPARK-37974) Implement vectorized DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support

2022-03-31 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-37974. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 35262

[jira] [Assigned] (SPARK-37974) Implement vectorized DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support

2022-03-31 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-37974: Assignee: Parth Chandra > Implement vectorized DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY

<    1   2   3   4   5   >