[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size
[ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42779: - Assignee: Anton Okolnychyi > Allow V2 writes to indicate advisory partition size > --- > > Key: SPARK-42779 > URL: https://issues.apache.org/jira/browse/SPARK-42779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > > Data sources may request a particular distribution and ordering of data for > V2 writes. If AQE is enabled, the default session advisory partition size > (64MB) will be used as guidance. Unfortunately, this default value can still > lead to small files because the written data can be compressed nicely using > columnar file formats. Spark should allow data sources to indicate the > advisory shuffle partition size, just like it lets data sources request a > particular number of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size
[ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42779: Assignee: Apache Spark > Allow V2 writes to indicate advisory partition size > --- > > Key: SPARK-42779 > URL: https://issues.apache.org/jira/browse/SPARK-42779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Apache Spark >Priority: Major > > Data sources may request a particular distribution and ordering of data for > V2 writes. If AQE is enabled, the default session advisory partition size > (64MB) will be used as guidance. Unfortunately, this default value can still > lead to small files because the written data can be compressed nicely using > columnar file formats. Spark should allow data sources to indicate the > advisory shuffle partition size, just like it lets data sources request a > particular number of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size
[ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42779: Assignee: (was: Apache Spark) > Allow V2 writes to indicate advisory partition size > --- > > Key: SPARK-42779 > URL: https://issues.apache.org/jira/browse/SPARK-42779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Priority: Major > > Data sources may request a particular distribution and ordering of data for > V2 writes. If AQE is enabled, the default session advisory partition size > (64MB) will be used as guidance. Unfortunately, this default value can still > lead to small files because the written data can be compressed nicely using > columnar file formats. Spark should allow data sources to indicate the > advisory shuffle partition size, just like it lets data sources request a > particular number of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org