[ 
https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700316#comment-17700316
 ] 

Apache Spark commented on SPARK-42779:
--------------------------------------

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40421

> Allow V2 writes to indicate advisory partition size
> ---------------------------------------------------
>
>                 Key: SPARK-42779
>                 URL: https://issues.apache.org/jira/browse/SPARK-42779
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Anton Okolnychyi
>            Priority: Major
>
> Data sources may request a particular distribution and ordering of data for 
> V2 writes. If AQE is enabled, the default session advisory partition size 
> (64MB) will be used as guidance. Unfortunately, this default value can still 
> lead to small files because the written data can be compressed nicely using 
> columnar file formats. Spark should allow data sources to indicate the 
> advisory shuffle partition size, just like it lets data sources request a 
> particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to