Re: [PR] [KYUUBI #6830] Allow indicate advisory shuffle partition size when me… [kyuubi]

via GitHub Fri, 06 Dec 2024 00:32:11 -0800


yabola commented on PR #6831:
URL: https://github.com/apache/kyuubi/pull/6831#issuecomment-2522505452


   @pan3793 emmm, but in the scenario of merging small files, we only need to 
consider the shuffle data size (this rule is only for shuffle data to file, 
doesn't matter what the data source is).
   Due to the row storage and estimation method of shuffle data, there is still 
a significant difference between the shuffle size and the actual written file 
size, especially for Parquet , usually less than 1/3 of the size of the shuffle 
data.
   iceberg Implementation:
   
https://github.com/apache/iceberg/blob/38c8daa4eae8a75ab46571f1efce1609100f53dd/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkCompressionUtil.java#L60-L69
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [KYUUBI #6830] Allow indicate advisory shuffle partition size when me… [kyuubi]

Reply via email to