marin-ma opened a new issue, #11050:
URL: https://github.com/apache/incubator-gluten/issues/11050

   ### Description
   
   The current method for coalescing input splits into partitions is based on 
sorted file sizes. After sorting the input splits, only adjacent splits are 
coalesced into a single partition. If the input splits include some small 
files, the smallest files are likely to be grouped into the same partition, 
which may cause that task to read many small files and become a straggler.
   The coalescing method should be optimized to evenly distribute small files 
across different partitions.
   
   ### Gluten version
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to