marin-ma commented on issue #11397:
URL: 
https://github.com/apache/incubator-gluten/issues/11397#issuecomment-3791805106

   In spark hive partition write, sort is added in file `V1Writes` to sort 
records by the partition key before writing. If the next record has a different 
partition key, the current writer is closed and a new writer is created. 
Therefore, only one writer is alive at a time during the write.
   
   In Gluten, this sort is removed by rule 
`RemoveNativeWriteFilesSortAndProject`. Because Velox writer creates a new 
writer for each partition key, and holds all writers in a vector until the 
write task ends. In this case, there's no benefit to sort the data, but holding 
too many `Writer` instances in each task can consume too much memory and may 
result in OOM.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to