zbs commented on issue #45638:
URL: https://github.com/apache/arrow/issues/45638#issuecomment-2689678138

   > From the footprint above, it seems that metadata of written row groups 
consumes 16.25% of the total memory. Is something else dominating the memory 
consumption?
   
   AFAICT nothing is dominating it -- I will try to increase the input subset 
used and run again. It's possible with my subsetting I've reduced the severity 
of the bug.
   
   > Did you write a large number of row groups and columns?
   Yes, dozens of files each with around 10K batches, totaling 10s of millions 
of rows. In many cases the batches contain 1 row. I disabled dictionary and 
statistics and will report back.
   
   Re: better APIs, our entire environment needs to upgrade at the same time, 
so when that happens in a couple months, and this is still not resolved, 
hopefully these APIs will solve it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to