lyang24 commented on PR #9093:
URL: https://github.com/apache/arrow-rs/pull/9093#issuecomment-3734148949

   > So, TLDR is my analyis is that we aren't properly sizing the allocations. 
The good news is we can fix this. The bad news is it may be tricky. I will 
update the original ticket too
   
   Thanks for the deep dive, i think passing down a capacity hint from 
ArrayReaderBuilder is the right way to go. I made a impl attempt - and results 
looks promising.
   clickbench query 10 no predicate pushdown
   
   | Run        | Original Arrow-rs (ms) | Patched Arrow-rs (ms) |
   |------------|------------------------|------------------------|
   | 1 (cold)   | 174                    | 181                    |
   | 2          | 115                    | 109                    |
   | 3          | 113                    | 107                    |
   | 4          | 114                    | 103                    |
   | 5          | 115                    | 111                    |
   | 6          | 118                    | 110                    |
   | **Avg (warm)** | **115**                | **108**                |
   
   with predicate pushdown the sql runs so fast (7ms) the difference become 
hard to tell
   ` SET datafusion.execution.parquet.pushdown_filters = true;`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to