lyang24 commented on PR #9093: URL: https://github.com/apache/arrow-rs/pull/9093#issuecomment-3734148949
> So, TLDR is my analyis is that we aren't properly sizing the allocations. The good news is we can fix this. The bad news is it may be tricky. I will update the original ticket too Thanks for the deep dive, i think passing down a capacity hint from ArrayReaderBuilder is the right way to go. I made a impl attempt - and results looks promising. clickbench query 10 no predicate pushdown | Run | Original Arrow-rs (ms) | Patched Arrow-rs (ms) | |------------|------------------------|------------------------| | 1 (cold) | 174 | 181 | | 2 | 115 | 109 | | 3 | 113 | 107 | | 4 | 114 | 103 | | 5 | 115 | 111 | | 6 | 118 | 110 | | **Avg (warm)** | **115** | **108** | with predicate pushdown the sql runs so fast (7ms) the difference become hard to tell ` SET datafusion.execution.parquet.pushdown_filters = true;` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
