nealrichardson commented on PR #13985: URL: https://github.com/apache/arrow/pull/13985#issuecomment-1236401632
Some of the remaining benchmark regressions are spurious (file-write, dataframe-to-table, neither of which are affected by this change). The other TPC-H ones are legitimate, but they're on tiny scale factors of data (0.1, 0.01, i.e. 100mb and 10mb), so the extra 10-15ms that the type checking this PR introduces shows up as statistically significant. IMO the tradeoff is worth it: we preserve the types of the original data better (especially for integers, and after ARROW-17601, decimals), we have more convenient passing of strings for dates/timestamps in expressions, and by avoiding unnecessary casts, we should get performance benefits on some queries. As it turns out, we aren't currently benchmarking queries where the performance benefit would show. I'd like to add a benchmark for the case shown on this issue, but it fails for me locally due to ARROW-17556. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org