nealrichardson commented on PR #13985:
URL: https://github.com/apache/arrow/pull/13985#issuecomment-1236401632

   Some of the remaining benchmark regressions are spurious (file-write, 
dataframe-to-table, neither of which are affected by this change). The other 
TPC-H ones are legitimate, but they're on tiny scale factors of data (0.1, 
0.01, i.e. 100mb and 10mb), so the extra 10-15ms that the type checking this PR 
introduces shows up as statistically significant. 
   
   IMO the tradeoff is worth it: we preserve the types of the original data 
better (especially for integers, and after ARROW-17601, decimals), we have more 
convenient passing of strings for dates/timestamps in expressions, and by 
avoiding unnecessary casts, we should get performance benefits on some queries. 
As it turns out, we aren't currently benchmarking queries where the performance 
benefit would show. I'd like to add a benchmark for the case shown on this 
issue, but it fails for me locally due to ARROW-17556. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to