josiahyan edited a comment on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-696381588
@jacques-n I haven't done very much investigation on other speedups - I just happened to notice performance irregularities as compared to our other (legacy) codepaths, and realized that we were spending a significant number of cycles dividing integers. If you're referring to the linked code, I have nothing to do with the Iceberg project. I just happened to chance upon this code. I haven't investigated further patterns, although I'd want to do that in the future, time permitting. I think it would be possible to measure the upper bound of performance, and address some of the questions (including that) in this discussion, if we did the following benchmarks side-by-side over the given variants: * the original IntBenchmarks * save the value/validity buffer capacities outside arrow code, and then manually resize, calling .set * save the overall value capacity in BaseFixedWidthVector.getValueCapacity, and force recomputation based on valueBuffer.capacity()/validityBuffer.capacity() (what I understand of @liyafan82 's suggestion that avoids invalidation issues around the underlying buffers) * Call the specialized putLong directly (The code that was linked in Iceberg) The second option should highlight the upper bound of performance reachable by aggressively caching. The last option should highlight the upper bound of performance reachable by de-specialization (and therefore removing the virtual call chain). I'll write and run these later today. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org