pvary opened a new pull request #4218: URL: https://github.com/apache/iceberg/pull/4218
@rbalamohan identified that: > the computation of "get" in "Caffeine" cache itself is expensive with operations like lookups, afterreads, stats updates, fork-join for purging etc. This causes running `GenericRecord.create()` for every record expensive. Sometimes we can not reuse the containers in readers, but we still need better performance. If we create a template record in readers and copy this template record then we can avoid the cache retrieval of the `nameToPos` map. Created some basic jmh performance tests, and here are the results: ``` benchmark-result.txt.orc.base: mean = 2.571 ±(99.9%) 0.172 s/op benchmark-result.txt.orc.fix: mean = 2.297 ±(99.9%) 0.129 s/op benchmark-result.txt.parquet.base: mean = 2.649 ±(99.9%) 0.136 s/op benchmark-result.txt.parquet.fix: mean = 2.209 ±(99.9%) 0.093 s/op ``` We can see that the change gained ~10-20% performance for the generic readers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
