pvary opened a new pull request #4218:
URL: https://github.com/apache/iceberg/pull/4218


   @rbalamohan identified that:
   > the computation of "get" in "Caffeine" cache itself is expensive with 
operations like lookups, afterreads, stats updates, fork-join for purging etc.
   
   This causes running `GenericRecord.create()` for every record expensive.
   Sometimes we can not reuse the containers in readers, but we still need 
better performance. If we create a template record in readers and copy this 
template record then we can avoid the cache retrieval of the `nameToPos` map.
   
   Created some basic jmh performance tests, and here are the results:
   ```
   benchmark-result.txt.orc.base:  mean =      2.571 ±(99.9%) 0.172 s/op
   benchmark-result.txt.orc.fix:  mean =      2.297 ±(99.9%) 0.129 s/op
   benchmark-result.txt.parquet.base:  mean =      2.649 ±(99.9%) 0.136 s/op
   benchmark-result.txt.parquet.fix:  mean =      2.209 ±(99.9%) 0.093 s/op
   ```
   
   We can see that the change gained ~10-20% performance for the generic 
readers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to