Hey Daniel,

I am working on reducing GET calls for small files.
Currently, we are doing 3 GET (2 for footer + 1 for actual data) when we
can easily work with 1 GET call as the file is small enough (expecting
small files <= 1mb) to get the whole file altogether then use it from
buffer for footer and data read both.

I have implemented an approach which is saving around 66% (537s to 169s)
when run in JMH benchmarks, 1000 files (not partitioned) & total 20,000,000
rows.
PR - https://github.com/apache/iceberg/pull/16729
I've started a new thread at
https://lists.apache.org/thread/yb8nom3w2zplb703m0p052kcc1wwotrr connecting
this to the parquet-mr discussion (arrow-rs already exposes footer size
hints that parquet-mr doesn't).
Would appreciate your thoughts there
Can you please look at
https://lists.apache.org/thread/yb8nom3w2zplb703m0p052kcc1wwotrr
-- 
Lakhyani Varun
Indian Institute of Technology Roorkee
Contact: +91 96246 46174

Reply via email to