gaborkaszab commented on PR #16319: URL: https://github.com/apache/iceberg/pull/16319#issuecomment-4544021822
Thanks for the explanation, @yadavay-amzn ! I understood the same with my first pass on the code for the 2 use-case. I'm not sure I could argue why to keep ETags on 2 different places, and what is the use-case to maintain a table cache-level freshness-aware loading, and an ops-level freshness-aware loading. If we take a step back and don't look at how this is implemented currently, a user might expect to not do a full table load after downloading the full metadata from the REST server, unless the table changed in the meantime. With this design we might load the changed table twice, once for ops and once for the table cache. E.g. 1) first we load the table to populate the cache. 2) The table changes after this. 3) We do ops `refresh()` that does a full table load. 4) Then we do a catalog.load() that does the same full table load, getting the same ETag as in 3. Users might rightfully say we shouldn't do a full load in step 4) because we already loaded the latest table to the client in step 3). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
