AbgarSim commented on PR #37145: URL: https://github.com/apache/beam/pull/37145#issuecomment-3684296417
> I think I've reviewed a similar PR before, but I can't recall which one. This seems useful, but note that these changes only introduce a distinct cache per instance of `BigQueryWrapper`. It's worth checking where those instances are created throughout the project and whether they're shared across replicas of a DoFn or not. > > If every replica of a DoFn creates their own cache (e.g., deserialized from its ParDo payload, created during the setup phase of its lifecycle), then the effects of these changes are somewhat limited. If that's the case then a class method with caching behaviour could improve that, but may require some changes to the cache key to avoid clashes between distinct (by value, not by reference) BigQuery clients. Hi @sjvanrossum thanks for the review! Checked, and seems that BigQueryWrapper is indeed instantiated multiple times, as such moved the `TTLCache` instance to class level, to allow for a shared cache between instances, also added `test_get_table_shared_cache_across_wrapper_instances` to test for this particular scenario. I also I've overwritten the TTLCache in a private class `_NonNoneTTLCache` to cover for a corner case: when table is created and the `get_or_create_table` is called, there is sometimes a delay when get_table returns None and then right away will return the new table, as such caching None value should be explicitly disabled here. I'm not as experience with python ecosystem and coding conventions so please advice if this is a good place to define this class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
