AbgarSim commented on PR #37145:
URL: https://github.com/apache/beam/pull/37145#issuecomment-3684296417

   > I think I've reviewed a similar PR before, but I can't recall which one. 
This seems useful, but note that these changes only introduce a distinct cache 
per instance of `BigQueryWrapper`. It's worth checking where those instances 
are created throughout the project and whether they're shared across replicas 
of a DoFn or not.
   > 
   > If every replica of a DoFn creates their own cache (e.g., deserialized 
from its ParDo payload, created during the setup phase of its lifecycle), then 
the effects of these changes are somewhat limited. If that's the case then a 
class method with caching behaviour could improve that, but may require some 
changes to the cache key to avoid clashes between distinct (by value, not by 
reference) BigQuery clients.
   
   Hi @sjvanrossum thanks for the review!
   Checked, and seems that BigQueryWrapper is indeed instantiated multiple 
times, as such moved the `TTLCache` instance to class level, to allow for a 
shared cache between instances, also added 
`test_get_table_shared_cache_across_wrapper_instances` to test for this 
particular scenario.
   
   I also I've overwritten the TTLCache in a private class `_NonNoneTTLCache` 
to cover for a corner case: when table is created and the `get_or_create_table` 
is called, there is sometimes a delay when get_table returns None and then 
right away will return the new table, as such caching None value should be 
explicitly disabled here. I'm not as experience with python ecosystem and 
coding conventions so please advice if this is a good place to define this 
class.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to