ahmedabu98 commented on PR #38149:
URL: https://github.com/apache/beam/pull/38149#issuecomment-4253764009

   I understand wanting to prevent a situation where closed thread-level 
Catalogs lead to dead Tables in a VM cache. 
   
   I'm a little worried about moving to a per-thread Table cache though. There 
will be a lot of "get table" calls in the beginning of a job's lifetime which 
can max out quota pretty quickly (we’ve seen that with users already). And it 
gets exacerbated for some streaming runners like Dataflow, where (# threads) >> 
(# vCPUs).
   
   What if we move in the opposite direction and have a static cache of 
Catalogs? So all threads in a VM share one Catalog instance and perform all 
Table operations through it? Realistically, only one thread will need to create 
the Catalog. That same thread will likely create/load the Table and also store 
it in a static cache.
   
   Thoughts on this approach? It would also improve our current Catalog 
management (which I’m noticing, is to eagerly create a Catalog per thread 
whether or not it actually gets used)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to