ViswaNXplore opened a new issue, #1289:
URL: https://github.com/apache/curator/issues/1289

   Hi Curator team,
   
   We are using:
   
   - Curator version: 5.6.0
   - ZooKeeper version: 3.8.4
   
   In one of our environment we do observe roughly ~50k znodes, where about 
~45k of them are empty directories. And this numbers are huge in higher envs.
   
   During CuratorCache initialization we observe that getChildren calls are 
issued across the tree, including for these empty directories. Since 
getChildren is relatively latency-heavy, this significantly increases cache 
initialization time at this scale.
   
   In our use case, strict initial enumeration completeness is not required, 
because we rely on persistent recursive watchers and eventually receive create 
events for newly added nodes. Our internal cache can converge to the correct 
state through those events.
   
   While reviewing the initialization flow, we noticed that Stat is already 
retrieved for nodes. Since Stat contains numChildren, we were wondering whether 
it would be reasonable to optimize the initialization logic as follows:
   
   `if (stat.getNumChildren() == 0)
       skip getChildren()`
   
   This would avoid issuing getChildren() calls for nodes that are already 
known to have no children, which could significantly reduce initialization 
overhead in environments with a large number of empty directories.
   
   Our assumption is that correctness would still be preserved because:
   - persistent recursive watchers would capture future child creation events
   - the cache would eventually converge to the correct state
   
   Conceptually the idea is to reduce initialization overhead for environments 
with a large number of empty directories, while still allowing the cache to 
converge to the correct state through watcher events.
   
   Would this optimization be compatible with the intended semantics of 
CuratorCache initialization, or is there a reason getChildren() must always be 
executed even when stat.getNumChildren() == 0?  
   Or is there a recommended way to optimize initialization behavior for this 
type of large namespace with many empty nodes?
   
   Thanks in advance for any guidance.
   
   Best regards,
   Viswanathan


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to