[ https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151001#comment-16151001 ]
ASF GitHub Bot commented on ZOOKEEPER-1416: ------------------------------------------- Github user skamille commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/136#discussion_r136637514 --- Diff: src/java/main/org/apache/zookeeper/server/WatchManager.java --- @@ -40,50 +40,110 @@ class WatchManager { private static final Logger LOG = LoggerFactory.getLogger(WatchManager.class); - private final HashMap<String, HashSet<Watcher>> watchTable = - new HashMap<String, HashSet<Watcher>>(); + enum Type { + STANDARD() { + @Override + boolean isPersistent() { + return false; + } + + @Override + boolean isRecursive() { + return false; + } + }, + PERSISTENT() { + @Override + boolean isPersistent() { + return true; + } + + @Override + boolean isRecursive() { + return false; + } + }, + PERSISTENT_RECURSIVE() { + @Override + boolean isPersistent() { + return true; + } + + @Override + boolean isRecursive() { + return true; + } + } + ; + + abstract boolean isPersistent(); + abstract boolean isRecursive(); + } + + private final Map<String, Map<Watcher, Type>> watchTable = + new HashMap<>(); + + private final Map<Watcher, Set<String>> watch2Paths = + new HashMap<>(); - private final HashMap<Watcher, HashSet<String>> watch2Paths = - new HashMap<Watcher, HashSet<String>>(); + private int recursiveWatchQty = 0; // guarded by sync + + // visible for testing + synchronized int getRecursiveWatchQty() { + return recursiveWatchQty; + } synchronized int size(){ int result = 0; - for(Set<Watcher> watches : watchTable.values()) { + for(Map<Watcher, Type> watches : watchTable.values()) { result += watches.size(); } return result; } - synchronized void addWatch(String path, Watcher watcher) { - HashSet<Watcher> list = watchTable.get(path); + synchronized void addWatch(String path, Watcher watcher, WatchManager.Type type) { + Map<Watcher, Type> list = watchTable.get(path); if (list == null) { // don't waste memory if there are few watches on a node // rehash when the 4th entry is added, doubling size thereafter // seems like a good compromise - list = new HashSet<Watcher>(4); + list = new HashMap<>(4); --- End diff -- Does the assumption still hold re: memory management now that we have a hashmap instead of a hashset? > Persistent Recursive Watch > -------------------------- > > Key: ZOOKEEPER-1416 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416 > Project: ZooKeeper > Issue Type: Improvement > Components: c client, documentation, java client, server > Reporter: Phillip Liu > Assignee: Jordan Zimmerman > Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > h4. The Problem > A ZooKeeper Watch can be placed on a single znode and when the znode changes > a Watch event is sent to the client. If there are thousands of znodes being > watched, when a client (re)connect, it would have to send thousands of watch > requests. At Facebook, we have this problem storing information for thousands > of db shards. Consequently a naming service that consumes the db shard > definition issues thousands of watch requests each time the service starts > and changes client watcher. > h4. Proposed Solution > We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent > means no Watch reset is necessary after a watch-fire. Recursive means the > Watch applies to the node and descendant nodes. A Persistent Recursive Watch > behaves as follows: > # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS. > # CHILDREN and DATA Recursive Watches can be placed on any znode. > # EXISTS Recursive Watches can be placed on any path. > # A Recursive Watch behaves like a auto-watch registrar on the server side. > Setting a Recursive Watch means to set watches on all descendant znodes. > # When a watch on a descendant fires, no subsequent event is fired until a > corresponding getData(..) on the znode is called, then Recursive Watch > automically apply the watch on the znode. This maintains the existing Watch > semantic on an individual znode. > # A Recursive Watch overrides any watches placed on a descendant znode. > Practically this means the Recursive Watch Watcher callback is the one > receiving the event and event is delivered exactly once. > A goal here is to reduce the number of semantic changes. The guarantee of no > intermediate watch event until data is read will be maintained. The only > difference is we will automatically re-add the watch after read. At the same > time we add the convience of reducing the need to add multiple watches for > sibling znodes and in turn reduce the number of watch messages sent from the > client to the server. > There are some implementation details that needs to be hashed out. Initial > thinking is to have the Recursive Watch create per-node watches. This will > cause a lot of watches to be created on the server side. Currently, each > watch is stored as a single bit in a bit set relative to a session - up to 3 > bits per client per znode. If there are 100m znodes with 100k clients, each > watching all nodes, then this strategy will consume approximately 3.75TB of > ram distributed across all Observers. Seems expensive. > Alternatively, a blacklist of paths to not send Watches regardless of Watch > setting can be set each time a watch event from a Recursive Watch is fired. > The memory utilization is relative to the number of outstanding reads and at > worst case it's 1/3 * 3.75TB using the parameters given above. > Otherwise, a relaxation of no intermediate watch event until read guarantee > is required. If the server can send watch events regardless of one has > already been fired without corresponding read, then the server can simply > fire watch events without tracking. -- This message was sent by Atlassian JIRA (v6.4.14#64029)