[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179190#comment-16179190
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1416:
-------------------------------------------

Github user Randgalt commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/136#discussion_r140815418
  
    --- Diff: src/java/main/org/apache/zookeeper/server/WatchManager.java ---
    @@ -40,50 +40,110 @@
     class WatchManager {
         private static final Logger LOG = 
LoggerFactory.getLogger(WatchManager.class);
     
    -    private final HashMap<String, HashSet<Watcher>> watchTable =
    -        new HashMap<String, HashSet<Watcher>>();
    +    enum Type {
    +        STANDARD() {
    +            @Override
    +            boolean isPersistent() {
    +                return false;
    +            }
    +
    +            @Override
    +            boolean isRecursive() {
    +                return false;
    +            }
    +        },
    +        PERSISTENT() {
    +            @Override
    +            boolean isPersistent() {
    +                return true;
    +            }
    +
    +            @Override
    +            boolean isRecursive() {
    +                return false;
    +            }
    +        },
    +        PERSISTENT_RECURSIVE() {
    +            @Override
    +            boolean isPersistent() {
    +                return true;
    +            }
    +
    +            @Override
    +            boolean isRecursive() {
    +                return true;
    +            }
    +        }
    +        ;
    +
    +        abstract boolean isPersistent();
    +        abstract boolean isRecursive();
    +    }
    +
    +    private final Map<String, Map<Watcher, Type>> watchTable =
    +        new HashMap<>();
    +
    +    private final Map<Watcher, Set<String>> watch2Paths =
    +        new HashMap<>();
     
    -    private final HashMap<Watcher, HashSet<String>> watch2Paths =
    -        new HashMap<Watcher, HashSet<String>>();
    +    private int recursiveWatchQty = 0;    // guarded by sync
    +
    +    // visible for testing
    +    synchronized int getRecursiveWatchQty() {
    +        return recursiveWatchQty;
    +    }
     
         synchronized int size(){
             int result = 0;
    -        for(Set<Watcher> watches : watchTable.values()) {
    +        for(Map<Watcher, Type> watches : watchTable.values()) {
                 result += watches.size();
             }
             return result;
         }
     
    -    synchronized void addWatch(String path, Watcher watcher) {
    -        HashSet<Watcher> list = watchTable.get(path);
    +    synchronized void addWatch(String path, Watcher watcher, 
WatchManager.Type type) {
    +        Map<Watcher, Type> list = watchTable.get(path);
             if (list == null) {
                 // don't waste memory if there are few watches on a node
                 // rehash when the 4th entry is added, doubling size thereafter
                 // seems like a good compromise
    -            list = new HashSet<Watcher>(4);
    +            list = new HashMap<>(4);
    --- End diff --
    
    Good point - this was just cargo-cult coding on my part. However, 
internally `HashSet` uses a `HashMap` so the implementation is the same.


> Persistent Recursive Watch
> --------------------------
>
>                 Key: ZOOKEEPER-1416
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: c client, documentation, java client, server
>            Reporter: Phillip Liu
>            Assignee: Jordan Zimmerman
>         Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> h4. The Problem
> A ZooKeeper Watch can be placed on a single znode and when the znode changes 
> a Watch event is sent to the client. If there are thousands of znodes being 
> watched, when a client (re)connect, it would have to send thousands of watch 
> requests. At Facebook, we have this problem storing information for thousands 
> of db shards. Consequently a naming service that consumes the db shard 
> definition issues thousands of watch requests each time the service starts 
> and changes client watcher.
> h4. Proposed Solution
> We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent 
> means no Watch reset is necessary after a watch-fire. Recursive means the 
> Watch applies to the node and descendant nodes. A Persistent Recursive Watch 
> behaves as follows:
> # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
> # CHILDREN and DATA Recursive Watches can be placed on any znode.
> # EXISTS Recursive Watches can be placed on any path.
> # A Recursive Watch behaves like a auto-watch registrar on the server side. 
> Setting a  Recursive Watch means to set watches on all descendant znodes.
> # When a watch on a descendant fires, no subsequent event is fired until a 
> corresponding getData(..) on the znode is called, then Recursive Watch 
> automically apply the watch on the znode. This maintains the existing Watch 
> semantic on an individual znode.
> # A Recursive Watch overrides any watches placed on a descendant znode. 
> Practically this means the Recursive Watch Watcher callback is the one 
> receiving the event and event is delivered exactly once.
> A goal here is to reduce the number of semantic changes. The guarantee of no 
> intermediate watch event until data is read will be maintained. The only 
> difference is we will automatically re-add the watch after read. At the same 
> time we add the convience of reducing the need to add multiple watches for 
> sibling znodes and in turn reduce the number of watch messages sent from the 
> client to the server.
> There are some implementation details that needs to be hashed out. Initial 
> thinking is to have the Recursive Watch create per-node watches. This will 
> cause a lot of watches to be created on the server side. Currently, each 
> watch is stored as a single bit in a bit set relative to a session - up to 3 
> bits per client per znode. If there are 100m znodes with 100k clients, each 
> watching all nodes, then this strategy will consume approximately 3.75TB of 
> ram distributed across all Observers. Seems expensive.
> Alternatively, a blacklist of paths to not send Watches regardless of Watch 
> setting can be set each time a watch event from a Recursive Watch is fired. 
> The memory utilization is relative to the number of outstanding reads and at 
> worst case it's 1/3 * 3.75TB using the parameters given above.
> Otherwise, a relaxation of no intermediate watch event until read guarantee 
> is required. If the server can send watch events regardless of one has 
> already been fired without corresponding read, then the server can simply 
> fire watch events without tracking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to