[ https://issues.apache.org/jira/browse/ZOOKEEPER-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872493#comment-13872493 ]
Patrick Hunt commented on ZOOKEEPER-153: ---------------------------------------- See also: ZOOKEEPER-1416 > add api support for "subscribe" method > -------------------------------------- > > Key: ZOOKEEPER-153 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-153 > Project: ZooKeeper > Issue Type: New Feature > Components: c client, documentation, java client, server, tests > Reporter: Patrick Hunt > Priority: Minor > > Subscribe Method > (note, this was moved from > http://zookeeper.wiki.sourceforge.net/SubscribeMethod) > Outline of the semantics and the requirements of a yet-to-be-implemented > subscribe() method. > Background > ZooKeeper uses a very light weight one-time notification method for notifying > interested clients of changes to ZooKeeper data nodes (znode). Clients can > set a watch on a node when they request information about a znode. The watch > is atomically set and the data returned, so that any subsequent changes to > the znode that affect the data returned will trigger a watch event. The watch > stays in place until triggered or the client is disconnected from a ZooKeeper > server. A disconnect watch event implicitly triggers all watches. > ZooKeeper users have wondered if they can set permanent watches rather than > one time watches. In reality such permanent watches do not provide any extra > benefit over one time watches. Specifically, no data is included in a watch > event, so the client still needs to do a query operation to get the data > corresponding to a change; even then, the znode can change yet again after > the event is received and before the client sends the query operation. Even > the number of of changes to a znode can be found using one time watches and > checking the mzxid in the stat structure of the znode. And the client will > still miss events that happen when the client switches ZooKeeper servers. > There are use cases that require clients to see every change to a ZooKeeper > node. The most general case is when a client behaves like a state machine and > each change to the znode changes the state of the client. In these cases > ZooKeeper is much more like a publish/subscribe system than a distributed > register. To support this case we need not only reliable permanent watches > (we even get the events that happen while switching servers) but also the > data that caused the change, so that the client doesn't miss data that occurs > between rapid fire changes. > Semantics > The subscribe(String path) causes ZooKeeper to register a subscription for a > znode. The initial value of the znode and any subsequent changes to that > znode will cause a watch event with the data to be sent to the client. The > client will see all changes in order. If a client switches servers, any > missed events with the corresponding data will be sent to the client when the > client reconnects to a server. > There are three ways to cancel a subscription: > 1. Calling unsubscribe(String path) > 2. Closing the ZooKeeper session or letting it expire > 3. Falling too far behind. If the server decides that a client is not > processing the watch events fast enough, it will cancel the subscription and > send a SUBSCRIPTION_CANCELLED watch event. > Requirements > There are a couple of things that make it hard to implement the subscribe() > method: > 1. Servers must have complete transaction logs - Currently ZooKeeper > servers just need to have their data trees and in flight transaction logs in > sync. When a follower syncs to a leader, the leader can just blast down a new > snapshot of its data tree; it does not need to send past transactions that > the follower might have missed. However in order to send changes that might > have been missed by a client, the ZooKeeper server must be able to look into > the past to send missed changes. > 2. Servers must be able to send clients information about past changes - > Currenly ZooKeeper servers just send clients information about the current > state of the system. However, to implement subscribe clients must be able to > go back into the log and send watches for past changes. > Implementation Hints > There are things that work in our favor. ZooKeeper does have a bound on the > amount of time it needs to look into the past. A ZooKeeper server bounds the > session expiration time. The server does not need to keep a record of > transactions older than this bound. > ZooKeeper also keeps a log of transactions. As long as the log is complete > enough (as all the transaction back to the longest expiration time) the > server has the information it needs and it isn't hard to process. > We do not want to cause the log disk to seek while looking at past > transactions. There are two complimentary approaches to handling this > problems: keep a few of the transactions from the recent past in memory and > log to two disks. The first log disk will be synced before letting requests > proceed and the second disk will not be synced. Recovery uses the first log > disk and ensures that the second log disk has the same log at recovery time. > The second log disk is to look into the past. Using the two disks in this way > allows synchronous logging to be fast because seeks are avoided on the disk > with the synchronous log. -- This message was sent by Atlassian JIRA (v6.1.5#6160)