[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126110#comment-13126110
 ] 

Marc Celani commented on ZOOKEEPER-1215:
----------------------------------------

Thanks for your comments everyone. It sounds like the feedback is that the 
cache API is too invasive, and the use case too narrow to warrant the big 
changes. Below, I've included two alternative changes that we could make that 
are less invasive, and allow for a broader range of use cases.  If you think 
these are acceptable, I can open a new JIRA and abandon this one.

// returns last_zxid
long long zoo_get_last_zxid(zhandle_t *zh);

// Adds the watches to the internal hashtables
// When connected, internal logic will send the watches, as if we are handling 
reconnect.
// paths_to_watch: list of paths we want to watch
// watch_type: list of watch types
// num_of_paths: lengths of last two arrays.
// last_zxid: The last know zxid, which will be used to fire watches that would 
have fired between the last_zxid and
// what the true zxid is.
zookeeper_init2(const char *host, watcher_fn watcher, int recv_timeout, const 
clientid_t *clientid, void *context, int flags, char **paths_to_watch, int 
*watch_type, int num_of_paths, long long last_zxid);

If we persist our last seen zxid and watch list, we can treat restart as if it 
were prolonged disconnected state. Assuming that the client has a large set of 
data that does not change often, the client can persist locally and reduce 
traffic. The client can build their cache on top of this API, and the changes 
are less invasive.
                
> C client persisted cache
> ------------------------
>
>                 Key: ZOOKEEPER-1215
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1215
>             Project: ZooKeeper
>          Issue Type: New Feature
>          Components: c client
>            Reporter: Marc Celani
>            Assignee: Marc Celani
>
> Motivation:
> 1.  Reduce the impact of client restarts on zookeeper by implementing a 
> persisted cache, and only fetching deltas on restart
> 2.  Reduce unnecessary calls to zookeeper.
> 3.  Improve performance of gets by caching on the client
> 4.  Allow for larger caches than in memory caches.
> Behavior Change:
> Zookeeper clients will not have the option to specify a folder path where it 
> can cache zookeeper gets.  If they do choose to cache results, the zookeeper 
> library will check the persisted cache before actually sending a request to 
> zookeeper.  Watches will automatically be placed on all gets in order to 
> invalidate the cache.  Alternatively, we can add a cache flag to the get API 
> - thoughts?  On reconnect or restart, zookeeper clients will check the 
> version number of each entries into its persisted cache, and will invalidate 
> any old entries.  In checking version number, zookeeper clients will also 
> place a watch on those files.  In regards to watches, client watch handlers 
> will not fire until the invalidation step is completed, which may slow down 
> client watch handling. Since setting up watches on all files is necessary on 
> initialization, initialization will likely slow down as well.
> API Change:
> The zookeeper library will expose a new init interface that specifies a 
> folder path to the cache.  A new get API will specify whether or not to use 
> cache, and whether or not stale data is safe to return if the connection is 
> down.
> Design:
> The zookeeper handler structure will now include a cache_root_path (possibly 
> null) string to cache all gets, as well as a bool for whether or not it is 
> okay to serve stale data.  Old API calls will default to a null path (which 
> signifies no cache), and signify that it is not okay to serve stale data.
> The cache will be located at a cache_root_path.  All files will be placed at 
> cache_root_path/file_path.  The cache will be an incomplete copy of 
> everything that is in zookeeper, but everything in the cache will have the 
> same relative path from the cache_root_path that it has as a path in 
> zookeeper.  Each file in the cache will include the Statstructure and the 
> file contents.
> zoo_get will check the zookeeper handler to determine whether or not it has a 
> cache.  If it does, it will first go to the path to the persisted cache and 
> append the get path.  If the file exists and it is not invalidated, the 
> zookeeper client will read it and return its value.  If the file does not 
> exist or is invalidated, the zookeeper library will perform the same get as 
> is currently designed.  After getting the results, the library will place the 
> value in the persisted cache for subsequent reads.  zoo_set will 
> automatically invalidate the path in the cache.
> If caching is requested, then on each zoo_get that goes through to zookeeper, 
> a watch will be placed on the path. A cache watch handler will handle all 
> watch events by invalidating the cache, and placing another watch on it.  
> Client watch handlers will handle the watch event after the cache watch 
> handler.  The cache watch handler will not call zoo_get, because it is 
> assumed that the client watch handlers will call zoo_get if they need the 
> fresh data as soon as it is invalidated (which is why the cache watch handler 
> must be executed first).
> All updates to the cache will be done on a separate thread, but will be 
> queued in order to maintain consistency in the cache.  In addition, all 
> client watch handlers will not be fired until the cache watch handler 
> completes its invalidation write in order to ensure that client calls to 
> zoo_get in the watch event handler are done after the invalidation step.  
> This means that a client watch handler could be waiting on SEVERAL writes 
> before it can be fired off, since all writes are queued.
> When a new connection is made, if a zookeeper handler has a cache, then that 
> cache will be scanned in order to find all leaf nodes.  Calls will be made to 
> zookeeper to check if all of these nodes still exist, and if they do, what 
> their version number is.  Any inconsistencies in version will result in the 
> cache invalidating the out of date files.  Any files that no longer exist 
> will be deleted from the cache.
>  
> If a connection fails, and a zoo_get call is made on a zookeeper handler that 
> has a cache associated with it, and that cache tolerates stale data, then the 
> stale data will be returned from cache - otherwise, all zoo_gets will error 
> out as they do today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to