[ 
https://issues.apache.org/jira/browse/SOLR-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835757#comment-16835757
 ] 

Gus Heck edited comment on SOLR-13439 at 5/8/19 10:23 PM:
----------------------------------------------------------

Need was probably the wrong word... want :) would have been more correct. I 
prefer that since then it can be a single line and we don't have to do the 
cache lookup for an object that we are already handling.

As for consistency/frequency of access let's consider some cases (time in 
minutes):
 # Case 1:
 ## T+0 A watch is set, cache is populated from zk
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed exactly once.This is UNCHANGED vs current
 # Case 2:
 ## T+0 A watch is set, cache is populated from zk
 ## T+15 properties are accessed with getCollectionProperties() 
 ## T+30 the watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed exactly once.This is UNCHANGED vs current
 # Case 3:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+5 a watch is set, cache is already populated, zk is not accessed
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case the data access frequency for zk is once instead of twice. 
This LESS than current.
 # Case 4:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+10 the cache expires
 ## T+20 a watch is set, cache is cache is populated from zk
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed twice. This is UNCHANGED vs current. 
 # Case 5:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+1 a call to getCollectionProperties() is made, cache is already 
populated, zk is not accessed
 ## T+2 a call to getCollectionProperties() is made, cache is already 
populated, zk is not accessed
 ## T+12 the cache expires
 ## T+30 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+40 the cache expires
 ## In this case zookeeper is accessed twice instead of four times This is LESS 
than current.

I will grant you that it's hard to predict that when the load on zookeeper will 
be less, but in no case will it be more, so unless you mean to stress test 
zookeeper this is not much of a problem. If I've missed a case let me know.

The existing code (without this patch) isn't really consistent either. Code 
that calls getCollectionProperties() is either fast or slow depending on 
whether or not someone else has set a watch on that collection. I think the 
current inconsistency is worse because something that was fast  due to the 
action of unrelated code (getCollectionProperties()) can become surprisingly 
slow, whereas after my patch, setting the watch may become surprisingly fast 
due to the effect of unrelated code.

Edit: re-reading that last sentence I realize now that one could flip either 
one around but I guess what I meant to express is that the delta is potentially 
much larger in the present code since it would be almost inconceivable to 
create a watch on a per-doc basis, but one could easily write code checking 
collection properties on a per-doc or per request basis.


was (Author: gus_heck):
Need was probably the wrong word... want :) would have been more correct. I 
prefer that since then it can be a single line and we don't have to do the 
cache lookup for an object that we are already handling.

As for consistency/frequency of access let's consider some cases (time in 
minutes):
 # Case 1:
 ## T+0 A watch is set, cache is populated from zk
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed exactly once.This is UNCHANGED vs current
 # Case 2:
 ## T+0 A watch is set, cache is populated from zk
 ## T+15 properties are accessed with getCollectionProperties() 
 ## T+30 the watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed exactly once.This is UNCHANGED vs current
 # Case 3:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+5 a watch is set, cache is already populated, zk is not accessed
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case the data access frequency for zk is once instead of twice. 
This LESS than current.
 # Case 4:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+10 the cache expires
 ## T+20 a watch is set, cache is cache is populated from zk
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed twice. This is UNCHANGED vs current. 
 # Case 5:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+1 a call to getCollectionProperties() is made, cache is already 
populated, zk is not accessed
 ## T+2 a call to getCollectionProperties() is made, cache is already 
populated, zk is not accessed
 ## T+12 the cache expires
 ## T+30 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+40 the cache expires
 ## In this case zookeeper is accessed twice instead of four times This is LESS 
than current.

I will grant you that it's hard to predict that when the load on zookeeper will 
be less, but in no case will it be more, so unless you mean to stress test 
zookeeper this is not much of a problem. If I've missed a case let me know.

The existing code (without this patch) isn't really consistent either. Code 
that calls getCollectionProperties() is either fast or slow depending on 
whether or not someone else has set a watch on that collection. I think the 
current inconsistency is worse because something that was fast  due to the 
action of unrelated code (getCollectionProperties()) can become surprisingly 
slow, whereas after my patch, setting the watch may become surprisingly fast 
due to the effect of unrelated code.

> Make collection properties easier and safer to use in code
> ----------------------------------------------------------
>
>                 Key: SOLR-13439
>                 URL: https://issues.apache.org/jira/browse/SOLR-13439
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: master (9.0)
>            Reporter: Gus Heck
>            Assignee: Gus Heck
>            Priority: Major
>         Attachments: SOLR-13439.patch, SOLR-13439.patch
>
>
> (breaking this out from SOLR-13420, please read there for further background)
> Before this patch the api is quite confusing (IMHO):
>  # any code that wanted to know what the properties for a collection are 
> could call zkStateReader.getCollectionProperties(collection) but this was a 
> dangerous and trappy API because that was a query to zookeeper every time. If 
> a naive user auto-completed that in their IDE without investigating, heavy 
> use of zookeeper would ensue.
>  # To "do it right" for any code that might get called on a per-doc or per 
> request basis one had to cause caching by registering a watcher. At which 
> point the getCollectionProperties(collection) magically becomes safe to use, 
> but the watcher pattern probably looks famillar induces a user who hasn't 
> read the solr code closely to create their own cache and update it when their 
> watcher is notified. If the caching side effect of watches isn't understood 
> this will lead to many in-memory copies of collection properties maintained 
> in user code.
>  # This also creates a task to be scheduled on a thread (PropsNotification) 
> and induces an extra thread-scheduling lag before the changes can be observed 
> by user code.
>  # The code that cares about collection properties needs to have a lifecycle 
> tied to either a collection or something other object with an even more 
> ephemeral life cycle such as an URP. The user now also has to remember to 
> ensure the watch is unregistered, or there is a leak.
> After this patch
>  # Calls to getCollectionProperties(collection) are always safe to use in any 
> code anywhere. Caching and cleanup are automatic.
>  # Code that really actually wants to know if a collection property changes 
> so it can wake up and do something (autoscaling?) still has the option of 
> registering a watcher that will asynchronously send them a notification.
>  # Updates can be observed sooner via getCollectionProperties with no need to 
> wait for a thread to run. (vs a cache held in user code)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to