[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

Gus Heck (JIRA) Tue, 07 Nov 2017 06:25:26 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242091#comment-16242091
 ]


Gus Heck commented on SOLR-11487:
---------------------------------

*Constructor* - Yeah that can be simplified. Much of the code directly accesses 
the field, so I try to make it impossible to observe invalid state, but I 
haven't covered the EMPTY_MAP case it seems. It might be that the null checks 
are not actually necessary if I have actually provided this guarantee up front. 
----
*Map Conversion* - This is a result of my not caching/duplicating state. At one 
point I began to have issues (test failures) due to the cached state getting 
out of sync, and rather than continue to try to maintain that duplicated state 
I opted to remove the duplication. My dislike for the possibility of repeated 
splitting of the list was why I originally changed things such that the main 
map contained a list. As you pointed out that complicates serialization if we 
are to maintain the existing comma separated format. So we wind up with one of 
these three things, none of which I like:

- Duplicated state
- Complicated serialization
- Repeated splitting of the comma separated list.

This sort of conundrum is more or less why I had previously suggested we do the 
metadata via zk nodes and don't expand the complexity of aliases.json... Now 
that everything else is working it should be more tractable to push the 
duplication/caching back in than it was to maintain it while things were 
evolving so I can do that if you like, but basically we have to pay for the 
fact that we are clumping this into a single json file somewhere.
----
*convertMap*  - ah yes good catch thx.
---- 
*priorChange* - The task of avoiding competition among unrelated nodes of 
aliases.json is complicated by the fact that the API allows several consecutive 
clones to be made before the result is given to zkStateReader.exportAllAliases 
(again, issues arising from to the "one big json" strategy). We could fix that 
in documentation, and/or set a package private flag that prevents further 
cloning until ZkStateReader has written the current changes... in that case we 
could possibly have a few fields that retained the previous change data as 
string data rather than a function closure. Not sure how fields containing 
strings and a flag is less hokey though, and the flag would technically break 
immutability.

Think of it this way: The state in aliasMap is "candidate" state, and the chain 
of Function calls is an immutable change history that can be applied to a new 
value read from zk if needed. 
----
*API* - Yeah I had attempted to raise this issue above, but confusingly 
conflated it with the possibility of collection metadata earlier, you responded 
to the latter in the negative, and I took it to mean negative vs the former. 
Sorry for the confusing question. This can certainly be added :)
----
*ZkStateReader* - These loops perform different tasks, there are two steps 
here. 
 - ensure the data we are sending includes the latest changes (exportAllAliases)
 - ensure (with timeout) that Zookeeper got the data we eventually decided to 
send. 

We do in fact call clone in the first loop via the Function closure, if needed. 
The one you see in exportAliasToZk is just the initial attempt.
----
*Field order* - yup, agree.
----
*over all*
I am increasingly feeling like there's a lot of complication here that derives 
from our attempts to provide zookeeper like guarantees and prevent competition 
within a single json file. Can you perhaps elaborate on the bookkeeping that 
worries you and [~noble.paul]? Is it really heavier than what we have here?

> Collection Alias metadata for time partitioned collections
> ----------------------------------------------------------
>
>                 Key: SOLR-11487
>                 URL: https://issues.apache.org/jira/browse/SOLR-11487
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: David Smiley
>         Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

Reply via email to