[
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242091#comment-16242091
]
Gus Heck commented on SOLR-11487:
---------------------------------
*Constructor* - Yeah that can be simplified. Much of the code directly accesses
the field, so I try to make it impossible to observe invalid state, but I
haven't covered the EMPTY_MAP case it seems. It might be that the null checks
are not actually necessary if I have actually provided this guarantee up front.
----
*Map Conversion* - This is a result of my not caching/duplicating state. At one
point I began to have issues (test failures) due to the cached state getting
out of sync, and rather than continue to try to maintain that duplicated state
I opted to remove the duplication. My dislike for the possibility of repeated
splitting of the list was why I originally changed things such that the main
map contained a list. As you pointed out that complicates serialization if we
are to maintain the existing comma separated format. So we wind up with one of
these three things, none of which I like:
- Duplicated state
- Complicated serialization
- Repeated splitting of the comma separated list.
This sort of conundrum is more or less why I had previously suggested we do the
metadata via zk nodes and don't expand the complexity of aliases.json... Now
that everything else is working it should be more tractable to push the
duplication/caching back in than it was to maintain it while things were
evolving so I can do that if you like, but basically we have to pay for the
fact that we are clumping this into a single json file somewhere.
----
*convertMap* - ah yes good catch thx.
----
*priorChange* - The task of avoiding competition among unrelated nodes of
aliases.json is complicated by the fact that the API allows several consecutive
clones to be made before the result is given to zkStateReader.exportAllAliases
(again, issues arising from to the "one big json" strategy). We could fix that
in documentation, and/or set a package private flag that prevents further
cloning until ZkStateReader has written the current changes... in that case we
could possibly have a few fields that retained the previous change data as
string data rather than a function closure. Not sure how fields containing
strings and a flag is less hokey though, and the flag would technically break
immutability.
Think of it this way: The state in aliasMap is "candidate" state, and the chain
of Function calls is an immutable change history that can be applied to a new
value read from zk if needed.
----
*API* - Yeah I had attempted to raise this issue above, but confusingly
conflated it with the possibility of collection metadata earlier, you responded
to the latter in the negative, and I took it to mean negative vs the former.
Sorry for the confusing question. This can certainly be added :)
----
*ZkStateReader* - These loops perform different tasks, there are two steps
here.
- ensure the data we are sending includes the latest changes (exportAllAliases)
- ensure (with timeout) that Zookeeper got the data we eventually decided to
send.
We do in fact call clone in the first loop via the Function closure, if needed.
The one you see in exportAliasToZk is just the initial attempt.
----
*Field order* - yup, agree.
----
*over all*
I am increasingly feeling like there's a lot of complication here that derives
from our attempts to provide zookeeper like guarantees and prevent competition
within a single json file. Can you perhaps elaborate on the bookkeeping that
worries you and [~noble.paul]? Is it really heavier than what we have here?
> Collection Alias metadata for time partitioned collections
> ----------------------------------------------------------
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch,
> SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a
> series of collections of a time series. We'll need to store some metadata
> about these time series collections, such as which field of the document
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper)
> will break_. Although if it's configured with an HTTP Solr URL then it would
> not break. There's also some read/write hassle to worry about -- we may need
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say,
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the
> configset in use?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]