[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242091#comment-16242091 ]
Gus Heck commented on SOLR-11487: --------------------------------- *Constructor* - Yeah that can be simplified. Much of the code directly accesses the field, so I try to make it impossible to observe invalid state, but I haven't covered the EMPTY_MAP case it seems. It might be that the null checks are not actually necessary if I have actually provided this guarantee up front. ---- *Map Conversion* - This is a result of my not caching/duplicating state. At one point I began to have issues (test failures) due to the cached state getting out of sync, and rather than continue to try to maintain that duplicated state I opted to remove the duplication. My dislike for the possibility of repeated splitting of the list was why I originally changed things such that the main map contained a list. As you pointed out that complicates serialization if we are to maintain the existing comma separated format. So we wind up with one of these three things, none of which I like: - Duplicated state - Complicated serialization - Repeated splitting of the comma separated list. This sort of conundrum is more or less why I had previously suggested we do the metadata via zk nodes and don't expand the complexity of aliases.json... Now that everything else is working it should be more tractable to push the duplication/caching back in than it was to maintain it while things were evolving so I can do that if you like, but basically we have to pay for the fact that we are clumping this into a single json file somewhere. ---- *convertMap* - ah yes good catch thx. ---- *priorChange* - The task of avoiding competition among unrelated nodes of aliases.json is complicated by the fact that the API allows several consecutive clones to be made before the result is given to zkStateReader.exportAllAliases (again, issues arising from to the "one big json" strategy). We could fix that in documentation, and/or set a package private flag that prevents further cloning until ZkStateReader has written the current changes... in that case we could possibly have a few fields that retained the previous change data as string data rather than a function closure. Not sure how fields containing strings and a flag is less hokey though, and the flag would technically break immutability. Think of it this way: The state in aliasMap is "candidate" state, and the chain of Function calls is an immutable change history that can be applied to a new value read from zk if needed. ---- *API* - Yeah I had attempted to raise this issue above, but confusingly conflated it with the possibility of collection metadata earlier, you responded to the latter in the negative, and I took it to mean negative vs the former. Sorry for the confusing question. This can certainly be added :) ---- *ZkStateReader* - These loops perform different tasks, there are two steps here. - ensure the data we are sending includes the latest changes (exportAllAliases) - ensure (with timeout) that Zookeeper got the data we eventually decided to send. We do in fact call clone in the first loop via the Function closure, if needed. The one you see in exportAliasToZk is just the initial attempt. ---- *Field order* - yup, agree. ---- *over all* I am increasingly feeling like there's a lot of complication here that derives from our attempts to provide zookeeper like guarantees and prevent competition within a single json file. Can you perhaps elaborate on the bookkeeping that worries you and [~noble.paul]? Is it really heavier than what we have here? > Collection Alias metadata for time partitioned collections > ---------------------------------------------------------- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org