Re: SOLR-14341 Migration of a collections's configName (configSet) into state.json

David Smiley Mon, 26 Apr 2021 08:31:42 -0700

Gus: state.json is read on startup.  In my proposal, the in-memory JSON for
it would be augmented with the configSet but I proposed no new write-back
to ZK.  So if there is some reason to change the state (e.g. replica state
change) then the collection would be upgraded.  Conceptually, writing back
immediately makes sense and would allow one to reason that the collections
are updated right away automatically, but I'm not yet sure how complicated
guaranteeing this would be.  Nazerke and I should explore this more to see.


~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Apr 20, 2021 at 6:13 PM Gus Heck <[email protected]> wrote:

> hmm, does state.json get read and (thus upgraded) when a node hosting the
> collection (re)starts? Would this in effect be an upgrade on startup?
>
> On Tue, Apr 20, 2021 at 5:28 PM David Smiley <[email protected]> wrote:
>
>> In the following issue,
>> https://issues.apache.org/jira/browse/SOLR-14341
>> Nazerke (my colleague) is working on moving a collection's "configName"
>> (configSet) into state.json where it should have been all along.  Better
>> late than never.  This is targeting 9.0.  This email is largely about
>> migration / backwards-compatibility.
>>
>> The current location of a collection's configSet name is read by
>> ZkStateReader.readConfigSetName(collection) which reads JSON stored at the
>> ZK path "/collections/<COLNAME>" which is the containing node for
>> SolrCloud's information about the collection (i.e. it contains state.json
>> etc.).  Example data: {"configName":"_default"}.  In case you didn't know,
>> ZK intermediate nodes can contain data just like leaf nodes, unlike a file
>> system.
>>
>> Instead, we want it retrievable by a new method
>> DocCollection.getConfigSet reflecting the storage of state.json which could
>> have a new name-value pair at the top: "configSet".
>>
>> So how do we do this transition?  How about this: Whenever SolrCloud
>> reads state.json, it detects the absence of configSet and it inserts it on
>> the fly, reading the old location.  This will incur a performance overhead
>> but it's transient during an upgrade to Solr 9.  To ensure that all
>> collections are upgraded (and thus stop incurring a penalty), we can
>> provide a trivial bash script that reads all existing collections and loops
>> over them to call MODIFYCOLLECTION to set the configSet to whatever it is
>> currently.  Creating/modifying a collection will ensure that the configSet
>> name is stored in the old place and new place.
>>
>> Then we remove writing to the old place in Solr 10.  Or maybe Solr 9
>> doesn't write to the old location, provided that during a live upgrade you
>> don't create or modify collections or associations with configSets because
>> that could confuse Solr 8 nodes?  If we go with this, a
>> MODIFYCOLLECTION command could remove the old data if it's present.
>>
>> AFAICT, SolrJ CloudSolrClient doesn't really care about this matter,
>> thankfully.
>>
>> WDYT folks?
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: SOLR-14341 Migration of a collections's configName (configSet) into state.json

Reply via email to