Hi Bram, Can you explain a bit more on the approach? How does Solr Cloud maintain different schema when update mixture of old and new documents in the same segment?
Thanks, and happy new year! - Wei On Fri, Dec 23, 2022 at 8:21 AM Bram Van Dam <[email protected]> wrote: > Greetings, > > We ran into a pretty hairy problem on 7.7. TL;DR; we had to enable > docValues on the unique key field in a large SolrCloud instance, without > being able to reindex old data. > > This kind of worked, by specifying different config sets in > core.properties for different shards, where new shards would get the > schema from ZK and newly indexed data would (correctly) use DocValues, > while old data in older shards remained unaffected. > > This broke when old data was modified: Solr would use the new schema for > the updates, and the index would get corrupted because documents with > and without docValues would be mixed in the same segment in the same > core, which resulted in errors when retrieving the documents (curiously, > not when merging the segments?). > > The linked patch, by my colleague Danny, allows Solr to use the correct > schema when updating data in these old shards (based on the > configuration in core.properties). > > We realize that this is a pretty ugly hack for a rather specific > problem. But at the same time, Solr allows for different configSets to > be specified for different cores, and this patch sort of improves > support for that. > > This applies cleanly on (the admittedly ancient) branch_7_7. All tests > are green, precommit checks are OK. > > If there is any interest in this patch, we might be able to look in to > making it available on master or branch_9x. > > https://foss.intix.eu/solr/2022-12-solr-schema.patch > > Any feedback is of course greatly appreciated. > > Thanks, and season's greetings! > > - Bram > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
