Rahul Goswami created SOLR-17725:
------------------------------------
Summary: Automatically upgrade Solr indexes without needing to
reindex from source
Key: SOLR-17725
URL: https://issues.apache.org/jira/browse/SOLR-17725
Project: Solr
Issue Type: Improvement
Reporter: Rahul Goswami
Today upgrading from Solr version X to X+2 requires complete reingestion of
data from source. This comes from Lucene's constraint which only guarantees
index compatibility between the version the index was created in and the
immediate next version.
This reindexing usually comes with added downtime and/or cost. Especially in
case of deployments which are in customer environments and not completely in
control of the vendor, this proposition of having to completely reindex the
data can become a hard sell.
I have developed a way which achieves this reindexing in-place on the same
index. Also, the process automatically keeps "upgrading" the indexes over
multiple subsequent Solr upgrades without needing manual intervention.
It comes with the following limitations:
i) All _source_ fields need to be either stored=true or docValues=true. Any
copyField destination fields can be stored=false of course, just that the
source fields (or more precisely, the source fields you care about preserving)
should be either stored or docValues true.
ii) The datatype of an existing field in schema.xml shouldn't change upon Solr
upgrade. Introducing new fields is fine.
For indexes where this limitation is not a problem (it wasn't for us!), the
tool can reindex in-place on the same core with zero downtime and legitimately
"upgrade" the index. This can remove a lot of operational headaches, especially
in environments with hundreds/thousands of very large indexes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]