Hello Ufuk, As per my understanding, data can be sent to Solr through various sources, capturing it all and keeping it somewhere will be a tedious and redundant task. As you are going to have that data in your solr index anyways and could always re-index from original source. Considering the fact that some sources might have a shorter life-span than the resultant index or instances when Solr might be used as no-sql database and only source of truth, we might not have the original source available.
Personally my strategy is to use a script to query all data in chunks from Solr, store the jsons on disk. Use another script to run clean up, remove Solr generated fields and fields created while indexing by update processors. I've used this strategy to migrate indexes as large as 10Gbs and hit no bottle neck. The script can very easily become a tool, just by replacing command line parameters such as solr url, auth token, query, file name regex for chunks etc with a UI based inputs. I can share my script publicly somewhere, I think I should put it on Github. Does this address your concern ? Best Regards, Ayan Farooqui Product Engineer *HotWax Commerce* *Real OmniChannel. Real Results.* email: [email protected] *www.hotwax.co <http://www.hotwax.co/>* On Tue, Sep 2, 2025 at 4:52 PM Ufuk YILMAZ <[email protected]> wrote: > It's interesting that no one has developed a tool to simplify > reindexing; there must be a good reason for it that I haven't > considered, and learning about it would be eye-opening. I figured I'd > just ask. > > Could something intercept requests sent to the update request handler, > capture the raw data before processing, store it somewhere, and replay > it later for reindexing? Here are a few reasons I could think of why > this hasn't been developed: > > - There are numerous entry points for data into Solr, making it > difficult to capture them all. > - It would be too large of a feature to include in Solr, and developers > might not want to maintain it long-term. > - Many people already build their own infrastructure for this, so nobody > actually needs it > > ..? > > --Ufuk > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
