I wonder if you might not get better performance in a case like this if you were ok taking your index off line, disabling merges, performing deletions and only then enabling merges? This could be done on a copy of the index if updates can be turned off or held in a queue, so that queries could still be served during the maintenance.
However it's largely a theoretical question, since it seems everything worked ok for you in the end. On Feb 28, 2018 8:37 PM, "Stuart Goldberg" <sgoldb...@fixflyer.com> wrote: > Thanks so much. I actually found that my purging routine finished after > about 35 minutes which is really acceptable given that this routine is > supposed to run during the overnight period. > > On Feb 28, 2018 8:34 PM, "Adrien Grand" <jpou...@gmail.com> wrote: > > > Thanks. Deleting lots of documents can indeed trigger a lot of work in > the > > Lucene side. First Lucene likely needs to rewrite the live docs of all > your > > segments and then this might trigger significant merging activity due to > > the fact that Lucene tries to keep the number of deleted docs reasonable > so > > that most disk space is not spent on deleted docs. I can't think of > > settings that would make it more efficient. > > > > If you call deleteDocuments because you are eg. deleting data after a > given > > age, it would help to have time-based indices so that you would remove an > > entire index at once rather than large portions of an index. > > > > Le jeu. 1 mars 2018 à 01:20, Stuart Goldberg <sgoldb...@fixflyer.com> a > > écrit : > > > > > I call deleteDocuments > > > > > > On Feb 28, 2018 8:16 PM, "Adrien Grand" <jpou...@gmail.com> wrote: > > > > > > > What do you mean by purging? What methods do you call? > > > > > > > > Le mer. 28 févr. 2018 à 19:34, Stuart Goldberg < > sgoldb...@fixflyer.com > > > > > > a > > > > écrit : > > > > > > > > > I have huge lucene index. On disk it's about 24Gb. > > > > > > > > > > > > > > > > > > > > I have a purging routine that is supposed to run and purge old > docs. > > > > > > > > > > > > > > > > > > > > There are about 650 million docs in there and through testing I > have > > > > > determined that about 1/3 of these need to be purged. > > > > > > > > > > > > > > > > > > > > During the purge, every so often it's apparently doing some > flushing > > > and > > > > > applying deletes. This causes the process to hang. I know it's not > > > > hanging, > > > > > but actually doing work because I have enabled infostream and I am > > > > getting > > > > > messages every so often (every 5 minutes). > > > > > > > > > > > > > > > > > > > > Is there some trick (index config) I can employ to get this to work > > > > faster. > > > > > > > > > > > > > > > > > > > > Stuart M Goldberg > > > > > > > > > > > > > > > > > > > >