Stefan, Mike,
Appreciate your responses! I spent some time analyzing your inputs and
going further down the rabbit hole.

I looked at the IndexRearranger code you referenced where it tries to drop
the segment. I see that it eventually gets handled via
IndexFileDeleter.checkpoint() through file refCounts (=0 for deletion
criteria). The same method also gets called as part of IndexWrtier.commit()
flow (Inside finishCommit()). So in an ideal scenario a commit should have
taken care of dropping the segment files. So that tells me the refCounts
for the files are not getting set to 0. I have a fair suspicion the
reindexing process running on the same index inside the same JVM has to do
something with it.

Thanks for the caution on Approach 2 ...good to at least be able to
continue on one train of thought. As mentioned in my response to Stefan,
the reindexing is going on *inside* of the Solr JVM as an asynchronous
thread and not as a separate process. So I believe the open reader you are
alluding to might be the one I am opening to through
(?) . However, looking at the code, I am seeing IndexFileDeleter.incRef()
only on the files in SegmentCommitInfos.

Does an incRef() also happen when an IndexReader is opened ?

Note:The index is a mix of 7.x and 8.x segments (on Solr 8.x). By extending
TMP and overloading findMerges() I am preventing 7.x segments from
participating in merges, and the code only reindexes these 7.x segments
into the same index, segment-by-segment.
In the current tests I am performing, there are no parallel search or
indexing threads through an external request. The reindexing is the only
process interacting with the index. The goal is to eventually have this
running alongside any parallel indexing/search requests on the index.
Also, as noted earlier, by inspecting the SegmentInfos , I can see the 7.x
segment progressively reducing, but the files never get cleared.

If it is my reader that is throwing off the refCount for Solr, what could
be another way of reading the index without bloating it up with 0 doc

I will also try floating this in the Solr list to get answers to some of
the questions you pose around Solr's handling of readers..


On Thu, Aug 31, 2023 at 6:48 AM Michael McCandless <> wrote:

> Hi Rahul,
> Please do not pursue Approach 2 :)  ReadersAndUpdates.release is not
> something the application should be calling.  This path can only lead to
> pain.
> It sounds to me like something in Solr is holding an old reader (maybe the
> last commit point, or reader prior to the refresh after you re-indexed all
> docs in a given now 100% deleted segment) open.
> Does Solr keep old readers open, older than the most recent commit?  Do
> you have queries in flight that might be holding the old reader open?
> Given that your small by-hand test case (3 docs) correctly showed the 100%
> deleted segment being reclaimed after the soft commit interval or a manual
> hard commit, something must be different in the larger use case that is
> causing Solr to keep a still old reader open.  Is there any logging you can
> enable to understand Solr's handling of its IndexReaders' lifecycle?
> Mike McCandless
> On Mon, Aug 28, 2023 at 10:20 PM Rahul Goswami <>
> wrote:
>> Hello,
>> I am trying to execute a program to read documents segment-by-segment and
>> reindex to the same index. I am reading using Lucene apis and indexing
>> using solr api (in a core that is currently loaded).
>> What I am observing is that even after a segment has been fully processed
>> and an autoCommit (as well as autoSoftCommit ) has kicked in, the segment
>> with 0 live docs gets left behind. *Upon Solr restart, the segment does
>> get
>> cleared succesfully.*
>> I tried to replicate same thing without the code by indexing 3 docs on an
>> empty test core, and then reindexing the same docs. The older segment gets
>> deleted as soon as softCommit interval hits or an explicit commit=true is
>> called.
>> Here are the two approaches that I have tried. Approach 2 is inspired by
>> the merge logic of accessing segments in case opening a DirectoryReader
>> (Approach 1) externally is causing this issue.
>> But both approaches leave undeleted segments behind until I restart Solr
>> and load the core again. What am I missing? I don't have any more brain
>> cells left to fry on this!
>> Approach 1:
>> =========
>> try (FSDirectory dir =;
>>                     IndexReader reader = {
>>                 for (LeafReaderContext lrc : reader.leaves()) {
>>                        //read live docs from each leaf , create a
>> SolrInputDocument out of Document and index using Solr api
>>                 }
>> }catch(Exception e){
>> }
>> Approach 2:
>> ==========
>> ReadersAndUpdates rld = null;
>> SegmentReader segmentReader = null;
>> RefCounted<IndexWriter> iwRef =
>> core.getSolrCoreState().getIndexWriter(core);
>>  iw = iwRef.get();
>> try{
>>   for (SegmentCommitInfo sci : segmentInfos) {
>>      rld = iw.getPooledInstance(sci, true);
>>      segmentReader = rld.getReader(IOContext.READ);
>>     //process all live docs similar to above using the segmentReader.
>>     rld.release(segmentReader);
>>     iw.release(rld);
>> }finally{
>>    if (iwRef != null) {
>>        iwRef.decref();
>>     }
>> }
>> Help would be much appreciated!
>> Thanks,
>> Rahul

Reply via email to