viliam-durina opened a new pull request, #15683:
URL: https://github.com/apache/lucene/pull/15683

   If an `IndexWriter` is opened using an `IndexCommit` with an opened reader 
(through `IndexWriteConfig.setIndexCommit()`), the reader's `SegmentReader`s 
are reused and no files are re-read, but there are two exceptions: the `.fnm` 
file (field infos) is re-read in `IndexWriter.getFieldNumberMap()`. This in 
unnecessary, as their contents are already loaded by the reader, and we can 
reuse this information. This PR modifies the `getFieldNumberMap()` method to 
reuse this information.
   
   The other exception is the last `segments_N` file and the respective `.si` 
files which are re-read twice; we don't address this issue here.
   
   This change is important to our use case because we're storing the index on 
a high-latency remote location and have a custom directory implementation that 
caches the files locally. The cache works in a simple mode: it caches files 
when they are opened and releases them when the file is closed, so every 
unnecessary file re-opening is harmful. This is greatly aggravated with 
compound files, which we always use, as the whole compound data file is 
reopened and the `cfe` file re-loaded. However, we hope this change is 
beneficial for Lucene in general, as it avoids duplicate re-reading of 
information we already have loaded.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to