viliam-durina opened a new pull request, #15683: URL: https://github.com/apache/lucene/pull/15683
If an `IndexWriter` is opened using an `IndexCommit` with an opened reader (through `IndexWriteConfig.setIndexCommit()`), the reader's `SegmentReader`s are reused and no files are re-read, but there are two exceptions: the `.fnm` file (field infos) is re-read in `IndexWriter.getFieldNumberMap()`. This in unnecessary, as their contents are already loaded by the reader, and we can reuse this information. This PR modifies the `getFieldNumberMap()` method to reuse this information. The other exception is the last `segments_N` file and the respective `.si` files which are re-read twice; we don't address this issue here. This change is important to our use case because we're storing the index on a high-latency remote location and have a custom directory implementation that caches the files locally. The cache works in a simple mode: it caches files when they are opened and releases them when the file is closed, so every unnecessary file re-opening is harmful. This is greatly aggravated with compound files, which we always use, as the whole compound data file is reopened and the `cfe` file re-loaded. However, we hope this change is beneficial for Lucene in general, as it avoids duplicate re-reading of information we already have loaded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
