Re: Questions about Lucene source
David and Adrien, thanks for your responses. Bringing up an old thread here. Revisiting this question ... > (so deleted docs == max docs) and call commit. Will/Can this segment still > exist after commit? SInce I am using Solr (8.11.1), the default deletion policy is SolrDeletionPolicy which retains only the latest commit by default and deletes the rest. In that case, would a segment be automatically deleted once all of the docs in it have been marked deleted (eg: via reindexing)? If yes, at what point (commit or merge)? Thanks, Rahul On Fri, Sep 23, 2022 at 9:25 AM Adrien Grand wrote: > On the 2nd question, we do not plan on leveraging this information to > figure out the codec: the codec that should be used to read a segment is > stored separately (also in segment infos). > > It is mostly useful for diagnostics purposes. E.g. if we see an interesting > corruption case where checksums match, we can guess that there is a bug > somewhere in Lucene in a version that is between this minimum version and > the version that was used to write the segment. > > On Sat, Sep 17, 2022 at 11:07 AM Dawid Weiss > wrote: > > > > (so deleted docs == max docs) and call commit. Will/Can this segment > > still > > > exist after commit? > > > > > > > Depends on your merge policy index deletion policy. You can configure > > Lucene to keep older commits (and then you'll preserve all historical > > segments). > > > > I don't know the answer to your second question. > > > > D. > > > > > -- > Adrien >
Re: Questions about Lucene source
On the 2nd question, we do not plan on leveraging this information to figure out the codec: the codec that should be used to read a segment is stored separately (also in segment infos). It is mostly useful for diagnostics purposes. E.g. if we see an interesting corruption case where checksums match, we can guess that there is a bug somewhere in Lucene in a version that is between this minimum version and the version that was used to write the segment. On Sat, Sep 17, 2022 at 11:07 AM Dawid Weiss wrote: > > (so deleted docs == max docs) and call commit. Will/Can this segment > still > > exist after commit? > > > > Depends on your merge policy index deletion policy. You can configure > Lucene to keep older commits (and then you'll preserve all historical > segments). > > I don't know the answer to your second question. > > D. > -- Adrien
Re: Questions about Lucene source
> (so deleted docs == max docs) and call commit. Will/Can this segment still > exist after commit? > Depends on your merge policy index deletion policy. You can configure Lucene to keep older commits (and then you'll preserve all historical segments). I don't know the answer to your second question. D.
Re: Questions about Lucene source
Following up on my questions since they didn't get much love the first time. Any inputs are greatly appreciated! Thanks, Rahul On Wed, Sep 14, 2022 at 3:58 PM Rahul Goswami wrote: > Hello, > > I was going through some parts of the Lucene source and had some questions: > 1) Can lucene have 0 document segments? Or will they always be purged > (either by TMP or otherwise) on a commit? > Eg: A segment has 4 docs, and I make a /update call to overwrite all 4 > docs (so deleted docs == max docs) and call commit. Will/Can this segment > still exist after commit? > > 2) Starting Lucene 7.0, each segment also stores a "minVersion" which > tracks the min version of the segment that contributed docs to this > segment. > > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfo.java#L83 > > Reading through LUCENE-7756 I see that one reason to have minVersion was > to have the entire version of the original index stored somewhere since a > change was made to store only the major version at the index level (in > SegmentInfos) > > > https://issues.apache.org/jira/browse/LUCENE-7756?focusedCommentId=15945863&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15945863 > > Checking the code, I found it's being consulted for any signs of index > corruption but that was pretty much it. Curious if there is any other > intended/planned use for minVersion? Eg: some choice of codec at read time > based on this field or anything else? > > Thanks, > Rahul > >
Questions about Lucene source
Hello, I was going through some parts of the Lucene source and had some questions: 1) Can lucene have 0 document segments? Or will they always be purged (either by TMP or otherwise) on a commit? Eg: A segment has 4 docs, and I make a /update call to overwrite all 4 docs (so deleted docs == max docs) and call commit. Will/Can this segment still exist after commit? 2) Starting Lucene 7.0, each segment also stores a "minVersion" which tracks the min version of the segment that contributed docs to this segment. https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfo.java#L83 Reading through LUCENE-7756 I see that one reason to have minVersion was to have the entire version of the original index stored somewhere since a change was made to store only the major version at the index level (in SegmentInfos) https://issues.apache.org/jira/browse/LUCENE-7756?focusedCommentId=15945863&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15945863 Checking the code, I found it's being consulted for any signs of index corruption but that was pretty much it. Curious if there is any other intended/planned use for minVersion? Eg: some choice of codec at read time based on this field or anything else? Thanks, Rahul