Thanks, that is the conclusion I came to as well; it was a little naive of
me to think all the segments always get replaced on each commit, as that of
course would be unnecessary and terribly inefficient. De-duplication using
a Set was indeed the fix for me.


On Fri, Mar 21, 2014 at 12:47 AM, Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
> a commit is actually just the list of segments and their (unmutable)
> files. Because the files are not mutable, every commit point can safely
> refer to the same files which are also used by an earlier commit point. In
> your code, you should use a Set<String> instead of a List<String>.
> Depending on how many changes you have between the snapshots/commits, I
> would expect that most of the files overlap. If you only added one document
> to an index and then create a new snapshot, you would see basically the
> same files in both segments, while the newer one has one segment more (one
> with only one document). If you delete documents, the same happens, but
> then you get additional files in one of the earlier segments (delete
> generations), but the basic set of files would be identical.
> If (automatic) segment merging was done between two snapshots, of course
> more files can change, because smaller segments may got combined with other
> ones on newer snapshots.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -----Original Message-----
> > From: Vitaly Funstein [mailto:vfunst...@gmail.com]
> > Sent: Friday, March 21, 2014 1:35 AM
> > To: java-user@lucene.apache.org
> > Subject: Segments reusable across commits?
> >
> > I have a usage pattern where I need to package up and store away all
> files
> > from an index referenced by multiple commit points. To that end, I
> basically
> > call IndexWriter.commit(), followed by SnapshotDeletionPolicy.snapshot(),
> > followed by something like this:
> >
> >       List<String> files = new ArrayList<String>(dir.listAll().length);
> >       for (IndexCommit commit: snapshotter.getSnapshots()) {
> >          files.addAll(commit.getFileNames());
> >       }
> >
> > As it turns out, this creates duplicates, specifically some .si files
> appear to be
> > present in multiple commit points. Is this expected, and if so
> > - does this mean that some commits are allowed to reuse segments created
> > by prior commits? I have always thought that each commit creates a new
> set
> > of segments... I'm using Lucene 4.6.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to