Thanks, that is the conclusion I came to as well; it was a little naive of me to think all the segments always get replaced on each commit, as that of course would be unnecessary and terribly inefficient. De-duplication using a Set was indeed the fix for me.
On Fri, Mar 21, 2014 at 12:47 AM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > a commit is actually just the list of segments and their (unmutable) > files. Because the files are not mutable, every commit point can safely > refer to the same files which are also used by an earlier commit point. In > your code, you should use a Set<String> instead of a List<String>. > Depending on how many changes you have between the snapshots/commits, I > would expect that most of the files overlap. If you only added one document > to an index and then create a new snapshot, you would see basically the > same files in both segments, while the newer one has one segment more (one > with only one document). If you delete documents, the same happens, but > then you get additional files in one of the earlier segments (delete > generations), but the basic set of files would be identical. > If (automatic) segment merging was done between two snapshots, of course > more files can change, because smaller segments may got combined with other > ones on newer snapshots. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- > > From: Vitaly Funstein [mailto:vfunst...@gmail.com] > > Sent: Friday, March 21, 2014 1:35 AM > > To: java-user@lucene.apache.org > > Subject: Segments reusable across commits? > > > > I have a usage pattern where I need to package up and store away all > files > > from an index referenced by multiple commit points. To that end, I > basically > > call IndexWriter.commit(), followed by SnapshotDeletionPolicy.snapshot(), > > followed by something like this: > > > > List<String> files = new ArrayList<String>(dir.listAll().length); > > for (IndexCommit commit: snapshotter.getSnapshots()) { > > files.addAll(commit.getFileNames()); > > } > > > > As it turns out, this creates duplicates, specifically some .si files > appear to be > > present in multiple commit points. Is this expected, and if so > > - does this mean that some commits are allowed to reuse segments created > > by prior commits? I have always thought that each commit creates a new > set > > of segments... I'm using Lucene 4.6. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >