Eran, The transactional functionality can rollback changes to an index should something happen during a commit. Refer to the methods PrepareCommit & Rollback. You would have to implement your own logic to re-process any changes that were rolled back.
Michael -----Original Message----- From: Eran Sevi [mailto:erans...@gmail.com] Sent: Tuesday, November 17, 2009 9:30 AM To: lucene-net-user@incubator.apache.org Subject: Re: IndexWriter is slow when reader is open Thanks Michael for the detailed explanation.It's much more clearer now. By "transactional capabilities" do you mean that if in the middle of a commit something happens, it is guaranteed that either all the data added from the last commit is in index or all the data is discarded? We have a steady stream of documents for indexing coming in (unfortunately only one at a time, but at a rate of up to 50 per second) and I hoped I could guarantee that when the add method returns, the document is secured on disk. We keep a status for each document in our DB and want to discard the original data. We'll just have to hang on to the original data until each commit has finished and in case of a crash or error reindex the original data. Eran. On Tue, Nov 17, 2009 at 5:59 PM, Michael Garski <mgar...@myspace-inc.com>wrote: > Eran, > > Make no mistake, the poor performance you are experiencing is due to > calling commit on every document addition and not due to internal 'coding by > exception'. There are transactional capabilities of Lucene that will ensure > that your documents are added and persisted to disk. Check out the > IndexWriter documentation for more information. > > The only 'connection' between the reader and the writer are the files on > disk. The writer writes them once, they are not updated, and the reader > holds a reference to the file to ensure it is not deleted out from > underneath it as it still needs to read from it to perform searches. > > During a commit, all of your changes are written to disk and any necessary > segment merges take place, which leaves the older segments that were merged > together as 'orphans' that are no longer referenced by the segments file and > are cleaned up during the final stage of the commit process after all of the > new segments have been written. An attempt is made to then clean up the > older segments that are no longer necessary, which will fail as your reader > still has them open. It fails gracefully in that the file names are > persisted internally to attempt to delete again later, hopefully after the > reader has been reopened and a reference to the orphaned files is no longer > being held. > > I suggest you step through the commit process in a debugger or use a > profiler to demonstrate this issue. > > Michael > > > > -----Original Message----- > From: Eran Sevi [mailto:erans...@gmail.com] > Sent: Tue 11/17/2009 4:55 AM > To: lucene-net-user@incubator.apache.org > Subject: Re: IndexWriter is slow when reader is open > > Michael, > Thanks for the answer. > > I thought the reader was less connected to the writer. Basically what your > saying is that as long as at least one reader is open, exceptions are > thrown > when trying to commit changes (or more accurately, when trying to merge > segments) ? > Can you point me to the place in the source code where that happens? > > What happens to the new documents that were added? are they still saved in > another segments? > > It's very important to us to make sure every document is persistent in the > index so working in batches could be a problem. > But if there's a way to save each added document to disk without merging > the > segment with older segments, this can solve our problem. And since the > reader can't see the new segments anyway until it's reopened, I don't see a > problem continuing writing documents to new segments without performing a > merge. I'll try to change the merge policy/scheduler and see what happens. > > Anyway, coding by exception is quite bad practice. Since we're following > the > java versions I guess it'll take time to be able to change that. > > Eran. > > On Mon, Nov 16, 2009 at 8:56 PM, Michael Garski <mgar...@myspace-inc.com > >wrote: > > > Eran, > > > > The root cause of the issue is due to calling commit after every document > > addition while having a reader open. Calls to commit should be batched > up - > > we frequently use batches of 100 or 1000 between commits. > > > > This is by design within Lucene. Adding documents will cause segments to > > merge and the writer will then delete the older segments that have been > > merged together to create a new one, however with an open reader the > writer > > will not be able to delete the older segment due to a file lock held by > the > > reader. On the call to delete the file an exception is thrown and > swallowed > > internally and the name of the file that the delete was attempted upon is > > added to a list of files that can be deleted on another call. > > > > I suggest you refrain from calling commit so often, as that is why you > are > > experiencing performance issues. > > > > Michael > > > > > > -----Original Message----- > > From: Eran Sevi [mailto:erans...@gmail.com] > > Sent: Mon 11/16/2009 5:07 AM > > To: lucene-net-user@incubator.apache.org > > Subject: Re: IndexWriter is slow when reader is open > > > > I've tried to use it with read-only mode and it looks like it's even > worse > > right now. > > > > I must admit that we're abusing the indexing a bit by commiting after > each > > document addition, but still when there's no reader open, each document > is > > indexed in about 30-50ms and when there's a read-only reader open then > each > > document is indexed in about 150-500ms. > > Why should an open reader affect the commit process so deeply? > > > > I wonder if no one encountered this phenomena before. > > > > > > On Sat, Nov 14, 2009 at 8:27 PM, Matt Honeycutt <mbhoneyc...@gmail.com > > >wrote: > > > > > 2.4 does indeed support read-only mode. I don't know how much it will > > > help, but I would definitely try it. > > > > > > On 11/14/09, Eran Sevi <erans...@gmail.com> wrote: > > > > I'm still using version 2.4 so I think there's still no read only > mode. > > > > Is there no other way to prevent this slow down in previous versions? > > > > > > > > Eran. > > > > > > > > On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski > > > > <mgar...@myspace-inc.com>wrote: > > > > > > > >> Eran, > > > >> > > > >> What version of Lucene are you using? Are you opening the > IndexReader > > > >> in read-only mode? > > > >> > > > >> Michael > > > >> > > > >> -----Original Message----- > > > >> From: Eran Sevi [mailto:erans...@gmail.com] > > > >> Sent: Thursday, November 12, 2009 9:06 AM > > > >> To: lucene-net-user@incubator.apache.org > > > >> Subject: IndexWriter is slow when reader is open > > > >> > > > >> Hi, > > > >> I'm using Lucene.Net 2.4 and I just noticed that when I index > > documents > > > >> while there's at least one IndexReader open on that index (even > > without > > > >> doing anything), the indexing speed is slower by a factor of 3 to 5. > > > >> When > > > >> closing the reader, the indexing speed goes back to normal. > > > >> I'm not doing any deletes, only adds. > > > >> > > > >> My index is going to be updated regularly and there's going to be a > > > >> reader/searcher in use almost all the time so this might be a big > > > >> problem > > > >> for me. > > > >> > > > >> Does anyone have a clue if this is normal behavior? why does it > happen > > > >> and > > > >> how can I avoid such a big loss in performance? > > > >> > > > >> > > > >> Thanks, > > > >> Eran. > > > >> > > > >> > > > > > > > > > > > > > > > >