Eran,

The transactional functionality can rollback changes to an index should
something happen during a commit.  Refer to the methods PrepareCommit &
Rollback.  You would have to implement your own logic to re-process any
changes that were rolled back.

Michael

-----Original Message-----
From: Eran Sevi [mailto:erans...@gmail.com] 
Sent: Tuesday, November 17, 2009 9:30 AM
To: lucene-net-user@incubator.apache.org
Subject: Re: IndexWriter is slow when reader is open

Thanks Michael for the detailed explanation.It's much more clearer now.

By "transactional capabilities" do you mean that if in the middle of a
commit something happens, it is guaranteed that either all the data
added
from the last commit is in index or all the data is discarded?

We have a steady stream of documents for indexing coming in
(unfortunately
only one at a time, but at a rate of up to 50 per second) and I hoped I
could guarantee that when the add method returns, the document is
secured on
disk. We keep a status for each document in our DB and want to discard
the
original data.

We'll just have to hang on to the original data until each commit has
finished and in case of a crash or error reindex the original data.

Eran.

On Tue, Nov 17, 2009 at 5:59 PM, Michael Garski
<mgar...@myspace-inc.com>wrote:

> Eran,
>
> Make no mistake, the poor performance you are experiencing is due to
> calling commit on every document addition and not due to internal
'coding by
> exception'.  There are transactional capabilities of Lucene that will
ensure
> that your documents are added and persisted to disk.  Check out the
> IndexWriter documentation for more information.
>
> The only 'connection' between the reader and the writer are the files
on
> disk.  The writer writes them once, they are not updated, and the
reader
> holds a reference to the file to ensure it is not deleted out from
> underneath it as it still needs to read from it to perform searches.
>
> During a commit, all of your changes are written to disk and any
necessary
> segment merges take place, which leaves the older segments that were
merged
> together as 'orphans' that are no longer referenced by the segments
file and
> are cleaned up during the final stage of the commit process after all
of the
> new segments have been written.  An attempt is made to then clean up
the
> older segments that are no longer necessary, which will fail as your
reader
> still has them open.  It fails gracefully in that the file names are
> persisted internally to attempt to delete again later, hopefully after
the
> reader has been reopened and a reference to the orphaned files is no
longer
> being held.
>
> I suggest you step through the commit process in a debugger or use a
> profiler to demonstrate this issue.
>
> Michael
>
>
>
> -----Original Message-----
> From: Eran Sevi [mailto:erans...@gmail.com]
> Sent: Tue 11/17/2009 4:55 AM
> To: lucene-net-user@incubator.apache.org
> Subject: Re: IndexWriter is slow when reader is open
>
> Michael,
> Thanks for the answer.
>
> I thought the reader was less connected to the writer. Basically what
your
> saying is that as long as at least one reader is open, exceptions are
> thrown
> when trying to commit changes (or more accurately, when trying to
merge
> segments) ?
> Can you point me to the place in the source code where that happens?
>
> What happens to the new documents that were added? are they still
saved in
> another segments?
>
> It's very important to us to make sure every document is persistent in
the
> index so working in batches could be a problem.
> But if there's a way to save each added document to disk without
merging
> the
> segment with older segments, this can solve our problem. And since the
> reader can't see the new segments anyway until it's reopened, I don't
see a
> problem continuing writing documents to new segments without
performing a
> merge. I'll try to change the merge policy/scheduler and see what
happens.
>
> Anyway, coding by exception is quite bad practice. Since we're
following
> the
> java versions I guess it'll take time to be able to change that.
>
> Eran.
>
> On Mon, Nov 16, 2009 at 8:56 PM, Michael Garski
<mgar...@myspace-inc.com
> >wrote:
>
> > Eran,
> >
> > The root cause of the issue is due to calling commit after every
document
> > addition while having a reader open.  Calls to commit should be
batched
> up -
> > we frequently use batches of 100 or 1000 between commits.
> >
> > This is by design within Lucene.  Adding documents will cause
segments to
> > merge and the writer will then delete the older segments that have
been
> > merged together to create a new one, however with an open reader the
> writer
> > will not be able to delete the older segment due to a file lock held
by
> the
> > reader.  On the call to delete the file an exception is thrown and
> swallowed
> > internally and the name of the file that the delete was attempted
upon is
> > added to a list of files that can be deleted on another call.
> >
> > I suggest you refrain from calling commit so often, as that is why
you
> are
> > experiencing performance issues.
> >
> > Michael
> >
> >
> > -----Original Message-----
> > From: Eran Sevi [mailto:erans...@gmail.com]
> > Sent: Mon 11/16/2009 5:07 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: Re: IndexWriter is slow when reader is open
> >
> > I've tried to use it with read-only mode and it looks like it's even
> worse
> > right now.
> >
> > I must admit that we're abusing the indexing a bit by commiting
after
> each
> > document addition, but still when there's no reader open, each
document
> is
> > indexed in about 30-50ms and when there's a read-only reader open
then
> each
> > document is indexed in about 150-500ms.
> > Why should an open reader affect the commit process so deeply?
> >
> > I wonder if no one encountered this phenomena before.
> >
> >
> > On Sat, Nov 14, 2009 at 8:27 PM, Matt Honeycutt
<mbhoneyc...@gmail.com
> > >wrote:
> >
> > > 2.4 does indeed support read-only mode. I don't know how much it
will
> > > help, but I would definitely try it.
> > >
> > > On 11/14/09, Eran Sevi <erans...@gmail.com> wrote:
> > > > I'm still using version 2.4 so I think there's still no read
only
> mode.
> > > > Is there no other way to prevent this slow down in previous
versions?
> > > >
> > > > Eran.
> > > >
> > > > On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski
> > > > <mgar...@myspace-inc.com>wrote:
> > > >
> > > >> Eran,
> > > >>
> > > >> What version of Lucene are you using?  Are you opening the
> IndexReader
> > > >> in read-only mode?
> > > >>
> > > >> Michael
> > > >>
> > > >> -----Original Message-----
> > > >> From: Eran Sevi [mailto:erans...@gmail.com]
> > > >> Sent: Thursday, November 12, 2009 9:06 AM
> > > >> To: lucene-net-user@incubator.apache.org
> > > >> Subject: IndexWriter is slow when reader is open
> > > >>
> > > >> Hi,
> > > >> I'm using Lucene.Net 2.4 and I just noticed that when I index
> > documents
> > > >> while there's at least one IndexReader open on that index (even
> > without
> > > >> doing anything), the indexing speed is slower by a factor of 3
to 5.
> > > >> When
> > > >> closing the reader, the indexing speed goes back to normal.
> > > >> I'm not doing any deletes, only adds.
> > > >>
> > > >>  My index is going to be updated regularly and there's going to
be a
> > > >> reader/searcher in use almost all the time so this might be a
big
> > > >> problem
> > > >> for me.
> > > >>
> > > >> Does anyone have a clue if this is normal behavior? why does it
> happen
> > > >> and
> > > >> how can I avoid such a big loss in performance?
> > > >>
> > > >>
> > > >> Thanks,
> > > >> Eran.
> > > >>
> > > >>
> > > >
> > >
> >
> >
> >
>
>
>

Reply via email to