The need to close the IndexWriter is no different with the patch for
deletes than it already is for adds.  This is a separate issue that can
be managed asynchronously using the existing mechanism in the
applicaiton.  The patch ensures the proper order of operations, so the
benefit remains.  Applications can now freely add and delete without
worrying about delete's forcing a close of the IndexWriter.

I think we are all in agreement that delete really belongs in IndexWriter.

I agree with Otis that IndexModifier should be deprecated for several
reasons.  I use an "IndexManager" that coordinates all of search, read,
add, delete, update, etc.  It manages the refreshes, the batches, bulk
updates, etc.  And does it all more efficiently than IndexManager.

Haven't heard an answer yet whether or not 1.5 code contributions would
be eligible for the core.

Chuck


robert engels wrote on 07/06/2006 08:01 PM:
> I think you still need to close the IndexWriter at some point, in
> order to search the new documents. In effect all of the changes using
> the "buffered" IndexWriter are meaningless until the IndexWriter is
> closed and a new IndexReader opened.
>
> Given that, it doesn't make much difference when you do the buffering...
>
> My statement about getting the lock once was not entirely correct as
> you point out, it needs to be grabbed in two stages, but a far more
> simple design (as I proposed) could be used - obviously some changes
> for lock management would be needed.
>
> I DO think that the deletion code should be moved to IndexWriter - it
> makes more sense there. The current design IS a bit goofy... I don't
> see why you would delete using an IndexReader.... - why be able to see
> deletions in the current IndexReader but not be able to see additions?
> What is the benefit?
>
> I really like the idea of the BufferedWriter - it is similar to what
> is proposed but I think the implementation would be far simpler and
> more straightforward.  It would be similar to IndexModifier without
> the warning that you should do all the deletions first, and then all
> the additions - the BufferedWriter would manage this for you.
>
> On Jul 6, 2006, at 9:16 PM, Chuck Williams wrote:
>
>> Robert,
>>
>> Either you or I are missing something basic.  I'm not sure which.
>>
>> As I understand things, an IndexWriter and an IndexReader cannot both
>> have the write lock at the same time (they use the same write lock file
>> name).  Only an IndexReader can delete and only an IndexWriter can add.
>> So to update, you need to close the IndexWriter, have the IndexReader
>> delete, and then reopen the IndexWriter.  With the patch, you never need
>> to close the IndexWriter, as I said before.  This provides a benefit in
>> cases where updates cannot be combined into large batches.  In this case
>> without the patch the IndexWriter must be closed and reopened
>> frequently, whereas with the patch it does not.
>>
>> Have I got something wrong?
>>
>> Chuck
>>
>>
>> robert engels wrote on 07/06/2006 03:08 PM:
>>> I think I finally see how this is supposed to optimize - basically
>>> because it remember the terms, and then does the batch deletions.
>>>
>>> We avoid all of this messiness by just making sure each document has a
>>> primary key and we always remove/update by primary key and we can keep
>>> the operations in an ordered list (actually set since the keys are
>>> unique, and that way multiple updates to the same document in a batch
>>> can be coalesced).
>>>
>>> I guess still don't see why the change is so involved though...
>>>
>>> I would just maintain an ordered list of operations (deletes an adds)
>>> on the "buffered writer".
>>> When the "buffered" writer is closed:
>>> Create a RamDirectory.
>>> Perform all deletions in a batch on the main IndexReader.
>>> Perform ordered deletes and adds on the RamDirectory.
>>> Merge the RamDirectory with the main index.
>>>
>>> This could all be encapsulated in a BufferedIndexWriter class.
>>>
>>>
>>> On Jul 6, 2006, at 4:34 PM, robert engels wrote:
>>>
>>>> I guess I don't see the difference...
>>>>
>>>> You need the write lock to use the indexWriter, and you also need the
>>>> write lock to perform a deletion, so if you just get the write lock
>>>> you can perform the deletion and the add, then close the writer.
>>>>
>>>> I have asked how this submission optimizes anything, and I still
>>>> can't seem to get an answer?
>>>>
>>>>
>>>> On Jul 6, 2006, at 4:27 PM, Otis Gospodnetic wrote:
>>>>
>>>>> I think that patch is for a different scenario, the one where you
>>>>> can't wait to batch deletes and adds, and want/need to execute them
>>>>> more frequently and in order they really are happening, without
>>>>> grouping them.
>>>>>
>>>>> Otis
>>>>>
>>>>> ----- Original Message ----
>>>>> From: robert engels <[EMAIL PROTECTED]>
>>>>> To: java-dev@lucene.apache.org
>>>>> Sent: Thursday, July 6, 2006 3:24:13 PM
>>>>> Subject: Re: [jira] Commented: (LUCENE-565) Supporting
>>>>> deleteDocuments in IndexWriter (Code and Performance Results
>>>>> Provided)
>>>>>
>>>>> I guess we just chose a much simpler way to do this...
>>>>>
>>>>> Even with you code changes, to see the modification made using the
>>>>> IndexWriter, it must be closed, and a new IndexReader opened.
>>>>>
>>>>> So a far simpler way is to get the collection of updates first, then
>>>>>
>>>>> using opened indexreader,
>>>>> for each doc in collection
>>>>>        delete document using "key"
>>>>> endfor
>>>>>
>>>>> open indexwriter
>>>>> for each doc in collection
>>>>>        add document
>>>>> endfor
>>>>>
>>>>> open indexreader
>>>>>
>>>>>
>>>>> I don't see how your way is any faster. You must always flush to disk
>>>>> and open the indexreader to see the changes.
>>>>>
>>>>>
>>>>>
>>>>> On Jul 6, 2006, at 2:07 PM, Ning Li wrote:
>>>>>
>>>>>> Hi Otis and Robert,
>>>>>>
>>>>>> I added an overview of my changes in JIRA. Hope that helps.
>>>>>>
>>>>>>> Anyway, my test did exercise the small batches, in that in our
>>>>>>> incremental updates we delete the documents with the unique
>>>>>>> term, and
>>>>>>> then add the new (which is what I assumed this was improving),
>>>>>>> and I
>>>>>>> saw o appreciable difference.
>>>>>>
>>>>>> Robert, could you describe a bit more how your test is set up? Or a
>>>>>> short
>>>>>> code snippet will help me explain.
>>>>>>
>>>>>> Without the patch, when inserts and deletes are interleaved in small
>>>>>> batches, the performance can degrade dramatically because the
>>>>>> ramDirectory
>>>>>> is flushed to disk whenever an IndexWriter is closed, causing a
>>>>>> lot of
>>>>>> small segments to be created on disk, which eventually need to be
>>>>>> merged.
>>>>>>
>>>>>> Is this how your test is set up? And, what are the maxBufferedDocs
>>>>>> and the
>>>>>> maxBufferedDeleteTerms in your test? You won't see a performance
>>>>>> improvement
>>>>>> if they are about the same as the small batch size. The patch
>>>>>> works by
>>>>>> internally buffering inserts and deletes into larger batches.
>>>>>>
>>>>>> Regards,
>>>>>> Ning
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to