IndexReader.close() semantics and optimize -- Re: problem with locks when updating the data of a previous stored document

2004-09-16 Thread David Spencer
Crump, Michael wrote:
You have to close the IndexReader after doing the delete, before opening the 
IndexWriter for the addition.  See information at this link:
http://wiki.apache.org/jakarta-lucene/UpdatingAnIndex
Recently I thought I observed that if I use this batch update idiom (1st 
delete the changed docs, then add them), it seems that 
IndexReader.close() does not flush/commit the deletions - rather 
IndexWriter.optimize() does.

I may have been confused and should retest this, but regardless, the 
javadoc seems unclear. close() says it "*saves* deletions to disk". What 
does it mean to save a deletion? Save a pending one, or commit it 
(commit -> really delete it)?

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close()
Also optimize doesn't mention deletions.
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#optimize()
Suggestion: could the word "save" in the close() jdoc be elaborated on, 
and possibly could optimize() get another comment wrt its effect on 
deletions?

thx,
 Dave

Regards,
Michael
-Original Message-
From:   Paul Williams [mailto:[EMAIL PROTECTED]
Sent:   Thu 9/16/2004 5:39 AM
To: 'Lucene Users List'
Cc: 
Subject:problem with locks when updating the data of a previous stored document
Hi,
Using lucene-1.4.1.jar on WinXP  

I am having trouble with locking and updating an existing Lucene document. I
delete the old document from the index and then add the new document to the
index writer. I am using the minMerge docs set to 100 (much quicker!!) and
close the writer once the batch is done, so the documents are flushed to the
filesystem
The problem i am having is I can't delete the old version of the document
(after the first document has been added) using reader.delete because there
is a lock on the index due to the IndexWriter being open.
Am I doing this wrong or is there a simple way round this?
Regards,
Paul
Code snippets of the update code (I have just cut and pasted the relevant
line from my app to get an idea)
reader = IndexReader.open(location);
// Delete old doc/term if present
if (reader.docFreq(docNumberTerm) > 0) {
reader.delete(docNumberTerm);
.
.
.
IndexWriter writer = null;
// get the writer from the hash table so last few are cached and don't
have to be restarted
synchronized(IndexWriterCache) {
   String dbstring = "" + ldb;
   writer = (IndexWriter)IndexWriterCache.get(dbstring);
   if (writer == null) {
   //Not in cache so create one and add to cache for next time
   writer = new IndexWriter(location, new StandardAnalyzer(),
new_index);
   writer.setUseCompoundFile(true);
   // Set the maximum number of entries per field. Default is
10,000
   writer.maxFieldLength = MaxFieldCount;
   // Set how many docs will be stored in memory before being
saved to disk
   writer.minMergeDocs = (int) DocsInMemory;
   IndexWriterCache.remove(dbstring);
   IndexWriterCache.put(dbstring, writer);
}
.
.
.
  
// Add the docuents to the Lucene index
writer.addDocument(doc);


.
. Some time later after a batch of docs been added
 
	   writer.close();




DISCLAIMER:
The information in this message is confidential and may be legally
privileged. It is intended solely for the addressee. Access to this message
by anyone else is unauthorised. If you are not the intended recipient, any
disclosure, copying, or distribution of the message, or any action or
omission taken by you in reliance on it, is prohibited and may be unlawful.
Please immediately contact the sender if you have received this message in
error.
Thank you.
Valid Information Systems Limited. Address: Morline House, 160 London Road,
Barking, Essex, IG11 8BB. 

http://www.valinf.com Tel: +44 (0) 20 8215 1414 Fax: +44 (0) 20 8215 2040
Please note that as part of our drive to continually improve the service to
our clients, we have established a dedicated support line for customers to 
call if they are in need of help with their installation of R/KYV or have a
query regarding the operation of the software. The number is - 0870 0161414
This will ensure any call is carefully noted, any action required is 
scheduled for completion and any problem experienced handled by a carefully
chosen team of developers. Please make a note of this number and pass it 
on to any other relevant person within your organisation.
 
*

--
Visit Valid who will sharing a stand with partners, Goss Interactive at the
SOCITM Event, 10- 12 October 2004, Edinburgh International Confer

Re: IndexReader.close() semantics and optimize -- Re: problem with locks when updating the data of a previous stored document

2004-09-16 Thread Morus Walter
David Spencer writes:
> Crump, Michael wrote:
> 
> > You have to close the IndexReader after doing the delete, before opening the 
> > IndexWriter for the addition.  See information at this link:
> > 
> > http://wiki.apache.org/jakarta-lucene/UpdatingAnIndex
> 
> Recently I thought I observed that if I use this batch update idiom (1st 
> delete the changed docs, then add them), it seems that 
> IndexReader.close() does not flush/commit the deletions - rather 
> IndexWriter.optimize() does.
> 
> I may have been confused and should retest this, but regardless, the 
> javadoc seems unclear. close() says it "*saves* deletions to disk". What 
> does it mean to save a deletion? Save a pending one, or commit it 
> (commit -> really delete it)?
> 
My understanding is, that saving makes sure, that index searcher opened after
the close will take the deletions into account.
Deletion in lucene is in fact a two phase process:
a) delete -> document is marked deleted but not removed from the index.
  low level apis (such as list terms) will still see the content
  from deleted documents 
  Search takes the delete flag into account and removes deleted
  documents from the result list.
  (I think this also means that the deleted documents still 
  contribute to term frequencies)
b) remove deleted documents
   this is done during optimize

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]