deleting documents

2001-12-27 Thread Uroš Jurglič

I had a following problem:
I filled index like this:
iw = new IndexWriter(index, new SimpleAnalyzer(), false);
iw.addDocument(assetToDoc(asset)); // assetToDoc return Document
instance
iw.close();

and deleted one document as follows:
ir = IndexReader.open(index);
Term uidTerm = new Term("uid", assetUid);   // uid is primary
key
int count = ir.delete(uidTerm);
ir.close();

When I next wanted to print index's content:
IndexReader ir = IndexReader.open(indexPath);
for (int i=0; i


Deleting documents

2002-03-08 Thread Aruna Raghavan

Hi,
Is there anything wrong with the following code?
  try {
   m_lock.write(); // obtain a write lock on a RWLock
   IndexReader indexReader = IndexReader.open("mypath");
   IndexSearcher indexSearcher = new IndexSearcher("mypath");
  // use the searcher to search for documents to be deleted
  // use the reader to do the deletes.
  indexReader.close();
  }
  catch(Throwable e)
  {   
   e.printStackTrace();
  }
  finally
  {
   m_lock.unlock();
  }

Sometimes I am getting the following exception:
java.io.IOException: Index locked for write:
Lock@D:\RevealCS\Search\Data\reports\write.lock
at org.apache.lucene.index.IndexReader.delete(Unknown Source)
at org.apache.lucene.index.IndexReader.delete(Unknown Source)
at
revsearch.RevSearch$DeleteWatcherThread.checkAction(RevSearch.java:1455)
at revsearch.RevSearch$WatcherThread.run(RevSearch.java:250)

This exception was not happening every time the code was run, it was
intermittent.

I suspect it is because I am using indexSearcher and indexWriter to open the
myPath dir. I changed it such that indexSearcher uses the indexReader in the
constructor.

I am hoping that some one can shed some light on what went wrong, thanks.
Aruna.



--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: deleting documents

2001-12-27 Thread Ian Lea

According to the javadoc for IndexReader maxDoc "returns one greater than
the largest possible document number" so doesn't necessarily bear much
resemblance to the actual number of undeleted documents in the index.

IndexReader.isDeleted(i) says whether a particular document has
been deleted and can be called before IndexReader.document(i) to
avoid Exceptions being thrown.

I don't think your index is corrupt.  If you read the javadoc for
IndexReader.delete() you will see that deleted documents don't disappear
immediately (unless you call optimize() as you are) but will disappear
as the index gets modified further.


Hope that helps.


--
Ian.
[EMAIL PROTECTED]


Uro¹ Jurgliè wrote:
> 
> I had a following problem:
> I filled index like this:
> iw = new IndexWriter(index, new SimpleAnalyzer(), false);
> iw.addDocument(assetToDoc(asset)); // assetToDoc return Document
> instance
> iw.close();
> 
> and deleted one document as follows:
> ir = IndexReader.open(index);
> Term uidTerm = new Term("uid", assetUid);   // uid is primary
> key
> int count = ir.delete(uidTerm);
> ir.close();
> 
> When I next wanted to print index's content:
> IndexReader ir = IndexReader.open(indexPath);
> for (int i=0; i System.out.println(i);
> Document doc = ir.document(i);
> Enumeration fields = doc.fields();
> 
> while (fields.hasMoreElements()) {
> System.out.println(fields.nextElement());
> }
> System.out.println();
> }
> 
> the ir.maxDoc() went beyond true number of docs in index and I got stucked
> with Exception telling me that I'm trying to access deleted document.
> 
> Now I always, after deleting a document, open an IndexWriter and call
> optimize() and close it and it works okay then, the index doesn't get
> currupted anymore. But I haven't noticed anywhere that this is a standard
> procedure after deleting a document, so am I doing something wrong?
> Has anyone experienced something similiar? If true, please let me know how
> did you solve it.
> 
> Happy y2k+2

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: Deleting documents

2002-03-12 Thread Spencer, Dave

I think I've come across the same problem.
If you have an indexer that adds docs and also deletes docs as it goes
(use case: it's updating
old docs or adding new ones) it seems that you always get
an exception like this thrown from IndexReader.delete().

java.io.IOException: Index locked for write:
Lock@C:\tmp\luc\locktest\write.lock

I had code similar to the code below, and then modified
to explicitely use the same Directory, to no avail.
Approx code:

Directory dir = FSDirectory.getDirectory( indexName, create);
IndexWriter writer = new IndexWriter( dir, ..., create);
IndexReader reader = IndexReader.open( dir);
// now calls to writer.addDocument() work
// if you call reader.delete(int) it fails

I've attached the full src below though it's a bit messy w/ trace
statements.
Should work fine as an isolation test case.
Uses windows dir names, sorry to Unix folk.

This fails against rc4 and also the latest build (0312).

I'm positive a few months ago this stuff worked fine.

If this is indeed a bug then I think the IndexReader and IndexWriter
should "know" they're
sharing a Directory, whereas now they don't seem to.

As a side note I've always found it strange that IndexReader was used to
delete entries. "reader" to me means read-only, thus I would have
expected IndexWriter to be the thing that is used to add/delete
documents.




-Original Message-
From: Aruna Raghavan [mailto:[EMAIL PROTECTED]]
Sent: Friday, March 08, 2002 10:40 AM
To: 'Lucene Users List'
Subject: Deleting documents


Hi,
Is there anything wrong with the following code?
  try {
   m_lock.write(); // obtain a write lock on a RWLock
   IndexReader indexReader = IndexReader.open("mypath");
   IndexSearcher indexSearcher = new IndexSearcher("mypath");
  // use the searcher to search for documents to be deleted
  // use the reader to do the deletes.
  indexReader.close();
  }
  catch(Throwable e)
  {   
   e.printStackTrace();
  }
  finally
  {
   m_lock.unlock();
  }

Sometimes I am getting the following exception:
java.io.IOException: Index locked for write:
Lock@D:\RevealCS\Search\Data\reports\write.lock
at org.apache.lucene.index.IndexReader.delete(Unknown Source)
at org.apache.lucene.index.IndexReader.delete(Unknown Source)
at
revsearch.RevSearch$DeleteWatcherThread.checkAction(RevSearch.java:1455)
at revsearch.RevSearch$WatcherThread.run(RevSearch.java:250)

This exception was not happening every time the code was run, it was
intermittent.

I suspect it is because I am using indexSearcher and indexWriter to open
the
myPath dir. I changed it such that indexSearcher uses the indexReader in
the
constructor.

I am hoping that some one can shed some light on what went wrong,
thanks.
Aruna.



--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>




LockTest.java
Description: LockTest.java

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>


docFreq and deleting documents

2004-01-30 Thread Pascal Heraud
Hi all,

Does IndexReader#docFreq should be aware of deleted documents, if the index has not been optimized ?

If I have two documents with same term T:
I call docFreq(T) and it returns 2.
I delete the first document.

I call docFreq(T) again and it returns 2.

In our cases, indexes are very big and it costs to optimize them.



Here is a code snippet pointing out the problem :

---
public class Test {
   public static void main(String[] args) {
  String tmp = System.getProperty("java.io.tmpdir")+File.separator+"tst";
  try {
 IndexWriter wri = new IndexWriter(tmp, new WhitespaceAnalyzer(),true);
 Document doc =new Document();
 doc.add(Field.Text("field1","value"));
 doc.add(Field.Text("field2","value2"));
 wri.addDocument(doc);
 doc = new Document();
 doc.add(Field.Text("field1","value"));
 doc.add(Field.Text("field2","value3"));
 wri.addDocument(doc);
 wri.optimize();
 wri.close();
 IndexReader reader = IndexReader.open(tmp);
 System.out.println(reader.docFreq(new Term("field1","value")));
 reader.delete(0);
 reader.close();
 reader = IndexReader.open(tmp);
 System.out.println(reader.docFreq(new Term("field1","value")));
  }
  catch (IOException e) {
 e.printStackTrace();
  }
  }
}
---
Thanks.
Pascal.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


problems deleting documents / design question

2004-10-22 Thread Paul
Hi,
I'm creating an index from several database tables. Every item within
every table has a unique id which is saved in some kind of id-field
and the table name in an other one. So together they form a unique
identifier within the index. When deleting / updating an item I need
to retrieve it. My first idea was
indexreader.delete(new Term("id", "id-value"));
but this could delete several entries as id-value may appear in
several databases.
My second idea was to combine database name and id to form a kind of
unique identifier but this seems to be not the right way as the
problem may occur again with some sub-ids within a certain table.
So my question is: is it possible to determine the item to be deleted
by more than one term?

thx,
Paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Deleting documents that meet a query

2003-06-09 Thread Bruce Cota
I need to delete all the documents from an index that
satisfy a BooleanQuery.
The only methods I can find (in IndexReader) for deleting
a document are delete(Term) and delete(int).
I tried searching on my Query using IndexSearcher.search(),
iterating over each document in the returned Hits,
and then iterating over Hits deleting each document like this:
for (int i=0; i
Hoping here that the value returned by
Hits.id(int) is the "docnum" expected in
IndexReader.delete(int)
But the call to delete throws an IOException.

So, is there any way I can delete all the documents from
an Index that satisfy a general Query?
Thank you for any advice.

Bruce Cota,
Unicon, Inc.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: problems deleting documents / design question

2004-10-22 Thread Aad Nales
Paul,

We are doing similar stuff. We actually do create a hash of database
name, table name and id to form a unique id. So far I have not had any
problems with it. 

Cheers,
Aad

Hi,
I'm creating an index from several database tables. Every item within
every table has a unique id which is saved in some kind of id-field and
the table name in an other one. So together they form a unique
identifier within the index. When deleting / updating an item I need to
retrieve it. My first idea was indexreader.delete(new Term("id",
"id-value")); but this could delete several entries as id-value may
appear in several databases. My second idea was to combine database name
and id to form a kind of unique identifier but this seems to be not the
right way as the problem may occur again with some sub-ids within a
certain table. So my question is: is it possible to determine the item
to be deleted by more than one term?

thx,
Paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Deleting documents that meet a query

2003-06-09 Thread Marie-Hélène Forget
Hi,

I confirm to you that delete( hits.id( i ) ) is ok.

Hits.id( int ) returns the docnum that you need.

MHF :)

On Mon, 2003-06-09 at 12:11, Bruce Cota wrote:
> I need to delete all the documents from an index that
> satisfy a BooleanQuery.
> 
> The only methods I can find (in IndexReader) for deleting
> a document are delete(Term) and delete(int).
> 
> I tried searching on my Query using IndexSearcher.search(),
> iterating over each document in the returned Hits,
> and then iterating over Hits deleting each document like this:
> 
> for (int i=0; i ireader.delete(hits.id(i));
> }
> 
> Hoping here that the value returned by
> Hits.id(int) is the "docnum" expected in
> IndexReader.delete(int)
> 
> But the call to delete throws an IOException.
> 
> So, is there any way I can delete all the documents from
> an Index that satisfy a general Query?
> 
> Thank you for any advice.
> 
> Bruce Cota,
> Unicon, Inc.
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Deleting documents that meet a query

2003-06-09 Thread Bruce Cota
Thanks.  I will restore that code and try to figure out why
it broke :)  (Because my alternative solution was way uglier.)
Marie-Hélène Forget wrote:

Hi,

I confirm to you that delete( hits.id( i ) ) is ok.

Hits.id( int ) returns the docnum that you need.

MHF :)

On Mon, 2003-06-09 at 12:11, Bruce Cota wrote:
 

I need to delete all the documents from an index that
satisfy a BooleanQuery.
The only methods I can find (in IndexReader) for deleting
a document are delete(Term) and delete(int).
I tried searching on my Query using IndexSearcher.search(),
iterating over each document in the returned Hits,
and then iterating over Hits deleting each document like this:
for (int i=0; i
Hoping here that the value returned by
Hits.id(int) is the "docnum" expected in
IndexReader.delete(int)
But the call to delete throws an IOException.

So, is there any way I can delete all the documents from
an Index that satisfy a general Query?
Thank you for any advice.

Bruce Cota,
Unicon, Inc.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




problem with last patch (obtain write.lock while deleting documents)

2002-02-09 Thread Daniel Calvo

Hi,

I've just updated to the latest version (to get the fix for the NullPointerException 
with some phrase queries) and now I'm having
problems with document deletion. I'm trying to delete a document using delete(Term) 
and I'm getting an IOException:

java.io.IOException: Index locked for write: Lock@E:\temp\index\write.lock
at org.apache.lucene.index.IndexReader.delete(Unknown Source)
at org.apache.lucene.index.SegmentsReader.doDelete(Unknown Source)
at org.apache.lucene.index.IndexReader.delete(Unknown Source)
at org.apache.lucene.index.IndexReader.delete(Unknown Source)
  (...)

Here's what I'm doing:
  IndexReader reader = IndexReader.open(index);
  reader.delete(new Term("fileid", id));
  reader.close();

I've taken a look at the sources but couldn't find anything wrong. Any ideas?
BTW, when performing this deletion there's no index writer opened; I assume the writer 
lock is being created by the IndexReader when
executing delete(numDoc).

TIA

Regards,

--Daniel


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




problems with last patch (obtain write.lock while deleting documents)

2002-02-10 Thread Daniel Calvo

Hi,

I've just updated my version (via CVS) and now I'm having problems with document 
deletion. I'm trying to delete a document using
IndexReader's delete(Term) method and I'm getting an IOException:

java.io.IOException: Index locked for write: Lock@E:\temp\index\write.lock
at org.apache.lucene.index.IndexReader.delete(Unknown Source)
at org.apache.lucene.index.SegmentsReader.doDelete(Unknown Source)
at org.apache.lucene.index.IndexReader.delete(Unknown Source)
at org.apache.lucene.index.IndexReader.delete(Unknown Source)
  (...)

I'm doing:
  IndexReader reader = IndexReader.open("index");
  reader.delete(new Term("fileid", id));
  reader.close();

I've taken a look at the sources but couldn't find anything wrong. Any ideas?

TIA

Regards,

--Daniel


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: problems with last patch (obtain write.lock while deleting documents)

2002-02-10 Thread Daniel Calvo

Hi,

I forgot to mention that during this deletion there's no index writer opened and no 
write lock in the index. The lock that's causing
the problem is created by the reader when invoking delete(docNum).

--Daniel

> -Original Message-
> From: Daniel Calvo [mailto:[EMAIL PROTECTED]]
> Sent: sábado, 9 de fevereiro de 2002 17:18
> To: Lucene Users List
> Subject: problems with last patch (obtain write.lock while deleting
> documents)
>
>
> Hi,
>
> I've just updated my version (via CVS) and now I'm having problems with document 
>deletion. I'm trying to delete a
> document using IndexReader's delete(Term) method and I'm getting an IOException:
>
> java.io.IOException: Index locked for write: Lock@E:\temp\index\write.lock
>   at org.apache.lucene.index.IndexReader.delete(Unknown Source)
>   at org.apache.lucene.index.SegmentsReader.doDelete(Unknown Source)
>   at org.apache.lucene.index.IndexReader.delete(Unknown Source)
>   at org.apache.lucene.index.IndexReader.delete(Unknown Source)
>   (...)
>
> I'm doing:
>   IndexReader reader = IndexReader.open("index");
>   reader.delete(new Term("fileid", id));
>   reader.close();
>
> I've taken a look at the sources but couldn't find anything wrong. Any ideas?
>
> TIA
>
> Regards,
>
> --Daniel


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




RE: problems with last patch (obtain write.lock while deleting documents)

2002-02-10 Thread Doug Cutting

> From: Daniel Calvo [mailto:[EMAIL PROTECTED]]
> 
> I've just updated my version (via CVS) and now I'm having 
> problems with document deletion. I'm trying to delete a document using
> IndexReader's delete(Term) method and I'm getting an IOException:
> 
> java.io.IOException: Index locked for write: 

Oops.  I think I see the problem.  I only tested this on an optimized index!

I just checked in a fix.  Try it and tell me how it goes.

Sorry for the inconvenience,

Doug

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: problems with last patch (obtain write.lock while deleting documents)

2002-02-10 Thread Daniel Calvo

Hi Doug,

Problem solved, thanks!

BTW, is the way I'm doing the deletion the correct one? I reckon I can't use a cached 
reader, since I have to close it after the
deletion to release the write lock. Does it make sense? Regarding writers, is it ok to 
share a single IndexWriter with multiple
writers, i.e., I have one writer adding a document and then I get another request for 
doc upload. I can't open a new IndexWriter
because of the write lock, so I'm using the one already available. After all writers 
are done, the IndexWriter is closed.

Again, thanks a lot (for the fix and, most important, for Lucene)

--Daniel

> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]]
> Sent: domingo, 10 de fevereiro de 2002 19:55
> To: 'Lucene Users List'
> Subject: RE: problems with last patch (obtain write.lock while deleting
> documents)
>
>
> > From: Daniel Calvo [mailto:[EMAIL PROTECTED]]
> >
> > I've just updated my version (via CVS) and now I'm having
> > problems with document deletion. I'm trying to delete a document using
> > IndexReader's delete(Term) method and I'm getting an IOException:
> >
> > java.io.IOException: Index locked for write:
>
> Oops.  I think I see the problem.  I only tested this on an optimized index!
>
> I just checked in a fix.  Try it and tell me how it goes.
>
> Sorry for the inconvenience,
>
> Doug
>
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




RE: problems with last patch (obtain write.lock while deleting documents)

2002-02-10 Thread Doug Cutting

> From: Daniel Calvo [mailto:[EMAIL PROTECTED]]
> 
> Problem solved, thanks!

Great!

> BTW, is the way I'm doing the deletion the correct one? I 
> reckon I can't use a cached reader, since I have to close it after the
> deletion to release the write lock. Does it make sense?

Yes.  Looks good to me.

It is most effiecient to batch deletions and insertions, i.e., perform a
bunch of deletions on a single IndexReader, close it, then perform a bunch
of insertions on a single IndexWriter.  Usually the IndexReader that you do
the deletions on is different than the one other threads are simultaneously
using for searching, since if you close a reader while a search is underway
it will crash the search.

> Regarding writers, is it ok to share a single IndexWriter 
> with multiple
> writers, i.e., I have one writer adding a document and then I 
> get another request for doc upload. I can't open a new IndexWriter
> because of the write lock, so I'm using the one already 
> available. After all writers are done, the IndexWriter is closed.

That also sounds fine.

Doug

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: problems with last patch (obtain write.lock while deleting documents)

2002-02-10 Thread Daniel Calvo

> From: Doug Cutting [mailto:[EMAIL PROTECTED]]

> It is most effiecient to batch deletions and insertions, i.e., perform a
> bunch of deletions on a single IndexReader, close it, then perform a bunch
> of insertions on a single IndexWriter.  Usually the IndexReader that you do
> the deletions on is different than the one other threads are simultaneously
> using for searching, since if you close a reader while a search is underway
> it will crash the search.

Unfortunately I can't do that in my application. Users are allowed to insert and 
delete files at any time and changes should be
reflected asap.

Thanks for the answer,

--Daniel


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Problems deleting documents from the index (Lock obtain timed out)

2003-12-15 Thread Hohwiller, Joerg
Hi there,

I just subscribed to this list and have a little Problem:

I am using lucene for incremental indexing (yes, I read the FAQ! dont try to convince 
me to rebuild the index periodically from scratch :) ).

Now the problem seems to be that lucene is not able to perform index modifications 
and parallel search requests. 
After my simple approaches failed, I finnaly implemented the recomended way to have an 
index that is modified and create a copy of that index for searches. I do all this 
with proper Thread synchronization (at least I hope so). 
Before I copy the index, I do close the index-writer and index-reader working 
on that index, then copy and reopen the index-writer and -reader on the new copy. Next 
I close the index-searcher and reopen it on the index that has been copied before.

Now my problem is that when I receive a delete event and want to remove a document 
from the index by a special field (in my case the URI), I get a IOException with the 
message "Lock obtain timed out".

I tried lucene 1.3-rc1, 1.3-rc2 and 1.3-rc3 all with the same result.

Any suggestions would be very welcome :)

Thank you so far
  Jörg Hohwiller

BTW: I attatched the relevant source code (but removed imports, etc. so that it does 
not contain any confidential information). Maybe this answers the first of your 
questions.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]