Re: can't delete from an index using IndexReader.delete()

2004-02-24 Thread Kevin A. Burton
Dhruba Borthakur wrote:

Hi folks,

I am using the latest and greatest Lucene jar file and am facing a 
problem with
deleting documents from the index. Browsing the mail archive, I found 
that the
following email (June 2003) listed the exact problem that I am 
encountering.

In short: I am using Field.text("id", "value") to mark a document. 
Then I use
reader.delete(new Term("id", "value")) to remove the document: this
call returns 0 and fails to delete the document. The attached sample 
program
shows this behaviour.
Agreed... you're values might be indexed... try adding them as Tokens...

Kevin

--

Please reply using PGP:

   http://peerfear.org/pubkey.asc

   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

begin:vcard
fn:Kevin Burton
n:Burton;Kevin
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



signature.asc
Description: OpenPGP digital signature


RE: can't delete from an index using IndexReader.delete()

2003-06-24 Thread Ian Lea
You should use Field.Keyword rather than Field.Text for the identifier
because you do not want it tokenized.

  doc.add(Field.Keyword("id", whatever));

In 2 places in your example code.



--
Ian.
[EMAIL PROTECTED]


> [EMAIL PROTECTED] (Robert Koberg) wrote 
>
> Here is a simple class that can reproduce the problem (happens with the last
> stable release too). Let me know if you would prefer this as an attachment.
> 
> Call like this:
> java TestReaderDelete existing_id new_label
> - or -
> 
> Try:
> java TestReaderDelete B724547 ppp
> 
> and then try:
> java TestReaderDelete a266122794 ppp
> 
> If an index has not been created it will create one. Keep running the one of
> the above example commands (with and without deleting the index directory)
> and see what happens to the System.out.println's
> 
> 
> 
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.DateField;
> 
> import org.xml.sax.*;
> import org.xml.sax.helpers.*;
> import org.xml.sax.Attributes;
> import javax.xml.parsers.*;
> 
> import java.io.*;
> import java.util.*;
> 
> 
> class TestReaderDelete {
> 
>   
> 
>   public static void main(String[] args) 
> throws IOException
>   {
> File index = new File("./testindex");
> if (!index.exists()) {
>   HashMap test_map = new HashMap();
>   test_map.put("preamble_content", "Preamble content bbb");
>   test_map.put("art_01_section_01", "Article 1, Section 1");
>   test_map.put("toc_tester", "Test TOC XML bbb");
>   test_map.put("B724547", "bio example");
>   test_map.put("a266122794", "tester");
>   indexFiles(index, test_map);
> } 
> String identifier = args[0];
> String new_label = args[1];
> testDeleteAndAdd(index, identifier, new_label);
>   }
>   
> 
>   public static void indexFiles(File index, HashMap test_map) 
>   {
> try {
>   IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(),
> true);
>   for (Iterator i=test_map.entrySet().iterator(); i.hasNext(); ) {
> Map.Entry e = (Map.Entry) i.next();
> System.out.println("Adding: " + e.getKey() + " = " + e.getValue());
> Document doc = new Document();
> doc.add(Field.Text("id", (String)e.getKey()));  
> doc.add(Field.Text("label", (String)e.getValue())); 
> writer.addDocument(doc);
>   }
>   writer.optimize();
>   writer.close();
> } catch (Exception e) {
>   System.out.println(" caught a " + e.getClass() +
>"\n with message: " + e.getMessage());
> }
>   }
>   
>   
>   public static void testDeleteAndAdd(File index, String identifier, String
> new_label) 
> throws IOException
>   {
> IndexReader reader = IndexReader.open(index);
> System.out.println("!!! reader.numDocs() : " + reader.numDocs());
> System.out.println("reader.indexExists(): " + reader.indexExists(index));
> 
> System.out.println("term field: " + new Term("id", identifier).field());
> System.out.println("term text: " + new Term("id", identifier).text());
> System.out.println("reader.docFreq: " + reader.docFreq(new Term("id",
> identifier)));  
> System.out.println("deleting target now...");
> int deleted_num = reader.delete(new Term("id", identifier));
> System.out.println("*** deleted_num: " + deleted_num);
> reader.close();
> try {
>   IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(),
> false);
>   String ident = identifier;
>   Document doc = new Document();
>   doc.add(Field.Text("id", identifier));  
>   doc.add(Field.Text("label", new_label)); 
>   writer.addDocument(doc);
>   writer.optimize();
>   writer.close();
> } catch (Exception e) {
>   System.out.println(" caught a " + e.getClass() +
>"\n with message: " + e.getMessage());
> }
> 
> System.out.println("!!! reader.numDocs() after deleting and adding : " +
> reader.numDocs()); 
>   } 
>   
> }
> 
> 
> 
> > -Original Message-
> > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> > Sent: Sunday, June 22, 2003 9:42 PM
> > To: Lucene Users List
> > 
> > The code looks fine.  Unfortunately, the provided code is not a full,
> > self-sufficient class that I can run on my machine to verify the
> > behaviour that you are describing.
> > 
> > Otis
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
--
Searchable personal storage and archiving from http://www.digimem.net/

--

RE: can't delete from an index using IndexReader.delete()

2003-06-23 Thread Robert Koberg
Here is a simple class that can reproduce the problem (happens with the last
stable release too). Let me know if you would prefer this as an attachment.

Call like this:
java TestReaderDelete existing_id new_label
- or -

Try:
java TestReaderDelete B724547 ppp

and then try:
java TestReaderDelete a266122794 ppp

If an index has not been created it will create one. Keep running the one of
the above example commands (with and without deleting the index directory)
and see what happens to the System.out.println's



import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.DateField;

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.xml.sax.Attributes;
import javax.xml.parsers.*;

import java.io.*;
import java.util.*;


class TestReaderDelete {

  

  public static void main(String[] args) 
throws IOException
  {
File index = new File("./testindex");
if (!index.exists()) {
  HashMap test_map = new HashMap();
  test_map.put("preamble_content", "Preamble content bbb");
  test_map.put("art_01_section_01", "Article 1, Section 1");
  test_map.put("toc_tester", "Test TOC XML bbb");
  test_map.put("B724547", "bio example");
  test_map.put("a266122794", "tester");
  indexFiles(index, test_map);
} 
String identifier = args[0];
String new_label = args[1];
testDeleteAndAdd(index, identifier, new_label);
  }
  

  public static void indexFiles(File index, HashMap test_map) 
  {
try {
  IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(),
true);
  for (Iterator i=test_map.entrySet().iterator(); i.hasNext(); ) {
Map.Entry e = (Map.Entry) i.next();
System.out.println("Adding: " + e.getKey() + " = " + e.getValue());
Document doc = new Document();
doc.add(Field.Text("id", (String)e.getKey()));  
doc.add(Field.Text("label", (String)e.getValue())); 
writer.addDocument(doc);
  }
  writer.optimize();
  writer.close();
} catch (Exception e) {
  System.out.println(" caught a " + e.getClass() +
 "\n with message: " + e.getMessage());
}
  }
  
  
  public static void testDeleteAndAdd(File index, String identifier, String
new_label) 
throws IOException
  {
IndexReader reader = IndexReader.open(index);
System.out.println("!!! reader.numDocs() : " + reader.numDocs());
System.out.println("reader.indexExists(): " + reader.indexExists(index));

System.out.println("term field: " + new Term("id", identifier).field());
System.out.println("term text: " + new Term("id", identifier).text());
System.out.println("reader.docFreq: " + reader.docFreq(new Term("id",
identifier)));  
System.out.println("deleting target now...");
int deleted_num = reader.delete(new Term("id", identifier));
System.out.println("*** deleted_num: " + deleted_num);
reader.close();
try {
  IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(),
false);
  String ident = identifier;
  Document doc = new Document();
  doc.add(Field.Text("id", identifier));  
  doc.add(Field.Text("label", new_label)); 
  writer.addDocument(doc);
  writer.optimize();
  writer.close();
} catch (Exception e) {
  System.out.println(" caught a " + e.getClass() +
 "\n with message: " + e.getMessage());
}

System.out.println("!!! reader.numDocs() after deleting and adding : " +
reader.numDocs()); 
  } 
  
}



> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> Sent: Sunday, June 22, 2003 9:42 PM
> To: Lucene Users List
> 
> The code looks fine.  Unfortunately, the provided code is not a full,
> self-sufficient class that I can run on my machine to verify the
> behaviour that you are describing.
> 
> Otis



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: can't delete from an index using IndexReader.delete()

2003-06-23 Thread Robert Koberg
Thanks Jeff and Otis,

After some more testing I am finding that the bug affects only certain docs.

For example if I have a Document in the index with the following IDs it will
not be deleted:

'preamble_content'
'toc_testor'
'B724547'

The following IDs will work and delete the Document from the index:

'art_01_section_01'
'a266122794'

When editing one of duplicates (triplicates, etc) metadata and saving it
again (goes through delete and then re-add again) it adds another entry to
the index with the same id Term.

On providing complete code - I will try to get that out to the list. It
currently reads from config XML files. I will try to make a simple example.

Thanks again,
-Rob



> -Original Message-
> From: Jeff Linwood [mailto:[EMAIL PROTECTED]
> Sent: Sunday, June 22, 2003 9:54 PM
> To: Lucene Users List
> 
> Hi,
> 
> Can you check the return value of your reader.delete(...); call?
> According to the Javadocs, it should return the number of documents it
> deleted, maybe you can verify that it is deleting an entry?
> 
> Jeff
> 
> Otis Gospodnetic wrote:
> > The code looks fine.  Unfortunately, the provided code is not a full,
> > self-sufficient class that I can run on my machine to verify the
> > behaviour that you are describing.
> >
> > Otis
> >
> > --- Robert Koberg <[EMAIL PROTECTED]> wrote:
> >
> >>Hi,
> >>
> >>I am using the latest binary distro (lucene-20030620.jar).  I am
> >>trying to
> >>delete an entry from an index and then add it back with updated
> >>information.
> >>
> >>The entry is a content XML piece with some metadata added to the
> >>Document. I
> >>try to delete the entry by using a Term derived by the Field 'id' and
> >>the
> >>value of that field. The value is correct. What happens is that two
> >>entries
> >>exist after executing the code below.
> >>
> >>So, creating a Query for field 'id' with an example value 'abc' will
> >>return
> >>two hits. Any ideas what I am doing wrong? Is this a bug?
> >>
> >>Also, if you see anything I am doing stupidly or that can be
> >>improved,
> >>please let me know.
> >>
> >>Thanks,
> >>-Rob
> >>
> >>
> >>IndexReader reader =
> >>IndexReader.open(project.search_index_path.getNativePath());
> >>reader.delete(new Term("id", member.content_idref));
> >>reader.close();
> >>
> >>ISO8601Converter iso_conv = new ISO8601Converter();
> >>
> >>try {
> >>  IndexWriter writer = new
> >>IndexWriter(project.search_index_path.getNativePath(), new
> >>StandardAnalyzer(), false);
> >>
> >>  File f = new
> >>
> >
> >
> File(project.content_path.lookup(member.content_idref.concat(".xml")).getN
> at
> >
> >>ivePath());
> >>
> >>  XMLSearchHandler hdlr = new XMLSearchHandler(f);
> >>
> >>  Document doc = hdlr.getDocument();
> >>
> >>  doc.add(Field.Text("id", member.content_idref));
> >>  doc.add(Field.Text("status", status));
> >>  doc.add(Field.Text("type", target_elem.getAttributeValue("type")));
> >>
> >>  doc.add(Field.Text("creator",
> >>target_elem.getAttributeValue("creator")));
> >>  doc.add(Field.Text("last_mod_by", member.full_name));
> >>  doc.add(Field.Text("modified",
> >>
> >
> >
> DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("modif
> ie
> >
> >>d"), new ParsePosition(0);
> >>  doc.add(Field.Text("created",
> >>
> >
> >
> DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("creat
> ed
> >
> >>"), new ParsePosition(0);
> >>  doc.add(Field.Text("label", label));
> >>  doc.add(Field.Text("keywords", keywords));
> >>
> >>  writer.addDocument(doc);
> >>
> >>  writer.optimize();
> >>  writer.close();
> >>
> >>} catch (Exception e) {
> >>  ...
> >>}
> >>
> >>
> >>-
> >>To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >
> >
> >
> > __
> > Do you Yahoo!?
> > SBC Yahoo! DSL - Now only $29.95 per month!
> > http://sbc.yahoo.com
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: can't delete from an index using IndexReader.delete()

2003-06-22 Thread Jeff Linwood
Hi,

Can you check the return value of your reader.delete(...); call? 
According to the Javadocs, it should return the number of documents it 
deleted, maybe you can verify that it is deleting an entry?

Jeff

Otis Gospodnetic wrote:
The code looks fine.  Unfortunately, the provided code is not a full,
self-sufficient class that I can run on my machine to verify the
behaviour that you are describing.
Otis

--- Robert Koberg <[EMAIL PROTECTED]> wrote:

Hi,

I am using the latest binary distro (lucene-20030620.jar).  I am
trying to
delete an entry from an index and then add it back with updated
information.
The entry is a content XML piece with some metadata added to the
Document. I
try to delete the entry by using a Term derived by the Field 'id' and
the
value of that field. The value is correct. What happens is that two
entries
exist after executing the code below. 

So, creating a Query for field 'id' with an example value 'abc' will
return
two hits. Any ideas what I am doing wrong? Is this a bug?
Also, if you see anything I am doing stupidly or that can be
improved,
please let me know.
Thanks,
-Rob
IndexReader reader =
IndexReader.open(project.search_index_path.getNativePath());
reader.delete(new Term("id", member.content_idref));
reader.close();
ISO8601Converter iso_conv = new ISO8601Converter(); 

try {
 IndexWriter writer = new
IndexWriter(project.search_index_path.getNativePath(), new
StandardAnalyzer(), false);

 File f = new
File(project.content_path.lookup(member.content_idref.concat(".xml")).getNat

ivePath());

 XMLSearchHandler hdlr = new XMLSearchHandler(f);

 Document doc = hdlr.getDocument();

 doc.add(Field.Text("id", member.content_idref));  
 doc.add(Field.Text("status", status)); 
 doc.add(Field.Text("type", target_elem.getAttributeValue("type")));

 doc.add(Field.Text("creator",
target_elem.getAttributeValue("creator"))); 
 doc.add(Field.Text("last_mod_by", member.full_name)); 
 doc.add(Field.Text("modified",

DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("modifie

d"), new ParsePosition(0); 
 doc.add(Field.Text("created",

DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("created

"), new ParsePosition(0); 
 doc.add(Field.Text("label", label)); 
 doc.add(Field.Text("keywords", keywords));
 
 writer.addDocument(doc);

 writer.optimize();
 writer.close();
} catch (Exception e) {
 ...
}
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


__
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: can't delete from an index using IndexReader.delete()

2003-06-22 Thread Otis Gospodnetic
The code looks fine.  Unfortunately, the provided code is not a full,
self-sufficient class that I can run on my machine to verify the
behaviour that you are describing.

Otis

--- Robert Koberg <[EMAIL PROTECTED]> wrote:
> 
> Hi,
> 
> I am using the latest binary distro (lucene-20030620.jar).  I am
> trying to
> delete an entry from an index and then add it back with updated
> information.
> 
> The entry is a content XML piece with some metadata added to the
> Document. I
> try to delete the entry by using a Term derived by the Field 'id' and
> the
> value of that field. The value is correct. What happens is that two
> entries
> exist after executing the code below. 
> 
> So, creating a Query for field 'id' with an example value 'abc' will
> return
> two hits. Any ideas what I am doing wrong? Is this a bug?
> 
> Also, if you see anything I am doing stupidly or that can be
> improved,
> please let me know.
> 
> Thanks,
> -Rob
> 
> 
> IndexReader reader =
> IndexReader.open(project.search_index_path.getNativePath());
> reader.delete(new Term("id", member.content_idref));
> reader.close();
> 
> ISO8601Converter iso_conv = new ISO8601Converter(); 
> 
> try {
>   IndexWriter writer = new
> IndexWriter(project.search_index_path.getNativePath(), new
> StandardAnalyzer(), false);
>   
>   File f = new
>
File(project.content_path.lookup(member.content_idref.concat(".xml")).getNat
> ivePath());
> 
>   XMLSearchHandler hdlr = new XMLSearchHandler(f);
> 
>   Document doc = hdlr.getDocument();
> 
>   doc.add(Field.Text("id", member.content_idref));  
>   doc.add(Field.Text("status", status)); 
>   doc.add(Field.Text("type", target_elem.getAttributeValue("type")));
> 
>   doc.add(Field.Text("creator",
> target_elem.getAttributeValue("creator"))); 
>   doc.add(Field.Text("last_mod_by", member.full_name)); 
>   doc.add(Field.Text("modified",
>
DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("modifie
> d"), new ParsePosition(0); 
>   doc.add(Field.Text("created",
>
DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("created
> "), new ParsePosition(0); 
>   doc.add(Field.Text("label", label)); 
>   doc.add(Field.Text("keywords", keywords));
>   
>   writer.addDocument(doc);
> 
>   writer.optimize();
>   writer.close();
> 
> } catch (Exception e) {
>   ...
> }
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]