Re: can't delete from an index using IndexReader.delete()
Dhruba Borthakur wrote: Hi folks, I am using the latest and greatest Lucene jar file and am facing a problem with deleting documents from the index. Browsing the mail archive, I found that the following email (June 2003) listed the exact problem that I am encountering. In short: I am using Field.text("id", "value") to mark a document. Then I use reader.delete(new Term("id", "value")) to remove the document: this call returns 0 and fails to delete the document. The attached sample program shows this behaviour. Agreed... you're values might be indexed... try adding them as Tokens... Kevin -- Please reply using PGP: http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster begin:vcard fn:Kevin Burton n:Burton;Kevin email;internet:[EMAIL PROTECTED] x-mozilla-html:TRUE version:2.1 end:vcard signature.asc Description: OpenPGP digital signature
RE: can't delete from an index using IndexReader.delete()
You should use Field.Keyword rather than Field.Text for the identifier because you do not want it tokenized. doc.add(Field.Keyword("id", whatever)); In 2 places in your example code. -- Ian. [EMAIL PROTECTED] > [EMAIL PROTECTED] (Robert Koberg) wrote > > Here is a simple class that can reproduce the problem (happens with the last > stable release too). Let me know if you would prefer this as an attachment. > > Call like this: > java TestReaderDelete existing_id new_label > - or - > > Try: > java TestReaderDelete B724547 ppp > > and then try: > java TestReaderDelete a266122794 ppp > > If an index has not been created it will create one. Keep running the one of > the above example commands (with and without deleting the index directory) > and see what happens to the System.out.println's > > > > import org.apache.lucene.index.IndexReader; > import org.apache.lucene.index.Term; > import org.apache.lucene.analysis.Analyzer; > import org.apache.lucene.analysis.standard.StandardAnalyzer; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.document.Document; > import org.apache.lucene.document.Field; > import org.apache.lucene.document.DateField; > > import org.xml.sax.*; > import org.xml.sax.helpers.*; > import org.xml.sax.Attributes; > import javax.xml.parsers.*; > > import java.io.*; > import java.util.*; > > > class TestReaderDelete { > > > > public static void main(String[] args) > throws IOException > { > File index = new File("./testindex"); > if (!index.exists()) { > HashMap test_map = new HashMap(); > test_map.put("preamble_content", "Preamble content bbb"); > test_map.put("art_01_section_01", "Article 1, Section 1"); > test_map.put("toc_tester", "Test TOC XML bbb"); > test_map.put("B724547", "bio example"); > test_map.put("a266122794", "tester"); > indexFiles(index, test_map); > } > String identifier = args[0]; > String new_label = args[1]; > testDeleteAndAdd(index, identifier, new_label); > } > > > public static void indexFiles(File index, HashMap test_map) > { > try { > IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), > true); > for (Iterator i=test_map.entrySet().iterator(); i.hasNext(); ) { > Map.Entry e = (Map.Entry) i.next(); > System.out.println("Adding: " + e.getKey() + " = " + e.getValue()); > Document doc = new Document(); > doc.add(Field.Text("id", (String)e.getKey())); > doc.add(Field.Text("label", (String)e.getValue())); > writer.addDocument(doc); > } > writer.optimize(); > writer.close(); > } catch (Exception e) { > System.out.println(" caught a " + e.getClass() + >"\n with message: " + e.getMessage()); > } > } > > > public static void testDeleteAndAdd(File index, String identifier, String > new_label) > throws IOException > { > IndexReader reader = IndexReader.open(index); > System.out.println("!!! reader.numDocs() : " + reader.numDocs()); > System.out.println("reader.indexExists(): " + reader.indexExists(index)); > > System.out.println("term field: " + new Term("id", identifier).field()); > System.out.println("term text: " + new Term("id", identifier).text()); > System.out.println("reader.docFreq: " + reader.docFreq(new Term("id", > identifier))); > System.out.println("deleting target now..."); > int deleted_num = reader.delete(new Term("id", identifier)); > System.out.println("*** deleted_num: " + deleted_num); > reader.close(); > try { > IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), > false); > String ident = identifier; > Document doc = new Document(); > doc.add(Field.Text("id", identifier)); > doc.add(Field.Text("label", new_label)); > writer.addDocument(doc); > writer.optimize(); > writer.close(); > } catch (Exception e) { > System.out.println(" caught a " + e.getClass() + >"\n with message: " + e.getMessage()); > } > > System.out.println("!!! reader.numDocs() after deleting and adding : " + > reader.numDocs()); > } > > } > > > > > -Original Message- > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > Sent: Sunday, June 22, 2003 9:42 PM > > To: Lucene Users List > > > > The code looks fine. Unfortunately, the provided code is not a full, > > self-sufficient class that I can run on my machine to verify the > > behaviour that you are describing. > > > > Otis > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Searchable personal storage and archiving from http://www.digimem.net/ --
RE: can't delete from an index using IndexReader.delete()
Here is a simple class that can reproduce the problem (happens with the last stable release too). Let me know if you would prefer this as an attachment. Call like this: java TestReaderDelete existing_id new_label - or - Try: java TestReaderDelete B724547 ppp and then try: java TestReaderDelete a266122794 ppp If an index has not been created it will create one. Keep running the one of the above example commands (with and without deleting the index directory) and see what happens to the System.out.println's import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.DateField; import org.xml.sax.*; import org.xml.sax.helpers.*; import org.xml.sax.Attributes; import javax.xml.parsers.*; import java.io.*; import java.util.*; class TestReaderDelete { public static void main(String[] args) throws IOException { File index = new File("./testindex"); if (!index.exists()) { HashMap test_map = new HashMap(); test_map.put("preamble_content", "Preamble content bbb"); test_map.put("art_01_section_01", "Article 1, Section 1"); test_map.put("toc_tester", "Test TOC XML bbb"); test_map.put("B724547", "bio example"); test_map.put("a266122794", "tester"); indexFiles(index, test_map); } String identifier = args[0]; String new_label = args[1]; testDeleteAndAdd(index, identifier, new_label); } public static void indexFiles(File index, HashMap test_map) { try { IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), true); for (Iterator i=test_map.entrySet().iterator(); i.hasNext(); ) { Map.Entry e = (Map.Entry) i.next(); System.out.println("Adding: " + e.getKey() + " = " + e.getValue()); Document doc = new Document(); doc.add(Field.Text("id", (String)e.getKey())); doc.add(Field.Text("label", (String)e.getValue())); writer.addDocument(doc); } writer.optimize(); writer.close(); } catch (Exception e) { System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage()); } } public static void testDeleteAndAdd(File index, String identifier, String new_label) throws IOException { IndexReader reader = IndexReader.open(index); System.out.println("!!! reader.numDocs() : " + reader.numDocs()); System.out.println("reader.indexExists(): " + reader.indexExists(index)); System.out.println("term field: " + new Term("id", identifier).field()); System.out.println("term text: " + new Term("id", identifier).text()); System.out.println("reader.docFreq: " + reader.docFreq(new Term("id", identifier))); System.out.println("deleting target now..."); int deleted_num = reader.delete(new Term("id", identifier)); System.out.println("*** deleted_num: " + deleted_num); reader.close(); try { IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), false); String ident = identifier; Document doc = new Document(); doc.add(Field.Text("id", identifier)); doc.add(Field.Text("label", new_label)); writer.addDocument(doc); writer.optimize(); writer.close(); } catch (Exception e) { System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage()); } System.out.println("!!! reader.numDocs() after deleting and adding : " + reader.numDocs()); } } > -Original Message- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > Sent: Sunday, June 22, 2003 9:42 PM > To: Lucene Users List > > The code looks fine. Unfortunately, the provided code is not a full, > self-sufficient class that I can run on my machine to verify the > behaviour that you are describing. > > Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: can't delete from an index using IndexReader.delete()
Thanks Jeff and Otis, After some more testing I am finding that the bug affects only certain docs. For example if I have a Document in the index with the following IDs it will not be deleted: 'preamble_content' 'toc_testor' 'B724547' The following IDs will work and delete the Document from the index: 'art_01_section_01' 'a266122794' When editing one of duplicates (triplicates, etc) metadata and saving it again (goes through delete and then re-add again) it adds another entry to the index with the same id Term. On providing complete code - I will try to get that out to the list. It currently reads from config XML files. I will try to make a simple example. Thanks again, -Rob > -Original Message- > From: Jeff Linwood [mailto:[EMAIL PROTECTED] > Sent: Sunday, June 22, 2003 9:54 PM > To: Lucene Users List > > Hi, > > Can you check the return value of your reader.delete(...); call? > According to the Javadocs, it should return the number of documents it > deleted, maybe you can verify that it is deleting an entry? > > Jeff > > Otis Gospodnetic wrote: > > The code looks fine. Unfortunately, the provided code is not a full, > > self-sufficient class that I can run on my machine to verify the > > behaviour that you are describing. > > > > Otis > > > > --- Robert Koberg <[EMAIL PROTECTED]> wrote: > > > >>Hi, > >> > >>I am using the latest binary distro (lucene-20030620.jar). I am > >>trying to > >>delete an entry from an index and then add it back with updated > >>information. > >> > >>The entry is a content XML piece with some metadata added to the > >>Document. I > >>try to delete the entry by using a Term derived by the Field 'id' and > >>the > >>value of that field. The value is correct. What happens is that two > >>entries > >>exist after executing the code below. > >> > >>So, creating a Query for field 'id' with an example value 'abc' will > >>return > >>two hits. Any ideas what I am doing wrong? Is this a bug? > >> > >>Also, if you see anything I am doing stupidly or that can be > >>improved, > >>please let me know. > >> > >>Thanks, > >>-Rob > >> > >> > >>IndexReader reader = > >>IndexReader.open(project.search_index_path.getNativePath()); > >>reader.delete(new Term("id", member.content_idref)); > >>reader.close(); > >> > >>ISO8601Converter iso_conv = new ISO8601Converter(); > >> > >>try { > >> IndexWriter writer = new > >>IndexWriter(project.search_index_path.getNativePath(), new > >>StandardAnalyzer(), false); > >> > >> File f = new > >> > > > > > File(project.content_path.lookup(member.content_idref.concat(".xml")).getN > at > > > >>ivePath()); > >> > >> XMLSearchHandler hdlr = new XMLSearchHandler(f); > >> > >> Document doc = hdlr.getDocument(); > >> > >> doc.add(Field.Text("id", member.content_idref)); > >> doc.add(Field.Text("status", status)); > >> doc.add(Field.Text("type", target_elem.getAttributeValue("type"))); > >> > >> doc.add(Field.Text("creator", > >>target_elem.getAttributeValue("creator"))); > >> doc.add(Field.Text("last_mod_by", member.full_name)); > >> doc.add(Field.Text("modified", > >> > > > > > DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("modif > ie > > > >>d"), new ParsePosition(0); > >> doc.add(Field.Text("created", > >> > > > > > DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("creat > ed > > > >>"), new ParsePosition(0); > >> doc.add(Field.Text("label", label)); > >> doc.add(Field.Text("keywords", keywords)); > >> > >> writer.addDocument(doc); > >> > >> writer.optimize(); > >> writer.close(); > >> > >>} catch (Exception e) { > >> ... > >>} > >> > >> > >>- > >>To unsubscribe, e-mail: [EMAIL PROTECTED] > >>For additional commands, e-mail: [EMAIL PROTECTED] > >> > > > > > > > > __ > > Do you Yahoo!? > > SBC Yahoo! DSL - Now only $29.95 per month! > > http://sbc.yahoo.com > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: can't delete from an index using IndexReader.delete()
Hi, Can you check the return value of your reader.delete(...); call? According to the Javadocs, it should return the number of documents it deleted, maybe you can verify that it is deleting an entry? Jeff Otis Gospodnetic wrote: The code looks fine. Unfortunately, the provided code is not a full, self-sufficient class that I can run on my machine to verify the behaviour that you are describing. Otis --- Robert Koberg <[EMAIL PROTECTED]> wrote: Hi, I am using the latest binary distro (lucene-20030620.jar). I am trying to delete an entry from an index and then add it back with updated information. The entry is a content XML piece with some metadata added to the Document. I try to delete the entry by using a Term derived by the Field 'id' and the value of that field. The value is correct. What happens is that two entries exist after executing the code below. So, creating a Query for field 'id' with an example value 'abc' will return two hits. Any ideas what I am doing wrong? Is this a bug? Also, if you see anything I am doing stupidly or that can be improved, please let me know. Thanks, -Rob IndexReader reader = IndexReader.open(project.search_index_path.getNativePath()); reader.delete(new Term("id", member.content_idref)); reader.close(); ISO8601Converter iso_conv = new ISO8601Converter(); try { IndexWriter writer = new IndexWriter(project.search_index_path.getNativePath(), new StandardAnalyzer(), false); File f = new File(project.content_path.lookup(member.content_idref.concat(".xml")).getNat ivePath()); XMLSearchHandler hdlr = new XMLSearchHandler(f); Document doc = hdlr.getDocument(); doc.add(Field.Text("id", member.content_idref)); doc.add(Field.Text("status", status)); doc.add(Field.Text("type", target_elem.getAttributeValue("type"))); doc.add(Field.Text("creator", target_elem.getAttributeValue("creator"))); doc.add(Field.Text("last_mod_by", member.full_name)); doc.add(Field.Text("modified", DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("modifie d"), new ParsePosition(0); doc.add(Field.Text("created", DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("created "), new ParsePosition(0); doc.add(Field.Text("label", label)); doc.add(Field.Text("keywords", keywords)); writer.addDocument(doc); writer.optimize(); writer.close(); } catch (Exception e) { ... } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: can't delete from an index using IndexReader.delete()
The code looks fine. Unfortunately, the provided code is not a full, self-sufficient class that I can run on my machine to verify the behaviour that you are describing. Otis --- Robert Koberg <[EMAIL PROTECTED]> wrote: > > Hi, > > I am using the latest binary distro (lucene-20030620.jar). I am > trying to > delete an entry from an index and then add it back with updated > information. > > The entry is a content XML piece with some metadata added to the > Document. I > try to delete the entry by using a Term derived by the Field 'id' and > the > value of that field. The value is correct. What happens is that two > entries > exist after executing the code below. > > So, creating a Query for field 'id' with an example value 'abc' will > return > two hits. Any ideas what I am doing wrong? Is this a bug? > > Also, if you see anything I am doing stupidly or that can be > improved, > please let me know. > > Thanks, > -Rob > > > IndexReader reader = > IndexReader.open(project.search_index_path.getNativePath()); > reader.delete(new Term("id", member.content_idref)); > reader.close(); > > ISO8601Converter iso_conv = new ISO8601Converter(); > > try { > IndexWriter writer = new > IndexWriter(project.search_index_path.getNativePath(), new > StandardAnalyzer(), false); > > File f = new > File(project.content_path.lookup(member.content_idref.concat(".xml")).getNat > ivePath()); > > XMLSearchHandler hdlr = new XMLSearchHandler(f); > > Document doc = hdlr.getDocument(); > > doc.add(Field.Text("id", member.content_idref)); > doc.add(Field.Text("status", status)); > doc.add(Field.Text("type", target_elem.getAttributeValue("type"))); > > doc.add(Field.Text("creator", > target_elem.getAttributeValue("creator"))); > doc.add(Field.Text("last_mod_by", member.full_name)); > doc.add(Field.Text("modified", > DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("modifie > d"), new ParsePosition(0); > doc.add(Field.Text("created", > DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("created > "), new ParsePosition(0); > doc.add(Field.Text("label", label)); > doc.add(Field.Text("keywords", keywords)); > > writer.addDocument(doc); > > writer.optimize(); > writer.close(); > > } catch (Exception e) { > ... > } > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > __ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]