Re: Deleting index for DB indexing
Hi U can try out the following code to delete document based on KeyWords import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.Term; import org.apache.lucene.index.TermDocs; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.index.IndexReader; import org.apache.lucene.search.Query; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.Searcher; public class LuceneDelete { private static final String[] strSTOP_WORDS = { "and", "are" }; private void test() throws Exception { Analyzer objAnalyzer = new StandardAnalyzer(); IndexWriter index = new IndexWriter("index",objAnalyzer, true ); Document objDocument = new Document(); objDocument.add( Field.Keyword("name","Ebrahim Faisal")); objDocument.add( Field.Text("address","Chennai")); objDocument.add( Field.Keyword("designation","Software Engineer")); objDocument.add( Field.UnIndexed("xyz","123 IndexWriter index")); index.addDocument( objDocument ); objDocument = new Document(); objDocument.add( Field.Keyword("name","John Smith")); objDocument.add( Field.Text("address","Delhi")); objDocument.add( Field.Keyword("designation","Sr. Software Engineer")); objDocument.add( Field.UnIndexed("xyz","456 StandardAnalyzer true")); index.addDocument( objDocument ); index.optimize(); index.close(); //Logic for deleting IndexReader objIndexReader = IndexReader.open("index"); TermDocs objTermDocs = objIndexReader.termDocs(new Term("name","Ebrahim Faisal")); while( objTermDocs.next() ) { int docNum = objTermDocs.doc(); objDocument = objIndexReader.document( docNum ); if( objDocument.get("designation").equalsIgnoreCase("Software Engineer")) { objIndexReader.delete( docNum ); } } objIndexReader.close(); Searcher objIndexSearcher = new IndexSearcher("index"); Query objQuery = null; objQuery = QueryParser.parse("Delhi", "address" , objAnalyzer); Hits objHits = objIndexSearcher.search(objQuery); System.out.println(" objHits "+objHits.length()); for (int nStart = 0; nStart < objHits.length(); nStart++) { objDocument = objHits.doc(nStart); System.out.println(" address "+objDocument.get("address")); } objIndexSearcher.close(); objIndexSearcher = null; } public static void main(String[] args) throws Exception { new LuceneDelete().test(); } } E.FAISAL From: mahaveer jain <[EMAIL PROTECTED]> Reply-To: "Lucene Users List" To: Lucene Users List , Paul <[EMAIL PROTECTED]> Subject: Re: Deleting index for DB indexing Date: Thu, 30 Dec 2004 21:17:48 -0800 (PST) Thanks Paul, You idea seems to be good. I ll try that. I have one more question. Should the new key what I create have to be keyword ? or Can it be just a column in the index ? Mahaveer Paul <[EMAIL PROTECTED]> wrote: On Thu, 30 Dec 2004 08:36:04 -0800 (PST), mahaveer jain wrote: > I am indexing more that 5 tables. And each for them have autoincrement and > that is the primary key. So if I do find DocNum, it may so happen that it > may delete document I don't want to delete. you need to create your own global ID, I had the same problem (but I used a MD5 hashvalue). One solution ist to give each of your tables an internal number and when creating your lucene-documents you add an additional field with something like "dbInternalId*100+dbNumber" so that db-record 5 in table 3 results in 503. when documents from your DB are deleted and you need to update the index you simple create a term which's value is calculated the same way and delete the document with the IndexReader.delete(Term) Ins
Re: Deleting index for DB indexing
Thanks Paul, You idea seems to be good. I ll try that. I have one more question. Should the new key what I create have to be keyword ? or Can it be just a column in the index ? Mahaveer Paul <[EMAIL PROTECTED]> wrote: On Thu, 30 Dec 2004 08:36:04 -0800 (PST), mahaveer jain wrote: > I am indexing more that 5 tables. And each for them have autoincrement and > that is the primary key. So if I do find DocNum, it may so happen that it > may delete document I don't want to delete. you need to create your own global ID, I had the same problem (but I used a MD5 hashvalue). One solution ist to give each of your tables an internal number and when creating your lucene-documents you add an additional field with something like "dbInternalId*100+dbNumber" so that db-record 5 in table 3 results in 503. when documents from your DB are deleted and you need to update the index you simple create a term which's value is calculated the same way and delete the document with the IndexReader.delete(Term) Instead of calculating you can do string concatenating as well :) Paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Deleting index for DB indexing
On Thu, 30 Dec 2004 08:36:04 -0800 (PST), mahaveer jain <[EMAIL PROTECTED]> wrote: > I am indexing more that 5 tables. And each for them have autoincrement and > that is the primary key. So if I do find DocNum, it may so happen that it > may delete document I don't want to delete. you need to create your own global ID, I had the same problem (but I used a MD5 hashvalue). One solution ist to give each of your tables an internal number and when creating your lucene-documents you add an additional field with something like "dbInternalId*100+dbNumber" so that db-record 5 in table 3 results in 503. when documents from your DB are deleted and you need to update the index you simple create a term which's value is calculated the same way and delete the document with the IndexReader.delete(Term) Instead of calculating you can do string concatenating as well :) Paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Deleting index for DB indexing
mahaveer jain writes: > I am using lucene for my DB indexing. I have 2 columns which are Keyword. > Now I want to delete my index based on this 2 keyword. > > Is it possible ? If no. What is other alternative ? > You can delete documents based on document number from an index reader. You can get document numbers from searches. So if you can search documents to be deleted based on your keywords, there should be no problem deleting them... HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Deleting index for DB indexing
I am indexing more that 5 tables. And each for them have autoincrement and that is the primary key. So if I do find DocNum, it may so happen that it may delete document I don't want to delete. Paul <[EMAIL PROTECTED]> wrote: Alternative: create a hashed value which is unique within your DB (e.g. use md5). Afterwards you can delete documents from the index with the IndexReader(Term). Without that additional field you can use the IndexSearcher to retrieve your documents from the index and then use IndexReader(DocNum) to delete these documents Paul On Thu, 30 Dec 2004 07:18:39 -0800 (PST), mahaveer jain wrote: > Hi All, > > I am using lucene for my DB indexing. I have 2 columns which are Keyword. > Now I want to delete my index based on this 2 keyword. > > Is it possible ? If no. What is other alternative ? > > Thanks > Mahaveer > > > - > Do you Yahoo!? > Yahoo! Mail - 250MB free storage. Do more. Manage less. > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Do you Yahoo!? Dress up your holiday email, Hollywood style. Learn more.
Re: Deleting index for DB indexing
Alternative: create a hashed value which is unique within your DB (e.g. use md5). Afterwards you can delete documents from the index with the IndexReader(Term). Without that additional field you can use the IndexSearcher to retrieve your documents from the index and then use IndexReader(DocNum) to delete these documents Paul On Thu, 30 Dec 2004 07:18:39 -0800 (PST), mahaveer jain <[EMAIL PROTECTED]> wrote: > Hi All, > > I am using lucene for my DB indexing. I have 2 columns which are Keyword. > Now I want to delete my index based on this 2 keyword. > > Is it possible ? If no. What is other alternative ? > > Thanks > Mahaveer > > > - > Do you Yahoo!? > Yahoo! Mail - 250MB free storage. Do more. Manage less. > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Deleting index for DB indexing
Hi All, I am using lucene for my DB indexing. I have 2 columns which are Keyword. Now I want to delete my index based on this 2 keyword. Is it possible ? If no. What is other alternative ? Thanks Mahaveer - Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less.