Re: Deleting index for DB indexing

mohamed ebrahim faisal Fri, 31 Dec 2004 06:08:13 -0800

Hi

U can try out the following code to delete document based on KeyWords


import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;

import org.apache.lucene.search.Searcher;

public class LuceneDelete
{
        private static final String[] strSTOP_WORDS =
       {
                        "and",
                        "are"
                         };
        private void test() throws Exception
        {
                Analyzer objAnalyzer = new StandardAnalyzer();
                IndexWriter index = new IndexWriter("index",objAnalyzer, true );


            Document objDocument = new Document();

                objDocument.add( Field.Keyword("name","Ebrahim Faisal"));
                objDocument.add( Field.Text("address","Chennai"));
                objDocument.add( Field.Keyword("designation","Software 
Engineer"));
                objDocument.add( Field.UnIndexed("xyz","123 IndexWriter 
index"));

                index.addDocument( objDocument );

                objDocument = new Document();

                objDocument.add( Field.Keyword("name","John Smith"));
                objDocument.add( Field.Text("address","Delhi"));
                objDocument.add( Field.Keyword("designation","Sr. Software 
Engineer"));
                objDocument.add( Field.UnIndexed("xyz","456 StandardAnalyzer 
true"));

                index.addDocument( objDocument );

                index.optimize();
                index.close();

                //Logic for deleting

                IndexReader objIndexReader = IndexReader.open("index");

TermDocs objTermDocs = objIndexReader.termDocs(new Term("name","Ebrahim Faisal"));

                while( objTermDocs.next() )
                {
                        int docNum = objTermDocs.doc();
                        objDocument = objIndexReader.document( docNum );
                        if( 
objDocument.get("designation").equalsIgnoreCase("Software Engineer"))
                        {
                                objIndexReader.delete( docNum );
                        }
                }
                objIndexReader.close();


            Searcher objIndexSearcher = new IndexSearcher("index");

                Query objQuery = null;

                objQuery = QueryParser.parse("Delhi", "address"
             , objAnalyzer);


            Hits objHits = objIndexSearcher.search(objQuery);

                System.out.println(" objHits "+objHits.length());

                for (int nStart = 0; nStart < objHits.length(); nStart++)
                {
                        objDocument = objHits.doc(nStart);
                        System.out.println(" address 
"+objDocument.get("address"));
                }
                objIndexSearcher.close();
                objIndexSearcher = null;


    }
        public static void main(String[] args) throws Exception
        {
                new LuceneDelete().test();
        }
}


E.FAISAL

From: mahaveer jain <[EMAIL PROTECTED]> Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org> To: Lucene Users List <lucene-user@jakarta.apache.org>, Paul <[EMAIL PROTECTED]> Subject: Re: Deleting index for DB indexing Date: Thu, 30 Dec 2004 21:17:48 -0800 (PST)
Thanks Paul,
You idea seems to be good. I ll try that. I have one more question. Should the new key what I create have to be keyword ? or Can it be just a column in the index ?
Mahaveer
Paul <[EMAIL PROTECTED]> wrote: On Thu, 30 Dec 2004 08:36:04 -0800 (PST), mahaveer jain wrote: > I am indexing more that 5 tables. And each for them have autoincrement and > that is the primary key. So if I do find DocNum, it may so happen that it > may delete document I don't want to delete.
you need to create your own global ID, I had the same problem (but I
used a MD5 hashvalue). One solution ist to give each of your tables an
internal number and when creating your lucene-documents you add an
additional field with something like "dbInternalId*100+dbNumber" so
that db-record 5 in table 3 results in 503. when documents from your
DB are deleted and you need to update the index you simple create a
term which's value is calculated the same way and delete the document
with the IndexReader.delete(Term)
Instead of calculating you can do string concatenating as well :)
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

_________________________________________________________________ The MS Office product suite. Make efficiency a habit. http://www.microsoft.com/india/office/experience/ Simplify your life.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Deleting index for DB indexing

Reply via email to