Re: Deleting index for DB indexing

2004-12-31 Thread mohamed ebrahim faisal
Hi
U can try out the following code to delete document based on KeyWords
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.Searcher;
public class LuceneDelete
{
private static final String[] strSTOP_WORDS =
   {
"and",
"are"
 };
private void test() throws Exception
{
Analyzer objAnalyzer = new StandardAnalyzer();
IndexWriter index = new IndexWriter("index",objAnalyzer, true );
Document objDocument = new Document();
objDocument.add( Field.Keyword("name","Ebrahim Faisal"));
objDocument.add( Field.Text("address","Chennai"));
objDocument.add( Field.Keyword("designation","Software 
Engineer"));
objDocument.add( Field.UnIndexed("xyz","123 IndexWriter 
index"));
index.addDocument( objDocument );
objDocument = new Document();
objDocument.add( Field.Keyword("name","John Smith"));
objDocument.add( Field.Text("address","Delhi"));
objDocument.add( Field.Keyword("designation","Sr. Software 
Engineer"));
objDocument.add( Field.UnIndexed("xyz","456 StandardAnalyzer 
true"));
index.addDocument( objDocument );
index.optimize();
index.close();
//Logic for deleting
IndexReader objIndexReader = IndexReader.open("index");
		TermDocs objTermDocs = objIndexReader.termDocs(new Term("name","Ebrahim 
Faisal"));

while( objTermDocs.next() )
{
int docNum = objTermDocs.doc();
objDocument = objIndexReader.document( docNum );
if( 
objDocument.get("designation").equalsIgnoreCase("Software Engineer"))
{
objIndexReader.delete( docNum );
}
}
objIndexReader.close();
Searcher objIndexSearcher = new IndexSearcher("index");
Query objQuery = null;
objQuery = QueryParser.parse("Delhi", "address"
 , objAnalyzer);
Hits objHits = objIndexSearcher.search(objQuery);
System.out.println(" objHits "+objHits.length());
for (int nStart = 0; nStart < objHits.length(); nStart++)
{
objDocument = objHits.doc(nStart);
System.out.println(" address 
"+objDocument.get("address"));
}
objIndexSearcher.close();
objIndexSearcher = null;
    }
    public static void main(String[] args) throws Exception
{
new LuceneDelete().test();
}
}

E.FAISAL


From: mahaveer jain <[EMAIL PROTECTED]>
Reply-To: "Lucene Users List" 
To: Lucene Users List ,  Paul 
<[EMAIL PROTECTED]>
Subject: Re: Deleting index for DB indexing
Date: Thu, 30 Dec 2004 21:17:48 -0800 (PST)

Thanks Paul,
You idea seems to be good. I ll try that. I have one more question. Should 
the new key what I create have to be keyword ? or Can it be just a column 
in the index ?

Mahaveer
Paul <[EMAIL PROTECTED]> wrote:
On Thu, 30 Dec 2004 08:36:04 -0800 (PST), mahaveer jain
wrote:
> I am indexing more that 5 tables. And each for them have autoincrement 
and
> that is the primary key. So if I do find DocNum, it may so happen that 
it
> may delete document I don't want to delete.

you need to create your own global ID, I had the same problem (but I
used a MD5 hashvalue). One solution ist to give each of your tables an
internal number and when creating your lucene-documents you add an
additional field with something like "dbInternalId*100+dbNumber" so
that db-record 5 in table 3 results in 503. when documents from your
DB are deleted and you need to update the index you simple create a
term which's value is calculated the same way and delete the document
with the IndexReader.delete(Term)
Ins

Re: Deleting index for DB indexing

2004-12-30 Thread mahaveer jain
Thanks Paul,
 
You idea seems to be good. I ll try that. I have one more question. Should the 
new key what I create have to be keyword ? or Can it be just a column in the 
index ?
 
Mahaveer

Paul <[EMAIL PROTECTED]> wrote:
On Thu, 30 Dec 2004 08:36:04 -0800 (PST), mahaveer jain
wrote:
> I am indexing more that 5 tables. And each for them have autoincrement and
> that is the primary key. So if I do find DocNum, it may so happen that it
> may delete document I don't want to delete. 

you need to create your own global ID, I had the same problem (but I
used a MD5 hashvalue). One solution ist to give each of your tables an
internal number and when creating your lucene-documents you add an
additional field with something like "dbInternalId*100+dbNumber" so
that db-record 5 in table 3 results in 503. when documents from your
DB are deleted and you need to update the index you simple create a
term which's value is calculated the same way and delete the document
with the IndexReader.delete(Term)
Instead of calculating you can do string concatenating as well :)

Paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Deleting index for DB indexing

2004-12-30 Thread Paul
On Thu, 30 Dec 2004 08:36:04 -0800 (PST), mahaveer jain
<[EMAIL PROTECTED]> wrote:
> I am indexing more that 5 tables. And each for them have autoincrement and
> that is the primary key. So if I do find DocNum, it may so happen that it
> may delete document I don't want to delete. 

you need to create your own global ID, I had the same problem (but I
used a MD5 hashvalue). One solution ist to give each of your tables an
internal number and when creating your lucene-documents you add an
additional field with something like "dbInternalId*100+dbNumber" so
that db-record 5 in table 3 results in 503. when documents from your
DB are deleted and you need to update the index you simple create a
term which's value is calculated the same way and delete the document
with the IndexReader.delete(Term)
Instead of calculating you can do string concatenating as well :)

Paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Deleting index for DB indexing

2004-12-30 Thread Morus Walter
mahaveer jain writes:

> I am using lucene for my DB indexing. I have 2 columns which are Keyword. 
> Now I want to delete my index based on this 2 keyword. 
>  
> Is it possible ? If no. What is other alternative ?
>  
You can delete documents based on document number from an index reader.
You can get document numbers from searches.
So if you can search documents to be deleted based on your keywords, there
should be no problem deleting them...

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Deleting index for DB indexing

2004-12-30 Thread mahaveer jain
I am indexing more that 5 tables. And each for them have autoincrement and that 
is the primary key. So if I do find DocNum, it may so happen that it may delete 
document I don't want to delete.
 
Paul <[EMAIL PROTECTED]> wrote:
Alternative: create a hashed value which is unique within your DB
(e.g. use md5). Afterwards you can delete documents from the index
with the IndexReader(Term).
Without that additional field you can use the IndexSearcher to
retrieve your documents from the index and then use
IndexReader(DocNum) to delete these documents

Paul


On Thu, 30 Dec 2004 07:18:39 -0800 (PST), mahaveer jain
wrote:
> Hi All,
> 
> I am using lucene for my DB indexing. I have 2 columns which are Keyword.
> Now I want to delete my index based on this 2 keyword.
> 
> Is it possible ? If no. What is other alternative ?
> 
> Thanks
> Mahaveer
> 
> 
> -
> Do you Yahoo!?
> Yahoo! Mail - 250MB free storage. Do more. Manage less.
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
Do you Yahoo!?
 Dress up your holiday email, Hollywood style. Learn more.

Re: Deleting index for DB indexing

2004-12-30 Thread Paul
Alternative: create a hashed value which is unique within your DB
(e.g. use md5). Afterwards you can delete documents from the index
with the IndexReader(Term).
Without that additional field you can use the IndexSearcher to
retrieve your documents from the index and then use
IndexReader(DocNum) to delete these documents

Paul


On Thu, 30 Dec 2004 07:18:39 -0800 (PST), mahaveer jain
<[EMAIL PROTECTED]> wrote:
> Hi All,
> 
> I am using lucene for my DB indexing. I have 2 columns which are Keyword.
> Now I want to delete my index based on this 2 keyword.
> 
> Is it possible ? If no. What is other alternative ?
> 
> Thanks
> Mahaveer
> 
> 
> -
> Do you Yahoo!?
>  Yahoo! Mail - 250MB free storage. Do more. Manage less.
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Deleting index for DB indexing

2004-12-30 Thread mahaveer jain
Hi All,
 
I am using lucene for my DB indexing. I have 2 columns which are Keyword. 
Now I want to delete my index based on this 2 keyword. 
 
Is it possible ? If no. What is other alternative ?
 
Thanks 
Mahaveer
 
 


-
Do you Yahoo!?
 Yahoo! Mail - 250MB free storage. Do more. Manage less.