Wow, very cool, even though I don't use Oracle anywhere at the moment.
You probably don't want that rowid field tokenized, by the way.

Otis

----- Original Message ----
From: Marcelo Ochoa <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, November 22, 2006 8:44:58 AM
Subject: Re: Oracle and Lucene Integration

Hi Mark:
> Very interesting.
>
> So how does this solution manage mapping Oracle primary keys to and from 
> Lucene doc ids?
  I am storing the rowid value as a Document field, here a code sniped
                Document doc = new Document();
                doc.add(new Field("rowid", rowid, Field.Store.YES,
                          Field.Index.TOKENIZED));
                Object value = rs.getObject(2);
                String valueStr = null;
                if (value!=null) { // Sanity checks
                  if (value instanceof CLOB)
                    valueStr =
((CLOB)value).getSubString(1,(int)((CLOB)value).length());
                  else if (value instanceof XMLType)
                    valueStr =
((XMLType)value).extract("//text()","").getStringVal();
                  else
                    valueStr = value.toString();
                  doc.add(new
Field(col,valueStr,Field.Store.NO,Field.Index.TOKENIZED));
                  writer.addDocument(doc);

  So when I am querying I can get the rowid back using:
                if (iterator.hasNext()) {
                    // append rowid to collection
                    Hit hit = (Hit) iterator.next();
                    try {
                        rid =  hit.get("rowid");
                        score = hit.getScore();
                    } catch (IOException e) {
                        e.printStackTrace();
                        throw new SQLException(e.getMessage());
                    }
                    rlist[i] = new String(rid);
                    slist.put(rid,new Float(score));
                    idx++;
                } else {.............
  and passing the rowid to the Oracle execution plan which is
collecting in bacth of 2000 rowids.
>
> >> Another benefits of using the Data Cartridge API is that if the
> >>table T1 has insert, update or delete rows operations a corresponding
> >>Java method will be called to automatically update the Lucene Index.
>
> I suspect the tricky bit is optimizing the opening/closing of Lucene 
> IndexReaders/Writers especially in the event of large batches of database 
> updates.
> Does this API pass the transactional info which would help organize the 
> batching of the Lucene reader.delete and writer.add calls?
  Well, I think that Oracle Text uses a Queue to store large batches,
because it use a ctx_sys.sync procedure to update the index ;)
  We can make the same solution using Oracle AQ.
>
> Cheers
> Mark
 Best regards, Marcelo.
-- 
Marcelo F. Ochoa
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to