Hi Mark:
Very interesting.

So how does this solution manage mapping Oracle primary keys to and from Lucene 
doc ids?
 I am storing the rowid value as a Document field, here a code sniped
               Document doc = new Document();
               doc.add(new Field("rowid", rowid, Field.Store.YES,
                         Field.Index.TOKENIZED));
               Object value = rs.getObject(2);
               String valueStr = null;
               if (value!=null) { // Sanity checks
                 if (value instanceof CLOB)
                   valueStr =
((CLOB)value).getSubString(1,(int)((CLOB)value).length());
                 else if (value instanceof XMLType)
                   valueStr =
((XMLType)value).extract("//text()","").getStringVal();
                 else
                   valueStr = value.toString();
                 doc.add(new
Field(col,valueStr,Field.Store.NO,Field.Index.TOKENIZED));
                 writer.addDocument(doc);

 So when I am querying I can get the rowid back using:
               if (iterator.hasNext()) {
                   // append rowid to collection
                   Hit hit = (Hit) iterator.next();
                   try {
                       rid =  hit.get("rowid");
                       score = hit.getScore();
                   } catch (IOException e) {
                       e.printStackTrace();
                       throw new SQLException(e.getMessage());
                   }
                   rlist[i] = new String(rid);
                   slist.put(rid,new Float(score));
                   idx++;
               } else {.............
 and passing the rowid to the Oracle execution plan which is
collecting in bacth of 2000 rowids.

>> Another benefits of using the Data Cartridge API is that if the
>>table T1 has insert, update or delete rows operations a corresponding
>>Java method will be called to automatically update the Lucene Index.

I suspect the tricky bit is optimizing the opening/closing of Lucene 
IndexReaders/Writers especially in the event of large batches of database 
updates.
Does this API pass the transactional info which would help organize the 
batching of the Lucene reader.delete and writer.add calls?
 Well, I think that Oracle Text uses a Queue to store large batches,
because it use a ctx_sys.sync procedure to update the index ;)
 We can make the same solution using Oracle AQ.

Cheers
Mark
Best regards, Marcelo.
--
Marcelo F. Ochoa
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to