Hi, Marcelo, Yes, putting it in the public space would be great. I personally would be very interested to have a look. Can it be posted on the 'lucene' website?
Vlad -----Original Message----- From: Marcelo Ochoa [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 22, 2006 8:10 AM To: java-user@lucene.apache.org Subject: Oracle and Lucene Integration Hi all: I read on this a list many threads about Lucene indexing framework integration with Oracle. http://www.gossamer-threads.com/lists/lucene/java-user/41104?search_stri ng=oracle%20jvm%20BLOB;#41104 So it push me to work in a Lucene and Oracle JVM (a Java virtual machine running inside the Oracle database). The reason to do this is: - Using traditional File System for storing the inverted index is not a good option for some users. - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling. - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point. - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!! With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database. The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions. I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on). The OJVMDirectory is cloned from the source at http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM. At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index. With this extension we can create a Lucene Inverted index in a table using: create index it1 on t1(f2) indextype is LuceneIndex parameters('test'); assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator: select * from t1 where contains(f2, 'Marcelo') = 1; the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer, here the output: SELECT STATEMENT ALL_ROWS 3 1 115 TABLE ACCESS(BY INDEX ROWID) LUCENE.T1 3 1 115 DOMAIN INDEX LUCENE.IT1 Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index. Well may be the email is so long, if anybody is interested in this implementation I can put in a public web site. Best regards, Marcelo. PD: For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C? I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed :) -- Marcelo F. Ochoa http://marcelo.ochoa.googlepages.com/home ______________ Do you Know DBPrism? Look @ DB Prism's Web Site http://www.dbprism.com.ar/index.html More info? Chapter 17 of the book "Programming the Oracle Database using Java & Web Services" http://www.amazon.com/gp/product/1555583296/ Chapter 21 of the book "Professional XML Databases" - Wrox Press http://www.amazon.com/gp/product/1861003587/ Chapter 8 of the book "Oracle & Open Source" - O'Reilly http://www.oreilly.com/catalog/oracleopen/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]