[jira] Created: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Marcelo F. Ochoa (JIRA) Wed, 22 Nov 2006 15:45:22 -0800

Oracle JVM implementation for Lucene DataStore also a preliminary 
implementation for an Oracle Domain index using Lucene
------------------------------------------------------------------------------------------------------------------------


                 Key: LUCENE-724
                 URL: http://issues.apache.org/jira/browse/LUCENE-724
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Store
    Affects Versions: 2.0.0
         Environment: Oracle 10g R2 with latest patchset, there is a txt file 
into the lib directory with the required libraries to compile this extension, 
which for legal issues I can't redistribute. All these libraries are include 
into the Oracle home directory,
            Reporter: Marcelo F. Ochoa
            Priority: Minor


Here a preliminary implementation of the Oracle JVM Directory data store which 
replace a file system by BLOB data storage.
The reason to do this is:
  - Using traditional File System for storing the inverted index is not a good 
option for some users.
  - Using BLOB for storing the inverted index running Lucene outside the Oracle 
database has a bad performance because there are a lot of network round trips 
and data marshalling.
  - Indexing relational data stores such as tables with VARCHAR2, CLOB or 
XMLType with Lucene running outside the database has the same problem as the 
previous point.
  - The JVM included inside the Oracle database can scale up to 10.000+ 
concurrent threads without memory leaks or deadlock and all the operation on 
tables are in the same memory space!!
  With these points in mind, I uploaded the complete Lucene framework inside 
the Oracle JVM and I runned the complete JUnit test case successful, except for 
some test such as the RMI test which requires special grants to open ports 
inside the database.
  The Lucene's test cases run faster inside the Oracle database (11g) than the 
Sun JDK 1.5, because the classes are automatically JITed after some executions.
  I had implemented and OJVMDirectory Lucene Store which replaces the file 
system storage with a BLOB based storage, compared with a RAMDirectory 
implementation is a bit slower but we gets all the benefits of the BLOB storage 
(backup, concurrence control, and so on).
 The OJVMDirectory is cloned from the source at
http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some 
changes to run faster inside the Oracle JVM.
 At this moment, I am working in a full integration with the SQL Engine using 
the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
 With this extension we can create a Lucene Inverted index in a table using:

create index it1 on t1(f2) indextype is LuceneIndex parameters('test');

 assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, 
after this, the query against the Lucene inverted index can be made using a new 
Oracle operator:

select * from t1 where contains(f2, 'Marcelo') = 1;

 the important point here is that this query is integrated with the execution 
plan of the Oracle database, so in this simple example the Oracle optimizer see 
that the column "f2" is indexed with the Lucene Domain index, then using the 
Data Cartridge API a Java code running inside the Oracle JVM is executed to 
open the search, a fetch all the ROWID that match with "Marcelo" and get the 
rows using the pointer,
here the output:

SELECT STATEMENT                                      ALL_ROWS      3       1   
    115
       TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
            DOMAIN INDEX LUCENE.IT1

 Another benefits of using the Data Cartridge API is that if the table T1 has 
insert, update or delete rows operations a corresponding Java method will be 
called to automatically update the Lucene Index.
  There is a simple HTML file with some explanation of the code.
   The install.sql script is not fully tested and must be lunched into the 
Oracle database, not remotely.
  Best regards, Marcelo.

- For Oracle users the big question is, Why do I use Lucene instead of Oracle 
Text which is implemented in C?
  I think that the answer is too simple, Lucene is open source and anybody can 
extend it and add the functionality needed
- For Lucene users which try to use Lucene as enterprise search engine, the 
Oracle JVM provides an highly scalable container which can scale up to 10.000+ 
concurrent session and with the facility of querying table in the same memory 
space.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Reply via email to