[jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Marcelo F. Ochoa (JIRA) Tue, 09 Jan 2007 11:42:49 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463377
 ]


Marcelo F. Ochoa commented on LUCENE-724:
-----------------------------------------

Latest code includes:
- The  Data Cartridge API is used without column data to reduce the data stored 
on the queue of changes and speedup the operation of the synchronize method.
- Query Hits are cached associated to the index search and the string returned 
by the QueryParser.toString() method.
- If no ancillary operator is used in the select, do not store the score list.
- The "Stemmer" argument is recognized as parameter given the argument for the 
SnowBall analyzer, for example: create index it1 on t1(f2) indextype is 
lucene.LuceneIndex parameters('Stemmer:English');.
- Before installing the ojvm extension is necessary to execute "ant jar-core" 
on the snowball directory.
- The IndexWriter.setUseCompoundFile(false) is called to use multi file storage 
(faster than the compound file) because there is no file descriptor limitation 
inside the OJVM, BLOBs are used instead of File.
- Files are marked for deletion and they are purged when calling to Sync or 
Optimize methods.
- Blob are created and populated in one call using Oracle SQL RETURNING 
information.
- A testing script for using OE sample schema, with query comparisons against 
Oracle Text ctxsys.context index. 

TODO:
- ODCI Stats interface implementation to provide to the optimizer the 
information about the cost of using the Domain Index. 
- A binding for using FIRST_ROWS(n) optimizer hint.
- A Digester class for loading DBLP database for testing very big indexes.
- Support for column with XDBUriType values.

> Oracle JVM implementation for Lucene DataStore also a preliminary 
> implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file 
> into the lib directory with the required libraries to compile this extension, 
> which for legal issues I can't redistribute. All these libraries are include 
> into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-01-09-07.tar.gz, ojvm-11-28-06.tar.gz, 
> ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store 
> which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a 
> good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the 
> Oracle database has a bad performance because there are a lot of network 
> round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or 
> XMLType with Lucene running outside the database has the same problem as the 
> previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ 
> concurrent threads without memory leaks or deadlock and all the operation on 
> tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside 
> the Oracle JVM and I runned the complete JUnit test case successful, except 
> for some test such as the RMI test which requires special grants to open 
> ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than 
> the Sun JDK 1.5, because the classes are automatically JITed after some 
> executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file 
> system storage with a BLOB based storage, compared with a RAMDirectory 
> implementation is a bit slower but we gets all the benefits of the BLOB 
> storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some 
> changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using 
> the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or 
> XMLType, after this, the query against the Lucene inverted index can be made 
> using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution 
> plan of the Oracle database, so in this simple example the Oracle optimizer 
> see that the column "f2" is indexed with the Lucene Domain index, then using 
> the Data Cartridge API a Java code running inside the Oracle JVM is executed 
> to open the search, a fetch all the ROWID that match with "Marcelo" and get 
> the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1 
>       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has 
> insert, update or delete rows operations a corresponding Java method will be 
> called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the 
> Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle 
> Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody 
> can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the 
> Oracle JVM provides an highly scalable container which can scale up to 
> 10.000+ concurrent session and with the facility of querying table in the 
> same memory space.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Reply via email to