[jira] Updated: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Marcelo F. Ochoa (JIRA) Thu, 27 Sep 2007 05:41:22 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marcelo F. Ochoa updated LUCENE-724:
------------------------------------

    Attachment: ojvm-09-27-07.tar.gz

This new release includes:
* Synchronized with latest Lucene 2.2.0 production
* Replaced in memory storage using Vector based implementation by direct BLOB 
IO, reducing memory usage for large index.
* Support for user data stores, it means you can not only index one column at 
time (limited by Data Cartridge API on 10g), now you can index multiples 
columns at base table and columns on related tabled joined together.
* User Data Stores can be customized by the user, it means writing a simple 
Java Class users can control which column are indexed, padding used or any 
other functionality previous to document adding step.
* There is a DefaultUserDataStore which gets all columns of the query and built 
a Lucene Document with Fields representing each database
* columns these fields are automatically padded if they have NUMBER or rounded 
if they have DATE data, for example.
* lcontains() SQL operator support full Lucene's QueryParser syntax to provide 
access to all columns indexed, see examples below.
* Support for DOMAIN_INDEX_SORT hint, it means that if you want to get rows 
order by lscore() operator (ascending,descending) the optimizer hint will 
assume that Lucene Domain Index will returns rowids in proper order avoided an 
inline-view to sort it.
* Automatic index synchronization by using AQ's Call Back.
* Lucene Domain Index creates extra tables named IndexName$T and an Oracle AQ 
named IndexName$Q with his storage table IndexName$QT at user's schema, so you 
can alter storage's preference if you want.
* ojvm project is at SourceForge.net CVS, so anybody can get it and collaborate 
;)
* Tested against 10gR2 and 11g database.
* LuceneDomainIndex.countHits() function to replace select count(*) from .. 
where lcontains(..)>0 syntax.
-  support inline pagination at lcontains(col,'rownum:[n TO m] AND ...") 
function
* see Readme.txt for details of usage and installation.
-------
Thanks to LendingClub.com to support this contribution.

> Oracle JVM implementation for Lucene DataStore also a preliminary 
> implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0, 2.2
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file 
> into the lib directory with the required libraries to compile this extension, 
> which for legal issues I can't redistribute. All these libraries are include 
> into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-01-09-07.tar.gz, ojvm-09-27-07.tar.gz, 
> ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store 
> which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a 
> good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the 
> Oracle database has a bad performance because there are a lot of network 
> round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or 
> XMLType with Lucene running outside the database has the same problem as the 
> previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ 
> concurrent threads without memory leaks or deadlock and all the operation on 
> tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside 
> the Oracle JVM and I runned the complete JUnit test case successful, except 
> for some test such as the RMI test which requires special grants to open 
> ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than 
> the Sun JDK 1.5, because the classes are automatically JITed after some 
> executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file 
> system storage with a BLOB based storage, compared with a RAMDirectory 
> implementation is a bit slower but we gets all the benefits of the BLOB 
> storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some 
> changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using 
> the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or 
> XMLType, after this, the query against the Lucene inverted index can be made 
> using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution 
> plan of the Oracle database, so in this simple example the Oracle optimizer 
> see that the column "f2" is indexed with the Lucene Domain index, then using 
> the Data Cartridge API a Java code running inside the Oracle JVM is executed 
> to open the search, a fetch all the ROWID that match with "Marcelo" and get 
> the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1 
>       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has 
> insert, update or delete rows operations a corresponding Java method will be 
> called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the 
> Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle 
> Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody 
> can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the 
> Oracle JVM provides an highly scalable container which can scale up to 
> 10.000+ concurrent session and with the facility of querying table in the 
> same memory space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Reply via email to