[ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463377 ]
Marcelo F. Ochoa commented on LUCENE-724: ----------------------------------------- Latest code includes: - The Data Cartridge API is used without column data to reduce the data stored on the queue of changes and speedup the operation of the synchronize method. - Query Hits are cached associated to the index search and the string returned by the QueryParser.toString() method. - If no ancillary operator is used in the select, do not store the score list. - The "Stemmer" argument is recognized as parameter given the argument for the SnowBall analyzer, for example: create index it1 on t1(f2) indextype is lucene.LuceneIndex parameters('Stemmer:English');. - Before installing the ojvm extension is necessary to execute "ant jar-core" on the snowball directory. - The IndexWriter.setUseCompoundFile(false) is called to use multi file storage (faster than the compound file) because there is no file descriptor limitation inside the OJVM, BLOBs are used instead of File. - Files are marked for deletion and they are purged when calling to Sync or Optimize methods. - Blob are created and populated in one call using Oracle SQL RETURNING information. - A testing script for using OE sample schema, with query comparisons against Oracle Text ctxsys.context index. TODO: - ODCI Stats interface implementation to provide to the optimizer the information about the cost of using the Domain Index. - A binding for using FIRST_ROWS(n) optimizer hint. - A Digester class for loading DBLP database for testing very big indexes. - Support for column with XDBUriType values. > Oracle JVM implementation for Lucene DataStore also a preliminary > implementation for an Oracle Domain index using Lucene > ------------------------------------------------------------------------------------------------------------------------ > > Key: LUCENE-724 > URL: https://issues.apache.org/jira/browse/LUCENE-724 > Project: Lucene - Java > Issue Type: New Feature > Components: Store > Affects Versions: 2.0.0 > Environment: Oracle 10g R2 with latest patchset, there is a txt file > into the lib directory with the required libraries to compile this extension, > which for legal issues I can't redistribute. All these libraries are include > into the Oracle home directory, > Reporter: Marcelo F. Ochoa > Priority: Minor > Attachments: ojvm-01-09-07.tar.gz, ojvm-11-28-06.tar.gz, > ojvm-12-20-06.tar.gz, ojvm.tar.gz > > > Here a preliminary implementation of the Oracle JVM Directory data store > which replace a file system by BLOB data storage. > The reason to do this is: > - Using traditional File System for storing the inverted index is not a > good option for some users. > - Using BLOB for storing the inverted index running Lucene outside the > Oracle database has a bad performance because there are a lot of network > round trips and data marshalling. > - Indexing relational data stores such as tables with VARCHAR2, CLOB or > XMLType with Lucene running outside the database has the same problem as the > previous point. > - The JVM included inside the Oracle database can scale up to 10.000+ > concurrent threads without memory leaks or deadlock and all the operation on > tables are in the same memory space!! > With these points in mind, I uploaded the complete Lucene framework inside > the Oracle JVM and I runned the complete JUnit test case successful, except > for some test such as the RMI test which requires special grants to open > ports inside the database. > The Lucene's test cases run faster inside the Oracle database (11g) than > the Sun JDK 1.5, because the classes are automatically JITed after some > executions. > I had implemented and OJVMDirectory Lucene Store which replaces the file > system storage with a BLOB based storage, compared with a RAMDirectory > implementation is a bit slower but we gets all the benefits of the BLOB > storage (backup, concurrence control, and so on). > The OJVMDirectory is cloned from the source at > http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some > changes to run faster inside the Oracle JVM. > At this moment, I am working in a full integration with the SQL Engine using > the Data Cartridge API, it means using Lucene as a new Oracle Domain Index. > With this extension we can create a Lucene Inverted index in a table using: > create index it1 on t1(f2) indextype is LuceneIndex parameters('test'); > assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or > XMLType, after this, the query against the Lucene inverted index can be made > using a new Oracle operator: > select * from t1 where contains(f2, 'Marcelo') = 1; > the important point here is that this query is integrated with the execution > plan of the Oracle database, so in this simple example the Oracle optimizer > see that the column "f2" is indexed with the Lucene Domain index, then using > the Data Cartridge API a Java code running inside the Oracle JVM is executed > to open the search, a fetch all the ROWID that match with "Marcelo" and get > the rows using the pointer, > here the output: > SELECT STATEMENT ALL_ROWS 3 1 > 115 > TABLE ACCESS(BY INDEX ROWID) LUCENE.T1 3 1 115 > DOMAIN INDEX LUCENE.IT1 > Another benefits of using the Data Cartridge API is that if the table T1 has > insert, update or delete rows operations a corresponding Java method will be > called to automatically update the Lucene Index. > There is a simple HTML file with some explanation of the code. > The install.sql script is not fully tested and must be lunched into the > Oracle database, not remotely. > Best regards, Marcelo. > - For Oracle users the big question is, Why do I use Lucene instead of Oracle > Text which is implemented in C? > I think that the answer is too simple, Lucene is open source and anybody > can extend it and add the functionality needed > - For Lucene users which try to use Lucene as enterprise search engine, the > Oracle JVM provides an highly scalable container which can scale up to > 10.000+ concurrent session and with the facility of querying table in the > same memory space. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]