[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

wolfgang hoschek (JIRA) Tue, 21 Nov 2006 17:36:41 -0800

    [ 
http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12451817 ] 
            
wolfgang hoschek commented on LUCENE-550:
-----------------------------------------


> All Lucene unit tests have been adapted to work with my alternate index. 
> Everything but proximity queries pass. 

Sounds like you're almost there :-)

Regarding indexing performance with MemoryIndex: Performance is more than good 
enough. I've observed and measured that often the bottleneck is not the 
MemoryIndex itself, but rather the Analyzer type (e.g. StandardAnalayzer) or 
the I/O for the input files or term lower casing 
(http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6265809) or something else 
entirely.

Regarding query performance with MemoryIndex: Some queries are more efficient 
than others. For example, fuzzy queries are much less efficient than wild card 
queries, which in turn are much less efficient than simple term queries. Such 
effects seem partly inherent due too the nature of the query type, partly a 
function of the chosen data structure (RAMDirectory, MemoryIndex, II, ...), and 
partly a consequence of the overall Lucene API design.

The query mix found in testqueries.txt is more intended for correctness testing 
than benchmarking. Therein, certain query types dominate over others, and thus, 
conclusions about the performance of individual aspects cannot easily be drawn.

Wolfgang.


> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
>                 Key: LUCENE-550
>                 URL: http://issues.apache.org/jira/browse/LUCENE-550
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 1.9
>            Reporter: Karl Wettin
>         Attachments: class_diagram.png, class_diagram.png, 
> instanciated_20060527.tar, InstanciatedIndexTermEnum.java, 
> lucene.1.9-karl1.jpg, lucene2-karl_20060722.tar.gz, 
> lucene2-karl_20060723.tar.gz
>
>
> After fixing the bugs, it's now 4.5 -> 5 times the speed. This is true for 
> both at index and query time. Sorry if I got your hopes up too much. There 
> are still things to be done though. Might not have time to do anything with 
> this until next month, so here is the code if anyone wants a peek.
> Not good enough for Jira yet, but if someone wants to fool around with it, 
> here it is. The implementation passes a TermEnum -> TermDocs -> Fields -> 
> TermVector comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and positions are stored ugly 
> and has bugs.
> You might notice that norms are float[] and not byte[]. That is me who 
> refactored it to see if it would do any good. Bit shifting don't take many 
> ticks, so I might just revert that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

Reply via email to