[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

wolfgang hoschek (JIRA) Tue, 21 Nov 2006 19:38:34 -0800

    [ 
http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12451730 ] 
            
wolfgang hoschek commented on LUCENE-550:
-----------------------------------------


What's the benchmark configuration? For example, is throughput bounded by 
indexing or querying?  Measuring N queries against a single preindexed document 
vs. 1 precompiled query against N documents? See the line

boolean measureIndexing = false; // toggle this to measure query performance

in my driver. If measuring indexing, what kind of analyzer / token filter chain 
is used? If measuring queries, what kind of query types are in the mix, with 
which relative frequencies? 

You may want to experiment with modifying/commenting/uncommenting various parts 
of the driver setup, for any given target scenario. Would it be possible to 
post the benchmark code, test data, queries for analysis?


> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
>                 Key: LUCENE-550
>                 URL: http://issues.apache.org/jira/browse/LUCENE-550
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 1.9
>            Reporter: Karl Wettin
>         Attachments: class_diagram.png, class_diagram.png, 
> instanciated_20060527.tar, InstanciatedIndexTermEnum.java, 
> lucene.1.9-karl1.jpg, lucene2-karl_20060722.tar.gz, 
> lucene2-karl_20060723.tar.gz
>
>
> After fixing the bugs, it's now 4.5 -> 5 times the speed. This is true for 
> both at index and query time. Sorry if I got your hopes up too much. There 
> are still things to be done though. Might not have time to do anything with 
> this until next month, so here is the code if anyone wants a peek.
> Not good enough for Jira yet, but if someone wants to fool around with it, 
> here it is. The implementation passes a TermEnum -> TermDocs -> Fields -> 
> TermVector comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and positions are stored ugly 
> and has bugs.
> You might notice that norms are float[] and not byte[]. That is me who 
> refactored it to see if it would do any good. Bit shifting don't take many 
> ticks, so I might just revert that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

Reply via email to