[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

Karl Wettin (JIRA) Thu, 20 Apr 2006 09:33:09 -0700

    [ 
http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12375394 ]


Karl Wettin commented on LUCENE-550:
------------------------------------

> > You might notice that norms are float[] and not byte[]. That is me who 
> > refactored it to see if it would do
> > any good. Bit shifting don't take many ticks, so I might just revert that.

> Since there are only 256 byte values, many scorers use a simple lookup table 
> Similarity.getNormDecoder()
> After I sped up norm decoding, a lookup table was only marginally faster 
> anyway (see comments in SmallFloat 
> class). So I wouldn't expect float[] norms to be mesurably faster than byte[] 
> norms in the context of a complete
> search.

The hypthesis is that instanciation and unnecessary data parsing is the bad 
guy. Converting bytes to floats fit that profile, so I moved it to the 
IO-classes (readFloat -> readByte). I relize that for the the norms alone, it 
is a marginal win, but if I find enough of these things it might show in the 
end.  Don't know if I'll find enough things to work with though. Been looking 
at getting ridth of things in the IndexReader as the information it returns in 
many situations already available in the information passed IndexReader, but 
I'm afraid it might be a Pyrrhus victory as the Jit usually automatically 
"caches" things like that. There are more obvious places to save ticks, e.g. 
replacing collections with arrays.

> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
>          Key: LUCENE-550
>          URL: http://issues.apache.org/jira/browse/LUCENE-550
>      Project: Lucene - Java
>         Type: New Feature

>   Components: Store
>     Versions: 1.9
>     Reporter: Karl Wettin
>  Attachments: Document.java, InstanciatedIndex.java, Term.java
>
> After fixing the bugs, it's now 4.5 -> 5 times the speed. This is true for 
> both at index and query time. Sorry if I got your hopes up too much. There 
> are still things to be done though. Might not have time to do anything with 
> this until next month, so here is the code if anyone wants a peek.
> Not good enough for Jira yet, but if someone wants to fool around with it, 
> here it is. The implementation passes a TermEnum -> TermDocs -> Fields -> 
> TermVector comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and positions are stored ugly 
> and has bugs.
> You might notice that norms are float[] and not byte[]. That is me who 
> refactored it to see if it would do any good. Bit shifting don't take many 
> ticks, so I might just revert that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

Reply via email to