[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

Karl Wettin (JIRA) Mon, 10 Dec 2007 14:28:11 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550215
 ]


Karl Wettin commented on LUCENE-550:
------------------------------------

{quote}
Grant Ingersoll - 10/Dec/07 02:11 PM
> courtesy of Olivier Chafik
What does this mean? He contributed the code personally or you got it from him? 
In other words, do you have the authority to assign the ASF copyright for said 
code?
{/quote}

Yes, 

http://ochafik.free.fr/blog/?p=106


Karl Wettin dit: 
20 October 2007 at 7:54 pm
Hi Olivier,

I was just going nuts over the lack of offset and length in 
Collections.binarySearch. I was thinking that perhaps a subList would be OK, 
but it turns out that the overhead of AbstractList.subList (in my case an 
ArrayList) is huge. It takes 1/3 the time to search the complete subList owner 
of 5000 instanes compared to instantiate and binarySearch a subListIn(2500, 
5000).

Google suggested your blog post.

I have based some non-released optimization in 
http://issues.apache.org/jira/browse/LUCENE-550 on your code. Would you mind 
donating it to the Apache Software Foundation? Lucene does not state author 
credits in source code, only in CHANGES.TXT.

LUCENE-550 is an alternative RAM index store that is up to 100x faster than the 
standard RAMDirectory and it is built to support my machine learning projects 
such as http://issues.apache.org/jira/browse/LUCENE-626 and 
http://issues.apache.org/jira/browse/LUCENE-1025

zOlive dit: 
21 October 2007 at 9:02 am
Hi Karl,

Thanks for your message, I'm happy to hear that someone actually made some use 
of this code !
Apart from the offset feature, the only specificity of my code is its relative 
speed for lookups in sorted integer lists, which I'm unsure whether it's 
exactly your use case or not.
However, I will be more than pleased to contribute this tiny piece of code to 
Apache, and I must say I'm a bit surprised that there isn't such a method in 
any of their projects yet (say, in Jakarta Commons - 
http://commons.apache.org/collections/).
Where shall I post it to ?

Karl Wettin dit: 
21 October 2007 at 4:32 pm
Thanks!

You don't need to post it anywhere, I have simply pasted it in this class of 
mine and adapted it to fit my needs.

It is indeed an int[] (actually MyClass[].getInt()) I'm seeking in, the 
variable pivot is most welcome.

> InstantiatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
>                 Key: LUCENE-550
>                 URL: https://issues.apache.org/jira/browse/LUCENE-550
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>            Reporter: Karl Wettin
>            Assignee: Grant Ingersoll
>         Attachments: HitCollectionBench.jpg, 
> LUCENE-550_20071021_no_core_changes.txt, test-reports.zip
>
>
> Represented as a coupled graph of class instances, this all-in-memory index 
> store implementation delivers search results up to a 100 times faster than 
> the file-centric RAMDirectory at the cost of greater RAM consumption.
> Performance seems to be a little bit better than log2n (binary search). No 
> real data on that, just my eyes.
> Populated with a single document InstantiatedIndex is almost, but not quite, 
> as fast as MemoryIndex.    
> At 20,000 document 10-50 characters long InstantiatedIndex outperforms 
> RAMDirectory some 30x,
> 15x at 100 documents of 2000 charachters length,
> and is linear to RAMDirectory at 10,000 documents of 2000 characters length.
> Mileage may vary depending on term saturation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

Reply via email to