[
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550215
]
Karl Wettin commented on LUCENE-550:
------------------------------------
{quote}
Grant Ingersoll - 10/Dec/07 02:11 PM
> courtesy of Olivier Chafik
What does this mean? He contributed the code personally or you got it from him?
In other words, do you have the authority to assign the ASF copyright for said
code?
{/quote}
Yes,
http://ochafik.free.fr/blog/?p=106
Karl Wettin dit:
20 October 2007 at 7:54 pm
Hi Olivier,
I was just going nuts over the lack of offset and length in
Collections.binarySearch. I was thinking that perhaps a subList would be OK,
but it turns out that the overhead of AbstractList.subList (in my case an
ArrayList) is huge. It takes 1/3 the time to search the complete subList owner
of 5000 instanes compared to instantiate and binarySearch a subListIn(2500,
5000).
Google suggested your blog post.
I have based some non-released optimization in
http://issues.apache.org/jira/browse/LUCENE-550 on your code. Would you mind
donating it to the Apache Software Foundation? Lucene does not state author
credits in source code, only in CHANGES.TXT.
LUCENE-550 is an alternative RAM index store that is up to 100x faster than the
standard RAMDirectory and it is built to support my machine learning projects
such as http://issues.apache.org/jira/browse/LUCENE-626 and
http://issues.apache.org/jira/browse/LUCENE-1025
zOlive dit:
21 October 2007 at 9:02 am
Hi Karl,
Thanks for your message, I'm happy to hear that someone actually made some use
of this code !
Apart from the offset feature, the only specificity of my code is its relative
speed for lookups in sorted integer lists, which I'm unsure whether it's
exactly your use case or not.
However, I will be more than pleased to contribute this tiny piece of code to
Apache, and I must say I'm a bit surprised that there isn't such a method in
any of their projects yet (say, in Jakarta Commons -
http://commons.apache.org/collections/).
Where shall I post it to ?
Karl Wettin dit:
21 October 2007 at 4:32 pm
Thanks!
You don't need to post it anywhere, I have simply pasted it in this class of
mine and adapted it to fit my needs.
It is indeed an int[] (actually MyClass[].getInt()) I'm seeking in, the
variable pivot is most welcome.
> InstantiatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: https://issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Store
> Affects Versions: 2.0.0
> Reporter: Karl Wettin
> Assignee: Grant Ingersoll
> Attachments: HitCollectionBench.jpg,
> LUCENE-550_20071021_no_core_changes.txt, test-reports.zip
>
>
> Represented as a coupled graph of class instances, this all-in-memory index
> store implementation delivers search results up to a 100 times faster than
> the file-centric RAMDirectory at the cost of greater RAM consumption.
> Performance seems to be a little bit better than log2n (binary search). No
> real data on that, just my eyes.
> Populated with a single document InstantiatedIndex is almost, but not quite,
> as fast as MemoryIndex.
> At 20,000 document 10-50 characters long InstantiatedIndex outperforms
> RAMDirectory some 30x,
> 15x at 100 documents of 2000 charachters length,
> and is linear to RAMDirectory at 10,000 documents of 2000 characters length.
> Mileage may vary depending on term saturation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]