[
https://issues.apache.org/jira/browse/LUCENE-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507092
]
Mark Miller commented on LUCENE-937:
------------------------------------
The 15,000 calls are each on a separate document. The documents are reletivley
small...newspapers articles from Reuters. Anything smaller would have to be
very small.
I have again carefully tested LinkedList get VS LinkedList iterator and the
performance is identical as far as I can tell.
I'll do more work to prove my case when I get a free moment, but just to be
clear:
I am using small documents (15,000 different varying sized docs), measuring the
total time and dividing by 15,000. The results show a 43% improvement using
ArrayList(30). I will run a test will even smaller docs when I get a chance. In
my work with a new Span based Highlighter, I need this speed or my
implementation is slower than the old Highlighter. With this boost, my Span
based Highlighter is actually (very)slightly faster. If you decide to keep
things as they are I will have to roll an alternate CachingTokenFilter for my
Highlighter (no problem of course <g>).
Perhaps it is best to just leave things as they are and if you need more
performance on docs with more than a handful of tokens, make your own Caching
Filter. If the common case is closer to docs the size of newspaper articles or
larger, a 43% gain is hard to ignore.
I will get back about the speed when using very short documents.
>>Only the pointers to the objects are contiguous, right?
One of these days I will actually make that transition from C++ to Java <g> I
don't know where the speed is coming from then...but its a heck of a difference.
- Mark
> Make CachingTokenFilter faster
> ------------------------------
>
> Key: LUCENE-937
> URL: https://issues.apache.org/jira/browse/LUCENE-937
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Mark Miller
> Priority: Minor
> Attachments: CachingTokenFilter.patch
>
>
> The wrong data structure was used for the CachingTokenFilter. It should be an
> ArrayList rather than a LinkedList. There is a noticeable difference in speed.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]