[jira] Commented: (LUCENE-937) Make CachingTokenFilter faster

Mark Miller (JIRA) Thu, 21 Jun 2007 19:02:47 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507092
 ]


Mark Miller commented on LUCENE-937:
------------------------------------

The 15,000 calls are each on a separate document. The documents are reletivley 
small...newspapers articles from Reuters. Anything smaller would have to be 
very small.

I have again carefully tested LinkedList get VS LinkedList iterator and the 
performance is identical as far as I can tell.

I'll do more work to prove my case when I get a free moment, but just to be 
clear:

I am using small documents (15,000 different varying sized docs), measuring the 
total time and dividing by 15,000. The results show a 43% improvement using 
ArrayList(30). I will run a test will even smaller docs when I get a chance. In 
my work with a new Span based Highlighter, I need this speed or my 
implementation is slower than the old Highlighter. With this boost, my Span 
based Highlighter is actually (very)slightly faster. If you decide to keep 
things as they are I will have to roll an alternate CachingTokenFilter for my 
Highlighter (no problem of course <g>).

Perhaps it is best to just leave things as they are and if you need more 
performance on docs with more than a handful of tokens, make your own Caching 
Filter. If the common case is closer to docs the size of newspaper articles or 
larger, a 43% gain is hard to ignore. 

I will get back about the speed when using very short documents.

>>Only the pointers to the objects are contiguous, right? 
One of these days I will actually make that transition from C++ to Java <g> I 
don't know where the speed is coming from then...but its a heck of a difference.

- Mark


> Make CachingTokenFilter faster
> ------------------------------
>
>                 Key: LUCENE-937
>                 URL: https://issues.apache.org/jira/browse/LUCENE-937
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachingTokenFilter.patch
>
>
> The wrong data structure was used for the CachingTokenFilter. It should be an 
> ArrayList rather than a LinkedList. There is a noticeable difference in speed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-937) Make CachingTokenFilter faster

Reply via email to