[ 
https://issues.apache.org/jira/browse/LUCENE-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507094
 ] 

Mark Miller commented on LUCENE-937:
------------------------------------

> So it may be safe to say that if you can estimate the list size (avoiding 
> array grow), AL is preferable if there's no add/remove not at the end.

In the CachingTokenFilter case I don't even believe it is really necessary to 
estimate the list size. Many of the documents I used had way more than 30 
tokens, but initializing the Array larger gave no benefits. I believe this is 
because the ArrayList doubles each time it grows (not guaranteed, but how it is 
implemented), and so a small increase in size can dramatically lower the number 
of resizes needed even when the List must grow *much* bigger than the init 
size. 10 just doesn't cut it, but 30 works great. A LinkedList (iterator or 
get()) seems to perform no better than an ArrayList(10).

- Mark

> Make CachingTokenFilter faster
> ------------------------------
>
>                 Key: LUCENE-937
>                 URL: https://issues.apache.org/jira/browse/LUCENE-937
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachingTokenFilter.patch
>
>
> The wrong data structure was used for the CachingTokenFilter. It should be an 
> ArrayList rather than a LinkedList. There is a noticeable difference in speed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to