[ https://issues.apache.org/jira/browse/LUCENE-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507094 ]
Mark Miller commented on LUCENE-937: ------------------------------------ > So it may be safe to say that if you can estimate the list size (avoiding > array grow), AL is preferable if there's no add/remove not at the end. In the CachingTokenFilter case I don't even believe it is really necessary to estimate the list size. Many of the documents I used had way more than 30 tokens, but initializing the Array larger gave no benefits. I believe this is because the ArrayList doubles each time it grows (not guaranteed, but how it is implemented), and so a small increase in size can dramatically lower the number of resizes needed even when the List must grow *much* bigger than the init size. 10 just doesn't cut it, but 30 works great. A LinkedList (iterator or get()) seems to perform no better than an ArrayList(10). - Mark > Make CachingTokenFilter faster > ------------------------------ > > Key: LUCENE-937 > URL: https://issues.apache.org/jira/browse/LUCENE-937 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Mark Miller > Priority: Minor > Attachments: CachingTokenFilter.patch > > > The wrong data structure was used for the CachingTokenFilter. It should be an > ArrayList rather than a LinkedList. There is a noticeable difference in speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]