[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516403 ]
Michael McCandless commented on LUCENE-871: ------------------------------------------- OK, for LUCENE-969 I made yet a 3rd option for optimizing ISOLatin1AccentFilter. In that patch I reuse the Token instance, using the char[] API for the Token's text instead of String, and I also re-use a single TokenStream instance (I did this for all core tokenizers). I just tested total time to tokenize all wikipedia content with current trunk (1116 sec) vs with LUCENE-969 (500 sec), with a WhitespaceTokenizer -> ISOLatin1AccentFilter chain. I separately timed just creating the documents at 112 sec, to subtract it off from the above times (so I can measure only cost of tokenization). This gives net speedup of this filter is 2.97X faster (1004 sec -> 388 sec). > ISOLatin1AccentFilter a bit slow > -------------------------------- > > Key: LUCENE-871 > URL: https://issues.apache.org/jira/browse/LUCENE-871 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2 > Reporter: Ian Boston > Attachments: fasterisoremove1.patch, fasterisoremove2.patch, > ISOLatin1AccentFilter.java.patch > > > The ISOLatin1AccentFilter is a bit slow giving 300+ ms responses when used in > a highligher for output responses. > Patch to follow -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]