[ https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Miller updated LUCENE-2939: -------------------------------- Description: huge documents can be drastically slower than need be because the entire field is added to the memory index this cost can be greatly reduced in many cases if we try and respect maxDocCharsToAnalyze things can be improved even further by respecting this setting with CachingTokenStream was: huge documents can be drastically slower than need be because the entire field is added to the memory index this cost can be greatly reduced in many cases if we try and respect maxDocCharsToAnalyze the cost is still not fantastic, but is at least improved in many situations and can be influenced with this change > Highlighter should try and use maxDocCharsToAnalyze in > WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as > when using CachingTokenStream > ---------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-2939 > URL: https://issues.apache.org/jira/browse/LUCENE-2939 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/highlighter > Reporter: Mark Miller > Assignee: Mark Miller > Priority: Minor > Attachments: LUCENE-2939.patch, LUCENE-2939.patch > > > huge documents can be drastically slower than need be because the entire > field is added to the memory index > this cost can be greatly reduced in many cases if we try and respect > maxDocCharsToAnalyze > things can be improved even further by respecting this setting with > CachingTokenStream -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org