[jira] Updated: (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

Mark Miller (JIRA) Sun, 27 Feb 2011 09:50:01 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mark Miller updated LUCENE-2939:
--------------------------------

    Description: 
huge documents can be drastically slower than need be because the entire field 
is added to the memory index
this cost can be greatly reduced in many cases if we try and respect 
maxDocCharsToAnalyze

things can be improved even further by respecting this setting with 
CachingTokenStream



  was:
huge documents can be drastically slower than need be because the entire field 
is added to the memory index
this cost can be greatly reduced in many cases if we try and respect 
maxDocCharsToAnalyze

the cost is still not fantastic, but is at least improved in many situations 
and can be influenced with this change




> Highlighter should try and use maxDocCharsToAnalyze in 
> WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as 
> when using CachingTokenStream
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2939
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2939
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-2939.patch, LUCENE-2939.patch
>
>
> huge documents can be drastically slower than need be because the entire 
> field is added to the memory index
> this cost can be greatly reduced in many cases if we try and respect 
> maxDocCharsToAnalyze
> things can be improved even further by respecting this setting with 
> CachingTokenStream

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

Reply via email to