[
https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-2939:
--------------------------------
Description:
huge documents can be drastically slower than need be because the entire field
is added to the memory index
this cost can be greatly reduced in many cases if we try and respect
maxDocCharsToAnalyze
things can be improved even further by respecting this setting with
CachingTokenStream
was:
huge documents can be drastically slower than need be because the entire field
is added to the memory index
this cost can be greatly reduced in many cases if we try and respect
maxDocCharsToAnalyze
the cost is still not fantastic, but is at least improved in many situations
and can be influenced with this change
> Highlighter should try and use maxDocCharsToAnalyze in
> WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as
> when using CachingTokenStream
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-2939
> URL: https://issues.apache.org/jira/browse/LUCENE-2939
> Project: Lucene - Java
> Issue Type: Bug
> Components: contrib/highlighter
> Reporter: Mark Miller
> Assignee: Mark Miller
> Priority: Minor
> Attachments: LUCENE-2939.patch, LUCENE-2939.patch
>
>
> huge documents can be drastically slower than need be because the entire
> field is added to the memory index
> this cost can be greatly reduced in many cases if we try and respect
> maxDocCharsToAnalyze
> things can be improved even further by respecting this setting with
> CachingTokenStream
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]