[jira] Commented: (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

Robert Muir (JIRA) Fri, 04 Mar 2011 05:35:03 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002602#comment-13002602
 ]


Robert Muir commented on LUCENE-2939:
-------------------------------------

{quote}
I mind your attitude. Changing the issue target 2 seconds after Grant with no 
discussion. Declaring on your own that it won't get in. Not trying to get to a 
real conversation about the issue (which you clearly don't fully understand if 
you think storing term vectors will help). These things are my issue, not any 
so called push back.
{quote}

Its not an attitude, and its not personal. Its trying to stop last minute stuff 
from being shoved into the release right before the RC, especially if its not 
fully-formed patches ready to be committed.

{quote}
Well man, you need us on your team too. Performance bug is a technical valid 
reason for a -1 on a release. I'm not threatening that - but I'm pointing out 
that everyone needs to be on board - not just the RM. Taking the time for fair 
discussion is not a waste of time.
{quote}

I totally agree with you here. But some people might say, if the bug has been 
aroudn since say 2.4 or 2.9 that its not critical that it be fixed in 3.1 at 
the last minute, and still +1 the release.

As i stated earlier on this issue, I'm sympathetic to performance bugs: 
performance bugs are bugs too. But we need to evaluate risk-reward here.

Just don't forget that there are other performance problems with large 
documents in lucene (some have been around a while) and we aren't trying to 
shove any last minute fixes for those in.

So, here are my questions:
# What version of Lucene was this performance bug introduced in? Is it 
something we introduced in version 3.1? If this is the case its more serious 
than if its something thats been around since 2.9.
# Why is fast-vector highlighter with TVs "ok", but highlighter with TVs slow?


> Highlighter should try and use maxDocCharsToAnalyze in 
> WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as 
> when using CachingTokenStream
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2939
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2939
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2939.patch, LUCENE-2939.patch, LUCENE-2939.patch
>
>
> huge documents can be drastically slower than need be because the entire 
> field is added to the memory index
> this cost can be greatly reduced in many cases if we try and respect 
> maxDocCharsToAnalyze
> things can be improved even further by respecting this setting with 
> CachingTokenStream

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

Reply via email to