[ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470090
 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Looks like a good start, Mark - thanks for contributing this!

I've had a quick play and have identified the following issues:

1) Fieldname "contents" shouldn't be hardcoded into the Highlighter - different 
analyzers can behave differently for different fields (see 
PerFieldAnalyzerWrapper). Either pass a fieldname parameter or do as the 
existing highlighter does and take a TokenStream. The latter approach has the 
advantage of being able to avoid re-analysis and make use of any stored 
TermVectors (see TokenSources.java)
2) Analyzers which produce overlapping tokens (see Synonym analyzer in existing 
highlighter Junit test) are problematic in the existing code. I remember the 
"TokenGroup" class in the existing highlighter was an approach to help cater 
for these "overlap" scenarios.
3) Without wishing to resurrect the whole 1.4 vs 1.5 debate I beleive Lucene 
still targets Java 1.4. 

To rectify these points it's not clear to me if it would be quicker to use your 
code or adapt the existing highlighter code to use spans.
Thoughts?

Thanks, again,
Mark





 

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5 requirements (generics) 
> that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, 
> Highlighter.java, HighlighterTest.java, QuerySpansExtractor.java, 
> SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting 
> approach to the existing highlighter in contrib. See 
> http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to