[ 
https://issues.apache.org/jira/browse/LUCENE-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-6445:
---------------------------------
    Attachment: LUCENE-6445_TokenSources_simplification.patch

Attached patch.
The 2nd method name is actually "getTermVectorTokenStreamOrNull", and I decided 
that positions on the term vector needn't be a hard requirement.  

The patch adds a test for the maxStartOffset behavior. The javadocs for these 
two methods are quite complete, including a warning about multi-valued fields.  
Solr calls one of these now with the maxStartOffset, so it will benefit.  
Updating  all the test calls was a bit tedious.

Also, this highlighter module now depends on analysis-common for the 
LimitTokenOffsetFilter.

> Highlighter TokenSources simplification; just one getAnyTokenStream()
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-6445
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6445
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: LUCENE-6445_TokenSources_simplification.patch
>
>
> The Highlighter "TokenSources" class has quite a few utility methods 
> pertaining to getting a TokenStream from either term vectors or analyzed 
> text.  I think it's too much:
> * some go to term vectors, some don't.  But if you don't want to go to term 
> vectors, then it's quite easy for the caller to invoke the Analyzer for the 
> field value, and to get that field value.
> * Some methods return null, some never null; I forget which at a glance.
> * Some methods read the Document (to get a field value) from the IndexReader, 
> some don't.  Furthermore, it's not an ideal place to get the doc since your 
> app might be using an IndexSearcher with a document cache (e.g. 
> SolrIndexSearcher).
> * None of the methods accept a Fields instance from term vectors as a 
> parameter.  Based on how Lucene's term vector format works, this is a 
> performance trap if you don't re-use an instance across fields on the 
> document that you're highlighting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to