[ https://issues.apache.org/jira/browse/LUCENE-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Smiley updated LUCENE-6445: --------------------------------- Attachment: LUCENE-6445_TokenSources_simplification.patch Attached patch. The 2nd method name is actually "getTermVectorTokenStreamOrNull", and I decided that positions on the term vector needn't be a hard requirement. The patch adds a test for the maxStartOffset behavior. The javadocs for these two methods are quite complete, including a warning about multi-valued fields. Solr calls one of these now with the maxStartOffset, so it will benefit. Updating all the test calls was a bit tedious. Also, this highlighter module now depends on analysis-common for the LimitTokenOffsetFilter. > Highlighter TokenSources simplification; just one getAnyTokenStream() > --------------------------------------------------------------------- > > Key: LUCENE-6445 > URL: https://issues.apache.org/jira/browse/LUCENE-6445 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter > Reporter: David Smiley > Assignee: David Smiley > Attachments: LUCENE-6445_TokenSources_simplification.patch > > > The Highlighter "TokenSources" class has quite a few utility methods > pertaining to getting a TokenStream from either term vectors or analyzed > text. I think it's too much: > * some go to term vectors, some don't. But if you don't want to go to term > vectors, then it's quite easy for the caller to invoke the Analyzer for the > field value, and to get that field value. > * Some methods return null, some never null; I forget which at a glance. > * Some methods read the Document (to get a field value) from the IndexReader, > some don't. Furthermore, it's not an ideal place to get the doc since your > app might be using an IndexSearcher with a document cache (e.g. > SolrIndexSearcher). > * None of the methods accept a Fields instance from term vectors as a > parameter. Based on how Lucene's term vector format works, this is a > performance trap if you don't re-use an instance across fields on the > document that you're highlighting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org