[
https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718773#action_12718773
]
Michael McCandless commented on LUCENE-1685:
--------------------------------------------
Why not deprecate QueryScorer? It's buggy, and leaving it in there, with such
a juicy name, looking like the right choice, just makes Lucene's
(highlighter's) quality look bad. Correctness trumps performance.
And then the javadocs should clearly favor SpanScorer... and I would include a
clear code fragment showing how to use it all, in context. EG this is what
LIA2 currently has, which is fine to copy/modify/etc. to get into the javadocs:
{code}
public void testHits() throws Exception {
IndexSearcher searcher = new
IndexSearcher(TestUtil.getBookIndexDirectory());
TermQuery query = new TermQuery(new Term("title", "action"));
TopDocs hits = searcher.search(query, 10);
Highlighter highlighter = new Highlighter(null);
Analyzer analyzer = new SimpleAnalyzer();
for (int i = 0; i < hits.scoreDocs.length; i++) {
Document doc = searcher.doc(hits.scoreDocs[i].doc);
String title = doc.get("title");
TokenStream stream =
TokenSources.getAnyTokenStream(searcher.getIndexReader(),
hits.scoreDocs[i].doc,
"title",
doc,
analyzer);
SpanScorer scorer = new SpanScorer(query, "title",
new CachingTokenFilter(stream));
Fragmenter fragmenter = new SimpleSpanFragmenter(scorer);
highlighter.setFragmentScorer(scorer);
highlighter.setTextFragmenter(fragmenter);
String fragment =
highlighter.getBestFragment(stream, title);
System.out.println(fragment);
}
}
{code}
It would also be nice to simplify that usage, eg, is there some way to not have
to make a SpanScorer (and, by extension, fragmenter) per query, but instead
make it up-front and add a setter for the new TokenStream for each doc?
(Having to create Highlighter(null) is awkward). Or I suppose we could simply
make a new Highlighter, SpanScorer, SimpleSpanFragmenter per-hit, but that
seems wasteful.
> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
> Key: LUCENE-1685
> URL: https://issues.apache.org/jira/browse/LUCENE-1685
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Mark Miller
> Assignee: Mark Miller
> Priority: Minor
> Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get
> the SpanScorer included with Lucene at all, so I was pretty much ready to
> move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that
> back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been
> conservative - one reason I havn't pushed for this change before. Might be
> best to actually make the switch in 3? I could go either way - as is, I know
> a bunch of people use it, but I'm betting its the large minority. It has
> never been listed in a changes entry and its not in LIA 1, so you pretty much
> have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any
> clause of a query that is not position sensitive. Position sensitive query
> clauses will obviously be somewhat slower to highlight, but that is because
> they will be highlighted correctly rather than ignoring position.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]