[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Michael McCandless (JIRA) Fri, 12 Jun 2009 03:02:35 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718773#action_12718773
 ]


Michael McCandless commented on LUCENE-1685:
--------------------------------------------

Why not deprecate QueryScorer?  It's buggy, and leaving it in there, with such 
a juicy name, looking like the right choice, just makes Lucene's 
(highlighter's) quality look bad.  Correctness trumps performance.

And then the javadocs should clearly favor SpanScorer... and I would include a 
clear code fragment showing how to use it all, in context.  EG this is what 
LIA2 currently has, which is fine to copy/modify/etc. to get into the javadocs:

{code}
  public void testHits() throws Exception {
    IndexSearcher searcher = new 
IndexSearcher(TestUtil.getBookIndexDirectory());
    TermQuery query = new TermQuery(new Term("title", "action"));
    TopDocs hits = searcher.search(query, 10);

    Highlighter highlighter = new Highlighter(null);
    Analyzer analyzer = new SimpleAnalyzer();
    
    for (int i = 0; i < hits.scoreDocs.length; i++) {
      Document doc = searcher.doc(hits.scoreDocs[i].doc);
      String title = doc.get("title");

      TokenStream stream = 
TokenSources.getAnyTokenStream(searcher.getIndexReader(),
                                                          hits.scoreDocs[i].doc,
                                                          "title",
                                                          doc,
                                                          analyzer);
      SpanScorer scorer = new SpanScorer(query, "title",
                                         new CachingTokenFilter(stream));
      Fragmenter fragmenter = new SimpleSpanFragmenter(scorer);
      highlighter.setFragmentScorer(scorer);
      highlighter.setTextFragmenter(fragmenter);

      String fragment =
          highlighter.getBestFragment(stream, title);

      System.out.println(fragment);
    }
  }
{code}

It would also be nice to simplify that usage, eg, is there some way to not have 
to make a SpanScorer (and, by extension, fragmenter) per query, but instead 
make it up-front and add a setter for the new TokenStream for each doc?  
(Having to create Highlighter(null) is awkward).  Or I suppose we could simply 
make a new Highlighter, SpanScorer, SimpleSpanFragmenter per-hit, but that 
seems wasteful.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get 
> the SpanScorer included with Lucene at all, so I was pretty much ready to 
> move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that 
> back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been 
> conservative - one reason I havn't pushed for this change before. Might be 
> best to actually make the switch in 3? I could go either way - as is, I know 
> a bunch of people use it, but I'm betting its the large minority. It has 
> never been listed in a changes entry and its not in LIA 1, so you pretty much 
> have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any 
> clause of a query that is not position sensitive. Position sensitive query 
> clauses will obviously be somewhat slower to highlight, but that is because 
> they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Reply via email to