Re: Finding out which fields matched the query

Alan Woodward Mon, 27 Jun 2022 04:10:21 -0700

Your approach is almost certainly more efficient, but it might give you false 
matches in some cases - for example, if you have a complex query with many 
nested MUST and SHOULD clauses, you can have a leaf TermScorer that is 
positioned on the correct document, but which is part of a clause that doesn’t 
actually match.  It also only works for term queries, so it won’t match phrases 
or span/interval groups.  And Matches will work on points or docvalues queries 
as well.  The reason I added Matches in the first place was precisely to handle 
these weird corner cases - I had written highlighters which more or less did 
the same thing you describe with a Collector and the Scorable tree, and I would 
occasionally get bad highlights back.


> On 27 Jun 2022, at 10:51, Shai Erera <ser...@gmail.com> wrote:
> 
> Out of curiosity and for education purposes, is the Collector approach I 
> proposed wrong/inefficient? Or less efficient than the matches() API?
> 
> I'm thinking, if you want to both match/rank documents and as a side effect 
> know which fields matched, the Collector will perform better than 
> Weight.matches(), but I could be wrong.
> 
> Shai
> 
> On Mon, Jun 27, 2022 at 11:57 AM Dawid Weiss <dawid.we...@gmail.com 
> <mailto:dawid.we...@gmail.com>> wrote:
> The matches API is awesome. Use it. You can also get a rough glimpse
> into a superset of fields potentially matching the query via:
> 
>     query.visit(
>         new QueryVisitor() {
>           @Override
>           public boolean acceptField(String field) {
>             affectedFields.add(field);
>             return false;
>           }
>         });
> 
> https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/Query.html#visit(org.apache.lucene.search.QueryVisitor)
>  
> <https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/Query.html#visit(org.apache.lucene.search.QueryVisitor)>
> 
> I'd go with the Matches API though.
> 
> Dawid
> 
> On Mon, Jun 27, 2022 at 10:48 AM Alan Woodward <romseyg...@gmail.com 
> <mailto:romseyg...@gmail.com>> wrote:
> >
> > The Matches API will give you this information - it’s still likely to be 
> > fairly slow, but it’s a lot easier to use than trying to parse Explain 
> > output.
> >
> > Query q = ….;
> > Weight w = searcher.createWeight(searcher.rewrite(query), 
> > ScoreMode.COMPLETE_NO_SCORES, 1.0f);
> >
> > Matches m = w.matches(context, doc);
> > List<String> matchingFields = new ArrayList();
> > for (String field : m) {
> >  matchingFields.add(field);
> > }
> >
> > Bear in mind that `matches` doesn’t maintain any state between calls, so 
> > calling it for every matching document is likely to be slow; for those 
> > cases Shai’s suggestion of using a Collector and examining low-level 
> > scorers will perform better, but it won’t work for every query type.
> >
> >
> > > On 25 Jun 2022, at 04:14, Yichen Sun <yiche...@bu.edu 
> > > <mailto:yiche...@bu.edu>> wrote:
> > >
> > > Hello!
> > >
> > > I’m a MSCS student from BU and learning to use Lucene. Recently I try to 
> > > output matched fields by one query. For example, for one document, there 
> > > are 10 fields and 2 of them match the query. I want to get the name of 
> > > these fields.
> > >
> > > I have tried using explain() method and getting description then regex. 
> > > However it cost so much time.
> > >
> > > I wonder what is the efficient way to get the matched fields. Would you 
> > > please offer some help? Thank you so much!
> > >
> > > Best regards,
> > > Yichen Sun
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
> > <mailto:dev-unsubscr...@lucene.apache.org>
> > For additional commands, e-mail: dev-h...@lucene.apache.org 
> > <mailto:dev-h...@lucene.apache.org>
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
> <mailto:dev-unsubscr...@lucene.apache.org>
> For additional commands, e-mail: dev-h...@lucene.apache.org 
> <mailto:dev-h...@lucene.apache.org>
>

Re: Finding out which fields matched the query

Reply via email to