Re: Finding out which fields matched the query

Shai Erera Mon, 27 Jun 2022 04:46:39 -0700

Thanks Alan, yeah I guess I was thinking about the usecase I described,
which involves (usually) simple term queries, but you're definitely right
about complex boolean clauses as well non-term queries.


I think the case for highlighter is different though? I mean you usually
generate highlights only for the top-K results and therefore are probably
less affected by whether the matches() API is slower than a Collector. And
if you invoke the API for every document in the index, it might be much
slower (depending on the index size) than the Collector.

Maybe a hybrid approach which runs the query and caches the docs in a
DocIdSet (like FacetsCollector does) and then invokes the matches() API
only on those hits, will let you enjoy the best of both worlds? Assuming
though that the number of matching documents is not huge.

So it seems there are several options and one should choose based on their
usecase. Do you see an advantage for Lucene to offer a Collector for this
usecase? Or should we tell users to use the matches API

Shai

On Mon, Jun 27, 2022 at 2:22 PM Dawid Weiss <dawid.we...@gmail.com> wrote:

> A side note - I've been using a highlighter based on matches API for
> quite some time now and it's been fantastic. Very precise and handles
> non-trivial queries (interval queries) very well.
>
>
> https://lucene.apache.org/core/9_2_0/highlighter/org/apache/lucene/search/matchhighlight/package-summary.html
>
> Dawid
>
> On Mon, Jun 27, 2022 at 1:10 PM Alan Woodward <romseyg...@gmail.com>
> wrote:
> >
> > Your approach is almost certainly more efficient, but it might give you
> false matches in some cases - for example, if you have a complex query with
> many nested MUST and SHOULD clauses, you can have a leaf TermScorer that is
> positioned on the correct document, but which is part of a clause that
> doesn’t actually match.  It also only works for term queries, so it won’t
> match phrases or span/interval groups.  And Matches will work on points or
> docvalues queries as well.  The reason I added Matches in the first place
> was precisely to handle these weird corner cases - I had written
> highlighters which more or less did the same thing you describe with a
> Collector and the Scorable tree, and I would occasionally get bad
> highlights back.
> >
> > On 27 Jun 2022, at 10:51, Shai Erera <ser...@gmail.com> wrote:
> >
> > Out of curiosity and for education purposes, is the Collector approach I
> proposed wrong/inefficient? Or less efficient than the matches() API?
> >
> > I'm thinking, if you want to both match/rank documents and as a side
> effect know which fields matched, the Collector will perform better than
> Weight.matches(), but I could be wrong.
> >
> > Shai
> >
> > On Mon, Jun 27, 2022 at 11:57 AM Dawid Weiss <dawid.we...@gmail.com>
> wrote:
> >>
> >> The matches API is awesome. Use it. You can also get a rough glimpse
> >> into a superset of fields potentially matching the query via:
> >>
> >>     query.visit(
> >>         new QueryVisitor() {
> >>           @Override
> >>           public boolean acceptField(String field) {
> >>             affectedFields.add(field);
> >>             return false;
> >>           }
> >>         });
> >>
> >>
> https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/Query.html#visit(org.apache.lucene.search.QueryVisitor)
> >>
> >> I'd go with the Matches API though.
> >>
> >> Dawid
> >>
> >> On Mon, Jun 27, 2022 at 10:48 AM Alan Woodward <romseyg...@gmail.com>
> wrote:
> >> >
> >> > The Matches API will give you this information - it’s still likely to
> be fairly slow, but it’s a lot easier to use than trying to parse Explain
> output.
> >> >
> >> > Query q = ….;
> >> > Weight w = searcher.createWeight(searcher.rewrite(query),
> ScoreMode.COMPLETE_NO_SCORES, 1.0f);
> >> >
> >> > Matches m = w.matches(context, doc);
> >> > List<String> matchingFields = new ArrayList();
> >> > for (String field : m) {
> >> >  matchingFields.add(field);
> >> > }
> >> >
> >> > Bear in mind that `matches` doesn’t maintain any state between calls,
> so calling it for every matching document is likely to be slow; for those
> cases Shai’s suggestion of using a Collector and examining low-level
> scorers will perform better, but it won’t work for every query type.
> >> >
> >> >
> >> > > On 25 Jun 2022, at 04:14, Yichen Sun <yiche...@bu.edu> wrote:
> >> > >
> >> > > Hello!
> >> > >
> >> > > I’m a MSCS student from BU and learning to use Lucene. Recently I
> try to output matched fields by one query. For example, for one document,
> there are 10 fields and 2 of them match the query. I want to get the name
> of these fields.
> >> > >
> >> > > I have tried using explain() method and getting description then
> regex. However it cost so much time.
> >> > >
> >> > > I wonder what is the efficient way to get the matched fields. Would
> you please offer some help? Thank you so much!
> >> > >
> >> > > Best regards,
> >> > > Yichen Sun
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Finding out which fields matched the query

Reply via email to