A side note - I've been using a highlighter based on matches API for quite some time now and it's been fantastic. Very precise and handles non-trivial queries (interval queries) very well.
https://lucene.apache.org/core/9_2_0/highlighter/org/apache/lucene/search/matchhighlight/package-summary.html Dawid On Mon, Jun 27, 2022 at 1:10 PM Alan Woodward <romseyg...@gmail.com> wrote: > > Your approach is almost certainly more efficient, but it might give you false > matches in some cases - for example, if you have a complex query with many > nested MUST and SHOULD clauses, you can have a leaf TermScorer that is > positioned on the correct document, but which is part of a clause that > doesn’t actually match. It also only works for term queries, so it won’t > match phrases or span/interval groups. And Matches will work on points or > docvalues queries as well. The reason I added Matches in the first place was > precisely to handle these weird corner cases - I had written highlighters > which more or less did the same thing you describe with a Collector and the > Scorable tree, and I would occasionally get bad highlights back. > > On 27 Jun 2022, at 10:51, Shai Erera <ser...@gmail.com> wrote: > > Out of curiosity and for education purposes, is the Collector approach I > proposed wrong/inefficient? Or less efficient than the matches() API? > > I'm thinking, if you want to both match/rank documents and as a side effect > know which fields matched, the Collector will perform better than > Weight.matches(), but I could be wrong. > > Shai > > On Mon, Jun 27, 2022 at 11:57 AM Dawid Weiss <dawid.we...@gmail.com> wrote: >> >> The matches API is awesome. Use it. You can also get a rough glimpse >> into a superset of fields potentially matching the query via: >> >> query.visit( >> new QueryVisitor() { >> @Override >> public boolean acceptField(String field) { >> affectedFields.add(field); >> return false; >> } >> }); >> >> https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/Query.html#visit(org.apache.lucene.search.QueryVisitor) >> >> I'd go with the Matches API though. >> >> Dawid >> >> On Mon, Jun 27, 2022 at 10:48 AM Alan Woodward <romseyg...@gmail.com> wrote: >> > >> > The Matches API will give you this information - it’s still likely to be >> > fairly slow, but it’s a lot easier to use than trying to parse Explain >> > output. >> > >> > Query q = ….; >> > Weight w = searcher.createWeight(searcher.rewrite(query), >> > ScoreMode.COMPLETE_NO_SCORES, 1.0f); >> > >> > Matches m = w.matches(context, doc); >> > List<String> matchingFields = new ArrayList(); >> > for (String field : m) { >> > matchingFields.add(field); >> > } >> > >> > Bear in mind that `matches` doesn’t maintain any state between calls, so >> > calling it for every matching document is likely to be slow; for those >> > cases Shai’s suggestion of using a Collector and examining low-level >> > scorers will perform better, but it won’t work for every query type. >> > >> > >> > > On 25 Jun 2022, at 04:14, Yichen Sun <yiche...@bu.edu> wrote: >> > > >> > > Hello! >> > > >> > > I’m a MSCS student from BU and learning to use Lucene. Recently I try to >> > > output matched fields by one query. For example, for one document, there >> > > are 10 fields and 2 of them match the query. I want to get the name of >> > > these fields. >> > > >> > > I have tried using explain() method and getting description then regex. >> > > However it cost so much time. >> > > >> > > I wonder what is the efficient way to get the matched fields. Would you >> > > please offer some help? Thank you so much! >> > > >> > > Best regards, >> > > Yichen Sun >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org