For a quick hack, you can use highlighting. That does more than you want, showing which words match, but it does have the info.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 27, 2022, at 3:23 AM, Shai Erera <ser...@gmail.com> wrote: > > Thanks Uwe, I didn't know about named queries, but it seems useful. Is there > interest in getting similar functionality in Lucene, or perhaps just the > FieldMatching collector? I'd be happy to PR-it. > > As for usecase, I was thinking of using something similar to this collector > for some kind of (simple) entity recognition task. If you have a corpus of > documents with many fields which denote product attributes, you could match a > word like "Red" to the various product attribute fields and determine based > on the matching fields + their doc count whether this word likely represents > a Color or Brand entity (hint: it matches both, the question is which is more > probable). > > I'm sure there are other ways to achieve this, and probably much smarter NER > implementations, but this one is at least based on the actual data that you > index which guarantees something about the results you will receive if > applying a certain attribute filtering. > > Shai > > On Mon, Jun 27, 2022 at 1:01 PM Uwe Schindler <u...@thetaphi.de > <mailto:u...@thetaphi.de>> wrote: > I think the collector approach is perfectly fine for mass-processing of > queries. > > By the way: Elasticserach/Opensearch have a feature already built-in and it > is working based on collector API in a similar way like you mentioned (as far > as I remember). It is a bit different as you can tag any clause in a BQ (so > every query) using a "name" (they call it "named query", > https://www.elastic.co/guide/en/elasticsearch/reference/8.2/query-dsl-bool-query.html#named-queries > > <https://www.elastic.co/guide/en/elasticsearch/reference/8.2/query-dsl-bool-query.html#named-queries>). > When you get the search results, for each hit it tells you which named > queries were a match on the hit. The actual implementation is some wrapper > query on each of those clauses that contains the name. In hit collection it > just collects all named query instances found in query tree. I think their > implementation somehow the wrapper query scorer impl adds the name to some > global state. > > Uwe > > Am 27.06.2022 um 11:51 schrieb Shai Erera: >> Out of curiosity and for education purposes, is the Collector approach I >> proposed wrong/inefficient? Or less efficient than the matches() API? >> >> I'm thinking, if you want to both match/rank documents and as a side effect >> know which fields matched, the Collector will perform better than >> Weight.matches(), but I could be wrong. >> >> Shai >> >> On Mon, Jun 27, 2022 at 11:57 AM Dawid Weiss <dawid.we...@gmail.com >> <mailto:dawid.we...@gmail.com>> wrote: >> The matches API is awesome. Use it. You can also get a rough glimpse >> into a superset of fields potentially matching the query via: >> >> query.visit( >> new QueryVisitor() { >> @Override >> public boolean acceptField(String field) { >> affectedFields.add(field); >> return false; >> } >> }); >> >> https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/Query.html#visit(org.apache.lucene.search.QueryVisitor) >> >> <https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/Query.html#visit(org.apache.lucene.search.QueryVisitor)> >> >> I'd go with the Matches API though. >> >> Dawid >> >> On Mon, Jun 27, 2022 at 10:48 AM Alan Woodward <romseyg...@gmail.com >> <mailto:romseyg...@gmail.com>> wrote: >> > >> > The Matches API will give you this information - it’s still likely to be >> > fairly slow, but it’s a lot easier to use than trying to parse Explain >> > output. >> > >> > Query q = ….; >> > Weight w = searcher.createWeight(searcher.rewrite(query), >> > ScoreMode.COMPLETE_NO_SCORES, 1.0f); >> > >> > Matches m = w.matches(context, doc); >> > List<String> matchingFields = new ArrayList(); >> > for (String field : m) { >> > matchingFields.add(field); >> > } >> > >> > Bear in mind that `matches` doesn’t maintain any state between calls, so >> > calling it for every matching document is likely to be slow; for those >> > cases Shai’s suggestion of using a Collector and examining low-level >> > scorers will perform better, but it won’t work for every query type. >> > >> > >> > > On 25 Jun 2022, at 04:14, Yichen Sun <yiche...@bu.edu >> > > <mailto:yiche...@bu.edu>> wrote: >> > > >> > > Hello! >> > > >> > > I’m a MSCS student from BU and learning to use Lucene. Recently I try to >> > > output matched fields by one query. For example, for one document, there >> > > are 10 fields and 2 of them match the query. I want to get the name of >> > > these fields. >> > > >> > > I have tried using explain() method and getting description then regex. >> > > However it cost so much time. >> > > >> > > I wonder what is the efficient way to get the matched fields. Would you >> > > please offer some help? Thank you so much! >> > > >> > > Best regards, >> > > Yichen Sun >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > <mailto:dev-unsubscr...@lucene.apache.org> >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> > <mailto:dev-h...@lucene.apache.org> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> <mailto:dev-unsubscr...@lucene.apache.org> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> <mailto:dev-h...@lucene.apache.org> >> > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de <https://www.thetaphi.de/> > eMail: u...@thetaphi.de <mailto:u...@thetaphi.de>