For a quick hack, you can use highlighting. That does more than you want, 
showing which words match, but it does have the info. 

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 27, 2022, at 3:23 AM, Shai Erera <ser...@gmail.com> wrote:
> 
> Thanks Uwe, I didn't know about named queries, but it seems useful. Is there 
> interest in getting similar functionality in Lucene, or perhaps just the 
> FieldMatching collector? I'd be happy to PR-it.
> 
> As for usecase, I was thinking of using something similar to this collector 
> for some kind of (simple) entity recognition task. If you have a corpus of 
> documents with many fields which denote product attributes, you could match a 
> word like "Red" to the various product attribute fields and determine based 
> on the matching fields + their doc count whether this word likely represents 
> a Color or Brand entity (hint: it matches both, the question is which is more 
> probable).
> 
> I'm sure there are other ways to achieve this, and probably much smarter NER 
> implementations, but this one is at least based on the actual data that you 
> index which guarantees something about the results you will receive if 
> applying a certain attribute filtering.
> 
> Shai
> 
> On Mon, Jun 27, 2022 at 1:01 PM Uwe Schindler <u...@thetaphi.de 
> <mailto:u...@thetaphi.de>> wrote:
> I think the collector approach is perfectly fine for mass-processing of 
> queries.
> 
> By the way: Elasticserach/Opensearch have a feature already built-in and it 
> is working based on collector API in a similar way like you mentioned (as far 
> as I remember). It is a bit different as you can tag any clause in a BQ (so 
> every query) using a "name" (they call it "named query", 
> https://www.elastic.co/guide/en/elasticsearch/reference/8.2/query-dsl-bool-query.html#named-queries
>  
> <https://www.elastic.co/guide/en/elasticsearch/reference/8.2/query-dsl-bool-query.html#named-queries>).
>  When you get the search results, for each hit it tells you which named 
> queries were a match on the hit. The actual implementation is some wrapper 
> query on each of those clauses that contains the name. In hit collection it 
> just collects all named query instances found in query tree. I think their 
> implementation somehow the wrapper query scorer impl adds the name to some 
> global state.
> 
> Uwe
> 
> Am 27.06.2022 um 11:51 schrieb Shai Erera:
>> Out of curiosity and for education purposes, is the Collector approach I 
>> proposed wrong/inefficient? Or less efficient than the matches() API?
>> 
>> I'm thinking, if you want to both match/rank documents and as a side effect 
>> know which fields matched, the Collector will perform better than 
>> Weight.matches(), but I could be wrong.
>> 
>> Shai
>> 
>> On Mon, Jun 27, 2022 at 11:57 AM Dawid Weiss <dawid.we...@gmail.com 
>> <mailto:dawid.we...@gmail.com>> wrote:
>> The matches API is awesome. Use it. You can also get a rough glimpse
>> into a superset of fields potentially matching the query via:
>> 
>>     query.visit(
>>         new QueryVisitor() {
>>           @Override
>>           public boolean acceptField(String field) {
>>             affectedFields.add(field);
>>             return false;
>>           }
>>         });
>> 
>> https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/Query.html#visit(org.apache.lucene.search.QueryVisitor)
>>  
>> <https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/Query.html#visit(org.apache.lucene.search.QueryVisitor)>
>> 
>> I'd go with the Matches API though.
>> 
>> Dawid
>> 
>> On Mon, Jun 27, 2022 at 10:48 AM Alan Woodward <romseyg...@gmail.com 
>> <mailto:romseyg...@gmail.com>> wrote:
>> >
>> > The Matches API will give you this information - it’s still likely to be 
>> > fairly slow, but it’s a lot easier to use than trying to parse Explain 
>> > output.
>> >
>> > Query q = ….;
>> > Weight w = searcher.createWeight(searcher.rewrite(query), 
>> > ScoreMode.COMPLETE_NO_SCORES, 1.0f);
>> >
>> > Matches m = w.matches(context, doc);
>> > List<String> matchingFields = new ArrayList();
>> > for (String field : m) {
>> >  matchingFields.add(field);
>> > }
>> >
>> > Bear in mind that `matches` doesn’t maintain any state between calls, so 
>> > calling it for every matching document is likely to be slow; for those 
>> > cases Shai’s suggestion of using a Collector and examining low-level 
>> > scorers will perform better, but it won’t work for every query type.
>> >
>> >
>> > > On 25 Jun 2022, at 04:14, Yichen Sun <yiche...@bu.edu 
>> > > <mailto:yiche...@bu.edu>> wrote:
>> > >
>> > > Hello!
>> > >
>> > > I’m a MSCS student from BU and learning to use Lucene. Recently I try to 
>> > > output matched fields by one query. For example, for one document, there 
>> > > are 10 fields and 2 of them match the query. I want to get the name of 
>> > > these fields.
>> > >
>> > > I have tried using explain() method and getting description then regex. 
>> > > However it cost so much time.
>> > >
>> > > I wonder what is the efficient way to get the matched fields. Would you 
>> > > please offer some help? Thank you so much!
>> > >
>> > > Best regards,
>> > > Yichen Sun
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
>> > <mailto:dev-unsubscr...@lucene.apache.org>
>> > For additional commands, e-mail: dev-h...@lucene.apache.org 
>> > <mailto:dev-h...@lucene.apache.org>
>> >
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
>> <mailto:dev-unsubscr...@lucene.apache.org>
>> For additional commands, e-mail: dev-h...@lucene.apache.org 
>> <mailto:dev-h...@lucene.apache.org>
>> 
> -- 
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de <https://www.thetaphi.de/>
> eMail: u...@thetaphi.de <mailto:u...@thetaphi.de>

Reply via email to