Hi Yichen,

I think you can implement a custom Collector which tracks the fields that
were matched for each Scorer. I implemented an example such Collector below:

public class FieldMatchingCollector implements Collector {

  /** Holds the number of matching documents for each field. */
  public final Map<String, Integer> matchingFieldCounts = new HashMap<>();

  /** Holds which fields were matched for each document. */
  public final Map<Integer, Set<String>> docMatchingFields = new
HashMap<>();

  private final Set<Scorer> termScorers = new HashSet<>();

  @Override
  public ScoreMode scoreMode() {
    return ScoreMode.COMPLETE_NO_SCORES;
  }

  @Override
  public LeafCollector getLeafCollector(LeafReaderContext context) {
    final int docBase = context.docBase;
    return new LeafCollector() {

      @Override
      public void setScorer(Scorable scorer) throws IOException {
        termScorers.clear();
        getSubTermScorers(scorer, termScorers);
      }

      @Override
      public void collect(int doc) {
        int basedDoc = doc + docBase;
        for (Scorer scorer : termScorers) {
          if (doc == scorer.docID()) {
            // We know that we're dealing w/ TermScorers
            String matchingField = ((TermQuery)
scorer.getWeight().getQuery()).getTerm().field();
            docMatchingFields.computeIfAbsent(basedDoc, d -> new
HashSet<>()).add(matchingField);
            matchingFieldCounts.merge(matchingField, 1, Integer::sum);
          }
        }
      }
    };
  }

  private void getSubTermScorers(Scorable scorer, Set<Scorer> set) throws
IOException {
    if (scorer instanceof TermScorer) {
      set.add((Scorer) scorer);
    } else {
      for (Scorable.ChildScorable child : scorer.getChildren()) {
        getSubTermScorers(child.child, set);
      }
    }
  }
}

This is of course an example implementation and you can optimize it to
match your needs (e.g. if you're only interested in the set of matching fields
you can change "matchingFieldCounts" to a Set<String>). Note that
"docMatchingFields"
is expensive, I've only included it as an example (and for debugging
purposes), but I recommend omitting it in a real application.

To use it you can do something like:

// Need to use this searcher to guarantee the bulk scorer API isn't used.
IndexSearcher searcher = new ScorerIndexSearcher(reader);

// Parse the query to match against a list of searchable fields
QueryParser qp = new MultiFieldQueryParser(FIELDS_TO_SEARCH_ON, new
StandardAnalyzer());
Query query = qp.parse(queryText);

// Collect the matching fields
FieldMatchingCollector fieldMatchingCollector = new
FieldMatchingCollector();
// If needed, collect the top matching documents too
TopScoreDocCollector topScoreDocCollector = TopScoreDocCollector.create(10,
Integer.MAX_VALUE);
searcher.search(query, MultiCollector.wrap(topScoreDocCollector,
fieldMatchingCollector));

System.out.println("matchingFieldCounts = " +
fieldMatchingCollector.matchingFieldCounts);
System.out.println("docMatchingFields = " +
fieldMatchingCollector.docMatchingFields);
System.out.println("totalHits = " + topScoreDocCollector.getTotalHits());

Hope this helps!

Shai

On Sat, Jun 25, 2022 at 7:58 AM Yichen Sun <yiche...@bu.edu> wrote:

> Hello!
>
> I’m a MSCS student from BU and learning to use Lucene. Recently I try to
> output matched fields by one query. For example, for one document, there
> are 10 fields and 2 of them match the query. I want to get the name of
> these fields.
>
> I have tried using explain() method and getting description then regex.
> However it cost so much time.
>
> I wonder what is the efficient way to get the matched fields. Would you
> please offer some help? Thank you so much!
>
> Best regards,
> Yichen Sun
>

Reply via email to