[ 
https://issues.apache.org/jira/browse/LUCENE-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558107#comment-16558107
 ] 

Adrien Grand commented on LUCENE-8430:
--------------------------------------

Here is a proposal which replaces TopDocs.totalHits with a new TotalHits object 
that is implemented like this:

{code}
/**
 * Description of the total number of hits of a query. The total hit count
 * can't generally be computed accurately without visiting all matches, which
 * is costly for queries that match lots of documents. Given that it is often
 * enough to have a lower bounds of the number of hits, such as
 * "there are more than 1000 hits", Lucene has options to stop counting as soon
 * as a threshold has been reached in order to improve query times.
 */
public final class TotalHits {

  /** How the {@link TotalHits#value} should be interpreted. */
  public enum Relation {
    /**
     * The total hit count is equal to {@link TotalHits#value}.
     */
    EQUAL_TO,
    /**
     * The total hit count is greater than or equal to {@link TotalHits#value}.
     */
    GREATER_THAN_OR_EQUAL_TO
  }

  /**
   * The value of the total hit count. Must be interpreted in the context of
   * {@link #relation}.
   */
  public final long value;

  /**
   * Whether {@link #value} is the exact hit count, in which case
   * {@link #relation} is equal to {@link Relation#EQUAL_TO}, or a lower bound
   * of the total hit count, in which case {@link #relation} is equal to
   * {@link Relation#GREATER_THAN_OR_EQUAL_TO}.
   */
  public final Relation relation;

  /** Sole constructor. */
  public TotalHits(long value, Relation relation) {
    if (value < 0) {
      throw new IllegalArgumentException("value must be >= 0, got " + value);
    }
    this.value = value;
    this.relation = Objects.requireNonNull(relation);
  }

  @Override
  public String toString() {
    return value + (relation == Relation.EQUAL_TO ? "" : "+") + " hits";
  }

}
{code}

Also TopScoreDocCollector and TopFieldCollector have been changed to disable 
the extrapolation of the hit count based on the number of hits that were 
collected exactly, and instead return the number of collected hits as a hit 
count, and GREATER_THAN_OR_EQUAL_TO as a relation. TopDocs#merge makes sure to 
return GREATER_THAN_OR_EQUAL_TO as a relation if any of the merged TopDocs 
instance has a hit count that is a lower bound too. All other changes are just 
about fixing compilation.

This way, whether the hit count is accurate or not is explicit, and users won't 
fall into the trap of assuming a hit count is accurate when it is not when they 
upgrade to Lucene 8.

> TopDocs.totalHits is not always the accurate hit count
> ------------------------------------------------------
>
>                 Key: LUCENE-8430
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8430
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8430.patch
>
>
> Sub task of LUCENE-8060. We should change TopDocs.totalHits so that users get 
> a compilation error, and the new field or documentation should make it clear 
> that this number is not always the accurate hit count, which is important if 
> we want to enable index sorting / WAND / impacts -related optimizations by 
> default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to