jpountz commented on code in PR #13036:
URL: https://github.com/apache/lucene/pull/13036#discussion_r1473219731


##########
lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java:
##########
@@ -962,6 +962,46 @@ public void testDisjunctionMatchesCount() throws 
IOException {
     dir.close();
   }
 
+  public void testTwoClauseTermDisjunctionCountOptimization() throws Exception 
{
+    List<String[]> docContent =
+        Arrays.asList(
+            new String[] {"A", "B"},
+            new String[] {"A"},
+            new String[] {},
+            new String[] {"A", "B", "C"},
+            new String[] {"B"},
+            new String[] {"B", "C"});
+
+    try (Directory dir = newDirectory()) {
+      try (IndexWriter w =
+          new IndexWriter(dir, 
newIndexWriterConfig().setMergePolicy(newLogMergePolicy()))) {
+
+        for (String[] values : docContent) {
+          Document doc = new Document();
+          for (String value : values) {
+            doc.add(new StringField("foo", value, Field.Store.NO));
+          }
+          w.addDocument(doc);
+        }
+        w.forceMerge(1);
+      }
+
+      try (IndexReader reader = DirectoryReader.open(dir)) {
+        IndexSearcher searcher = newSearcher(reader);
+
+        Query query =
+            new BooleanQuery.Builder()
+                .add(new TermQuery(new Term("foo", "A")), 
BooleanClause.Occur.SHOULD)
+                .add(new TermQuery(new Term("foo", "B")), 
BooleanClause.Occur.SHOULD)
+                .setMinimumNumberShouldMatch(1)
+                .build();
+
+        int count = searcher.count(query);
+        assertEquals(5, count);

Review Comment:
   Can you also test corner cases, e.g. one clause has no matches, or both 
clauses have no matches?
   
   I'm not sure how to do it, but it would also be nice to double check we're 
indeed applying your optimization, and that the heuristic is not disabling the 
opto.



##########
lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java:
##########
@@ -179,6 +180,36 @@ boolean isPureDisjunction() {
     return clauses.size() == getClauses(Occur.SHOULD).size() && 
minimumNumberShouldMatch <= 1;
   }
 
+  /** Whether this query is a two clause disjunction with two term query 
clauses. */
+  boolean isTwoClausePureDisjunctionWithTerms() {
+    return clauses.size() == 2
+        && isPureDisjunction()
+        && clauses.get(0).getQuery() instanceof TermQuery
+        && clauses.get(1).getQuery() instanceof TermQuery;
+  }
+
+  /**
+   * Rewrite a single two clause disjunction query with terms to two term 
queries and a conjunction
+   * query using the inclusion–exclusion principle.
+   */
+  Query[] rewriteTwoClauseDisjunctionWithTermsForCount(IndexSearcher 
indexSearcher)
+      throws IOException {
+    BooleanQuery.Builder newQuery = new BooleanQuery.Builder();
+    Query[] queries = new Query[3];
+    for (int i = 0; i < clauses.size(); i++) {
+      TermQuery termQuery = (TermQuery) clauses.get(i).getQuery();
+      if (termQuery.getTermStates() == null) {
+        termQuery =
+            new TermQuery(
+                termQuery.getTerm(), TermStates.build(indexSearcher, 
termQuery.getTerm(), true));

Review Comment:
   We don't need term statistics here, just a cache for term states.
   
   ```suggestion
                   termQuery.getTerm(), TermStates.build(indexSearcher, 
termQuery.getTerm(), false));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to