jpountz commented on a change in pull request #731:
URL: https://github.com/apache/lucene/pull/731#discussion_r837397382



##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,76 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * Merges the overlapping ranges and returns unconnected ranges by calling 
{@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, 
bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };

Review comment:
       I don't like that the rewritten query would have a reference to the 
un-rewritten query. I like this new rewrite rule but it raises a couple 
challenges, maybe tackle it in a separate JIRA? My current thinking is that the 
easier way to make it work would be to make the query implement `Clonable` and 
make `rewrite` change the ranges wrapped by a clone?

##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,60 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    List<RangeClause> mergedRanges = mergeOverlappingRanges();
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  private List<RangeClause> mergeOverlappingRanges() {
+    if (numDims != 1 || rangeClauses.size() <= 1) {
+      return rangeClauses;
+    }
+    List<RangeClause> originRangeClause = new ArrayList<>(rangeClauses);
+    final ArrayUtil.ByteArrayComparator comparator = 
ArrayUtil.getUnsignedComparator(bytesPerDim);
+    originRangeClause.sort(
+        new Comparator<RangeClause>() {
+          @Override
+          public int compare(RangeClause o1, RangeClause o2) {
+            int result = comparator.compare(o1.lowerValue, 0, o2.lowerValue, 
0);
+            if (result == 0) {
+              return comparator.compare(o1.upperValue, 0, o2.upperValue, 0);
+            } else {
+              return result;
+            }
+          }
+        });
+    List<RangeClause> finalRangeClause = new ArrayList<>();
+    RangeClause current = originRangeClause.get(0);
+    for (int i = 1; i < originRangeClause.size(); i++) {
+      RangeClause nextClause = originRangeClause.get(i);
+      if (comparator.compare(nextClause.lowerValue, 0, current.upperValue, 0) 
> 0) {
+        finalRangeClause.add(current);
+        current = nextClause;
+      } else {
+        if (comparator.compare(nextClause.upperValue, 0, current.upperValue, 
0) > 0) {
+          current = new RangeClause(current.lowerValue, nextClause.upperValue);
+        }
+      }
+    }
+    finalRangeClause.add(current);
+    if (finalRangeClause.size() != rangeClauses.size()) {

Review comment:
       Actually this is not about saving an extra object allocation, this is 
because the contract is that `Query#rewrite` should be called until the query 
returns its own instance. So if we returned a different query instance all the 
time, we'd get an infinite loop.

##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -163,6 +165,69 @@ public void visit(QueryVisitor visitor) {
     }
   }
 
+  /**
+   * merge its overlapping ranges and return a simpler but slightly different 
form by calling {@link
+   * #mergeOverlappingRanges}
+   */
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+    if (numDims != 1) {
+      return this;
+    }
+    List<RangeClause> mergedRanges = mergeOverlappingRanges(rangeClauses, 
bytesPerDim);
+    if (mergedRanges != rangeClauses) {
+      return new MultiRangeQuery(field, numDims, bytesPerDim, mergedRanges) {
+        @Override
+        protected String toString(int dimension, byte[] value) {
+          return MultiRangeQuery.this.toString(dimension, value);
+        }
+      };
+    } else {
+      return this;
+    }
+  }
+
+  /** merge overlapping ranges to some unconnected ranges */
+  public static List<RangeClause> mergeOverlappingRanges(

Review comment:
       Does it need to be public?

##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/MultiRangeQuery.java
##########
@@ -314,6 +386,32 @@ public Scorer scorer(LeafReaderContext context) throws 
IOException {
       public boolean isCacheable(LeafReaderContext ctx) {
         return true;
       }
+
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        if (numDims != 1 || context.reader().hasDeletions() == true) {
+          return super.count(context);
+        }
+        List<RangeClause> mergeRangeClause = 
mergeOverlappingRanges(rangeClauses, bytesPerDim);
+        int total = 0;
+        for (RangeClause rangeClause : mergeRangeClause) {
+          PointRangeQuery pointRangeQuery =
+              new PointRangeQuery(field, rangeClause.lowerValue, 
rangeClause.upperValue, numDims) {
+                @Override
+                protected String toString(int dimension, byte[] value) {
+                  return MultiRangeQuery.this.toString(dimension, value);
+                }
+              };
+          int count = pointRangeQuery.createWeight(searcher, scoreMode, 
boost).count(context);

Review comment:
       Summing up counts across ranges is only correct on single-valued fields. 
We need to check that `pointValues.size() == pointValues.docCount()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to