[GitHub] [lucene-solr] jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-29 Thread GitBox
jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD 
reader by exploiting cardinality information stored on leaves
URL: https://github.com/apache/lucene-solr/pull/746#discussion_r298797885
 
 

 ##
 File path: lucene/core/src/test/org/apache/lucene/util/bkd/TestBKD.java
 ##
 @@ -844,7 +845,28 @@ public void visit(int docID, byte[] packedValue) {
   hits.set(docID);
 }
 
-@Override
+  @Override
+  public void visit(DocIdSetIterator iterator, byte[] packedValue) 
throws IOException {
+  if (random().nextBoolean()) {
+//check the default method is correct
+IntersectVisitor.super.visit(iterator, packedValue);
+  } else {
+assertTrue(iterator.docID() == -1);
+int cost = (int) iterator.cost();
+int numberOfPoints = 0;
+int docID;
+while ((docID = iterator.nextDoc()) != 
DocIdSetIterator.NO_MORE_DOCS) {
+  visit(docID, packedValue);
 
 Review comment:
   let's assert on the value of docID() here as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-29 Thread GitBox
jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD 
reader by exploiting cardinality information stored on leaves
URL: https://github.com/apache/lucene-solr/pull/746#discussion_r298797828
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java
 ##
 @@ -780,4 +779,51 @@ public int getDocCount() {
   public boolean isLeafNode(int nodeID) {
 return nodeID >= leafNodeOffset;
   }
+
+  protected static class BKDReaderDocIDSetIterator extends DocIdSetIterator {
+
+private int idx;
+private int length;
+private int offset;
+private int docID;
+int[] docIDs;
 
 Review comment:
   make it final?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-29 Thread GitBox
jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD 
reader by exploiting cardinality information stored on leaves
URL: https://github.com/apache/lucene-solr/pull/746#discussion_r298797880
 
 

 ##
 File path: lucene/core/src/test/org/apache/lucene/util/bkd/TestBKD.java
 ##
 @@ -844,7 +845,28 @@ public void visit(int docID, byte[] packedValue) {
   hits.set(docID);
 }
 
-@Override
+  @Override
+  public void visit(DocIdSetIterator iterator, byte[] packedValue) 
throws IOException {
+  if (random().nextBoolean()) {
+//check the default method is correct
+IntersectVisitor.super.visit(iterator, packedValue);
+  } else {
+assertTrue(iterator.docID() == -1);
+int cost = (int) iterator.cost();
+int numberOfPoints = 0;
+int docID;
+while ((docID = iterator.nextDoc()) != 
DocIdSetIterator.NO_MORE_DOCS) {
+  visit(docID, packedValue);
+  numberOfPoints++;
+}
+assertTrue(cost == numberOfPoints);
+assertTrue(iterator.docID() == DocIdSetIterator.NO_MORE_DOCS);
+assertTrue(iterator.nextDoc() == 
DocIdSetIterator.NO_MORE_DOCS);
+assertTrue(iterator.docID() == DocIdSetIterator.NO_MORE_DOCS);
 
 Review comment:
   can you use assertEquals here as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-29 Thread GitBox
jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD 
reader by exploiting cardinality information stored on leaves
URL: https://github.com/apache/lucene-solr/pull/746#discussion_r298797876
 
 

 ##
 File path: lucene/core/src/test/org/apache/lucene/util/bkd/TestBKD.java
 ##
 @@ -844,7 +845,28 @@ public void visit(int docID, byte[] packedValue) {
   hits.set(docID);
 }
 
-@Override
+  @Override
+  public void visit(DocIdSetIterator iterator, byte[] packedValue) 
throws IOException {
+  if (random().nextBoolean()) {
+//check the default method is correct
+IntersectVisitor.super.visit(iterator, packedValue);
+  } else {
+assertTrue(iterator.docID() == -1);
+int cost = (int) iterator.cost();
 
 Review comment:
   being paranoid: can you use Math#toIntExact?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-29 Thread GitBox
jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD 
reader by exploiting cardinality information stored on leaves
URL: https://github.com/apache/lucene-solr/pull/746#discussion_r298797868
 
 

 ##
 File path: lucene/core/src/test/org/apache/lucene/util/bkd/TestBKD.java
 ##
 @@ -844,7 +845,28 @@ public void visit(int docID, byte[] packedValue) {
   hits.set(docID);
 }
 
-@Override
+  @Override
+  public void visit(DocIdSetIterator iterator, byte[] packedValue) 
throws IOException {
+  if (random().nextBoolean()) {
+//check the default method is correct
+IntersectVisitor.super.visit(iterator, packedValue);
+  } else {
+assertTrue(iterator.docID() == -1);
 
 Review comment:
   Can you use assertEquals so that we get the current doc ID according to the 
iterator in case of failure?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-28 Thread GitBox
jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD 
reader by exploiting cardinality information stored on leaves
URL: https://github.com/apache/lucene-solr/pull/746#discussion_r298482772
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java
 ##
 @@ -780,4 +783,43 @@ public int getDocCount() {
   public boolean isLeafNode(int nodeID) {
 return nodeID >= leafNodeOffset;
   }
+
+  protected static class BKDReaderDocIDSetIterator extends DocIdSetIterator {
+
+int idx;
+int length;
+int offset;
+int[] docIds;
+
+public BKDReaderDocIDSetIterator(int maxPointsInLeafNode) {
+  this.docIds = new int[maxPointsInLeafNode];
+}
+
+@Override
+public int docID() {
+  if (idx == -1)  {
+return -1;
+  }
+  return docIds[offset + idx];
+}
+
+@Override
+public int nextDoc() throws IOException {
+  if (idx == length - 1) {
+return DocIdSetIterator.NO_MORE_DOCS;
+  }
+  idx++;
+  return docIds[offset + idx];
+}
+
+@Override
+public int advance(int target) throws IOException {
+  return slowAdvance(target);
+}
+
+@Override
+public long cost() {
+  return length;
+}
+  }
 
 Review comment:
   can you add tests for it? e.g. making sure `docID()` actually returns -1 
when unpositioned and NO_MORE_DOCS when exhausted (I think the latter is not 
correct at the moment)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-27 Thread GitBox
jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD 
reader by exploiting cardinality information stored on leaves
URL: https://github.com/apache/lucene-solr/pull/746#discussion_r298139661
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/PointValues.java
 ##
 @@ -208,6 +208,15 @@ protected PointValues() {
  *  docID order. */
 void visit(int docID, byte[] packedValue) throws IOException;
 
+/** Similar to {@link IntersectVisitor#visit(int, byte[])} but in this 
case the packedValue
+ * can have more than one docID associated to it. */
 
 Review comment:
   Can you document that the iterator should not escape the scope of this 
method so that implementations of PointValues are free to reuse the iterator?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-27 Thread GitBox
jpountz commented on a change in pull request #746: LUCENE-8885: Optimise BKD 
reader by exploiting cardinality information stored on leaves
URL: https://github.com/apache/lucene-solr/pull/746#discussion_r298140894
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java
 ##
 @@ -780,4 +783,43 @@ public int getDocCount() {
   public boolean isLeafNode(int nodeID) {
 return nodeID >= leafNodeOffset;
   }
+
+  protected static class BKDReaderDocIDSetIterator extends DocIdSetIterator {
+
+int idx;
+int length;
+int offset;
+int[] docIds;
+
+public BKDReaderDocIDSetIterator(int maxPointsInLeafNode) {
+  this.docIds = new int[maxPointsInLeafNode];
+}
+
+@Override
+public int docID() {
+  if (idx == -1)  {
+return -1;
+  }
+  return docIds[offset + idx];
+}
+
+@Override
+public int nextDoc() throws IOException {
+  if (idx == length - 1) {
+return DocIdSetIterator.NO_MORE_DOCS;
+  }
+  idx++;
+  return docIds[offset + idx];
+}
+
+@Override
+public int advance(int target) throws IOException {
+  throw new UnsupportedOperationException();
 
 Review comment:
   I think it'd be better to support it, even if slow. Eg. you can return 
`slowAdvance(target)`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org