start + end for TermDocs.read()

Yonik Seeley Tue, 12 Jun 2007 19:07:45 -0700

If I know the docfreq of a term is 1000, I'd like to be able to
allocate an int[1000]
and grab all the ids via TermDocs.read().  But because there is no
offset, MultiTermDocs returns a list from each sub-segment, forcing me
to copy each partial int[] filled into my full int[].


If one could specify a start/end into the array, this copying could be
avoided (see partial and untested patch below).

My feeling is that this is probably too specialized to warrant adding
an additional method on the interface, so I won't open a JIRA for it.
I brought it up in case anyone cared
to argue otherwise.

-Yonik


Index: src/java/org/apache/lucene/index/MultiReader.java
===================================================================
--- src/java/org/apache/lucene/index/MultiReader.java   (revision 545184)
+++ src/java/org/apache/lucene/index/MultiReader.java   (working copy)
@@ -384,27 +384,33 @@
    }
  }

+  public int read(final int[] docs, final int[] freqs) throws IOException {
+    return read(docs, freqs, 0, docs.length);
+  }
+
  /** Optimized implementation. */
-  public int read(final int[] docs, final int[] freqs) throws IOException {
-    while (true) {
+  public int read(final int[] docs, final int[] freqs, int start, int
end) throws IOException {
+    while (start < end) {
      while (current == null) {
        if (pointer < readers.length) {      // try next segment
          base = starts[pointer];
          current = termDocs(pointer++);
        } else {
-          return 0;
+          return start;
        }
      }
-      int end = current.read(docs, freqs);
-      if (end == 0) {          // none left in segment
+      int newStart = current.read(docs, freqs, start, end);
+      if (newStart == start) {
        current = null;
      } else {            // got some
        final int b = base;        // adjust doc numbers
-        for (int i = 0; i < end; i++)
+        for (int i = start; i < newStart; i++) {
         docs[i] += b;
-        return end;
+        }
+        start = newStart;
      }
    }
+    return start;
  }

 /* A Possible future optimization could skip entire segments */
Index: src/java/org/apache/lucene/index/SegmentTermDocs.java
===================================================================
--- src/java/org/apache/lucene/index/SegmentTermDocs.java       (revision 
545184)
+++ src/java/org/apache/lucene/index/SegmentTermDocs.java       (working copy)
@@ -122,11 +122,15 @@
    return true;
  }

+  public int read(final int[] docs, final int[] freqs) throws IOException {
+    return read(docs, freqs, 0, docs.length);
+  }
+
  /** Optimized implementation. */
-  public int read(final int[] docs, final int[] freqs)
+  public int read(final int[] docs, final int[] freqs, int start, int end)
          throws IOException {
-    final int length = docs.length;
-    int i = 0;
+    final int length = end;
+    int i = start;
    while (i < length && count < df) {

      // manually inlined call to next() for speed
Index: src/java/org/apache/lucene/index/TermDocs.java
===================================================================
--- src/java/org/apache/lucene/index/TermDocs.java      (revision 545184)
+++ src/java/org/apache/lucene/index/TermDocs.java      (working copy)
@@ -60,6 +60,8 @@
   * stream has been exhausted.  */
  int read(int[] docs, int[] freqs) throws IOException;

+  int read(int[] docs, int[] freqs, int start, int end) throws IOException;
+
  /** Skips entries to the first beyond the current whose document number is
   * greater than or equal to <i>target</i>. <p>Returns true iff there is such
   * an entry.  <p>Behaves as if written: <pre>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

start + end for TermDocs.read()

Reply via email to