On Wed, Oct 28, 2009 at 10:58 AM, Peter Keegan <peterlkee...@gmail.com> wrote:
> The only change I made to the source code was the patch for PayloadNearQuery
> (LUCENE-1986).

That patch certainly shouldn't lead to this.

> It's possible that our content contains U+FFFF. I will run in debugger and
> see.

OK may as well check just so we cover all possibilities.

> The data is 'sensitive', so I may not be able to provide a bad segment,
> unfortunately.

OK, maybe we can modify your CheckIndex instead.  Let's start with
this, which prints a warning whenever the docFreq differs but
otherwise continues (vs throwing RuntimeException).  I'm curious how
many terms show this, and whether the TermEnum keeps working after
this term that has different docFreq:

Index: src/java/org/apache/lucene/index/CheckIndex.java
===================================================================
--- src/java/org/apache/lucene/index/CheckIndex.java    (revision 829889)
+++ src/java/org/apache/lucene/index/CheckIndex.java    (working copy)
@@ -672,8 +672,8 @@
         }

         if (freq0 + delCount != docFreq) {
-          throw new RuntimeException("term " + term + " docFreq=" +
-                                     docFreq + " != num docs seen " +
freq0 + " + num docs deleted " + delCount);
+          System.out.println("WARNING: term  " + term + " docFreq=" +
+                             docFreq + " != num docs seen " + freq0 +
" + num docs deleted " + delCount);
         }
       }

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to