OK I opened this JIRA issue to track this: https://issues.apache.org/jira/browse/LUCENE-1069
Mike "Michael McCandless" <[EMAIL PROTECTED]> wrote: > > Woops! You are right, this is a silly bug in the CheckIndex tool. It is not > properly taking into account deletions. I will open an issue & fix it. > > Thanks for testing & reporting this, and sorry about that. > > Mike > > "Bogdan Ghidireac" <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I tried to use the CheckIndex tool (the latest svn code) and I was > > surprised > > to notice that all my indexes from production (around 30) are corrupt. > > This > > is highly unlikely because they were running for about one year and I had > > no > > exception during search so far. > > > > One recurring pattern I observed is that the tool reports the segments > > with > > deleted docs as corrupt. The one without deleted docs are fine.. Here is > > a > > sample output. > > > > index 1 > > > > 6 of 7: name=_wxlp docCount=1001 > > compound=true > > numFiles=1 > > size (MB)=0.213 > > no deletions > > test: open reader.........OK > > test: fields, norms.......OK [12 fields] > > test: terms, freq, prox...OK [4142 terms; 8004 terms/docs pairs; 8006 > > tokens] > > test: stored fields.......OK [12012 total field count; avg 12 fields > > per > > doc] > > test: term vectors........OK [0 total vector count; avg 0 term/freq > > vector fields per doc] > > > > 7 of 7: name=_wxqg docCount=178 > > compound=true > > numFiles=1 > > size (MB)=0.039 > > no deletions > > test: open reader.........OK > > test: fields, norms.......OK [12 fields] > > test: terms, freq, prox...OK [819 terms; 1417 terms/docs pairs; 1417 > > tokens] > > test: stored fields.......OK [2136 total field count; avg 12 fields > > per > > doc] > > test: term vectors........OK [0 total vector count; avg 0 term/freq > > vector fields per doc] > > > > index 2 > > > > 6 of 7: name=_10hr docCount=1978 > > compound=true > > numFiles=2 > > size (MB)=3.601 > > has deletions [delFileName=_10hr_5.del] > > test: open reader.........OK [17 deleted docs] > > test: fields, norms.......OK [10 fields] > > test: terms, freq, prox...FAILED > > WARNING: would remove reference to this segment (-fix was not > > specified); full exception: > > java.lang.RuntimeException: term ASIN:342678033X docFreq=5 != num docs > > seen > > 4 > > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:217) > > > > 7 of 7: name=_10i0 docCount=196 > > compound=true > > numFiles=1 > > size (MB)=0.44 > > no deletions > > test: open reader.........OK > > test: fields, norms.......OK [10 fields] > > test: terms, freq, prox...OK [8611 terms; 24307 terms/docs pairs; > > 32841 > > tokens] > > test: stored fields.......OK [1960 total field count; avg 10 fields > > per > > doc] > > test: term vectors........OK [0 total vector count; avg 0 term/freq > > vector fields per doc] > > > > > > Is this a known issue or my indexes are really corrupt ? > > > > Regards, > > Bogdan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]