Woops! You are right, this is a silly bug in the CheckIndex tool. It is not properly taking into account deletions. I will open an issue & fix it.
Thanks for testing & reporting this, and sorry about that. Mike "Bogdan Ghidireac" <[EMAIL PROTECTED]> wrote: > Hi, > > I tried to use the CheckIndex tool (the latest svn code) and I was > surprised > to notice that all my indexes from production (around 30) are corrupt. > This > is highly unlikely because they were running for about one year and I had > no > exception during search so far. > > One recurring pattern I observed is that the tool reports the segments > with > deleted docs as corrupt. The one without deleted docs are fine.. Here is > a > sample output. > > index 1 > > 6 of 7: name=_wxlp docCount=1001 > compound=true > numFiles=1 > size (MB)=0.213 > no deletions > test: open reader.........OK > test: fields, norms.......OK [12 fields] > test: terms, freq, prox...OK [4142 terms; 8004 terms/docs pairs; 8006 > tokens] > test: stored fields.......OK [12012 total field count; avg 12 fields > per > doc] > test: term vectors........OK [0 total vector count; avg 0 term/freq > vector fields per doc] > > 7 of 7: name=_wxqg docCount=178 > compound=true > numFiles=1 > size (MB)=0.039 > no deletions > test: open reader.........OK > test: fields, norms.......OK [12 fields] > test: terms, freq, prox...OK [819 terms; 1417 terms/docs pairs; 1417 > tokens] > test: stored fields.......OK [2136 total field count; avg 12 fields > per > doc] > test: term vectors........OK [0 total vector count; avg 0 term/freq > vector fields per doc] > > index 2 > > 6 of 7: name=_10hr docCount=1978 > compound=true > numFiles=2 > size (MB)=3.601 > has deletions [delFileName=_10hr_5.del] > test: open reader.........OK [17 deleted docs] > test: fields, norms.......OK [10 fields] > test: terms, freq, prox...FAILED > WARNING: would remove reference to this segment (-fix was not > specified); full exception: > java.lang.RuntimeException: term ASIN:342678033X docFreq=5 != num docs > seen > 4 > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:217) > > 7 of 7: name=_10i0 docCount=196 > compound=true > numFiles=1 > size (MB)=0.44 > no deletions > test: open reader.........OK > test: fields, norms.......OK [10 fields] > test: terms, freq, prox...OK [8611 terms; 24307 terms/docs pairs; > 32841 > tokens] > test: stored fields.......OK [1960 total field count; avg 10 fields > per > doc] > test: term vectors........OK [0 total vector count; avg 0 term/freq > vector fields per doc] > > > Is this a known issue or my indexes are really corrupt ? > > Regards, > Bogdan --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]