Woops!  You are right, this is a silly bug in the CheckIndex tool.  It is not
properly taking into account deletions.  I will open an issue & fix it.

Thanks for testing & reporting this, and sorry about that.

Mike

"Bogdan Ghidireac" <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I tried to use the CheckIndex tool (the latest svn code) and I was
> surprised
> to notice that all my indexes from production (around 30) are corrupt.
> This
> is highly unlikely because they were running for about one year and I had
> no
> exception during search so far.
> 
> One recurring pattern I observed is that the tool reports the segments
> with
> deleted docs as corrupt. The one without deleted docs are fine.. Here is
> a
> sample output.
> 
> index 1
> 
>   6 of 7: name=_wxlp docCount=1001
>     compound=true
>     numFiles=1
>     size (MB)=0.213
>     no deletions
>     test: open reader.........OK
>     test: fields, norms.......OK [12 fields]
>     test: terms, freq, prox...OK [4142 terms; 8004 terms/docs pairs; 8006
> tokens]
>     test: stored fields.......OK [12012 total field count; avg 12 fields
>     per
> doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
> 
>   7 of 7: name=_wxqg docCount=178
>     compound=true
>     numFiles=1
>     size (MB)=0.039
>     no deletions
>     test: open reader.........OK
>     test: fields, norms.......OK [12 fields]
>     test: terms, freq, prox...OK [819 terms; 1417 terms/docs pairs; 1417
> tokens]
>     test: stored fields.......OK [2136 total field count; avg 12 fields
>     per
> doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
> 
> index 2
> 
>   6 of 7: name=_10hr docCount=1978
>     compound=true
>     numFiles=2
>     size (MB)=3.601
>     has deletions [delFileName=_10hr_5.del]
>     test: open reader.........OK [17 deleted docs]
>     test: fields, norms.......OK [10 fields]
>     test: terms, freq, prox...FAILED
>     WARNING: would remove reference to this segment (-fix was not
> specified); full exception:
> java.lang.RuntimeException: term ASIN:342678033X docFreq=5 != num docs
> seen
> 4
>         at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:217)
> 
>   7 of 7: name=_10i0 docCount=196
>     compound=true
>     numFiles=1
>     size (MB)=0.44
>     no deletions
>     test: open reader.........OK
>     test: fields, norms.......OK [10 fields]
>     test: terms, freq, prox...OK [8611 terms; 24307 terms/docs pairs;
>     32841
> tokens]
>     test: stored fields.......OK [1960 total field count; avg 10 fields
>     per
> doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
> 
> 
> Is this a known issue or my indexes are really corrupt ?
> 
> Regards,
> Bogdan

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to