OK I opened this JIRA issue to track this:

  https://issues.apache.org/jira/browse/LUCENE-1069

Mike

"Michael McCandless" <[EMAIL PROTECTED]> wrote:
> 
> Woops!  You are right, this is a silly bug in the CheckIndex tool.  It is not
> properly taking into account deletions.  I will open an issue & fix it.
> 
> Thanks for testing & reporting this, and sorry about that.
> 
> Mike
> 
> "Bogdan Ghidireac" <[EMAIL PROTECTED]> wrote:
> > Hi,
> > 
> > I tried to use the CheckIndex tool (the latest svn code) and I was
> > surprised
> > to notice that all my indexes from production (around 30) are corrupt.
> > This
> > is highly unlikely because they were running for about one year and I had
> > no
> > exception during search so far.
> > 
> > One recurring pattern I observed is that the tool reports the segments
> > with
> > deleted docs as corrupt. The one without deleted docs are fine.. Here is
> > a
> > sample output.
> > 
> > index 1
> > 
> >   6 of 7: name=_wxlp docCount=1001
> >     compound=true
> >     numFiles=1
> >     size (MB)=0.213
> >     no deletions
> >     test: open reader.........OK
> >     test: fields, norms.......OK [12 fields]
> >     test: terms, freq, prox...OK [4142 terms; 8004 terms/docs pairs; 8006
> > tokens]
> >     test: stored fields.......OK [12012 total field count; avg 12 fields
> >     per
> > doc]
> >     test: term vectors........OK [0 total vector count; avg 0 term/freq
> > vector fields per doc]
> > 
> >   7 of 7: name=_wxqg docCount=178
> >     compound=true
> >     numFiles=1
> >     size (MB)=0.039
> >     no deletions
> >     test: open reader.........OK
> >     test: fields, norms.......OK [12 fields]
> >     test: terms, freq, prox...OK [819 terms; 1417 terms/docs pairs; 1417
> > tokens]
> >     test: stored fields.......OK [2136 total field count; avg 12 fields
> >     per
> > doc]
> >     test: term vectors........OK [0 total vector count; avg 0 term/freq
> > vector fields per doc]
> > 
> > index 2
> > 
> >   6 of 7: name=_10hr docCount=1978
> >     compound=true
> >     numFiles=2
> >     size (MB)=3.601
> >     has deletions [delFileName=_10hr_5.del]
> >     test: open reader.........OK [17 deleted docs]
> >     test: fields, norms.......OK [10 fields]
> >     test: terms, freq, prox...FAILED
> >     WARNING: would remove reference to this segment (-fix was not
> > specified); full exception:
> > java.lang.RuntimeException: term ASIN:342678033X docFreq=5 != num docs
> > seen
> > 4
> >         at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:217)
> > 
> >   7 of 7: name=_10i0 docCount=196
> >     compound=true
> >     numFiles=1
> >     size (MB)=0.44
> >     no deletions
> >     test: open reader.........OK
> >     test: fields, norms.......OK [10 fields]
> >     test: terms, freq, prox...OK [8611 terms; 24307 terms/docs pairs;
> >     32841
> > tokens]
> >     test: stored fields.......OK [1960 total field count; avg 10 fields
> >     per
> > doc]
> >     test: term vectors........OK [0 total vector count; avg 0 term/freq
> > vector fields per doc]
> > 
> > 
> > Is this a known issue or my indexes are really corrupt ?
> > 
> > Regards,
> > Bogdan
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to