Hi, i sent this 30 min ago and it didn't seem to go through so i'm
trying again, i apologize if two copies finally arrive.

I am working on the development of a product that is using Lucene. A
corrupt index was reported by testers and it is in an odd state.
The indexes are built in batches (to multiple ram indexes in parallel)
and then eventually merged into a disk index with
IndexWriter.addIndexes(Directory[]).
Somehow the index got corrupted, there were no indications of a crash or
errors in log. The failure in SegmentMerger.mergeNorms:
 private void mergeNorms() throws IOException {
   for (int i = 0; i < fieldInfos.size(); i++) {
     FieldInfo fi = fieldInfos.fieldInfo(i);
     if (fi.isIndexed && !fi.omitNorms) {
       IndexOutput output = directory.createOutput(segment + ".f" + i);
       try {
         for (int j = 0; j < readers.size(); j++) {
           IndexReader reader = (IndexReader) readers.elementAt(j);
           int maxDoc = reader.maxDoc();
           byte[] input = new byte[maxDoc];
           reader.norms(fi.name, input, 0);  <==== ERROR HERE
           for (int k = 0; k < maxDoc; k++) {
             if (!reader.isDeleted(k)) {
               output.writeByte(input[k]);
             }
           }
         }
       } finally {
         output.close();
       }
     }
   }
 }

The problem is that the maxDoc() returned by the indexReader
(FieldsReader in this case) is larger then the size, in bytes, of the
norms file. then there is an error in IndexInput.read(byte[], int,
int) because there is not enough data in file to read.
Here is part of the directory listing (there are many stored fields of
the same size so omitting all but first 3):

-rw-r--r--  1 icmadmin db2grp1        811 Sep 27 20:48 _a4.fnm
-rw-r--r--  1 icmadmin db2grp1    1451696 Sep 27 20:49 _a4.fdx
-rw-r--r--  1 icmadmin db2grp1   12736304 Sep 27 20:49 _a4.fdt
-rw-r--r--  1 icmadmin db2grp1 5648544509 Sep 27 21:30 _a4.prx
-rw-r--r--  1 icmadmin db2grp1 1695149231 Sep 27 21:30 _a4.frq
-rw-r--r--  1 icmadmin db2grp1   45688880 Sep 27 21:30 _a4.tis
-rw-r--r--  1 icmadmin db2grp1     673588 Sep 27 21:30 _a4.tii
-rw-r--r--  1 icmadmin db2grp1     181159 Sep 27 21:30 _a4.f2
-rw-r--r--  1 icmadmin db2grp1     181159 Sep 27 21:30 _a4.f1
-rw-r--r--  1 icmadmin db2grp1     181159 Sep 27 21:30 _a4.f0

from looking at the code the sizeof(.fdx)/8 should equal sizeof(.f0)
but it doesn't in this case.

any ideas? Also, I'm wasn't sure if this was more appropriate for dev
or user so i guessed user.

-Nick
(programmer working @ ibm)

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to