I want to calculate average document length for document collection which
each document having 3 different fields(filed1, field2,field3)
This is the program to calculate average length when only one field is
there.
private byte[] normsDocLengthArr = null;
private double avgDocLength;
normsDocLengthArr = indexReader.norms("filed1");
//norms-Returns the byte-encoded normalization factor for
the named field of every document.
double sumLength = 0;
for (int i = 0; i < normsDocLengthArr.length; i++) {
double encodeLength = DefaultSimilarity.decodeNorm(normsDocLengthArr[i]);
//decodeNorm -Decodes a normalization factor stored in an index.
double length = 1 / (encodeLength * encodeLength);
sumLength += length;
}
this.avgDocLength = sumLength / normsDocLengthArr.length;
This is how I extended it for all 3 fields.
private byte[] normsDocLengthArrField1 = null;
private byte[] normsDocLengthArrField2 = null;
private byte[] normsDocLengthArrField3 = null;
private double avgDocLength;
normsDocLengthArrField1 = indexReader.norms("filed1");
normsDocLengthArrField2 = indexReader.norms("filed2");
normsDocLengthArrField3 = indexReader.norms("filed3");
//norms-Returns the byte-encoded normalization factor for
the named field of every document.
double sumLength = 0;
for (int i = 0; i < normsDocLengthArrField1.length; i++) {
double encodeLengthF1 =
DefaultSimilarity.decodeNorm(normsDocLengthArrField1[i]);
double encodeLengthF2 =
DefaultSimilarity.decodeNorm(normsDocLengthArrField2[i]);
double encodeLengthF3 =
DefaultSimilarity.decodeNorm(normsDocLengthArrField3[i]);
//decodeNorm -Decodes a normalization factor stored in an index.
double length = 1 / {(encodeLengthF1 *
encodeLengthF1)+(encodeLengthF2 * encodeLengthF2)+(encodeLengthF3 *
encodeLengthF3)};
sumLength += length;
}
this.avgDocLength = sumLength / (normsDocLengthArrField1.length+
normsDocLengthArrField2.length+normsDocLengthArrField3.length;
I just want to know whether my implementation of calculating Doc average
length for 3 field is correct?
--
Regards
Kasun Perera