I want to calculate average document length for document collection which each document having 3 different fields(filed1, field2,field3)
This is the program to calculate average length when only one field is there. private byte[] normsDocLengthArr = null; private double avgDocLength; normsDocLengthArr = indexReader.norms("filed1"); //norms-Returns the byte-encoded normalization factor for the named field of every document. double sumLength = 0; for (int i = 0; i < normsDocLengthArr.length; i++) { double encodeLength = DefaultSimilarity.decodeNorm(normsDocLengthArr[i]); //decodeNorm -Decodes a normalization factor stored in an index. double length = 1 / (encodeLength * encodeLength); sumLength += length; } this.avgDocLength = sumLength / normsDocLengthArr.length; This is how I extended it for all 3 fields. private byte[] normsDocLengthArrField1 = null; private byte[] normsDocLengthArrField2 = null; private byte[] normsDocLengthArrField3 = null; private double avgDocLength; normsDocLengthArrField1 = indexReader.norms("filed1"); normsDocLengthArrField2 = indexReader.norms("filed2"); normsDocLengthArrField3 = indexReader.norms("filed3"); //norms-Returns the byte-encoded normalization factor for the named field of every document. double sumLength = 0; for (int i = 0; i < normsDocLengthArrField1.length; i++) { double encodeLengthF1 = DefaultSimilarity.decodeNorm(normsDocLengthArrField1[i]); double encodeLengthF2 = DefaultSimilarity.decodeNorm(normsDocLengthArrField2[i]); double encodeLengthF3 = DefaultSimilarity.decodeNorm(normsDocLengthArrField3[i]); //decodeNorm -Decodes a normalization factor stored in an index. double length = 1 / {(encodeLengthF1 * encodeLengthF1)+(encodeLengthF2 * encodeLengthF2)+(encodeLengthF3 * encodeLengthF3)}; sumLength += length; } this.avgDocLength = sumLength / (normsDocLengthArrField1.length+ normsDocLengthArrField2.length+normsDocLengthArrField3.length; I just want to know whether my implementation of calculating Doc average length for 3 field is correct? -- Regards Kasun Perera