Hi Roy, It is about storing the document length into a byte (to use less memory). Please edit the source code to avoid this encode/decode thing:
/** * Encodes the document length in a lossless way */ @Override public long computeNorm(FieldInvertState state) { return state.getLength() - state.getNumOverlap(); } @Override public float score(int doc, float freq) { // We have to supply something in case norms are omitted return ModelBase.this.score(stats, freq, norms == null ? 1L : norms.get(doc)); } @Override public Explanation explain(int doc, Explanation freq) { return ModelBase.this.explain(stats, doc, freq, norms == null ? 1L : norms.get(doc)); } On Thursday, July 21, 2016 6:06 PM, Dwaipayan Roy <dwaipayan....@gmail.com> wrote: ​Hello, In *SimilarityBase.java*, I can see that the length of the document is is getting normalized by using the function *decodeNormValue()*. But I can't understand how the normalizations is done. Can you please help? Also, is there any way to avoid this doc-length normalization, to use the raw doc-length (as used in LM-JM Zhai et al. SIGIR-2001)? Thanks.. P.S. I am using Lucene 4.10.4 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org