Alan - Thanks for the idea. We don't want to invent a new scoring formula, hence a new Similarity class. While fully leveraging what DefaultSimilarity/TFIDFSimilarity already provides, we only want to override computation of a single component (i.e. fieldNorm) of existing tf-idf based scoring. Creating a new class would require copy/paste of existing TFIDFSimilarity code and would make it hard to upgrade and keep things in sync with future versions. Also changing it in the original code would allow others to benefit from it without posing any risks.
In case you're interested, we want to move the length-norm computation from index time to search time. That will allow us to change the length-norm function and A/B test it against the default, without having to re-create the index which is an extremely expensive task for us. We'll simply store the raw field length (#terms) as fieldNorm and will change the scorer to compute length-norm from it at search time. Thanks, Hamid On Fri, Oct 24, 2014 at 2:21 AM, Alan Woodward <a...@flax.co.uk> wrote: > Hi Hamid, > > Can't you just extend Similarity instead? > > Alan Woodward > www.flax.co.uk > > > On 24 Oct 2014, at 08:04, Hafiz Hamid wrote: > > Hi - I wanted to check if folks would be okay with removing the "final" > modifier from 4 methods (i.e. computeNorm,computeWeight, exactSimScorer > and sloppySimScorer) in Lucene's TFIDFSimilarity class. It doesn't look > like allowing to override these methods would have any negative > implications on the function of this class. Yet it'd enable us tune the > tf-idf scoring provided by this class to better serve our needs. > > I've logged a Jira issue for this: LUCENE-6023 > <https://issues.apache.org/jira/browse/LUCENE-6023>. If folks don't have > any objection, I've a patch ready and can upload it. > > Thanks, > Hamid > > >