Hi Nicholas, Aha, I see that you are into field-based scoring, which is an unsolved problem.
Then, you might find BlendedTermQuery and SynonymQuery relevant. Ahmet On Friday, November 18, 2016 12:22 AM, Nicolás Lichtmaier <nicol...@wolfram.com> wrote: That depends on what you want. In this case I want to use a discrimination power based in all the body text, not just the titles. Because otherwise terms that are really not that relevant end up being very high! El 17/11/16 a las 18:25, Ahmet Arslan escribió: > Hi Nicholas, > > IDF, among others, is a measure of term specificity. If 'or' is not so usual > in titles, then it has some discrimination power in that domain. > > I think it's OK 'or' to get a high IDF value in this case. > > Ahmet > > > > On Thursday, November 17, 2016 9:09 PM, Nicolás Lichtmaier > <nicol...@wolfram.com> wrote: > IDF measures the selectivity of a term. But the calculation is > per-field. That can be bad for very short fields (like titles). One > example of this problem: If I don't delete stop words, then "or", "and", > etc. should be dealt with low IDF values, however "or" is, perhaps, not > so usual in titles. Then, "or" will have a high IDF value and be treated > as an important term. That's bad. > > One solution I see is to modify the Similarity to have a global, or > multi-field IDF value. This value would include in its calculation > longer fields that has more "normal text"-like stats. However this is > not trivial because I can't just add document-frequencies (I would be > counting some documents several times if "or" is present in more than > one field). I would need need to OR the bit-vectors that signal the > presence of the term, right? Not trivial. > > Has anyone encountered this issue? Has it been solved? Is my thinking wrong? > > Should I also try the developers' list? > > Thanks! > > Nicolás.- > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org