: Sunday gets ranked highly due to idf. How do I reduce this skewness : due to the date-posted field? I saw a reference earlier to : ConstantScoreRangeQuery on JIRA - is it the solution?
Yes. RangeQuery expands to a BooleanQuery containing all of the terms in the. The number of terms (and the frequency of thsoe terms in the index) will allways affect those scores. This is why i constantly argue that when using dates or numbers a RangeQuery never makes sense -- allways use a RangeFilter, and if you must have a "Query" object, use ConstantScoreRangeQuery. : 2. If I choose to sort the results by date, then recent documents with : very very low relevancy (say the words searched appears only in : content, and not in title/bylines/summary fields that are boosted : higher) are still shown relatively high in the list, and I wish to : omit them in general. What is the best way to implement some sort of a : relevancy filter (include only documents with an normalized score of : 0.2 or more....)? Or is there a better way around it? there is no safe way to filter by score, this is mentioned in the FAQ... http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03 An alternate approach is to sort by score, but use something like a FunctionQuery to inflate the scores of more recent documents... https://issues.apache.org/jira/browse/LUCENE-446 -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]