Hi All, Given that Lucene scoring can favour shorter fields in documents, in the past we've had to pad out 'unreasonably' short fields to a set minimum (with basically nonsense words), I'm wondering how others might have dealt with this issue.
Another option is to have a custom Similarity class with an altered lengthNorm method? Cheers, Dan -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- From: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html score(q,d) = coord(q,d) · queryNorm(q) · SUM( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) ) Given one term query, and the term found in two documents doc{a}, doc{b}(with no boost on field, doc or query term) score(q,d) =~ SUM ( tf(t in d) · norm(t,d) ) and for one term: score(q,d) =~ tf(t in d) · norm(t,d) also: norm(t,d) =~ lengthNorm(field) lengthNorm(field) : computed when the document is added to the index in accordance with the number of tokens of this field in the document, so that shorter fields contribute more to the score in DefaultSimilarity.java lengthNorm(field) = 1/sqrt(num_terms_in_field) doc{a} field{a} num_terms_in_field = 100, term appears 10 times in field{a},doc{a} score =~ 10/sqrt(100) = 1 doc{b} field{a} num_terms_in_field = 300, term appears 10 times in field{a},doc{a} score =~ 10/sqrt(300) = 0.577350269 Daniel Rosher Developer d: 0207 3489 912 t: 0870 2020 121 f: 0870 2020 131 m: http://www.hotonline.com/ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This message is sent in confidence for the addressee only. It may contain privileged information. The contents are not to be disclosed to anyone other than the addressee. Unauthorised recipients are requested to preserve this confidentiality and to advise us of any errors in transmission. Thank you. hotonline ltd is registered in England & Wales. Registered office: One Canada Square, Canary Wharf, London E14 5AP. Registered No: 1904765. This message has been scanned for viruses by BlackSpider MailControl - www.blackspider.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]