Hi Jian, Thanks for the reply. The problem with that is it completely ignores document length. A book that mentions "frog" 5 times in its 2,000 pages should be less relevant than a book that mentions "frog" 4 times in its 4 pages.
I really want to lower the document length weight instead of removing it completely. Any ideas how to do that? Thanks. --- java-user@lucene.apache.org wrote: Hi, > > I would use pure span or cover density based ranking algorithm which > do not take document length into consideration. (tweaking whatever > currently in the standard Lucene distribution?) > > For example, searching for the keywords "beautiful house", span/cover > ranking will treat a long document and a short document the same > ranking as long as they have the same number of spans/covers (for > example, "beautiful xxxxxx house" is one cover), and with each > span/cover, the editing distance between the keywords is the same. > > Just my 2 cents, > > Cheers, > > Jian > > On 29 Jun 2005 20:30:49 -0000, [EMAIL PROTECTED] > <[EMAIL PROTECTED]> wrote: > > Hi, > > > > Short documents bubble to the top of the results because the field > > length is short. Does anyone have a good strategy for working around this? > > Will doing something like log(document length) flatten out my results while > > still making them meaningful? I'm going to try some different approaches > > but any advice is appreciated. > > > > Thanks. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]