Re: What are norms?

Marvin Humphrey Fri, 14 Jul 2006 16:33:49 -0700


On Jul 14, 2006, at 7:42 AM, Yonik Seeley wrote:

On 7/14/06, Rob Staveley (Tom) <[EMAIL PROTECTED]> wrote:

What would I lose by omitting norms? The ability to boostindividual fields
as they are added to the index? Anything else?


Length normalization of the field.  Full-text matches on shorter
fields score higher because the match is seen as more specific.  You
loose that if you omit norms.  That's typically OK for short fields
like "title" anyway, and fields that aren't full-text (like dates,
numbers, etc).

Yonik, I disagree on one point. I recommend against omitting normsfor title fields.

Without norms, the titles "Duke Ellington" and "Duke Ellington meetsCount Basie" will contribute equally to their respective documentscores on a search for "Duke Ellington". For most applications,exact title matches should win, so that's not optimal.

KinoSearch adopted a default tf() truncation scheme where all fieldswere normalized as if they contained a minimum of 100 tokens. Thatachieved the desired outcome of stopping very short documents fromscoring inappropriately high, but even with a boost assigned to atitle field, I've found that I can't get really good IR precisionwithout going back to a non-truncating tf() for title.


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: What are norms?

Reply via email to