On Jul 14, 2006, at 7:42 AM, Yonik Seeley wrote:

On 7/14/06, Rob Staveley (Tom) <[EMAIL PROTECTED]> wrote:
What would I lose by omitting norms? The ability to boost individual fields
as they are added to the index? Anything else?

Length normalization of the field.  Full-text matches on shorter
fields score higher because the match is seen as more specific.  You
loose that if you omit norms.  That's typically OK for short fields
like "title" anyway, and fields that aren't full-text (like dates,
numbers, etc).

Yonik, I disagree on one point. I recommend against omitting norms for title fields.

Without norms, the titles "Duke Ellington" and "Duke Ellington meets Count Basie" will contribute equally to their respective document scores on a search for "Duke Ellington". For most applications, exact title matches should win, so that's not optimal.

KinoSearch adopted a default tf() truncation scheme where all fields were normalized as if they contained a minimum of 100 tokens. That achieved the desired outcome of stopping very short documents from scoring inappropriately high, but even with a boost assigned to a title field, I've found that I can't get really good IR precision without going back to a non-truncating tf() for title.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to