If you were to score repeated terms then I suspect it would have to be done so that the repetitions didn't score as highly as the first occurrence - otherwise f2 could be selected as a better fragment than f3 for the query q1 in your example. Repetitions of a term in a fragment could be scored as a very small fraction of the score given to the first occurrence. This would at least rank f2 higher than f1 for query q2. Another potentially useful ranking factor may be to boost fragments found at the beginning of a document - that's where people tend to write summaries or introductions.

Doron Cohen wrote:
This question was raised in the user's list -
http://www.nabble.com/highlighting-tf2322109.html

Assume three fragments and two queries:
  f1 = aa  11  bb  33  cc
  f2 = aa  11  bb  11  cc
  f3 = aa  11  bb  22  cc
  q1 = 11 22
  q2 = 11
Now we call highlighter.getBestFragment(q);
For q1, f3 is returned, as expected.
For q2, f1 is returned, although "11" appears twice in f2 but only once in
f1.

This is because QueryScorer.getTokenScore(Token) counts only unique
fragment tokens.

Would it make sense to make this behavior controllable?
(It is easily done but I am not sure about the consequences.)

Or perhaps there is a way to achieve this behavior (preferring f2 on f1 for
q2 above) that I missed?



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






                
___________________________________________________________ Copy addresses and emails from any email account to Yahoo! Mail - quick, easy and free. http://uk.docs.yahoo.com/trueswitch2.html


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to