[Discussion moved to htdig3-dev as I paste in code...]

>Please do so. Perhaps one could change it so something like (pseudo-code):
>
>new_weight = weight1 + weight2
>
>if( weight1 > 0 && weight2 > 0 ) {
>       new_weight = new_weight * two_words_matched_factor
>}

Hmm. That's an interesting idea...

Here's the relevant code in htsearch/parser.cc: (line 376 on)
   if (dm2)
   {
       //
       // Duplicate document.  We just need to add the scores together
       //
       dm2->score += dm->score;
       if (dm->anchor < dm2->anchor)
      dm2->anchor = dm->anchor;
   }
   else
   {
       dm2 = new DocMatch;
       dm2->score = dm->score;
       dm2->id = dm->id;
       dm2->anchor = dm->anchor;
       result->add(dm2);
   }

So you see that it *already* sums the weights of the documents when it does
an *or*...

>I now go back and search for the same words with "Match: Any". I get 141
>matches. Document a ist now at position 19 in the ranking order, document
>b is at 28.
>
>To my mind, best behaviour would be: 141 machtes but with documents a and b
>at the top.

True, but here's a question. What are the actual scores from the $(SCORE)
variable for the two searches. They *should* be the same, from what I see
in the code. If not, there's a bug in there somewhere...

If *so*, try fiddling around with the line after the duplicate document
comment... Also look at the $(DOCID) of the documents that come in front of
documents a and b. Then grep through the db.wordlist file for the search
words and those DocIDs. How do those weights (last column, e.g. w:1000)
compare to documents a and b?

In other words, why might those documents have higher weights. My guess is
that a and b have all the words you mentioned. But other documents might
have higher weights for "lyx" or "search replace" or whatever...

-Geoff


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to