Re: text format and scoring

petite_abeille Sat, 03 Aug 2002 14:24:40 -0700

Hi Alex,

On Saturday, August 3, 2002, at 11:13 , Alex Murzaku wrote:


> Hi PA! How are things going?

Doing all right :-)

>
> It's an interesting question but I don't think Lucene
> (as it is today) could change weights based on
> semantics (either assigned by formatting tags or maybe
> looked up in some dictionary like WordNet)...

Ummm... I see.

>
> Some time ago, Doug sent to this list the formula for
> the score computation which is:

Thanks.


> The only thing that counts is the frequency of the
> terms in the document and among documents.
>
> A way to influence the final score might be to tweak
> the real frequencies during indexing with some
> parameters configured externally. Let's say if the
> word is underlined then multiply its count by X. This
> modified TF should influence the final score
> accordingly.
>
> Just a thought...

I see. That's what I'm basically doing right now somehow: I index a 
document multiple time (eg an email could be indexed by subject, first 
sentence and body content). Then I do multiple searches. And use a 
"ranking comparator" to evaluate the result based on how many time I get 
a specific document plus its Lucene scores and other funky heuristics. 
Which seems to work ok, but is kind of cumbersome :-( Same deal for 
finding "related" document. Lucene is very good for finding "similar" 
document, but for "related" (think "cluster" ;-), I basically end up 
doing some term categorization and assign some multiplying factor for 
each term category. Which then I feed to Lucene to get something more 
akin to a "cluster" of document...

In any case, I was simply wandering if there was a more straightforward 
way of doing things.

Cheers,

PA.


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: text format and scoring

Reply via email to