Hi Alex, On Saturday, August 3, 2002, at 11:13 , Alex Murzaku wrote:
> Hi PA! How are things going? Doing all right :-) > > It's an interesting question but I don't think Lucene > (as it is today) could change weights based on > semantics (either assigned by formatting tags or maybe > looked up in some dictionary like WordNet)... Ummm... I see. > > Some time ago, Doug sent to this list the formula for > the score computation which is: Thanks. > The only thing that counts is the frequency of the > terms in the document and among documents. > > A way to influence the final score might be to tweak > the real frequencies during indexing with some > parameters configured externally. Let's say if the > word is underlined then multiply its count by X. This > modified TF should influence the final score > accordingly. > > Just a thought... I see. That's what I'm basically doing right now somehow: I index a document multiple time (eg an email could be indexed by subject, first sentence and body content). Then I do multiple searches. And use a "ranking comparator" to evaluate the result based on how many time I get a specific document plus its Lucene scores and other funky heuristics. Which seems to work ok, but is kind of cumbersome :-( Same deal for finding "related" document. Lucene is very good for finding "similar" document, but for "related" (think "cluster" ;-), I basically end up doing some term categorization and assign some multiplying factor for each term category. Which then I feed to Lucene to get something more akin to a "cluster" of document... In any case, I was simply wandering if there was a more straightforward way of doing things. Cheers, PA. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>