Hi PA! How are things going? It's an interesting question but I don't think Lucene (as it is today) could change weights based on semantics (either assigned by formatting tags or maybe looked up in some dictionary like WordNet)...
Some time ago, Doug sent to this list the formula for the score computation which is: score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) * coord_q_d score_d : score for document d sum_t : sum for all terms t tf_q : the square root of the frequency of t in the query tf_d : the square root of the frequency of t in d idf_t : log(numDocs/docFreq_t+1) + 1.0 numDocs : number of documents in index docFreq_t : number of documents containing t norm_q : sqrt(sum_t((tf_q*idf_t)^2)) norm_d_t : square root of number of tokens in d in the same field as t boost_t : the user-specified boost for term t coord_q_d : number of terms in both query and document / number of terms in query The only thing that counts is the frequency of the terms in the document and among documents. A way to influence the final score might be to tweak the real frequencies during indexing with some parameters configured externally. Let's say if the word is underlined then multiply its count by X. This modified TF should influence the final score accordingly. Just a thought... Alex --- petite_abeille <[EMAIL PROTECTED]> wrote: > Hello, > > I was wandering what would be a good way to > incorporate text format > information in Lucene word/document scoring. For > example, when turning > HTML into plain text for indexing purpose, a lot of > potentially useful > information are lost: eg tags like <bold>, <strong> > and so on could be > understood as conveying emphasis information about > some words. If > somebody took the pain to "underline" some words, > why throw it away? > Assuming there is some interesting meaning in a > document format/layout, > and a way to understand it and weight it, how could > one incorporate this > information into document scoring? > > Thanks for any insights :-) > > PA. > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>