multiple tokens at the same position

Enis Soztutar Fri, 25 May 2007 07:44:23 -0700

Hi,

In nutch we have a use case in which we need to store tokens with theiroriginal text plus their stemmed form plus their canonical form(throughsome asciifization). From my understanding of lucene, it makes sense towrite a tokenstream which generates several tokens for each "word", butplace all the tokens for the "word" at the same position withToken#setPositionIncrement(0).This way we could be able to search over this field using anyform(stemmed, canonical, original) of the "word". Actually i have twoquestions here. First is that is there any way to avoid matching stemmedor canonical forms to a phrase query. Moreover it seems that addingmultiple forms of the "word"s alters statistical calculations forscoring, especially for tf and idf, because the frequency of the rootform of the word is incremented at each word with that root form. Isthere any way that we could avoid it?




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

multiple tokens at the same position

Reply via email to