Hi list I’m trying to figure out how customizable scoring and weighting is in the Lucene API. I read about the API’s but still can’t figure out if the following is possible.
I would like to do normal document text indexing, but I would like to control the weight added to tokens my self, also I would like to control the weighting of query tokens and the how things are added together. When indexing a word I would like attache my own weights to the word, and use these weights when querying for documents. F.ex. Doc 1 Lucene(0.7) is(0) a(0) powerful(0.9) indexing(0.62) and(0) search(0.99) API(0.3) Doc 2 Lucene(0.5) is(0) used by(0) a(0) lot of(0) smart(0) people(0.1) The floats in parentheses are some I would like to add in the indexing process, not something coming from Lucene tdf/id ex. Wen querying I would like to repeat this and also create the weights for each term “myself” and control how the final doc score is calculated. I have read that it’s possible to attach your own custom attributes to tokens. Is this the way to go? Ie. should I add my custom weight as attributes to tokens, and then access these attributes when calculating document score in the search process (described here https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/analysis/package-summary.html under “adding a custom attribute”)? The reason why I’m asking is that I can’t find any examples of this being done anywhere. But I found someone stating “With Lucene, it is impossible to increase or decrease the weight of individual terms in a document”. With regards Rune