Hi Mike, Thank you for your reply. Yes, I had thought of this, but it is not a solution to my problem, and this is because the Term Frequency and therefore the results will still be wrong, as prepending or appending a string to the term will still make it a different term.
Similarily, I could use regex queries, but again that doesn't fix the TF issue. I am not talking here hypothetically, I have proof this doesn't work experimentally (i.e. the precision for my task goes down in my experiments). Also, I agree that when your fields are essentially different as in /title/, /author /and /text/, normalizing by field length makes sense, but in my case my fields are many and are all chunks of a larger text (extracted sentences that have been labelled with a number of different classes), and in the experiments I am running I am trying to establish whether weighting sentences in different classes differently will lead to increased relevance of results. This also doesn't change the fact that documentation is wrong! Any ideas how to fix? Daniel -- View this message in context: http://lucene.472066.n3.nabble.com/Similarity-formula-documentation-is-misleading-how-to-make-field-agnostic-queries-tp4179307p4179834.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org