Hi Mike,

Thank you for your reply. Yes, I had thought of this, but it is not a
solution to my problem, and this is because the Term Frequency and therefore
the results will still be wrong, as prepending or appending a string to the
term will still make it a different term.

Similarily, I could use regex queries, but again that doesn't fix the TF
issue. I am not talking here hypothetically, I have proof this doesn't work
experimentally (i.e. the precision for my task goes down in my experiments).

Also, I agree that when your fields are essentially different as in /title/,
/author /and /text/, normalizing by field length makes sense, but in my case
my fields are many and are all chunks of a larger text (extracted sentences
that have been labelled with a number of different classes), and in the
experiments I am running I am trying to establish whether weighting
sentences in different classes differently will lead to increased relevance
of results.

This also doesn't change the fact that documentation is wrong! Any ideas how
to fix?
Daniel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Similarity-formula-documentation-is-misleading-how-to-make-field-agnostic-queries-tp4179307p4179834.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to