hey Hoss, thanks for your reply. I thought I've solved the issue according to Uwe, the queries without coord function were reasonably comparable, but now you actually reopened it.
So, I need to be sure I'm making them comparable and I would like to ask the following. My BooleanQueries have similar structure. Important: they only contain TermQueries. The fields are always 3 but the terms number can vary... this is an example of BooleanQuery (sorry for the syntax): field1:term1, SHOULD field1:term2, SHOULD field2:term1, SHOULD field2:term2, SHOULD field2:term3, SHOULD field3:term1, SHOULD ... If it is not clear how the BooleanQueries are, I can print some of them for you. They have same number of fields but different number of terms. 1- Do you still think QueryNorm is not an issue ? Funny, because in the documentation I can read: QueryNorm(q) is a normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied by the same factor), but rather just attempts to make scores from different queries (or even different indexes) comparable. It seems I can compare queries from the documentation. 2- I don't think I'm using queryBoosts, are they enabled by default in the BooleanQuery ? 3- FieldNorm is not mentioned in Similarity class. How can I disable it ? SHould I disable it ? Is it a issue ? 4- If I'm not wrong Uwe told me I can compute comparable cosine similarities even with documents of different length. Tf and Idf are unbounded, and my docs have different length. Can't I measure the similarity between query and doc vectors anyway ? 5 - Again, I've been told I can compare queries and from documentation, I can see that queryNorm factor normalizes all queries. But you are saying I should manually normalize them somehow ? It is not clear thanks Patrick > querynorm hsouldn't be a problem (since your booleanqueries all have hte > same structure, and odn't use query boosts ... i assume) but field norm > might be; i also don't see anything mentioned so far in this thread that > describes how you'll work arround the tf and idf values being theretically > unbounded (unless your docs are all of identical length) > > ultimatley, attempts at comparing scores across different searches all > come down to normalizing (either explicitly or implicitly) and normalizing > requires that you have a "max possible score" you can normalize relative > to -- not just a "max score for the index", but a max score in the scope > of all theretical documents (because otherwise the comparison isn't fair > given an arbitrary corpus) > > with the default similarity, you can't really define a "max possible > score" for a given query because tf and idf are not bounded functions. > > > There have been a few nice discussions about this general concept over the > years, here's the first once i found doing a quick search... > > http://www.gossamer-threads.com/lists/lucene/java-user/61075 > > > > > > -Hoss > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >