> thanks for your reply. I thought I've solved the issue according to Uwe, the > queries without coord function were reasonably comparable, but now you > actually reopened it. > > So, I need to be sure I'm making them comparable and I would like to ask the > following. > > My BooleanQueries have similar structure. Important: they only contain > TermQueries. The fields are always 3 but the terms number can vary... this is > an example of BooleanQuery (sorry for the syntax): > > field1:term1, SHOULD > field1:term2, SHOULD > field2:term1, SHOULD > field2:term2, SHOULD > field2:term3, SHOULD > field3:term1, SHOULD > ... > > If it is not clear how the BooleanQueries are, I can print some of them for > you. They have same number of fields but different number of terms. > > 1- Do you still think QueryNorm is not an issue ? Funny, because in the > documentation I can read: > QueryNorm(q) is a normalizing factor used to make scores between queries > comparable. This factor does not affect document ranking (since all ranked > documents are multiplied by the same factor), but rather just attempts to > make scores from different queries (or even different indexes) comparable. > > It seems I can compare queries from the documentation.
But as you are always using the same type of query (TermQuery), the QueryNorm should not change, so no issue at all. It differs if you have a variable number of Boolean clauses, the Query norm could help you to make the queries comparable. But if you only have always the same looking BQ with exact same number of TQ in it (only different terms) its not an issue at all. In all other cases, the query norm helps to compare e.g. a BQ with 5 TQ clauses with another BQ that has 8 TQ clauses. > 2- I don't think I'm using queryBoosts, are they enabled by default in the > BooleanQuery ? Query boost are only active if you do TermQuery.setBoost(anything != 1.0f). > 3- FieldNorm is not mentioned in Similarity class. How can I disable it ? > SHould I disable it ? Is it a issue ? FieldNorm should not be a problem, as it's an indexed feature. So the same document has always the same FieldNorm (which is a combination of length norm, indexing document boost). If two queries hit the same document the scores for this document should be comparable, as the FieldNorm is the same for both cases. See point 6) in the Similarity docs: norm(t,d) > 4- If I'm not wrong Uwe told me I can compute comparable cosine similarities > even with documents of different length. Tf and Idf are unbounded, and my > docs have different length. Can't I measure the similarity between query and > doc vectors anyway ? The field norm normalizes that. So where is the problem? > 5 - Again, I've been told I can compare queries and from documentation, I > can see that queryNorm factor normalizes all queries. But you are saying I > should manually normalize them somehow ? It is not clear It only affects different querys (e.g. number of Boolean clauses differ, type of queries differ). Uwe --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org