RE: comparing lucene scores across queries

Uwe Schindler Tue, 29 Mar 2011 01:49:33 -0700

> thanks for your reply. I thought I've solved the issue according to Uwe,
the
> queries without coord function were reasonably comparable, but now you
> actually reopened it.
> 
> So, I need to be sure I'm making them comparable and I would like to ask
the
> following.
> 
> My BooleanQueries have similar structure. Important: they only contain
> TermQueries. The fields are always 3 but the terms number can vary... this
is
> an example of BooleanQuery (sorry for the syntax):
> 
> field1:term1, SHOULD
> field1:term2, SHOULD
> field2:term1, SHOULD
> field2:term2, SHOULD
> field2:term3, SHOULD
> field3:term1, SHOULD
> ...
> 
> If it is not clear how the BooleanQueries are, I can print some of them
for
> you. They have same number of fields but different number of terms.
> 
> 1- Do you still think QueryNorm is not an issue ? Funny, because in the
> documentation I can read:
> QueryNorm(q) is a normalizing factor used to make scores between queries
> comparable. This factor does not affect document ranking (since all ranked
> documents are multiplied by the same factor), but rather just attempts to
> make scores from different queries (or even different indexes) comparable.
> 
> It seems I can compare queries from the documentation.


But as you are always using the same type of query (TermQuery), the
QueryNorm should not change, so no issue at all. It differs if you have a
variable number of Boolean clauses, the Query norm could help you to make
the queries comparable. But if you only have always the same looking BQ with
exact same number of TQ in it (only different terms) its not an issue at
all. In all other cases, the query norm helps to compare e.g. a BQ with 5 TQ
clauses with another BQ that has 8 TQ clauses.

> 2- I don't think I'm using queryBoosts, are they enabled by default in the
> BooleanQuery ?

Query boost are only active if you do TermQuery.setBoost(anything != 1.0f).

> 3- FieldNorm is not mentioned in Similarity class. How can I disable it ?
> SHould I disable it ? Is it a issue ?

FieldNorm should not be a problem, as it's an indexed feature. So the same
document has always the same FieldNorm (which is a combination of length
norm, indexing document boost). If two queries hit the same document the
scores for this document should be comparable, as the FieldNorm is the same
for both cases.

See point 6) in the Similarity docs: norm(t,d)

> 4-  If I'm not wrong Uwe told me I can compute comparable cosine
similarities
> even with documents of different length. Tf and Idf are unbounded, and my
> docs have different length. Can't I measure the similarity between query
and
> doc vectors anyway ?

The field norm normalizes that. So where is the problem?

> 5 - Again, I've been told I can compare queries and from documentation, I
> can see that queryNorm factor normalizes all queries. But you are saying I
> should manually normalize them somehow ? It is not clear

It only affects different querys (e.g. number of Boolean clauses differ,
type of queries differ).

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: comparing lucene scores across queries

Reply via email to