I have an idea for something I'm calling grouped scoring, and I want to
know if anybody has already done anything like this.
The idea comes from the problem that in your search results you'd like
to show only one or a small number of items from each group: for example
on google.com, multiple results from a single site tend to be spread out
in the index, so that if you search for a (word in a) hostname, for
example, you don't just get all the pages from a single domain.
I've used Solr's grouping, and that is OK, but it is an all-or-nothing
kind of thing: you get a single group for all the items in the group.
What I have in mind is to penalize docs when there are higher scoring
docs in the same group, probably by some multiplier on the score. So if
you had results A1, A2, A3 with scores 10,9,8 and B1, B2, B3 with scores
5,4,3, you might prefer to get
A1, B1, A2, B2, A3, B3 instead of A1, A2, A3, B1, B2, B3 (where the
groups are A and B)
You could do this by multiplying the score of a doc by R^n where r is a
constant < 1 and n is the number of docs with higher scores in the same
group.
An efficient implementation using PriorityQueue is feasible since you
don't care about the relative ranking of docs that are not in the queue
So -- does this already exist somewhere I can just use it, or do I get
to be first :)
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org