I have an idea for something I'm calling grouped scoring, and I want to know if anybody has already done anything like this.

The idea comes from the problem that in your search results you'd like to show only one or a small number of items from each group: for example on google.com, multiple results from a single site tend to be spread out in the index, so that if you search for a (word in a) hostname, for example, you don't just get all the pages from a single domain.

I've used Solr's grouping, and that is OK, but it is an all-or-nothing kind of thing: you get a single group for all the items in the group.

What I have in mind is to penalize docs when there are higher scoring docs in the same group, probably by some multiplier on the score. So if you had results A1, A2, A3 with scores 10,9,8 and B1, B2, B3 with scores 5,4,3, you might prefer to get

A1, B1, A2, B2, A3, B3 instead of A1, A2, A3, B1, B2, B3 (where the groups are A and B)

You could do this by multiplying the score of a doc by R^n where r is a constant < 1 and n is the number of docs with higher scores in the same group.

An efficient implementation using PriorityQueue is feasible since you don't care about the relative ranking of docs that are not in the queue

So -- does this already exist somewhere I can just use it, or do I get to be first :)

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to