grouped scoring

Michael Sokolov Mon, 07 Apr 2014 10:51:12 -0700

I have an idea for something I'm calling grouped scoring, and I want toknow if anybody has already done anything like this.

The idea comes from the problem that in your search results you'd liketo show only one or a small number of items from each group: for exampleon google.com, multiple results from a single site tend to be spread outin the index, so that if you search for a (word in a) hostname, forexample, you don't just get all the pages from a single domain.

I've used Solr's grouping, and that is OK, but it is an all-or-nothingkind of thing: you get a single group for all the items in the group.

What I have in mind is to penalize docs when there are higher scoringdocs in the same group, probably by some multiplier on the score. So ifyou had results A1, A2, A3 with scores 10,9,8 and B1, B2, B3 with scores5,4,3, you might prefer to get

A1, B1, A2, B2, A3, B3 instead of A1, A2, A3, B1, B2, B3 (where thegroups are A and B)

You could do this by multiplying the score of a doc by R^n where r is aconstant < 1 and n is the number of docs with higher scores in the samegroup.

An efficient implementation using PriorityQueue is feasible since youdon't care about the relative ranking of docs that are not in the queue

So -- does this already exist somewhere I can just use it, or do I getto be first :)


-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

grouped scoring

Reply via email to