Re: Confidence scores at search time

Grant Ingersoll Mon, 02 Mar 2009 13:22:57 -0800


On Mar 2, 2009, at 2:47 PM, Ken Williams wrote:

Hi Grant,

It's true, I may have an X-Y problem here. =)
My basic need is to sacrifice recall to achieve greater precision.Ratherthan always presenting the user with the top N documents, I need toreturn*only* the documents that seem relevant. For some searches this maybe 3
documents, for some it may be none.

Therein lies the rub. How are you determining what is relevant? Insome sense, you are asking Lucene to determine what is relevant andthen turning around and telling it you are not happy with it doingwhat you told it to do (I'm exaggerating a bit, I know), namely tellyou what the relevant documents are for a given query and a set ofdocuments based on it's scoring model. As an alternate tack, Iusually look at this type of thing and try to figure out a way to makemy queries more precise (e.g. replace OR with AND, introduce phrasequeries, filter or add NOT clauses or some other qualifiers) or someother relevance tricks [1], [2].

That being said, I could see maybe determining a delta value such thatif the distance between any two scores is more than the delta, you cutoff the rest of the docs. This takes into account the relative stateof scores and is not some arbitrary value (although, the delta is, ofcourse)

Since you are allowing the user to "explore", it may be morereasonable to cutoff at some point, too, but I still don't know of agood way to determine what that point is in a generic way. Maybe withsome specific knowledge about how you are creating your queries andwhat query terms matched you could come up with something, but still,I am uncertain.

The other thing that strikes me is that you add in some type oflearning/memory component that tracks your click-through informationand gives feedback into the system about relevance.

My user interface in this case isn't the standard "type words in abox andwe'll show you the best docs" - I'm using Lucene as a tool in thebackground
to do some exploration about how I could augment a set of traditional
results with a few alternative results gleaned from a different path.
Not sure if this helps with the X-Y problem, but that's my task athand.


Yes.

Also, keep in mind there are other techniques for encouragingexploration: clustering, faceting, info extraction (identifying namedentities, etc. and presenting them)


Just throwing out some food for thought.

Also, while perusing the threads you refer to below, I saw areference to
the following link, which seems to have gone dead:

 https://issues.apache.org/bugzilla/show_bug.cgi?id=31841

Hmm, bugzilla has moved to JIRA. I'm not sure where the mapping isanymore. There used to be a Bugzilla Id in JIRA, I think. Sorry.


-Grant

[1] 
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/
[2] 
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Optimizing-Findability-in-Lucene-and-Solr/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Confidence scores at search time

Reply via email to