Hi Grant, It's true, I may have an X-Y problem here. =)
My basic need is to sacrifice recall to achieve greater precision. Rather than always presenting the user with the top N documents, I need to return *only* the documents that seem relevant. For some searches this may be 3 documents, for some it may be none. My user interface in this case isn't the standard "type words in a box and we'll show you the best docs" - I'm using Lucene as a tool in the background to do some exploration about how I could augment a set of traditional results with a few alternative results gleaned from a different path. Not sure if this helps with the X-Y problem, but that's my task at hand. Also, while perusing the threads you refer to below, I saw a reference to the following link, which seems to have gone dead: https://issues.apache.org/bugzilla/show_bug.cgi?id=31841 (in http://www.lucidimagination.com/search/document/1618ce933c8ebd6b ) Has the issue tracker moved somewhere else? Finally, I seem unable to get Searcher.explain() to do much useful - my code looks like: Searcher searcher = new IndexSearcher(reader); QueryParser parser = new QueryParser(LuceneIndex.CONTENT, analyzer); Query query = parser.parse(queryString); TopDocCollector collector = new TopDocCollector(n); searcher.search(query, collector); for ( ScoreDoc d : collector.topDocs().scoreDocs ) { String explanation = searcher.explain(query, d.doc).toString(); Field id = searcher.doc( d.doc ).getField( LuceneIndex.ID ); System.out.println(id + "\t" + d.score + "\t" + explanation); } In the output, I get explanations like "0.88922405 = (MATCH) product of:" with no details. Perhaps I need to do something different in indexing? Thanks, -Ken On 2/26/09 10:36 AM, "Grant Ingersoll" <gsing...@apache.org> wrote: > I don't know of anyone doing work on it in the Lucene community. My > understanding to date is that it is not really worth trying, but that > may in fact be an outdated view. I haven't stayed up on the > literature on this subject, so background info on what you are > interested in would be helpful. > > Digging around in the archives a bit more, I come up with some more > relevant emails: > http://www.lucidimagination.com/search/?q=comparing+scores+across+searches#/ > p:lucene,solr/s:email > > What is the bigger problem that you are trying to solve? That is, you > imply that score comparison is the solution, but you haven't said the > problem you are trying to solve. > > Cheers, > Grant > > > On Feb 25, 2009, at 11:38 AM, Ken Williams wrote: > >> Hi all, >> >> I didn't get a response to this - not sure whether the question was >> ill-posed, or too-frequently-asked, or just not interesting. But if >> anyone >> could take a stab at it or let me know a different place to look, >> I'd really >> appreciate it. >> >> Thanks, >> >> -Ken >> >> >> On 2/20/09 12:00 PM, "Ken Williams" >> <ken.willi...@thomsonreuters.com> wrote: >> >>> Hi, >>> >>> Has there been any work done on getting confidence scores at >>> runtime, so >>> that scores of documents can be compared across queries? I found one >>> reference in the mailing list to some work in 2003, but couldn't >>> find any >>> follow-up: >>> >>> http://osdir.com/ml/jakarta.lucene.user/2003-12/msg00093.html >>> >>> Thanks. >> >> -- >> Ken Williams >> Research Scientist >> The Thomson Reuters Corporation >> Eagan, MN >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Ken Williams Research Scientist The Thomson Reuters Corporation Eagan, MN --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org