Re: Moving SweetSpotSimilarity out of contrib

Grant Ingersoll Wed, 03 Sep 2008 14:06:00 -0700


On Sep 3, 2008, at 3:00 PM, Michael McCandless wrote:


Obviously we can't default everything perfectly since at some point
there are hard tradeoffs to be made and every app is different, but if
SweetSpotSimilarity really gives better relevance for many/most apps,
and doesn't have any downsides (I haven't looked closely myself), I
think we should get it into core?

Well, we only have 2 data points here: Hoss' original position thatit was helpful, and Doron's Million Query work. Has anyone elsereported benefit? And in that regard, the difference between OOTB andSweetSpot was 0.154 vs. 0.162 for MAP. Not a huge amount, but stilluseful. In that regard, there are other length normalizationfunctions (namely approaches that don't favor very short documents asmuch) that I've seen benefit applications as well, but as Erik is(in)famous for saying "it depends". In fact, if we go solely based onthe million query work, we'd be better off having the Query Parsercreate phrase queries automatically for any query w/ more than 1 term(0.19 vs 0.154) before we even touch length normalization.

I've long argued that Lucene needs to take on the relevance questionmore head on, and in an open source way, until then, we are merelyguessing at what's better, w/o empirical evidence that can be easilyreproduced. TREC is just one data point, and is often discounted asbeing all that useful in the real world.

I'm on the fence, though. I agree w/ Hoss that core should be "core"and I don't think we want to throw more and more into core, but I alsoagree w/ Mike in that we want good, intelligent defaults for what wedo have in core.


-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Moving SweetSpotSimilarity out of contrib

Reply via email to