My thought was to move SSS to core as a step towards making it the default, if and when there is more evidence it is better than current default - it just felt right as a cautious step - I mean first move it to core so that it is more exposed and used, an only after a while, maybe, if there are mostly positive evidences, make it the default.
On Thu, Sep 4, 2008 at 12:04 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > > On Sep 3, 2008, at 3:00 PM, Michael McCandless wrote: > >> >> Obviously we can't default everything perfectly since at some point >> there are hard tradeoffs to be made and every app is different, but if >> SweetSpotSimilarity really gives better relevance for many/most apps, >> and doesn't have any downsides (I haven't looked closely myself), I >> think we should get it into core? >> > > Well, we only have 2 data points here: Hoss' original position that it was > helpful, and Doron's Million Query work. Has anyone else reported benefit? > And in that regard, the difference between OOTB and SweetSpot was 0.154 vs. > 0.162 for MAP. Not a huge amount, but still useful. In that regard, there > are other length normalization functions (namely approaches that don't favor > very short documents as much) that I've seen benefit applications as well, > but as Erik is (in)famous for saying "it depends". In fact, if we go solely > based on the million query work, we'd be better off having the Query Parser > create phrase queries automatically for any query w/ more than 1 term (0.19 > vs 0.154) before we even touch length normalization. > > I've long argued that Lucene needs to take on the relevance question more > head on, and in an open source way, until then, we are merely guessing at > what's better, w/o empirical evidence that can be easily reproduced. TREC > is just one data point, and is often discounted as being all that useful in > the real world. > > I'm on the fence, though. I agree w/ Hoss that core should be "core" and I > don't think we want to throw more and more into core, but I also agree w/ > Mike in that we want good, intelligent defaults for what we do have in core. > > -Grant > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >