Regarding SSS (and any other contrib visibility). Perhaps we should get into habit of referencing contrib goodies from highly visible (to developers) spots (no pun intended), like Javadocs. Concretely, if SSS is so good or if it is simply one possible alternative Similarity that's available and that we (Lucene developers) know about, why are we not mentioning it in Javadocs for (Default)Similarity?
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/search/Similarity.html http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/search/DefaultSimilarity.html Javadocs have a lot of visibility, esp. in modern IDEs. We can also have this mentioned on the Wiki, but Wiki is documentation that I think most people don't really like to read. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Michael McCandless <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Friday, September 5, 2008 6:41:48 AM > Subject: Re: Moving SweetSpotSimilarity out of contrib > > > Chris Hostetter wrote: > > > : Another important driver is the "out-of-the-box experience". > > > > I honestly have no idea what an OOTB experience for Lucene-Java > > means ... > > For Solr i understand, For Nutch i understand ... for a java > > library???? > > Well... even though it's a "java library", Lucene still has many > defaults. > > Sure, Solr has even more, so this is important for Solr too. > > Most non-Solr apps built on Lucene will simply use Lucene's defaults, > for lack of knowing any better. > > How well such apps then work is what I'm calling the OOTB experience > for Lucene, and I think it's well-defined and important. > > Especially spooky is when a publication does an eval of search > libraries because typically they will eval only the OOTB experience and > won't go looking on our wiki to discover all the tricks. > > With IndexWriter we default to flushing by RAM usage (16 MB) not by > buffered doc count, to ConcurrentMergeScheduler, to > LogByteSizeMergePolicy, to compound file format, mergeFactor is 10, > etc. > > IndexSearcher (and also IndexWriter, for lengthNorm) uses > Similarity.getDefault(). > > QueryParser uses a number of defaults when translating the end user's > search text into all sorts of Query instances. > > In 2.3 we made great improvements to OOTB indexing speed, and that's > important. > > I think making improvements to OOTB relevance is also important, but I > agree this is much harder to do "in general" since there are so many > differences between the content in apps. > > That all being said... I also agree (on closer inspection) it's not > cut and dry that SSS is a good choice for default (what would be the > right default for its "curve"?). > > If other OOTB relevance improvements surface with time (eg a good way > to do passage scoring/retrieval or proximity scoring or lexical > affinity) then we should strongly consider them. Such things always > come with a performance cost, though, so it'll be an interesting > discussion... > > > Butthen we get into that back-compat concern issue. > > Well...is Lucene's precise scoring formula guaranteed not to change > between releases? I assume and hope not. > > Just like with indexing, where the precise choice of when committing > and merging and flushing happens was never "promised", that lack of > API promise gave us the freedom to drastically improve the OOTB > indexing speed without breaking any promises. We need to keep that > same freedom on the search side. > > From our last discussion on back compat, our most powerful weapon is > to NOT make promises when they aren't necessary or could limit future > back compat. > > And, if we have a back compat situation that's holding back Lucene's > OOTB adoption by new users, we should think hard about switching the > default to favor new users and making an option to quickly get back to > the old behavior to accomodate existing users. The recent bug fixes > to StandardTokenizer are such examples. > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]