>From a legal standpoint, whenever we need to use open-source code, somebody has to inspect the code and 'approve' it. This inspection makes sure there's no use of 3rd party libraries, to which we'd need to get open-source clearance as well.
This process was done for Lucene core, but not for contrib, in my company. AFAIU, this process should be done by a company if it wants to (usually mandatory when you integrate open-source code in your products). Therefore I don't think the Lucene community should be concerned with this. The only thing that the community can do is to move as much as possible to the core, so that if a company inspects the code, it will cover as much as possible. Of course, this may sound too 'broad' of a statement and I definitely don't think everything should belong to 'core'. My understanding is that the 'contrib' packages include 3rd party libraries (like Snowball), while there are packages which do not require and 3rd party libs (like SweetSpotSimiliarity). For those that require 3rd party libs, it makes sense to leave them in contrib. For those that don't, per-request, it might make sense to move them to 'core' in order to encourage people to use them. That's why I was asking if it's a problem to move SweetSpot to 'core'. As for your questions on SweetSpot, from what I understand in the code, an application should configure it with different values, depnding on the TF computation method it wants to use (hyperbolic or baseline). The default implementation in SweetSpot for tf() is to use the baseline method, while an application can extend SweetSpot and override tf() to use the hyperbolic one. An application can also configure the length norm parameters for different fields. >From what I read, the code is well documented. Perhaps Doron can some high-level documentation on what's the benefit of each tf() computation method, or give some references. But the defaults seem to make sense, so an application can definitely start with the default (if it wants to). Shai On Tue, Sep 2, 2008 at 2:34 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Sep 2, 2008, at 6:07 AM, Shai Erera wrote: > > Hi, > > Following Doron's quality work enhancements in TREC 2007 ( > http://wiki.apache.org/lucene-java/TREC_2007_Million_Queries_Track_-_IBM_Haifa_Team), > I was wondering if it's possible to move the SweetSpotSimilarity to Lucene's > main code stream (out of "contrib" that is). > It shows significant improvement over the default similarity. > > > My understanding is it requires a bit of tuning, right? I'd want to make > sure people have the right information to use it intelligently, but > otherwise, it seems reasonable. > > I'm not suggesting to replace the DefaultSimilarity (as the default) with > SweetSpot, but just expose SweetSpot as part of Lucene's core. It will help > me use it, since I cannot use the contrib packages easily in my environment > (legal issues), but can use Lucene's core more freely. > > > This strikes me as really odd. The contrib modules are released under the > exact same terms as the core, but heh, I'm not a lawyer... Is there > anything you think we should be concerned with? > > -Grant > >