On Thu, Jun 11, 2009 at 7:01 AM, Michael McCandless<[email protected]> wrote: > On Wed, Jun 10, 2009 at 6:07 PM, Yonik Seeley<[email protected]> > wrote: > >> Really goes into Solr land... my pref for Lucene is to remain a core >> expert-level full-text search library and keep out things that are >> easy to do in an application or at another level. > > I think this must be the crux of our disagreement.
Indeed. The itch to scratch w.r.t Solr in Lucene is increased core functionality, not more magic (that duplicates what Solr already does, but just in a different way and thus makes the lives of Solr developers harder). If we asked on java-user about people's priorities/wishes, I bet column stride fields, near real time indexing, and better performance would dominate stuff like not having to specify how to sort a field. > I feel, instead, that Lucene should stand on its own, as a useful > search library, with a consumable API, good defaults, etc. Lucene is > more than "the expert level search API that's embedded in > Solr". Lucene is consumed directly by apps other than Solr. > > In fact, I think there are many things in Solr that naturally belong > in Lucene (and over time we've been gradually slurping them down). > The line/criteria has always been rather blurry... And conversely, Solr isn't just a wrapper around Lucene and an incubator for Lucene technology. Ask Lucene users if they would like pretty much any substantial piece of functionality in Solr moved to Lucene as a module and you'll probably get an affirmative answer. But moving something from Solr to Lucene can have a lot of negative effects for Solr, including taking it out of the hands of Solr committers who aren't Lucene committers, and taking it out of Solr's release cycle and easy ability to change - if Solr needs to make a change to one of the moved classes, it's necessary to get it through the Lucene change process and then upgrade to the latest Lucene trunk - all or nothing. It's also the case that the goals of Lucene classes and Solr classes are often very different. Lucene is more concerned with Java APIs (as should be the case), while they are a bit more secondary in Solr... the external APIs are of primary importance and one doesn't worry as much (or at all) about the classes implementing that interface or it's Java API back compatibility (as a generalization... it depends on the class). > In Lucene, we should be able to add a NumericField to a document, > index it, and then create RangeFilter or Sort on that field and have > things "just work". That feels like a false sense of simplicity, and Lucene isn't for dummies ;-) One needs to understand how things work under the hood to avoid shooting oneself in the foot. You need to understand the memory implications of sorting on different fields, and you need to understand that to sort on a text field, there really needs to be just one token per field. You need to understand that the way Trie is indexed, and that multiple values per field won't work if you use a precision step less than the word size. There have been a lot of bad design decisions (I'm talking software development in general) due to citing "the user will be confused". Often, this hypothetical user doesn't exist (or is an extreme minority), and hence I prefer things of the form "I think this is confusing". Extra magic isn't always a good thing. -Yonik http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
