On Sat, May 7, 2011 at 12:30 PM, Michael McCandless <luc...@mikemccandless.com> wrote: > I agree: refactoring is TONS of work. Even cases that seem cut and > dry, from a distance, quickly prove to be hairy (just ask Robert about > refactoring analyzers). > > However, I think "unproven gain" is too strong. EG, just a few days > ago we had a user thread asking how to use auto-suggest outside of > Solr. Once we commit the suggest module, this is easy/ier for that > user, and now we have one more user testing things, finding bugs, > maybe offering improvements, etc. I think the gains of each > refactoring are potentially large, but they are not immediate -- they > accrue over time. It's an investment. > > Also: I'm in no way asking/expecting other devs to sign up to do > refactoring (your response seems to imply this). Nobody can do such a > thing. We all scratch our own itches and I'm not asking you to > scratch mine :) > > What I am asking is that if someone wants to scratch this itch (factor > out XXX as a module), they are fully free to do so, as long as it > doesn't harm Solr's/Lucene's current functions, performance, etc. We > don't seem to have this freedom today, and this is, I think, the core > conflict. > > Grant if I'm reading your response right, you agree with that freedom > (others are free to refactor); you're just tempering in a good dose of > reality ("refactoring is hard"), which I agree with.
Mike thank you for this email - this is the consens we need to have!!! +1 for this... I think this is also what the board report should contain but I will reply to this separately. simon > > Mike > > http://blog.mikemccandless.com > > On Thu, May 5, 2011 at 10:25 AM, Grant Ingersoll <gsing...@apache.org> wrote: >> >> On May 5, 2011, at 4:15 AM, Simon Willnauer wrote: >> >>> Hey folks >>> >>> On Tue, May 3, 2011 at 6:49 PM, Michael McCandless >>> <luc...@mikemccandless.com> wrote: >>>> Isn't our end goal here a bunch of well factored search modules? Ie, >>>> fast forward a year or two and I think we should have modules like >>>> these: >>> >>> I think we have two camps here (10k feet view): >>> >> >> I'd say 3 camps: >> >>> 1. wants to move towards modularization might support all the modules >>> mike has listed below >>> 2. wants to stick with Solr's current architecture and remain >>> "monolithic" (not negative in this case) as much as possible >> >> 3. Those who think most should be modularized, but realize it's a ton of >> work for an unproven gain (although most admit it is a highly likely gain) >> and should be handled on a case-by-case basis as people do the work. I >> don't have anything against modularization, I just know, given my schedule, >> I won't be able to block off weeks of time to do it. I'm happy to review >> where/when I can. >> >> >>> >>> I think we can meet somewhere in between and agree on certain module >>> that should be available to lucene users as well. The ones I have in >>> mind are >>> primary search features like: >>> - Faceting >> >> Yeah, for instance, Bobo seems to have some interesting faceting >> implementations that are ASL, perhaps we can combine into this new faceting >> module. >> >>> - Highlighting >>> - Suggest >>> - Function Query (consolidation is needed here!) >>> - Analyzer factories >> >> +1. >> >>> >>> things like distribution and replication should remain in solr IMO but >>> might be moved to a more extensible API so that people can add their >>> own implementation. >> >> And, of course, all the web tier stuff (response writers, inputs, etc.) >> >>> I am thinking about things like the ZooKeeper >>> support that might not be a good solution for everybody where folks >>> have already JGroups infrastructure. >> >> Or other similar solutions. I wonder about using a ZeroConf implementation >> that can do self-discovery. >> >>> So I think we can work towards 2 >>> distinct goals. >>> 1. extract common search features into modules >>> 2. refactor solr to be more "elastic" / "distributed" and extensible >>> with respect to those goals. >> >> 3. Make it easier for Solr to be programmatically configured by decoupling >> the reading of schema.xml and solrconfig.xml from the code that actually >> contains the structures for the properties (IndexSchema and SolrConfig) >> >>> >>> maybe we can get agreement on such a basis though. >>> >>> let me know what you think >> >> I think it's reasonable. At the end of the day, it broadens the appeal of >> both Lucene and Solr. Solr still exists and is not just a "shell" and at >> the end of the day, remains the primary choice for people who don't want to >> stitch everything together themselves. All of it is easier to contribute to >> b/c people can focus in on the core area they know w/o having to know >> everything else per se. Stuff should be better tested b/c of it as well >> since it will receive broader use. >> >> That being said, and not to be discouraging, but I see it as a ton of work. >> >> >> >> >>> >>> simon >>>> >>>> * Faceting >>>> >>>> * Highlighting >>>> >>>> * Suggest (good patch is on LUCENE-2995) >>>> >>>> * Schema >>>> >>>> * Query impls >>>> >>>> * Query parsers >>>> >>>> * Analyzers (good progress here already, thanks Robert!), >>>> incl. factories/XML configuration (still need this) >>>> >>>> * Database import (DIH) >>>> >>>> * Web app >>>> >>>> * Distribution/replication >>>> >>>> * Doc set representations >>>> >>>> * Collapse/grouping >>>> >>>> * Caches >>>> >>>> * Similarity/scoring impls (BM25, etc.) >>>> >>>> * Codecs >>>> >>>> * Joins >>>> >>>> * Lucene core >>>> >>>> In this future, much of this code came from what is now Solr and >>>> Lucene, but we should freely and aggressively poach from other >>>> projects when appropriate (and license/provenance is OK). >>>> >>>> I keep seeing all these cool "compressed int set" projects popping >>>> up... surely these are useful for us. Solr poached a doc set impl >>>> from Nutch; probably there's other stuff to poach from Nutch, Mahout, >>>> etc. >>>> >>>> Katta's doing something sweet with distribution/replication; let's >>>> poach & merge w/ Solr's approach. There are various facet impls out >>>> there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge >>>> with Solr's. >>>> >>>> Elastic Search has lots of cool stuff, too, under ASL2. >>>> >>>> All these external open-source projects are fair game for poaching and >>>> refactoring into shared modules, along with what is now Solr and >>>> Lucene sources. >>>> >>>> In this ideal future, Solr becomes the bundling and default/example >>>> configuration of the Web App and other modules, much like how the >>>> various Linux distros bundle different stuff together around the Linux >>>> kernel. And if you are an advanced app and don't need the webapp >>>> part, you can cherry pick the huper duper modules you do need and >>>> directly embedded into your app. >>>> >>>> Isn't this the future we are working towards? >>>> >>>> Mike >>>> >>>> http://blog.mikemccandless.com >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >> >> -------------------------- >> Grant Ingersoll >> Lucene Revolution -- Lucene and Solr User Conference >> May 25-26 in San Francisco >> www.lucenerevolution.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org