Re: modularization discussion

Simon Willnauer Sat, 07 May 2011 03:35:13 -0700

On Sat, May 7, 2011 at 12:30 PM, Michael McCandless
<luc...@mikemccandless.com> wrote:
> I agree: refactoring is TONS of work.  Even cases that seem cut and
> dry, from a distance, quickly prove to be hairy (just ask Robert about
> refactoring analyzers).
>
> However, I think "unproven gain" is too strong.  EG, just a few days
> ago we had a user thread asking how to use auto-suggest outside of
> Solr.  Once we commit the suggest module, this is easy/ier for that
> user, and now we have one more user testing things, finding bugs,
> maybe offering improvements, etc.  I think the gains of each
> refactoring are potentially large, but they are not immediate -- they
> accrue over time.  It's an investment.
>
> Also: I'm in no way asking/expecting other devs to sign up to do
> refactoring (your response seems to imply this).  Nobody can do such a
> thing.  We all scratch our own itches and I'm not asking you to
> scratch mine :)
>
> What I am asking is that if someone wants to scratch this itch (factor
> out XXX as a module), they are fully free to do so, as long as it
> doesn't harm Solr's/Lucene's current functions, performance, etc.  We
> don't seem to have this freedom today, and this is, I think, the core
> conflict.
>
> Grant if I'm reading your response right, you agree with that freedom
> (others are free to refactor); you're just tempering in a good dose of
> reality ("refactoring is hard"), which I agree with.


Mike thank you for this email - this is the consens we need to have!!!

+1 for this... I think this is also what the board report should
contain but I will reply to this separately.

simon
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Thu, May 5, 2011 at 10:25 AM, Grant Ingersoll <gsing...@apache.org> wrote:
>>
>> On May 5, 2011, at 4:15 AM, Simon Willnauer wrote:
>>
>>> Hey folks
>>>
>>> On Tue, May 3, 2011 at 6:49 PM, Michael McCandless
>>> <luc...@mikemccandless.com> wrote:
>>>> Isn't our end goal here a bunch of well factored search modules?  Ie,
>>>> fast forward a year or two and I think we should have modules like
>>>> these:
>>>
>>> I think we have two camps here (10k feet view):
>>>
>>
>> I'd say 3 camps:
>>
>>> 1. wants to move towards modularization might support all the modules
>>> mike has listed below
>>> 2. wants to stick with Solr's current architecture and remain
>>> "monolithic" (not negative in this case) as much as possible
>>
>> 3.  Those who think most should be modularized, but realize it's a ton of 
>> work for an unproven gain (although most admit it is a highly likely gain) 
>> and should be handled on a case-by-case basis as people do the work.   I 
>> don't have anything against modularization, I just know, given my schedule, 
>> I won't be able to block off weeks of time to do it.  I'm happy to review 
>> where/when I can.
>>
>>
>>>
>>> I think we can meet somewhere in between and agree on certain module
>>> that should be available to lucene users as well. The ones I have in
>>> mind are
>>> primary search features like:
>>> - Faceting
>>
>> Yeah, for instance, Bobo seems to have some interesting faceting 
>> implementations that are ASL, perhaps we can combine into this new faceting 
>> module.
>>
>>> - Highlighting
>>> - Suggest
>>> - Function Query (consolidation is needed here!)
>>> - Analyzer factories
>>
>> +1.
>>
>>>
>>> things like distribution and replication should remain in solr IMO but
>>> might be moved to a more extensible API so that people can add their
>>> own implementation.
>>
>> And, of course, all the web tier stuff (response writers, inputs, etc.)
>>
>>> I am thinking about things like the ZooKeeper
>>> support that might not be a good solution for everybody where folks
>>> have already JGroups infrastructure.
>>
>> Or other similar solutions.  I wonder about using a ZeroConf implementation 
>> that can do self-discovery.
>>
>>> So I think we can work towards 2
>>> distinct goals.
>>> 1. extract common search features into modules
>>> 2. refactor solr to be more "elastic" / "distributed"  and extensible
>>> with respect to those goals.
>>
>> 3. Make it easier for Solr to be programmatically configured by decoupling 
>> the reading of schema.xml and solrconfig.xml from the code that actually 
>> contains the structures for the properties (IndexSchema and SolrConfig)
>>
>>>
>>> maybe we can get agreement on such a basis though.
>>>
>>> let me know what you think
>>
>> I think it's reasonable.  At the end of the day, it broadens the appeal of 
>> both Lucene and Solr.  Solr still exists and is not just a "shell" and at 
>> the end of the day, remains the primary choice for people who don't want to 
>> stitch everything together themselves.  All of it is easier to contribute to 
>> b/c people can focus in on the core area they know w/o having to know 
>> everything else per se.  Stuff should be better tested b/c of it as well 
>> since it will receive broader use.
>>
>> That being said, and not to be discouraging, but I see it as a ton of work.
>>
>>
>>
>>
>>>
>>> simon
>>>>
>>>>  * Faceting
>>>>
>>>>  * Highlighting
>>>>
>>>>  * Suggest (good patch is on LUCENE-2995)
>>>>
>>>>  * Schema
>>>>
>>>>  * Query impls
>>>>
>>>>  * Query parsers
>>>>
>>>>  * Analyzers (good progress here already, thanks Robert!),
>>>>    incl. factories/XML configuration (still need this)
>>>>
>>>>  * Database import (DIH)
>>>>
>>>>  * Web app
>>>>
>>>>  * Distribution/replication
>>>>
>>>>  * Doc set representations
>>>>
>>>>  * Collapse/grouping
>>>>
>>>>  * Caches
>>>>
>>>>  * Similarity/scoring impls (BM25, etc.)
>>>>
>>>>  * Codecs
>>>>
>>>>  * Joins
>>>>
>>>>  * Lucene core
>>>>
>>>> In this future, much of this code came from what is now Solr and
>>>> Lucene, but we should freely and aggressively poach from other
>>>> projects when appropriate (and license/provenance is OK).
>>>>
>>>> I keep seeing all these cool "compressed int set" projects popping
>>>> up... surely these are useful for us.  Solr poached a doc set impl
>>>> from Nutch; probably there's other stuff to poach from Nutch, Mahout,
>>>> etc.
>>>>
>>>> Katta's doing something sweet with distribution/replication; let's
>>>> poach & merge w/ Solr's approach.  There are various facet impls out
>>>> there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge
>>>> with Solr's.
>>>>
>>>> Elastic Search has lots of cool stuff, too, under ASL2.
>>>>
>>>> All these external open-source projects are fair game for poaching and
>>>> refactoring into shared modules, along with what is now Solr and
>>>> Lucene sources.
>>>>
>>>> In this ideal future, Solr becomes the bundling and default/example
>>>> configuration of the Web App and other modules, much like how the
>>>> various Linux distros bundle different stuff together around the Linux
>>>> kernel.  And if you are an advanced app and don't need the webapp
>>>> part, you can cherry pick the huper duper modules you do need and
>>>> directly embedded into your app.
>>>>
>>>> Isn't this the future we are working towards?
>>>>
>>>> Mike
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> Lucene Revolution -- Lucene and Solr User Conference
>> May 25-26 in San Francisco
>> www.lucenerevolution.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

Reply via email to