Re: modularization discussion

Michael McCandless Tue, 03 May 2011 09:50:10 -0700

Isn't our end goal here a bunch of well factored search modules?  Ie,
fast forward a year or two and I think we should have modules like
these:


  * Faceting

  * Highlighting

  * Suggest (good patch is on LUCENE-2995)

  * Schema

  * Query impls

  * Query parsers

  * Analyzers (good progress here already, thanks Robert!),
    incl. factories/XML configuration (still need this)

  * Database import (DIH)

  * Web app

  * Distribution/replication

  * Doc set representations

  * Collapse/grouping

  * Caches

  * Similarity/scoring impls (BM25, etc.)

  * Codecs

  * Joins

  * Lucene core

In this future, much of this code came from what is now Solr and
Lucene, but we should freely and aggressively poach from other
projects when appropriate (and license/provenance is OK).

I keep seeing all these cool "compressed int set" projects popping
up... surely these are useful for us.  Solr poached a doc set impl
from Nutch; probably there's other stuff to poach from Nutch, Mahout,
etc.

Katta's doing something sweet with distribution/replication; let's
poach & merge w/ Solr's approach.  There are various facet impls out
there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge
with Solr's.

Elastic Search has lots of cool stuff, too, under ASL2.

All these external open-source projects are fair game for poaching and
refactoring into shared modules, along with what is now Solr and
Lucene sources.

In this ideal future, Solr becomes the bundling and default/example
configuration of the Web App and other modules, much like how the
various Linux distros bundle different stuff together around the Linux
kernel.  And if you are an advanced app and don't need the webapp
part, you can cherry pick the huper duper modules you do need and
directly embedded into your app.

Isn't this the future we are working towards?

Mike

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: modularization discussion

Reply via email to