BTW, quoting Marcelo Ochoa (the developer behind the Oracle/Lucene implementation) the three minimal features a transactional DB should support for Lucene integration are:
1) The ability to define new functions (e.g. lcontains() lscore) which would allow to bind queries to lucene and obtain document/row scores 2) An API that would allow DML intercepts, like Oracle's ODCI. 3) The ability to extend and/or implement new types of "domain" indexes that the engine's query evaluation and execution/optimization planner can use efficiently. Thanks Marcelo. -- Joaquin On Sun, Sep 7, 2008 at 8:16 AM, J. Delgado <[EMAIL PROTECTED]>wrote: > On Sun, Sep 7, 2008 at 2:41 AM, mark harwood <[EMAIL PROTECTED]>wrote: > > >>for example joins are not possible using SOLR). >> >> It's largely *because* Lucene doesn't do joins that it can be made to >> scale out. I've replaced two large-scale database systems this year with >> distributed Lucene solutions because this scale-out architecture provided >> significantly better performance. These were "semi-structured" systems too. >> Lucene's comparitively simplistic data model/query model is both a weakness >> and a strength in this regard. >> > > Hey, maybe the right way to go for a truly scalable and high performance > semi-structured database is to marry HBase (Big-table like data storage) > with SOLR/Lucene.I concur with you in the sense that simplistic data models > coupled with high performance are the killer. > > Let me quote this from the original Bigtable paper from Google: > > " Bigtable does not support a full relational data model; instead, it > provides clients with a simple data model that supports dynamic control over > data layout and format, and allows clients to reason about the locality > properties of the data represented in the underlying storage. Data is > indexed using row and column names that can be arbitrary strings. Bigtable > also treats data as uninterpreted strings, although clients often serialize > various forms of structured and semi-structured data into these strings. > Clients can control the locality of their data through careful choices in > their schemas. Finally, Bigtable schema parameters let clients dynamically > control whether to serve data out of memory or from disk." > >