An equivalent Parallelizer for IndexWriter would be a
useful addition to keep the two indexes in synch.

Hiding the details of which lucene index document data
is retrieved from gives us some added flexibility in
storage options but I've been thinking of a more
general-purpose layer of abstraction which would allow
me to use other storage options eg relational
databases just as transparently. 

A typical configuration might augment a lucene index
with  an rdbms storage plug-in where all text content
is indexed (not stored) in the lucene index along with
a stored Field holding the RDBMS primary key. The
RDBMS would be used to store the original text plus
any other fields. Retrieving documents would involve
querying the lucene index, retrieving the rdbms key
and using that to access the database for the other
required fields from the database. 
As well as allowing the prospect of an RDBMS-backed
storage option for document fields we can also
introduce the option of using the RDBMS to provide
filters at query time eg books with price <$10.

As a rough outline this would require:

1) A new HybridDocument which can contain lucene and
non-lucene fields for reading and writing
2) A new reader/writer abstraction which routes fields
to the appropriate repository (lucene/plugin storage)
3) A plugin interface for attaching external
storage/filter modules.
4) A new search facility that can pass lucene queries
to lucene and filter requests to a filter module
5) A search facility that allows partial retrieval of
documents (eg equivalent of select summary, title,
price...).




Send instant messages to your online friends http://uk.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to