An Anti-Merging Multi-Directory Indexing Framework
--------------------------------------------------

                 Key: LUCENE-2425
                 URL: https://issues.apache.org/jira/browse/LUCENE-2425
             Project: Lucene - Java
          Issue Type: New Feature
          Components: contrib/*, Index
    Affects Versions: 3.0.1
            Reporter: Karthick Sankarachary


By design, a Lucene index tends to merge documents that span multiple segments 
into fewer segments, in order to optimize its directory structure, which in 
turn leads to better search performance. In particular, it relies on a merge 
policy to specify the set of merge operations that should be performed when the 
index is optimized. 

Often times, there's a need to do the exact opposite, which is to "split" the 
documents. This calls for a mechanism that facilitates sub-division of 
documents based on a certain (ideally, user-defined) algorithm. By way of 
example, one may wish to sub-divide (or partition) documents based on 
parameters such as time, space, real-timeliness, and so on. Herein, we describe 
an indexing framework that builds on the Lucene index writer and reader, to 
address use cases wherein documents need to diverge rather than converge.

In brief, it associates zero or more sub-directories with the index's 
directory, which serve to complement it in some manner. The sub-directories 
(a.k.a. splits) are managed by a split policy, which is notified of all changes 
made to the index directory (a.k.a. super-directory), thus allowing it to 
modify its sub-directories as it sees fit. To make the index reader and writer 
"observable", we extend Lucene's reader and writer with the goal of providing 
hooks into every method that could potentially change the index. This allows 
for propagation of such changes to the split policy, which essentially acts as 
a listener on the index.

We refer to each sub-directory (or split) and the super-directory as a 
sub-index of the containing index (a.k.a. the split index). Note that the 
sub-directory may not necessarily be co-located with the super-directory. 
Furthermore, the split policy in turn relies on one or more split rules to 
determine when to add or remove sub-directories. This allows for a clear 
separation of the event that triggers a split from the management of those 
splits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to