[jira] Commented: (LUCENE-2425) An Anti-Merging Multi-Directory Indexing Framework

Michael McCandless (JIRA) Mon, 03 May 2010 03:08:22 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863294#action_12863294
 ]


Michael McCandless commented on LUCENE-2425:
--------------------------------------------

>From a distance this looks very interesting!

It looks roughly similar to ParallelReader (and the ParallelWriter 
proposed/iterating on LUCENE-1879) is trying to accomplish, except they split a 
single document into different slices by field, whereas this issue is sending 
different documents to different slices.

It looks like you split "under" the Directory abstraction?  How do you handle 
the doc store (term vectors, stored fields) files, which IW normally writes 
as-it-indexes to single open IndexOutputs,?

> An Anti-Merging Multi-Directory Indexing Framework
> --------------------------------------------------
>
>                 Key: LUCENE-2425
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2425
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*, Index
>    Affects Versions: 3.0.1
>            Reporter: Karthick Sankarachary
>         Attachments: LUCENE-2425.patch
>
>
> By design, a Lucene index tends to merge documents that span multiple 
> segments into fewer segments, in order to optimize its directory structure, 
> which in turn leads to better search performance. In particular, it relies on 
> a merge policy to specify the set of merge operations that should be 
> performed when the index is optimized. 
> Often times, there's a need to do the exact opposite, which is to "split" the 
> documents. This calls for a mechanism that facilitates sub-division of 
> documents based on a certain (ideally, user-defined) algorithm. By way of 
> example, one may wish to sub-divide (or partition) documents based on 
> parameters such as time, space, real-timeliness, and so on. Herein, we 
> describe an indexing framework that builds on the Lucene index writer and 
> reader, to address use cases wherein documents need to diverge rather than 
> converge.
> In brief, it associates zero or more sub-directories with the index's 
> directory, which serve to complement it in some manner. The sub-directories 
> (a.k.a. splits) are managed by a split policy, which is notified of all 
> changes made to the index directory (a.k.a. super-directory), thus allowing 
> it to modify its sub-directories as it sees fit. To make the index reader and 
> writer "observable", we extend Lucene's reader and writer with the goal of 
> providing hooks into every method that could potentially change the index. 
> This allows for propagation of such changes to the split policy, which 
> essentially acts as a listener on the index.
> We refer to each sub-directory (or split) and the super-directory as a 
> sub-index of the containing index (a.k.a. the split index). Note that the 
> sub-directory may not necessarily be co-located with the super-directory. 
> Furthermore, the split policy in turn relies on one or more split rules to 
> determine when to add or remove sub-directories. This allows for a clear 
> separation of the event that triggers a split from the management of those 
> splits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2425) An Anti-Merging Multi-Directory Indexing Framework

Reply via email to