[jira] [Updated] (LUCENE-2632) FilteringCodec, TeeCodec, TeeDirectory

Andrzej Bialecki (Updated) (JIRA) Mon, 13 Feb 2012 17:41:23 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrzej Bialecki  updated LUCENE-2632:
--------------------------------------

    Description: 
This issue adds two new Codec implementations:

* TeeCodec: there have been attempts in the past to implement parallel writing 
to multiple indexes so that they are all synchronized. This was however 
complicated due to the complexity of IndexWriter/SegmentMerger logic. The 
solution presented here offers a similar functionality but working on a 
different level - as the name suggests, the TeeCodec duplicates index data into 
multiple output Directories.

* TeeDirectory (used also in TeeCodec) is a simple abstraction to perform 
Directory operations on several directories in parallel (effectively mirroring 
their data). Optionally it's possible to specify a set of suffixes of files 
that should be mirrored so that non-matching files are skipped.

* FilteringCodec is related in a remote way to the ideas of index pruning 
presented in LUCENE-1812 and the concept of tiered search. Since we can use 
TeeCodec to write to multiple output Directories in a synchronized way, we 
could also filter out or modify some of the data that is being written. The 
FilteringCodec provides this functionality, so that you can use like this:
{code}
IndexWriter --> TeeCodec
                 |  |
                 |  +--> StandardCodec --> Directory1
                 +--> FilteringCodec --> StandardCodec --> Directory2
{code}

The end result of this chain is two indexes that are kept in sync - one is the 
full regular index, and the other one is a filtered index.

  was:
This issue adds two new Codec implementations:

* TeeSinkCodec: there have been attempts in the past to implement parallel 
writing to multiple indexes so that they are all synchronized. This was however 
complicated due to the complexity of IndexWriter/SegmentMerger logic. The 
solution presented here offers a similar functionality but working on a 
different level - as the name suggests, the TeeSinkCodec duplicates term data 
into multiple output Directories, and provides a multi-directory abstraction to 
perform other operations that are not yet handled by the Codec API (e.g. stored 
fields handling).

* FilteringCodec is related in a remote way to the ideas of index pruning 
presented in LUCENE-1812 and the concept of tiered search. Since we can use 
TeeSinkCodec to write to multiple output Directories in a synchronized way, we 
could also filter out or modify some of the data that is being written. The 
FilteringCodec provides this functionality, so that you can use like this:
{code}
IndexWriter --> TeeSinkCodec
                 |  |
                 |  +--> StandardCodec --> Directory1
                 +--> FilteringCodec --> StandardCodec --> Directory2
{code}

The end result of this chain is two indexes that are kept in sync - one is the 
full regular index, and the other one is a filtered index.

        Summary: FilteringCodec, TeeCodec, TeeDirectory  (was: TeeSinkCodec and 
FilteringCodec)

Patch updated to the latest trunk.

A few notes:
* FilteringCodec tests pass
* TeeDirectory tests pass
* TeeCodec tests pass only partially, failures occur during merge operations.
                
> FilteringCodec, TeeCodec, TeeDirectory
> --------------------------------------
>
>                 Key: LUCENE-2632
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2632
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Andrzej Bialecki 
>         Attachments: LUCENE-2632.patch
>
>
> This issue adds two new Codec implementations:
> * TeeCodec: there have been attempts in the past to implement parallel 
> writing to multiple indexes so that they are all synchronized. This was 
> however complicated due to the complexity of IndexWriter/SegmentMerger logic. 
> The solution presented here offers a similar functionality but working on a 
> different level - as the name suggests, the TeeCodec duplicates index data 
> into multiple output Directories.
> * TeeDirectory (used also in TeeCodec) is a simple abstraction to perform 
> Directory operations on several directories in parallel (effectively 
> mirroring their data). Optionally it's possible to specify a set of suffixes 
> of files that should be mirrored so that non-matching files are skipped.
> * FilteringCodec is related in a remote way to the ideas of index pruning 
> presented in LUCENE-1812 and the concept of tiered search. Since we can use 
> TeeCodec to write to multiple output Directories in a synchronized way, we 
> could also filter out or modify some of the data that is being written. The 
> FilteringCodec provides this functionality, so that you can use like this:
> {code}
> IndexWriter --> TeeCodec
>                  |  |
>                  |  +--> StandardCodec --> Directory1
>                  +--> FilteringCodec --> StandardCodec --> Directory2
> {code}
> The end result of this chain is two indexes that are kept in sync - one is 
> the full regular index, and the other one is a filtered index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-2632) FilteringCodec, TeeCodec, TeeDirectory

Reply via email to