[jira] [Commented] (LUCENE-4731) New ReplicatingDirectory mirrors index files to HDFS

Shai Erera (JIRA) Tue, 29 Jan 2013 06:49:13 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565414#comment-13565414
 ]


Shai Erera commented on LUCENE-4731:
------------------------------------

Why do you need such a Directory implementation? HDFS already does replication 
(unless you turn it off), so I wonder what does that replication give you, that 
HDFS replication doesn't?
                
> New ReplicatingDirectory mirrors index files to HDFS
> ----------------------------------------------------
>
>                 Key: LUCENE-4731
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4731
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/store
>            Reporter: David Arthur
>             Fix For: 4.2, 5.0
>
>         Attachments: ReplicatingDirectory.java
>
>
> I've been working on a Directory implementation that mirrors the index files 
> to HDFS (or other Hadoop supported FileSystem).
> A ReplicatingDirectory delegates all calls to an underlying Directory 
> (supplied in the constructor). The only hooks are the deleteFile and sync 
> calls. We submit deletes and replications to a single scheduler thread to 
> keep things serializer. During a sync call, if "segments.gen" is seen in the 
> list of files, we know a commit is finishing. After calling the deletage's 
> sync method, we initialize an asynchronous replication as follows.
> * Read segments.gen (before leaving ReplicatingDirectory#sync), save the 
> values for later
> * Get a list of local files from ReplicatingDirectory#listAll before leaving 
> ReplicatingDirectory#sync
> * Submit replication task (DirectoryReplicator) to scheduler thread
> * Compare local files to remote files, determine which remote files get 
> deleted, and which need to get copied
> * Submit a thread to copy each file (one thead per file)
> * Submit a thread to delete each file (one thead per file)
> * Submit a "finalizer" thread. This thread waits on the previous two batches 
> of threads to finish. Once finished, this thread generates a new 
> "segments.gen" remotely (using the version and generation number previously 
> read in).
> I have no idea where this would belong in the Lucene project, so i'll just 
> attach the standalone class instead of a patch. It introduces dependencies on 
> Hadoop core (and all the deps that brings with it).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4731) New ReplicatingDirectory mirrors index files to HDFS

Reply via email to