[ https://issues.apache.org/jira/browse/LUCENE-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565414#comment-13565414 ]
Shai Erera commented on LUCENE-4731: ------------------------------------ Why do you need such a Directory implementation? HDFS already does replication (unless you turn it off), so I wonder what does that replication give you, that HDFS replication doesn't? > New ReplicatingDirectory mirrors index files to HDFS > ---------------------------------------------------- > > Key: LUCENE-4731 > URL: https://issues.apache.org/jira/browse/LUCENE-4731 > Project: Lucene - Core > Issue Type: New Feature > Components: core/store > Reporter: David Arthur > Fix For: 4.2, 5.0 > > Attachments: ReplicatingDirectory.java > > > I've been working on a Directory implementation that mirrors the index files > to HDFS (or other Hadoop supported FileSystem). > A ReplicatingDirectory delegates all calls to an underlying Directory > (supplied in the constructor). The only hooks are the deleteFile and sync > calls. We submit deletes and replications to a single scheduler thread to > keep things serializer. During a sync call, if "segments.gen" is seen in the > list of files, we know a commit is finishing. After calling the deletage's > sync method, we initialize an asynchronous replication as follows. > * Read segments.gen (before leaving ReplicatingDirectory#sync), save the > values for later > * Get a list of local files from ReplicatingDirectory#listAll before leaving > ReplicatingDirectory#sync > * Submit replication task (DirectoryReplicator) to scheduler thread > * Compare local files to remote files, determine which remote files get > deleted, and which need to get copied > * Submit a thread to copy each file (one thead per file) > * Submit a thread to delete each file (one thead per file) > * Submit a "finalizer" thread. This thread waits on the previous two batches > of threads to finish. Once finished, this thread generates a new > "segments.gen" remotely (using the version and generation number previously > read in). > I have no idea where this would belong in the Lucene project, so i'll just > attach the standalone class instead of a patch. It introduces dependencies on > Hadoop core (and all the deps that brings with it). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org