[jira] [Commented] (LUCENE-4731) New ReplicatingDirectory mirrors index files to HDFS

Shai Erera (JIRA) Thu, 31 Jan 2013 05:31:16 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567621#comment-13567621
 ]


Shai Erera commented on LUCENE-4731:
------------------------------------

bq. Not exactly, just no other replication or delete events will happen

Well in that case then you could run into troubles. I.e. imagine two threads, 
one doing commit() and one doing replication. The commit() thread could be much 
faster than the replication one. Therefore, it can do commit(#1), replication 
thread starts to replication that index commit. In the middle, the commit 
thread does commit(#2), which deletes some files of the previous commit (e.g. 
due to segment merging), and the replication thread will be left with a corrupt 
commit ...

bq. Is that what the SnapshotDeletionPolicy does

Yes. You can see how it's used in the tests. Also, here's a thread from the 
user list with an example code: http://markmail.org/message/3novogsi6vcgarur.

I am not sure if Solr uses it, but I think it does. I mean .. it's the "safe" 
way to replicate/backup your index.

Lucene doesn't have an RPC server built-in .. I wrote a simple Servlet that 
responds to some REST API to invoke replication ...
                
> New ReplicatingDirectory mirrors index files to HDFS
> ----------------------------------------------------
>
>                 Key: LUCENE-4731
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4731
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/store
>            Reporter: David Arthur
>             Fix For: 4.2, 5.0
>
>         Attachments: ReplicatingDirectory.java
>
>
> I've been working on a Directory implementation that mirrors the index files 
> to HDFS (or other Hadoop supported FileSystem).
> A ReplicatingDirectory delegates all calls to an underlying Directory 
> (supplied in the constructor). The only hooks are the deleteFile and sync 
> calls. We submit deletes and replications to a single scheduler thread to 
> keep things serializer. During a sync call, if "segments.gen" is seen in the 
> list of files, we know a commit is finishing. After calling the deletage's 
> sync method, we initialize an asynchronous replication as follows.
> * Read segments.gen (before leaving ReplicatingDirectory#sync), save the 
> values for later
> * Get a list of local files from ReplicatingDirectory#listAll before leaving 
> ReplicatingDirectory#sync
> * Submit replication task (DirectoryReplicator) to scheduler thread
> * Compare local files to remote files, determine which remote files get 
> deleted, and which need to get copied
> * Submit a thread to copy each file (one thead per file)
> * Submit a thread to delete each file (one thead per file)
> * Submit a "finalizer" thread. This thread waits on the previous two batches 
> of threads to finish. Once finished, this thread generates a new 
> "segments.gen" remotely (using the version and generation number previously 
> read in).
> I have no idea where this would belong in the Lucene project, so i'll just 
> attach the standalone class instead of a patch. It introduces dependencies on 
> Hadoop core (and all the deps that brings with it).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4731) New ReplicatingDirectory mirrors index files to HDFS

Reply via email to