[ 
https://issues.apache.org/jira/browse/CASSANDRA-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180959#comment-13180959
 ] 

Vijay commented on CASSANDRA-3690:
----------------------------------

Hi Jonathan, But there is additional IO which the server has to do to copy the 
archive logs to a different location (not locally)... 
While streaming the Commit log back to the server we have to copy it first and 
then read it back which is also a over head in recovery. 

Something like copying the data to S3 in amazon and copying right back for the 
node for recovery. (this backup will also be used for test cluster refresh for 
prod data and BI which is completely a different system)
Recovery in most case are loose of instance or the whole cluster (Virtual 
machines).
                
> Streaming CommitLog backup
> --------------------------
>
>                 Key: CASSANDRA-3690
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3690
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>
> Problems with the current SST backups
> 1) The current backup doesn't allow us to restore point in time (within a SST)
> 2) Current SST implementation needs the backup to read from the filesystem 
> and hence additional IO during the normal operational Disks
> 3) in 1.0 we have removed the flush interval and size when the flush will be 
> triggered per CF, 
>           For some use cases where there is less writes it becomes 
> increasingly difficult to time it right.
> 4) Use cases which needs BI which are external (Non cassandra), needs the 
> data in regular intervals than waiting for longer or unpredictable intervals.
> Disadvantages of the new solution
> 1) Over head in processing the mutations during the recover phase.
> 2) More complicated solution than just copying the file to the archive.
> Additional advantages:
> Online and offline restore.
> Close to live incremental backup.
> Note: If the listener agent gets restarted, it is the agents responsibility 
> to Stream the files missed or incomplete.
> There are 3 Options in the initial implementation:
> 1) Backup -> Once a socket is connected we will switch the commit log and 
> send new updates via the socket.
> 2) Stream -> will take the absolute path of the file and will read the file 
> and send the updates via the socket.
> 3) Restore -> this will get the serialized bytes and apply's the mutation.
> Side NOTE: (Not related to this patch as such) The agent which will take 
> incremental backup is planned to be open sourced soon (Name: Priam).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to