[ 
https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamara Alexander updated CASSANDRA-1602:
----------------------------------------

    Attachment: 1602-v3.txt

Attached newer patch.

bin/logreplay has required mode 'normal' or 'forced' as before and now the 
option to identify a maxtimestamp, after which mutations from commit logs will 
not be applied

> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>         Attachments: 1602-0.6.4.txt, 1602-cassandra0.6.txt, 1602-v2.txt, 
> 1602-v3.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see 
> discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) 
> to retain (archive) commit logs generated by cassandra and  restore data by 
> rolling forward of commit logs to previously backed up or snapshotted data 
> files.
> Here is an instruction of how to use it, which i extracted from out internal 
> wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, 
> which can lead to data destruction on the whole cluster. But the freshest 
> backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive  backup 
> strategy - i.e. we collect commit logs and snapshotted data files. On event 
> of data loss, either due hardware failure or logical bug, we restore last 
> snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by 
> myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink 
> (unix command "ln $1 $2") is created from just closed commit log file to 
> commit log archive directory. Both commit log and commit log archive are on 
> the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit 
> log archive dir and copies them over net to a backup location. As soon as 
> file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do 
> data snapshots from time to time using "nodetool snapshot" command, available 
> from standard cassandra distribution and copies snapshot files to backup 
> location.
> ## Creating a snapshot is very light operation for cassandra - under the hood 
> it is just hardlinking currently existing files to 
> "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to 
> pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit 
> logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older 
> than snapshot data files could be copied. copying too old will do no harm, 
> but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility 
> (bin/logreplay) with option -forced or -forcedcompaction and wait for its 
> completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in 
> storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as 
> shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running 
> org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node 
> (use option "-forcedcompact" to do major compaction right after log roll 
> forward process completion). I also made a script named 
> <cassandra-dir>/bin/logreplay.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to