[ 
https://issues.apache.org/jira/browse/CASSANDRA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920451#action_12920451
 ] 

Oleg Anastasyev commented on CASSANDRA-1602:
--------------------------------------------

Having hard link created so soon is not good - some ppl backup commit logs to a 
safe location. So if hard link is created before commit log segment is actually 
closed - it is hard to determine - is it ready for copy or not. That's why in 
my code hard link is created only after commit log segment is closed and will 
never been written to.

> Commit Log archivation and rolling forward utility (AKA Retaining commit logs)
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1602
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1602
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>             Fix For: 0.6.7
>
>         Attachments: 1602-0.6.4.txt, 1602-cassandra0.6.txt, 1602-v2.txt
>
>
> As couple of people from mailing list suggested me to share this patch (see 
> discussion at http://comments.gmane.org/gmane.comp.db.cassandra.user/9423 ) 
> to retain (archive) commit logs generated by cassandra and  restore data by 
> rolling forward of commit logs to previously backed up or snapshotted data 
> files.
> Here is an instruction of how to use it, which i extracted from out internal 
> wiki:
> We rely on cassandra replication factor for disaster recovery.
> But there is another problem: there are bugs in data manipulation logic, 
> which can lead to data destruction on the whole cluster. But the freshest 
> backup to restore from is last snapshot, which can be up to 24h as old.
> To fight with it, we decided to implement snapshot + log archive  backup 
> strategy - i.e. we collect commit logs and snapshotted data files. On event 
> of data loss, either due hardware failure or logical bug, we restore last 
> snapshot and roll forward all logs, collected since last snapshot time to it.
> Originally cassandra does not support log archive , so I implemented it by 
> myself.
> The idea is simple:
> # As soon as commit log file is not needed anymore by cassadra, a hardlink 
> (unix command "ln $1 $2") is created from just closed commit log file to 
> commit log archive directory. Both commit log and commit log archive are on 
> the same volume, of course.
> # Some script (which authoring i left to admin) then takes files from commit 
> log archive dir and copies them over net to a backup location. As soon as 
> file is transferred, it can be safely deleted from commit log archive dir.
> # Dont forget there must be some script (also authored by admins), which do 
> data snapshots from time to time using "nodetool snapshot" command, available 
> from standard cassandra distribution and copies snapshot files to backup 
> location.
> ## Creating a snapshot is very light operation for cassandra - under the hood 
> it is just hardlinking currently existing files to 
> "snapshot/<timestamp-millis>" directory. So frequence is up to our ability to 
> pull snapshotted data files over network.
> # To restore data, admin must:
> ## stop cassandra instance
> ## remove all corrupted data files from /data directory. Leave only commit 
> logs you want to roll forward in /commitlog dir
> ## copy to /data last snapshot data files
> ## copy to /commitlog all archived commit log files (only files not older 
> than snapshot data files could be copied. copying too old will do no harm, 
> but will extend processing time)
> ## *not starting* cassandra instance, run roll forward utility 
> (bin/logreplay) with option -forced or -forcedcompaction and wait for its 
> completion
> ## then start cassandra node instance as usual.
> Log archive&nbsp; logic is activated using CommitLogArchive directive in 
> storage-conf.xml:
> {code:xml}
> <CommitLogDirectory>/commitlog</CommitLogDirectory>
> <CommitLogArchive>true</CommitLogArchive>
> <DataFileDirectories>
> <DataFileDirectory>/data</DataFileDirectory>
> </DataFileDirectories>
> {code}Log files will be archived to <commit-dir>/.archive directory, i.e. as 
> shown in example above, to /commitlog/.archive directory.
> Archived logs replay process is launched by running 
> org.apache.cassandra.tools.ReplayLogs with option "-forced" locally on node 
> (use option "-forcedcompact" to do major compaction right after log roll 
> forward process completion). I also made a script named 
> <cassandra-dir>/bin/logreplay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to