Re: how to take consistant snapshot?
Snapshots trigger a flush first, so data that's currently in the commit log will be covered by the snapshot. On Thu, Dec 6, 2012 at 11:52 PM, Andrey Ilinykh ailin...@gmail.com wrote: On Thu, Dec 6, 2012 at 7:34 PM, aaron morton aa...@thelastpickle.comwrote: For background http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups If you it for a single node then yes there is a chance of inconsistency across CF's. If you have mulitple nodes the snashots you take on the later nodes will help. If you use CL QUOURM for reads you *may* be ok (cannot work it out quickly.). If you use CL ALL for reads you will be ok. Or you can use nodetool repair to ensure the data is consistent. I'm talking about restoring whole cluster, so all nodes are restored from backup and all of them are inconsistent because they lost data from commit logs. It doesn't matter what CL I use, some data may be lost. Cassandra 1.1 supports commit log archiving http://www.datastax.com/docs/1.1/configuration/commitlog_archiving I think if I store both flushed sstables and commit logs it should solve my problem. I'm wondering if someone has any experience with this feature? Thank you, Andrey -- Tyler Hobbs DataStax http://datastax.com/
Re: how to take consistant snapshot?
That's right. But when I have incremental backup on each CF gets flushed independently. I have hot CF which gets flushed every several minutes and regular CF which gets flushed every hour or so. They have references to each other and data in sstables is definitely inconsistent. On Fri, Dec 7, 2012 at 9:28 AM, Tyler Hobbs ty...@datastax.com wrote: Snapshots trigger a flush first, so data that's currently in the commit log will be covered by the snapshot. On Thu, Dec 6, 2012 at 11:52 PM, Andrey Ilinykh ailin...@gmail.comwrote: On Thu, Dec 6, 2012 at 7:34 PM, aaron morton aa...@thelastpickle.comwrote: For background http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups If you it for a single node then yes there is a chance of inconsistency across CF's. If you have mulitple nodes the snashots you take on the later nodes will help. If you use CL QUOURM for reads you *may* be ok (cannot work it out quickly.). If you use CL ALL for reads you will be ok. Or you can use nodetool repair to ensure the data is consistent. I'm talking about restoring whole cluster, so all nodes are restored from backup and all of them are inconsistent because they lost data from commit logs. It doesn't matter what CL I use, some data may be lost. Cassandra 1.1 supports commit log archiving http://www.datastax.com/docs/1.1/configuration/commitlog_archiving I think if I store both flushed sstables and commit logs it should solve my problem. I'm wondering if someone has any experience with this feature? Thank you, Andrey -- Tyler Hobbs DataStax http://datastax.com/
Re: how to take consistant snapshot?
Right. I don't personally think incremental backup is useful beyond restoring individual nodes unless none of your data happens to reference any other rows. On Fri, Dec 7, 2012 at 11:37 AM, Andrey Ilinykh ailin...@gmail.com wrote: That's right. But when I have incremental backup on each CF gets flushed independently. I have hot CF which gets flushed every several minutes and regular CF which gets flushed every hour or so. They have references to each other and data in sstables is definitely inconsistent. On Fri, Dec 7, 2012 at 9:28 AM, Tyler Hobbs ty...@datastax.com wrote: Snapshots trigger a flush first, so data that's currently in the commit log will be covered by the snapshot. On Thu, Dec 6, 2012 at 11:52 PM, Andrey Ilinykh ailin...@gmail.comwrote: On Thu, Dec 6, 2012 at 7:34 PM, aaron morton aa...@thelastpickle.comwrote: For background http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups If you it for a single node then yes there is a chance of inconsistency across CF's. If you have mulitple nodes the snashots you take on the later nodes will help. If you use CL QUOURM for reads you *may* be ok (cannot work it out quickly.). If you use CL ALL for reads you will be ok. Or you can use nodetool repair to ensure the data is consistent. I'm talking about restoring whole cluster, so all nodes are restored from backup and all of them are inconsistent because they lost data from commit logs. It doesn't matter what CL I use, some data may be lost. Cassandra 1.1 supports commit log archiving http://www.datastax.com/docs/1.1/configuration/commitlog_archiving I think if I store both flushed sstables and commit logs it should solve my problem. I'm wondering if someone has any experience with this feature? Thank you, Andrey -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
Re: how to take consistant snapshot?
Agreed. On Fri, Dec 7, 2012 at 12:38 PM, Tyler Hobbs ty...@datastax.com wrote: Right. I don't personally think incremental backup is useful beyond restoring individual nodes unless none of your data happens to reference any other rows. On Fri, Dec 7, 2012 at 11:37 AM, Andrey Ilinykh ailin...@gmail.comwrote: That's right. But when I have incremental backup on each CF gets flushed independently. I have hot CF which gets flushed every several minutes and regular CF which gets flushed every hour or so. They have references to each other and data in sstables is definitely inconsistent. On Fri, Dec 7, 2012 at 9:28 AM, Tyler Hobbs ty...@datastax.com wrote: Snapshots trigger a flush first, so data that's currently in the commit log will be covered by the snapshot. On Thu, Dec 6, 2012 at 11:52 PM, Andrey Ilinykh ailin...@gmail.comwrote: On Thu, Dec 6, 2012 at 7:34 PM, aaron morton aa...@thelastpickle.comwrote: For background http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups If you it for a single node then yes there is a chance of inconsistency across CF's. If you have mulitple nodes the snashots you take on the later nodes will help. If you use CL QUOURM for reads you *may* be ok (cannot work it out quickly.). If you use CL ALL for reads you will be ok. Or you can use nodetool repair to ensure the data is consistent. I'm talking about restoring whole cluster, so all nodes are restored from backup and all of them are inconsistent because they lost data from commit logs. It doesn't matter what CL I use, some data may be lost. Cassandra 1.1 supports commit log archiving http://www.datastax.com/docs/1.1/configuration/commitlog_archiving I think if I store both flushed sstables and commit logs it should solve my problem. I'm wondering if someone has any experience with this feature? Thank you, Andrey -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
Re: how to take consistant snapshot?
For background http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backups If you it for a single node then yes there is a chance of inconsistency across CF's. If you have mulitple nodes the snashots you take on the later nodes will help. If you use CL QUOURM for reads you *may* be ok (cannot work it out quickly.). If you use CL ALL for reads you will be ok. Or you can use nodetool repair to ensure the data is consistent. I doubt that even using repair would give you a provable guarantee though. Anyone ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/12/2012, at 7:56 AM, Andrey Ilinykh ailin...@gmail.com wrote: Hello, everybody! I have production cluster with incremental backup on and I want to clone it (create test one). I don't understand one thing- each column family gets flushed (and copied to backup storage) independently. Which means the total snapshot is inconsistent. If I restore from such snapshot I have totally useless system. To be more specific, let's say I have two CF, one serves as an index for another. Every time I update one CF I update index CF. There is a good chance that all replicas flush index CF first. Then I move it into backup storage, restore and get CF which has pointers to non existent data in another CF. What is a way to avoid this situation? Thank you, Andrey
Re: how to take consistant snapshot?
On Thu, Dec 6, 2012 at 7:34 PM, aaron morton aa...@thelastpickle.comwrote: For background http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups If you it for a single node then yes there is a chance of inconsistency across CF's. If you have mulitple nodes the snashots you take on the later nodes will help. If you use CL QUOURM for reads you *may* be ok (cannot work it out quickly.). If you use CL ALL for reads you will be ok. Or you can use nodetool repair to ensure the data is consistent. I'm talking about restoring whole cluster, so all nodes are restored from backup and all of them are inconsistent because they lost data from commit logs. It doesn't matter what CL I use, some data may be lost. Cassandra 1.1 supports commit log archiving http://www.datastax.com/docs/1.1/configuration/commitlog_archiving I think if I store both flushed sstables and commit logs it should solve my problem. I'm wondering if someone has any experience with this feature? Thank you, Andrey
how to take consistant snapshot?
Hello, everybody! I have production cluster with incremental backup on and I want to clone it (create test one). I don't understand one thing- each column family gets flushed (and copied to backup storage) independently. Which means the total snapshot is inconsistent. If I restore from such snapshot I have totally useless system. To be more specific, let's say I have two CF, one serves as an index for another. Every time I update one CF I update index CF. There is a good chance that all replicas flush index CF first. Then I move it into backup storage, restore and get CF which has pointers to non existent data in another CF. What is a way to avoid this situation? Thank you, Andrey