Re: best practices on EC2 question
b) do people skip backups altogether except for huge outages and just let rebooted server instances come up empty to repopulate via C*? This one. Bootstrapping a new node into the cluster has a small impact on the existing nodes and the new nodes to have all the data they need when the finish the process. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 17/05/2013, at 3:17 AM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: On May 16, 2013, at 17:05 , Brian Tarbox tar...@cabotresearch.com wrote: An alternative that we had explored for a while was to do a two stage backup: 1) copy a C* snapshot from the ephemeral drive to an EBS drive 2) do an EBS snapshot to S3. The idea being that EBS is quite reliable, S3 is still the emergency backup and copying back from EBS to ephemeral is likely much faster than the 15 MB/sec we get from S3. Yup, this is what we do. We use rsync with --bwlimit=4000 to copy the snapshots from the eph drive to EBS; this is intentionally very low so that the backup process does not take eat our I/O. This is on m1.xlarge instances; YMMV so measure :). EBS drives are then snapshot with ec2-consistent-snapshot and then old snapshots expired using ec2-expire-snapshots (I believe these scripts are from Alestic). /Janne
Re: best practices on EC2 question
On Fri, May 17, 2013 at 11:13 AM, aaron morton aa...@thelastpickle.com wrote: Bootstrapping a new node into the cluster has a small impact on the existing nodes and the new nodes to have all the data they need when the finish the process. Sorry for the pedantry, but bootstrapping from existing replicas cannot guarantee that the new nodes have all the data they need when they finish the process. There is a non-zero chance that the failed node contained the single under-replicated copy of a given datum. In practice if your RF is =2, you are unlikely to experience this type of data loss. But restore-a-backup-then-repair protects you against this unlikely case. =Rob
Re: best practices on EC2 question
I was considering that when bootstrapping starts the nodes receive writes so that when the process is complete they have both the data from the streaming process and all writes from the time they started. So that a repair is not needed. Compared to bootstrapping a node from a backup where a (non -pr) repair is needed on the node to achieve consistency. In that sense the node as all it's data when the bootstrap has finished. If there is data that is replicated to a single node there is always a risk of data loss. The data could have been written in the time between the last backup and the node failing. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/05/2013, at 6:32 AM, Robert Coli rc...@eventbrite.com wrote: On Fri, May 17, 2013 at 11:13 AM, aaron morton aa...@thelastpickle.com wrote: Bootstrapping a new node into the cluster has a small impact on the existing nodes and the new nodes to have all the data they need when the finish the process. Sorry for the pedantry, but bootstrapping from existing replicas cannot guarantee that the new nodes have all the data they need when they finish the process. There is a non-zero chance that the failed node contained the single under-replicated copy of a given datum. In practice if your RF is =2, you are unlikely to experience this type of data loss. But restore-a-backup-then-repair protects you against this unlikely case. =Rob
best practices on EC2 question
From this list and the NYC* conference it seems that the consensus configuration of C* on EC2 is to put the data on an ephemeral drive and then periodically back it the drive to S3...relying on C*'s inherent fault tolerance to deal with any data loss. Fine, and we're doing this...but we find that transfer rates from S3 back to a rebooted server instance are *very *slow...like 15 MB/second or roughly a minute per gigabyte. Calling EC2 support resulting in them saying sorry, that's how it is. I'm wondering if anyone a) has found a faster way to transfer to S3, or b) do people skip backups altogether except for huge outages and just let rebooted server instances come up empty to repopulate via C*? An alternative that we had explored for a while was to do a two stage backup: 1) copy a C* snapshot from the ephemeral drive to an EBS drive 2) do an EBS snapshot to S3. The idea being that EBS is quite reliable, S3 is still the emergency backup and copying back from EBS to ephemeral is likely much faster than the 15 MB/sec we get from S3. Thoughts? Brian
Re: best practices on EC2 question
On May 16, 2013, at 17:05 , Brian Tarbox tar...@cabotresearch.com wrote: An alternative that we had explored for a while was to do a two stage backup: 1) copy a C* snapshot from the ephemeral drive to an EBS drive 2) do an EBS snapshot to S3. The idea being that EBS is quite reliable, S3 is still the emergency backup and copying back from EBS to ephemeral is likely much faster than the 15 MB/sec we get from S3. Yup, this is what we do. We use rsync with --bwlimit=4000 to copy the snapshots from the eph drive to EBS; this is intentionally very low so that the backup process does not take eat our I/O. This is on m1.xlarge instances; YMMV so measure :). EBS drives are then snapshot with ec2-consistent-snapshot and then old snapshots expired using ec2-expire-snapshots (I believe these scripts are from Alestic). /Janne