Re: best practices on EC2 question

2013-05-17 Thread aaron morton
  b) do people skip backups altogether except for huge outages and just let 
 rebooted server instances come up empty to repopulate via C*?
This one. 
Bootstrapping a new node into the cluster has a small impact on the existing 
nodes and the new nodes to have all the data they need when the finish the 
process.

Cheers
  
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/05/2013, at 3:17 AM, Janne Jalkanen janne.jalka...@ecyrd.com wrote:

 On May 16, 2013, at 17:05 , Brian Tarbox tar...@cabotresearch.com wrote:
 
 An alternative that we had explored for a while was to do a two stage backup:
 1) copy a C* snapshot from the ephemeral drive to an EBS drive
 2) do an EBS snapshot to S3.
 
 The idea being that EBS is quite reliable, S3 is still the emergency backup 
 and copying back from EBS to ephemeral is likely much faster than the 15 
 MB/sec we get from S3.
 
 Yup, this is what we do.  We use rsync with --bwlimit=4000 to copy the 
 snapshots from the eph drive to EBS; this is intentionally very low so that 
 the backup process does not take eat our I/O.  This is on m1.xlarge 
 instances; YMMV so measure :).  EBS drives are then snapshot with 
 ec2-consistent-snapshot and then old snapshots expired using 
 ec2-expire-snapshots (I believe these scripts are from Alestic).
 
 /Janne
 



Re: best practices on EC2 question

2013-05-17 Thread Robert Coli
On Fri, May 17, 2013 at 11:13 AM, aaron morton aa...@thelastpickle.com wrote:
 Bootstrapping a new node into the cluster has a small impact on the existing
 nodes and the new nodes to have all the data they need when the finish the
 process.

Sorry for the pedantry, but bootstrapping from existing replicas
cannot guarantee that the new nodes have all the data they need when
they finish the process. There is a non-zero chance that the failed
node contained the single under-replicated copy of a given datum. In
practice if your RF is =2, you are unlikely to experience this type
of data loss. But restore-a-backup-then-repair protects you against
this unlikely case.

=Rob


Re: best practices on EC2 question

2013-05-17 Thread aaron morton
I was considering that when bootstrapping starts the nodes receive writes so 
that when the process is complete they have both the data from the streaming 
process and all writes from the time they started. So that a repair is not 
needed. Compared to bootstrapping a node from a backup where a (non -pr) repair 
is needed on the node to achieve consistency. In that sense the node as all 
it's data when the bootstrap has finished. 

If there is data that is replicated to a single node there is always a risk of 
data loss. The data could have been written in the time between the last backup 
and the node failing. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/05/2013, at 6:32 AM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, May 17, 2013 at 11:13 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 Bootstrapping a new node into the cluster has a small impact on the existing
 nodes and the new nodes to have all the data they need when the finish the
 process.
 
 Sorry for the pedantry, but bootstrapping from existing replicas
 cannot guarantee that the new nodes have all the data they need when
 they finish the process. There is a non-zero chance that the failed
 node contained the single under-replicated copy of a given datum. In
 practice if your RF is =2, you are unlikely to experience this type
 of data loss. But restore-a-backup-then-repair protects you against
 this unlikely case.
 
 =Rob



best practices on EC2 question

2013-05-16 Thread Brian Tarbox
From this list and the NYC* conference it seems that the consensus
configuration of C* on EC2 is to put the data on an ephemeral drive and
then periodically back it the drive to S3...relying on C*'s inherent fault
tolerance to deal with any data loss.

Fine, and we're doing this...but we find that transfer rates from S3 back
to a rebooted server instance are *very *slow...like 15 MB/second or
roughly a minute per gigabyte.  Calling EC2 support resulting in them
saying sorry, that's how it is.

I'm wondering if anyone a) has found a faster way to transfer to S3, or b)
do people skip backups altogether except for huge outages and just let
rebooted server instances come up empty to repopulate via C*?

An alternative that we had explored for a while was to do a two stage
backup:
1) copy a C* snapshot from the ephemeral drive to an EBS drive
2) do an EBS snapshot to S3.

The idea being that EBS is quite reliable, S3 is still the emergency backup
and copying back from EBS to ephemeral is likely much faster than the 15
MB/sec we get from S3.

Thoughts?

Brian


Re: best practices on EC2 question

2013-05-16 Thread Janne Jalkanen
On May 16, 2013, at 17:05 , Brian Tarbox tar...@cabotresearch.com wrote:

 An alternative that we had explored for a while was to do a two stage backup:
 1) copy a C* snapshot from the ephemeral drive to an EBS drive
 2) do an EBS snapshot to S3.
 
 The idea being that EBS is quite reliable, S3 is still the emergency backup 
 and copying back from EBS to ephemeral is likely much faster than the 15 
 MB/sec we get from S3.

Yup, this is what we do.  We use rsync with --bwlimit=4000 to copy the 
snapshots from the eph drive to EBS; this is intentionally very low so that the 
backup process does not take eat our I/O.  This is on m1.xlarge instances; YMMV 
so measure :).  EBS drives are then snapshot with ec2-consistent-snapshot and 
then old snapshots expired using ec2-expire-snapshots (I believe these scripts 
are from Alestic).

/Janne