Following up on my own post, I had a little free time the other day and decided
to investigate whether this was feasible. Setting up the necessary services on
Amazon was trivial, including access control and block storage. I tried s3fs
first, and it worked, but it felt like there was way too much i/o going on for
that kind of data (which is pretty much what I expected). Then I tried putting
my bacula-sd on an EC2 node, writing to files on EBS, and it worked great
(spooling first to the "local" drive on EC2). Throughput however was somewhat
less than I was hoping for, approx. 25% of what I get locally to spool and then
to tape. However, I found that there was NO performance penalty for running
two jobs concurrently. I didn't try larger numbers, but my guess is you can
run a large number of concurrent jobs to get a pretty good effective
throughput, assuming you have lots of clients with similar data sizes.
Our problem is that 80% of our data is on one client, and it would take 130
hours to do a full backup, and our backup window simply isn't that long. Then
I thought I could break the FileSets into smaller pieces and run multiple
backup jobs in parallel (and I'm assuming that my client is not the
bottleneck). However, it wouldn't run more than one job on that client
concurrently. Since I can run multiple clients concurrently, I'm pretty sure
my bacula-dir.conf and bacula-sd.conf settings are correct, and my
bacula-fd.conf specifies "Maximum Concurrent Jobs = 20"... Any other reason why
I couldn't run say 5 parallel jobs with different filesets off the same client?
From: Peter Zenge [mailto:pze...@ilinc.com]
Sent: Tuesday, March 02, 2010 2:57 PM
To: bacula-users@lists.sourceforge.net
Subject: [Bacula-users] Bacula to the Cloud
Hello, 2 year Bacula user but first-time poster. I'm currently dumping about
1.6TB to LTO2 tapes every week and I'm looking to migrate to a new storage
medium.
The obvious answer, I think, is a direct-attached disk array (which I would be
able to put in a remote gigabit-attached datacenter before too long). However,
I'm wondering if anyone is currently doing large (or what seem to me to be
large) backups to the cloud in some way? Assuming I have a gigabit connection
to the Internet from my datacenter, I'm wondering how feasible it would be to
either use something like Amazon S3 with s3fs (I'm guessing way too much
overhead to be efficient), or a bacula-SD implementation on an EC2 node, using
Elastic Block Store (EBS) as "local" disk, and VPN (Amazon VPC) between my
datacenter and the SD.
Substitute your favorite cloud provider for Amazon above; I don't use any right
now so not tied to any particular provider. It just seems like Amazon has all
the necessary pieces today.
To do this, and keep customers comfortable with the idea of data in the cloud,
we would need to encrypt, so I'm also wondering if it would be possible for the
SD to encrypt the backup volume, rather than the FD encrypt the data before
sending it to SD (which is what we do now)? Easier to manage if we just
handled encryption in one place for all clients.
I would love to hear what other people are either doing with Bacula and the
cloud, or why you have decided not to.
Thanks
Peter Zenge
Pzenge .at. ilinc .dot. com
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users