Following up on my own post, I had a little free time the other day and decided 
to investigate whether this was feasible.  Setting up the necessary services on 
Amazon was trivial, including access control and block storage.  I tried s3fs 
first, and it worked, but it felt like there was way too much i/o going on for 
that kind of data (which is pretty much what I expected).  Then I tried putting 
my bacula-sd on an EC2 node, writing to files on EBS, and it worked great 
(spooling first to the "local" drive on EC2).  Throughput however was somewhat 
less than I was hoping for, approx. 25% of what I get locally to spool and then 
to tape.  However, I found that there was NO performance penalty for running 
two jobs concurrently.  I didn't try larger numbers, but my guess is you can 
run a large number of concurrent jobs to get a pretty good effective 
throughput, assuming you have lots of clients with similar data sizes.

Our problem is that 80% of our data is on one client, and it would take 130 
hours to do a full backup, and our backup window simply isn't that long.  Then 
I thought I could break the FileSets into smaller pieces and run multiple 
backup jobs in parallel (and I'm assuming that my client is not the 
bottleneck).  However, it wouldn't run more than one job on that client 
concurrently.  Since I can run multiple clients concurrently, I'm pretty sure 
my bacula-dir.conf and bacula-sd.conf settings are correct, and my 
bacula-fd.conf specifies "Maximum Concurrent Jobs = 20"... Any other reason why 
I couldn't run say 5 parallel jobs with different filesets off the same client?

From: Peter Zenge [mailto:pze...@ilinc.com]
Sent: Tuesday, March 02, 2010 2:57 PM
To: bacula-users@lists.sourceforge.net
Subject: [Bacula-users] Bacula to the Cloud

Hello, 2 year Bacula user but first-time poster.  I'm currently dumping about 
1.6TB to LTO2 tapes every week and I'm looking to migrate to a new storage 
medium.

The obvious answer, I think, is a direct-attached disk array (which I would be 
able to put in a remote gigabit-attached datacenter before too long).  However, 
I'm wondering if anyone is currently doing large (or what seem to me to be 
large) backups to the cloud in some way?  Assuming I have a gigabit connection 
to the Internet from my datacenter, I'm wondering how feasible it would be to 
either use something like Amazon S3 with s3fs (I'm guessing way too much 
overhead to be efficient), or a bacula-SD implementation on an EC2 node, using 
Elastic Block Store (EBS) as "local" disk, and VPN (Amazon VPC) between my 
datacenter and the SD.

Substitute your favorite cloud provider for Amazon above; I don't use any right 
now so not tied to any particular provider.  It just seems like Amazon has all 
the necessary pieces today.

To do this, and keep customers comfortable with the idea of data in the cloud, 
we would need to encrypt, so I'm also wondering if it would be possible for the 
SD to encrypt the backup volume, rather than the FD encrypt the data before 
sending it to SD (which is what we do now)?  Easier to manage if we just 
handled encryption in one place for all clients.

I would love to hear what other people are either doing with Bacula and the 
cloud, or why you have decided not to.

Thanks

Peter Zenge
Pzenge .at. ilinc .dot. com


------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to