On Fri, 5 Dec 2008 04:45:56 -0500
David Lee Lambert <[EMAIL PROTECTED]> wrote:

> I'm trying to use Bacula to do daily backups of data stored in iSCSI LUNs on 
> a  
> NetApp filer, using NetApp snapshots to ensure consistency.  The hosts to be 
> backed up have dual Gigabit Ethernet connections to the NetApp.  The backup 
> host consists of:
> 
> - a desktop-class (32-bit, 2.4GHz) machine with a single local SATA drive
> - an Overland Storage autochanger with room for 12 LTO-4 tapes
> - a built-in Fast Ethernet adapter (3com 3c509) and an add-in Gigabit 
> Ethernet 
> adapter (Linksys rev 10)
> - running Ubuntu G server and kernel 2.6.22; Bacula is storing its catalog in 
> a local Postgres database
> 
> One issue we've struggled with is speed.  With the GB adapter, reading files 
> from a snapshot via iSCSI, we were consistently getting less than 2MByte/sec, 
> sometimes as low as 300kbyte/sec.  Yesterday we switched to the 100Mbit 
> adapter,  and were sometimes able to almost max it out during a full backup 
> (network usage of 10 to 11 MByte/sec on the Fast Ethernet adapter),  but it 
> also slowed down sometimes: it took 25 minutes to back up a 22GB LUN with 7GB 
> of files,  and it took 25 minutes to back up a 6GB LUN with 1.1GB of files 
> (yes, almost exactly the same amount of total time).

This sounds like some problem outside of bacula.  It helps to test all
of the components separately:
1) How fast the source can read from disk (probably pretty fast since
it's a filer)
2) How fast the source can write to network (probably limited by GbE)
3) Is your network path free or maybe there is contention?
4) How fast the backup machine can read from GbE (what bus is the GbE
adapter on?)  The PCI bus is 32/8*33.3*1000000/ 1048576=127.2
MBytes/second, which is roughly the same as the Gigabit Ethernet.
5) How fast can you write to the tape?  What bus is the HBA on for the
tape drive?  Is it the same PCI bus?  
6) How fast can you write to the DB?  Is it on the single SATA spindle
that is attached to a controller on that same PCI bus?

> 
> I recently did dd to a raw tape and got a speed of at least 17MByte/sec.  The 
> local drive seems to have a write speed of about 7Mbyte/sec,  so pooling to 
> local disk is not an option.  On our faster servers with dual server-class 
> Gigabit Ethernet adapters,  I can get burst read speeds of 40 to 70 
> Mbyte/sec.

Right, you want to write from network directly to tape.

> We'd also like our tape-rotation policy, for at least some of our tapes, to 
> mirror as closely as possible what we do for our existing servers with local 
> tape drives:  daily tape rotation in a two-week cycle,  with tapes written at 
> night and taken off-site for one week starting the day after they're written. 
>  
> That gives us an 18-hour window in which to write the tapes, and we should be 
> able to fill an 800-GB tape in 17 hours 46 minutes ( 800e8 / 1.25e7 / 3600 = 
> 17.77 ) at Fast Ethernet speed.  We probably have less data than that to back 
> up;  in fact, if we keep our other current tape drives and don't back 
> up /usr/portage or similar directories anywhere, we probably have less than 
> 400GB.  Therefore,  I think we should do a full backup each day; perhaps even 
> a full backup of the first snapshot and incremental backups for later 
> snapshots that same day.  Is that reasonable?  
> 
> Is it possible to initiate an incremental backup that would store all changes 
> against the contents of a certain medium?  (Say tape 5 is in the drive today 
> and has a 380GB full backup and 6 20-GB incremental backups going back 3 
> months.  File /foo/bar/xxx changed monday and tuesday, so the newest copy is 
> on the tuesday tape;  but write a copy to the friday tape as well.)
> 
> Has anyone seen similar speed problems with a NetApp filer, or another device 
> that serves up snapshots of iSCSI or FCP LUNs,  and solved them?
> 

I've seen very different performance levels with Bacula in different
setups.  The bottleneck can be any of a number of things.

My current setup is an LTO-4 autochanger connected to the backup server
via SAS and the backup server is connected to the filer via 10GbE and
I'm getting about 20MB/s max.  I believe the bottleneck has something
to do with the block size Bacula uses to write to the tape drive, or
possibly the database write speed (millions of files).  You can search
the bacula-users archives for threads about "LTO-4 performance" or
"block size".

> Supposing that round-trip-time over the network or disk seek latency on the 
> NetApp is the problem,  could we solve it by running multiple parallel backup 
> jobs to the same tape (without spooling)?

> How can we initiate an external script from Bacula that would do all the 
> snapshots and mount them before any backup job runs; or would we have to do 
> that kind of thing from cron? 

You can do a BeforeJob script and AfterJob script.

> 
> It took about 5 minutes to enter the "select files" phase when doing a 
> restore 
> of a backup with 7 GB of data and 128000 files.  Does that mean that if we 
> made one big backup job over all hosts with 700 GB of data, it would take 8 
> hours to enter the "select files" phase?

Bacula looks at all the file entries in the database for that, so it
depends on the number of files and how fast your db is.

> -- 
> David Lee Lambert ... Software Developer
> Cell phone: +1 586-873-8813 ; alt. email <[EMAIL PROTECTED]> or 
> <[EMAIL PROTECTED]>
> GPG key at http://www.lmert.com/keyring.txt
> 


-- 
Alex Chekholko [EMAIL PROTECTED]

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to