On 03/06/2012 02:03 PM, Todd Hunt wrote:
Hi,
First let me state that I'm not a DBA, but a developer. I know enough
about databases to be dangerous, but not enough to make any money at
it. ;-)
We are using large objects with OIDs as part of our data model. One
of our processes is to backup the database every night. Our DBAs told
me that pg_dump was taking too long. So they decided to shut down
Postgres, TAR up the directory to a back up location, restart
Postgres, then copy the back up TAR to tape.
For our smaller sites, 200GB or less (mostly LOBs), it takes less than
an hour to shutdown, TAR, and restart (writing to tape is not part of
this time frame). Some of our larger sites have 500GB+ worth of data,
which is mostly LOBs. Our DBAs want to move the LOBs out of the
database and store them on the file system and have the record have a
path the binary files. I'd like to come up with a better and faster
back up solution that allows the LOBs to stay in the DB.
A few things to note is that when a LOB gets inserted into the DB, it
is never updated. It may be deleted on rare occasions, but never
updated. Also, the DBAs are against incremental backups and I don't
blame them, sort of.
I'm open to any ideas. The servers are pretty standard. They
initially come with four 1TB hard drives ran in RAID 10, so they have
2TB available. There is another controller card and space for 4 more
drives. We want to keep cost down, but uptime is very important.
Even though I'm not a Sys Admin either, I was wondering if there would
be a way to replicate the DB on the two different RAID sets, "halt"
one to do the backup, then reinitialize it so that it would sync up
with the other RAID set.
Other than adding a secondary server to do data replication, I'm open
to ideas. Even if it means moving the LOBs onto the file system. I
just need a back up solution that scales well when we exceed 1TB.
Thank you,
Todd
Howdy,
With a database that size you probably want to consider performing hot
backups
http://www.postgresql.org/docs/9.0/static/continuous-archiving.html.
That way your system is never down for the reason of backups. Also you
can take full backups on the weekends (or during a known slow time) and
then deltas (your archived logs) daily. This speeds up your backups to a
much more manageable duration.
or, if you're on linux/bsd/etc you could also rsync to another server,
you can configure it to only do deltas so it's not bringing over 500GB
every day.
If you can get a SAN with snapshot capability, you can throw the
database into backup mode, snapshot it, and then take it out, making the
impact of your entire backup take ~5 minutes regardless of size on disk.
All that said, I would advocate for taking the LOBs out of the database.
A database isn't a filesystem, it's never going to match a filesystem +
nginx (or something) for serving up files. The fact that your LOBs are
rarely updated strengthens the case for getting it out of the database.
My philosophy is, let the database handle what it's good at: joins,
transactions, updates, etc. and let filesystems / webservers handle what
they're good at: serving up static content.
Dave