Re: [GENERAL] Help making tablespaces work for my application

2011-06-24 Thread Craig Ringer

On 06/23/2011 09:37 PM, Natusch, Paul wrote:

I have an application for which data is being written to many disks
simultaneously. I would like to use a postgres table space on each disk.
If one of the disks crashes it is tolerable to lose that data, however,
I must continue to write to the other disks.


About the only way you'll be able to do that with PostgreSQL is to run 
one PostgreSQL instance per disk. Give each its own port, datadir, 
shared_buffers, etc.


I wouldn't expect that setup to perform particularly well, and it costs 
you the ability to have ACID rules apply between data on different 
disks. It's also a horribly inefficient use of RAM.


For this kind of task, it is typical to use a simple, dedicated tool to 
capture the writes from the sensors or whatever you are logging. Once 
the data has hit disk, another tool can read it in small batches and add 
it to the database for analysis and reporting.


Perhaps it'd help if you explained what you want - and why - with a 
little more background and detail?


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help making tablespaces work for my application

2011-06-24 Thread Greg Smith

On 06/23/2011 09:37 AM, Natusch, Paul wrote:


I have an application for which data is being written to many disks 
simultaneously.  I would like to use a postgres table space on each 
disk.   If one of the disks crashes it is tolerable to lose that data, 
however, I must continue to write to the other disks.




Tablespaces are not useful for implementing this idea yet.


1. There is a single WAL log for the entire cluster, located in the 
pg_log subdirectory. If the disk containing the pg_log file crashed, 
does that mean the system would come to a halt.  Is there anyway to 
distribute this data so that WAL is located on the same media as the 
table space? An alternative would be to use raid with the disk that 
stores the pg_log subdirectory but that adds cost to the system.




Loss of the pg_xlog subdirectory and the WAL contained in it normally 
results in catastrophic database failure.  Recommended practice is to 
use a RAID-1 volume to make odds of that failure lower.


2.  If #1 was solved by using the raid approach, what happens if one 
of the disks containing one of my table spaces crashes.  At some point 
postgres will want to write the data from the WAL file to the crashed 
(unavailable) disk.Will postgres will be blocked at this point? 
 Is there some way to notify postgres that a specific disk is no 
longer available and that the entries in the WAL for this disk should 
either be purged or ignored? ( I'm willing to "throw away" the data on 
the crashed disk).




PostgreSQL can't be expected to operate sanely when faced with the loss 
of an individual tablespace.  It may be possible to recover from it, but 
you'll be doing something it's not designed to handle, and that effort 
may not succeed.


Note that any tablespace failure is likely to require taking down the 
database to repair the involved tablespaces, so you're likely to have 
downtime between a component failure and when you notice to take 
action.  The database really does not like having tablespaces just go 
away in the middle of operations.


PostgreSQL 9.1 (not released yet, currently in beta) includes a new 
feature called "unlogged tables" that might make this sort of deployment 
possible.  If you created a tablespace for disposable data and put an 
unlogged table onto it, loss of that tablespace would me much less 
likely to cause a problem.  So long as you recreated a new space for the 
unlogged table after restarting, you could probably recover having only 
lost the data on the crashed disk in this situation.



Clearly using raid on all of the disks would be a solution, but that 
is cost prohibitive.





On a cheap server I can easily RAID-1 mirror a pair of drives on Linux 
using software RAID, and individual drives are $50 to $100 each.  If 
your data isn't worth that much, And even that's not enough to really 
make me feel secure about the data--you really need to keep another copy 
around as a backup, too.  You can treat your data as disposable and 
expect to lose it when any single component fails, or you can include 
some good redundancy practices in the design to reduce odds of a 
failure.  There really isn't really a good solution providing partial 
protection in the middle of those two.


--
Greg Smith   2ndQuadrant usg...@2ndquadrant.comBaltimore, MD
PostgreSQL Training, Services, and 24x7 Supportwww.2ndQuadrant.us
"PostgreSQL 9.0 High Performance":http://www.2ndQuadrant.com/books



Re: [GENERAL] Help making tablespaces work for my application

2011-06-23 Thread John R Pierce
 If #1 was solved by using the raid approach, what happens if one of 
the disks containing one of my table spaces crashes.


if you are using raid, your tablespaces are on raid volumes comprised of 
2 or more drives, any one of those drives may fail, and the full data is 
still available.   if you insist on keeping your tablespaces on seperate 
volumes, you'll need enough disks to have a raid for each tablespace, 
thats a minimum of 2 times the tablespaces in drives. putting all 
the drives in one large raid10 (striped mirrors) and putting all your 
data on that same volume will tend to distribute the IO as well as anything.


now, if you're asking what happens when a complete raid volume fails, 
you could consider that a system failure, and fail the whole mess over 
to a streaming standby server  that would catch more failure cases 
and be much simpler and more robust.



--
john r pierceN 37, W 122
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general