Re: [PERFORM] large dataset with write vs read clients

2010-10-12 Thread Chris Browne
cr...@postnewspapers.com.au (Craig Ringer) writes:
 Hey, maybe I should try posting YouTube video answers to a few
 questions for kicks, see how people react ;-)

And make sure it uses the same voice as is used in the MongoDB is web
scale video, to ensure that people interpret it correctly :-).
-- 
output = (cbbrowne @ gmail.com)
http://linuxdatabases.info/info/nonrdbms.html
The *Worst* Things  to Say to a  Police Officer: Hey, is that  a 9 mm?
That's nothing compared to this .44 magnum.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-12 Thread Chris Browne
mladen.gog...@vmsinfo.com (Mladen Gogala) writes:
 I have a logical problem with asynchronous commit. The commit
 command should instruct the database to make the outcome of the
 transaction permanent. The application should wait to see whether the
 commit was successful or not. Asynchronous behavior in the commit
 statement breaks the ACID rules and should not be used in a RDBMS
 system. If you don't need ACID, you may not need RDBMS at all. You may
 try with MongoDB. MongoDB is web scale:
 http://www.youtube.com/watch?v=b2F-DItXtZs

The client always has the option of connecting to a set of databases,
and stowing parts of the data hither and thither.  That often leads to
the relaxation called BASE.  (And IBM has been selling that relaxation
as MQ-Series since the early '90s!)

There often *ARE* cases where it is acceptable for some of the data to
not be as durable, because that data is readily reconstructed.  This is
particularly common for calculated/cached/aggregated data.

Many things can get relaxed for a data warehouse data store, where the
database is not authoritative, but rather aggregates data drawn from
other authoritative sources.  In such applications, neither the A, C, I,
nor the D are pointedly crucial, in the DW data store.

- We don't put the original foreign key constraints into the DW
  database; they don't need to be enforced a second time.  Ditto for
  constraints of all sorts.

- Batching of the loading of updates is likely to break several of the
  letters.  And I find it *quite* acceptable to lose D if the data may
  be safely reloaded into the DW database.

I don't think this is either cavalier nor that it points to MongoDB is
web scale.
-- 
cbbrowne,@,gmail.com
Rules  of the  Evil Overlord  #181.  I  will decree  that all  hay be
shipped in tightly-packed bales. Any wagonload of loose hay attempting
to pass through a checkpoint will be set on fire.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-10 Thread Craig Ringer

On 10/10/2010 5:35 AM, Mladen Gogala wrote:

I have a logical problem with asynchronous commit. The commit command
should instruct the database to make the outcome of the transaction
permanent. The application should wait to see whether the commit was
successful or not. Asynchronous behavior in the commit statement breaks
the ACID rules and should not be used in a RDBMS system. If you don't
need ACID, you may not need RDBMS at all. You may try with MongoDB.
MongoDB is web scale: http://www.youtube.com/watch?v=b2F-DItXtZs


That argument makes little sense to me.

Because you can afford a clearly defined and bounded loosening of the 
durability guarantee provided by the database, such that you know and 
accept the possible loss of x seconds of work if your OS crashes or your 
UPS fails, this means you don't really need durability guarantees at all 
- let alone all that atomic commit silliness, transaction isolation, or 
the guarantee of a consistent on-disk state?


Some of the other flavours of non-SQL databases, both those that've been 
around forever (PICK/UniVerse/etc, Berkeley DB, Cache, etc) and those 
that're new and fashionable Cassandra, CouchDB, etc, provide some ACID 
properties anyway. If you don't need/want an SQL interface to your 
database you don't have to throw out all that other database-y goodness 
if you haven't been drinking too much of the NoSQL kool-aid.


There *are* situations in which it's necessary to switch to relying on 
distributed, eventually-consistent databases with non-traditional 
approaches to data management. It's awfully nice not to have to, though, 
and can force you to do a lot more wheel reinvention when it comes to 
querying, analysing and reporting on your data.


FWIW, a common approach in this sort of situation has historically been 
- accepting that RDBMSs aren't great at continuous fast loading of 
individual records - to log the records in batches to a flat file, 
Berkeley DB, etc as a staging point. You periodically rotate that file 
out and bulk-load its contents into the RDBMS for analysis and 
reporting. This doesn't have to be every hour - every minute is usually 
pretty reasonable, and still gives your database a much easier time 
without forcing you to modify your app to batch inserts into 
transactions or anything like that.


--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-10 Thread Mladen Gogala

 On 10/10/2010 2:43 AM, Craig Ringer wrote:


Some of the other flavours of non-SQL databases, both those that've been
around forever (PICK/UniVerse/etc, Berkeley DB, Cache, etc) and those
that're new and fashionable Cassandra, CouchDB, etc, provide some ACID
properties anyway. If you don't need/want an SQL interface to your
database you don't have to throw out all that other database-y goodness
if you haven't been drinking too much of the NoSQL kool-aid.
This is a terrible misunderstanding. You haven't taken a look at that 
Youtube clip I sent you, have you? I am an Oracle DBA, first and 
foremost, disturbing the peace since 1989. I haven't been drinking the 
NoSQL kool-aid at all.
I was simply being facetious. ACID rules are business rules and I am 
bitterly opposed to relaxing them. BTW, my favorite drink is Sam Adams Ale.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-10 Thread Craig Ringer

On 10/10/2010 2:55 PM, Mladen Gogala wrote:

On 10/10/2010 2:43 AM, Craig Ringer wrote:


Some of the other flavours of non-SQL databases, both those that've been
around forever (PICK/UniVerse/etc, Berkeley DB, Cache, etc) and those
that're new and fashionable Cassandra, CouchDB, etc, provide some ACID
properties anyway. If you don't need/want an SQL interface to your
database you don't have to throw out all that other database-y goodness
if you haven't been drinking too much of the NoSQL kool-aid.

This is a terrible misunderstanding. You haven't taken a look at that
Youtube clip I sent you, have you?


I'm not so good with video when I'm seeking information not 
entertainment. I really dislike having to sit and watch someone 
slwly get aroud to the point; give me something to skim read and 
I'll do that. The trend toward video news etc drives me nuts - IMO just 
detracting from the guts of the story/argument/explanation in most cases.


One of the wonderful things about the written word is that everybody can 
benefit from it at their own natural pace. Video, like university 
lectures, takes that away and forces the video to be paced to the needs 
of the slowest.


My dislike of video-as-information is a quirk that's clearly not shared 
by too many given how trendy video is becoming on the 'net. OTOH, it's 
probably not a grossly unreasonable choice when dealing with lots of 
mailing list posts/requests. Imagine if the Pg list accepted video link 
questions - ugh.


Hey, maybe I should try posting YouTube video answers to a few questions 
for kicks, see how people react ;-)



I am an Oracle DBA, first and
foremost, disturbing the peace since 1989. I haven't been drinking the
NoSQL kool-aid at all.
I was simply being facetious. ACID rules are business rules and I am
bitterly opposed to relaxing them. BTW, my favorite drink is Sam Adams Ale.


Aah, thanks. I completely missed it - which is a little scary, in that 
IMO that message could've been believably written in deadly earnest by a 
NoSQL over-enthusiast. Good work ... I think. Eek.


Sam Adams ale, I'm afrid, does not travel well from Australia.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-10 Thread Florian Weimer
* Mladen Gogala:

 I have a logical problem with asynchronous commit. The commit
 command should instruct the database to make the outcome of the
 transaction permanent. The application should wait to see whether the
 commit was successful or not. Asynchronous behavior in the commit
 statement breaks the ACID rules and should not be used in a RDBMS
 system.

That's a bit over the top.  It may make sense to use PostgreSQL even
if the file system doesn't guarantuee ACID by keeping multiple
checksummed copies of the database files.  Asynchronous commits offer
yet another trade-off here.

Some people use RDBMSs mostly for the *M* part, to get a consistent
administration experience across multiple applications.  And even with
asynchronous commits, PostgreSQL will maintain a consistent state of
the database.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-10 Thread Mladen Gogala

 On 10/10/2010 7:45 AM, Florian Weimer wrote:

Some people use RDBMSs mostly for the*M*  part, to get a consistent
administration experience across multiple applications.  And even with
asynchronous commits, PostgreSQL will maintain a consistent state of
the database.


Both Postgres and Oracle have that option and both databases will 
maintain the consistent state, but both databases will allow the loss of 
data in case of system crash.  Strictly speaking, that does break the 
D in  ACID.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com



Re: [PERFORM] large dataset with write vs read clients

2010-10-09 Thread Florian Weimer
* Greg Smith:

 Given the size of your database, I'd advise you consider a migration
 to a new version ASAP.  8.4 is a nice stable release at this point,
 that's the one to consider moving to.

It also offers asynchronous commits, which might be a good tradeoff
here (especially if the data gathered is not used for billing purposes
8-).

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] large dataset with write vs read clients

2010-10-09 Thread Mladen Gogala
I have a logical problem with asynchronous commit. The commit command 
should instruct the database to make the outcome of the transaction 
permanent. The application should wait to see whether the commit was 
successful or not. Asynchronous behavior in the commit statement breaks 
the ACID rules and should not be used in a RDBMS system. If you don't 
need ACID, you may not need RDBMS at all. You may try with MongoDB. 
MongoDB is web scale: http://www.youtube.com/watch?v=b2F-DItXtZs


Florian Weimer wrote:

* Greg Smith:

  

Given the size of your database, I'd advise you consider a migration
to a new version ASAP.  8.4 is a nice stable release at this point,
that's the one to consider moving to.



It also offers asynchronous commits, which might be a good tradeoff
here (especially if the data gathered is not used for billing purposes
8-).

  



--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] large dataset with write vs read clients

2010-10-07 Thread Aaron Turner
currently PG 8.1.3.  See attached for my postgresql.conf.   Server is
freebsd 6.2 w/ a fast 3TB storage array and only 2GB of ram.

We're running RTG which is a like mrtg, cricket, etc.  basically
queries network devices via SNMP, throws stats into the DB for making
pretty bandwidth graphs.  We've got hundreds of devices, with 10K+
ports and probably 100K's of stats being queried every 5 minutes.  In
order to do all that work, the back end SNMP querier is multi-threaded
and opens a PG connection per-thread.  We're running 30 threads.  This
is basically all INSERTS, but only ends up to being about 32,000/5
minutes.

The graphing front end CGI is all SELECT.  There's 12k tables today,
and new tables are created each month.  The number of rows per table
is 100-700k, with most in the 600-700K range.  190GB of data so far.
Good news is that queries have no joins and are limited to only a few
tables at a time.

Basically, each connection is taking about 100MB resident.  As we need
to increase the number of threads to be able to query all the devices
in the 5 minute window, we're running out of memory.  There aren't
that many CGI connections at anyone one time, but obviously query
performance isn't great, but honestly is surprisingly good all things
considered.

Honestly, not looking to improve PG's performance, really although I
wouldn't complain.  Just better manage memory/hardware.  I assume I
can't start up two instances of PG pointing at the same files, one
read-only and one read-write with different memory profiles, so I
assume my only real option is throw more RAM at it.   I don't have $$$
for another array/server for a master/slave right now.   Or perhaps
tweaking my .conf file?  Are newer PG versions more memory efficient?

Thanks,
Aaron

-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
carpe diem quam minimum credula postero


postgresql.conf
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-07 Thread Dan Harris

 On 10/7/10 11:47 AM, Aaron Turner wrote:

snip

Basically, each connection is taking about 100MB resident.  As we need
to increase the number of threads to be able to query all the devices
in the 5 minute window, we're running out of memory.
I think the first thing to do is look into using a connection pooler 
like pgpool to reduce your connection memory overhead.


-Dan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-07 Thread Stephen Frost
* Dan Harris (f...@drivefaster.net) wrote:
  On 10/7/10 11:47 AM, Aaron Turner wrote:
 Basically, each connection is taking about 100MB resident.  As we need
 to increase the number of threads to be able to query all the devices
 in the 5 minute window, we're running out of memory.
 I think the first thing to do is look into using a connection pooler  
 like pgpool to reduce your connection memory overhead.

Yeah..  Having the number of database connections be close to the number
of processors is usually recommended.

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] large dataset with write vs read clients

2010-10-07 Thread Stephen Frost
* Aaron Turner (synfina...@gmail.com) wrote:
 The graphing front end CGI is all SELECT.  There's 12k tables today,
 and new tables are created each month.  

That's a heck of alot of tables..  Probably more than you really need.
Not sure if reducing that number would help query times though.

 The number of rows per table
 is 100-700k, with most in the 600-700K range.  190GB of data so far.
 Good news is that queries have no joins and are limited to only a few
 tables at a time.

Have you got indexes and whatnot on these tables?

 Basically, each connection is taking about 100MB resident.  As we need
 to increase the number of threads to be able to query all the devices
 in the 5 minute window, we're running out of memory.  There aren't
 that many CGI connections at anyone one time, but obviously query
 performance isn't great, but honestly is surprisingly good all things
 considered.

I'm kind of suprised at each connection taking 100MB, especially ones
which are just doing simple inserts.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] large dataset with write vs read clients

2010-10-07 Thread Stephen Frost
* Aaron Turner (synfina...@gmail.com) wrote:
 Basically, each connection is taking about 100MB resident

Errr..  Given that your shared buffers are around 100M, I think you're
confusing what you see in top with reality.  The shared buffers are
visible in every process, but it's all the same actual memory, not 100M
per process.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] large dataset with write vs read clients

2010-10-07 Thread Aaron Turner
On Thu, Oct 7, 2010 at 12:02 PM, Stephen Frost sfr...@snowman.net wrote:
 * Aaron Turner (synfina...@gmail.com) wrote:
 Basically, each connection is taking about 100MB resident

 Errr..  Given that your shared buffers are around 100M, I think you're
 confusing what you see in top with reality.  The shared buffers are
 visible in every process, but it's all the same actual memory, not 100M
 per process.

Ah, I had missed that.  Thanks for the tip.  Sounds like I should
still investigate pgpool though.  If nothing else it should improve
insert performance right?

As for the tables, no indexes.  We're using a constraint on one of the
columns (date) w/ table inheritance to limit which tables are scanned
since SELECT's are always for a specific date range.  By always
querying the inherited table, we're effectively getting a cheap
semi-granular index without any insert overhead.  Unfortunately,
without forking the RTG code significantly, redesigning the schema
really isn't viable.

-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
carpe diem quam minimum credula postero

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-07 Thread Greg Smith

Aaron Turner wrote:

Are newer PG versions more memory efficient?
  


Moving from PostgreSQL 8.1 to 8.3 or later should make everything you do 
happen 2X to 3X faster, before even taking into account that you can 
tune the later versions better too.  See 
http://suckit.blog.hu/2009/09/29/postgresql_history for a simple 
comparison of how much performance jumped on both reads and writes in 
the later versions than what you're running.  Memory consumption will on 
average decrease too, simply via the fact that queries start and finish 
more quickly.  Given an even workload, there will be less of them 
running at a time on a newer version to keep up.


Given the size of your database, I'd advise you consider a migration to 
a new version ASAP.  8.4 is a nice stable release at this point, that's 
the one to consider moving to.  The biggest single problem people 
upgrading from 8.1 to 8.3 or later see is related to changes in how data 
is cast between text and integer types; 1 doesn't equal '1' anymore is 
the quick explanation of that.  See 
http://wiki.postgresql.org/wiki/Version_History for links to some notes 
on that, as well as other good resources related to upgrading.  This may 
require small application changes to deal with.


Even not considering the performance increases, PostgreSQL 8.1 is due to 
be dropped from active support potentially as early as next month:  
http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy


Also:  PostgreSQL 8.1.3 has several known bugs that can lead to various 
sorts of nasty data corruption.  You should at least consider an 
immediate upgrade to the latest release of that version, 8.1.22.  Small 
version number increases in PostgreSQL only consist of serious bug 
fixes, not feature changes.  See 
http://www.postgresql.org/support/versioning for notes about the 
project's standard for changes here, and how it feels about the risks of 
running versions with known bugs in them vs. upgrading.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, PostgreSQL 9.0 High PerformancePre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large dataset with write vs read clients

2010-10-07 Thread Aaron Turner
On Thu, Oct 7, 2010 at 2:47 PM, Greg Smith g...@2ndquadrant.com wrote:
 Aaron Turner wrote:

 Are newer PG versions more memory efficient?


 Moving from PostgreSQL 8.1 to 8.3 or later should make everything you do
 happen 2X to 3X faster, before even taking into account that you can tune
 the later versions better too.  See
 http://suckit.blog.hu/2009/09/29/postgresql_history for a simple comparison
 of how much performance jumped on both reads and writes in the later
 versions than what you're running.  Memory consumption will on average
 decrease too, simply via the fact that queries start and finish more
 quickly.  Given an even workload, there will be less of them running at a
 time on a newer version to keep up.

 Given the size of your database, I'd advise you consider a migration to a
 new version ASAP.  8.4 is a nice stable release at this point, that's the
 one to consider moving to.  The biggest single problem people upgrading from
 8.1 to 8.3 or later see is related to changes in how data is cast between
 text and integer types; 1 doesn't equal '1' anymore is the quick explanation
 of that.  See http://wiki.postgresql.org/wiki/Version_History for links to
 some notes on that, as well as other good resources related to upgrading.
  This may require small application changes to deal with.

 Even not considering the performance increases, PostgreSQL 8.1 is due to be
 dropped from active support potentially as early as next month:
  http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy

 Also:  PostgreSQL 8.1.3 has several known bugs that can lead to various
 sorts of nasty data corruption.  You should at least consider an immediate
 upgrade to the latest release of that version, 8.1.22.  Small version number
 increases in PostgreSQL only consist of serious bug fixes, not feature
 changes.  See http://www.postgresql.org/support/versioning for notes about
 the project's standard for changes here, and how it feels about the risks of
 running versions with known bugs in them vs. upgrading.

 --
 Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
 PostgreSQL Training, Services and Support  www.2ndQuadrant.us
 Author, PostgreSQL 9.0 High Performance    Pre-ordering at:
 https://www.packtpub.com/postgresql-9-0-high-performance/book



Thanks for the info Greg.  Sounds like I've got an upgrade in the near
future! :)

Again, thanks to everyone who's responded; it's been really
informative and helpful.  The PG community has always proven to be
awesome!



-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
carpe diem quam minimum credula postero

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance