Re: [PERFORM] Postgres vs. DSpam
On Fri, 2004-11-26 at 14:37 +1300, Andrew McMillan wrote: > In PostgreSQL the UPDATE will result > internally in a new record being written, with the old record being > marked as deleted. That old record won't be re-used until after a > VACUUM has run, and this means that the on-disk tables will have a lot > of dead rows in them quite quickly. Not necessarily: yes, you need a VACUUM to begin reusing the space consumed by expired tuples, but that does not mean "tables will have a lot of dead rows in them quite quickly". VACUUM does not block concurrent database activity, so you can run it as frequently as you'd like (and as your database workload requires). There is a tradeoff between the space consumed by expired tuple versions and the I/O required to do a VACUUM -- it's up to the PG admin to decide what the right balance for their database is (pg_autovacuum et al. can help make this decision). > The reason that PostgreSQL operates this way, is a direct result of the > way transactional support is implemented, and it may well change in a > version or two. I doubt it. -Neil ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Postgres vs. DSpam
On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote: > Hi, > > I have installed the dspam filter > (http://www.nuclearelephant.com/projects/dspam) on our mail server > (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users > with a quite low traffic of 4000 messages/day. So it's a quite common > platform/environment, nothing spectacular. I am using DSpam with PostgreSQL here. I have a daily job that cleans the DSpam database up, as follows: DELETE FROM dspam_token_data WHERE (innocent_hits*2) + spam_hits < 5 AND CURRENT_DATE - last_hit > 60; DELETE FROM dspam_token_data WHERE innocent_hits = 1 AND CURRENT_DATE - last_hit > 30; DELETE FROM dspam_token_data WHERE CURRENT_DATE - last_hit > 180; DELETE FROM dspam_signature_data WHERE CURRENT_DATE - created_on > 14; VACUUM dspam_token_data; VACUUM dspam_signature_data; I also occasionally do a "VACUUM FULL ANALYZE;" on the database as well. In all honesty though, I think that MySQL is better suited to DSpam than PostgreSQL is. > Please, could anyone explain me this difference? > Is Postgres that bad? > Is MySQL that good? > Am I the only one to have observed this behavior? I believe that what DSpam does that is not well-catered for in the way PostgreSQL operates, is that it does very frequent updates to rows in (eventually) quite large tables. In PostgreSQL the UPDATE will result internally in a new record being written, with the old record being marked as deleted. That old record won't be re-used until after a VACUUM has run, and this means that the on-disk tables will have a lot of dead rows in them quite quickly. The reason that PostgreSQL operates this way, is a direct result of the way transactional support is implemented, and it may well change in a version or two. It's got better over the last few versions, with things like pg_autovacuum, but that approach still doesn't suit some types of database updating. Cheers, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!! - signature.asc Description: This is a digitally signed message part
[PERFORM] Postgres backend using huge amounts of ram
How much RAM can a single postgres backend use? I've just loaded a moderately sized dataset into postgres and was applying RI constraints to the tables (using pgadmin on windows). Part way though I noticed the (single) postgres backend had shot up to using 300+ MB of my RAM! The two tables are: create table reqt_dates ( reqt_date_idserial, reqt_id integer not null, reqt_date date not null, primary key (reqt_date_id) ) without oids; and create table booking_plan ( booking_plan_id serial, reqt_date_idinteger not null, booking_id integer not null, booking_datedate not null, datetime_from timestamp not null, datetime_to timestamp not null, primary key (booking_plan_id) ) without oids; and I was was trying to do: alter table booking_plan add foreign key ( reqt_date_id ) references reqt_dates ( reqt_date_id ) on delete cascade; Since I can't get an explain of what the alter table was doing I used this: select count(*) from booking_plan,reqt_dates where booking_plan.reqt_date_id = reqt_dates.reqt_date_id and sure enough this query caused the backend to use 300M RAM. The plan for this was: QUERY PLAN Aggregate (cost=37.00..37.00 rows=1 width=0) (actual time=123968.000..123968.000 rows=1 loops=1) -> Hash Join (cost=15.50..36.50 rows=1000 width=0) (actual time=10205.000..120683.000 rows=1657709 loops=1) Hash Cond: ("outer".reqt_date_id = "inner".reqt_date_id) -> Seq Scan on booking_plan (cost=0.00..15.00 rows=1000 width=4) (actual time=10.000..4264.000 rows=1657709 loops=1) -> Hash (cost=15.00..15.00 rows=1000 width=4) (actual time=10195.000..10195.000 rows=0 loops=1) -> Seq Scan on reqt_dates (cost=0.00..15.00 rows=1000 width=4) (actual time=0.000..6607.000 rows=2142184 loops=1) Total runtime: 124068.000 ms I then analysed the database. Note, there are no indexes at this stage except the primary keys. the same query then gave: QUERY PLAN Aggregate (cost=107213.17..107213.17 rows=1 width=0) (actual time=57002.000..57002.000 rows=1 loops=1) -> Hash Join (cost=35887.01..106384.32 rows=1657709 width=0) (actual time=9774.000..54046.000 rows=1657709 loops=1) Hash Cond: ("outer".reqt_date_id = "inner".reqt_date_id) -> Seq Scan on booking_plan (cost=0.00..22103.55 rows=1657709 width=4) (actual time=10.000..19648.000 rows=1657709 loops=1) -> Hash (cost=24355.92..24355.92 rows=2142184 width=4) (actual time=9674.000..9674.000 rows=0 loops=1) -> Seq Scan on reqt_dates (cost=0.00..24355.92 rows=2142184 width=4) (actual time=0.000..4699.000 rows=2142184 loops=1) Total runtime: 57002.000 ms This is the same set of hash joins, BUT the backend only used 30M of private RAM. Platform is Windows XP, Postgres 8.0 beta 5 shared_buffers = 4000 work_mem = 8192 Any explanations? Thanks, Gary. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Postgres vs. MySQL
I did some work on RT wrt Postgres for a company and found that their was lots of room for improvement particularly if you are linking requests. The latest RT code hopefully has fixes as a result of this work. Dave Steinar H. Gunderson wrote: On Wed, Nov 24, 2004 at 09:57:52AM -0500, Christian Fowler wrote: As for performance, lots of others will probably volunteer tips and techniques. In my experience, properly written and tuned applications will show only minor speed differences. I have seen several open-source apps that "support postgres" but are not well tested on it. Query optimization can cause orders of magnitude performance differences. Definitely. My favourite is Request Tracker (we use 2.x, although 3.x is the latest version), which used something like 5-600 queries (all seqscans since the database schema only had an ordinary index on the varchar fields in question, and the queries were automatically searching on LOWER(field) to emulate MySQL's case-insensitivity on varchar fields) for _every_ page shown. Needless to say, the web interface was dog slow -- some index manipulation and a few bugfixes (they had some kind of cache layer which would eliminate 98% of the queries, but for some reason was broken for non-MySQL databases) later, and we were down to 3-4 index scans, a few orders of magnitude faster. :-) /* Steinar */ -- Dave Cramer http://www.postgresintl.com 519 939 0336 ICQ#14675561 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster