Re: [PERFORM] Postgres vs. DSpam

2004-11-25 Thread Neil Conway
On Fri, 2004-11-26 at 14:37 +1300, Andrew McMillan wrote:
> In PostgreSQL the UPDATE will result
> internally in a new record being written, with the old record being
> marked as deleted.  That old record won't be re-used until after a
> VACUUM has run, and this means that the on-disk tables will have a lot
> of dead rows in them quite quickly.

Not necessarily: yes, you need a VACUUM to begin reusing the space
consumed by expired tuples, but that does not mean "tables will have a
lot of dead rows in them quite quickly". VACUUM does not block
concurrent database activity, so you can run it as frequently as you'd
like (and as your database workload requires). There is a tradeoff
between the space consumed by expired tuple versions and the I/O
required to do a VACUUM -- it's up to the PG admin to decide what the
right balance for their database is (pg_autovacuum et al. can help make
this decision).

> The reason that PostgreSQL operates this way, is a direct result of the
> way transactional support is implemented, and it may well change in a
> version or two.

I doubt it.

-Neil



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Postgres vs. DSpam

2004-11-25 Thread Andrew McMillan
On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote:
> Hi,
> 
> I have installed the dspam filter
> (http://www.nuclearelephant.com/projects/dspam) on our mail server
> (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users
> with a quite low traffic of 4000 messages/day. So it's a quite common
> platform/environment, nothing spectacular.

I am using DSpam with PostgreSQL here.  I have a daily job that cleans
the DSpam database up, as follows:

DELETE FROM dspam_token_data
  WHERE (innocent_hits*2) + spam_hits < 5
  AND CURRENT_DATE - last_hit > 60;

DELETE FROM dspam_token_data
  WHERE innocent_hits = 1
  AND CURRENT_DATE - last_hit > 30;

DELETE FROM dspam_token_data
  WHERE CURRENT_DATE - last_hit > 180;

DELETE FROM dspam_signature_data
  WHERE CURRENT_DATE - created_on > 14;

VACUUM dspam_token_data;

VACUUM dspam_signature_data;



I also occasionally do a "VACUUM FULL ANALYZE;" on the database as well.


In all honesty though, I think that MySQL is better suited to DSpam than
PostgreSQL is.


> Please, could anyone explain me this difference?
> Is Postgres that bad?
> Is MySQL that good?
> Am I the only one to have observed this behavior?

I believe that what DSpam does that is not well-catered for in the way
PostgreSQL operates, is that it does very frequent updates to rows in
(eventually) quite large tables.  In PostgreSQL the UPDATE will result
internally in a new record being written, with the old record being
marked as deleted.  That old record won't be re-used until after a
VACUUM has run, and this means that the on-disk tables will have a lot
of dead rows in them quite quickly.

The reason that PostgreSQL operates this way, is a direct result of the
way transactional support is implemented, and it may well change in a
version or two.  It's got better over the last few versions, with things
like pg_autovacuum, but that approach still doesn't suit some types of
database updating.

Cheers,
Andrew.
-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
   These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!!
-



signature.asc
Description: This is a digitally signed message part


[PERFORM] Postgres backend using huge amounts of ram

2004-11-25 Thread Gary Doades
How much RAM can a single postgres backend use?
I've just loaded a moderately sized dataset into postgres and was
applying RI constraints to the tables (using pgadmin on windows). Part
way though I noticed the (single) postgres backend had shot up to using
300+ MB of my RAM!
The two tables are:
create table reqt_dates
(
reqt_date_idserial,
reqt_id integer not null,
reqt_date   date not null,
primary key (reqt_date_id)
) without oids;
and
create table booking_plan
(
booking_plan_id serial,
reqt_date_idinteger not null,
booking_id  integer not null,
booking_datedate not null,
datetime_from   timestamp not null,
datetime_to timestamp not null,
primary key (booking_plan_id)
) without oids;
and I was was trying to do:
alter table booking_plan add
 foreign key
(
reqt_date_id
) references reqt_dates (
reqt_date_id
) on delete cascade;
Since I can't get an explain of what the alter table was doing I used this:
select count(*) from booking_plan,reqt_dates where
booking_plan.reqt_date_id = reqt_dates.reqt_date_id
and sure enough this query caused the backend to use 300M RAM. The plan
for this was:
QUERY PLAN
Aggregate  (cost=37.00..37.00 rows=1 width=0) (actual
time=123968.000..123968.000 rows=1 loops=1)
  ->  Hash Join  (cost=15.50..36.50 rows=1000 width=0) (actual
time=10205.000..120683.000 rows=1657709 loops=1)
Hash Cond: ("outer".reqt_date_id = "inner".reqt_date_id)
->  Seq Scan on booking_plan  (cost=0.00..15.00 rows=1000
width=4) (actual time=10.000..4264.000 rows=1657709 loops=1)
->  Hash  (cost=15.00..15.00 rows=1000 width=4) (actual
time=10195.000..10195.000 rows=0 loops=1)
  ->  Seq Scan on reqt_dates  (cost=0.00..15.00 rows=1000
width=4) (actual time=0.000..6607.000 rows=2142184 loops=1)
Total runtime: 124068.000 ms
I then analysed the database. Note, there are no indexes at this stage
except the primary keys.
the same query then gave:
QUERY PLAN
Aggregate  (cost=107213.17..107213.17 rows=1 width=0) (actual
time=57002.000..57002.000 rows=1 loops=1)
  ->  Hash Join  (cost=35887.01..106384.32 rows=1657709 width=0)
(actual time=9774.000..54046.000 rows=1657709 loops=1)
Hash Cond: ("outer".reqt_date_id = "inner".reqt_date_id)
->  Seq Scan on booking_plan  (cost=0.00..22103.55 rows=1657709
width=4) (actual time=10.000..19648.000 rows=1657709 loops=1)
->  Hash  (cost=24355.92..24355.92 rows=2142184 width=4)
(actual time=9674.000..9674.000 rows=0 loops=1)
  ->  Seq Scan on reqt_dates  (cost=0.00..24355.92
rows=2142184 width=4) (actual time=0.000..4699.000 rows=2142184 loops=1)
Total runtime: 57002.000 ms
This is the same set of hash joins, BUT the backend only used 30M of
private RAM.
Platform is Windows XP, Postgres 8.0 beta 5
shared_buffers = 4000
work_mem = 8192
Any explanations?
Thanks,
Gary.
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Postgres vs. MySQL

2004-11-25 Thread Dave Cramer
I did some work on RT wrt Postgres for a company and found that their 
was lots of room for improvement
particularly if you are linking requests. The latest RT code hopefully 
has fixes as a result of this work.

Dave
Steinar H. Gunderson wrote:
On Wed, Nov 24, 2004 at 09:57:52AM -0500, Christian Fowler wrote:
 

As for performance, lots of others will probably volunteer tips and 
techniques. In my experience, properly written and tuned applications will 
show only minor speed differences. I have seen several open-source apps 
that "support postgres" but are not well tested on it. Query optimization 
can cause orders of magnitude performance differences.
   

Definitely. My favourite is Request Tracker (we use 2.x, although 3.x is the
latest version), which used something like 5-600 queries (all seqscans since
the database schema only had an ordinary index on the varchar fields in
question, and the queries were automatically searching on LOWER(field) to
emulate MySQL's case-insensitivity on varchar fields) for _every_ page shown.
Needless to say, the web interface was dog slow -- some index manipulation
and a few bugfixes (they had some kind of cache layer which would eliminate
98% of the queries, but for some reason was broken for non-MySQL databases)
later, and we were down to 3-4 index scans, a few orders of magnitude faster.
:-)
/* Steinar */
 

--
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster