Time to buy new hardware Thomas:

dbmail=# select count(*) from dbmail_physmessage;
 count
-------
 43644
(1 row)


QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..726.83 rows=6245 width=36) (actual
time=73.480..4860.933 rows=43644 loops=1)
   Merge Cond: ("outer".id = "inner".physmessage_id)
   ->  Index Scan using dbmail_physmessage_pkey on dbmail_physmessage b
(cost=0.00..216.71 rows=6246 width=24) (actual time=14.958..1227.097
rows=43644 loops=1)
   ->  Index Scan using dbmail_messages_2 on dbmail_messages a
(cost=0.00..401.17 rows=6245 width=28) (actual time=41.286..2819.806
rows=43644 loops=1)
         Filter: ((message_idnr >= 1::bigint) AND (message_idnr <=
417966::bigint) AND (mailbox_idnr = 1::bigint) AND (status <
2::smallint))
 Total runtime: 4992.861 ms
(6 rows)


But watch this:


dbmail=# alter table dbmail_physmessage add column  message_idnr BIGINT
NULL;
ALTER TABLE
dbmail=# alter table dbmail_physmessage add column  mailbox_idnr BIGINT
NULL;
ALTER TABLE

[EMAIL PROTECTED] dbmail]$ (echo 'BEGIN;';psql -At -c 'select
physmessage_id,message_idnr,mailbox_idnr,status from dbmail_messages;'
dbmail  | awk '-F|' '{ print "UPDATE dbmail_physmessage SET
mailbox_idnr="$3", message_idnr="$2" WHERE id="$1";" }'; echo 'COMMIT;')
| psql dbmail >/dev/null


{{ wait a really fucking long time }}


dbmail=# create index dbmail_foo on dbmail_physmessage
(message_idnr,mailbox_idnr,id);
dbmail=# reindex database dbmail force;

{{ and finally }}


dbmail=# EXPLAIN ANALYZE SELECT dbmail-#  a.seen_flag,
dbmail-#  a.answered_flag,
dbmail-#  a.deleted_flag,
dbmail-#  a.flagged_flag,
dbmail-#  a.draft_flag,
dbmail-#  a.recent_flag,
dbmail-#  TO_CHAR(b.internal_date, 'YYYY-MM-DD HH24:MI:SS'),
dbmail-#  b.rfcsize,
dbmail-#  a.message_idnr
dbmail-#
dbmail-# FROM dbmail_physmessage AS b JOIN dbmail_messages AS a
dbmail-# USING (mailbox_idnr,message_idnr)
dbmail-# WHERE message_idnr BETWEEN '1' AND '417966'
dbmail-#  AND b.id=a.physmessage_id
dbmail-#  AND mailbox_idnr = '1'
dbmail-#  AND status < '2';


                                                                   QUERY
PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..12.45 rows=1 width=36) (actual
time=0.346..1703.698 rows=43644 loops=1)
   Join Filter: ("outer".message_idnr = "inner".message_idnr)
   ->  Index Scan using dbmail_foo on dbmail_physmessage b
(cost=0.00..6.38 rows=2 width=40) (actual time=0.055..197.415 rows=43644
loops=1)
         Index Cond: ((message_idnr >= 1::bigint) AND (message_idnr <=
417966::bigint) AND (mailbox_idnr = 1::bigint))
   ->  Index Scan using dbmail_messages_2 on dbmail_messages a
(cost=0.00..3.02 rows=1 width=36) (actual time=0.010..0.015 rows=1
loops=43644)
         Index Cond: ("outer".id = a.physmessage_id)
         Filter: ((status < 2::smallint) AND (1::bigint = mailbox_idnr))
 Total runtime: 1838.451 ms
(8 rows)






On Tue, 2005-03-15 at 18:18 +0100, Thomas Mueller wrote:
> Geo Carncross wrote:
> > On Tue, 2005-03-15 at 13:49 +0100, Thomas Mueller wrote:
> > 
> >>I have 7.4.7 too. The results are MUCH better than what dbmail does at
> >>the moment so it's a real improvement. I don't know if it can be done
> >>better. The query on the mailbox with ~600 Mails (that's quite common I
> >>think) looses all its time here:
> > 
> > Agreed. Will include EXPLAIN ANALYZE when posting. I'm happy enough
> > looking at the plans :)
> 
> Yes that really helps a lot.
> Concerning your second mail: the output of MySQL won't help at all.
> 
> > QUERY PLAN
> > ----------------------------------------------------------------------------------------------------------------------------------------------------------------
> >  Merge Join  (cost=0.00..726.83 rows=6245 width=36) (actual 
> > time=75.749..535.647 rows=6246 loops=1)
> >    Merge Cond: ("outer".id = "inner".physmessage_id)
> >    ->  Index Scan using dbmail_physmessage_pkey on dbmail_physmessage b
> > (cost=0.00..216.71 rows=6246 width=24) (actual time=49.655..360.177
> > rows=6246 loops=1)
> >    ->  Index Scan using dbmail_messages_2 on dbmail_messages a
> > (cost=0.00..401.17 rows=6245 width=28) (actual time=13.445..51.841
> > rows=6246 loops=1)
> >          Filter: ((message_idnr >= 1::bigint) AND (message_idnr <=
> > 417966::bigint) AND (mailbox_idnr = 1::bigint) AND (status <
> > 2::smallint))
> >  Total runtime: 558.631 ms
> > (6 rows)
> > 
> > [This is at 6K messages]
> 
> Why is it that fast but slow for me?? Strange.
> The machine is a 2 GHz Celeron, 512 MB RAM, Raid 1 discs and I think Pg
> is tuned for the system.
> 
> >>Index Scan using dbmail_physmessage_pkey on dbmail_physmessage b
> >>(cost=0.00..7631.57 rows=298853 width=24) (actual time=72.596..12891.334
> >>rows=104412 loops=1)
> >>I don't think we can speed that up. dbmail_physmessage_pkey is the best
> >>index to use.
> > 
> > As far as Pg is concerned- it is. The only thing that could speed it up
> > (without some highly incompatible magic) would be to store the critical
> > parts (mailbox_idnr, message_idnr) in dbmail_physmessage so Pg could do
> > better.
> > 
> > I _know_ you don't like queries like this, and [rant] I know it's
> > shitty, but it _does_ work for Pg.
> 
> I'd accept a lot at the moment to speed the query up :-/
> 
> >>>If it is data related, see if you can give me a postgresql dump that
> >>>causes the problem (that is, if it IS your mailing lists, reload 'em
> >>>into a new db first) - you CAN send it directly to me (just remember to
> >>>remove the -dbmail from my email address)
> >>
> >>Thanks, but I don't think it's data related - it only depends on the
> >>number of mails in a mailbox. For every mail the database has to fetch
> >>the entry from the physmessage table.
> > 
> > Will verify; I'm going to load more messages in. You say 20k is enough?
> 
> I think so, but your numbers are very different from mine: I have about
> 90000 mails in the database, but the last dump was ~8 GB.
> 
> You said:
> > My entire dbmail db is only about 50M per 12k messages; I can easily
> > fit 42k messages into buffer-space.
> 
> Of course if I do the query twice the time drops from 13s to <200ms
> because the indizes and all records to read (the mails don't fill the
> buffer, we don't read the messageblks here) are either in the database
> or the operating system buffers. But that's cheating and has nothing to
> do with real world usage. A folder is opened once and that's it.
> 
> 
> Thomas
-- 
Internet Connection High Quality Web Hosting
http://www.internetconnection.net/

Reply via email to