Hi Geo,

> I've got 6000 messages here, and subsecond queries; here's the EXACT
> query:

[..]
>  Nested Loop  (cost=0.00..46.85 rows=2 width=36)

Could you please always do an EXPLAIN ANALYZE? Your output only shows
what the planer expects without doing the query. With EXPLAIN ANALYZE
you get the expectation and the real result:

 Nested Loop  (cost=0.00..1041.31 rows=147 width=36) (actual
time=34.540..66.288 rows=101 loops=1)

Expectation were 147 rows, real result were 101 rows, that's quite good.
The loop started after 34ms (had to wait for results for 34ms) until
66ms after the query started.

The planer expects only 2 rows for your query - for small mailboxes it's
very very fast for me too (147/101 mails):

 Nested Loop  (cost=0.00..1041.31 rows=147 width=36) (actual
time=34.540..66.288 rows=101 loops=1)
   ->  Index Scan using dbmail_messages_tmm_status_recent_idx on
dbmail_messages a  (cost=0.00..585.27 rows=147 width=28) (actual
time=11.616..36.320 rows=101 loops=1)
         Index Cond: ((mailbox_idnr = 424::bigint) AND (status <
2::smallint))
         Filter: ((message_idnr >= 1::bigint) AND (message_idnr <=
500000::bigint))
   ->  Index Scan using dbmail_physmessage_pkey on dbmail_physmessage b
 (cost=0.00..3.09 rows=1 width=24) (actual time=0.024..0.031 rows=1
loops=101)
         Index Cond: ("outer".physmessage_id = b.id)
 Total runtime: 66.982 ms

For bigger mailboxes the plan changes. Pg uses a merge join instead of a
nested loop (very good idea) and a different index (hm, expected and
real rows differ a lot here):

 Merge Join  (cost=13255.73..21697.27 rows=3637 width=36) (actual
time=9421.114..13351.260 rows=571 loops=1)
   Merge Cond: ("outer".id = "inner".physmessage_id)
   ->  Index Scan using dbmail_physmessage_pkey on dbmail_physmessage b
 (cost=0.00..7631.57 rows=298853 width=24) (actual
time=72.596..12891.334 rows=104412 loops=1)
   ->  Sort  (cost=13255.73..13264.83 rows=3637 width=28) (actual
time=10.073..12.358 rows=571 loops=1)
         Sort Key: a.physmessage_id
         ->  Index Scan using dbmail_messages_mailbox_idx on
dbmail_messages a (cost=0.00..13040.63 rows=3637 width=28) (actual
time=0.075..6.958 rows=571 loops=1)
               Index Cond: (mailbox_idnr = 82::bigint)
               Filter: ((message_idnr >= 1::bigint) AND (message_idnr <=
500000::bigint) AND (status < 2::smallint))
 Total runtime: 13353.907 ms

For nearly 20.000 mails in a mailbox (that's the biggest one I have)
this is the third plan:

 Merge Join  (cost=0.00..34180.39 rows=77533 width=36) (actual
time=82.494..33801.615 rows=19579 loops=1)
   Merge Cond: ("outer".id = "inner".physmessage_id)
   ->  Index Scan using dbmail_physmessage_pkey on dbmail_physmessage b
 (cost=0.00..7631.57 rows=298853 width=24) (actual time=24.161..5934.702
rows=92937 loops=1)
   ->  Index Scan using dbmail_messages_physmessage_idx on
dbmail_messages a  (cost=0.00..24639.51 rows=77533 width=28) (actual
time=52.447..26830.988 rows=19579 loops=1)
         Filter: ((message_idnr >= 1::bigint) AND (message_idnr <=
417966::bigint) AND (mailbox_idnr = 86::bigint) AND (status < 2::smallint))
 Total runtime: 33884.024 ms

> If you're still seeing these problems, it may be the version of Pg (I'm
> using 7.4.7) - or it MAY be data related.

I have 7.4.7 too. The results are MUCH better than what dbmail does at
the moment so it's a real improvement. I don't know if it can be done
better. The query on the mailbox with ~600 Mails (that's quite common I
think) looses all its time here:

Index Scan using dbmail_physmessage_pkey on dbmail_physmessage b
(cost=0.00..7631.57 rows=298853 width=24) (actual time=72.596..12891.334
rows=104412 loops=1)
I don't think we can speed that up. dbmail_physmessage_pkey is the best
index to use.

> If it is data related, see if you can give me a postgresql dump that
> causes the problem (that is, if it IS your mailing lists, reload 'em
> into a new db first) - you CAN send it directly to me (just remember to
> remove the -dbmail from my email address)

Thanks, but I don't think it's data related - it only depends on the
number of mails in a mailbox. For every mail the database has to fetch
the entry from the physmessage table.

I guess you get the same results if you take mailboxes with lot of mails.
Most mails were copied into my database by the customers using their
IMAP Client to import old mails. Thunderbird can be used to read mbox
archives p.e. - I don't know if that's faster than dbmail-smtp ?


I'm sure someone has a bigger system than I have (mine is only for
~10-20 people!) ? Several million mails? I'd really like to see some
numbers from such systems. But I guess it's the same as for me.


Geo thanks a lot for your help!


Thomas
-- 
http://www.tmueller.com for pgp key (95702B3B)

Reply via email to