Hi,

I did some tests with cvs head from today and found out that it is quite
slow when messages are inserted. I have a really aged machine for tests,
but even for that machine it is too slow:

Pentium 166, 128 MB RAM
Debian Sarge
Vanilla kernel 2.6.1
all services stopped that aren't required
IMAP protected using stunnel
Postgres 7.3.4 on a fast 7200 rpm 80 GB HD

Copying about 3000 mails using IMAP took more than an hour (!) in my
LAN.

I've enabled postgres stats collector and did some tests: I have set up
a new database with one user, and two mailboxes, INBOX and INBOX/Test
and renamed dbmail-smtp, so my MTA couldn't insert any messages.

I used the following SQL queries to get the stats:

select xact_commit from pg_stat_database where datname='dbmail'
To get the overall number of transactions.

select * from pg_stat_user_tables
To get detailed seq scans/INSERT/UPDATE/DELETE counts for every row.

select pg_stat_reset()
To reset all stats to zero.

I've reset the stats, started Evolution 1.4.5 and copied 2872 small
mails (sometimes spam can be useful) to my dbmail test machine.
Unfortunately Evolution read about 2% of the mails after I've copied
them, that's why some seq_scan numbers might be somewhat to high.
I've attached the table because of the line breaks too.

--------------------------------------------------
 xact_commit 
-------------
       66906
(1 row)

 relid | schemaname |       relname        | seq_scan | seq_tup_read |
idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del 
-------+------------+----------------------+----------+--------------+----------+---------------+-----------+-----------+-----------
 83072 | public     | aliases              |        0 |            0 |
0 |             0 |         0 |         0 |         0
 83082 | public     | users                |        0 |            0 |
11490 |         11490 |         0 |      5745 |         0
 83095 | public     | mailboxes            |    20295 |        98603 |
0 |             0 |         0 |         0 |         0
 83116 | public     | subscription         |        0 |            0 |
0 |             0 |         0 |         0 |         0
 83128 | public     | acl                  |        0 |            0 |
0 |             0 |         0 |         0 |         0
 83151 | public     | physmessage          |    17490 |     25494744 |
0 |             0 |      2872 |      2958 |         0
 83160 | public     | messages             |    23436 |     34300164 |
0 |             0 |      2872 |      2872 |         0
 83189 | public     | messageblks          |       86 |       493984 |
0 |             0 |      5744 |         0 |         0
 83206 | public     | auto_notifications   |        0 |            0 |
0 |             0 |         0 |         0 |         0
 83219 | public     | auto_replies         |        0 |            0 |
0 |             0 |         0 |         0 |         0
[..]
(15 rows)

 pg_stat_reset 
---------------
 t
(1 row)
--------------------------------------------------

There are several problems:
- 23 transactions for every message copied (!)
- the users table is updated twice for every message
- lot of sequential scans

We had that discussion some months ago yet - we really, really need
transactions. I didn't check the code, but is it possible to encapsulate
the database access for one mail into one transaction without major
rewrites? That would speed up a lot.


Then I checked the performance of dbmail-maintenance (cfpd).

There is one seqScan on mailboxes for every message even if there is
nothing to do - several thousand mailboxes and some million mails and
that will take a really long time.

I've marked all messages as deleted and ran maintenance the first time
(set delete status for deleted messages) - 2872 updates on messages but
only 31 transactions, that looks very good.
The second run of maintenance (delete messages with delete status)
looked worse: 2872 delete on physmessage and messages, 5744 delete on
messageblks but 14389 transactions.


It should be easy to add an index for mailboxes to remove the sequential
scans, but I don't know what query forces postgres to do the seqScan.

I checked the code and found this query for example:
snprintf(query, DEF_QUERYSIZE,
         "SELECT mailbox_idnr FROM mailboxes "
         "WHERE LOWER(name) = LOWER('%s') "
         "AND owner_idnr='%llu'", name, owner_idnr);
We have this index:
CREATE INDEX mailboxes_name_idx ON mailboxes(name);
But we would need:
CREATE INDEX mailboxes_lower_name_idx ON mailboxes(lower(name));

Is IMAP case sensitive? Are folders like INBOX/test and INBOX/Test
possible? If it's insensitive we could lower() all queries on names and
create an index on lower(name).


-- 
MfG Thomas Mueller - http://www.tmueller.com for pgp key (95702B3B)
 xact_commit 
-------------
       66906
(1 row)

 relid | schemaname |       relname        | seq_scan | seq_tup_read | idx_scan 
| idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del 
-------+------------+----------------------+----------+--------------+----------+---------------+-----------+-----------+-----------
 83072 | public     | aliases              |        0 |            0 |        0 
|             0 |         0 |         0 |         0
 83082 | public     | users                |        0 |            0 |    11490 
|         11490 |         0 |      5745 |         0
 83095 | public     | mailboxes            |    20295 |        98603 |        0 
|             0 |         0 |         0 |         0
 83116 | public     | subscription         |        0 |            0 |        0 
|             0 |         0 |         0 |         0
 83128 | public     | acl                  |        0 |            0 |        0 
|             0 |         0 |         0 |         0
 83151 | public     | physmessage          |    17490 |     25494744 |        0 
|             0 |      2872 |      2958 |         0
 83160 | public     | messages             |    23436 |     34300164 |        0 
|             0 |      2872 |      2872 |         0
 83189 | public     | messageblks          |       86 |       493984 |        0 
|             0 |      5744 |         0 |         0
 83206 | public     | auto_notifications   |        0 |            0 |        0 
|             0 |         0 |         0 |         0
 83219 | public     | auto_replies         |        0 |            0 |        0 
|             0 |         0 |         0 |         0

Reply via email to