Do you have any way of narrowing this down to specific queries that are taking
the longest and/or are being executed the most? That would identify which
low-level database functions are being called, then we can just trace our way
up the call chain to see who's misbehaving or acting on a flawed design. Also,
if you could run similar tests against the latest 1.2, it would help to give a
frame of reference, particularly for my delivery chain design.
Aaron
Thomas Mueller <[EMAIL PROTECTED]> said:
> Hi,
>
> I did some tests with cvs head from today and found out that it is quite
> slow when messages are inserted. I have a really aged machine for tests,
> but even for that machine it is too slow:
>
> Pentium 166, 128 MB RAM
> Debian Sarge
> Vanilla kernel 2.6.1
> all services stopped that aren't required
> IMAP protected using stunnel
> Postgres 7.3.4 on a fast 7200 rpm 80 GB HD
>
> Copying about 3000 mails using IMAP took more than an hour (!) in my
> LAN.
>
> I've enabled postgres stats collector and did some tests: I have set up
> a new database with one user, and two mailboxes, INBOX and INBOX/Test
> and renamed dbmail-smtp, so my MTA couldn't insert any messages.
>
> I used the following SQL queries to get the stats:
>
> select xact_commit from pg_stat_database where datname='dbmail'
> To get the overall number of transactions.
>
> select * from pg_stat_user_tables
> To get detailed seq scans/INSERT/UPDATE/DELETE counts for every row.
>
> select pg_stat_reset()
> To reset all stats to zero.
>
> I've reset the stats, started Evolution 1.4.5 and copied 2872 small
> mails (sometimes spam can be useful) to my dbmail test machine.
> Unfortunately Evolution read about 2% of the mails after I've copied
> them, that's why some seq_scan numbers might be somewhat to high.
> I've attached the table because of the line breaks too.
>
> --------------------------------------------------
> xact_commit
> -------------
> 66906
> (1 row)
>
> relid | schemaname | relname | seq_scan | seq_tup_read |
> idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
>
-------+------------+----------------------+----------+--------------+----------+---------------+-----------+-----------+-----------
> 83072 | public | aliases | 0 | 0 |
> 0 | 0 | 0 | 0 | 0
> 83082 | public | users | 0 | 0 |
> 11490 | 11490 | 0 | 5745 | 0
> 83095 | public | mailboxes | 20295 | 98603 |
> 0 | 0 | 0 | 0 | 0
> 83116 | public | subscription | 0 | 0 |
> 0 | 0 | 0 | 0 | 0
> 83128 | public | acl | 0 | 0 |
> 0 | 0 | 0 | 0 | 0
> 83151 | public | physmessage | 17490 | 25494744 |
> 0 | 0 | 2872 | 2958 | 0
> 83160 | public | messages | 23436 | 34300164 |
> 0 | 0 | 2872 | 2872 | 0
> 83189 | public | messageblks | 86 | 493984 |
> 0 | 0 | 5744 | 0 | 0
> 83206 | public | auto_notifications | 0 | 0 |
> 0 | 0 | 0 | 0 | 0
> 83219 | public | auto_replies | 0 | 0 |
> 0 | 0 | 0 | 0 | 0
> [..]
> (15 rows)
>
> pg_stat_reset
> ---------------
> t
> (1 row)
> --------------------------------------------------
>
> There are several problems:
> - 23 transactions for every message copied (!)
> - the users table is updated twice for every message
> - lot of sequential scans
>
> We had that discussion some months ago yet - we really, really need
> transactions. I didn't check the code, but is it possible to encapsulate
> the database access for one mail into one transaction without major
> rewrites? That would speed up a lot.
>
>
> Then I checked the performance of dbmail-maintenance (cfpd).
>
> There is one seqScan on mailboxes for every message even if there is
> nothing to do - several thousand mailboxes and some million mails and
> that will take a really long time.
>
> I've marked all messages as deleted and ran maintenance the first time
> (set delete status for deleted messages) - 2872 updates on messages but
> only 31 transactions, that looks very good.
> The second run of maintenance (delete messages with delete status)
> looked worse: 2872 delete on physmessage and messages, 5744 delete on
> messageblks but 14389 transactions.
>
>
> It should be easy to add an index for mailboxes to remove the sequential
> scans, but I don't know what query forces postgres to do the seqScan.
>
> I checked the code and found this query for example:
> snprintf(query, DEF_QUERYSIZE,
> "SELECT mailbox_idnr FROM mailboxes "
> "WHERE LOWER(name) = LOWER('%s') "
> "AND owner_idnr='%llu'", name, owner_idnr);
> We have this index:
> CREATE INDEX mailboxes_name_idx ON mailboxes(name);
> But we would need:
> CREATE INDEX mailboxes_lower_name_idx ON mailboxes(lower(name));
>
> Is IMAP case sensitive? Are folders like INBOX/test and INBOX/Test
> possible? If it's insensitive we could lower() all queries on names and
> create an index on lower(name).
>
>
> --
> MfG Thomas Mueller - http://www.tmueller.com for pgp key (95702B3B)
>
>
--