On Tue, 2005-03-15 at 18:18 +0100, Thomas Mueller wrote:
> Geo Carncross wrote:
> > On Tue, 2005-03-15 at 13:49 +0100, Thomas Mueller wrote:
> > 
> >>I have 7.4.7 too. The results are MUCH better than what dbmail does at
> >>the moment so it's a real improvement. I don't know if it can be done
> >>better. The query on the mailbox with ~600 Mails (that's quite common I
> >>think) looses all its time here:
> > 
> > Agreed. Will include EXPLAIN ANALYZE when posting. I'm happy enough
> > looking at the plans :)
> 
> Yes that really helps a lot.
> Concerning your second mail: the output of MySQL won't help at all.

No, not likely :)


> > QUERY PLAN
> > ----------------------------------------------------------------------------------------------------------------------------------------------------------------
> >  Merge Join  (cost=0.00..726.83 rows=6245 width=36) (actual 
> > time=75.749..535.647 rows=6246 loops=1)
> >    Merge Cond: ("outer".id = "inner".physmessage_id)
> >    ->  Index Scan using dbmail_physmessage_pkey on dbmail_physmessage b
> > (cost=0.00..216.71 rows=6246 width=24) (actual time=49.655..360.177
> > rows=6246 loops=1)
> >    ->  Index Scan using dbmail_messages_2 on dbmail_messages a
> > (cost=0.00..401.17 rows=6245 width=28) (actual time=13.445..51.841
> > rows=6246 loops=1)
> >          Filter: ((message_idnr >= 1::bigint) AND (message_idnr <=
> > 417966::bigint) AND (mailbox_idnr = 1::bigint) AND (status <
> > 2::smallint))
> >  Total runtime: 558.631 ms
> > (6 rows)
> > 
> > [This is at 6K messages]

At 12K messages, query plan same, and runtime nearly constant:
 Total runtime: 419.159 ms

We will wait and see (still loading messages)

> Why is it that fast but slow for me?? Strange.
> The machine is a 2 GHz Celeron, 512 MB RAM, Raid 1 discs and I think Pg
> is tuned for the system.

While we wait for my mailboxes to fill up nice and good, I would dump
the data you've got, recreate the databases as I did, and reload your
data after the fact (no tables/index), and take another look.

It's entirely possible its data-related, but in case it isn't, it might
be some index or cached junk that Pg is just choking on.

At this moment, I believe it's probably data-related. After all, I'm
looking at hash-buckets that - at this moment- aren't generating [many]
any collisions. As I generate more collisions, I'll probably start
seeing the same results- although with my data-set, it'll probably take
a lot more than 20k messages...

... nevertheless, we will see :)



> >>Index Scan using dbmail_physmessage_pkey on dbmail_physmessage b
> >>(cost=0.00..7631.57 rows=298853 width=24) (actual time=72.596..12891.334
> >>rows=104412 loops=1)
> >>I don't think we can speed that up. dbmail_physmessage_pkey is the best
> >>index to use.
> > 
> > As far as Pg is concerned- it is. The only thing that could speed it up
> > (without some highly incompatible magic) would be to store the critical
> > parts (mailbox_idnr, message_idnr) in dbmail_physmessage so Pg could do
> > better.
> > 
> > I _know_ you don't like queries like this, and [rant] I know it's
> > shitty, but it _does_ work for Pg.
> 
> I'd accept a lot at the moment to speed the query up :-/

Using the combined index means we won't use a straight-hash table.
This'll work as long as we REALLY ARE SEEING a lot of collisions, _OR_
we don't have enough buckets.

I don't know which case it is yet.


> >>>If it is data related, see if you can give me a postgresql dump that
> >>>causes the problem (that is, if it IS your mailing lists, reload 'em
> >>>into a new db first) - you CAN send it directly to me (just remember to
> >>>remove the -dbmail from my email address)
> >>
> >>Thanks, but I don't think it's data related - it only depends on the
> >>number of mails in a mailbox. For every mail the database has to fetch
> >>the entry from the physmessage table.
> > 
> > Will verify; I'm going to load more messages in. You say 20k is enough?
> 
> I think so, but your numbers are very different from mine: I have about
> 90000 mails in the database, but the last dump was ~8 GB.

I'll continue to 100k if necessary... it just takes a while :)

> You said:
> > My entire dbmail db is only about 50M per 12k messages; I can easily
> > fit 42k messages into buffer-space.
> 
> Of course if I do the query twice the time drops from 13s to <200ms
> because the indizes and all records to read (the mails don't fill the
> buffer, we don't read the messageblks here) are either in the database
> or the operating system buffers. But that's cheating and has nothing to
> do with real world usage. A folder is opened once and that's it.

If that's the case, you're right that it may not be data-related. I
don't think combined indexes will help in that case. Your disk might
simply not be fast enough.

If your disk is less than 50% full, you can probably get your Pg
database sequential by shutting it down, copying your data directory (be
efficient about sparse files and preserve permissions!) someplace else,
then moving that over your original database.

See if that speeds you up- your system might have simply been spending
all of it's time seeking.




-- 
Internet Connection High Quality Web Hosting
http://www.internetconnection.net/

Reply via email to