Afternoon, folks.
Let me tell you a tale.
We are running dbmail 1.2.8 on debian, postgresql 7.4.7 backend
also on debian.
We were experiencing massive slowdowns at the database level, and
eventually tracked it to a query in db_update_pop:
"UPDATE messages set status=%llu::bigint
 WHERE message_idnr=%llu::bigint AND status<%d::smallint"
Turned up logging, saw the queries, and that one was taking as
much as 50 seconds on some of the entries.
Did an explain of one of the queries, all indexed, should be
OK.
So then I got to thinking "why do we have that comparison against
current status?"
Removed that portion of the query, and query time dropped to
where it should be (sub-second).  I checked the latest 2.1.3
release code, and the query is the same there.
Jesse and I then drew out on the whiteboard all the possible
behavioural effects my removal could have, and the only ones
we came up with were related to two clients checking the same
account at the same time, the first client to disconnect
tagging messages as deleted, and the second client to disconnect
not deleting them (only reading them), thus setting the status
back to 'Read'.  So they would possibly re-appear on the first
client.
Could somebody explain why the query included the status
comparison?
If there's no compelling reason to leave it, I propose removing
it from all version branches.
If there is a reason to keep it, is there a way to optimize the
database even more?  Here is the 'explain':

dbmail=# explain UPDATE messages set status=2::bigint WHERE
message_idnr=48090528::bigint AND status<2::smallint;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Index Scan using idx_status on messages  (cost=0.00..5.70 rows=1
width=94)
   Index Cond: ((status < 2::smallint) AND (message_idnr =
48090528::bigint))

Our database server load over the past week has been consistently
around 16-24 (possibly longer, but that's when it got slow enough
that people started complaining non stop).  Now, sans that last
part of the original query, it is a comfortable 0.26.  My feeling
is that it hit a "critical mass" point where the query time just
multiplied out.

We vacuum daily, reindex nightly, analyze hourly....

Any and all feedback is much appreciated.
And if I forgot anything, I'll follow up.

And no, we're still not quite ready to upgrade to 2.x! :)

dave


-- 
Dave Logan
http://www.digitalcoven.com/

"No!  Try not!  Do.  Or do not.  There is no try." -- Yoda

Reply via email to