[ADMIN] Odd query behavior [urgent, but long]

Steve Lane Wed, 25 Aug 2004 10:09:20 -0700

Hello all:

I'm seeing some very odd query behavior on postgres 7.1.3. I know that's way
out of date, and I do have a plan in place to upgrade, but any immediate
help will be, well, very helpful.


I have a server running two instances of postgres: 7.1.3 and 7.4.3. Each
supports a different web application -- the web apps are served from a
separate middleware server. Formerly both web apps ran on a single
all-in-one server, on postgres 7.1.3.

Since reconfiguring the servers (one web, one database) and deploying on the
two-postgres config, performance of the web app running on the older pg is
terrible, despite the new db-only machine being superficially "better" (in
processor and RAM) than the original.

I have enabled pretty extensive query logging on the 7.1.3 instance. What I
see is that certain queries take hideously long.

What is odd is that these are perfectly well optimized queries: they
represents updates to a table of web application sessions. Each user's
session is written once per web page view, so the query gets run frequently.

Now, this session update is not, I think, the only query that's taking long.
But there are so many of these queries that they are heavily represented in
the slow queries (which I'm determining somewhat laboriously by running the
huge logs through an awk script).

Here's an example EXPLAIN:

datapoint=# explain UPDATE nsse_session SET expiration = 1093541296, value =
'sessIdUser|s:5:\"31991\";sessPriv|s:1:\"u\";se\
datapoint'# 
ssCustId|s:5:\"14688\";sessSchoolId|s:5:\"14781\";sessUserName|s:6:\"gcruse\
";sessImgVersion|i:2;sessAccessPrivs|N;sessAccessSchools|a:1:{i:14781;s:\
datapoint'# 
5:\"14781\";}sessSort|a:0:{}!sessBack|sessDPVersion|s:4:\"full\";sessDataSou
rce|s:4:\"live\";sessUserMeasures|s:14:\"1948,1913,1703\";sessUserGroups\
datapoint'# 
|s:84:\"3074,3113,3144,3179,3223,3263,5968,5997,6011,6027,6035,6046,6063,607
6,6087,6105,6116\";!sessGroupFilter|' WHERE id_session = '955af0272896ba\
datapoint'# a67956494dcb30a5fe' AND expiration > 1093441296;
NOTICE:  QUERY PLAN:

Index Scan using nsse_session_pkey on nsse_session  (cost=0.00..8.16 rows=3
width=50)

EXPLAIN

So, an index scan, as it should be.

When I check the postgres log for the query stats, I find this:

QUERY STATISTICS
! system usage stats:
!       0.012477 elapsed 0.000000 user 0.000000 system sec
!       [0.070000 user 0.000000 sys total]
!       0/0 [0/0] filesystem blocks in/out
!       3/1 [1044/309] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!       0/0 [0/0] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:          0 read,          0 written, buffer hit rate
= 100.00%
!       Local  blocks:          0 read,          0 written, buffer hit rate
= 0.00%
!       Direct blocks:          0 read,          0 written
QUERY STATISTICS
! system usage stats:
!       1878.708691 elapsed 194.000000 user 1.180000 system sec
!       [194.020000 user 1.190000 sys total]
!       0/0 [0/0] filesystem blocks in/out
!       36441/3233 [37108/3496] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!       0/0 [0/0] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:      16863 read,        125 written, buffer hit rate
= 99.99%
!       Local  blocks:          0 read,          0 written, buffer hit rate
= 0.00%
!       Direct blocks:          0 read,          0 written

Two sets of stats, of which the first looks OK, and the second is terrible.
Virtually all of the long queries I've seen so far display this behavior, a
double set of stats, with the second one looking like bad news. The second
set seems almost to show a very elevated level of page faults. (what are
these page faults against? They seem not to be against the pg shared buffers
-- are they against the OS disk cache?)

I did hear a suggestion that it's possible that a few very expensive queries
are blocking others (if they're writing) and that I see the effects mostly
on these session updates, because they're so frequent. I guess that's
possible, but would like to know more about what the above stats might mean.

Some of these session writes are fairly heavy -- we carry a lot of data in
the session, often multiple tens of K of text going into one column.

Is this a stats problem, symptom of insufficient vacuuming?

Relevant params: postgres 7.1.3 running on RedHat ES 2.1. SHMMAX set to
800000000. (Physical ram is 3gig). Shared buffers for this install = 20000,
sort mem is 8 meg, max connections = 50. (I don't think it's pure concurrent
load, it happened during a user training session with only about 30 users).
The other pg install on the box is similarly configured. Just to complete
the data dump, here's the output of ipcs:

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x0052e6a9 3538944    postgres  600        167649280  2
0x0052e2c1 3473409    postgres  600        170762240  2

------ Semaphore Arrays --------
key        semid      owner      perms      nsems      status
0x0052e6a9 11567104   postgres  600        17
0x0052e6aa 11599873   postgres  600        17
0x0052e6ab 11632642   postgres  600        17
0x0052e6ac 11665411   postgres  600        17
0x0052e2c1 10878980   postgres  600        17
0x0052e2c2 10911749   postgres  600        17
0x0052e2c3 10944518   postgres  600        17
0x0052e2c4 10977295   postgres  600        17
0x0052e2c5 11010064   postgres  600        17
0x0052e2c6 11042835   postgres  600        17
0x0052e2c7 11075604   postgres  600        17
0x0052e2c8 11108373   postgres  600        17
0x0052e2c9 11141142   postgres  600        17
0x0052e2ca 11173911   postgres  600        17
0x0052e2cb 11206680   postgres  600        17
0x0052e2cc 11239449   postgres  600        17
0x0052e2cd 11272218   postgres  600        17
0x0052e2ce 11304987   postgres  600        17
0x0052e2cf 11337756   postgres  600        17
0x0052e2d0 11370525   postgres  600        17
0x0052e2d1 11403294   postgres  600        17

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

Sorry, I know that's a ton of data but the problem is urgent and I'm very
perplexed.

For comparison, on the original unified server, I had max connections of 256
(as opposed to 50 on the new one -- could this be the problem?), sort_mem of
6000, shared buffers of 20000. (overall, shared buffers have doubled on the
new machine because I've devoted that much to each).

-- sgl


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

[ADMIN] Odd query behavior [urgent, but long]

Reply via email to