[PERFORM] Dramatic change in memory usage with version 9.1

2011-12-19 Thread Rafael Martinez
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello

I am sending this email to ask if anyone has noticed a change in how
a server running postgreSQL 9.1 uses and allocates memory compared to
older versions.

We upgraded all our systems from 8.3 to 9.1 a couple of weeks ago, and
we have experienced a radical change in how our servers make use of
memory. How memory is allocated has become more unstable and the swap
usage has increased dramatically.

The pattern that we have started seeing is:

* Sudden decrease of swap when running backup/vacuum+analyze jobs
* Full use of cached memory when running backup/vacuum+analyze jobs
* Sudden increase of swap and unused memory when backup/vacuum+analyze
jobs are finnished.
* Progressive decrease of swap during the day.


Here is a list of things about this upgrade to version 9.1 that can be
interesting when analyzing this change of behavior:

* The servers are running the samme OS version and linux kernel as
with 8.3.

* We are running the same values for parameters related to memory
allocation as we used in 8.3.

* We are running the same backups and maintenance jobs as with version
8.3. These jobs are running at the exactly same time as with 8.3.

* Backups (PITR, pg_dumps) and maintenances (vacuum, analyze) jobs are
executed between midnight and early morning.

* We run several postgreSQL clusters per server, running in different
IPs and disks.

* We have not seen any significant change in how databases are
used/accessed after the upgrade to 9.1.

* We upgraded in the first time from 8.3.12 to 9.1.2, but because this
bug: http://archives.postgresql.org/pgsql-bugs/2011-12/msg00068.php
we had to downgrade to 9.1.1. We thought in the begynning that our
memory problems were related to this bug, but everything is the same
with 9.1.1.

* A couple of days ago we decreased the values of maintenance_work_mem
and work_mem over a 50% in relation to values used with 8.3. The only
change we have seen is even more unused memory after backup/vacuum
+analyze jobs are finnished.

Here you have some graphs that can help to get a picture about what we
are talking about:

* Overview of how memory use changed in one of our servers after the
upgrade in the begynning og week 49:
http://folk.uio.no/rafael/upgrade_to_9.1/server-1/memory-month.png
http://folk.uio.no/rafael/upgrade_to_9.1/server-1/memory-year.png

* We could think that all this happens because we are running to much
in one server. Here are some graphs from a server with 30GB+ running
only one postgres cluster (shared_memory = 6GB,
maintenance_work_memory = 512MB, work_mem = 32MB) for a couple of days:

http://folk.uio.no/rafael/upgrade_to_9.1/server-2/memory-week.png

The memory pattern is the same even when running only one postgres
cluster in a server with enough memory.

Any ideas about why this dramatic change in memory usage when the only
thing apparently changed from our side is the postgres version?

Thanks in advance for any help.

regards,
- -- 
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7vUpYACgkQBhuKQurGihTvjACff5J08pNJuRDgkegYdtQ5zp52
GeoAnRaaU+F/C/udQ7lMl/TkvRKX2WnP
=VcDk
-END PGP SIGNATURE-

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Dramatic change in memory usage with version 9.1

2011-12-19 Thread k...@rice.edu
Wow, upgrading 3 major releases at a go. :) It would probably be
useful to use the helpful:

http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

to get the information that is needed to the right people.

Regards,
Ken

On Mon, Dec 19, 2011 at 04:04:54PM +0100, Rafael Martinez wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Hello
 
 I am sending this email to ask if anyone has noticed a change in how
 a server running postgreSQL 9.1 uses and allocates memory compared to
 older versions.
 
 We upgraded all our systems from 8.3 to 9.1 a couple of weeks ago, and
 we have experienced a radical change in how our servers make use of
 memory. How memory is allocated has become more unstable and the swap
 usage has increased dramatically.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Dramatic change in memory usage with version 9.1

2011-12-19 Thread Marti Raudsepp
On Mon, Dec 19, 2011 at 17:04, Rafael Martinez r.m.guerr...@usit.uio.no wrote:
 * Sudden decrease of swap when running backup/vacuum+analyze jobs

Do you know for certain that this memory use is attributed to
vacuum/analyze/backup, or are you just guessing? You should isolate
whether it's the vacuum or a backup process/backend that takes this
memory.

Do you launch vacuum/analyze manually or are you just relying on autovacuum?
How many parallel vacuum jobs are there?
What's your autovacuum_max_workers set to?
How large is your database?
How did you perform the upgrade -- via pg_upgrade or pg_dump?

 Any ideas about why this dramatic change in memory usage when the only
 thing apparently changed from our side is the postgres version?

Well, for one, there have been many planner changes that make it use
memory more aggressively, these probably being the most significant:
* Materialize for nested loop queries in 9.0:
http://rhaas.blogspot.com/2010/04/materialization-in-postgresql-90.html
* Hash join usage for RIGHT and FULL OUTER JOINs in 9.0

However, none of these would apply to vacuum, analyze or backups.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] OOM-killer issue with a specific query

2011-12-19 Thread nabble . 30 . miller_2555
Hi - I'm running into an OOM-killer issue when running a specific query (no
virtual machine running) and, based on researching the issue, I can
probably fix by making the following sysctl adjustments:
vm.overcommit_memory = 2
vm.overcommit_ratio = 0
However, I am perplexed as to why I am running into the issue in the first
place. The machine (running Linux 2.6.34.7-61.fc13.x86_64) is dedicated to
Postgres (v9.0.0 [RPM package: postgresql90-9.0.0-1PGDG.fc13.1.x86_64]) and
the following memory usage is pretty typical for the system (via top):
Mem:   8121992k total,  2901960k used,  5220032k free,   237408k buffers
Swap:  1048572k total,   235940k used,   812632k free,  2053768k cached
Under steady-state conditions, the following shows the virtual memory size
for postgres backend processes:
 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
8506 postgres  20   0 2327m 3084 1792 S  0.0  0.0   0:00.33 postgres
8504 postgres  20   0 2326m  14m  13m S  0.0  0.2   0:01.32 postgres
8505 postgres  20   0 2326m  728  452 S  0.0  0.0   0:00.91 postgres
3582 postgres  20   0 2325m  54m  53m S  0.0  0.7   0:02.03 postgres
My current relevant postgresql.conf settings are the following:
shared_buffers = 2100MB
temp_buffers = 8MB
work_mem = 32MB
maintenance_work_mem = 16MB
max_stack_depth = 2MB
constraint_exclusion = partition
When executing the query, I've been watching the top activity, sorted by
resident memory. Upon execution, no other processes appear to take
additional resident memory, except a postgres backend process servicing the
query, which goes to +6Gb (triggering the OOM-killer). Given the settings
in postgresql.conf, and my anecdotal understanding of Postgres memory
management functions, I am uncertain why Postgres exhausts physical memory
instead of swapping to temporary files. Do I need to lower my work_mem
setting since the subquery involves a partitioned table, causing a
multiplier effect to the memory used (I have tried per-connection settings
of 6MB)? Would tweaking  query planning settings help?

Thanks in advance!

If it helps, I have included the query (with column names aliased to their
data type), a brief description of the applicable table's contents, and an
abridged copy of the EXPLAIN ANALYZE output

SELECT bigint, date, text
FROM tableA AS A
WHERE A.boolean = 'true' AND
(A.text = 'abc' OR A.text = 'xyz') AND
A.bigint NOT IN (SELECT bigint FROM tableB)
ORDER BY A.date DESC;

tableA:
- total table contains ~11 million records (total width: 109 bytes)
- partitioned by month (180 partitions)
- each table partition contains ~100k records
tableB:
- total table contains ~400k records (total width: 279 bytes)
- partitioned by month (96 partitions)
- each table partition contains ~30k records


EXPLAIN ANALYZE output:
Note: could not produce output for exact query due to OOM-killer, but
ran query by limiting the subquery to the first 50 results. The planner
iterates over all partitions, but only the first two partitions are noted
for brevity.


QUERY PLAN
---
 Limit  (cost=451279.67..451279.70 rows=10 width=55) (actual
time=18343.085..18343.090 rows=10 loops=1)
   -  Sort  (cost=451279.67..456398.37 rows=2047480 width=55) (actual
time=18343.083..18343.087 rows=10 loops=1)
 Sort Key: A.Date
 Sort Method:  top-N heapsort  Memory: 26kB
 -  Result  (cost=1.21..407034.37 rows=2047480 width=55) (actual
time=0.793..17014.726 rows=4160606 loops=1)
   -  Append  (cost=1.21..407034.37 rows=2047480 width=55)
(actual time=0.792..16119.298 rows=4160606 loops=1)
 -  Seq Scan on tableA A  (cost=1.21..19.08 rows=1
width=44) (actual time=0.002..0.002 rows=0 loops=1)
   Filter: (boolean AND (NOT (hashed SubPlan 1))
AND ((text = 'abc'::text) OR (text = 'xyz'::text)))
   SubPlan 1
 -  Limit  (cost=0.00..1.08 rows=50 width=8)
(actual time=0.010..0.054 rows=50 loops=210)
   -  Result  (cost=0.00..9249.46
rows=427846 width=8) (actual time=0.009..0.044 rows=50 loops=210)
 -  Append  (cost=0.00..9249.46
rows=427846 width=8) (actual time=0.008..0.031 rows=50 loops=210)
   -  Seq Scan on tableB
(cost=0.00..15.30 rows=530 width=8) (actual time=0.001..0.001 rows=0
loops=210)
   -  Seq Scan on
tableB_201201 tableB  (cost=0.00..15.30 rows=530 width=8) (actual
time=0.000..0.000 rows=0 loops=210)
   -  Seq Scan on
tableB_201112 tableB  (cost=0.00..251.25 rows=12125 width=8) (actual
time=0.006..0.019 rows=50 loops=210)