Re: [PERFORM] queries with subquery constraints on partitioned tables not optimized?
Tom Lane t...@sss.pgh.pa.us writes: Davor J. dav...@live.com writes: Now, if one takes a subquery for 1, the optimizer evaluates it first (let's say to 1), but then searches for it (sequentially) in every partition, which, for large partitions, can be very time-consuming and goes beyond the point of partitioning. No, the optimizer doesn't evaluate it first. Subqueries aren't ever assumed to reduce to constants. (If you actually do have a constant expression, why don't you just leave out the word SELECT?) It's easy to experience the same problem with a JOIN you'd want to happen at the partition level that the planner will apply on the Append Node. I'm yet to figure out if 8.4 is smarter about this, meanwhile I'm using array tricks to force the push-down. WHERE ... AND service = ANY ((SELECT array_accum(id) FROM services WHERE x=281) || (SELECT array_accum(id) FROM services WHERE y=281)) It happens that I need the array concatenation more than the = ANY operator (as compared to IN), so I also have queries using = ANY ('{}':int[] || (SELECT array_accum(x) ...)) to really force the planner into doing the join in the partitions rather than after the Append has taken place. Regards, -- dim PS: If you're interrested into complete examples, I'll be able to provide for them in private. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] System overload / context switching / oom, 8.3
On Tue, 2 Feb 2010, Rob wrote: pg 8.3.9, Debian Etch, 8gb ram, quadcore xeon, megaraid (more details at end) ~240 active databases, 800+ db connections via tcp. Linux 2.6.18-6-686-bigmem #1 SMP Thu Nov 5 17:30:05 UTC 2009 i686 GNU/Linux (Debian Etch) 8 MB RAM 4 Quad Core Intel(R) Xeon(R) CPU E5440 @ 2.83GHz stepping 06 My advice? 1. Switch to 64-bit operating system and Postgres. Debian provides that, and it works a charm. You have a 64-bit system, so why not use it? 2. Buy more RAM. Think about it - you have 800 individual processes running on your box, and they will all want their own process space. To be honest, I'm impressed that the current machine works at all. You can get an idea of how much RAM you might need by multiplying the number of connections by (work_mem + about 3MB), and add on shared_buffers. So even when the system is idle you're currently burning 3200MB just sustaining 800 processes - more if they are actually doing something. 3. Try to reduce the number of connections to the database server. 4. Think about your work_mem. Finding the correct value for you is going to be a matter of testing. Smaller values will result in large queries running slowly, but have the danger of driving the system to swap and OOM. Matthew -- A good programmer is one who looks both ways before crossing a one-way street. Considering the quality and quantity of one-way streets in Cambridge, it should be no surprise that there are so many good programmers there. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
On Tue, Feb 2, 2010 at 7:45 PM, Robert Haas robertmh...@gmail.com wrote: I think you're probably right, but it's not clear what the new name should be until we have a comment explaining what the function is responsible for. So I wrote some comments but wasn't going to repost the patch with the unchanged name without explanation... But I think you're right though I was looking at it the other way around. I want to have an API for a two-stage sync and of course if I do that I'll comment it to explain that clearly. The gist of the comments was that the function is preparing to fsync to initiate the i/o early and allow the later fsync to fast -- but also at the same time have the beneficial side-effect of avoiding cache poisoning. It's not clear that the two are necessarily linked though. Perhaps we need two separate apis, though it'll be hard to keep them separate on all platforms. -- greg -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
On 02/03/10 12:53, Greg Stark wrote: On Tue, Feb 2, 2010 at 7:45 PM, Robert Haasrobertmh...@gmail.com wrote: I think you're probably right, but it's not clear what the new name should be until we have a comment explaining what the function is responsible for. So I wrote some comments but wasn't going to repost the patch with the unchanged name without explanation... But I think you're right though I was looking at it the other way around. I want to have an API for a two-stage sync and of course if I do that I'll comment it to explain that clearly. The gist of the comments was that the function is preparing to fsync to initiate the i/o early and allow the later fsync to fast -- but also at the same time have the beneficial side-effect of avoiding cache poisoning. It's not clear that the two are necessarily linked though. Perhaps we need two separate apis, though it'll be hard to keep them separate on all platforms. I vote for two seperate apis - sure, there will be some unfortunate overlap for most unixoid platforms but its sure better possibly to allow adding more platforms later at a centralized place than having to analyze every place where the api is used. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
On Wed, Feb 3, 2010 at 6:53 AM, Greg Stark gsst...@mit.edu wrote: On Tue, Feb 2, 2010 at 7:45 PM, Robert Haas robertmh...@gmail.com wrote: I think you're probably right, but it's not clear what the new name should be until we have a comment explaining what the function is responsible for. So I wrote some comments but wasn't going to repost the patch with the unchanged name without explanation... But I think you're right though I was looking at it the other way around. I want to have an API for a two-stage sync and of course if I do that I'll comment it to explain that clearly. The gist of the comments was that the function is preparing to fsync to initiate the i/o early and allow the later fsync to fast -- but also at the same time have the beneficial side-effect of avoiding cache poisoning. It's not clear that the two are necessarily linked though. Perhaps we need two separate apis, though it'll be hard to keep them separate on all platforms. Well, maybe we should start with a discussion of what kernel calls you're aware of on different platforms and then we could try to put an API around it. I mean, right now all you've got is POSIX_FADV_DONTNEED, so given just that I feel like the API could simply be pg_dontneed() or something. It's hard to design a general framework based on one example. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
On 02/03/10 14:42, Robert Haas wrote: On Wed, Feb 3, 2010 at 6:53 AM, Greg Starkgsst...@mit.edu wrote: On Tue, Feb 2, 2010 at 7:45 PM, Robert Haasrobertmh...@gmail.com wrote: I think you're probably right, but it's not clear what the new name should be until we have a comment explaining what the function is responsible for. So I wrote some comments but wasn't going to repost the patch with the unchanged name without explanation... But I think you're right though I was looking at it the other way around. I want to have an API for a two-stage sync and of course if I do that I'll comment it to explain that clearly. The gist of the comments was that the function is preparing to fsync to initiate the i/o early and allow the later fsync to fast -- but also at the same time have the beneficial side-effect of avoiding cache poisoning. It's not clear that the two are necessarily linked though. Perhaps we need two separate apis, though it'll be hard to keep them separate on all platforms. Well, maybe we should start with a discussion of what kernel calls you're aware of on different platforms and then we could try to put an API around it. In linux there is sync_file_range. On newer Posixish systems one can emulate that with mmap() and msync() (in batches obviously). No idea about windows. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] some problems when i use postgresql 8.4.2 in my projects .
2010/2/2 wyx6...@sina.com: the unique constraints actualy kill concurrency write transaction when concurrency insert violate the unique constraints , they block each other , i test this in oracle10g, has the same behavour. I think this may be reasonable because the uqniue check must be the seriazable check . for resolve this problem , i do the unique check in application as possible , but in big concurrency env , this is not good way . You may find that your way isn't actually very reliable, and that making it reliable will be very, very much harder (and likely no faster) than letting PostgreSQL do it. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: Optimizing Postgresql server and FreeBSD for heavy read and writes
Forgot to add that I am using Postgres 8.4.2 from the default ports of FreeBSD. With regards Amitabh Kant On Wed, Feb 3, 2010 at 8:40 PM, Amitabh Kant amitabhk...@gmail.com wrote: Hello I have a server dedicated for Postgres with the following specs: RAM 16GB, 146GB SAS (15K) x 4 - RAID 10 with BBU, Dual Xeon E5345 @ 2.33GHz OS: FreeBSD 8.0 It runs multiple (approx 10) databases ranging from 500MB to over 24 GB in size. All of them are of the same structure, and almost all of them have very heavy read and writes. pgtune (http://pgfoundry.org/projects/pgtune/) suggests the settings to be changed as : maintenance_work_mem = 960MB # pg_generate_conf wizard 2010-02-03 checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-02-03 effective_cache_size = 11GB # pg_generate_conf wizard 2010-02-03 work_mem = 160MB # pg_generate_conf wizard 2010-02-03 wal_buffers = 8MB # pg_generate_conf wizard 2010-02-03 checkpoint_segments = 16 # pg_generate_conf wizard 2010-02-03 shared_buffers = 3840MB # pg_generate_conf wizard 2010-02-03 max_connections = 100 # pg_generate_conf wizard 2010-02-03 While this gives me the changes for postgresql.conf, I am not sure of of the chnages that I need to make in FreeBSD to support such large memory allocations. The last time I tried, Postgres refused to start and I had to fall back to the default settings. I would appreciate if somebody could point out the sysctl/loader.conf settings that I need to have in FreeBSD. With regards Amitabh Kant
Re: [PERFORM] some problems when i use postgresql 8.4.2 in my projects .
wyx6...@sina.com wrote: after shaming , I think i should pick out some my points: the unique constraints actualy kill concurrency write transaction when concurrency insert violate the unique constraints , they block each other , i test this in oracle10g, has the same behavour. I think this may be reasonable because the uqniue check must be the seriazable check . for resolve this problem , i do the unique check in application as possible , but in big concurrency env , this is not good way . How can you enforce uniqueness in the application? If you implement it correctly, you need considerably longer than letting it do PostgreSQL. Even if you use some kind of magic, I could not imagine, how you can implement a unique constraint in the application and gaurantee uniqueness while at the same time be faster than the RDBMS. Leo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] queries with subquery constraints on partitioned tables not optimized?
On Tue, Feb 2, 2010 at 7:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: Davor J. dav...@live.com writes: Now, if one takes a subquery for 1, the optimizer evaluates it first (let's say to 1), but then searches for it (sequentially) in every partition, which, for large partitions, can be very time-consuming and goes beyond the point of partitioning. No, the optimizer doesn't evaluate it first. Subqueries aren't ever assumed to reduce to constants. (If you actually do have a constant expression, why don't you just leave out the word SELECT?) regards, tom lane If you don't have a constant expression then you can either explicitly loop in the calling code or a function or you could index the key in all the subtables. The index isn't really optimal but it gets the job done. Nik
Re: [PERFORM] Queries within a function
2010/2/2 Mridula Mahadevan mmahade...@stratify.com Hi, I am running a bunch of queries within a function, creating some temp tables and populating them. When the data exceeds say, 100k the queries start getting really slow and timeout (30 min). when these are run outside of a transaction(in auto commit mode), they run in a few seconds. Any ideas on what may be going on and any postgresql.conf parameters etc that might help? Thanks Have you tried to analyze temp tables after you've populated them? Because AFAIK it won't do it automatically for tables created, filled and then used in same transaction.
Re: [PERFORM] Re: Optimizing Postgresql server and FreeBSD for heavy read and writes
On Wed, 2010-02-03 at 20:42 +0530, Amitabh Kant wrote: Forgot to add that I am using Postgres 8.4.2 from the default ports of FreeBSD. start with this page http://www.postgresql.org/docs/8.4/static/kernel-resources.html -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes
On 02/03/10 16:10, Amitabh Kant wrote: Hello I have a server dedicated for Postgres with the following specs: RAM 16GB, 146GB SAS (15K) x 4 - RAID 10 with BBU, Dual Xeon E5345 @ 2.33GHz OS: FreeBSD 8.0 If you really do have heavy read and write load on the server, nothing will save you from the bottleneck of having only 4 drives in the system (or more accurately: adding more memory will help reads but nothing helps writes except more drivers or faster (SSD) drives). If you can, add another 2 drives in RAID 1 and move+symlink the pg_xlog directory to the new array. maintenance_work_mem = 960MB # pg_generate_conf wizard 2010-02-03 checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-02-03 effective_cache_size = 11GB # pg_generate_conf wizard 2010-02-03 work_mem = 160MB # pg_generate_conf wizard 2010-02-03 wal_buffers = 8MB # pg_generate_conf wizard 2010-02-03 checkpoint_segments = 16 # pg_generate_conf wizard 2010-02-03 shared_buffers = 3840MB # pg_generate_conf wizard 2010-02-03 max_connections = 100 # pg_generate_conf wizard 2010-02-03 I would appreciate if somebody could point out the sysctl/loader.conf settings that I need to have in FreeBSD. Firstly, you need to run a 64-bit version (amd64) of FreeBSD. In /boot/loader.conf you will probably need to increase the number of sysv ipc semaphores: kern.ipc.semmni=512 kern.ipc.semmns=1024 This depends mostly on the number of connections allowed to the server. The example values I gave above are more than enough but since this is a boot-only tunable it is expensive to modify later. In /etc/sysctl.conf you will need to increase the shared memory sizes, e.g. for a 3900 MB shared_buffer: kern.ipc.shmmax=4089446400 This is the maximum shared memory segment size, in bytes. kern.ipc.shmall=105 This is the maximum amount of memory allowed to be used as sysv shared memory, in 4 kB pages. If the database contains many objects (tables, indexes, etc.) you may need to increase the maximum number of open files and the amount of memory for the directory list cache: kern.maxfiles=16384 vfs.ufs.dirhash_maxmem=4194304 If you estimate you will have large sequential reads on the database, you should increase read-ahead count: vfs.read_max=32 Be sure that soft-updates is enabled on the file system you are using for data. Ignore all Linux-centric discussions about problems with journaling and write barriers :) All settings in /etc/sysctl.conf can be changed at runtime (individually or by invoking /etc/rc.d/sysctl restart), settings in loader.conf are boot-time only. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes
On 2/3/2010 9:10 AM, Amitabh Kant wrote: Hello I have a server dedicated for Postgres with the following specs: RAM 16GB, 146GB SAS (15K) x 4 - RAID 10 with BBU, Dual Xeon E5345 @ 2.33GHz OS: FreeBSD 8.0 It runs multiple (approx 10) databases ranging from 500MB to over 24 GB in size. All of them are of the same structure, and almost all of them have very heavy read and writes. With regards Amitabh Kant What problems are you having? Is it slow? Is there something you are trying to fix, or is this just the first tune up? memory allocations. The last time I tried, Postgres refused to start and I had to fall back to the default settings. Its probably upset about the amount of shared mem. There is probably a way in bsd to set the max amount of shared memory available. A Quick google turned up: kern.ipc.shmmax Dunno if thats right. When you try to start PG, if it cannot allocate enough shared mem it'll spit out an error message into its log saying how much it tried to allocate. Check: http://archives.postgresql.org/pgsql-admin/2004-06/msg00155.php maintenance_work_mem = 960MB # pg_generate_conf wizard 2010-02-03 checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-02-03 effective_cache_size = 11GB # pg_generate_conf wizard 2010-02-03 work_mem = 160MB # pg_generate_conf wizard 2010-02-03 wal_buffers = 8MB # pg_generate_conf wizard 2010-02-03 checkpoint_segments = 16 # pg_generate_conf wizard 2010-02-03 shared_buffers = 3840MB # pg_generate_conf wizard 2010-02-03 max_connections = 100 # pg_generate_conf wizard 2010-02-03 Some of these seem like too much. I'd recommend starting with one or two and see how it runs. Then increase if you're still slow. Start with effective_cache_size, shared_buffers and checkpoint_segments. Wait until very last to play with work_mem and maintenance_work_mem. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow-ish Query Needs Some Love
On 2/3/2010 11:17 AM, Matt White wrote: On Feb 2, 1:11 pm, a...@squeakycode.net (Andy Colson) wrote: On 2/2/2010 1:03 PM, Matt White wrote: On Feb 2, 6:06 am, Edgardo Portalegportal2...@yahoo.comwrote: On 2010-02-02, Matt Whitemattw...@gmail.comwrote: I have a relatively straightforward query that by itself isn't that slow, but we have to run it up to 40 times on one webpage load, so it needs to run much faster than it does. Here it is: SELECT COUNT(*) FROM users, user_groups WHERE users.user_group_id = user_groups.id AND NOT users.deleted AND user_groups.partner_id IN (partner_id_1, partner_id_2); The structure is partners have user groups which have users. In the test data there are over 200,000 user groups and users but only ~3000 partners. Anyone have any bright ideas on how to speed this query up? Can you avoid running it 40 times, maybe by restructuring the query (or making a view) along the lines of the following and adding some logic to your page? SELECT p.partner_id, ug.user_group_id, u.id, count(*) FROM partners p LEFT JOIN user_groups ug ON ug.partner_id=p.partner_id LEFT JOIN users u ON u.user_group_id=ug.id WHERE NOT u.deleted GROUP BY 1,2,3 ; Thanks for the suggestion. The view didn't seem to speed things up. Perhaps we can reduce the number of times it's called, we'll see. Any additional ideas would be helpful. Thanks. I agree with Edgardo, I think the biggest time saver will be reducing trips to the database. But... do you have an index on users.user_group_id? Does rewriting it change the plan any? SELECT COUNT(*) FROM users inner join user_groups on (users.user_group_id = user_groups.id) where NOT users.deleted AND user_groups.partner_id IN (partner_id_1, partner_id_2); And... it looks like the row guestimate is off a litte: Index Scan using user_groups_partner_id_idx on user_groups (cost=0.00..133.86 rows=3346 width=8) (actual time=0.049..96.992 rows=11 loops=2) It guessed 3,346 rows, but actually got 100,001. Have you run an analyze on it? If so, maybe bumping up the stats might help? -Andy -- Sent via pgsql-performance mailing list (pgsql-performa...@postgresql.org) To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-performance Andy, I have run analyze, see my query plan in my original post. You'll have to forgive me for being a bit of a Postgres noob but what do you mean by bumping up the stats? Thats not what I mean. explain analyze select... is what you did, and correct. What I meant was analyze user_groups. see: http://www.postgresql.org/docs/8.4/interactive/sql-analyze.html an analyze will make PG look at a table, and calc stats on it, so it can make better guesses. By default analyze only looks at a few rows (well a small percent of rows) and makes guesses about the entire table based on those rows. If it guesses wrong, sometimes you need to tell it to analyze more rows (ie. a bigger percentage of the table). By bumping the stats I was referring to this: http://wiki.postgresql.org/wiki/Planner_Statistics I have never had to do it, so dont know much about it. It may or may not help. Just thought it was something you could try. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queries within a function
Thank you all, You were right on the analyze. Insert statement with an aggregated subquery had a problem on an empty table. I had to change the queries to do a simple insert then analyze on the table followed by an update with an aggregated sub query. That goes thru very fast. -mridula From: Віталій Тимчишин [mailto:tiv...@gmail.com] Sent: Wednesday, February 03, 2010 8:11 AM To: Mridula Mahadevan Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Queries within a function 2010/2/2 Mridula Mahadevan mmahade...@stratify.commailto:mmahade...@stratify.com Hi, I am running a bunch of queries within a function, creating some temp tables and populating them. When the data exceeds say, 100k the queries start getting really slow and timeout (30 min). when these are run outside of a transaction(in auto commit mode), they run in a few seconds. Any ideas on what may be going on and any postgresql.conf parameters etc that might help? Thanks Have you tried to analyze temp tables after you've populated them? Because AFAIK it won't do it automatically for tables created, filled and then used in same transaction.
Re: [PERFORM] System overload / context switching / oom, 8.3
On Tue, Feb 2, 2010 at 3:47 PM, Andy Colson a...@squeakycode.net wrote: effective_cache_size = 5000MB I see your running a 32bit, but with bigmem support, but still, one process is limited to 4gig. You'd make better use of all that ram if you switched to 64bit. And this cache, I think, would be limited to 4gig. Just to be clear, effective_cache_size does not allocate any memory of any kind, in any way, ever... ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes
On Wed, Feb 3, 2010 at 10:10 AM, Amitabh Kant amitabhk...@gmail.com wrote: work_mem = 160MB # pg_generate_conf wizard 2010-02-03 Overall these settings look sane, but this one looks like an exception. That is an enormous value for that parameter... ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes
Robert Haas wrote: On Wed, Feb 3, 2010 at 10:10 AM, Amitabh Kant amitabhk...@gmail.com wrote: work_mem = 160MB # pg_generate_conf wizard 2010-02-03 Overall these settings look sane, but this one looks like an exception. That is an enormous value for that parameter... Yeah, I think I need to retune the suggestions for that parameter. The idea behind the tuning profile used in the web and OLTP setups is that you're unlikely to have all the available connections doing something involving sorting at the same time with those workloads, and when it does happen you want it to use the fastest approach possible even if that takes more RAM so the client waiting for a response is more likely to get one on time. That's why the work_mem figure in those situations is set very aggressively: total_mem / connections, so on a 16GB server that comes out to the 160MB seen here. I'm going to adjust that so that it's capped a little below (total_mem - shared_buffers) / connections instead. pgtune just got a major bit of refactoring recently from Matt Harrison to make it more Python-esque, and I'll be pushing toward an official 1.0 with all the major loose ends cleaned up and an adjusted tuning model that will be available before 9.0 ships. I'm seeing enough people interested in it now to justify putting another block of work into improving it. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us
Re: [PERFORM] System overload / context switching / oom, 8.3
Andy Colson wrote: work_mem = 32MB maintenance_work_mem = 64MB if you have lots and lots of connections, you might need to cut these down? definitely, work_mem is the main focus. If I understand correctly, th 64MB maintenance_work_mem is per vacuum task, and on this system there are 3 autovacuums. I was wondering if with this many databases, possibly decreasing the maintenance_work_mem significantly and starting up more autovacuums. Yes, also moving databases to other servers in order to decrease the number of connections. effective_cache_size = 5000MB I see your running a 32bit, but with bigmem support, but still, one process is limited to 4gig. You'd make better use of all that ram if you switched to 64bit. And this cache, I think, would be limited to 4gig. All of the cache is being used because the operating system kernel is built with the memory extensions to access outside the 32bit range. This is the cache size reported by free(1). However, there may be advantages to switch to 64bit. The oom-killer is kicking in, at some point, so somebody is using too much ram. There should be messages or logs or something, right? (I've never enabled the oom stuff so dont know much about it). But the log messages might be helpful. Also, do you know what the oom max memory usage is set to? You said: oom_adj -17. vm_overcommit_memory set to 2, but at this time vm_overcommit_ratio was still at 50 (has since been changed to 90, should this be 100?) Oh man. I encourage everyone to find out what /proc/pid/oom_adj means. You have to set this to keep the Linux oom-killer from doing a kill -9 on postgres postmaster. On Debian: echo -17 /proc/$(cat /var/run/postgresql/8.3-main.pid)/oom_adj This is my experience with oom-killer. After putting -17 into /proc/pid/oom_adj, oom-killer seemed to kill one of the database connection processes. Then the postmaster attempted to shut down all processes because of possible shared memory corruption. The database then went into recovery mode. After stopping the database some of the processes were stuck and could not be killed. The operating system was rebooted and the database returned with no data loss. My earlier experience with oom-killer: If you don't have this setting in oom_adj, then it seems likely (certain?) that oom-killer kills the postmaster because of the algorithm oom-killer uses (called badness()) which adds children process scores to their parent's scores. I don't know if sshd was killed but I don't think anyone could log in to the OS. After rebooting there was a segmentation violation when trying to start the postmaster. I don't think that running pg_resetxlog with defaults is a good idea. My colleague who has been investigating the crash believes that we could have probably eliminated at least some of the data loss with more judicious use of pg_resetxlog. There was a discussion on the postgres lists about somehow having the postgres distribution include the functionality to set oom_adj on startup. To my knowledge, that's not in 8.3 so I wrote a script and init.d script to do this on Debian systems. As far as vm.over_commit memory goes, there are three settings and most recommend setting it to 2 for postgres. However, this does not turn off oom-killer! You need to put -17 in /proc/pid/oom_adj whether you do anything about vm.over_commit memory or not We had vm_overcommit_memory set to 2 and oom-killer became active and killed the postmaster. Kind of off-topic, but a Linux kernel parameter that's often not set on database servers is elevator=deadline which sets up the io scheduling algorithm. The algorithm can be viewed/set at runtime for example the disk /dev/sdc in /sys/block/sdc/queue/scheduler. Rob -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] System overload / context switching / oom, 8.3
Rob Lemley wrote: here was a discussion on the postgres lists about somehow having the postgres distribution include the functionality to set oom_adj on startup. To my knowledge, that's not in 8.3 so I wrote a script and init.d script to do this on Debian systems. That's not in anything earlier than the upcoming 9.0 because the support code involved just showed up: http://archives.postgresql.org/pgsql-committers/2010-01/msg00169.php It was always possible to do this in an init script as you describe. The specific new feature added is the ability to remove client child processes from having that protection, so that they can still be killed normally. Basically, limiting the protection just at the process that you really need it on. The updated documentation for the new version has more details about this whole topic, useful to people running older versions too: http://developer.postgresql.org/pgdocs/postgres/kernel-resources.html Kind of off-topic, but a Linux kernel parameter that's often not set on database servers is elevator=deadline which sets up the io scheduling algorithm. The algorithm can be viewed/set at runtime for example the disk /dev/sdc in /sys/block/sdc/queue/scheduler. I've never seen a real-world PostgreSQL workload where deadline worked better than CFQ, and I've seen a couple where it was significantly worse. Playing with that parameter needs a heavy disclaimer that you should benchmark *your app* before and after changing it to make sure it was actually useful. Actually, three times: return to CFQ again afterwards, too, just to confirm it's not a faster on the second run effect. The important things to get right on Linux are read-ahead and reducing the size of the write cache size--the latter being the more direct and effective way to improve the problem that the scheduler change happens to impact too. Those have dramatically more importance than sensible changes to the scheduler used (with using the anticipatory one on a server system or the no-op one on a desktop would be non-sensible changes). -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance