[PERFORM] Fw: Help me put 2 Gigs of RAM to use
Hello, PostgreSQL has served us very well powering a busy national pet adoption website. Now I'd like to tune our setup further get more out of hardware. What I'm noticing is that the while the FreeBSD server has 4 Gigs of memory, there are rarely every more than 2 in use-- the memory use graphs as being rather constant. My goal is to make good use of those 2 Gigs of memory to improve performance and reduce the CPU usage. The server has 4 2.33 Ghz processors in it, and RAIDed 15k RPM SCSI disks. Here are some current memory-related settings from our postgresql.conf file. (We currently run 8.2, but are planning an upgrade to 8.4 soon). Do you see an obvious suggestions for improvement? I find the file a bit hard to read because of the lack of units in the examples, but perhaps that's already been addressed in future versions. max_connections= 400 # Seems to be enough us shared_buffers = 8192 effective_cache_size = 1000 work_mem = 4096 maintenance_work_mem = 160MB Thanks for your suggestions! Mark [I tried to post this yesterday but didn't see it come through. This message is a second attempt.) -- . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark StosbergPrincipal Developer m...@summersault.com Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summersault.com/ . . . . . . . . -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fw: Help me put 2 Gigs of RAM to use
On Thu, 10 Dec 2009, Mark Stosberg wrote: What I'm noticing is that the while the FreeBSD server has 4 Gigs of memory, there are rarely every more than 2 in use-- the memory use graphs as being rather constant. My goal is to make good use of those 2 Gigs of memory to improve performance and reduce the CPU usage. I think you'll find that the RAM is already being used quite effectively as disc cache by the OS. It sounds like the server is actually set up pretty well. You may get slightly better performance by tweaking a thing here or there, but the server needs some OS disc cache to perform well. (We currently run 8.2, but are planning an upgrade to 8.4 soon). Highly recommended. [I tried to post this yesterday but didn't see it come through. This message is a second attempt.) The mailing list server will silently chuck any message whose subject starts with the word help, just in case you're asking for help about managing the mailing list. The default behaviour is not to inform you that it has done so. It is highly annoying - could a list admin please consider changing this? Matthew -- I would like to think that in this day and age people would know better than to open executables in an e-mail. I'd also like to be able to flap my arms and fly to the moon.-- Tim Mullen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] big select is resulting in a large amount of disk writing by kjournald
Joseph S j...@selectacast.net wrote: I just installed a shiny new database server with pg 8.4.1 running on CentOS 5.4. After using slony to replicate over my database I decided to do some basic performance tests to see how spiffy my shiny new server is. This machine has 32G ram, over 31 of which is used for the system file cache. So I run select count(*) from large_table and I see in xosview a solid block of write activity. Runtime is 28125.644 ms for the first run. The second run does not show a block of write activity and takes 3327.441 ms As others have mentioned, this is due to hint bit updates, and doing an explicit VACUUM after the load and before you start using the database will avoid run-time issues. You also need statistics, so be sure to do VACUUM ANALYZE. There is one other sneaky surprise awaiting you, however. Since this stuff was all loaded with a narrow range of transaction IDs, they will all need to be frozen at about the same time; so somewhere down the road, either during a routine database vacuum or possibly in the middle of normal operations, all of these rows will need to be rewritten *again* to change the transaction IDs used for managing MVCC to the special frozen value. We routinely follow a load with VACUUM FREEZE ANALYZE of the database to combine the update to freeze the tuples with the update to set the hint bits and avoid this problem. There has been some talk about possibly writing tuples in a frozen state with the hint bits already set if they are loaded in the same database transaction which creates the table, but I'm not aware of anyone currently working on this. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help me put 2 Gigs of RAM to use
Thanks for the response, Matthew. On Thu, 10 Dec 2009, Mark Stosberg wrote: What I'm noticing is that the while the FreeBSD server has 4 Gigs of memory, there are rarely every more than 2 in use-- the memory use graphs as being rather constant. My goal is to make good use of those 2 Gigs of memory to improve performance and reduce the CPU usage. I think you'll find that the RAM is already being used quite effectively as disc cache by the OS. It sounds like the server is actually set up pretty well. You may get slightly better performance by tweaking a thing here or there, but the server needs some OS disc cache to perform well. As part of reviewing this status, I it appears that the OS is only addresses 3 of the 4 Gigs of memory. We'll work on our FreeBSD setup to cure that. Here's how top reports the memory breakdown: Mem: 513M Active, 2246M Inact, 249M Wired, 163M Cache, 112M Buf, 7176K Free Swap: 9216M Total, 1052K Used, 9215M Free So perhaps the OS disc cache is represented in the Inactive memory statistic? I suppose once we have the 4th Gig of memory actually available, that would all be doing to the disk cache. (We currently run 8.2, but are planning an upgrade to 8.4 soon). Highly recommended. For performance improvements in particular? Mark -- . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark StosbergPrincipal Developer m...@summersault.com Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summersault.com/ . . . . . . . . -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fw: Help me put 2 Gigs of RAM to use
Mark Stosberg wrote: I find the file a bit hard to read because of the lack of units in the examples, but perhaps that's already been addressed in future versions. max_connections= 400 # Seems to be enough us shared_buffers = 8192 effective_cache_size = 1000 work_mem = 4096 maintenance_work_mem = 160MB It's already addressed in 8.2, as you can note by the fact that maintenance_work_mem is in there with an easy to read format. Guessing that someone either pulled in settings from an older version, or used some outdated web guide to get starter settings. To convert the rest of them, you need to know what the units for each parameter is. You can find that out like this: gsmith=# select name,setting,unit from pg_settings where name in ('shared_buffers','effective_cache_size','work_mem'); name | setting | unit --+-+-- effective_cache_size | 16384 | 8kB shared_buffers | 4096| 8kB work_mem | 1024| kB So your shared buffers setting is 8192 * 8K = 64MB effective_cache_size is 8MB work_mem is 4MB. The first and last of those are reasonable but on the small side, the last is...not. Increasing it won't actually use more memory on your server though, it will just change query plans--so you want to be careful about increasing it too much in one shot. The next set of stuff you need to know about general guidelines for server sizing is at http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server You'd probably want to put shared_buffers at a higher level based on the amount of RAM on your server, but I'd suggest you tune the checkpoint parameters along with that--just increasing the buffer space along can cause problems rather than solve them if you're having checkpoints all the time. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fw: Help me put 2 Gigs of RAM to use
On Thu, Dec 10, 2009 at 11:45 AM, Greg Smith g...@2ndquadrant.com wrote: So your shared buffers setting is 8192 * 8K = 64MB effective_cache_size is 8MB work_mem is 4MB. The first and last of those are reasonable but on the small side, the last is...not. I believe that the second instance of the word last in that sentence should have been middle, referring to effective_cache_size. Small values discourage the planner from using indices in certain situations. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Load experimentation
Hi Andy, Load is chugging along. We've optimized our postgres conf as much as possible but are seeing the inevitable I/O bottleneck. I had the same thought as you (converting inserts into copy's) a while back but unfortunately each file has many inserts into many different tables. Potentially I could rip through this with a little MapReduce job on 50-100 nodes, which is still something I might do. One thought we are playing with was taking advantage of 4 x 414GB EBS devices in a RAID0 configuration. This would spread disk writes across 4 block devices. Right now I'm wrapping about 1500 inserts in a transaction block. Since its an I/O bottlenecks, COPY statements might not give me much advantage. Its definitely a work in progress :) Ben On 09/12/2009 5:31 AM, Andy Colson wrote: On 12/07/2009 12:12 PM, Ben Brehmer wrote: Hello All, I'm in the process of loading a massive amount of data (500 GB). After some initial timings, I'm looking at 260 hours to load the entire 500GB. 10 days seems like an awfully long time so I'm searching for ways to speed this up. The load is happening in the Amazon cloud (EC2), on a m1.large instance: -7.5 GB memory -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) -64-bit platform So far I have modified my postgresql.conf file (PostgreSQL 8.1.3). The modifications I have made are as follows: shared_buffers = 786432 work_mem = 10240 maintenance_work_mem = 6291456 max_fsm_pages = 300 wal_buffers = 2048 checkpoint_segments = 200 checkpoint_timeout = 300 checkpoint_warning = 30 autovacuum = off There are a variety of instance types available in the Amazon cloud (http://aws.amazon.com/ec2/instance-types/), including high memory and high CPU. High memory instance types come with 34GB or 68GB of memory. High CPU instance types have a lot less memory (7GB max) but up to 8 virtual cores. I am more than willing to change to any of the other instance types. Also, there is nothing else happening on the loading server. It is completely dedicated to the load. Any advice would be greatly appreciated. Thanks, Ben I'm kind of curious, how goes the load? Is it done yet? Still looking at days'n'days to finish? I was thinking... If the .sql files are really nicely formatted, it would not be too hard to whip up a perl script to run as a filter to change the statements into copy's. Each file would have to only fill one table, and only contain inserts, and all the insert statements would have to set the same fields. (And I'm sure there could be other problems). Also, just for the load, did you disable fsync? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] 8.4.1 ubuntu karmic slow createdb
Hey, I've got a computer which runs but 8.3 and 8.4. To create a db it takes 4s for 8.3 and 9s for 8.4. I have many unit tests which create databases all of the time and now run much slower than 8.3 but it seems to be much longer as I remember at one point creating databases I considered an instantaneous thing. Does any on the list know why this is true and if I can get it back to normal. -Michael
Re: [PERFORM] 8.4.1 ubuntu karmic slow createdb
On Thursday 10 December 2009 21:41:08 Michael Clemmons wrote: Hey, I've got a computer which runs but 8.3 and 8.4. To create a db it takes 4s for 8.3 and 9s for 8.4. I have many unit tests which create databases all of the time and now run much slower than 8.3 but it seems to be much longer as I remember at one point creating databases I considered an instantaneous thing. Does any on the list know why this is true and if I can get it back to normal. Possibly you had fsync=off at the time? Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 8.4.1 ubuntu karmic slow createdb
Im not sure what that means ppl in my office with slower hd speeds using 8.4 can create a db in 2s vs my 8-12s. Could using md5 instead of ident do it? On Thu, Dec 10, 2009 at 4:56 PM, Andres Freund and...@anarazel.de wrote: On Thursday 10 December 2009 21:41:08 Michael Clemmons wrote: Hey, I've got a computer which runs but 8.3 and 8.4. To create a db it takes 4s for 8.3 and 9s for 8.4. I have many unit tests which create databases all of the time and now run much slower than 8.3 but it seems to be much longer as I remember at one point creating databases I considered an instantaneous thing. Does any on the list know why this is true and if I can get it back to normal. Possibly you had fsync=off at the time? Andres
Re: [PERFORM] 8.4.1 ubuntu karmic slow createdb
Hi, On Thursday 10 December 2009 23:01:08 Michael Clemmons wrote: Im not sure what that means ppl in my office with slower hd speeds using 8.4 can create a db in 2s vs my 8-12s. - Possibly their config is different - they could have disabled the fsync parameter which turns the database to be not crashsafe anymore but much faster in some circumstances. - Possibly you have much data in your template1 database? You could check whether CREATE DATABASE speedtest TEMPLATE template1; takes more time than CREATE DATABASE speedtest TEMPLATE template0;. You should issue both multiple times to ensure caching on the template database doesnt play a role. Could using md5 instead of ident do it? Seems unlikely. Is starting psql near-instantaneus? Are you using createdb or are you issuing CREATE DATABASE ...? Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Load experimentation
On 12/7/09 11:12 AM, Ben Brehmer benbreh...@gmail.com wrote: Thanks for the quick responses. I will respond to all questions in one email: COPY command: Unfortunately I'm stuck with INSERTS due to the nature this data was generated (Hadoop/MapReduce). If you have control over the MapReduce output, you can have that output result files in a format that COPY likes. If you don't have any control over that its more complicated. I use a final pass Hadoop Map only job to go over the output and insert into postgres directly from the job, using the : INSERT INTO table VALUES (val1, val2, ... ), (val1, val2, ...), ... Insert style from Java with about 80 rows per insert statement and a single transaction for about a thousand of these. This was faster than batch inserts . On 07/12/2009 10:39 AM, Thom Brown wrote: 2009/12/7 Kevin Grittner kevin.gritt...@wicourts.gov Ben Brehmer benbreh...@gmail.com wrote: -7.5 GB memory -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) -64-bit platform What OS? (PostgreSQL 8.1.3) Why use such an antiquated, buggy version? Newer versions are faster. -Kevin I'd agree with trying to use the latest version you can. How are you loading this data? I'd make sure you haven't got any indices, primary keys, triggers or constraints on your tables before you begin the initial load, just add them after. Also use either the COPY command for loading, or prepared transactions. Individual insert commands will just take way too long. Regards Thom -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 8.4.1 ubuntu karmic slow createdb
In my limited experience ext4 as presented by Karmic is not db friendly. I had to carve my swap partition into a swap partition and an xfs partition to get better db performance. Try fsync=off first, but if that doesn't work then try a mini xfs. On Thu, Dec 10, 2009 at 5:09 PM, Andres Freund and...@anarazel.de wrote: Hi, On Thursday 10 December 2009 23:01:08 Michael Clemmons wrote: Im not sure what that means ppl in my office with slower hd speeds using 8.4 can create a db in 2s vs my 8-12s. - Possibly their config is different - they could have disabled the fsync parameter which turns the database to be not crashsafe anymore but much faster in some circumstances. - Possibly you have much data in your template1 database? You could check whether CREATE DATABASE speedtest TEMPLATE template1; takes more time than CREATE DATABASE speedtest TEMPLATE template0;. You should issue both multiple times to ensure caching on the template database doesnt play a role. Could using md5 instead of ident do it? Seems unlikely. Is starting psql near-instantaneus? Are you using createdb or are you issuing CREATE DATABASE ...? Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Load experimentation
On 12/10/09 3:29 PM, Scott Carey sc...@richrelevance.com wrote: On 12/7/09 11:12 AM, Ben Brehmer benbreh...@gmail.com wrote: Thanks for the quick responses. I will respond to all questions in one email: COPY command: Unfortunately I'm stuck with INSERTS due to the nature this data was generated (Hadoop/MapReduce). If you have control over the MapReduce output, you can have that output result files in a format that COPY likes. If you don't have any control over that its more complicated. I use a final pass Hadoop Map only job to go over the output and insert into postgres directly from the job, using the : INSERT INTO table VALUES (val1, val2, ... ), (val1, val2, ...), ... Insert style from Java with about 80 rows per insert statement and a single transaction for about a thousand of these. This was faster than batch inserts . I should mention that the above is a bit off. There is an important caveat that each of these individual tasks might run twice in Hadoop (only one will finish -- speculative execution and retry on error). To deal with this you can run each job inside a single transaction so that a failure will rollback, and likely want to turn off speculative execution. Another option is to run only one map job, with no reduce for this sort of work in order to ensure duplicate data is not inserted. We are inserting into a temp table named uniquely per chunk first (sometimes in parallel). Then while holding a posstgres advisory lock we do a SELECT * FROM temp INTO destination type operation, which is fast. On 07/12/2009 10:39 AM, Thom Brown wrote: 2009/12/7 Kevin Grittner kevin.gritt...@wicourts.gov Ben Brehmer benbreh...@gmail.com wrote: -7.5 GB memory -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) -64-bit platform What OS? (PostgreSQL 8.1.3) Why use such an antiquated, buggy version? Newer versions are faster. -Kevin I'd agree with trying to use the latest version you can. How are you loading this data? I'd make sure you haven't got any indices, primary keys, triggers or constraints on your tables before you begin the initial load, just add them after. Also use either the COPY command for loading, or prepared transactions. Individual insert commands will just take way too long. Regards Thom -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance