Re: [PERFORM] High context switches occurring
Sun hardware is a 4 CPU (8 cores) v40z, Dell is 6850 Quad XEON (8 cores), both have 16GB RAM, and 2 internal drives, one drive has OS + data and second drive has pg_xlog. RedHat AS4.0 U2 64-bit on both servers, PG8.1, 64bit RPMs. Thanks, Anjan -Original Message- From: Juan Casero [mailto:[EMAIL PROTECTED] Sent: Monday, December 19, 2005 11:17 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] High context switches occurring Guys - Help me out here as I try to understand this benchmark. What is the Sun hardware and operating system we are talking about here and what is the intel hardware and operating system? What was the Sun version of PostgreSQL compiled with? Gcc on Solaris (assuming sparc) or Sun studio? What was PostgreSQL compiled with on intel? Gcc on linux? Thanks, Juan On Monday 19 December 2005 21:08, Anjan Dave wrote: > Re-ran it 3 times on each host - > > Sun: > -bash-3.00$ time pgbench -s 10 -c 10 -t 3000 pgbench > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > number of clients: 10 > number of transactions per client: 3000 > number of transactions actually processed: 3/3 > tps = 827.810778 (including connections establishing) > tps = 828.410801 (excluding connections establishing) > real0m36.579s > user0m1.222s > sys 0m3.422s > > Intel: > -bash-3.00$ time pgbench -s 10 -c 10 -t 3000 pgbench > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > number of clients: 10 > number of transactions per client: 3000 > number of transactions actually processed: 3/3 > tps = 597.067503 (including connections establishing) > tps = 597.606169 (excluding connections establishing) > real0m50.380s > user0m2.621s > sys 0m7.818s > > Thanks, > Anjan > > > -Original Message- > From: Anjan Dave > Sent: Wed 12/7/2005 10:54 AM > To: Tom Lane > Cc: Vivek Khera; Postgresql Performance > Subject: Re: [PERFORM] High context switches occurring > > > > Thanks for your inputs, Tom. I was going after high concurrent clients, > but should have read this carefully - > > -s scaling_factor > this should be used with -i (initialize) option. > number of tuples generated will be multiple of the > scaling factor. For example, -s 100 will imply 10M > (10,000,000) tuples in the accounts table. > default is 1. NOTE: scaling factor should be at least > as large as the largest number of clients you intend > to test; else you'll mostly be measuring update > contention. > > I'll rerun the tests. > > Thanks, > Anjan > > > -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 06, 2005 6:45 PM > To: Anjan Dave > Cc: Vivek Khera; Postgresql Performance > Subject: Re: [PERFORM] High context switches occurring > > "Anjan Dave" <[EMAIL PROTECTED]> writes: > > -bash-3.00$ time pgbench -c 1000 -t 30 pgbench > > starting vacuum...end. > > transaction type: TPC-B (sort of) > > scaling factor: 1 > > number of clients: 1000 > > number of transactions per client: 30 > > number of transactions actually processed: 3/3 > > tps = 45.871234 (including connections establishing) > > tps = 46.092629 (excluding connections establishing) > > I can hardly think of a worse way to run pgbench :-(. These numbers are > about meaningless, for two reasons: > > 1. You don't want number of clients (-c) much higher than scaling factor > (-s in the initialization step). The number of rows in the "branches" > table will equal -s, and since every transaction updates one > randomly-chosen "branches" row, you will be measuring mostly row-update > contention overhead if there's more concurrent transactions than there > are rows. In the case -s 1, which is what you've got here, there is no > actual concurrency at all --- all the transactions stack up on the > single branches row. > > 2. Running a small number of transactions per client means that > startup/shutdown transients overwhelm the steady-state data. You should > probably run at least a thousand transactions per client if you want > repeatable numbers. > > Try something like "-s 10 -c 10 -t 3000" to get numbers
Re: [PERFORM] High context switches occurring
Re-ran it 3 times on each host - Sun: -bash-3.00$ time pgbench -s 10 -c 10 -t 3000 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 10 number of transactions per client: 3000 number of transactions actually processed: 3/3 tps = 827.810778 (including connections establishing) tps = 828.410801 (excluding connections establishing) real0m36.579s user0m1.222s sys 0m3.422s Intel: -bash-3.00$ time pgbench -s 10 -c 10 -t 3000 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 10 number of transactions per client: 3000 number of transactions actually processed: 3/3 tps = 597.067503 (including connections establishing) tps = 597.606169 (excluding connections establishing) real0m50.380s user0m2.621s sys 0m7.818s Thanks, Anjan -Original Message- From: Anjan Dave Sent: Wed 12/7/2005 10:54 AM To: Tom Lane Cc: Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring Thanks for your inputs, Tom. I was going after high concurrent clients, but should have read this carefully - -s scaling_factor this should be used with -i (initialize) option. number of tuples generated will be multiple of the scaling factor. For example, -s 100 will imply 10M (10,000,000) tuples in the accounts table. default is 1. NOTE: scaling factor should be at least as large as the largest number of clients you intend to test; else you'll mostly be measuring update contention. I'll rerun the tests. Thanks, Anjan -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 06, 2005 6:45 PM To: Anjan Dave Cc: Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring "Anjan Dave" <[EMAIL PROTECTED]> writes: > -bash-3.00$ time pgbench -c 1000 -t 30 pgbench > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > number of clients: 1000 > number of transactions per client: 30 > number of transactions actually processed: 3/3 > tps = 45.871234 (including connections establishing) > tps = 46.092629 (excluding connections establishing) I can hardly think of a worse way to run pgbench :-(. These numbers are about meaningless, for two reasons: 1. You don't want number of clients (-c) much higher than scaling factor (-s in the initialization step). The number of rows in the "branches" table will equal -s, and since every transaction updates one randomly-chosen "branches" row, you will be measuring mostly row-update contention overhead if there's more concurrent transactions than there are rows. In the case -s 1, which is what you've got here, there is no actual concurrency at all --- all the transactions stack up on the single branches row. 2. Running a small number of transactions per client means that startup/shutdown transients overwhelm the steady-state data. You should probably run at least a thousand transactions per client if you want repeatable numbers. Try something like "-s 10 -c 10 -t 3000" to get numbers reflecting test conditions more like what the TPC council had in mind when they designed this benchmark. I tend to repeat such a test 3 times to see if the numbers are repeatable, and quote the middle TPS number as long as they're not too far apart. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SAN/NAS options
Usually manufacturer's claims are tested in 'ideal' conditions, it may not translate well on bandwidth seen on the host side. A 2Gbps Fiber Channel connection would (ideally) give you about 250MB/sec per HBA. Not sure how it translates for GigE considering scsi protocol overheads, but you may want to confirm from them how they achieved 370MB/sec (hwo many iSCSI controllers, what file system, how many drives, what RAID type, block size, strip size, cache settings, etc), and whether it was physical I/O or cached. In other words, if someone has any benchmark numbers, that would be helpful. Regarding diskless iscsi boots for future servers, remember that it's a shared storage, if you have a busy server attached to your Nexsan, you may have to think twice on sharing the performance (throughput and IOPS of the storage controller) without impacting the existing hosts, unless you are zizing it now. And you want to have a pretty clean GigE network, more or less dedicated to this block traffic. Large internal storage with more memory and AMD CPUs is an option as Luke had originally suggested. Check out Appro as well. I'd also be curious to know if someone has been using this (SATA/iSCSI/SAS) solution and what are some I/O numbers observed. Thanks, Anjan -Original Message- From: Matthew Schumacher [mailto:[EMAIL PROTECTED] Sent: Mon 12/19/2005 7:41 PM To: pgsql-performance@postgresql.org Cc: Subject: Re: [PERFORM] SAN/NAS options Jim C. Nasby wrote: > On Wed, Dec 14, 2005 at 01:56:10AM -0500, Charles Sprickman wrote: > You'll note that I'm being somewhat driven by my OS of choice, FreeBSD. > >>Unlike Solaris or other commercial offerings, there is no nice volume >>management available. While I'd love to keep managing a dozen or so >>FreeBSD boxes, I could be persuaded to go to Solaris x86 if the volume >>management really shines and Postgres performs well on it. > > > Have you looked at vinum? It might not qualify as a true volume manager, > but it's still pretty handy. I am looking very closely at purchasing a SANRAD Vswitch 2000, a Nexsan SATABoy with SATA disks, and the Qlogic iscsi controller cards. Nexsan claims up to 370MB/s sustained per controller and 44,500 IOPS but I'm not sure if that is good or bad. It's certainly faster than the LSI megaraid controller I'm using now with a raid 1 mirror. The sanrad box looks like it saves money in that you don't have to by controller cards for everything, but for I/O intensive servers such as the database server, I would end up buying an iscsi controller card anyway. At this point I'm not sure what the best solution is. I like the idea of having logical disks available though iscsi because of how flexible it is, but I really don't want to spend $20k (10 for the nexsan and 10 for the sanrad) and end up with poor performance. On other advantage to iscsi is that I can go completely diskless on my servers and boot from iscsi which means that I don't have to have spare disks for each host, now I just have spare disks for the nexsan chassis. So the question becomes: has anyone put postgres on an iscsi san, and if so how did it perform? schu ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] separate drives for WAL or pgdata files
Hi, I am not sure if there’s an obvious answer to this…If there’s a choice of an external RAID10 (Fiber Channel 6 or 8 15Krpm drives) enabled drives, what is more beneficial to store on it, the WAL, or the Database files? One of the other would go on the local RAID10 (4 drives, 15Krpm) along with the OS. This is a very busy database with high concurrent connections, random reads and writes. Checkpoint segments are 300 and interval is 6 mins. Database size is less than 50GB. It has become a bit more confusing because I am trying to allot shared storage across several hosts, and want to be careful not to overload one of the 2 storage processors. What should I check/monitor if more information is needed to determine this? Appreciate some suggestions. Thanks, Anjan This email message and any included attachments constitute confidential and privileged information intended exclusively for the listed addressee(s). If you are not the intended recipient, please notify Vantage by immediately telephoning 215-579-8390, extension 1158. In addition, please reply to this message confirming your receipt of the same in error. A copy of your email reply can also be sent to mailto:[EMAIL PROTECTED]. Please do not disclose, copy, distribute or take any action in reliance on the contents of this information. Kindly destroy all copies of this message and any attachments. Any other use of this email is prohibited. Thank you for your cooperation. For more information about Vantage, please visit our website at http://www.vantage.com.
Re: [PERFORM] PostgreSQL and Ultrasparc T1
8 HBAs at 200MB/sec would require a pretty significant Storage Processor backend unless cost is not a factor. Once you achieve that, there's a question of sharing/balancing I/O requirements of various other applications/databases on that same shared backend storage... Anjan -Original Message- From: Jignesh K. Shah [mailto:[EMAIL PROTECTED] Sent: Monday, December 19, 2005 9:27 AM To: Luke Lonergan Cc: Juan Casero; pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL and Ultrasparc T1 Sun Fire T2000 has 3 PCI-E and 1PCI-X slot free when shipped. Using dual fiber channel 2G adapters you can get about 200MB x 8 = 1600MB/sec IO bandwidth. Plus when 4G HBAs are supported that will double up. Now I think generally that's good enough for 1TB raw data or 2-3 TB Database size. Of course typically the database size in PostgreSQL space will be in the 100-500GB range so a Sun Fire T2000 can be a good fit with enough area to grow at a very reasonable price. Of course like someone mentioned if all you have is 1 connection using postgresql which cannot spawn helper processes/threads, this will be limited by the single thread performance which is about 1.2Ghz compared on Sun Fire T2000 to AMD64 (Sun Fire X4200) which pretty much has similar IO Bandwidth, same size chassis, but the individual AMD64 cores runs at about 2.4Ghz (I believe) and max you can get is 4 cores but you also have to do a little trade off in terms of power consumption in lei of faster single thread performance. So Choices are available with both architecture. .However if you have a webserver driving a postgreSQL backend, then UltraSPARC T1 might be a better option if you suddenly wants to do 100s of db connections. The SunFire T2000 gives you 8 cores with 32 threads in all running on the system. With PostgreSQL 8.1 fix for SMP Bufferpool performance and with ZFS now available in Solaris Express release, it would be interesting to see how the combination of PostgreSQL 8.1 and ZFS works on Solaris since ZFS is one of the perfect file systems for PostgreSQL where it wants all complexities (like block allocation, fragmentation, etc) to the underlying file systems and not re-implement its own infrastructure. If somebody is already conducting their own tests, do let me know. As soon as I get some free cycles, I want to run ZFS with PostgreSQL using Solaris Express. If you have some preferred workloads do let me know. Regards, Jignesh Luke Lonergan wrote: >Juan, > >On 12/18/05 8:35 AM, "Juan Casero" <[EMAIL PROTECTED]> wrote: > > > >>Can anyone tell me how well PostgreSQL 8.x performs on the new Sun Ultrasparc >>T1 processor and architecture on Solaris 10? I have a custom built retail >>sales reporting that I developed using PostgreSQL 7.48 and PHP on a Fedora >>Core 3 intel box. I want to scale this application upwards to handle a >>database that might grow to a 100 GB. Our company is green mission conscious >>now so I was hoping I could use that to convince management to consider a Sun >>Ultrasparc T1 or T2 system provided that if I can get the best performance >>out of it on PostgreSQL. So will newer versions of PostgreSQL (8.1.x) be >>able to take of advantage of the multiple cores on a T1 or T2?I cannot >>change the database and this will be a hard sell unless I can convince them >>that the performance advantages are too good to pass up. The company is >>moving in the Win32 direction and so I have to provide rock solid reasons for >>why I want to use Solaris Sparc on a T1 or T2 server for this database >>application instead of Windows on SQL Server. >> >> > >The Niagara CPUs are heavily multi-threaded and will require a lot of >parallelism to be exposed to them in order to be effective. > >Until Sun makes niagara-based machines with lots of I/O channels, there >won't be much I/O parallelism available to match the CPU parallelism. > >Bizgres MPP will use the process and I/O parallelism of these big SMP >machines and the version based on Postgres 8.1 will be out in February. > >- Luke > > > >---(end of broadcast)--- >TIP 5: don't forget to increase your free space map settings > > ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] SAN/NAS options
Luke, How did you measure 800MB/sec, is it cached, or physical I/O? -anjan -Original Message- From: Luke Lonergan [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 14, 2005 2:10 AM To: Charles Sprickman; pgsql-performance@postgresql.org Subject: Re: [PERFORM] SAN/NAS options Charles, > Lastly, one thing that I'm not yet finding in trying to > educate myself on SANs is a good overview of what's come out > in the past few years that's more affordable than the old > big-iron stuff. For example I saw some brief info on this > list's archives about the Dell/EMC offerings. Anything else > in that vein to look at? My two cents: SAN is a bad investment, go for big internal storage. The 3Ware or Areca SATA RAID adapters kick butt and if you look in the newest colos (I was just in ours "365main.net" today), you will see rack on rack of machines with from 4 to 16 internal SATA drives. Are they all DB servers? Not necessarily, but that's where things are headed. You can get a 3U server with dual opteron 250s, 16GB RAM and 16x 400GB SATAII drives with the 3Ware 9550SX controller for $10K - we just ordered 4 of them. I don't think you can buy an external disk chassis and a Fibre channel NIC for that. Performance? 800MB/s RAID5 reads, 400MB/s RAID5 writes. Random IOs are also very high for RAID10, but we don't use it so YMMV - look at Areca and 3Ware. Managability? Good web management interfaces with 6+ years of development from 3Ware, e-mail, online rebuild options, all the goodies. No "snapshot" or offline backup features like the high-end SANs, but do you really need it? Need more power or storage over time? Run a parallel DB like Bizgres MPP, you can add more servers with internal storage and increase your I/O, CPU and memory. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] High context switches occurring
Thanks for your inputs, Tom. I was going after high concurrent clients, but should have read this carefully - -s scaling_factor this should be used with -i (initialize) option. number of tuples generated will be multiple of the scaling factor. For example, -s 100 will imply 10M (10,000,000) tuples in the accounts table. default is 1. NOTE: scaling factor should be at least as large as the largest number of clients you intend to test; else you'll mostly be measuring update contention. I'll rerun the tests. Thanks, Anjan -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 06, 2005 6:45 PM To: Anjan Dave Cc: Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring "Anjan Dave" <[EMAIL PROTECTED]> writes: > -bash-3.00$ time pgbench -c 1000 -t 30 pgbench > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > number of clients: 1000 > number of transactions per client: 30 > number of transactions actually processed: 3/3 > tps = 45.871234 (including connections establishing) > tps = 46.092629 (excluding connections establishing) I can hardly think of a worse way to run pgbench :-(. These numbers are about meaningless, for two reasons: 1. You don't want number of clients (-c) much higher than scaling factor (-s in the initialization step). The number of rows in the "branches" table will equal -s, and since every transaction updates one randomly-chosen "branches" row, you will be measuring mostly row-update contention overhead if there's more concurrent transactions than there are rows. In the case -s 1, which is what you've got here, there is no actual concurrency at all --- all the transactions stack up on the single branches row. 2. Running a small number of transactions per client means that startup/shutdown transients overwhelm the steady-state data. You should probably run at least a thousand transactions per client if you want repeatable numbers. Try something like "-s 10 -c 10 -t 3000" to get numbers reflecting test conditions more like what the TPC council had in mind when they designed this benchmark. I tend to repeat such a test 3 times to see if the numbers are repeatable, and quote the middle TPS number as long as they're not too far apart. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] High context switches occurring
I ran a bit exhaustive pgbench on 2 test machines I have (quad dual core Intel and Opteron). Ofcourse the Opteron was much faster, but interestingly, it was experiencing 3x more context switches than the Intel box (upto 100k, versus ~30k avg on Dell). Both are RH4.0 64bit/PG8.1 64bit. Sun (v40z): -bash-3.00$ time pgbench -c 1000 -t 30 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1000 number of transactions per client: 30 number of transactions actually processed: 3/3 tps = 45.871234 (including connections establishing) tps = 46.092629 (excluding connections establishing) real10m54.240s user0m34.894s sys 3m9.470s Dell (6850): -bash-3.00$ time pgbench -c 1000 -t 30 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1000 number of transactions per client: 30 number of transactions actually processed: 3/3 tps = 22.088214 (including connections establishing) tps = 22.162454 (excluding connections establishing) real22m38.301s user0m43.520s sys 5m42.108s Thanks, Anjan -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 22, 2005 2:42 PM To: Anjan Dave Cc: Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring "Anjan Dave" <[EMAIL PROTECTED]> writes: > Would this problem change it's nature in any way on the recent Dual-Core > Intel XEON MP machines? Probably not much. There's some evidence that Opterons have less of a problem than Xeons in multi-chip configurations, but we've seen CS thrashing on Opterons too. I think the issue is probably there to some extent in any modern SMP architecture. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] High context switches occurring
Simon, I tested it by running two of those simultaneous queries (the 'unoptimized' one), and it doesn't make any difference whether vm.max-readahead is 256 or 2048...the modified query runs in a snap. Thanks, Anjan -Original Message----- From: Anjan Dave Sent: Wednesday, November 23, 2005 1:33 PM To: Simon Riggs Cc: Scott Marlowe; Tom Lane; Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring The offending SELECT query that invoked the CS storm was optimized by folks here last night, so it's hard to say if the VM setting made a difference. I'll give it a try anyway. Thanks, Anjan -Original Message- From: Simon Riggs [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 23, 2005 1:14 PM To: Anjan Dave Cc: Scott Marlowe; Tom Lane; Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring On Tue, 2005-11-22 at 18:17 -0500, Anjan Dave wrote: > It's mostly a 'read' application, I increased the vm.max-readahead to > 2048 from the default 256, after which I've not seen the CS storm, > though it could be incidental. Can you verify this, please? Turn it back down again, try the test, then reset and try the test. If that is a repeatable way of recreating one manifestation of the problem then we will be further ahead than we are now. Thanks, Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] High context switches occurring
The offending SELECT query that invoked the CS storm was optimized by folks here last night, so it's hard to say if the VM setting made a difference. I'll give it a try anyway. Thanks, Anjan -Original Message- From: Simon Riggs [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 23, 2005 1:14 PM To: Anjan Dave Cc: Scott Marlowe; Tom Lane; Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring On Tue, 2005-11-22 at 18:17 -0500, Anjan Dave wrote: > It's mostly a 'read' application, I increased the vm.max-readahead to > 2048 from the default 256, after which I've not seen the CS storm, > though it could be incidental. Can you verify this, please? Turn it back down again, try the test, then reset and try the test. If that is a repeatable way of recreating one manifestation of the problem then we will be further ahead than we are now. Thanks, Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] High context switches occurring
Yes, it's turned on, unfortunately it got overlooked during the setup, and until now...! It's mostly a 'read' application, I increased the vm.max-readahead to 2048 from the default 256, after which I've not seen the CS storm, though it could be incidental. Thanks, Anjan -Original Message- From: Scott Marlowe [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 22, 2005 3:38 PM To: Anjan Dave Cc: Tom Lane; Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring On Tue, 2005-11-22 at 14:33, Anjan Dave wrote: > Is there any way to get a temporary relief from this Context Switching > storm? Does restarting postmaster help? > > It seems that I can recreate the heavy CS with just one SELECT > statement...and then when multiple such SELECT queries are coming in, > things just get hosed up until we cancel a bunch of queries... Is your machine a hyperthreaded one? Some folks have found that turning off hyper threading helps. I knew it made my servers better behaved in the past. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] High context switches occurring
Is there any way to get a temporary relief from this Context Switching storm? Does restarting postmaster help? It seems that I can recreate the heavy CS with just one SELECT statement...and then when multiple such SELECT queries are coming in, things just get hosed up until we cancel a bunch of queries... Thanks, Anjan -Original Message- From: Anjan Dave Sent: Tuesday, November 22, 2005 2:24 PM To: Tom Lane; Vivek Khera Cc: Postgresql Performance Subject: Re: [PERFORM] High context switches occurring Thanks, guys, I'll start planning on upgrading to PG8.1 Would this problem change it's nature in any way on the recent Dual-Core Intel XEON MP machines? Thanks, Anjan -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 22, 2005 12:36 PM To: Vivek Khera Cc: Postgresql Performance; Anjan Dave Subject: Re: [PERFORM] High context switches occurring Vivek Khera <[EMAIL PROTECTED]> writes: > On Nov 22, 2005, at 11:59 AM, Anjan Dave wrote: >> This is a Dell Quad XEON. Hyperthreading is turned on, and I am >> planning to turn it off as soon as I get a chance to bring it down. > You should probably also upgrade to Pg 8.0 or newer since it is a > known problem with XEON processors and older postgres versions. > Upgrading Pg may solve your problem or it may not. PG 8.1 is the first release that has a reasonable probability of avoiding heavy contention for the buffer manager lock when there are multiple CPUs. If you're going to update to try to fix this, you need to go straight to 8.1. I've recently been chasing a report from Rob Creager that seems to indicate contention on SubTransControlLock, so the slru code is likely to be our next bottleneck to fix :-( regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] High context switches occurring
Thanks, guys, I'll start planning on upgrading to PG8.1 Would this problem change it's nature in any way on the recent Dual-Core Intel XEON MP machines? Thanks, Anjan -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 22, 2005 12:36 PM To: Vivek Khera Cc: Postgresql Performance; Anjan Dave Subject: Re: [PERFORM] High context switches occurring Vivek Khera <[EMAIL PROTECTED]> writes: > On Nov 22, 2005, at 11:59 AM, Anjan Dave wrote: >> This is a Dell Quad XEON. Hyperthreading is turned on, and I am >> planning to turn it off as soon as I get a chance to bring it down. > You should probably also upgrade to Pg 8.0 or newer since it is a > known problem with XEON processors and older postgres versions. > Upgrading Pg may solve your problem or it may not. PG 8.1 is the first release that has a reasonable probability of avoiding heavy contention for the buffer manager lock when there are multiple CPUs. If you're going to update to try to fix this, you need to go straight to 8.1. I've recently been chasing a report from Rob Creager that seems to indicate contention on SubTransControlLock, so the slru code is likely to be our next bottleneck to fix :-( regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] High context switches occurring
Hi, One of our PG server is experiencing extreme slowness and there are hundreds of SELECTS building up. I am not sure if heavy context switching is the cause of this or something else is causing it. Is this pretty much the final word on this issue? http://archives.postgresql.org/pgsql-performance/2004-04/msg00249.php procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 20 2860544 124816 8042544 0 0 0 0 0 0 0 0 0 0 2 0 20 2860376 124816 8042552 0 0 0 24 157 115322 13 10 76 0 3 0 20 2860364 124840 8042540 0 0 0 228 172 120003 12 10 77 0 2 0 20 2860364 124840 8042540 0 0 0 20 158 118816 15 10 75 0 2 0 20 2860080 124840 8042540 0 0 0 10 152 117858 12 11 77 0 1 0 20 2860080 124848 8042572 0 0 0 210 202 114724 14 10 76 0 2 0 20 2860080 124848 8042572 0 0 0 20 169 114843 13 10 77 0 3 0 20 2859908 124860 8042576 0 0 0 188 180 115134 14 11 75 0 3 0 20 2859848 124860 8042576 0 0 0 20 173 113470 13 10 77 0 2 0 20 2859836 124860 8042576 0 0 0 10 157 112839 14 11 75 0 The system seems to be fine on iowait/memory side, except the CPU being busy with the CS. Here’s the top output: 11:54:57 up 59 days, 14:11, 2 users, load average: 1.13, 1.66, 1.52 282 processes: 281 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 13.8% 0.0% 9.7% 0.0% 0.0% 0.0% 76.2% cpu00 12.3% 0.0% 10.5% 0.0% 0.0% 0.1% 76.8% cpu01 12.1% 0.0% 6.1% 0.0% 0.0% 0.1% 81.5% cpu02 10.9% 0.0% 9.1% 0.0% 0.0% 0.0% 79.9% cpu03 19.4% 0.0% 14.9% 0.0% 0.0% 0.0% 65.6% cpu04 13.9% 0.0% 11.1% 0.0% 0.0% 0.0% 74.9% cpu05 14.9% 0.0% 9.1% 0.0% 0.0% 0.0% 75.9% cpu06 12.9% 0.0% 8.9% 0.0% 0.0% 0.0% 78.1% cpu07 14.3% 0.0% 8.1% 0.0% 0.1% 0.0% 77.3% Mem: 12081720k av, 9273304k used, 2808416k free, 0k shrd, 126048k buff 4686808k actv, 3211872k in_d, 170240k in_c Swap: 4096532k av, 20k used, 4096512k free 8044072k cached PostgreSQL 7.4.7 on i686-redhat-linux-gnu Red Hat Enterprise Linux AS release 3 (Taroon Update 5) Linux vl-pe6650-004 2.4.21-32.0.1.ELsmp This is a Dell Quad XEON. Hyperthreading is turned on, and I am planning to turn it off as soon as I get a chance to bring it down. WAL is on separate drives from the OS and database. Appreciate any inputs please…. Thanks, Anjan
[PERFORM] slow database, queries accumulating
Hi We are experiencing consistent slowness on the database for one application. This is more a reporting type of application, heavy on the bytea data type usage (gets rendered into PDFs in the app server). A lot of queries, mostly selects and a few random updates, get accumulated on the server – with increasing volume of users on the application. Below is a snapshot of top, with about 80 selects and 3 or 4 updates. Things get better eventually if I cancel (SIGINT) some of the oldest queries. I also see a few instances of shared locks not being granted during this time…I don’t even see high iowait or memory starvation during these times, as indicated by top. -bash-2.05b$ psql -c "select * from pg_locks;" dbname | grep f | | 77922136 | 16761 | ShareLock | f We (development) are looking into the query optimization (explain analyze, indexes, etc), and my understanding is that the queries when run for explain analyze execute fast, but during busy times, they become quite slow, taking from a few seconds to a few minutes to execute. I do see in the log that almost all queries do have either ORDER BY, or GROUP BY, or DISTINCT. Does it hurt to up the sort_mem to 3MB or 4MB? Should I up the effective_cache_size to 5 or 6GB? The app is does not need a lot of connections on the database, I can reduce it down from 600. Based on the description above and the configuration below does any thing appear bad in config? Is there anything I can try in the configuration to improve performance? The database size is about 4GB. This is PG 7.4.7, RHAS3.0 (u5), Local 4 spindle RAID10 (15KRPM), and logs on a separate set of drives, RAID10. 6650 server, 4 x XEON, 12GB RAM. Vacuum is done every night, full vacuum done once a week. I had increased the shared_buffers and sort_memory recently, which didn’t help. Thanks, Anjan 10:44:51 up 14 days, 13:38, 2 users, load average: 0.98, 1.14, 1.12 264 processes: 257 sleeping, 7 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 14.4% 0.0% 7.4% 0.0% 0.0% 0.0% 77.9% cpu00 15.7% 0.0% 5.7% 0.0% 0.1% 0.0% 78.2% cpu01 15.1% 0.0% 7.5% 0.0% 0.0% 0.1% 77.0% cpu02 10.5% 0.0% 5.9% 0.0% 0.0% 0.0% 83.4% cpu03 9.9% 0.0% 5.9% 0.0% 0.0% 0.0% 84.0% cpu04 7.9% 0.0% 3.7% 0.0% 0.0% 0.0% 88.2% cpu05 19.3% 0.0% 12.3% 0.0% 0.0% 0.0% 68.3% cpu06 20.5% 0.0% 9.5% 0.0% 0.0% 0.1% 69.7% cpu07 16.1% 0.0% 8.5% 0.0% 0.1% 0.3% 74.7% Mem: 12081736k av, 7881972k used, 4199764k free, 0k shrd, 82372k buff 4823496k actv, 2066260k in_d, 2036k in_c Swap: 4096532k av, 0k used, 4096532k free 6888900k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 16773 postgres 15 0 245M 245M 240M S 0.0 2.0 1:16 7 postmaster 16880 postgres 15 0 245M 245M 240M S 0.1 2.0 0:49 6 postmaster 16765 postgres 15 0 245M 245M 240M S 0.0 2.0 1:16 0 postmaster 16825 postgres 15 0 245M 245M 240M S 0.0 2.0 1:02 5 postmaster 16774 postgres 15 0 245M 245M 240M S 0.1 2.0 1:16 0 postmaster 16748 postgres 15 0 245M 245M 240M S 0.0 2.0 1:19 5 postmaster 16881 postgres 15 0 245M 245M 240M S 0.1 2.0 0:50 7 postmaster 16762 postgres 15 0 245M 245M 240M S 0.0 2.0 1:14 4 postmaster … … max_connections = 600 shared_buffers = 3 #=234MB, up from 21760=170MB min 16, at least max_connections*2, 8KB each sort_mem = 2048 # min 64, size in KB vacuum_mem = 32768 # up from 16384 min 1024, size in KB # - Free Space Map - #max_fsm_pages = 2 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~50 bytes each #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync # the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync #wal_buffers = 8 # min 4, 8KB each # - Checkpoints - checkpoint_segments = 125 # in logfile segments, min 1, 16MB each checkpoint_timeout = 600 # range 30-3600, in seconds #checkpoint_warning = 30 # 0 is off, in seconds #commit_delay = 0 # range 0-10, in microseconds #commit_siblings = 5 # range 1-1000 # - Planner Method Enabling - #enable_hashagg = true #enable_hashjoin = true #enable_indexscan = true #enable_mergejoin = true #enable_nestloop = true #enable_seqscan = true #enable_sort = true #enable_tidscan = true # - Planner Cost Con
Re: [PERFORM] High load and iowait but no disk access
I have seen references of changing the kernel io scheduler at boot time…not sure if it applies to RHEL3.0, or will help, but try setting ‘elevator=deadline’ during boot time or via grub.conf. Have you tried running a simple ‘dd’ on the LUN? The drives are in RAID10 configuration, right? Thanks, Anjan From: Woody Woodring [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 30, 2005 2:30 PM To: 'Rémy Beaumont'; pgsql-performance@postgresql.org Subject: Re: [PERFORM] High load and iowait but no disk access Have you tried a different kernel? We run with a netapp over NFS without any issues, but we have seen high IO-wait on other Dell boxes (running and not running postgres) and RHES 3. We have replaced a Dell PowerEdge 350 running RH 7.3 with a PE750 with more memory running RHES3 and it be bogged down with IO waits due to syslog messages writing to the disk, the old slower server could handle it fine. I don't know if it is a Dell thing or a RH kernel, but we try different kernels on our boxes to try to find one that works better. We have not found one that stands out over another consistently but we have been moving away from Update 2 kernel (2.4.21-15.ELsmp) due to server lockup issues. Unfortunately we get the best disk throughput on our few remaining 7.3 boxes. Woody IGLASS Networks www.iglass.net From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rémy Beaumont Sent: Monday, August 29, 2005 9:43 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] High load and iowait but no disk access We have been trying to pinpoint what originally seem to be a I/O bottleneck but which now seems to be an issue with either Postgresql or RHES 3. We have the following test environment on which we can reproduce the problem: 1) Test System A Dell 6650 Quad Xeon Pentium 4 8 Gig of RAM OS: RHES 3 update 2 Storage: NetApp FAS270 connected using an FC card using 10 disks 2) Test System B Dell Dual Xeon Pentium III 2 Gig o RAM OS: RHES 3 update 2 Storage: NetApp FAS920 connected using an FC card using 28 disks Our Database size is around 30G. The behavior we see is that when running queries that do random reads on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a throughput bellow 3000kB/s (We usually average 4 kB/s to 8 kB/s on sequential read operations on the netapps) The stats of the NetApp do confirm that it is sitting idle. Doing an strace on the Postgresql process shows that is it doing seeks and reads. So my question is where is this iowait time spent ? Is there a way to pinpoint the problem in more details ? We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3 I have included the output of top,vmstat,strace and systat from the Netapp from System B while running a single query that generates this behavior. Rémy top output: 06:27:28 up 5 days, 16:59, 6 users, load average: 1.04, 1.30, 1.01 72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 2.7% 0.0% 1.0% 0.1% 0.2% 46.0% 49.5% cpu00 0.2% 0.0% 0.2% 0.0% 0.2% 2.2% 97.2% cpu01 5.3% 0.0% 1.9% 0.3% 0.3% 89.8% 1.9% Mem: 2061696k av, 2043936k used, 17760k free, 0k shrd, 3916k buff 1566332k actv, 296648k in_d, 30504k in_c Swap: 16771584k av, 21552k used, 16750032k free 1933772k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 30960 postgres 15 0 13424 10M 9908 D 2.7 0.5 2:00 1 postmaster 30538 root 15 0 1080 764 524 S 0.7 0.0 0:43 0 sshd 1 root 15 0 496 456 436 S 0.0 0.0 0:08 0 init 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0 3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1 4 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 keventd 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0 6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1 9 root 15 0 0 0 0 SW 0.0 0.0 0:24 1 bdflush 7 root 15 0 0 0 0 SW 0.0 0.0 6:53 1 kswapd 8 root 15 0 0 0 0 SW 0.0 0.0 8:44 1 kscand 10 root 15 0 0 0 0 SW 0.0 0.0 0:13 0 kupdated 11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd 17 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 ahc_dv_0 vmstat output procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 21552 17796 4872 1931928 2 3 3 1 27 6 2 1 7 3 0 1 21552 18044 4880 1931652 0 0 1652 0 397 512 1 2 50 47 0 1 21552 17976 4896 1931664 0 0 2468 0 407 552 2 2 50 47 1 0 21552 17984 4896 1931608 0 0 2124 0 418 538 3 3 48 46 0 1 21552 18028 4900 1931536 0 0 1592 0 385 509 1 3 50 46 0 1 21552 18040 4916 1931488 0 0 1620 820 419 581 2 2 50 46 0 1 21552 17968 4916 1931536 0 4 1708 4 402 554 3 1 50 46 1 1 21552 18052 4916 1931388 0 0 1772 0 409 531 3 1 49 47 0 1 21552 17912 4924 1931492 0 0 1772 0 408 565 3 1 48 48 0 1 21552 17932 4932 1931440 0 4 1356 4 391 545 5 0 49 46 0 1 21552 18320 4944 1931016 0 4 1500 840 414 571 1 1 48 50 0 1 21552 17872 4944 1931440 0 0 2116 0 392 496 1 5 46 48 0 1 21552 18060 4944 1931232 0 0 2232 0 423 597 1 2 48 49 1 1 21552 17684 4944 1931584 0 0 1752 0 395 537 1
Re: [PERFORM] choosing RAID level for xlogs
Thanks, everyone. I got some excellent replies, including some long explanations. Appreciate the time you guys took out for the responses. The gist of it i take, is to use RAID10. I have 400MB+ of write cache on the controller(s), that the RAID5 LUN(s) could benefit from by filling it up and writing out the complete stripe, but come to think of it, it's shared among the two Storage Processors, all the LUNs, not just the ones holding the pg_xlog directory. The other thing (with Clariion) is the write cache mirroring. Write isn't signalled complete to the host until the cache content is mirrored across the other SP (and vice-versa), which is a good thing, but this operation could potentially become a bottleneck with very high load on the SPs. Also, one would have to fully trust the controller/manufacturer's claim on signalling the write completion. And, performance is a priority over the drive space lost in RAID10 for me. I can use 4 drives instead of 6. Thanks, Anjan t-Original Message- From: Gregory S. Williamson [mailto:[EMAIL PROTECTED] Sent: Tue 8/16/2005 6:22 PM To: Anjan Dave; pgsql-performance@postgresql.org Cc: Subject: RE: [PERFORM] choosing RAID level for xlogs I would be very cautious about ever using RAID5, despite manufacturers' claims to the contrary. The link below is authored by a very knowledgable fellow whose posts I know (and trust) from Informix land. <http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt> Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Anjan Dave Sent: Mon 8/15/2005 1:35 PM To: pgsql-performance@postgresql.org Cc: Subject:[PERFORM] choosing RAID level for xlogs Hi, One simple question. For 125 or more checkpoint segments (checkpoint_timeout is 600 seconds, shared_buffers are at 21760 or 170MB) on a very busy database, what is more suitable, a separate 6 disk RAID5 volume, or a RAID10 volume? Databases will be on separate spindles. Disks are 36GB 15KRPM, 2Gb Fiber Channel. Performance is paramount, but I don't want to use RAID0. PG7.4.7 on RHAS 4.0 I can provide more info if needed. Appreciate some recommendations! Thanks, Anjan --- This email message and any included attachments constitute confidential and privileged information intended exclusively for the listed addressee(s). If you are not the intended recipient, please notify Vantage by immediately telephoning 215-579-8390, extension 1158. In addition, please reply to this message confirming your receipt of the same in error. A copy of your email reply can also be sent to [EMAIL PROTECTED] Please do not disclose, copy, distribute or take any action in reliance on the contents of this information. Kindly destroy all copies of this message and any attachments. Any other use of this email is prohibited. Thank you for your cooperation. For more information about Vantage, please visit our website at http://www.vantage.com <http://www.vantage.com/> . --- !DSPAM:4300fd35105094125621296! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] choosing RAID level for xlogs
Yes, that's true, though, I am a bit confused because the Clariion array document I am reading talks about how the write cache can eliminate the RAID5 Write Penalty for sequential and large IOs...resulting in better sequential write performance than RAID10. anjan -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 16, 2005 2:00 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] choosing RAID level for xlogs Quoting Anjan Dave <[EMAIL PROTECTED]>: > Hi, > > > > One simple question. For 125 or more checkpoint segments > (checkpoint_timeout is 600 seconds, shared_buffers are at 21760 or > 170MB) on a very busy database, what is more suitable, a separate 6 disk > RAID5 volume, or a RAID10 volume? Databases will be on separate > spindles. Disks are 36GB 15KRPM, 2Gb Fiber Channel. Performance is > paramount, but I don't want to use RAID0. > RAID10 -- no question. xlog activity is overwhelmingly sequential 8KB writes. In order for RAID5 to perform a write, the host (or controller) needs to perform extra calculations for parity. This turns into latency. RAID10 does not perform those extra calculations. > > > PG7.4.7 on RHAS 4.0 > > > > I can provide more info if needed. > > > > Appreciate some recommendations! > > > > Thanks, > > Anjan > > > > > --- > This email message and any included attachments constitute confidential > and privileged information intended exclusively for the listed > addressee(s). If you are not the intended recipient, please notify > Vantage by immediately telephoning 215-579-8390, extension 1158. In > addition, please reply to this message confirming your receipt of the > same in error. A copy of your email reply can also be sent to > [EMAIL PROTECTED] Please do not disclose, copy, distribute or take > any action in reliance on the contents of this information. Kindly > destroy all copies of this message and any attachments. Any other use of > this email is prohibited. Thank you for your cooperation. For more > information about Vantage, please visit our website at > http://www.vantage.com <http://www.vantage.com/> . > --- > > > > ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] choosing RAID level for xlogs
Hi, One simple question. For 125 or more checkpoint segments (checkpoint_timeout is 600 seconds, shared_buffers are at 21760 or 170MB) on a very busy database, what is more suitable, a separate 6 disk RAID5 volume, or a RAID10 volume? Databases will be on separate spindles. Disks are 36GB 15KRPM, 2Gb Fiber Channel. Performance is paramount, but I don’t want to use RAID0. PG7.4.7 on RHAS 4.0 I can provide more info if needed. Appreciate some recommendations! Thanks, Anjan ---This email message and any included attachments constitute confidential and privileged information intended exclusively for the listed addressee(s). If you are not the intended recipient, please notify Vantage by immediately telephoning 215-579-8390, extension 1158. In addition, please reply to this message confirming your receipt of the same in error. A copy of your email reply can also be sent to [EMAIL PROTECTED] Please do not disclose, copy, distribute or take any action in reliance on the contents of this information. Kindly destroy all copies of this message and any attachments. Any other use of this email is prohibited. Thank you for your cooperation. For more information about Vantage, please visit our website at http://www.vantage.com.---
Re: [PERFORM] seqential vs random io
I would tell him to go for the random, which is what most DBs would be by nature. What you need to understand will be the cache parameters, read/write cache amount, and stripe size, depending on your controller type and whatever it defaults to on these things. Thanks, Anjan -Original Message- From: David Parker [mailto:[EMAIL PROTECTED] Sent: Mon 5/23/2005 4:58 PM To: pgsql-performance@postgresql.org Cc: Subject: [PERFORM] seqential vs random io I just got a question from one our QA guys who is configuring a RAID 10 disk that is destined to hold a postgresql database. The disk configuration procedure is asking him if he wants to optimize for sequential or random access. My first thought is that random is what we would want, but then I started wondering if it's not that simple, and my knowledge of stuff at the hardware level is, well, limited. If it were your QA guy, what would you tell him? - DAP -- David ParkerTazz Networks(401) 709-5130 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL strugling during high load
Yes, I am using it another DB/application. Few more days and I'll have a free hand on this box as well. Thanks, Anjan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Thursday, May 19, 2005 3:58 PM To: Anjan Dave Cc: Donald Courtney; Tom Lane; pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL strugling during high load Anjan, > As far as disk I/O is concerned for flushing the buffers out, I am not > ruling out the combination of Dell PERC4 RAID card, and the RH AS 3.0 > Update3 being a problem. You know that Update4 is out, yes? Update3 is currenly throttling your I/O by about 50%. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL strugling during high load
What platform is this? We had similar issue (PG 7.4.7). Raising number of checkpoint segments to 125, seperating the WAL to a different LUN helped, but it's still not completely gone. As far as disk I/O is concerned for flushing the buffers out, I am not ruling out the combination of Dell PERC4 RAID card, and the RH AS 3.0 Update3 being a problem. Thanks, Anjan -Original Message- From: Donald Courtney [mailto:[EMAIL PROTECTED] Sent: Thu 5/19/2005 12:54 PM To: Tom Lane Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL strugling during high load Tom Thanks for the post - I think I am getting this problem for a synthetic workload at high connection loads. The whole system seems to stop. Can you give some examples on what to try out in the .conf file? I tried bgwriter_all_percent = 30, 10, and 3 Which I understand to mean 30%, 10% and 3% of the dirty pages should be written out *between* checkpoints. I didn't see any change in effect. /regards Don C. Tom Lane wrote: >"Mindaugas Riauba" <[EMAIL PROTECTED]> writes: > > >> It looks like that not only vacuum causes our problems. vacuum_cost >>seems to lower vacuum impact but we are still noticing slow queries "storm". >>We are logging queries that takes >2000ms to process. >> And there is quiet periods and then suddenly 30+ slow queries appears in >>log within the same second. What else could cause such behaviour? >> >> > >Checkpoints? You should ensure that the checkpoint settings are such >that checkpoints don't happen too often (certainly not oftener than >every five minutes or so), and make sure the bgwriter is configured >to dribble out dirty pages at a reasonable rate, so that the next >checkpoint doesn't have a whole load of stuff to write. > > regards, tom lane > >---(end of broadcast)--- >TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] > > ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Whence the Opterons?
The DP+DC isn't available yet, from Sun. Only QP+DC is, for which the bid opens at 38k, that is a bit pricey -:) -Original Message- From: William Yu [mailto:[EMAIL PROTECTED] Sent: Monday, May 09, 2005 1:24 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Whence the Opterons? Unfortunately, Anandtech only used Postgres just a single time in his benchmarks. And what it did show back then was a huge performance advantage for the Opteron architecture over Xeon in this case. Where the fastest Opterons were just 15% faster in MySQL/MSSQL/DB2 than the fastest Xeons, it was 100%+ faster in Postgres. He probably got rid of Postgres from his benchmark suite since it favors Opteron too much. As a general hardware review site, makes senses that he needs to get more neutral apps in order to get free systems to review and (ahem) ad dollars. That being said, I wouldn't get a quad Opteron system anyways now that the dual core Opterons are available. A DP+DC system would be faster and cheaper than a pure quad system. Unless of course, I needed a QP+DC for 8-way SMP. Anjan Dave wrote: > Wasn't the context switching issue occurring in specific cases only? > > I haven't seen any benchmarks for a 50% performance difference. Neither > have I seen any benchmarks of pure disk IO performance of specific > models of Dell vs HP or Sun Opterons. > > Thanks, > Anjan > >>EMC you can file an RPQ via your sales contacts to get it approved, >>though not sure how lengthy/painful that process might be, or if it's >>gonna be worth it. >> >>Read the article devoted to the v40z on anandtech.com. >> >>I am also trying to get a quad-Opteron versus the latest quad-XEON > > from > >>Dell (6850), but it's hard to justify a difference between a 15K dell >>versus a 30k v40z for a 5-8% performance gain (read the XEON Vs. > > Opteron > >>Database comparo on anandtech.com)... >> >>Thanks, >>Anjan >> > > > 15k vs 30k is indeed a big difference. But also realize that Postgres > has a specific benefit to Opterons versus Xeons. The context switching > storm happens less on an Opteron for some reason. > > I would venture a much greater benefit than 5-8%, more like 10-50%. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Whence the Opterons?
Wasn't the context switching issue occurring in specific cases only? I haven't seen any benchmarks for a 50% performance difference. Neither have I seen any benchmarks of pure disk IO performance of specific models of Dell vs HP or Sun Opterons. Thanks, Anjan -Original Message- From: John A Meinel [mailto:[EMAIL PROTECTED] Sent: Monday, May 09, 2005 11:22 AM To: Anjan Dave Cc: Geoffrey; Mischa Sandberg; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Whence the Opterons? Anjan Dave wrote: > You also want to consider any whitebox opteron system being on the > compatibility list of your storage vendor, as well as RedHat, etc. With > EMC you can file an RPQ via your sales contacts to get it approved, > though not sure how lengthy/painful that process might be, or if it's > gonna be worth it. > > Read the article devoted to the v40z on anandtech.com. > > I am also trying to get a quad-Opteron versus the latest quad-XEON from > Dell (6850), but it's hard to justify a difference between a 15K dell > versus a 30k v40z for a 5-8% performance gain (read the XEON Vs. Opteron > Database comparo on anandtech.com)... > > Thanks, > Anjan > 15k vs 30k is indeed a big difference. But also realize that Postgres has a specific benefit to Opterons versus Xeons. The context switching storm happens less on an Opteron for some reason. I would venture a much greater benefit than 5-8%, more like 10-50%. John =:-> ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Whence the Opterons?
You also want to consider any whitebox opteron system being on the compatibility list of your storage vendor, as well as RedHat, etc. With EMC you can file an RPQ via your sales contacts to get it approved, though not sure how lengthy/painful that process might be, or if it's gonna be worth it. Read the article devoted to the v40z on anandtech.com. I am also trying to get a quad-Opteron versus the latest quad-XEON from Dell (6850), but it's hard to justify a difference between a 15K dell versus a 30k v40z for a 5-8% performance gain (read the XEON Vs. Opteron Database comparo on anandtech.com)... Thanks, Anjan -Original Message- From: Geoffrey [mailto:[EMAIL PROTECTED] Sent: Sunday, May 08, 2005 10:18 PM To: Mischa Sandberg Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Whence the Opterons? Mischa Sandberg wrote: > After reading the comparisons between Opteron and Xeon processors for Linux, > I'd like to add an Opteron box to our stable of Dells and Sparcs, for comparison. > > IBM, Sun and HP have their fairly pricey Opteron systems. > The IT people are not swell about unsupported purchases off ebay. > Anyone care to suggest any other vendors/distributors? > Looking for names with national support, so that we can recommend as much to our > customers. Monarch Computer http://www.monarchcomputer.com/ They have prebuilt and custom built systems. -- Until later, Geoffrey ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Why is this system swapping?
Using Resin's connection pooling. We are looking into pgpool alongside slony to separate some reporting functionality. -anjan -Original Message- From: Jeff [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 27, 2005 3:29 PM To: Greg Stark Cc: Anjan Dave; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Why is this system swapping? On Apr 27, 2005, at 2:29 PM, Greg Stark wrote: > "AI would seriously look at tuning those connection pools down. A lot. > If your > server processes are sitting idle over half the time I would at least > cut it > by a factor of 2. > Are you (Anjan) using real or fake connection pooling - ie pgpool versus php's persistent connections ? I'd strongly recommend looking at pgpool. it does connection pooling correctly (A set of X connections shared among the entire box rather than 1 per web server) -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Why is this system swapping?
Yes, HT is turned off (I haven't seen any recommendations to keep it on). This is when we were seeing 30 to 50% less traffic (users) than today - we didn't want the idle connections in the pool to expire too soon (default 30 secs, after which it goes back to pool) and reopen it quickly, or not have sufficient available (default 20 conns, we raised it to 50), so we figured a number per app server (50) and set that to expire after a very long time, so as to avoid any overhead, and always have the connection available whenever needed, without opening a new one. But now, for *some* reason, in some part of the day, we use up almost all connections in each app's pool. After that since they are set to expire after a long time, they remain there, taking up DB resources. I will be trimming down the idle-timeout to a few minutes first, see if that helps. Thanks, Anjan -Original Message- From: Greg Stark [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 27, 2005 2:29 PM To: Anjan Dave Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Why is this system swapping? "Anjan Dave" <[EMAIL PROTECTED]> writes: > Some background: > > This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4...pretty heavy > on concurrent usage. With peak traffic (db allows 1000 connections, in > line with the number of app servers and connection pools for each) > following is from 'top' (sorted by mem) Shared_buffers is 170MB, > sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre channel > storage, RAID10. > > 972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped > > CPU states: cpuusernice systemirq softirq iowait idle >total 57.2%0.0% 23.2% 0.0% 3.6% 82.8% 232.4% This looks to me like most of your server processes are sitting around idle most of the time. > 21397 postgres 22 0 181M 180M 175M D25.9 1.5 85:17 0 > postmaster > > 23820 postgres 15 0 178M 177M 175M S 0.0 1.5 1:53 3 > postmaster So each process is taking up 8-11M of ram beyond the shared memory. 1,000 x 10M is 10G. Add in some memory for page tables and kernel data structures, as well as the kernel's need to keep some memory set aside for filesystem buffers (what you really want all that memory being used for anyways) and you've used up all your 12G. I would seriously look at tuning those connection pools down. A lot. If your server processes are sitting idle over half the time I would at least cut it by a factor of 2. Working the other direction: you have four processors (I guess you have hyperthreading turned off?) so ideally what you want is four runnable processes at all times and as few others as possible. If your load typically spends about half the time waiting on i/o (which is what that top output says) then you want a total of 8 connections. Realistically you might not be able to predict which app server will be providing the load at any given time, so you might want 8 connections per app server. And you might have some load that's more i/o intensive than the 50% i/o load shown here. Say you think some loads will be 80% i/o, you might want 20 connections for those loads. If you had 10 app servers with 20 connections each for a total of 200 connections I suspect that would be closer to right than having 1,000 connections. 200 connections would consume 2G of ram leaving you with 10G of filesystem cache. Which might in turn decrease the percentage of time waiting on i/o, which would decrease the number of processes you need even further... -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Why is this system swapping?
Sorry, I didn't attach vmstat, the system does actively swap pages. Not to the point where it crawls, but for some brief periods the console becomes a bit unresponsive. I am taking this as a sign to prevent future problems. anjan -Original Message- From: Jeff [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 27, 2005 2:30 PM To: Anjan Dave Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Why is this system swapping? On Apr 27, 2005, at 1:48 PM, Anjan Dave wrote: > As you can see the system starts utilizing swap at some point, with so > many processes. Some time ago we had decided to keep the connections > from the pool open for longer You've shown the system has used swap but not that it is swapping. Having swap in use is fine - there is likely plenty of code and whatnot that is not being used so it dumped it out to swap. However if you are actively moving data to/from swap that is bad. Very bad. Especially on linux. To tell if you are swapping you need to watch the output of say, vmstat 1 and look at the si and so columns. Linux is very swap happy and likes to swap things for fun and profit. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Why is this system swapping?
Hello, I am trying to understand what I need to do for this system to stop using swap. Maybe it’s something simple, or obvious for the situation. I’d appreciate some thoughts/suggestions. Some background: This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4…pretty heavy on concurrent usage. With peak traffic (db allows 1000 connections, in line with the number of app servers and connection pools for each) following is from ‘top’ (sorted by mem) Shared_buffers is 170MB, sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre channel storage, RAID10. 972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 57.2% 0.0% 23.2% 0.0% 3.6% 82.8% 232.4% cpu00 22.0% 0.0% 9.1% 0.1% 0.9% 18.7% 48.8% cpu01 17.5% 0.0% 5.8% 0.0% 2.3% 19.7% 54.4% cpu02 7.8% 0.0% 3.7% 0.0% 0.0% 20.8% 67.5% cpu03 9.7% 0.0% 4.4% 0.0% 0.5% 23.6% 61.5% Mem: 12081744k av, 12055220k used, 26524k free, 0k shrd, 71828k buff 9020480k actv, 1741348k in_d, 237396k in_c Swap: 4096532k av, 472872k used, 3623660k free 9911176k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 21397 postgres 22 0 181M 180M 175M D 25.9 1.5 85:17 0 postmaster 23820 postgres 15 0 178M 177M 175M S 0.0 1.5 1:53 3 postmaster 24428 postgres 15 0 178M 177M 175M S 0.0 1.5 1:35 3 postmaster 24392 postgres 15 0 178M 177M 175M S 2.7 1.5 2:07 2 postmaster 23610 postgres 15 0 178M 177M 175M S 0.0 1.5 0:29 2 postmaster 24395 postgres 15 0 178M 177M 175M S 0.0 1.5 1:12 1 postmaster … … -bash-2.05b$ free total used free shared buffers cached Mem: 12081744 12055536 26208 0 66704 9943988 -/+ buffers/cache: 2044844 10036900 Swap: 4096532 512744 3583788 As you can see the system starts utilizing swap at some point, with so many processes. Some time ago we had decided to keep the connections from the pool open for longer periods of time, possibly to avoid connection maintenance overhead on the db. At that time the traffic was not as high as it is today, which might be causing this, because for the most part, non-idle postmaster processes are only a few, except when the system becomes busy and suddenly you see a lot of selects piling up, and load averages shooting upwards. I am thinking closing out connections sooner might help the system release some memory to the kernel. Swapping adds up to the IO, although OS is on separate channel than postgres. I can add more memory, but I want to make sure I haven’t missed out something obvious. Thanks! Anjan **This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.**
[PERFORM] Updating table, precautions?
Hi there, We need to update a table of about 1.2GB (and about 900k rows) size. I was wondering if I should let the regular cron job take care of clean up (vacuum db Mon-Sat, vacuum full on Sun, followed by Reindex script), or manually do this on the table followed by the update. This is what I used to find the table size, which probably doesn’t include the index size. Is there a way to find out size of indexes? select relpages * 8192 as size_in_bytes from pg_class where relnamespace = (select oid from pg_namespace where nspname = 'public') and relname = 'r_itemcategory'; Thanks, Anjan **This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.**
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
He is running RHAS4, which is the latest 2.6.x kernel from RH. I believe it should have done away with the RHAS3.0 Update 3 IO issue. anjan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 20, 2005 4:23 PM To: Joel Fradkin Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Joel, > I have MSSQL running on a 2 proc dell which until my load has increased > (over aprx 2 years) it was just fine. I totally agree that there are better > solutions based on this lists comments, but I have all Dell hardware now > and resist trying different vendors just to suit Postgres. I was under the > impression there were still issues with 64bit postgres and Linux (or at > least were when I purchased). I believed I could make my next aquistion a > opteron based hardware. Yeah, sorry, the Dell stuff is a sore point with me. You can't imagine the number of conversations I have that go like this: "We're having a severe performance problem with PostgreSQL" "What hardware/OS are you using?" "Dell *650 with RHAS 3.0 " BTW, which Update version is your RHAS? If you're on Update3, you can grab more performance right there by upgrading to Update4. > Again I am not at all trying to critasize any one, so please except my > apology if I some how came across with that attitude. I am very > disappointed at this point. My views may not be that great (although I am > not saying that either), but they run ok on MSSQL and appear to run ok on > MYSQL. Yeah. I think you'll find a few things that are vice-versa. For that matter, I can point to a number of queries we run better than Oracle, and a number we don't. Your particular query problem seems to stem from some bad estimates. Can you post an EXPLAIN ANALYZE based on all the advice people have given you so far? > I wish I did understand what I am doing wrong because I do not wish to > revisit engineering our application for MYSQL. I can imagine. > I would of spent more $ with Command, but he does need my data base to help > me and I am not able to do that. Yes. For that matter, it'll take longer to troubleshoot on this list because of your security concerns. > I agree testing the whole app is the only way to see and unfortunately it > is a time consuming bit. I do not have to spend 4k on MYSQL, that is if I > want to have their premium support. I can spend $250.00 a server for the > commercial license if I find the whole app does run well. I just loaded the > data last night and only had time to convert one view this morning. I am > sure it is something I do not understand and not a problem with postgres. I > also am willing to take time to get more knowledgeable, but my time is > running out and I feel honestly stupid. You're not. You have a real query problem and it will require further troubleshooting to solve. Some of us make a pretty handsome living solving these kinds of problems, it take a lot of expert knowledge. > It was never my intention to make you feel like I was flaming anyone > involved. On the contrary, I feel many have taken time to look at my > questions and given excellent advice. I know I check the archives so > hopefully that time will help others after me. Well, I overreacted too. Sorry! > I may find that revisiting the datasets is a way to make PG work, or as you > mentioned maybe I can get some one with more knowledge to step in locally. > I did ask Tom if he knew of anyone, maybe some one else on the list is > aware of a professional in the Tampa FL area. Well, Robert Treat is in Florida but I'm pretty sure he's busy full-time. > Realistically I don't think a 30k$ Dell is a something that needs to be > junked. I am pretty sure if I got MSSQL running on it, it would outperform > my two proc box. I can agree it may not have been the optimal platform. My > decision is not based solely on the performance on the 4 proc box. Oh, certainly it's too late to buy a Sunfire or eServer instead. You just could have gotten far more bang for the buck with some expert advice, that's all. But don't bother with Dell support any further, they don't really have the knowledge to help you. So ... new EXPLAIN ANALYZE ? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Opteron vs Xeon (Was: What to do with 6 disks?)
There have been some discussions on this list and others in general about Dell's version of RAID cards, and server support, mainly linux support. Before I venture into having another vendor in the shop I want to know if there are any dos/don't's about 4-way Opteron offerings from Sun and HP. Don't want to put the topic on a different tangent, but I would be interested in the discussion of AMD Vs. XEON in terms of actual products available today. Thanks, Anjan -Original Message- From: Christian Sander Røsnes [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 20, 2005 12:14 PM To: Bruce Momjian Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Opteron vs Xeon (Was: What to do with 6 disks?) On Wednesday 20 April 2005 17:50, Bruce Momjian wrote: > Anjan Dave wrote: > > In terms of vendor specific models - > > > > Does anyone have any good/bad experiences/recommendations for a 4-way > > Opteron from Sun (v40z, 6 internal drives) or HP (DL585 5 internal > > drives) models? > > > > This is in comparison with the new Dell 6850 (it has PCIexpress, faster > > FSB 667MHz, which doesn't match up with AMD's total IO bandwidth, but > > much better than previous 6650s). > > Dell cuts too many corners to be a good server. Hi Which corners do Dell cut compared to the competition ? Thanks Christian ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Opteron vs Xeon (Was: What to do with 6 disks?)
In terms of vendor specific models - Does anyone have any good/bad experiences/recommendations for a 4-way Opteron from Sun (v40z, 6 internal drives) or HP (DL585 5 internal drives) models? This is in comparison with the new Dell 6850 (it has PCIexpress, faster FSB 667MHz, which doesn't match up with AMD's total IO bandwidth, but much better than previous 6650s). Thanks, Anjan -Original Message- From: William Yu [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 20, 2005 11:10 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Opteron vs Xeon (Was: What to do with 6 disks?) I posted this link a few months ago and there was some surprise over the difference in postgresql compared to other DBs. (Not much surprise in Opteron stomping on Xeon in pgsql as most people here have had that experience -- the surprise was in how much smaller the difference was in other DBs.) If it was across the board +100% in MS-SQL, MySQL, etc -- you can chalk in up to overall better CPU architecture. Most of the time though, the numbers I've seen show +0-30% for [insert DB here] and a huge whopping + for pgsql. Why the pronounced preference for postgresql, I'm not sure if it was explained fully. BTW, the Anandtech test compares single CPU systems w/ 1GB of RAM. Go to dual/quad and SMP Xeon will suffer even more since it has to share a fixed amount of FSB/memory bandwidth amongst all CPUs. Xeons also seem to suffer more from context-switch storms. Go > 4GB of RAM and the Xeon suffers another hit due to the lack of a 64-bit IOMMU. Devices cannot map to addresses > 4GB which means the OS has to do extra work in copying data from/to > 4GB anytime you have IO. (Although this penalty might exist all the time in 64-bit mode for Xeon if Linux/Windows took the expedient and less-buggy route of using a single method versus checking whether target addresses are > or < 4GB.) Jeff Frost wrote: > On Tue, 19 Apr 2005, J. Andrew Rogers wrote: > >> I don't know about 2.5x faster (perhaps on specific types of loads), >> but the reason Opterons rock for database applications is their >> insanely good memory bandwidth and latency that scales much better >> than the Xeon. Opterons also have a ccNUMA-esque I/O fabric and two >> dedicated on-die memory channels *per processor* -- no shared bus >> there, closer to real UNIX server iron than a glorified PC. > > > Thanks J! That's exactly what I was suspecting it might be. Actually, > I found an anandtech benchmark that shows the Opteron coming in at close > to 2.0x performance: > > http://www.anandtech.com/linux/showdoc.aspx?i=2163&p=2 > > It's an Opteron 150 (2.4ghz) vs. Xeon 3.6ghz from August. I wonder if > the differences are more pronounced with the newer Opterons. > > -Jeff > > ---(end of broadcast)--- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Alternatives to Dell?
Not in my experience for IBM, even for an order approaching 100k. The sales guy was rude, jumping on numbers, unable to talk about exactly what differentiates IBM from Dell (equivalent config) - other than the name and their 20K+ difference. We use many Dell servers, no quality issue, but as someone pointed out earlier, linux support is not that great. Only issue so far hardware wise is the PERC card on one of the machines, and i believe one should stay away from the adaptec versions of PERC. -anjan -Original Message- From: Joshua D. Drake [mailto:[EMAIL PROTECTED] Sent: Wed 12/1/2004 9:15 PM To: Matthew Marlowe Cc: Steven Rosenstein; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [PERFORM] Alternatives to Dell? > >Go the IBM website, try to find a comparative x86 system and >spec it out. The list pricing is in the $12-16K range. Yes, I know >I could get a good discount if I developed a relationship with >an IBM reseller here..and perhaps the end pricing would be >in the $10-12K rangebut the Dell way just seems alot more honest >to me, and reasonable. The IBM gear doesn't seem that much better. > > It is my experience that IBM will get within 5% of Dell if you provide IBM with a written quote from Dell. Sincerely, Joshua D. Drake >And while I have concerns about some of the Dell >hardware, none of the issues have really caused any issues for me or my clients >here yet.(crossing fingers..) > >I just don't think IBM makes it easy for new customers to buy their equipment and >if I went with them, I'd always have the lingering suspicion that I was paying too much. > >I really hope they change some day... Until then, I just see Dell winning more of the >server market share. > >Regards, >Matt >--- Original Message--- > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > From: Steven Rosenstein <[EMAIL PROTECTED]> > Sent: 12/01/2004 4:17PM > Subject: Re: [PERFORM] Alternatives to Dell? > > > >>> >>> >>>I recommend IBM equipment, but in the spirit of caveat emptor I should let >>>you know I work for IBM... :-) >>> >>>Seriously, I've been using IBM laptops and desktops for about 5 years, even >>>before I started working for them. They tend to be a little more expensive >>>than Dell, but I think they use beefier components and don't cut the specs >>>quite as close as Dell does. IBM gear is designed more for industrial use >>>than home computing, which is reflected in the quality (and the price). >>> >>>IBM just released a new series of PowerPC-based servers that are >>>specifically designed to run Linux. They're at the higher end, but from >>>what I understand, they provide much more bang for the buck than >>>Intel-based servers. >>> >>>I hope this helps, >>>--- Steve >>> >>>___ >>> >>>Steven Rosenstein >>>Senior IT Architect/Specialist | IBM Virtual Server Administration >>>Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001 >>>Text Messaging: 6463456978 @ mobile.mycingular.com >>>Email: srosenst @ us.ibm.com >>> >>>"Learn from the mistakes of others because you can't live long enough to >>>make them all yourself." -- Eleanor Roosevelt >>> >>> >>> >>> Josh Berkus >>> <[EMAIL PROTECTED] >>> m> To >>> Sent by: [EMAIL PROTECTED] >>> pgsql-performance cc >>> [EMAIL PROTECTED] >>> .org Subject >>> [PERFORM] Alternatives to Dell? >>> >>> 12/01/2004 05:24 >>> PM >>> >>> >>> Please respond to >>> josh >>> >>> >>> >>> >>> >>> >>>Folks, >>> >>>A lot of people have been having a devilish time with Dell hardware lately. >>> >>>It seems like the quality control just isn't there on the Dell servers. >>> >>>Thing is, some companies are requ
Re: [PERFORM] Query Performance and IOWait
Check the linux-dell list for more...The PERC3/Di cards are specifically Adaptec, not most. PERC4/DC is LSI Megaraid. Unless you buy the cheaper version, most will come with battery. -anjan -Original Message- From: Andrew Janian [mailto:[EMAIL PROTECTED] Sent: Friday, November 19, 2004 4:22 PM To: [EMAIL PROTECTED] Subject: Re: [PERFORM] Query Performance and IOWait The data that we are accessing is via QLogic cards connected to an EMC Clarion. We have tried it on local SCSI disks with the same (bad) results. When the machine gets stuck in a 100% IOWAIT state it often crashes soon after that. The disks are fine, have been replaced and checked. Here are my results from hdparm -Tt /dev/sda1 (which is the EMC disk array) /dev/sda1: Timing buffer-cache reads: 2976 MB in 2.00 seconds = 1488.00 MB/sec Timing buffered disk reads: 44 MB in 3.13 seconds = 14.06 MB/sec -Original Message- From: Dave Cramer [mailto:[EMAIL PROTECTED] Sent: Thursday, November 18, 2004 11:14 AM To: Andrew Janian Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Query Performance and IOWait Andrew, Dell's aren't well known for their disk performance, apparently most of the perc controllers sold with dell's are actually adaptec controllers. Also apparently they do not come with the battery required to use the battery backed up write cache ( In fact according to some Dell won't even sell the battery to you). Also Dell's monitoring software is quite a memory hog. Have you looked at top ?, and also hdparm -Tt /dev/sd? Dave Andrew Janian wrote: >Hello All, > >I have a setup with a Dell Poweredge 2650 with Red Hat and Postgres 7.4.5 with a database with about 27GB of data. The table in question has about 35 million rows. > >I am running the following query: > >SELECT * >FROM mb_fix_message >WHERE msg_client_order_id IN ( > SELECT msg_client_order_id > FROM mb_fix_message > WHERE msg_log_time >= '2004-06-01' > AND msg_log_time < '2004-06-01 13:30:00.000' > AND msg_message_type IN ('D','G') > AND mb_ord_type = '1' > ) > AND msg_log_time > '2004-06-01' > AND msg_log_time < '2004-06-01 23:59:59.999' > AND msg_message_type = '8' > AND (mb_raw_text LIKE '%39=1%' OR mb_raw_text LIKE '%39=2%'); > >with the following plan: > > QUERY PLAN >Nested Loop IN Join (cost=0.00..34047.29 rows=1 width=526) > -> Index Scan using mfi_log_time on mb_fix_message (cost=0.00..22231.31 rows=2539 width=526) > Index Cond: ((msg_log_time > '2004-06-01 00:00:00'::timestamp without time zone) AND (msg_log_time < '2004-06-01 23:59:59.999'::timestamp without time zone)) > Filter: (((msg_message_type)::text = '8'::text) AND (((mb_raw_text)::text ~~ '%39=1%'::text) OR ((mb_raw_text)::text ~~ '%39=2%'::text))) > -> Index Scan using mfi_client_ordid on mb_fix_message (cost=0.00..445.56 rows=1 width=18) > Index Cond: (("outer".msg_client_order_id)::text = (mb_fix_message.msg_client_order_id)::text) > Filter: ((msg_log_time >= '2004-06-01 00:00:00'::timestamp without time zone) AND (msg_log_time < '2004-06-01 13:30:00'::timestamp without time zone) AND ((msg_message_type)::text = 'D'::text) OR ((msg_message_type)::text = 'G'::text)) AND ((mb_ord_type)::text = '1'::text)) > >While running, this query produces 100% iowait usage on its processor and takes a ungodly amount of time (about an hour). > >The postgres settings are as follows: > >shared_buffers = 32768 # min 16, at least max_connections*2, 8KB each >sort_mem = 262144 # min 64, size in KB > >And the /etc/sysctl.conf has: >kernel.shmall = 274235392 >kernel.shmmax = 274235392 > >The system has 4GB of RAM. > >I am pretty sure of these settings, but only from my reading of the docs and others' recommendations online. > >Thanks, > >Andrew Janian >OMS Development >Scottrade Financial Services >(314) 965-1555 x 1513 >Cell: (314) 369-2083 > >---(end of broadcast)--- >TIP 7: don't forget to increase your free space map settings > > -- Dave Cramer http://www.postgresintl.com 519 939 0336 ICQ#14675561 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] shared_buffers and Shared Memory Segments
Hello, I am trying to understand the output of the ‘ipcs’ command during peak activity and how I can use it to possibly tune the shared_buffers… Here’s what I see right now: (ipcs –m) – (Host is RHAS 3.0) -- Shared Memory Segments key shmid owner perms bytes nattch status 0x0052e2c1 1966080 postgres 600 92078080 322 What is nattch? Is this the num of segments attached? Is it saying that about 92MB is used out of 512MB? -Shared memory segment size is defined to be 512MB -Currently, shared_buffers are at 80MB (10240) Here’s the ‘top’ output: 12:29:42 up 24 days, 15:04, 6 users, load average: 2.28, 1.07, 1.07 421 processes: 414 sleeping, 3 running, 4 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 83.6% 0.0% 40.8% 0.0% 7.6% 76.4% 190.0% cpu00 20.9% 0.0% 9.0% 0.3% 0.1% 22.5% 46.8% cpu01 19.2% 0.0% 10.6% 0.0% 7.3% 14.4% 48.3% cpu02 15.0% 0.0% 7.3% 0.0% 0.0% 8.6% 68.9% cpu03 28.6% 0.0% 14.0% 0.0% 0.1% 31.0% 26.0% Mem: 7973712k av, 7675856k used, 297856k free, 0k shrd, 149220k buff 3865444k actv, 2638404k in_d, 160092k in_c Swap: 4096532k av, 28k used, 4096504k free 6387092k cached Can I conclude anything from these outputs and the buffer setting? Appreciate any thoughts. Thanks, Anjan
Re: [PERFORM] Summary: can't handle large number of INSERT/UPDATEs
I would like to thank everyone for their timely suggestions. The problem appears to be resolved now. We verified/modified - locking/indexes/vacuum/checkpoints/IO bottleneck/queries, etc. Couple significant changes were the number of checkpoint segments were increased, and we moved over the database to a new SAN RAID10 volume (which was in plan anyway, just did it sooner). Thanks, Anjan From: Anjan Dave Sent: Monday, October 25, 2004 4:53 PM To: [EMAIL PROTECTED] Subject: [PERFORM] can't handle large number of INSERT/UPDATEs Hi, I am dealing with an app here that uses pg to handle a few thousand concurrent web users. It seems that under heavy load, the INSERT and UPDATE statements to one or two specific tables keep queuing up, to the count of 150+ (one table has about 432K rows, other has about 2.6Million rows), resulting in ‘wait’s for other queries, and then everything piles up, with the load average shooting up to 10+. We (development) have gone through the queries/explain analyzes and made sure the appropriate indexes exist among other efforts put in. I would like to know if there is anything that can be changed for better from the systems perspective. Here’s what I have done and some recent changes from the system side: -Upgraded from 7.4.0 to 7.4.1 sometime ago -Upgraded from RH8 to RHEL 3.0 -The settings from postgresql.conf (carried over, basically) are: shared_buffers = 10240 (80MB) max_connections = 400 sort_memory = 1024 effective_cache_size = 262144 (2GB) checkpoint_segments = 15 stats_start_collector = true stats_command_string = true Rest everything is at default In /etc/sysctl.conf (512MB shared mem) kernel.shmall = 536870912 kernel.shmmax = 536870912 -This is a new Dell 6650 (quad XEON 2.2GHz, 8GB RAM, Internal HW RAID10), RHEL 3.0 (2.4.21-20.ELsmp), PG 7.4.1 -Vaccum Full run everyday -contrib/Reindex run everyday -Disabled HT in BIOS I would greatly appreciate any helpful ideas. Thanks in advance, Anjan
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
Josh, I have increased them to 30, will see if that helps. Space is not a concern. slightly longer recovery time could be fine too. Wonder what people use (examples) for this value for high volume databases (except for dump/restore)...? I don't know what is checkpoint_sibling. I'll read about it if there's some info on it somewhere. Thanks, Anjan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Tue 10/26/2004 8:42 PM To: [EMAIL PROTECTED] Cc: Anjan Dave; Tom Lane; Rod Taylor Subject: Re: [PERFORM] can't handle large number of INSERT/UPDATEs Anjan, > Oct 26 17:26:25 vl-pe6650-003 postgres[14273]: [4-1] LOG: recycled > transaction > log file "000B0082" > ... > Oct 26 17:31:27 vl-pe6650-003 postgres[14508]: [2-1] LOG: recycled > transaction > log file "000B0083" > Oct 26 17:31:27 vl-pe6650-003 postgres[14508]: [3-1] LOG: recycled > transaction > log file "000B0084" > Oct 26 17:31:27 vl-pe6650-003 postgres[14508]: [4-1] LOG: recycled > transaction > log file "000B0085" Looks like you're running out of disk space for pending transactions. Can you afford more checkpoint_segments? Have you considered checkpoint_siblings? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
Ok, i was thinking from the disk perspective. Thanks! -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tue 10/26/2004 6:37 PM To: Anjan Dave Cc: Matt Clark; Rod Taylor; Postgresql Performance Subject: Re: [PERFORM] can't handle large number of INSERT/UPDATEs "Anjan Dave" <[EMAIL PROTECTED]> writes: > One thing I am not sure is why 'bi' (disk writes) stays at 0 mostly, > it's the 'bo' column that shows high numbers (reads from disk). With so > many INSERT/UPDATEs, I would expect it the other way around... Er ... it *is* the other way around. bi is blocks in (to the CPU), bo is blocks out (from the CPU). regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
It just seems that the more activity there is (that is when there's a lot of disk activity) the checkpoints happen quicker too. Here's a snapshot from the /var/log/messages - Oct 26 17:21:22 vl-pe6650-003 postgres[13978]: [2-1] LOG: recycled transaction log file "000B007E" Oct 26 17:21:22 vl-pe6650-003 postgres[13978]: [3-1] LOG: recycled transaction log file "000B007F" ... Oct 26 17:26:25 vl-pe6650-003 postgres[14273]: [2-1] LOG: recycled transaction log file "000B0080" Oct 26 17:26:25 vl-pe6650-003 postgres[14273]: [3-1] LOG: recycled transaction log file "000B0081" Oct 26 17:26:25 vl-pe6650-003 postgres[14273]: [4-1] LOG: recycled transaction log file "000B0082" ... Oct 26 17:31:27 vl-pe6650-003 postgres[14508]: [2-1] LOG: recycled transaction log file "000B0083" Oct 26 17:31:27 vl-pe6650-003 postgres[14508]: [3-1] LOG: recycled transaction log file "000B0084" Oct 26 17:31:27 vl-pe6650-003 postgres[14508]: [4-1] LOG: recycled transaction log file "000B0085" ... I have increased them from default 3 to 15. Haven't altered the frequency though Thanks, Anjan -Original Message----- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 26, 2004 5:53 PM To: Anjan Dave Cc: Rod Taylor; Postgresql Performance Subject: Re: [PERFORM] can't handle large number of INSERT/UPDATEs "Anjan Dave" <[EMAIL PROTECTED]> writes: > None of the locks are in state false actually. In that case you don't have a locking problem. > I don't have iostat on that machine, but vmstat shows a lot of writes to > the drives, and the runnable processes are more than 1: I get the impression that you are just saturating the write bandwidth of your disk :-( It's fairly likely that this happens during checkpoints. Look to see if the postmaster has a child that shows itself as a checkpointer in "ps" when the saturation is occurring. You might be able to improve matters by altering the checkpoint frequency parameters (though beware that either too small or too large will likely make matters even worse). regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
That is 1 or maybe 2 second interval. One thing I am not sure is why 'bi' (disk writes) stays at 0 mostly, it's the 'bo' column that shows high numbers (reads from disk). With so many INSERT/UPDATEs, I would expect it the other way around... -anjan -Original Message- From: Matt Clark [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 26, 2004 2:29 PM To: Anjan Dave Cc: Rod Taylor; Postgresql Performance Subject: Re: [PERFORM] can't handle large number of INSERT/UPDATEs >I don't have iostat on that machine, but vmstat shows a lot of writes to >the drives, and the runnable processes are more than 1: > > 6 1 0 3617652 292936 279192800 0 52430 1347 4681 25 >19 20 37 > > Assuming that's the output of 'vmstat 1' and not some other delay, 50MB/second of sustained writes is usually considered 'a lot'. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
Andrew/Josh, Josh also suggested to check for any FK/referential integrity checks, but I am told that we don't have any foreign key constraints. Thanks, anjan -Original Message- From: Andrew McMillan [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 26, 2004 4:51 PM To: Anjan Dave Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] can't handle large number of INSERT/UPDATEs On Mon, 2004-10-25 at 16:53 -0400, Anjan Dave wrote: > Hi, > > > > I am dealing with an app here that uses pg to handle a few thousand > concurrent web users. It seems that under heavy load, the INSERT and > UPDATE statements to one or two specific tables keep queuing up, to > the count of 150+ (one table has about 432K rows, other has about > 2.6Million rows), resulting in 'wait's for other queries, and then > everything piles up, with the load average shooting up to 10+. Hi, We saw a similar problem here that was related to the locking that can happen against referred tables for referential integrity. In our case we had referred tables with very few rows (i.e. < 10) which caused the insert and update on the large tables to be effectively serialised due to the high contention on the referred tables. We changed our app to implement those referential integrity checks differently and performance was hugely boosted. Regards, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 Chicken Little was right. - ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
None of the locks are in state false actually. I don't have iostat on that machine, but vmstat shows a lot of writes to the drives, and the runnable processes are more than 1: procs memory swap io system cpu r b swpd free buff cache si sobibo incs us sy wa id 1 2 0 3857568 292936 279187600 0 44460 1264 2997 23 13 22 41 2 2 0 3824668 292936 279188400 0 25262 1113 4797 28 12 29 31 2 3 0 3784772 292936 279189600 0 38988 1468 6677 28 12 48 12 2 4 0 3736256 292936 279190400 0 50970 1530 5217 19 12 49 20 4 2 0 3698056 292936 279190800 0 43576 1369 7316 20 15 35 30 2 1 0 3667124 292936 279192000 0 39174 1444 4659 25 16 35 24 6 1 0 3617652 292936 279192800 0 52430 1347 4681 25 19 20 37 1 3 0 352 292936 279086800 0 40156 1439 4394 20 14 29 37 6 0 0 3797488 292936 256864800 0 17706 2272 21534 28 23 19 30 0 0 0 3785396 292936 256873600 0 1156 1237 14057 33 8 0 59 0 0 0 3783568 292936 256873600 0 704 512 1537 5 2 1 92 1 0 0 3783188 292936 256875200 0 842 613 1919 6 1 1 92 -anjan -Original Message- From: Rod Taylor [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 26, 2004 1:49 PM To: Anjan Dave Cc: Postgresql Performance Subject: RE: [PERFORM] can't handle large number of INSERT/UPDATEs On Tue, 2004-10-26 at 13:42, Anjan Dave wrote: > It probably is locking issue. I got a long list of locks held when we ran select * from pg_locks during a peak time. > > relation | database | transaction | pid | mode | granted > --+--+-+---+--+- > 17239 |17142 | | 3856 | AccessShareLock | t How many have granted = false? > Vmstat would show a lot of disk IO at the same time. > > Is this pointing towards a disk IO issue? Not necessarily. Is your IO reaching the limit or is it just heavy? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
It probably is locking issue. I got a long list of locks held when we ran select * from pg_locks during a peak time. relation | database | transaction | pid | mode | granted --+--+-+---+--+- 17239 |17142 | | 3856 | AccessShareLock | t | |21196323 | 3875 | ExclusiveLock| t 16390 |17142 | | 3911 | AccessShareLock | t 16595 |17142 | | 3782 | AccessShareLock | t 17227 |17142 | | 3840 | AccessShareLock | t 17227 |17142 | | 3840 | RowExclusiveLock | t ... ... Vmstat would show a lot of disk IO at the same time. Is this pointing towards a disk IO issue? (to that end, other than a higher CPU speed, and disabling HT, only thing changed is that it's RAID5 volume now, instead of a RAID10) -anjan -Original Message- From: Rod Taylor [mailto:[EMAIL PROTECTED] Sent: Monday, October 25, 2004 5:19 PM To: Anjan Dave Cc: Postgresql Performance Subject: Re: [PERFORM] can't handle large number of INSERT/UPDATEs On Mon, 2004-10-25 at 16:53, Anjan Dave wrote: > Hi, > > > > I am dealing with an app here that uses pg to handle a few thousand > concurrent web users. It seems that under heavy load, the INSERT and > UPDATE statements to one or two specific tables keep queuing up, to > the count of 150+ (one table has about 432K rows, other has about > 2.6Million rows), resulting in ʽwaitʼs for other queries, and then This isn't an index issue, it's a locking issue. Sounds like you have a bunch of inserts and updates hitting the same rows over and over again. Eliminate that contention point, and you will have solved your problem. Free free to describe the processes involved, and we can help you do that. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] can't handle large number of INSERT/UPDATEs
Hi, I am dealing with an app here that uses pg to handle a few thousand concurrent web users. It seems that under heavy load, the INSERT and UPDATE statements to one or two specific tables keep queuing up, to the count of 150+ (one table has about 432K rows, other has about 2.6Million rows), resulting in ‘wait’s for other queries, and then everything piles up, with the load average shooting up to 10+. We (development) have gone through the queries/explain analyzes and made sure the appropriate indexes exist among other efforts put in. I would like to know if there is anything that can be changed for better from the systems perspective. Here’s what I have done and some recent changes from the system side: -Upgraded from 7.4.0 to 7.4.1 sometime ago -Upgraded from RH8 to RHEL 3.0 -The settings from postgresql.conf (carried over, basically) are: shared_buffers = 10240 (80MB) max_connections = 400 sort_memory = 1024 effective_cache_size = 262144 (2GB) checkpoint_segments = 15 stats_start_collector = true stats_command_string = true Rest everything is at default In /etc/sysctl.conf (512MB shared mem) kernel.shmall = 536870912 kernel.shmmax = 536870912 -This is a new Dell 6650 (quad XEON 2.2GHz, 8GB RAM, Internal HW RAID10), RHEL 3.0 (2.4.21-20.ELsmp), PG 7.4.1 -Vaccum Full run everyday -contrib/Reindex run everyday -Disabled HT in BIOS I would greatly appreciate any helpful ideas. Thanks in advance, Anjan
Re: [PERFORM] SAN performance
I believe 1/0 or 1+0 is aka RAID-10. CX300 doesn't support 0+1. So far i am aware of two things, the cache page size is 8KB (can be increased or decreased), and the stripe element size of 128 sectors default. Thanks, Anjan -Original Message- From: Mr Pink [mailto:[EMAIL PROTECTED] Sent: Thu 9/23/2004 11:39 AM To: Anjan Dave; [EMAIL PROTECTED] Cc: Subject: Re: [PERFORM] SAN performance Hi, I expect you mean RAID 1/0 or 1+0 since the CX300 didn't support RAID 10 last time I looked. Whether you are using a SAN or not, you should consider putting the WAL files (pg_xlog folder) on seperate diskes from the DB. Since the log files are mostly written to, not read from you could just use RAID 1. It's a pity pg doesn't have a way to use a cluster of servers to get the most out of your expensive SAN. I read a comment earlier about setting block sizes to 8k to math pg's block size. Seems to make sense, you should check it out. Have fun, Mr Pink --- Anjan Dave <[EMAIL PROTECTED]> wrote: > Hello, > > > > I'll be moving a DB from internal RAID-10 SCSI storage to an EMC CX300 > FC RAID-10 LUN, bound to the host. I've setup a test host machine and a > test LUN. The /var/lib/pgsql/data folder is sym-linked to a partition on > the LUN. > > > > Other than the shared_buffers, effective cache size, and sort memory, I > am not sure if I need to change any other parameters in the > postgresql.conf file for getting maximum performance from the EMC box. > > > > Is there a general guideline for setting up postgres database and the > tunable parameters on a SAN, especially for EMC? > > > > Appreciate any help, > > > > Thanks, > Anjan > > __ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] SAN performance
Hello, I’ll be moving a DB from internal RAID-10 SCSI storage to an EMC CX300 FC RAID-10 LUN, bound to the host. I’ve setup a test host machine and a test LUN. The /var/lib/pgsql/data folder is sym-linked to a partition on the LUN. Other than the shared_buffers, effective cache size, and sort memory, I am not sure if I need to change any other parameters in the postgresql.conf file for getting maximum performance from the EMC box. Is there a general guideline for setting up postgres database and the tunable parameters on a SAN, especially for EMC? Appreciate any help, Thanks, Anjan
Re: [PERFORM] Database Server Tuning
Vivek, Was there anything specific that helped you decide on a RAID-5 and not a RAID-10? I have my DBs on RAID10, and would soon be moving them on FC drives, and i am considering RAID-10. Thanks, Anjan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Tue 3/2/2004 4:27 PM To: Vivek Khera; [EMAIL PROTECTED] Cc: Subject: Re: [PERFORM] Database Server Tuning Vivek, > I did a bunch of testing with different RAID levels on a 14 disk > array. I finally settled on this: RAID5 across 14 disks for the > data, the OS (including syslog directory) and WAL on a RAID1 pair on > the other channel of the same controller (I didn't want to spring for > dual RAID controllers). The biggest bumps in performance came from > increasing the checkpoint_buffers since my DB is heavily written to, > and increasing sort_mem. With large RAID, have you found that having WAL on a seperate array actually boosts performance? The empirical tests we've seen so far don't seem to support this. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Scaling further up
Can you describe the vendors/components of a "cheap SAN setup?" Thanks, Anjan -Original Message- From: Rod Taylor [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 02, 2004 5:57 PM To: Scott Marlowe Cc: Anjan Dave; Chris Ruprecht; [EMAIL PROTECTED]; William Yu; Postgresql Performance Subject: Re: [PERFORM] Scaling further up > For speed, the X86 32 and 64 bit architectures seem to be noticeable > faster than Sparc. However, running Linux or BSD on Sparc make them > pretty fast too, but you lose the fault tolerant support for things like > hot swappable CPUs or memory. Agreed.. You can get a Quad Opteron with 16GB memory for around 20K. Grab 3, a cheap SAN and setup a little master/slave replication with failover (how is Slony coming?), and you're all set. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Scaling further up
For the disks part - I am looking at a SAN implementation, and I will be planning a separate RAID group for the WALs. The controller is a PERC, with 128MB cache, and I think it is writeback. Other than the disks, I am curious what other people are using in terms of the horsepower needed. The Quad server has been keeping up, but we are expecting quite high loads in the near future, and I am not sure if just by having the disks on a high-end storage will do it. Thanks, Anjan -Original Message- From: Magnus Hagander [mailto:[EMAIL PROTECTED] Sent: Monday, March 01, 2004 3:54 PM To: Anjan Dave; [EMAIL PROTECTED] Subject: RE: [PERFORM] Scaling further up > All: > > We have a Quad-Intel XEON 2.0GHz (1MB cache), 12GB memory, running RH9, PG 7.4.0. There's > an internal U320, 10K RPM RAID-10 setup on 4 drives. > > We are expecting a pretty high load, a few thousands of 'concurrent' users executing either > select, insert, update, statments. > What is the next step up in terms of handling very heavy loads? Clustering? I'd look at adding more disks first. Depending on what type of query load you get, that box sounds like it will be very much I/O bound. More spindles = more parallell operations = faster under load. Consider adding 15KRPM disks as well, they're not all that much more expensive, and should give you better performance than 10KRPM. Also, make sure you put your WAL disks on a separate RAIDset if possible (not just a separate partition on existing RAIDset). Finally, if you don't already have it, look for a battery-backed RAID controller that can do writeback-cacheing, and enable that. (Don't even think about enabling it unless it's battery backed!) And add as much RAM as you can to that controller. //Magnus ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Quad processor options
Did you mean to say the trigger-based clustering solution is loading the dual CPUs 60-70% right now? Performance will not be linear with more processors, but it does help with more processes. We haven't benchmarked it, but we haven't had any problems also so far in terms of performance. Price would vary with your relation/yearly purchase, etc, but a 6650 with 2.0GHz/1MB cache/8GB Memory, RAID card, drives, etc, should definitely cost you less than 20K USD. -anjan -Original Message- From: Bjoern Metzdorf [mailto:[EMAIL PROTECTED] Sent: Tue 5/11/2004 4:28 PM To: Anjan Dave Cc: [EMAIL PROTECTED]; Pgsql-Admin (E-mail) Subject: Re: [PERFORM] Quad processor options Anjan Dave wrote: > We use XEON Quads (PowerEdge 6650s) and they work nice, > provided you configure the postgres properly. > Dell is the cheapest quad you can buy i think. > You shouldn't be paying 30K unless you are getting high CPU-cache > on each processor and tons of memory. good to hear, I tried to online configure a quad xeon here at dell germany, but the 6550 is not available for online configuration. at dell usa it works. I will give them a call tomorrow. > I am actually curious, have you researched/attempted any > postgresql clustering solutions? > I agree, you can't just keep buying bigger machines. There are many asynchronous, trigger based solutions out there (eRserver etc..), but what we need is basically a master <-> master setup, which seems not to be available soon for postgresql. Our current dual Xeon runs at 60-70% average cpu load, which is really much. I cannot afford any trigger overhead here. This machine is responsible for over 30M page impressions per month, 50 page impressums per second at peak times. The autovacuum daemon is a god sent gift :) I'm curious how the recently announced mysql cluster will perform, although it is not an option for us. postgresql has far superior functionality. > They have 5 internal drives (4 in RAID 10, 1 spare) on U320, > 128MB cache on the PERC controller, 8GB RAM. Could you tell me what you paid approximately for this setup? How does it perform? It certainly won't be twice as fast a as dual xeon, but I remember benchmarking a quad P3 xeon some time ago, and it was disappointingly slow... Regards, Bjoern ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Quad processor options
We use XEON Quads (PowerEdge 6650s) and they work nice, provided you configure the postgres properly. Dell is the cheapest quad you can buy i think. You shouldn't be paying 30K unless you are getting high CPU-cache on each processor and tons of memory. I am actually curious, have you researched/attempted any postgresql clustering solutions? I agree, you can't just keep buying bigger machines. They have 5 internal drives (4 in RAID 10, 1 spare) on U320, 128MB cache on the PERC controller, 8GB RAM. Thanks, Anjan -Original Message- From: Bjoern Metzdorf [mailto:[EMAIL PROTECTED] Sent: Tue 5/11/2004 3:06 PM To: [EMAIL PROTECTED] Cc: Pgsql-Admin (E-mail) Subject: [PERFORM] Quad processor options Hi, I am curious if there are any real life production quad processor setups running postgresql out there. Since postgresql lacks a proper replication/cluster solution, we have to buy a bigger machine. Right now we are running on a dual 2.4 Xeon, 3 GB Ram and U160 SCSI hardware-raid 10. Has anyone experiences with quad Xeon or quad Opteron setups? I am looking at the appropriate boards from Tyan, which would be the only option for us to buy such a beast. The 30k+ setups from Dell etc. don't fit our budget. I am thinking of the following: Quad processor (xeon or opteron) 5 x SCSI 15K RPM for Raid 10 + spare drive 2 x IDE for system ICP-Vortex battery backed U320 Hardware Raid 4-8 GB Ram Would be nice to hear from you. Regards, Bjoern ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Wierd context-switching issue on Xeon
Tested the sql on Quad 2.0GHz XEON/8GB RAM: During the first run, the CS shooted up more than 100k, and was randomly high/low Second process made it consistently high 100k+ Third brought it down to anaverage 80-90k Fourth brought it down to an average of 50-60k/s By cancelling the queries one-by-one, the CS started going up again. 8 logical CPUs in 'top', all of them not at all too busy, load average stood around 2 all the time. Thanks. Anjan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Tue 4/20/2004 12:59 PM To: Anjan Dave; Dirk LutzebÃck; Tom Lane Cc: [EMAIL PROTECTED]; Neil Conway Subject: Re: [PERFORM] Wierd context-switching issue on Xeon Anjan, > Quad 2.0GHz XEON with highest load we have seen on the applications, DB > performing great - Can you run Tom's test? It takes a particular pattern of data access to reproduce the issue. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}} \viewkind4\uc1\pard\f0\fs20 [EMAIL PROTECTED] root]# vmstat 2\par procs memoryswap io system cpu\par r b w swpd free buff cache si sobibo incs us sy id\par 2 0 4 25068 30752 339164 6899660 0 0 1 20 2 0 1 2\par 1 1 2 25068 21608 339164 6909292 0 0 0 20240 623 31025 12 9 79\par 2 1 1 25068 24580 339168 6909292 0 0 0 22446 523 824 12 1 87\par 1 0 0 25068 241244 339168 6691372 0 0 0 498 141 79995 13 6 81\par 1 0 0 25068 241172 339168 6691372 0 0 0 0 11723 13 2 86\par 1 0 0 25068 241208 339168 6691372 0 0 068 12432 13 0 88\par 1 0 1 25068 241208 339168 6691372 0 0 0 0 11923 13 0 88\par 1 0 0 25068 241208 339168 6691372 0 0 0 0 11423 13 2 86\par 1 0 0 25068 241208 339168 6691372 0 0 074 132 284 13 0 88\par 1 0 0 25068 241208 339168 6691372 0 0 0 0 11718 13 2 86\par 2 0 0 25068 240256 339168 6691376 0 0 082 145 13905 14 2 84\par 1 0 0 25068 240168 339168 6691380 0 0 0 338 177 4746 13 1 86\par 1 0 0 25068 240168 339168 6691380 0 0 056 128 221 12 2 86\par 1 0 0 25068 240180 339168 6691380 0 0 090 131 12633 14 1 85\par 2 0 1 25068 240140 339168 6691380 0 0 0 104 144 100919 18 6 76\par 2 0 0 25068 240136 339168 6691380 0 0 0 138 138 106567 18 5 77\par 2 0 0 25068 240132 339168 6691380 0 0 050 138 108254 16 5 79\par 2 0 0 25068 240128 339168 6691380 0 0 086 127 102183 16 7 77\par 1 0 0 25068 240132 339168 6691380 0 0 0 0 119 110382 17 5 78\par 2 0 0 25068 239980 339168 6691380 0 0 0 0 125 106970 18 4 78\par 2 0 0 25068 239972 339168 6691380 0 0 0 136 140 103389 17 7 76\par procs memoryswap io system cpu\par r b w swpd free buff cache si sobibo incs us sy id\par 2 0 0 25068 240008 339168 6691380 0 0 082 134 107627 19 4 77\par 2 0 0 25068 240012 339168 6691380 0 0 090 128 94183 16 9 75\par 2 0 0 25068 213520 339168 6715988 0 0 0 114 156 82781 16 7 78\par 2 0 1 25068 120356 339168 6803692 0 0 0 30790 522 31866 15 10 76\par 1 1 3 25068 55384 339168 6870940 0 0 0 21904 466 25549 15 11 73\par 1 1 2 25068 22804 339168 6903996 0 0 0 21786 538 29445 13 7 80\par 1 1 1 25068 22284 339168 6905036 0 0 0 20678 634 3428 12 1 87\par 2 0 0 25068 26232 339168 6906028 0 0 0 12054 332 3577 12 3 84\par \par \par 2 Processes running - \par procs memoryswap io system cpu\par r b w swpd free buff cache si sobibo incs us sy id\par 2 0 0 25068 244412 339192 6691392 0 0 066 150 144059 14 7 79\par 2 0 1 25068 244368 339196 6691388 0 0 0 134 123 147517 16 7 77\par 2 0 0 25068 244356 339196 6691388 0 0 0 0 119 134576 16 8 76\par 2 0 0 25068 244340 339196 6691388 0 0 092 143 103336 17 4 79\par 2 0 0 25068 244172 339196 6691388 0 0 0 156 158 105336 18 6 75\par 2 0 0 25068 244104 339196 6691388 0 0 0 0 118 105222 18 5 77\par 2
Re: [PERFORM] Wierd context-switching issue on Xeon
If this helps - Quad 2.0GHz XEON with highest load we have seen on the applications, DB performing great - procs memory swap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 0 0 1616 351820 66144 1081370400 2 01 1 0 2 7 3 0 0 1616 349712 66144 1081373600 8 1634 1362 4650 4 2 95 0 0 0 1616 347768 66144 1081412000 188 1218 1158 4203 5 1 93 0 0 1 1616 346596 66164 1081418400 8 1972 1394 4773 4 1 94 2 0 1 1616 345424 66164 108142720020 1392 1184 4197 4 2 94 Around 4k CS/sec Chipset is Intel ServerWorks GC-HE. Linux Kernel 2.4.20-28.9bigmem #1 SMP Thanks, Anjan -Original Message- From: Dirk Lutzebäck [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 20, 2004 10:29 AM To: Tom Lane; Josh Berkus Cc: [EMAIL PROTECTED]; Neil Conway Subject: Re: [PERFORM] Wierd context-switching issue on Xeon Dirk Lutzebaeck wrote: > c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro) > > performs well and I could not observe context switch peaks here (one > user active), almost no extra semop calls Did Tom's test here: with 2 processes I'll reach 200k+ CS with peaks to 300k CS. Bummer.. Josh, I don't think you can bash the ServerWorks chipset here nor bigmem. Dirk ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Moving postgres to FC disks
I am planning to move the pg databases from the internal RAID to external Fiber Channel over SAN. Question is – -With the db size being as big as, say, 30+GB, how do I move it on the new logical drive? (stop postgresql, and simply move it over somehow and make a link?) -Currently, the internal RAID volume is ext3 filesystem. Any recommendations for the filesystem on the new FC volume? Rieserfs? DBs are 7.4.1(RH9), and 7.2.3 (RH8). Appreciate any pointers. Thanks, Anjan
Re: [PERFORM] Wierd context-switching issue on Xeon
What about quad-XEON setups? Could that be worse? (have dual, and quad setups both) Shall we re-consider XEON-MP CPU machines with high cache (4MB+)? Very generally, what number would be considered high, especially, if it coincides with expected heavy load? Not sure a specific chipset was mentioned... Thanks, Anjan -Original Message- From: Greg Stark [mailto:[EMAIL PROTECTED] Sent: Sun 4/18/2004 8:40 PM To: Tom Lane Cc: [EMAIL PROTECTED]; Josh Berkus; [EMAIL PROTECTED]; Neil Conway Subject: Re: [PERFORM] Wierd context-switching issue on Xeon Tom Lane <[EMAIL PROTECTED]> writes: > So in the short term I think we have to tell people that Xeon MP is not > the most desirable SMP platform to run Postgres on. (Josh thinks that > the specific motherboard chipset being used in these machines might > share some of the blame too. I don't have any evidence for or against > that idea, but it's certainly possible.) > > In the long run, however, CPUs continue to get faster than main memory > and the price of cache contention will continue to rise. So it seems > that we need to give up the assumption that SpinLockAcquire is a cheap > operation. In the presence of heavy contention it won't be. There's nothing about the way Postgres spinlocks are coded that affects this? Is it something the kernel could help with? I've been wondering whether there's any benefits postgres is missing out on by using its own hand-rolled locking instead of using the pthreads infrastructure that the kernel is often involved in. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux
What bus speeds? 533MHz on the 32-bit Intel will give you about 4.2Gbps of IO throughput... I think the Sun will be 150MHz, 64bit is 2.4Gbps of IO. Correct me if i am wrong. Thanks, Anjan -Original Message- From: Subbiah, Stalin [mailto:[EMAIL PROTECTED] Sent: Tue 3/23/2004 1:40 PM To: 'Andrew Sullivan'; '[EMAIL PROTECTED]' Cc: Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux We are looking into Sun V210 (2 x 1 GHz cpu, 2 gig ram, 5.8Os) vs. Dell 1750 (2 x 2.4 GHz xeon, 2 gig ram, RH3.0). database will mostly be write intensive and disks will be on raid 10. Wondering if 64bit 1 GHz to 32bit 2.4 GHz make a big difference here. Thanks! -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Andrew Sullivan Sent: Tuesday, March 23, 2004 9:37 AM To: '[EMAIL PROTECTED]' Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux On Mon, Mar 22, 2004 at 04:05:45PM -0800, Subbiah, Stalin wrote: > being the key performance booster for postgres. what is the preferred OS > for postgres deployment if given an option between linux and solaris. As One thing this very much depends on is what you're trying to do. Suns have a reputation for greater reliability. While my own experience with Sun hardware has been rather shy of sterling, I _can_ say that it stands head and shoulders above a lot of the x86 gear you can get. If you're planning to use Solaris on x86, don't bother. Solaris is a slow, bloated pig compared to Linux, at least when it comes to managing the largish number of processes that Postgres requires. If pure speed is what you're after, I have found that 2-way, 32 bit Linux on P-IIIs compares very favourably to 4 way 64 bit Ultra SPARC IIs. A -- Andrew Sullivan | [EMAIL PROTECTED] The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Scaling further up
Great response, Thanks. Regarding 12GB memory and 13G db, and almost no I/O, one thing I don't understand is that even though the OS caches most of the memory and PG can use it if it needs it, why would the system swap (not much, only during peak times)? The SHMMAX is set to 512MB, shared_buffers is 150MB, effective cache size is 2GB, sort mem is 2MB, rest is default values. It also happens that a large query (reporting type) can hold up the other queries, and the load averages shoot up during peak times. Regarding a baseline - -We have docs and monitor for frequency of sql statements, most expensive ones, etc. (IronEye) -I am monitoring disk reads/writes using iostat -How do I measure commit frequency, and system events like checkpoint? (vacuum is done nightly during less or no load) Thanks, Anjan -Original Message- From: Aaron W [mailto:[EMAIL PROTECTED] Sent: Thursday, March 04, 2004 8:58 AM To: [EMAIL PROTECTED]; Anjan Dave Subject: Re: Scaling further up I'd look at adding more disks first. Depending on what type of query load you get, that box sounds like it will be very much I/O bound Given a a 13G database on a 12G system, with a low growth rate, it is likely that there is almost no I/O for most activities. The exception is checkpointing. The first thing I'd do is try to build a spreadsheet model of: - select frequency, and # logical and physical reads involved - insert/delete/update frequency, and # logical and physical read and writes involved - commit frequency, etc. (start out with simplistic assumptions, and do it for peak load) - system events (checkpoints, vacuum) I assume that the only high I/O you will see will be for logging. The RAID issue there is basically obviated by the sequential write nature of WAL. If that is the case, EMC is not the most cost effective or obvious solution - since the value they provide is mostly manageability for disaster recovery. The goal in this case is to write at the application max speed, and with mimimal latency. Any responsible battery backed up write through (mirrored) cached controller can do that for you. On the other hand, if your requests are not *all* trivial, you are going to test the hardware and scheduling algorithms of OS and pg. Even if 0.1% of 3,000 tps take a second - that ends up generating 3 seconds of load Any, even slightly, slow transactions will generate enormous queues which slow down everything. In most systems of this volume I've seen, the mix of activities is constantly invalidating cache, making L2 caching less important. Memory to CPU bus speed is a limiting factor, as well as raw CPU speed in processing the requests. Xeon is not a great architecture for this because of FSB contention; I suspect a 4-way will be completely FSB bottlenecked so a more than 4 way would likely not change performance. I would try to get a simple model/benchmark going and test against it. You should be talking to the big iron vendors for their take on your issues and get their capacity benchmarks. __ Do you Yahoo!? Yahoo! Search - Find what you're looking for faster http://search.yahoo.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Scaling further up
Here's what I recorded today from iostat (linux, iostat -x -k, sda3 is the pg slice, logs included) during peak time on the RAID-10 array - What i see is mostly writes, and sometimes, quite a bit of writing, during which the average wait times shoot up. Device:rrqm/s wrqm/s r/s w/s rsec/s wsec/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util /dev/sda3 18.81 113.21 3.90 36.33 181.54 1207.7590.77 603.88 34.54 0.490.73 0.22 0.87 /dev/sda30.00 208.00 0.00 150.000.00 2884.00 0.00 1442.00 19.23 0.750.50 0.33 5.00 /dev/sda30.00 239.00 0.00 169.000.00 3264.00 0.00 1632.00 19.31 2.151.27 0.33 5.50 /dev/sda30.00 224.50 0.00 158.000.00 3060.00 0.00 1530.00 19.37 1.901.20 0.28 4.50 /dev/sda30.00 157.00 0.00 117.000.00 2192.00 0.00 1096.00 18.74 0.400.34 0.30 3.50 /dev/sda30.00 249.50 0.00 179.000.00 3596.00 0.00 1798.00 20.0921.40 10.78 0.39 7.00 /dev/sda30.00 637.50 0.00 620.500.00 9936.00 0.00 4968.00 16.01 1137.15 183.55 1.85 115.00 /dev/sda30.00 690.00 0.00 548.500.00 9924.00 0.00 4962.00 18.0943.107.82 0.46 25.50 /dev/sda30.00 485.00 0.00 392.000.00 7028.00 0.00 3514.00 17.9386.90 22.21 1.14 44.50 /dev/sda30.00 312.50 0.00 206.500.00 4156.00 0.00 2078.00 20.13 3.501.69 0.53 11.00 /dev/sda30.00 386.50 0.00 275.500.00 5336.00 0.00 2668.00 19.3716.806.10 0.60 16.50 /dev/sda30.00 259.00 0.00 176.500.00 3492.00 0.00 1746.00 19.78 3.251.84 0.40 7.00 /dev/sda30.00 196.00 0.00 99.000.00 2360.00 0.00 1180.00 23.84 0.100.10 0.10 1.00 /dev/sda30.00 147.00 0.00 100.000.00 1976.00 0.00 988.00 19.76 0.500.50 0.45 4.50 /dev/sda30.00 126.50 0.00 94.500.00 1768.00 0.00 884.00 18.71 0.200.21 0.21 2.00 /dev/sda30.00 133.50 0.00 106.500.00 1920.00 0.00 960.00 18.03 0.500.47 0.47 5.00 /dev/sda30.00 146.50 0.00 118.000.00 2116.00 0.00 1058.00 17.93 0.200.21 0.17 2.00 /dev/sda30.00 156.00 0.00 128.500.00 2276.00 0.00 1138.00 17.71 0.350.27 0.27 3.50 /dev/sda30.00 145.00 0.00 105.000.00 2000.00 0.00 1000.00 19.05 0.250.24 0.24 2.50 /dev/sda30.00 72.96 0.00 54.510.00 1019.74 0.00 509.87 18.71 0.170.31 0.31 1.72 /dev/sda30.00 168.50 0.00 139.500.00 2464.00 0.00 1232.00 17.66 0.650.47 0.39 5.50 /dev/sda30.00 130.50 0.00 100.000.00 1844.00 0.00 922.00 18.44 0.000.00 0.00 0.00 /dev/sda30.00 122.00 0.00 101.000.00 1784.00 0.00 892.00 17.66 0.250.25 0.25 2.50 /dev/sda30.00 143.00 0.00 121.500.00 2116.00 0.00 1058.00 17.42 0.250.21 0.21 2.50 /dev/sda30.00 134.50 0.00 96.500.00 1848.00 0.00 924.00 19.15 0.350.36 0.36 3.50 /dev/sda30.00 153.50 0.00 115.000.00 2148.00 0.00 1074.00 18.68 0.350.30 0.30 3.50 /dev/sda30.00 101.50 0.00 80.000.00 1452.00 0.00 726.00 18.15 0.200.25 0.25 2.00 /dev/sda30.00 108.50 0.00 92.500.00 1608.00 0.00 804.00 17.38 0.250.27 0.27 2.50 /dev/sda30.00 179.00 0.00 132.500.00 2492.00 0.00 1246.00 18.81 0.550.42 0.42 5.50 /dev/sda31.00 113.00 1.00 83.00 16.00 1568.00 8.00 784.00 18.86 0.150.18 0.12 1.00 /dev/sda30.00 117.00 0.00 97.500.00 1716.00 0.00 858.00 17.60 0.200.21 0.21 2.00 /dev/sda30.00 541.00 0.00 415.500.00 7696.00 0.00 3848.00 18.52 146.50 35.09 1.37 57.00 /dev/sda30.00 535.00 0.00 392.500.00 7404.00 0.00 3702.00 18.86 123.70 31.67 1.31 51.50 /dev/sda30.00 993.50 0.00 697.500.00 13544.00 0.00 6772.00 19.42 174.25 24.98 1.25 87.00 /dev/sda30.00 245.00 0.00 108.500.00 2832.00 0.00 1416.00 26.10 0.550.51 0.51 5.50 -Original Message- From: scott.marlowe [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 02, 2004 4:16 PM To: Anjan Dave Cc: [EMAIL PROTECTED]; William Yu; [EMAIL PROTECTED] Subject: Re: [PERFORM] Scaling further up On Tue, 2 Mar 2004, Anjan Dave wrote: > "By lots I mean dozen(s) in a raid 10 array with a good controller." > > I believe, for RAID-10, I will need even number of drives. Correct. > Currently, > the size of the database is about 13GB, and is not expected to grow > exponentially with thousands of concurrent users, so total space is > not of paramount importance compared to performance. > > Does this sound reasonable setup? > 10x36GB FC drives on RAID-10 > 4x36GB FC drives for the logs on RAID-10 (not sure if this is the > correct
Re: [PERFORM] Scaling further up
That was part of my original question - whether it makes sense to go for a mid-range SunFire machine (64bit HW, 64bit OS), which is scalable to high amounts of memory, and shouldn't have any issues addressing it all. I've had that kind of setup once temporarily on a V480 (quad UltraSparc, 16GB RAM) machine, and it did well in production use. Without having the time/resources to do extensive testing, I am not sure if Postgres/Solaris9 is really suggested by the community for high-performance, as opposed to a XEON/Linux setup. Storage being a separate discussion. Thanks, Anjan -Original Message- From: Chris Ruprecht [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 02, 2004 4:17 PM To: Anjan Dave; [EMAIL PROTECTED]; William Yu Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Scaling further up Hi all, If you have a DB of 'only' 13 GB and you do not expect it to grow much, it might be advisable to have enough memory (RAM) to hold the entire DB in shared memory (everything is cached). If you have a server with say 24 GB or memory and can allocate 20 GB for cache, you don't care about the speed of disks any more - all you worry about is the speed of your memory and your network connection. I believe, this not possible using 32-bit technology, you would have to go to some 64-bit platform, but if it's speed you want ... You can also try solid state hard disk drives. These are actually just meory, there are no moving parts, but the look and behave like very very fast disk drives. I have seen them at capacities of 73 GB - but they didn't mention the price (I'd probably have a heart attack when I look at the price tag). Best regards, Chris On Tuesday 02 March 2004 14:41, Anjan Dave wrote: > "By lots I mean dozen(s) in a raid 10 array with a good controller." > > I believe, for RAID-10, I will need even number of drives. Currently, > the size of the database is about 13GB, and is not expected to grow > exponentially with thousands of concurrent users, so total space is > not of paramount importance compared to performance. > > Does this sound reasonable setup? > 10x36GB FC drives on RAID-10 > 4x36GB FC drives for the logs on RAID-10 (not sure if this is the > correct ratio)? 1 hotspare > Total=15 Drives per enclosure. > > Tentatively, I am looking at an entry-level EMC CX300 product with 2GB > RAID cache, etc. > > Question - Are 73GB drives supposed to give better performance because > of higher number of platters? > > Thanks, > Anjan > > > -Original Message- > From: Fred Moyer [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 02, 2004 5:57 AM > To: William Yu; Anjan Dave > Cc: [EMAIL PROTECTED] > Subject: Re: [PERFORM] Scaling further up > > On Tue, 2004-03-02 at 17:42, William Yu wrote: > > Anjan Dave wrote: > > > We have a Quad-Intel XEON 2.0GHz (1MB cache), 12GB memory, running > > > RH9, PG 7.4.0. There's an internal U320, 10K RPM RAID-10 setup on > > > 4 > > drives. > > > > We are expecting a pretty high load, a few thousands of > > > 'concurrent' users executing either select, insert, update, > > > statments. > > > > The quick and dirty method would be to upgrade to the recently > > announced 3GHz Xeon MPs with 4MB of L3. My semi-educated guess is > > that you'd get > > > > another +60% there due to the huge L3 hiding the Xeon's shared bus > > penalty. > > If you are going to have thousands of 'concurrent' users you should > seriously consider the 2.6 kernel if you are running Linux or as an > alternative going with FreeBSD. You will need to load test your > system and become an expert on tuning Postgres to get the absolute > maximum performance from each and every query you have. > > And you will need lots of hard drives. By lots I mean dozen(s) in a > raid 10 array with a good controller. Thousands of concurrent users > means hundreds or thousands of transactions per second. I've > personally seen it scale that far but in my opinion you will need a > lot more hard drives and ram than cpu. > > > ---(end of > broadcast)--- > TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Scaling further up
"By lots I mean dozen(s) in a raid 10 array with a good controller." I believe, for RAID-10, I will need even number of drives. Currently, the size of the database is about 13GB, and is not expected to grow exponentially with thousands of concurrent users, so total space is not of paramount importance compared to performance. Does this sound reasonable setup? 10x36GB FC drives on RAID-10 4x36GB FC drives for the logs on RAID-10 (not sure if this is the correct ratio)? 1 hotspare Total=15 Drives per enclosure. Tentatively, I am looking at an entry-level EMC CX300 product with 2GB RAID cache, etc. Question - Are 73GB drives supposed to give better performance because of higher number of platters? Thanks, Anjan -Original Message- From: Fred Moyer [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 02, 2004 5:57 AM To: William Yu; Anjan Dave Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Scaling further up On Tue, 2004-03-02 at 17:42, William Yu wrote: > Anjan Dave wrote: > > We have a Quad-Intel XEON 2.0GHz (1MB cache), 12GB memory, running > > RH9, > > PG 7.4.0. There's an internal U320, 10K RPM RAID-10 setup on 4 drives. > > > > We are expecting a pretty high load, a few thousands of 'concurrent' > > users executing either select, insert, update, statments. > > The quick and dirty method would be to upgrade to the recently > announced > 3GHz Xeon MPs with 4MB of L3. My semi-educated guess is that you'd get > another +60% there due to the huge L3 hiding the Xeon's shared bus penalty. If you are going to have thousands of 'concurrent' users you should seriously consider the 2.6 kernel if you are running Linux or as an alternative going with FreeBSD. You will need to load test your system and become an expert on tuning Postgres to get the absolute maximum performance from each and every query you have. And you will need lots of hard drives. By lots I mean dozen(s) in a raid 10 array with a good controller. Thousands of concurrent users means hundreds or thousands of transactions per second. I've personally seen it scale that far but in my opinion you will need a lot more hard drives and ram than cpu. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Scaling further up
Title: Message All: We have a Quad-Intel XEON 2.0GHz (1MB cache), 12GB memory, running RH9, PG 7.4.0. There's an internal U320, 10K RPM RAID-10 setup on 4 drives. We are expecting a pretty high load, a few thousands of 'concurrent' users executing either select, insert, update, statments. What is the next step up in terms of handling very heavy loads? Clustering? Are there any standard, recommended clustering options? How about this? http://c-jdbc.objectweb.org Also, in terms of hardware, overall, what benefits more, a SunFire 880 (6 or 8 CPUs, lots of RAM, internal FC Drives) type of machine, or an IA-64 architecture? Appreciate any inputs, Thanks,Anjan ** This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.
[PERFORM] Postgresql on SAN
Title: Message Hello, Has anyone designed/implemented postgresql server on storage networks? Are there any design considerations? Are there any benchmarks for storage products (HBAs, Switches, Storage Arrays)? Any recommendation on the design, resources, references, keeping PG in mind? Thanks,Anjan ** This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.
[PERFORM] Compile Vs RPMs
Title: Message Hello, I would like to know whether there are any significant performance advantages of compiling (say, 7.4) on your platform (being RH7.3, 8, and 9.0, and Fedora especially) versus getting the relevant binaries (rpm) from the postgresql site? Hardware is Intel XEON (various speeds, upto 2.8GHz, single/dual/quad configuration). Thankyou, Anjan ** This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.
Re: [PERFORM] shared_buffer value
Sorry I wasn't clear. We do have nightly vacuum crons defined on all pg servers. Apparently, this one had been taking many hours to finish recently, and we did an additional vacuum during day time when there was low volume, which finished quickly. The platform I mentioned is RedHat 9, PG7.4, on Dell PowerEdge2650. Here's the output of 'top' below, taken just now. I can capture the stats during peak time in the afternoon also: 68 processes: 67 sleeping, 1 running, 0 zombie, 0 stopped CPU0 states: 3.1% user 4.4% system0.0% nice 0.0% iowait 92.0% idle CPU1 states: 0.0% user 3.2% system0.0% nice 0.0% iowait 96.3% idle CPU2 states: 0.4% user 0.3% system0.0% nice 0.0% iowait 98.3% idle CPU3 states: 0.3% user 1.0% system0.0% nice 0.0% iowait 98.2% idle Mem: 3874188k av, 3622296k used, 251892k free, 0k shrd, 322372k buff 2369836k actv, 454984k in_d, 44568k in_c Swap: 4096532k av, 24552k used, 4071980k free 2993384k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 4258 postgres 16 0 88180 86M 85796 S 2.1 2.2 14:55 0 postmaster 5260 postgres 15 0 85844 83M 84704 S 0.0 2.2 2:51 1 postmaster 14068 root 23 0 69240 67M 2164 S 3.9 1.7 59:44 2 wish 3157 postgres 15 0 50364 49M 48484 S 0.0 1.2 0:02 3 postmaster 2174 postgres 15 0 50196 48M 48380 S 0.1 1.2 0:00 0 postmaster 3228 postgres 15 0 49292 48M 47536 S 0.0 1.2 0:00 3 postmaster 3050 postgres 15 0 49184 47M 47364 S 0.5 1.2 0:00 2 postmaster 2725 postgres 15 0 7788 7688 6248 S 0.0 0.1 0:00 3 postmaster 3600 postgres 16 0 5812 5700 4784 S 0.0 0.1 0:00 3 postmaster 1342 gdm 15 0 12988 5560 2056 S 0.0 0.1 19:36 3 gdmgreeter According to top's man, RSS is: The total amount of physical memory used by the task, in kilo-bytes... SHARE is: The amount of shared memory used by the task is shown... SIZE is: The size of the task's code plus data plus stack space, in kilo-bytes... I am not sure how do I calculate whether 80MB shared_buffer (in postgresql.conf)should be increased or decreased from the above values, because during higher loads, the number of postmaster instances go up to 100 (limited by max connections), each at an RSS of about 87MB... Thanks, anjan -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Thursday, January 15, 2004 7:52 PM To: Anjan Dave Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] shared_buffer value "Anjan Dave" <[EMAIL PROTECTED]> writes: > Question is, does the 80MB buffer allocation correspond to ~87MB per > postmaster instance? (with about 100 instances of postmaster, that > will be about 100 x 80MB =3D 8GB??) Most likely, top is counting some portion of the shared memory block against each backend process. This behavior is platform-specific, however, and you did not tell us what platform you're on. > Interestingly, at one point, we vacuumed the database, and the size > reported by 'df -k' on the pgsql slice dropped very > significantly...guess, it had been using a lot of temp files? "At one point"? If your setup doesn't include *routine* vacuuming, you are going to have problems with file bloat. This isn't something you can do just when you happen to remember it --- it needs to be driven off a cron job or some such. Or use the contrib autovacuum daemon. You want to vacuum often enough to keep the database size more or less constant. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] shared_buffer value
Title: Message Gurus, I have defined the following values on a db: shared_buffers = 10240 # 10240 = 80MB max_connections = 100 sort_mem = 1024 # 1024KB is 1MB per operation effective_cache_size = 262144 # equals to 2GB for 8k pages Rest of the values are unchanged from default. The poweredge 2650 machine has 4GB RAM, and the size of the database (size of 'data' folder) is about 5GB. PG is 7.4, RH9. The machine has been getting quite busy (when, say, 50 students login at the same time, when others have logged in already) and is maxing out at 100 connections (will increase this tonight probably to 200). We have been getting "too many clients" message upon trying to connect. Once connected, the pgmonitor, and the 'pg_stat_activity' show connections reaching about 100. There's a series of SELECT and UPDATE statements that get called for when a group of users log in simultaneously...and for some reason, many of them stay there for a while... During that time, if i do a 'top', i can see multiple postmaster processes, each about 87MB in size. The Memory utilization drops down to about 30MB free, and i can see a little bit of swap utilization in vmstat then. Question is, does the 80MB buffer allocation correspond to ~87MB per postmaster instance? (with about 100 instances of postmaster, that will be about 100 x 80MB = 8GB??) Should i decrease the buffer value to about 50MB and monitor? Interestingly, at one point, we vacuumed the database, and the size reported by 'df -k' on the pgsql slice dropped very significantly...guess, it had been using a lot of temp files? Further steps will be to add more memory, and possibly drop/recreate a couple of indexes that are used in the UPDATE statements. Thanks in advance for any inputs. -Anjan ** This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.
Re: [PERFORM] Server Configs
Dear Gurus, We are planning to add more db server hardware for the apps. The question is, what makes more sense regarding performance/scalability/price of the hardware... There are a couple of apps, currently on a dual-cpu Dell server. The usage of the apps is going to increase quite a lot, and considering the prices, we are looking at the following options: Option 1: == Have each app on a separate db server (looking at 4 of these). The server being a PowerEdge 2650, Dual 2.8GHz/512KB XEONS, 2GB RAM, PERC-3 RAID-5, split back plane (2+3), and 5 x 36GB HDDs (10K RPM). Note: These servers are 1/3 the price of the Quad-cpu 6650 server. Option 2: == Have two to three apps dbs hosted on a single server. The server being a PowerEdge 6650, 4 x 2GHz/1MB XEONS, 8GB RAM, PERC-3 RAID-5, split back plane (2+3), and 5 x 36GB HDDs (10K RPM). Note: This server is 3 times more the price of the option 1. Appreciate your guidance. Thanks, Anjan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL data on a NAS device ?
Just an interesting comparison: I don't have the specifics, but a Dell 2 x 2.4GHZ/512KB L3 / 2GB RAM machine timed a query much faster than an older Sun E4000 with 6 x ~300MHZ CPUs / 2GB RAM. One on RH(8 or 9, don't remember) and one on Solaris 9. -anjan -Original Message- From: William Yu [mailto:[EMAIL PROTECTED] Sent: Tue 10/21/2003 12:12 PM To: [EMAIL PROTECTED] Cc: Subject: Re: [PERFORM] PostgreSQL data on a NAS device ? > I have never worked with a XEON CPU before. Does anyone know how it performs > running PostgreSQL 7.3.4 / 7.4 on RedHat 9 ? Is it faster than a Pentium 4? > I believe the main difference is cache memory, right? Aside from cache mem, > it's basically a Pentium 4, or am I wrong? Well, see the problem is of course, there's so many flavors of P4s and Xeons that it's hard to tell which is faster unless you specify the exact model. And even then, it would depend on the workload. Would a Xeon/3GHz/2MB L3/400FSB be faster than a P4C/3GHz/800FSB? No idea as no one has complete number breakdowns on these comparisons. Oh yeah, you could get a big round number that says on SPEC or something one CPU is faster than the other but whether that's faster for Postgres and your PG app is a totally different story. That in mind, I wouldn't worry about it. The CPU is probably plenty fast for what you need to do. I'd look into two things in the server: memory and CPU expandability. I know you already plan on 4GB but you may need even more in the future. Few things can dramatically improve performance more than moving disk access to disk cache. And if there's a 2nd socket where you can pop another CPU in, that would leave you extra room if your server becomes CPU limited. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Tuning for mid-size server
Hopefully, i am not steering this into a different direction, but is there a way to find out how much sort memory each query is taking up, so that we can scale that up with increasing users? THanks, Anjan -Original Message- From: scott.marlowe [mailto:[EMAIL PROTECTED] Sent: Tue 10/21/2003 1:33 PM To: Josh Berkus Cc: Anjan Dave; Richard Huxton; [EMAIL PROTECTED] Subject: Re: [PERFORM] Tuning for mid-size server On Tue, 21 Oct 2003, Josh Berkus wrote: > Anjan, > > > From what I know, there is a cache-row-set functionality that doesn't > > exist with the newer postgres... > > What? PostgreSQL has always used the kernel cache for queries. > > > Concurrent users will start from 1 to a high of 5000 or more, and could > > ramp up rapidly. So far, with increased users, we have gone up to > > starting the JVM (resin startup) with 1024megs min and max (recommended > > by Sun) - on the app side. > > Well, just keep in mind when tuning that your calculations should be based on > *available* RAM, meaning RAM not used by Apache or the JVM. > > With that many concurrent requests, you'll want to be *very* conservative with > sort_mem; I might stick to the default of 1024 if I were you, or even lower > it to 512k. Exactly. Remember, Anjan, that that if you have a single sort that can't fit in RAM, it will use the hard drive for temp space, effectively "swapping" on its own. If the concurrent sorts run the server out of memory, the server will start swapping process, quite possibly the sorts, in a sort of hideous round robin death spiral that will bring your machine to its knees as the worst possible time, midday under load. sort_mem is one of the small "foot guns" in the postgresql.conf file that people tend to pick up and go "huh, what's this do?" right before cranking it up. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Tuning for mid-size server
Josh, The app servers are seperate dual-cpu boxes with 2GB RAM on each. Yes, from all the responses i have seen, i will be reducing the numbers to what has been suggested. Thanks to all, anjan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Tue 10/21/2003 1:22 PM To: Anjan Dave; Richard Huxton; [EMAIL PROTECTED] Cc: Subject: Re: [PERFORM] Tuning for mid-size server Anjan, > From what I know, there is a cache-row-set functionality that doesn't > exist with the newer postgres... What? PostgreSQL has always used the kernel cache for queries. > Concurrent users will start from 1 to a high of 5000 or more, and could > ramp up rapidly. So far, with increased users, we have gone up to > starting the JVM (resin startup) with 1024megs min and max (recommended > by Sun) - on the app side. Well, just keep in mind when tuning that your calculations should be based on *available* RAM, meaning RAM not used by Apache or the JVM. With that many concurrent requests, you'll want to be *very* conservative with sort_mem; I might stick to the default of 1024 if I were you, or even lower it to 512k. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Tuning for mid-size server
Josh, The 6650 can have upto 32GB of RAM. There are 5 drives. In future, they will be replaced by a fiber array - hopefully. I read an article that suggests you 'start' with 25% of memory for shared_buffers. Sort memory was suggested to be at 2-4%. Here's the link: http://www.ca.postgresql.org/docs/momjian/hw_performance/node8.html Maybe, I misinterpreted it. I read the document on http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html and the suggested values are much lower than what I have mentioned here. It won't hurt to start with lower numbers and increase lateron if needed. Thanks, Anjan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 21, 2003 12:21 PM To: Anjan Dave; [EMAIL PROTECTED] Subject: Re: [PERFORM] Tuning for mid-size server Anjan, > Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, > with internal drives on RAID5 will be delivered. Postgres will be from > RH8.0. How many drives? RAID5 sucks for heavy read-write databases, unless you have 5+ drives. Or a large battery-backed cache. Also, last I checked, you can't address 8GB of RAM without a 64-bit processor. Since when are the Xeons 64-bit? > Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144 That's too high. Cut it in half at least. Probably down to 5% of available RAM. > Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - > 167772 Fine if you're running a few-user-large-operation database. If this is a webserver, you want a much, much lower value. > Effective_cache_size = 262144 (same as shared_buffers - 25%) Much too low. Where did you get these calculations, anyway? > In a generic sense, these are recommended values I found in some > documents. Where? We need to contact the author of the "documents" and tell them to correct things. > joins, orderby, groupby clauses. The web application is based on > Apache/Resin and hotspot JVM 1.4.0. You'll need to estimate the memory consumed by Java & Apache to have realistic figures to work with. > Are the above settings ok to begin with? Are there any other > parameters that I should configure now, or monitor lateron? No, they're not. See: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these parameters. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Tuning for mid-size server
From what I know, there is a cache-row-set functionality that doesn't exist with the newer postgres... Concurrent users will start from 1 to a high of 5000 or more, and could ramp up rapidly. So far, with increased users, we have gone up to starting the JVM (resin startup) with 1024megs min and max (recommended by Sun) - on the app side. Thanks, Anjan -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 21, 2003 11:57 AM To: Anjan Dave; [EMAIL PROTECTED] Subject: Re: [PERFORM] Tuning for mid-size server On Tuesday 21 October 2003 15:28, Anjan Dave wrote: > Hi, > > Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, > with internal drives on RAID5 will be delivered. Postgres will be from > RH8.0. You'll want to upgrade PG to v7.3.4 > I am planning for these values for the postgres configuration - to > begin > with: > > Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144 > > Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - > 167772 > > Effective_cache_size = 262144 (same as shared_buffers - 25%) My instincts would be to lower the first two substantially, and increase the effective cache once you know load levels. I'd probably start with something like the values below and work up: shared_buffers = 8,000 - 10,000 (PG is happier letting the OS do the cacheing) sort_mem = 4,000 - 8,000 (don't forget this is for each sort) You'll find the annotated postgresql.conf and performance tuning articles useful: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php > In a generic sense, these are recommended values I found in some > documents. The database will be small in size and will gradually grow > over time from few thousands to a few million records, or more. The > activity will be mostly of select statements from a few tables with > joins, orderby, groupby clauses. The web application is based on > Apache/Resin and hotspot JVM 1.4.0. You'll need to figure out how many concurrent users you'll have and how much memory will be required by apache/java. If your database grows radically, you'll probably want to re-tune as it grows. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Tuning for mid-size server
Title: Tuning for mid-size server Hi, Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with internal drives on RAID5 will be delivered. Postgres will be from RH8.0. I am planning for these values for the postgres configuration - to begin with: Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144 Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - 167772 Effective_cache_size = 262144 (same as shared_buffers - 25%) In the /etc/sysctl file: = kernel.shmall = 536870912 (512MB) SHMALL Total amount of shared memory available (bytes or pages) kernel.shmmax = 536870912 (512MB) SHMMAX Maximum size of shared memory segment (bytes) In a generic sense, these are recommended values I found in some documents. The database will be small in size and will gradually grow over time from few thousands to a few million records, or more. The activity will be mostly of select statements from a few tables with joins, orderby, groupby clauses. The web application is based on Apache/Resin and hotspot JVM 1.4.0. Are the above settings ok to begin with? Are there any other parameters that I should configure now, or monitor lateron? In other words, am I missing anything here to take full advantage of 4 CPUs and 8Gigs of RAM? Appreciate any help. Thanks, Anjan ** This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.