Re: [PERFORM] Some performance numbers, with thoughts
Brian, Any idea what your bottleneck is? You can find out at a crude level by attaching an strace to the running backend, assuming it¹s running long enough to grab it, then look at what the system call breakdown is. Basically, run one of your long insert streams, do a ³top² to find which process id the backend is using (the ), then run this: strace -p -c And CTRL-C after a few seconds to see a breakdown of system calls. I think what you'll see is that for the small number of inserts per TXN, you'll be bottlenecked on fsync() calls, or fdatasync() if you defaulted it. Things might speed up a whole lot there depending on your choice of one or the other. - Luke On 6/19/06 5:09 PM, "Brian Hurt" <[EMAIL PROTECTED]> wrote: > > > For long involved reasons I'm hanging out late at work today, and rather > than doing real, productive work, I thought I'd run some benchmarks > against our development PostgreSQL database server. My conclusions are > at the end. > > The purpose of the benchmarking was to find out how fast Postgres was, > or to compare Postgres to other databases, but to instead answer the > question: when does it become worthwhile to switch over to using COPYs > instead of INSERTS, and by how much? This benchmark should in no way be > used to gauge absolute performance of PostgreSQL. > > The machine in question: a new HP-145 rack mount server, with a > single-socket dual-core 1.8GHz Opteron 275, 1M of cache per core, with > 4G of memory, running Redhat Linux (forget which version). Database was > on the local single SATA hard disk- no raid. From the numbers, I'm > assuming the disk honors fsync. Some tuning of the database was done, > specifically shared_buffers was upped to 2500 and temp_buffers to 1500 > (mental note to self: must increase these signifigantly more. Forgot > they were so low). fsync is definately on. Test program was written in > Ocaml, compiled to native code, using the Ocaml Postgresql connection > library (Ocaml bindings of the libpgsql library). The test was single > threaded- only one insert going on at a time, run over the local gigabit > ethernet network from a remote machine. > > The table design was very simple: > CREATE TABLE copytest ( > id SERIAL PRIMARY KEY NOT NULL, > name VARCHAR(64), > thread INT, > block INT, > num INT); > > The id column was not specified either in the inserts or in the copies, > instead it just came from the sequence. Other than the id, there are no > indexes on the table. Numbers are approximate. > > Results: > > Inserts, 1 per transaction*83 inserts/second > Inserts, 5 per transaction419 inserts/second > Inserts, 10 per transaction 843 inserts/second > Inserts, 50 per transaction ~3,100 inserts/second > Inserts, 100 per transaction~4,200 inserts/second > Inserts, 1,000 per transaction ~5,400 inserts/second > Copy, 5 element blocks ~405 inserts/second > Copy, 10 element blocks ~700 inserts/second > Copy, 50 element blocks ~3,400 inserts/second > Copy, 100 element blocks ~6,000 inserts/second > Copy, 1,000 element blocks ~20,000 inserts/second > Copy, 10,000 element blocks~27,500 inserts/second > Copy, 100,000 element blocks ~27,600 inserts/second > > * The singleton inserts were not done in an explicit begin/end block, > but were instead "unadorned" inserts. > > Some conclusions: > > 1) Transaction time is a huge hit on the small block sizes. Going from > 1 insert per transaction to 10 inserts per transaction gives a 10x speed > up. Once the block size gets large enough (10's to 100's of elements > per block) the cost of a transaction becomes less of a problem. > > 2) Both insert methods hit fairly hard walls of diminishing returns were > larger block sizes gave little performance advantage, tending to no > performance advantage. > > 3) For small enough block sizes, inserts are actually faster than > copies- but not by much. There is a broad plateau, spanning at least > the 5 through 100 elements per block (more than an order of magnitude), > where the performance of the two are roughly identical. For the general > case, I'd be inclined to switch to copies sooner (at 5 or so elements > per block) rather than later. > > 4) At the high end, copies vastly outperformed inserts. At 1,000 > elements per block, the copy was almost 4x faster than inserts. This > widened to ~5x before copy started topping out. > > 5) The performance of Postgres, at least on inserts, depends critically > on how you program it. One the same hardware, performance for me varied > over a factor of over 300-fold, 2.5 orders of magnitude. Programs which > are unaware of transactions and are designed to be highly portable are > likely to hit the abysmal side of performance, where the transaction > overhead kill
Re: [PERFORM] Some performance numbers, with thoughts
Brian Hurt <[EMAIL PROTECTED]> writes: > For long involved reasons I'm hanging out late at work today, and rather > than doing real, productive work, I thought I'd run some benchmarks > against our development PostgreSQL database server. My conclusions are > at the end. Ummm ... you forgot to mention Postgres version? Also, which client and server encodings did you use (that starts to get to be a noticeable issue for high COPY rates)? > 1) Transaction time is a huge hit on the small block sizes. Right. For small transactions with a drive honoring fsync, you should expect to get a max of about one commit per platter revolution. Your numbers work out to a shade under 5000 commits/minute, from which I speculate a 7200 RPM drive ... do you know what it really is? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Some performance numbers, with thoughts
For long involved reasons I'm hanging out late at work today, and rather than doing real, productive work, I thought I'd run some benchmarks against our development PostgreSQL database server. My conclusions are at the end. The purpose of the benchmarking was to find out how fast Postgres was, or to compare Postgres to other databases, but to instead answer the question: when does it become worthwhile to switch over to using COPYs instead of INSERTS, and by how much? This benchmark should in no way be used to gauge absolute performance of PostgreSQL. The machine in question: a new HP-145 rack mount server, with a single-socket dual-core 1.8GHz Opteron 275, 1M of cache per core, with 4G of memory, running Redhat Linux (forget which version). Database was on the local single SATA hard disk- no raid. From the numbers, I'm assuming the disk honors fsync. Some tuning of the database was done, specifically shared_buffers was upped to 2500 and temp_buffers to 1500 (mental note to self: must increase these signifigantly more. Forgot they were so low). fsync is definately on. Test program was written in Ocaml, compiled to native code, using the Ocaml Postgresql connection library (Ocaml bindings of the libpgsql library). The test was single threaded- only one insert going on at a time, run over the local gigabit ethernet network from a remote machine. The table design was very simple: CREATE TABLE copytest ( id SERIAL PRIMARY KEY NOT NULL, name VARCHAR(64), thread INT, block INT, num INT); The id column was not specified either in the inserts or in the copies, instead it just came from the sequence. Other than the id, there are no indexes on the table. Numbers are approximate. Results: Inserts, 1 per transaction*83 inserts/second Inserts, 5 per transaction419 inserts/second Inserts, 10 per transaction 843 inserts/second Inserts, 50 per transaction ~3,100 inserts/second Inserts, 100 per transaction~4,200 inserts/second Inserts, 1,000 per transaction ~5,400 inserts/second Copy, 5 element blocks ~405 inserts/second Copy, 10 element blocks ~700 inserts/second Copy, 50 element blocks ~3,400 inserts/second Copy, 100 element blocks ~6,000 inserts/second Copy, 1,000 element blocks ~20,000 inserts/second Copy, 10,000 element blocks~27,500 inserts/second Copy, 100,000 element blocks ~27,600 inserts/second * The singleton inserts were not done in an explicit begin/end block, but were instead "unadorned" inserts. Some conclusions: 1) Transaction time is a huge hit on the small block sizes. Going from 1 insert per transaction to 10 inserts per transaction gives a 10x speed up. Once the block size gets large enough (10's to 100's of elements per block) the cost of a transaction becomes less of a problem. 2) Both insert methods hit fairly hard walls of diminishing returns were larger block sizes gave little performance advantage, tending to no performance advantage. 3) For small enough block sizes, inserts are actually faster than copies- but not by much. There is a broad plateau, spanning at least the 5 through 100 elements per block (more than an order of magnitude), where the performance of the two are roughly identical. For the general case, I'd be inclined to switch to copies sooner (at 5 or so elements per block) rather than later. 4) At the high end, copies vastly outperformed inserts. At 1,000 elements per block, the copy was almost 4x faster than inserts. This widened to ~5x before copy started topping out. 5) The performance of Postgres, at least on inserts, depends critically on how you program it. One the same hardware, performance for me varied over a factor of over 300-fold, 2.5 orders of magnitude. Programs which are unaware of transactions and are designed to be highly portable are likely to hit the abysmal side of performance, where the transaction overhead kills performance. I'm not sure there is a fix for this (let alone an easy fix)- simply dropping transactions is obviously not it. Programs that are transaction aware and willing to use PostgreSQL-specific features can get surprisingly excellent performance. Simply being transaction-aware and doing multiple inserts per transaction greatly increases performance, giving an easy order of magnitude increase (wrapping 10 inserts in a transaction gives a 10x performance boost). Brian ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SAN performance mystery
Michael Stone wrote: On Mon, Jun 19, 2006 at 08:09:47PM +1000, Tim Allen wrote: Certainly, the read performance of the SATA disk still beats the SAN, and there is no way to lie about read performance. Sure there is: you have the data cached in system RAM. I find it real hard to believe that you can sustain 161MB/s off a single SATA disk. Agreed - approx 60-70Mb/s seems to be the ballpark for modern SATA drives, so get get 161Mb/s you would need about 3 of them striped together (or a partially cached file as indicated). What is interesting is that (presumably) the same test is getting such uninspiring results on the SAN... Having said that, I've been there too, about 4 years ago with a SAN that had several 6 disk RAID5 arrays, and the best sequential *read* performance we ever saw from them was about 50Mb/s. I recall trying to get performance data from the vendor - only to be told that if we were doing benchmarks - could they have our results when we were finished! regards Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SAN performance mystery
* John Vincent ([EMAIL PROTECTED]) wrote: > >> I'd have to agree with you about the specific SAN/setup you're working > >> with there. I certainly disagree that it's a general property of SAN's > >> though. We've got a DS4300 with FC controllers and drives, hosts are > >> generally dual-controller load-balanced and it works quite decently. > >> > >How are you guys doing the load balancing? IIRC, the RDAC driver only does > >failover. Or are you using the OS level multipathing instead? While we were > >on the 4300 for our AIX boxes, we just created two big RAID5 LUNs and > >assigned one to each controller. With 2 HBAs and LVM stripping that was > >about the best we could get in terms of load balancing. We're using the OS-level multipathing. I tend to prefer using things like multipath over specific-driver options. I havn't spent a huge amount of effort profiling the SAN, honestly, but it's definitely faster than the direct-attached hardware-RAID5 SCSI system we used to use (from nStor), though that could have been because they were smaller, slower, regular SCSI disks (not FC). A simple bonnie++ run on one of the systems on the SAN gave me this: Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP vardamir 32200M 40205 15 22399 5 102572 10 288.4 0 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 2802 99 + +++ + +++ 2600 99 + +++ 10205 100 So, 40MB/s out, 102MB/s in, or so. This was on an ext3 filesystem. Underneath that array it's a 3-disk RAID5 of 300GB 10k RPM FC disks. We also have a snapshot on that array, but it was disabled at the time. > >Indeed, the EMC SANs are generally the high-priced ones too, so not > >> really sure what to tell you about the poor performance you're seeing > >> out of it. Your IT folks and/or your EMC rep. should be able to resolve > >> that, really... > > > > > >The only exception I've heard to this is the Clarion AX150. We looked at > >one and we were warned off of it by some EMC gearheads. Yeah, the Clarion is the EMC "cheap" line, and I think the AX150 was the extra-cheap one which Dell rebranded and sold. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] SAN performance mystery
I'd have to agree with you about the specific SAN/setup you're working with there. I certainly disagree that it's a general property of SAN'sthough. We've got a DS4300 with FC controllers and drives, hosts aregenerally dual-controller load-balanced and it works quite decently. How are you guys doing the load balancing? IIRC, the RDAC driver only does failover. Or are you using the OS level multipathing instead? While we were on the 4300 for our AIX boxes, we just created two big RAID5 LUNs and assigned one to each controller. With 2 HBAs and LVM stripping that was about the best we could get in terms of load balancing. Indeed, the EMC SANs are generally the high-priced ones too, so not really sure what to tell you about the poor performance you're seeing out of it. Your IT folks and/or your EMC rep. should be able to resolvethat, really...The only exception I've heard to this is the Clarion AX150. We looked at one and we were warned off of it by some EMC gearheads.
Re: [PERFORM] SAN performance mystery
On 6/19/06, Tim Allen <[EMAIL PROTECTED]> wrote: As I noted in another thread, the HBA is an Emulex LP1050, and they havea rather old driver for it. I've recommended that they update ASAP. Thishasn't happened yet.Yeah, I saw that in a later thread. I would suggest also that the BIOS settings on the HBA itself have been investigated. An example is the Qlogic HBAs have a profile of sorts, one for tape and one for disk. Could be something there. OK, thanks, I'll ask the customer whether they've used PowerPath at all. They do seem to have it installed on the machine, but I suppose thatdoesn't guarantee it's being used correctly. However, it looks like theyhave just the one HBA, so, if I've correctly understood what loadbalancing means in this context, it's not going to help; right? If they have a single HBA then no it won't help. I'm not very intimate on powerpath but it might even HURT if they have it enabled with one HBA. As an example, we were in the process of migrating an AIX LPAR to our DS6800. We only had one spare HBA to assign it. The default policy with the SDD driver is lb (load balancing). The problem is that with the SDD driver you see multiple hdisks per HBA per controller port on the SAN. Since we had 4 controller ports active on the SAN, our HBA saw 4 hdisks per LUN. The SDD driver abstracts that out as a single vpath and you use the vpaths as your pv on the system. The problem was that it was attempting to load balance across a single hba which was NOT what we wanted. I've done some dd'ing myself, as described in another thread. The results are not at all encouraging - their SAN seems to do about 20MB/sor less.I saw that as well. The SAN possibly is over-subscribed. Can you suggest any easy ways forme to find out? The customer has an IT department who look after theirSANs, and they're not keen on outsiders poking their noses in. It's hard for me to get any direct access to the SAN itself.When I say over-subscribed, you have to look at all the active LUNs and all of the systems attached as well. With the DS4300 (standard not turbo option), the SAN can handle 512 I/Os per second. If I have 4 LUNs assigned to four systems (1 per system), and each LUN has a queue_depth of 128 from each system, I''ll oversubscribe with the next host attach unless I back the queue_depth off on each host. Contrast that with the Turbo controller option which does 1024 I/Os per sec and I can duplicate what I have now or add a second LUN per host. I can't even find how much our DS6800 supports. Thanks for all the suggestions, John. I'll keep trying to follow some of them up.From what I can tell, it sounds like the SATA problem other people have mentioned sounds like the culprit.
Re: [PERFORM] SAN performance mystery
* Tim Allen ([EMAIL PROTECTED]) wrote: > The conclusion I'm drawing here is that this SAN does not perform at all > well, and is not a good database platform. It's sounding from replies > from other people that this might be a general property of SAN's, or at > least the ones that are not stratospherically priced. I'd have to agree with you about the specific SAN/setup you're working with there. I certainly disagree that it's a general property of SAN's though. We've got a DS4300 with FC controllers and drives, hosts are generally dual-controller load-balanced and it works quite decently. Indeed, the EMC SANs are generally the high-priced ones too, so not really sure what to tell you about the poor performance you're seeing out of it. Your IT folks and/or your EMC rep. should be able to resolve that, really... Enjoy, Stephen signature.asc Description: Digital signature
Re: [PERFORM] SAN performance mystery
John Vincent wrote: Is that expected performance, anyone? It doesn't sound right to me. Does anyone have any clues about what might be going on? Buggy kernel drivers? Buggy kernel, come to think of it? Does a SAN just not provide adequate performance for a large database? Tim, Here are the areas I would look at first if we're considering hardware to be the problem: HBA and driver: Since this is a Intel/Linux system, the HBA is PROBABLY a qlogic. I would need to know the SAN model to see what the backend of the SAN is itself. EMC has some FC-attach models that actually have SATA disks underneath. You also might want to look at the cache size of the controllers on the SAN. As I noted in another thread, the HBA is an Emulex LP1050, and they have a rather old driver for it. I've recommended that they update ASAP. This hasn't happened yet. I know very little about the SAN itself - the customer hasn't provided any information other than the brand name, as they selected it and installed it themselves. I shall ask for more information. - Something also to note is that EMC provides a add-on called PowerPath for load balancing multiple HBAs. If they don't have this, it might be worth investigating. OK, thanks, I'll ask the customer whether they've used PowerPath at all. They do seem to have it installed on the machine, but I suppose that doesn't guarantee it's being used correctly. However, it looks like they have just the one HBA, so, if I've correctly understood what load balancing means in this context, it's not going to help; right? - As with anything, disk layout is important. With the lower end IBM SAN (DS4000) you actually have to operate on physical spindle level. On our 4300, when I create a LUN, I select the exact disks I want and which of the two controllers are the preferred path. On our DS6800, I just ask for storage. I THINK all the EMC models are the "ask for storage" type of scenario. However with the 6800, you select your storage across extent pools. Have they done any benchmarking of the SAN outside of postgres? Before we settle on a new LUN configuration, we always do the dd,umount,mount,dd routine. It's not a perfect test for databases but it will help you catch GROSS performance issues. I've done some dd'ing myself, as described in another thread. The results are not at all encouraging - their SAN seems to do about 20MB/s or less. SAN itself: - Could the SAN be oversubscribed? How many hosts and LUNs total do they have and what are the queue_depths for those hosts? With the qlogic card, you can set the queue depth in the BIOS of the adapter when the system is booting up. CTRL-Q I think. If the system has enough local DASD to relocate the database internally, it might be a valid test to do so and see if you can isolate the problem to the SAN itself. The SAN possibly is over-subscribed. Can you suggest any easy ways for me to find out? The customer has an IT department who look after their SANs, and they're not keen on outsiders poking their noses in. It's hard for me to get any direct access to the SAN itself. PG itself: If you think it's a pgsql configuration, I'm guessing you already configured postgresql.conf to match thiers (or at least a fraction of thiers since the memory isn't the same?). What about loading a "from-scratch" config file and restarting the tuning process? The pg configurations are not identical. However, given the differences in raw I/O speed observed, it doesn't seem likely that the difference in configuration is responsible. Yes, as you guessed, we set more conservative options on the less capable box. Doing proper double-blind tests on the customer box is difficult, as it is in production and the customer has a very low tolerance for downtime. Just a dump of my thought process from someone who's been spending too much time tuning his SAN and postgres lately. Thanks for all the suggestions, John. I'll keep trying to follow some of them up. Tim -- --- Tim Allen [EMAIL PROTECTED] Proximity Pty Ltd http://www.proximity.com.au/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SAN performance mystery
On Mon, Jun 19, 2006 at 08:09:47PM +1000, Tim Allen wrote: Certainly, the read performance of the SATA disk still beats the SAN, and there is no way to lie about read performance. Sure there is: you have the data cached in system RAM. I find it real hard to believe that you can sustain 161MB/s off a single SATA disk. Mike Stone ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SAN performance mystery
Scott Marlowe wrote: On Thu, 2006-06-15 at 16:50, Tim Allen wrote: We have a customer who are having performance problems. They have a large (36G+) postgres 8.1.3 database installed on an 8-way opteron with 8G RAM, attached to an EMC SAN via fibre-channel (I don't have details of the EMC SAN model, or the type of fibre-channel card at the moment). They're running RedHat ES3 (which means a 2.4.something Linux kernel). They are unhappy about their query performance. We've been doing various things to try to work out what we can do. One thing that has been apparent is that autovacuum has not been able to keep the database sufficiently tamed. A pg_dump/pg_restore cycle reduced the total database size from 81G to 36G. Performing the restore took about 23 hours. Do you have the ability to do any simple IO performance testing, like with bonnie++ (the old bonnie is not really capable of properly testing modern equipment, but bonnie++ will give you some idea of the throughput of the SAN) Or even just timing a dd write to the SAN? I've done some timed dd's. The timing results vary quite a bit, but it seems you can write to the SAN at about 20MB/s and read from it at about 12MB/s. Not an entirely scientific test, as I wasn't able to stop other activity on the machine, though I don't think much else was happening. Certainly not impressive figures, compared with our machine with the SATA disk (referred to below), which can get 161MB/s copying files on the same disk, and 48MB/s and 138Mb/s copying files from the sata disk respectively to and from a RAID5 array. The customer is a large organisation, with a large IT department who guard their turf carefully, so there is no way I could get away with installing any heavier duty testing tools like bonnie++ on their machine. We tried restoring the pg_dump output to one of our machines, a dual-core pentium D with a single SATA disk, no raid, I forget how much RAM but definitely much less than 8G. The restore took five hours. So it would seem that our machine, which on paper should be far less impressive than the customer's box, does more than four times the I/O performance. To simplify greatly - single local SATA disk beats EMC SAN by factor of four. Is that expected performance, anyone? It doesn't sound right to me. Does anyone have any clues about what might be going on? Buggy kernel drivers? Buggy kernel, come to think of it? Does a SAN just not provide adequate performance for a large database? Yes, this is not uncommon. It is very likely that your SATA disk is lying about fsync. I guess a sustained write will flood the disk's cache and negate the effect of the write-completion dishonesty. But I have no idea how large a copy would have to be to do that - can anyone suggest a figure? Certainly, the read performance of the SATA disk still beats the SAN, and there is no way to lie about read performance. What kind of backup are you using? insert statements or copy statements? If insert statements, then the difference is quite believable. If copy statements, less so. A binary pg_dump, which amounts to copy statements, if I'm not mistaken. Next time, on their big server, see if you can try a restore with fsync turned off and see if that makes the restore faster. Note you should turn fsync back on after the restore, as running without it is quite dangerous should you suffer a power outage. How are you mounting to the EMC SAN? NFS, iSCSI? Other? iSCSI, I believe. Some variant of SCSI, anyway, of that I'm certain. The conclusion I'm drawing here is that this SAN does not perform at all well, and is not a good database platform. It's sounding from replies from other people that this might be a general property of SAN's, or at least the ones that are not stratospherically priced. Tim -- --- Tim Allen [EMAIL PROTECTED] Proximity Pty Ltd http://www.proximity.com.au/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster