Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
Scott Marlowe schrieb: On Tue, Dec 2, 2008 at 2:22 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote: Has anyone benchmarked this controller (PCIe/4x, 512 MB BBC)? We try to use it with 8x SATA 1TB drives in RAID-5 mode under Linux, and measure strange values. An individual drive is capable of delivering 91 MB/sec sequential read performance, and we get values ~102MB/sec out of a 8-drive RAID5, seems to be ridiculous slow. Write performance seems to be much better, ~300 MB /sec - seems ok to me. I guess I must be doing something wrong, I cannot believe that a 500 € controller is delivering such poor performance. A few suggestions... Try to find the latest driver for your card, try using the card as nothing but a caching controller and run your RAID on software in linux (or whatever IS you're on). Test a 2 drive RAID-0 to see what kind of performance increase you get. If you can't dd a big file off of a RAID-0 at about 2x the rate of a single drive then something IS wrong with it. Try RAID 10. Try RAID-1 sets on the controller and RAID 0 over that in software. I've already tried Softraid with individual drives, performs much better. However, it's no option to use softraid, so I'm stuck. The card has the latest firmware installed, and there are no drivers needed, they're already included in the linux kernel. I still think we must be doing something wrong here, I googled the controller and Linux, and did not find anything indicating a problem. The HP SmartArray series is quite common, so a lot of users would have the same problem. Thanks! -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
Kevin Grittner schrieb: Mario Weilguni <[EMAIL PROTECTED]> wrote: Has anyone benchmarked this controller (PCIe/4x, 512 MB BBC)? We try to use it with 8x SATA 1TB drives in RAID-5 mode under Linux, and measure strange values. An individual drive is capable of delivering 91 MB/sec sequential read performance, and we get values ~102MB/sec out of a 8-drive RAID5, seems to be ridiculous slow. Write performance seems to be much better, ~300 MB /sec - seems ok to me. What's your stripe size? -Kevin We used the default settings, it's 64k. Might a bigger value help here? -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
Alan Hodgson schrieb: Mario Weilguni <[EMAIL PROTECTED]> wrote: strange values. An individual drive is capable of delivering 91 MB/sec sequential read performance, and we get values ~102MB/sec out of a 8-drive RAID5, seems to be ridiculous slow. What command are you using to test the reads? Some recommendations to try: 1) /sbin/blockdev --setra 2048 device (where device is the partition or LVM volume) 2) Use XFS, and make sure your stripe settings match the RAID. Having said that, 102MB/sec sounds really low for any modern controller with 8 drives, regardless of tuning or filesystem choice. First, thanks alot for this and all the other answers. I measured the raw device performance: dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null I get poor performance when all 8 drives are configured as one, large RAID-5, and slightly poorer performance when configured as JBOD. In production, we use XFS as FS, but I doubt this has anything to do with FS tuning. -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Greg Smith <[EMAIL PROTECTED]> writes: > On Mon, 8 Dec 2008, Merlin Moncure wrote: > >> I wonder if shared_buffers has any effect on how far you can go before >> you hit the 'tipping point'. > > If your operating system has any reasonable caching itself, not so much at > first. As long as the index on the account table fits in shared_buffers, even > the basic sort of caching logic an OS uses is perfectly functional for > swapping > the individual pages of the account table in and out, the main limiting factor > on pgbench performance. I would expect higher shared_buffers to raise the curve before the first breakpoint but after the first breakpoint make the drop steeper and deeper. The equilibrium where the curve becomes flatter should be lower. That is, as long as the database fits entirely in RAM having more of the buffers be immediately in shared buffers is better. Once there's contention for the precious cache stealing some of it for duplicated buffers will only hurt. > There is a further out tipping point I've theorized about but not really > explored: the point where even the database indexes stop fitting in memory > usefully. As you get closer to that, I'd expect that the clock sweep > algorithm > used by shared_buffers should make it a bit more likely that those important > blocks would hang around usefully if you put them there, rather than giving > most of the memory to the OS to manage. Hm, sounds logical. At that point the slow drop-off should become even shallower and possibly become completely flat. Greater shared_buffers might start helping again at that point. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Greg Smith wrote: | On Mon, 8 Dec 2008, Merlin Moncure wrote: | |> I wonder if shared_buffers has any effect on how far you can go before |> you hit the 'tipping point'. | | If your operating system has any reasonable caching itself, not so much at | first. As long as the index on the account table fits in shared_buffers, | even the basic sort of caching logic an OS uses is perfectly functional | for swapping the individual pages of the account table in and out, the | main limiting factor on pgbench performance. | | There is a further out tipping point I've theorized about but not really | explored: the point where even the database indexes stop fitting in | memory usefully. As you get closer to that, I'd expect that the clock | sweep algorithm used by shared_buffers should make it a bit more likely | that those important blocks would hang around usefully if you put them | there, rather than giving most of the memory to the OS to manage. I am by no means an expert at this. But one thing that can matter is whether you want to improve just the performance of the dbms, or the performance of the entire system, on which the dbms runs. Because if you want to improve the whole system, you would want as much of the caching to take place in the system's buffers so the use of the memory could be optimized over the entire workload, not just the load of the dbms itself. I suppose on a dedicated system with only one dbms running with only one database open (at a time, anyway), this might be moot, but not otherwise. Now I agree that it would be good to get the entire index (or at least the working set of the index) into the memory of the computer. But does it really matter if it is in the system's cache, or the postgres cache? Is it any more likely to be in postgres's cache than in the system cache if the system is hurting for memory? I would think the system would be equally likely to page out "idle" pages no matter where they are unless they are locked to memory, and I do not know if all operating systems can do this, and even if they can, I do not know if postgres uses that ability. I doubt it, since I believe (at least in Linux) a process can do that only if run as root, which I imagine few (if any) users do. | | Since the data is about 7.5X as large as the indexes, that point is way | further out than the basic bottlenecks. And if you graph pgbench results | on a scale that usefully shows the results for in-memory TPS scores, you | can barely see that part of the chart a well. One day I may get to | mapping that out better, and if I do it will be interesting to see if the | balance of shared_buffers to OS cache works the way I expect. I was | waiting until I finished the pgtune program for that, that's building some | of the guts I wanted to make it easier to tweak postgresql.conf settings | programmatically in between pgbench runs. | | -- | * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD | - -- ~ .~. Jean-David Beyer Registered Linux User 85642. ~ /V\ PGP-Key: 9A2FC99A Registered Machine 241939. ~ /( )\ Shrewsbury, New Jerseyhttp://counter.li.org ~ ^^-^^ 07:55:02 up 5 days, 18:13, 4 users, load average: 4.18, 4.17, 4.11 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org iD8DBQFJPm2+Ptu2XpovyZoRAlcJAKCIN098quZKZ7MfAs3MOkuL3WWxrQCdHCVl sUQoIVleRWVLvcMZoihztpE= =n6uO -END PGP SIGNATURE- -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Tuesday 09 December 2008 13:08:14 Jean-David Beyer wrote: > > and even if they can, I do not know if postgres uses that ability. I doubt > it, since I believe (at least in Linux) a process can do that only if run > as root, which I imagine few (if any) users do. Disclaimer: I'm not a system programmer... I believe that at Linux kernel revision 2.6.8 and before processes need Posix capability CAP_IPC_LOCK, and 2.6.9 and after they need CAP_IPC_LOCK to lock more than RLIMIT_MEMLOCK. It is a capability, so a process can run as any user assuming it is started with or gained the capability. No idea if Postgres uses any of this, other than to protect security of certain password operations there is probably not much point. If key parts of your database are being paged out, get more RAM, if idle parts of your database are paged out, you probably could more usefully apply that RAM for something else. The Varnish cache design is the place to look for enlightenment on relying on the kernel paging (using memory mapped files) rather than trying to do it yourself, but then a proxy server is a lot simpler than a RDBMS. That said, Varnish is fast at what it does (reverse HTTP proxy) ! -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] file system and raid performance
Scott Marlowe wrote: > On Sun, Dec 7, 2008 at 10:59 PM, M. Edward (Ed) Borasky > <[EMAIL PROTECTED]> wrote: >> Ah, but shouldn't a PostgreSQL (or any other database, for that matter) >> have its own set of filesystems tuned to the application's I/O patterns? >> Sure, there are some people who need to have all of their eggs in one >> basket because they can't afford multiple baskets. For them, maybe the >> OS defaults are the right choice. But if you're building a >> database-specific server, you can optimize the I/O for that. > > It's really about a cost / benefits analysis. 20 years ago file > systems were slow and buggy and a database could, with little work, > outperform them. Nowadays, not so much. I'm guessing that the extra > cost and effort of maintaining a file system for pgsql outweighs any > real gain you're likely to see performance wise. > > But I'm sure that if you implemented one that outran XFS / ZFS / ext3 > et. al. people would want to hear about it. > I guess I wasn't clear -- I didn't mean a PostgreSQL-specific filesystem design, although BTRFS does have some things that are "RDBMS-friendly". I meant that one should hand-tune existing filesystems / hardware for optimum performance on specific workloads. The tablespaces in PostgreSQL give you that kind of potential granularity, I think. -- M. Edward (Ed) Borasky, FBG, AB, PTA, PGS, MS, MNLP, NST, ACMC(P), WOM "A mathematician is a device for turning coffee into theorems." -- Alfréd Rényi via Paul Erdős -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Sun, Dec 7, 2008 at 7:38 PM, Josh Berkus <[EMAIL PROTECTED]> wrote: > > Also, the following patches currently still have bugs, but when the bugs are > fixed I'll be looking for performance testers, so please either watch the > wiki or watch this space: >... > -- posix_fadvise (Gregory Stark) Eh? Quite possibly but none that I'm aware of. The only problem is a couple of trivial bits of bitrot. I'll a post an update now if you want. -- greg -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
On Tue, Dec 9, 2008 at 5:17 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote: > Alan Hodgson schrieb: >> >> Mario Weilguni <[EMAIL PROTECTED]> wrote: >> strange values. An individual drive is capable of delivering 91 MB/sec sequential read performance, and we get values ~102MB/sec out of a 8-drive RAID5, seems to be ridiculous slow. >> >> >> What command are you using to test the reads? >> >> Some recommendations to try: >> >> 1) /sbin/blockdev --setra 2048 device (where device is the partition or >> LVM volume) >> >> 2) Use XFS, and make sure your stripe settings match the RAID. >> >> Having said that, 102MB/sec sounds really low for any modern controller >> with 8 drives, regardless of tuning or filesystem choice. >> >> > > First, thanks alot for this and all the other answers. > > I measured the raw device performance: > dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null > > I get poor performance when all 8 drives are configured as one, large > RAID-5, and slightly poorer performance when configured as JBOD. In > production, we use XFS as FS, but I doubt this has anything to do with FS > tuning. Yeah, having just trawled the pgsql-performance archives, there are plenty of instances of people having terrible performance from HP smart array controllers before the P800. Is it possible for you to trade up to a better RAID controller? Whichever salesman sold you the P400 should take one for the team and make this right for you. -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
On Tue, 2008-12-09 at 13:10 +0100, Mario Weilguni wrote: > Scott Marlowe schrieb: > > On Tue, Dec 2, 2008 at 2:22 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote: > I still think we must be doing something wrong here, I googled the > controller and Linux, and did not find anything indicating a problem. > The HP SmartArray series is quite common, so a lot of users would have > the same problem. Yes the SmartArray series is quite common and actually know to perform reasonably well, in RAID 10. You still appear to be trying RAID 5. Joshua D. Drake > > Thanks! > -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
We reached a fairly good performance on a P400 controller (8 SATA 146GB 2,5" 10k rpm) with raid5 or raid6 Linux software raid: the writing bandwidth reached about 140 MB/s sustained throughput (the hardware raid5 gave a sustained 20 MB/s...). With a second, equal controller (16 disks) we reached (raid6 spanning all 16 disks) about 200 MB/s sustained. The CPU load is negligible. Reading performance is about 20% better. Best regards and my apologies for my bad English. GT P.S.: on a P800, 12 SATA 750GB 3,5" 7200 rpm, the hardware raid5 writing performance was about 30 MB/s, software raid5 is between 60 and 80 MB/s. Scott Marlowe ha scritto: On Tue, Dec 9, 2008 at 5:17 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote: Alan Hodgson schrieb: Mario Weilguni <[EMAIL PROTECTED]> wrote: strange values. An individual drive is capable of delivering 91 MB/sec sequential read performance, and we get values ~102MB/sec out of a 8-drive RAID5, seems to be ridiculous slow. What command are you using to test the reads? Some recommendations to try: 1) /sbin/blockdev --setra 2048 device (where device is the partition or LVM volume) 2) Use XFS, and make sure your stripe settings match the RAID. Having said that, 102MB/sec sounds really low for any modern controller with 8 drives, regardless of tuning or filesystem choice. First, thanks alot for this and all the other answers. I measured the raw device performance: dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null I get poor performance when all 8 drives are configured as one, large RAID-5, and slightly poorer performance when configured as JBOD. In production, we use XFS as FS, but I doubt this has anything to do with FS tuning. Yeah, having just trawled the pgsql-performance archives, there are plenty of instances of people having terrible performance from HP smart array controllers before the P800. Is it possible for you to trade up to a better RAID controller? Whichever salesman sold you the P400 should take one for the team and make this right for you. smime.p7s Description: S/MIME Cryptographic Signature
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
* Joshua D. Drake <[EMAIL PROTECTED]> [081209 11:01]: > Yes the SmartArray series is quite common and actually know to perform > reasonably well, in RAID 10. You still appear to be trying RAID 5. *boggle* Are people *still* using raid5? /me gives up! -- Aidan Van Dyk Create like a god, [EMAIL PROTECTED] command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
On Tue, Dec 9, 2008 at 9:03 AM, Gabriele Turchi <[EMAIL PROTECTED]> wrote: > We reached a fairly good performance on a P400 controller (8 SATA 146GB 2,5" > 10k rpm) with raid5 or raid6 Linux software raid: the writing bandwidth > reached about 140 MB/s sustained throughput (the hardware raid5 gave a > sustained 20 MB/s...). With a second, equal controller (16 disks) we reached > (raid6 spanning all 16 disks) about 200 MB/s sustained. That's better than you were getting but still quite slow. I was bothered that my 12x15k4 SAS RAID-10 array could only sustain about 350Megs/second sequential read, thinking that each drive should be able to approach 80 or so megs/second and I was only getting about 60... This sounds more and more like HP is trying to undercompete along with Dell in the RAID controller market or at least the raid controller driver market. -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
Aidan Van Dyk wrote: * Joshua D. Drake <[EMAIL PROTECTED]> [081209 11:01]: Yes the SmartArray series is quite common and actually know to perform reasonably well, in RAID 10. You still appear to be trying RAID 5. *boggle* Are people *still* using raid5? /me gives up! What do you suggest when there is not enough room for a RAID 10? -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
* Peter Eisentraut <[EMAIL PROTECTED]> [081209 11:28]: > What do you suggest when there is not enough room for a RAID 10? More disks ;-) But if you've given up on performance and reliability in favour of cheaper storage, I guess raid5 is ok. But then I'm not sure what the point of asking about it's poor performance is... a. -- Aidan Van Dyk Create like a god, [EMAIL PROTECTED] command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [PERFORM] Need help with 8.4 Performance Testing
> > From: [EMAIL PROTECTED] [EMAIL PROTECTED] On Behalf Of > Jean-David Beyer > [EMAIL PROTECTED] > Sent: Tuesday, December 09, 2008 5:08 AM > To: [email protected] > Subject: Re: [PERFORM] Need help with 8.4 Performance Testing > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > But one thing that can matter is whether you want to improve just the > performance of the dbms, or the performance of the entire system, on which > the dbms runs. Because if you want to improve the whole system, you would > want as much of the caching to take place in the system's buffers so the use > of the memory could be optimized over the entire workload, not just the load > of the dbms itself. I suppose on a dedicated system with only one dbms > running with only one database open (at a time, anyway), this might be moot, > but not otherwise. Yes, the OS is in better position to arbitrate between multiple things. Of course, we aren't talking about the highest performance databases if we are talking about mixed use systems though. Additionally, the OS can never really get it right, with a DB or other apps. Any app can behave badly and grab too much RAM and access it regularly enough for it to not be 'idle' much but give the OS VM fits trying to figure out if its important or not versus other processes. > Now I agree that it would be good to get the entire index (or at least the > working set of the index) into the memory of the computer. But does it > really matter if it is in the system's cache, or the postgres cache? Is it > any more likely to be in postgres's cache than in the system cache if the > system is hurting for memory? I would think the system would be equally > likely to page out "idle" pages no matter where they are unless they are > locked to memory, and I do not know if all operating systems can do this, > and even if they can, I do not know if postgres uses that ability. I doubt > it, since I believe (at least in Linux) a process can do that only if run as > root, which I imagine few (if any) users do. The problem, is when none of them are really 'idle'. When the OS has to decide which pages, all of which have been accessed recently, to evict. Most OS's will make bad choices if the load is mixed random and sequential access, as they treat all pages equally with respect to freshness versus eviction. Another problem is that there IS a difference between being in postgres' cache and the OS cache. One is more expensive to retrieve than the other. Significantly. Aaccessing buffers in shared_buffers, in process, uses a good chunk less CPU (and data copy and shared buffer eviction overhead) than going over the sys call to the OS. And as far as I can tell, even after the 8.4 fadvise patch, all I/O is in block_size chunks. (hopefully I am wrong) My system is now CPU bound, the I/O can do sequential reads of more than 1.2GB/sec but Postgres can't do a seqscan 30% as fast because it eats up CPU like crazy just reading and identifying tuples. It does seqscans ~ 25% faster if its from shared_buffers than from the OS's page cache though. Seqscans are between 250MB/sec and 400MB/sec peak, from mem or disk, typically showing no more than 35% iostat utilization of the array if off disk -- so we run a few concurrently where we can. In addition to the fadvise patch, postgres needs to merge adjacent I/O's into larger ones to reduce the overhead. It only really needs to merge up to sizes of about 128k or 256k, and gain a 8x to 16x drop in syscall overhead, and additionally potentially save code trips down the shared buffer management code paths. At lest, thats my guess I haven't looked at any code and could be wrong. Additionally, the "If your operating system has any reasonable caching itself" comment earlier in this conversation --- Linux (2.6.18, Centos 5.2) does NOT. I can easily make it spend 100% CPU in system time trying to figure out what to do with the system cache for an hour. Just do large seqscans with memory pressure from work_mem or other forces that the OS will not deem 'idle'. Once the requested memory is ~75% of the system total, it will freak out. Linux simply will not give up that last 25% or so of the RAM for anything but page cache, even though the disk subsustem is very fast and most of the access is sequential, marginalizing the benefit of the cache. Depending on how you tune it, it will either spin system cpu or swap storm, but the system cpu spin times for the same work load are a lot shorter than an equivalent swap storm. Mount the data drive in O_DIRECT and the problem vanishes. I've been told that this problem may be gone in some of the latest kernels. I have seriously considered bumping shared_buffers up a lot and mounting the thing direct -- but then we lose the useful scheduler and readahead algorithms. The other way around (small shared_buffers, let the OS do it) hurts performa
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
On Tue, 2008-12-09 at 18:27 +0200, Peter Eisentraut wrote: > Aidan Van Dyk wrote: > > * Joshua D. Drake <[EMAIL PROTECTED]> [081209 11:01]: > > > >> Yes the SmartArray series is quite common and actually know to perform > >> reasonably well, in RAID 10. You still appear to be trying RAID 5. > > > > *boggle* > > > > Are people *still* using raid5? > > > > /me gives up! > > What do you suggest when there is not enough room for a RAID 10? RAID 1. Joshua D. Drake > -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
On Tue, 2008-12-09 at 09:25 -0700, Scott Marlowe wrote: > On Tue, Dec 9, 2008 at 9:03 AM, Gabriele Turchi > <[EMAIL PROTECTED]> wrote: > > We reached a fairly good performance on a P400 controller (8 SATA 146GB 2,5" > > 10k rpm) with raid5 or raid6 Linux software raid: the writing bandwidth > > reached about 140 MB/s sustained throughput (the hardware raid5 gave a > > sustained 20 MB/s...). With a second, equal controller (16 disks) we reached > > (raid6 spanning all 16 disks) about 200 MB/s sustained. > > That's better than you were getting but still quite slow. I was > bothered that my 12x15k4 SAS RAID-10 array could only sustain about > 350Megs/second sequential read, thinking that each drive should be > able to approach 80 or so megs/second and I was only getting about > 60... > > This sounds more and more like HP is trying to undercompete along with > Dell in the RAID controller market or at least the raid controller > driver market. It is certainly possible. The 400 is the higher end of the lower end with HP... 200, 400, 600, 800 (800 is a nice controller). Joshua D. Drake > -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Tue, Dec 9, 2008 at 9:37 AM, Scott Carey <[EMAIL PROTECTED]> wrote: > As for tipping points and pg_bench -- It doesn't seem to reflect the kind of > workload we use postgres for at all, though my workload does a lot of big > hashes and seqscans, and I'm curious how much improved those may be due to > the hash improvements. 32GB RAM and 3TB data (about 250GB scanned regularly) > here. And yes, we are almost completely CPU bound now except for a few > tasks. Iostat only reports above 65% disk utilization for about 5% of the > workload duty-cycle, and is regularly < 20%. COPY doesn't get anywhere near > platter speeds, on indexless bulk transfer. The highest disk usage spikes > occur when some of our radom-access data/indexes get shoved out of cache. > These aren't too large, but high enough seqscan load will cause postgres and > the OS to dump them from cache. If we put these on some SSD's the disk > utilization % would drop a lot further. It definitely reflects our usage pattern, which is very random and involves tiny bits of data scattered throughout the database. Our current database is about 20-25 Gig, which means it's quickly reaching the point where it will not fit in our 32G of ram, and it's likely to grow too big for 64Gig before a year or two is out. > I feel confident in saying that in about a year, I could spec out a medium > sized budget for hardware ($25k) for almost any postgres setup and make it > almost pure CPU bound. > SSDs and hybrid tech such as ZFS L2ARC make this possible with easy access to > 10k+ iops, and it it will take no more than 12 SATA drives in raid 10 next > year (and a good controller or software raid) to get 1GB/sec sequential reads. Lucky you, having needs that are fulfilled by sequential reads. :) I wonder how many hard drives it would take to be CPU bound on random access patterns? About 40 to 60? And probably 15k / SAS drives to boot. Cause that's what we're looking at in the next few years where I work. -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Tue, 2008-12-09 at 10:21 -0700, Scott Marlowe wrote: > On Tue, Dec 9, 2008 at 9:37 AM, Scott Carey <[EMAIL PROTECTED]> wrote: > Lucky you, having needs that are fulfilled by sequential reads. :) > > I wonder how many hard drives it would take to be CPU bound on random > access patterns? About 40 to 60? And probably 15k / SAS drives to > boot. Cause that's what we're looking at in the next few years where > I work. I was able to achieve only 10-20% IO/Wait even after beating the heck out of the machine with 50 spindles (of course it does have 16 CPUs): http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/ > -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
> Lucky you, having needs that are fulfilled by sequential reads. :) > I wonder how many hard drives it would take to be CPU bound on random > access patterns? About 40 to 60? And probably 15k / SAS drives to > boot. Cause that's what we're looking at in the next few years where > I work. About $3000 worth of Intel --- mainstream SSD's = 240GB space (6 in raid 10) today, 2x to 3x that storage area in 1 year. Random reads are even easier, provided you don't need more than 500GB or so. And with something like ZFS + L2ARC you can back your data with large slow iops disks and have cache access to data without requiring mirrors on the cache ($3k of ssds for that covers 2x the area, then). -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Tue, 9 Dec 2008, Scott Marlowe wrote: I wonder how many hard drives it would take to be CPU bound on random access patterns? About 40 to 60? And probably 15k / SAS drives to boot. Cause that's what we're looking at in the next few years where I work. There's a problem with that thinking. That is, in order to exercise many spindles, you will need to have just as many (if not more) concurrent requests. And if you have many concurrent requests, then you can spread them over multiple CPUs. So it's more a case of "How many hard drives PER CPU". It also becomes a matter of whether Postgres can scale that well. Matthew -- Those who do not understand Unix are condemned to reinvent it, poorly. -- Henry Spencer -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Tue, Dec 9, 2008 at 10:35 AM, Matthew Wakeling <[EMAIL PROTECTED]> wrote: > On Tue, 9 Dec 2008, Scott Marlowe wrote: >> >> I wonder how many hard drives it would take to be CPU bound on random >> access patterns? About 40 to 60? And probably 15k / SAS drives to >> boot. Cause that's what we're looking at in the next few years where >> I work. > > There's a problem with that thinking. That is, in order to exercise many > spindles, you will need to have just as many (if not more) concurrent > requests. And if you have many concurrent requests, then you can spread them > over multiple CPUs. So it's more a case of "How many hard drives PER CPU". > It also becomes a matter of whether Postgres can scale that well. For us, all that is true. We typically have a dozen or more concurrent requests running at once. We'll likely see that increase linearly with our increase in users over the next year or so. We bought the machines with dual quad core opterons knowing the 6,8 and 12 core opterons were due out on the same socket design in the next year or so and we could upgrade those too if needed. PostgreSQL seems to scale well in most tests I've seen to at least 16 cores, and after that it's anyone's guess. The Sparc Niagra seems capable of scaling to 32 threads on 8 cores with pgsql 8.2 quite well. I worry about the linux kernel scaling that well, and we might have to look at open solaris or something like the solaris kernel under ubuntu distro to get better scaling. -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Let me re-phrase this. For today, at 200GB or less of required space, and 500GB or less next year. "Where we're going, we don't NEED spindles." Seriously, go down to the store and get 6 X25-M's, they're as cheap as $550 each and will be sub $500 soon. These are more than sufficient for all but heavy write workloads (each can withstand ~600+ TB of writes in a lifetime, and SMART will tell you before they go). 6 in a RAID 10 will give you 750MB/sec read, and equivalent MB/sec in random reads. I've tested them. Random writes are very very fast too, faster than any SAS drive. Put this in your current system, and you won't need to upgrade the RAM unless you need items in cache to reduce CPU load or need it for the work_mem space. Spindles will soon be only for capacity and sequential access performance requirements. Solid state will be for IOPS, and I would argue that for most Postgres installations, already is (now that the Intel SSD drive, which does random writes and read/write concurrency well, has arrived - more such next gen drives are on the way). On 12/9/08 9:28 AM, "Scott Carey" <[EMAIL PROTECTED]> wrote: > Lucky you, having needs that are fulfilled by sequential reads. :) > I wonder how many hard drives it would take to be CPU bound on random > access patterns? About 40 to 60? And probably 15k / SAS drives to > boot. Cause that's what we're looking at in the next few years where > I work. About $3000 worth of Intel --- mainstream SSD's = 240GB space (6 in raid 10) today, 2x to 3x that storage area in 1 year. Random reads are even easier, provided you don't need more than 500GB or so. And with something like ZFS + L2ARC you can back your data with large slow iops disks and have cache access to data without requiring mirrors on the cache ($3k of ssds for that covers 2x the area, then). -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Tue, Dec 9, 2008 at 11:01 AM, Scott Carey <[EMAIL PROTECTED]> wrote: > Let me re-phrase this. > > For today, at 200GB or less of required space, and 500GB or less next year. > > "Where we're going, we don't NEED spindles." Those intel SSDs sound compelling. I've been waiting for SSDs to get competitive price and performance wise for a while, and when the intels came out and I read the first benchmarks I immediately began scheming. Sadly, that was right after we're ordered our new 16 drive servers, and I didn't have time to try something new and hope it would work. Now that the servers are up and running, we'll probably look at adding the SSDs next summer before our high load period begins. -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Tue, 2008-12-09 at 11:08 -0700, Scott Marlowe wrote: > On Tue, Dec 9, 2008 at 11:01 AM, Scott Carey <[EMAIL PROTECTED]> wrote: > > Let me re-phrase this. > > > > For today, at 200GB or less of required space, and 500GB or less next year. > > > > "Where we're going, we don't NEED spindles." > > Those intel SSDs sound compelling. I've been waiting for SSDs to get > competitive price and performance wise for a while, and when the > intels came out and I read the first benchmarks I immediately began > scheming. Sadly, that was right after we're ordered our new 16 drive > servers, and I didn't have time to try something new and hope it would > work. Now that the servers are up and running, we'll probably look at > adding the SSDs next summer before our high load period begins. > The idea of SSDs is interesting. However I think I will wait for all the other early adopters to figure out the problems before I start suggesting them to clients. Hard drives work, their cheap and fast. I can get 25 spindles, 15k in a 3U with controller and battery backed cache for <$10k. Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Tue, Dec 9, 2008 at 1:11 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote: > Hard drives work, their cheap and fast. I can get 25 spindles, 15k in a > 3U with controller and battery backed cache for <$10k. While I agree with your general sentiments about early adoption, etc (the intel ssd products are the first flash drives that appear to have real promise in the enterprise), the numbers tell a different story. A *single* X25-E will give similar sustained write IOPS as your tray for far less price and a much better worst case read latency. All this without the 25 sets of whizzing ball bearings, painful spin-up times, fanning, RAID controller firmware, and various other sundry technologies to make the whole thing work. The main issue that I see with flash SSD is if the promised wear lifetimes are believable in high load environments and the mechanism of failure (slowly degrade into read only) is accurate. So, at least in relative terms, 15k sas drives are not 'fast'. They are terribly, awfully, painfully slow. They are also not cheap in terms of $/IOPS. The end is near. merlin -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Tue, 2008-12-09 at 15:07 -0500, Merlin Moncure wrote: > On Tue, Dec 9, 2008 at 1:11 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote: > > Hard drives work, their cheap and fast. I can get 25 spindles, 15k in a > > 3U with controller and battery backed cache for <$10k. > > While I agree with your general sentiments about early adoption, etc > (the intel ssd products are the first flash drives that appear to have > real promise in the enterprise), the numbers tell a different story. Oh I have read about them and I am excited. I am just saying that there are plenty of people who can take advantage of the unknown without the worry of the pain that can cause. My client, can't. > > The main issue that I see with flash SSD is if the promised wear > lifetimes are believable in high load environments and the mechanism > of failure (slowly degrade into read only) is accurate. > Right. > So, at least in relative terms, 15k sas drives are not 'fast'. They > are terribly, awfully, painfully slow. They are also not cheap in > terms of $/IOPS. The end is near. > No doubt about it. I give it 24 months tops. Joshua D. Drake > merlin > -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Which brings this back around to the point I care the most about: I/O per second will diminish as the most common database performance limiting factor in Postgres 8.4's lifetime, and become almost irrelevant in 8.5's. Becoming more CPU efficient will become very important, and for some, already is. The community needs to be proactive on this front. This turns a lot of old assumptions on their head, from the database down through the OS and filesystem. We're bound to run into many surprises due to this major shift in something that has had its performance characteristics taken for granted for decades. > On 12/9/08 12:20 PM, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > > So, at least in relative terms, 15k sas drives are not 'fast'. They > > are terribly, awfully, painfully slow. They are also not cheap in > > terms of $/IOPS. The end is near. > > > No doubt about it. I give it 24 months tops. > > Joshua D. Drake
[PERFORM] query plan with index having a btrim is different for strings of different length
Hi,
I've discovered a peculiarity with using btrim in an index and was
wondering if anyone has any input.
My table is like this:
Table "public.m_object_paper"
Column| Type | Modifiers
-++
id | integer| not null
title | character varying(200) | not null
x_firstname | character varying(50) |
x_lastname | character varying(50) |
<...snip...>
page_count | smallint |
compare_to_database | bit varying| not null
Indexes:
"m_object_paper_pkey" PRIMARY KEY, btree (id)
"last_name_fnc_idx" btree (lower(btrim(x_lastname::text)))
"m_object_paper_assignment_idx" btree (assignment)
"m_object_paper_owner_idx" btree (owner) CLUSTER
<...snip to end...>
My query is like this:
SELECT m_object_paper.id FROM m_object_paper, m_assignment WHERE
m_object_paper.assignment = m_assignment.id AND
m_object_paper.owner=-1 AND m_assignment.class = 2450798 AND
lower(btrim(x_firstname)) = lower(btrim($FIRSTNAME)) and
lower(btrim(x_lastname)) = lower(btrim($LASTNAME));
Strangely, if $LASTNAME is 5 chars, the query plan looks like this:
tii=# explain SELECT m_object_paper.id FROM m_object_paper,
m_assignment WHERE m_object_paper.assignment = m_assignment.id AND
m_object_paper.owner=-1 AND m_assignment.class = 2450798 AND
lower(btrim(x_firstname)) = lower(btrim('Jordan')) and
lower(btrim(x_lastname)) = lower(btrim('Smith'));
QUERY PLAN
---
Hash Join (cost=181704.85..291551.77 rows=1 width=4)
Hash Cond: (m_object_paper.assignment = m_assignment.id)
-> Bitmap Heap Scan on m_object_paper (cost=181524.86..291369.66
rows=562 width=8)
Recheck Cond: ((lower(btrim((x_lastname)::text)) =
'smith'::text) AND (owner = (-1)))
Filter: (lower(btrim((x_firstname)::text)) = 'jordan'::text)
-> BitmapAnd (cost=181524.86..181524.86 rows=112429 width=0)
-> Bitmap Index Scan on last_name_fnc_idx
(cost=0.00..5468.29 rows=496740 width=0)
Index Cond: (lower(btrim((x_lastname)::text)) =
'smith'::text)
-> Bitmap Index Scan on m_object_paper_owner_idx
(cost=0.00..176056.04 rows=16061244 width=0)
Index Cond: (owner = (-1))
-> Hash (cost=177.82..177.82 rows=174 width=4)
-> Index Scan using m_assignment_class_idx on m_assignment
(cost=0.00..177.82 rows=174 width=4)
Index Cond: (class = 2450798)
(13 rows)
However, if $LASTNAME is != 5 chars (1 char, 100 chars, doesn't make a
difference), the query plan looks like this:
tii=# explain SELECT m_object_paper.id FROM m_object_paper,
m_assignment WHERE m_object_paper.assignment = m_assignment.id AND
m_object_paper.owner=-1 AND m_assignment.class = 2450798 AND
lower(btrim(x_firstname)) = lower(btrim('Jordan')) and
lower(btrim(x_lastname)) = lower(btrim('Smithers'));
QUERY PLAN
---
Nested Loop (cost=0.00..10141.06 rows=1 width=4)
-> Index Scan using last_name_fnc_idx on m_object_paper
(cost=0.00..10114.24 rows=11 width=8)
Index Cond: (lower(btrim((x_lastname)::text)) =
'smithers'::text)
Filter: ((owner = (-1)) AND
(lower(btrim((x_firstname)::text)) = 'jordan'::text))
-> Index Scan using m_assignment_pkey on m_assignment
(cost=0.00..2.43 rows=1 width=4)
Index Cond: (m_assignment.id = m_object_paper.assignment)
Filter: (m_assignment.class = 2450798)
(7 rows)
In practice, the difference is 300+ seconds when $LASTNAME == 5 chars
and <1 second when $LASTNAME != 5 chars.
Would anyone know what's going on here? Is there something about the
way btrim works, or perhaps with the way indexes are created? It's
strange that the query plan would change for just one case ("Jones,"
"Smith," "Brown," etc., all cause the query plan to use that extra
heap scan).
Thanks for any help!
--Richard
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Scott Carey <[EMAIL PROTECTED]> writes: > Which brings this back around to the point I care the most about: > I/O per second will diminish as the most common database performance limiting > factor in Postgres 8.4's lifetime, and become almost irrelevant in 8.5's. > Becoming more CPU efficient will become very important, and for some, already > is. The community needs to be proactive on this front. > This turns a lot of old assumptions on their head, from the database down > through the OS and filesystem. We're bound to run into many surprises due to > this major shift in something that has had its performance characteristics > taken for granted for decades. Hmm ... I wonder whether this means that the current work on parallelizing I/O (the posix_fadvise patch in particular) is a dead end. Because what that is basically going to do is expend more CPU to improve I/O efficiency. If you believe this thesis then that's not the road we want to go down. regards, tom lane -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query plan with index having a btrim is different for strings of different length
On Tue, Dec 9, 2008 at 2:56 PM, Richard Yen <[EMAIL PROTECTED]> wrote:
> In practice, the difference is 300+ seconds when $LASTNAME == 5 chars and <1
> second when $LASTNAME != 5 chars.
>
> Would anyone know what's going on here? Is there something about the way
> btrim works, or perhaps with the way indexes are created? It's strange that
> the query plan would change for just one case ("Jones," "Smith," "Brown,"
> etc., all cause the query plan to use that extra heap scan).
Those are likely common names, and may be showing up in the table
stats as common values, causing the planner to change things around.
Does this hold even for non-existent 5-character lastname strings?
Speaking of table statistics, might be worth upping the statistics
target on that table/column, analyzing, and seeing if you get
different results.
--
- David T. Wilson
[EMAIL PROTECTED]
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
> Hmm ... I wonder whether this means that the current work on > parallelizing I/O (the posix_fadvise patch in particular) is a dead > end. Because what that is basically going to do is expend more CPU > to improve I/O efficiency. If you believe this thesis then that's > not the road we want to go down. I don't believe the thesis. The gap between disk speeds and memory speeds may narrow over time, but I doubt it's likely to disappear altogether any time soon, and certainly not for all users. Besides which, I believe the CPU overhead of that patch is pretty darn small when the feature is not enabled. ...Robert -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Tue, 2008-12-09 at 17:38 -0500, Tom Lane wrote: > Scott Carey <[EMAIL PROTECTED]> writes: > > Which brings this back around to the point I care the most about: > > I/O per second will diminish as the most common database performance > > limiting factor in Postgres 8.4's lifetime, and become almost irrelevant in > > 8.5's. > > Becoming more CPU efficient will become very important, and for some, > > already is. The community needs to be proactive on this front. > > This turns a lot of old assumptions on their head, from the database down > > through the OS and filesystem. We're bound to run into many surprises due > > to this major shift in something that has had its performance > > characteristics taken for granted for decades. > > Hmm ... I wonder whether this means that the current work on > parallelizing I/O (the posix_fadvise patch in particular) is a dead > end. Because what that is basically going to do is expend more CPU > to improve I/O efficiency. If you believe this thesis then that's > not the road we want to go down. The per cpu performance increase against the per I/O system increase line is going to be vastly different. Anything that reduces overall I/O is going to help (remember, you can never go too fast). The idea that somehow I/O per second will diminish as the most common database performance factor is IMO a pipe dream. Even as good as SSDs are getting, they still have to go through the bus. Something CPUs are better at (especially those CPUs that connect to memory directly without the bus). In 5 years maybe, in the next two postgresql releases, not likely. Not to mention all of this is around the idea of a different class of hardware than 99% of our community will be running. Sincerely, Joshua D. Drake > > regards, tom lane > -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
On Tue, 9 Dec 2008, Robert Haas wrote: I don't believe the thesis. The gap between disk speeds and memory speeds may narrow over time, but I doubt it's likely to disappear altogether any time soon, and certainly not for all users. I think the "not for all users" is the critical part. In 2 years, we may (or may not) start using SSD instead of traditional drives for new installs, but we certainly won't be throwing out our existing servers any time soon just because something (much) better is now available. -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Tom Lane wrote: Scott Carey <[EMAIL PROTECTED]> writes: Which brings this back around to the point I care the most about: I/O per second will diminish as the most common database performance limiting factor in Postgres 8.4's lifetime, and become almost irrelevant in 8.5's. Becoming more CPU efficient will become very important, and for some, already is. The community needs to be proactive on this front. This turns a lot of old assumptions on their head, from the database down through the OS and filesystem. We're bound to run into many surprises due to this major shift in something that has had its performance characteristics taken for granted for decades. Hmm ... I wonder whether this means that the current work on parallelizing I/O (the posix_fadvise patch in particular) is a dead end. Because what that is basically going to do is expend more CPU to improve I/O efficiency. If you believe this thesis then that's not the road we want to go down. regards, tom lane What does the CPU/ Memory/Bus performance road map look like? Is the IO performance for storage device for what ever it be, going to be on par with the above to cause this problem? Once IO performance numbers start jumping up I think DBA will have the temptation start leaving more and more data in the production database instead of moving it out of the production database. Or start consolidating databases onto fewer servers . Again pushing more load onto the IO.
Re: [PERFORM] query plan with index having a btrim is different for strings of different length
Richard Yen <[EMAIL PROTECTED]> writes: > I've discovered a peculiarity with using btrim in an index and was > wondering if anyone has any input. What PG version is this? In particular, I'm wondering if it's one of the early 8.2.x releases, which had some bugs in and around choose_bitmap_and() that caused them to sometimes make weird choices of indexes in a BitmapAnd plan structure ... Like David, I'm pretty dubious that the behavior has anything to do with strings being 5 characters long exactly; but it could very much depend on whether the string you choose is a common last name or not. That would change the estimated number of matching rows and hence affect the apparent relative attractiveness of different indexes. regards, tom lane -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Prefetch CPU cost should be rather low in the grand scheme of things, and does help performance even for very fast I/O. I would not expect a very large CPU use increase from that sort of patch in the grand scheme of things - there is a lot that is more expensive to do on a per block basis. There are two ways to look at non-I/O bound performance: * Aggregate performance across many concurrent activities - here you want the least CPU used possible per action, and the least collisions on locks or shared data structures. Using resources for as short of an interval as possible also helps a lot here. * Single query performance, where you want to shorten the query time, perhaps at the cost of more average CPU. Here, something like the fadvise stuff helps - as would any thread parallelism. Perhaps less efficient in aggregate, but more efficient for a single query. Overall CPU cost of accessing and reading data. If this comes from disk, the big gains will be along the whole chain: Driver to file system cache, file system cache to process, process specific tasks (cache eviction, placement, tracking), examining page tuples, locating tuples within pages, etc. Anything that currently occurs on a per-block basis that could be done in a larger batch or set of blocks may be a big gain. Another place that commonly consumes CPU in larger software projects is memory allocation if more advanced allocation techniques are not used. I have no idea what Postgres uses here however. I do know that commercial databases have extensive work in this area for performance, as well as reliability (harder to cause a leak, or easier to detect) and ease of use (don't have to even bother to free in certain contexts). > On 12/9/08 2:58 PM, "Robert Haas" <[EMAIL PROTECTED]> wrote: > I don't believe the thesis. The gap between disk speeds and memory > speeds may narrow over time, but I doubt it's likely to disappear > altogether any time soon, and certainly not for all users. Well, when select count(1) reads pages slower than my disk, its 16x + slower than my RAM. Until one can demonstrate that the system can even read pages in RAM faster than what disks will do next year, it doesn't matter much that RAM is faster. It does matter that RAM is faster for sorts, hashes, and other operations, but at the current time it does not for the raw pages themselves, from what I can measure. This is in fact, central to my point. Things will be CPU bound, not I/O bound. It is mentioned that we still have to access things over the bus, and memory is faster, etc. But Postgres is too CPU bound on page access to take advantage of the fact that memory is faster (for reading data pages). The biggest change is not just that disks are getting closer to RAM, but that the random I/O penalty is diminishing significantly. Low latencies makes seek-driven queries that used to consume mostly disk time consume CPU time instead. High CPU costs for accessing pages makes a fast disk surprisingly close to RAM speed. > Besides which, I believe the CPU overhead of that patch is pretty darn > small when the feature is not enabled. > ...Robert I doubt it is much CPU, on or off. It will help with SSD's when optimizing a single query, it may not help much if a system has enough 'natural' parallelism from other concurrent queries. However there is a clear CPU benefit for getting individual queries out of the way faster, and occupying precious work_mem or other resources for a shorter time. Occupying resources for a shorter period always translates to some CPU savings on a machine running at its limit with high concurrency.
Re: [PERFORM] Need help with 8.4 Performance Testing
Scott Carey <[EMAIL PROTECTED]> writes: > And as far as I can tell, even after the 8.4 fadvise patch, all I/O is in > block_size chunks. (hopefully I am wrong) >... > In addition to the fadvise patch, postgres needs to merge adjacent I/O's > into larger ones to reduce the overhead. It only really needs to merge up to > sizes of about 128k or 256k, and gain a 8x to 16x drop in syscall overhead, > and additionally potentially save code trips down the shared buffer > management code paths. At lest, thats my guess I haven't looked at any code > and could be wrong. There are a lot of assumptions here that I would be interested in seeing experiments to back up. FWIW when I was doing testing of posix_fadvise I did a *lot* of experiments though only with a couple systems. One had a 3-drive array and one with a 15-drive array, both running Linux. I sometimes could speed up the sequential scan by about 10% but not consistently. It was never more than about 15% shy of the highest throughput from dd. And incidentally the throughput from dd didn't seem to depend much at all on the blocksize. On your system does "dd bs=8k" and "dd bs=128k" really have an 8x performance difference? In short, at least from the evidence available, this all seems like it might be holdover beliefs from the olden days of sysadmining where syscalls were much slower and OS filesystem caches much dumber. I'm still interested in looking into it but I'll have to see actual vmstat or iostat output while it's happening, preferably some oprofile results too. And how many drives do you actually need to get into this situation. Also, what is the output of "vacuum verbose" on the table? > Additionally, the "If your operating system has any reasonable caching > itself" comment earlier in this conversation --- Linux (2.6.18, Centos 5.2) > does NOT. I can easily make it spend 100% CPU in system time trying to > figure out what to do with the system cache for an hour. Just do large > seqscans with memory pressure from work_mem or other forces that the OS will > not deem 'idle'. Once the requested memory is ~75% of the system total, it > will freak out. Linux simply will not give up that last 25% or so of the RAM > for anything but page cache This seems like just a misconfigured system. Linux and most Unixen definitely expect to have a substantial portion of RAM dedicated to disk cache. Keep in mind all your executable pages count towards this page cache too. You can adjust this to some extent with the "swappiness" variable in Linux -- but I doubt you'll be happy with the results regardless. > The other way around (small shared_buffers, let the OS do it) hurts > performance overall quite a bit -- randomly accessed pages get pushed out to > the OS cache more often, and the OS tosses thouse out when a big seqscan > occurs, resulting in a lot more random access from disk and more disk bound > periods of time. Great wonder, this operating system caching, eh? How do you observe this? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Matthew Wakeling <[EMAIL PROTECTED]> writes: > On Tue, 9 Dec 2008, Scott Marlowe wrote: >> I wonder how many hard drives it would take to be CPU bound on random >> access patterns? About 40 to 60? And probably 15k / SAS drives to >> boot. Cause that's what we're looking at in the next few years where >> I work. > > There's a problem with that thinking. That is, in order to exercise many > spindles, you will need to have just as many (if not more) concurrent > requests. > And if you have many concurrent requests, then you can spread them over > multiple CPUs. So it's more a case of "How many hard drives PER CPU". It also > becomes a matter of whether Postgres can scale that well. Well: $ units 2445 units, 71 prefixes, 33 nonlinear units You have: 8192 byte/5ms You want: MB/s * 1.6384 / 0.61035156 At 1.6MB/s per drive if find Postgres is cpu-bound doing sequential scans at 1GB/s you'll need about 640 drives to keep one cpu satisfied doing random I/O -- assuming you have perfect read-ahead and the read-ahead itself doesn't add cpu overhead. Both of which are false of course, but at least in theory that's what it'll take. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Tom Lane wrote: Scott Carey <[EMAIL PROTECTED]> writes: Which brings this back around to the point I care the most about: I/O per second will diminish as the most common database performance limiting factor in Postgres 8.4's lifetime, and become almost irrelevant in 8.5's. Becoming more CPU efficient will become very important, and for some, already is. The community needs to be proactive on this front. This turns a lot of old assumptions on their head, from the database down through the OS and filesystem. We're bound to run into many surprises due to this major shift in something that has had its performance characteristics taken for granted for decades. Hmm ... I wonder whether this means that the current work on parallelizing I/O (the posix_fadvise patch in particular) is a dead end. Because what that is basically going to do is expend more CPU to improve I/O efficiency. If you believe this thesis then that's not the road we want to go down. I imagine the larger postgres installations will still benefit from this patch - because I imagine they will stay on hard disks for quite some time; simply because the cost of 70TB of disks seems like it'll be lower than RAM for at least the intermediate term. I imagine the smaller postgres installations will also still benefit from this patch - because my postgres installations with the most painful I/O bottlenecks are small virtual machines without dedicated drives where I/O (I guess emulated by the virtual machine software) is very painful. Perhaps there's a mid-sized system that won't benefit from fadvise() in the intermediate term -- where the size of the database is about the same size as a cost-effective flash drive -- but I don't have any databases in that range now. -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
justin wrote: Tom Lane wrote: Hmm ... I wonder whether this means that the current work on parallelizing I/O (the posix_fadvise patch in particular) is a dead end. Because what that is basically going to do is expend more CPU to improve I/O efficiency. If you believe this thesis then that's not the road we want to go down. regards, tom lane What does the CPU/ Memory/Bus performance road map look like? Is the IO performance for storage device for what ever it be, going to be on par with the above to cause this problem? Flash memory will become just a fourth layer in the memory caching system (on-board CPU, high-speed secondary cache, main memory, and persistent memory). The idea of external storage will probably disappear altogether -- computers will just have memory, and won't forget anything when you turn them off. Since most computers are 64 bits these days, all data and programs will just hang out in memory at all times, and be directly addressable by the CPU. The distinction between disk and memory arose from the fact that disks were large, slow devices relative to "core" memory and had to be connected by long wires, hence the need for I/O subsystems. As flash memory becomes mainstream, I expect this distinction to disappear. Craig -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
> Well, when select count(1) reads pages slower than my disk, its 16x + slower > than my RAM. Until one can demonstrate that the system can even read pages > in RAM faster than what disks will do next year, it doesn't matter much that > RAM is faster. It does matter that RAM is faster for sorts, hashes, and > other operations, but at the current time it does not for the raw pages > themselves, from what I can measure. > > This is in fact, central to my point. Things will be CPU bound, not I/O > bound. It is mentioned that we still have to access things over the bus, > and memory is faster, etc. But Postgres is too CPU bound on page access to > take advantage of the fact that memory is faster (for reading data pages). As I understand it, a big part of the reason for the posix_fadvise patch is that the current system doesn't do a good job leveraging many spindles in the service of a single query. So the problem is not that the CPU overhead is too large in some general sense but that the disk and CPU operations get serialized, leading to an overall loss of performance. On the other hand, there are certainly cases (such as a database which is entirely in RAM, or all the commonly used parts are in RAM) where there really isn't very much I/O, and in those cases of course the CPU cost will dominate. ...Robert -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Degenerate Performance Problem
I have postgresql 8.3.5 installed on MacOS X / Darwin. I remember setting shared memory buffer parameters and that solved the initial performance problem, but after running several tests, the performance goes way, way down. Restarting the server doesn't seem to help. I'm using pqxx to access the database, if that makes any difference. -- Vincent
Re: [PERFORM] Degenerate Performance Problem
On Tue, Dec 9, 2008 at 8:16 PM, Vincent Predoehl <[EMAIL PROTECTED]> wrote: > I have postgresql 8.3.5 installed on MacOS X / Darwin. I remember setting > shared memory buffer parameters and that solved the initial performance > problem, but after running several tests, the performance goes way, way > down. Restarting the server doesn't seem to help. > I'm using pqxx to access the database, if that makes any difference. Could be a vacuuming issue. What does vacuum verbose; on the database say? -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Just to clarify, I'm not talking about random I/O bound loads today, on hard drives, targetted by the fadvise stuff - these aren't CPU bound, and they will be helped by it. For sequential scans, this situation is different, since the OS has sufficient read-ahead prefetching algorithms of its own for sequential reads, and the CPU work and I/O work ends up happening in parallel due to that. For what it is worth, you can roughly double to triple the iops of an Intel X-25M on pure random reads if you queue up multiple concurrent reads rather than serialize them. But it is not due to spindles, it is due to the latency of the SATA interface and the ability of the controller chip to issue reads to flash devices on different banks concurrently to some extent. On 12/9/08 7:06 PM, "Robert Haas" <[EMAIL PROTECTED]> wrote: > Well, when select count(1) reads pages slower than my disk, its 16x + slower > than my RAM. Until one can demonstrate that the system can even read pages > in RAM faster than what disks will do next year, it doesn't matter much that > RAM is faster. It does matter that RAM is faster for sorts, hashes, and > other operations, but at the current time it does not for the raw pages > themselves, from what I can measure. > > This is in fact, central to my point. Things will be CPU bound, not I/O > bound. It is mentioned that we still have to access things over the bus, > and memory is faster, etc. But Postgres is too CPU bound on page access to > take advantage of the fact that memory is faster (for reading data pages). As I understand it, a big part of the reason for the posix_fadvise patch is that the current system doesn't do a good job leveraging many spindles in the service of a single query. So the problem is not that the CPU overhead is too large in some general sense but that the disk and CPU operations get serialized, leading to an overall loss of performance. On the other hand, there are certainly cases (such as a database which is entirely in RAM, or all the commonly used parts are in RAM) where there really isn't very much I/O, and in those cases of course the CPU cost will dominate. ...Robert
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
Scott Marlowe schrieb: On Tue, Dec 9, 2008 at 5:17 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote: Alan Hodgson schrieb: Mario Weilguni <[EMAIL PROTECTED]> wrote: strange values. An individual drive is capable of delivering 91 MB/sec sequential read performance, and we get values ~102MB/sec out of a 8-drive RAID5, seems to be ridiculous slow. What command are you using to test the reads? Some recommendations to try: 1) /sbin/blockdev --setra 2048 device (where device is the partition or LVM volume) 2) Use XFS, and make sure your stripe settings match the RAID. Having said that, 102MB/sec sounds really low for any modern controller with 8 drives, regardless of tuning or filesystem choice. First, thanks alot for this and all the other answers. I measured the raw device performance: dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null I get poor performance when all 8 drives are configured as one, large RAID-5, and slightly poorer performance when configured as JBOD. In production, we use XFS as FS, but I doubt this has anything to do with FS tuning. Yeah, having just trawled the pgsql-performance archives, there are plenty of instances of people having terrible performance from HP smart array controllers before the P800. Is it possible for you to trade up to a better RAID controller? Whichever salesman sold you the P400 should take one for the team and make this right for you. A customer of us uses the P400 on a different machine, 8 SAS drives (Raid 5 as well), and the performance is very, very good. So we thought it's a good choice. Maybe the SATA drives are the root of this problem? -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?
It could be the drives, it could be a particular interaction between them and the drivers or firmware. Do you know if NCQ is activated for them? Can you test a single drive JBOD through the array to the same drive through something else, perhaps the motherboard's SATA port? You may also have better luck with software raid-0 on top of 2 4 disk raid 5's or raid 10s. But not if a single disk JBOD still performs well under par. You may need new drivers for the card, or firmware for the drive and or card. Or, the card may simply be incompatible with those drives. I've seen several hard drive - raid card incompatibilities before. On 12/9/08 11:45 PM, "Mario Weilguni" <[EMAIL PROTECTED]> wrote: Scott Marlowe schrieb: > On Tue, Dec 9, 2008 at 5:17 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote: > >> Alan Hodgson schrieb: >> >>> Mario Weilguni <[EMAIL PROTECTED]> wrote: >>> >>> > strange values. An individual drive is capable of delivering 91 > MB/sec > sequential read performance, and we get values ~102MB/sec out of a > 8-drive RAID5, seems to be ridiculous slow. > >>> What command are you using to test the reads? >>> >>> Some recommendations to try: >>> >>> 1) /sbin/blockdev --setra 2048 device (where device is the partition or >>> LVM volume) >>> >>> 2) Use XFS, and make sure your stripe settings match the RAID. >>> >>> Having said that, 102MB/sec sounds really low for any modern controller >>> with 8 drives, regardless of tuning or filesystem choice. >>> >>> >>> >> First, thanks alot for this and all the other answers. >> >> I measured the raw device performance: >> dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null >> >> I get poor performance when all 8 drives are configured as one, large >> RAID-5, and slightly poorer performance when configured as JBOD. In >> production, we use XFS as FS, but I doubt this has anything to do with FS >> tuning. >> > > Yeah, having just trawled the pgsql-performance archives, there are > plenty of instances of people having terrible performance from HP > smart array controllers before the P800. Is it possible for you to > trade up to a better RAID controller? Whichever salesman sold you the > P400 should take one for the team and make this right for you. > > A customer of us uses the P400 on a different machine, 8 SAS drives (Raid 5 as well), and the performance is very, very good. So we thought it's a good choice. Maybe the SATA drives are the root of this problem? -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
