Re: [PERFORM] Running on an NFS Mounted Directory
We have gotten very good performance from netapp and postgres 7.4.11 . I was able to push about 100MB/s over gigE, but that was limited by our netapp. DAS will generally always be faster, but if for example you have 2 disks vs. 100 NFS mounted ,NFS will be faster. NFS is very reliable and I would stay away from iscsi. Regards, Dan Gorman On Apr 26, 2006, at 7:35 PM, Steve Wampler wrote: On Wed, Apr 26, 2006 at 10:06:58PM -0400, Ketema Harris wrote: I was wondering if there were any performance issues with having a data directory that was an nfs mounted drive? Say like a SAN or NAS device? Has anyone done this before? My understanding is that NFS is pretty poor in performance in general, so I would expect it to be particularly bad for a DB. You might run some (non-DB) performance tests to get a feel for how bad it might me. (Someone once told me that NFS topped out at around 12MB/s, but I don't know if that's really true [they were trying to sell a competitive networked filesystem]). In any event, you're at least limited by ethernet speeds, if not more. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Running on an NFS Mounted Directory
On Wed, Apr 26, 2006 at 07:35:42PM -0700, Steve Wampler wrote: > On Wed, Apr 26, 2006 at 10:06:58PM -0400, Ketema Harris wrote: > > I was wondering if there were any performance issues with having a data > > directory that was an nfs mounted drive? Say like a SAN or NAS device? Has > > anyone done this before? > > My understanding is that NFS is pretty poor in performance in general, > so I would expect it to be particularly bad for a DB. You might run > some (non-DB) performance tests to get a feel for how bad it might me. > (Someone once told me that NFS topped out at around 12MB/s, but I don't > know if that's really true [they were trying to sell a competitive > networked filesystem]). > > In any event, you're at least limited by ethernet speeds, if not more. More importantly, the latency involved will kill commit performance. If it doesn't then it's likely that fsync isn't being obeyed, which means 0 data integrity. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Running on an NFS Mounted Directory
On Wed, Apr 26, 2006 at 10:06:58PM -0400, Ketema Harris wrote: > I was wondering if there were any performance issues with having a data > directory that was an nfs mounted drive? Say like a SAN or NAS device? Has > anyone done this before? My understanding is that NFS is pretty poor in performance in general, so I would expect it to be particularly bad for a DB. You might run some (non-DB) performance tests to get a feel for how bad it might me. (Someone once told me that NFS topped out at around 12MB/s, but I don't know if that's really true [they were trying to sell a competitive networked filesystem]). In any event, you're at least limited by ethernet speeds, if not more. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Wed, Apr 26, 2006 at 05:37:31PM -0500, Jim C. Nasby wrote: > On Wed, Apr 26, 2006 at 06:16:46PM -0400, Bruce Momjian wrote: > > AMD transfers the dirty cache line directly from cpu to cpu. I can > > imaging that helping our test-and-set shared memory usage quite a bit. > Wasn't the whole point of test-and-set that it's the recommended way to > do lightweight spinlocks according to AMD/Intel? You'd think they'd have > a way to make that performant on multiple CPUs (though if it's relying > on possibly modifying an underlying data page I can't really think of > how to do that without snaking through the cache...) It's expensive no matter what. One method might be less expensive than another. :-) AMD definately seems to have things right for lowest absolute latency. 2X still sounds like an extreme case - but until I've actually tried a very large, or thread intensive PostgreSQL db on both, I probably shouldn't doubt the work of others too much. :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow deletes in 8.1 when FKs are involved
Stef: There is already a post explaining the solution. All the proper indexes were there, and it works great on 7.4. The problem lies with leftover 7.4 RI triggers being carried over to an 8.1 database. The solution is to drop the triggers and add the constraint. Hopefully this will not cause as many locking issues with FKs on 8.1 as it did in 7.4 (which is why one of the RI triggers was removed in the first place). Will Reese -- http://blog.rezra.com On Apr 26, 2006, at 6:43 PM, Stef T wrote: Hey there Will, I would assume that, perhaps, jst perhaps, the FK doesn't have an index on the field on both sides, so, your seeing a potential sequential scan happening. Can you fling up an explain anaylze for everyone please ? Anything more will be merely shooting in the dark, and, tracer bullets aside, I have heard that -that- can be dangerous ;p Regards Stef Will Reese wrote: I'm preparing for an upgrade from PostgreSQL 7.4.5 to 8.1.3, and I noticed a potential performance issue. I have two servers, a dual proc Dell with raid 5 running PostgreSQL 7.4, and a quad proc Dell with a storage array running PostgreSQL 8.1. Both servers have identical postgresql.conf settings and were restored from the same 7.4 backup. Almost everything is faster on the 8.1 server (mostly due to hardware), except one thing...deletes from tables with foreign keys. I have table A with around 100,000 rows, that has foreign keys to around 50 other tables. Some of these other tables (table B, for example) have around 10 million rows. On the 7.4 server, I can delete a single row from a table A in well under a second (as expected). On the 8.1 server, it takes over a minute to delete. I tried all the usual stuff, recreating indexes, vacuum analyzing, explain analyze. Everything is identical between the systems. If I hit ctrl-c while the delete was running on 8.1, I repeatedly got the following message... db=# delete from "A" where "ID" in ('6'); Cancel request sent ERROR: canceling statement due to user request CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."B" x WHERE "A_ID" = $1 FOR SHARE OF x" It looks to me like the "SELECT ... FOR SHARE" functionality in 8.1 is the culprit. Has anyone else run into this issue? Will Reese -- http://blog.rezra.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Running on an NFS Mounted Directory
Title: Running on an NFS Mounted Directory I was wondering if there were any performance issues with having a data directory that was an nfs mounted drive? Say like a SAN or NAS device? Has anyone done this before?
Re: [Bizgres-general] [PERFORM] Introducing a new linux
On Wed, Apr 26, 2006 at 04:33:40PM -0700, Luke Lonergan wrote: I¹m thinking about it, we¹re already using a fixed read-ahead of 16MB using blockdev on the stock Redhat 2.6.9 kernel, it would be nice to not have to set this so we may try it. FWIW, I never saw much performance difference from doing that. Wu's patch, OTOH, gave a big boost. Mike Stone ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Slow deletes in 8.1 when FKs are involved
Hey there Will, I would assume that, perhaps, jst perhaps, the FK doesn't have an index on the field on both sides, so, your seeing a potential sequential scan happening. Can you fling up an explain anaylze for everyone please ? Anything more will be merely shooting in the dark, and, tracer bullets aside, I have heard that -that- can be dangerous ;p Regards Stef Will Reese wrote: > I'm preparing for an upgrade from PostgreSQL 7.4.5 to 8.1.3, and I > noticed a potential performance issue. > > I have two servers, a dual proc Dell with raid 5 running PostgreSQL > 7.4, and a quad proc Dell with a storage array running PostgreSQL 8.1. > Both servers have identical postgresql.conf settings and were restored > from the same 7.4 backup. Almost everything is faster on the 8.1 > server (mostly due to hardware), except one thing...deletes from > tables with foreign keys. > > I have table A with around 100,000 rows, that has foreign keys to > around 50 other tables. Some of these other tables (table B, for > example) have around 10 million rows. > > On the 7.4 server, I can delete a single row from a table A in well > under a second (as expected). On the 8.1 server, it takes over a > minute to delete. I tried all the usual stuff, recreating indexes, > vacuum analyzing, explain analyze. Everything is identical between > the systems. If I hit ctrl-c while the delete was running on 8.1, I > repeatedly got the following message... > > db=# delete from "A" where "ID" in ('6'); > Cancel request sent > ERROR: canceling statement due to user request > CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."B" x WHERE > "A_ID" = $1 FOR SHARE OF x" > > It looks to me like the "SELECT ... FOR SHARE" functionality in 8.1 is > the culprit. Has anyone else run into this issue? > > > Will Reese -- http://blog.rezra.com > > > > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly > ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [Bizgres-general] [PERFORM] Introducing a new linux
Title: Re: [Bizgres-general] [PERFORM] Introducing a new linux readahead framework Jim, I’m thinking about it, we’re already using a fixed read-ahead of 16MB using blockdev on the stock Redhat 2.6.9 kernel, it would be nice to not have to set this so we may try it. - Luke On 4/26/06 3:28 PM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: (including bizgres-general) Has anyone done any testing on bizgres? It's got some patches that eliminate a lot of IO bottlenecks, so it might present even larger gains. On Wed, Apr 26, 2006 at 03:08:59PM -0500, Steve Poe wrote: > I found an average 14% improvement Using Pg 7.4.11 with odbc-bench as my > test bed with Wu's kernel patch. I have not tried version 8.x yet. > > Thanks Wu. > > Steve Poe > > Using Postgresql 7.4.11, on an dual Opteron with 4GB > > On Fri, 2006-04-21 at 09:38 +0800, Wu Fengguang wrote: > > Greetings, > > > > I'd like to introduce a new readahead framework for the linux kernel: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/1021.html > > > > HOW IT WORKS > > > > In adaptive readahead, the context based method may be of particular > > interest to postgresql users. It works by peeking into the file cache > > and check if there are any history pages present or accessed. In this > > way it can detect almost all forms of sequential / semi-sequential read > > patterns, e.g. > > - parallel / interleaved sequential scans on one file > > - sequential reads across file open/close > > - mixed sequential / random accesses > > - sparse / skimming sequential read > > > > It also have methods to detect some less common cases: > > - reading backward > > - seeking all over reading N pages > > > > WAYS TO BENEFIT FROM IT > > > > As we know, postgresql relies on the kernel to do proper readahead. > > The adaptive readahead might help performance in the following cases: > > - concurrent sequential scans > > - sequential scan on a fragmented table > > (some DBs suffer from this problem, not sure for pgsql) > > - index scan with clustered matches > > - index scan on majority rows (in case the planner goes wrong) > > > > TUNABLE PARAMETERS > > > > There are two parameters which are described in this email: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/1024.html > > > > Here are the more oriented guidelines for postgresql users: > > > > - /proc/sys/vm/readahead_ratio > > Since most DB servers are bounty of memory, the danger of readahead > > thrashing is near to zero. In this case, you can set readahead_ratio to > > 100(or even 200:), which helps the readahead window to scale up rapidly. > > > > - /proc/sys/vm/readahead_hit_rate > > Sparse sequential reads are read patterns like {0, 2, 4, 5, 8, 11, ...}. > > In this case we might prefer to do readahead to get good I/O performance > > with the overhead of some useless pages. But if you prefer not to do so, > > set readahead_hit_rate to 1 will disable this feature. > > > > - /sys/block/sd/queue/read_ahead_kb > > Set it to a large value(e.g. 4096) as you used to do. > > RAID users might want to use a bigger number. > > > > TRYING IT OUT > > > > The latest patch for stable kernels can be downloaded here: > > http://www.vanheusden.com/ara/ > > > > Before compiling, make sure that the following options are enabled: > > Processor type and features -> Adaptive file readahead > > Processor type and features -> Readahead debug and accounting > > > > HELPING AND CONTRIBUTING > > > > The patch is open to fine-tuning advices :) > > Comments and benchmarking results are highly appreciated. > > > > Thanks, > > Wu > > > > ---(end of broadcast)--- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ___ Bizgres-general mailing list [EMAIL PROTECTED] http://pgfoundry.org/mailman/listinfo/bizgres-general
Re: [PERFORM] WAL logging of SELECT ... INTO command
Backpatched to 8.0.X and 8.1.X. --- Kris Jurka wrote: > > > On Fri, 24 Mar 2006, Jim C. Nasby wrote: > > > On Wed, Mar 22, 2006 at 02:37:28PM -0500, Kris Jurka wrote: > >> > >> On Wed, 22 Mar 2006, Jim C. Nasby wrote: > >> > >>> Ok, I saw disk activity on the base directory and assumed it was pg_xlog > >>> stuff. Turns out that both SELECT INTO and CREATE TABLE AS ignore > >>> default_tablepsace and create the new tables in the base directory. I'm > >>> guessing that's a bug... (this is on 8.1.2, btw). > >> > >> This has been fixed in CVS HEAD as part of a patch to allow additional > >> options to CREATE TABLE AS. > >> > >> http://archives.postgresql.org/pgsql-patches/2006-02/msg00211.php > > > > I'll argue that the current behavior is still a bug and should be fixed. > > Would it be difficult to patch 8.1 (and 8.0 if there were tablespaces > > then...) to honor default_tablespace? > > Here are patches that fix this for 8.0 and 8.1. > > Kris Jurka Content-Description: [ Attachment, skipping... ] Content-Description: [ Attachment, skipping... ] > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Wed, Apr 26, 2006 at 06:16:46PM -0400, Bruce Momjian wrote: > Jim C. Nasby wrote: > > On Wed, Apr 26, 2006 at 10:27:18AM -0500, Scott Marlowe wrote: > > > If you haven't actually run a heavy benchmark of postgresql on the two > > > architectures, please don't make your decision based on other > > > benchmarks. Since you've got both a D920 and an X2 3800, that'd be a > > > great place to start. Mock up some benchmark with a couple dozen > > > threads hitting the server at once and see if the Intel can keep up. It > > > > Or better yet, use dbt* or even pgbench so others can reproduce... > > For why Opterons are superior to Intel for PostgreSQL, see: > > http://techreport.com/reviews/2005q2/opteron-x75/index.x?pg=2 > > Section "MESI-MESI-MOESI Banana-fana...". Specifically, this part about > the Intel implementation: > > The processor with the Invalid data in its cache (CPU 0, let's say) > might then wish to modify that chunk of data, but it could not do so > while the only valid copy of the data is in the cache of the other > processor (CPU 1). Instead, CPU 0 would have to wait until CPU 1 wrote > the modified data back to main memory before proceeding.and that takes > time, bus bandwidth, and memory bandwidth. This is the great drawback of > MESI. > > AMD transfers the dirty cache line directly from cpu to cpu. I can > imaging that helping our test-and-set shared memory usage quite a bit. Wasn't the whole point of test-and-set that it's the recommended way to do lightweight spinlocks according to AMD/Intel? You'd think they'd have a way to make that performant on multiple CPUs (though if it's relying on possibly modifying an underlying data page I can't really think of how to do that without snaking through the cache...) -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Introducing a new linux readahead framework
(including bizgres-general) Has anyone done any testing on bizgres? It's got some patches that eliminate a lot of IO bottlenecks, so it might present even larger gains. On Wed, Apr 26, 2006 at 03:08:59PM -0500, Steve Poe wrote: > I found an average 14% improvement Using Pg 7.4.11 with odbc-bench as my > test bed with Wu's kernel patch. I have not tried version 8.x yet. > > Thanks Wu. > > Steve Poe > > Using Postgresql 7.4.11, on an dual Opteron with 4GB > > On Fri, 2006-04-21 at 09:38 +0800, Wu Fengguang wrote: > > Greetings, > > > > I'd like to introduce a new readahead framework for the linux kernel: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/1021.html > > > > HOW IT WORKS > > > > In adaptive readahead, the context based method may be of particular > > interest to postgresql users. It works by peeking into the file cache > > and check if there are any history pages present or accessed. In this > > way it can detect almost all forms of sequential / semi-sequential read > > patterns, e.g. > > - parallel / interleaved sequential scans on one file > > - sequential reads across file open/close > > - mixed sequential / random accesses > > - sparse / skimming sequential read > > > > It also have methods to detect some less common cases: > > - reading backward > > - seeking all over reading N pages > > > > WAYS TO BENEFIT FROM IT > > > > As we know, postgresql relies on the kernel to do proper readahead. > > The adaptive readahead might help performance in the following cases: > > - concurrent sequential scans > > - sequential scan on a fragmented table > > (some DBs suffer from this problem, not sure for pgsql) > > - index scan with clustered matches > > - index scan on majority rows (in case the planner goes wrong) > > > > TUNABLE PARAMETERS > > > > There are two parameters which are described in this email: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/1024.html > > > > Here are the more oriented guidelines for postgresql users: > > > > - /proc/sys/vm/readahead_ratio > > Since most DB servers are bounty of memory, the danger of readahead > > thrashing is near to zero. In this case, you can set readahead_ratio to > > 100(or even 200:), which helps the readahead window to scale up rapidly. > > > > - /proc/sys/vm/readahead_hit_rate > > Sparse sequential reads are read patterns like {0, 2, 4, 5, 8, 11, ...}. > > In this case we might prefer to do readahead to get good I/O performance > > with the overhead of some useless pages. But if you prefer not to do so, > > set readahead_hit_rate to 1 will disable this feature. > > > > - /sys/block/sd/queue/read_ahead_kb > > Set it to a large value(e.g. 4096) as you used to do. > > RAID users might want to use a bigger number. > > > > TRYING IT OUT > > > > The latest patch for stable kernels can be downloaded here: > > http://www.vanheusden.com/ara/ > > > > Before compiling, make sure that the following options are enabled: > > Processor type and features -> Adaptive file readahead > > Processor type and features -> Readahead debug and accounting > > > > HELPING AND CONTRIBUTING > > > > The patch is open to fine-tuning advices :) > > Comments and benchmarking results are highly appreciated. > > > > Thanks, > > Wu > > > > ---(end of broadcast)--- > > TIP 4: Have you searched our list archives? > > > >http://archives.postgresql.org > > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org > -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Wed, Apr 26, 2006 at 02:48:53AM -0400, [EMAIL PROTECTED] wrote: > You said that DB accesses are random. I'm not so sure. In PostgreSQL, > are not the individual pages often scanned sequentially, especially > because all records are variable length? You don't think PostgreSQL > will regularly read 32 bytes (8 bytes x 4) at a time, in sequence? > Whether for table pages, or index pages - I'm not seeing why the > accesses wouldn't be sequential. You believe PostgreSQL will access > the table pages and index pages randomly on a per-byte basis? What > is the minimum PostgreSQL record size again? Isn't it 32 bytes or > over? :-) Data within a page can absolutely be accessed randomly; it would be horribly inefficient to slog through 8K of data every time you needed to find a single row. The header size of tuples is ~23 bytes, depending on your version of PostgreSQL, and data fields have to start on the proper alignment (generally 4 bytes). So essentially the smallest row you can get is 28 bytes. I know that tuple headers are dealt with as a C structure, but I don't know if that means accessing any of the header costs the same as accessing the whole thing. I don't know if PostgreSQL can access fields within tuples without having to scan through at least the first part of preceeding fields, though I suspect that it can access fixed-width fields that sit before any varlena fields directly (without scanning through the other fields). If we ever got to the point of divorcing the in-memory tuple layout from the table layout it'd be interesting to experiment with having all varlena length info stored immediately after all fixed-width fields; that could potentially make accessing varlena's randomly faster. Note that null fields are indicated as such in the null bitmap, so I'm pretty sure that their in-tuple position doesn't matter much. Of course if you want the definitive answer, Use The Source. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
Jim C. Nasby wrote: > On Wed, Apr 26, 2006 at 10:27:18AM -0500, Scott Marlowe wrote: > > If you haven't actually run a heavy benchmark of postgresql on the two > > architectures, please don't make your decision based on other > > benchmarks. Since you've got both a D920 and an X2 3800, that'd be a > > great place to start. Mock up some benchmark with a couple dozen > > threads hitting the server at once and see if the Intel can keep up. It > > Or better yet, use dbt* or even pgbench so others can reproduce... For why Opterons are superior to Intel for PostgreSQL, see: http://techreport.com/reviews/2005q2/opteron-x75/index.x?pg=2 Section "MESI-MESI-MOESI Banana-fana...". Specifically, this part about the Intel implementation: The processor with the Invalid data in its cache (CPU 0, let's say) might then wish to modify that chunk of data, but it could not do so while the only valid copy of the data is in the cache of the other processor (CPU 1). Instead, CPU 0 would have to wait until CPU 1 wrote the modified data back to main memory before proceeding.and that takes time, bus bandwidth, and memory bandwidth. This is the great drawback of MESI. AMD transfers the dirty cache line directly from cpu to cpu. I can imaging that helping our test-and-set shared memory usage quite a bit. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Tue, Apr 25, 2006 at 11:07:17PM -0400, Ron Peacetree wrote: > A minor point to be noted in addition here is that most DB servers under load > are limited by their physical IO subsystem, their HDs, and not the speed of > their RAM. I think if that were the only consideration we wouldn't be seeing such a dramatic difference between AMD and Intel though. Even in a disk-bound server, caching is going to have a tremendous impact, and that's essentially entirely bound by memory bandwith and latency. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Wed, Apr 26, 2006 at 10:17:58AM -0500, Scott Marlowe wrote: > On Tue, 2006-04-25 at 18:55, Jim C. Nasby wrote: > > On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote: > > > On Tue, 2006-04-25 at 13:14, Bill Moran wrote: > > > > I've been given the task of making some hardware recommendations for > > > > the next round of server purchases. The machines to be purchased > > > > will be running FreeBSD & PostgreSQL. > > > > > > > > Where I'm stuck is in deciding whether we want to go with dual-core > > > > pentiums with 2M cache, or with HT pentiums with 8M cache. > > > > > > Given a choice between those two processors, I'd choose the AMD 64 x 2 > > > CPU. It's a significantly better processor than either of the Intel > > > choices. And if you get the HT processor, you might as well turn of HT > > > on a PostgreSQL machine. I've yet to see it make postgresql run faster, > > > but I've certainly seen HT make it run slower. > > > > Actually, believe it or not, a coworker just saw HT double the > > performance of pgbench on his desktop machine. Granted, not really a > > representative test case, but it still blew my mind. This was with a > > database that fit in his 1G of memory, and running windows XP. Both > > cases were newly minted pgbench databases with a scale of 40. Testing > > was 40 connections and 100 transactions. With HT he saw 47.6 TPS, > > without it was 21.1. > > > > I actually had IT build put w2k3 server on a HT box specifically so I > > could do more testing. > > Just to clarify, this is PostgreSQL on Windows, right? > > I wonder if the latest Linux kernel can do that well... I'm guessing > that the kernel scheduler in Windows has had a lot of work to make it > good at scheduling on a HT architecture than the linux kernel has. Yes, this is on Windows XP. Larry might also have a HT box with some other OS on it we can check with (though I suspect that maybe that's been beaten to death...) -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Wed, Apr 26, 2006 at 10:27:18AM -0500, Scott Marlowe wrote: > If you haven't actually run a heavy benchmark of postgresql on the two > architectures, please don't make your decision based on other > benchmarks. Since you've got both a D920 and an X2 3800, that'd be a > great place to start. Mock up some benchmark with a couple dozen > threads hitting the server at once and see if the Intel can keep up. It Or better yet, use dbt* or even pgbench so others can reproduce... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Introducing a new linux readahead framework
I found an average 14% improvement Using Pg 7.4.11 with odbc-bench as my test bed with Wu's kernel patch. I have not tried version 8.x yet. Thanks Wu. Steve Poe Using Postgresql 7.4.11, on an dual Opteron with 4GB On Fri, 2006-04-21 at 09:38 +0800, Wu Fengguang wrote: > Greetings, > > I'd like to introduce a new readahead framework for the linux kernel: > http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/1021.html > > HOW IT WORKS > > In adaptive readahead, the context based method may be of particular > interest to postgresql users. It works by peeking into the file cache > and check if there are any history pages present or accessed. In this > way it can detect almost all forms of sequential / semi-sequential read > patterns, e.g. > - parallel / interleaved sequential scans on one file > - sequential reads across file open/close > - mixed sequential / random accesses > - sparse / skimming sequential read > > It also have methods to detect some less common cases: > - reading backward > - seeking all over reading N pages > > WAYS TO BENEFIT FROM IT > > As we know, postgresql relies on the kernel to do proper readahead. > The adaptive readahead might help performance in the following cases: > - concurrent sequential scans > - sequential scan on a fragmented table > (some DBs suffer from this problem, not sure for pgsql) > - index scan with clustered matches > - index scan on majority rows (in case the planner goes wrong) > > TUNABLE PARAMETERS > > There are two parameters which are described in this email: > http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/1024.html > > Here are the more oriented guidelines for postgresql users: > > - /proc/sys/vm/readahead_ratio > Since most DB servers are bounty of memory, the danger of readahead > thrashing is near to zero. In this case, you can set readahead_ratio to > 100(or even 200:), which helps the readahead window to scale up rapidly. > > - /proc/sys/vm/readahead_hit_rate > Sparse sequential reads are read patterns like {0, 2, 4, 5, 8, 11, ...}. > In this case we might prefer to do readahead to get good I/O performance > with the overhead of some useless pages. But if you prefer not to do so, > set readahead_hit_rate to 1 will disable this feature. > > - /sys/block/sd/queue/read_ahead_kb > Set it to a large value(e.g. 4096) as you used to do. > RAID users might want to use a bigger number. > > TRYING IT OUT > > The latest patch for stable kernels can be downloaded here: > http://www.vanheusden.com/ara/ > > Before compiling, make sure that the following options are enabled: > Processor type and features -> Adaptive file readahead > Processor type and features -> Readahead debug and accounting > > HELPING AND CONTRIBUTING > > The patch is open to fine-tuning advices :) > Comments and benchmarking results are highly appreciated. > > Thanks, > Wu > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Tue, 2006-04-25 at 20:17, [EMAIL PROTECTED] wrote: > On Tue, Apr 25, 2006 at 08:54:40PM -0400, [EMAIL PROTECTED] wrote: > > I made the choice I describe based on a lot of research. I was going > > to go both Intel, until I noticed that the Intel prices were dropping > > fast. 30% price cut in 2 months. AMD didn't drop at all during the > > same time. > > Errr.. big mistake. That was going to be - I was going to go both AMD. > > > There are plenty of reasons to choose one over the other. Generally > > the AMD comes out on top. It is *not* 2X though. Anybody who claims > > this is being highly selective about which benchmarks they consider. > > I have an Intel Pentium D 920, and an AMD X2 3800+. These are very > close in performance. The retail price difference is: > > Intel Pentium D 920 is selling for $310 CDN > AMD X2 3800+is selling for $347 CDN Let me be clear. The performance difference between those boxes running the latest first person shooter is not what I was alluding to in my first post. While the price of the Intel's may have dropped, there's a huge difference (often 2x or more) in performance when running PostgreSQL on otherwise similar chips from Intel and AMD. Note that my workstation at work, my workstation at home, and my laptop are all intel based machines. They work fine for that. But if I needed to build a big fast oracle or postgresql server, I'd almost certainly go with the AMD, especially so if I needed >2 cores, where the performance difference becomes greater and greater. You'd likely find that for PostgreSQL, the slowest dual core AMDs out would still beat the fasted Intel Dual cores, because of the issue we've seen on the list with context switching storms. If you haven't actually run a heavy benchmark of postgresql on the two architectures, please don't make your decision based on other benchmarks. Since you've got both a D920 and an X2 3800, that'd be a great place to start. Mock up some benchmark with a couple dozen threads hitting the server at once and see if the Intel can keep up. It should do OK, but not great. If you can get your hands on a dual dual-core setup for either, you should really start to see the advantage going to AMD, and by the time you get to a quad dual core setup, it won't even be a contest. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
David Boreham wrote: It isn't only Postgres. I work on a number of other server applications that also run much faster on Opterons than the published benchmark figures would suggest they should. They're all compiled with gcc4, so possibly there's a compiler issue. I don't run Windows on any of our Opteron boxes so I can't easily compare using the MS compiler. Maybe it's just a fact that the majority of x86 64-bit development for open source software happens on Opteron/A64 machines. 64-bit AMD machines were selling a good year before 64-bit Intel machines were available. And even after Intel EMT64 were available, anybody in their right mind would have picked AMD machines over Intel due to cost/heat/performance. So you end up with 64-bit OSS being developed/optimized for Opterons and the 10% running Intel EMT64 handle compatibility issues. Would be interesting to see a survey of what machines OSS developers use to write/test/optimize their code. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Tue, 2006-04-25 at 18:55, Jim C. Nasby wrote: > On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote: > > On Tue, 2006-04-25 at 13:14, Bill Moran wrote: > > > I've been given the task of making some hardware recommendations for > > > the next round of server purchases. The machines to be purchased > > > will be running FreeBSD & PostgreSQL. > > > > > > Where I'm stuck is in deciding whether we want to go with dual-core > > > pentiums with 2M cache, or with HT pentiums with 8M cache. > > > > Given a choice between those two processors, I'd choose the AMD 64 x 2 > > CPU. It's a significantly better processor than either of the Intel > > choices. And if you get the HT processor, you might as well turn of HT > > on a PostgreSQL machine. I've yet to see it make postgresql run faster, > > but I've certainly seen HT make it run slower. > > Actually, believe it or not, a coworker just saw HT double the > performance of pgbench on his desktop machine. Granted, not really a > representative test case, but it still blew my mind. This was with a > database that fit in his 1G of memory, and running windows XP. Both > cases were newly minted pgbench databases with a scale of 40. Testing > was 40 connections and 100 transactions. With HT he saw 47.6 TPS, > without it was 21.1. > > I actually had IT build put w2k3 server on a HT box specifically so I > could do more testing. Just to clarify, this is PostgreSQL on Windows, right? I wonder if the latest Linux kernel can do that well... I'm guessing that the kernel scheduler in Windows has had a lot of work to make it good at scheduling on a HT architecture than the linux kernel has. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
Have a look at this Wikipedia page which outlines some differences between the AMD and Intel versions of 64-bit : http://en.wikipedia.org/wiki/EM64T It isn't only Postgres. I work on a number of other server applications that also run much faster on Opterons than the published benchmark figures would suggest they should. They're all compiled with gcc4, so possibly there's a compiler issue. I don't run Windows on any of our Opteron boxes so I can't easily compare using the MS compiler. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
Mea Culpa. There is a mistake in my example for SDR vs DDR vs DDR2. This is what I get for posting before my morning coffee. The base latency for all of the memory types is that of the base clock rate; 200MHz= 5ns in my given examples. I double factored, making DDR and DDR2 worse than they actually are. Again, my apologies. Ron -Original Message- >From: Ron Peacetree <[EMAIL PROTECTED]> >Sent: Apr 26, 2006 8:40 AM >To: [EMAIL PROTECTED], pgsql-performance@postgresql.org >Subject: Re: [PERFORM] Large (8M) cache vs. dual-core CPUs > >I'm posting this to the entire performance list in the hopes that it will be >generally useful. >=r > >Note also what happens when transferring the first datum after a lull period. >For purposes of example, let's pretend that we are talking about a base clock >rate of 200MHz= 5ns. > >The SDR still transfers data every 5ns no matter what. >The DDR transfers the 1st datum in 10ns and then assuming there are at least 2 >sequential datums to be >transferred will transfer the 2nd and subsequent >sequential pieces of data every 2.5ns. >The DDR2 transfers the 1st datum in 20ns and then assuming there are at least >4 sequential datums to be >transferred will transfer the 2nd and subsequent >sequential pieces of data every 1.25ns. > =5= ns to first transfer in all 3 casess. Bad Ron. No Biscuit! > >Thus we can see that randomly accessing RAM degrades performance significantly >for DDR and DDR2. We can >also see that the conditions for optimal RAM >performance become more restrictive as we go from SDR to DDR to >DDR2. >The reason DDR2 with a low base clock rate excelled at tasks like streaming >multimedia and stank at things like >small transaction OLTP DB applications is >now apparent. > >Factors like CPU prefetching and victim buffers can muddy this picture a bit. >Also, if the CPU's off die IO is slower than the RAM it is talking to, how >fast that RAM is becomes unimportant. > These statements, and everything else I posted, are accurate. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Introducing a new linux readahead framework
From my initial testing this is very promising for a postgres server. Benchmark-wise, a simple dd with an 8k blocksize gets ~200MB/s as compared to ~140MB/s on the same hardware without the patch. Also, that 200MB/s seems to be unaffected by the dd blocksize, whereas without the patch a 512k blocksize would get ~100MB/s. I'm now watching to see how it does over a couple of days on real-world workloads. Mike Stone ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
>While in general there may not be that much of a % difference between the 2 chips, >there's a huge gap in Postgres. For whatever reason, Postgres likes Opterons. >Way more than Intel P4-architecture chips. It isn't only Postgres. I work on a number of other server applications that also run much faster on Opterons than the published benchmark figures would suggest they should. They're all compiled with gcc4, so possibly there's a compiler issue. I don't run Windows on any of our Opteron boxes so I can't easily compare using the MS compiler. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] slow deletes on pgsql 7.4
It was on my first email. Here it is again: MONSOON=# explain delete from scenario where id='1099'; QUERY PLAN-- Index Scan using scenario_pkey on scenario (cost= 0.00..3.14 rows=1 width=6) Index Cond: (id = 1099::bigint)(2 rows)MONSOON=# explain delete from scenario where id='1023'; QUERY PLAN-- Index Scan using scenario_pkey on scenario (cost=0.00..3.14 rows=1 width=6) Index Cond: (id = 1023::bigint)(2 rows) Thanks, J On 4/25/06, Tom Lane <[EMAIL PROTECTED]> wrote: "Junaili Lie" <[EMAIL PROTECTED]> writes:> ie. delete from scenario where id=3D'1023' is very fast, but delete from > scenario where id=3D'1099' is running forever.What does EXPLAIN show for each of those cases? regards, tom lane
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
[EMAIL PROTECTED] wrote: I have an Intel Pentium D 920, and an AMD X2 3800+. These are very close in performance. The retail price difference is: Intel Pentium D 920 is selling for $310 CDN AMD X2 3800+is selling for $347 CDN Anybody who claims that Intel is 2X more expensive for the same performance, isn't considering all factors. No question at all - the Opteron is good, and the Xeon isn't - but the original poster didn't ask about Opeteron or Xeon, did he? For the desktop lines - X2 is not double Pentium D. Maybe 10%. Maybe not at all. Especially now that Intel is dropping it's prices due to overstock. There's part of the equation you are missing here. This is a PostgreSQL mailing list which means we're usually talking about performance of just this specific server app. While in general there may not be that much of a % difference between the 2 chips, there's a huge gap in Postgres. For whatever reason, Postgres likes Opterons. Way more than Intel P4-architecture chips. (And it appears way more than IBM Power4 chips and a host of other chips also.) Here's one of the many discussions we had about this issue last year: http://qaix.com/postgresql-database-development/337-670-re-opteron-vs-xeon-was-what-to-do-with-6-disks-read.shtml The exact reasons why Opteron runs PostgreSQL so much better than P4s, we're not 100% sure of. We have guesses -- lower memory latency, lack of shared FSB, better 64-bit, 64-bit IOMMU, context-switch storms on P4, better dualcore implementation and so on. Perhaps it's a combination of all the above factors but somehow, the general experience people have had is that equivalently priced Opterons servers run PostgreSQL 2X faster than P4 servers as the baseline and the gap increases as you add more sockets and more cores. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
The reason AMD is has held off from supporting DDR2 until now are: 1. DDR is EOL. JEDEC is not ratifying any DDR faster than 200x2 while DDR2 standards as fast as 333x4 are likely to be ratified (note that Intel pretty much avoided DDR, leaving it to AMD, while DDR2 is Intel's main RAM technology. Guess who has more pull with JEDEC?) DDR2 is to RDRAM as C# is to Java ;) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
I'm posting this to the entire performance list in the hopes that it will be generally useful. =r -Original Message- >From: [EMAIL PROTECTED] >Sent: Apr 26, 2006 3:25 AM >To: Ron Peacetree <[EMAIL PROTECTED]> >Subject: Re: [PERFORM] Large (8M) cache vs. dual-core CPUs > >Hi Ron: > >As a result of your post on the matter, I've been redoing some of my >online research on this subject, to see whether I do have one or more >things wrong. > I'm always in favor of independent investigation to find the truth. :-) >You say: > >> THROUGHPUT is better with DDR2 if and only if there is enough data >> to be fetched in a serial fashion from memory. >... >> So PC3200, 200MHz x2, is going to actually perform better than >> PC2-5400, 166MHz x4, for almost any memory access pattern except >> those that are highly sequential. >... >> For the mostly random memory access patterns that comprise many DB >> applications, the base latency of the RAM involved is going to >> matter more than the peak throughput AKA the bandwidth of that RAM. > >I'm trying to understand right now - why does DDR2 require data to be >fetched in a serial fashion, in order for it to maximize bandwidth? > SDR transfers data on either the rising or falling edge of its clock cycle. DDR transfers data on both the rising and falling edge of the base clock signal. If there is a contiguous chunk of 2+ datums to be transferred. DDR2 basically has a second clock that cycles at 2x the rate of the base clock and thus we get 4 data transfers per base clock cycle. If there is a contiguous chunk of 4+ datums to be transferred. Note also what happens when transferring the first datum after a lull period. For purposes of example, let's pretend that we are talking about a base clock rate of 200MHz= 5ns. The SDR still transfers data every 5ns no matter what. The DDR transfers the 1st datum in 10ns and then assuming there are at least 2 sequential datums to be transferred will transfer the 2nd and subsequent sequential pieces of data every 2.5ns. The DDR2 transfers the 1st datum in 20ns and then assuming there are at least 4 sequential datums to be transferred will transfer the 2nd and subsequent sequential pieces of data every 1.25ns. Thus we can see that randomly accessing RAM degrades performance significantly for DDR and DDR2. We can also see that the conditions for optimal RAM performance become more restrictive as we go from SDR to DDR to DDR2. The reason DDR2 with a low base clock rate excelled at tasks like streaming multimedia and stank at things like small transaction OLTP DB applications is now apparent. Factors like CPU prefetching and victim buffers can muddy this picture a bit. Also, if the CPU's off die IO is slower than the RAM it is talking to, how fast that RAM is becomes unimportant. The reason AMD is has held off from supporting DDR2 until now are: 1. DDR is EOL. JEDEC is not ratifying any DDR faster than 200x2 while DDR2 standards as fast as 333x4 are likely to be ratified (note that Intel pretty much avoided DDR, leaving it to AMD, while DDR2 is Intel's main RAM technology. Guess who has more pull with JEDEC?) 2. DDR and DDR2 RAM with equal base clock rates are finally available, removing the biggest performance difference between DDR and DDR2. 3. Due to the larger demand for DDR2, more of it is produced. That in turn has resulted in larger supplies of DDR2 than DDR. Which in turn, especially when combined with the factors above, has resulted in lower prices for DDR2 than for DDR of the same or faster base clock rate by now. Hope this is helpful, Ron ---(end of broadcast)--- TIP 6: explain analyze is your friend