Re: [PERFORM] Query planner is using wrong index.
--- Tom Lane [EMAIL PROTECTED] wrote: Brian Herlihy [EMAIL PROTECTED] writes: My options seem to be - Fudge the analysis results so that the selectivity estimate changes. I have tested reducing n_distinct, but this doesn't seem to help. - Combine the columns into one column, allowing postgres to calculate the combined selectivity. - Drop the (p2, p3) index. But I need this for other queries. Have you considered reordering the pkey to be (p2,p3,p1) and then dropping the (p2,p3) index? regards, tom lane Hi Tom, I've considered it. Unfortunately I need to do lookups on (p1) and (p1,p2) as well as (p1, p2, p3). The solution I've gone with is to create an index on (p2 || '/' || p3). This is unique for each p2/p3 combination, because p2 cannot contain the '/' character. I'm assuming that this index will be no slower to generate than one on (p2, p3), as concatenation is very cheap. Having the index on an expression hides it from the optimizer, which is then forced to use the primary key instead. It works perfectly now! There were only 2 queries in the system which need this index, so it was no problem to change them. Thankyou very much for all your time and patience! Before I go, I have a question - From discussions on the Postgresql irc channel, and from reading the TODO list on the website, I am under the impression that there are no plans to allow optimizer hints, such as use index table_pkey. Is this really true? Such a feature would make life inestimably easier for your end-users, particularly me :) Thanks, Brian ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Same SQL, 104296ms of difference between 7.4.12 and 8.0.7
Hello I have a sql statement that takes 108489.780 ms with 8.0.7 in a RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz / 8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with 2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM. Some information: - There is no IO when I am running the sql, but it uses 99% of the cpu. - I run VACUUM VERBOSE ANALYZE in both databases before the test. - The databases are identical. - No other jobs running when testing. - Some different parameters between 7.4.12 and 8.0.7 : 7.4.12: --- shared_buffers = 114966 #(15% of ram) sort_mem = 16384 vacuum_mem = 524288 wal_buffers = 64 checkpoint_segments = 16 effective_cache_size = 383220 #(50% ram) random_page_cost = 3 default_statistics_target = 100 8.0.7: -- shared_buffers = 250160 #(25% ram) work_mem = 8192 maintenance_work_mem = 131072 wal_buffers = 128 checkpoint_segments = 64 effective_cache_size = 500321 #(50% ram) random_page_cost = 3 default_statistics_target = 100 Any ideas of what I can test/configurate to find out why this happens? Thanks in advance. *** With 7.4.12 *** rtprod=# explain analyze SELECT DISTINCT main.* FROM Users main , Principals Principals_1, ACL ACL_2, Groups Groups_3, CachedGroupMembers CachedGroupMembers_4 WHERE ((ACL_2.RightName = 'OwnTicket')) AND ((CachedGroupMembers_4.MemberId = Principals_1.id)) AND ((Groups_3.id = CachedGroupMembers_4.GroupId)) AND ((Principals_1.Disabled = '0')OR(Principals_1.Disabled = '0')) AND ((Principals_1.id != '1')) AND ((main.id = Principals_1.id)) AND ( (ACL_2.PrincipalId = Groups_3.id AND ACL_2.PrincipalType = 'Group' AND ( Groups_3.Domain = 'SystemInternal' OR Groups_3.Domain = 'UserDefined' OR Groups_3.Domain = 'ACLEquivalence')) OR ( ( (Groups_3.Domain = 'RT::Queue-Role' ) ) AND Groups_3.Type = ACL_2.PrincipalType) ) AND (ACL_2.ObjectType = 'RT::System' OR (ACL_2.ObjectType = 'RT::Queue') ) ORDER BY main.Name ASC; QUERYPLAN Unique (cost=40250.00..40250.09 rows=1 width=695) (actual time=3974.528..4182.343 rows=264 loops=1) - Sort (cost=40250.00..40250.00 rows=1 width=695) (actual time=3974.522..3992.487 rows=24697 loops=1) Sort Key: main.name, main.id, main.password, main.comments, main.signature, main.emailaddress, main.freeformcontactinfo, main.organization, main.realname, main.nickname, main.lang, main.emailencoding, main.webencoding, main.externalcontactinfoid, main.contactinfosystem, main.externalauthid, main.authsystem, main.gecos, main.homephone, main.workphone, main.mobilephone, main.pagerphone, main.address1, main.address2, main.city, main.state, main.zip, main.country, main.timezone, main.pgpkey, main.creator, main.created, main.lastupdatedby, main.lastupdated - Nested Loop (cost=33.67..40249.99 rows=1 width=695) (actual time=37.793..3240.146 rows=24697 loops=1) - Nested Loop (cost=33.67..40246.95 rows=1 width=699) (actual time=37.754..2635.812 rows=24697 loops=1) - Nested Loop (cost=33.67..40242.47 rows=1 width=4) (actual time=37.689..2091.634 rows=24755 loops=1) - Nested Loop (cost=33.67..40225.72 rows=1 width=4) (actual time=37.663..1967.388 rows=54 loops=1) Join Filter: outer.domain)::text = 'RT::Queue-Role'::text) OR (inner.principalid = outer.id)) AND (((outer.type)::text = (inner.principaltype)::text) OR (inner.principalid = outer.id)) AND (((outer.domain)::text = 'RT::Queue-Role'::text) OR ((inner.principaltype)::text = 'Group'::text)) AND (((outer.type)::text = (inner.principaltype)::text) OR ((inner.principaltype)::text = 'Group'::text)) AND (((outer.type)::text = (inner.principaltype)::text) OR ((outer.domain)::text = 'SystemInternal'::text) OR ((outer.domain)::text = 'UserDefined'::text) OR ((outer.domain)::text = 'ACLEquivalence'::text))) - Index Scan using groups4, groups4, groups4, groups4 on groups groups_3 (cost=0.00..2164.05 rows=15845 width=32) (actual time=0.041..43.636 rows=16160 loops=1) Index Cond: (((domain)::text = 'RT::Queue-Role'::text) OR ((domain)::text = 'SystemInternal'::text) OR
Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and
Rafael Martinez Guerrero wrote: Hello I have a sql statement that takes 108489.780 ms with 8.0.7 in a RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz / 8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with 2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM. Some information: - There is no IO when I am running the sql, but it uses 99% of the cpu. - I run VACUUM VERBOSE ANALYZE in both databases before the test. - The databases are identical. - No other jobs running when testing. - Some different parameters between 7.4.12 and 8.0.7 : 7.4.12: --- shared_buffers = 114966 #(15% of ram) sort_mem = 16384 vacuum_mem = 524288 wal_buffers = 64 checkpoint_segments = 16 effective_cache_size = 383220 #(50% ram) random_page_cost = 3 default_statistics_target = 100 8.0.7: -- shared_buffers = 250160 #(25% ram) work_mem = 8192 maintenance_work_mem = 131072 wal_buffers = 128 checkpoint_segments = 64 effective_cache_size = 500321 #(50% ram) random_page_cost = 3 default_statistics_target = 100 Any ideas of what I can test/configurate to find out why this happens? Thanks in advance. I haven't looked in detail at the plans, but what stands out to me is that you've got a sort with a lot of columns and you've halved sort_mem (work_mem). Try increasing it (perhaps to 32000 even). set work_mem = 32000; Give that a quick go and see what happens. If it doesn't work, we'll look at the plans in more detail. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and
On Fri, 2006-04-07 at 15:31, Richard Huxton wrote: Rafael Martinez Guerrero wrote: Hello I have a sql statement that takes 108489.780 ms with 8.0.7 in a RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz / 8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with 2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM. Some information: - There is no IO when I am running the sql, but it uses 99% of the cpu. - I run VACUUM VERBOSE ANALYZE in both databases before the test. - The databases are identical. - No other jobs running when testing. - Some different parameters between 7.4.12 and 8.0.7 : 7.4.12: --- shared_buffers = 114966 #(15% of ram) sort_mem = 16384 vacuum_mem = 524288 wal_buffers = 64 checkpoint_segments = 16 effective_cache_size = 383220 #(50% ram) random_page_cost = 3 default_statistics_target = 100 8.0.7: -- shared_buffers = 250160 #(25% ram) work_mem = 8192 maintenance_work_mem = 131072 wal_buffers = 128 checkpoint_segments = 64 effective_cache_size = 500321 #(50% ram) random_page_cost = 3 default_statistics_target = 100 Any ideas of what I can test/configurate to find out why this happens? Thanks in advance. I haven't looked in detail at the plans, but what stands out to me is that you've got a sort with a lot of columns and you've halved sort_mem (work_mem). Try increasing it (perhaps to 32000 even). set work_mem = 32000; Give that a quick go and see what happens. If it doesn't work, we'll look at the plans in more detail. I know that this SQL could be done in a much better way, but I can not change it at the moment. work_mem = 16384: - After restarting the database and running the explain two times: 107911.229 ms work_mem = 32768: - After restarting the database and running the explain two times: 103988.337 ms -- Rafael Martinez, [EMAIL PROTECTED] Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Loading the entire DB into RAM
I have a web server with PostgreSQL and RHEL. It hosts a search engine, and each time some one makes a query, it uses the HDD Raid array. The DB is not very big, it is less than a GB. I plan to add more RAM anyway. What I'd like to do is find out how to keep the whole DB in RAM so that each time some one does a query, it doesn't use the HDD. Is it possible, if so, how? Thanks, Charles. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and
Rafael Martinez Guerrero wrote: Any ideas of what I can test/configurate to find out why this happens? Thanks in advance. I haven't looked in detail at the plans, but what stands out to me is that you've got a sort with a lot of columns and you've halved sort_mem (work_mem). Try increasing it (perhaps to 32000 even). set work_mem = 32000; Give that a quick go and see what happens. If it doesn't work, we'll look at the plans in more detail. I know that this SQL could be done in a much better way, but I can not change it at the moment. work_mem = 16384: - After restarting the database and running the explain two times: 107911.229 ms work_mem = 32768: - After restarting the database and running the explain two times: 103988.337 ms Damn! I hate it when I have to actually work at a problem :-) Well, the first thing that strikes me is that the row estimates are terrible for 7.4.12 (which runs quickly) and much better for 8.0.7 (which runs slowly). Which suggests you were lucky before. The second thing I notice is the bit that goes: Materialize ... Seq Scan on acl acl_2. If you compare the two you'll see that the 7.4 version loops 16,160 times but 8.0 loops 513,264 times. This is a bad choice, and I'm guessing it's made because it gets the row estimate wrong: Hash Join (cost=4667.85..51078.88 rows=62852 width=727) (actual time=649.028..13602.451 rows=513264 loops=1) That's the comparison Groups_3.id = CachedGroupMembers_4.GroupId if I'm reading this correctly. Is there anything unusual about those two columns? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and
On Fri, 2006-04-07 at 16:41 +0200, Gábriel Ákos wrote: Any ideas of what I can test/configurate to find out why this happens? Thanks in advance. Increase work_mem to 50% of memory, and don't care about maintenance_work_mem and effective_cache_size, they don't matter in this case. The problem is not the amount of memory. It works much faster with only 16M and 7.4.12 than 8.0.7. -- Rafael Martinez, [EMAIL PROTECTED] Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Loading the entire DB into RAM
On 4/7/06, Matt Davies | Postgresql List [EMAIL PROTECTED] wrote: Out of curiosity, what are you using as the search engine? Thank you. We designed the search engine ourself (we didn't use a ready-to-use solution). -- Charles A. Landemaine. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Loading the entire DB into RAM
If memory serves me correctly I have seen several posts about this in the past. I'll try to recall highlights. 1. Create a md in linux sufficiently large enough to handle the data set you are wanting to store. 2. Create a HD based copy somewhere as your permanent storage mechanism. 3. Start up your PostgreSQL instance with the MD as the data store 4. Load your data to the MD instance. 5. Figure out how you will change indexes _and_ ensure that your disk storage is consistent with your MD instance. I haven't done so, but it would be interesting to have a secondary database somewhere that is your primary storage. It needn't be especially powerful, or even available. It serves as the place to generate your indexing data. You could then use SLONY to propogate the data to the MD production system. Of course, if you are updating your system that resides in ram, you should be thinking the other way. Have SLONY replicate changes to the other, permanent storage, system. Either way you do it, I can't think of an out of the box method to doing it. Somehow one has to transfer data from permanent storage to the md instance, and, likewise, back to permanent storage. Out of curiosity, what are you using as the search engine? Charles A. Landemaine wrote: I have a web server with PostgreSQL and RHEL. It hosts a search engine, and each time some one makes a query, it uses the HDD Raid array. The DB is not very big, it is less than a GB. I plan to add more RAM anyway. What I'd like to do is find out how to keep the whole DB in RAM so that each time some one does a query, it doesn't use the HDD. Is it possible, if so, how? Thanks, Charles. ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Spotting planner errors (was Re: [PERFORM] Query planner is using wrong index.)
Tom Lane wrote: Brian Herlihy [EMAIL PROTECTED] writes: Before I go, I have a question - From discussions on the Postgresql irc channel, and from reading the TODO list on the website, I am under the impression that there are no plans to allow optimizer hints, such as use index table_pkey. Is this really true? I personally don't think it's a good idea: the time spent in designing, implementing, and maintaining a usable hint system would be significant, and IMHO the effort is better spent on *fixing* the optimizer problems than working around them. Tom - does the planner/executor know it's got row estimates wrong? That is, if I'm not running an EXPLAIN ANALYSE is there a point at which we could log planner estimate for X out by factor of Y? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: Spotting planner errors (was Re: [PERFORM] Query planner is using wrong index.)
Richard Huxton dev@archonet.com writes: Tom - does the planner/executor know it's got row estimates wrong? That is, if I'm not running an EXPLAIN ANALYSE is there a point at which we could log planner estimate for X out by factor of Y? Not at the moment, but you could certainly imagine changing the executor to count rows even without EXPLAIN ANALYZE, and then complain during plan shutdown. Not sure how helpful that would be; there would be a lot of noise from common cases such as executing underneath a LIMIT node. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: Spotting planner errors (was Re: [PERFORM] Query planner is using
Tom Lane wrote: Richard Huxton dev@archonet.com writes: Tom - does the planner/executor know it's got row estimates wrong? That is, if I'm not running an EXPLAIN ANALYSE is there a point at which we could log planner estimate for X out by factor of Y? Not at the moment, but you could certainly imagine changing the executor to count rows even without EXPLAIN ANALYZE, and then complain during plan shutdown. Not sure how helpful that would be; there would be a lot of noise from common cases such as executing underneath a LIMIT node. Hmm - thinking about it you'd probably want to record it similarly to stats too. It's the fact that the planner *repeatedly* gets an estimate wrong that's of interest. Would it be prohibitive to total actions taken - to act as raw data for random_page_cost / cpu_xxx_cost? If you could get a ratio of estimated vs actual time vs the various page-fetches/index-fetches etc. we could actually plug some meaningful numbers in. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Loading the entire DB into RAM
Charles A. Landemaine [EMAIL PROTECTED] writes: What I'd like to do is find out how to keep the whole DB in RAM so that each time some one does a query, it doesn't use the HDD. Is it possible, if so, how? That should happen essentially for free, if the kernel doesn't have any better use for the memory --- anything read from disk once will stay in kernel disk cache. Perhaps you need to take a closer look at your kernel VM parameters. Or maybe you don't have enough RAM yet for both the DB contents and the processes you need to run. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and 8.0.7
Rafael Martinez Guerrero [EMAIL PROTECTED] writes: I have a sql statement that takes 108489.780 ms with 8.0.7 in a RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz / 8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with 2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM. I think you've discovered a planner regression. Simplified test case using the regression database: explain select * from tenk1 a, tenk1 b where (a.ten = b.ten and (a.unique1 = 100 or a.unique1 = 101)) or (a.hundred = b.hundred and a.unique1 = 42); 7.4: Nested Loop (cost=0.00..2219.74 rows=4 width=488) Join Filter: (((outer.hundred = inner.hundred) OR (outer.ten = inner.ten)) AND ((outer.unique1 = 42) OR (outer.ten = inner.ten)) AND ((outer.hundred = inner.hundred) OR (outer.unique1 = 100) OR (outer.unique1 = 101))) - Index Scan using tenk1_unique1, tenk1_unique1, tenk1_unique1 on tenk1 a (cost=0.00..18.04 rows=3 width=244) Index Cond: ((unique1 = 42) OR (unique1 = 100) OR (unique1 = 101)) - Seq Scan on tenk1 b (cost=0.00..458.24 rows=10024 width=244) (5 rows) 8.0: Nested Loop (cost=810.00..6671268.00 rows=2103 width=488) Join Filter: (((outer.ten = inner.ten) AND ((outer.unique1 = 100) OR (outer.unique1 = 101))) OR ((outer.hundred = inner.hundred) AND (outer.unique1 = 42))) - Seq Scan on tenk1 a (cost=0.00..458.00 rows=1 width=244) - Materialize (cost=810.00..1252.00 rows=1 width=244) - Seq Scan on tenk1 b (cost=0.00..458.00 rows=1 width=244) (5 rows) Note the failure to pull out the unique1 conditions from the join clause and use them with the index. I didn't bother to do EXPLAIN ANALYZE; this plan obviously sucks compared to the other. 8.1: TRAP: FailedAssertion(!(!restriction_is_or_clause((RestrictInfo *) orarg)), File: indxpath.c, Line: 479) LOG: server process (PID 12201) was terminated by signal 6 server closed the connection unexpectedly Oh dear. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Loading the entire DB into RAM
On 4/7/06, Charles A. Landemaine [EMAIL PROTECTED] wrote: I have a web server with PostgreSQL and RHEL. It hosts a search engine, and each time some one makes a query, it uses the HDD Raid array. The DB is not very big, it is less than a GB. I plan to add more RAM anyway. What I'd like to do is find out how to keep the whole DB in RAM so that each time some one does a query, it doesn't use the HDD. Is it possible, if so, how? don't bother. If your database is smaller than ram on the box, the operating will cache it quite effectively. All you should be worrying about is to set fsync=on (you care about your data) or off (you don't). If your data is truly static you might get better performance out of a in-process data storage, like sqlite for example. Merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Loading the entire DB into RAM
** This has not been tested. Create a ramdisk of required size Create a Linux software RAID mirror between the ramdisk, and a partition of the same size. Mark the physical-disk as write-mostly (reads will go to the ramdisk) Format it and load data... On reboot you'll get a RAID1 mirror with 1 failed drive (because the ramdisk is dead of course). Just recreate the ramdisk and resync. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and 8.0.7
I wrote: Rafael Martinez Guerrero [EMAIL PROTECTED] writes: I have a sql statement that takes 108489.780 ms with 8.0.7 in a RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz / 8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with 2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM. I think you've discovered a planner regression. Simplified test case using the regression database: explain select * from tenk1 a, tenk1 b where (a.ten = b.ten and (a.unique1 = 100 or a.unique1 = 101)) or (a.hundred = b.hundred and a.unique1 = 42); I've repaired the assertion crash in 8.1/HEAD, but I don't think it's practical to teach 8.0 to optimize queries like this nicely. The reason 7.4 can do it is that 7.4 forces the WHERE condition into CNF, ie (a.hundred = b.hundred OR a.ten = b.ten) AND (a.unique1 = 42 OR a.ten = b.ten) AND (a.hundred = b.hundred OR a.unique1 = 100 OR a.unique1 = 101) AND (a.unique1 = 42 OR a.unique1 = 100 OR a.unique1 = 101) from which it's easy to extract the index condition for A. We decided that forcing to CNF wasn't such a hot idea, so 8.0 and later don't do it, but 8.0's logic for extracting index conditions from joinquals isn't up to the problem of handling sub-ORs. Fixing that looks like a larger change than I care to back-patch into an old release. My recommendation is to update to 8.1.4 when it comes out. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Bing-bong, passenger announcement.. the panic train is now pulling into platform 8.1.3. Bing-bong. =) OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650 (8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot up and our website is next to unusable. The IBM is not swapping (not with 16GB of RAM!), disk i/o is low, but there must be something critically wrong for this monster to be performing so badly.. There is little IO (maybe 500KB/sec), but the CPUs are often at 100% usage. VACUUM VERBOSE ANALYZE shows me 4 page slots are needed to track all free space. I have 16 page slots configured, and this machine is dedicated to pg. The thing that really winds me up about this, is that aside from all the normal 'my postgres is slow l0lz!' troubleshooting is the previous machine (Debian sarge on four 3GHz Xeons) is using 8.1.3 also, with an inferior I/O subsystem, and it churns through the workload very merrily, only reaching a full loadavg of 4 at peak times, and running our main 'hotelsearch' function in ~1000ms.. This IBM on the other hand is often taking 5-10 seconds to do the same thing - although just by watching the logs it's clear to see the workload coming in waves, and then calming down again. (this correlation is matched by watching the load-balancer's logs as it takes unresponsive webservers out of the cluster) Here's the differences (I've removed obvious things like file/socket paths) in select name,setting from pg_catalog.pg_settings between the two: --- cayenne 2006-04-07 18:43:48.0 +0100 # quad xeon +++ jalapeno2006-04-07 18:44:08.0 +0100 # ibm 650 - effective_cache_size| 32 + effective_cache_size| 64 - integer_datetimes | on + integer_datetimes | off - maintenance_work_mem| 262144 + maintenance_work_mem| 1048576 - max_connections | 150 + max_connections | 100 - max_fsm_pages | 66000 + max_fsm_pages | 16 - max_stack_depth | 2048 + max_stack_depth | 16384 - tcp_keepalives_count| 0 - tcp_keepalives_idle | 0 - tcp_keepalives_interval | 0 - temp_buffers| 1000 - TimeZone| GB + tcp_keepalives_count| 8 + tcp_keepalives_idle | 7200 + tcp_keepalives_interval | 75 + temp_buffers| 4000 + TimeZone| GMT0BST,M3.5.0,M10.5.0 - wal_sync_method | fdatasync - work_mem| 4096 + wal_sync_method | open_datasync + work_mem| 16384 So, jalapeno really should have much more room to move. shared_buffers is 6 on both machines. I'm reaching the end of my tether here - our search functions are just so extensive and my pg knowledge is so small that it's overwhelming to try and step through it to find any bottlenecks :( Just to reiterate, it all runs great on cayenne since we trimmed a lot of the fat out of the search, and I can't understand why the IBM box isn't absolutely throwing queries out the door :) Cheers, Gavin. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Gavin Hamill [EMAIL PROTECTED] writes: OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650 (8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot up and our website is next to unusable. The IBM is not swapping (not with 16GB of RAM!), disk i/o is low, but there must be something critically wrong for this monster to be performing so badly.. Have you vacuumed/analyzed since reloading your data? Compare some EXPLAIN ANALYZE outputs for identical queries on the two machines, that usually helps figure out what's wrong. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 2006-04-07 at 12:58, Gavin Hamill wrote: Bing-bong, passenger announcement.. the panic train is now pulling into platform 8.1.3. Bing-bong. =) OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650 (8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot up and our website is next to unusable. The IBM is not swapping (not with 16GB of RAM!), disk i/o is low, but there must be something critically wrong for this monster to be performing so badly.. There is little IO (maybe 500KB/sec), but the CPUs are often at 100% usage. Can you test your AIX box with linux on it? It may well be that something in AIX is causing this performance problem. I know that on the same SPARC hardware, a postgresql database is 2 or more times faster on top of linux or BSD than it is on solaris, at least it was back a few years ago when I tested it. Are the same queries getting the same basic execution plan on both boxes? Turn on logging for slow queries, and explain analyze them on both machines to see if they are. If they aren't, figure out why. I'd put the old 4 way Xeon back in production and do some serious testing of this pSeries machine. IBM should be willing to help you, I hope. My guess is that this is an OS issue. Maybe there are AIX tweaks that will get it up to the same or higher level of performance as your four way xeon. Maybe there aren't. Myself, I'd throw a spare drive in for the OS, put some flavor of linux on it http://www-1.ibm.com/partnerworld/pwhome.nsf/weblook/pat_linux_learn_why_power.html and do some load testing there. If the machine can't perform up to snuff with the same basic OS and a similar setup to your Xeon, send it back to IBM and buy one of these: http://www.asaservers.com/system_dept.asp?dept_id=SD-002 or something similar. I can't imagine it costing more than a pSeries. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and
On Fri, 2006-04-07 at 13:36 -0400, Tom Lane wrote: I wrote: Rafael Martinez Guerrero [EMAIL PROTECTED] writes: I have a sql statement that takes 108489.780 ms with 8.0.7 in a RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz / 8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with 2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM. I think you've discovered a planner regression. Simplified test case using the regression database: explain select * from tenk1 a, tenk1 b where (a.ten = b.ten and (a.unique1 = 100 or a.unique1 = 101)) or (a.hundred = b.hundred and a.unique1 = 42); I've repaired the assertion crash in 8.1/HEAD, but I don't think it's practical to teach 8.0 to optimize queries like this nicely. The reason 7.4 can do it is that 7.4 forces the WHERE condition into CNF, ie [..] Tom, thank you very much for your help. As I suspected this was a more complicated problem than the configuration of some parameters :( . Good that we have found out this now and not after the upgrade. All our upgrade plans and testing for all our databases have been done for/with 8.0.x (yes, I know 8.1.x is much better, but I am working in a conservative place from the sysadm point of view). We will have to change our plans and go for 8.1 if we want this to work. My recommendation is to update to 8.1.4 when it comes out. Any idea about when 8.1.4 will be released? Thanks again. -- Rafael Martinez, [EMAIL PROTECTED] Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] bad performance on Solaris 10
Ok, so I did a few runs for each of the sync methods, keeping all the rest constant and got this: open_datasync 0.7 fdatasync 4.6 fsync 4.5 fsync_writethrough not supported open_sync 0.6 in arbitrary units - higher is faster. Quite impressive! Chris, Just to make sure the x4100 config is similar to your Linux system, can you verify the default setting for disk write cache and make sure they are both enabled or disabled. Here's how to check in Solaris. As root, run format -e - pick a disk - cache - write_cache - display Not sure how to do it on Linux though! Regards, -Robert I don't have access to the machine for the next few days due to eh... let's call it firewall accident ;), but it might very well be that it was off on the x4100 (I know it's on the smaller Linux box). That together with the bad default sync method can definitely explain the strangely slow out of box performance I got. So thanks again for explaining this to me :) Bye, Chris. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 07 Apr 2006 14:41:39 -0400 Tom Lane [EMAIL PROTECTED] wrote: Gavin Hamill [EMAIL PROTECTED] writes: OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650 (8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot up and our website is next to unusable. The IBM is not swapping (not with 16GB of RAM!), disk i/o is low, but there must be something critically wrong for this monster to be performing so badly.. Have you vacuumed/analyzed since reloading your data? Absolutely - a VACUUM FULL was the first thing I did, and have VACUUM ANALYZE VERBOSE'd a couple of times since. I have plenty of overhead to keep the entire free space map in RAM. Compare some EXPLAIN ANALYZE outputs for identical queries on the two machines, that usually helps figure out what's wrong. If only :) Since 90% of the db work is the 'hotelsearch' function (which is 350 lines-worth that I'm not permitted to share :(( ), an EXPLAIN ANALYZE reveals practically nothing: # jalapeno (IBM) laterooms=# EXPLAIN ANALYZE select * from hotelsearch(12.48333::numeric, 41.9::numeric, 5::int4, '2006-04-13'::date, 5::int4, NULL::int4, 1::int4, NULL::int4, NULL::int4, TRUE::bool, FALSE::bool, FALSE::bool, 1::int2, 'GBP'::text, 'Roma'::text, 7::int4, NULL::int4, NULL::int4) limit 500; QUERY PLAN --- Limit (cost=0.00..6.25 rows=500 width=1587) (actual time=2922.282..2922.908 rows=255 loops=1) - Function Scan on hotelsearch (cost=0.00..12.50 rows=1000 width=1587) (actual time=2922.277..2922.494 rows=255 loops=1) Total runtime: 2923.296 ms (3 rows) # cayenne (xeon) laterooms=# EXPLAIN ANALYZE select * from hotelsearch(12.48333::numeric, 41.9::numeric, 5::int4, '2006-04-13'::date, 5::int4, NULL::int4, 1::int4, NULL::int4, NULL::int4, TRUE::bool, FALSE::bool, FALSE::bool, 1::int2, 'GBP'::text, 'Roma'::text, 7::int4, NULL::int4, NULL::int4) limit 500; QUERY PLAN --- Limit (cost=0.00..6.25 rows=500 width=1587) (actual time=1929.483..1930.103 rows=255 loops=1) - Function Scan on hotelsearch (cost=0.00..12.50 rows=1000 width=1587) (actual time=1929.479..1929.693 rows=255 loops=1) Total runtime: 1930.506 ms (3 rows) The 'LIMIT 500' is a red herring since the function body will get all data, so reducing the LIMIT in the call to hotelsearch doesn't reduce the amount of work being done. The killer in it all is tail'ing the postgres log (which I have set only to log queries at 1000ms or up) is things will be returning at 1000-2000ms.. then suddenly shoot up to 8000ms.. and if I try a few of those 8000ms queries on the xeon box, they exec in ~1500ms.. and if I try them again a few moments later on the ibm, they'll also exec in maybe ~2500ms. This is one hell of a moving target and I can't help but think I'm just missing something that's right in front of my nose, too close to see. Cheers, Gavin. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 07 Apr 2006 13:54:21 -0500 Scott Marlowe [EMAIL PROTECTED] wrote: Are the same queries getting the same basic execution plan on both boxes? Turn on logging for slow queries, and explain analyze them on both machines to see if they are. See reply to Tom Lane :) I'd put the old 4 way Xeon back in production and do some serious testing of this pSeries machine. IBM should be willing to help you, I hope. They probably would if this had been bought new - as it is, we have rented the machine for a month from a 2nd-user dealer to see if it's capable of taking the load. I'm now glad we did this. My guess is that this is an OS issue. Maybe there are AIX tweaks that will get it up to the same or higher level of performance as your four way xeon. Maybe there aren't. The pSeries isn't much older than our Xeon machine, and I expected the performance level to be exemplary out of the box.. we've enabled the 64-bit kernel+userspace, and compiled pg for 64-bitness with the gcc flags as reccommended by Senica Cunningham on this very list.. Myself, I'd throw a spare drive in for the OS, put some flavor of linux on it Terrifying given I know nothing about the pSeries boot system, but at this stage I'm game for nearly anything. http://www.asaservers.com/system_dept.asp?dept_id=SD-002 Multi-Opteron was the other thing we considered but decided to give 'Big Iron' UNIX a whirl... Cheers, Gavin. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Gavin Hamill [EMAIL PROTECTED] writes: Scott Marlowe [EMAIL PROTECTED] wrote: My guess is that this is an OS issue. Maybe there are AIX tweaks that will get it up to the same or higher level of performance as your four way xeon. Maybe there aren't. The pSeries isn't much older than our Xeon machine, and I expected the performance level to be exemplary out of the box.. I'm fairly surprised too. One thing I note from your comparison of settings is that the default WAL sync method is different on the two operating systems. If the query load is update-heavy then it would be very worth your while to experiment with the sync method. However, if the bottleneck is pure-SELECT transactions then WAL sync should not be a factor at all. Does AIX have anything comparable to oprofile or dtrace? It'd be interesting to try to monitor things at that level and see what we can learn. Failing a low-level profiler, there should at least be something comparable to strace --- you should try watching some of the backends with strace and see what their behavior is when the performance goes south. Lots of delaying select()s or semop()s would be a red flag. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 7 Apr 2006 20:59:19 +0100 Gavin Hamill [EMAIL PROTECTED] wrote: I'd put the old 4 way Xeon back in production and do some serious testing of this pSeries machine. IBM should be willing to help you, I hope. They probably would if this had been bought new - as it is, we have rented the machine for a month from a 2nd-user dealer to see if it's capable of taking the load. I'm now glad we did this. We also had problems with a high end AIX system and we got no help from IBM. They expected you to put Oracle on and if you used anything else you were on your own. We had exactly the same issue. We expected to get an order of magnitude improvement and instead the app bogged down. It also got worse over time. We had to reboot every night to get anything out of it. Needless to say, they got their system back. My guess is that this is an OS issue. Maybe there are AIX tweaks that will get it up to the same or higher level of performance as your four way xeon. Maybe there aren't. The pSeries isn't much older than our Xeon machine, and I expected the performance level to be exemplary out of the box.. we've enabled the 64-bit kernel+userspace, and compiled pg for 64-bitness with the gcc flags as reccommended by Senica Cunningham on this very list.. That's Seneca. We found that our money was better spent on multiple servers running NetBSD with a home grown multi-master replication system. Need more power? Just add more servers. -- D'Arcy J.M. Cain darcy@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 2006-04-07 at 14:59, Gavin Hamill wrote: On Fri, 07 Apr 2006 13:54:21 -0500 Scott Marlowe [EMAIL PROTECTED] wrote: Are the same queries getting the same basic execution plan on both boxes? Turn on logging for slow queries, and explain analyze them on both machines to see if they are. See reply to Tom Lane :) I didn't see one go by yet... Could be sitting in the queue. They probably would if this had been bought new - as it is, we have rented the machine for a month from a 2nd-user dealer to see if it's capable of taking the load. I'm now glad we did this. Thank god. I had a picture of you sitting on top of a brand new very expensive pSeries Let us know if changing the fsync setting helps. Hopefully that's all the problem is. Off on a tangent. If the aggregate memory bandwidth of the pSeries is no greater than you Xeon you might not see a big improvement if you were memory bound before. If you were CPU bound, you may or may not see an improvement. Can you describe the disc subsystems in the two machines for us? What kind of read / write load you have? It could be the older box was running on IDE drives with fake fsync responses which would lie, be fast, but not reliable in case of a power outage. Do you have hardware RAID for your pSeries? how many discs, how much battery backed cache, etc? Multi-Opteron was the other thing we considered but decided to give 'Big Iron' UNIX a whirl... It still might be a good choice, if it's a simple misconfiguration issue. But man, those new multiple core opterons can make some impressive machines for very little money. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Gavin Hamill [EMAIL PROTECTED] writes: This is one hell of a moving target and I can't help but think I'm just missing something that's right in front of my nose, too close to see. I'm assuming you compiled postgres yourself? Do you have the output from the configure script? I'm wondering if it failed to find a good spinlock match for the architecture. Not sure if that's really likely but it's a possibility. Also, I'm pretty sure IBM has tools that would let you disable some of the processors to see if maybe it's a shared memory bus issue. If I understand you right the machine isn't in production yet? In which case I would get timing information for a single processor, two processors, four processors, and eight processors. If you see it max out and start dropping then that would point towards a hardware/low level postgres issue like spinlocks or shared memory rather than a high level database issue like stats. -- greg ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Gavin Hamill wrote: Bing-bong, passenger announcement.. the panic train is now pulling into platform 8.1.3. Bing-bong. =) OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650 (8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot up and our website is next to unusable. The IBM is not swapping (not I would say running _one_ query at a time depends on the power of _one_ cpu. PPCs aren't that fast, I'd say they are slower than Xeons. Moreover I'm sure that AMD Opterons are faster than Xeons. I'd say you should go and test an opteron-based configuration. You'll get much more power for the same (much likely for less) money. -- Üdvözlettel, Gábriel Ákos -=E-Mail :[EMAIL PROTECTED]|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353|Mobil:+36209278894=- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 7 Apr 2006 16:16:02 -0400 D'Arcy J.M. Cain darcy@druid.net wrote: We also had problems with a high end AIX system and we got no help from IBM. They expected you to put Oracle on and if you used anything else you were on your own. Urk, I thought IBM were supposedly Linux sycophants thesedays... We had exactly the same issue. We expected to get an order of magnitude improvement and instead the app bogged down. That's kind of encouraging, I suppose - that it might not be something mind-bogglingly stupid I'm doing. It also got worse over time. We had to reboot every night to get anything out of it. Needless to say, they got their system back. nod That's Seneca. Oops - meant to check the spelling before I sent that =) We found that our money was better spent on multiple servers running NetBSD with a home grown multi-master replication system. Need more power? Just add more servers. Aye, I originally suggested multiple servers, but was talked round to one giant db so that our devels didn't have to rewrite code to deal with read/write + read-only db handles... Cheers, Gavin. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 07 Apr 2006 15:24:18 -0500 Scott Marlowe [EMAIL PROTECTED] wrote: See reply to Tom Lane :) I didn't see one go by yet... Could be sitting in the queue. If it's not arrived by now - EXPLAIN ANALYZE doesn't tell me anything :) Let us know if changing the fsync setting helps. Hopefully that's all the problem is. fsync's already off - yes a bit scary, but our I/O is only about 500KB/sec writing.. the whole db fits in RAM / kernel disk cache, and I'd rather have performance than security at this exact moment.. Off on a tangent. If the aggregate memory bandwidth of the pSeries is no greater than you Xeon you might not see a big improvement if you were memory bound before. If you were CPU bound, you may or may not see an improvement. I did look into the specs of the system, and the memory bw on the pSeries was /much/ greater than the Xeon - it's one of the things that really pushed me towards it in the end. I forget the figures, but it was 3 or 4 times greater. Can you describe the disc subsystems in the two machines for us? What kind of read / write load you have? It could be the older box was running on IDE drives with fake fsync responses which would lie, be fast, but not reliable in case of a power outage. Again, I'm confident that I/O's not the killer here.. the Xeon is a Dell 6850- hardware RAID1.. SCSI drives. Multi-Opteron was the other thing we considered but decided to give 'Big Iron' UNIX a whirl... It still might be a good choice, if it's a simple misconfiguration issue. But man, those new multiple core opterons can make some impressive machines for very little money. So I see - we could buy two quad-opterons for the cost of renting this pSeries for a month Cheers, Gavin. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 07 Apr 2006 16:06:02 -0400 Tom Lane [EMAIL PROTECTED] wrote: The pSeries isn't much older than our Xeon machine, and I expected the performance level to be exemplary out of the box.. I'm fairly surprised too. One thing I note from your comparison of settings is that the default WAL sync method is different on the two operating systems. We're very read-focussed.. there's update activity, sure, but the IO is only pushing about 500KByte/sec on average, usually much less. I also have fsync switched off - yes dangerous, but I just want to eliminate IO completely as a contributing factor. Does AIX have anything comparable to oprofile or dtrace? I've used neither on Linux, but a quick google showed up a few articles along the lines of 'in theory it shouldn't be hard to port to AIX' but nothing concrete. My guess is IBM sell a tool to do this. Hell, the C++ compiler is £1200... (hence our use of GCC 4.1 to compile pg) Failing a low-level profiler, there should at least be something comparable to strace --- you should try watching some of the backends with strace and see what their behavior is when the performance goes south. Lots of delaying select()s or semop()s would be a red flag. There's truss installed which seems to do the same as strace on Linux... and here's a wildly non-scientific glance.. I watched the 'topas' output (top for AIX) , identified a PID that was doing a lot of work, then attached truss to that pid. In addition to lots of send (), recv() and lseek()s... about once a minute I saw hundreds of calls to __semop() interspersed with _select(), followed by tons of lseek() +kread()+__semop() and then I can see the kwrite() to the pg logfile 246170: kwrite(2, L O G : d u, 8)= 8 etc. Cheers, Gavin. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Title: Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow. Gavin, On 4/7/06 2:24 PM, Gavin Hamill [EMAIL PROTECTED] wrote: I did look into the specs of the system, and the memory bw on the pSeries was /much/ greater than the Xeon - it's one of the things that really pushed me towards it in the end. I forget the figures, but it was 3 or 4 times greater. >From the literature at: http://www-03.ibm.com/servers/eserver/pseries/hardware/midrange/p650_desc.html The pSeries 650 features a peak aggregate memory to L3 cache bandwidth of 25.6GB/second for an 8way configuration. In addition, aggregate I/O bandwidth is up to 16GB/second. The result is a remarkable combination of system architecture, speed and power that delivers efficient and cost-effective data sharing and application throughput. Thats a total of 25.6GB/s for 8 CPUs, or 3.2GB/s per CPU. 3GHz P4 Xeons typically have an 800MHz memory bus with double the speed at 6.4GB/s result (800MHz x 8 bytes per L2 cache line = 6.4GB/s). Furthermore, the speed at which the P4 Xeon can load data into L2 cache from memory is matched to the bus because the L2 cache line width is 8 bytes wide and can stream data to L2 at full bus speed. That said, I find typical memory bandwidth for the P4 in applications is limited at about 2GB/s. See here for more detail: http://www.cs.virginia.edu/stream/standard/Bandwidth.html In fact, looking at the results there, the IBM 650m2 only gets 6GB/s on all 8 CPUs. I wouldnt be surprised if the strange L3 cache architecture of the IBM 650 is holding it back from streaming memory access efficiently. Whether this has anything to do with your problem or not, I have no idea! - Luke
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Gavin Hamill [EMAIL PROTECTED] writes: There's truss installed which seems to do the same as strace on Linux... and here's a wildly non-scientific glance.. I watched the 'topas' output (top for AIX) , identified a PID that was doing a lot of work, then attached truss to that pid. In addition to lots of send (), recv() and lseek()s... Those are good, they represent real work getting done. about once a minute I saw hundreds of calls to __semop() interspersed with _select(), This is not good. Did the semop storms coincide with visible slowdown? (I'd assume so, but you didn't actually say...) regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Luke Lonergan [EMAIL PROTECTED] writes: That said, I find typical memory bandwidth for the P4 in applications is limited at about 2GB/s. See here for more detail: http://www.cs.virginia.edu/stream/standard/Bandwidth.html In fact, looking at the results there, the IBM 650m2 only gets 6GB/s on all 8 CPUs. I wouldn't be surprised if the strange L3 cache architecture of the IBM 650 is holding it back from streaming memory access efficiently. Given Gavin's latest report, I'm wondering how much the IBM slows down when a spinlock operation is involved. If the memory architecture isn't good about supporting serialized access to memory, that gaudy sounding bandwidth number might have little to do with PG's real-world behavior. On the other hand, we already know that Xeons suck about as badly as can be on that same measure; could the pSeries really be worse? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Tom, On 4/7/06 3:02 PM, Tom Lane [EMAIL PROTECTED] wrote: On the other hand, we already know that Xeons suck about as badly as can be on that same measure; could the pSeries really be worse? I wouldn't be too surprised, but it sounds like it needs a test. Do we have a test for this? Is there a contention-prone query stream that we can think up? - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Luke Lonergan [EMAIL PROTECTED] writes: On 4/7/06 3:02 PM, Tom Lane [EMAIL PROTECTED] wrote: On the other hand, we already know that Xeons suck about as badly as can be on that same measure; could the pSeries really be worse? I wouldn't be too surprised, but it sounds like it needs a test. Do we have a test for this? Is there a contention-prone query stream that we can think up? If you want you could install a pre-8.1 PG and then try one of the queries that we were using as test cases a year ago for spinlock investigations. I don't recall details right now but I remember having posted a pretty trivial test case that would send a multiprocessor machine into context-swap storm, which sounds a whole lot like what Gavin is seeing. I think that 8.1 ought to be relatively free of buffer-manager spinlock contention, which is why I doubt that test case would be interesting against 8.1. The interesting question is what else is he seeing contention for, if it's not the BufMgrLock? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 07 Apr 2006 17:56:49 -0400 Tom Lane [EMAIL PROTECTED] wrote: This is not good. Did the semop storms coincide with visible slowdown? (I'd assume so, but you didn't actually say...) If I'd been able to tell, then I'd tell you =) I'll have another go... Yes, there's a definate correlation here.. I attached truss to the main postmaster.. $ truss -Ff -p 340344 21 | grep semop here's a snippet 278774: __semop(15728650, 0x0FFF7E80, 1)= 0 155712: __semop(15728650, 0x0FFF5920, 1)= 0 278774: __semop(15728649, 0x0FFF6F10, 1) 114914: __semop(15728649, 0x0FFF6A40, 1)= 0 = 0 114914: __semop(15728650, 0x0FFF61E0, 1) 155712: __semop(15728650, 0x0FFF6850, 1)= 0 = 0 155712: __semop(15728650, 0x0FFF6890, 1)= 0 1 55712: __semop(15728650, 0x0FFF5920, 1) 278774: __semop(15728650, 0x0FFF6F10, 1) 155712: __semop(15728650, 0x0FFF6850, 1)= 0 = 0 278774: __semop(15728649, 0x0FFF7E40, 1) 114914: __semop(15728649, 0x0FFF6A80, 1)= 0 = 0 278774: __semop(15728650, 0x0FFF7E80, 1) And when I saw a flood of semop's for any particular PID, a second later in the 'topas' process list would show that PID at a 100% CPU ... Most intriguing :) Cheers, Gavin. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Gavin Hamill [EMAIL PROTECTED] writes: On Fri, 07 Apr 2006 17:56:49 -0400 Tom Lane [EMAIL PROTECTED] wrote: This is not good. Did the semop storms coincide with visible slowdown? (I'd assume so, but you didn't actually say...) Yes, there's a definate correlation here.. I attached truss to the main postmaster.. ... And when I saw a flood of semop's for any particular PID, a second later in the 'topas' process list would show that PID at a 100% CPU ... So apparently we've still got a problem with multiprocess contention for an LWLock somewhere. It's not the BufMgrLock because that's gone in 8.1. It could be one of the finer-grain locks that are still there, or it could be someplace else. Are you in a position to try your workload using PG CVS tip? There's a nontrivial possibility that we've already fixed this --- a couple months ago I did some work to reduce contention in the lock manager: 2005-12-11 16:02 tgl * src/: backend/access/transam/twophase.c, backend/storage/ipc/procarray.c, backend/storage/lmgr/README, backend/storage/lmgr/deadlock.c, backend/storage/lmgr/lock.c, backend/storage/lmgr/lwlock.c, backend/storage/lmgr/proc.c, include/storage/lock.h, include/storage/lwlock.h, include/storage/proc.h: Divide the lock manager's shared state into 'partitions', so as to reduce contention for the former single LockMgrLock. Per my recent proposal. I set it up for 16 partitions, but on a pgbench test this gives only a marginal further improvement over 4 partitions --- we need to test more scenarios to choose the number of partitions. This is unfortunately not going to help you as far as getting that machine into production now (unless you're brave enough to run CVS tip as production, which I certainly am not). I'm afraid you're most likely going to have to ship that pSeries back at the end of the month, but while you've got it it'd be awfully nice if we could use it as a testbed ... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Gavin, On 4/7/06 3:27 PM, Gavin Hamill [EMAIL PROTECTED] wrote: 278774: __semop(15728650, 0x0FFF7E80, 1)= 0 155712: __semop(15728650, 0x0FFF5920, 1)= 0 278774: __semop(15728649, 0x0FFF6F10, 1) 114914: __semop(15728649, 0x0FFF6A40, 1)= 0 = 0 114914: __semop(15728650, 0x0FFF61E0, 1) 155712: __semop(15728650, 0x0FFF6850, 1)= 0 = 0 155712: __semop(15728650, 0x0FFF6890, 1)= 0 1 55712: __semop(15728650, 0x0FFF5920, 1) 278774: __semop(15728650, 0x0FFF6F10, 1) 155712: __semop(15728650, 0x0FFF6850, 1)= 0 = 0 278774: __semop(15728649, 0x0FFF7E40, 1) 114914: __semop(15728649, 0x0FFF6A80, 1)= 0 = 0 278774: __semop(15728650, 0x0FFF7E80, 1) Seems like you're hitting a very small target in RAM with these semop calls. I wonder what part of the code is doing this - Tom would know better how to trace it, but the equivalent of oprofile output would be nice. The other thing that I'd like to see is an evaluation of the memory access latency of this machine from Register to RAM. I couldn't find a benchmarking tool that was UNIX friendly out there, maybe I'll write one real quick. I suspect this machine has a heinous latency and a storm of semops to the same spot of RAM might be a far worse performance problem on this machine than on others... - Luke ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Luke Lonergan [EMAIL PROTECTED] writes: On 4/7/06 3:27 PM, Gavin Hamill [EMAIL PROTECTED] wrote: 278774: __semop(15728650, 0x0FFF7E80, 1)= 0 155712: __semop(15728650, 0x0FFF5920, 1)= 0 278774: __semop(15728649, 0x0FFF6F10, 1) Seems like you're hitting a very small target in RAM with these semop calls. IIRC the address passed to semop() in our code is always a local struct on the stack, so that's a bit of a red herring --- there won't be cross-processor contention for that. It's plausible though that we are seeing contention across members of the LWLock array, with the semop storm just being a higher-level symptom of the real hardware-level problem. You might try increasing LWLOCK_PADDED_SIZE to 64 or even 128, see src/backend/storage/lmgr/lwlock.c (this is something that does exist in 8.1, so it'd be easy to try). regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 07 Apr 2006 18:52:20 -0400 Tom Lane [EMAIL PROTECTED] wrote: Are you in a position to try your workload using PG CVS tip? There's a nontrivial possibility that we've already fixed this --- a couple months ago I did some work to reduce contention in the lock manager: Well, there's a question. At the moment it's still live - but I'll need to swap back to the Xeon machine since I can't afford to have a Saturday with the db firing on three cylinders (out of eight :) At that point you're welcome to twiddle, compile, throw anything you want at it. If it helps us as much as the greater pg world, then that's perfect. This is unfortunately not going to help you as far as getting that machine into production now (unless you're brave enough to run CVS tip as production, which I certainly am not). .. if the problem can actually be boiled down to the locking/threading issues, surely it should be straightforward to backport those changes to 8.1.3 mainline? I'm afraid you're most likely going to have to ship that pSeries back at the end of the month, but while you've got it it'd be awfully nice if we could use it as a testbed ... We have it for the next 2 weeks, and whilst I can't guarantee access for all that time, you're welcome to hammer away at it over this weekend if that's any help? Mail me privately and I'll sort out login details if this is interesting. Cheers, Gavin. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 07 Apr 2006 15:56:52 -0700 Luke Lonergan [EMAIL PROTECTED] wrote: Seems like you're hitting a very small target in RAM with these semop calls. I wonder what part of the code is doing this - Tom would know better how to trace it, but the equivalent of oprofile output would be nice. I'm happy to test whatever I can, but I simply don't know enough AIX to be able to tell whether a similar kernel-level profiler is available/possible. The other thing that I'd like to see is an evaluation of the memory access latency of this machine from Register to RAM. I couldn't find a benchmarking tool that was UNIX friendly out there, maybe I'll write one real quick. I suspect this machine has a heinous latency and a storm of semops to the same spot of RAM might be a far worse performance problem on this machine than on others... Well, as I said to Tom, the machine is available for running tests on :) If it helps us, and helps pg become more AIX friendly, then I'm all for whatever needs done... Cheers, Gavin. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster