Re: [PERFORM] Query planner is using wrong index.

2006-04-07 Thread Brian Herlihy
--- Tom Lane [EMAIL PROTECTED] wrote:

 Brian Herlihy [EMAIL PROTECTED] writes:
  My options seem to be
- Fudge the analysis results so that the selectivity estimate changes.  I
  have tested reducing n_distinct, but this doesn't seem to help.
- Combine the columns into one column, allowing postgres to calculate the
  combined selectivity.
- Drop the (p2, p3) index.  But I need this for other queries.
 
 Have you considered reordering the pkey to be (p2,p3,p1) and then
 dropping the (p2,p3) index?
 
   regards, tom lane

Hi Tom,

I've considered it.  Unfortunately I need to do lookups on (p1) and (p1,p2) as
well as (p1, p2, p3).

The solution I've gone with is to create an index on (p2 || '/' || p3).  This
is unique for each p2/p3 combination, because p2 cannot contain the '/'
character.  I'm assuming that this index will be no slower to generate than one
on (p2, p3), as concatenation is very cheap.  Having the index on an expression
hides it from the optimizer, which is then forced to use the primary key
instead.

It works perfectly now!  There were only 2 queries in the system which need
this index, so it was no problem to change them.

Thankyou very much for all your time and patience!

Before I go, I have a question - From discussions on the Postgresql irc
channel, and from reading the TODO list on the website, I am under the
impression that there are no plans to allow optimizer hints, such as use index
table_pkey.  Is this really true?  Such a feature would make life inestimably
easier for your end-users, particularly me :)

Thanks,
Brian

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Same SQL, 104296ms of difference between 7.4.12 and 8.0.7

2006-04-07 Thread Rafael Martinez Guerrero
Hello

I have a sql statement that takes 108489.780 ms with 8.0.7 in a
RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz /
8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with
2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM.

Some information:

- There is no IO when I am running the sql, but it uses 99% of the cpu. 
- I run VACUUM VERBOSE ANALYZE in both databases before the test.
- The databases are identical.
- No other jobs running when testing.
- Some different parameters between 7.4.12 and 8.0.7 :

7.4.12:
---
shared_buffers = 114966   #(15% of ram)  
sort_mem = 16384
vacuum_mem = 524288 
wal_buffers = 64 
checkpoint_segments = 16   
effective_cache_size = 383220   #(50% ram)
random_page_cost = 3
default_statistics_target = 100 

8.0.7:
--
shared_buffers = 250160   #(25% ram)
work_mem = 8192
maintenance_work_mem = 131072  
wal_buffers = 128 
checkpoint_segments = 64   
effective_cache_size = 500321  #(50% ram)
random_page_cost = 3
default_statistics_target = 100
 
Any ideas of what I can test/configurate to find out why this happens?
Thanks in advance.


***
With 7.4.12
***
rtprod=# explain analyze SELECT DISTINCT main.* FROM Users main ,
Principals Principals_1, ACL ACL_2, Groups Groups_3, CachedGroupMembers
CachedGroupMembers_4  WHERE ((ACL_2.RightName = 'OwnTicket')) AND
((CachedGroupMembers_4.MemberId = Principals_1.id)) AND ((Groups_3.id =
CachedGroupMembers_4.GroupId)) AND ((Principals_1.Disabled =
'0')OR(Principals_1.Disabled = '0')) AND ((Principals_1.id != '1')) AND
((main.id = Principals_1.id)) AND  ( (ACL_2.PrincipalId =
Groups_3.id AND ACL_2.PrincipalType = 'Group' AND (   Groups_3.Domain =
'SystemInternal' OR Groups_3.Domain = 'UserDefined' OR Groups_3.Domain =
'ACLEquivalence')) OR ( ( (Groups_3.Domain = 'RT::Queue-Role' )  )  AND
Groups_3.Type = ACL_2.PrincipalType) )  AND (ACL_2.ObjectType =
'RT::System' OR  (ACL_2.ObjectType = 'RT::Queue') )  ORDER BY main.Name
ASC;
 




  QUERYPLAN 



   

 Unique  (cost=40250.00..40250.09 rows=1 width=695) (actual
time=3974.528..4182.343 rows=264 loops=1)
   -  Sort  (cost=40250.00..40250.00 rows=1 width=695) (actual
time=3974.522..3992.487 rows=24697 loops=1)
 Sort Key: main.name, main.id, main.password, main.comments,
main.signature, main.emailaddress, main.freeformcontactinfo,
main.organization, main.realname, main.nickname, main.lang,
main.emailencoding, main.webencoding, main.externalcontactinfoid,
main.contactinfosystem, main.externalauthid, main.authsystem,
main.gecos, main.homephone, main.workphone, main.mobilephone,
main.pagerphone, main.address1, main.address2, main.city, main.state,
main.zip, main.country, main.timezone, main.pgpkey, main.creator,
main.created, main.lastupdatedby, main.lastupdated
 -  Nested Loop  (cost=33.67..40249.99 rows=1 width=695)
(actual time=37.793..3240.146 rows=24697 loops=1)
   -  Nested Loop  (cost=33.67..40246.95 rows=1 width=699)
(actual time=37.754..2635.812 rows=24697 loops=1)
 -  Nested Loop  (cost=33.67..40242.47 rows=1
width=4) (actual time=37.689..2091.634 rows=24755 loops=1)
   -  Nested Loop  (cost=33.67..40225.72 rows=1
width=4) (actual time=37.663..1967.388 rows=54 loops=1)
 Join Filter:
outer.domain)::text = 'RT::Queue-Role'::text) OR
(inner.principalid = outer.id)) AND (((outer.type)::text =
(inner.principaltype)::text) OR (inner.principalid = outer.id))
AND (((outer.domain)::text = 'RT::Queue-Role'::text) OR
((inner.principaltype)::text = 'Group'::text)) AND
(((outer.type)::text = (inner.principaltype)::text) OR
((inner.principaltype)::text = 'Group'::text)) AND
(((outer.type)::text = (inner.principaltype)::text) OR
((outer.domain)::text = 'SystemInternal'::text) OR
((outer.domain)::text = 'UserDefined'::text) OR
((outer.domain)::text = 'ACLEquivalence'::text)))
 -  Index Scan using groups4, groups4,
groups4, groups4 on groups groups_3  (cost=0.00..2164.05 rows=15845
width=32) (actual time=0.041..43.636 rows=16160 loops=1)
   Index Cond: (((domain)::text =
'RT::Queue-Role'::text) OR ((domain)::text = 'SystemInternal'::text)
OR 

Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and

2006-04-07 Thread Richard Huxton

Rafael Martinez Guerrero wrote:

Hello

I have a sql statement that takes 108489.780 ms with 8.0.7 in a
RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz /
8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with
2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM.

Some information:

- There is no IO when I am running the sql, but it uses 99% of the cpu. 
- I run VACUUM VERBOSE ANALYZE in both databases before the test.

- The databases are identical.
- No other jobs running when testing.
- Some different parameters between 7.4.12 and 8.0.7 :

7.4.12:
---
shared_buffers = 114966   #(15% of ram)  
sort_mem = 16384
vacuum_mem = 524288 
wal_buffers = 64 
checkpoint_segments = 16   
effective_cache_size = 383220   #(50% ram)
random_page_cost = 3
default_statistics_target = 100 


8.0.7:
--
shared_buffers = 250160   #(25% ram)
work_mem = 8192
maintenance_work_mem = 131072  
wal_buffers = 128 
checkpoint_segments = 64   
effective_cache_size = 500321  #(50% ram)
random_page_cost = 3
default_statistics_target = 100
 
Any ideas of what I can test/configurate to find out why this happens?

Thanks in advance.


I haven't looked in detail at the plans, but what stands out to me is 
that you've got a sort with a lot of columns and you've halved sort_mem 
(work_mem). Try increasing it (perhaps to 32000 even).

set work_mem = 32000;

Give that a quick go and see what happens. If it doesn't work, we'll 
look at the plans in more detail.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and

2006-04-07 Thread Rafael Martinez Guerrero
On Fri, 2006-04-07 at 15:31, Richard Huxton wrote:
 Rafael Martinez Guerrero wrote:
  Hello
  
  I have a sql statement that takes 108489.780 ms with 8.0.7 in a
  RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz /
  8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with
  2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM.
  
  Some information:
  
  - There is no IO when I am running the sql, but it uses 99% of the cpu. 
  - I run VACUUM VERBOSE ANALYZE in both databases before the test.
  - The databases are identical.
  - No other jobs running when testing.
  - Some different parameters between 7.4.12 and 8.0.7 :
  
  7.4.12:
  ---
  shared_buffers = 114966   #(15% of ram)  
  sort_mem = 16384
  vacuum_mem = 524288 
  wal_buffers = 64 
  checkpoint_segments = 16   
  effective_cache_size = 383220   #(50% ram)
  random_page_cost = 3
  default_statistics_target = 100 
  
  8.0.7:
  --
  shared_buffers = 250160   #(25% ram)
  work_mem = 8192
  maintenance_work_mem = 131072  
  wal_buffers = 128 
  checkpoint_segments = 64   
  effective_cache_size = 500321  #(50% ram)
  random_page_cost = 3
  default_statistics_target = 100
   
  Any ideas of what I can test/configurate to find out why this happens?
  Thanks in advance.
 
 I haven't looked in detail at the plans, but what stands out to me is 
 that you've got a sort with a lot of columns and you've halved sort_mem 
 (work_mem). Try increasing it (perhaps to 32000 even).
   set work_mem = 32000;
 
 Give that a quick go and see what happens. If it doesn't work, we'll 
 look at the plans in more detail.

I know that this SQL could be done in a much better way, but I can not
change it at the moment. 

work_mem = 16384:
-
After restarting the database and running the explain two times:
107911.229 ms

work_mem = 32768:
-
After restarting the database and running the explain two times:
103988.337 ms


-- 
Rafael Martinez, [EMAIL PROTECTED]
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Loading the entire DB into RAM

2006-04-07 Thread Charles A. Landemaine
I have a web server with PostgreSQL and RHEL. It hosts a search
engine, and each time some one makes a query, it uses the HDD Raid
array. The DB is not very big, it is less than a GB. I plan to add
more RAM anyway.

What I'd like to do is find out how to keep the whole DB in RAM so
that each time some one does a query, it doesn't use the HDD. Is it
possible, if so, how?
Thanks,

Charles.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and

2006-04-07 Thread Richard Huxton

Rafael Martinez Guerrero wrote:

Any ideas of what I can test/configurate to find out why this happens?
Thanks in advance.
I haven't looked in detail at the plans, but what stands out to me is 
that you've got a sort with a lot of columns and you've halved sort_mem 
(work_mem). Try increasing it (perhaps to 32000 even).

set work_mem = 32000;

Give that a quick go and see what happens. If it doesn't work, we'll 
look at the plans in more detail.


I know that this SQL could be done in a much better way, but I can not
change it at the moment. 


work_mem = 16384:
-
After restarting the database and running the explain two times:
107911.229 ms

work_mem = 32768:
-
After restarting the database and running the explain two times:
103988.337 ms


Damn! I hate it when I have to actually work at a problem :-)


Well, the first thing that strikes me is that the row estimates are 
terrible for 7.4.12 (which runs quickly) and much better for 8.0.7 
(which runs slowly). Which suggests you were lucky before.


The second thing I notice is the bit that goes: Materialize ... Seq Scan 
on acl acl_2. If you compare the two you'll see that the 7.4 version 
loops 16,160 times but 8.0 loops 513,264 times.


This is a bad choice, and I'm guessing it's made because it gets the row 
estimate wrong:
Hash Join  (cost=4667.85..51078.88 rows=62852 width=727) (actual 
time=649.028..13602.451 rows=513264 loops=1)


That's the comparison Groups_3.id = CachedGroupMembers_4.GroupId if I'm 
reading this correctly. Is there anything unusual about those two columns?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and

2006-04-07 Thread Rafael Martinez
On Fri, 2006-04-07 at 16:41 +0200, Gábriel Ákos wrote:

   
  Any ideas of what I can test/configurate to find out why this happens?
  Thanks in advance.
 
 Increase work_mem to 50% of memory, and don't care about 
 maintenance_work_mem and effective_cache_size, they don't matter in this 
 case.
 

The problem is not the amount of memory. It works much faster with only
16M and 7.4.12 than 8.0.7.

-- 
Rafael Martinez, [EMAIL PROTECTED]
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Loading the entire DB into RAM

2006-04-07 Thread Charles A. Landemaine
On 4/7/06, Matt Davies | Postgresql List [EMAIL PROTECTED] wrote:
 Out of curiosity, what are you using as the search engine?

Thank you. We designed the search engine ourself (we didn't use a
ready-to-use solution).

--
Charles A. Landemaine.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Loading the entire DB into RAM

2006-04-07 Thread Matt Davies | Postgresql List
If memory serves me correctly I have seen several posts about this in 
the past.


I'll try to recall highlights.

1. Create a md in linux sufficiently large enough to handle the data set 
you are wanting to store.

2. Create a HD based copy somewhere as your permanent storage mechanism.
3. Start up your PostgreSQL instance with the MD as the data store
4. Load your data to the MD instance.
5. Figure out how you will change indexes _and_ ensure that your disk 
storage is consistent with your MD instance.


I haven't done so, but it would be interesting to have a secondary 
database somewhere that is your primary storage. It needn't be 
especially powerful, or even available. It serves as the place to 
generate your indexing data. You could then use SLONY to propogate the 
data to the MD production system.


Of course, if you are updating your system that resides in ram, you 
should be thinking the other way. Have SLONY replicate changes to the 
other, permanent storage, system.


Either way you do it, I can't think of an out of the box method to doing 
it. Somehow one has to transfer data from permanent storage to the md 
instance, and, likewise, back to permanent storage.


Out of curiosity, what are you using as the search engine?


Charles A. Landemaine wrote:

I have a web server with PostgreSQL and RHEL. It hosts a search
engine, and each time some one makes a query, it uses the HDD Raid
array. The DB is not very big, it is less than a GB. I plan to add
more RAM anyway.

What I'd like to do is find out how to keep the whole DB in RAM so
that each time some one does a query, it doesn't use the HDD. Is it
possible, if so, how?
Thanks,

Charles.

---(end of broadcast)---
TIP 6: explain analyze is your friend


  



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Spotting planner errors (was Re: [PERFORM] Query planner is using wrong index.)

2006-04-07 Thread Richard Huxton

Tom Lane wrote:

Brian Herlihy [EMAIL PROTECTED] writes:

Before I go, I have a question - From discussions on the Postgresql irc
channel, and from reading the TODO list on the website, I am under the
impression that there are no plans to allow optimizer hints, such as use index
table_pkey.  Is this really true?


I personally don't think it's a good idea: the time spent in designing,
implementing, and maintaining a usable hint system would be significant,
and IMHO the effort is better spent on *fixing* the optimizer problems
than working around them.


Tom - does the planner/executor know it's got row estimates wrong? That 
is, if I'm not running an EXPLAIN ANALYSE is there a point at which we 
could log planner estimate for X out by factor of Y?


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: Spotting planner errors (was Re: [PERFORM] Query planner is using wrong index.)

2006-04-07 Thread Tom Lane
Richard Huxton dev@archonet.com writes:
 Tom - does the planner/executor know it's got row estimates wrong? That 
 is, if I'm not running an EXPLAIN ANALYSE is there a point at which we 
 could log planner estimate for X out by factor of Y?

Not at the moment, but you could certainly imagine changing the executor
to count rows even without EXPLAIN ANALYZE, and then complain during
plan shutdown.

Not sure how helpful that would be; there would be a lot of noise from
common cases such as executing underneath a LIMIT node.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: Spotting planner errors (was Re: [PERFORM] Query planner is using

2006-04-07 Thread Richard Huxton

Tom Lane wrote:

Richard Huxton dev@archonet.com writes:
Tom - does the planner/executor know it's got row estimates wrong? That 
is, if I'm not running an EXPLAIN ANALYSE is there a point at which we 
could log planner estimate for X out by factor of Y?


Not at the moment, but you could certainly imagine changing the executor
to count rows even without EXPLAIN ANALYZE, and then complain during
plan shutdown.

Not sure how helpful that would be; there would be a lot of noise from
common cases such as executing underneath a LIMIT node.


Hmm - thinking about it you'd probably want to record it similarly to 
stats too. It's the fact that the planner *repeatedly* gets an estimate 
wrong that's of interest.


Would it be prohibitive to total actions taken - to act as raw data for 
random_page_cost / cpu_xxx_cost? If you could get a ratio of estimated 
vs actual time vs the various page-fetches/index-fetches etc. we could 
actually plug some meaningful numbers in.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Loading the entire DB into RAM

2006-04-07 Thread Tom Lane
Charles A. Landemaine [EMAIL PROTECTED] writes:
 What I'd like to do is find out how to keep the whole DB in RAM so
 that each time some one does a query, it doesn't use the HDD. Is it
 possible, if so, how?

That should happen essentially for free, if the kernel doesn't have any
better use for the memory --- anything read from disk once will stay in
kernel disk cache.  Perhaps you need to take a closer look at your
kernel VM parameters.  Or maybe you don't have enough RAM yet for both
the DB contents and the processes you need to run.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and 8.0.7

2006-04-07 Thread Tom Lane
Rafael Martinez Guerrero [EMAIL PROTECTED] writes:
 I have a sql statement that takes 108489.780 ms with 8.0.7 in a
 RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz /
 8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with
 2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM.

I think you've discovered a planner regression.
Simplified test case using the regression database:

explain select * from tenk1 a, tenk1 b
where (a.ten = b.ten and (a.unique1 = 100 or a.unique1 = 101))
   or (a.hundred = b.hundred and a.unique1 = 42);

7.4:
 Nested Loop  (cost=0.00..2219.74 rows=4 width=488)
   Join Filter: (((outer.hundred = inner.hundred) OR (outer.ten = 
inner.ten)) AND ((outer.unique1 = 42) OR (outer.ten = inner.ten)) AND 
((outer.hundred = inner.hundred) OR (outer.unique1 = 100) OR 
(outer.unique1 = 101)))
   -  Index Scan using tenk1_unique1, tenk1_unique1, tenk1_unique1 on tenk1 a 
(cost=0.00..18.04 rows=3 width=244)
 Index Cond: ((unique1 = 42) OR (unique1 = 100) OR (unique1 = 101))
   -  Seq Scan on tenk1 b  (cost=0.00..458.24 rows=10024 width=244)
(5 rows)

8.0:
 Nested Loop  (cost=810.00..6671268.00 rows=2103 width=488)
   Join Filter: (((outer.ten = inner.ten) AND ((outer.unique1 = 100) OR 
(outer.unique1 = 101))) OR ((outer.hundred = inner.hundred) AND 
(outer.unique1 = 42)))
   -  Seq Scan on tenk1 a  (cost=0.00..458.00 rows=1 width=244)
   -  Materialize  (cost=810.00..1252.00 rows=1 width=244)
 -  Seq Scan on tenk1 b  (cost=0.00..458.00 rows=1 width=244)
(5 rows)

Note the failure to pull out the unique1 conditions from the join clause
and use them with the index.  I didn't bother to do EXPLAIN ANALYZE;
this plan obviously sucks compared to the other.

8.1:
TRAP: FailedAssertion(!(!restriction_is_or_clause((RestrictInfo *) orarg)), 
File: indxpath.c, Line: 479)
LOG:  server process (PID 12201) was terminated by signal 6
server closed the connection unexpectedly

Oh dear.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Loading the entire DB into RAM

2006-04-07 Thread Merlin Moncure
On 4/7/06, Charles A. Landemaine [EMAIL PROTECTED] wrote:
 I have a web server with PostgreSQL and RHEL. It hosts a search
 engine, and each time some one makes a query, it uses the HDD Raid
 array. The DB is not very big, it is less than a GB. I plan to add
 more RAM anyway.

 What I'd like to do is find out how to keep the whole DB in RAM so
 that each time some one does a query, it doesn't use the HDD. Is it
 possible, if so, how?

don't bother.

If your database is smaller than ram on the box, the operating will
cache it quite effectively.  All you should be worrying about is to
set fsync=on (you care about your data) or off (you don't).  If your
data is truly static you might get better performance out of a
in-process data storage, like sqlite for example.

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Loading the entire DB into RAM

2006-04-07 Thread PFC

** This has not been tested.

Create a ramdisk of required size
	Create a Linux software RAID mirror between the ramdisk, and a partition  
of the same size.

Mark the physical-disk as write-mostly (reads will go to the ramdisk)
Format it and load data...

	On reboot you'll get a RAID1 mirror with 1 failed drive (because the  
ramdisk is dead of course). Just recreate the ramdisk and resync.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and 8.0.7

2006-04-07 Thread Tom Lane
I wrote:
 Rafael Martinez Guerrero [EMAIL PROTECTED] writes:
 I have a sql statement that takes 108489.780 ms with 8.0.7 in a
 RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz /
 8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with
 2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM.

 I think you've discovered a planner regression.
 Simplified test case using the regression database:

 explain select * from tenk1 a, tenk1 b
 where (a.ten = b.ten and (a.unique1 = 100 or a.unique1 = 101))
or (a.hundred = b.hundred and a.unique1 = 42);

I've repaired the assertion crash in 8.1/HEAD, but I don't think it's
practical to teach 8.0 to optimize queries like this nicely.  The reason
7.4 can do it is that 7.4 forces the WHERE condition into CNF, ie

  (a.hundred = b.hundred OR a.ten = b.ten) AND
  (a.unique1 = 42 OR a.ten = b.ten) AND
  (a.hundred = b.hundred OR a.unique1 = 100 OR a.unique1 = 101) AND
  (a.unique1 = 42 OR a.unique1 = 100 OR a.unique1 = 101)

from which it's easy to extract the index condition for A.  We decided
that forcing to CNF wasn't such a hot idea, so 8.0 and later don't do
it, but 8.0's logic for extracting index conditions from joinquals isn't
up to the problem of handling sub-ORs.  Fixing that looks like a larger
change than I care to back-patch into an old release.

My recommendation is to update to 8.1.4 when it comes out.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Gavin Hamill
Bing-bong, passenger announcement.. the panic train is now pulling into
platform 8.1.3. Bing-bong. =)

OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650
(8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot
up and our website is next to unusable. The IBM is not swapping (not
with 16GB of RAM!), disk i/o is low, but there must be something
critically wrong for this monster to be performing so badly..

There is little IO (maybe 500KB/sec), but the CPUs are often at 100%
usage.

VACUUM VERBOSE ANALYZE shows me 4 page slots are needed to track
all free space. I have 16 page slots configured, and this machine is
dedicated to pg.

The thing that really winds me up about this, is that aside from all
the normal 'my postgres is slow l0lz!' troubleshooting is the previous
machine (Debian sarge on four 3GHz Xeons) is using 8.1.3 also, with an
inferior I/O subsystem, and it churns through the workload very
merrily, only reaching a full loadavg of 4 at peak times, and running
our main 'hotelsearch' function in ~1000ms.. 

This IBM on the other hand is often taking 5-10 seconds to do the same
thing - although just by watching the logs it's clear to see the
workload coming in waves, and then calming down again. (this
correlation is matched by watching the load-balancer's logs as it takes
unresponsive webservers out of the cluster)

Here's the differences (I've removed obvious things like file/socket
paths) in select name,setting from pg_catalog.pg_settings between the
two:

--- cayenne 2006-04-07 18:43:48.0 +0100 # quad xeon
+++ jalapeno2006-04-07 18:44:08.0 +0100 # ibm 650
- effective_cache_size| 32
+ effective_cache_size| 64
- integer_datetimes   | on
+ integer_datetimes   | off
- maintenance_work_mem| 262144
+ maintenance_work_mem| 1048576
- max_connections | 150
+ max_connections | 100
- max_fsm_pages   | 66000
+ max_fsm_pages   | 16
- max_stack_depth | 2048
+ max_stack_depth | 16384
- tcp_keepalives_count| 0
- tcp_keepalives_idle | 0
- tcp_keepalives_interval | 0
- temp_buffers| 1000
- TimeZone| GB
+ tcp_keepalives_count| 8
+ tcp_keepalives_idle | 7200
+ tcp_keepalives_interval | 75
+ temp_buffers| 4000
+ TimeZone| GMT0BST,M3.5.0,M10.5.0
- wal_sync_method | fdatasync
- work_mem| 4096
+ wal_sync_method | open_datasync
+ work_mem| 16384

So, jalapeno really should have much more room to move. shared_buffers
is 6 on both machines.

I'm reaching the end of my tether here - our search functions are just
so extensive and my pg knowledge is so small that it's overwhelming to
try and step through it to find any bottlenecks :(

Just to reiterate, it all runs great on cayenne since we trimmed a lot
of the fat out of the search, and I can't understand why the IBM box
isn't absolutely throwing queries out the door :)

Cheers,
Gavin.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Tom Lane
Gavin Hamill [EMAIL PROTECTED] writes:
 OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650
 (8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot
 up and our website is next to unusable. The IBM is not swapping (not
 with 16GB of RAM!), disk i/o is low, but there must be something
 critically wrong for this monster to be performing so badly..

Have you vacuumed/analyzed since reloading your data?  Compare some
EXPLAIN ANALYZE outputs for identical queries on the two machines,
that usually helps figure out what's wrong.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Scott Marlowe
On Fri, 2006-04-07 at 12:58, Gavin Hamill wrote:
 Bing-bong, passenger announcement.. the panic train is now pulling into
 platform 8.1.3. Bing-bong. =)
 
 OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650
 (8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot
 up and our website is next to unusable. The IBM is not swapping (not
 with 16GB of RAM!), disk i/o is low, but there must be something
 critically wrong for this monster to be performing so badly..
 
 There is little IO (maybe 500KB/sec), but the CPUs are often at 100%
 usage.

Can you test your AIX box with linux on it?  It may well be that
something in AIX is causing this performance problem.  I know that on
the same SPARC hardware, a postgresql database is 2 or more times faster
on top of linux or BSD than it is on solaris, at least it was back a few
years ago when I tested it.

Are the same queries getting the same basic execution plan on both
boxes?  Turn on logging for slow queries, and explain analyze them on
both machines to see if they are.

If they aren't, figure out why.

I'd put the old 4 way Xeon back in production and do some serious
testing of this pSeries machine.  IBM should be willing to help you, I
hope.

My guess is that this is an OS issue.  Maybe there are AIX tweaks that
will get it up to the same or higher level of performance as your four
way xeon.  Maybe there aren't.

Myself, I'd throw a spare drive in for the OS, put some flavor of linux
on it

http://www-1.ibm.com/partnerworld/pwhome.nsf/weblook/pat_linux_learn_why_power.html

and do some load testing there.  If the machine can't perform up to
snuff with the same basic OS and a similar setup to your Xeon, send it
back to IBM and buy one of these:

http://www.asaservers.com/system_dept.asp?dept_id=SD-002

or something similar.  I can't imagine it costing more than a pSeries.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Same SQL, 104296ms of difference between 7.4.12 and

2006-04-07 Thread Rafael Martinez
On Fri, 2006-04-07 at 13:36 -0400, Tom Lane wrote:
 I wrote:
  Rafael Martinez Guerrero [EMAIL PROTECTED] writes:
  I have a sql statement that takes 108489.780 ms with 8.0.7 in a
  RHEL4/amd64linux server with 2xAMD Opteron(tm) Processor 275 2.00GHz /
  8GB RAM and only 4193.588 ms with 7.4.12 in a RHEL3/386linux server with
  2xIntel(R) Xeon(TM) CPU 2.40GHz / 4GB RAM.
 
  I think you've discovered a planner regression.
  Simplified test case using the regression database:
 
  explain select * from tenk1 a, tenk1 b
  where (a.ten = b.ten and (a.unique1 = 100 or a.unique1 = 101))
 or (a.hundred = b.hundred and a.unique1 = 42);
 
 I've repaired the assertion crash in 8.1/HEAD, but I don't think it's
 practical to teach 8.0 to optimize queries like this nicely.  The reason
 7.4 can do it is that 7.4 forces the WHERE condition into CNF, ie
 
[..]

Tom, thank you very much for your help. As I suspected this was a more
complicated problem than the configuration of some parameters :( . Good
that we have found out this now and not after the upgrade.

All our upgrade plans and testing for all our databases have been done
for/with 8.0.x (yes, I know 8.1.x is much better, but I am working in a
conservative place from the sysadm point of view). We will have to
change our plans and go for 8.1 if we want this to work. 

 My recommendation is to update to 8.1.4 when it comes out.

Any idea about when 8.1.4 will be released?
Thanks again.

-- 
Rafael Martinez, [EMAIL PROTECTED]
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] bad performance on Solaris 10

2006-04-07 Thread Chris Mair

 Ok, so I did a few runs for each of the sync methods, keeping all the
 rest constant and got this:
 
 open_datasync  0.7
 fdatasync  4.6
 fsync  4.5
 fsync_writethrough not supported
 open_sync  0.6
 
 in arbitrary units - higher is faster.
 
 Quite impressive!
 
 
   
 
 Chris,
 Just to make sure the x4100 config is similar to your Linux system, can 
 you verify the default setting for disk write cache and make sure they 
 are both enabled or disabled. Here's how to check in Solaris.
 As root, run format -e - pick a disk - cache - write_cache - display
 
 Not sure how to do it on Linux though!
 
 Regards,
 -Robert

I don't have access to the machine for the next few days due to eh...
let's call it firewall accident ;), but it might very well be that it
was off on the x4100 (I know it's on the smaller Linux box).

That together with the bad default sync method can definitely explain
the strangely slow out of box performance I got.

So thanks again for explaining this to me :)

Bye, Chris.




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Gavin Hamill
On Fri, 07 Apr 2006 14:41:39 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 Gavin Hamill [EMAIL PROTECTED] writes:
  OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650
  (8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot
  up and our website is next to unusable. The IBM is not swapping (not
  with 16GB of RAM!), disk i/o is low, but there must be something
  critically wrong for this monster to be performing so badly..
 
 Have you vacuumed/analyzed since reloading your data?  

Absolutely - a VACUUM FULL was the first thing I did, and have VACUUM ANALYZE 
VERBOSE'd a couple of times since. I have plenty of overhead to keep the entire 
free space map in RAM.

 Compare some
 EXPLAIN ANALYZE outputs for identical queries on the two machines,
 that usually helps figure out what's wrong.

If only :)

Since 90% of the db work is the 'hotelsearch' function (which is 350 
lines-worth that I'm not permitted to share :(( ), an EXPLAIN ANALYZE reveals 
practically nothing:

# jalapeno (IBM)
laterooms=# EXPLAIN ANALYZE select * from hotelsearch(12.48333::numeric, 
41.9::numeric, 5::int4, '2006-04-13'::date, 5::int4, NULL::int4, 1::int4, 
NULL::int4, NULL::int4, TRUE::bool, FALSE::bool, FALSE::bool, 1::int2, 
'GBP'::text, 'Roma'::text, 7::int4, NULL::int4, NULL::int4) limit 500;
  QUERY PLAN 
---
 Limit  (cost=0.00..6.25 rows=500 width=1587) (actual time=2922.282..2922.908 
rows=255 loops=1)
   -  Function Scan on hotelsearch  (cost=0.00..12.50 rows=1000 width=1587) 
(actual time=2922.277..2922.494 rows=255 loops=1)
 Total runtime: 2923.296 ms
(3 rows)

# cayenne (xeon)
laterooms=# EXPLAIN ANALYZE select * from hotelsearch(12.48333::numeric, 
41.9::numeric, 5::int4, '2006-04-13'::date, 5::int4, NULL::int4, 1::int4, 
NULL::int4, NULL::int4, TRUE::bool, FALSE::bool, FALSE::bool, 1::int2, 
'GBP'::text, 'Roma'::text, 7::int4, NULL::int4, NULL::int4) limit 500;
  QUERY PLAN 
---
 Limit  (cost=0.00..6.25 rows=500 width=1587) (actual time=1929.483..1930.103 
rows=255 loops=1)
   -  Function Scan on hotelsearch  (cost=0.00..12.50 rows=1000 width=1587) 
(actual time=1929.479..1929.693 rows=255 loops=1)
 Total runtime: 1930.506 ms
(3 rows)


The 'LIMIT 500' is a red herring since the function body will get all data, so 
reducing the LIMIT in the call to hotelsearch doesn't reduce the amount of work 
being done.

The killer in it all is tail'ing the postgres log (which I have set only to log 
queries at 1000ms or up) is things will be returning at 1000-2000ms.. then 
suddenly shoot up to 8000ms.. and if I try a few of those 8000ms queries on the 
xeon box, they exec in ~1500ms.. and if I try them again a few moments later on 
the ibm, they'll also exec in maybe ~2500ms.

This is one hell of a moving target and I can't help but think I'm just missing 
something that's right in front of my nose, too close to see. 

Cheers,
Gavin.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Gavin Hamill
On Fri, 07 Apr 2006 13:54:21 -0500
Scott Marlowe [EMAIL PROTECTED] wrote:

 Are the same queries getting the same basic execution plan on both
 boxes?  Turn on logging for slow queries, and explain analyze them on
 both machines to see if they are.

See reply to Tom Lane :)

 I'd put the old 4 way Xeon back in production and do some serious
 testing of this pSeries machine.  IBM should be willing to help you, I
 hope.

They probably would if this had been bought new - as it is, we have
rented the machine for a month from a 2nd-user dealer to see if it's
capable of taking the load. I'm now glad we did this. 

 My guess is that this is an OS issue.  Maybe there are AIX tweaks that
 will get it up to the same or higher level of performance as your four
 way xeon.  Maybe there aren't.

The pSeries isn't much older than our Xeon machine, and I expected the
performance level to be exemplary out of the box.. we've enabled the
64-bit kernel+userspace, and compiled pg for 64-bitness with the gcc
flags as reccommended by Senica Cunningham on this very list..

 Myself, I'd throw a spare drive in for the OS, put some flavor of
 linux on it

Terrifying given I know nothing about the pSeries boot system, but at
this stage I'm game for nearly anything. 

 http://www.asaservers.com/system_dept.asp?dept_id=SD-002

Multi-Opteron was the other thing we considered but decided to give
'Big Iron' UNIX a whirl...

Cheers,
Gavin.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Tom Lane
Gavin Hamill [EMAIL PROTECTED] writes:
 Scott Marlowe [EMAIL PROTECTED] wrote:
 My guess is that this is an OS issue.  Maybe there are AIX tweaks that
 will get it up to the same or higher level of performance as your four
 way xeon.  Maybe there aren't.

 The pSeries isn't much older than our Xeon machine, and I expected the
 performance level to be exemplary out of the box..

I'm fairly surprised too.  One thing I note from your comparison of
settings is that the default WAL sync method is different on the two
operating systems.  If the query load is update-heavy then it would be
very worth your while to experiment with the sync method.  However,
if the bottleneck is pure-SELECT transactions then WAL sync should not
be a factor at all.

Does AIX have anything comparable to oprofile or dtrace?  It'd be
interesting to try to monitor things at that level and see what we can
learn.  Failing a low-level profiler, there should at least be something
comparable to strace --- you should try watching some of the backends
with strace and see what their behavior is when the performance goes
south.  Lots of delaying select()s or semop()s would be a red flag.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread D'Arcy J.M. Cain
On Fri, 7 Apr 2006 20:59:19 +0100
Gavin Hamill [EMAIL PROTECTED] wrote:
  I'd put the old 4 way Xeon back in production and do some serious
  testing of this pSeries machine.  IBM should be willing to help you, I
  hope.
 
 They probably would if this had been bought new - as it is, we have
 rented the machine for a month from a 2nd-user dealer to see if it's
 capable of taking the load. I'm now glad we did this. 

We also had problems with a high end AIX system and we got no help from
IBM.  They expected you to put Oracle on and if you used anything else
you were on your own.  We had exactly the same issue.  We expected to
get an order of magnitude improvement and instead the app bogged down.
It also got worse over time.  We had to reboot every night to get
anything out of it.  Needless to say, they got their system back.

 
  My guess is that this is an OS issue.  Maybe there are AIX tweaks that
  will get it up to the same or higher level of performance as your four
  way xeon.  Maybe there aren't.
 
 The pSeries isn't much older than our Xeon machine, and I expected the
 performance level to be exemplary out of the box.. we've enabled the
 64-bit kernel+userspace, and compiled pg for 64-bitness with the gcc
 flags as reccommended by Senica Cunningham on this very list..

That's Seneca.

We found that our money was better spent on multiple servers running
NetBSD with a home grown multi-master replication system.  Need more
power?  Just add more servers.

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Scott Marlowe
On Fri, 2006-04-07 at 14:59, Gavin Hamill wrote:
 On Fri, 07 Apr 2006 13:54:21 -0500
 Scott Marlowe [EMAIL PROTECTED] wrote:
 
  Are the same queries getting the same basic execution plan on both
  boxes?  Turn on logging for slow queries, and explain analyze them on
  both machines to see if they are.
 
 See reply to Tom Lane :)

I didn't see one go by yet...  Could be sitting in the queue.

 They probably would if this had been bought new - as it is, we have
 rented the machine for a month from a 2nd-user dealer to see if it's
 capable of taking the load. I'm now glad we did this. 

Thank god.  I had a picture of you sitting on top of a brand new very
expensive pSeries 

Let us know if changing the fsync setting helps.  Hopefully that's all
the problem is.

Off on a tangent.  If the aggregate memory bandwidth of the pSeries is
no greater than you Xeon you might not see a big improvement if you were
memory bound before.  If you were CPU bound, you may or may not see an
improvement.

Can you describe the disc subsystems in the two machines for us?  What
kind of read / write load you have?  It could be the older box was
running on IDE drives with fake fsync responses which would lie, be
fast, but not reliable in case of a power outage.

Do you have hardware RAID for your pSeries?  how many discs, how much
battery backed cache, etc?

 Multi-Opteron was the other thing we considered but decided to give
 'Big Iron' UNIX a whirl...

It still might be a good choice, if it's a simple misconfiguration
issue.

But man, those new multiple core opterons can make some impressive
machines for very little money.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Greg Stark

Gavin Hamill [EMAIL PROTECTED] writes:

 This is one hell of a moving target and I can't help but think I'm just
 missing something that's right in front of my nose, too close to see.

I'm assuming you compiled postgres yourself? Do you have the output from the
configure script? I'm wondering if it failed to find a good spinlock match for
the architecture. Not sure if that's really likely but it's a possibility.

Also, I'm pretty sure IBM has tools that would let you disable some of the
processors to see if maybe it's a shared memory bus issue. If I understand you
right the machine isn't in production yet? In which case I would get timing
information for a single processor, two processors, four processors, and eight
processors. If you see it max out and start dropping then that would point
towards a hardware/low level postgres issue like spinlocks or shared memory
rather than a high level database issue like stats.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Gábriel Ákos

Gavin Hamill wrote:

Bing-bong, passenger announcement.. the panic train is now pulling into
platform 8.1.3. Bing-bong. =)

OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650
(8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot
up and our website is next to unusable. The IBM is not swapping (not


I would say running _one_ query at a time depends on the power of _one_ 
cpu. PPCs aren't that fast, I'd say they are slower than Xeons. Moreover 
I'm sure that AMD Opterons are faster than Xeons. I'd say you should go 
and test an opteron-based configuration. You'll get much more power for 
the same (much likely for less) money.



--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :[EMAIL PROTECTED]|Web:  http://www.i-logic.hu=-
-=Tel/fax:+3612367353|Mobil:+36209278894=-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Gavin Hamill
On Fri, 7 Apr 2006 16:16:02 -0400
D'Arcy J.M. Cain darcy@druid.net wrote:

 We also had problems with a high end AIX system and we got no help
 from IBM.  They expected you to put Oracle on and if you used
 anything else you were on your own.  

Urk, I thought IBM were supposedly Linux sycophants thesedays...

 We had exactly the same issue.
 We expected to get an order of magnitude improvement and instead the
 app bogged down.

That's kind of encouraging, I suppose - that it might not be something
mind-bogglingly stupid I'm doing.

 It also got worse over time.  We had to reboot every
 night to get anything out of it.  Needless to say, they got their
 system back.

nod
 
 That's Seneca.

Oops - meant to check the spelling before I sent that =)
 
 We found that our money was better spent on multiple servers running
 NetBSD with a home grown multi-master replication system.  Need more
 power?  Just add more servers.

Aye, I originally suggested multiple servers, but was talked round to
one giant db so that our devels didn't have to rewrite code to deal
with read/write + read-only db handles...

Cheers,
Gavin.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Gavin Hamill
On Fri, 07 Apr 2006 15:24:18 -0500
Scott Marlowe [EMAIL PROTECTED] wrote:

  See reply to Tom Lane :)
 
 I didn't see one go by yet...  Could be sitting in the queue.

If it's not arrived by now - EXPLAIN ANALYZE doesn't tell me
anything :)

 Let us know if changing the fsync setting helps.  Hopefully that's all
 the problem is.

fsync's already off - yes a bit scary, but our I/O is only about
500KB/sec writing.. the whole db fits in RAM / kernel disk cache, and
I'd rather have performance than security at this exact moment..
 
 Off on a tangent.  If the aggregate memory bandwidth of the pSeries is
 no greater than you Xeon you might not see a big improvement if you
 were memory bound before.  If you were CPU bound, you may or may not
 see an improvement.

I did look into the specs of the system, and the memory bw on the
pSeries was /much/ greater than the Xeon - it's one of the things that
really pushed me towards it in the end. I forget the figures, but it
was 3 or 4 times greater.

 Can you describe the disc subsystems in the two machines for us?  What
 kind of read / write load you have?  It could be the older box was
 running on IDE drives with fake fsync responses which would lie, be
 fast, but not reliable in case of a power outage.

Again, I'm confident that I/O's not the killer here.. the Xeon is a Dell
6850- hardware RAID1.. SCSI drives.

  Multi-Opteron was the other thing we considered but decided to give
  'Big Iron' UNIX a whirl...
 
 It still might be a good choice, if it's a simple misconfiguration
 issue.
 
 But man, those new multiple core opterons can make some impressive
 machines for very little money.

So I see - we could buy two quad-opterons for the cost of renting this
pSeries for a month

Cheers,
Gavin.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Gavin Hamill
On Fri, 07 Apr 2006 16:06:02 -0400
Tom Lane [EMAIL PROTECTED] wrote:

  The pSeries isn't much older than our Xeon machine, and I expected
  the performance level to be exemplary out of the box..
 
 I'm fairly surprised too.  One thing I note from your comparison of
 settings is that the default WAL sync method is different on the two
 operating systems.  

We're very read-focussed.. there's update activity, sure, but the IO is
only pushing about 500KByte/sec on average, usually much less. I also
have fsync switched off - yes dangerous, but I just want to eliminate
IO completely as a contributing factor.

 Does AIX have anything comparable to oprofile or dtrace?  

I've used neither on Linux, but a quick google showed up a few articles
along the lines of 'in theory it shouldn't be hard to port to AIX'
but nothing concrete. My guess is IBM sell a tool to do this. Hell, the
C++ compiler is £1200... (hence our use of GCC 4.1 to compile pg)


 Failing a low-level profiler, there should at least be
 something comparable to strace --- you should try watching some of
 the backends with strace and see what their behavior is when the
 performance goes south.  Lots of delaying select()s or semop()s would
 be a red flag.

There's truss installed which seems to do the same as strace on
Linux... and here's a wildly non-scientific  glance..  I watched the
'topas' output (top for AIX) , identified a PID that was doing a lot of
work, then attached truss to that pid. In addition to lots of send
(), recv() and lseek()s... about once a minute I saw hundreds of calls
to __semop() interspersed with _select(), followed by tons of lseek()
+kread()+__semop() and then I can see the kwrite() to the pg logfile

246170: kwrite(2,  L O G : d u, 8)= 8 etc.

Cheers,
Gavin.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Luke Lonergan
Title: Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.



Gavin,

On 4/7/06 2:24 PM, Gavin Hamill [EMAIL PROTECTED] wrote:

 I did look into the specs of the system, and the memory bw on the
 pSeries was /much/ greater than the Xeon - it's one of the things that
 really pushed me towards it in the end. I forget the figures, but it
 was 3 or 4 times greater.

>From the literature at: 
http://www-03.ibm.com/servers/eserver/pseries/hardware/midrange/p650_desc.html

The pSeries 650 features a peak aggregate memory to L3 cache bandwidth of 25.6GB/second for an 8way configuration. In addition, aggregate I/O bandwidth is up to 16GB/second. The result is a remarkable combination of system architecture, speed and power that delivers efficient and cost-effective data sharing and application throughput.

Thats a total of 25.6GB/s for 8 CPUs, or 3.2GB/s per CPU. 3GHz P4 Xeons typically have an 800MHz memory bus with double the speed at 6.4GB/s result (800MHz x 8 bytes per L2 cache line = 6.4GB/s). Furthermore, the speed at which the P4 Xeon can load data into L2 cache from memory is matched to the bus because the L2 cache line width is 8 bytes wide and can stream data to L2 at full bus speed.

That said, I find typical memory bandwidth for the P4 in applications is limited at about 2GB/s. See here for more detail: http://www.cs.virginia.edu/stream/standard/Bandwidth.html

In fact, looking at the results there, the IBM 650m2 only gets 6GB/s on all 8 CPUs. I wouldnt be surprised if the strange L3 cache architecture of the IBM 650 is holding it back from streaming memory access efficiently.

Whether this has anything to do with your problem or not, I have no idea!

- Luke 





Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Tom Lane
Gavin Hamill [EMAIL PROTECTED] writes:
 There's truss installed which seems to do the same as strace on
 Linux... and here's a wildly non-scientific  glance..  I watched the
 'topas' output (top for AIX) , identified a PID that was doing a lot of
 work, then attached truss to that pid. In addition to lots of send
 (), recv() and lseek()s...

Those are good, they represent real work getting done.

 about once a minute I saw hundreds of calls
 to __semop() interspersed with _select(),

This is not good.  Did the semop storms coincide with visible slowdown?
(I'd assume so, but you didn't actually say...)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes:
 That said, I find typical memory bandwidth for the P4 in applications is
 limited at about 2GB/s.  See here for more detail:
 http://www.cs.virginia.edu/stream/standard/Bandwidth.html

 In fact, looking at the results there, the IBM 650m2 only gets 6GB/s
 on all 8 CPUs.  I wouldn't be surprised if the strange L3 cache
 architecture of the IBM 650 is holding it back from streaming memory
 access efficiently.

Given Gavin's latest report, I'm wondering how much the IBM slows down
when a spinlock operation is involved.  If the memory architecture isn't
good about supporting serialized access to memory, that gaudy sounding
bandwidth number might have little to do with PG's real-world behavior.
On the other hand, we already know that Xeons suck about as badly as
can be on that same measure; could the pSeries really be worse?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Luke Lonergan
Tom,

On 4/7/06 3:02 PM, Tom Lane [EMAIL PROTECTED] wrote:

 On the other hand, we already know that Xeons suck about as badly as
 can be on that same measure; could the pSeries really be worse?

I wouldn't be too surprised, but it sounds like it needs a test.  Do we have
a test for this?  Is there a contention-prone query stream that we can think
up?

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes:
 On 4/7/06 3:02 PM, Tom Lane [EMAIL PROTECTED] wrote:
 On the other hand, we already know that Xeons suck about as badly as
 can be on that same measure; could the pSeries really be worse?

 I wouldn't be too surprised, but it sounds like it needs a test.  Do we have
 a test for this?  Is there a contention-prone query stream that we can think
 up?

If you want you could install a pre-8.1 PG and then try one of the
queries that we were using as test cases a year ago for spinlock
investigations.  I don't recall details right now but I remember
having posted a pretty trivial test case that would send a
multiprocessor machine into context-swap storm, which sounds a whole
lot like what Gavin is seeing.

I think that 8.1 ought to be relatively free of buffer-manager spinlock
contention, which is why I doubt that test case would be interesting
against 8.1.  The interesting question is what else is he seeing
contention for, if it's not the BufMgrLock?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Gavin Hamill
On Fri, 07 Apr 2006 17:56:49 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 This is not good.  Did the semop storms coincide with visible
 slowdown? (I'd assume so, but you didn't actually say...)

If I'd been able to tell, then I'd tell you =) I'll have another go...

Yes, there's a definate correlation here.. I attached truss to the
main postmaster..

$ truss -Ff -p 340344 21 | grep semop

here's a snippet

278774: __semop(15728650, 0x0FFF7E80, 1)= 0
155712: __semop(15728650, 0x0FFF5920, 1)= 0
278774: __semop(15728649, 0x0FFF6F10, 1)
114914: __semop(15728649, 0x0FFF6A40, 1)= 0 = 0 
114914: __semop(15728650, 0x0FFF61E0, 1)
155712: __semop(15728650, 0x0FFF6850, 1)= 0 = 0 
155712: __semop(15728650, 0x0FFF6890, 1)= 0 1
55712: __semop(15728650, 0x0FFF5920, 1)
278774: __semop(15728650, 0x0FFF6F10, 1) 
155712: __semop(15728650, 0x0FFF6850, 1)= 0 = 0
278774: __semop(15728649, 0x0FFF7E40, 1)
114914: __semop(15728649, 0x0FFF6A80, 1)= 0 = 0
278774: __semop(15728650, 0x0FFF7E80, 1) 

And when I saw a flood of semop's for any particular PID, a second later
in the 'topas' process list would show that PID at a 100% CPU ...

Most intriguing :)

Cheers,
Gavin.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Tom Lane
Gavin Hamill [EMAIL PROTECTED] writes:
 On Fri, 07 Apr 2006 17:56:49 -0400
 Tom Lane [EMAIL PROTECTED] wrote:
 This is not good.  Did the semop storms coincide with visible
 slowdown? (I'd assume so, but you didn't actually say...)

 Yes, there's a definate correlation here.. I attached truss to the
 main postmaster..
 ...
 And when I saw a flood of semop's for any particular PID, a second later
 in the 'topas' process list would show that PID at a 100% CPU ...

So apparently we've still got a problem with multiprocess contention for
an LWLock somewhere.  It's not the BufMgrLock because that's gone in 8.1.
It could be one of the finer-grain locks that are still there, or it
could be someplace else.

Are you in a position to try your workload using PG CVS tip?  There's a
nontrivial possibility that we've already fixed this --- a couple months
ago I did some work to reduce contention in the lock manager:

2005-12-11 16:02  tgl

* src/: backend/access/transam/twophase.c,
backend/storage/ipc/procarray.c, backend/storage/lmgr/README,
backend/storage/lmgr/deadlock.c, backend/storage/lmgr/lock.c,
backend/storage/lmgr/lwlock.c, backend/storage/lmgr/proc.c,
include/storage/lock.h, include/storage/lwlock.h,
include/storage/proc.h: Divide the lock manager's shared state into
'partitions', so as to reduce contention for the former single
LockMgrLock.  Per my recent proposal.  I set it up for 16
partitions, but on a pgbench test this gives only a marginal
further improvement over 4 partitions --- we need to test more
scenarios to choose the number of partitions.

This is unfortunately not going to help you as far as getting that
machine into production now (unless you're brave enough to run CVS tip
as production, which I certainly am not).  I'm afraid you're most likely
going to have to ship that pSeries back at the end of the month, but
while you've got it it'd be awfully nice if we could use it as a testbed
...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Luke Lonergan
Gavin,

On 4/7/06 3:27 PM, Gavin Hamill [EMAIL PROTECTED] wrote:

 278774: __semop(15728650, 0x0FFF7E80, 1)= 0
 155712: __semop(15728650, 0x0FFF5920, 1)= 0
 278774: __semop(15728649, 0x0FFF6F10, 1)
 114914: __semop(15728649, 0x0FFF6A40, 1)= 0 = 0
 114914: __semop(15728650, 0x0FFF61E0, 1)
 155712: __semop(15728650, 0x0FFF6850, 1)= 0 = 0
 155712: __semop(15728650, 0x0FFF6890, 1)= 0 1
 55712: __semop(15728650, 0x0FFF5920, 1)
 278774: __semop(15728650, 0x0FFF6F10, 1)
 155712: __semop(15728650, 0x0FFF6850, 1)= 0 = 0
 278774: __semop(15728649, 0x0FFF7E40, 1)
 114914: __semop(15728649, 0x0FFF6A80, 1)= 0 = 0
 278774: __semop(15728650, 0x0FFF7E80, 1)

Seems like you're hitting a very small target in RAM with these semop calls.
I wonder what part of the code is doing this - Tom would know better how to
trace it, but the equivalent of oprofile output would be nice.

The other thing that I'd like to see is an evaluation of the memory access
latency of this machine from Register to RAM.  I couldn't find a
benchmarking tool that was UNIX friendly out there, maybe I'll write one
real quick.  I suspect this machine has a heinous latency and a storm of
semops to the same spot of RAM might be a far worse performance problem on
this machine than on others...

- Luke  





---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes:
 On 4/7/06 3:27 PM, Gavin Hamill [EMAIL PROTECTED] wrote:

 278774: __semop(15728650, 0x0FFF7E80, 1)= 0
 155712: __semop(15728650, 0x0FFF5920, 1)= 0
 278774: __semop(15728649, 0x0FFF6F10, 1)

 Seems like you're hitting a very small target in RAM with these semop calls.

IIRC the address passed to semop() in our code is always a local struct
on the stack, so that's a bit of a red herring --- there won't be
cross-processor contention for that.

It's plausible though that we are seeing contention across members of
the LWLock array, with the semop storm just being a higher-level symptom
of the real hardware-level problem.  You might try increasing
LWLOCK_PADDED_SIZE to 64 or even 128, see
src/backend/storage/lmgr/lwlock.c (this is something that does exist in
8.1, so it'd be easy to try).

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Gavin Hamill
On Fri, 07 Apr 2006 18:52:20 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 Are you in a position to try your workload using PG CVS tip?  There's
 a nontrivial possibility that we've already fixed this --- a couple
 months ago I did some work to reduce contention in the lock manager:

Well, there's a question. At the moment it's still live - but I'll need
to swap back to the Xeon machine since I can't afford to have a Saturday
with the db firing on three cylinders (out of eight :)

At that point you're welcome to twiddle, compile, throw anything you
want at it. If it helps us as much as the greater pg world, then that's
perfect.

 This is unfortunately not going to help you as far as getting that
 machine into production now (unless you're brave enough to run CVS tip
 as production, which I certainly am not).  

.. if the problem can actually be boiled down to the locking/threading
issues, surely it should be straightforward to backport those changes
to 8.1.3 mainline?

 I'm afraid you're most
 likely going to have to ship that pSeries back at the end of the
 month, but while you've got it it'd be awfully nice if we could use
 it as a testbed ...

We have it for the next 2 weeks, and whilst I can't guarantee access for
all that time, you're welcome to hammer away at it over this weekend if
that's any help? Mail me privately and I'll sort out login details if
this is interesting.

Cheers,
Gavin.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread Gavin Hamill
On Fri, 07 Apr 2006 15:56:52 -0700
Luke Lonergan [EMAIL PROTECTED] wrote:

 Seems like you're hitting a very small target in RAM with these semop
 calls. I wonder what part of the code is doing this - Tom would know
 better how to trace it, but the equivalent of oprofile output would
 be nice.

I'm happy to test whatever I can, but I simply don't know enough AIX to
be able to tell whether a similar kernel-level profiler is
available/possible.
 
 The other thing that I'd like to see is an evaluation of the memory
 access latency of this machine from Register to RAM.  I couldn't find
 a benchmarking tool that was UNIX friendly out there, maybe I'll
 write one real quick.  I suspect this machine has a heinous latency
 and a storm of semops to the same spot of RAM might be a far worse
 performance problem on this machine than on others...

Well, as I said to Tom, the machine is available for running tests
on :) If it helps us, and helps pg become more AIX friendly, then I'm
all for whatever needs done...

Cheers,
Gavin.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster