Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri
 So, why I don't use prepare here: let's say I'm testing the worst
 stress case :-)  Imagine you have thousands of such kind of queries -
 you cannot prepare all of them! :-)

 Thousands?  Surely there'll be a dozen or three of most common queries,
 to which you pass different parameters.  You can prepare thoseu

Ok, and if each client just connect to the database, execute each kind
of query just *once* and then disconnect?..  - cost of prepare will
kill performance here if it's not reused at least 10 times within the
same session.

Well, I know, we always can do better, and even use stored procedures,
etc. etc.



 Now, as you see from your explanation, the Part #2 is the most
 dominant - so why instead to blame this query not to implement a QUERY
 PLANNER CACHE??? - in way if any *similar* query is recognized by
 parser we simply *reuse* the same plan?..

 This has been discussed in the past, but it turns out that a real
 implementation is a lot harder than it seems.

Ok. If I remember well, Oracle have it and it helps a lot, but for
sure it's not easy to implement..

Rgds,
-Dimitri

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Andres Freund

Hi,

On 05/12/2009 12:46 AM, Dimitri wrote:

So, why I don't use prepare here: let's say I'm testing the worst
stress case :-)  Imagine you have thousands of such kind of queries -
you cannot prepare all of them! :-)  or you'll maybe prepare it once,
but as I showed previously in this thread prepare statement itself
takes 16ms, so for a single shot there is no gain! :-)
I have a hard time imaging a high throughput OLTP workload with that 
many different queries ;-)


Naturally it would still be nice to be good in this not optimal workload...

Andres

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Timestamp index not used in some cases

2009-05-12 Thread Евгений Василев
I have the following table:

CREATE TABLE temp.tmp_135528
(
  id integer NOT NULL,
  prid integer,
  group_id integer,
  iinv integer,
  oinv integer,
  isum numeric,
  osum numeric,
  idate timestamp without time zone,
  odate timestamp without time zone,
  CONSTRAINT t_135528_pk PRIMARY KEY (id)
)
WITH (OIDS=FALSE);

With index:

CREATE INDEX t_135528
  ON temp.tmp_135528
  USING btree
  (idate, group_id, osum, oinv);

When the following query is executed the index is not used:

EXPLAIN SELECT id, osum
FROM temp.tmp_135528
WHERE idate = '2007-05-17 00:00:00'::timestamp
AND group_id =  '13'
AND osum =  '19654.45328'
AND oinv = -1

QUERY PLAN
---
 Seq Scan on tmp_135528  (cost=0.00..7022.36 rows=1166 width=11)
   Filter: ((idate = '2007-05-17 00:00:00'::timestamp without time zone) AND 
(osum = 19654.45328) AND (group_id = 13) AND (oinv = (-1)))
(2 rows)

When 
idate = '2007-05-17 00:00:00'::timestamp 
is changed to 
idate = '2007-05-17 00:00:00'::timestamp 
or
idate = '2007-05-17 00:00:00'::timestamp 
then the index is used:

EXPLAIN SELECT id, osum
FROM temp.tmp_135528
WHERE idate = '2007-05-17 00:00:00'::timestamp
AND group_id =  '13'
AND osum =  '19654.45328'
AND oinv = -1;
  QUERY PLAN
---
 Index Scan using t_135528 on tmp_135528  (cost=0.00..462.61 rows=47 width=11)
   Index Cond: ((idate = '2007-05-17 00:00:00'::timestamp without time zone) 
AND (group_id = 13) AND (osum = 19654.45328) AND (oinv = (-1)))
(2 rows)

Why I cannot use the index in = comparison on timestamp ?

Best regards,
Evgeni Vasilev
JAR Computers
IT Department
jabber id: evasi...@jabber.jarcomputers.com



Re: [PERFORM] What is the most optimal config parameters to keep stable write TPS ?..

2009-05-12 Thread Laurent Laborde
On Mon, May 11, 2009 at 6:31 PM, Dimitri dimitrik...@gmail.com wrote:
 Hi Kevin,

 PostgreSQL: 8.3.7  8.4
 Server: Sun M5000 32cores
 OS: Solaris 10

 current postgresql.conf:

 #
 max_connections = 2000                  # (change requires restart)

Are you sure about the 2000 connections ?
Why don't you use a pgbouncer or pgpool instead ?


-- 
F4FQM
Kerunix Flan
Laurent Laborde

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the most optimal config parameters to keep stable write TPS ?..

2009-05-12 Thread Dimitri
It's just one of the test conditions - what if there 2000 users? - I
know I may use pgpool or others, but I also need to know the limits of
the database engine itself.. For the moment I'm limiting to 256
concurrent sessions, but config params are kept like for 2000 :-)

Rgds,
-Dimitri

On 5/12/09, Laurent Laborde kerdez...@gmail.com wrote:
 On Mon, May 11, 2009 at 6:31 PM, Dimitri dimitrik...@gmail.com wrote:
 Hi Kevin,

 PostgreSQL: 8.3.7  8.4
 Server: Sun M5000 32cores
 OS: Solaris 10

 current postgresql.conf:

 #
 max_connections = 2000                  # (change requires restart)

 Are you sure about the 2000 connections ?
 Why don't you use a pgbouncer or pgpool instead ?


 --
 F4FQM
 Kerunix Flan
 Laurent Laborde


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Timestamp index not used in some cases

2009-05-12 Thread Scott Marlowe
On Tue, May 12, 2009 at 3:00 AM, Евгений Василев
evasi...@jarcomputers.com wrote:
 I have the following table:

 CREATE TABLE temp.tmp_135528
 (
 id integer NOT NULL,
 prid integer,
 group_id integer,
 iinv integer,
 oinv integer,
 isum numeric,
 osum numeric,
 idate timestamp without time zone,
 odate timestamp without time zone,
 CONSTRAINT t_135528_pk PRIMARY KEY (id)
 )
 WITH (OIDS=FALSE);

 With index:

 CREATE INDEX t_135528
 ON temp.tmp_135528
 USING btree
 (idate, group_id, osum, oinv);

 When the following query is executed the index is not used:

 EXPLAIN SELECT id, osum
 FROM temp.tmp_135528
 WHERE idate = '2007-05-17 00:00:00'::timestamp
 AND group_id = '13'
 AND osum = '19654.45328'
 AND oinv = -1

 QUERY PLAN
 ---
 Seq Scan on tmp_135528 (cost=0.00..7022.36 rows=1166 width=11)
 Filter: ((idate = '2007-05-17 00:00:00'::timestamp without time zone) AND
 (osum = 19654.45328) AND (group_id = 13) AND (oinv = (-1)))
 (2 rows)

 When
 idate = '2007-05-17 00:00:00'::timestamp
 is changed to
 idate = '2007-05-17 00:00:00'::timestamp
 or
 idate = '2007-05-17 00:00:00'::timestamp
 then the index is used:

 EXPLAIN SELECT id, osum
 FROM temp.tmp_135528
 WHERE idate = '2007-05-17 00:00:00'::timestamp
 AND group_id = '13'
 AND osum = '19654.45328'
 AND oinv = -1;
 QUERY PLAN
 ---
 Index Scan using t_135528 on tmp_135528 (cost=0.00..462.61 rows=47 width=11)
 Index Cond: ((idate = '2007-05-17 00:00:00'::timestamp without time zone)
 AND (group_id = 13) AND (osum = 19654.45328) AND (oinv = (-1)))
 (2 rows)

 Why I cannot use the index in = comparison on timestamp ?

You can.  But in this instance one query is returning 47 rows while
the other is returning 1166 rows (or the planner thinks it is).
There's a switchover point where it's cheaper to seq scan.  You can
adjust this point up and down by adjusting various costs parameters.
random_page_cost is commonly lowered to the 1.5 to 2.0 range, and
effective_cache_size is normally set higher, to match the cache in the
kernel plus the shared_buffer size.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri
Folks, before you start to think what a dumb guy doing a dumb thing :-))
I'll explain you few details:

it's for more than 10 years I'm using a db_STRESS kit
(http://dimitrik.free.fr/db_STRESS.html) to check databases
performance and scalability. Until now I was very happy with results
it gave me as it stress very well each database engine internals an
put on light some things I should probably skip on other workloads.
What do you want, with a time the fast query executed before in
500ms now runs within 1-2ms  - not only hardware was improved but also
database engines increased their performance a lot! :-))

In 2007 I've published the first public results with PostgreSQL, and
it was 2 times faster on that time comparing to MySQL
(http://dimitrik.free.fr/db_STRESS_BMK_Part1.html)

Last month for the launching of MySQL 5.4 I've done a long series of
tests and at the end for my curiosity I've executed the same load
against PostgreSQL 8.3.7 to see if MySQL is more close now. For my big
surprise, MySQL was faster! As well observations on PG processing
bring me a lot of questions - I supposed something was abnormal on PG
side, but I did not have too much time to understand what it was
exactly 
(http://dimitrik.free.fr/db_STRESS_MySQL_540_and_others_Apr2009.html#note_5443)

What I'm trying to do now is to understand what exactly is the problem.

What I discovered so far with all your help:
  - the impact of a planner
  - the impact of the analyze target
  - the impact of prepare / execute
  - scalability limit on 32 cores

I'll also try to adapt prepare/execute solution to see how much it
improves performance and/or scalability.

As well helping from the other thread I was able to improve a lot the
TPS stability on read+write workload! :-)

Any other comments are welcome!

Rgds,
-Dimitri

On 5/12/09, Dimitri Fontaine dfonta...@hi-media.com wrote:
 Hi,

 Dimitri dimitrik...@gmail.com writes:

 So, why I don't use prepare here: let's say I'm testing the worst
 stress case :-)  Imagine you have thousands of such kind of queries -
 you cannot prepare all of them! :-)

 Thousands?  Surely there'll be a dozen or three of most common queries,
 to which you pass different parameters.  You can prepare thoseu

 Ok, and if each client just connect to the database, execute each kind
 of query just *once* and then disconnect?..  - cost of prepare will
 kill performance here if it's not reused at least 10 times within the
 same session.

 In a scenario which looks like this one, what I'm doing is using
 pgbouncer transaction pooling. Now a new connection from client can be
 served by an existing backend, which already has prepared your
 statement.

 So you first SELECT name FROM pg_prepared_statements; to know if you
 have to PREPARE or just EXECUTE, and you not only maintain much less
 running backends, lower fork() calls, but also benefit fully from
 preparing the statements even when you EXECUTE once per client
 connection.

 Well, I know, we always can do better, and even use stored procedures,
 etc. etc.

 Plain SQL stored procedure will prevent PostgreSQL to prepare your
 queries, only PLpgSQL functions will force transparent plan caching. But
 calling this PL will cost about 1ms per call in my tests, so it's not a
 good solution.

 It's possible to go as far as providing your own PostgreSQL C module
 where you PREPARE at _PG_init() time and EXECUTE in a SQL callable
 function, coupled with pgbouncer it should max out the perfs. But maybe
 you're not willing to go this far.

 Anyway, is hammering the server with always the same query your real
 need or just a simplified test-case? If the former, you'll see there are
 good ways to theorically obtain better perfs than what you're currently
 reaching, if the latter I urge you to consider some better benchmarking
 tools, such as playr or tsung.

   https://area51.myyearbook.com/trac.cgi/wiki/Playr
   http://tsung.erlang-projects.org/
   http://pgfouine.projects.postgresql.org/tsung.html
   http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php

 Regards,
 --
 dim


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 12:19 +0200, Dimitri wrote:

 For my big surprise, MySQL was faster!

Ours too.

** I bet you $1000 that I can improve the performance of your benchmark
results with PostgreSQL. You give me $1000 up-front and if I can't
improve your high end numbers I'll give you $2000 back. Either way, you
name me and link to me from your blog. Assuming you re-run the tests as
requested and give me reasonable access to info and measurements. **

I note your blog identifies you as a Sun employee. Is that correct? If
you do not give us the opportunity to improve upon the results then
reasonable observers might be persuaded you did not wish to show
PostgreSQL in its best light. You up for it?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri
Wow, Simon! :-))

yes, I'm working in Sun Benchmark Center :-))
(I'm not using my Sun email on public lists only to avid a spam)

and as came here and asking questions it's probably proving my
intentions to show PostgreSQL in its best light, no?.. - I never liked
not honest comparisons :-))

Regarding your bet: from a very young age I learned a one thing - you
take any 2 person who betting for any reason - you'll find in them one
idiot and one bastard :-))   idiot - because betting while missing
knowledge, and bastard - because knowing the truth is not honset to
get a profit from idiots :-))  That's why I never betting in my life,
but every time telling the same story in such situation... Did you
like it? ;-))

However, no problem to give you a credit as well to all pg-perf list
as it provides a very valuable help! :-))

Rgds,
-Dimitri

On 5/12/09, Simon Riggs si...@2ndquadrant.com wrote:

 On Tue, 2009-05-12 at 12:19 +0200, Dimitri wrote:

 For my big surprise, MySQL was faster!

 Ours too.

 ** I bet you $1000 that I can improve the performance of your benchmark
 results with PostgreSQL. You give me $1000 up-front and if I can't
 improve your high end numbers I'll give you $2000 back. Either way, you
 name me and link to me from your blog. Assuming you re-run the tests as
 requested and give me reasonable access to info and measurements. **

 I note your blog identifies you as a Sun employee. Is that correct? If
 you do not give us the opportunity to improve upon the results then
 reasonable observers might be persuaded you did not wish to show
 PostgreSQL in its best light. You up for it?

 --
  Simon Riggs   www.2ndQuadrant.com
  PostgreSQL Training, Services and Support



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 13:16 +0200, Dimitri wrote:

 Wow, Simon! :-))
 
 yes, I'm working in Sun Benchmark Center :-))
 (I'm not using my Sun email on public lists only to avid a spam)
 
 and as came here and asking questions it's probably proving my
 intentions to show PostgreSQL in its best light, no?.. - I never liked
 not honest comparisons :-))
 
 Regarding your bet: from a very young age I learned a one thing - you
 take any 2 person who betting for any reason - you'll find in them one
 idiot and one bastard :-))   idiot - because betting while missing
 knowledge, and bastard - because knowing the truth is not honset to
 get a profit from idiots :-))  That's why I never betting in my life,
 but every time telling the same story in such situation... Did you
 like it? ;-))

No, but I asked for it, so we're even. ;-)

Let's work on the benchmark.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Heikki Linnakangas

Dimitri wrote:

What I discovered so far with all your help:
  - the impact of a planner
  - the impact of the analyze target
  - the impact of prepare / execute
  - scalability limit on 32 cores


You've received good advice on how to minimize the impact of the first 
three points, and using those techniques should bring a benefit. But I'm 
pretty surprised by the bad scalability you're seeing and no-one seems 
to have a good idea on where that limit is coming from. At a quick 
glance, I don't see any inherent bottlenecks in the schema and workload.


If you could analyze where the bottleneck is with multiple cores, that 
would be great. With something like oprofile, it should be possible to 
figure out where the time is spent.


My first guess would be the WALInsertLock: writing to WAL is protected 
by that and it an become a bottleneck with lots of small 
UPDATE/DELETE/INSERT transactions. But a profile would be required to 
verify that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 12:19 +0200, Dimitri wrote:

 What I'm trying to do now is to understand what exactly is the
 problem.

You're running with 1600 users, which is above the scalability limit
uncovered (by Sun...) during earlier benchmarking. The scalability
issues are understood but currently considered above the
reasonable-setting limit and so nobody has been inclined to improve
matters.

You should use a connection concentrator to reduce the number of
sessions down to say 400.

You're WAL buffers setting is also too low and you will be experiencing
contention on the WALWriteLock. Increase wal_buffers to about x8 where
you have it now.

You can move pg_xlog to its own set of drives.

Set checkpoint_completion_target to 0.95.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query planner making bad decisions

2009-05-12 Thread Cory Coager

Tom Lane said the following on 05/11/2009 07:02 PM:

where we're off by a factor of 1500+ :-(

I think most likely the ~~ operator is the biggest problem.
Unfortunately 8.1's estimator for ~~ is not terribly bright.  You could
try increasing your statistics target but I don't think it will help
much.  Is there any chance of updating to 8.2 or later?  8.2 can do
significantly better on this type of estimate as long as it has enough
stats.

In any case I'd suggest raising default_statistics_target to 100 or so,
as you seem to be running queries complex enough to need good stats.
But I'm not sure that that will be enough to fix the problem in 8.1.

regards, tom lane
  
I should have mentioned the statistics for every column are already set 
to 1000.  I guess we'll have to add an upgrade to the project list.  
Thanks for the info.





The information contained in this communication is intended
only for the use of the recipient(s) named above. It may
contain information that is privileged or confidential, and
may be protected by State and/or Federal Regulations. If
the reader of this message is not the intended recipient,
you are hereby notified that any dissemination,
distribution, or copying of this communication, or any of
its contents, is strictly prohibited. If you have received
this communication in error, please return it to the sender
immediately and delete the original message and any copy
of it from your computer system. If you have any questions
concerning this message, please contact the sender.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri
For the moment I'm even not considering any scalability issues on the
Read+Write workload - it may always be related to the storage box, and
storage latency or controller/cache efficiency may play a lot.

As problem I'm considering a scalability issue on Read-Only workload -
only selects, no disk access, and if on move from 8 to 16 cores we
gain near 100%, on move from 16 to 32 cores it's only 10%...

I think I have to replay Read-Only with prepare/execute and check how
much it'll help (don't know if there are some internal locking used
when a planner is involved)..

And yes, I'll try to profile on 32 cores, it makes sense.

Rgds,
-Dimitri

On 5/12/09, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 Dimitri wrote:
 What I discovered so far with all your help:
   - the impact of a planner
   - the impact of the analyze target
   - the impact of prepare / execute
   - scalability limit on 32 cores

 You've received good advice on how to minimize the impact of the first
 three points, and using those techniques should bring a benefit. But I'm
 pretty surprised by the bad scalability you're seeing and no-one seems
 to have a good idea on where that limit is coming from. At a quick
 glance, I don't see any inherent bottlenecks in the schema and workload.

 If you could analyze where the bottleneck is with multiple cores, that
 would be great. With something like oprofile, it should be possible to
 figure out where the time is spent.

 My first guess would be the WALInsertLock: writing to WAL is protected
 by that and it an become a bottleneck with lots of small
 UPDATE/DELETE/INSERT transactions. But a profile would be required to
 verify that.

 --
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Stefan Kaltenbrunner

Dimitri wrote:

Folks, before you start to think what a dumb guy doing a dumb thing :-))
I'll explain you few details:

it's for more than 10 years I'm using a db_STRESS kit
(http://dimitrik.free.fr/db_STRESS.html) to check databases
performance and scalability. Until now I was very happy with results
it gave me as it stress very well each database engine internals an
put on light some things I should probably skip on other workloads.
What do you want, with a time the fast query executed before in
500ms now runs within 1-2ms  - not only hardware was improved but also
database engines increased their performance a lot! :-))


I was attempting to look into that benchmark kit a bit but I find the 
information on that page a bit lacking :( a few notices:


* is the sourcecode for the benchmark actually available? the kit 
seems to contain a few precompiled binaries and some source/headfiles 
but there are no building instructions, no makefile or even a README 
which makes it really hard to verify exactly what the benchmark is doing 
or if the benchmark client might actually be the problem here.


* there is very little information on how the toolkit talks to the 
database - some of the binaries seem to contain a static copy of libpq 
or such?


* how many queries per session is the toolkit actually using - some 
earlier comments seem to imply you are doing a connect/disconnect cycle 
for every query ist that actually true?



Stefan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri
Wait wait, currently I'm playing the stress scenario, so there are
only 256 sessions max, but thing time is zero (full stress). Scenario
with 1600 users is to test how database is solid just to keep a huge
amount of users, but doing only one transaction per second (very low
global TPS comparing to what database is able to do, but it's testing
how well its internals working to manage the user sessions).

I did not plan to do 1600 users test this time (all depends on time :-))

So, do I need to increase WAL buffers for 256 users?

My LOG and DATA are placed on separated storage LUNs and controllers
from the beginning.

I've changed the default 0.5 checkpoint_completion_target to 0.8 now,
should I go until 0.95 ?..

Also, to avoid TPS waves and bring stability on Read+Write workload
I followed advices from a parallel thread:

bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 4.0
shared_buffers = 1024MB

I've also tried shared_buffers=256MB as it was advised, but then
Read-Only workload decreasing performance as PG self caching helps
anyway.

Also, checkpoint_timeout is 30s now, and of course a huge difference
came with moving default_statistics_target to 5 !  -but this one I
found myself :-))

Probably checkpoint_timeout may be bigger now with the current
settings? - the goal here is to keep Read+Write TPS as stable as
possible and also avoid a long recovery in case of
system/database/other crash (in theory).

Rgds,
-Dimitri


On 5/12/09, Simon Riggs si...@2ndquadrant.com wrote:

 On Tue, 2009-05-12 at 12:19 +0200, Dimitri wrote:

 What I'm trying to do now is to understand what exactly is the
 problem.

 You're running with 1600 users, which is above the scalability limit
 uncovered (by Sun...) during earlier benchmarking. The scalability
 issues are understood but currently considered above the
 reasonable-setting limit and so nobody has been inclined to improve
 matters.

 You should use a connection concentrator to reduce the number of
 sessions down to say 400.

 You're WAL buffers setting is also too low and you will be experiencing
 contention on the WALWriteLock. Increase wal_buffers to about x8 where
 you have it now.

 You can move pg_xlog to its own set of drives.

 Set checkpoint_completion_target to 0.95.

 --
  Simon Riggs   www.2ndQuadrant.com
  PostgreSQL Training, Services and Support



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri
Hi Stefan,

sorry, I did not have a time to bring all details into the toolkit -
but at least I published it instead to tell a nice story about :-)

The client process is a binary compiled with libpq. Client is
interpreting a scenario script and publish via SHM a time spent on
each SQL request. I did not publish sources yet as it'll also require
to explain how to compile them :-)) So for the moment it's shipped as
a freeware, but with time everything will be available (BTW, you're
the first who asking for sources (well, except IBM guys who asked to
get it on POWER boxes, but it's another story :-))

What is good is each client is publishing *live* its internal stats an
we're able to get live data and follow any kind of waves in
performance. Each session is a single process, so there is no
contention between clients as you may see on some other tools. The
current scenario script contains 2 selects (representing a Read
transaction) and delete/insert/update (representing Write
transaction). According a start parameters each client executing a
given number Reads per Write. It's connecting on the beginning and
disconnecting at the end of the test.

It's also possible to extend it to do other queries, or simply give to
each client a different scenario script - what's important is to able
to collect then its stats live to understand what's going wrong (if
any)..

I'm planning to extend it and give an easy way to run it against any
database schema, it's only question of time..

Rgds,
-Dimitri

On 5/12/09, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote:
 Dimitri wrote:
 Folks, before you start to think what a dumb guy doing a dumb thing :-))
 I'll explain you few details:

 it's for more than 10 years I'm using a db_STRESS kit
 (http://dimitrik.free.fr/db_STRESS.html) to check databases
 performance and scalability. Until now I was very happy with results
 it gave me as it stress very well each database engine internals an
 put on light some things I should probably skip on other workloads.
 What do you want, with a time the fast query executed before in
 500ms now runs within 1-2ms  - not only hardware was improved but also
 database engines increased their performance a lot! :-))

 I was attempting to look into that benchmark kit a bit but I find the
 information on that page a bit lacking :( a few notices:

 * is the sourcecode for the benchmark actually available? the kit
 seems to contain a few precompiled binaries and some source/headfiles
 but there are no building instructions, no makefile or even a README
 which makes it really hard to verify exactly what the benchmark is doing
 or if the benchmark client might actually be the problem here.

 * there is very little information on how the toolkit talks to the
 database - some of the binaries seem to contain a static copy of libpq
 or such?

 * how many queries per session is the toolkit actually using - some
 earlier comments seem to imply you are doing a connect/disconnect cycle
 for every query ist that actually true?


 Stefan


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the most optimal config parameters to keep stable write TPS ?..

2009-05-12 Thread Julian v. Bock
Hi

 D == Dimitri  dimitrik...@gmail.com writes:

D current postgresql.conf:

D #
D max_connections = 2000 # (change requires restart)
D temp_buffers = 200MB

temp_buffers are kept per connection and not freed until the session
ends. If you use some kind of connection pooling this can eat up a lot
of ram that could be used for caching instead.

Regards,
Julian

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Stefan Kaltenbrunner

Dimitri wrote:

Hi Stefan,

sorry, I did not have a time to bring all details into the toolkit -
but at least I published it instead to tell a nice story about :-)


fair point and appreciated. But it seems important that benchmarking 
results can be verified by others as well...




The client process is a binary compiled with libpq. Client is
interpreting a scenario script and publish via SHM a time spent on
each SQL request. I did not publish sources yet as it'll also require
to explain how to compile them :-)) So for the moment it's shipped as
a freeware, but with time everything will be available (BTW, you're
the first who asking for sources (well, except IBM guys who asked to
get it on POWER boxes, but it's another story :-))


well there is no licence tag(or a copyright notice) or anything als 
associated with the download which makes it a bit harder than it really 
needs to be.
The reason why I was actually looking for the source is that all my 
available benchmark platforms are none of the ones you are providing 
binaries for which kinda reduces its usefulness.




What is good is each client is publishing *live* its internal stats an
we're able to get live data and follow any kind of waves in
performance. Each session is a single process, so there is no
contention between clients as you may see on some other tools. The
current scenario script contains 2 selects (representing a Read
transaction) and delete/insert/update (representing Write
transaction). According a start parameters each client executing a
given number Reads per Write. It's connecting on the beginning and
disconnecting at the end of the test.


well I have seen clients getting bottlenecked internally (like wasting 
more time in getting rid/absorbing of the actual result than it took the 
server to generate the answer...).
How sure are you that your live publishing of data does not affect the 
benchmark results(because it kinda generates an artifical think time) 
for example?
But what I get from your answer is that you are basically doing one 
connect/disconnect per client and the testcase you are talking about has 
256 clients?



Stefan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Robert Haas
On Tue, May 12, 2009 at 8:59 AM, Dimitri dimitrik...@gmail.com wrote:
 Wait wait, currently I'm playing the stress scenario, so there are
 only 256 sessions max, but thing time is zero (full stress). Scenario
 with 1600 users is to test how database is solid just to keep a huge
 amount of users, but doing only one transaction per second (very low
 global TPS comparing to what database is able to do, but it's testing
 how well its internals working to manage the user sessions).

Didn't we beat this to death in mid-March on this very same list?
Last time I think it was Jignesh Shah.  AIUI, it's a well-known fact
that PostgreSQL doesn't do very well at this kind of workload unless
you use a connection pooler.

*goes and checks the archives*  Sure enough, 116 emails under the
subject line Proposal of tunable fix for scalability of 8.4.

So, if your goal is to find a scenario under which PostgreSQL performs
as badly as possible, congratulations - you've discovered the same
case that we already knew about.  Obviously it would be nice to
improve it, but IIRC so far no one has had any very good ideas on how
to do that.  If this example mimics a real-world workload that you
care about, and if using a connection pooler is just not a realistic
option in that scenario for whatever reason, then you'd be better off
working on how to fix it than on measuring it, because it seems to me
we already know it's got problems, per previous discussions.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Matthew Wakeling

On Tue, 12 May 2009, Stefan Kaltenbrunner wrote:
But what I get from your answer is that you are basically doing one 
connect/disconnect per client and the testcase you are talking about has 256 
clients?


Correct me if I'm wrong, but won't connect operations be all handled by a 
single thread - the parent postmaster? There's your scalability problem 
right there. Also, spawning a new backend process is an awful lot of 
overhead to run just one query.


As far as I can see, it's quite understandable for MySQL to perform better 
than PostgreSQL in these circumstances, as it has a smaller simpler 
backend to start up each time. If you really want to get a decent 
performance out of Postgres, then use long-lived connections (which most 
real-world use cases will do) and prepare your queries in advance with 
parameters.


Matthew

--
import oz.wizards.Magic;
  if (Magic.guessRight())...   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 16:00 +0100, Matthew Wakeling wrote:
 won't connect operations be all handled by a 
 single thread - the parent postmaster?

No, we spawn then authenticate. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Matthew Wakeling

On Tue, 12 May 2009, Simon Riggs wrote:

won't connect operations be all handled by a
single thread - the parent postmaster?


No, we spawn then authenticate.


But you still have a single thread doing the accept() and spawn. At some 
point (maybe not now, but in the future) this could become a bottleneck 
given very short-lived connections.


Matthew

--
-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-.
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Stefan Kaltenbrunner

Matthew Wakeling wrote:

On Tue, 12 May 2009, Simon Riggs wrote:

won't connect operations be all handled by a
single thread - the parent postmaster?


No, we spawn then authenticate.


But you still have a single thread doing the accept() and spawn. At some 
point (maybe not now, but in the future) this could become a bottleneck 
given very short-lived connections.


well the main cost is backend startup and that one is extremely 
expensive (compared to the cost of a simple query and also depending on 
the OS). We have more overhead there than other databases (most notably 
MySQL) hence what prompted my question on how the benchmark was operating.
For any kind of workloads that contain frequent connection 
establishments one wants to use a connection pooler like pgbouncer(as 
said elsewhere in the thread already).



Stefan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Tom Lane
Matthew Wakeling matt...@flymine.org writes:
 On Tue, 12 May 2009, Simon Riggs wrote:
 No, we spawn then authenticate.

 But you still have a single thread doing the accept() and spawn. At some 
 point (maybe not now, but in the future) this could become a bottleneck 
 given very short-lived connections.

More to the point, each backend process is a pretty heavyweight object:
it is a process, not a thread, and it's not going to be good for much
until it's built up a reasonable amount of stuff in its private caches.
I don't think the small number of cycles executed in the postmaster
process amount to anything at all compared to the other overhead
involved in getting a backend going.

In short: executing a single query per connection is going to suck,
and there is not anything we are going to do about it except to tell
you to use a connection pooler.

MySQL has a different architecture: thread per connection, and AFAIK
whatever caches it has are shared across threads.  So a connection is a
lighter-weight object for them; but there's no free lunch.  They pay for
it in having to tolerate locking/contention overhead on operations that
for us are backend-local.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri
Robert, what I'm testing now is 256 users max. The workload is growing
progressively from 1, 2, 4, 8 ... to 256 users. Of course the Max
throughput is reached on the number of users equal to 2 * number of
cores, but what's important for me here - database should continue to
keep the workload! - response time regressing, but the troughput
should remain near the same.

So, do I really need a pooler to keep 256 users working??  - I don't
think so, but please, correct me.

BTW, I did not look to put PostgreSQL in bad conditions - the test is
the test, and as I said 2 years ago PostgreSQL outperformed MySQL on
the same test case, and there was nothing done within MySQL code to
improve it explicitly for db_STRESS.. And I'm staying pretty honest
when I'm testing something.

Rgds,
-Dimitri


On 5/12/09, Robert Haas robertmh...@gmail.com wrote:
 On Tue, May 12, 2009 at 8:59 AM, Dimitri dimitrik...@gmail.com wrote:
 Wait wait, currently I'm playing the stress scenario, so there are
 only 256 sessions max, but thing time is zero (full stress). Scenario
 with 1600 users is to test how database is solid just to keep a huge
 amount of users, but doing only one transaction per second (very low
 global TPS comparing to what database is able to do, but it's testing
 how well its internals working to manage the user sessions).

 Didn't we beat this to death in mid-March on this very same list?
 Last time I think it was Jignesh Shah.  AIUI, it's a well-known fact
 that PostgreSQL doesn't do very well at this kind of workload unless
 you use a connection pooler.

 *goes and checks the archives*  Sure enough, 116 emails under the
 subject line Proposal of tunable fix for scalability of 8.4.

 So, if your goal is to find a scenario under which PostgreSQL performs
 as badly as possible, congratulations - you've discovered the same
 case that we already knew about.  Obviously it would be nice to
 improve it, but IIRC so far no one has had any very good ideas on how
 to do that.  If this example mimics a real-world workload that you
 care about, and if using a connection pooler is just not a realistic
 option in that scenario for whatever reason, then you'd be better off
 working on how to fix it than on measuring it, because it seems to me
 we already know it's got problems, per previous discussions.

 ...Robert


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the most optimal config parameters to keep stable write TPS ?..

2009-05-12 Thread Dimitri
Good point!  I missed it.. - will 20MB be enough?

Rgds,
-Dimitri

On 5/12/09, Julian v. Bock b...@openit.de wrote:
 Hi

 D == Dimitri  dimitrik...@gmail.com writes:

 D current postgresql.conf:

 D #
 D max_connections = 2000 # (change requires restart)
 D temp_buffers = 200MB

 temp_buffers are kept per connection and not freed until the session
 ends. If you use some kind of connection pooling this can eat up a lot
 of ram that could be used for caching instead.

 Regards,
 Julian

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Kevin Grittner
Dimitri dimitrik...@gmail.com wrote:
 
 Of course the Max throughput is reached on the number of users equal
 to 2 * number of cores
 
I'd expect that when disk I/O is not a significant limiting factor,
but I've seen a sweet spot of (2 * cores) + (effective spindle
count) for loads involving a lot of random I/O.
 
 So, do I really need a pooler to keep 256 users working??
 
I have seen throughput fall above a certain point when I don't use a
connection pooler.  With a connection pooler which queues requests
when all connections are busy, you will see no throughput degradation
as users of the pool are added.  Our connection pool is in our
framework, so I don't know whether pgbouncer queues requests. 
(Perhaps someone else can comment on that, and make another suggestion
if it doesn't.)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri
On 5/12/09, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote:
 Dimitri wrote:
 Hi Stefan,

 sorry, I did not have a time to bring all details into the toolkit -
 but at least I published it instead to tell a nice story about :-)

 fair point and appreciated. But it seems important that benchmarking
 results can be verified by others as well...

until now there were only people running Solaris or Linux :-))



 The client process is a binary compiled with libpq. Client is
 interpreting a scenario script and publish via SHM a time spent on
 each SQL request. I did not publish sources yet as it'll also require
 to explain how to compile them :-)) So for the moment it's shipped as
 a freeware, but with time everything will be available (BTW, you're
 the first who asking for sources (well, except IBM guys who asked to
 get it on POWER boxes, but it's another story :-))

 well there is no licence tag(or a copyright notice) or anything als
 associated with the download which makes it a bit harder than it really
 needs to be.
 The reason why I was actually looking for the source is that all my
 available benchmark platforms are none of the ones you are providing
 binaries for which kinda reduces its usefulness.


agree, will improve this point


 What is good is each client is publishing *live* its internal stats an
 we're able to get live data and follow any kind of waves in
 performance. Each session is a single process, so there is no
 contention between clients as you may see on some other tools. The
 current scenario script contains 2 selects (representing a Read
 transaction) and delete/insert/update (representing Write
 transaction). According a start parameters each client executing a
 given number Reads per Write. It's connecting on the beginning and
 disconnecting at the end of the test.

 well I have seen clients getting bottlenecked internally (like wasting
 more time in getting rid/absorbing of the actual result than it took the
 server to generate the answer...).
 How sure are you that your live publishing of data does not affect the
 benchmark results(because it kinda generates an artifical think time)
 for example?

On all my test tools client are publishing their data via shared
memory segment (ISM), all they do is just *incrementing* their current
stats values and continuing their processing. Another dedicated
program should be executed to print these stats - it's connecting to
the same SHM segment and printing a *difference* between values for
the current and the next interval. Let me know if you need more
details.

 But what I get from your answer is that you are basically doing one
 connect/disconnect per client and the testcase you are talking about has
 256 clients?

Exactly, only one connect/disconnect per test, and number of clients
is growing progressively from 1, 2, 4, 8, 16, .. to 256

Rgds,
-Dimitri



 Stefan


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Joshua D. Drake
On Tue, 2009-05-12 at 17:22 +0200, Dimitri wrote:
 Robert, what I'm testing now is 256 users max. The workload is growing
 progressively from 1, 2, 4, 8 ... to 256 users. Of course the Max
 throughput is reached on the number of users equal to 2 * number of
 cores, but what's important for me here - database should continue to
 keep the workload! - response time regressing, but the troughput
 should remain near the same.
 
 So, do I really need a pooler to keep 256 users working??  - I don't
 think so, but please, correct me.

If they disconnect and reconnect yes. If they keep the connections live
then no. 

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri
No, they keep connections till the end of the test.

Rgds,
-Dimitri

On 5/12/09, Joshua D. Drake j...@commandprompt.com wrote:
 On Tue, 2009-05-12 at 17:22 +0200, Dimitri wrote:
 Robert, what I'm testing now is 256 users max. The workload is growing
 progressively from 1, 2, 4, 8 ... to 256 users. Of course the Max
 throughput is reached on the number of users equal to 2 * number of
 cores, but what's important for me here - database should continue to
 keep the workload! - response time regressing, but the troughput
 should remain near the same.

 So, do I really need a pooler to keep 256 users working??  - I don't
 think so, but please, correct me.

 If they disconnect and reconnect yes. If they keep the connections live
 then no.

 Joshua D. Drake

 --
 PostgreSQL - XMPP: jdr...@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 AIUI, whenever the connection pooler switches to serving a new client,
 it tells the PG backend to DISCARD ALL.  But why couldn't we just
 implement this same logic internally?  IOW, when a client disconnects,
 instead of having the backend exit immediately, have it perform the
 equivalent of DISCARD ALL and then stick around for a minute or two
 and, if a new connection request arrives within that time, have the
 old backend handle the new connection...

See previous discussions.  IIRC, there are two killer points:

1. There is no (portable) way to pass the connection from the postmaster
to another pre-existing process.

2. You'd have to track which database, and probably which user, each
such backend had been launched for; reconnecting a backend to a new
database is probably impractical and would certainly invalidate all
the caching.

Overall it looked like way too much effort for way too little gain.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Alvaro Herrera
Andres Freund escribió:

 Naturally it would still be nice to be good in this not optimal workload...

I find it hard to justify wasting our scarce development resources into
optimizing such a contrived workload.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 11:30 -0700, Scott Carey wrote:
 the fact is there is no evidence that a connection pooler will fix the
 scalability from 16  32 cores.

There has been much analysis over a number of years of the effects of
the ProcArrayLock, specifically the O(N^2) effect of increasing numbers
of connections on GetSnapshotData(). Most discussion has been on
-hackers, not -perform.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 15:52 -0400, Robert Haas wrote:
 On Tue, May 12, 2009 at 12:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  1. There is no (portable) way to pass the connection from the postmaster
  to another pre-existing process.
 
 [Googles.]  It's not obvious to me that SCM_RIGHTS is non-portable,
 and Windows has an API call WSADuplicateSocket() specifically for this
 purpose.

Robert, Greg,

Tom's main point is it isn't worth doing. We have connection pooling
software that works well, very well. Why do we want to bring it into
core? (Think of the bugs we'd hit...) If we did, who would care?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri Fontaine

Hi,

Le 12 mai 09 à 18:32, Robert Haas a écrit :

implement this same logic internally?  IOW, when a client disconnects,
instead of having the backend exit immediately, have it perform the
equivalent of DISCARD ALL and then stick around for a minute or two
and, if a new connection request arrives within that time, have the
old backend handle the new connection...


A much better idea to solve this, in my opinion, would be to have  
pgbouncer as a postmaster child, integrated into PostgreSQL. It allows  
for choosing whether you want session pooling, transaction pooling or  
statement pooling, which is a more deterministic way to choose when  
your client connection will benefit from a fresh backend or an  
existing one. And it's respecting some backend timeouts etc.
It's Open-Source proven technology, and I think I've heard about some  
PostgreSQL distribution where it's already a postmaster's child.


handwaving
And when associated with Hot Standby (and Sync Wal Shipping), having a  
connection pooler in -core could allow for transparent Read-Write  
access to the slave: at the moment you need an XID (and when connected  
on the slave), the backend could tell the pgbouncer process to  
redirect the connection to the master. With such a feature, you don't  
have to build client side high availability, just connect to either  
the master or the slave and be done with it, whatever the SQL you're  
gonna run.

/





Regards,
--
dim
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Scott Carey
Although nobody wants to support it, he should try the patch that Jignesh K.
Shah (from Sun) proposed that makes ProcArrayLock lighter-weight.  If it
makes 32 cores much faster, then we have a smoking gun.

Although everyone here is talking about this as an 'unoptimal' solution, the
fact is there is no evidence that a connection pooler will fix the
scalability from 16  32 cores.
Certainly a connection pooler will help most results, but it may not fix the
scalability problem.

A question for Dimitri:
What is the scalability from 16  32 cores at the 'peak' load that occurs
near 2x the CPU count?  Is it also poor?  If this is also poor, IMO the
community here should not be complaining about this unopimal case -- a
connection pooler at that stage does little and prepared statements will
increase throughput but not likely alter scalability.

If that result scales, then the short term answer is a connection pooler.

In the tests that Jingesh ran -- making the ProcArrayLock faster helped the
case where connections = 2x the CPU core count quite a bit.

The thread about the CPU scalability is Proposal of tunable fix for
scalability of 8.4, originally posted by Jignesh K. Shah
j.k.s...@sun.com, March 11 2009.

It would be very useful to see results of this benchmark with:
1. A Connection Pooler
2. Jignesh's patch
3. Prepared statements

#3 is important, because prepared statements are ideal for queries that
perform well with low statistics_targets, and not ideal for those that
require high statistics targets.  Realistically, an app won't have more than
a couple dozen statement forms to prepare.  Setting the default statistics
target to 5 is just a way to make some other query perform bad.


On 5/12/09 10:53 AM, Alvaro Herrera alvhe...@commandprompt.com wrote:

 Andres Freund escribió:
 
 Naturally it would still be nice to be good in this not optimal workload...
 
 I find it hard to justify wasting our scarce development resources into
 optimizing such a contrived workload.
 
 --
 Alvaro Herrerahttp://www.CommandPrompt.com/
 The PostgreSQL Company - Command Prompt, Inc.
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Robert Haas
On Tue, May 12, 2009 at 1:00 PM, Dimitri dimitrik...@gmail.com wrote:
 On MySQL there is no changes if I set the number of sessions in the
 config file to 400 or to 2000 - for 2000 it'll just allocate more
 memory.

I don't care whether the setting affects the speed of MySQL.  I want
to know if it affects the speed of PostgreSQL.

 After latest fix with default_statistics_target=5, version 8.3.7 is
 running as fast as 8.4, even 8.4 is little little bit slower.

 I understand your position with a pooler, but I also want you think
 about idea that 128 cores system will become a commodity server very
 soon, and to use these cores on their full power you'll need a
 database engine capable to run 256 users without pooler, because a
 pooler will not help you here anymore..

So what?  People with 128-core systems will not be running trivial
joins that return in 1-2ms and have one second think times between
them.  And if they are, and if they have nothing better to do than
worry about whether MySQL can process those queries in 1/2000th of the
think time rather than 1/1000th of the think time, then they can use
MySQL.  If we're going to worry about performance on 128-core system,
we would be much better advised to put our efforts into parallel query
execution than how many microseconds it takes to execute very simple
queries.

Still, I have no problem with making PostgreSQL faster in the case
you're describing.  I'm just not interested in doing it on my own time
for free.  I am sure there are a number of people who read this list
regularly who would be willing to do it for money, though.  Maybe even
me.  :-)

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query planner making bad decisions

2009-05-12 Thread Rafael Martinez
Cory Coager wrote:
 I'm running version 8.1.11 on SLES 10 SP2.  I'm trying to improve this
 query and unfortunately I cannot change the application.  For some
 reason the planner is making a bad decision sometimes after an analyze
 of table objectcustomfieldvalues.
 
 The query is:
 SELECT DISTINCT main.* FROM Tickets main JOIN CustomFields
 CustomFields_1  ON ( CustomFields_1.Name = 'QA Origin' ) JOIN
 CustomFields CustomFields_3  ON (CustomFields_3.Name = 'QA Group Code' )
 JOIN ObjectCustomFieldValues ObjectCustomFieldValues_4  ON
 (ObjectCustomFieldValues_4.ObjectId = main.id ) AND (
 ObjectCustomFieldValues_4.Disabled = '0' ) AND
 (ObjectCustomFieldValues_4.ObjectType = 'RT::Ticket' ) AND (
 ObjectCustomFieldValues_4.CustomField = CustomFields_3.id ) JOIN
 ObjectCustomFieldValues ObjectCustomFieldValues_2  ON (
 ObjectCustomFieldValues_2.Disabled = '0' ) AND
 (ObjectCustomFieldValues_2.ObjectId = main.id ) AND (
 ObjectCustomFieldValues_2.CustomField = CustomFields_1.id ) AND
 (ObjectCustomFieldValues_2.ObjectType = 'RT::Ticket' )  WHERE
 (main.Status != 'deleted') AND (main.Queue = '60' AND
 ObjectCustomFieldValues_2.Content LIKE '%Patient Sat Survey%' AND
 ObjectCustomFieldValues_4.Content LIKE'%MOT%') AND (main.EffectiveId =
 main.id) AND (main.Type = 'ticket')  ORDER BY main.id ASC;
 
 

Hello

Just in case you want this information. Our RT installation running on
8.3.6 / RHEL4 and with default_statistics_target=100 gives us this query
plan:

Unique  (cost=1360.05..1360.12 rows=1 width=161) (actual
time=2141.834..2141.834 rows=0 loops=1)
   -  Sort  (cost=1360.05..1360.06 rows=1 width=161) (actual
time=2141.831..2141.831 rows=0 loops=1)
 Sort Key: main.effectiveid, main.issuestatement,
main.resolution, main.owner, main.subject, main.initialpriority,
main.finalpriority, main.priority, main.timeestimated, main.timeworked,
main.status, main.timeleft, main.told, main.starts, main.started,
main.due, main.resolved, main.lastupdatedby, main.lastupdated,
main.creator, main.created, main.disabled
 Sort Method:  quicksort  Memory: 25kB
 -  Nested Loop  (cost=14.14..1360.04 rows=1 width=161) (actual
time=2141.724..2141.724 rows=0 loops=1)
   -  Nested Loop  (cost=14.14..1358.09 rows=1 width=165)
(actual time=2141.717..2141.717 rows=0 loops=1)
 -  Nested Loop  (cost=14.14..1356.14 rows=1
width=169) (actual time=2141.715..2141.715 rows=0 loops=1)
   -  Nested Loop  (cost=14.14..1348.69 rows=1
width=169) (actual time=2141.711..2141.711 rows=0 loops=1)
 -  Bitmap Heap Scan on tickets main
(cost=14.14..1333.78 rows=2 width=161) (actual time=0.906..26.413
rows=1046 loops=1)
   Recheck Cond: (queue = 60)
   Filter: (((status)::text 
'deleted'::text) AND (effectiveid = id) AND ((type)::text = 'ticket'::text))
   -  Bitmap Index Scan on tickets1
 (cost=0.00..14.14 rows=781 width=0) (actual time=0.662..0.662 rows=1188
loops=1)
 Index Cond: (queue = 60)
 -  Index Scan using
objectcustomfieldvalues3 on objectcustomfieldvalues
objectcustomfieldvalues_2  (cost=0.00..7.44 rows=1 width=8) (actual
time=2.017..2.017 rows=0 loops=1046)
   Index Cond:
((objectcustomfieldvalues_2.disabled = 0) AND
(objectcustomfieldvalues_2.objectid = main.effectiveid) AND
((objectcustomfieldvalues_2.objecttype)::text = 'RT::Ticket'::text))
   Filter:
((objectcustomfieldvalues_2.content)::text ~~ '%Patient Sat Survey%'::text)
   -  Index Scan using objectcustomfieldvalues3
on objectcustomfieldvalues objectcustomfieldvalues_4  (cost=0.00..7.44
rows=1 width=8) (never executed)
 Index Cond:
((objectcustomfieldvalues_4.disabled = 0) AND
(objectcustomfieldvalues_4.objectid = main.effectiveid) AND
((objectcustomfieldvalues_4.objecttype)::text = 'RT::Ticket'::text))
 Filter:
((objectcustomfieldvalues_4.content)::text ~~ '%MOT%'::text)
 -  Index Scan using customfields_pkey on
customfields customfields_3  (cost=0.00..1.94 rows=1 width=4) (never
executed)
   Index Cond: (customfields_3.id =
objectcustomfieldvalues_4.customfield)
   Filter: ((customfields_3.name)::text = 'QA
Group Code'::text)
   -  Index Scan using customfields_pkey on customfields
customfields_1  (cost=0.00..1.94 rows=1 width=4) (never executed)
 Index Cond: (customfields_1.id =
objectcustomfieldvalues_2.customfield)
 Filter: ((customfields_1.name)::text = 'QA
Origin'::text)
 Total runtime: 2142.347 ms
(26 rows)

-- 
 Rafael Martinez, r.m.guerr...@usit.uio.no
 Center for Information Technology Services
 University of 

[PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-12 Thread Scott Marlowe
Anyone on the list had a chance to benchmark the Nehalem's yet?  I'm
primarily wondering if their promise of performance from 3 memory
channels holds up under typical pgsql workloads.  I've been really
happy with the behavior of my AMD shanghai based server under heavy
loads, but if the Nehalems much touted performance increase translates
to pgsql, I'd like to know.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Robert Haas
On Tue, May 12, 2009 at 4:24 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Tue, 2009-05-12 at 15:52 -0400, Robert Haas wrote:
 On Tue, May 12, 2009 at 12:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  1. There is no (portable) way to pass the connection from the postmaster
  to another pre-existing process.

 [Googles.]  It's not obvious to me that SCM_RIGHTS is non-portable,
 and Windows has an API call WSADuplicateSocket() specifically for this
 purpose.

 Robert, Greg,

 Tom's main point is it isn't worth doing. We have connection pooling
 software that works well, very well. Why do we want to bring it into
 core? (Think of the bugs we'd hit...) If we did, who would care?

I don't know.  It seems like it would be easier to manage just
PostgreSQL than PostgreSQL + connection pooling software, but mostly I
was just curious whether it had been thought about, so I asked, and
the answer then led to a further question...  was not intending to
make a big deal about it.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] increase index performance

2009-05-12 Thread Thomas Finneid

Hi

have the following table (theoretical)

table apartment_location (

city_idint,
street_id  int,
house_id   int,
floor_id   int,
owner  string
...
)

index .. ( city_id, street_id, house_id, floor_id ) tablespc indexspace;

on a database with 260 GB of data and an index size of 109GB on separate 
raid disks. there are

85 city_ids, 2000
street_ids per city,
20 house_ids per street per city
5 floor_ids per house_ per street per city

Then I perform a query to retrieve all house_ids for a specified city, 
house and floor ( a bit contrived, but the same cardinality applies)


  select street_id, floor_id from apartment_location where
city_id = 67 and
house_id = 6 and
floor_id = 4

this returns about 2000 rows, but the query takes 3-4 seconds. It 
performas an index scan, and everything happens inside 6GB of memory.


So the question, any suggestions on how to possibly decrease the query 
time. From iostat etc. its seems that most of the work is reading the 
index, reading the data takes almost next to nothing.


Any suggestions?

regards

thomas







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Alvaro Herrera
Dimitri Fontaine escribió:

 A much better idea to solve this, in my opinion, would be to have  
 pgbouncer as a postmaster child, integrated into PostgreSQL. It allows  
 for choosing whether you want session pooling, transaction pooling or  
 statement pooling, which is a more deterministic way to choose when your 
 client connection will benefit from a fresh backend or an existing one. 
 And it's respecting some backend timeouts etc.

Hmm.  Seems like the best idea if we go this route would be one of
Simon's which was to have better support for pluggable postmaster
children.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Greg Stark
On Tue, May 12, 2009 at 5:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 See previous discussions.  IIRC, there are two killer points:

 1. There is no (portable) way to pass the connection from the postmaster
 to another pre-existing process.

The Apache model is to have all the backends call accept. So incoming
connections don't get handled by a single master process, they get
handled by whichever process the kernel picks to receive the
connection.

-- 
greg

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] superlative missuse

2009-05-12 Thread Angel Alvarez
Dear List mates,

more optimal plan... 
morreoptimal configuration...

we suffer a 'more optimal' superlative missuse

there is  not so 'more optimal' thing but a simple 'better' thing.

im not native english speaker but i think it still applies.

Well this a superlative list so all of you deserve a better optimal use.

Regards, Angel
-- 
Este correo no tiene dibujos. Las formas extrañas en la pantalla son letras.

Clist UAH a.k.a Angel
-[www.uah.es]---

...being the second biggest search engine in the world is good enough for us. 
Peter @ Pirate Bay.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] superlative missuse

2009-05-12 Thread David Wilson
On Tue, May 12, 2009 at 5:53 PM, Angel Alvarez cl...@uah.es wrote:

 we suffer a 'more optimal' superlative missuse

 there is  not so 'more optimal' thing but a simple 'better' thing.

 im not native english speaker but i think it still applies.

 Well this a superlative list so all of you deserve a better optimal use.

As a native english speaker:

You are technically correct. However, more optimal has a
well-understood meaning as closer to optimal, and as such is
appropriate and generally acceptable despite being technically
incorrect.

This is a postgres mailing list, not an english grammar mailing list...

-- 
- David T. Wilson
david.t.wil...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Joshua D. Drake
On Tue, 2009-05-12 at 20:34 -0400, Aidan Van Dyk wrote:
 * Joshua D. Drake j...@commandprompt.com [090512 19:27]:
  
  Apache solved this problem back when it was still called NSCA HTTPD. Why
  aren't we preforking again?
 
 Of course, preforking and connection pooling are totally different
 beast...
 

Yes and no. They both solve similar problems and preforking solves more
problems when you look at the picture in entirety (namely authentication
integration etc..)

 But, what really does preforking give us?  A 2 or 3% improvement?

It depends on the problem we are solving. We can test it but I would bet
it is more than that especially in a high velocity environment.

   The
 forking isn't the expensive part,

It is expensive but not as expensive as the below.

  the per-database setup that happens is
 the expensive setup...  All pre-forking would save us is a tiny part of
 the initial setup, and in turn make our robust postmaster controller no
 longer have control.

I don't buy this. Properly coded we aren't going to lose any control.

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Aidan Van Dyk
* Joshua D. Drake j...@commandprompt.com [090512 19:27]:
 
 Apache solved this problem back when it was still called NSCA HTTPD. Why
 aren't we preforking again?

Of course, preforking and connection pooling are totally different
beast...

But, what really does preforking give us?  A 2 or 3% improvement?  The
forking isn't the expensive part, the per-database setup that happens is
the expensive setup...  All pre-forking would save us is a tiny part of
the initial setup, and in turn make our robust postmaster controller no
longer have control.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] increase index performance

2009-05-12 Thread Greg Smith

On Tue, 12 May 2009, Thomas Finneid wrote:

on a database with 260 GB of data and an index size of 109GB on separate raid 
disks. there are

85 city_ids, 2000
street_ids per city,
20 house_ids per street per city
5 floor_ids per house_ per street per city


You should test what happens if you reduce the index to just being 
(city_id,street_id).  Having all the fields in there makes the index 
larger, and it may end up being faster to just pull all of the ~100 data 
rows for a particular (city_id,street_id) using the smaller index and then 
filter out just the ones you need.  Having a smaller index to traverse 
also means that you'll be more likely to keep all the index blocks in the 
buffer cache moving forward.


A second level improvement there is to then CLUSTER on the smaller index, 
which increases the odds you'll get all of the rows you need by fetching 
only a small number of data pages.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-12 Thread Greg Smith
Anand did SQL Server and Oracle test results, the Nehalem system looks 
like a substantial improvement over the Shanghai Opteron 2384:


http://it.anandtech.com/IT/showdoc.aspx?i=3536p=6
http://it.anandtech.com/IT/showdoc.aspx?i=3536p=7

--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Stephen Frost
* Aidan Van Dyk (ai...@highrise.ca) wrote:
 But, what really does preforking give us?  A 2 or 3% improvement?  The
 forking isn't the expensive part, the per-database setup that happens is
 the expensive setup... 

Obviously that begs the question- why not support pre-fork with specific
databases associated with specific backends that do the per-database
setup prior to a connection coming in?  eg- I want 5 backends ready per
user database (excludes template0, template1, postgres).

Thoughts?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-12 Thread Scott Marlowe
On Tue, May 12, 2009 at 8:05 PM, Greg Smith gsm...@gregsmith.com wrote:
 Anand did SQL Server and Oracle test results, the Nehalem system looks like
 a substantial improvement over the Shanghai Opteron 2384:

 http://it.anandtech.com/IT/showdoc.aspx?i=3536p=6
 http://it.anandtech.com/IT/showdoc.aspx?i=3536p=7

That's an interesting article. Thanks for the link.  A couple points
stick out to me.

1: 5520 to 5540 parts only have 1 133MHz step increase in performance
2: 550x parts have no hyperthreading.

Assuming that the parts tested (5570) were using hyperthreading and
two 133MHz steps, at the lower end of the range, the 550x parts are
likely not that much faster than the opterons in their same clock
speed range, but are still quite a bit more expensive.

It'd be nice to see some benchmarks on the more reasonably priced CPUs
in both ranges, the 2.2 to 2.4 GHz opterons and the 2.0 (5504) to
2.26GHz (5520) nehalems. Since I have to buy  1 server to handle the
load and provide redundancy anyway, single cpu performance isn't
nearly as interesting as aggregate performance / $ spent.

While all the benchmarks on near 3GHz parts is fun to read and
salivate over, it's not as relevant to my interests as the performance
of the more reasonably prices parts.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Robert Haas
On Tue, May 12, 2009 at 11:22 AM, Dimitri dimitrik...@gmail.com wrote:
 Robert, what I'm testing now is 256 users max. The workload is growing
 progressively from 1, 2, 4, 8 ... to 256 users. Of course the Max
 throughput is reached on the number of users equal to 2 * number of
 cores, but what's important for me here - database should continue to
 keep the workload! - response time regressing, but the troughput
 should remain near the same.

 So, do I really need a pooler to keep 256 users working??  - I don't
 think so, but please, correct me.

Not an expert on this, but there has been a lot of discussion of the
importance of connection pooling in this space.  Is MySQL still faster
if you lower max_connections to a value that is closer to the number
of users, like 400 rather than 2000?

 BTW, I did not look to put PostgreSQL in bad conditions - the test is
 the test, and as I said 2 years ago PostgreSQL outperformed MySQL on
 the same test case, and there was nothing done within MySQL code to
 improve it explicitly for db_STRESS.. And I'm staying pretty honest
 when I'm testing something.

Yeah but it's not really clear what that something is.  I believe you
said upthread that PG 8.4 beta 1 is faster than PG 8.3.7, but PG 8.4
beta 1 is slower than MySQL 5.4 whereas PG 8.3.7 was faster than some
older version of MySQL.  So PG got faster and MySQL got faster, but
they sped things up more than we did.  If our performance were getting
WORSE, I'd be worried about that, but the fact that they were able to
make more improvement on this particular case than we were doesn't
excite me very much.  Sure, I'd love it if PG were even faster than it
is, and if you have a suggested patch please send it in...  or if you
want to profile it and send the results that would be great too.  But
I guess my point is that the case of a very large number of
simultaneous users with pauses-for-thought between queries has already
been looked at in the very recent past in a way that's very similar to
what you are doing (and by someone who works at the same company you
do, no less!) so I'm not quite sure why we're rehashing the issue.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-12 Thread Scott Carey
The $ cost of more CPU power on larger machines ends up such a small %
chunk, especially after I/O cost.  Sure, the CPU with HyperThreading and the
turbo might be 40% more expensive than the other CPU, but if the total
system cost is 5% more for 15% more performance . . .

It depends on how CPU limited you are.  If you aren't, there isn't much of a
reason to look past the cheaper Opterons with a good I/O setup.

I've got a 2 x 5520 system with lots of RAM on the way.  The problem with
lots of RAM in the Nehalem systems, is that the memory speed slows as more
is added.  I think mine slows from the 1066Mhz the processor can handle to
800Mhz.  It still has way more bandwidth than the old Xeons though.
Although my use case is about as far from pg_bench as you can get, I might
be able to get a run of it in during stress testing.



On 5/12/09 7:28 PM, Scott Marlowe scott.marl...@gmail.com wrote:

 On Tue, May 12, 2009 at 8:05 PM, Greg Smith gsm...@gregsmith.com wrote:
 Anand did SQL Server and Oracle test results, the Nehalem system looks like
 a substantial improvement over the Shanghai Opteron 2384:
 
 http://it.anandtech.com/IT/showdoc.aspx?i=3536p=6
 http://it.anandtech.com/IT/showdoc.aspx?i=3536p=7
 
 That's an interesting article. Thanks for the link.  A couple points
 stick out to me.
 
 1: 5520 to 5540 parts only have 1 133MHz step increase in performance
 2: 550x parts have no hyperthreading.
 
 Assuming that the parts tested (5570) were using hyperthreading and
 two 133MHz steps, at the lower end of the range, the 550x parts are
 likely not that much faster than the opterons in their same clock
 speed range, but are still quite a bit more expensive.
 
 It'd be nice to see some benchmarks on the more reasonably priced CPUs
 in both ranges, the 2.2 to 2.4 GHz opterons and the 2.0 (5504) to
 2.26GHz (5520) nehalems. Since I have to buy  1 server to handle the
 load and provide redundancy anyway, single cpu performance isn't
 nearly as interesting as aggregate performance / $ spent.
 
 While all the benchmarks on near 3GHz parts is fun to read and
 salivate over, it's not as relevant to my interests as the performance
 of the more reasonably prices parts.
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Robert Haas
On Tue, May 12, 2009 at 12:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 1. There is no (portable) way to pass the connection from the postmaster
 to another pre-existing process.

[Googles.]  It's not obvious to me that SCM_RIGHTS is non-portable,
and Windows has an API call WSADuplicateSocket() specifically for this
purpose.

 2. You'd have to track which database, and probably which user, each
 such backend had been launched for; reconnecting a backend to a new
 database is probably impractical and would certainly invalidate all
 the caching.

User doesn't seem like a major problem, but I understand your point
about databases, which would presumably preclude the Apache approach
of having every backend call accept() on the master socket.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Glenn Maynard
I'm sorry, but I'm confused.  Everyone keeps talking about connection
pooling, but Dimitri has said repeatedly that each client makes a
single connection and then keeps it open until the end of the test,
not that it makes a single connection per SQL query.  Connection
startup costs shouldn't be an issue.  Am I missing something here?
test(N) starts N clients, each client creates a single connection and
hammers the server for a while on that connection.  test(N) is run for
N=1,2,4,8...256.  This seems like a very reasonable test scenario.

--
Glenn Maynard

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-12 Thread Scott Marlowe
On Tue, May 12, 2009 at 8:59 PM, Scott Carey sc...@richrelevance.com wrote:
 The $ cost of more CPU power on larger machines ends up such a small %
 chunk, especially after I/O cost.  Sure, the CPU with HyperThreading and the
 turbo might be 40% more expensive than the other CPU, but if the total
 system cost is 5% more for 15% more performance . . .

But everything dollar I spend on CPUs is a dollar I can't spend on
RAID contollers, more memory, or more drives.

We're looking at machines with say 32 1TB SATA drives, which run in
the $12k range.  The Nehalem 5570s (2.8GHz) are going for something in
the range of $1500 or more, the 5540 (2.53GHz) at $774.99, 5520
(2.26GHz) at $384.99, and the 5506 (2.13GHz) at $274.99.  The 5520 is
the first one with hyperthreading so it's a reasonable cost increase.
Somewhere around the 5530 the cost for increase in performance stops
making a lot of sense.

The opterons, like the 2378 barcelona at 2.4GHz cost $279.99, or the
2.5GHz 2380 at $400 are good values.  And I know they mostly scale by
clock speed so I can decide on which to buy based on that.The 83xx
series cpus are still far too expensive to be cost effective, with
2.2GHz parts running $600 and faster parts climbing VERY quickly after
that.

So what I want to know is how the 2.5GHz barcelonas would compare to
both the 5506 through 5530 nehalems, as those parts are all in the
same cost range (sub $500 cpus).

 It depends on how CPU limited you are.  If you aren't, there isn't much of a
 reason to look past the cheaper Opterons with a good I/O setup.

Exactly.  Which is why I'm looking for best bang for buck on the CPU
front.  Also performance as a data pump so to speak, i.e. minimizing
memory bandwidth limitations.

 I've got a 2 x 5520 system with lots of RAM on the way.  The problem with
 lots of RAM in the Nehalem systems, is that the memory speed slows as more
 is added.

I too wondered about that and its effect on performance.  Another
benchmark I'd like to see, how it runs with more and less memory.

 I think mine slows from the 1066Mhz the processor can handle to
 800Mhz.  It still has way more bandwidth than the old Xeons though.
 Although my use case is about as far from pg_bench as you can get, I might
 be able to get a run of it in during stress testing.

I'd be very interested in hearing how it runs.  and not just for pgbench.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance