Re: [PERFORM] bgwriter, checkpoints, curious (seeing delays)

2010-02-25 Thread Devrim GÜNDÜZ
On Thu, 2010-02-25 at 23:01 -0800, Tory M Blue wrote:

> Checkpoint_timeout is the default and that looks like 5 mins (300
> seconds). And is obviously why I have such a discrepancy between time
> reached and requested. 

If you have a high load, you may want to start tuning with 15 minutes,
and bump it to 30 mins if needed. Also you may want to decrease segments
value based on your findings, since increasing only one of them won't
help you a lot.

As I wrote before, pg_stat_bgwriter is your friend here.
-- 
Devrim GÜNDÜZ
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] bgwriter, checkpoints, curious (seeing delays)

2010-02-25 Thread Tory M Blue
2010/2/25 Tory M Blue :
> 2010/2/25 Devrim GÜNDÜZ :
>> On Thu, 2010-02-25 at 22:12 -0800, Tory M Blue wrote:
>>> shared_buffers = 1500MB
>>
>> Some people tend to increase this to 2.2GB(32-bit) or 4-6 GB (64 bit),
>> if needed. Please note that more shared_buffers will lead to more
>> pressure on bgwriter, but it also has lots of benefits, too.
>>
>>> work_mem = 100MB
>>
>> This is too much. Since you have 300 connections, you will probably swap
>> because of this setting, since each connection may use this much
>> work_mem. The rule of the thumb is to set this to a lower general value
>> (say, 1-2 MB), and set it per-query when needed.
>>
>>> checkpoint_segments = 100
>>> checkpoint_warning = 3600s
>>
>> What about checkpoint_timeout? Please note that even if
>> checkpoint_segments = 100, if timeout value is low (say 5 mins),
>> PostgreSQL will probably checkpoint every checkpoint_timeout minutes
>> (unless PostgreSQL creates $checkpoint_segments xlogs before
>> checkpoint_timeout value). Depending on your workload, this may not be
>> intended, and it may cause spikes -- which will lead to the issues you
>> complain.
>>
>> I'll stop here, and suggest you read this:
>>
>> http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
>>
>> for details about this subject. As noted there, if you are running 8.3+,
>> pg_stat_bgwriter will help you to tune checkpoint & bgwriter settings.
>>
>> -HTH.
>
> Checkpoint_timeout is the default and that looks like 5 mins (300
> seconds). And is obviously why I have such a discrepancy between time
> reached and requested.
>
> Thank you sir, that's actually the page that I've spent much of my
> time on this eve :)  I'll continue to read and check my configuration
> settings.
>
> Tory
Also since I set the log on today I have some log information
regarding the checkpoints
2010-02-25 22:08:13 PSTLOG:  checkpoint starting: time
2010-02-25 22:10:41 PSTLOG:  checkpoint complete: wrote 44503
buffers (23.2%); 0 transaction log file(s) added, 0 removed, 20
recycled; write=148.539 s, sync=0.000 s, total=148.540 s
2010-02-25 22:13:13 PSTLOG:  checkpoint starting: time
2010-02-25 22:15:37 PSTLOG:  checkpoint complete: wrote 38091
buffers (19.8%); 0 transaction log file(s) added, 0 removed, 20
recycled; write=144.713 s, sync=0.000 s, total=144.714 s
2010-02-25 22:18:13 PSTLOG:  checkpoint starting: time
2010-02-25 22:20:42 PSTLOG:  checkpoint complete: wrote 38613
buffers (20.1%); 0 transaction log file(s) added, 0 removed, 19
recycled; write=149.870 s, sync=0.000 s, total=149.871 s
2010-02-25 22:23:13 PSTLOG:  checkpoint starting: time
2010-02-25 22:25:42 PSTLOG:  checkpoint complete: wrote 39009
buffers (20.3%); 0 transaction log file(s) added, 0 removed, 19
recycled; write=149.876 s, sync=0.000 s, total=149.877 s
2010-02-25 22:28:13 PSTLOG:  checkpoint starting: time
2010-02-25 22:30:43 PSTLOG:  checkpoint complete: wrote 30847
buffers (16.1%); 0 transaction log file(s) added, 0 removed, 19
recycled; write=150.000 s, sync=0.000 s, total=150.001 s
2010-02-25 22:33:13 PSTLOG:  checkpoint starting: time
2010-02-25 22:35:43 PSTLOG:  checkpoint complete: wrote 11917
buffers (6.2%); 0 transaction log file(s) added, 0 removed, 14
recycled; write=150.064 s, sync=0.000 s, total=150.065 s
2010-02-25 22:38:13 PSTLOG:  checkpoint starting: time
2010-02-25 22:40:43 PSTLOG:  checkpoint complete: wrote 10869
buffers (5.7%); 0 transaction log file(s) added, 0 removed, 5
recycled; write=149.995 s, sync=0.000 s, total=149.996 s
2010-02-25 22:43:13 PSTLOG:  checkpoint starting: time
2010-02-25 22:45:41 PSTLOG:  checkpoint complete: wrote 31424
buffers (16.4%); 0 transaction log file(s) added, 0 removed, 4
recycled; write=148.597 s, sync=0.000 s, total=148.598 s
2010-02-25 22:48:13 PSTLOG:  checkpoint starting: time
2010-02-25 22:50:42 PSTLOG:  checkpoint complete: wrote 33895
buffers (17.7%); 0 transaction log file(s) added, 0 removed, 17
recycled; write=149.825 s, sync=0.000 s, total=149.826 s
2010-02-25 22:53:13 PSTLOG:  checkpoint starting: time
2010-02-25 22:53:17 PST postgres postgres [local] LOG:  unexpected EOF
on client connection
2010-02-25 22:55:43 PSTLOG:  checkpoint complete: wrote 34155
buffers (17.8%); 0 transaction log file(s) added, 0 removed, 15
recycled; write=150.045 s, sync=0.000 s, total=150.046 s
2010-02-25 22:58:13 PSTLOG:  checkpoint starting: time
2010-02-25 23:00:41 PSTLOG:  checkpoint complete: wrote 33873
buffers (17.6%); 0 transaction log file(s) added, 0 removed, 15
recycled; write=148.223 s, sync=0.000 s, total=148.224 s
2010-02-25 23:03:13 PSTLOG:  checkpoint starting: time

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bgwriter, checkpoints, curious (seeing delays)

2010-02-25 Thread Tory M Blue
2010/2/25 Devrim GÜNDÜZ :
> On Thu, 2010-02-25 at 22:12 -0800, Tory M Blue wrote:
>> shared_buffers = 1500MB
>
> Some people tend to increase this to 2.2GB(32-bit) or 4-6 GB (64 bit),
> if needed. Please note that more shared_buffers will lead to more
> pressure on bgwriter, but it also has lots of benefits, too.
>
>> work_mem = 100MB
>
> This is too much. Since you have 300 connections, you will probably swap
> because of this setting, since each connection may use this much
> work_mem. The rule of the thumb is to set this to a lower general value
> (say, 1-2 MB), and set it per-query when needed.
>
>> checkpoint_segments = 100
>> checkpoint_warning = 3600s
>
> What about checkpoint_timeout? Please note that even if
> checkpoint_segments = 100, if timeout value is low (say 5 mins),
> PostgreSQL will probably checkpoint every checkpoint_timeout minutes
> (unless PostgreSQL creates $checkpoint_segments xlogs before
> checkpoint_timeout value). Depending on your workload, this may not be
> intended, and it may cause spikes -- which will lead to the issues you
> complain.
>
> I'll stop here, and suggest you read this:
>
> http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
>
> for details about this subject. As noted there, if you are running 8.3+,
> pg_stat_bgwriter will help you to tune checkpoint & bgwriter settings.
>
> -HTH.

Checkpoint_timeout is the default and that looks like 5 mins (300
seconds). And is obviously why I have such a discrepancy between time
reached and requested.

Thank you sir, that's actually the page that I've spent much of my
time on this eve :)  I'll continue to read and check my configuration
settings.

Tory

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bgwriter, checkpoints, curious (seeing delays)

2010-02-25 Thread Devrim GÜNDÜZ
On Thu, 2010-02-25 at 22:12 -0800, Tory M Blue wrote:
> shared_buffers = 1500MB 

Some people tend to increase this to 2.2GB(32-bit) or 4-6 GB (64 bit),
if needed. Please note that more shared_buffers will lead to more
pressure on bgwriter, but it also has lots of benefits, too.

> work_mem = 100MB

This is too much. Since you have 300 connections, you will probably swap
because of this setting, since each connection may use this much
work_mem. The rule of the thumb is to set this to a lower general value
(say, 1-2 MB), and set it per-query when needed.

> checkpoint_segments = 100
> checkpoint_warning = 3600s

What about checkpoint_timeout? Please note that even if
checkpoint_segments = 100, if timeout value is low (say 5 mins),
PostgreSQL will probably checkpoint every checkpoint_timeout minutes
(unless PostgreSQL creates $checkpoint_segments xlogs before
checkpoint_timeout value). Depending on your workload, this may not be
intended, and it may cause spikes -- which will lead to the issues you
complain.

I'll stop here, and suggest you read this:

http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

for details about this subject. As noted there, if you are running 8.3+,
pg_stat_bgwriter will help you to tune checkpoint & bgwriter settings.

-HTH.
 
-- 
Devrim GÜNDÜZ
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] bgwriter, checkpoints, curious (seeing delays)

2010-02-25 Thread Jochen Erwied
Friday, February 26, 2010, 7:20:38 AM you wrote:

> "checkpoint_completion_target (floating point)

> interesting that it's a .5 second default setting and I'm seeing
> exactly that .5 second delay.

It's not an exact time, but a multiplier to 'checkpoint_timeout'. So a
setting of .5 with a timeout of 300 seconds means a checkpoint should be
completed after 300*0.5 = 150 seconds.


-- 
Jochen Erwied |   home: joc...@erwied.eu +49-208-38800-18, FAX: -19
Sauerbruchstr. 17 |   work: j...@mbs-software.de  +49-2151-7294-24, FAX: -50
D-45470 Muelheim  | mobile: jochen.erw...@vodafone.de   +49-173-5404164


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bgwriter, checkpoints, curious (seeing delays)

2010-02-25 Thread Tory M Blue
On Thu, Feb 25, 2010 at 10:12 PM, Tory M Blue  wrote:
> Okay ladies and gents and the rest of you :)
>
> It's time I dig into another issue, and that's a curious 5 second
> delay on connect, on occasion. Now, I believe the network to be sound
> and there are zero errors on any servers, no retrans, no runts, nada
> nada nada. However I will continue to run tests with/without dns,
> tcpdumps, tracking communications, handshakes, however.
>
> I've been doing some reading and what did I find. I found "Checkpoints
> are very disrupting to your database performance and can cause
> connections to stall for up to a few seconds while they occur."

Quick added note, sorry.

"checkpoint_completion_target (floating point)

Specifies the target length of checkpoints, as a fraction of the
checkpoint interval. The default is 0.5. This parameter can only be
set in the postgresql.conf file or on the server command line. "

interesting that it's a .5 second default setting and I'm seeing
exactly that .5 second delay.

Again could be reaching here!

Thanks for putting up with me

Tory

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] bgwriter, checkpoints, curious (seeing delays)

2010-02-25 Thread Tory M Blue
Okay ladies and gents and the rest of you :)

It's time I dig into another issue, and that's a curious 5 second
delay on connect, on occasion. Now, I believe the network to be sound
and there are zero errors on any servers, no retrans, no runts, nada
nada nada. However I will continue to run tests with/without dns,
tcpdumps, tracking communications, handshakes, however.

I've been doing some reading and what did I find. I found "Checkpoints
are very disrupting to your database performance and can cause
connections to stall for up to a few seconds while they occur."

Now I'm reaching here, but wondered.

My issue is what appears to be a random 5 second connection delay.

Connection times over a 24 hour period from my application servers,
which connect to my query servers (postgres 8.3.4, slony 1.2.15), big
8 way boxes with 16-32gb ram

Conn time in seconds and count. (50 kind of a round even number eh!)
  '1.0-2.0' => 4,
  '5.0-' => 50,
  '0.5-1.0' => 6,
  '0.0-0.5' => 155,632

So the 5 second makes us instantly think network but I want to hit
this from multiple angles, so figured I would reach out to the brain
trust here for some ideas.

Here is what may be interesting and may point me into a direction of
further tuning and adjusting.

postgres=# select * from pg_stat_bgwriter;
 checkpoints_timed | checkpoints_req | buffers_checkpoint |
buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc
---+-++---+--+-+---
 34820 | 207 |  06118 |
214634102 |  1274873 |   75850 |3370607455
(1 row)

Now I'll be honest, I have nothing special in my configs for
bg_writer, in fact other than enabling log_checkpoints today, I have
nothing but defaults for bg_*. but the fact that checkpoints_timed is
so high, seams to maybe point to some required tuning?

Things of interest.

Memory 24gb
8 way processor
postgres 8.3.4

shared_buffers = 1500MB  # min 128kB or max_connections*16kB
max_prepared_transactions = 0   # can be 0 or more
work_mem = 100MB# min 64kB
maintenance_work_mem = 128MB# min 1MB
max_fsm_pages = 50  # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 225 # min 100, ~70 bytes each
fsync = off  # turns forced
synchronization on or off
checkpoint_segments = 100   # in logfile segments, min 1, 16MB each
checkpoint_warning = 3600s  # 0 is off

So nothing exciting in my configs (300 connections)..

So my guess is "IF" a really big if, the delay is actually due to
checkpoints (perfect lil storm, timing) than there should be ways for
me to tune this ya?

Thanks "again" for your assistance

Tory

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2010-02-25 Thread Tom Lane
Bruce Momjian  writes:
> Did this get addressed?

Partially.  There are stats now but autovacuum is not bright about
when to update them.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2010-02-25 Thread Bruce Momjian

Did this get addressed?

---

Tom Lane wrote:
> Kris Jurka  writes:
> > The real problem is getting reasonable stats to pass through the partition 
> > Append step, so it can make a reasonable estimate of the join output size.
> 
> I dug around a bit and concluded that the lack of stats for the Append
> relation is indeed the main problem.  It's not so much the bad join size
> estimate (although that could hurt for cases where you need to join this
> result to another table).  Rather, it's that the planner is deliberately
> biased against picking hash joins in the absence of stats for the inner
> relation.  Per the comments for estimate_hash_bucketsize:
> 
>  * If no statistics are available, use a default estimate of 0.1.  This will
>  * discourage use of a hash rather strongly if the inner relation is large,
>  * which is what we want.  We do not want to hash unless we know that the
>  * inner rel is well-dispersed (or the alternatives seem much worse).
> 
> While we could back off the default a bit here, I think it'd be better
> to fix it by not punting on the stats-for-append-relations problem.
> That doesn't seem like material for 8.4 at this point, though.
> 
>   regards, tom lane
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2010-02-25 Thread Bruce Momjian

Was this corrected?  I don't see any commits to seg.c.

---

Matthew Wakeling wrote:
> On Thu, 7 May 2009, Oleg Bartunov wrote:
> > Did you try Guttman quadratic split algorithm ? We also found linear
> > split algorithm for Rtree.
> 
> The existing (bugfixed) seg split algorithm is the Guttman quadratic split 
> algorithm. Guttman did all his work on two-dimensional and above data, 
> dismissing one-dimensional data as being handled adequately by B-trees, 
> which is not true for segment overlaps. It turns out that the algorithm 
> has a weakness with certain types of data, and one-dimensional data is 
> almost certain to exercise that weakness. The greater the number of 
> dimensions, the less the weakness is exercised.
> 
> The problem is that the algorithm does not calculate a split pivot. 
> Instead it finds two suitable entries, and adds the remaining entries to 
> those two in turn. This can lead to the majority of the entries being 
> added to just one side. In fact, I saw lots of cases where 367 entries 
> were being split into two pages of 366 and one entry.
> 
> Guttman's linear split algorithm has the same weakness.
> 
> >> One thing I am seeing is a really big difference in performance between 
> >> Postgres/GiST and a Java implementation I have written, using the same 
> >> algorithms. Postgres takes three minutes to perform a set of index lookups 
> >> while java takes six seconds. The old version of bioseg took an hour. I 
> >> can't see anything in the GiST support code that could account for this.
> >
> > is the number of index lookups different, or just index lookup time is very
> > big ?
> 
> Same number of index lookups. Same algorithms. I have a set of 681879 
> segments, and I load them all into the index. I then query the index for 
> overlaps for each one in turn. For some reason, GiST lookups seem to be 
> slow, even if they are using a good algorithm. I have seen that problem 
> with btree_gist on integers too. I can't see any reason for this is the 
> GiST code - it all seems pretty tight to me. We probably need to do some 
> profiling.
> 
> Matthew
> 
> -- 
>  I suppose some of you have done a Continuous Maths course. Yes? Continuous
>  Maths?  Whoah, it was like that, was it!
> -- Computer Science Lecturer
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2010-02-25 Thread Bruce Momjian

Was there every any conclusion on this issue?

---

Matthew Wakeling wrote:
> 
> Revisiting the thread a month back or so, I'm still investigating 
> performance problems with GiST indexes in Postgres.
> 
> Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd 
> like to clarify the contrib/seg issue. Contrib/seg is vulnerable to 
> pathological behaviour which is fixed by my second patch, which can be 
> viewed as complete. Contrib/cube, being multi-dimensional, is not affected 
> to any significant degree, so should not need alteration.
> 
> A second quite distinct issue is the general performance of GiST indexes 
> which is also mentioned in the old thread linked from Open Items. For 
> that, we have a test case at 
> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for 
> btree_gist indexes. I have a similar example with the bioseg GiST index. I 
> have completely reimplemented the same algorithms in Java for algorithm
> investigation and instrumentation purposes, and it runs about a hundred 
> times faster than in Postgres. I think this is a problem, and I'm willing 
> to do some investigation to try and solve it.
> 
> Do you have a recommendation for how to go about profiling Postgres, what 
> profiler to use, etc? I'm running on Debian Linux x86_64.
> 
> Matthew
> 
> -- 
>  Jadzia: Don't forget the 34th rule of acquisition: Peace is good for 
> business.
>  Quark:  That's the 35th.
>  Jadzia: Oh yes, that's right. What's the 34th again?
>  Quark:  War is good for business. It's easy to get them mixed up.
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dbt2 performance

2010-02-25 Thread Yu-Ju Hong
Thanks for the reply.

On Thu, Feb 25, 2010 at 5:48 PM, Greg Smith  wrote:

> Yu-Ju Hong wrote:
>
>> 2. Moreover, the disk utilization was high and the "await" time from
>> iostat is around 500 ms. Could disk I/O limit the overall throughput? The
>> server has 2 SATA disks, one for system and postgresql and the other is
>> dedicated to logging (pg_xlog). As far as I understand, modern database
>> systems should be CPU-bound rather than I/O-bound, is it because I did not
>> perform adequate performance tuning?
>>
>
> dbt2 is almost exclusively disk I/O bound once the data set gets big
> enough.  There are some applications where most of the data fits in RAM and
> therefore CPU performance is the limiter.  dbt2 is exactly the opposite of
> such an application though, and the idea that "modern database systems
> should be CPU bound" is not really true at all.  That's only the case if the
> data you're operating on fits in RAM.  Otherwise, databases are just as I/O
> bound as they've always been.  Main thing that's changed is there's a lot
> more RAM in systems nowadays.
>

In my test, there was almost no disk reads (mostly disk writes), so I
assumed the size of the database didn't cause the performance bottleneck.
Maybe I was wrong. If so, should I increase shared_buffer?

Assuming that dbt2 was limited by disk I/O in my experiments, do you think
the numbers I got with my server configuration are reasonable?

Also, would you mind giving some examples where the applications are CPU
bound? That could be useful information to me.

>
> By the way:  a large increase in checkpoint_segments is the first thing you
> should do.  If you check the database logs, they're probably filled with
> complaints about it being too low.  32 would be a useful starting value,
> going much higher for a test that's only 10 minutes long is probably
> cheating.
>
>
I increased the checkpoint_segments to 10 when I ran the tests. I'll
certainly increase it to 32 and give it a try.



>  --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> g...@2ndquadrant.com   www.2ndQuadrant.us 
>
>
Thanks,
Yu-Ju


Re: [PERFORM] dbt2 performance

2010-02-25 Thread Greg Smith

Yu-Ju Hong wrote:
2. Moreover, the disk utilization was high and the "await" time from 
iostat is around 500 ms. Could disk I/O limit the overall throughput? 
The server has 2 SATA disks, one for system and postgresql and the 
other is dedicated to logging (pg_xlog). As far as I understand, 
modern database systems should be CPU-bound rather than I/O-bound, is 
it because I did not perform adequate performance tuning?


dbt2 is almost exclusively disk I/O bound once the data set gets big 
enough.  There are some applications where most of the data fits in RAM 
and therefore CPU performance is the limiter.  dbt2 is exactly the 
opposite of such an application though, and the idea that "modern 
database systems should be CPU bound" is not really true at all.  That's 
only the case if the data you're operating on fits in RAM.  Otherwise, 
databases are just as I/O bound as they've always been.  Main thing 
that's changed is there's a lot more RAM in systems nowadays.


By the way:  a large increase in checkpoint_segments is the first thing 
you should do.  If you check the database logs, they're probably filled 
with complaints about it being too low.  32 would be a useful starting 
value, going much higher for a test that's only 10 minutes long is 
probably cheating.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] dbt2 performance

2010-02-25 Thread Yu-Ju Hong
Hi,

I have a couple of questions about dbt2 performance.

1. I tested dbt2+postgresql 8.4.2 on my server, but the NOTPM is around only
320~390 with 10 connections and 30 warehouses. Increasing the number of
connections did not improve the throughput? The NOPTM number does not seem
very high to me. Should I try more configurations to see if it can be
improved? Are there any numbers I can compare with (NOPTM and response
time)?

2. Moreover, the disk utilization was high and the "await" time from iostat
is around 500 ms. Could disk I/O limit the overall throughput? The server
has 2 SATA disks, one for system and postgresql and the other is dedicated
to logging (pg_xlog). As far as I understand, modern database systems should
be CPU-bound rather than I/O-bound, is it because I did not perform adequate
performance tuning?

3. From "vmstat", the cpus spent around 72% of time idle, 25% waiting for
I/O, and only 2~3% left doing real work. I was surprised that the cpu
utilization was so low. Is that normal or could it be due to
misconfiguration? In my opinion, even if disk I/O may have been stressed,
70% of idle time was still too high.


Below are some specs/configurations that I used. Any suggestion is welcome.
Thanks!

server spec:
4 cores (2*Dual-Core AMD Opteron, 800MHz), 12GB ram
2 SATA disks, one for system and postgresql and the other is dedicated to
logging (pg_xlog)

postgres configuration:
30 warehouses
256MB shared_buffer
768MB effective_cache_size
checkpoint_timeout 1hr (All my tests are within 10 minutes interval, so
checkpointing should not interfere the performance)
I turned off fsync to see whether the performance could be improved.

Yu-Ju


Re: [PERFORM] Extracting superlatives - SQL design philosophy

2010-02-25 Thread Tom Lane
"Julien Theulier"  writes:
> Teradata provides a “qualify” syntax which works as a filtering condition on
> a windowed function result. This is the only DB allowing this direct
> filtering on windowed functions, from what I know.

Seems like you could easily translate that into SQL-standard syntax by
adding a level of sub-select:

select ... from (select *, window_function wf from ...) ss
where wf=1;

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Extracting superlatives - SQL design philosophy

2010-02-25 Thread Julien Theulier
Hi all,

 

Just for your information, and this is not related to PG directly:

Teradata provides a “qualify” syntax which works as a filtering condition on
a windowed function result. This is the only DB allowing this direct
filtering on windowed functions, from what I know.

So, as an example, the query you ask for becomes very easy on this database:

select 

city, temp, date 

from bar 

qualify row_number() over (partition by city order by temp desc)=1

 

This is very practical indeed (you can mix it also with classical
where/having/group by syntaxes).

On postgres, you may get the same result using an inner query (sorry, I
can’t test it for now) such as:

select 

city, temp, date 

from

(select city, temp, date, row_number() over (partition by city order by temp
desc) as nr 

from bar ) a1

where nr=1

 

Julien Theulier

 

De : pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] De la part de Mose
Envoyé : mercredi 24 février 2010 22:50
À : Dave Crooke
Cc : pgsql-performance
Objet : Re: [PERFORM] Extracting superlatives - SQL design philosophy

 

Can you try using window functions?

 

Something like this:

 

select distinct

  city,

  first_value(temp) over w as max_temp,

  first_value(date) over w as max_temp_date

 from

  cities

  window w as (partition by city order by temp desc)

 

 
http://www.postgresql.org/docs/current/static/tutorial-window.html

 

- Mose

 

On Wed, Feb 24, 2010 at 1:31 PM, Dave Crooke < 
dcro...@gmail.com> wrote:

This is a generic SQL issue and not PG specific, but I'd like to get
an opinion from this list.

Consider the following data:

# \d bar
Table "public.bar"
 Column |Type | Modifiers
+-+---
 city   | character varying(255)  |
 temp   | integer |
 date   | timestamp without time zone |

# select * from bar order by city, date;
  city| temp |date
---+--+-
 Austin|   75 | 2010-02-21 15:00:00
 Austin|   35 | 2010-02-23 15:00:00
 Edinburgh |   42 | 2010-02-23 15:00:00
 New York  |   56 | 2010-02-23 15:00:00
 New York  |   78 | 2010-06-23 15:00:00
(5 rows)

If you want the highest recorded temperature for a city, that's easy
to do, since the selection criteria works on the same column that we
are extracing:

# select city, max(temp) from bar group by city order by 1;
  city| max
---+-
 Austin|  75
 Edinburgh |  42
 New York  |  78
(3 rows)


However there is (AFAIK) no simple way in plain SQL to write a query
that performs such an aggregation where the aggregation criteria is on
one column and you want to return another, e.g. adding the the *date
of* that highest temperature to the output above, or doing a query to
get the most recent temperature reading for each city.

What I'd like to do is something like the below (and I'm inventing
mock syntax here, the following is not valid SQL):

-- Ugly implicit syntax but no worse than an Oracle outer join ;-)
select city, temp, date from bar where date=max(date) group by city,
temp order by city;

or perhaps

-- More explicit
select aggregate_using(max(date), city, temp, date) from bar group by
city, temp order by city;

Both of the above, if they existed, would be a single data access
followed by and sort-merge.

The only way I know how to do it involves doing two accesses to the data,
e.g.

# select city, temp, date from bar a where date=(select max(b.date)
from bar b where a.city=b.city) order by 1;
  city| temp |date
---+--+-
 Austin|   35 | 2010-02-23 15:00:00
 Edinburgh |   42 | 2010-02-23 15:00:00
 New York  |   78 | 2010-06-23 15:00:00
(3 rows)


# explain select * from bar a where date=(select max(b.date) from bar
b where a.city=b.city) order by 1;
   QUERY PLAN
--
 Sort  (cost=1658.86..1658.87 rows=1 width=528)
  Sort Key: a.city
  ->  Seq Scan on bar a  (cost=0.00..1658.85 rows=1 width=528)
Filter: (date = (subplan))
SubPlan
  ->  Aggregate  (cost=11.76..11.77 rows=1 width=8)
->  Seq Scan on bar b  (cost=0.00..11.75 rows=1
width=8) -- would be an index lookup in a real scenario
  Filter: (($0)::text = (city)::text)
(8 rows)

--
Sent via pgsql-performance mailing list (
 pgsql-performance@postgresql.org)
To make changes to your subscription:
 
http://www.postgresql.org/mailpref/pgsql-performance

 



Re: [PERFORM] Extracting superlatives - SQL design philosophy

2010-02-25 Thread Pierre C



-- More explicit
select aggregate_using(max(date), city, temp, date) from bar group by
city, temp order by city;


select city, max(ROW(temp, date)) from bar group by city;

Does not work (alas) for lack of a default comparison for record type.

Another solution, which works wonders if you've got the list of cities in  
a separate table, and an index on (city, temp) is this :


SELECT c.city, (SELECT ROW( t.date, t.temp ) FROM cities_temp t WHERE  
t.city=c.city ORDER BY temp DESC LIMIT 1) FROM cities;


This will do a nested loop index scan and it is the fastest way, except if  
you have very few rows per city.
The syntax is ugly and you have to extract the stuff from the ROW()  
afterwards, though.


Unfortunately, this does not work :

SELECT c.city, (SELECT t.date, t.temp FROM cities_temp t WHERE  
t.city=c.city ORDER BY temp DESC LIMIT 1) AS m FROM cities;


because the subselect isn't allowed to return more than 1 column.

Note that you can also get the usually annoying top-N by category to use  
the index by doing something like :


SELECT c.city, (SELECT array_agg(date) FROM (SELECT t.date FROM  
cities_temp t WHERE t.city=c.city ORDER BY temp DESC LIMIT 5)) AS m FROM  
cities;


The results aren't in a very usable form either, but :

CREATE INDEX ti ON annonces( type_id, price ) WHERE price IS NOT NULL;

EXPLAIN ANALYZE SELECT
t.id, (SELECT ROW(a.id, a.price, a.date_annonce)
FROM annonces a
WHERE a.type_id = t.id AND price IS NOT NULL
ORDER BY price DESC LIMIT 1)
FROM types_bien t;
  QUERY  
PLAN

--
 Seq Scan on types_bien t  (cost=0.00..196.09 rows=57 width=4) (actual  
time=0.025..0.511 rows=57 loops=1)

   SubPlan 1
 ->  Limit  (cost=0.00..3.41 rows=1 width=16) (actual  
time=0.008..0.008 rows=1 loops=57)
   ->  Index Scan Backward using ti on annonces a   
(cost=0.00..8845.65 rows=2592 width=16) (actual time=0.007..0.007 rows=1  
loops=57)

 Index Cond: (type_id = $0)
 Total runtime: 0.551 ms

explain analyze
select distinct type_id, first_value(price) over w as max_price
from annonces where price is not null
window w as (partition by type_id order by price desc);
 QUERY PLAN

 HashAggregate  (cost=30515.41..30626.87 rows=11146 width=10) (actual  
time=320.927..320.971 rows=46 loops=1)
   ->  WindowAgg  (cost=27729.14..29958.16 rows=111451 width=10) (actual  
time=195.289..282.150 rows=111289 loops=1)
 ->  Sort  (cost=27729.14..28007.76 rows=111451 width=10) (actual  
time=195.278..210.762 rows=111289 loops=1)

   Sort Key: type_id, price
   Sort Method:  quicksort  Memory: 8289kB
   ->  Seq Scan on annonces  (cost=0.00..18386.17 rows=111451  
width=10) (actual time=0.009..72.589 rows=111289 loops=1)

 Filter: (price IS NOT NULL)
 Total runtime: 322.382 ms

Here using the index is 600x faster... worth a bit of ugly SQL, you decide.

By disabling seq_scan and bitmapscan, you can corecr this plan :

EXPLAIN ANALYZE SELECT DISTINCT ON (type_id) type_id, date_annonce, price  
FROM annonces WHERE price IS NOT NULL ORDER BY type_id, price LIMIT 40;

QUERY PLAN
--
 Limit  (cost=0.00..78757.61 rows=33 width=14) (actual time=0.021..145.509  
rows=40 loops=1)
   ->  Unique  (cost=0.00..78757.61 rows=33 width=14) (actual  
time=0.021..145.498 rows=40 loops=1)
 ->  Index Scan using ti on annonces  (cost=0.00..78478.99  
rows=111451 width=14) (actual time=0.018..132.671 rows=110796 loops=1)

 Total runtime: 145.549 ms

This plan would be very bad (unless the whole table is in RAM) because I  
guess the index scan isn't aware of the DISTINCT ON, so it scans all rows  
in the index and in the table.









--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Extracting superlatives - SQL design philosophy

2010-02-25 Thread Richard Huxton

On 24/02/10 23:37, Dave Crooke wrote:

1. The city temps table is a toy example, not meant to be realistic :-)


You knew that and I guessed it, but it's worth stating these things for 
people who read the archives a year from now.



2. Yes, my (Java) algorithm is deterministic ... it will return
exactly one row per city, and that will be the row (or strictly, *a*
row) containing the highest temp. Temp value ties will break in favour
of earlier rows in Guinness Book of Records tradition :-) It's
equivalent to a HashAggregate implementation.


But not when you add in other columns (which is what you're trying to do).


The following two query plans (from my real schema) illustrate the
itch I am trying to scratch  I want the functionality of the 2nd
one, but with the execution plan structure of the first:

# explain analyse select a, max(b) from perf_raw_2010_02_23 group by a;
   QUERY
PLAN
--
  HashAggregate  (cost=117953.09..117961.07 rows=639 width=8) (actual
time=10861.845..10863.008 rows=1023 loops=1)
->   Seq Scan on perf_raw_2010_02_23  (cost=0.00..91572.39
rows=5276139 width=8) (actual time=0.038..4459.222 rows=5276139
loops=1)
  Total runtime: 10863.856 ms
(3 rows)

Time: 10864.817 ms
# explain analyse select distinct on (a) * from perf_raw_2010_02_23
order by a, b desc ;


One big bit of the cost difference is going to be the ordering you need 
to get a repeatable result.



  QUERY
PLAN
-
  Unique  (cost=1059395.04..1085775.73 rows=639 width=28) (actual
time=46011.204..58428.210 rows=1023 loops=1)
->   Sort  (cost=1059395.04..1072585.39 rows=5276139 width=28)
(actual time=46011.200..53561.112 rows=5276139 loops=1)
  Sort Key: a, b
  Sort Method:  external merge  Disk: 247584kB
-- actually OS RAM buffers


Even if the sort never actually reaches a physical disk, you should 
still see an increase by increasing sort_mem for the duration of the one 
query. It's not going to be the magnitude you want, but probably worth 
doing.



  ->   Seq Scan on perf_raw_2010_02_23  (cost=0.00..91572.39
rows=5276139 width=28) (actual time=0.047..6491.036 rows=5276139
loops=1)
  Total runtime: 58516.185 ms
(6 rows)

Time: 58517.233 ms

The only difference between these two is that the second query returns
the whole row. The *ratio* in cost between these two plans increases
in proportion to log(n) of the table size ... at 5.5m rows its
livable, at 500m it's probably not :-!


If performance on this query is vital to you, and the table doesn't 
change after initial population (which I'm guessing is true) then try an 
index on (a asc, b desc) and CLUSTER that index. Depending on the ratio 
of distinct a:b values that could be what you're after.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance