Re: [PERFORM] Hash Anti Join performance degradation

2011-05-23 Thread Craig Ringer
On 24/05/11 12:14, panam wrote:
> Hi,
> 
> In my DB, there is a large table containing messages and one that contains
> message boxes.
> Messages are assigned to boxes via a child parent link m->b.
> In order to obtain the last message for a specific box, I use the following
> SQL:
> 
> SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id
> AND m1.id < m2.id) WHERE m2.id IS NULL AND m1.box_id = id;
> 
> This worked quite well for a long time. But now, suddenly the performance of
> this query drastically degraded as new messages were added.
> If these new messages are removed again, everything is back to normal. If
> other messages are removed instead, the problem remains, so it does not seem
> to be a memory issue. I fear I have difficulties to understand what is going
> wrong here.

The usual cause is that the statistics for estimated row counts cross a
threshold that makes the query planner think that a different kind of
plan will be faster.

If the query planner is using bad information about the performance of
the storage, then it will be making bad decisions about which approach
is faster. So the usual thing to do is to adjust seq_page_cost and
random_page_cost to more closely reflect the real performance of your
hardware, and to make sure that effective_cache_size matches the real
amount of memory your computer has free for disk cache use.

Newer versions of PostgreSQL always include query planning and
statistics improvements too.

BTW, it can be really helpful to paste your query plans into
http://explain.depesz.com/ , which will provide an easier to read visual
analysis of the plan. This will only work with query plans that haven't
been butchered by mail client word wrapping, so I can't do it for you,
but if you paste them there and post the links that'd be really handy.

Also have a look at http://wiki.postgresql.org/wiki/SlowQueryQuestions

I found the plans you posted a bit hard to read. Not your fault; it's
stupid mail clients.  Maybe depesz.com needs to be taught to de-munge
the damage done to plans by common mail clients.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance degradation of inserts when database size grows

2011-05-23 Thread Terry Schmitt
As near as I can tell from your test configuration description, you have
JMeter --> J2EE --> Postgres.
Have you ruled out the J2EE server as the problem? This problem may not be
the database.
I would take a look at your app server's health and look for any potential
issues there before spending too much time on the database. Perhaps there
are memory issues or excessive garbage collection on the app server?

Terry


On Tue, May 17, 2011 at 5:45 AM, Andrey Vorobiev <
andrey.o.vorob...@gmail.com> wrote:

> Hi, guys.
>
>
> I have following environment configuration
>
> - Postgres 8.4.7 with following postresql.conf settings modified:
>
>   listen_addresses = '*'
>
>   max_connections = 100
>
>
>   shared_buffers = 2048MB
>
>   max_prepared_transactions = 100
>
>   wal_buffers = 1024kB
>
>
>   checkpoint_segments = 64
>
>   checkpoint_completion_target = 0.8
>
>
>   log_checkpoints = on
>
>
> - Two databases. Let's call them db_1 and db_2
>
> - J2EE application server that performs inserts into databases defined
> above. (distribution transactions are used).
>
> - All constraints and indexes are on.
>
> - JMeter that acts as HTTP client and sends requests to server causing it
> to insert rows. (case of new users registration)
>
>
> After running scenario scenario described above (with 10 concurrent
> threads) I have observed following behavior:
>
>
> For the first time everything is fine and J2EE server handles about 700
> requests/sec (about 2500 inserts into several tables per second). But after
> some amount of time I observe performance degradation. In general it looks
> like the following:
>
>
> Total number of requests passed; Requests per second;
>
> 382000;   768;
>
> 546000;   765;
>
> 58;   723;
>
> 65;   700;
>
> 671000;   656;
>
> 70;   628;
>
>
> Checkpoint logging gives me the following:
>
> 2011-05-17 18:55:51 NOVST LOG:  checkpoint starting: xlog
>
> 2011-05-17 18:57:20 NOVST LOG:  checkpoint complete: wrote 62861 buffers
> (24.0%); 0 transaction log file(s) added, 0 removed, 0 recycled;
> write=89.196 s, sync=0.029 s, total=89.242 s
>
> 2011-05-17 18:57:47 NOVST LOG:  checkpoint starting: xlog
>
> 2011-05-17 18:59:02 NOVST LOG:  checkpoint complete: wrote 83747 buffers
> (31.9%); 0 transaction log file(s) added, 0 removed, 64 recycled;
> write=75.001 s, sync=0.043 s, total=75.061 s
>
> 2011-05-17 18:59:29 NOVST LOG:  checkpoint starting: xlog
>
> 2011-05-17 19:00:30 NOVST LOG:  checkpoint complete: wrote 97341 buffers
> (37.1%); 0 transaction log file(s) added, 0 removed, 64 recycled;
> write=60.413 s, sync=0.050 s, total=60.479 s
>
> 2011-05-17 19:00:55 NOVST LOG:  checkpoint starting: xlog
>
> 2011-05-17 19:01:48 NOVST LOG:  checkpoint complete: wrote 110149 buffers
> (42.0%); 0 transaction log file(s) added, 0 removed, 64 recycled;
> write=52.285 s, sync=0.072 s, total=52.379 s
>
> 2011-05-17 19:02:11 NOVST LOG:  checkpoint starting: xlog
>
> 2011-05-17 19:02:58 NOVST LOG:  checkpoint complete: wrote 120003 buffers
> (45.8%); 0 transaction log file(s) added, 0 removed, 64 recycled;
> write=46.766 s, sync=0.082 s, total=46.864 s
>
> 2011-05-17 19:03:20 NOVST LOG:  checkpoint starting: xlog
>
> 2011-05-17 19:04:18 NOVST LOG:  checkpoint complete: wrote 122296 buffers
> (46.7%); 0 transaction log file(s) added, 0 removed, 64 recycled;
> write=57.795 s, sync=0.054 s, total=57.867 s
>
> 2011-05-17 19:04:38 NOVST LOG:  checkpoint starting: xlog
>
> 2011-05-17 19:05:34 NOVST LOG:  checkpoint complete: wrote 128165 buffers
> (48.9%); 0 transaction log file(s) added, 0 removed, 64 recycled;
> write=55.061 s, sync=0.087 s, total=55.188 s
>
> 2011-05-17 19:05:53 NOVST LOG:  checkpoint starting: xlog
>
> 2011-05-17 19:06:51 NOVST LOG:  checkpoint complete: wrote 138508 buffers
> (52.8%); 0 transaction log file(s) added, 0 removed, 64 recycled;
> write=57.919 s, sync=0.106 s, total=58.068 s
>
> 2011-05-17 19:07:08 NOVST LOG:  checkpoint starting: xlog
>
> 2011-05-17 19:08:21 NOVST LOG:  checkpoint complete: wrote 132485 buffers
> (50.5%); 0 transaction log file(s) added, 0 removed, 64 recycled;
> write=72.949 s, sync=0.081 s, total=73.047 s
>
> 2011-05-17 19:08:40 NOVST LOG:  checkpoint starting: xlog
>
> 2011-05-17 19:09:48 NOVST LOG:  checkpoint complete: wrote 139542 buffers
> (53.2%); 0 transaction log file(s) added, 0 removed, 64 recycled;
> write=68.193 s, sync=0.107 s, total=68.319 s
>
> 2011-05-17 19:10:06 NOVST LOG:  checkpoint starting: xlog
>
> 2011-05-17 19:11:31 NOVST LOG:  checkpoint complete: wrote 137657 buffers
> (52.5%); 0 transaction log file(s) added, 0 removed, 64 recycled;
> write=84.575 s, sync=0.047 s, total=84.640 s
>
>
> Also I observed more heavy  IO from iostat utility.
>
>
> So my questions are:
>
> 1. How does database size affect insert performance?
>
> 2. Why does number of written buffers increase when database size grows?
>
> 3. How can I further analyze this problem?
>
> --
> Best regards.
>


[PERFORM] Hash Anti Join performance degradation

2011-05-23 Thread panam
Hi,

In my DB, there is a large table containing messages and one that contains
message boxes.
Messages are assigned to boxes via a child parent link m->b.
In order to obtain the last message for a specific box, I use the following
SQL:

SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id
AND m1.id < m2.id) WHERE m2.id IS NULL AND m1.box_id = id;

This worked quite well for a long time. But now, suddenly the performance of
this query drastically degraded as new messages were added.
If these new messages are removed again, everything is back to normal. If
other messages are removed instead, the problem remains, so it does not seem
to be a memory issue. I fear I have difficulties to understand what is going
wrong here.

This is the query plan when everything is fine:

"Seq Scan on public.box this_  (cost=0.00..10467236.32 rows=128 width=696)
(actual time=0.169..7683.978 rows=128 loops=1)"
"  Output: this_.id, this_.login, (SubPlan 1)"
"  Buffers: shared hit=188413 read=94635 written=135, temp read=22530
written=22374"
"  SubPlan 1"
"->  Hash Anti Join  (cost=41323.25..81775.25 rows=20427 width=8)
(actual time=59.571..59.877 rows=1 loops=128)"
"  Output: m1.id"
"  Hash Cond: (m1.box_id = m2.box_id)"
"  Join Filter: (m1.id < m2.id)"
"  Buffers: shared hit=188412 read=94633 written=135, temp
read=22530 written=22374"
"  ->  Bitmap Heap Scan on public.message m1  (cost=577.97..40212.28
rows=30640 width=16) (actual time=3.152..9.514 rows=17982 loops=128)"
"Output: m1.id, m1.box_id"
"Recheck Cond: (m1.box_id = $0)"
"Buffers: shared hit=131993 read=9550 written=23"
"->  Bitmap Index Scan on "message_box_Idx" 
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.840..2.840 rows=18193
loops=128)"
"  Index Cond: (m1.box_id = $0)"
"  Buffers: shared hit=314 read=6433 written=23"
"  ->  Hash  (cost=40212.28..40212.28 rows=30640 width=16) (actual
time=26.840..26.840 rows=20014 loops=115)"
"Output: m2.box_id, m2.id"
"Buckets: 4096  Batches: 4 (originally 2)  Memory Usage:
5444kB"
"Buffers: shared hit=56419 read=85083 written=112, temp
written=7767"
"->  Bitmap Heap Scan on public.message m2 
(cost=577.97..40212.28 rows=30640 width=16) (actual time=2.419..20.007
rows=20014 loops=115)"
"  Output: m2.box_id, m2.id"
"  Recheck Cond: (m2.box_id = $0)"
"  Buffers: shared hit=56419 read=85083 written=112"
"  ->  Bitmap Index Scan on "message_box_Idx" 
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.166..2.166 rows=20249
loops=115)"
"Index Cond: (m2.box_id = $0)"
"Buffers: shared hit=6708"
"Total runtime: 7685.202 ms"

This is the plan when the query gets sluggish:

"Seq Scan on public.box this_  (cost=0.00..10467236.32 rows=128 width=696)
(actual time=0.262..179333.086 rows=128 loops=1)"
"  Output: this_.id, this_.login, (SubPlan 1)"
"  Buffers: shared hit=189065 read=93983 written=10, temp read=22668
written=22512"
"  SubPlan 1"
"->  Hash Anti Join  (cost=41323.25..81775.25 rows=20427 width=8)
(actual time=1264.700..1400.886 rows=1 loops=128)"
"  Output: m1.id"
"  Hash Cond: (m1.box_id = m2.box_id)"
"  Join Filter: (m1.id < m2.id)"
"  Buffers: shared hit=189064 read=93981 written=10, temp read=22668
written=22512"
"  ->  Bitmap Heap Scan on public.message m1  (cost=577.97..40212.28
rows=30640 width=16) (actual time=3.109..9.850 rows=18060 loops=128)"
"Output: m1.id, m1.box_id"
"Recheck Cond: (m1.box_id = $0)"
"Buffers: shared hit=132095 read=9448"
"->  Bitmap Index Scan on "message_box_Idx" 
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.867..2.867 rows=18193
loops=128)"
"  Index Cond: (m1.box_id = $0)"
"  Buffers: shared hit=312 read=6435"
"  ->  Hash  (cost=40212.28..40212.28 rows=30640 width=16) (actual
time=27.533..27.533 rows=20102 loops=115)"
"Output: m2.box_id, m2.id"
"Buckets: 4096  Batches: 4 (originally 2)  Memory Usage:
5522kB"
"Buffers: shared hit=56969 read=84533 written=10, temp
written=7811"
"->  Bitmap Heap Scan on public.message m2 
(cost=577.97..40212.28 rows=30640 width=16) (actual time=2.406..20.492
rows=20102 loops=115)"
"  Output: m2.box_id, m2.id"
"  Recheck Cond: (m2.box_id = $0)"
"  Buffers: shared hit=56969 read=84533 written=10"
"  ->  Bitmap Index Scan on "message_box_Idx" 
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.170..2.170 rows=20249
loops=115)"
"Index Cond: (m2.box_id = $0)"
"

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-23 Thread Robert Haas
On Mon, May 23, 2011 at 3:08 PM, Josh Berkus  wrote:
>
>> Well, all of that stuff sounds impractically expensive to me... but I
>> just work here.
>
> I'll point out that the simple version, which just checks for hot tables
> and indexes, would improve estimates greatly and be a LOT less
> complicated than these proposals.

I realize I'm sounding like a broken record here, but as far as I can
tell there is absolutely zero evidence that that would be better.  I'm
sure you're in good company thinking so, but the list of things that
could skew (or should I say, screw) the estimates is long and painful;
and if those estimates are wrong, you'll end up with something that is
both worse and less predictable than the status quo.  First, I haven't
seen a shred of hard evidence that the contents of the buffer cache or
OS cache are stable enough to be relied upon, and we've repeatedly
discussed workloads where that might not be true.  Has anyone done a
systematic study of this on a variety real production systems?  If so,
the results haven't been posted here, at least not that I can recall.
Second, even if we were willing to accept that we could obtain
relatively stable and accurate measurements of this data, who is to
say that basing plans on it would actually result in an improvement in
plan quality?  That may seem obvious, but I don't think it is.  The
proposed method is a bit like trying to determine the altitude of a
hot air balloon by throwing the ballast over the side and timing how
long it takes to hit the ground.  Executing plans that are based on
the contents of the cache will change the contents of the cache, which
will in turn change the plans.  The idea that we can know, without any
experimentation, how that's going to shake out, seems to me to be an
exercise in unjustified optimism of the first order.

Sorry to sound grumpy and pessimistic, but I really think we're
letting our enthusiasm get way, way ahead of the evidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres not use index, IN statement

2011-05-23 Thread Craig Ringer

On 24/05/2011 5:30 AM, Anibal David Acosta wrote:


So, this is some bug?


Hard to know with the information provided. Please post EXPLAIN ANALYZE 
output.


http://wiki.postgresql.org/wiki/SlowQueryQuestions

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] "error with invalid page header" while vacuuming pgbench data

2011-05-23 Thread Kevin Grittner
John Rouillard  wrote:
 
>  I seem to be able to provoke this error:
> 
>vacuum...ERROR:  invalid page header in
> block 2128910 of relation base/16385/21476
 
What version of PostgreSQL?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] "error with invalid page header" while vacuuming pgbench data

2011-05-23 Thread John Rouillard
Hi all:

Not sure if this is a performance question or a generic admin
question. I have the following script running on a host different from
the database to use pgbench to test the database:

  pgbench -i (inital mode)
  pgsql vacuum analyze; (and some other code to dump table sizes)
  pgbench (multiple connections, jobs etc )

with a loop for setting different scales 

 I seem to be able to provoke this error:

   vacuum...ERROR:  invalid page header in
block 2128910 of relation base/16385/21476

on a pgbench database created with a scale factor of 1000 relatively
reliably (2 for 2). I am not seeing any disk errors from the raid
controller or the operating system.

Running pg_dumpall to check for errors reports:

   pg_dump: Error message from server: ERROR:  invalid page header in 
block 401585 of relation base/16385/21476

which is different from the originaly reported block.

Does anybody have any suggestions?

Configuration details.

OS: centos 5.5
Filesystem: data - ext4 (note 4 not 3); 6.6T formatted
wal  - ext4; 1.5T formatted
Raid: data - level 10, 8 disk wd2003; controller LSI MegaRAID SAS 9260-4i
  wal  - level 1,  2 disk wd2003; controller LSI MegaRAID SAS 9260-4i

Could it be an ext4 issue? It seems that ext4 may still be at the
bleeding edge for postgres use.

Thanks for any thoughts even if it's go to the admin list.

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] postgres not use index, IN statement

2011-05-23 Thread Anibal David Acosta
I have a strange situation.
I have a table of detail with millones of rows and a table of items with
thousands of rows

When I do..

select count(*) from wiz_application_response where application_item_id in
(select id from wiz_application_item where application_id=110)

This query NOT use the index on column application_item_id, instead is doing
a sequential scan

BUT, when I add the range of min and max id of the subquery, the postgres
uses the INDEX
This is the second query...

select count(*) from wiz_application_response where application_item_id
between 908 and 1030 and application_item_id in(select id from
wiz_application_item where application_id=110)

908 and 1030 are limits (lower and upper) of the subquery, the subquery
returns 100 elements aprox.

So, this is some bug?

Thanks!

Anibal



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pushing LIMIT into sub-queries of a UNION ALL?

2011-05-23 Thread Josh Berkus
On 5/23/11 8:54 AM, Dave Johansen wrote:
> I apologize for the multiple posts. I sent this email right after joining
> the list and after it hadn't shown up a day later I figured that it had been
> lost or something and sent the other one.

List moderation took a holiday while all of us were at pgCon.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pushing LIMIT into sub-queries of a UNION ALL?

2011-05-23 Thread Robert Klemme
On Mon, May 23, 2011 at 5:54 PM, Dave Johansen  wrote:
> I apologize for the multiple posts. I sent this email right after joining
> the list and after it hadn't shown up a day later I figured that it had been
> lost or something and sent the other one.

Sorry for the nitpicking but I even see _three_ instances of this
posting (first on May 18th).

> Also, the database I posted this about does not have internet access and so
> I'm working on getting it moved over to a machine that does or getting it
> the info onto a machine where I can post the pertinent information about the
> schema and explain outputs.

Great!

Cheers

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance degradation of inserts when database size grows

2011-05-23 Thread Tomas Vondra
Dne 23.5.2011 21:05, Robert Haas napsal(a):
> On Mon, May 23, 2011 at 2:46 PM, Tomas Vondra  wrote:
>> Really? He already has 64 checkpoint segments, which is about 1GB of
>> xlog data. The real problem is that the amount of buffers to write is
>> constantly growing. At the beginning there's 62861 buffers (500MB) and
>> at the end there's 137657 buffers (1GB).
>>
>> IMHO increasing the number of checkpoint segments would make this
>> disruption even worse.
> 
> Maybe - but it would also make the checkpoints less frequent, which
> might be a good thing.
> 
>> In 9.1 there's a feature that spreads checkpoint writes, but with 8.4
>> that's not possible.
> 
> What feature are you referring to here?  Checkpoint spreading was
> added in 8.3, IIRC.

You're absolutely right, I was talking about

  checkpoint_completion_target

and it was added in 8.3. Your memory is obviously better than mine.

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance degradation of inserts when database size grows

2011-05-23 Thread Robert Haas
On Tue, May 17, 2011 at 8:45 AM, Andrey Vorobiev
 wrote:
> 1. How does database size affect insert performance?

Well, if your database gets bigger, then your indexes will become
deeper, requiring more time to update.  But I'm not sure that's your
problem here.

> 2. Why does number of written buffers increase when database size grows?

It normally doesn't.

> 3. How can I further analyze this problem?

Are you actually inserting more user data into these tables, so that
they have more and more rows as time goes by, or are the data files
getting larger out of proportion to the amount of useful data in them?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-23 Thread Josh Berkus

> Well, all of that stuff sounds impractically expensive to me... but I
> just work here.

I'll point out that the simple version, which just checks for hot tables
and indexes, would improve estimates greatly and be a LOT less
complicated than these proposals.  Certainly having some form of
block-based or range-based stats would be better, but it also sounds
hard enough to maybe never get done.

Having user-accessible "hot" stats would also be useful to DBAs.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance degradation of inserts when database size grows

2011-05-23 Thread Robert Haas
On Mon, May 23, 2011 at 2:46 PM, Tomas Vondra  wrote:
> Really? He already has 64 checkpoint segments, which is about 1GB of
> xlog data. The real problem is that the amount of buffers to write is
> constantly growing. At the beginning there's 62861 buffers (500MB) and
> at the end there's 137657 buffers (1GB).
>
> IMHO increasing the number of checkpoint segments would make this
> disruption even worse.

Maybe - but it would also make the checkpoints less frequent, which
might be a good thing.

> In 9.1 there's a feature that spreads checkpoint writes, but with 8.4
> that's not possible.

What feature are you referring to here?  Checkpoint spreading was
added in 8.3, IIRC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance degradation of inserts when database size grows

2011-05-23 Thread Tomas Vondra
Dne 23.5.2011 15:30, Shaun Thomas napsal(a):
> On 05/17/2011 07:45 AM, Andrey Vorobiev wrote:
> 
>> 2011-05-17 18:55:51 NOVST LOG: checkpoint starting: xlog
>> 2011-05-17 18:57:20 NOVST LOG: checkpoint complete: wrote 62861 buffers
>> (24.0%); 0 transaction log file(s) added, 0 removed, 0 recycled;
>> write=89.196 s, sync=0.029 s, total=89.242 s
> 
> Increase your checkpoint_segments. If you see "checkpoint starting:
> xlog" instead of "checkpoint starting: time", you don't have enough
> checkpoint segments to handle your writes. Checkpoints *will* degrade
> your throughput.
> 

Really? He already has 64 checkpoint segments, which is about 1GB of
xlog data. The real problem is that the amount of buffers to write is
constantly growing. At the beginning there's 62861 buffers (500MB) and
at the end there's 137657 buffers (1GB).

IMHO increasing the number of checkpoint segments would make this
disruption even worse.

What I don't understand is that the checkpoint time does not increase
with the amount of data to write. Writing the

   62861 buffers  total=89.242 s( 5 MB/s)
   83747 buffers  total=75.061 s( 9 MB/s)
   97341 buffers  total=60.479 s(13 MB/s)
  110149 buffers  total=52.379 s(17 MB/s)
  120003 buffers  total=46.864 s(20 MB/s)
  122296 buffers  total=57.867 s(17 MB/s)
  128165 buffers  total=55.188 s(18 MB/s)
  138508 buffers  total=58.068 s(19 MB/s)
  132485 buffers  total=73.047 s(14 MB/s)
  139542 buffers  total=68.319 s(16 MB/s)
  137657 buffers  total=84.640 s(13 MB/s)

Maybe this depends on what sections of the files are modified
(contiguous vs. not contiguous), but I doubt it.

In 9.1 there's a feature that spreads checkpoint writes, but with 8.4
that's not possible. I think think this might be tuned using background
writer, just make it more aggressive.

- bgwriter_delay (decrease)
- bgwriter_lru_maxpages (increase)
- bgwriter_lru_multiplier (increase)

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Logfile

2011-05-23 Thread Nick Raj
Hi,
I am using contrib/cube code. I am building GIST index on cube data type
then it leads to a very large size of log file (nearly 220 MB for only 12k
records).
While creating index on geometry field with gist gives 1KB size of log file
for 17 lakh records.

Can someone please tell me how to stop postgres to logged so much data in
case of cube?

Thanks
Nick


Re: [PERFORM] SORT performance - slow?

2011-05-23 Thread Tomas Vondra
Dne 23.5.2011 19:01, Maciek Sakrejda napsal(a):
>> You're probably reading it wrong. The sort itself takes about 1 ms (just
>> subtract the numbers in "actual=").
> 
> I thought it was cost=startup_cost..total_cost. That is not quite the
> same thing, since startup_cost is effectively "cost to produce first
> row", and Sort can't really operate in a "streaming" fashion (well,
> theoretically, something like selection sort could, but that's beside
> the point) so it needs to do all the work up front. I'm no explain
> expert, so someone please correct me if I'm wrong.

Good point, thanks. In that case the second number (2.3 sec) is correct.

I still think the problem is not the sorting but the inaccurate
estimates - fixing this might yield a much better / faster plan. But the
OP posted just a small part of the plan, so it's hard to guess.

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SORT performance - slow?

2011-05-23 Thread Robert Haas
On Mon, May 23, 2011 at 1:01 PM, Maciek Sakrejda  wrote:
>> You're probably reading it wrong. The sort itself takes about 1 ms (just
>> subtract the numbers in "actual=").
>
> I thought it was cost=startup_cost..total_cost. That is not quite the
> same thing, since startup_cost is effectively "cost to produce first
> row", and Sort can't really operate in a "streaming" fashion (well,
> theoretically, something like selection sort could, but that's beside
> the point) so it needs to do all the work up front. I'm no explain
> expert, so someone please correct me if I'm wrong.

You are right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pushing LIMIT into sub-queries of a UNION ALL

2011-05-23 Thread Robert Haas
On Mon, May 16, 2011 at 3:38 PM, Dave Johansen  wrote:
> I am using Postgres 8.3 and I have an issue very closely related to the one
> described here:
> http://archives.postgresql.org/pgsql-general/2005-06/msg00488.php
>
> Basically, I have a VIEW which is a UNION ALL of two tables but when I do a
> select on the view using a LIMIT, it scans the entire tables and takes
> significantly longer than writing out the query with the LIMITs in the
> sub-queries themselves. Is there a solution to get the view to perform like
> the sub-query version?

I believe this is fixed by MergeAppend in 9.1.  You might want to try
9.1beta1 and see if that works better for you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: FW: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)

2011-05-23 Thread Robert Haas
On Tue, May 17, 2011 at 11:10 AM,   wrote:
> For Hstore I'm using a GIST index.

I would have thought that GIN would be a better choice for this workload.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SORT performance - slow?

2011-05-23 Thread Maciek Sakrejda
> You're probably reading it wrong. The sort itself takes about 1 ms (just
> subtract the numbers in "actual=").

I thought it was cost=startup_cost..total_cost. That is not quite the
same thing, since startup_cost is effectively "cost to produce first
row", and Sort can't really operate in a "streaming" fashion (well,
theoretically, something like selection sort could, but that's beside
the point) so it needs to do all the work up front. I'm no explain
expert, so someone please correct me if I'm wrong.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pushing LIMIT into sub-queries of a UNION ALL?

2011-05-23 Thread Dave Johansen
I apologize for the multiple posts. I sent this email right after joining
the list and after it hadn't shown up a day later I figured that it had been
lost or something and sent the other one.

Also, the database I posted this about does not have internet access and so
I'm working on getting it moved over to a machine that does or getting it
the info onto a machine where I can post the pertinent information about the
schema and explain outputs.

Thanks,
Dave

--
Dave Johansen
phone: (520) 302-4526


On Sun, May 22, 2011 at 10:34 AM, Robert Klemme
wrote:

> Dave,
>
> how often do you want to repeat that posting?  What about instead
> replying to the answers you got so far?
>
> Cheers
>
> robert
>
>
> On Tue, May 17, 2011 at 5:31 PM, Dave Johansen 
> wrote:
> > I am using Postgres 8.3 and I have an issue very closely related to the
> one
> > described here:
> > http://archives.postgresql.org/pgsql-general/2005-06/msg00488.php
> >
> > Basically, I have a VIEW which is a UNION ALL of two tables but when I do
> a
> > select on the view using a LIMIT, it scans the entire tables and takes
> > significantly longer than writing out the query with the LIMITs in the
> > sub-queries themselves. Is there a solution to get the view to perform
> like
> > the sub-query version?
> >
> > Thanks,
> > Dave
>
>
>
> --
> remember.guy do |as, often| as.you_can - without end
> http://blog.rubybestpractices.com/
>


Re: [PERFORM] Postgres refusing to use >1 core

2011-05-23 Thread Aren Cambre
>
> It's always good to hear when these things work out. Thanks for
> reporting back.
>
> Using the set-based nature of relational databases to your advantage,
> writing smarter queries that do more work server-side with fewer
> round-trips, and effective batching can make a huge difference.
>

Glad I could be a good digital citizen! :-)

Correction: it's going to run for significantly more than 2 hours, but far
less than 30 hours!

I'm loving seeing the CPU meter showing all 8 of my (fake) cores being
pounded mercilessly!

Aren


Re: [PERFORM] Performance degradation of inserts when database size grows

2011-05-23 Thread Shaun Thomas

On 05/17/2011 07:45 AM, Andrey Vorobiev wrote:


2011-05-17 18:55:51 NOVST LOG: checkpoint starting: xlog
2011-05-17 18:57:20 NOVST LOG: checkpoint complete: wrote 62861 buffers
(24.0%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=89.196 s, sync=0.029 s, total=89.242 s


Increase your checkpoint_segments. If you see "checkpoint starting: 
xlog" instead of "checkpoint starting: time", you don't have enough 
checkpoint segments to handle your writes. Checkpoints *will* degrade 
your throughput.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [OT]: Confidentiality disclosures in list posts (Was: SORT performance - slow?)

2011-05-23 Thread Shaun Thomas

On 05/20/2011 11:47 AM, Lew wrote:


I don't even know why people bother even putting such nonsense into
their emails, let alone Usenet or mailing-list posts.


This may sound like a surprise, but many of us don't. Several companies 
use an auto-append on any outgoing message not sent to an internal 
recipient. You can see this for yourselves in this message, as my 
company's little blurb gets attached after my signature lines. It's just 
standard boilerplate meant as a CYA measure, really.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: FW: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)

2011-05-23 Thread Pierre C



Hi Merlin

The analyze command gave the following result:

On the KVP table:
Index Scan using kvpidx on bench_kvp (cost=0.00..8.53 rows=1 width=180)  
(actual time=0.037..0.038 rows=1 loops=1)

Index Cond: (bench_id = '20_20'::text)
Total runtime: 0.057 ms

And on the Hstore table:
Bitmap Heap Scan on bench_hstore (cost=32.22..3507.54 rows=1000  
width=265) (actual time=145.040..256.173 rows=1 loops=1)

Recheck Cond: (bench_hstore @> '"bench_id"=>"20_20"'::hstore)
-> Bitmap Index Scan on hidx (cost=0.00..31.97 rows=1000 width=0)  
(actual time=114.748..114.748 rows=30605 loops=1)

Index Cond: (bench_hstore @> '"bench_id"=>"20_20"'::hstore)
Total runtime: 256.211 ms

For Hstore I'm using a GIST index.



Try to create a btree index on "(bench_hstore->bench_id) WHERE  
(bench_hstore->bench_id) IS NOT NULL".



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance