Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-24 Thread Vitalii Tymchyshyn

Hello.

As of me, all this hot thing really looks like uncertain and dynamic 
enough.
Two things that I could directly use right now (and they are needed in 
pair) are:
1)Per-table/index/database bufferpools (split shared buffer into parts, 
allow to specify which index/table/database goes where)

2)Per-table/index cost settings

If I had this, I could allocate specific bufferpools for tables/indexes 
that MUST be hot in memory and set low costs for this specific tables.
P.S. Third thing, great to have to companion this two is Load on 
startup flag to automatically populate bufferpools with fast sequential 
read, but this can be easily emulated with a statement.


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash Anti Join performance degradation

2011-05-24 Thread Cédric Villemain
2011/5/24 panam pa...@gmx.net:
 Hi,

 In my DB, there is a large table containing messages and one that contains
 message boxes.
 Messages are assigned to boxes via a child parent link m-b.
 In order to obtain the last message for a specific box, I use the following
 SQL:

 SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id
 AND m1.id  m2.id) WHERE m2.id IS NULL AND m1.box_id = id;

 This worked quite well for a long time. But now, suddenly the performance of
 this query drastically degraded as new messages were added.
 If these new messages are removed again, everything is back to normal. If
 other messages are removed instead, the problem remains, so it does not seem
 to be a memory issue. I fear I have difficulties to understand what is going
 wrong here.

We need more information here. The case is in fact interesting.
What's the PostgreSQL version, and did you have log of vacuum and
checkpoint activity ? (no vacuum full/cluster or such thing running ?)

Obvisouly, Craig suggestion to read
http://wiki.postgresql.org/wiki/SlowQueryQuestions is relevant as it
helps to have all common information required to analyze the issue.


 This is the query plan when everything is fine:

 Seq Scan on public.box this_  (cost=0.00..10467236.32 rows=128 width=696)
 (actual time=0.169..7683.978 rows=128 loops=1)
   Output: this_.id, this_.login, (SubPlan 1)
   Buffers: shared hit=188413 read=94635 written=135, temp read=22530
 written=22374
   SubPlan 1
     -  Hash Anti Join  (cost=41323.25..81775.25 rows=20427 width=8)
 (actual time=59.571..59.877 rows=1 loops=128)
           Output: m1.id
           Hash Cond: (m1.box_id = m2.box_id)
           Join Filter: (m1.id  m2.id)
           Buffers: shared hit=188412 read=94633 written=135, temp
 read=22530 written=22374
           -  Bitmap Heap Scan on public.message m1  (cost=577.97..40212.28
 rows=30640 width=16) (actual time=3.152..9.514 rows=17982 loops=128)
                 Output: m1.id, m1.box_id
                 Recheck Cond: (m1.box_id = $0)
                 Buffers: shared hit=131993 read=9550 written=23
                 -  Bitmap Index Scan on message_box_Idx
 (cost=0.00..570.31 rows=30640 width=0) (actual time=2.840..2.840 rows=18193
 loops=128)
                       Index Cond: (m1.box_id = $0)
                       Buffers: shared hit=314 read=6433 written=23
           -  Hash  (cost=40212.28..40212.28 rows=30640 width=16) (actual
 time=26.840..26.840 rows=20014 loops=115)
                 Output: m2.box_id, m2.id
                 Buckets: 4096  Batches: 4 (originally 2)  Memory Usage:
 5444kB
                 Buffers: shared hit=56419 read=85083 written=112, temp
 written=7767
                 -  Bitmap Heap Scan on public.message m2
 (cost=577.97..40212.28 rows=30640 width=16) (actual time=2.419..20.007
 rows=20014 loops=115)
                       Output: m2.box_id, m2.id
                       Recheck Cond: (m2.box_id = $0)
                       Buffers: shared hit=56419 read=85083 written=112
                       -  Bitmap Index Scan on message_box_Idx
 (cost=0.00..570.31 rows=30640 width=0) (actual time=2.166..2.166 rows=20249
 loops=115)
                             Index Cond: (m2.box_id = $0)
                             Buffers: shared hit=6708
 Total runtime: 7685.202 ms

 This is the plan when the query gets sluggish:

 Seq Scan on public.box this_  (cost=0.00..10467236.32 rows=128 width=696)
 (actual time=0.262..179333.086 rows=128 loops=1)
   Output: this_.id, this_.login, (SubPlan 1)
   Buffers: shared hit=189065 read=93983 written=10, temp read=22668
 written=22512
   SubPlan 1
     -  Hash Anti Join  (cost=41323.25..81775.25 rows=20427 width=8)
 (actual time=1264.700..1400.886 rows=1 loops=128)
           Output: m1.id
           Hash Cond: (m1.box_id = m2.box_id)
           Join Filter: (m1.id  m2.id)
           Buffers: shared hit=189064 read=93981 written=10, temp read=22668
 written=22512
           -  Bitmap Heap Scan on public.message m1  (cost=577.97..40212.28
 rows=30640 width=16) (actual time=3.109..9.850 rows=18060 loops=128)
                 Output: m1.id, m1.box_id
                 Recheck Cond: (m1.box_id = $0)
                 Buffers: shared hit=132095 read=9448
                 -  Bitmap Index Scan on message_box_Idx
 (cost=0.00..570.31 rows=30640 width=0) (actual time=2.867..2.867 rows=18193
 loops=128)
                       Index Cond: (m1.box_id = $0)
                       Buffers: shared hit=312 read=6435
           -  Hash  (cost=40212.28..40212.28 rows=30640 width=16) (actual
 time=27.533..27.533 rows=20102 loops=115)
                 Output: m2.box_id, m2.id
                 Buckets: 4096  Batches: 4 (originally 2)  Memory Usage:
 5522kB
                 Buffers: shared hit=56969 read=84533 written=10, temp
 written=7811
                 -  Bitmap Heap Scan on public.message m2
 (cost=577.97..40212.28 rows=30640 width=16) (actual time=2.406..20.492
 rows=20102 loops=115)
                

Re: [PERFORM] Hash Anti Join performance degradation

2011-05-24 Thread panam
Hi Craig and Cédric,

Thanks for the very informative introduction to the netiquette here and
thanks for sharing your time.
I wasn't aware of http://explain.depesz.com/, very useful.
So, here are the query plans:
http://explain.depesz.com/s/6AU (1st from previous post, good)
http://explain.depesz.com/s/YPS (2nd from previous post, bad)

 The usual cause is that the statistics for estimated row counts cross a 
 threshold that makes the query planner think that a different kind of 
 plan will be faster.

Hm, as far as i understand the plans, they are equivalent, aren't they?

 If the query planner is using bad information about the performance of 
 the storage, then it will be making bad decisions about which approach 
 is faster. So the usual thing to do is to adjust seq_page_cost and 
 random_page_cost to more closely reflect the real performance of your 
 hardware, and to make sure that effective_cache_size matches the real 
 amount of memory your computer has free for disk cache use.

Will this make any difference even when the plans are equivalent as assumed
above?

The table creation SQL is as follows:
http://pastebin.com/qFDUP7Aa (Message table); ~ 2328680 rows, is growing
constantly (~ 1 new rows each day), 
http://pastebin.com/vEmh4hb8 (Box table); ~ 128 rows (growing very slowly 1
row every two days, each row updated about 2x a day)

The DB contains the same data, except that for the good query, the last
10976 rows (0.4%) of message are removed by doing a

DELETE FROM message where timestamp  TO_DATE ('05/23/2011','mm/dd/');


This speeds up the query by a factor of ~27. (207033.081 (bad) vs. 7683.978
(good)).

Each query was run before and after a vacuum analyze, one time to create
appropriate statistics, and the second time to do the actual measurement.
All tests were made on the dev-machine, which is a 8GB, Core i7, Windows 7

I experienced the issue at first on the production-environment, which is a
64-bit Ubuntu, running PostgreSQL 9.0.1 on x86_64-unknown-linux-gnu,
compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit,
and later for analysis on the dev-environment, which is a
64-bit Windows 7, running PostgreSQL 9.0.4, compiled by Visual C++ build
1500, 64-bit
For testing, I've increased the buffers that I judge important for the issue
to the following values:
effective_cache_size: 4GB
shared_buffers: 1GB
work_mem: 1GB
temp_buffers: 32MB
After that, configuration was reloaded and the postgresql service was
restarted using pgAdmin.
Interestingly, there was no performance gain as compared to the default
settings, the bad query even took about 30 seconds (15%) longer.
As well it seems, all data fit into memory, so there is not much disk I/O
involved.

@Cédric
 did you have log of vacuum and checkpoint activity ?
 (no vacuum full/cluster or such thing running ?)
There is no clustering involved here, its a pretty basic setup.
How can I obtain the information you require here? I could send you the
output of the analyse vacuum command from pgAdmin, but is there a way to
make it output the information in English (rather than German)? 

Thanks for your interest in this issue.

Regards,
panam

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4422247.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move

2011-05-24 Thread Greg Smith

On 05/17/2011 05:47 AM, Craig Ringer wrote:
This makes me wonder if Pg attempts to pre-fetch blocks of interest 
for areas where I/O needs can be known in advance, while there's still 
other works or other I/O to do. For example, pre-fetching for the next 
iteration of a nested loop while still executing the prior one. Is it 
even possible?


Well, remember that a nested loop isn't directly doing any I/O.  It's 
pulling rows from some lower level query node.  So the useful question 
to ask is how can pre-fetch speed up the table access methods?  That 
worked out like this:


Sequential Scan:  logic here was added and measured as useful for one 
system with terrible I/O.  Everywhere else it was tried on Linux, the 
read-ahead logic in the kernel seems to make this redundant.  Punted as 
too much complexity relative to measured average gain.  You can try to 
tweak this on a per-file database in an application, but the kernel has 
almost as much information to make that decision usefully as the 
database does.


Index Scan:  It's hard to know what you're going to need in advance here 
and pipeline the reads, so this hasn't really been explored yet.


Bitmap heap scan:  Here, the exact list of blocks to fetch is known in 
advance, they're random, and it's quite possible for the kernel to 
schedule them more efficiently than serial access of them can do.  This 
was added as the effective_io_concurrency feature (it's the only thing 
that feature impacts), which so far is only proven to work on Linux.  
Any OS implementing the POSIX API used will also get this however; 
FreeBSD was the next likely candidate that might benefit when I last 
looked around.


I'm guessing not, because (AFAIK) Pg uses only synchronous blocking 
I/O, and with that there isn't really a way to pre-fetch w/o threads 
or helper processes. Linux (at least) supports buffered async I/O, so 
it'd be possible to submit such prefetch requests ... on modern Linux 
kernels. Portably doing so, though - not so much.


Linux supports the POSIX_FADV_WILLNEED advisory call, which is perfect 
for suggesting what blocks will be accessed in the near future in the 
bitmap heap scan case.  That's how effective_io_concurrency works.


Both Solaris and Linux also have async I/O mechanisms that could be used 
instead.  Greg Stark built a prototype and there's an obvious speed-up 
there to be had.  But the APIs for this aren't very standard, and it's 
really hard to rearchitect the PostgreSQL buffer manager to operate in a 
less synchronous way.  Hoping that more kernels support the will need 
API usefully, which meshes very well with how PostgreSQL thinks about 
the problem, is where things are at right now.  With so many bigger 
PostgreSQL sites on Linux, that's worked out well so far.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance degradation of inserts when database size grows

2011-05-24 Thread Tomas Vondra
Dne 24.5.2011 07:24, Terry Schmitt napsal(a):
 As near as I can tell from your test configuration description, you have
 JMeter -- J2EE -- Postgres.
 Have you ruled out the J2EE server as the problem? This problem may not
 be the database.
 I would take a look at your app server's health and look for any
 potential issues there before spending too much time on the database.
 Perhaps there are memory issues or excessive garbage collection on the
 app server?

It might be part of the problem, yes, but it's just a guess. We need to
se some iostat / iotop / vmstat output to confirm that.

The probable cause here is that the indexes grow with the table, get
deeper, so when you insert a new row you need to modify more and more
pages. That's why the number of buffers grows over time and the
checkpoint takes more and more time (the average write speed is about 15
MB/s - not sure if that's good or bad performance).

The question is whether this is influenced by other activity (Java GC or
something)

I see three ways to improve the checkpoint performance:

  1) set checkpoint_completion_target = 0.9 or something like that
 (this should spread the checkpoint, but it also increases the
 amount of checkpoint segments to keep)

  2) make the background writer more aggressive (tune the bgwriter_*
 variables), this is similar to (1)

  3) improve the write performance (not sure how random the I/O is in
 this case, but a decent controller with a cache might help)

and then two ways to decrease the index overhead / amount of modified
buffers

  1) keep only the really necessary indexes (remove duplicate, indexes,
 remove indexes where another index already performs reasonably,
 etc.)

  2) partition the table (so that only indexes on the current partition
 will be modified, and those will be more shallow)

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash Anti Join performance degradation

2011-05-24 Thread Tom Lane
panam pa...@gmx.net writes:
 In my DB, there is a large table containing messages and one that contains
 message boxes.
 Messages are assigned to boxes via a child parent link m-b.
 In order to obtain the last message for a specific box, I use the following
 SQL:

 SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id
 AND m1.id  m2.id) WHERE m2.id IS NULL AND m1.box_id = id;

BTW, this query doesn't actually match the EXPLAIN outputs...

 So from my limited experience, the only significant difference I see is that
 the Hash Anti Join takes a lot more time in plan 2, but I do not understand
 why.

Whatever's going on is below the level that EXPLAIN can show.  I can
think of a couple of possibilities:

1. The extra rows in the slower case all manage to come out to the
same hash value, or some very small number of distinct hash values, such
that we spend a lot of time searching a single hash chain.  But it's
hard to credit that adding 0.4% more rows could result in near 100x
slowdown, no matter how bad their distribution.

2. There's some inefficiency in the use of the temp files, though again
it's far from clear why your two cases would be noticeably different
there.  Possibly enabling log_temp_files would tell you something useful.

One other thing I'm not following is how come it's using hash temp files
at all, when you claim in your later message that you've got work_mem
set to 1GB.  It should certainly not take more than a couple meg to hold
20K rows at 16 payload bytes per row.  You might want to check whether
that setting actually took effect.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance degradation of inserts when database size grows

2011-05-24 Thread Greg Smith

On 05/17/2011 08:45 AM, Andrey Vorobiev wrote:

1. How does database size affect insert performance?


As indexes grow, it becomes slower to insert into them.  It has to 
navigate all of the indexes on the table to figure out where to add the 
new row into there, and that navigation time goes up when tables are 
larger.  Try using the queries at 
http://wiki.postgresql.org/wiki/Disk_Usage to quantify how big your 
indexes are.  Many people are absolutely shocked to see how large they 
become.  And some database designers throw indexes onto every possible 
column combination as if they were free.



2. Why does number of written buffers increase when database size grows?



As indexes grow, the changes needed to insert more rows get spread over 
more blocks too.


You can install pg_buffercache and analyze what's actually getting dirty 
in the buffer cache to directly measure what's changing here.  If you 
look at http://projects.2ndquadrant.com/talks and download the Inside 
the PostgreSQL Buffer Cache talk and its Sample Queries set, those 
will give you some examples of how to summarize everything.



3. How can I further analyze this problem?


This may not actually be a problem in that it's something you can 
resolve.  If you assume that you can insert into a giant table at the 
same speed you can insert into a trivial one, you'll have to adjust your 
thinking because that's never going to be true.  Removing some indexes 
may help; reducing the columns in the index is also good; and some 
people end up partitioning their data specifically to help with this 
situation.  It's also possible to regain some of the earlier performance 
using things like REINDEX and CLUSTER.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: FW: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)

2011-05-24 Thread Stefan Keller
Salut Pierre

You wrote
 Try to create a btree index on (bench_hstore-bench_id) WHERE
 (bench_hstore-bench_id) IS NOT NULL.

What  do you mean exactly?
= CREATE INDEX myhstore_kps_gin_idx ON myhstore USING gin(kvps) WHERE
??? IS NOT NULL;

My table's def is:
 CREATE TABLE myhstore ( id bigint PRIMARY KEY, kvps hstore NOT NULL );
So I'm doing something like:
CREATE INDEX myhstore_kps_gin_idx ON myhstore USING gin(kvps);

Stefan


2011/5/23 Pierre C li...@peufeu.com:

 Hi Merlin

 The analyze command gave the following result:

 On the KVP table:
 Index Scan using kvpidx on bench_kvp (cost=0.00..8.53 rows=1 width=180)
 (actual time=0.037..0.038 rows=1 loops=1)
 Index Cond: (bench_id = '20_20'::text)
 Total runtime: 0.057 ms

 And on the Hstore table:
 Bitmap Heap Scan on bench_hstore (cost=32.22..3507.54 rows=1000 width=265)
 (actual time=145.040..256.173 rows=1 loops=1)
 Recheck Cond: (bench_hstore @ 'bench_id=20_20'::hstore)
 - Bitmap Index Scan on hidx (cost=0.00..31.97 rows=1000 width=0) (actual
 time=114.748..114.748 rows=30605 loops=1)
 Index Cond: (bench_hstore @ 'bench_id=20_20'::hstore)
 Total runtime: 256.211 ms

 For Hstore I'm using a GIST index.


 Try to create a btree index on (bench_hstore-bench_id) WHERE
 (bench_hstore-bench_id) IS NOT NULL.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash Anti Join performance degradation

2011-05-24 Thread Craig Ringer
On 24/05/11 22:34, panam wrote:

 The usual cause is that the statistics for estimated row counts cross a 
 threshold that makes the query planner think that a different kind of 
 plan will be faster.
 
 Hm, as far as i understand the plans, they are equivalent, aren't they?

Yes, they are, and the estimates are too. This isn't the usual case
where the planner trips over a threshold and switches to a totally
different plan type that it thinks is faster, but isn't.

The estimates are actually IDENTICAL for the hash anti-join node of
interest, and so are the actual loop count and row count. Temp file
activity is much the same across both plans too.

You can reproduce this behaviour consistently? It's _seriously_ weird,
and the sort of thing that when I encounter myself I tend to ask what
else is going on that I'm missing?.

What happens if you DELETE more rows? Or fewer? What's the threshold?

What happens if you DELETE rows from the start not the end, or a random
selection?

Does the problem persist if you DELETE the rows then CLUSTER the table
before running the query?

Does the problem persist if you DELETE the rows then REINDEX?

 If the query planner is using bad information about the performance of 
 the storage, then it will be making bad decisions about which approach 
 is faster. [snip]
 
 Will this make any difference even when the plans are equivalent as assumed
 above?

Nope. It doesn't seem to be a problem with plan selection.

 This speeds up the query by a factor of ~27. (207033.081 (bad) vs. 7683.978
 (good)).

That's a serious WTF.

 @Cédric
 did you have log of vacuum and checkpoint activity ?
 (no vacuum full/cluster or such thing running ?)
 There is no clustering involved here, its a pretty basic setup.

He means 'CLUSTER', the SQL command that tells PostgreSQL to re-organize
a table.

The answer from the rest of your post would appear to be 'no, it's being
run in an otherwise-idle stand-alone test environment'. Right?

 How can I obtain the information you require here? I could send you the
 output of the analyse vacuum command from pgAdmin, but is there a way to
 make it output the information in English (rather than German)?

It's easy enough to read familiar output like that in German, if needs be.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance degradation of inserts when database size grows

2011-05-24 Thread Santhakumaran


Tomas Vondra t...@fuzzy.cz wrote:

Dne 24.5.2011 07:24, Terry Schmitt napsal(a):
 As near as I can tell from your test configuration description, you have
 JMeter -- J2EE -- Postgres.
 Have you ruled out the J2EE server as the problem? This problem may not
 be the database.
 I would take a look at your app server's health and look for any
 potential issues there before spending too much time on the database.
 Perhaps there are memory issues or excessive garbage collection on the
 app server?

It might be part of the problem, yes, but it's just a guess. We need to
se some iostat / iotop / vmstat output to confirm that.

The probable cause here is that the indexes grow with the table, get
deeper, so when you insert a new row you need to modify more and more
pages. That's why the number of buffers grows over time and the
checkpoint takes more and more time (the average write speed is about 15
MB/s - not sure if that's good or bad performance).

The question is whether this is influenced by other activity (Java GC or
something)

I see three ways to improve the checkpoint performance:

  1) set checkpoint_completion_target = 0.9 or something like that
 (this should spread the checkpoint, but it also increases the
 amount of checkpoint segments to keep)

  2) make the background writer more aggressive (tune the bgwriter_*
 variables), this is similar to (1)

  3) improve the write performance (not sure how random the I/O is in
 this case, but a decent controller with a cache might help)

and then two ways to decrease the index overhead / amount of modified
buffers

  1) keep only the really necessary indexes (remove duplicate, indexes,
 remove indexes where another index already performs reasonably,
 etc.)

  2) partition the table (so that only indexes on the current partition
 will be modified, and those will be more shallow)

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance