Re: [PERFORM] SQL statement over 500% slower with 9.2 compared with 9.1

2013-08-27 Thread Jeff Janes
On Monday, August 26, 2013, Rafael Martinez wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hello
>
> We have a SQL statement that with 9.1 takes ca 4000ms to finnish and
> with 9.2 over 22000ms.
>
> The explain analyze information is here:
>

Could you do explain (analyze, buffers) of these?


>
> With 9.1.:
> http://explain.depesz.com/s/5ou
>
> With 9.2
> http://explain.depesz.com/s/d4vU
>
> The SQL statement is:
>
> SELECT  firstname || ' ' || lastname AS Name
> FROMPerson R
> WHERE  R.gender like 'F'
> AND  19 < (SELECT COUNT(DISTINCT filmId)
>   FROM   FilmParticipation F
>   WHERE  F.partType = 'director' AND
>  F.personId = R.personId)
>

What happens if you excise the "19 < (select ...)" clause?

That would greatly simplify the analysis, assuming the problem remains.

How many distinct filmId are there?



>
> We can see that the query plan is very different between versions and
> that 9.2 is really wrong with the number of rows involved. Why is 9.2
> taking so wrong about the number of rows involved in some parts of the
> plan?
>

Most directors are not also actors, so there is a strong negative
correlation that PostgreSQL is not aware of. However, I think if you could
get 9.1 to report the same path, it would be just as wrong on that
estimate.  But since it doesn't report the same path, you don't see how
wrong it is.

Try running:

explain (analyze, buffers)
 SELECT  D.personId
FROMFilmParticipation D
WHERE   D.partType = 'director'
--AND D.personId = R.personId
AND NOT EXISTS (
SELECT  *
FROMFilmParticipation C
WHERE   C.partType = 'cast'
AND C.filmId = D.filmId
AND C.personId = D.personId
   );

On both 9.1 and 9.2.

Cheers,

Jeff


Re: [PERFORM] Poor performance on simple queries compared to sql server express

2013-08-27 Thread Tomas Vondra
Hi,

On 27.8.2013 06:06, Adam Ma'ruf wrote:
> Hi
> 
> Thanks for the response.  I reran the query but first ran the statement
> you provided and set working mem to 2gb.  It ended up taking 133s and
> group aggregate was still used

OK.

> 
> Here are the values you asked for:
> # - Planner Method Configuration -
> # - Planner Cost Constants -

All set to default, so seems fine to me.

> 
> #seq_page_cost = 1.0# measured on an arbitrary scale
> #random_page_cost = 4.0# same scale as above
> #cpu_tuple_cost = 0.01# same scale as above
> #cpu_index_tuple_cost = 0.005# same scale as above
> #cpu_operator_cost = 0.0025# same scale as above
> #effective_cache_size = 6000MB

Well, if effective_cache_size is commented out, then it's still 128MB
(default). But I don't think that matters here.

> The output of select * from pg_statistics is large...should I attach it
> as a separate file (not sure if that's allowed on these mailing lists)

I haven't asked for pg_statistics dump. I asked for pg_settings (but I
already got most of the important pieces above).


> The data is ~2.5gb, I can't think of any place I can upload it.  I can

There's like a zillion of such places. E.g. Dropbox, Box, Wuala, Google
Drive, mega.co.nz or one of the many other alternatives. All of them
give you ~5GB space for free.

Or I could give you access to my FTP server, if that's what you prefer.

> provide the columns and data type.  it's a subset of public data from
> usaspending.gov 

Is there a simple way to download / filter the public data to get the
same dataset as you have?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SQL statement over 500% slower with 9.2 compared with 9.1

2013-08-27 Thread Tomas Vondra
On 27.8.2013 11:19, Rafael Martinez wrote:
> On 08/26/2013 02:33 PM, Rafael Martinez wrote:
> []
>> The SQL statement is:
> 
>> SELECT  firstname || ' ' || lastname AS Name FROMPerson R WHERE
>> R.gender like 'F' AND  19 < (SELECT COUNT(DISTINCT filmId) FROM
>> FilmParticipation F WHERE  F.partType = 'director' AND F.personId =
>> R.personId) AND NOT EXISTS ( SELECT  * FROM
>> FilmParticipation D WHERE   D.partType = 'director' AND D.personId
>> = R.personId AND NOT EXISTS ( SELECT  * FROMFilmParticipation
>> C WHERE   C.partType = 'cast' AND C.filmId = D.filmId AND
>> C.personId = D.personId ) ) ;
> 
> 
> [.]
> 
>> We can see that the query plan is very different between versions
>> and that 9.2 is really wrong with the number of rows involved. Why
>> is 9.2 taking so wrong about the number of rows involved in some
>> parts of the plan?
> 
> 
> Hei
> 
> More information:
> 
> If we turn off enable_indexscan the runtime gets more similar to the
> one we get with 9.1, we are down to 4200ms.
> 
> The query plan with this configuration is here:
> http://explain.depesz.com/s/jVR
> 
> The question remains the same, why is 9.2 using such a different and
> bad plan compared to 9.1, when the data and the configuration are the
> same?

Hi,

seems the problem is mostly about the inner-most query, i.e. this:

SELECT  *
FROMFilmParticipation C
WHERE   C.partType = 'cast'
AND C.filmId = D.filmId
AND C.personId = D.personId
)

In 9.2 it's estimated to return 1 row, but it returns 595612 of them (or
97780 after materialization). I believe this is the culprit that causes
cost estimates that are way off, and that in turn leads to choice of
"cheaper" plan that actually takes much longer to evaluate.

Because the slow plan is estimated to "cost" 122367017.97 while the fast
one 335084834.95 (i.e. 3x more).

I don't immediately see where's the problem - maybe some other hacker on
this list can. Can you prepare a testcase for this? I.e. a structure of
the tables + data so that we can reproduce it?

regards
Tomas



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-08-27 Thread Merlin Moncure
On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund  wrote:
> On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote:
>> + bool
>> + RecoveryMightBeInProgress(void)
>> + {
>> + /*
>> +  * We check shared state each time only until we leave recovery mode. 
>> We
>> +  * can't re-enter recovery, so there's no need to keep checking after 
>> the
>> +  * shared variable has once been seen false.
>> +  */
>> + if (!LocalRecoveryInProgress)
>> + return false;
>> + else
>> + {
>> + /* use volatile pointer to prevent code rearrangement */
>> + volatile XLogCtlData *xlogctl = XLogCtl;
>> +
>> + /* Intentionally query xlogctl without spinlocking! */
>> + LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress;
>> +
>> + return LocalRecoveryInProgress;
>> + }
>> + }
>
> I don't think it's acceptable to *set* LocalRecoveryInProgress
> here. That should only be done in the normal routine.

quite right -- that was a major error -- you could bypass the
initialization call to the xlog with some bad luck.

merlin


recovery3.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-08-27 Thread Andres Freund
On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote:
> + bool
> + RecoveryMightBeInProgress(void)
> + {
> + /*
> +  * We check shared state each time only until we leave recovery mode. We
> +  * can't re-enter recovery, so there's no need to keep checking after 
> the
> +  * shared variable has once been seen false.
> +  */
> + if (!LocalRecoveryInProgress)
> + return false;
> + else
> + {
> + /* use volatile pointer to prevent code rearrangement */
> + volatile XLogCtlData *xlogctl = XLogCtl;
> + 
> + /* Intentionally query xlogctl without spinlocking! */
> + LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress;
> + 
> + return LocalRecoveryInProgress;
> + }
> + }

I don't think it's acceptable to *set* LocalRecoveryInProgress
here. That should only be done in the normal routine.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-08-27 Thread Merlin Moncure
On Tue, Aug 27, 2013 at 9:12 AM, Merlin Moncure  wrote:
> Something like the attached.  Note, this patch is for research
> purposes only and should *not* be applied to your production
> environment.

Here is a revised version that is missing the spurious whitespace edit.

merlin


recovery2.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-08-27 Thread Merlin Moncure
On Tue, Aug 27, 2013 at 8:38 AM, Merlin Moncure  wrote:
> On Tue, Aug 27, 2013 at 8:23 AM, Merlin Moncure  wrote:
>> It looks like you're hitting spinlock connection inside
>> heap_page_prune_opt().  Which is commented:
>>  * Note: this is called quite often.  It's important that it fall out quickly
>>  * if there's not any use in pruning.
>>
>> This in turn calls RecoveryInProgress() which spinlocks in order to
>> get a guaranteed result.  At that call site, we are told:
>> /*
>> * We can't write WAL in recovery mode, so there's no point trying to
>> * clean the page. The master will likely issue a cleaning WAL record soon
>> * anyway, so this is no particular loss.
>> */
>>
>> So ISTM it's necessary to pedantically check  RecoveryInProgress on
>> each and every call of this routine (or at least, we should be able to
>> reduce the number of spinlocks).
>>
>> Hm, what if we exposed LocalRecoveryInProgress() through a function
>> which would approximately satisfy the condition
>> "MightRecoveryInProgress()" in the basis the condition only moves in
>> one direction?  That could lead to optimization around the spinlock in
>> hot path cases like this where getting 'TRUE' incorrectly is mostly
>> harmless...
>
> More specifically, this hypothetical routine would query
> xlogctl->SharedRecoveryInProgress without taking a lock and would not
> issue InitXLOGAccess(). RecoveryInProgress() seems to be called
> everywhere (In particular: StartTransaction()) so I don't think
> there's a lot of risk in terms of losing access to the xlog.

Something like the attached.  Note, this patch is for research
purposes only and should *not* be applied to your production
environment.

merlin


recovery.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-08-27 Thread Merlin Moncure
On Tue, Aug 27, 2013 at 8:23 AM, Merlin Moncure  wrote:
> It looks like you're hitting spinlock connection inside
> heap_page_prune_opt().  Which is commented:
>  * Note: this is called quite often.  It's important that it fall out quickly
>  * if there's not any use in pruning.
>
> This in turn calls RecoveryInProgress() which spinlocks in order to
> get a guaranteed result.  At that call site, we are told:
> /*
> * We can't write WAL in recovery mode, so there's no point trying to
> * clean the page. The master will likely issue a cleaning WAL record soon
> * anyway, so this is no particular loss.
> */
>
> So ISTM it's necessary to pedantically check  RecoveryInProgress on
> each and every call of this routine (or at least, we should be able to
> reduce the number of spinlocks).
>
> Hm, what if we exposed LocalRecoveryInProgress() through a function
> which would approximately satisfy the condition
> "MightRecoveryInProgress()" in the basis the condition only moves in
> one direction?  That could lead to optimization around the spinlock in
> hot path cases like this where getting 'TRUE' incorrectly is mostly
> harmless...

More specifically, this hypothetical routine would query
xlogctl->SharedRecoveryInProgress without taking a lock and would not
issue InitXLOGAccess(). RecoveryInProgress() seems to be called
everywhere (In particular: StartTransaction()) so I don't think
there's a lot of risk in terms of losing access to the xlog.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-08-27 Thread Merlin Moncure
On Tue, Aug 27, 2013 at 2:57 AM, Дмитрий Дегтярёв  wrote:
> Hello.
>
> Exist 2 identical server DELL PowerEdge™ R720, CPU Dual Intel® Xeon® E5-2620
> Hexa-Core inkl, RAM 256Gb, RAID-10 8 x 600 GB SAS 6 Gb/s 15000 rpm.
>
> $ cat /etc/fedora-release
> Fedora release 19
>
> $ postgres --version
> postgres (PostgreSQL) 9.2.4
>
> Data ~220Gb and Indexes ~140Gb
>
> iowait ~0.2-0.5. Disk usage only write ~0-2 Mb/sec
>
> On each installed pg_bouncer. Pool size 24.
>
> On Master in peak load ~1200 request/sec, ~30 ms/request avg, 24 CPU ~95% -
> this is no problem
> $ perf top
>  21,14%  [kernel] [k] isolate_freepages_block
>  12,27%  [unknown][.] 0x7fc1bb303be0
>   5,93%  postgres [.] hash_search_with_hash_value
>   4,85%  libbz2.so.1.0.6  [.] 0xa6e0
>   2,70%  postgres [.] PinBuffer
>   2,34%  postgres [.] slot_deform_tuple
>   1,92%  libbz2.so.1.0.6  [.] BZ2_compressBlock
>   1,85%  postgres [.] LWLockAcquire
>   1,69%  postgres [.] heap_page_prune_opt
>   1,48%  postgres [.] _bt_checkkeys
>   1,40%  [kernel] [k] page_fault
>   1,36%  postgres [.] _bt_compare
>   1,23%  postgres [.] heap_hot_search_buffer
>   1,19%  [kernel] [k] get_pageblock_flags_group
>   1,18%  postgres [.] AllocSetAlloc
>
> On Slave max ~400-500 request/sec, ~200 and up 24 ms/request avg, 24 CPU
> ~95% - this is problem
> $ perf top
>  30,10%  postgres   [.] s_lock
>  22,90%  [unknown]  [.] 0x00729cfe
>   4,98%  postgres   [.] RecoveryInProgress.part.9
>   4,89%  postgres   [.] LWLockAcquire
>   4,57%  postgres   [.] hash_search_with_hash_value
>   3,50%  postgres   [.] PinBuffer
>   2,31%  postgres   [.] heap_page_prune_opt


It looks like you're hitting spinlock connection inside
heap_page_prune_opt().  Which is commented:
 * Note: this is called quite often.  It's important that it fall out quickly
 * if there's not any use in pruning.

This in turn calls RecoveryInProgress() which spinlocks in order to
get a guaranteed result.  At that call site, we are told:
/*
* We can't write WAL in recovery mode, so there's no point trying to
* clean the page. The master will likely issue a cleaning WAL record soon
* anyway, so this is no particular loss.
*/

So ISTM it's necessary to pedantically check  RecoveryInProgress on
each and every call of this routine (or at least, we should be able to
reduce the number of spinlocks).

Hm, what if we exposed LocalRecoveryInProgress() through a function
which would approximately satisfy the condition
"MightRecoveryInProgress()" in the basis the condition only moves in
one direction?  That could lead to optimization around the spinlock in
hot path cases like this where getting 'TRUE' incorrectly is mostly
harmless...

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SQL statement over 500% slower with 9.2 compared with 9.1

2013-08-27 Thread Rafael Martinez
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/26/2013 02:33 PM, Rafael Martinez wrote:
[]
> The SQL statement is:
> 
> SELECT  firstname || ' ' || lastname AS Name FROMPerson R WHERE
> R.gender like 'F' AND  19 < (SELECT COUNT(DISTINCT filmId) FROM
> FilmParticipation F WHERE  F.partType = 'director' AND F.personId =
> R.personId) AND NOT EXISTS ( SELECT  * FROM
> FilmParticipation D WHERE   D.partType = 'director' AND D.personId
> = R.personId AND NOT EXISTS ( SELECT  * FROMFilmParticipation
> C WHERE   C.partType = 'cast' AND C.filmId = D.filmId AND
> C.personId = D.personId ) ) ;
> 
> 
[.]
> 
> We can see that the query plan is very different between versions
> and that 9.2 is really wrong with the number of rows involved. Why
> is 9.2 taking so wrong about the number of rows involved in some
> parts of the plan?
> 

Hei

More information:

If we turn off enable_indexscan the runtime gets more similar to the
one we get with 9.1, we are down to 4200ms.

The query plan with this configuration is here:
http://explain.depesz.com/s/jVR

The question remains the same, why is 9.2 using such a different and
bad plan compared to 9.1, when the data and the configuration are the
same?

regards,
- -- 
 Rafael Martinez Guerrero
 Center for Information Technology
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAlIcbx8ACgkQBhuKQurGihReJgCcCiEfGQ0rZHcazlN3Ihb2PeCn
jOsAnjkh1M0j4r1DQJ4Xb1djZ+y4mji3
=Td8b
-END PGP SIGNATURE-


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-08-27 Thread Дмитрий Дегтярёв
Hello.

Exist 2 identical server DELL PowerEdge™ R720, CPU Dual Intel® Xeon®
E5-2620 Hexa-Core inkl, RAM 256Gb, RAID-10 8 x 600 GB SAS 6 Gb/s 15000 rpm.

$ cat /etc/fedora-release
Fedora release 19

$ postgres --version
postgres (PostgreSQL) 9.2.4

Data ~220Gb and Indexes ~140Gb

iowait ~0.2-0.5. Disk usage only write ~0-2 Mb/sec

On each installed pg_bouncer. Pool size 24.

On Master in peak load ~1200 request/sec, ~30 ms/request avg, 24 CPU ~95% -
this is no problem
$ perf top
 21,14%  [kernel] [k] isolate_freepages_block
 12,27%  [unknown][.] 0x7fc1bb303be0
  5,93%  postgres [.] hash_search_with_hash_value
  4,85%  libbz2.so.1.0.6  [.] 0xa6e0
  2,70%  postgres [.] PinBuffer
  2,34%  postgres [.] slot_deform_tuple
  1,92%  libbz2.so.1.0.6  [.] BZ2_compressBlock
  1,85%  postgres [.] LWLockAcquire
  1,69%  postgres [.] heap_page_prune_opt
  1,48%  postgres [.] _bt_checkkeys
  1,40%  [kernel] [k] page_fault
  1,36%  postgres [.] _bt_compare
  1,23%  postgres [.] heap_hot_search_buffer
  1,19%  [kernel] [k] get_pageblock_flags_group
  1,18%  postgres [.] AllocSetAlloc

On Slave max ~400-500 request/sec, ~200 and up 24 ms/request avg, 24 CPU
~95% - this is problem
$ perf top
 30,10%  postgres   [.] s_lock
 22,90%  [unknown]  [.] 0x00729cfe
  4,98%  postgres   [.] RecoveryInProgress.part.9
  4,89%  postgres   [.] LWLockAcquire
  4,57%  postgres   [.] hash_search_with_hash_value
  3,50%  postgres   [.] PinBuffer
  2,31%  postgres   [.] heap_page_prune_opt
  2,27%  postgres   [.] LWLockRelease
  1,18%  postgres   [.] heap_hot_search_buffer
  1,03%  postgres   [.] AllocSetAlloc
...

Slave at a much lower load than the master hangs on the function s_lock.
What can be done about it?

On Slave ~300 request/sec, ~5-8 ms/request avg, cpu usage ~20% - normal
work by small load
$ perf top
 10,74%  postgres   [.] hash_search_with_hash_value
  4,94%  postgres   [.] PinBuffer
  4,61%  postgres   [.] AllocSetAlloc
  3,57%  postgres   [.] heap_page_prune_opt
  3,24%  postgres   [.] LWLockAcquire
  2,47%  postgres   [.] heap_hot_search_buffer
  2,11%  postgres   [.] SearchCatCache
  1,90%  postgres   [.] LWLockRelease
  1,87%  postgres   [.] _bt_compare
  1,68%  postgres   [.] FunctionCall2Coll
  1,46%  postgres   [.] _bt_checkkeys
  1,24%  postgres   [.] copyObject
  1,15%  postgres   [.] RecoveryInProgress.part.9
  1,05%  postgres   [.] slot_deform_tuple
...


Configuration Master postgres.conf
listen_addresses = '*'
max_connections = 100
shared_buffers = 200GB
work_mem = 20MB
maintenance_work_mem = 2GB
effective_io_concurrency = 4
wal_level = hot_standby
fsync = on
synchronous_commit = off
full_page_writes = on
wal_writer_delay = 200ms
checkpoint_segments = 100
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'pbzip2 -f -c %p > /opt/pg/wals/wals/%f.bz2'
max_wal_senders = 3
random_page_cost = 0.5
cpu_tuple_cost = 0.02
cpu_index_tuple_cost = 0.01
cpu_operator_cost = 0.005
effective_cache_size = 40GB
default_statistics_target = 300
logging_collector = on
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_min_duration_statement = 1000
log_checkpoints = on
log_line_prefix = '%t %p %c-%l %x %q(%u, %d, %r, %a) '
log_lock_waits = on
track_io_timing = on
track_activity_query_size = 4096
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_freeze_max_age = 15
datestyle = 'iso, dmy'
timezone = 'Europe/Moscow'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'ru_RU.UTF-8'
lc_numeric = 'ru_RU.UTF-8'
lc_time = 'ru_RU.UTF-8'
default_text_search_config = 'pg_catalog.russian'
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = all
max_locks_per_transaction = 264

Configuration Slave postgres.conf
listen_addresses = '*'
max_connections = 100
shared_buffers = 200GB
work_mem = 20MB
maintenance_work_mem = 2GB
effective_io_concurrency = 4
wal_level = hot_standby
fsync = on
synchronous_commit = off
full_page_writes = on
wal_writer_delay = 200ms
commit_delay = 1000
commit_siblings = 2
checkpoint_segments = 100
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'pbzip2 -f -c %p > /opt/pg/wals/wals/%f.bz2'
max_wal_senders = 4
hot_standby = on
max_standby_archive_delay = 30s
max_standby_streaming_delay = 30s
hot_standby_feedback = on
random_page_cost = 0.5
cpu_tuple_cost = 0.02
cpu_index_tuple_co