Re: [PERFORM] Explain Analyze - actual time in loops

2017-09-08 Thread Igor Neyman
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Neto pr
Sent: Thursday, September 07, 2017 11:17 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Explain Analyze - actual time in loops

…
 ###
   -> Index Scan using idx_l_partkeylineitem000x on lineitem (cost = 
0.57..97.65 rows = 26 width = 36)
  (current time = 23.615..419.113 rows = 30 loops = 26469)
  Index Cond: (l_partkey = part.p_partkey)
## #
According to the documentation, one should multiply the Actual Time by the 
number of Loops.
That is: 419113 ms -> 419113/1000/60 = 6.9 minutes * 26469 (loops) = 182.6 
minutes.

But how does this stretch take 182.6 minutes, if the entire query ran in 66 
minutes?

…….
thank you and best regards
[] 's Neto
Neto,
The time you see there is in ms, so the point (‘.’) you see is the digital 
point.
So, it is 419.113ms or a little less than half a second (0.419sec).
Igor Neyman


Re: [PERFORM] performance problem on big tables

2017-08-21 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Mariel Cherkassky
Sent: Monday, August 21, 2017 10:20 AM
To: MichaelDBA 
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] performance problem on big tables

I had a system that consist from many objects(procedures,functions..) on an 
oracle database. We decided to integrate that system to postgresql. That system 
coppied alot of big tables from a different read only oracle database and 
preformed on it alot of queries to produce reports. The part of getting the 
data is part of some procedures, I cant change it so freely. I'm searching a 
way to improve the perfomance of the database because I'm sure that I didnt 
conifgure something well. Moreover, When I run complicted queries (joint 
between 4 big tables and filtering) it takes alot of time and I see that the 
server is cacheing all my ram memory.


Probably your joins are done on Postgres side.

m.b. instead of Postgres pulling data from Oracle, you should try pushing data 
from Oracle to Postgres using Oracle’s Heterogeneous Services and Postgres ODBC 
driver. In this case you do your joins and filtering on Oracles side and just 
push the result set to Postgres.
That’s how I did migration from Oracle to Postgres.

Regards,
Igor Neyman


Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Friday, July 14, 2017 3:13 PM
To: Charles Nadeau 
Cc: Jeff Janes ; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman mailto:iney...@perceptron.com>>
Cc: Jeff Janes mailto:jeff.ja...@gmail.com>>; 
pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content 
of the directory that stores the temporary files, I found that I need at most 
21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

   QUERY PLAN

 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual 
time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual 
time=2227678.194..2227678.217 rows=50 loops=1)
 Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
 ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) 
(actual time=2227678.192..2227678.202 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 654395kB
   Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
   ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 
width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
 ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 
width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 3049216kB
   Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
   ->  Gather  (cost=30060688.07..48003007.07 
rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 
loops=1)
 Workers Planned: 12
 Workers Launched: 12
 Buffers: shared hit=728798037 read=82974833
 ->  Hash Semi Join  
(cost=30059688.07..47951761.31 rows=220376 width=20) (actual 
time=1268845.181..2007864.725 rows=7057357 loops=13)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=728795193 
read=82974833
   ->  Nested Loop  (cost=0.03..17891246.86 
rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
 Buffers: shared hit=590692229 
read=14991777
 ->  Parallel Seq Scan on flows  
(cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 
rows=45371630 loops=13)
   Buffers: shared hit=860990 
read=14991777
 ->  Index Only Scan using 
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual 
time=0.002..0.002 rows=1 loops=589831190)
   Index Cond: (ipaddr >>= 
(flows.srcaddr)::ip4r)
   Heap Fetches: 0
   Buffers: shared hit=589831203
  

Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Igor Neyman
From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman 
Cc: Jeff Janes ; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content 
of the directory that stores the temporary files, I found that I need at most 
21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

   QUERY PLAN

 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual 
time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual 
time=2227678.194..2227678.217 rows=50 loops=1)
 Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
 ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) 
(actual time=2227678.192..2227678.202 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 654395kB
   Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
   ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 
width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
 ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 
width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 3049216kB
   Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
   ->  Gather  (cost=30060688.07..48003007.07 
rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 
loops=1)
 Workers Planned: 12
 Workers Launched: 12
 Buffers: shared hit=728798037 read=82974833
 ->  Hash Semi Join  
(cost=30059688.07..47951761.31 rows=220376 width=20) (actual 
time=1268845.181..2007864.725 rows=7057357 loops=13)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=728795193 
read=82974833
   ->  Nested Loop  (cost=0.03..17891246.86 
rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
 Buffers: shared hit=590692229 
read=14991777
 ->  Parallel Seq Scan on flows  
(cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 
rows=45371630 loops=13)
   Buffers: shared hit=860990 
read=14991777
 ->  Index Only Scan using 
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual 
time=0.002..0.002 rows=1 loops=589831190)
   Index Cond: (ipaddr >>= 
(flows.srcaddr)::ip4r)
   Heap Fetches: 0
   Buffers: shared hit=589831203
   ->  Hash  (cost=30059641.47..30059641.47 
rows=13305 width=4) (actual time=1268811.101..1268811.101 rows=3803508 loops=13)
 Buckets: 4194304 (originally 
16384)  Batches: 1 (originally 1)  Memory Usage: 166486kB
 Buffers: shared hit=138102964 
read=67983056
 ->  HashAggregate  
(

Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Igor Neyman


From: 
pgsql-performance-ow...@postgresql.org
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Charles Nadeau
Sent: Wednesday, July 12, 2017 6:05 AM
To: Jeff Janes mailto:jeff.ja...@gmail.com>>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent


flows=# explain (analyze, buffers) SELECT DISTINCT
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport,
flows-#COUNT(*) AS conversation,
flows-#SUM(doctets) / 1024 / 1024 AS mbytes
flows-# FROM
flows-#flowscompact,
flows-#mynetworks
flows-# WHERE
flows-#mynetworks.ipaddr >>= flowscompact.srcaddr
flows-#AND dstaddr IN
flows-#(
flows(#   SELECT
flows(#  dstaddr
flows(#   FROM
flows(#  dstexterne
flows(#)
flows-# GROUP BY
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport
flows-# ORDER BY
flows-#mbytes DESC LIMIT 50;
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.6", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.7", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.8", size 639696896
LOG:  duration: 2765020.327 ms  statement: explain (analyze, buffers) SELECT 
DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

  QUERY PLAN
--
 Limit  (cost=37762321.83..37762321.98 rows=50 width=52) (actual 
time=2764548.863..2764548.891 rows=50 loops=1)
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Unique  (cost=37762321.83..37769053.57 rows=2243913 width=52) (actual 
time=2764548.861..2764548.882 rows=50 loops=1)
 Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37762321.83..37763443.79 rows=2243913 width=52) 
(actual time=2764548.859..2764548.872 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 563150kB
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
   I/O Timings: read=5323746.860
   ->  GroupAggregate  (cost=37698151.34..37714980.68 rows=2243913 
width=52) (actual time=2696721.610..2752109.551 rows=4691734 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37698151.34..37699273.29 rows=2243913 
width=20) (actual time=2696711.428..2732781.705 rows=81896988 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 2721856kB
   Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Gather  (cost=19463936.00..37650810.19 
rows=2243913 width=20) (actual time=1777219.713..2590530.887 rows=81896988 
loops=1)
 Workers Planned: 9
 Workers Launched: 9
 Buffers: shared hit=1116590559 read=15851133
 I/O Timings: read=5323746.860
 ->  Hash Semi Join  
(cost=19462936.00..37622883.23 rows=249324 width=20) (actual 
time=1847579.360..2602039.780 rows=8189699 loops=10)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=1116588309 
read=15851133
   I/O Timings: read=5323746.860
   ->  Nested Loop  (cost=0.03..18159012.30 
rows=249324 width=20) (actual time=1.562..736556.583 rows=45499045 loops=10)
 Buffers: shared hit=996551813 
read=15851133
 

Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Charles Nadeau
Sent: Wednesday, July 12, 2017 6:05 AM
To: Jeff Janes 
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent


flows=# explain (analyze, buffers) SELECT DISTINCT
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport,
flows-#COUNT(*) AS conversation,
flows-#SUM(doctets) / 1024 / 1024 AS mbytes
flows-# FROM
flows-#flowscompact,
flows-#mynetworks
flows-# WHERE
flows-#mynetworks.ipaddr >>= flowscompact.srcaddr
flows-#AND dstaddr IN
flows-#(
flows(#   SELECT
flows(#  dstaddr
flows(#   FROM
flows(#  dstexterne
flows(#)
flows-# GROUP BY
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport
flows-# ORDER BY
flows-#mbytes DESC LIMIT 50;
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.6", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.7", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.8", size 639696896
LOG:  duration: 2765020.327 ms  statement: explain (analyze, buffers) SELECT 
DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

  QUERY PLAN
--
 Limit  (cost=37762321.83..37762321.98 rows=50 width=52) (actual 
time=2764548.863..2764548.891 rows=50 loops=1)
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Unique  (cost=37762321.83..37769053.57 rows=2243913 width=52) (actual 
time=2764548.861..2764548.882 rows=50 loops=1)
 Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37762321.83..37763443.79 rows=2243913 width=52) 
(actual time=2764548.859..2764548.872 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 563150kB
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
   I/O Timings: read=5323746.860
   ->  GroupAggregate  (cost=37698151.34..37714980.68 rows=2243913 
width=52) (actual time=2696721.610..2752109.551 rows=4691734 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37698151.34..37699273.29 rows=2243913 
width=20) (actual time=2696711.428..2732781.705 rows=81896988 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 2721856kB
   Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Gather  (cost=19463936.00..37650810.19 
rows=2243913 width=20) (actual time=1777219.713..2590530.887 rows=81896988 
loops=1)
 Workers Planned: 9
 Workers Launched: 9
 Buffers: shared hit=1116590559 read=15851133
 I/O Timings: read=5323746.860
 ->  Hash Semi Join  
(cost=19462936.00..37622883.23 rows=249324 width=20) (actual 
time=1847579.360..2602039.780 rows=8189699 loops=10)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=1116588309 
read=15851133
   I/O Timings: read=5323746.860
   ->  Nested Loop  (cost=0.03..18159012.30 
rows=249324 width=20) (actual time=1.562..736556.583 rows=45499045 loops=10)
 Buffers: shared hit=996551813 
read=15851133
 I/O Timings: read=5323746.860
 ->  Parallel Seq Scan on flo

Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Igor Neyman

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Wednesday, July 12, 2017 3:21 AM
To: Igor Neyman 
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

I set shared_buffers to 24 GB and effective_cache_size to 64GB and I can see 
that the queries are faster due to the fact that the index are used more often. 
Knowing I have 72GB of RAM and the server is exclusively dedicated to 
Postgresql, what could be the maximum value for effective_cache?
Thanks!

Charles

64GB for effective_cache_size should be good enough, adding couple more GB 
wouldn’t change much.

Igor


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Igor Neyman

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Tuesday, July 11, 2017 11:25 AM
To: Igor Neyman 
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent


Attention: This email was sent from someone outside of Perceptron. Always 
exercise caution when opening attachments or clicking links from unknown 
senders or when receiving unexpected emails.

Igor,

The sum of effective_cache_size and shared_buffer will be higher than the 
physical memory I have. Is it OK?
Thanks!

Charles

Yes, that’s normal.

shared_buffers is the maximum that Postgres allowed to allocate, while 
effective_cache_size is just a number that optimizer takes into account when 
creating execution plan.

Igor



Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Tuesday, July 11, 2017 10:34 AM
To: Charles Nadeau 
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Tuesday, July 11, 2017 6:43 AM
To: Igor Neyman mailto:iney...@perceptron.com>>
Cc: Andreas Kretschmer 
mailto:andr...@a-kretschmer.de>>; 
pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

I reduced the value of random_page_cost to 4 but the read speed remains low.
Regarding effective_cache_size and shared_buffer, do you mean they should be 
both equal to 64GB?
Thanks for suggestions!

Charles

No, they should not be equal.
From the docs:

effective_cache_size (integer)
Sets the planner's assumption about the effective size of the disk cache that 
is available to a single query. This is factored into estimates of the cost of 
using an index; a higher value makes it more likely index scans will be used, a 
lower value makes it more likely sequential scans will be used. When setting 
this parameter you should consider both PostgreSQL's shared buffers and the 
portion of the kernel's disk cache that will be used for PostgreSQL data files. 
Also, take into account the expected number of concurrent queries on different 
tables, since they will have to share the available space. This parameter has 
no effect on the size of shared memory allocated by PostgreSQL, nor does it 
reserve kernel disk cache; it is used only for estimation purposes. The system 
also does not assume data remains in the disk cache between queries. The 
default is 4 gigabytes (4GB).
So, I’d set shared_buffers at 24GB and effective_cache_size at 64GB.

Regards,
Igor

Also, maybe it’s time to look at execution plans (explain analyze) of specific 
slow queries, instead of trying to solve the problem “in general”.

Igor



Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Igor Neyman

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Tuesday, July 11, 2017 6:43 AM
To: Igor Neyman 
Cc: Andreas Kretschmer ; 
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

I reduced the value of random_page_cost to 4 but the read speed remains low.
Regarding effective_cache_size and shared_buffer, do you mean they should be 
both equal to 64GB?
Thanks for suggestions!

Charles

No, they should not be equal.
From the docs:

effective_cache_size (integer)
Sets the planner's assumption about the effective size of the disk cache that 
is available to a single query. This is factored into estimates of the cost of 
using an index; a higher value makes it more likely index scans will be used, a 
lower value makes it more likely sequential scans will be used. When setting 
this parameter you should consider both PostgreSQL's shared buffers and the 
portion of the kernel's disk cache that will be used for PostgreSQL data files. 
Also, take into account the expected number of concurrent queries on different 
tables, since they will have to share the available space. This parameter has 
no effect on the size of shared memory allocated by PostgreSQL, nor does it 
reserve kernel disk cache; it is used only for estimation purposes. The system 
also does not assume data remains in the disk cache between queries. The 
default is 4 gigabytes (4GB).
So, I’d set shared_buffers at 24GB and effective_cache_size at 64GB.

Regards,
Igor



Re: [PERFORM] Very poor read performance, query independent

2017-07-10 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Charles Nadeau
Sent: Monday, July 10, 2017 11:48 AM
To: Andreas Kretschmer 
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Andreas,

Because the ratio between the Sequential IOPS and Random IOPS is about 29. 
Taking into account that part of the data is in RAM, I obtained an "effective" 
ratio of about 22.
Thanks!

Charles

On Mon, Jul 10, 2017 at 5:35 PM, Andreas Kretschmer 
mailto:andr...@a-kretschmer.de>> wrote:


Am 10.07.2017 um 16:03 schrieb Charles Nadeau:
random_page_cost | 22


why such a high value for random_page_cost?

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com<http://www.2ndQuadrant.com>


--
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Considering RAM size of 72 GB and your database size of ~225GB, and also the 
fact that Postgres is the only app running on the server, probably 1/3 of your 
database resides in memory, so random_page_cost = 22 looks extremely high, 
probably it completely precludes index usage in your queries.

You should try this setting at least at its default value: random_page_cost =4, 
and probably go even lower.
Also, effective_cache_size is at least as big as your shared_buffers. Having 
72GB RAM t effective_cache_size should be set around 64GB (again considering 
that Postgres is the only app running on the server).

Regards,
Igor Neyman






Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-22 Thread Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Thursday, September 22, 2016 10:36 AM
To: Sven R. Kunze ; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause


-Original Message-
From: Igor Neyman 
Sent: Thursday, September 22, 2016 10:33 AM
To: 'Sven R. Kunze' ; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sven R. Kunze
Sent: Thursday, September 22, 2016 9:25 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

Hi pgsql-performance list,


what is the recommended way of doing **multiple-table-spanning joins with ORs 
in the WHERE-clause**?


Until now, we've used the LEFT OUTER JOIN to filter big_table like so:


SELECT DISTINCT 
FROM
 "big_table"
 LEFT OUTER JOIN "table_a" ON ("big_table"."id" =
"table_a"."big_table_id")
 LEFT OUTER JOIN "table_b" ON ("big_table"."id" =
"table_b"."big_table_id")
WHERE
 "table_a"."item_id" IN ()
 OR
 "table_b"."item_id" IN ();


However, this results in an awful slow plan (requiring to scan the complete 
big_table which obviously isn't optimal).
So, we decided (at least for now) to split up the query into two separate ones 
and merge/de-duplicate the result with application logic:


SELECT 
FROM
 "big_table" INNER JOIN "table_a" ON ("big_table"."id" =
"table_a"."big_table_id")
WHERE
 "table_a"."item_id" IN ();


SELECT 
FROM
 "big_table" INNER JOIN "table_b" ON ("big_table"."id" = 
"table_b"."big_table_id")
WHERE
 "table_b"."item_id" IN ();


As you can imagine we would be very glad to solve this issue with a 
single query and without having to re-code existing logic of PostgreSQL. 
But how?


Best,
Sven


PS: if you require EXPLAIN ANALYZE, I can post them as well.

______

Another option to try::


SELECT DISTINCT 
FROM
"big_table"
LEFT OUTER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id" 
AND  "table_a"."item_id" IN ())
LEFT OUTER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id" 
AND "table_b"."item_id" IN ());

Regards,
Igor Neyman

___

Please disregard this last suggestion, it'll not produce required results.

Solution using UNION should work.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-22 Thread Igor Neyman

-Original Message-
From: Igor Neyman 
Sent: Thursday, September 22, 2016 10:33 AM
To: 'Sven R. Kunze' ; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sven R. Kunze
Sent: Thursday, September 22, 2016 9:25 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

Hi pgsql-performance list,


what is the recommended way of doing **multiple-table-spanning joins with ORs 
in the WHERE-clause**?


Until now, we've used the LEFT OUTER JOIN to filter big_table like so:


SELECT DISTINCT 
FROM
 "big_table"
 LEFT OUTER JOIN "table_a" ON ("big_table"."id" =
"table_a"."big_table_id")
 LEFT OUTER JOIN "table_b" ON ("big_table"."id" =
"table_b"."big_table_id")
WHERE
 "table_a"."item_id" IN ()
 OR
 "table_b"."item_id" IN ();


However, this results in an awful slow plan (requiring to scan the complete 
big_table which obviously isn't optimal).
So, we decided (at least for now) to split up the query into two separate ones 
and merge/de-duplicate the result with application logic:


SELECT 
FROM
 "big_table" INNER JOIN "table_a" ON ("big_table"."id" =
"table_a"."big_table_id")
WHERE
 "table_a"."item_id" IN ();


SELECT 
FROM
 "big_table" INNER JOIN "table_b" ON ("big_table"."id" = 
"table_b"."big_table_id")
WHERE
 "table_b"."item_id" IN ();


As you can imagine we would be very glad to solve this issue with a 
single query and without having to re-code existing logic of PostgreSQL. 
But how?


Best,
Sven


PS: if you require EXPLAIN ANALYZE, I can post them as well.

______

Another option to try::


SELECT DISTINCT 
FROM
"big_table"
LEFT OUTER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id" 
AND  "table_a"."item_id" IN ())
LEFT OUTER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id" 
AND "table_b"."item_id" IN ());

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-22 Thread Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sven R. Kunze
Sent: Thursday, September 22, 2016 9:25 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

Hi pgsql-performance list,


what is the recommended way of doing **multiple-table-spanning joins with ORs 
in the WHERE-clause**?


Until now, we've used the LEFT OUTER JOIN to filter big_table like so:


SELECT DISTINCT 
FROM
 "big_table"
 LEFT OUTER JOIN "table_a" ON ("big_table"."id" = 
"table_a"."big_table_id")
 LEFT OUTER JOIN "table_b" ON ("big_table"."id" = 
"table_b"."big_table_id")
WHERE
 "table_a"."item_id" IN ()
 OR
 "table_b"."item_id" IN ();


However, this results in an awful slow plan (requiring to scan the 
complete big_table which obviously isn't optimal).
So, we decided (at least for now) to split up the query into two 
separate ones and merge/de-duplicate the result with application logic:


SELECT 
FROM
 "big_table" INNER JOIN "table_a" ON ("big_table"."id" = 
"table_a"."big_table_id")
WHERE
 "table_a"."item_id" IN ();


SELECT 
FROM
 "big_table" INNER JOIN "table_b" ON ("big_table"."id" = 
"table_b"."big_table_id")
WHERE
 "table_b"."item_id" IN ();


As you can imagine we would be very glad to solve this issue with a 
single query and without having to re-code existing logic of PostgreSQL. 
But how?


Best,
Sven


PS: if you require EXPLAIN ANALYZE, I can post them as well.

______

What about:

SELECT 
FROM
 "big_table" INNER JOIN "table_a" ON ("big_table"."id" = 
"table_a"."big_table_id")
WHERE
 "table_a"."item_id" IN ()
UNION
SELECT 
FROM
 "big_table" INNER JOIN "table_b" ON ("big_table"."id" = 
"table_b"."big_table_id")
WHERE
 "table_b"."item_id" IN ();


Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Problem with performance using query with unnest after migrating from V9.1 to V9.2 and higher

2016-09-22 Thread Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Knels, Udo
Sent: Thursday, September 22, 2016 8:40 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Problem with performance using query with unnest after 
migrating from V9.1 to V9.2 and higher

Hi,

I tried the following on the upgraded database:
analyze schema_test.table_a;

But the result is the same. 

https://explain.depesz.com/s/hsx5

"Sort  (cost=5.94..6.01 rows=26 width=6) (actual time=0.199..0.200 rows=3 
loops=1)"
"  Sort Key: table_a.col0002"
"  Sort Method: quicksort  Memory: 25kB"
"  Buffers: shared hit=1"
"  ->  HashAggregate  (cost=5.07..5.33 rows=26 width=6) (actual 
time=0.161..0.163 rows=3 loops=1)"
"Group Key: table_a.col0002"
"Buffers: shared hit=1"
"->  Hash Semi Join  (cost=2.76..4.95 rows=50 width=6) (actual 
time=0.070..0.133 rows=26 loops=1)"
"  Hash Cond: ((table_a.col0001)::text = 
(unnest('{501001}'::text[])))"
"  Buffers: shared hit=1"
"  ->  Seq Scan on table_a  (cost=0.00..1.50 rows=50 width=17) 
(actual time=0.015..0.034 rows=50 loops=1)"
"Buffers: shared hit=1"
"  ->  Hash  (cost=1.51..1.51 rows=100 width=32) (actual 
time=0.028..0.028 rows=1 loops=1)"
"Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"->  Result  (cost=0.00..0.51 rows=100 width=0) (actual 
time=0.015..0.017 rows=1 loops=1)"
"Planning time: 0.653 ms"
"Execution time: 0.326 ms"

Greetings

Udo Knels
treubuch IT GmbH
_____

table_a is too small, just 50 records.
Optimizer decided (correctly) that Seq Scan is cheaper than using an index.

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random slow queries

2016-06-29 Thread Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of 
devel.brai...@xoxy.net
Sent: Tuesday, June 28, 2016 9:24 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Random slow queries

Hi,

I have a weird slow query issue I can't seem to find the cause of, so I'm 
hoping someone here can shed some light on this.





Given the reasonably small dataset (a pg_dump of the full database containing 
about 2500 jobs is less than 1MB) I would think that the whole database fits in 
memory anyway, making this issue all the more puzzling. Have I missed something 
obvious?

Best regards,
Roel

__

Did you try AUTO_EXPLAIN extension 
(https://www.postgresql.org/docs/9.3/static/auto-explain.html) for diagnostic 
purposes?
With auto_explain.loganalize = true it will log automatically EXPLAIN ANALYZE 
output, rather than just EXPLAIN output.  Turning this parameter ON permanently 
could have negative impact on over-all performance, so use it judiciously.  

Regards,
Igor


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big number of connections

2016-03-31 Thread Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Jarek
Sent: Thursday, March 31, 2016 3:08 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Big number of connections

Hello!

We are going to build system based on PostgreSQL database for huge 
number of individual users (few thousands). Each user will have his own 
account, for authorization we will use Kerberos (MIT or Windows). 
Most of users will have low activity, but for various reasons, connection 
should be open all the time.
I'd like to know what potential problems and limitations we can expect with 
such deployment.
During preliminary testing we have found that for each connection we 
need ~1MB RAM. Is there any way to decrease this ? Is there any risk, that such 
number of users will degrade performance ?
I'll be happy to hear any remarks and suggestions related to design, 
administration and handling of such installation.

best regards
Jarek

___

Take a look at PgBouncer.
It should solve your problems.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Primary key index partially used

2016-01-26 Thread Igor Neyman

From: Igor Neyman
Sent: Tuesday, January 26, 2016 11:01 AM
To: 'Florian Gossin' ; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Primary key index partially used


From: 
pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Florian Gossin
Sent: Tuesday, January 26, 2016 10:52 AM
To: pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: [PERFORM] Primary key index partially used

Hi all,
I'm using PostgreSQL 9.4.5 and I have a weird issue.
I have the following three tables:

visit
( nb bigint NOT NULL,
  CONSTRAINT visit_pkey PRIMARY KEY (nb)
)
with ~ 750'000 rows

invoice
( id bigint NOT NULL,
  CONSTRAINT invoice_pkey PRIMARY KEY (id)
)
with ~ 3'000'000 rows

visit_invoice
( invoice_id bigint NOT NULL,
  visit_nb bigint NOT NULL,
  CONSTRAINT visit_invoice_pkey PRIMARY KEY (visit_nb, invoice_id),
  CONSTRAINT fk_vis_inv_inv FOREIGN KEY (invoice_id)
  REFERENCES invoice (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_vis_inv_vis FOREIGN KEY (visit_nb)
  REFERENCES visit (nb) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE
)
with ~ 3'000'000 rows

When I delete a row in visit table, it runs the trigger for constraint 
fk_vis_inv_vis and it seems to use the primary key index on visit_invoice:
explain analyze DELETE FROM visit WHERE nb = 213;

 Delete on visit  (cost=0.42..8.44 rows=1 width=6) (actual time=2.225..2.225 
rows=0 loops=1)
   ->  Index Scan using visit_pkey on visit  (cost=0.42..8.44 rows=1 width=6) 
(actual time=2.084..2.088 rows=1 loops=1)
 Index Cond: (nb = 213)
 Planning time: 0.201 ms
 Trigger for constraint fk_vis_inv_vis: time=0.673 calls=1

But when I delete a record in the table invoice, it runs the trigger for 
constraint fk_vis_inv_vis and it doesn't seem to use the primary key index on 
visit_invoice:

explain analyze DELETE FROM invoice WHERE id = 30140470;

 Delete on invoice  (cost=0.43..8.45 rows=1 width=6) (actual time=0.109..0.109 
rows=0 loops=1)
   ->  Index Scan using invoice_pkey on invoice  (cost=0.43..8.45 rows=1 
width=6) (actual time=0.060..0.060 rows=1 loops=1)
 Index Cond: (id = 30140470)
 Planning time: 0.156 ms
 Trigger for constraint fk_vis_inv_inv: time=219.122 calls=1
So, if I create explicitly an index for the second column (which is already 
part of the primary key), it seems to use it because the trigger execution is 
really faster:

CREATE INDEX fki_vis_inv_inv
  ON visit_invoice
  USING btree
  (invoice_id);

explain analyze DELETE FROM invoice WHERE id = 30140470;

 Delete on invoice  (cost=0.43..8.45 rows=1 width=6) (actual time=0.057..0.057 
rows=0 loops=1)
   ->  Index Scan using invoice_pkey on invoice  (cost=0.43..8.45 rows=1 
width=6) (actual time=0.039..0.040 rows=1 loops=1)
 Index Cond: (id = 120043571)
 Planning time: 0.074 ms
 Trigger for constraint fk_vis_inv_inv: time=0.349 calls=1
So I have tried to create the primary key differently, like PRIMARY KEY 
(invoice_id, visit_nb), and in that case it is the opposite, the deletion of 
the invoice is very fast and the deletion of the visit is really slower, unless 
I create a specific index as above.
So my question is: why is my index on the primary key not used by both triggers 
and why should I always create an explicit index on the second column ?
Thanks.
Florian

First, It’s a god (for performance) practice to create indexes on FK columns in 
“child” table.
Second, PG is using index only if the first column in concatenated index is 
used in WHERE clause.  That is exactly what you observe.

Regards,
Igor Neyman

“god” -> good ☺



Re: [PERFORM] Primary key index partially used

2016-01-26 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Florian Gossin
Sent: Tuesday, January 26, 2016 10:52 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Primary key index partially used

Hi all,
I'm using PostgreSQL 9.4.5 and I have a weird issue.
I have the following three tables:

visit
( nb bigint NOT NULL,
  CONSTRAINT visit_pkey PRIMARY KEY (nb)
)
with ~ 750'000 rows

invoice
( id bigint NOT NULL,
  CONSTRAINT invoice_pkey PRIMARY KEY (id)
)
with ~ 3'000'000 rows

visit_invoice
( invoice_id bigint NOT NULL,
  visit_nb bigint NOT NULL,
  CONSTRAINT visit_invoice_pkey PRIMARY KEY (visit_nb, invoice_id),
  CONSTRAINT fk_vis_inv_inv FOREIGN KEY (invoice_id)
  REFERENCES invoice (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_vis_inv_vis FOREIGN KEY (visit_nb)
  REFERENCES visit (nb) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE
)
with ~ 3'000'000 rows

When I delete a row in visit table, it runs the trigger for constraint 
fk_vis_inv_vis and it seems to use the primary key index on visit_invoice:
explain analyze DELETE FROM visit WHERE nb = 213;

 Delete on visit  (cost=0.42..8.44 rows=1 width=6) (actual time=2.225..2.225 
rows=0 loops=1)
   ->  Index Scan using visit_pkey on visit  (cost=0.42..8.44 rows=1 width=6) 
(actual time=2.084..2.088 rows=1 loops=1)
 Index Cond: (nb = 213)
 Planning time: 0.201 ms
 Trigger for constraint fk_vis_inv_vis: time=0.673 calls=1

But when I delete a record in the table invoice, it runs the trigger for 
constraint fk_vis_inv_vis and it doesn't seem to use the primary key index on 
visit_invoice:

explain analyze DELETE FROM invoice WHERE id = 30140470;

 Delete on invoice  (cost=0.43..8.45 rows=1 width=6) (actual time=0.109..0.109 
rows=0 loops=1)
   ->  Index Scan using invoice_pkey on invoice  (cost=0.43..8.45 rows=1 
width=6) (actual time=0.060..0.060 rows=1 loops=1)
 Index Cond: (id = 30140470)
 Planning time: 0.156 ms
 Trigger for constraint fk_vis_inv_inv: time=219.122 calls=1
So, if I create explicitly an index for the second column (which is already 
part of the primary key), it seems to use it because the trigger execution is 
really faster:

CREATE INDEX fki_vis_inv_inv
  ON visit_invoice
  USING btree
  (invoice_id);

explain analyze DELETE FROM invoice WHERE id = 30140470;

 Delete on invoice  (cost=0.43..8.45 rows=1 width=6) (actual time=0.057..0.057 
rows=0 loops=1)
   ->  Index Scan using invoice_pkey on invoice  (cost=0.43..8.45 rows=1 
width=6) (actual time=0.039..0.040 rows=1 loops=1)
 Index Cond: (id = 120043571)
 Planning time: 0.074 ms
 Trigger for constraint fk_vis_inv_inv: time=0.349 calls=1
So I have tried to create the primary key differently, like PRIMARY KEY 
(invoice_id, visit_nb), and in that case it is the opposite, the deletion of 
the invoice is very fast and the deletion of the visit is really slower, unless 
I create a specific index as above.
So my question is: why is my index on the primary key not used by both triggers 
and why should I always create an explicit index on the second column ?
Thanks.
Florian

First, It’s a god (for performance) practice to create indexes on FK columns in 
“child” table.
Second, PG is using index only if the first column in concatenated index is 
used in WHERE clause.  That is exactly what you observe.

Regards,
Igor Neyman




Re: [PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-10 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of David Osborne
Sent: Tuesday, November 10, 2015 12:32 PM
To: Tom Lane 
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow 3 Table Join with v bad row estimate

Ok - wow.

Adding that index, I get the same estimate of 1 row, but a runtime of ~450ms.
A 23000ms improvement.

http://explain.depesz.com/s/TzF8h

This is great. So as a general rule of thumb, if I see a Join Filter removing 
an excessive number of rows, I can check if that condition can be added to an 
index from the same table which is already being scanned.

Thanks for this!

David,
I believe the plan you are posting is the old plan.
Could you please post explain analyze with the index that Tom suggested?

Regards,
Igor Neyman


Re: [PERFORM] One long transaction or multiple short transactions?

2015-10-06 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Carlo
Sent: Monday, October 05, 2015 11:11 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] One long transaction or multiple short transactions?

We have a system which is constantly importing flat file data feeds into 
normalized tables in a DB warehouse over 10-20 connections. Each data feed row 
results in a single transaction of multiple single row writes to multiple 
normalized tables.

The more columns in the feed row, the more write operations, longer the 
transaction.

Operators are noticing that splitting a single feed of say - 100 columns - into 
two consecutive feeds of 50 columns improves performance dramatically. I am 
wondering whether the multi-threaded and very busy import environment causes 
non-linear performance degradation for longer transactions. Would the operators 
be advised to rewrite the feeds to result in more smaller transactions rather 
than fewer, longer ones?

Carlo



?  over 10-20 connections

How many cores do you have on that machine?
Test if limiting number of simultaneous feeds, like bringing their number down 
to half of your normal connections has the same positive effect.

Regards,
Igor Neyman


Re: [PERFORM] shared-buffers set to 24GB but the RAM only use 4-5 GB average

2015-10-05 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Monday, October 05, 2015 2:25 PM
To: FattahRozzaq ; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] shared-buffers set to 24GB but the RAM only use 4-5 GB 
average



-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of FattahRozzaq
Sent: Monday, October 05, 2015 10:51 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] shared-buffers set to 24GB but the RAM only use 4-5 GB 
average

I have configured postgresql.conf with parameters as below:

log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_rotation_age = 1d
log_rotation_size = 1024MB
listen_addresses = '*'
checkpoint_segments = 64
wal_keep_segments = 128
max_connections = 
max_prepared_transactions = 
checkpoint_completion_target = 0.9
default_statistics_target = 10
maintenance_work_mem = 1GB
effective_cache_size = 64GB
shared_buffers = 24GB
work_mem = 5MB
wal_buffers = 8MB
port = 40003
pooler_port = 40053
gtm_host = 'node03'
gtm_port = 10053

As you can see, I have set the shared_buffers to 24GB, but my server still only 
use 4-5 GB average.
I have 128GB RAM in a single server.
My database has 2 tables:
- room (3GB size if pg_dump'ed)
- message (17GB if pg_dump'ed)

The backend application is a messaging server, in average there will be 40-180 
connections to the postgres Server.
The traffic is quite almost-heavy.

How to make postgres-xl effectively utilizes the resource of RAM for
 max_connections?


Thanks,
FattahRozzaq


Why are you looking at memory consumption?
Are you experiencing performance problems?

Regards,
Igor Neyman

___

Also,
Postgres-xl has it's own mailing lists:
http://sourceforge.net/p/postgres-xl/mailman/

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared-buffers set to 24GB but the RAM only use 4-5 GB average

2015-10-05 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of FattahRozzaq
Sent: Monday, October 05, 2015 10:51 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] shared-buffers set to 24GB but the RAM only use 4-5 GB 
average

I have configured postgresql.conf with parameters as below:

log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_rotation_age = 1d
log_rotation_size = 1024MB
listen_addresses = '*'
checkpoint_segments = 64
wal_keep_segments = 128
max_connections = 
max_prepared_transactions = 
checkpoint_completion_target = 0.9
default_statistics_target = 10
maintenance_work_mem = 1GB
effective_cache_size = 64GB
shared_buffers = 24GB
work_mem = 5MB
wal_buffers = 8MB
port = 40003
pooler_port = 40053
gtm_host = 'node03'
gtm_port = 10053

As you can see, I have set the shared_buffers to 24GB, but my server still only 
use 4-5 GB average.
I have 128GB RAM in a single server.
My database has 2 tables:
- room (3GB size if pg_dump'ed)
- message (17GB if pg_dump'ed)

The backend application is a messaging server, in average there will be 40-180 
connections to the postgres Server.
The traffic is quite almost-heavy.

How to make postgres-xl effectively utilizes the resource of RAM for
 max_connections?


Thanks,
FattahRozzaq


Why are you looking at memory consumption?
Are you experiencing performance problems?

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Server slowing down over time

2015-09-03 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Jean Cavallo
Sent: Thursday, August 27, 2015 1:21 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Server slowing down over time

Hi,

I am currently working on a data migration for a client.
The general plan is :
  - Read data from a postgresql database
  - Convert them to the new application
  - Insert in another database (same postgresql instance).

The source database is rather big (~40GB, wo indexes), and the
conversion process takes some time. It is done by multiple workers
on a separate Linux environnement, piece by piece.

When we start the migration, at first it looks good.
Performances are good, and it ran smoothly. After a few hours,
we noticed that things started to slow down. Some queries seemed
to be stuck, so we waited for them to end, and restarted the server.

After that it went well for some time (~10 minutes), then it slowed
down again. We tried again (a few times), and the pattern repeats.

My postgresql specific problem is that it looks like the server gets
stuck. CPU usage is <10%, RAM usage is under 50% max, there is
no noticeable disk usage. But, there are some (<10) active queries,
some of which may take several hours to complete. Those queries
work properly (i.e < 1min) right after the server restarts.

So my question is : What could slow the queries from ~1min to 2hours
which does not involve CPU, Memory, or disk usage, and which would
"reset" when restarting the server ?

For information, the number of processes does not seem to be the
problem, there are ~20 connections with max_connection set to 100.
We noticed at some point that the hard drive holding the target
database was heavily fragmented (100%...), but defrag did not
seem to change anything.

Also, the queries that appear to get stuck are "heavy" queries,
though after a fresh restart they execute in a reasonable time.

Finally, whatever causes the database to wait also causes the
Windows instance to slow down. But restarting Postgresql fixes
this as well.

Configuration :

The Postgresql server runs on a Windows Virtual Machine under
VMWare. The VM has dedicated resources, and the only other
VM on the host is the applicative server (which runs idle while
waiting for the database). There is nothing else running on the
server except postgresql (well, there were other things, but we
stopped everything to no avail).

PostgreSQL 9.3.5, compiled by Visual C++ build 1600, 64-bit
Windows 2008R2 (64 bits)
10 Go RAM
4 vCPU

Host : VMWare ESXi 5.5.0 build-2068190
CPU Intel XEON X5690 3.97GHz
HDD 3x Nearline SAS 15K RAID0

Please let me know if any other information may be useful.

Jean Cavallo


Having 4 CPUs, I’d try to decrease number of connections from ~20 to 8, and see 
if “slowing down” still happens.

Regards,
Igor Neyman



Re: [PERFORM] Index creation running now for 14 hours

2015-08-26 Thread Igor Neyman


From: Tory M Blue [mailto:tmb...@gmail.com]
Sent: Wednesday, August 26, 2015 3:26 PM
To: Igor Neyman 
Cc: pgsql-performance 
Subject: Re: [PERFORM] Index creation running now for 14 hours



On Wed, Aug 26, 2015 at 12:18 PM, Igor Neyman 
mailto:iney...@perceptron.com>> wrote:


From: 
pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>
 
[mailto:pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>]
 On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance 
mailto:pgsql-performance@postgresql.org>>
Subject: [PERFORM] Index creation running now for 14 hours

I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it 
started this particular tables index creation at 10:16pm and it's still 
running. 1 single core is at 100% (32 core box) and there is almost zero I/O 
activity.

CentOS 6.6


 16398 | clsdb | 25765 |   10 | postgres | slon.remoteWorkerThread_1 | 
10.13.200.232 | |   45712 | 2015-08-25 21:12:01.6
19819-07 | 2015-08-25 21:22:08.68766-07  | 2015-08-25 22:16:03.10099-07  | 
2015-08-25 22:16:03.100992-07 | f   | active | select "_cls".fini
shTableAfterCopy(143); analyze "torque"."impressions";
I was wondering if there were underlying tools to see how it's progressing, or 
if there is anything I can do to bump the performance mid creation? Nothing I 
can do really without stopping postgres or slon, but that would start me back 
at square one.

Thanks
Tory


i

Check pg_locks in regards to the table in question.

Regards,
Igor Neyman

thanks Igor I did, but not clear what that is telling me, there are 249 rows in 
there, nothing has a table name , they are all for the PID in the "analyze 
torque.impressions line that I listed above pid 25765.

Here is one for an exclusive lock, but what should I be looking for? There are 
no other processes on this box other than slon and this index creation.


 transactionid |  ||  |   ||  
93588453 | |   |  | 4/25823460 | 25765 | ExclusiveL
ock   | t   | f

Thanks
Tory

There are objects OIDs in pg_lock, not names.
Find the OID of the table that you create your index for, and search pg_locks 
for the records referencing your table.
It cannot be that all records in pg_locks are for pid running “analyze”, there 
should be records with pid running your “create index”.
What’s the size of the table you are indexing?
Also, take a look at pg_stat_activity for long running transactions/queries.

Igor Neyman



Re: [PERFORM] Index creation running now for 14 hours

2015-08-26 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance 
Subject: [PERFORM] Index creation running now for 14 hours

I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it 
started this particular tables index creation at 10:16pm and it's still 
running. 1 single core is at 100% (32 core box) and there is almost zero I/O 
activity.

CentOS 6.6


 16398 | clsdb | 25765 |   10 | postgres | slon.remoteWorkerThread_1 | 
10.13.200.232 | |   45712 | 2015-08-25 21:12:01.6
19819-07 | 2015-08-25 21:22:08.68766-07  | 2015-08-25 22:16:03.10099-07  | 
2015-08-25 22:16:03.100992-07 | f   | active | select "_cls".fini
shTableAfterCopy(143); analyze "torque"."impressions";
I was wondering if there were underlying tools to see how it's progressing, or 
if there is anything I can do to bump the performance mid creation? Nothing I 
can do really without stopping postgres or slon, but that would start me back 
at square one.

Thanks
Tory


iostat: sdb is the db directory

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.550.000.230.000.00   96.22

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   1.00 0.0012.00  0 24
sdb   0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.570.000.060.000.00   96.37

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   0.00 0.00 0.00  0  0
sdb  21.50 0.00 15484.00  0  30968

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.720.000.060.000.00   96.22

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   2.00 0.0020.00  0 40
sdb   0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   4.060.000.050.020.00   95.87

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   4.00 0.0064.00  0128
sdb   3.50 0.00   108.00  0216

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.360.000.030.000.00   96.61

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   0.00 0.00 0.00  0  0
sdb   0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.410.000.060.000.00   96.53

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   0.00 0.00 0.00  0  0
sdb   0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.450.000.270.000.00   96.28

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   0.00 0.00 0.00  0  0
sdb   1.00 0.0024.00  0 48

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.500.000.300.000.00   96.20

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   1.50 0.00   344.00  0688
sdb   0.00 0.00 0.00  0  0

Check pg_locks in regards to the table in question.

Regards,
Igor Neyman


Re: [PERFORM] Performance bottleneck due to array manipulation

2015-08-21 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Genc, Ömer
Sent: Friday, August 21, 2015 8:49 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Performance bottleneck due to array manipulation

Hey,

i have a very long running stored procedure, due to array manipulation in a 
stored procedure. The following procedure takes 13 seconds to finish.

BEGIN
point_ids_older_than_one_hour := '{}';
object_ids_to_be_invalidated := '{}';

select ARRAY(SELECT
point_id
from ONLY
public.ims_point as p
where
p.timestamp < m_before_one_hour
)
into point_ids_older_than_one_hour ; -- this array has a size of 20k

select ARRAY(SELECT
object_id
  from
public.ims_object_header h
  WHERE
h.last_point_id= ANY(point_ids_older_than_one_hour)
 )
into object_ids_to_be_invalidated; -- this array has a size of 100

--current_last_point_ids will have a size of 100k
current_last_point_ids := ARRAY( SELECT
last_point_id
  from
public.ims_object_header h
 );
-- START OF PERFORMANCE BOTTLENECK
IF(array_length(current_last_point_ids, 1) > 0)
THEN
FOR i IN 0 .. array_upper(current_last_point_ids, 1)
LOOP
point_ids_older_than_one_hour = 
array_remove(point_ids_older_than_one_hour, current_last_point_ids[i]::bigint);
END LOOP;
END IF;
-- END OF PERFORMANCE BOTTLENECK
END;

The array manipulation part is the performance bottleneck. I am pretty sure, 
that there is a better way of doing this, however I couldn't find one.
What I have is two table, lets call them ims_point and ims_object_header. 
ims_object_header references some entries of ims_point in the column 
last_point_id.
Now I want to delete all entries from ims_point, where the timestamp is older 
than one hour. The currently being referenced ids of the table 
ims_object_header should be excluded from this deletion. Therefore I stored the 
ids in arrays and iterate over those arrays to exclude the referenced values 
from being deleted.

However, I not sure if using an array for an operation like this is the best 
approach.

Can anyone give me some advice how this could be enhanced.

Thanks in advance.


I think in this case (as is in many other cases) "pure" SQL does the job much 
better than procedural language:

DELETE FROM public.ims_point as P
WHERE  P.timestamp < m_before_one_hour
 AND NOT EXISTS (SELECT 1 FROM  public.ims_object_header OH
WHERE OH.last_point_id = 
P.object_id);

Is that what you are trying to accomplish?

Regards,
Igor Neyman






Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Friday, June 19, 2015 11:07 AM
To: Ian Pushee; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' 
sort type)



-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ian Pushee
Sent: Friday, June 19, 2015 10:54 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' 
sort type)



On 6/19/2015 10:46 AM, Igor Neyman wrote:
>
> Probably events_confidnce index is not very selective, that's why optimizer 
> prefers seq scan.
> I'd try to create an index on (name, eventspy_id, camera_id, type, status).
>
> Also, the recent 9.2 is 9.2.13, you should upgrade.
>
> Regards,
> Igor Neyman

Hi Igor,

I already have an index for (name, eventspy_id, camera_id, type, status)... 
that is the index being used (apparently silently) when I set 
random_page_cost=1.0.

Thanks,
-Ian


--

Well, having 8GB Ram on the machine you probably should not be using default 
config parameters.
Depending on what else is this machine is being used for, and depending on 
queries you are running, you should definitely modify Postgres config.
If this machine is designated database server, I'd start with the following 
parameters modified from default values:

shared_buffers = 1024MB
temp_buffers = 8MB
work_mem = 64MB 
effective_cache_size = 1024MB
random_page_cost = 2.5
cpu_tuple_cost = 0.03
cpu_index_tuple_cost = 0.05

and see how it goes.

Regards,
Igor Neyman

---

Oops, should be at least:

effective_cache_size = 5120MB

on dedicated server.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ian Pushee
Sent: Friday, June 19, 2015 10:54 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' 
sort type)



On 6/19/2015 10:46 AM, Igor Neyman wrote:
>
> Probably events_confidnce index is not very selective, that's why optimizer 
> prefers seq scan.
> I'd try to create an index on (name, eventspy_id, camera_id, type, status).
>
> Also, the recent 9.2 is 9.2.13, you should upgrade.
>
> Regards,
> Igor Neyman

Hi Igor,

I already have an index for (name, eventspy_id, camera_id, type, status)... 
that is the index being used (apparently silently) when I set 
random_page_cost=1.0.

Thanks,
-Ian


--

Well, having 8GB Ram on the machine you probably should not be using default 
config parameters.
Depending on what else is this machine is being used for, and depending on 
queries you are running, you should definitely modify Postgres config.
If this machine is designated database server, I'd start with the following 
parameters modified from default values:

shared_buffers = 1024MB
temp_buffers = 8MB
work_mem = 64MB 
effective_cache_size = 1024MB
random_page_cost = 2.5
cpu_tuple_cost = 0.03
cpu_index_tuple_cost = 0.05

and see how it goes.

Regards,
Igor Neyman




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ian Pushee
Sent: Friday, June 19, 2015 10:34 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Slow query (planner insisting on using 'external merge' sort 
type)

Hi Folks,

This is my first time posting here, so hopefully I manage to convey all the 
information needed.
We have a simple query that just started giving us problems in production when 
the number of rows gets too large (>100k).
The issue seems to be that the planner wants to sort the rows using a 
sequential scan, rather than the index provided specifically for this query. 
This isn't a problem with low numbers of rows, but eventually the query 
outgrows work_mem and uses the disk, slowing does the query greatly. I know the 
common answer is to increase work_mem... but since this tables growth is 
unpredictable, that isn't a viable strategy.
I've tried increasing shared_buffers and effective_cache_size, but that doesn't 
appear to effect the plan chosen here. Setting
random_page_cost=1.0 works, but I'm hoping for a more general solution that 
doesn't require setting that locally each time I run the query. I guess my real 
question is wether or not there is any way to get the planner to take into 
account the fact that it's going to need to do an 'external merge', and that it 
is going to take a LONG time?

Table and Index Schemas:
CREATE TABLE events
(
   id serial NOT NULL,
   name character varying(64),
   eventspy_id integer NOT NULL,
   camera_id integer NOT NULL,
   start_time timestamp without time zone NOT NULL,
   millisecond smallint NOT NULL,
   uid smallint NOT NULL,
   update_time timestamp without time zone NOT NULL DEFAULT now(),
   length integer NOT NULL,
   objects text NOT NULL,
   priority smallint NOT NULL,
   type character varying(45) NOT NULL DEFAULT 'alarm'::character varying,
   status event_status NOT NULL DEFAULT 'new'::event_status,
   confidence smallint NOT NULL DEFAULT 100::smallint,
   CONSTRAINT events_pkey PRIMARY KEY (id)
)
WITH (
   OIDS=FALSE
);

CREATE INDEX events_confidnce
   ON events
   USING btree
   (confidence);

CREATE INDEX events_summary
   ON events
   USING btree
   (name COLLATE pg_catalog."default", eventspy_id, camera_id, type COLLATE 
pg_catalog."default", status);

Query:
SELECT name, type, eventspy_id, camera_id, status, COUNT(id), MAX(update_time), 
MIN(start_time), MAX(start_time) FROM events WHERE 
confidence>=0 GROUP BY name, eventspy_id, camera_id, type, status;

Explain Analyze outputs (links as requested):
Default plan: http://explain.depesz.com/s/ib3k
Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP

Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM
All pgsql settings are at their defaults.

Thanks for any help you can provide,
-Ian Pushee

---

Probably events_confidnce index is not very selective, that's why optimizer 
prefers seq scan.
I'd try to create an index on (name, eventspy_id, camera_id, type, status).

Also, the recent 9.2 is 9.2.13, you should upgrade.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PGBOUNCER ISSUE PLEASE HELP(Slowing down the site)

2015-06-18 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sheena, Prabhjot
Sent: Thursday, June 18, 2015 3:19 PM
To: k...@rice.edu; Will Platnick
Cc: pgsql-performance@postgresql.org; pgsql-gene...@postgresql.org
Subject: Re: [PERFORM] PGBOUNCER ISSUE PLEASE HELP(Slowing down the site)

Hi Ken/ Will

  I have checked the ulimit value and we are nowhere hitting the max 4096 that 
we have currently set. Is there any other explanation why we should be thinking 
of bumping it to like ulimit -n 5 ( Add ulimit -n 5 to the start of 
whatever you use to start pgbouncer (init script, etc..)) even though we are 
not reaching 4096 max value

Regards
Prabhjot Singh



-Original Message-
From: k...@rice.edu [mailto:k...@rice.edu] 
Sent: Thursday, June 18, 2015 11:10 AM
To: Sheena, Prabhjot
Cc: pgsql-performance@postgresql.org; pgsql-gene...@postgresql.org
Subject: Re: [PERFORM] PGBOUNCER ISSUE PLEASE HELP(Slowing down the site)

On Thu, Jun 18, 2015 at 05:41:01PM +, Sheena, Prabhjot wrote:
> Here is the output of OS limits
> 
> postgres@symds-pg:~ $ ulimit -a
> 
> core file size  (blocks, -c) 0
> data seg size   (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals (-i) 790527
> max locked memory   (kbytes, -l) 32
> max memory size (kbytes, -m) unlimited
> open files  (-n) 4096
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 10240
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 16384
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
> 
> 
> Thanks
> Prabhjot
> 

I would bump your open files as was suggested in your pgbouncer start script.

Regards,
Ken

---

Why are you so sure that it is PgBouncer causing slowness?

You, said, bouncer pool_size is set to 250.  How many cores do you have on your 
db server?

Also, why are you running bouncer on a separate machine?  It is very 
"light-weight", so running it on the db server wouldn't require much additional 
resource, but will eliminate some network traffic that you have with the 
current configuration.

Regards,
Igor Neyman




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to calculate statistics for one column

2015-06-18 Thread Igor Neyman


From: Irineu Ruiz [mailto:iri...@rassystem.com.br]
Sent: Thursday, June 18, 2015 3:10 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to calculate statistics for one column

I din't understood.

In this case, my statistics target should be  approximately 349?
I already try this range but didn't work.

It's only work when I put 900 in my statistics.

There is some kind of formula to calculate a good statistics for a column like 
this?



2015-06-18 15:24 GMT-03:00 Igor Neyman 
mailto:iney...@perceptron.com>>:


From: Irineu Ruiz 
[mailto:iri...@rassystem.com.br<mailto:iri...@rassystem.com.br>]
Sent: Thursday, June 18, 2015 2:18 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] How to calculate statistics for one column

SELECT COUNT(DISTINCT id_camada) FROM … equals
349

And it doesn't change significantly over time.

[]'s

2015-06-18 15:16 GMT-03:00 Igor Neyman 
mailto:iney...@perceptron.com>>:


From: 
pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>
 
[mailto:pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>]
 On Behalf Of Irineu Ruiz
Sent: Thursday, June 18, 2015 1:53 PM
To: pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: [PERFORM] How to calculate statistics for one column

Hi,

I have a table with irregular distribution based in a foreign key, like you can 
see in the end of the e-mail.

Sometimes, in simples joins with another tables with the same id_camada (but 
not the table owner of the foreign key, the planner chooses a seq scan instead 
of use the index with id_camada.
If I do the join using also de table owner of the foreign key, then the index 
is used.

In the first case, querys with seq scan tahe about 30 seconds and with the 
index take about 40 ms.

When I increase the statistics of the column id_camada to 900, then everything 
works using the index in both cases.
My doubt is: there is a way to discovery the best statistics number for this 
column or is a process of trial and error?

id_camada;count(*)
123;10056782
83;311471
42;11316
367;5564
163;3362
257;2100
89;1725
452;1092
157;904
84;883
233;853
271;638
272;620
269;548
270;485
455;437
255;427
32;371
39;320
31;309
411;291
91;260
240;251
162;250
444;247
165;227
36;215
236;193
54;185
53;175
76;170
412;153
159;140
160;139
105;130
59;117
60;117
267;115
238;112
279;111
465;111
5;107
74;103
243;98
35;96
68;82
400;78
391;75
49;74
124;68
73;66
260;64
66;62
168;60
172;56
4;54
44;54
384;53
237;53
390;52
234;52
387;51
378;51
148;50
64;50
379;47
56;46
52;46
377;46
443;46
253;45
97;45
280;43
77;43
2;40
376;39
45;38
235;36
231;36
413;36
241;36
232;34
388;32
101;32
249;32
99;32
100;32
69;32
125;31
166;30
65;29
433;29
149;28
96;27
71;27
98;26
67;26
386;25
50;24
21;24
122;24
47;24
291;22
287;22
404;22
70;22
48;21
63;21
153;18
13;18
46;18
262;18
43;17
72;17
161;17
344;15
29;15
439;14
104;14
119;13
456;12
434;12
55;10
3;10
345;10
286;10
15;10
141;9
169;9
258;9
18;9
158;9
14;8
94;8
463;8
218;8
92;8
170;8
58;7
17;7
19;7
6;7
414;7
10;7
7;7
22;7
90;6
430;6
27;6
195;6
16;6
223;6
11;6
242;6
9;6
26;5
57;5
82;5
451;5
61;5
8;5
445;5
140;5
431;5
197;5
20;5
362;5
24;5
385;4
23;4
25;4
62;4
134;4
150;4
215;4
217;4
219;4
220;4
222;4
224;4
244;4
284;4
318;4
389;4
415;4
449;4
461;4
93;3
209;3
136;3
299;3
188;3
319;3
264;3
95;3
337;3
1;3
221;3
310;3
143;2
320;2
321;2
322;2
324;2
210;2
302;2
438;2
303;2
239;2
330;2
196;2
447;2
332;2
333;2
334;2
307;2
308;2
309;2
340;2
341;2
171;2
190;2
313;2
193;2
154;2
294;2
295;2
250;2
144;2
311;1
312;1
314;1
315;1
316;1
317;1
51;1
323;1
325;1
326;1
327;1
328;1
329;1
331;1
335;1
336;1
338;1
339;1
342;1
343;1
186;1
185;1
354;1
355;1
356;1
357;1
359;1
360;1
361;1
184;1
363;1
364;1
366;1
183;1
369;1
370;1
182;1
181;1
180;1
179;1
380;1
381;1
382;1
383;1
178;1
177;1
176;1
174;1
30;1
173;1
392;1
393;1
155;1
405;1
407;1
409;1
151;1
145;1
12;1
425;1
138;1
135;1
103;1
435;1
437;1
102;1
440;1
441;1
442;1
80;1
448;1
28;1
226;1
227;1
228;1
230;1
225;1
214;1
216;1
213;1
212;1
211;1
208;1
207;1
206;1
78;1
245;1
205;1
204;1
254;1
203;1
202;1
201;1
200;1
199;1
265;1
198;1
268;1
194;1
192;1
273;1
274;1
275;1
278;1
191;1
282;1
75;1
285;1
189;1
288;1
289;1
290;1
187;1
293;1
296;1
297;1
300;1
304;1
305;1
306;1

--

So what’s the result of:

SELECT COUNT(DISTINCT id_camada) FROM …

Does it change significantly over time?

Regards,
Igor Neyman



Then, I’d think that’s approximately your statistics target.

Regards,
Igor Neyman

Well, check if information in pg_stats for your table is correct:

http://www.postgresql.org/docs/9.4/static/view-pg-stats.html

Regards,
Igor Neyman


Re: [PERFORM] How to calculate statistics for one column

2015-06-18 Thread Igor Neyman


From: Irineu Ruiz [mailto:iri...@rassystem.com.br]
Sent: Thursday, June 18, 2015 2:18 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to calculate statistics for one column

SELECT COUNT(DISTINCT id_camada) FROM … equals
349

And it doesn't change significantly over time.

[]'s

2015-06-18 15:16 GMT-03:00 Igor Neyman 
mailto:iney...@perceptron.com>>:


From: 
pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>
 
[mailto:pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>]
 On Behalf Of Irineu Ruiz
Sent: Thursday, June 18, 2015 1:53 PM
To: pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: [PERFORM] How to calculate statistics for one column

Hi,

I have a table with irregular distribution based in a foreign key, like you can 
see in the end of the e-mail.

Sometimes, in simples joins with another tables with the same id_camada (but 
not the table owner of the foreign key, the planner chooses a seq scan instead 
of use the index with id_camada.
If I do the join using also de table owner of the foreign key, then the index 
is used.

In the first case, querys with seq scan tahe about 30 seconds and with the 
index take about 40 ms.

When I increase the statistics of the column id_camada to 900, then everything 
works using the index in both cases.
My doubt is: there is a way to discovery the best statistics number for this 
column or is a process of trial and error?

id_camada;count(*)
123;10056782
83;311471
42;11316
367;5564
163;3362
257;2100
89;1725
452;1092
157;904
84;883
233;853
271;638
272;620
269;548
270;485
455;437
255;427
32;371
39;320
31;309
411;291
91;260
240;251
162;250
444;247
165;227
36;215
236;193
54;185
53;175
76;170
412;153
159;140
160;139
105;130
59;117
60;117
267;115
238;112
279;111
465;111
5;107
74;103
243;98
35;96
68;82
400;78
391;75
49;74
124;68
73;66
260;64
66;62
168;60
172;56
4;54
44;54
384;53
237;53
390;52
234;52
387;51
378;51
148;50
64;50
379;47
56;46
52;46
377;46
443;46
253;45
97;45
280;43
77;43
2;40
376;39
45;38
235;36
231;36
413;36
241;36
232;34
388;32
101;32
249;32
99;32
100;32
69;32
125;31
166;30
65;29
433;29
149;28
96;27
71;27
98;26
67;26
386;25
50;24
21;24
122;24
47;24
291;22
287;22
404;22
70;22
48;21
63;21
153;18
13;18
46;18
262;18
43;17
72;17
161;17
344;15
29;15
439;14
104;14
119;13
456;12
434;12
55;10
3;10
345;10
286;10
15;10
141;9
169;9
258;9
18;9
158;9
14;8
94;8
463;8
218;8
92;8
170;8
58;7
17;7
19;7
6;7
414;7
10;7
7;7
22;7
90;6
430;6
27;6
195;6
16;6
223;6
11;6
242;6
9;6
26;5
57;5
82;5
451;5
61;5
8;5
445;5
140;5
431;5
197;5
20;5
362;5
24;5
385;4
23;4
25;4
62;4
134;4
150;4
215;4
217;4
219;4
220;4
222;4
224;4
244;4
284;4
318;4
389;4
415;4
449;4
461;4
93;3
209;3
136;3
299;3
188;3
319;3
264;3
95;3
337;3
1;3
221;3
310;3
143;2
320;2
321;2
322;2
324;2
210;2
302;2
438;2
303;2
239;2
330;2
196;2
447;2
332;2
333;2
334;2
307;2
308;2
309;2
340;2
341;2
171;2
190;2
313;2
193;2
154;2
294;2
295;2
250;2
144;2
311;1
312;1
314;1
315;1
316;1
317;1
51;1
323;1
325;1
326;1
327;1
328;1
329;1
331;1
335;1
336;1
338;1
339;1
342;1
343;1
186;1
185;1
354;1
355;1
356;1
357;1
359;1
360;1
361;1
184;1
363;1
364;1
366;1
183;1
369;1
370;1
182;1
181;1
180;1
179;1
380;1
381;1
382;1
383;1
178;1
177;1
176;1
174;1
30;1
173;1
392;1
393;1
155;1
405;1
407;1
409;1
151;1
145;1
12;1
425;1
138;1
135;1
103;1
435;1
437;1
102;1
440;1
441;1
442;1
80;1
448;1
28;1
226;1
227;1
228;1
230;1
225;1
214;1
216;1
213;1
212;1
211;1
208;1
207;1
206;1
78;1
245;1
205;1
204;1
254;1
203;1
202;1
201;1
200;1
199;1
265;1
198;1
268;1
194;1
192;1
273;1
274;1
275;1
278;1
191;1
282;1
75;1
285;1
189;1
288;1
289;1
290;1
187;1
293;1
296;1
297;1
300;1
304;1
305;1
306;1

--

So what’s the result of:

SELECT COUNT(DISTINCT id_camada) FROM …

Does it change significantly over time?

Regards,
Igor Neyman



Then, I’d think that’s approximately your statistics target.

Regards,
Igor Neyman


Re: [PERFORM] How to calculate statistics for one column

2015-06-18 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Irineu Ruiz
Sent: Thursday, June 18, 2015 1:53 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] How to calculate statistics for one column

Hi,

I have a table with irregular distribution based in a foreign key, like you can 
see in the end of the e-mail.

Sometimes, in simples joins with another tables with the same id_camada (but 
not the table owner of the foreign key, the planner chooses a seq scan instead 
of use the index with id_camada.
If I do the join using also de table owner of the foreign key, then the index 
is used.

In the first case, querys with seq scan tahe about 30 seconds and with the 
index take about 40 ms.

When I increase the statistics of the column id_camada to 900, then everything 
works using the index in both cases.
My doubt is: there is a way to discovery the best statistics number for this 
column or is a process of trial and error?

id_camada;count(*)
123;10056782
83;311471
42;11316
367;5564
163;3362
257;2100
89;1725
452;1092
157;904
84;883
233;853
271;638
272;620
269;548
270;485
455;437
255;427
32;371
39;320
31;309
411;291
91;260
240;251
162;250
444;247
165;227
36;215
236;193
54;185
53;175
76;170
412;153
159;140
160;139
105;130
59;117
60;117
267;115
238;112
279;111
465;111
5;107
74;103
243;98
35;96
68;82
400;78
391;75
49;74
124;68
73;66
260;64
66;62
168;60
172;56
4;54
44;54
384;53
237;53
390;52
234;52
387;51
378;51
148;50
64;50
379;47
56;46
52;46
377;46
443;46
253;45
97;45
280;43
77;43
2;40
376;39
45;38
235;36
231;36
413;36
241;36
232;34
388;32
101;32
249;32
99;32
100;32
69;32
125;31
166;30
65;29
433;29
149;28
96;27
71;27
98;26
67;26
386;25
50;24
21;24
122;24
47;24
291;22
287;22
404;22
70;22
48;21
63;21
153;18
13;18
46;18
262;18
43;17
72;17
161;17
344;15
29;15
439;14
104;14
119;13
456;12
434;12
55;10
3;10
345;10
286;10
15;10
141;9
169;9
258;9
18;9
158;9
14;8
94;8
463;8
218;8
92;8
170;8
58;7
17;7
19;7
6;7
414;7
10;7
7;7
22;7
90;6
430;6
27;6
195;6
16;6
223;6
11;6
242;6
9;6
26;5
57;5
82;5
451;5
61;5
8;5
445;5
140;5
431;5
197;5
20;5
362;5
24;5
385;4
23;4
25;4
62;4
134;4
150;4
215;4
217;4
219;4
220;4
222;4
224;4
244;4
284;4
318;4
389;4
415;4
449;4
461;4
93;3
209;3
136;3
299;3
188;3
319;3
264;3
95;3
337;3
1;3
221;3
310;3
143;2
320;2
321;2
322;2
324;2
210;2
302;2
438;2
303;2
239;2
330;2
196;2
447;2
332;2
333;2
334;2
307;2
308;2
309;2
340;2
341;2
171;2
190;2
313;2
193;2
154;2
294;2
295;2
250;2
144;2
311;1
312;1
314;1
315;1
316;1
317;1
51;1
323;1
325;1
326;1
327;1
328;1
329;1
331;1
335;1
336;1
338;1
339;1
342;1
343;1
186;1
185;1
354;1
355;1
356;1
357;1
359;1
360;1
361;1
184;1
363;1
364;1
366;1
183;1
369;1
370;1
182;1
181;1
180;1
179;1
380;1
381;1
382;1
383;1
178;1
177;1
176;1
174;1
30;1
173;1
392;1
393;1
155;1
405;1
407;1
409;1
151;1
145;1
12;1
425;1
138;1
135;1
103;1
435;1
437;1
102;1
440;1
441;1
442;1
80;1
448;1
28;1
226;1
227;1
228;1
230;1
225;1
214;1
216;1
213;1
212;1
211;1
208;1
207;1
206;1
78;1
245;1
205;1
204;1
254;1
203;1
202;1
201;1
200;1
199;1
265;1
198;1
268;1
194;1
192;1
273;1
274;1
275;1
278;1
191;1
282;1
75;1
285;1
189;1
288;1
289;1
290;1
187;1
293;1
296;1
297;1
300;1
304;1
305;1
306;1

--

So what’s the result of:

SELECT COUNT(DISTINCT id_camada) FROM …

Does it change significantly over time?

Regards,
Igor Neyman



[PERFORM] Re: Query running slow for only one specific id. (Postgres 9.3) version

2015-06-05 Thread Igor Neyman


From: Sheena, Prabhjot [mailto:prabhjot.si...@classmates.com]
Sent: Friday, June 05, 2015 2:38 PM
To: Igor Neyman; pgsql-gene...@postgresql.org; pgsql-performance@postgresql.org
Subject: RE: Query running slow for only one specific id. (Postgres 9.3) version

When I run vacuum analyze it fixes the problem but after 1 or 2 days the 
problem comes back

Here is the table structure

  Column  |Type |  
Modifiers   | Storage | Stats target | Description
--+-+--+-+--+-
response_id  | integer | not null default 
nextval('btdt_responses_response_id_seq'::regclass) | plain   |  |
registration_id  | bigint  | not null   
  | plain   |  |
btdt_id  | integer | not null   
  | plain   |  |
response | integer | not null   
  | plain   |  |
creation_date| timestamp without time zone | not null default now() 
  | plain   |  |
last_update_date | timestamp without time zone | not null default now() 
  | plain   |  |
Indexes:
"btdt_responses_pkey" PRIMARY KEY, btree (response_id)
"btdt_responses_u2" UNIQUE, btree (registration_id, btdt_id)
"btdt_responses_n1" btree (btdt_id)
"btdt_responses_n2" btree (btdt_id, response)
"btdt_responses_n4" btree (creation_date)
"btdt_responses_n5" btree (last_update_date)
"btdt_responses_n6" btree (btdt_id, last_update_date)
Foreign-key constraints:
"btdt_responses_btdt_id_fkey" FOREIGN KEY (btdt_id) REFERENCES 
btdt_items(btdt_id)
"btdt_responses_fk1" FOREIGN KEY (btdt_id) REFERENCES btdt_items(btdt_id)
Has OIDs: no
Options: autovacuum_enabled=true, autovacuum_vacuum_scale_factor=0.02, 
autovacuum_analyze_scale_factor=0.02

Thanks

From: Igor Neyman [mailto:iney...@perceptron.com]
Sent: Friday, June 5, 2015 11:06 AM
To: Sheena, Prabhjot; 
pgsql-gene...@postgresql.org<mailto:pgsql-gene...@postgresql.org>; 
pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: RE: Query running slow for only one specific id. (Postgres 9.3) version



From: 
pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sheena, Prabhjot
Sent: Friday, June 05, 2015 1:55 PM
To: pgsql-gene...@postgresql.org<mailto:pgsql-gene...@postgresql.org>; 
pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: [PERFORM] Query running slow for only one specific id. (Postgres 9.3) 
version

Postgresql 9.3 Version

Guys
  Here  is the issue that I'm facing for couple of weeks now. I have 
table (size  7GB)

If I run this query with this specific registration id it is using the wrong 
execution plan and takes more than a minute to complete. Total number of rows 
for this registration_id is only 414 in this table

explain analyze SELECT max(last_update_date) AS last_update_date FROM 
btdt_responses WHERE registration_id = 8718704208 AND response != 4;

QUERY PLAN
--
Result  (cost=2902.98..2903.01 rows=1 width=0) (actual 
time=86910.730..86910.731 rows=1 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.57..2902.98 rows=1 width=8) (actual 
time=86910.725..86910.725 rows=1 loops=1)
   ->  Index Scan Backward using btdt_responses_n5 on btdt_responses  
(cost=0.57..6425932.41 rows=2214 width=8) (actual time=86910.723..86910.723 
rows=1 loops=1)
 Index Cond: (last_update_date IS NOT NULL)
 Filter: ((response <> 4) AND (registration_id = 
8718704208::bigint))
 Rows Removed by Filter: 52145434
Total runtime: 86910.766 ms


Same query with any other registration id will come back in milli seconds



explain analyze SELECT max(last_update_date) AS last_update_date FROM 
btdt_responses WHERE registration_id = 8688546267 AND response != 4;
  QUERY PLAN
-

[PERFORM] Re: Query running slow for only one specific id. (Postgres 9.3) version

2015-06-05 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sheena, Prabhjot
Sent: Friday, June 05, 2015 1:55 PM
To: pgsql-gene...@postgresql.org; pgsql-performance@postgresql.org
Subject: [PERFORM] Query running slow for only one specific id. (Postgres 9.3) 
version

Postgresql 9.3 Version

Guys
  Here  is the issue that I'm facing for couple of weeks now. I have 
table (size  7GB)

If I run this query with this specific registration id it is using the wrong 
execution plan and takes more than a minute to complete. Total number of rows 
for this registration_id is only 414 in this table

explain analyze SELECT max(last_update_date) AS last_update_date FROM 
btdt_responses WHERE registration_id = 8718704208 AND response != 4;

QUERY PLAN
--
Result  (cost=2902.98..2903.01 rows=1 width=0) (actual 
time=86910.730..86910.731 rows=1 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.57..2902.98 rows=1 width=8) (actual 
time=86910.725..86910.725 rows=1 loops=1)
   ->  Index Scan Backward using btdt_responses_n5 on btdt_responses  
(cost=0.57..6425932.41 rows=2214 width=8) (actual time=86910.723..86910.723 
rows=1 loops=1)
 Index Cond: (last_update_date IS NOT NULL)
 Filter: ((response <> 4) AND (registration_id = 
8718704208::bigint))
 Rows Removed by Filter: 52145434
Total runtime: 86910.766 ms


Same query with any other registration id will come back in milli seconds



explain analyze SELECT max(last_update_date) AS last_update_date FROM 
btdt_responses WHERE registration_id = 8688546267 AND response != 4;
  QUERY PLAN
--
Aggregate  (cost=529.75..529.78 rows=1 width=8) (actual time=19.723..19.723 
rows=1 loops=1)
   ->  Index Scan using btdt_responses_u2 on btdt_responses  (cost=0.57..529.45 
rows=119 width=8) (actual time=0.097..19.689 rows=72 loops=1)
 Index Cond: (registration_id = 8688546267::bigint)
 Filter: (response <> 4)
 Rows Removed by Filter: 22
Total runtime: 19.769 ms


Please let me know what I can do to fix this issue.


Thanks


Not enough info.
Table structure? Is registration_id - PK?  If not, what is the distribution of 
the values for this table?
When was it analyzed last time?  M.b. you need to increase statistics target 
for this table:

Index Scan Backward using btdt_responses_n5 on btdt_responses  
(cost=0.57..6425932.41 rows=2214 width=8) (actual time=86910.723..86910.723 
rows=1 loops=1)

It expects 2214 records while really getting only 1.

Regards,
Igor Neyman





Re: [PERFORM] Poor performance when deleting from entity-attribute-value type master-table

2015-02-09 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Andreas Joseph 
Krogh
Sent: Monday, February 09, 2015 4:13 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Poor performance when deleting from entity-attribute-value 
type master-table

Hi all.

Using PG-9.4.0 I'm seeing this trying to delete from an "entity"-master table:

*# explain analyze delete from onp_crm_entity where entity_id IN (select 
tmp.delivery_id from temp_delete_delivery_id tmp);
QUERY PLAN
---
 Delete on onp_crm_entity  (cost=0.43..5673.40 rows=1770 width=12) (actual 
time=7.370..7.370 rows=0 loops=1)
   ->  Nested Loop  (cost=0.43..5673.40 rows=1770 width=12) (actual 
time=0.050..1.374 rows=108 loops=1)
 ->  Seq Scan on temp_delete_delivery_id tmp  (cost=0.00..27.70 
rows=1770 width=14) (actual time=0.014..0.080 rows=108 loops=1)
 ->  Index Scan using onp_crm_entity_pkey on onp_crm_entity  
(cost=0.43..3.18 rows=1 width=14) (actual time=0.010..0.011 rows=1 loops=108)
   Index Cond: (entity_id = tmp.delivery_id)
 Planning time: 0.314 ms
 Trigger for constraint onp_crm_activity_entity_id_fkey: time=4.141 calls=108
 Trigger for constraint ...
 Trigger for constraint ...
 Trigger for constraint ...


I have lots of tables referencing onp_crm_entity(entity_id) so I expect the 
poor performance of deleting from it is caused by all the triggers firing to 
check FKI-constraints.


Andreas, do you have indexes on FK columns in child tables?
If not – there is your problem.

Regards,
Igor Neyman



Re: [PERFORM] Copy command Faster than original select

2015-02-06 Thread Igor Neyman
I think, it is the difference between writing 43602 records into the file and 
displaying 43602 records on screen.
If you wrap up your select into select count(a.*) from your select, e.g.:

Select count(a.*) from (select ... from mytable join .. join ... order by ) 
as a;

This will exclude time to display all these rows, so you'll get the same (or 
better) performance as with "copy" into text file, which will prove this theory.

Regards,
Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of belal
Sent: Friday, February 06, 2015 3:31 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Copy command Faster than original select

I made complex select using PGAdmin III Query Editor, Postgre server 9.3


select ... from mytable join .. join ... order by 

I get [Total query runtime: 8841 ms. 43602 rows retrieved.]

but when I use 

copy ([same above select]) to '/x.txt' 
I get [Query returned successfully: 43602 rows affected, 683 ms execution time.]

these test made on the same machine as the postgresql server.


can anyone explain huge difference in executing time?

best regards all 



--
View this message in context: 
http://postgresql.nabble.com/Copy-command-Faster-than-original-select-tp5836886.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] extremly bad select performance on huge table

2014-10-22 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Björn Wittich
Sent: Wednesday, October 22, 2014 1:06 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] extremly bad select performance on huge table

Hi Igor,

that was also my assumption, but unfortunately this isn't true.
I am using the explain analyze.

Example which is fast "explain analyze select value from smallertable inner 
join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn"

130 - 140 sec

Example which is fast "explain analyze select value,c1 from smallertable inner 
join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn"


does not complete after several hours although the c1 coulmn should only be 
relevant for retrieval.

Comparing the explain comparison of both statements gave me a hint:

adding the c1 column changes the query planner to make a sequential scan on 
myhugetable as well as on smallertable. This is much slower.

When I set enable_seqscan=false the queryplanner shows the same query plan for 
both statements but the statement including the c1 column does not complete 
after several hours.

How can this be explained?

I do not want the db server to prepare the whole query result at once, my 
intention is that the asynchronous retrieval starts as fast as possible.

Thanks
Björn




>
> -Original Message-
> From: pgsql-performance-ow...@postgresql.org 
> [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Björn 
> Wittich
> Sent: Tuesday, October 21, 2014 3:32 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] extremly bad select performance on huge table
>
> Hi Tom and Igor,
>
> thanks for your help. With the reindex the select query running time was 
> reduced from 5200 sec to 130 sec. Impressive!
>
> Even a join on this table is now fast.
>
> Unfortunately, there is now another problem: The table in my example 
> has
> 500 columns which I want to retrieve with my join command.
>
> Example which is fast "select value from smallertable inner join myhugetable 
> ON smallertable.mycolumn = myhugetable.mycolumn"
>
> Example which is slow "select value,c1,c2,c3,...,c10 from smallertable inner 
> join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn"
>
>
> Which is the number of columns to fetch so bad ? Which action is done 
> in the db system when querying this via pgadmin? I think that there is 
> no real retrieval included, why is the number of additional columns so 
> bad for the join performance?
>
>> =?ISO-8859-15?Q?Bj=F6rn_Wittich?=  writes:
>>> Here is the explain (analyze,buffers) select mycolumn from 
>>> myhugetable "Index Only Scan using myprimkey on myhugetable  
>>> (cost=0.00..822.82
>>> rows=71768080 width=33) (actual time=16.722..2456300.778 
>>> rows=71825999 loops=1)"
>>> "  Heap Fetches: 356861"
>>> "  Buffers: shared hit=71799472 read=613813"
>>> "Total runtime: 2503009.611 ms"
>> So that works out to about 4 msec per page fetched considering only 
>> I/O costs, which is about as good as you're likely to get if the data 
>> is sitting on spinning rust.
>>
>> You could potentially make it faster with a VACUUM (to mark all pages 
>> all-visible and eliminate the "heap fetches" costs), or a REINDEX (so 
>> that the index scan becomes more nearly sequential instead of random 
>> access).  However, unless the data is nearly static those will just 
>> be temporary fixes: the time will degrade again as you update the table.
>>
>>> Note: This select is just for testing. My final statement will be a 
>>> join on this table via the "mycolumn" column.
>> In that case it's probably a waste of time to worry about the 
>> performance of this query as such.  In the first place, a join is not 
>> likely to use the index at all unless it's fetching a relatively 
>> small number of rows, and in the second place it seems unlikely that 
>> the join query can use an IndexOnlyScan on this index --- I imagine 
>> that the purpose of the join will require fetching additional columns.
>>
>>  regards, tom lane
>>
>>
> Björn,
>
> I think, the timing difference you see between 2 queries is caused by 
> delivering to the front-end (PgAdmin) and displaying all additional columns 
> that you include in the second query (much bigger amount of data to pass from 
> the db to the client).
> Pretty sure, if you do explain analyze on both queries, you'll see the same 
> timing, because it'll reflect o

Re: [PERFORM] Query with large number of joins

2014-10-21 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Marco Di Cesare
Sent: Tuesday, October 21, 2014 4:03 PM
To: Andrew Dunstan; Merlin Moncure
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query with large number of joins


On 10/21/2014 12:31 PM, Andrew Dunstan wrote:
> Please don't top-post on the PostgreSQL lists. See 
> <http://idallen.com/topposting.html>

Oops, sorry.

>Have you tried a) either turning off geqo or setting geqo_threshold 
>fairly high b) setting join_collapse_limit fairly high (assuming all the above 
>join targets are tables and not views, setting it to something like 25 should 
>do the trick.

I did try various combinations of these settings but none yielded any 
significant query run time improvements. 

> You also haven't told us what settings you have for things like 
> effective_cache_size, which can dramatically affect query plans.

effective_cache_size = 4096MB

I tried bumping this up as well but again no significant query run time 
improvements.  



Marco,

Didn't you mention, that you have something like 48GB RAM?
In this case (if that's dedicated db server), you should try and set 
effective_cache_size around 40GB (not 4GB).

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] extremly bad select performance on huge table

2014-10-21 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Björn Wittich
Sent: Tuesday, October 21, 2014 3:32 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] extremly bad select performance on huge table

Hi Tom and Igor,

thanks for your help. With the reindex the select query running time was 
reduced from 5200 sec to 130 sec. Impressive!

Even a join on this table is now fast.

Unfortunately, there is now another problem: The table in my example has
500 columns which I want to retrieve with my join command.

Example which is fast "select value from smallertable inner join myhugetable ON 
smallertable.mycolumn = myhugetable.mycolumn"

Example which is slow "select value,c1,c2,c3,...,c10 from smallertable inner 
join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn"


Which is the number of columns to fetch so bad ? Which action is done in 
the db system when querying this via pgadmin? I think that there is no 
real retrieval included, why is the number of additional columns so bad 
for the join performance?

> =?ISO-8859-15?Q?Bj=F6rn_Wittich?=  writes:
>> Here is the explain (analyze,buffers) select mycolumn from myhugetable
>> "Index Only Scan using myprimkey on myhugetable  (cost=0.00..822.82
>> rows=71768080 width=33) (actual time=16.722..2456300.778 rows=71825999
>> loops=1)"
>> "  Heap Fetches: 356861"
>> "  Buffers: shared hit=71799472 read=613813"
>> "Total runtime: 2503009.611 ms"
> So that works out to about 4 msec per page fetched considering only I/O
> costs, which is about as good as you're likely to get if the data is
> sitting on spinning rust.
>
> You could potentially make it faster with a VACUUM (to mark all pages
> all-visible and eliminate the "heap fetches" costs), or a REINDEX
> (so that the index scan becomes more nearly sequential instead of random
> access).  However, unless the data is nearly static those will just be
> temporary fixes: the time will degrade again as you update the table.
>
>> Note: This select is just for testing. My final statement will be a join
>> on this table via the "mycolumn" column.
> In that case it's probably a waste of time to worry about the performance
> of this query as such.  In the first place, a join is not likely to use
> the index at all unless it's fetching a relatively small number of rows,
> and in the second place it seems unlikely that the join query can use
> an IndexOnlyScan on this index --- I imagine that the purpose of the join
> will require fetching additional columns.
>
>   regards, tom lane
>
>

Björn,

I think, the timing difference you see between 2 queries is caused by 
delivering to the front-end (PgAdmin) and displaying all additional columns 
that you include in the second query (much bigger amount of data to pass from 
the db to the client).
Pretty sure, if you do explain analyze on both queries, you'll see the same 
timing, because it'll reflect only db time without what's spent on delivering 
data to the client.

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] extremly bad select performance on huge table

2014-10-21 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Björn Wittich
Sent: Tuesday, October 21, 2014 1:35 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] extremly bad select performance on huge table

Sorry forget to copy the buffer information:

"  Heap Fetches: 356861"

"  Buffers: shared hit=71799472 read=613813"




> Hi newsgroup,
>
> I have a very huge table (70 mio rows ) with a key (text length about
> 30 characters each key). A select on this indexed column "myprimkey" 
> (index on column mycolumn)  took more than 30 mins.
>
> Here is the explain (analyze,buffers) select mycolumn from myhugetable
>
> "Index Only Scan using myprimkey on myhugetable (cost=0.00..822.82
> rows=71768080 width=33) (actual time=16.722..2456300.778 rows=71825999 
> loops=1)"
>
> "  Heap Fetches: 356861"
>
> "Total runtime: 2503009.611 ms"
>
>
> Even repeating the query does not show a performance improvement. I 
> assume that the index itself is too large for my db cache. What can I 
> do to gain performance? Which parameters can I adapt? Having a huge 
> Linux machine with 72 GB RAM.
>
> Note: This select is just for testing. My final statement will be a 
> join on this table via the "mycolumn" column.
>
> Thanks for your help
> Björn
>
>
>
>

Did you check the bloat in your myprimkey index?

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions and work_mem?

2014-10-15 Thread Igor Neyman


From: Dave Johansen [mailto:davejohan...@gmail.com]
Sent: Wednesday, October 15, 2014 4:49 PM
To: Igor Neyman
Cc: Josh Berkus; pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 1:36 PM, Igor Neyman 
mailto:iney...@perceptron.com>> wrote:


From: Dave Johansen 
[mailto:davejohan...@gmail.com<mailto:davejohan...@gmail.com>]
Sent: Wednesday, October 15, 2014 4:20 PM
To: Igor Neyman
Cc: Josh Berkus; pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 1:08 PM, Igor Neyman 
mailto:iney...@perceptron.com>> wrote:


From: 
pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>
 
[mailto:pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>]
 On Behalf Of Dave Johansen
Sent: Wednesday, October 15, 2014 4:05 PM
To: Josh Berkus
Cc: pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 10:10 AM, Josh Berkus 
mailto:j...@agliodbs.com>> wrote:
On 10/14/2014 10:08 AM, Dave Johansen wrote:
> I'm running Postgres 8.4 on RHEL 6 64-bit and I had a question about how
> work_mem and partitions interact.
>
> https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#work_mem
> The above wiki states that "if a query involves doing merge sorts of 8
> tables, that requires 8 times work_mem." If I have a table that is
> partitioned does each partition count as a "table" and get its on work_mem?

In theory, this could happen.  In practice, based on tests I did at Sun
with DBT3 and 8.3, no backend ever used more than 3X work_mem.  This is
partly because the level of parallelism in postgres is extremely
limited, so we can't actually sort 8 partitions at the same time.

Thanks for the feedback. That's very helpful.

BTW, 8.4 is EOL.  Maybe time to upgrade?

RHEL 6 isn't EOLed and we're working on moving to RHEL 7 but it's a slow 
process that will probably take quite a bit of time, if it ever happens.


Postgres 8.4 is EOL (RHEL).

Sorry I don't understand what you mean by that. My understanding is that RedHat 
maintains fixes for security and other major issues for packages that have been 
EOLed. Are you implying that that's not the case? Or something else?

I don’t think that RedHat can maintain Postgres version which was EOLed.
Postgres 8.4 is not supported by PostgreSQL community.

This conversation has probably become a bit off topic, but my understanding is 
that what you're paying RedHat for is a stable platform for a long period of 
time. That means creating/backporting of fixes for security and other critical 
issues for packages that have been EOLed.
Assuming the above is true, (which I beleve to be the case 
https://access.redhat.com/support/policy/updates/errata ), I don't see what 
would prevent RedHat from making a patch and applying it to the latest 8.4 
release to resolve any newly discovered issues. Isn't that the whole point of 
open source and RedHat being able to do with the code what it wishes as long as 
it meets the requirements of the license? So are you claiming that RedHat 
doesn't/won't do this? Is incapable of doing this? Or am I missing something?


Tom Lane is probably better authority on this issue.
Let’s wait and see what he says.




Re: [PERFORM] Partitions and work_mem?

2014-10-15 Thread Igor Neyman


From: Dave Johansen [mailto:davejohan...@gmail.com]
Sent: Wednesday, October 15, 2014 4:20 PM
To: Igor Neyman
Cc: Josh Berkus; pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 1:08 PM, Igor Neyman 
mailto:iney...@perceptron.com>> wrote:


From: 
pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>
 
[mailto:pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>]
 On Behalf Of Dave Johansen
Sent: Wednesday, October 15, 2014 4:05 PM
To: Josh Berkus
Cc: pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 10:10 AM, Josh Berkus 
mailto:j...@agliodbs.com>> wrote:
On 10/14/2014 10:08 AM, Dave Johansen wrote:
> I'm running Postgres 8.4 on RHEL 6 64-bit and I had a question about how
> work_mem and partitions interact.
>
> https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#work_mem
> The above wiki states that "if a query involves doing merge sorts of 8
> tables, that requires 8 times work_mem." If I have a table that is
> partitioned does each partition count as a "table" and get its on work_mem?

In theory, this could happen.  In practice, based on tests I did at Sun
with DBT3 and 8.3, no backend ever used more than 3X work_mem.  This is
partly because the level of parallelism in postgres is extremely
limited, so we can't actually sort 8 partitions at the same time.

Thanks for the feedback. That's very helpful.

BTW, 8.4 is EOL.  Maybe time to upgrade?

RHEL 6 isn't EOLed and we're working on moving to RHEL 7 but it's a slow 
process that will probably take quite a bit of time, if it ever happens.


Postgres 8.4 is EOL (RHEL).

Sorry I don't understand what you mean by that. My understanding is that RedHat 
maintains fixes for security and other major issues for packages that have been 
EOLed. Are you implying that that's not the case? Or something else?

I don’t think that RedHat can maintain Postgres version which was EOLed.
Postgres 8.4 is not supported by PostgreSQL community.

Igor Neyman




Re: [PERFORM] Partitions and work_mem?

2014-10-15 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Dave Johansen
Sent: Wednesday, October 15, 2014 4:05 PM
To: Josh Berkus
Cc: pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 10:10 AM, Josh Berkus 
mailto:j...@agliodbs.com>> wrote:
On 10/14/2014 10:08 AM, Dave Johansen wrote:
> I'm running Postgres 8.4 on RHEL 6 64-bit and I had a question about how
> work_mem and partitions interact.
>
> https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#work_mem
> The above wiki states that "if a query involves doing merge sorts of 8
> tables, that requires 8 times work_mem." If I have a table that is
> partitioned does each partition count as a "table" and get its on work_mem?

In theory, this could happen.  In practice, based on tests I did at Sun
with DBT3 and 8.3, no backend ever used more than 3X work_mem.  This is
partly because the level of parallelism in postgres is extremely
limited, so we can't actually sort 8 partitions at the same time.

Thanks for the feedback. That's very helpful.

BTW, 8.4 is EOL.  Maybe time to upgrade?

RHEL 6 isn't EOLed and we're working on moving to RHEL 7 but it's a slow 
process that will probably take quite a bit of time, if it ever happens.


Postgres 8.4 is EOL (RHEL).

Igor Neyman


Re: [PERFORM] Seqscan on big table, when an Index-Usage should be possible

2014-06-05 Thread Igor Neyman
 ("timestamp")
> > Fremdschlüssel-Constraints:
> > "fk_equipmentid" FOREIGN KEY (equipmentid) REFERENCES
> equipment(equipmentid) ON UPDATE CASCADE ON DELETE RESTRICT
> > "fk_mc_selectionid" FOREIGN KEY (mc_selectionid) REFERENCES
> mc_selection(mc_selectionid) ON UPDATE CASCADE ON DELETE SET NULL
> > "fk_ordermaterialsid" FOREIGN KEY (ordermaterialsid) REFERENCES
> ordermaterials(ordermaterialsid) ON UPDATE CASCADE ON DELETE RESTRICT
> > "fk_testerid" FOREIGN KEY (testerid) REFERENCES tester(testerid)
> > ON UPDATE CASCADE ON DELETE RESTRICT
> >
> > pd=> \d measurementstype
> > Tabelle 
> > "public.measurementstype"
> >Spalte   |  Typ   |  
> >Attribute
> > ++
> > --------++
> > ++
> > ++---
> >  measurementstypeid | integer| not null Vorgabewert
> nextval('measurementstype_measurementstypeid_seq'::regclass)
> >  datatype   | character varying(20)  | not null Vorgabewert
> 'char'::character varying
> >  name   | character varying(255) | not null
> >  description| character varying(255) | Vorgabewert NULL::character
> varying
> >  unit   | character varying(20)  | Vorgabewert NULL::character 
> > varying
> >  step   | integer|
> >  stepdescription| character varying(255) | Vorgabewert NULL::character
> varying
> >  permissionlevel| integer| not null Vorgabewert 0
> > Indexe:
> > "measurementstype_pkey" PRIMARY KEY, btree (measurementstypeid)
> > "measurementstype_datatype" btree (datatype)
> > "measurementstype_name" btree (name)
> > "measurementstype_step" btree (step)
> > "measurementstype_stepdescription" btree (stepdescription)
> >
> 

You don't tell: 
- what kind of hardware (specifically, how much RAM) you are using
- what are your config settings: shared_buffers, work_mem, 
effective_cache_size

All this affects planner decisions, when choosing one (or another) execution 
path/plan.

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible performance regression in PostgreSQL 9.2/9.3?

2014-06-05 Thread Igor Neyman
should
> > be considered a 'last resort' fix and should generally not be changed
> > in postgresql.conf.  If i were in your shoes, I'd be breaking the
> > query down and figuring out where it goes off the rails.   Best case
> > scenario, you have a simplified, test case reproducible reduction of
> > the problem that can help direct changes to the planner.  In lieu of
> > that, I'd look at this as a special case optimization of problem
> > queries.
> >
> > There is something else to try.  Can you (temporarily) raise
> > join_collapse_limit higher (to, say 20), and see if you get a better
> > plan (with and without other planner adjustments)?
> >
> > merlin
> >
> >
> 
> This is the plan with join_collapse_limit=20, enable_nestloop=false,
> enable_material=false:
> http://explain.depesz.com/s/PpL
> 
> The plan with join_collapse_limit=20 but nestloops and enable_material true
> is taking too much time, seems to have the same problem as with
> join_collapse_limit=8.
> 
> I will try to create a simpler reproducible example, thank you.
> 
> Regards,
> Miguel Angel.
> 
> 
> 
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Usually, when I increase join_collapse_limit, I also increase 
from_collaps_limit (to the same value).

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Issue with query scanning through all data even with indexes

2014-01-16 Thread Igor Neyman
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Kai Sellgren
Sent: Thursday, January 09, 2014 4:37 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Issue with query scanning through all data even with indexes

Hi,

I have a query that has each field used in conditions + sort indexed, but it 
scans through all data.

The query in question looks like:

http://pastie.org/8618562

I have each of those condition fields indexed:

NewsArticle.groupId
NewsArticle.sharedToCommunityIds
NewsArticle.sourceFilterIds
CommunityGroupLink.communityId
CommunityGroupLink.groupId
SourceFilter.groupId
SourceFilter.communityId

This is the data output for explain http://d.pr/i/VGT3

And in visual http://d.pr/i/mqiN

Line 7 says rows=99173 which makes it real slow (it can take up to a minute to 
run).

Do you have any ideas? All of them are appreciated!

Cheers,

--
Yours sincerely,
Kai Sellgren


Could you try to move WHERE clause conditions into JOIN conditions, something 
like this:

SELECT "NewsArticle"."id"
FROM "NewsArticle"
LEFT JOIN "CommunityGroupLink" ON "CommunityGroupLink"."communityId" = 1538 AND 
("CommunityGroupLink"."groupId" = "NewsArticle"."groupId")
  AND((1538 = ANY ("NewsArticle"."sharedToCommunityIds") OR 
("CommunityGroupLink"."id" IS NOT NULL)))
LEFT JOIN "SourceFilter" ON "SourceFilter"."communityId" = 1538 AND 
"SourceFilter"."groupId" = "NewsArticle"."groupId"
  AND(("SourceFilter"."id" IS NULL OR "SourceFilter"."id" = 
ANY("NewsArticle"."sourceFilterIds")));


Not sure what you do with "LIMIT 35" - it's not shown in "explain" plan.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

2013-11-05 Thread Igor Neyman


From: Caio Casimiro [mailto:casimiro.lis...@gmail.com]
Sent: Monday, November 04, 2013 4:33 PM
To: Igor Neyman
Cc: Jeff Janes; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

These are the parameters I have set in postgresql.conf:

work_mem = 128MB
shared_buffers = 1GB
maintenance_work_mem = 1536MB
fsync = off
synchronous_commit = off
effective_cache_size = 2GB

The hardware is a modest one:
CPU: Intel(R) Atom(TM) CPU  230   @ 1.60GHz
RAM: 2GB
HD: 1TV 7200 RPM (WDC WD10EZEX-00RKKA0)

This machine runs a slackware 14.0 dedicated to the Postgresql.

Thank you,
Caio
With just 2GB RAM, this:

shared_buffers = 1GB

and this:

effective_cache_size = 2GB

is too high.

You should lower those:

shared_buffers = 256MB
effective_cache_size = 1GB

and see how your execution plan changes.

Oh, and this:
maintenance_work_mem = 1536MB

is also too high.
Turning off fsync and synchronous_commit is not very good idea.

Regards,
Igor Neyman



Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

2013-11-04 Thread Igor Neyman
From: Caio Casimiro [mailto:casimiro.lis...@gmail.com] 
Sent: Monday, November 04, 2013 4:10 PM
To: Igor Neyman
Cc: Jeff Janes; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

Hi Neyman, thank you for your answer.
Unfortunately this query runs almost at the same time:

Sort  (cost=4877693.98..4877702.60 rows=3449 width=20) (actual 
time=25820.291..25821.845 rows=1640 loops=1)
  Sort Key: tt.tweet_id
  Sort Method: quicksort  Memory: 97kB
  Buffers: shared hit=1849 read=32788
  ->  Nested Loop  (cost=247.58..4877491.32 rows=3449 width=20) (actual 
time=486.839..25814.120 rows=1640 loops=1)
        Buffers: shared hit=1849 read=32788
        ->  Hash Semi Join  (cost=229.62..88553.23 rows=1681 width=8) (actual 
time=431.654..13209.159 rows=597 loops=1)
              Hash Cond: (t.user_id = relationship.followed_id)
              Buffers: shared hit=3 read=31870
              ->  Index Scan using tweet_creation_time_index on tweet t  
(cost=0.57..83308.25 rows=1781234 width=16) (actual time=130.144..10037.764 
rows=1759645 loops=1)
                    Index Cond: ((creation_time >= '2013-05-05 
00:00:00-03'::timestamp with time zone) AND (creation_time <= '2013-05-06 
00:00:00-03'::timestamp with time zone))
                    Buffers: shared hit=1 read=31867
              ->  Hash  (cost=227.12..227.12 rows=154 width=8) (actual 
time=94.365..94.365 rows=106 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 3kB
                    Buffers: shared hit=2 read=3
                    ->  Index Only Scan using relationship_id on relationship  
(cost=0.42..227.12 rows=154 width=8) (actual time=74.540..94.101 rows=106 
loops=1)
                          Index Cond: (follower_id = 335093362)
                          Heap Fetches: 0
                          Buffers: shared hit=2 read=3
        ->  Bitmap Heap Scan on tweet_topic tt  (cost=17.96..2841.63 rows=723 
width=20) (actual time=21.014..21.085 rows=3 loops=597)
              Recheck Cond: (tweet_id = t.id)
              Buffers: shared hit=1846 read=918
              ->  Bitmap Index Scan on tweet_topic_pk  (cost=0.00..17.78 
rows=723 width=0) (actual time=15.012..15.012 rows=3 loops=597)
                    Index Cond: (tweet_id = t.id)
                    Buffers: shared hit=1763 read=632
Total runtime: 25823.386 ms

I have noticed that in both queries the index scan on tweet_creation_time_index 
is very expensive. Is there anything I can do to make the planner choose a 
index only scan?

Thank you,
Caio

Just try the following:

SELECT tt.tweet_id, tt.topic, tt.topic_value
            FROM tweet_topic AS tt  JOIN tweet AS t ON (tt.tweet_id = t.id
                                                  AND t.creation_time BETWEEN 
'D1' AND 'D2' AND t.user_id in
                                         (SELECT followed_id FROM relationship 
WHERE follower_id = N))
 ORDER BY tt.tweet_id;

And see if it helps with performance.

Regards,
Igor Neyman

What is your hardware configuration, and Postgres config parameters modified 
from default values?

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

2013-11-04 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Caio Casimiro
Sent: Monday, November 04, 2013 3:44 PM
To: Jeff Janes
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

Thank you very much for your answers guys!

On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes  wrote:
On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro  wrote:
Hello all,

I have one query running at ~ 7 seconds and I would like to know if it's 
possible to make it run faster, once this query runs lots of time in my 
experiment.


Do you mean you want it to be fast because it runs many times, or that you want 
it to become fast after it runs many times (i.e. once the data is fully 
cached)?  The plan you show takes 24 seconds, not 7 seconds.

I want it to be fast because it runs many times. I have an experiment that 
evaluates recommendation algorithms  for a set of twitter users. This query 
returns recommendation candidates so it is called a lot of times for different 
users and time intervals.
 
 

Basically the query return the topics of tweets published by users that the 
user N follows and that are published between D1 and D2.

Query:

SELECT tt.tweet_id, tt.topic, tt.topic_value
            FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id = t.id
            WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
            (SELECT followed_id FROM relationship WHERE follower_id = N) ORDER 
BY tt.tweet_id;


I don't know if this affects the plan at all, but it is silly to do a left join 
to "tweet" when the WHERE clause has conditions that can't be satisfied with a 
null row.  Also, you could try changing the IN-list to an EXISTS subquery.

I'm sorry the ignorance, but I don't understand the issue with the left join, 
could you explain more?
...
Thank you very much again!
Caio


Just try the following:

SELECT tt.tweet_id, tt.topic, tt.topic_value
FROM tweet_topic AS tt  JOIN tweet AS t ON (tt.tweet_id = t.id
  AND t.creation_time BETWEEN 
'D1' AND 'D2' AND t.user_id in
 (SELECT followed_id FROM relationship 
WHERE follower_id = N))
 ORDER BY tt.tweet_id;

And see if it helps with performance.

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How clustering for scale out works in PostgreSQL

2013-08-29 Thread Igor Neyman
> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
> performance-ow...@postgresql.org] On Behalf Of bsreejithin
> Sent: Thursday, August 29, 2013 12:42 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] How clustering for scale out works in PostgreSQL
> 
> The performance test that was conducted was for 1 Hour.
> 
> There are 6 transactions. 2 DB inserts and 4 SELECTs.
> Every 2 minutes there will be 4 SELECTs. And every 3 minutes there will be 2
> DB inserts.
> 
> 
> 
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/How-clustering-for-scale-out-
> works-in-PostgreSQL-tp5768917p5768957.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
> 
> 

With that kind of activity, you don't need clustering for your 1000 users.
What you need is PgBouncer, it should solv your problem.  Please read some docs 
on PgBouncer, it's "light-weight" and very easy to setup.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficient Correlated Update

2013-08-09 Thread Igor Neyman
> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
> performance-ow...@postgresql.org] On Behalf Of Kevin Grittner
> Sent: Friday, August 09, 2013 11:44 AM
> To: Robert DiFalco; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Efficient Correlated Update
> 
> Robert DiFalco  wrote:
> 
> > In my system a user can have external contacts. When I am bringing in
> > external contacts I want to correlate any other existing users in the
> > system with those external contacts. A users external contacts may or
> > may not be users in my system. I have a user_id field in "contacts"
> > that is NULL if that contact is not a user in my system
> >
> > Currently I do something like this after reading in external
> > contacts:
> >
> >   UPDATE contacts SET user_id = u.id
> >   FROM my_users u
> >   JOIN phone_numbers pn ON u.phone_significant = pn.significant
> >   WHERE contacts.owner_id = 7
> >   AND contacts.user_id IS NULL
> >   AND contacts.id = pn.ref_contact_id;
> >
> > If any of the fields are not self explanatory let me know.
> > "Significant" is just the right 7 most digits of a raw phone number.
> >
> > I'm more interested in possible improvements to my relational logic
> > than the details of the "significant" condition. IOW, I'm start enough
> > to optimize the "significant" query but not smart enough to know if
> > this is the best approach for the overall correlated UPDATE query. :)
> >
> > So yeah, is this the best way to update a contact's user_id reference
> > based on a contacts phone number matching the phone number of a user?
> >
> > One detail from the schema -- A contact can have many phone numbers
> > but a user in my system will only ever have just one phone number.
> > Hence the JOIN to "phone_numbers" versus the column in "my_users".
> 
> In looking it over, nothing jumped out at me as a problem.  Are you having
> some problem with it, like poor performance or getting results different from
> what you expected?
> 
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 

There is an illness that sometimes strikes database developers/administrators.
It is called CTD - Compulsive Tuning Disorder :)

Igor Neyman




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Igor Neyman
> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
> performance-ow...@postgresql.org] On Behalf Of Claudio Freire
> Sent: Wednesday, August 07, 2013 2:20 PM
> To: Robert DiFalco
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Efficiently query for the most recent record for a
> given user
> 
> On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco 
> wrote:
> > Let's say I have a table something like this:
> >
> >create table call_activity (
> > id int8 not null,
> > called timestamp,
> > user_id int8 not null,
> > primary key (id)
> > foreign key (user_id) references my_users
> > )
> >
> >
> > I want to get the last call_activity record for a single user.  Is
> > there ANY way to efficiently retrieve the last record for a specified
> > user_id, or do I need to de-normalize and update a table with a single
> > row for each user each time a new call_activity record is inserted? I
> > know I how to do the query without the summary table (subquery or
> > GROUP BY with MAX) but that seems like it will never perform well for
> > large data sets. Or am I full of beans and it should perform just fine
> > for a huge data set as long as I have an index on "called"?
> 
> 
> Create an index over (user_id, called desc), and do
> 
> select * from call_activity where user_id = blarg order by called desc limit 1
> 

And most recent call for every user:

SELECT id, user_id, MAX(called) OVER (PARTITION BY user_id) FROM call_activity;

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Igor Neyman


From: sl...@centrum.sk [mailto:sl...@centrum.sk] 
Sent: Wednesday, August 07, 2013 11:34 AM
To: Igor Neyman; Pavel Stehule
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on 
a view with another view inside of it.

I got:
"Total runtime: 9.313 ms" in pgAdmin
"Total runtime: 9.363 ms" in psql.
But timing after the query finished was 912.842 ms in psql.
 
Cheers,
 
Peter Slapansky
__

That proves what Pavel suggested regarding planning time.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Igor Neyman


From: sl...@centrum.sk [mailto:sl...@centrum.sk] 
Sent: Wednesday, August 07, 2013 10:43 AM
To: Igor Neyman; Pavel Stehule
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on 
a view with another view inside of it.

You're right, it does... but it's quite odd, because I re-ran the 
explain-analyze statement and got the same results.
Still, the query now runs for about a second as mentioned before, so it's 
almost like something's missing from the explain, but I'm certain I copied it 
all.
 
I did this via pgadmin, but that shouldn't matter, should it?
 
Thank you,
 
Peter Slapansky
__
_

At very end of explain analyze output there should be a line:

Total runtime: 

What do you get there?

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of sl...@centrum.sk
Sent: Wednesday, August 07, 2013 8:43 AM
To: Pavel Stehule
Cc: pgsql-performance@postgresql.org
Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a 
view with another view inside of it.

Good day,
 
I have included a link to the result of EXPLAIN ANALYZE. It's this one:
https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h
 
Here's a link to Depesz's explain (if links to the site are okay):
http://explain.depesz.com/s/gCk
 
I have just tried setting geqo_threshold, join_collapse_limit and 
from_collapse_limit to 16, but it yielded no improvement.
Changing those three parameters to 32 did speed up the query from about 3.3 
seconds to about a second (give or take 50 ms), which is a pretty good 
improvement, but not quite there, as I'm looking to bring it down to about 300 
ms if possible. Changing those three settings to 48 yielded no improvements 
over 32.
Is there possibly something something else to tweak there?
 
Here's EXPLAIN ANALYZE output when the three settings have been set to 32:
http://explain.depesz.com/s/cj2
 
Thank you.
 
Peter Slapansky

-

Your last explain analyze (with 3 settings set to 32)  shows query duration 
10ms, not 1sec.
Am I wrong? 

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Igor Neyman
> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
> performance-ow...@postgresql.org] On Behalf Of Shaun Thomas
> Sent: Thursday, June 27, 2013 12:16 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Partitions not Working as Expected
> 
> Hey guys,
> 
> I suspect I'll get an answer equivalent to "the planner treats that like a
> variable," but I really hope not because it renders partitions essentially
> useless to us. This is as recent as 9.1.9 and constraint exclusion is enabled.
> 
> What I have is this test case:
> 
> CREATE TABLE part_test (
>fake INT,
>part_col TIMESTAMP WITHOUT TIME ZONE
> );
> 
> CREATE TABLE part_test_1 (
>CHECK (part_col >= '2013-05-01' AND
>   part_col < '2013-06-01')
> ) INHERITS (part_test);
> 
> CREATE TABLE part_test_2 (
>CHECK (part_col >= '2013-04-01' AND
>   part_col < '2013-05-01')
> ) INHERITS (part_test);
> 
> And this query performs a sequence scan across all partitions:
> 
> EXPLAIN ANALYZE
> SELECT * FROM part_test
>   WHERE part_col > CURRENT_DATE;
> 
> The CURRENT_DATE value is clearly more recent than any of the partitions,
> yet it checks them anyway. The only way to get it to properly constrain
> partitions is to use a static value:
> 
> EXPLAIN ANALYZE
> SELECT * FROM part_test
>   WHERE part_col > '2013-06-27';
> 
> But developers never do this. Nor should they. I feel like an idiot even 
> asking
> this, because it seems so wrong, and I can't seem to come up with a
> workaround other than, "Ok devs, hard code dates into all of your queries
> from now on."
> 
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-676-8870
> stho...@optionshouse.com
> 

Doesn't have to be hardcoded.
If executed as dynamic sql, it will be re-planned properly, e.g.:

lQueryString := 'SELECT MAX(cycle_date_time) AS MaxDT
 FROM gp_cycle_' || partition_extension::varchar ||
 ' WHERE cell_id = ' || i_n_Cell_id::varchar ||
     ' AND part_type_id = ' || i_n_PartType_id::varchar ||
 ' AND cycle_date_time <= TIMESTAMP ' || 
quote_literal(cast(i_t_EndDate AS VARCHAR));
   IF (lQueryString IS NOT NULL) THEN
   EXECUTE lQueryString INTO lEndDate;


Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seqscan for 100 out of 3M rows, index present

2013-06-26 Thread Igor Neyman


From: Willy-Bas Loos [mailto:willy...@gmail.com] 
Sent: Wednesday, June 26, 2013 3:19 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] seqscan for 100 out of 3M rows, index present

plan with enable_seqscan off:

Aggregate  (cost=253892.48..253892.49 rows=1 width=0) (actual 
time=208.681..208.681 rows=1 loops=1)
  ->  Nested Loop  (cost=5.87..253889.49 rows=1198 width=0) (actual 
time=69.403..208.647 rows=17 loops=1)
    ->  Index Scan using geo_blok_idx on geo g  (cost=0.00..1314.43 
rows=500 width=8) (actual time=45.776..46.147 rows=121 loops=1)
  Index Cond: (blok = 1942)
    ->  Bitmap Heap Scan on bmp_data d  (cost=5.87..502.91 rows=179 
width=8) (actual time=1.340..1.341 rows=0 loops=121)
  Recheck Cond: (geo_id = g.geo_id)
  ->  Bitmap Index Scan on bmp_data_geo_idx  (cost=0.00..5.82 
rows=179 width=0) (actual time=1.206..1.206 rows=0 loops=121)
    Index Cond: (geo_id = g.geo_id)
Total runtime: 208.850 ms


On Wed, Jun 26, 2013 at 9:08 PM, Igor Neyman  wrote:


From: Willy-Bas Loos [mailto:willy...@gmail.com]
Sent: Wednesday, June 26, 2013 3:04 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] seqscan for 100 out of 3M rows, index present

nope
$ grep ^[^#] /etc/postgresql/9.1/main/postgresql.conf|grep -e ^[^[:space:]]
data_directory = '/var/lib/postgresql/9.1/main'        # use data in another 
directory
hba_file = '/etc/postgresql/9.1/main/pg_hba.conf'    # host-based 
authentication file
ident_file = '/etc/postgresql/9.1/main/pg_ident.conf'    # ident configuration 
file
external_pid_file = '/var/run/postgresql/9.1-main.pid'        # write an extra 
PID file
port = 5432                # (change requires restart)
max_connections = 100            # (change requires restart)
unix_socket_directory = '/var/run/postgresql'        # (change requires restart)
ssl = true                # (change requires restart)
shared_buffers = 2GB            # min 128kB
work_mem = 100MB                # min 64kB
maintenance_work_mem = 256MB        # min 1MB
synchronous_commit = off        # synchronization level; on, off, or local
checkpoint_segments = 10        # in logfile segments, min 1, 16MB each
log_line_prefix = '%t '            # special values:
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'            # locale for system error message
lc_monetary = 'en_US.UTF-8'            # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'            # locale for number formatting
lc_time = 'en_US.UTF-8'                # locale for time formatting
default_text_search_config = 'pg_catalog.english'
--

How much RAM you have on this machine?
What else is this machine is being used for (besides being db server)?
And, what is your setting for effective_cache_size?  It looks like you didn't 
change it from default (128MB).
You need to adjust effective_cache_size so somewhat between 60%-75% of RAM, if 
the database is the main process running on this machine.

Again, effective_cache_size could be set on session level, so you could try it 
before changing GUC in postgresql.conf.
When trying it, don't forget to change enable_seqscan back to "on" (if it's 
still "off").

Igor Neyman







-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seqscan for 100 out of 3M rows, index present

2013-06-26 Thread Igor Neyman


From: Willy-Bas Loos [mailto:willy...@gmail.com] 
Sent: Wednesday, June 26, 2013 3:04 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] seqscan for 100 out of 3M rows, index present

nope
$ grep ^[^#] /etc/postgresql/9.1/main/postgresql.conf|grep -e ^[^[:space:]]
data_directory = '/var/lib/postgresql/9.1/main'        # use data in another 
directory
hba_file = '/etc/postgresql/9.1/main/pg_hba.conf'    # host-based 
authentication file
ident_file = '/etc/postgresql/9.1/main/pg_ident.conf'    # ident configuration 
file
external_pid_file = '/var/run/postgresql/9.1-main.pid'        # write an extra 
PID file
port = 5432                # (change requires restart)
max_connections = 100            # (change requires restart)
unix_socket_directory = '/var/run/postgresql'        # (change requires restart)
ssl = true                # (change requires restart)
shared_buffers = 2GB            # min 128kB
work_mem = 100MB                # min 64kB
maintenance_work_mem = 256MB        # min 1MB
synchronous_commit = off        # synchronization level; on, off, or local
checkpoint_segments = 10        # in logfile segments, min 1, 16MB each
log_line_prefix = '%t '            # special values:
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'            # locale for system error message
lc_monetary = 'en_US.UTF-8'            # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'            # locale for number formatting
lc_time = 'en_US.UTF-8'                # locale for time formatting
default_text_search_config = 'pg_catalog.english'

--

You could change this setting on session level, and prove yourself or query 
optimizer right (or wrong :)

Igor Neyman

...
...
Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual 
time=481.526..481.526 rows=1 loops=1)
  ->  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual 
time=317.403..481.513 rows=17 loops=1)
    Hash Cond: (d2.gid = g2.gid)
    ->  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8) (actual 
time=0.013..231.707 rows=3107454 loops=1)
    ->  Hash  (cost=1290.24..1290.24 rows=494 width=8) (actual 
time=0.207..0.207 rows=121 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 5kB
  ->  Index Scan using g_blok on g2  (cost=0.00..1290.24 rows=494 
width=8) (actual time=0.102..0.156 rows=121 loops=1)
    Index Cond: (k = 1942)
Total runtime: 481.600 ms
Here's the DDL:
create table g2 (gid bigint primary key, k integer);
create table d2 (id bigint primary key, gid bigint);
--insert into g2 (...)
--insert into d2 (...)
create index g_blok on g2(blok);
create index d_gid on d2(gid);
alter table d2 add constraint d_g_fk foreign key (gid) references g2 (gid);
analyze d2;
analyze g2;

Any advice?

Cheers,
Willy-Bas Loos
--
So, did you try to set:

enable_seqscan = off

and see if different execution plan is more efficient?

Igor Neyman



-- 
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seqscan for 100 out of 3M rows, index present

2013-06-26 Thread Igor Neyman
Hi,
postgres does a seqscan, even though there is an index present and it should be 
much more efficient to use it.
I tried to synthetically reproduce it, but it won't make the same choice when i 
do.
I can reproduce it with a simplified set of the data itself though.
here's the query, and the analyzed plan:
select count(*) 
from d2
join g2 on g2.gid=d2.gid
where g2.k=1942

Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual 
time=481.526..481.526 rows=1 loops=1)
  ->  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual 
time=317.403..481.513 rows=17 loops=1)
    Hash Cond: (d2.gid = g2.gid)
    ->  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8) (actual 
time=0.013..231.707 rows=3107454 loops=1)
    ->  Hash  (cost=1290.24..1290.24 rows=494 width=8) (actual 
time=0.207..0.207 rows=121 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 5kB
  ->  Index Scan using g_blok on g2  (cost=0.00..1290.24 rows=494 
width=8) (actual time=0.102..0.156 rows=121 loops=1)
    Index Cond: (k = 1942)
Total runtime: 481.600 ms
Here's the DDL:
create table g2 (gid bigint primary key, k integer);
create table d2 (id bigint primary key, gid bigint);
--insert into g2 (...)
--insert into d2 (...)
create index g_blok on g2(blok);
create index d_gid on d2(gid);
alter table d2 add constraint d_g_fk foreign key (gid) references g2 (gid);
analyze d2;
analyze g2;

Any advice?

Cheers,
Willy-Bas Loos
-- 

So, did you try to set:

enable_seqscan = off

and see if different execution plan is more efficient?

Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-30 Thread Igor Neyman
> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Merlin Moncure
> Sent: Thursday, May 30, 2013 11:14 AM
> To: John Mudd
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
> 
> On Mon, May 27, 2013 at 9:02 AM, John Mudd  wrote:
> > Postgres 9.1.2 on Ubuntu 12.04
> >
> > Any reason why a select by primary key would be slower than a select
> > that includes an ORDER BY? I was really hoping using the primary key
> > would give me a boost.
> >
> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
> > /proc/sys/vm/drop_caches" between the runs.
> >
> >
> >
> > test=# VACUUM ANALYZE test_select;
> > VACUUM
> >
> > (stopped postgres; reset O/S cache; started postgres)
> >
> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
> > ORDER BY key1, key2, key3, id LIMIT 1;
> >   QUERY
> > PLAN
> > -
> -
> > 
> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
> > rows=1
> > loops=1)
> >->  Index Scan using my_key on test_select  (cost=0.00..41895.49
> > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
> >  Index Cond: (key1 >= 50)
> >  Total runtime: 12.678 ms
> >
> > (stopped postgres; reset O/S cache; started postgres)
> 
> 
> why are you flushing postgres/os cache?  when you do that, you are
> measuring raw read time from disks.  Typical disk seek time is measured
> in milliseconds so the timings are completely appropriate once you
> remove caching effects. Hard drives (at least, the spinning
> kind) are slow and one of the major challenges of database and hardware
> engineering is working around their limitations.  Fortunately it looks
> like faster storage will soon be commonplace for reasonable prices.
> 
> merlin
> 

True.
But, on the hand (back to original question), 
execution plans that John got before and after suggested change in 
configuration parameters are exactly the same, though timing is different but 
only due to buffer cache issue.

Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best practice when reindexing in production

2013-05-29 Thread Igor Neyman


From: Matheus de Oliveira [mailto:matioli.math...@gmail.com] 
Sent: Wednesday, May 29, 2013 10:19 AM
To: Igor Neyman
Cc: Niels Kristian Schjødt; Magnus Hagander; pgsql-performance@postgresql.org 
list
Subject: Re: [PERFORM] Best practice when reindexing in production



On Wed, May 29, 2013 at 10:55 AM, Igor Neyman  wrote:


You could do something like this (which considers you use simple names for your 
indexes, where simple ~ [a-z_][a-z0-9_]*):

SELECT 
regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 
INDEX CONCURRENTLY tmp_\2 ON \3;') || E'\n'
|| E'BEGIN;\n'
|| 'DROP INDEX ' || i.indexname || E';\n'
|| 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n'
|| E'COMMIT;\n'
FROM pg_indexes i
WHERE schemaname !~ '^(pg_|information_schema$)';

Although this one is *really simple* and *error phrone*, because it does not 
consider at least two things: index that are constraints and index that has FK 
depending on it. For the first case, you only need to change the constraint to 
use the index and the DROP command. As for the second case, you would need to 
remove the FKs, drop the old one and recreate the FK (inside a transaction, of 
course), but this could be really slow, a reindex for this case would be 
simpler and perhaps faster.

=
I must be missing something here.
But, how is that FK depends on the index?
I understand FK lookup works much faster with the index supporting FK than 
without it, but you could have FK without index (on the "child" table).
So, what gives?

AFAIK, when you create a FK, PostgreSQL associate it with an UNIQUE INDEX on 
the target table. It creates an entry on pg_depends (I don't know if somewhere 
else), and when you try to drop the index, even if there is an identical one 
that PGs could use, it will throw an error.

You can easily check this:

postgres=# CREATE TABLE parent(id int);
CREATE TABLE
postgres=# CREATE UNIQUE INDEX parent_idx1 ON parent (id);
CREATE INDEX
postgres=# CREATE TABLE child(idparent int REFERENCES parent (id));
CREATE TABLE
postgres=# CREATE UNIQUE INDEX parent_idx2 ON parent (id);
CREATE INDEX
postgres=# DROP INDEX parent_idx1;
ERROR:  cannot drop index parent_idx1 because other objects depend on it
DETAIL:  constraint child_idparent_fkey on table child depends on index 
parent_idx1
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

BTW, I do think PostgreSQL could verify if there is another candidate to this 
FK. Is it in TODO list? Should it be?

Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


So, it's about index on parent table that's used for unique (or PK) constraint 
and referenced by FK on child table.
>From your previous email I thought that index on child table supporting FK 
>(which is mostly created for performance purposes) cannot be dropped without 
>disabling FK. My bad.

Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best practice when reindexing in production

2013-05-29 Thread Igor Neyman


You could do something like this (which considers you use simple names for your 
indexes, where simple ~ [a-z_][a-z0-9_]*):

SELECT 
regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 
INDEX CONCURRENTLY tmp_\2 ON \3;') || E'\n'
|| E'BEGIN;\n'
|| 'DROP INDEX ' || i.indexname || E';\n'
|| 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n'
|| E'COMMIT;\n'
FROM pg_indexes i
WHERE schemaname !~ '^(pg_|information_schema$)';

Although this one is *really simple* and *error phrone*, because it does not 
consider at least two things: index that are constraints and index that has FK 
depending on it. For the first case, you only need to change the constraint to 
use the index and the DROP command. As for the second case, you would need to 
remove the FKs, drop the old one and recreate the FK (inside a transaction, of 
course), but this could be really slow, a reindex for this case would be 
simpler and perhaps faster.


Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


I must be missing something here.
But, how is that FK depends on the index?
I understand FK lookup works much faster with the index supporting FK than 
without it, but you could have FK without index (on the "child" table).
So, what gives?

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


From: Anne Rosset [aros...@collab.net]
Sent: Monday, May 06, 2013 5:51 PM
To: Igor Neyman; k...@rice.edu
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Deterioration in performance when query executed in  
multi threads

Hi Igor,
Result with enable_nestloop off:




---
 Hash Join  (cost=49946.49..58830.02 rows=1 width=181) (actual 
time=2189.474..2664.888 rows=180 loops=1)
   Hash Cond: ((item.created_by_id)::text = (sfuser.id)::text)
   ->  Hash Join  (cost=49470.50..58345.53 rows=1 width=167) (actual 
time=1931.870..2404.745 rows=180 loops=1)
 Hash Cond: ((relationship.origin_id)::text = (sfuser2.id)::text)
 ->  Hash Join  (cost=48994.51..57869.52 rows=1 width=153) (actual 
time=1927.603..2400.334 rows=180 loops=1)
   Hash Cond: ((relationship.target_id)::text = (artifact.id)::text)
   ->  Seq Scan on relationship  (cost=0.00..7973.38 rows=240435 
width=19) (actual time=0.036..492.442 rows=241285 loops=1)
 Filter: ((NOT is_deleted) AND 
((relationship_type_name)::text = 'ArtifactAssignment'::text))
   ->  Hash  (cost=48994.49..48994.49 rows=1 width=154) (actual 
time=1858.350..1858.350 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 34kB
 ->  Hash Join  (cost=47260.54..48994.49 rows=1 width=154) 
(actual time=1836.495..1858.151 rows=180 loops=1)
   Hash Cond: ((field_value4.id)::text = 
(artifact.customer_fv)::text)
   ->  Seq Scan on field_value field_value4  
(cost=0.00..1443.78 rows=77378 width=9) (actual time=22.104..30.694 rows=77378 
loops=1)
   ->  Hash  (cost=47260.52..47260.52 rows=1 width=163) 
(actual time=1814.005..1814.005 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 35kB
 ->  Hash Join  (cost=45526.57..47260.52 rows=1 
width=163) (actual time=1790.908..1813.780 rows=180 loops=1)
   Hash Cond: ((field_value3.id)::text = 
(artifact.category_fv)::text)
   ->  Seq Scan on field_value field_value3 
 (cost=0.00..1443.78 rows=77378 width=15) (actual time=0.002..9.262 rows=77378 
loops=1)
   ->  Hash  (cost=45526.55..45526.55 
rows=1 width=166) (actual time=1790.505..1790.505 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory 
Usage: 36kB
 ->  Hash Join  
(cost=43792.60..45526.55 rows=1 width=166) (actual time=1768.362..1790.304 
rows=180 loops=1)
   Hash Cond: 
((field_value.id)::text = (artifact.group_fv)::text)
   ->  Seq Scan on field_value  
(cost=0.00..1443.78 rows=77378 width=9) (actual time=0.002..8.687 rows=77378 
loops=1)
   ->  Hash  
(cost=43792.58..43792.58 rows=1 width=175) (actual time=1767.928..1767.928 
rows=180 loops=1)
 Buckets: 1024  
Batches: 1  Memory Usage: 38kB
 ->  Hash Join  
(cost=42058.63..43792.58 rows=1 width=175) (actual time=1499.822..1767.734 
rows=180 loops=1)
   Hash Cond: 
((field_value2.id)::text = (artifact.status_fv)::text)
   ->  Seq Scan on 
field_value field_value2  (cost=0.00..1443.78 rows=77378 width=15) (actual 
time=0.002..261.082 rows=77378 loops=1)
   ->  Hash  
(cost=42058.61..42058.61 rows=1 width=178) (actual time=1492.707..1492.707 
rows=180 loops=1)
 Buckets: 
1024  Batches: 1  Memory Usage: 38kB
 ->  Hash 
Join  (cost=18039.59..42058.61 rows=1 width=178) (actual 
time=1175.659..1492.482 rows=180 loops=1)
   Hash 
Cond: ((item.id)::text = (artifact.id)::text)
   ->  
Hash Join  (cost=12112.31..36130.95 rows=30 width=128) (actual 
time=304.035..702.745 rows=1015 loops=1)

 Hash Cond: ((item.folder_id)::text = (folder.id)::text)
 

Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Thomas Kellerer
> Sent: Monday, May 06, 2013 1:12 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Deterioration in performance when query executed
> in multi threads
> 
> Anne Rosset, 06.05.2013 19:00:
> > Postgres version: 9.0.13
> >
> >> Work_mem is set to 64MB
> >> Shared_buffer to 240MB
> >> Segment_size is 1GB
> >> Wal_buffer is 10MB
> >
> > Artifact table: 251831 rows
> > Field_value table: 77378 rows
> > Mntr_subscription: 929071 rows
> > Relationship: 270478 row
> > Folder: 280356 rows
> > Item: 716465 rows
> > Sfuser: 5733 rows
> > Project: 1817 rows
> >
> > 8CPUs
> > RAM: 8GB
> >
> 
> With 8GB RAM you should be able to increase shared_buffer to 1GB or
> maybe even higher especially if this is a dedicated server.
> 240MB is pretty conservative for a server with that amount of RAM
> (unless you have many other applications running on that box)
> 
> Also what are the values for
> 
> cpu_tuple_cost
> seq_page_cost
> random_page_cost
> effective_cache_size
> 
> What kind of harddisk is in the server? SSD? Regular ones (spinning
> disks)?
> 
> 
> 


Also, with 8 CPUs, your max connection_pool size shouldn't much bigger than 20.

Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


> -Original Message-
> From: Anne Rosset [mailto:aros...@collab.net]
> Sent: Monday, May 06, 2013 1:01 PM
> To: Igor Neyman; k...@rice.edu
> Cc: pgsql-performance@postgresql.org
> Subject: RE: [PERFORM] Deterioration in performance when query executed
> in multi threads
> 
> Hi Igor,
> The explain analyze is from when there was no load.
> 
> Artifact table: 251831 rows
> Field_value table: 77378 rows
> Mntr_subscription: 929071 rows
> Relationship: 270478 row
> Folder: 280356 rows
> Item: 716465 rows
> Sfuser: 5733 rows
> Project: 1817 rows
> 
> 8CPUs
> RAM: 8GB
> 
> Postgres version: 9.0.13
> 
>  And no we haven't switched or tested yet  with pgbouncer. We would
> like to do a bit more analysis before trying this.
> 
> Thanks for your help,
> Anne
> 
> 


Anne,

Just as a quick test, try in the psql session/connection locally change 
enable_nestloop setting and run your query:

set enable_nestloop = off;
explain analyze ;

just to see if different execution plan will be better and optimizer needs to 
be "convinced" to use this different plan.
Please post what you get with the modified setting.

Also, what is the setting for effective_cache_size in postgresql.conf?

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


> -Original Message-
> From: Anne Rosset [mailto:aros...@collab.net]
> Sent: Friday, May 03, 2013 4:52 PM
> To: Igor Neyman; k...@rice.edu
> Cc: pgsql-performance@postgresql.org
> Subject: RE: [PERFORM] Deterioration in performance when query executed
> in multi threads
> 
> We saw a little bit improvement by increasing the min_pool_size but
> again I see a bigvariation in the time the query is executed. Here is
> the query:
> 
> srdb=> explain analyze SELECT
> psrdb->artifact.id AS id,
> psrdb->artifact.priority AS priority,
> psrdb->project.path AS projectPathString,
> psrdb->project.title AS projectTitle,
> psrdb->folder.project_id AS projectId,
> psrdb->folder.title AS folderTitle,
> psrdb->item.folder_id AS folderId,
> psrdb->item.title AS title,
> psrdb->item.name AS name,
> psrdb->field_value2.value AS status,
> psrdb->field_value3.value AS category,
> psrdb->sfuser.username AS submittedByUsername,
> psrdb->sfuser.full_name AS submittedByFullname,
> psrdb->sfuser2.username AS assignedToUsername,
> psrdb->sfuser2.full_name AS assignedToFullname,
> psrdb->item.version AS version,
> psrdb->CASE when ((SELECT
> psrdb(>mntr_subscription.user_id AS userId
> psrdb(> FROM
> psrdb(>mntr_subscription mntr_subscription
> psrdb(> WHERE
> psrdb(>artifact.id=mntr_subscription.object_key
> psrdb(> AND mntr_subscription.user_id='user1439'
> psrdb(> )= 'user1439') THEN 'user1439' ELSE null END AS
> monitoringUserId,
> psrdb->tracker.icon AS trackerIcon,
> psrdb->tracker.remaining_effort_disabled AS
> remainingEffortDisabled,
> psrdb->tracker.actual_effort_disabled AS actualEffortDisabled,
> psrdb->tracker.estimated_effort_disabled AS
> estimatedEffortDisabled
> psrdb-> FROM
> psrdb->field_value field_value2,
> psrdb->field_value field_value,
> psrdb->sfuser sfuser2,
> psrdb->field_value field_value3,
> psrdb->field_value field_value4,
> psrdb->item item,
> psrdb->project project,
> psrdb->relationship relationship,
> psrdb->tracker tracker,
> psrdb->artifact artifact,
> psrdb->sfuser sfuser,
> psrdb->folder folder
> psrdb-> WHERE
> psrdb->artifact.id=item.id
> psrdb-> AND item.folder_id=folder.id
> psrdb-> AND folder.project_id=project.id
> psrdb-> AND artifact.group_fv=field_value.id
> psrdb-> AND artifact.status_fv=field_value2.id
> psrdb-> AND artifact.category_fv=field_value3.id
> psrdb-> AND artifact.customer_fv=field_value4.id
> psrdb-> AND item.created_by_id=sfuser.id
> psrdb-> AND relationship.is_deleted=false
> psrdb-> AND
> relationship.relationship_type_name='ArtifactAssignment'
> psrdb-> AND relationship.origin_id=sfuser2.id
> psrdb-> AND artifact.id=relationship.target_id
> psrdb-> AND item.is_deleted=false
> psrdb-> AND ((artifact.priority=3))
> psrdb-> AND (project.path='projects.psr-pub-13')
> psrdb-> AND item.folder_id=tracker.id
> psrdb-> ;
> 
> QUERY PLAN
> 
> ---
> -
> ---
> 
>  Nested Loop  (cost=0.00..272.62 rows=1 width=181) (actual
> time=805.934..1792.596 rows=177 loops=1)
> 
>->  Nested Loop  (cost=0.00..263.87 rows=1 width=167) (actual
> time=707.739..1553.348 rows=177 loops=1)
> 
>  ->  Nested Loop  (cost=0.00..263.58 rows=1 width=153) (actual
> time=653.053..1496.839 rows=177 loops=1)
> 
>->  Nested Loop  (cost=0.00..262.50 rows=1 width=154)
> (actual time=565.627..1385.667 rows=177 loops=1)
> 
>  ->  Nested Loop  (cost=0.00..262.08 rows=1
> width=163) (actual time=565.605..1383.686 rows=177 loops
> =1)
>->  Nested Loop  (cost=0.00..261.67 rows=1
> width=166) (actual time=530.928..1347.053 rows=177
>  loops=1)
>  ->  Nested Loop  (cost=0.00..261.26
> rows=1 width=175) (actual time=530.866..1345.032
> rows=177 loops=1)
>->  Nested Loop
&

Re: [PERFORM] [BUGS] BUG #8130: Hashjoin still gives issues

2013-05-01 Thread Igor Neyman


> -Original Message-
> 

> 
> The original query:
> 
> select * from ambit_privateevent_calendars as a, ambit_privateevent as
> b, ambit_calendarsubscription as c, ambit_calendar as d where
> c.calendar_id = d.id and a.privateevent_id = b.id and c.user_id = 1270
> and  c.calendar_id = a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4,
> 12, 20) and not b.main_recurrence = true;
> 
> select b.id from ambit_privateevent_calendars as a, ambit_privateevent
> as b, ambit_calendarsubscription as c, ambit_calendar as d where
> c.calendar_id = d.id and a.privateevent_id = b.id and c.user_id = 1270
> and  c.calendar_id = a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4,
> 12, 20) and not b.main_recurrence = true;
> 
> (select * => select b.id, the star query is *fastest*)
> 
> We compare:
> http://explain.depesz.com/s/jRx
> http://explain.depesz.com/s/eKE
> 
> 
> By setting "set enable_hashjoin = off;" performance in our entire
> application increased 30 fold in throughput, which was a bit unexpected
> but highly appreciated. The result of the last query switch the
> mergejoin:
> 
> http://explain.depesz.com/s/AWB
> 
> It is also visible that after hashjoin is off, the b.id query is faster
> than the * query (what would be expected).
> 
> 
> Our test machine is overbudgetted, 4x the memory of the entire database
> ~4GB, and uses the PostgreSQL stock settings.
> 
> 
> Stefan
> 

I'd suggest that you adjust Postgres configuration, specifically memory 
settings (buffer_cache, work_mem, effective_cache_size), to reflect your 
hardware config, and see how it affects your query.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-01 Thread Igor Neyman


> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Anne Rosset
> Sent: Wednesday, May 01, 2013 1:10 PM
> To: k...@rice.edu
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Deterioration in performance when query executed
> in multi threads
> 
> Thanks Ken. I am going to test with different pool sizes and see if I
> see any improvements.
> Are there other configuration options I should look like? I was
> thinking of playing with shared_buffer.
> 
> Thanks,
> Anne
> 
> -Original Message-
> From: k...@rice.edu [mailto:k...@rice.edu]
> Sent: Wednesday, May 01, 2013 9:27 AM
> To: Anne Rosset
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Deterioration in performance when query executed
> in multi threads
> 
> On Wed, May 01, 2013 at 04:07:55PM +, Anne Rosset wrote:
> > Hi Ken,
> > Thanks for your answer. My test is actually running with jboss 7/jdbc
> and the connection pool is defined  with min-pool-size =10 and max-
> pool-size=400.
> >
> > Why would you think it is an issue with the connection pool?
> >
> > Thanks,
> > Anne
> >
> 
> Hi Anne,
> 
> You want to be able to run as many jobs productively at once as your
> hardware is capable of supporting. Usually something starting a 2 x
> number of CPUs is best.
> If you make several runs increasing the size of the pool each time, you
> will see a maximum throughput somewhere near there and then the
> performance will decrease as you add more and more connections. You can
> then use that sweet spot.
> Your test harness should make that pretty easy to find.
> 
> Regards,
> Ken
> 
> 
> --
> Sent via pgsql-performance mailing list (pgsql-
> performa...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Anne,

Before expecting advice on specific changes to Postgres configuration 
parameters,
You should provide this list with your hardware configuration, Postgres 
version, your current Postgres configuration parameters (at least those that 
changed from defaults).
And, if you do the testing using specific query, would be nice if you provide 
the results of:

Explain analyze ;

along with the definition of database objects (tables, indexes) involved in 
this select.

Also, you mention client-side connection pooler.  In my experience, server-side 
poolers, such as PgBouncer mentioned earlier, are much more effective.

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: Join between 2 tables always executes a sequential scan on the larger table

2013-04-02 Thread Igor Neyman
From: Dieter Rehbein [mailto:dieter.rehb...@skiline.cc] 
Sent: Tuesday, April 02, 2013 4:52 AM
To: pgsql-performance@postgresql.org
Subject: Join between 2 tables always executes a sequential scan on the larger 
table

Hi everybody,

in a project I have a performance problem, which I (and my colleagues) don't 
understand. It's a simple join between 2 of 3 tables:

table-1:   user   (id, user_name, ...).   This table has about 1 million rows 
(999673 rows)
table-2:   competition (57 rows)
table-3:   user_2_competition.  A relation between user and competition. This 
table has about 100.000 rows

The query is a join between table user_2_competition and user and looks like 
this:

select u.id, u.user_name
from user_2_competition uc 
          left join "user" u on u.id = uc.user_id 
where uc.competition_id = '3cc1cb9b3ac132ad013ad01316040001'

The query returns the ID and user_name of all users participating in a 
competition.

What I don't understand: This query executes a sequential scan on user!


The tables have the following indexes:

user_2_competition:  there is an index on user_id and an index on 
competition_id (competition_id is a VARCHAR(32) containing UUIDs)
user:  id is the primary key and has therefore a unique index (the ID is a 
VARCHAR(32), which contains UUIDs).

The database has just been restored from a backup, I've executed ANALYZE for 
both tables.

The output of explain analyze (Postgres 9.2.3):

Hash Left Join  (cost=111357.64..126222.29 rows=41396 width=42) (actual 
time=1982.543..2737.331 rows=41333 loops=1)
  Hash Cond: ((uc.user_id)::text = (u.id)::text)
  ->  Seq Scan on user_2_competition uc  (cost=0.00..4705.21 rows=41396 
width=33) (actual time=0.019..89.691 rows=41333 loops=1)
        Filter: ((competition_id)::text = 
'3cc1cb9b3ac132ad013ad01316040001'::text)
        Rows Removed by Filter: 80684
  ->  Hash  (cost=90074.73..90074.73 rows=999673 width=42) (actual 
time=1977.604..1977.604 rows=999673 loops=1)
        Buckets: 2048  Batches: 128  Memory Usage: 589kB
        ->  Seq Scan on "user" u  (cost=0.00..90074.73 rows=999673 width=42) 
(actual time=0.004..1178.827 rows=999673 loops=1)
Total runtime: 2740.723 ms


I expected to see an index-scan on user_2_competition with a hash join to user, 
not a sequential scan on user.  I've tried this with Postgres 9.1 and 9.2.3).

Any ideas, what's going on here?

With EXPLAIN ANALYZE I can see, which query plan Postgres is using.  Is there 
any way to find out, WHY postgres uses this query plan? 

best regards
Dieter

---

Dieter, 
why do you think index-scan on user_2_competition would be better?

Based on huge number of rows returned (41333 out of total ~12 in the table) 
from this table optimizer decided that Seq Scan is better than index scan.
You don't show QUERY TUNING parameters from Postgresql.conf, are they default?
Playing with optimizer parameters (lowering random_page_cost, lowering 
cpu_index_tuple_cost , increasing effective_cache_size, or just setting 
enable_seqscan = off), you could try to force "optimizer" to use index, and see 
if you are getting better results.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to help the planner

2013-03-29 Thread Igor Neyman
Marty,

When you change from/join collaps_limit pay attention to Genetic Query 
Optimizer settings, I believe by default it's "on" (geqo = on).
Specifically look at geqo_threshold parameter (default is 12).  
AFAIK, if you don't have intensions to use Genetic Query Optimizer, 
geqo_threshold parameter should be higher than your collaps_limit, e.g. if you 
want to set collaps_limit to 50, and you think you may join 50 tables, then 
also increase geqo_threshold to at least 51.
Otherwise GeCO will come into play unexpectedly.

Besides this, try to play with these parameters (according to your original 
message you keep them at default):

#seq_page_cost = 1.0# measured on an arbitrary scale
random_page_cost = 2.0  # same scale as above  (default 4.0)
cpu_tuple_cost = 0.05   # same scale as above (default 0.01)
cpu_index_tuple_cost = 0.05 # same scale as above  (default 0.005)
cpu_operator_cost = 0.0075  # same scale as above  (default 0.0025)

Start with cpu_tuple_cost, increasing it from default 0.01 to 0.03-0.05.

Regards,
Igor Neyman


From: Marty Frasier [mailto:m.fras...@escmatrix.com] 
Sent: Thursday, March 28, 2013 4:45 PM
To: Tom Lane
Cc: pgsql-performance@postgresql.org; James Quinn
Subject: Re: how to help the planner

Tom,
I cranked (join|from)_collapse_limit up to 50, then 500 just to exclude the 
limits completey, and attempted the query both times.  The planner came up with 
an estimate close to the other estimates (1,944,276) and I stopped actual 
execution after some length of time.
The t12 subquery is grouped differently because that particular test can be 
valid at mutliple schools per student.

I had set session pg_default_statistics to 1 and analyzed prior to the 
earlier runs to allow it to have the best stats it could.  I've looked at it a 
little more closely, setting pg_default_statistics back to default of 100 and 
re-ran analyze on that database.

The value 'cahsee_ela' occurs 75,000 times in column 
analysis.iteration__student__test__year.test which totals 11M rows.  It's 
ranked about 60 of 91 values in frequency.
By setting statistics=1000 on the column 'test' the MCV from pg_stats contains 
all 91 distinct values (there are no nulls) and there is no histogram_bounds 
value for the column.  From MCV: cahsee_ela = 0.00658 which is accurate.
I think that should give the planner good info on the selectivity of the where 
clause.  It appears from the var_eq_const function that it will use that exact 
value when found.  It doesn' t seem to help the outcome though as it had good 
stats before.  I just understand it a little better now - which is good.

Do you have any suggestions where to probe next?
I see some statistics hooks mentioned in some of the source codes but don't 
know how to take advantage of them or whether it would be of use.
I suppose the answer could eventually be we have to reorganize our queries?
 
Thanks,
Marty


On Thu, Mar 28, 2013 at 12:18 PM, Tom Lane  wrote:
Marty Frasier  writes:
> We've been using postgreSQL for a few years.  This is my first post here
> and first real dive into query plans.
One quick thought is that it's probably worth cranking up
join_collapse_limit and/or from_collapse_limit, since the number of
relations in the query is considerably more than the default values of
those limits.  This will make planning take longer but possibly find
better plans.  I'm not sure it will help a lot, since most of the
problem is evidently bad rowcount estimates, but it might help.

Also it seems like the major rowcount failing is in the estimate for the
t12 subquery.  I can't tell why that particular combination of WHERE
clauses is giving it such a hard time --- is there something odd about
the distribution of 'cahsee_ela' tests?  Why is that particular subquery
grouped over school/student when all the others are grouped over just
student?

                        regards, tom lane





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU usage after partitioning

2013-01-22 Thread Igor Neyman
In PG 9.2 I’m getting “Index Only Scan Backward” for every partition in the 
first part of execution plan, when looking for MAX in partitioned table on a 
similar query:

"->  Index Only Scan Backward using pk_cycle_200610 on 
gp_cycle_200610 gp_cycle  (cost=0.00..8.34 rows=5 width=8) (actual 
time=0.021..0.021 rows=1 loops=1)"
"  Index Cond: (cycle_date_time IS NOT NULL)"
"  Heap Fetches: 0"

May be you should upgrade to 9.2.

Regards,
Igor Neyman


From: rudi [mailto:rudol...@gmail.com]
Sent: Tuesday, January 22, 2013 10:08 AM
To: pgsql-performance@postgresql.org
Subject: Re: High CPU usage after partitioning

On Tue, Jan 22, 2013 at 3:46 PM, Andrew Dunstan 
mailto:and...@dunslane.net>> wrote:
The query is pretty simple and standard, the behaviour (and the plan) is 
totally different when it comes to a partitioned table.

Partioned table query => explain analyze SELECT  "sb_logs".* FROM "sb_logs"  
WHERE (device_id = 901 AND date_taken = (SELECT MAX(date_taken) FROM sb_logs 
WHERE device_id = 901));

And there you have it. Constraint exclusion does not work in cases like this. 
It only works with static expressions (such as a literal date in this case).

Ok, but I would have expected same plant repeated 4 times. When the table is 
not partitioned, the plan is defintely smarter: it knows that index is reversed 
and looks for max with an index scan backward). When the table is partitioned, 
it scan forward and I guess it will always do a full index scan.



--
rd

This is the way the world ends.
Not with a bang, but a whimper.


Re: [PERFORM] How can i find out top high load sql queries in PostgreSQL.

2012-12-19 Thread Igor Neyman
> -Original Message-
> From: Vibhor Kumar [mailto:vibhor.ku...@enterprisedb.com]
> Sent: Monday, December 17, 2012 11:34 AM
> To: suhas.basavaraj12
> Cc: pgsql-performance@postgresql.org
> Subject: Re: How can i find out top high load sql queries in
> PostgreSQL.
> 
> 
> On Dec 17, 2012, at 3:21 AM, suhas.basavaraj12 
> wrote:
> 
> > There is a tool called pg Fouine . I am sure this will help  you..
> >
> > http://pgfouine.projects.pgfoundry.org/tutorial.html
> 
> +1
> 
> You can also use pgbadger, which seemed more flexible than pgFouine.
> http://dalibo.github.com/pgbadger/
> 
> Thanks & Regards,
> Vibhor Kumar
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
> Blog:http://vibhork.blogspot.com
> 

Pg_stat_statements extension tracks SQL statements execution statistics.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Igor Neyman
From: Anibal David Acosta [mailto:a...@devshock.com] 
Sent: Thursday, October 04, 2012 10:01 AM
To: pgsql-performance@postgresql.org
Subject: how to avoid deadlock on masive update with multiples delete

.
. 
.

The other situation could be that update process while blocking rows scale to 
block page and the try to scale to lock table while the delete process as some 
locked rows.

Thanks!


This (lock escalation from row -> to page -> to table) is MS SQL Server 
"feature", pretty sure Postgres does not do it.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres becoming slow, only full vacuum fixes it

2012-10-01 Thread Igor Neyman
> -Original Message-
> From: Thomas Kellerer [mailto:spam_ea...@gmx.net]
> Sent: Tuesday, September 25, 2012 7:24 AM
> To: pgsql-performance@postgresql.org
> Subject: Re: Postgres becoming slow, only full vacuum fixes it
> 
> Kiriakos Tsourapas, 25.09.2012 13:01:
> > Thank you,
> >
> > I will take this into consideration, since upgrading to 9 will be
> much harder I assume...
> >
> 
> I think an upgrade from 8.3 to 8.4 was "harder" due to the removal of a
> lot of implicit type casts.
> 8.4 to 9.x shouldn't be that problematic after all (but will take
> longer due to the required dump/reload)
> 

Actually, 8.3 to 8.4 required db dump/restore.
When upgrading from 8.4 to 9.x pg_upgrade could be used  without dump/restore.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index with all necessary columns - Postgres vs MSSQL

2012-02-07 Thread Igor Neyman
265,293,305,331,348}'::integer[])) AND 
("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND 
("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"

reference different table and index names.
Also, EXPLAIN ANALYZE would provide additional info compared to just EXPLAIN.

One option you could try, is to cluster your table based on " test_all" index, 
and see if it makes a difference.
BTW., in SQL Server your "covering" index - is it clustered?

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?

2011-11-03 Thread Igor Neyman
> -Original Message-
> From: Craig Ringer [mailto:ring...@ringerc.id.au]
> Sent: Thursday, November 03, 2011 5:07 AM
> To: Igor Neyman
> Cc: Robert Haas; Tom Lane; Jay Levitt;
pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Guide to PG's capabilities for inlining,
> predicate hoisting, flattening, etc?
> 
> On 11/03/2011 04:22 AM, Igor Neyman wrote:
> 
> That said, I'm not actually against performance hints if done
sensibly.
> 
> --
> Craig Ringer
> 


> ...sensibly
As it is with any other feature...

Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?

2011-11-02 Thread Igor Neyman


> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Wednesday, November 02, 2011 11:13 AM
> To: Tom Lane
> Cc: Jay Levitt; pgsql-performance@postgresql.org
> Subject: Re: Guide to PG's capabilities for inlining, predicate
> hoisting, flattening, etc?
> ...
> ...
> Perhaps we could let people say
> something like WITH x AS FENCE (...) when they want the fencing
> behavior, and otherwise assume they don't (but give it to them anyway
> if there's a data-modifying operation in there).
> 
> 
>  
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


Hints here we come :)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Migrated from 8.3 to 9.0 - need to update config (re-post)

2011-09-14 Thread Igor Neyman


From: Carlo Stonebanks [mailto:stonec.regis...@sympatico.ca] 
Sent: Tuesday, September 13, 2011 9:27 PM
To: Performance support Postgresql
Subject: Re: Migrated from 8.3 to 9.0 - need to update config (re-post)


 

Ok, connection pooler it is. As I understand it, even if there are no idle 
connections available we'll get the benefit of putting a turnstile on 
the butcher's door.
I also ordered the book as soon as you mentioned - the title alone was enough 
to sell me on it! The book won't be for the errant sys admin who increased the 
connections, it's for me - I'll use it to whack the sys admin on the head. 
Thanks fo rthe tip, the author owes you a beer - as do I.
 
Will the book recommend any particular connection pooler product, or is it 
inappropriate to ask for a recommendation on the forum?
 
Carlo
 

I'd start with the pg_bouncer: very simple to setup, reliable, no "extra" 
functionality, which seems by your message you don't need.

Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to see memory usage using explain analyze ?

2011-08-17 Thread Igor Neyman

> -Original Message-
> From: hyelluas [mailto:helen_yell...@mcafee.com]
> Sent: Monday, August 15, 2011 2:33 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: How to see memory usage using explain analyze ?
> 
> Igor,
> 
> thank you , my tests showed better performance against the larger
> summary
> tables when I splited the index for datasource_id & datex , I use to
> have a
> composed index.
> 
> Regarding that index statistics - should I analyze the tables? I
> thought
> auto vacuum takes care of it.
> 
> helen
> 
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-
> explain-analyze-tp4694962p4701919.html
> Sent from the PostgreSQL - performance mailing list archive at
> Nabble.com.


But, having different sets of indexes, you can't compare execution
plans.
In regards to statistics, you could try to ANALYZE table manually, may
be increasing "default_statistics_target".
From the docs:

"default_statistics_target (integer)

Sets the default statistics target for table columns that have not
had a column-specific target set via ALTER TABLE SET STATISTICS. Larger
values increase the time needed to do ANALYZE, but might improve the
quality of the planner's estimates. The default is 10. For more
information on the use of statistics by the PostgreSQL query planner,
refer to Section 14.2."

HTH,
Igor

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to see memory usage using explain analyze ?

2011-08-15 Thread Igor Neyman
>   Recheck Cond: ((datasource_id = 10) AND (datex >= '2011-
> 08-03
> 00:00:00+00'::timestamp with time zone) AND (datex < '2011-08-06
> 00:00:00+00'::timestamp with time zone))
>   ->  Bitmap Index Scan on summ_app_fw_datex_15191
> (cost=0.00..2281.32 rows=72293 width=0) (actual time=10.910..10.910
> rows=75024 loops=1)
> Index Cond: ((datasource_id = 10) AND (datex >=
> '2011-08-03 00:00:00+00'::timestamp with time zone) AND (datex <
'2011-
> 08-06
> 00:00:00+00'::timestamp with time zone))
> 
> 
> Why the difference is so large? How I can tune this query?
> 
> thank you.
> 
> Helen
> 
> 
> 
> 
> 
> 
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-
> explain-analyze-tp4694681p4694681.html
> Sent from the PostgreSQL - performance mailing list archive at
> Nabble.com.

Helen,

I'm probably a bit late answering your question.
But, just in case...

It looks like one table has "combined" index summ_app_fw_datex_15191 on
both: datasource_id and datex, which works better than 2 separate
indexes ind_datex_15191(datex) and ind_fw_15191(datasource_id), that you
have on the other table.
Besides, this:

->  Bitmap Index Scan on ind_datex_15191
(cost=0.00..985.83 rows=46855 width=0) (actual time=1020.834..1020.834
rows=9370944 loops=1)

Shows that statistics on ind_datex_15191 are completely "out of wack"
(expected rows=46855, actual rows=9370944).

HTH,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Igor Neyman
 

> -Original Message-
> From: Scott Marlowe [mailto:scott.marl...@gmail.com] 
> Sent: Thursday, January 27, 2011 4:25 PM
> To: Igor Neyman
> Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall; 
> pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
> 
> On Thu, Jan 27, 2011 at 2:18 PM, Igor Neyman 
>  wrote:
> >
> >> -Original Message-
> >> From: Scott Marlowe [mailto:scott.marl...@gmail.com]
> >> Sent: Thursday, January 27, 2011 4:16 PM
> >> To: Igor Neyman
> >> Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall; 
> >> pgsql-performance@postgresql.org
> >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
> >>
> >> On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman 
>  
> >> wrote:
> >> >
> >> >
> >> >> -Original Message-
> >> >> From: Scott Marlowe [mailto:scott.marl...@gmail.com]
> >> >> Sent: Thursday, January 27, 2011 3:59 PM
> >> >> To: Mladen Gogala
> >> >> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; 
> >> >> pgsql-performance@postgresql.org
> >> >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
> >> >>
> >> >> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala 
> >> >>  wrote:
> >> >> > On 1/27/2011 3:37 PM, Scott Marlowe wrote:
> >> >> >>
> >> >> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala 
> >> >> >>   wrote:
> >> >> >>>
> >> >> >>> There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use 
> an index.
> >> >> >>
> >> >> >> That's because Oracle has covering indexes.
> >> >> >>
> >> >> > I am not sure what you mean by "covering indexes" but I
> >> >> hope that for
> >> >> > the larger table I have in mind,  indexes will be used.
> >> >> For a small
> >> >> > table like
> >> >>
> >> >> In Oracle you can hit JUST the index to get the data 
> you need (and 
> >> >> maybe rollback logs, which are generally pretty small)
> >> >>
> >> >> In Pgsql, once you hit the index you must then hit the 
> actual data 
> >> >> store to get the right version of your tuple.  So, index
> >> access in pg
> >> >> is more expensive than in Oracle.  However, updates are cheaper.
> >> >> Always a trade off
> >> >>
> >> >>
> >> >
> >> > Scott,
> >> > What you describe here isn't about "covering indexes" -
> >> it's about different ways implementing MVCC in Oracle and PG.
> >>
> >> It is about covering indexes AND it's about the difference in how 
> >> MVCC is implemented in both databases.
> >>
> >>
> >
> > Well, Mladen's query doesn't involve covering indexes.
> 
> On Oracle?  Then how can it get the values it needs without 
> having to hit the data store?
> 
> 

It doesn't.
It does "INDEX UNIQUE SCAN" and then "TABLE ACCESS BY INDEX ROWID".

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Igor Neyman
 

> -Original Message-
> From: Scott Marlowe [mailto:scott.marl...@gmail.com] 
> Sent: Thursday, January 27, 2011 4:16 PM
> To: Igor Neyman
> Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall; 
> pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
> 
> On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman 
>  wrote:
> >
> >
> >> -Original Message-
> >> From: Scott Marlowe [mailto:scott.marl...@gmail.com]
> >> Sent: Thursday, January 27, 2011 3:59 PM
> >> To: Mladen Gogala
> >> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; 
> >> pgsql-performance@postgresql.org
> >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
> >>
> >> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala 
> >>  wrote:
> >> > On 1/27/2011 3:37 PM, Scott Marlowe wrote:
> >> >>
> >> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala 
> >> >>   wrote:
> >> >>>
> >> >>> There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.
> >> >>
> >> >> That's because Oracle has covering indexes.
> >> >>
> >> > I am not sure what you mean by "covering indexes" but I
> >> hope that for
> >> > the larger table I have in mind,  indexes will be used.
> >> For a small
> >> > table like
> >>
> >> In Oracle you can hit JUST the index to get the data you need (and 
> >> maybe rollback logs, which are generally pretty small)
> >>
> >> In Pgsql, once you hit the index you must then hit the actual data 
> >> store to get the right version of your tuple.  So, index 
> access in pg 
> >> is more expensive than in Oracle.  However, updates are cheaper.
> >> Always a trade off
> >>
> >>
> >
> > Scott,
> > What you describe here isn't about "covering indexes" - 
> it's about different ways implementing MVCC in Oracle and PG.
> 
> It is about covering indexes AND it's about the difference in 
> how MVCC is implemented in both databases.
> 
> 

Well, Mladen's query doesn't involve covering indexes.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Igor Neyman
 

> -Original Message-
> From: Scott Marlowe [mailto:scott.marl...@gmail.com] 
> Sent: Thursday, January 27, 2011 3:59 PM
> To: Mladen Gogala
> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; 
> pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
> 
> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala 
>  wrote:
> > On 1/27/2011 3:37 PM, Scott Marlowe wrote:
> >>
> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala 
> >>   wrote:
> >>>
> >>> There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.
> >>
> >> That's because Oracle has covering indexes.
> >>
> > I am not sure what you mean by "covering indexes" but I 
> hope that for 
> > the larger table I have in mind,  indexes will be used.  
> For a small 
> > table like
> 
> In Oracle you can hit JUST the index to get the data you need 
> (and maybe rollback logs, which are generally pretty small)
> 
> In Pgsql, once you hit the index you must then hit the actual 
> data store to get the right version of your tuple.  So, index 
> access in pg is more expensive than in Oracle.  However, 
> updates are cheaper.
> Always a trade off
> 
> 

Scott,
What you describe here isn't about "covering indexes" - it's about different 
ways implementing MVCC in Oracle and PG.

Mladen, 
you were right.
For recursive query like yours Oracle uses index even on small table.
I made an assumption without testing it.
However some other (non-recursive) queries against the same small table that 
also require reading all 14 rows do "table scan".

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Igor Neyman
 

> -Original Message-
> From: Mladen Gogala [mailto:mladen.gog...@vmsinfo.com] 
> Sent: Thursday, January 27, 2011 12:00 PM
> To: Tom Lane
> Cc: David Wilson; Kenneth Marshall; pgsql-performance@postgresql.org
> Subject: Re: Postgres 9.0 has a bias against indexes
> 
> On 1/27/2011 11:40 AM, Tom Lane wrote:
> > It is worth noting that EXPLAIN results should not be extrapolated
> >  to situations other than the one you are actually testing; for
> >  example, results on a toy-sized table cannot be 
> assumed to apply to
> >  large tables.
> Well, that's precisely what I tried. Bummer, I will have to 
> copy a large table over.
> 
> --
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
> 
> 

Mladen,

I don't think, this is exclusive Postgres feature.
I'm pretty sure, Oracle optimizer will do "TABLE ACCESS (FULL)" instead
of using index on 14-row table either.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Real vs Int performance

2011-01-27 Thread Igor Neyman
 

> -Original Message-
> From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
> Sent: Wednesday, January 26, 2011 5:12 PM
> To: David Greco
> Cc: pgsql-performance@postgresql.org
> Subject: Re: Real vs Int performance 
> 
> David Greco  writes:
> > Came across a problem I find perplexing. I recreated the 
> dimensional 
> > tables in Oracle and the fields that are integers in Oracle became 
> > integers in Postgres. Was experiencing terrible performance 
> during the 
> > load and narrowed down to a particular dimensional lookup 
> problem. 
> ...
> ...
> ...
> ...
> In real life, this query is actually bound and parameterized,
> 
> In that case, an EXPLAIN using literal constants is next door 
> to useless in terms of telling you what will happen in real 
> life.  You need to pay attention to exactly how the 
> parameterization is done.  Again, I'm suspecting a wrong 
> datatype indication.
> 
>   regards, tom lane
> 

To see what happens with parametrized query in "real life" you could try
"auto_explain" contrib module.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partitioning question 1

2010-10-29 Thread Igor Neyman
 

> -Original Message-
> From: Ben [mailto:midfi...@gmail.com] 
> Sent: Friday, October 29, 2010 12:16 PM
> To: Igor Neyman
> Cc: pgsql-performance@postgresql.org
> Subject: Re: partitioning question 1
> 
> On Oct 29, 2010, at 7:38 AM, Igor Neyman wrote:
> 
> >> is my intuition completely off on this?
> >> 
> >> best regards, ben
> >> 
> > 
> > If your SELECT retrieves substantial amount of records, table scan 
> > could be more efficient than index access.
> > 
> > Now, if while retrieving large amount of records "WHERE clause" of 
> > this SELECT still satisfies constraints on some partition(s), then 
> > obviously one (or few) partition scans will be more efficient than 
> > full table scan of non-partitioned table.
> > 
> > So, yes partitioning provides performance improvements, not only 
> > maintenance convenience.
> 
> my impression was that a *clustered* index would give a lot 
> of the same I/O benefits, in a more flexible way.  if you're 
> clustered on the column in question, then an index scan for a 
> range is much like a sequential scan over a partition (as far 
> as i understand.)
> 
> b
> 

Even with clustered index you still read index+table, which is more
expensive than just table scan (in situation I described above).
PG clustered index is not the same as SQL Server clustered index (which
includes actual table pages on the leaf level).

Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partitioning question 1

2010-10-29 Thread Igor Neyman
> -Original Message-
> From: Ben [mailto:midfi...@gmail.com] 
> Sent: Thursday, October 28, 2010 12:37 PM
> To: pgsql-performance@postgresql.org
> Subject: partitioning question 1
> 
> hello --
> 
> my last email was apparently too long to respond to so i'll 
> split it up into shorter pieces.  my first question :
> 
> my understanding of how range partitioning and constraint 
> exclusion works leads me to believe that it does not buy any 
> query performance that a clustered index doesn't already give 
> you -- the advantages are all in maintainability.  an index 
> is able to eliminate pages just as well as constraint 
> exclusion is able to eliminate table partitions.  the I/O 
> advantages of having queries target small subtables are the 
> same as the I/O advantages of clustering the index : result 
> pages in a small range are very close to each other on disk.
> 
> finally, since constraint exclusion isn't as flexible as 
> indexing (i've seen old mailing list posts that say that 
> constraint exclusion only works with static constants in 
> where clauses, and only works with simple operators like >, < 
> which basically forces btree indexes when i want to use gist) 
> it is indeed likely that partitioning can be slower than one 
> big table with a clustered index.
> 
> is my intuition completely off on this?
> 
> best regards, ben
> 

If your SELECT retrieves substantial amount of records, table scan could
be more efficient than index access.

Now, if while retrieving large amount of records "WHERE clause" of this
SELECT still satisfies constraints on some partition(s), then obviously
one (or few) partition scans will be more efficient than full table scan
of non-partitioned table.

So, yes partitioning provides performance improvements, not only
maintenance convenience.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index scan is not working, why??

2010-10-21 Thread Igor Neyman
 

> -Original Message-
> From: AI Rumman [mailto:rumman...@gmail.com] 
> Sent: Thursday, October 21, 2010 1:25 AM
> To: pgsql-performance@postgresql.org
> Subject: Index scan is not working, why??
> 
> I don't know why seq scan is running on the following query 
> where the same query is giving index scan on other servers:
> explain analyze
> 
> select *
> from act
> where act.acttype in ( 'Meeting','Call','Task');
>   QUERY PLAN
> --
> --
> 
>  Seq Scan on act (cost=0.00..13386.78 rows=259671 width=142) 
> (actual time=0.013..484.572 rows=263639 loops=1)
>   Filter: (((acttype)::text = 'Meeting'::text) OR 
> ((acttype)::text = 'Call'::text) OR ((acttype)::text = 
> 'Task'::text))  Total runtime: 732.956 ms
> (3 rows)
> 
> 
> The above query is giving index scan on other servers and 
> even if I rewrite the query as follows I got index scan:
> explain analyze
> 
> select *
> from act
> where act.acttype = 'Meeting'
> or act.acttype = 'Call';
>   QUERY PLAN
> --
> --
> --
>  Bitmap Heap Scan on act (cost=17.98..1083.80 rows=2277 
> width=142) (actual time=1.901..9.722 rows=4808 loops=1)
>   Recheck Cond: (((acttype)::text = 'Meeting'::text) OR 
> ((acttype)::text = 'Call'::text))
>   -> BitmapOr (cost=17.98..17.98 rows=2281 width=0) (actual 
> time=1.262..1.262 rows=0 loops=1)
>   -> Bitmap Index Scan on act_acttype_idx (cost=0.00..8.99 
> rows=1141 width=0) (actual time=0.790..0.790 rows=3181 loops=1)
>   Index Cond: ((acttype)::text = 'Meeting'::text)
>   -> Bitmap Index Scan on act_acttype_idx (cost=0.00..8.99 
> rows=1141 width=0) (actual time=0.469..0.469 rows=1630 loops=1)
>   Index Cond: ((acttype)::text = 'Call'::text)  Total 
> runtime: 14.227 ms
> (8 rows)
> 
> 

"Index Scan" is not alwayes prefarable to "Seq Scan", it depends on
selectivity of your query.
When retrieving substancial portion of big table seq scan is usually
faster, that's why optimizer chooses it.

Your queries (and possibly data sets in the tables on different servers)
are not the same.
Your first query (which uses seq scan) returns 259671 which is probably
substantial part of the whole table.
Your second query (which uses index scan) returns only 4808 rows, which
makes index access less costly in this case.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Igor Neyman

> -Original Message-
> From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] 
> Sent: Friday, October 15, 2010 2:14 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: oracle to psql migration - slow query in postgres
> 
> Thanks for all your responses. What's interesting is that an 
> index is used when this query is executed in Oracle.  It 
> appears to do some parallel processing:
> 
> SQL> set line 200
> delete from plan_table;
> explain plan for
> select websiteid, emailaddress
>   from members
>  where emailok = 1
>and emailbounced = 0;
> 
> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
> SQL> 
> 3 rows deleted.
> 
> SQL>   2345  
> Explained.
> 
> SQL> SQL> 
> PLAN_TABLE_OUTPUT
> --
> --
> --
> --
> Plan hash value: 4247959398
> 
> --
> -
> | Id  | Operation   | Name   | 
> Rows  | Bytes
> | Cost (%CPU)| Time |TQ  |IN-OUT| PQ Distrib |
> --
> -
> |   0 | SELECT STATEMENT||   237M|
> 7248M|   469K  (2)| 01:49:33 ||  ||
> |   1 |  PX COORDINATOR ||   |
> ||  ||  ||
> |   2 |   PX SEND QC (RANDOM)   | :TQ1   |   237M|
> 7248M|   469K  (2)| 01:49:33 |  Q1,00 | P->S | QC (RAND)  |
> |   3 |PX BLOCK ITERATOR||   237M|
> 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWC ||
> |*  4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 |   237M|
> 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWP ||
> --
> -
> 
> PLAN_TABLE_OUTPUT
> --
> --
> --
> --
> 
> Predicate Information (identified by operation id):
> ---
> 
>4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1)
> 
> 16 rows selected.
> 
> 

1. Postgres doesn't have "FAST FULL SCAN" because even if all the info
is in the index, it need to visit the row in the table ("visibility"
issue).

2. Postgres doesn't have parallel executions.

BUT, it's free anf has greate community support, as you already saw.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Igor Neyman
; tablespace | members_idx
> indexdef   | CREATE INDEX members_memberid_idx ON members USING btree
> (memberid)
> -[ RECORD
> 9 
> ]-
> --
> -
> tablename  | members
> indexname  | email_website_unq
> tablespace | members_idx
> indexdef   | CREATE UNIQUE INDEX email_website_unq ON members USING
> btree (emailaddress, websiteid)
> 
> 
> This table has also been vacuumed analyzed as well:
> 
> select * from pg_stat_all_tables where relname = 'members'; 
> -[ RECORD 1 ]+--
> relid| 3112786
> schemaname   | x
> relname  | members
> seq_scan | 298
> seq_tup_read | 42791828896
> idx_scan | 31396925
> idx_tup_fetch| 1083796963
> n_tup_ins| 291308316
> n_tup_upd    | 0
> n_tup_del| 4188020
> n_tup_hot_upd| 0
> n_live_tup   | 285364632
> n_dead_tup   | 109658
> last_vacuum  | 2010-10-12 20:26:01.227393-04
> last_autovacuum  | 
> last_analyze | 2010-10-12 20:28:01.105656-04
> last_autoanalyze | 2010-09-16 20:50:00.712418-04
> 
> 


Tony,
For your query:

> select 
> emailaddress, websiteid
>   from members
>  where emailok = 1
>and emailbounced = 0;

your table doesn't have any indexes where "emailok" or "emailbounced"
are leading columns.
That's why existing indexes can not be used.

If you specified "websiteid" in the "where" clause then (most probably)
the index members_sorted_idx_001 will be used (based on selectivity and
statistics known to optimizer). 

If this query (as is - without "websiteid") is important for your app,
create another index on (emailok, emailbounced) which should help, of
course if selectivity of your where clause is good enough (not to
perform full table scan).

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Identical query slower on 8.4 vs 8.3

2010-07-16 Thread Igor Neyman
 

> -Original Message-
> From: Patrick Donlin [mailto:pdon...@oaisd.org] 
> Sent: Thursday, July 15, 2010 11:13 AM
> To: Kevin Grittner; pgsql-performance@postgresql.org
> Subject: Re: Identical query slower on 8.4 vs 8.3
> 
> I'll read over that wiki entry, but for now here is the 
> EXPLAIN ANALYZE output assuming I did it correctly. I have 
> run vacuumdb --full --analyze,  it actually runs as a nightly 
> cron job.
> 
> 8.4.4 Sever:
> "Unique  (cost=202950.82..227521.59 rows=702022 width=86) 
> (actual time=21273.371..22429.511 rows=700536 loops=1)"
> "  ->  Sort  (cost=202950.82..204705.87 rows=702022 width=86) 
> (actual time=21273.368..22015.948 rows=700536 loops=1)"
> "Sort Key: test.tid, testresult.trscore, 
> testresult.trpossiblescore, testresult.trstart, 
> testresult.trfinish, testresult.trscorebreakdown, 
> testresult.fk_sid, test.tname, qr.qrscore, qr.qrtotalscore, 
> testresult.trid, qr.qrid"
> "Sort Method:  external merge  Disk: 71768kB"
> "->  Hash Join  (cost=2300.82..34001.42 rows=702022 
> width=86) (actual time=64.388..1177.468 rows=700536 loops=1)"
> "  Hash Cond: (qr.fk_trid = testresult.trid)"
> "  ->  Seq Scan on questionresult qr  
> (cost=0.00..12182.22 rows=702022 width=16) (actual 
> time=0.090..275.518 rows=702022 loops=1)"
> "  ->  Hash  (cost=1552.97..1552.97 rows=29668 
> width=74) (actual time=63.042..63.042 rows=29515 loops=1)"
> "->  Hash Join  (cost=3.35..1552.97 
> rows=29668 width=74) (actual time=0.227..39.111 rows=29515 loops=1)"
> "  Hash Cond: (testresult.fk_tid = test.tid)"
> "  ->  Seq Scan on testresult  
> (cost=0.00..1141.68 rows=29668 width=53) (actual 
> time=0.019..15.622 rows=29668 loops=1)"
> "  ->  Hash  (cost=2.60..2.60 rows=60 
> width=21) (actual time=0.088..0.088 rows=60 loops=1)"
> "->  Seq Scan on test  
> (cost=0.00..2.60 rows=60 width=21) (actual time=0.015..0.044 
> rows=60 loops=1)"
> "Total runtime: 22528.820 ms"
> 
> 8.3.7 Server:
> "Unique  (cost=202950.82..227521.59 rows=702022 width=86) 
> (actual time=22157.714..23343.461 rows=700536 loops=1)"
> "  ->  Sort  (cost=202950.82..204705.87 rows=702022 width=86) 
> (actual time=22157.706..22942.018 rows=700536 loops=1)"
> "Sort Key: test.tid, testresult.trscore, 
> testresult.trpossiblescore, testresult.trstart, 
> testresult.trfinish, testresult.trscorebreakdown, 
> testresult.fk_sid, test.tname, qr.qrscore, qr.qrtotalscore, 
> testresult.trid, qr.qrid"
> "Sort Method:  external merge  Disk: 75864kB"
> "->  Hash Join  (cost=2300.82..34001.42 rows=702022 
> width=86) (actual time=72.842..1276.634 rows=700536 loops=1)"
> "  Hash Cond: (qr.fk_trid = testresult.trid)"
> "  ->  Seq Scan on questionresult qr  
> (cost=0.00..12182.22 rows=702022 width=16) (actual 
> time=0.112..229.987 rows=702022 loops=1)"
> "  ->  Hash  (cost=1552.97..1552.97 rows=29668 
> width=74) (actual time=71.421..71.421 rows=29515 loops=1)"
> "->  Hash Join  (cost=3.35..1552.97 
> rows=29668 width=74) (actual time=0.398..44.524 rows=29515 loops=1)"
> "  Hash Cond: (testresult.fk_tid = test.tid)"
> "  ->  Seq Scan on testresult  
> (cost=0.00..1141.68 rows=29668 width=53) (actual 
> time=0.117..20.890 rows=29668 loops=1)"
> "  ->  Hash  (cost=2.60..2.60 rows=60 
> width=21) (actual time=0.112..0.112 rows=60 loops=1)"
> "->  Seq Scan on test  
> (cost=0.00..2.60 rows=60 width=21) (actual time=0.035..0.069 
> rows=60 loops=1)"
> "Total runtime: 23462.639 ms"
> 
> 
> Thanks for the quick responses and being patient with me not 
> providing enough information.
> -Patrick
> 

Well, now that you've got similar runtime on both 8.4.4 and 8.3.7, here
is a suggestion to improve performance of this query based on EXPLAIN
ANALYZE you proveded (should have done it in your first e-mail).

EXPLAIN ANALYZE shows that most of the time (22015 ms on 8.4.4) spent on
sorting you result set.
And according to this: "Sort Method:  external merge  Disk: 71768kB" -
sorting is done using disk, meaning your work_mem setting is not
sufficient to do this sort in memory (I didn't go back through this
thread far enough, to see if you provided info on how it is set).

I'd suggest to increase the value up to ~80MB, if not for the system,
may be just for the session running this query.
Then see if performance improved.

And, with query performance issues always start with EXPLAIN ANALYZE.

Regards,
Igor Neyman 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Internal operations when the planner makes a hash join.

2010-02-24 Thread Igor Neyman
 

> -Original Message-
> From: negora [mailto:neg...@negora.com] 
> Sent: Tuesday, February 23, 2010 4:33 PM
> To: Scott Carey
> Cc: Alvaro Herrera; pgsql-performance@postgresql.org
> Subject: Re: Internal operations when the planner makes a hash join.
> 
> Thank you for explaining me the internal behaviour of the 
> PostgreSQL engine. I'll try to look for more information 
> about that hash tables. It sounds really really interesting. 
> Your information was very useful.
> 
> The origin of my doubt resides in the fact that I need to do 
> a joint between 3 HUGE tables (millions of registries) and do 
> certain operations with the retrieved information. I was 
> deciding whether to use one SELECT with 3 JOINs, as I've been 
> doing since the beginning, or build a PL/PgSQL function based 
> on 3 nested "FOR ... IN SELECT ... LOOP" 
> structures which tried to minimize the subsequent table 
> searches storing intermediate useful data in arrays 
> (curiously, these would act as the hash tables which you 
> mention, but in a very very rudimentary way). In a case like 
> this one (possibly unable to fit in RAM), Is also JOIN the 
> best solution?
> 
> Since I've to retrieve such a big amount of columns and 
> crossed registries I had started to think that using 1 SELECT 
> with 3 JOINs would increase the number of table searches a 
> LOT and also "duplicate" the information too much. I mean 
> "duplicate" as in this case, where the Factor 1 appears 
> millions of times for every Element:
> 
> Element 1 | Sub-factor 1 | Factor 1
> Element 2 | Subf-actor 1 | Factor 1
> ...
> Element 12639747465586 | Sub-factor 1 | Factor 1 Element 1 | 
> Sub-factor 2 | Factor 1
> 
> I hope not to robber you much time but... What do you think 
> about it? Is it better either 1 SELECT with 3 JOINs or build 
> nested "FOR ... IN SELECT ... LOOP" structures? Could it be 
> one of that cases in which I've to choose between either 
> higher speed but higher memory consume (3
> JOINs) or lower speed but less memory expense (3 FORs)?
> 
> Thanks again and apologizes for extending this topic too much.
> 
> 
> Scott Carey wrote:
> > On Feb 23, 2010, at 8:53 AM, Alvaro Herrera wrote:
> >
> >   
> >> negora wrote:
> >>
> >> 
> >>> According to how I understood the process, the engine 
> would get the 
> >>> name from the student with ID 1 and would look for the 
> name of the 
> >>> father with ID 1 in the hashed table. It'd do exactly the 
> same with 
> >>> the student #2 and father #2. But my big doubt is about 
> the 3rd one 
> >>> (Anthony). Would the engine "know" that it already had 
> retrieved the 
> >>> father's name for the student 1 and would avoid searching for it 
> >>> into the hashed table (using some kind of internal 
> mechanism which 
> >>> allows to "re-utilize" the name)? Or would it search into 
> the hashed 
> >>> table again?
> >>>   
> >> The hash table is searched again.  But that's fast, because it's a 
> >> hash table.
> >>
> >> 
> >
> > To answer the question another way, "remembering" that it 
> has already seen father A once and tracking that would use a 
> hash table to remember that fact.  
> >
> > The hash table created by the first scan IS the "remember 
> you have seen this father" data structure, optimized for fast 
> lookup.  So before even looking at the first student, the 
> hash table is built so that it is fast to find out if a 
> father has been seen before, and if so where that father's 
> data is located.  Looking this data up is often referred to 
> as a "probe" and not a "scan" because it takes just as long 
> to do if the hash table has 100 entries or 1 entries.  
> The drawback is that the whole thing has to fit in RAM.
> >
> >
> >   
> >> -- 
> >> Alvaro Herrera    
> http://www.CommandPrompt.com/
> >> The PostgreSQL Company - Command Prompt, Inc.
> >>
> >> --
> >> Sent via pgsql-performance mailing list 
> >> (pgsql-performance@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
> >> 
> >
> >
> >   
> 

So, you are trying to do "nested loop" in PL/PgSQL.
Why not let optimizer decide between "nested loop" and "hash join" based
on your memory settings and statistics collected for objects involved?
I'm pretty sure, it'll be faster than PL/PgSQL 3 nested loops.

Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance