On Thu, Jun 11, 2015 at 7:18 PM, Sasa Vilic sasavi...@gmail.com wrote:
Hi,
I have a query that takes ridiculously long to complete (over 500ms) but if
I disable nested loop it does it really fast (24.5ms)
Here are links for
* first request (everything enabled): http://explain.depesz.com/s/Q1M
* second request (nested loop disabled): http://explain.depesz.com/s/9ZY
I have also noticed, that setting
set join_collapse_limit = 1;
produces similar results as when nested loops are disabled.
Autovacuumm is running, and I did manually performed both: analyze and
vacuumm analyze. No effect.
I tried increasing statistics for columns (slot, path_id, key) to 5000 for
table data. No effect.
I tried increasing statistics for columns (id, parent, key) to 5000 for
table path. No effect.
I can see, that postgres is doing wrong estimation on request count, but I
can't figure it out why.
Table path is used to represent tree-like structure.
== QUERY ==
SELECT p1.value as request_type, p2.value as app_id, p3.value as app_ip,
p3.id as id, data.*, server.name
FROM data
INNER JOIN path p3 ON data.path_id = p3.id
INNER JOIN server on data.server_id = server.id
INNER JOIN path p2 on p2.id = p3.parent
INNER JOIN path p1 on p1.id = p2.parent
WHERE data.slot between '2015-02-18 00:00:00' and '2015-02-19 00:00:00'
AND p1.key = 'request_type' AND p2.key = 'app_id' AND p3.key = 'app_ip'
;
== TABLES ==
Table public.path
Column | Type | Modifiers | Storage |
Description
+---+---+--+-
id | integer | not null default
nextval('path_id_seq'::regclass) | plain|
parent | integer | |
plain|
key| character varying(25) | not null
| extended |
value | character varying(50) | not null
| extended |
Indexes:
path_pkey PRIMARY KEY, btree (id)
path_unique UNIQUE CONSTRAINT, btree (parent, key, value)
Foreign-key constraints:
path.fg.parent-path(id) FOREIGN KEY (parent) REFERENCES path(id)
Referenced by:
TABLE data CONSTRAINT data_fkey_path FOREIGN KEY (path_id)
REFERENCES path(id)
TABLE path CONSTRAINT path.fg.parent-path(id) FOREIGN KEY (parent)
REFERENCES path(id)
Has OIDs: no
Table public.data
Column | Type | Modifiers | Storage |
Description
---++---+--+-
slot | timestamp(0) without time zone | not null | plain|
server_id | integer| not null | plain|
path_id | integer| not null | plain|
key | character varying(50) | not null | extended |
value | real | not null | plain|
Indexes:
data_pkey PRIMARY KEY, btree (slot, server_id, path_id, key)
Foreign-key constraints:
data_fkey_path FOREIGN KEY (path_id) REFERENCES path(id)
Has OIDs: no
svilic= select count(*) from path;
count
---
603
svilic= select count(*) from path p1 inner join path p2 on p1.id =
p2.parent inner join path p3 on p2.id = p3.parent where p1.parent is null;
count
---
463
svilic= select count(*) from server;
count
---
37
svilic= select count(*) from data;
count
--
23495552
svilic= select version();
version
-
PostgreSQL 9.1.17 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
== SERVER CONFIGURATION ==
shared_buffers = 512MB
work_mem = 8MB (I have tried changing it to 32, 128 and 512, no effect)
maintenance_work_mem = 64MB
checkpoint_segments = 100
random_page_cost = 4.0
effective_cache_size = 3072MB
== HARDWARE CONFIGURATION ==
cpu: Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz (4 cores)
mem: 8GB
system is using regular disks, (no raid and no ssd)
huh. the query looks pretty clean (except for possible overuse of
surrogate keys which tend to exacerbate planning issues in certain
cases).
Let's try cranking statistics on data.path_id, first to 1000 and then
to 1 and see how it affects the plan. The database is clearly
misestimating row counts on that join.
merlin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance