[PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Ian Pushee

Hi Folks,

This is my first time posting here, so hopefully I manage to convey all 
the information needed.
We have a simple query that just started giving us problems in 
production when the number of rows gets too large (100k).
The issue seems to be that the planner wants to sort the rows using a 
sequential scan, rather than the index provided specifically for this 
query. This isn't a problem with low numbers of rows, but eventually the 
query outgrows work_mem and uses the disk, slowing does the query 
greatly. I know the common answer is to increase work_mem... but since 
this tables growth is unpredictable, that isn't a viable strategy.
I've tried increasing shared_buffers and effective_cache_size, but that 
doesn't appear to effect the plan chosen here. Setting 
random_page_cost=1.0 works, but I'm hoping for a more general solution 
that doesn't require setting that locally each time I run the query. I 
guess my real question is wether or not there is any way to get the 
planner to take into account the fact that it's going to need to do an 
'external merge', and that it is going to take a LONG time?


Table and Index Schemas:
CREATE TABLE events
(
  id serial NOT NULL,
  name character varying(64),
  eventspy_id integer NOT NULL,
  camera_id integer NOT NULL,
  start_time timestamp without time zone NOT NULL,
  millisecond smallint NOT NULL,
  uid smallint NOT NULL,
  update_time timestamp without time zone NOT NULL DEFAULT now(),
  length integer NOT NULL,
  objects text NOT NULL,
  priority smallint NOT NULL,
  type character varying(45) NOT NULL DEFAULT 'alarm'::character varying,
  status event_status NOT NULL DEFAULT 'new'::event_status,
  confidence smallint NOT NULL DEFAULT 100::smallint,
  CONSTRAINT events_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX events_confidnce
  ON events
  USING btree
  (confidence);

CREATE INDEX events_summary
  ON events
  USING btree
  (name COLLATE pg_catalog.default, eventspy_id, camera_id, type 
COLLATE pg_catalog.default, status);


Query:
SELECT name, type, eventspy_id, camera_id, status, COUNT(id), 
MAX(update_time), MIN(start_time), MAX(start_time) FROM events WHERE 
confidence=0 GROUP BY name, eventspy_id, camera_id, type, status;


Explain Analyze outputs (links as requested):
Default plan: http://explain.depesz.com/s/ib3k
Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP

Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM
All pgsql settings are at their defaults.

Thanks for any help you can provide,
-Ian Pushee



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Ian Pushee



On 6/19/2015 10:46 AM, Igor Neyman wrote:


Probably events_confidnce index is not very selective, that's why optimizer 
prefers seq scan.
I'd try to create an index on (name, eventspy_id, camera_id, type, status).

Also, the recent 9.2 is 9.2.13, you should upgrade.

Regards,
Igor Neyman


Hi Igor,

I already have an index for (name, eventspy_id, camera_id, type, 
status)... that is the index being used (apparently silently) when I set 
random_page_cost=1.0.


Thanks,
-Ian


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Ian Pushee



On 6/19/2015 10:47 AM, Andreas Kretschmer wrote:

Explain Analyze outputs (links as requested):
Default plan: http://explain.depesz.com/s/ib3k
Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP

Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM
All pgsql settings are at their defaults.

increase work_mem. per session via set work_mem  = 'xxxMB'; or in
postgresql.conf, reload.




Hi Andreas,

The number of rows in the events table isn't constrained, so 
unfortunately it isn't feasible to set work_mem high enough to allow an 
in-memory sort. Forcing the planner to use the index works to produce a 
fast query, so I'm wondering if there is a more general way to getting 
the planner to take into account that work_mem isn't big enough to fit 
the query which will result in a MUCH more costly external merge.


Thanks,
-Ian


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance