Re: [pgsql-es-ayuda] join super lento

Eduardo Morras Mon, 22 Feb 2016 10:48:07 -0800

On Mon, 22 Feb 2016 09:55:21 -0500
Hellmuth Vargas <[email protected]> wrote:


> Hola Lista
> Les tengo el siguiente desafio pues no he podido dar con el tema,
> tengo dos tablas
> 
> CREATE TABLE ath_tecnicosv2
> (
>   descripcionmovimiento character varying(160),
>   fechamto date,
>   horamto character varying(8),
>   fechaact character varying(8),
>   horaact character varying(8),
>   usuario character varying(50),
>   identificacionusuario bigint,
>   tipomovimientosusuario character varying(25),
>   id bigserial NOT NULL
> )
> WITH (
>   OIDS=FALSE
> );
> 
> -- tamaño: 1639200 registros
> 
> CREATE UNIQUE INDEX idx_u_ath_tecnicosv2_id
>   ON ath_tecnicosv2
>   USING btree
>   (id);
> 
> CREATE INDEX idx_ath_tecnicosv2_comp
>   ON ath_tecnicosv2
>   USING btree
>   (fechamto, identificacionusuario, descripcionmovimiento COLLATE
> pg_catalog."default", horamto COLLATE pg_catalog."default", fechaact
> COLLATE pg_catalog."default", horaact COLLATE pg_catalog."default",
> usuario COLLATE pg_catalog."default", tipomovimientosusuario COLLATE
> pg_catalog."default");
> 
> CREATE INDEX idx_ath_tecnicosv2_fecha2
>   ON public.ath_tecnicosv2
>   USING btree
>   (fechamto asc, identificacionusuario asc,id);
> 
> 
> CREATE TABLE ath_cajerosv2
> (
>   fecha date,
>   nombre character varying(60),
>   regionalath character varying(24),
>   regionaltdv character varying(54),
>   tiposolicitud character varying(10),
>   id_usuario character varying(4),
>   identificacion bigint,
>   usuario character varying(36),
>   cant bigint,
>   id bigserial NOT NULL
> )
> WITH (
>   OIDS=FALSE
> );
> 
> -- tamaño: 132050 registros
> 
> CREATE UNIQUE INDEX idx_u_ath_cajerosv2_id
>   ON ath_cajerosv2
>   USING btree
>   (id);
> 
> CREATE INDEX idx_ath_cajerosv2_comp
>   ON ath_cajerosv2
>   USING btree
>   (fecha, identificacion, nombre COLLATE pg_catalog."default",
> regionalath COLLATE pg_catalog."default", regionaltdv COLLATE
> pg_catalog."default", tiposolicitud COLLATE pg_catalog."default");
> 
> CREATE INDEX idx_ath_cajerosv2_fecha2
>   ON public.ath_cajerosv2
>   USING btree
>   (fecha asc, identificacion asc,id);
> 
> -- consulta Básica Inicial:
> 
> select t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact,
> t.horaact, t.usuario, t.identificacionusuario,
> t.tipomovimientosusuario, c.fecha, c.nombre, c.regionalath,
> c.regionaltdv, c.tiposolicitud, 1 as cant from public.ath_tecnicosv2 t
> join public.ath_cajerosv2 c on
> t.identificacionusuario=c.identificacion and t.fechamto=c.fecha
> where fecha >= '20160218' and fechamto >= '20160218'
> group by  t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact,
> t.horaact, t.usuario, t.identificacionusuario,
> t.tipomovimientosusuario, c.fecha, c.nombre, c.regionalath,
> c.regionaltdv, c.tiposolicitud,1

Exactamente en Postgres no se, en otras SGBD el PRIMARY KEY es la primera 
columna por que la implementacion sabe donde comienza cada fila, y por tanto 
calcula rapidamente donde esta el valor del primary key. En tu caso tiene que 
ir al final de cada fila, que es variable, para comparar el id. Pon el id 
bigserial como primera columna, y crea explicitamente un PRIMARY KEY sobre el. 
Ademas, como el join lo vas a hacer sobre 
(t.identificacionusuario=c.identificacion and t.fechamto=c.fecha), yo crearia 
dos indices, uno para tabla con dichas columnas. Un indice adicional sobre 
ath_tecnicosv2 con (t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact, 
t.horaact, t.usuario, t.identificacionusuario, t.tipomovimientosusuario) hara 
que el group by sea mas rapido. Dado que las columnas tienen un primary key, no 
hace falta añadir id al final de cada indice (lo has puesto en algunos y no en 
otros) si no que Postgres lo hace automaticamente.

> 
> Entonces no se pueda ser!!!
> 
> incluso (a manera de comentario) replantee la consultas asi y tampoco:
> 
> with base1 as (
> select t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact,
> t.horaact, t.usuario, t.identificacionusuario,
> t.tipomovimientosusuario from (
> select t.identificacionusuario,t.fechamto from public.ath_tecnicosv2 t
>  where fechamto >= '20160218'
> intersect
> select c.identificacion,c.fecha from public.ath_cajerosv2 c  where
> fecha >= '20160218'
> ) as x
> join public.ath_tecnicosv2 t on
> (t.identificacionusuario,t.fechamto)=
> (x.identificacionusuario,x.fechamto) ), base2 as (
> select x.identificacionusuario,x.fechamto,c.fecha, c.nombre,
> c.regionalath, c.regionaltdv, c.tiposolicitud
> from (
> select t.identificacionusuario,t.fechamto from public.ath_tecnicosv2 t
>  where fechamto >= '20160218'
> intersect
> select c.identificacion,c.fecha from public.ath_cajerosv2 c  where
> fecha >= '20160218'
> ) as x
> join public.ath_cajerosv2 c  on
>  (c.identificacion,c.fecha)=(x.identificacionusuario,x.fechamto)
> )
> select u.descripcionmovimiento, u.fechamto, u.horamto, u.fechaact,
> u.horaact, u.usuario, u.identificacionusuario,
> u.tipomovimientosusuario, u.fecha, u.nombre, u.regionalath,
> u.regionaltdv, u.tiposolicitud, 1 as cant from (
> select
> x.descripcionmovimiento, x.fechamto, x.horamto, x.fechaact, x.horaact,
> x.usuario, x.identificacionusuario, x.tipomovimientosusuario, y.fecha,
> y.nombre, y.regionalath, y.regionaltdv,
> y.tiposolicitud,y.identificacionusuario as
> identificacionusuario2,y.fechamto as fechamto2
> from base1 as x
> cross join  base2 as y
> ) as u where u.identificacionusuario=u.identificacionusuario2 and
> u.fechamto=u.fechamto2
> order by 2,3,7
> 
> 
> ---
> 
> Sort  (cost=8260.00..8260.48 rows=961 width=1010)
>   Sort Key: x.fechamto, x.horamto, x.identificacionusuario
>   CTE base1
>     ->  Nested Loop  (cost=0.19..3929.54 rows=21842 width=123)
>           ->  Subquery Scan on x_1  (cost=0.08..1528.40 rows=533
> width=12) ->  HashSetOp Intersect  (cost=0.08..1526.27 rows=533
> width=12)
>                       ->  Append  (cost=0.08..1422.83 rows=103438
> width=12) ->  Subquery Scan on "*SELECT* 2"
>  (cost=0.08..125.87 rows=9072 width=12)
>                                   ->  Index Only Scan using
> idx_ath_cajerosv2_fecha on ath_cajerosv2 c  (cost=0.08..89.58
> rows=9072 width=12)
>                                         Index Cond: (fecha >=
> '2016-02-18'::date)
>                             ->  Subquery Scan on "*SELECT* 1"
>  (cost=0.09..1296.96 rows=94366 width=12)
>                                   ->  Index Only Scan using
> idx_ath_tecnicosv2_fecha on ath_tecnicosv2 t_1  (cost=0.09..919.50
> rows=94366 width=12)
>                                         Index Cond: (fechamto >=
> '2016-02-18'::date)
>           ->  Index Only Scan using idx_ath_tecnicosv2_comp on
> ath_tecnicosv2 t  (cost=0.11..4.08 rows=107 width=123)
>                 Index Cond: ((fechamto = x_1.fechamto) AND
> (identificacionusuario = x_1.identificacionusuario))
>   CTE base2
>     ->  Nested Loop  (cost=0.17..2175.54 rows=1760 width=68)
>           ->  Subquery Scan on x_2  (cost=0.08..1528.40 rows=533
> width=12) ->  HashSetOp Intersect  (cost=0.08..1526.27 rows=533
> width=12)
>                       ->  Append  (cost=0.08..1422.83 rows=103438
> width=12) ->  Subquery Scan on "*SELECT* 2_1"
>  (cost=0.08..125.87 rows=9072 width=12)
>                                   ->  Index Only Scan using
> idx_ath_cajerosv2_fecha on ath_cajerosv2 c_2  (cost=0.08..89.58
> rows=9072 width=12)
>                                         Index Cond: (fecha >=
> '2016-02-18'::date)
>                             ->  Subquery Scan on "*SELECT* 1_1"
>  (cost=0.09..1296.96 rows=94366 width=12)
>                                   ->  Index Only Scan using
> idx_ath_tecnicosv2_fecha on ath_tecnicosv2 t_2  (cost=0.09..919.50
> rows=94366 width=12)
>                                         Index Cond: (fechamto >=
> '2016-02-18'::date)
>           ->  Index Only Scan using idx_ath_cajerosv2_comp on
> ath_cajerosv2 c_1  (cost=0.08..1.15 rows=17 width=64)
>                 Index Cond: ((fecha = x_2.fechamto) AND
> (identificacion = x_2.identificacionusuario))
>   ->  Hash Join  (cost=22.88..2145.40 rows=961 width=1010)
>         Hash Cond: ((x.identificacionusuario =
> y.identificacionusuario) AND (x.fechamto = y.fechamto))
>         ->  CTE Scan on base1 x  (cost=0.00..174.74 rows=21842
> width=638) ->  Hash  (cost=14.08..14.08 rows=1760 width=384)
>               ->  CTE Scan on base2 y  (cost=0.00..14.08 rows=1760
> width=384)
> 

Creo que estas complicando en exceso o has intentado optimizar las tablas y 
consultas antes de tiempo.


> He cambiado la estructura de las tablas: antes todos los campos eran
> del tipo TEXT y por lo tanto la tablas tenían ademas tablas TOAST y
> supuse que el JOIN con estas era lo que estaba penalizando.
> 
> Las especificaciones, es un  PostgreSQL 9.4.5, corriendo en un CentOS
> 6 con discos de estado solido. 12 GB de RAM, 12

Deberia hacerlo sin problemas. No creo que el problema sea por falta de 
"hierro".

> Lista les agradezco sus comentratios el ideas de que puede ser el
> probelma.
> 
> -- 
> Cordialmente,
> 
> Ing. Hellmuth I. Vargas S.
> Esp. Telemática y Negocios por Internet
> Oracle Database 10g Administrator Certified Associate
> EnterpriseDB Certified PostgreSQL 9.3 Associate


---   ---
Eduardo Morras <[email protected]>

-
Enviado a la lista de correo pgsql-es-ayuda ([email protected])
Para cambiar tu suscripción:
http://www.postgresql.org/mailpref/pgsql-es-ayuda

Re: [pgsql-es-ayuda] join super lento

Responder a