RE: [pgsql-es-ayuda] join super lento

Mario Soto Cordones Mon, 22 Feb 2016 11:41:33 -0800

Para colocar ese valor tan alto, debes considerar tu max_connections


El valor que estás colocando es muy alto 

 

 

De: [email protected] 
[mailto:[email protected]] En nombre de Hellmuth Vargas
Enviado el: lunes, 22 de febrero de 2016 16:35
Para: Eduardo Morras <[email protected]>
CC: Lista Postgres ES <[email protected]>
Asunto: Re: [pgsql-es-ayuda] join super lento

 

 

 

Hola Antony

 

Pues, restaure indices y  lleve el work_mem  hasta 4096 MB (la tercera parte de 
la RAM del servidor) y pasaron 4 minutos y nada.

 

set local work_mem='4096 MB'

 

 

select t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact, t.horaact, 
t.usuario, t.identificacionusuario, t.tipomovimientosusuario, c.fecha, 
c.nombre, c.regionalath, c.regionaltdv, c.tiposolicitud, 1 as cant 

from public.ath_tecnicosv2 t 

left join public.ath_cajerosv2 c on t.identificacionusuario=c.identificacion 
and t.fechamto=c.fecha 

where fecha >= '20160218' and fechamto >= '20160218' 

group by  t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact, 
t.horaact, t.usuario, t.identificacionusuario, t.tipomovimientosusuario, 
c.fecha, c.nombre, c.regionalath, c.regionaltdv, c.tiposolicitud,1

 

 

 

 

----

Por otro lado Eduardo, 

 

Elimine todos los indices y cree los que sugiere (aunque el ExPLAIN me indica  
crear indices sobre las fechas antes que  otros  atributos):

 

CREATE INDEX idx_ath_cajerosv2_comp

  ON ath_cajerosv2

  USING btree

  (identificacion, fecha);

 

 

CREATE INDEX idx_ath_tecnicosv2_comp

  ON ath_tecnicosv2

  USING btree

  (identificacionusuario, fechamto);

 

 

CREATE INDEX idx_ath_tecnicosv2_comp2

  ON ath_tecnicosv2

  USING btree

  (descripcionmovimiento COLLATE pg_catalog."default", fechamto, horamto 
COLLATE pg_catalog."default", fechaact COLLATE pg_catalog."default", horaact 
COLLATE pg_catalog."default", usuario COLLATE pg_catalog."default", 
identificacionusuario, tipomovimientosusuario COLLATE pg_catalog."default");

 

 

 

Eso genero este plan de ejecución:

 

 

Group  (cost=96536.23..101779.31 rows=749012 width=179)

  ->  Sort  (cost=96536.23..96910.74 rows=749012 width=179)

        Sort Key: t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact, 
t.horaact, t.usuario, t.identificacionusuario, t.tipomovimientosusuario, 
c.fecha, c.nombre, c.regionalath, c.regionaltdv, c.tiposolicitud

        ->  Nested Loop  (cost=0.09..42932.64 rows=749012 width=179)

              ->  Seq Scan on ath_cajerosv2 c  (cost=0.00..2952.22 rows=9072 
width=64)

                    Filter: (fecha >= '2016-02-18'::date)

              ->  Index Scan using idx_ath_tecnicosv2_comp on ath_tecnicosv2 t  
(cost=0.09..4.38 rows=6 width=123)

                    Index Cond: ((identificacionusuario = c.identificacion) AND 
(fechamto = c.fecha) AND (fechamto >= '2016-02-18'::date))

 

 

 

Ejecute el EXPLAIN ANALYZE, espere 3 minutos y nada ...

 

En cuanto al PRIMARY KEY tengo entendido que un indice unique sobre el campo ID 
 brinda los mismos  resultados y que no tendría importancia la posición de la 
columna respecto  a las demás al menos en PostgreSQL (si me equivoco me 
corrigen por favor), Incluso las columnas ID yo las agregue 'artificialmente' 
pues originalmente no las tenia las tablas, le agregue estos campos con el 
propósito de probar los JOIN solo con ID.

 

 

El 22 de febrero de 2016, 13:47, Eduardo Morras<[email protected] 
<mailto:[email protected]> > escribió:

On Mon, 22 Feb 2016 09:55:21 -0500
Hellmuth Vargas <[email protected] <mailto:[email protected]> > wrote:

> Hola Lista
> Les tengo el siguiente desafio pues no he podido dar con el tema,
> tengo dos tablas
>
> CREATE TABLE ath_tecnicosv2
> (
>   descripcionmovimiento character varying(160),
>   fechamto date,
>   horamto character varying(8),
>   fechaact character varying(8),
>   horaact character varying(8),
>   usuario character varying(50),
>   identificacionusuario bigint,
>   tipomovimientosusuario character varying(25),
>   id bigserial NOT NULL
> )
> WITH (
>   OIDS=FALSE
> );
>
> -- tamaño: 1639200 registros
>
> CREATE UNIQUE INDEX idx_u_ath_tecnicosv2_id
>   ON ath_tecnicosv2
>   USING btree
>   (id);
>
> CREATE INDEX idx_ath_tecnicosv2_comp
>   ON ath_tecnicosv2
>   USING btree
>   (fechamto, identificacionusuario, descripcionmovimiento COLLATE
> pg_catalog."default", horamto COLLATE pg_catalog."default", fechaact
> COLLATE pg_catalog."default", horaact COLLATE pg_catalog."default",
> usuario COLLATE pg_catalog."default", tipomovimientosusuario COLLATE
> pg_catalog."default");
>
> CREATE INDEX idx_ath_tecnicosv2_fecha2
>   ON public.ath_tecnicosv2
>   USING btree
>   (fechamto asc, identificacionusuario asc,id);
>
>
> CREATE TABLE ath_cajerosv2
> (
>   fecha date,
>   nombre character varying(60),
>   regionalath character varying(24),
>   regionaltdv character varying(54),
>   tiposolicitud character varying(10),
>   id_usuario character varying(4),
>   identificacion bigint,
>   usuario character varying(36),
>   cant bigint,
>   id bigserial NOT NULL
> )
> WITH (
>   OIDS=FALSE
> );
>
> -- tamaño: 132050 registros
>
> CREATE UNIQUE INDEX idx_u_ath_cajerosv2_id
>   ON ath_cajerosv2
>   USING btree
>   (id);
>
> CREATE INDEX idx_ath_cajerosv2_comp
>   ON ath_cajerosv2
>   USING btree
>   (fecha, identificacion, nombre COLLATE pg_catalog."default",
> regionalath COLLATE pg_catalog."default", regionaltdv COLLATE
> pg_catalog."default", tiposolicitud COLLATE pg_catalog."default");
>
> CREATE INDEX idx_ath_cajerosv2_fecha2
>   ON public.ath_cajerosv2
>   USING btree
>   (fecha asc, identificacion asc,id);
>
> -- consulta Básica Inicial:
>
> select t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact,
> t.horaact, t.usuario, t.identificacionusuario,
> t.tipomovimientosusuario, c.fecha, c.nombre, c.regionalath,
> c.regionaltdv, c.tiposolicitud, 1 as cant from public.ath_tecnicosv2 t
> join public.ath_cajerosv2 c on
> t.identificacionusuario=c.identificacion and t.fechamto=c.fecha
> where fecha >= '20160218' and fechamto >= '20160218'
> group by  t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact,
> t.horaact, t.usuario, t.identificacionusuario,
> t.tipomovimientosusuario, c.fecha, c.nombre, c.regionalath,
> c.regionaltdv, c.tiposolicitud,1

Exactamente en Postgres no se, en otras SGBD el PRIMARY KEY es la primera 
columna por que la implementacion sabe donde comienza cada fila, y por tanto 
calcula rapidamente donde esta el valor del primary key. En tu caso tiene que 
ir al final de cada fila, que es variable, para comparar el id. Pon el id 
bigserial como primera columna, y crea explicitamente un PRIMARY KEY sobre el. 
Ademas, como el join lo vas a hacer sobre 
(t.identificacionusuario=c.identificacion and t.fechamto=c.fecha), yo crearia 
dos indices, uno para tabla con dichas columnas. Un indice adicional sobre 
ath_tecnicosv2 con (t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact, 
t.horaact, t.usuario, t.identificacionusuario, t.tipomovimientosusuario) hara 
que el group by sea mas rapido. Dado que las columnas tienen un primary key, no 
hace falta añadir id al final de cada indice (lo has puesto en algunos y no en 
otros) si no que Postgres lo hace automaticamente.

>
> Entonces no se pueda ser!!!
>
> incluso (a manera de comentario) replantee la consultas asi y tampoco:
>
> with base1 as (
> select t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact,
> t.horaact, t.usuario, t.identificacionusuario,
> t.tipomovimientosusuario from (
> select t.identificacionusuario,t.fechamto from public.ath_tecnicosv2 t
>  where fechamto >= '20160218'
> intersect
> select c.identificacion,c.fecha from public.ath_cajerosv2 c  where
> fecha >= '20160218'
> ) as x
> join public.ath_tecnicosv2 t on
> (t.identificacionusuario,t.fechamto)=
> (x.identificacionusuario,x.fechamto) ), base2 as (

> select x.identificacionusuario,x.fechamto,c.fecha, c.nombre,
> c.regionalath, c.regionaltdv, c.tiposolicitud
> from (
> select t.identificacionusuario,t.fechamto from public.ath_tecnicosv2 t
>  where fechamto >= '20160218'
> intersect
> select c.identificacion,c.fecha from public.ath_cajerosv2 c  where
> fecha >= '20160218'
> ) as x
> join public.ath_cajerosv2 c  on
>  (c.identificacion,c.fecha)=(x.identificacionusuario,x.fechamto)
> )
> select u.descripcionmovimiento, u.fechamto, u.horamto, u.fechaact,
> u.horaact, u.usuario, u.identificacionusuario,
> u.tipomovimientosusuario, u.fecha, u.nombre, u.regionalath,
> u.regionaltdv, u.tiposolicitud, 1 as cant from (
> select
> x.descripcionmovimiento, x.fechamto, x.horamto, x.fechaact, x.horaact,
> x.usuario, x.identificacionusuario, x.tipomovimientosusuario, y.fecha,
> y.nombre, y.regionalath, y.regionaltdv,
> y.tiposolicitud,y.identificacionusuario as
> identificacionusuario2,y.fechamto as fechamto2
> from base1 as x
> cross join  base2 as y
> ) as u where u.identificacionusuario=u.identificacionusuario2 and
> u.fechamto=u.fechamto2
> order by 2,3,7
>
>
> ---
>
> Sort  (cost=8260.00..8260.48 rows=961 width=1010)
>   Sort Key: x.fechamto, x.horamto, x.identificacionusuario
>   CTE base1
>     ->  Nested Loop  (cost=0.19..3929.54 rows=21842 width=123)
>           ->  Subquery Scan on x_1  (cost=0.08..1528.40 rows=533
> width=12) ->  HashSetOp Intersect  (cost=0.08..1526.27 rows=533
> width=12)
>                       ->  Append  (cost=0.08..1422.83 rows=103438
> width=12) ->  Subquery Scan on "*SELECT* 2"
>  (cost=0.08..125.87 rows=9072 width=12)
>                                   ->  Index Only Scan using
> idx_ath_cajerosv2_fecha on ath_cajerosv2 c  (cost=0.08..89.58
> rows=9072 width=12)
>                                         Index Cond: (fecha >=
> '2016-02-18'::date)
>                             ->  Subquery Scan on "*SELECT* 1"
>  (cost=0.09..1296.96 rows=94366 width=12)
>                                   ->  Index Only Scan using
> idx_ath_tecnicosv2_fecha on ath_tecnicosv2 t_1  (cost=0.09..919.50
> rows=94366 width=12)
>                                         Index Cond: (fechamto >=
> '2016-02-18'::date)
>           ->  Index Only Scan using idx_ath_tecnicosv2_comp on
> ath_tecnicosv2 t  (cost=0.11..4.08 rows=107 width=123)
>                 Index Cond: ((fechamto = x_1.fechamto) AND
> (identificacionusuario = x_1.identificacionusuario))
>   CTE base2
>     ->  Nested Loop  (cost=0.17..2175.54 rows=1760 width=68)
>           ->  Subquery Scan on x_2  (cost=0.08..1528.40 rows=533
> width=12) ->  HashSetOp Intersect  (cost=0.08..1526.27 rows=533
> width=12)
>                       ->  Append  (cost=0.08..1422.83 rows=103438
> width=12) ->  Subquery Scan on "*SELECT* 2_1"
>  (cost=0.08..125.87 rows=9072 width=12)
>                                   ->  Index Only Scan using
> idx_ath_cajerosv2_fecha on ath_cajerosv2 c_2  (cost=0.08..89.58
> rows=9072 width=12)
>                                         Index Cond: (fecha >=
> '2016-02-18'::date)
>                             ->  Subquery Scan on "*SELECT* 1_1"
>  (cost=0.09..1296.96 rows=94366 width=12)
>                                   ->  Index Only Scan using
> idx_ath_tecnicosv2_fecha on ath_tecnicosv2 t_2  (cost=0.09..919.50
> rows=94366 width=12)
>                                         Index Cond: (fechamto >=
> '2016-02-18'::date)
>           ->  Index Only Scan using idx_ath_cajerosv2_comp on
> ath_cajerosv2 c_1  (cost=0.08..1.15 rows=17 width=64)
>                 Index Cond: ((fecha = x_2.fechamto) AND
> (identificacion = x_2.identificacionusuario))
>   ->  Hash Join  (cost=22.88..2145.40 rows=961 width=1010)
>         Hash Cond: ((x.identificacionusuario =
> y.identificacionusuario) AND (x.fechamto = y.fechamto))
>         ->  CTE Scan on base1 x  (cost=0.00..174.74 rows=21842
> width=638) ->  Hash  (cost=14.08..14.08 rows=1760 width=384)
>               ->  CTE Scan on base2 y  (cost=0.00..14.08 rows=1760
> width=384)
>

Creo que estas complicando en exceso o has intentado optimizar las tablas y 
consultas antes de tiempo.


> He cambiado la estructura de las tablas: antes todos los campos eran
> del tipo TEXT y por lo tanto la tablas tenían ademas tablas TOAST y
> supuse que el JOIN con estas era lo que estaba penalizando.
>
> Las especificaciones, es un  PostgreSQL 9.4.5, corriendo en un CentOS
> 6 con discos de estado solido. 12 GB de RAM, 12

Deberia hacerlo sin problemas. No creo que el problema sea por falta de 
"hierro".

> Lista les agradezco sus comentratios el ideas de que puede ser el
> probelma.
>
> --
> Cordialmente,
>
> Ing. Hellmuth I. Vargas S.
> Esp. Telemática y Negocios por Internet
> Oracle Database 10g Administrator Certified Associate
> EnterpriseDB Certified PostgreSQL 9.3 Associate


---   ---
Eduardo Morras <[email protected] <mailto:[email protected]> >

-
Enviado a la lista de correo pgsql-es-ayuda ([email protected] 
<mailto:[email protected]> )
Para cambiar tu suscripción:
http://www.postgresql.org/mailpref/pgsql-es-ayuda





 

-- 

Cordialmente, 

Ing. Hellmuth I. Vargas S. 
Esp. Telemática y Negocios por Internet 

Oracle Database 10g Administrator Certified Associate

EnterpriseDB Certified PostgreSQL 9.3 Associate

RE: [pgsql-es-ayuda] join super lento

Responder a