I cant use a unique index because i only want to check for duplicates where processed = 2, for simplicity i did not include that condition in the example.
On Fri, Apr 24, 2009 at 5:50 PM, Scott Marlowe <scott.marl...@gmail.com>wrote: > On Fri, Apr 24, 2009 at 5:37 PM, Miguel Miranda > <miguel.miran...@gmail.com> wrote: > > hi , i hava a table: > > CREATE TABLE public.cdr_ama_stat ( > > id int4 NOT NULL DEFAULT nextval('cdr_ama_stat_id_seq'::regclass), > > abonado_a varchar(30) NULL, > > abonado_b varchar(30) NULL, > > fecha_llamada timestamp NULL, > > duracion int4 NULL, > > puerto_a varchar(4) NULL, > > puerto_b varchar(4) NULL, > > tipo_llamada char(1) NULL, > > processed int4 NULL, > > PRIMARY KEY(id) > > ) > > GO > > CREATE INDEX kpi_fecha_llamada > > ON public.cdr_ama_stat(fecha_llamada) > > > > there should be unique values for abonado_a, abonado_b, fecha_llamada, > > duracion in every row, googling around i found how to delete duplicates > in > > postgresonline site , > > Then why not have a unique index on those rows together? > > > so i run the following query (lets say i want to know how many duplicates > > exists for 2004-04-18, before delete them): > > > > SELECT * FROM cdr_ama_stat > > WHERE id NOT IN > > (SELECT MAX(dt.id) > > FROM cdr_ama_stat As dt > > WHERE dt.fecha_llamada BETWEEN '2009-04-18' AND '2009-04-18'::timestamp + > > INTERVAL '1 day' > > GROUP BY dt.abonado_a, dt.abonado_b,dt.fecha_llamada,dt.duracion) > > AND fecha_llamada BETWEEN '2009-04-18' AND '2009-04-18'::timestamp + > > INTERVAL '1 day' > > > > my problem is that the query take forever, number of rows: > > Have you tried throwing more work_mem at the problem? > > The other method to do this uses no group by but a join clause. > Depending on the number of dupes it can be faster or slow. > > delete from table x where x.id in > (select a.id from table a jon table b on (a.somefield=b.somefield > and a.id < b.id)) > > Or something like that. >