I cant use a unique index because i only want to check for duplicates where
processed = 2, for simplicity i did not include that condition in the
example.

On Fri, Apr 24, 2009 at 5:50 PM, Scott Marlowe <scott.marl...@gmail.com>wrote:

> On Fri, Apr 24, 2009 at 5:37 PM, Miguel Miranda
> <miguel.miran...@gmail.com> wrote:
> > hi , i hava a table:
> > CREATE TABLE public.cdr_ama_stat (
> > id int4 NOT NULL DEFAULT nextval('cdr_ama_stat_id_seq'::regclass),
> > abonado_a varchar(30) NULL,
> > abonado_b varchar(30) NULL,
> > fecha_llamada timestamp NULL,
> > duracion int4 NULL,
> > puerto_a varchar(4) NULL,
> > puerto_b varchar(4) NULL,
> > tipo_llamada char(1) NULL,
> > processed int4 NULL,
> > PRIMARY KEY(id)
> > )
> > GO
> > CREATE INDEX kpi_fecha_llamada
> > ON public.cdr_ama_stat(fecha_llamada)
> >
> > there should be unique values for abonado_a, abonado_b, fecha_llamada,
> > duracion in every row, googling around i found how to delete duplicates
> in
> > postgresonline site ,
>
> Then why not have a unique index on those rows together?
>
> > so i run the following query (lets say i want to know how many duplicates
> > exists for 2004-04-18, before delete them):
> >
> > SELECT * FROM cdr_ama_stat
> > WHERE id NOT IN
> > (SELECT MAX(dt.id)
> > FROM cdr_ama_stat As dt
> > WHERE dt.fecha_llamada BETWEEN '2009-04-18' AND '2009-04-18'::timestamp +
> > INTERVAL '1 day'
> > GROUP BY dt.abonado_a, dt.abonado_b,dt.fecha_llamada,dt.duracion)
> > AND fecha_llamada BETWEEN '2009-04-18' AND '2009-04-18'::timestamp +
> > INTERVAL '1 day'
> >
> > my problem is that the query take forever, number of rows:
>
> Have you tried throwing more work_mem at the problem?
>
> The other method to do this uses no group by but a join clause.
> Depending on the number of dupes it can be faster or slow.
>
> delete from table x where x.id in
>    (select a.id from table a jon table b on (a.somefield=b.somefield
> and a.id < b.id))
>
> Or something like that.
>

Reply via email to