I know I have seen this posted
before.......
We have a large range
partitioned table that has duplicates in it. What is the fastest way
to remove the dups.? I have the following scripts which do it but may
be fast or slow. What do you guys use?
DELETE FROM tablename
WHERE ROWID NOT IN
(SELECT MIN(ROWID)
FROM tablename
GROUP
BY fieldnames);
[Ferenc
Mantfeld] This will be your fastest way, provided you have an index on
the columns searched for. Actually the format of the statement would
be
delete from INVOICE_DETAILS A where A.rowid
>
(select
min(rowid) from INVOICE_DETAILS B where
B.INV_NUM=A.INV_NUM and
B.LINE_NUM=A.LINE_NUM )
;
Ensure you have a composite index on
INVOICE_DETAILS (INV_NUM, LINE_NUM).
Or
alter table &table_name
add constraint duplicate_cons
unique key (&column_name)
exceptions into
exception table;
[Ferenc Mantfeld] Problem with this is when you want
to delete the duplicates, you have no way of telling, unless you code the
min function again. If you have triplicates, and want to keep one
of them and blow away the other two, this is a tedious way, and all
this does is to help you identify the
duplicates.
How to find duplicates:
select &column_name, count(&column_name)
from &table_name
group by
&column_name
having
count(&column_name) > 1;
[Ferenc Mantfeld] Same as above. only
identifies the duplicates, does nothing to remove
them.
Tom