I know I have seen this posted
before.......
We have a large range partitioned
table that has duplicates in it. What is the fastest way to remove the
dups.? I have the following scripts which do it but may be fast or
slow. What do you guys use?
DELETE FROM tablename
WHERE ROWID NOT IN
(SELECT MIN(ROWID)
FROM tablename
GROUP
BY fieldnames);
[Ferenc
Mantfeld] This will be your fastest way, provided you have an index on
the columns searched for. Actually the format of the statement would
be
delete from INVOICE_DETAILS A where A.rowid
>
(select
min(rowid) from INVOICE_DETAILS B where
B.INV_NUM=A.INV_NUM and
B.LINE_NUM=A.LINE_NUM )
;
Ensure you have a composite index on INVOICE_DETAILS
(INV_NUM, LINE_NUM).
Or
alter table &table_name
add constraint duplicate_cons
unique key (&column_name)
exceptions into exception
table;
[Ferenc Mantfeld] Problem with this is when you want to delete the
duplicates, you have no way of telling, unless you code the min function
again. If you have triplicates, and want to keep one of them and
blow away the other two, this is a tedious way, and all this does is to
help you identify the duplicates.
How to find duplicates:
select &column_name, count(&column_name)
from &table_name
group by &column_name
having count(&column_name) >
1;
[Ferenc Mantfeld] Same as above. only identifies the duplicates,
does nothing to remove them.
Tom