Re: Load duplicates

2023-08-15 Thread Geoffrey Rommel
So are you reading the records one at a time through DBI and looking for
duplicates? There is probably a better way. 

I work with Teradata, so I'll use Teradata syntax, but the syntax for other
databases will be similar.

To find the duplicates in your table, you can select all columns from the
table (except the auto-increment) along with a count, like so:

create volatile table ##dup_counts
as
(select acct, comp, accr, descrip, all-other-columns...,
min(auto_increment) min_num /* or max() */ , count(*) kount
from table1
group by acct, comp, accr, descrip, all-other-columns... )
with data;

Now there are no duplicates in ##dup_counts, so you can insert them to your
target table, either generating a new auto_increment or using min_num.

The duplicates can now be inserted into your error table:
insert into error_table
select acct, comp, ...etc.
from ##dup_counts
where kount > 1;

Et voilà ... the records are deduped with only three statements.

As you probably know, in relational databases rows do not have an order, so
I'm not sure that you necessarily need a sequence number on the duplicates.
If you want to trace the duplicates back to a flat file source, it might be
better to add a sequence number to the file before loading it to the
database.

I hope this helps.


On Tue, Aug 15, 2023 at 11:50 AM Ian  wrote:

> Hi group,
>
> My perl skills are basic and my SQL skills almost match that.
> Using perl 5.28 and mysql on windows.
>
> I have a couple million records that needs processing to go to their final
> destination.
> Currently I'm catchin duplicates with error 1062 in perl and using that to
> write the duplicate records to a separate table for later processing.
>
> Question: Can DBI give me a field from the record on the main table
> causing the duplicate so I can add that to the record going to
> the duplicate table?
>
> Example:
> MAIN table fields = "auto_increment", acct, comp,accr,desc,etc  :
>  (auto_increment created when records are added, rest is the source record)
> Duplicate table fields = "auto_increment from MAIN table",
> acct,comp,accr,desc,etc.
>
> Thanks
> Ian
>
>
>


Load duplicates

2023-08-15 Thread Ian
Hi group,

My perl skills are basic and my SQL skills almost match that.
Using perl 5.28 and mysql on windows.

I have a couple million records that needs processing to go to their final
destination.
Currently I'm catchin duplicates with error 1062 in perl and using that to
write the duplicate records to a separate table for later processing.

Question: Can DBI give me a field from the record on the main table causing
the duplicate so I can add that to the record going to the duplicate table?

Example:
MAIN table fields = "auto_increment", acct, comp,accr,desc,etc  :
 (auto_increment created when records are added, rest is the source record)
Duplicate table fields = "auto_increment from MAIN table",
acct,comp,accr,desc,etc.

Thanks
Ian