Build a hash where the key is the email address, and the value
is the whole record:
my %email_hash = ();
while (<IN>) {
### use "unpack" or "split" to get fields from line ###
if (! exists($email_hash{$email})) {
$email_hash{$email} = $_;
}
}
You would end up with a hash containing only lines with unique
email addresses - you could then iterate through the hash and
insert rows into the table. Of course this approach could use
a LOT of memory if the file you're importing is large.
Another approach might be to read the tab delimited file into
a temporary MySQL table that has an index based on email address.
Then with DBI(this is just pseudo-code),
1. SELECT * FROM A
ORDER BY EMAIL
2. while (fetch) {
if (email changed) {
insert row into good table
}
}
this would be the long-winded approach. I'm sure there's a
better way, but these are the 2 that came to mind...
HTH.
[EMAIL PROTECTED] [[EMAIL PROTECTED]] wrote:
> Hello all,
>
> Sorry about the off-topic question, but this has been driving me nuts. I am
> trying to import a tab delimited file into a MySQL database and need to remove
> duplicate email addresses, before importing the data. The file is setup like
> this:
>
> id1 user1 pass1 email1 name1 na 1 na
> id2 user2 pass2 email2 name2 na 0 na
> id3 user3 pass3 email2 name3 na 1 na
> id4 user4 pass4 email name4 na 1 na
> ....etc
> now I need to go thru each data record in the file and remove all duplicate
> records where the email address is the same. IE above, id2 has the same email
> address and id3 so I want to remove id3.. and so on. the file is sorted by the
> id value.
>
> Any help much appreciated.
>
>
> Mike(mickalo)Blezien
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Thunder Rain Internet Publishing
> Providing Internet Solutions that work!
> http://www.thunder-rain.com
> Tel: 1(225)686-2002
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
--
Hardy Merrill
Mission Critical Linux, Inc.
http://www.missioncriticallinux.com