Re: Removing duplicate records - OT

Hardy Merrill Mon, 13 Aug 2001 07:13:28 -0700
Build a hash where the key is the email address, and the value
is the whole record:

my %email_hash = ();
while (<IN>) {
   ### use "unpack" or "split" to get fields from line ###
   if (! exists($email_hash{$email})) {
      $email_hash{$email} = $_;
   }
}

You would end up with a hash containing only lines with unique
email addresses - you could then iterate through the hash and
insert rows into the table.  Of course this approach could use
a LOT of memory if the file you're importing is large.

Another approach might be to read the tab delimited file into
a temporary MySQL table that has an index based on email address.
Then with DBI(this is just pseudo-code),
 
  1. SELECT * FROM A
     ORDER BY EMAIL

  2. while (fetch) {
        if (email changed) {
           insert row into good table
        }
     }

this would be the long-winded approach.  I'm sure there's a
better way, but these are the 2 that came to mind...

HTH.


[EMAIL PROTECTED] [[EMAIL PROTECTED]] wrote:
> Hello all,
> 
> Sorry about the off-topic question, but this has been driving me nuts. I am
> trying to import a tab delimited file into a MySQL database and need to remove
> duplicate email addresses, before importing the data. The file is setup like
> this:
> 
> id1  user1  pass1  email1  name1  na  1  na
> id2  user2  pass2  email2  name2  na  0  na
> id3  user3  pass3  email2  name3  na  1  na
> id4  user4  pass4  email   name4  na  1  na
> ....etc
> now I need to go thru each data record in the file and remove all duplicate
> records where the email address is the same. IE above, id2 has the same email
> address and id3 so I want to remove id3.. and so on. the file is sorted by the
> id value.
> 
> Any help much appreciated.
> 
> 
> Mike(mickalo)Blezien
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Thunder Rain Internet Publishing
> Providing Internet Solutions that work!
> http://www.thunder-rain.com
> Tel: 1(225)686-2002
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

-- 
Hardy Merrill
Mission Critical Linux, Inc.
http://www.missioncriticallinux.com
Re: Removing duplicate records - OT

Reply via email to