Re: Removing duplicate lines.

Jenda Krynicky Mon, 21 Jul 2003 05:32:16 -0700

From: [EMAIL PROTECTED]
> I have a text file which contains a list of companies:
> 
> NORTH DOWN AND ARDS INSTITUTE
> NOTTINGHAM HEALTH AUTHORITY
> 1ST CONTACT GROUP LTD
> 1ST CONTACT GROUP LTD
> 1ST CONTACT GROUP LTD
> 1ST CONTACT GROUP LTD
> 4D TELECOM & KINGSTON INMEDIA
> A E COOK LTD
> A E COOK LTD
> 
> etc......
> 
> How can a write a simple perl script to remove the duplicates and
> leave just one of each customer? Any help would be great.


Is it safe to assume that all duplicates are together like this? If 
so all you have to do is to
        1) read the file line by line
        2) only print the line you just read if it's different from the last 
one
        3) remember the line

or if I word it differently. 
        1) read the file
        2) skip the line if it's the same as the last one
        3) print it and remember it

        my $last = '';
        while (<>) {
                next if $_ eq $last;
                print $_;
                $last = $;
        }

If the duplicates are scattered all over the place, then the easiest 
solution is to use a hash, That will get rid of the duplicates for 
you:

        while (<>) {
                chomp;
                $seen{$_}++;
        }

        foreack my $item (keys %seen) {
                print $item,"\n";
        }

If the list of companies is huge you may need to store the hash on 
disk to prevent swapping:

        use DB_File;
        tie %seen, 'DB_File', $filename;
        ...

Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing duplicate lines.

Reply via email to