Hi Jamie,

I tried out your script and it worked very well.  I notice that most de-duping scripts 
use hashes: There must be some significant benefit to this. Could one also use an 
array and a foreach type code to remove duplicates?  

I was also curious as to
how the following lines worked:

if (! defined($found{lc $mystring}) )
        {
        $found{$mystring} = 1;
        print BRY "$_\n";





in

#C:\Perl\bin\perl -w
#script by Jamie Bryant

$file1 = 'c:\ipaddresses.txt';
$file2 = 'c:\spaces.txt';
open (JAM, "$file1");
open (BRY, ">$file2");
print "Start of process.\n";
$mystring = "";
my %found;

while (<JAM>)
{
        chomp $_;
        $_ = lc $_;
        $mystring = $_;
        
        if (! defined($found{lc $mystring}) )
        {
        $found{$mystring} = 1;
        print BRY "$_\n";
        }
}
print "End of process";
close(BRY);
close(JAM);

Does each iteration of JAM get pushed onto the top of the hash?  And does $_
get everything the found hash gets?  I especially don't get the =1 part...

thanks for your help and the great script Jamie!

Russ

-----Original Message-----
From: Bryant, Jamie [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, October 03, 2000 5:21 PM
To: 'Perl'
Subject: RE: Removing evil people , also leading spaces when copying an
ar ray to file


I had to do the same type of thing running through a list of keywords about
100000 lines long and I had good success with the following script.


Hope this helps.

Jamie

-----Original Message-----
From: Perl [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, October 03, 2000 11:18 AM
To: [EMAIL PROTECTED]
Subject: Removing evil people , also leading spaces when copying an
array to file


Hi All...It's the Worm again,

I regularly ban evil people from my website and log their ip's to a log file
just like so:

216.86.29.12
202.144.64.4
212.189.244.34
212.189.244.34
24.14.15.59
194.23.95.100
152.163.205.36
209.244.229.125
168.191.249.201
152.163.205.37
213.3.226.93
216.201.37.161
24.226.175.199

Evil people don't stop at just one evil deed but keep going; ergo I have
many duplicate ip addresses in the file.

Far below in this email are some of the methods I was looking at to remove
dupes from the file, but along the way I noticed something unsavory:  It
seems that my rather simple code of copying the contents of a file into an
array and then copying that array to a file "inserts leading spaces" on each
line, right before the ip address.  Here is the code I used:

#open list of ip addresses
$file1 = 'c:\ipaddresses.txt';

open (handle1, "<$file1");

@things = <handle1> ;
#put ip addresses into an array


$file2 = 'c:\spaces.txt';
#declare variable for filename

open (handle2, ">$file2") || die "can't open file $!";

print handle2 "@things" || die "no file to print to? $!";
#print array to file spacey.txt


Does anyone know if these leading spaces can be avoided or do they have to
be removed everytime an array is copied into a file?

Thanks!

The Worm

Below are the ways I thought to removes dupes...know of a better way?

a) If @in is sorted, and you want @out to be sorted: (this assumes all true
values in the array) 
    $prev = 'nonesuch';
    @out = grep($_ ne $prev && ($prev = $_), @in);

This is nice in that it doesn't use much extra memory, simulating uniq(1)'s
behavior of removing only adjacent duplicates. It's less nice in that it
won't work with false values like undef, 0, or ""; "0 but true" is OK,
though. 

b) If you don't know whether @in is sorted: 
    undef %saw;
    @out = grep(!$saw{$_}++, @in);


c) Like (b), but @in contains only small integers: 
    @out = grep(!$saw[$_]++, @in);


d) A way to do (b) without any loops or greps: 
    undef %saw;
    @saw{@in} = ();
    @out = sort keys %saw;  # remove sort if undesired


e) Like (d), but @in contains only small positive integers: 
    undef @ary;
    @ary[@in] = @in;
    @out = grep {defined} @ary;


_______________________________________________
Perl-Win32-Web mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/perl-win32-web
_______________________________________________
Perl-Win32-Web mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/perl-win32-web

Reply via email to