Re: hashes and loops

Jeff 'japhy' Pinyan Tue, 14 Jun 2005 17:07:51 -0700

On Jun 14, Karyn Williams said:

Below is code that I found on the web that I slightly modified. I am trying
to create a script to remove from a file (tlist) the items in another file
(tnames). This works but I have multiple files (tlist) I need to check
against. I'm not sure how/where to put the loop and would appreciate any
help. I am also wondering why the hash ? Does it work better in  the script
than an array and what are the keys in this case as the files (both) are
just lists of e-mail addresses.

The logic of the hash is as follows. A hash, unlike an array, usesarbitrary strings as its keys. Therefore, your hash %compare has theemail addresses found in the tnames file as its keys. Their correspondingvalues are numbers (specifically, as per your code, the number of timesthat email address was found in the tnames file).

To determine if a particular email address was in the tnames file, then,all you need to do is see if it is a key of your %compare hash:


  if (exists $compare{'[EMAIL PROTECTED]'}) {
    # it's in there
  }

This is MONUMENTALLY faster than using an array:

  chomp(my @email_address = <file1>);

  # now we need to search the array for an email address:
  my $seen = 0;
  for (@email_address) {
    if ($_ eq '[EMAIL PROTECTED]') {
      $seen = 1;
      last;
    }
  }

The hash takes care of the searching for us, and very efficiently.

As for doing this to multiple files, I'd use a couple of Perl's tricks toget it done quickly and painlessly:


  # hash of email addresses to exclude from other files
  my %exclude;
  open my($names), $tnames_file or die "can't read $tnames_file: $!";
  while (<$names>) {
    chomp;
    $exclude{$_} = 1;
  }
  close $names;

  # now comes the magic...
  {
    local $^I = ".bak";  # we're going to keep backups of the files
                         # so if we read tlist-1, Perl automatically
                         # backs it up as tlist-1.bak

    local @ARGV = @files_to_search;  # this is an array of the files
                                     # from which we want to remove the
                                     # email addresses found in %exclude

    # this while loop does everything for us:
    # the empty <> operator reads from the files listed in @ARGV, and for
    # each one, it executes the code inside the loop.  whatever we print
    # inside the loop gets printed to the new version of the file we're
    # working on (remember, the file got backed up as xxx.bak before we
    # started reading it).  our code simply says:  "unless the email
    # address is found in the %exclude hash, print it."  so this is doing
    # the job of reading EACH line of EACH file, and ONLY printing those
    # lines which are not found in the %exclude hash.

    while (<>) {
      chomp;
      print "$_\n" unless $exclude{$_};
    }
  }

Presto!

  perldoc perlrun (for $^I and @ARGV and <> magic -- see the -i option)

--
Jeff "japhy" Pinyan         %  How can we ever be the sold short or
RPI Acacia Brother #734     %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %    -- Meister Eckhart

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: hashes and loops

Reply via email to