Re: Comparing items from multiple files

A. Pagaltzis Fri, 21 Nov 2003 19:16:59 -0800

* Peter Scott <[EMAIL PROTECTED]> [2003-11-20 11:52]:
> my $regex = shift;
> my %data;
> while (<>)
> {
>   next unless /$regex/;
>   $data{$ARGV}{$1}++;
> }
> 
> for my $outer (keys %data)
> {
>   my @others = grep $_ ne $outer => keys %data;
>   for my $inner (keys %{$data{$outer}})
>   {
>     for my $other (@others)
>     {
>       unless (exists $data{$other}{$inner})
>       {
>   print "In $outer but not $other: $inner\n";
>       }
>     }
>   }
> }


Your variable naming sucks. :)

s/data/found_for_file/g;
s/outer/cur_file/g;
s/others/other_file/g;
s/inner/match/g;

And then it's much, much clearer.

  my $regex = shift;
  my %found_for_file;

  while (<>) {
      next unless /$regex/;
      $found_for_file{$ARGV}{$1}++;
  }

  for my $cur_file (keys %found_for_file) {
      my @other_file = grep $_ ne $cur_file => keys %found_for_file;
      for my $match (keys %{ $found_for_file{$cur_file} }) {
          for my $other (@other_file) {
              unless (exists $found_for_file{$other}{$match}) {
                  print "In $cur_file but not $other: $match\n";
              }
          }
      }
  }

The grep is silly. You can just ``next if $other eq $cur_file''
in the ``for my $other'' loop; that would work better if you swap
the two innermost loops and iterate over other files first and
then over matches in the current one next.

But I don't like the nested iteration over the list of files
anyway.  You've written an O(n^2) algorithm.  A better data
structure would go a long way here.  In this case it's as simple
as inverting the hashes.  With an eye towards what you were
trying to achieve, it's more user friendly to summarize matches.
To get rid of nested loops, I take advantage of Storable.

  use Storable;
  $Storable::canonical = 1; # important!! see POD

  my $regex = shift;
  my %file_for_match;

  while (<>) {
      chomp;
      next unless /$regex/;
      $file_for_match{$1}{$ARGV}++;
  }

  my %match_group;

  push @{ $match_group{freeze $file_for_match{$_} } }, $_
      for keys %file_for_match;
      
  while(my ($group, $matches) = each %match_group) {
      my $files = join ', ', keys %{ thaw $group };
      print "Found in $files:\n", map "  $_\n", @$matches;
  }

Basically I'm (ab?)using Storable to make a hash the key of
another hash. A nice touch:

  my $regex = shift;
  my $total_files = @ARGV;

  # ...
  # ...
  # ...

  while(my ($group, $matches) = each %match_group) {
      my $group_hash = thaw $group;
      my $files = keys %$group_hash == $total_files
          ? 'all files'
          : join ', ', keys %$group_hash;
      print "Found in $files:\n", map "  $_\n", @$matches;
  }

If there's a file called 'all files' which has unique matches
this will be misleading, though. ;-)

Could also sort the %match_group keys by number of keys in their
frozen hashes. Could cache unfrozen hashes by their frozen
representation so's you could sort them by number of files for a
certain match.. hmm..

  use Storable;
  $Storable::canonical++;

  my $regex = shift;
  my $total_files = @ARGV;

  my %file_for_match;

  while (<>) {
      chomp;
      next unless /$regex/;
      $file_for_match{$1}{$ARGV}++;
  }

  my (%match_for_group, %file_group):

  for(keys %file_for_match) {
      my $frozen = freeze $file_for_match{$_};
      $file_group{$frozen} ||= [ keys %{$file_for_match{$_}} ];
      push @{$match_for_group{$frozen}}, $_;
  }

  for(
      sort {
          @{$file_group{$b}} <=> @{$file_group{$a}}
      } keys %match_for_group
  ) {
      my $files = @{$file_group{$_}} == $total_files
          ? 'all files'
          : join ', ', @{$file_group{$_}};

      print
          "Found in $files:\n",
          map "  $_\n", @{$match_for_group{$_}};
  }

-- 
Regards,
Aristotle
 
"If you can't laugh at yourself, you don't take life seriously enough."

Re: Comparing items from multiple files

Reply via email to