On Fri, Dec 12, 2003 at 01:48:12AM -0600, Ronn!Blankenship wrote:
> and three, four, . . . is even worse.  Any suggestions?

All emails are supposed to have a unique ID. For example, yours is

Message-Id: <[EMAIL PROTECTED]>

All you have to do is search through and discard all messages except the
first one you encounter with each unique ID.

I wrote a quick and dirty perl script a while ago for a similar task
(filtering duplicate poker hand histories). With a couple slight
modifications (3 really, the beginning of the email, the Message-ID:
line, and the end of the email) that you can probably do in a few
minutes, it would work for your task. Here's the perl script that I
wrote that you can feel free to use or modify:



#!/usr/bin/perl -w
#
# filter duplicate hands from a Party style hand file

use strict;
use warnings;

my %seen = ();  # hash (look-up-table) to record already seen hands
my $inhand = 0; # whether we are currently inside a hand record
my $hand = 0;   # hand number of the hand we are currently inside
my $line;       # line we are currently parsing 
my @aline = (); # line array so we can join lines if we find an = on the end
my $nn = 0;     # line iteration variable

# take line input from command line filename argument or STDIN automagically
while ($line = <>) {
   # look for beginning of another hand, o option compiles pattern <O>nce 
   # only for speed since pattern doesn't change at runtime
   if ( $line =~ m{Hand History for Game\s+(\d+)\s+\*\*}o ) {
     $seen{$hand}++; # mark previous hand seen
     $hand = $1;     # current hand number from pattern match above
     $inhand = 1;    # record that we are currently in a hand
   }
   if ( $inhand ) {
      # print the filtered line unless we've already seen the hand
      unless ( $seen{$hand} ) {
         $line =~ s{=20$}{ }o; # convert =20 to a space
         if ( $line =~ m{=$}o ) {  # line continuation mark = found
            @aline = ();
            $line =~ s{=$}{}o;
            push @aline, $line;
            while ( $line = <> ) {
               unless ( $line =~ m{=$}o ) {
                  last;
               }
               $line =~ s{=$}{}o; 
               push @aline, $line;
            }
            chomp @aline;
            push @aline, $line;
            $line = join '', @aline;
         }
         print $line;
      }
      # look for a blank or spaces only line to end hand  
      if ( $line =~ m{^\s*$}o ) {
         $inhand = 0; # found a blank line so record not in hand
      }
   } 
}
_______________________________________________
http://www.mccmedia.com/mailman/listinfo/brin-l

Reply via email to