Attached is a perl program I use to categorize spam subjects for review.  It
is a portion of an analysis batch file that addresses several other problems
such as moving the spam from the \IMAIL\SPOOL\SPAM holding area before
analysis so that the spam that pours in during analysis won't be deleted
without review.

    The subject sort order might be integrated into the other full featured
VB spam review program.

 Sort order:

  1.) Alphabetical

  2.) Spam subjects beginning with "wwww, xxx xxx xxx..."

  3.) Spam subjects containing multiple spaces.   I don't even need to look
at these since I've never seen a legitimate message containing spaces and
tagged as possible spam.  I wonder if any message containing multiple spaces
in the subject has ever been non spam.


# Summarize held spam by subject lines

my @fileList=glob("d*.smd");

my $spamCount=@fileList;

print "Found $spamCount files\n";

my %subjects;

foreach $spamFile(@fileList) {
   &getSubject($spamFile);
}

print "\n\n----------------result\n\n";


foreach $spamSubj(sort(keys(%subjects))) {
  print "$spamSubj: ($subjects{$spamSubj})\n"; 
}

exit(0);






sub getSubject {
   my ($spamFilename) = $_[0];

   #print "$spamFilename...";

   $subject="";
   open(SPAMFILE, "+< $spamFilename") || die "Can't open file $spamFilename\n";
   my($keepReading)=1;
   while ($keepReading) {
      $line = <SPAMFILE>;
      if ((!$line) || ($line eq "")) {
         # Blank line or EOF, terminate
         $keepReading=0;
      } else {
         $lcLine= lc $line;
         if ($lcLine =~ m/^subject:/) {
            $subject = substr($line, 8);

            for ($subject) {
              # Trim leading and trailing spaces
              s/^\s+//;
              s/\s+$//;

              # Change cr into space
              s/\r/ /g;
             }

             # If typical spam subject (Multiple spaces), sort to bottom by prepending 
FF character
             if ($subject =~ m/     /) {
                $subject="\xff" . $subject;
             }

             # If typical spam subject (begins with "xxx, BUY THIS SPAM"), sort near 
bottom by prepending FE character
             if ($subject =~ m/^(\w*),/) {
                $subject="\xfe" . $subject;
             }


            $keepReading=0;

            if (! defined($subjects{$subject})) {
               $subjects{$subject} = 1;
            } else {
               $subjects{$subject} ++;
            }
         }

      }

   }
   close SPAMFILE;

  # print "Subject: $subject\n";

}

Reply via email to