Attached is a perl program I use to categorize spam subjects for review. It is a portion of an analysis batch file that addresses several other problems such as moving the spam from the \IMAIL\SPOOL\SPAM holding area before analysis so that the spam that pours in during analysis won't be deleted without review.
The subject sort order might be integrated into the other full featured VB spam review program. Sort order: 1.) Alphabetical 2.) Spam subjects beginning with "wwww, xxx xxx xxx..." 3.) Spam subjects containing multiple spaces. I don't even need to look at these since I've never seen a legitimate message containing spaces and tagged as possible spam. I wonder if any message containing multiple spaces in the subject has ever been non spam.
# Summarize held spam by subject lines my @fileList=glob("d*.smd"); my $spamCount=@fileList; print "Found $spamCount files\n"; my %subjects; foreach $spamFile(@fileList) { &getSubject($spamFile); } print "\n\n----------------result\n\n"; foreach $spamSubj(sort(keys(%subjects))) { print "$spamSubj: ($subjects{$spamSubj})\n"; } exit(0); sub getSubject { my ($spamFilename) = $_[0]; #print "$spamFilename..."; $subject=""; open(SPAMFILE, "+< $spamFilename") || die "Can't open file $spamFilename\n"; my($keepReading)=1; while ($keepReading) { $line = <SPAMFILE>; if ((!$line) || ($line eq "")) { # Blank line or EOF, terminate $keepReading=0; } else { $lcLine= lc $line; if ($lcLine =~ m/^subject:/) { $subject = substr($line, 8); for ($subject) { # Trim leading and trailing spaces s/^\s+//; s/\s+$//; # Change cr into space s/\r/ /g; } # If typical spam subject (Multiple spaces), sort to bottom by prepending FF character if ($subject =~ m/ /) { $subject="\xff" . $subject; } # If typical spam subject (begins with "xxx, BUY THIS SPAM"), sort near bottom by prepending FE character if ($subject =~ m/^(\w*),/) { $subject="\xfe" . $subject; } $keepReading=0; if (! defined($subjects{$subject})) { $subjects{$subject} = 1; } else { $subjects{$subject} ++; } } } } close SPAMFILE; # print "Subject: $subject\n"; }