> From: Chris Santerre
> To: 'Duncan Findlay'; [email protected]
> > Yes, but it's difficult for people to join SARE, or learn what goes
> > into rule development. If all the development takes place
> in private,
> > then there's no way for newcomers to join and this is a really bad
> > thing.
>
> How do you think all the SARE members got in? Wasn't hard.
> They did good work. Some wrote rules that were never used,
> but showed the will to do it. So they got in.
As a "rule writer wanna-be" let me say that this "code is
different" argument also affects us who will write rules.
When I read you Perl code, I usually understand much of
what you were trying to accomplish -- a rule can be relatively
opaque and devoid of intent when finished.
If we approach the issue from the old mindset of "it
works for SARE" (etc) then we may get the same old results.
Namely: few rule writers.
This is explicitly what you (we) are trying to change.
Example: I am currently writing a very FEW rules, some from
scratch and some by adapting the work or ideas of others from
such lists or web sites.
But these are probably crappy rules by the older members
standards; I don't really know the full process for checking
them nor necessarily have the resources/processes setup to
do that; AND...
You have all convinced me that if I post a rule for discussion
that it is then close to worthless.
We're back to: If we want more rule writers, we have to make
it easier to start writing AND sharing rules -- we have to
encourage new rule writers not just wait for them to "show off"
or contribute spontaneously.
A Definition of Insanity: Continuing the same behavior and
expecting different results.
I want to write a rule (or maybe a plugin is necessary for this)
to check "display name" against "user part of email (before the@)".
Now I have no idea if this is going to offer an advantage nor
precisely how to do it -- I expect a noticable false positive
rate but also suspect that overall this might hit spam that is
not being found.
"Shirley Johnson" <[EMAIL PROTECTED]> stands out as a
high probability of being a bogus email to humans, but
can a rule or plugin understand this obvious mismatch.
(And spammers must believe it is important to make
that display name look reasonable because the vast
majority of my spam now looks like this.)
My other ideas include more "two rule" or "spam NOT ham rule"
combos.
Those rules like Bayes99 and (for me) SpamHaus RBLs might
be stripped of almost all false positives if they
are checked in combo with rules that give either
very few false positives themselves OR against rules
that give strong Ham sign.
The above are probably naive -- may even sound stupid by the
standards of the "experts" -- but you have to start somewhere and
if the experts are not able to share their ideas freely with
unknown beginners then "Houston we have a problem."
One distinction for me: I am willing to sound stupid in order to
learn how to be smarter.
(And my "display" vs "user part" is not quite as naive as it
sounds -- I am seeing a lot of mail that my human pattern
matcher picks out almost immediately, so finding a way to teach
SA to do that is theorectically possible. Obviously a direct
match is NOT doing to work, but counting percent of matching
letters and avoiding the score if BOTH "display" characters
are not largely in "user part" AND vice versa might make sense,
or using a soundex algorythm or some such.)
Rules rule.
--
Herb Martin