I agree that this isn't going to be the best approach. Detecting ham
is simply more difficult:
1. New types of ham emerge more often than new types of spam. Spammers
generally stick to tried-and-true subjects while ham is all over the
place.
2. Ham is more personalized than spam. Everyone gets very similar
spam, but nobody gets the same mix of ham messages that I get.
3. Ham has a much greater range of potential subjects and patterns
than spam. For all the spam, nobody's doing anything creative like
trying to sell fountain pens or beverage dispensers or books of poetry
with spam - it's all fake rolexes and cheap pharmaceuticals. Ham, on
the other hand, has a million potential subjects and you get
one-of-a-kind messages every day.
4. Spammers will have an easier time faking ham characteristics than
removing spam characteristics, which may be endemic to their methods
(spamming software, botnets, etc.)
5. Network effects are very helpful with spam (DNS blacklists, Razor,
etc.) but not very helpful with ham.
Of course, ham rules are helpful - especially personalized ones. I use
a bunch. But they're best used with the existing framework of spam
detection.