Hi Ken and other EUG-LUG and CPSR members,

I first posted this to EUG-LUG (Eugene Linux Users Group) because I think the technical issues here may be beyond EFN's knowledge base when we start talking about "Bayesian probability" of spam. With this second posting, I'm also reaching out to the Computer Professionals for Social Responsibility (which I've just joined) and hope to continue the discussion there since CPSR's focus is at the intersection of technology and society.

Ken, the interesting thing to note is that all the breaches of good e-mail design you've rightly attributed to the Lane county Dean campaign are scored but they total only 0.8 points-- nowhere near enough to set off the SpamAssassin spam-o-meter currently set to a threshhold of 5.0 on the EFN servers. The *single factor* that pushed the Dean campaign e-mail over the 5.0 threshhold was the "Bayesian spam probability" with a whopping 5.4 points.

Could anyone on this list explain what "Bayesian spam probability" means in the context of SpamAssassin? Some spam filters use locally defined filtering rules and some are augmented with remote databases of "known" spam messages (such as razor.sf.net). I don't know about SpamAssassin.

As I wrote before, the message below suggests the use of Bayesian networks in a heuristic to detect spam if it is similar to spam in a central repository or "training set" somewhere. The most likely reason I can think of for something to trigger the Bayesian rule is that there is an e-mail very much like the triggering e-mail in the repository, whereever that is. The only reason that there would be a 99% to 100% correlation (as cited in the SpamAssassin report) is if someone put a (nearly identical) e-mail from this opt-in list into the repository-- in other words, in all probability, someone signed up for the Dean campaign mailing list and then reported the received e-mails as spam in the central repository. I'm curious how they did that exactly. I mean, if you don't want to get the e-mail, don't sign up for it, right? Unless of course, you also want *other* people not to get the e-mail.

Suppose people signed up for your favorite campaign's e-mailing list with their AOL and/or Yahoo accounts and then clicked the "This is spam" link every time they got an e-mail from the campaign? Would the campaign's e-mails then start going into the "Bulk" (spam) folders for other users besides them? How do AOL/Yahoo and other ISPs use the information gained from users clicking on "This is Spam" links? Where does that information go? BTW, I don't know the legality of such a practice, but I do NOT recommend it as I consider it unethical.

It doesn't matter what your political leanings are-- this is potentially an issue of censorship and censorship with regard to political speech is anti-Democratic (IMHO).

My sense is that current Spam filters operate on the assumption that nearly everyone agrees on what is Spam so anything reported by anyone as spam is likely to actually be spam. That probably works fine up until an election year :-). For more background on PMSF, please see my first posting on the Planetwork archives at:

http://planetwork.blueoxen.net/forums/collaboratory/2003-08/msg00001.html

Thank you for your

Marc

Ken Barber wrote:

On Monday 10 November 2003 15:02, Marc Baber wrote:

I've been noticing that all e-mail coming to me from the local
Dean campaign is flagged as [EMAIL PROTECTED] by EFN's servers.


Yeah, no kidding! An HTML-encoded message with large fonts, flashy colors and lots of uppercase characters... plus it isn't even valid HTML... (see spamassassin message below)


Ken

------------------------------------------------------------------------

5.4 BAYES_99
BODY: Bayesian spam probability is 99 to 100%
[score: 1.0000]
0.1 HTML_FONTCOLOR_BLUE BODY: HTML font color is blue
0.1 HTML_MESSAGE BODY: HTML included in message
0.3 HTML_FONT_BIG BODY: HTML has a big font
0.1 HTML_FONTCOLOR_UNSAFE BODY: HTML font color not in safe
6x6x6 palette 0.2 HTML_TAG_BALANCE_A BODY: HTML has excess
"a" close tags 0.0 UPPERCASE_25_50 message body is
25-50% uppercase




_______________________________________________
EuG-LUG mailing list
[EMAIL PROTECTED]
http://mailman.efn.org/cgi-bin/listinfo/eug-lug

Reply via email to