Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails
Before anyone rushes ahead and puts any time or money into this. I think it's worth establishing whether it makes any significant difference. It solves several real world problems that I'm aware of but I agree it's not going to hold up 3.4.0 or be a top priority for me. regards, KAM
Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails
On Thu, 22 Mar 2012 07:59:39 -0400 Kevin A. McGrail wrote: > Yes and no. What you have missed is that David F Skoll is a key > author of MIMEDefang. They also publish a great COTS solution for > email filtering called CanIT. So his plugin is part of the commercial > product. AFAIK his Bayes uses word-pair tokenization, and DSPAM supports various multi-word tokenizers, so they are somewhat more susceptible to header rewriting. > > However, his idea is very elegant on tokens is an elegant idea. To > extract them, I planned on using SA's existing Bayesian framework and > deliver them to a header. What is done with the header from there is > a spam/ham delivery issue but at best sa-learn could use it. Lots of > security and privacy issues to deal with but I am just in the idea > phase. Before anyone rushes ahead and puts any time or money into this. I think it's worth establishing whether it makes any significant difference. AFAIK Bayes tokenizes after any encoding is removed so unless Exchange does something extreme like converting to unicode or rich-text format etc, I doubt it makes any difference at all to the body. I don't know how exchange mangles headers, but I'm sceptical it has much effect - if any. You'd really need to look at the details. Extra headers added after processing shouldn't be a problem, and it's easy enough to strip them if you're paranoid.
Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails
Yes and no. What you have missed is that David F Skoll is a key author of MIMEDefang. They also publish a great COTS solution for email filtering called CanIT. So his plugin is part of the commercial product. However, his idea is very elegant on tokens is an elegant idea. To extract them, I planned on using SA's existing Bayesian framework and deliver them to a header. What is done with the header from there is a spam/ham delivery issue but at best sa-learn could use it. Lots of security and privacy issues to deal with but I am just in the idea phase. Regards, KAM Per-Erik Persson wrote: Since we are on the subject of adding "magic links" to email header to make it easier for nontech staff to report spam. I don't understand how to extract the tokinzed data needed to represent the specific email. Have I missed some plugin that everyone else knows about? The rest of the problem seems trivial if you already have an infrastructure deployed with SSO and a decent webinterface. The setup with postfix facing the world, spamassassin sanitizinging it and exchange storing it is something that I see quite often nowdays.
Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails
On Thu, 22 Mar 2012 07:51:07 +0100 Per-Erik Persson wrote: > Since we are on the subject of adding "magic links" to email header to > make it easier for nontech staff to report spam. > I don't understand how to extract the tokinzed data needed to > represent the specific email. We have an entire infrastructure built to support this. It is proprietary, however, and is not easily implemented as a SpamAssassin plugin, though the basic idea probably could be. Regards, David.
Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails
Am 22.03.2012 09:15, schrieb xTrade Assessory: > Robert Schetterer wrote: >>> >> >> however , i have a ham/spam transport learn mail address, >> nearly null users forwards something to it, no wonder >> the false positve rate is nearly null >> >> in fact , there are systems with webmail guis for classify >> spam i.e aol, reality shows users dont use it very wise >> perhaps clicking field spam and delte are to near etc or they are simply >> dummy >> >> my conclusion dont waste your time to implement complicated mechs >> for ham/spam training, work on the tagging/rejecting side to reduce >> false positive rate >> > > Hi > > I can not agree more to that ... at the end, sooner or later, you > discover having spent time on something with erroneous or no return at > all ... not even talking about the support-overhead this extra mboxes > will create > > beside the obvious you already said it is still highly questionable if a > "user" is able to classify reliable. > > also, IMO, most SPAM hits obvious account names/combinations and most > user are not affected by the problem, unless their addresses are > standard_names@ > > since years I do not care so much any more and run a pretty standard > spamassassin but I query maillog for delivering attempts to not existing > accounts. First I slow it down after 2 invalid destination addresses but > also record the sender details and block them for three month from > within access file (I run sendmail everywhere) > > that works so smooth for me, still with almost zero cpu overhead for > spamd and it is practical, easy and cheap, the result is, before I got > on certain accounts 50 SPAMS per day, now 2 maybe 3 and that numbers > are for mservers with each of them having +50.000 accounts going through > > Hans > something like http://mailfud.org/postpals/ may helpfull too at some sites i have heard amavis has some equal mech however there is lot a postmaster can do, before trusting users spam/ham classify ( i.e there is the spamassassin black and whitlist feature ) , but if somebody do so ,dont trust your users in total users train should ever be one tag out of others, so i.e it may high bayes points etc but should not to lead for high tagging over spam/ham boarder in one tag step ( this is for isp style mail systems, the policy might be other for dediacted company mail etc , but its still complicated there too) but as reality shows i.e at aol their user abuse spam reporting program is totally broken , i never had a "true spam alarm" of their users by sended mails from my systems and on the other side the aol mail systems itself are very high rate for trying deliver in spam to my servers -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails
On 03/22/2012 07:59 AM, Robert Schetterer wrote: > Am 22.03.2012 07:51, schrieb Per-Erik Persson: >> Since we are on the subject of adding "magic links" to email header to >> make it easier for nontech staff to report spam. >> I don't understand how to extract the tokinzed data needed to represent >> the specific email. >> Have I missed some plugin that everyone else knows about? >> >> The rest of the problem seems trivial if you already have an >> infrastructure deployed with SSO and a decent webinterface. >> >> The setup with postfix facing the world, spamassassin sanitizinging it >> and exchange storing it is something that I see quite often nowdays. >> >> >> > however , i have a ham/spam transport learn mail address, > nearly null users forwards something to it, no wonder > the false positve rate is nearly null > > in fact , there are systems with webmail guis for classify > spam i.e aol, reality shows users dont use it very wise > perhaps clicking field spam and delte are to near etc or they are simply > dummy > > my conclusion dont waste your time to implement complicated mechs > for ham/spam training, work on the tagging/rejecting side to reduce > false positive rate > You are right about how the average user works. (Oh I am tired of the mailinglist, lets classify it as spam since I don't know how to unsubscribe) However a helpdesk and similair often get complaints about spam getting thru and it is virtually impossible to make most users cut and paste a header. But pasting a single field from the header and sending it to the right helpdeskqueue or a webinterface is probably just the right amount of work. I have a personal toolbox to sieve out the phishingemails(and false positives) and would like to make a closed loop for feeding the spamassassin without having access to the original emails.
Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails
Robert Schetterer wrote: >> > > however , i have a ham/spam transport learn mail address, > nearly null users forwards something to it, no wonder > the false positve rate is nearly null > > in fact , there are systems with webmail guis for classify > spam i.e aol, reality shows users dont use it very wise > perhaps clicking field spam and delte are to near etc or they are simply > dummy > > my conclusion dont waste your time to implement complicated mechs > for ham/spam training, work on the tagging/rejecting side to reduce > false positive rate > Hi I can not agree more to that ... at the end, sooner or later, you discover having spent time on something with erroneous or no return at all ... not even talking about the support-overhead this extra mboxes will create beside the obvious you already said it is still highly questionable if a "user" is able to classify reliable. also, IMO, most SPAM hits obvious account names/combinations and most user are not affected by the problem, unless their addresses are standard_names@ since years I do not care so much any more and run a pretty standard spamassassin but I query maillog for delivering attempts to not existing accounts. First I slow it down after 2 invalid destination addresses but also record the sender details and block them for three month from within access file (I run sendmail everywhere) that works so smooth for me, still with almost zero cpu overhead for spamd and it is practical, easy and cheap, the result is, before I got on certain accounts 50 SPAMS per day, now 2 maybe 3 and that numbers are for mservers with each of them having +50.000 accounts going through Hans -- XTrade Assessory International Facilitator BR - US - CA - DE - GB - RU - UK +55 (11) 4249. http://xtrade.matik.com.br
Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails
Am 22.03.2012 07:51, schrieb Per-Erik Persson: > Since we are on the subject of adding "magic links" to email header to > make it easier for nontech staff to report spam. > I don't understand how to extract the tokinzed data needed to represent > the specific email. > Have I missed some plugin that everyone else knows about? > > The rest of the problem seems trivial if you already have an > infrastructure deployed with SSO and a decent webinterface. > > The setup with postfix facing the world, spamassassin sanitizinging it > and exchange storing it is something that I see quite often nowdays. > > > however , i have a ham/spam transport learn mail address, nearly null users forwards something to it, no wonder the false positve rate is nearly null in fact , there are systems with webmail guis for classify spam i.e aol, reality shows users dont use it very wise perhaps clicking field spam and delte are to near etc or they are simply dummy my conclusion dont waste your time to implement complicated mechs for ham/spam training, work on the tagging/rejecting side to reduce false positive rate -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails
Since we are on the subject of adding "magic links" to email header to make it easier for nontech staff to report spam. I don't understand how to extract the tokinzed data needed to represent the specific email. Have I missed some plugin that everyone else knows about? The rest of the problem seems trivial if you already have an infrastructure deployed with SSO and a decent webinterface. The setup with postfix facing the world, spamassassin sanitizinging it and exchange storing it is something that I see quite often nowdays.