Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails

2012-03-22 Thread Kevin A. McGrail

Before anyone rushes ahead and puts any time or money into this. I
think it's worth establishing whether it makes any significant
difference.
It solves several real world problems that I'm aware of but I agree it's 
not going to hold up 3.4.0 or be a top priority for me.


regards,
KAM


Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails

2012-03-22 Thread RW
On Thu, 22 Mar 2012 07:59:39 -0400
Kevin A. McGrail wrote:

> Yes and no. What you have missed is that David F Skoll is a key
> author of MIMEDefang. They also publish a great COTS solution for
> email filtering called CanIT. So his plugin is part of the commercial
> product.

AFAIK his Bayes uses word-pair tokenization, and DSPAM supports
various multi-word tokenizers, so they are somewhat more susceptible
to header rewriting.

> 
> However, his idea is very elegant on tokens is an elegant idea. To
> extract them, I planned on using SA's existing Bayesian framework and
> deliver them to a header. What is done with the header from there is
> a spam/ham delivery issue but at best sa-learn could use it. Lots of
> security and privacy issues to deal with but I am just in the idea
> phase.

Before anyone rushes ahead and puts any time or money into this. I
think it's worth establishing whether it makes any significant
difference.

AFAIK Bayes tokenizes after any encoding is removed so unless
Exchange does something extreme like converting to unicode or rich-text
format etc, I doubt it makes any difference at all to the body.

I don't know how exchange mangles headers, but I'm sceptical it has
much effect - if any. You'd really need to look at the details.

Extra headers added after processing shouldn't be a problem, and it's
easy enough to strip them if you're paranoid.





Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails

2012-03-22 Thread Kevin A. McGrail
Yes and no. What you have missed is that David F Skoll is a key author of 
MIMEDefang. They also publish a great COTS solution for email filtering called 
CanIT. So his plugin is part of the commercial product.

However, his idea is very elegant on tokens is an elegant idea. To extract 
them, I planned on using SA's existing Bayesian framework and deliver them to a 
header. What is done with the header from there is a spam/ham delivery issue 
but at best sa-learn could use it. Lots of security and privacy issues to deal 
with but I am just in the idea phase.
Regards,
KAM

Per-Erik Persson  wrote:

Since we are on the subject of adding "magic links" to email header to
make it easier for nontech staff to report spam.
I don't understand how to extract the tokinzed data needed to represent
the specific email.
Have I missed some plugin that everyone else knows about?

The rest of the problem seems trivial if you already have an
infrastructure deployed with SSO and a decent webinterface.

The setup with postfix facing the world, spamassassin sanitizinging it
and exchange storing it is something that I see quite often nowdays.



Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails

2012-03-22 Thread David F. Skoll
On Thu, 22 Mar 2012 07:51:07 +0100
Per-Erik Persson  wrote:

> Since we are on the subject of adding "magic links" to email header to
> make it easier for nontech staff to report spam.
> I don't understand how to extract the tokinzed data needed to
> represent the specific email.

We have an entire infrastructure built to support this.  It is proprietary,
however, and is not easily implemented as a SpamAssassin plugin, though
the basic idea probably could be.

Regards,

David.


Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails

2012-03-22 Thread Robert Schetterer
Am 22.03.2012 09:15, schrieb xTrade Assessory:
> Robert Schetterer wrote:
>>>
>>
>> however , i have a ham/spam transport learn mail address,
>> nearly null users forwards something to it, no wonder
>> the false positve rate is nearly null
>>
>> in fact , there are systems with webmail guis for classify
>> spam i.e aol, reality shows users dont use it very wise
>> perhaps clicking field spam and delte are to near etc or they are simply
>> dummy
>>
>> my conclusion dont  waste your time to implement complicated mechs
>> for ham/spam training, work on the tagging/rejecting side to reduce
>> false positive rate
>>
> 
> Hi
> 
> I can not agree more to that ... at the end, sooner or later, you
> discover having spent time on something with erroneous or no return at
> all ... not even talking about the support-overhead this extra mboxes
> will create
> 
> beside the obvious you already said it is still highly questionable if a
> "user" is able to classify reliable.
> 
> also, IMO, most SPAM hits obvious account names/combinations and most
> user are not affected by the problem, unless their addresses are
> standard_names@
> 
> since years I do not care so much any more and run a pretty standard
> spamassassin but I query maillog for delivering attempts to not existing
> accounts. First I slow it down after 2 invalid destination addresses but
> also record the sender details and block them for three month from
> within access file (I run sendmail everywhere)
> 
> that works so smooth for me, still with almost zero cpu overhead for
> spamd and it is practical, easy and cheap, the result is,  before I got
> on certain accounts 50 SPAMS per day, now 2 maybe 3 and that numbers
> are for mservers with each of them having +50.000 accounts going through
> 
> Hans
> 

something
like
http://mailfud.org/postpals/ may helpfull too at some sites
i have heard amavis has some equal mech

however there is lot a postmaster can do, before trusting users
spam/ham classify ( i.e there is the spamassassin black and whitlist
feature ) , but if somebody do so ,dont trust your users in total
users train should  ever be one tag out of others, so i.e it may high
bayes points etc
but should not to lead for high tagging over spam/ham boarder in one tag
step

( this is for isp style mail systems, the policy might be other for
dediacted company mail etc , but its still complicated there too)

but as reality shows i.e at aol their user abuse spam reporting program
is totally broken , i never had a "true spam alarm" of their users by
sended mails from my systems
and on the other side the aol mail systems itself are very high rate for
trying deliver in spam to my servers

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails

2012-03-22 Thread Per-Erik Persson
On 03/22/2012 07:59 AM, Robert Schetterer wrote:
> Am 22.03.2012 07:51, schrieb Per-Erik Persson:
>> Since we are on the subject of adding "magic links" to email header to
>> make it easier for nontech staff to report spam.
>> I don't understand how to extract the tokinzed data needed to represent
>> the specific email.
>> Have I missed some plugin that everyone else knows about?
>>
>> The rest of the problem seems trivial if you already have an
>> infrastructure deployed with SSO and a decent webinterface.
>>
>> The setup with postfix facing the world, spamassassin sanitizinging it
>> and exchange storing it is something that I see quite often nowdays.
>>
>>
>>
> however , i have a ham/spam transport learn mail address,
> nearly null users forwards something to it, no wonder
> the false positve rate is nearly null
>
> in fact , there are systems with webmail guis for classify
> spam i.e aol, reality shows users dont use it very wise
> perhaps clicking field spam and delte are to near etc or they are simply
> dummy
>
> my conclusion dont  waste your time to implement complicated mechs
> for ham/spam training, work on the tagging/rejecting side to reduce
> false positive rate
>
You are right about how the average user works. (Oh I am tired of the
mailinglist, lets classify it as spam since I don't know how to unsubscribe)
However a helpdesk and similair often get complaints about spam getting
thru and it is virtually impossible to make most users cut and paste a
header.
But pasting a single field from the header and sending it to the right
helpdeskqueue or a webinterface is probably just the right amount of work.
I have a personal toolbox to sieve out the phishingemails(and false
positives) and would like to make a closed loop for feeding the
spamassassin without having access to the original emails.
 


Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails

2012-03-22 Thread xTrade Assessory
Robert Schetterer wrote:
>>
> 
> however , i have a ham/spam transport learn mail address,
> nearly null users forwards something to it, no wonder
> the false positve rate is nearly null
> 
> in fact , there are systems with webmail guis for classify
> spam i.e aol, reality shows users dont use it very wise
> perhaps clicking field spam and delte are to near etc or they are simply
> dummy
> 
> my conclusion dont  waste your time to implement complicated mechs
> for ham/spam training, work on the tagging/rejecting side to reduce
> false positive rate
> 

Hi

I can not agree more to that ... at the end, sooner or later, you
discover having spent time on something with erroneous or no return at
all ... not even talking about the support-overhead this extra mboxes
will create

beside the obvious you already said it is still highly questionable if a
"user" is able to classify reliable.

also, IMO, most SPAM hits obvious account names/combinations and most
user are not affected by the problem, unless their addresses are
standard_names@

since years I do not care so much any more and run a pretty standard
spamassassin but I query maillog for delivering attempts to not existing
accounts. First I slow it down after 2 invalid destination addresses but
also record the sender details and block them for three month from
within access file (I run sendmail everywhere)

that works so smooth for me, still with almost zero cpu overhead for
spamd and it is practical, easy and cheap, the result is,  before I got
on certain accounts 50 SPAMS per day, now 2 maybe 3 and that numbers
are for mservers with each of them having +50.000 accounts going through

Hans

-- 
XTrade Assessory
International Facilitator
BR - US - CA - DE - GB - RU - UK
+55 (11) 4249.
http://xtrade.matik.com.br


Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails

2012-03-22 Thread Robert Schetterer
Am 22.03.2012 07:51, schrieb Per-Erik Persson:
> Since we are on the subject of adding "magic links" to email header to
> make it easier for nontech staff to report spam.
> I don't understand how to extract the tokinzed data needed to represent
> the specific email.
> Have I missed some plugin that everyone else knows about?
> 
> The rest of the problem seems trivial if you already have an
> infrastructure deployed with SSO and a decent webinterface.
> 
> The setup with postfix facing the world, spamassassin sanitizinging it
> and exchange storing it is something that I see quite often nowdays.
> 
> 
> 

however , i have a ham/spam transport learn mail address,
nearly null users forwards something to it, no wonder
the false positve rate is nearly null

in fact , there are systems with webmail guis for classify
spam i.e aol, reality shows users dont use it very wise
perhaps clicking field spam and delte are to near etc or they are simply
dummy

my conclusion dont  waste your time to implement complicated mechs
for ham/spam training, work on the tagging/rejecting side to reduce
false positive rate

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails

2012-03-21 Thread Per-Erik Persson
Since we are on the subject of adding "magic links" to email header to
make it easier for nontech staff to report spam.
I don't understand how to extract the tokinzed data needed to represent
the specific email.
Have I missed some plugin that everyone else knows about?

The rest of the problem seems trivial if you already have an
infrastructure deployed with SSO and a decent webinterface.

The setup with postfix facing the world, spamassassin sanitizinging it
and exchange storing it is something that I see quite often nowdays.