Re: BAYES_00 BODY. Negative score?

2023-02-14 Thread Alex
Hi,

>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> >*  [score: 0.]
>
> This indicates a mistrained database, which means you have trained too
> many
> spams or spam-like messages (commercial messages) as ham.
>
> Proper training of spams should help. Just keep your spam (and optionally
> ham) corpora for retraining in case you would drop the database.
>
> I also recommend to abstain from training commercial mail (notices from
> e-shops, companies you done business with etc) as ham, unless they
> generate
> BAYES_999 score and you want it lower.  I often train them as spam so
> those
> give uncertain BAYES_50 result.
>

Is there any ability to distinguish a legitimate newsletter from a spam
newsletter?

In other words, if I train emails from Forbes or Washington Post as ham,
then train similar newsletter emails from other other providers that are
more suspect, will bayes still be able to distinguish Forbes and WP as ham?

The problem is that if I avoid training newsletters or bulk email
altogether, then I'm also left with spam newsletters still only hitting
bayes50.

I'm actually in a situation now where Forbes and WP newsletters are being
marked as spam, so considering retraining, but wondering what approach/best
practices I should be following.

 # sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  97002  0  non-token data: nspam
0.000  0  90173  0  non-token data: nham
0.000  0   11581565  0  non-token data: ntokens
0.000  0 1054224948  0  non-token data: oldest atime
0.000  0 1676433889  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal sync
atime
0.000  0 1648164856  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime
delta
0.000  0  0  0  non-token data: last expire
reduction count


Re: BAYES_00 BODY. Negative score?

2023-02-14 Thread joe a
Please let this sit for a while, I've discovered a fundamental issue 
with my scheme of feeding messages to BAYES.  Unfortunately I was 
remiss, apparently, it setting up logging for some bits, so have no idea 
how long this has been failing.


Sorry for the clutter.

joe a.

On 2/14/2023 5:37 PM, joe a wrote:

On 2/14/2023 2:56 AM, Matus UHLAR - fantomas wrote:

On 13.02.23 17:42, joe a wrote:
Have some annoying SPAM that consistently shows a negative score on 
BAYES.  Is the default scoring or influenced by BAYES in some way?


*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
*  [score: 0.]


This indicates a mistrained database, which means you have trained too 
many spams or spam-like messages (commercial messages) as ham.


Proper training of spams should help. Just keep your spam (and 
optionally ham) corpora for retraining in case you would drop the 
database.


I also recommend to abstain from training commercial mail (notices 
from e-shops, companies you done business with etc) as ham, unless 
they generate BAYES_999 score and you want it lower.  I often train 
them as spam so those give uncertain BAYES_50 result.


Those mails resemble spam too much to be used for training.



All,

The term "proper training" has always seemed a bit problematic to me. 
That aside, experiencing an error trying attempting:


sa-learn -D --spam /var/mail/spamd/Cabinet.saved-spam

The last line shows:

***
Learned tokens from 0 message(s) (1 message(s) examined)
ERROR: the Bayes learn function returned an error, please re-run with -D 
for more information at /usr/bin/sa-learn line 500.

***

Which may be permissions related.  However, there seem to be some 
errors/warning at the beginning, starting with:


***
Feb 14 17:26:14.956 [2855] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Razo r2 from 
@INC

Feb 14 17:26:14.959 [2855] dbg: razor2: razor2 is not available
Feb 14 17:26:14.959 [2855] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::SpamCop from @INC
plugin: failed to parse plugin (from @INC): Can't locate 
Mail/SpamAssassin/Plugin/SpamCop.pm: 
lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44) 
line 1.

***

While this also suggests a permissions issue the only place I find 
SpamCom.pm (even as root) is at: 
"/usr/lib/perl5/vendor_perl/5.26.1/Mail/SpamAssassin/Plugin/SpamCop.pm", 
which is not in the path sa-learn concocted when invoked.


Sorry if the formatting is weird or if this is useless information.


Re: BAYES_00 BODY. Negative score?

2023-02-14 Thread joe a

On 2/14/2023 2:56 AM, Matus UHLAR - fantomas wrote:

On 13.02.23 17:42, joe a wrote:
Have some annoying SPAM that consistently shows a negative score on 
BAYES.  Is the default scoring or influenced by BAYES in some way?


*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
*  [score: 0.]


This indicates a mistrained database, which means you have trained too 
many spams or spam-like messages (commercial messages) as ham.


Proper training of spams should help. Just keep your spam (and 
optionally ham) corpora for retraining in case you would drop the database.


I also recommend to abstain from training commercial mail (notices from 
e-shops, companies you done business with etc) as ham, unless they 
generate BAYES_999 score and you want it lower.  I often train them as 
spam so those give uncertain BAYES_50 result.


Those mails resemble spam too much to be used for training.



All,

The term "proper training" has always seemed a bit problematic to me. 
That aside, experiencing an error trying attempting:


sa-learn -D --spam /var/mail/spamd/Cabinet.saved-spam

The last line shows:

***
Learned tokens from 0 message(s) (1 message(s) examined)
ERROR: the Bayes learn function returned an error, please re-run with -D 
for more information at /usr/bin/sa-learn line 500.

***

Which may be permissions related.  However, there seem to be some 
errors/warning at the beginning, starting with:


***
Feb 14 17:26:14.956 [2855] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Razo r2 from 
@INC

Feb 14 17:26:14.959 [2855] dbg: razor2: razor2 is not available
Feb 14 17:26:14.959 [2855] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::SpamCop from @INC
plugin: failed to parse plugin (from @INC): Can't locate 
Mail/SpamAssassin/Plugin/SpamCop.pm: 
lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44) 
line 1.

***

While this also suggests a permissions issue the only place I find 
SpamCom.pm (even as root) is at: 
"/usr/lib/perl5/vendor_perl/5.26.1/Mail/SpamAssassin/Plugin/SpamCop.pm", 
which is not in the path sa-learn concocted when invoked.


Sorry if the formatting is weird or if this is useless information.


Re: Seeing big (>1MB) spam

2023-02-14 Thread Loren Wilton

I started seeing some spam today in the 1-1.5 MB range.


It's been over a year now, but for a while I was getting a huge number of 
spams that were either 1143 KB or 3831 KB.
The 3831 KB variant used the same obfuscation payload as the 1143 KB spams, 
they just put it in twice in a row.


   Loren



Seeing big (>1MB) spam

2023-02-14 Thread Kenneth Porter
I started seeing some spam today in the 1-1.5 MB range. I was surprised to 
see obvious spam in my Inbox, but discovered it had no SA headers. It 
turned out that my procmailrc rule was only scanning messages smaller than 
700k. I boosted it to 2MB:


:0fw
* < 200
| /usr/bin/spamc -s 200




[Off-Topic] Blog from KAM on Cybersecurity and Looking for Hecklers for my workshop at InboxExpo

2023-02-14 Thread Kevin A. McGrail
Thanks to Inbox Expo for publishing my 2 Secrets to Streamline
Cybersecurity Projects. You can read it at
https://inboxexpo.com/2-secrets-from-from-kam/ and no registration or
silliness required!

I will also be presenting the keynote and a workshop for InboxExpo.com on
February 27th. While the onsite venue is full, free virtual tickets are
available thanks to Dotdigital. Register today at https://lnkd.in/gATaQaGX.

My Workshop will be a facilitated discussion on deliverability, SEO, Spam,
Marketing, Branding, etc.

If you are interested in more content from me and want to learn more about
CRM, Emails, Marketing, Email Security, and using Google Cloud & AI, I will
be working with emailexpert.org to give free classes as part of the 2023
membership drive running now. Join today!

Regards,
KAM
--
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171