Re: Bayes auto-learn - not happening

2017-08-10 Thread AM
Imho You need 100 ham and 100 spam to auto learning working. Do manual
learning

08.08.2017 8:20 PM "Scott Techlist"  napisaƂ(a):

> Centos7
> Postfix 3.2.2
> Amavisd-new 2.11.0
> Spamassassin 3.4.0
> Site-wide configuration
>
> This is a new box and I've configured some conservative values for
> auto-learn.  I've enabled it properly AFAIK, but I can't see any sign of it
> working.
>
> I have these set in local.cf
> use_bayes   1
> bayes_auto_learn1
> bayes_auto_learn_threshold_nonspam -1.7
> bayes_auto_learn_threshold_spam 10.0
> # this is a filename prefix, not a directory per se
> bayes_path  /etc/mail/bayes/bayes
> bayes_file_mode 0666
>
> -bayes prep 
> Start fresh for troubleshooting:
> su amavis -c 'sa-learn --clear'
>
> Add one spam manually and check tokens:
>
> [root@tn2 mail]# su amavis -c 'sa-learn --dump magic'
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0  1  0  non-token data: nspam
> 0.000  0  0  0  non-token data: nham
> 0.000  0   2157  0  non-token data: ntokens
>
> -amavisd prep
>
> Restart amavisd/spamassassin just to be sure all configs read..
>
> --- ready to process -
>
> The next high scoring spam arrives, it was sent to my spam mailbox.  It
> did NOT autolearn.  Nor did several others.
>
> To troubleshoot, I took one that did not autolearn, and learned it
> manually by:
> su amavis -c 'sa-learn -D --spam --showdots  --mbox /home/mail/onespam
>
> even though this message was slightly over the threshold, the log says it
> learned anyway:
> -D log snippet:
> -
> Aug  8 12:37:27.216 [13198] info: archive-iterator: skipping large
> message: 858 lines, 262203 bytes, limit 262144 bytes
>
> Learned tokens from 1 message(s) (1 message(s) examined)
> -
>
> Verified it learned:
>
> [root@tn2 mail]# su amavis -c 'sa-learn --dump magic'
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0  2  0  non-token data: nspam
>
>
> Partial header from that message:
>
> X-Spam-Flag: YES
> X-Spam-Score: 17.374
> X-Spam-Level: *
> X-Spam-Status: Yes, score=17.374 tag=- tag2=5 kill=6.31
> tests=[RCVD_IN_BRBL_LASTEXT=1.644, RCVD_IN_DNSWL_NONE=-0.0001,
> RCVD_IN_RP_RNBL=1.284, RCVD_IN_SBL_CSS=3.558,
> RCVD_IN_SORBS_WEB=1.5,
> RP_MATCHES_RCVD=-0.001, SUSPICIOUS_RECIPS=2.497,
> URIBL_ABUSE_SURBL=1.948, URIBL_BLACK=1.7, URIBL_DBL_SPAM=2.5,
> URIBL_SBL=0.644, URIBL_SBL_A=0.1] autolearn=no autolearn_force=no
>
> Why aren't my spams getting auto-learned?  If sa-learn "ate" it, shouldn't
> auto-learn too?
>
> I know there is a default 200 threshold before Bayes starts tagging
> anything, but I understand it should learn without issue.
>
> Can't figure out what's wrong...
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Relay Country Plugin GEOIP issue - solved

2015-10-16 Thread am

On 2015-10-14 18:31, Mark Martinec wrote:


Check your database:

  $ spamassassin --lint -D metadata' 2>&1 | fgrep RelayCountry

should yield something like:

Oct 15 01:26:45.584 [78315] dbg: metadata: RelayCountry:
  Using database: Geo::IP GEO-106FREE 20151006 Build 1
  Copyright (c) 2015 MaxMind Inc All Rights Reserved



We see that exact response but we are still exhibited the warning.



Fresh files can be downloaded from:

  http://dev.maxmind.com/geoip/legacy/geolite/

unzip them and place them to their expected location,
typically /usr/local/share/GeoIP/ .
You need files GeoIP.dat and GeoIPv6.dat there.

  Mark


We tried this with no luck. Then we discovered we were patching the bug 
in the wrong location.


Wrong Location (System Location): /usr/local/share/perl5/Geo/IP.pm
Right Location (Cpanel Location) 
/usr/local/cpanel/3rdparty/perl/514/lib64/perl5/cpanel_lib/Geo/IP.pm


Now it's working as it's supposed to.  Thanks for your help.

Allen
a...@satester.com



Re: Relay Country Plugin GEOIP issue

2015-10-14 Thread am
Hi, I cannot get the fix below to work.  Does the Geo::IP package need 
to be recompiled for the change to go into effect? If so, any tips on 
how to recompile would be greatly appreciated.


Allen
a...@satester.com


On 2015-10-14 12:04, George Ficzeri wrote:

This?

https://github.com/maxmind/geoip-api-perl/pull/22

If you click 'Files changed' you'll see the path, and see the fix.


On 10/14/15 11:49 AM, a...@satester.com wrote:

Hi,

We activated the relay country plugin yesterday. As part of the 
process
we did a yum install perl-Geo-IP.  Now we get the following warning 
when

we lint or salearn.

Use of uninitialized value $hasStructureInfo in numeric eq (==) at 
(eval

31) line 5520

I have no idea which file line 5520 is in, and I am not finding much 
in

Google and I'm hoping someone here has a clue. Thanks.


Allen
a...@satester.com





Re: Relay Country Plugin GEOIP issue

2015-10-14 Thread am
Thanks for the reply George.  We tried that link yesterday and made the 
change as described with no results. We restarted mailscanner but 
nothing else. Maybe I need to restart our MTA or other daemon.


Allen
a...@bandwise.com



On 2015-10-14 12:04, George Ficzeri wrote:

This?

https://github.com/maxmind/geoip-api-perl/pull/22

If you click 'Files changed' you'll see the path, and see the fix.


On 10/14/15 11:49 AM, a...@satester.com wrote:

Hi,

We activated the relay country plugin yesterday. As part of the 
process
we did a yum install perl-Geo-IP.  Now we get the following warning 
when

we lint or salearn.

Use of uninitialized value $hasStructureInfo in numeric eq (==) at 
(eval

31) line 5520

I have no idea which file line 5520 is in, and I am not finding much 
in

Google and I'm hoping someone here has a clue. Thanks.


Allen
a...@satester.com



Relay Country Plugin GEOIP issue

2015-10-14 Thread am

Hi,

We activated the relay country plugin yesterday. As part of the process 
we did a yum install perl-Geo-IP.  Now we get the following warning when 
we lint or salearn.


Use of uninitialized value $hasStructureInfo in numeric eq (==) at (eval 
31) line 5520


I have no idea which file line 5520 is in, and I am not finding much in 
Google and I'm hoping someone here has a clue. Thanks.



Allen
a...@satester.com



satester.com update

2015-08-04 Thread am
I have been working on satester.com this past week in my spare time and 
I put up a new version recently (v1.02).  I think I may have finally 
nailed it down to where it is usable with a large ruleset. (It threw 
errors with pastes of large .cf files and other stuff) I also show the 
rule name now in the results, and anyway, I hope you and others may find 
this tool useful for certain situations.


satester.com


Allen Marsalis
a...@satester.com



Re: SA Rule Tester/Checker

2015-07-18 Thread am

On 2015-07-18 04:54, Martin Gregorie wrote:


There are lots of possibilities. I test using a big (and growing) spam
collection, which I keep so I can regression test my current rule set.
Thats quite crude: if everything in the collection is recognised as
spam, nothing gets flagged up during the test run and thats a pass.



Thanks for the reply. Having a test server seems like a good idea. I 
also like the idea of not storing a spam collection on a more expensive 
production server.



My setup is fairly simple: I run spamc/spamd on a development box and
have a collection of bash scripts that can pass one or more spam
samples (piped into spamc) through spamd for testing. I maintain local
copies of all cf files and a set of bash scripts that can:


I keep local copies on a small production server and run a bash script 
that moves our .cf files (after linting) to several servers and reloads 
the rules. It's very simple but does the trick. On a related note, I 
wish I could pass a message from the quarantine. That is, I'm trying to 
figure out a way to pass a raw message to my little satester for visual 
analysis. We use mailscanner/mailwatch and I would like to be able to 
just click once on a message in mailwatch and be looking at it in 
satester. To me, that would be cool.



- lint check the cf file collection locally be calling spamassassin
- start/stop/status check the local spamd (its stopped except when
  testing rule changes
- move cf files to the local /etc/mail/spamassassin and restart spamd
- run selected messages through spamc/spamd showing SA generated
  headers or
- run selected messages through spamc/spamd showing whether the rule
  under test fires
- run a full regression test that displays the messages that AREN'T
  flagged as spam
- load the current cf file collection into my production mail server
  and restart spamd.

I don't pretend this is the  best approach, but it works for me and has
also been used to test, develop and control the installation of SA
plugins.


I like your approach and thanks again for sharing. It definitely gives 
me some ideas on possible directions to proceed.


Allen



Hopefully this shows that you can run spamc/spamd anywhere, that it
doesn't need to be associated with an MTA for rule development and
testing and that the test setup can be quite simple - certainly no more
complex than you'd use to develop any other single-purpose server.


Martin


Re: SA Rule Tester/Checker

2015-07-17 Thread am

On 2015-07-17 16:49, Kevin A. McGrail wrote:


We use maildir most of the time on our servers. Is that a problem or 
are you referring to a mbox file on a client machine? I never ran 
spamassassin on a client before. Sorry, just trying to understand your 
test environment.


I usually am working and researching based on submissions.  Your mail
flow might differ but mutt supports maildir.


Yea sorry I'm totally lost. It happens. Do you test on a production 
server, other (test) server, or local mbox with Mutt as your client? By 
submissions, do you mean customer submissions of FP and FN hits 
submitted by users?


I looked for regression_tests.cf but I couldn't find it in any 
directory on my server.

You likely need an svn checkout of trunk to get it.


Understood, that will probably get me on the right path.



Exactly.  It's in addition to your cf files now and adds that
regression testing layer to see if they do what you expect.  And if
you get a new spam in the same vein, you add the string, modify the
rule and see if it still works on your old patterns and the new, etc.


Wow, I can now obviously see how that can be useful. I will definitely 
give it a try.


There really isn't much out there on the subject that is clear enough 
for someone getting started. I'm not sure I'm Wiki material, but I may 
try to put together a few basic howto's that interested folks can be 
pointed to on occasions like this.


Anything is better than a vacuum!
All about combatting spam!


Yeppers!


Allen
am -at- sarules.com




Re: SA Rule Tester/Checker

2015-07-17 Thread am

On 2015-07-17 09:27, Kevin A. McGrail wrote:

On 7/16/2015 8:00 PM, Allen Marsalis wrote:


Can you elaborate on the macros any?


Sure.  Mutt is a very powerful little mail client and it's perfect for
me for analysis of mbox files.


We use maildir most of the time on our servers. Is that a problem or are 
you referring to a mbox file on a client machine? I never ran 
spamassassin on a client before. Sorry, just trying to understand your 
test environment.




Creating a .muttrc file, you can add some macros like ctrl-y (why is
this hitting KAM ;-) ):
macro index \cy "spamassassin -t -D 2>&1 | grep -e KAM
--e Content\\ analysis\n" "Test Message with Spamassassin for
KAM Rules"

or prompt for a string to match with ctrl-v

macro index \cv "spamassassin -t -D 2>&1 | grep -i -e
Content\\ analysis -e " "Test Message with Spamassassin for Rules
Matching Search"

ctrl-o to look at everything:
macro index \co "spamassassin -t -D\n" "Test
Message with Spamassassin for all Rules"

And when you view a message I can hit ctrl v to sha1sum an attachment
for example:
macro attach \cv "sha1sum\n" "sha1sum on an 
attachment"


Hope this helps and perhaps you can edit our wiki and add any ideas
you find useful for others!


Yes that helps tremendously. Thanks Kevin. I was thinking bash scripts 
with spamassassin -t or something when I saw the word "macro". Knowing 
these are Mutt Macros set off the light bulb. I will definitely play 
around with Mutt and your macros as see what it's all about.




regression_tests.cf is a file you edit with a rule name and strings it
should and should not hit on.

You then run make test and will be told if your rule hits/doesn't hit
as expected.  Off-hand not sure exactly which test does it but once
you figure that out you can do prove -v t/testname.t and run just that
test.


I looked for regression_tests.cf but I couldn't find it in any directory 
on my server.  Not in /etc/mail/spamassassin/ or anyplace else I looked. 
I did google sample copies of the file and looking at it, I was a little 
confused since it doesn't look like other .cf files I'm familiar with. I 
see the "test", "ok" and "fail" attributes but no regex, just words. I'm 
guessing this is a .cf file that I need to add alongside my other .cf 
files (not part of installation).  I never ran make test before either. 
But shouldn't be too hard to figure out.




Sorry, not trying to spam my rule tool but just gain insight on where 
and if it is truly useful.


I think it is useful for new rule testers.  I try and automate my
stuff as much as possible and these days I can pickup spam patterns in
my sleep...


LOL. I may not be so good at coding patterns, but I can smell a good 
spam phrase in a heartbeat. I am also very careful to double check, lint 
rules, etc. and I can't be too careful.


Anyway, a link or two for (basic|convention|intended) rule checking 
might be enough to get me started and more familiar with regular 
methods of checking/debugging.


Sorry, only thing I would be doing is a Google search... I'm not sure
such a document exists though it should.  Perhaps some of the other
people who write rules can share some of their tricks?



There really isn't much out there on the subject that is clear enough 
for someone getting started. I'm not sure I'm Wiki material, but I may 
try to put together a few basic howto's that interested folks can be 
pointed to on occasions like this.


Thanks again for taking time to help someone down low on the mountain, 
help get up the mountain. I surely appreciate it.



Allen
am -at- satester.com




Re: SA Rule Tester/Checker

2015-07-16 Thread am

On 2015-07-16 04:53, Kevin A. McGrail wrote:


You might find the regression_tests.cf in the trunk rules/ dir
interesting.  It's a way of giving strings you want to hit/not-hit on
rules and see if it properly hits/doesn't hit as you expect.

I also use mutt and a few macros such as one that run spamassassin -t
2>&1 with a prompt for a keyword.  Helpful for debugging.



Can you elaborate on the macros any? After searching, I'm still having a 
hard time understanding conventional SA rule checking/debugging methods. 
I've been going my own route so far, but I would like to have a basic 
understanding how most folks do it. I'm not finding a much to get me 
started. (Guides on regression_tests.cf etc.)


Without knowing more at this point, do you think there may some 
usefulness to a tool that responds to keystrokes/keyphrases in real time 
like satester/rubular do?  That is why I found the Rubular site so handy 
for checking my regex patterns in the first place and was inspired to 
write satester. For example, as I bang out a new rule, I can vary the 
sample text very quickly to check the pattern. Add/change/delete a 
character here or there and see what happens instantly. But with 
satester just on a larger scale. Sorry, not trying to spam my rule tool 
but just gain insight on where and if it is truly useful.


Anyway, a link or two for (basic|convention|intended) rule checking 
might be enough to get me started and more familiar with regular methods 
of checking/debugging.



Allen
am -at- satester.com




Re: SA Rule Tester/Checker

2015-07-16 Thread am

On 2015-07-16 07:32, Axb wrote:


header __KAM_NOTINMYNETWORK1 X-No-Relay =~ /./i
header __KAM_MULTIPLE_FROM From =~ /^./

I think I get the first one (if anything exists in X-No-Relay) but 
I'll

have to look deeper to understand why you would trigger on any From
address.  Anyway I'm having fun, learning a lot, and doing my 
customers
a lot of good by developing rules.  Thanks again for your tips and 
help.


did you miss the next line?

tflags   __KAM_MULTIPLE_FROM multiple,maxhits=2


Understood. I just had a "what the heck is this?" moment. I'm a little 
excited by the new tool and I can't wait to dig into mutt and 
spamassassin -t today to see how they work by comparison. Yea, no more 
Rubular. heh.


Allen Marsalis
am -at- satester.com



Re: SA Rule Tester/Checker

2015-07-16 Thread am

On 2015-07-16 04:53, Kevin A. McGrail wrote:


You might find the regression_tests.cf in the trunk rules/ dir
interesting.  It's a way of giving strings you want to hit/not-hit on
rules and see if it properly hits/doesn't hit as you expect.

I also use mutt and a few macros such as one that run spamassassin -t
2>&1 with a prompt for a keyword.  Helpful for debugging.




Thank Kevin. I really appreciate your sharing. I will check these out 
today. I've used regex for years but I'm relatively new to SA rules.


I did see something interesting this morning.  I pasted KAM.cf in to 
satester figuring it might overload my script but it worked. However any 
sample text I type triggers these two rules of yours.


header __KAM_NOTINMYNETWORK1 X-No-Relay =~ /./i
header __KAM_MULTIPLE_FROM From =~ /^./

I think I get the first one (if anything exists in X-No-Relay) but I'll 
have to look deeper to understand why you would trigger on any From 
address.  Anyway I'm having fun, learning a lot, and doing my customers 
a lot of good by developing rules.  Thanks again for your tips and help.


Allen Marsalis
am -at- satest.com



SA Rule Tester/Checker

2015-07-15 Thread am
I started writing SA rules about a year ago. Although I am new to this 
list, I have been lurking for quite a while. I would like to thank Kevin 
McGrail and others for providing rules and tips that inspires me to 
write my own custom rules.


Today I wrote a little tool that helps me test my SA rules.  I was using 
Rubular.com to check one pattern at a time which was very tedious. With 
my new tool, I can paste my entire rule.cf file (or just a one rule) and 
check against any test string to see which rules hit.  (operates like a 
multi-line version of Rubular)


I hope some of you find this tool useful. I wrote it because I couldn't 
find another one like it in google. If there is something better at 
testing SA rules like this, please let me know so I don't waste any 
further development efforts. If it is useful, ideas and suggestions will 
be heartily appreciated.


www.satester.com

It's a one page site created in one day, so it doesn't look like much 
right now. We might style it better later on. There is no database and 
we save nothing entered into the site. It ignores meta, score, and 
describe at this time (any line without regex in it) Simply paste in a 
rule and enter some sample text and it automatically highlights the 
hits.


I notice a couple of bugs already. I've seen an odd rule hit on one of 
our span tags used for highlighting sample results.  Also I need to add 
mimeheader to the list of lines that contain regex to be checked (along 
with header, body, rawbody, etc.)



Hope you enjoy!


Allen Marsalis
President, Bandwise LLC
am -at- satester dot com