spamassassin (cmd line) connection to Redis

2014-05-22 Thread Matteo Dessalvi

Hi all.

As stated in the subject I am just trying to test my
SpamAssassin 3.4.0 installation (I am using the Debian
Jessie package), with the usual method described here:

http://wiki.apache.org/spamassassin/TestingInstallation

In the output of the command: spamassassin -D  gTube_spam.txt
I have got the following error:

(...)
May 22 12:31:39.240 [8390] warn: plugin: eval failed: bayes: Redis
failed: Redis error: ERR operation not permitted at /usr/share/perl5
/Mail/SpamAssassin/BayesStore/Redis.pm line 233, GEN2 line 1. at
/usr/share/perl5/Mail/SpamAssassin/BayesStore/Redis.pm line 265.
(...)

In the end the test have worked perfectly, because SA has
correctly classified the GTUBE spam sample but I am worried
about that Redis error.

The SA local.cf contains the following string:

bayes_sql_dsn   server=10.1.1.19:6379;password=mypass;database=2
(...)

which, I taught, should be enough for SA. Note that if
I am using the redis-cli from the command line, specifying
the same parameters, I did not have any connection/authorization
problem.

Looking for the line 233 stated in the error message, I found
that the error is raised inside the sub on_connect but it looks
like it's not a Redis authentication error.

Any clues about what I am doing wrong?
Thanks in advance!

Best regards,
Matteo


Re: spamassassin (cmd line) connection to Redis

2014-05-22 Thread Axb

On 05/22/2014 12:56 PM, Matteo Dessalvi wrote:

Hi all.

As stated in the subject I am just trying to test my
SpamAssassin 3.4.0 installation (I am using the Debian
Jessie package), with the usual method described here:

http://wiki.apache.org/spamassassin/TestingInstallation

In the output of the command: spamassassin -D  gTube_spam.txt
I have got the following error:

(...)
May 22 12:31:39.240 [8390] warn: plugin: eval failed: bayes: Redis
failed: Redis error: ERR operation not permitted at /usr/share/perl5
/Mail/SpamAssassin/BayesStore/Redis.pm line 233, GEN2 line 1. at
/usr/share/perl5/Mail/SpamAssassin/BayesStore/Redis.pm line 265.
(...)

In the end the test have worked perfectly, because SA has
correctly classified the GTUBE spam sample but I am worried
about that Redis error.

The SA local.cf contains the following string:

bayes_sql_dsn   server=10.1.1.19:6379;password=mypass;database=2
(...)

which, I taught, should be enough for SA. Note that if
I am using the redis-cli from the command line, specifying
the same parameters, I did not have any connection/authorization
problem.

Looking for the line 233 stated in the error message, I found
that the error is raised inside the sub on_connect but it looks
like it's not a Redis authentication error.

Any clues about what I am doing wrong?
Thanks in advance!


have you included this in your local.cf ?

bayes_store_module  Mail::SpamAssassin::BayesStore::Redis



Re: spamassassin (cmd line) connection to Redis

2014-05-22 Thread Axb

On 05/22/2014 12:56 PM, Matteo Dessalvi wrote:

Hi all.

As stated in the subject I am just trying to test my
SpamAssassin 3.4.0 installation (I am using the Debian
Jessie package), with the usual method described here:

http://wiki.apache.org/spamassassin/TestingInstallation

In the output of the command: spamassassin -D  gTube_spam.txt
I have got the following error:

(...)
May 22 12:31:39.240 [8390] warn: plugin: eval failed: bayes: Redis
failed: Redis error: ERR operation not permitted at /usr/share/perl5
/Mail/SpamAssassin/BayesStore/Redis.pm line 233, GEN2 line 1. at
/usr/share/perl5/Mail/SpamAssassin/BayesStore/Redis.pm line 265.
(...)

In the end the test have worked perfectly, because SA has
correctly classified the GTUBE spam sample but I am worried
about that Redis error.

The SA local.cf contains the following string:

bayes_sql_dsn   server=10.1.1.19:6379;password=mypass;database=2
(...)

which, I taught, should be enough for SA. Note that if
I am using the redis-cli from the command line, specifying
the same parameters, I did not have any connection/authorization
problem.

Looking for the line 233 stated in the error message, I found
that the error is raised inside the sub on_connect but it looks
like it's not a Redis authentication error.

Any clues about what I am doing wrong?
Thanks in advance!


what happens if you don't use authentication?



Re: spamassassin (cmd line) connection to Redis

2014-05-22 Thread Matteo Dessalvi

On 22.05.2014 13:10, Axb wrote:

have you included this in your local.cf ?

bayes_store_module  Mail::SpamAssassin::BayesStore::Redis


These are the relevant configuration lines for the
Redis SA module:

bayes_store_module  Mail::SpamAssassin::BayesStore::Redis
bayes_sql_dsn   server=10.1.1.19:6379;password=mypass;database=2
bayes_token_ttl 21d
bayes_seen_ttl   8d
bayes_auto_expire 1


On 22.05.2014 13:12, Axb wrote:


what happens if you don't use authentication?



It looks like the problem lies in the authentication.
When I have tried with an empty 'password=' (after
disabling the requirepass in the redis.conf) I have
got the following messages: (I have included empty
lines for the sake of readbility):

(...)
dbg: bayes: learner_new 
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x3cc14c0), 
bayes_store_module=Mail::SpamAssassin::BayesStore::Redis


dbg: bayes: learner_new: got 
store=Mail::SpamAssassin::BayesStore::Redis=HASH(0x42161c0)


dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x3cc14c0) 
implements 'learner_is_scan_available', priority 0


dbg: bayes: _open_db(not yet connected)

dbg: bayes: Redis on-connect, db_id 2

dbg: bayes: CLIENT SETNAME command failed, don't worry, possibly an old 
redis version: ERR Syntax error, try CLIENT (LIST | KILL ip:port)


dbg: bayes: redis server version 2.4.14, memory used 6.8 MiB, Lua is not 
available


dbg: bayes: initialized empty database, version 3

dbg: bayes: nspam_nham_get nspam=0, nham=0

dbg: bayes: not available for scanning, only 0 spam(s) in bayes DB  200
(...)

Of course this is just the initial test, so I do not have enough
bayes data. The 'CLIENT SETNAME' error is probably due to my old
Redis version but other than that it looks fine.

I will try again with the authentication enabled and see if
I stumble in the same problem as before.

Best regards,
Matteo



Re: spamassassin (cmd line) connection to Redis

2014-05-22 Thread Axb

On 05/22/2014 02:06 PM, Matteo Dessalvi wrote:


dbg: bayes: redis server version 2.4.14, memory used 6.8 MiB, Lua is not
available


You're using an ancient Redis version with no LUA support.

Redis 2.8.9 is the latest stable version.

I'd suggest you update Redis before you go on chasing windmills.




Rule updates?

2014-05-22 Thread Tom Hendrikx
Hi,

After checking the results of sa-update and doing some manual dns
queries, it seems that last rule updates were done more than a month
ago. This used to be an almost daily process, even when there were only
score changes due to masschecks.

Any specific reason for no new updates? Something we can assist with?

Regards,
Tom



signature.asc
Description: OpenPGP digital signature


Re: spamassassin (cmd line) connection to Redis

2014-05-22 Thread Matteo Dessalvi

Yes, you are definitely right: with the latest stable
Redis version (2.8.9 indeed) everything works smoothly
with the authentication.

Thanks for pointing me in the right direction!

Best regards,
Matteo

On 22.05.2014 14:10, Axb wrote:


You're using an ancient Redis version with no LUA support.

Redis 2.8.9 is the latest stable version.

I'd suggest you update Redis before you go on chasing windmills.




Re: Rule updates?

2014-05-22 Thread Kevin A. McGrail

On 5/22/2014 9:04 AM, Tom Hendrikx wrote:

After checking the results of sa-update and doing some manual dns
queries, it seems that last rule updates were done more than a month
ago. This used to be an almost daily process, even when there were only
score changes due to masschecks.

Any specific reason for no new updates? Something we can assist with?


Hi Tom,

The system running the update processing failed catastrophically and 
backups were insufficient.


I've been rebuilding the box as time allows.

Regards,
KAM


Re: spamassassin (cmd line) connection to Redis

2014-05-22 Thread Axb


On 05/22/2014 03:27 PM, Matteo Dessalvi wrote:

Yes, you are definitely right: with the latest stable
Redis version (2.8.9 indeed) everything works smoothly
with the authentication.

Thanks for pointing me in the right direction!

Best regards,
Matteo

On 22.05.2014 14:10, Axb wrote:


You're using an ancient Redis version with no LUA support.

Redis 2.8.9 is the latest stable version.

I'd suggest you update Redis before you go on chasing windmills.




Good to hear you got it working.

If your box is high traffic, watch the Redis memory usage.
When it does the dump to file it duplicates memory usage so if you 
expect Redis to use 2GB of memory, you'll need 4GB of free memory to do 
the dump. Swapping is not a happy option


my Redis usage looks like.

bayes_token_ttl 432000
bayes_seen_ttl  2d


0.000  0   22202312  0  non-token data: nspam
0.000  09593796  0  non-token data: nham


# Clients
connected_clients:203
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:4085255152
used_memory_human:3.80G
used_memory_rss:6439870464
used_memory_peak:6307356768
used_memory_peak_human:5.87G
used_memory_lua:126976
mem_fragmentation_ratio:1.58
mem_allocator:jemalloc-3.2.0


used_memory_peak is what it used to do the dump to file.

and even with that amount of data, Bayes is extremely fast and goes 
totally unnoticed in overall msg processing time.






Re: Rule updates?

2014-05-22 Thread Tom Hendrikx
On 05/22/2014 03:36 PM, Kevin A. McGrail wrote:
 On 5/22/2014 9:04 AM, Tom Hendrikx wrote:
 After checking the results of sa-update and doing some manual dns
 queries, it seems that last rule updates were done more than a month
 ago. This used to be an almost daily process, even when there were only
 score changes due to masschecks.

 Any specific reason for no new updates? Something we can assist with?
 
 Hi Tom,
 
 The system running the update processing failed catastrophically and
 backups were insufficient.

Ah, bugger ;

 
 I've been rebuilding the box as time allows.

Fair enough :)
Thanks fr the insight.

Kind regards,
Tom



signature.asc
Description: OpenPGP digital signature


Re: autolearn_force

2014-05-22 Thread RW
On Wed, 21 May 2014 21:34:23 -0700
Ian Zimmerman wrote:

 I don't understand this setting, and reading the documentation doesn't
 help.
 
 It seems it sould make bayes learn spam whenever the total score
 surpasses the value of bayes_auto_learn_threshold_spam, and not
 require 3 points from header and body each; that would make it a
 global setting similar in purpose to bayes_auto_learn_threshold_spam.
 
 But in fact this is a per-test setting, a subcategory of tflags.  Do I
 have to specify it separately for every test?  Why?

The point is to set it for a small number of rules that are
sufficiently strong as to guarantee there will be no mislearning in
combination with the autolearn as spam threshold. 


It's probably best to create a single metarule for this - something
that eliminates the possibility of mistraining through a lot
of overlapping rules. I do something similar to get more spam into my
high-scoring folder. I assign a lot of the near-certain spam rules
to different classes: BAYES, RBLs, URIBLs, relaycountry etc and then
count the number of classes.



Re: autolearn_force

2014-05-22 Thread Ian Zimmerman
On Thu, 22 May 2014 15:54:42 +0100
RW rwmailli...@googlemail.com wrote:

Ian I don't understand this setting, and reading the documentation
Ian doesn't help.

Ian It seems it should make Bayes learn spam whenever the total score
Ian surpasses the value of bayes_auto_learn_threshold_spam, and not
Ian require 3 points from header and body each; that would make it a
Ian global setting similar in purpose to
Ian bayes_auto_learn_threshold_spam.

Ian But in fact this is a per-test setting, a subcategory of tflags.
Ian Do I have to specify it separately for every test?  Why?

RW The point is to set it for a small number of rules that are
RW sufficiently strong as to guarantee there will be no mislearning in
RW combination with the autolearn as spam threshold.

RW It's probably best to create a single metarule for this - something
RW that eliminates the possibility of mistraining through a lot of
RW overlapping rules. I do something similar to get more spam into my
RW high-scoring folder. I assign a lot of the near-certain spam rules
RW to different classes: BAYES, RBLs, URIBLs, relaycountry etc and then
RW count the number of classes.

The problem I am trying to solve is that nearly all of my spam is
flagged due to body rules.  The header rules seem to be close to useless
with the latest campaigns - spammers seem to have learned enough to
avoid sending obvious stinking pieces of turd.  (The one exception is
patterns in the Message-ID, but I am afraid that will be short lived
too, and is insufficient by itself even now).

Thus, even if I set bayes_auto_learn_threshold_spam low, very few of my
spams are autolearned because of the 3/3 requirement.  The damn 3/3 is
my problem - how can I work around it?  If I have to spend an hour a day
manually training the classifier the spammers have won :-(

By the way, how are meta rules counted for this purpose?  The
documentation says nothing about that.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Mystery SpamWare

2014-05-22 Thread hospice admin
Hi Team,

All of a sudden I've started noticing a lot of spam coming in with some fairly 
unique headers like this:

x-track-version: 4
x-track-source: notifire_XXX
x-track-spooler-id: 
x-track-spooler-split-id: 
x-track-spooler-segment-id: 
x-render: render-
Precedence: bulk
x-track-contact-id: 

 is some number which varies with user to some degree, XXX varies by 
spammer.

Does anyone recognise where these headers come from?

Thanks

Jude.

  

Re: Mystery SpamWare

2014-05-22 Thread Axb

On 05/22/2014 07:23 PM, hospice admin wrote:

Hi Team,

All of a sudden I've started noticing a lot of spam coming in with some fairly 
unique headers like this:

x-track-version: 4
x-track-source: notifire_XXX
x-track-spooler-id: 
x-track-spooler-split-id: 
x-track-spooler-segment-id: 
x-render: render-
Precedence: bulk
x-track-contact-id: 

 is some number which varies with user to some degree, XXX varies by 
spammer.

Does anyone recognise where these headers come from?

Thanks


can you pastebin a sample?





Re: 20_sought_fraud.cf

2014-05-22 Thread Kevin A. McGrail

On 5/20/2014 3:03 PM, psychobyte wrote:

Hi,

Has there been any progress on this? We are looking to integrate these 
rules but, won't bother if the project is abandoned.


Thanks,
There has been some progress, yes but it's taken a back seat a bit. It's 
not abandoned.  Ping the list in 2 weeks.


regards,
KAM


Blank line rules

2014-05-22 Thread James B. Byrne
I am clearly missing something with these rules but I lack the experience to
see what it is:

score RAW_BLANK_LINES_05 0.5
rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i
describe RAW_BLANK_LINES_05 Raw body contains 5 or more consecutive empty lines
score RAW_BLANK_LINES_10 1.0
rawbody RAW_BLANK_LINES_10 /(\r?\n){10,24}/i
describe RAW_BLANK_LINES_10 Raw body contains 10 or more consecutive empty lines
score RAW_BLANK_LINES_15 1.5
rawbody RAW_BLANK_LINES_15 /(\r?\n){25}/
describe RAW_BLANK_LINES_15 Raw body contains 25 or more consecutive empty lines

I created a test file that consisted of nought but newlines (shown as $
characters using vim set list).

I passed it to spamassassin from the command line with the above rules in
/etc/mail/spamassassin/local.cf and nothing was reported.  I used an actual
message body from a spam message received and only the RAW_BLANK_LINES_05 test
is tripped even though the body of that message has 18 consecutive blank
lines, also consisting of nothing but \n characters.

So what is it about the regexp I am using that I evidently do not understand?

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte  Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3



Re: 20_sought_fraud.cf

2014-05-22 Thread psychobyte

Great!  Will do and Thx.

On 05/22/2014 12:13 PM, Kevin A. McGrail wrote:

On 5/20/2014 3:03 PM, psychobyte wrote:

Hi,

Has there been any progress on this? We are looking to integrate 
these rules but, won't bother if the project is abandoned.


Thanks,
There has been some progress, yes but it's taken a back seat a bit.  
It's not abandoned.  Ping the list in 2 weeks.


regards,
KAM




Re: Blank line rules

2014-05-22 Thread John Hardin

On Thu, 22 May 2014, James B. Byrne wrote:


I am clearly missing something with these rules but I lack the experience to
see what it is:

score RAW_BLANK_LINES_05 0.5
rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i
describe RAW_BLANK_LINES_05 Raw body contains 5 or more consecutive empty lines
score RAW_BLANK_LINES_10 1.0
rawbody RAW_BLANK_LINES_10 /(\r?\n){10,24}/i
describe RAW_BLANK_LINES_10 Raw body contains 10 or more consecutive empty lines
score RAW_BLANK_LINES_15 1.5
rawbody RAW_BLANK_LINES_15 /(\r?\n){25}/
describe RAW_BLANK_LINES_15 Raw body contains 25 or more consecutive empty lines


Regular expressions by default only consider a single line of text. You 
need to provide an option to say treat multiple lines as a single line.

Try this:

  rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m
  rawbody RAW_BLANK_LINES_10 /(?:\r?\n){10,24}/m
  rawbody RAW_BLANK_LINES_15 /(?:\r?\n){25}/m

The case-insensitive flag is not meaningful for these rules as there's no 
attempt to match text, and I added the ?: to make the groups 
non-capturing, which is a bit more efficient.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Windows and its users got mentioned at home today, after my wife the
 psych major brought up Seligman's theory of learned helplessness.
 -- Dan Birchall in a.s.r
---
 4 days until Memorial Day - honor those who sacrificed for our liberty


Re: Blank line rules

2014-05-22 Thread Ian Zimmerman
On Thu, 22 May 2014 13:47:04 -0700 (PDT)
John Hardin jhar...@impsec.org wrote:

John Regular expressions by default only consider a single line of
John text.  You need to provide an option to say treat multiple lines
John as a single line. Try this:

rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m
rawbody RAW_BLANK_LINES_10 /(?:\r?\n){10,24}/m
rawbody RAW_BLANK_LINES_15 /(?:\r?\n){25}/m

James, see also the Bayes refinement thread where I posted about doing
the exact same thing.  Somehow John's multiline rules don't work for me,
either.  Kärsten was looking at it last I know.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote:
 In either case, having a sample would speed up this ping-pong style
 debugging. And I am curious. ;)  Mind putting your sample up a pastebin?

Ian sent me the original message off-list. It indeed contains about 16
consecutive newlines, but doesn't trigger the rawbody rules discussed.
The issue is not related to rawbody being split up into chunks.

A stripped down test-case is easy to generate:

  echo -e \n\n~\n\n\n\nend

That's an empty mail header and a very short text body, consisting of
consecutive newlines. The tilde and end string are merely there for
anchoring and visualizing the match.

The rule for debugging the issue is the same I posted before, just
slightly modified to better visualize the match.

  rawbody __BLANKS  /.\n{2,}/
  tflags  __BLANKS  multiple

Feeding the test-case to spamassassin -D, the debug output shows the
match like the following:

  dbg: rules: ran rawbody rule __BLANKS == got hit: ~
  dbg: rules: [...] 
  ...
  dbg: rules: [...] 

The number of continuation lines equals the number of newlines in the
test-case.

Well, up until 12, that is. :-/

Any number up to 11 of consecutive newlines can be matched with rawbody
rules. However, 12 or more consecutive newlines will be squeezed and
replaced by exactly two newlines.


I've had a quick look at the code already, but did not yet find where
the supposedly raw (sic) body gets altered.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Blank line rules

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote:
 I am clearly missing something with these rules but I lack the experience to
 see what it is:
 
 score RAW_BLANK_LINES_05 0.5
 rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i

Why is everyone trying to match empty lines these days? Must be spam I'm
missing out on. ;)

 I passed it to spamassassin from the command line with the above rules in
 /etc/mail/spamassassin/local.cf and nothing was reported.  I used an actual
 message body from a spam message received and only the RAW_BLANK_LINES_05 test
 is tripped even though the body of that message has 18 consecutive blank
 lines, also consisting of nothing but \n characters.
 
 So what is it about the regexp I am using that I evidently do not understand?

See the post Consecutive Newlines in Rawbody Rules as of a few minutes
ago, follow-up to the Bayes refinement thread.

In a nutshell: 12 or more consecutive newlines cannot be matched with
rawbody rules. They get replaced by 2 newlines.


There's another issue with your approach of different rules matching up
to n occurrences and more than n. The first will always match in
addition, if the latter matches.

If the desired behavior is mutually exclusive matching, you need meta
rules actually encoding the math / logic.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Blank line rules

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 13:47 -0700, John Hardin wrote:
 On Thu, 22 May 2014, James B. Byrne wrote:

  rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i

 Regular expressions by default only consider a single line of text. You 

Nope. You're thinking about ^ and $ by default only matching the
beginning and end of the string. A \n newline is just an ordinary char.

REs don't know the concept of lines, they operate on a string.


 need to provide an option to say treat multiple lines as a single line.
 Try this:
 
rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m

The /m modifier changes ^ and $ to match anywhere in the string.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread David B Funk

On Thu, 22 May 2014, Karsten Bräckelmann wrote:


On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote:

[snip..]

The number of continuation lines equals the number of newlines in the
test-case.

Well, up until 12, that is. :-/

Any number up to 11 of consecutive newlines can be matched with rawbody
rules. However, 12 or more consecutive newlines will be squeezed and
replaced by exactly two newlines.


I've had a quick look at the code already, but did not yet find where
the supposedly raw (sic) body gets altered.


Look at Message.pm, around line 300:

# if we've got a series of blank lines, get rid of them
if (defined $start) {
  my $num = $start-$cnt;
  if ($num  10) {
splice @message, $cnt+2, $num-1;
  }


--
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{

Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread David B Funk

On Thu, 22 May 2014, David B Funk wrote:


On Thu, 22 May 2014, Karsten Bräckelmann wrote:


On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote:

[snip..]

The number of continuation lines equals the number of newlines in the
test-case.

Well, up until 12, that is. :-/

Any number up to 11 of consecutive newlines can be matched with rawbody
rules. However, 12 or more consecutive newlines will be squeezed and
replaced by exactly two newlines.


I've had a quick look at the code already, but did not yet find where
the supposedly raw (sic) body gets altered.


Look at Message.pm, around line 300:

   # if we've got a series of blank lines, get rid of them
   if (defined $start) {
 my $num = $start-$cnt;
 if ($num  10) {
   splice @message, $cnt+2, $num-1;
 }



After doing some experimenting with that code I came up with something that
I'd argue is more semantically correct:

# if we've got a long series of blank lines, limit them
if (defined $start) {
  my $max_blank_lines = 20;
  my $num = $start-$cnt;
  if ($num  $max_blank_lines) {
splice @message, $cnt+2, $num-$max_blank_lines;
  }
  undef $start;
}

IE limit a message to no more than $max_blank_lines in a row, not the total
collapse of more than 11. (adjust $max_blank_lines as you see fit or make it
a configurable parameter).

After making that change, I found rules like BLANK_LINES_60_70  BODY: 
Message is at least 60% blank lines started firing on a test message

that I was using. So could argue by that total collapse of large blocks of
lines the creators of that code are torpedoing rules like BLANK_LINES_60_70.



--
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{

Re: Consecutive Newlines in Rawbody Rules

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 17:43 -0500, David B Funk wrote:
 On Thu, 22 May 2014, Karsten Bräckelmann wrote:

  Any number up to 11 of consecutive newlines can be matched with rawbody
  rules. However, 12 or more consecutive newlines will be squeezed and
  replaced by exactly two newlines.

  I've had a quick look at the code already, but did not yet find where
  the supposedly raw (sic) body gets altered.
 
 Look at Message.pm, around line 300:

Thanks, good catch!

  # if we've got a series of blank lines, get rid of them
  if (defined $start) {
my $num = $start-$cnt;
if ($num  10) {
  splice @message, $cnt+2, $num-1;
}

10 empty lines, 11 consecutive newlines.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 18:34 -0500, David B Funk wrote:
 After doing some experimenting with that code I came up with something that
 I'd argue is more semantically correct:
 
  # if we've got a long series of blank lines, limit them
  if (defined $start) {
my $max_blank_lines = 20;
my $num = $start-$cnt;
if ($num  $max_blank_lines) {
  splice @message, $cnt+2, $num-$max_blank_lines;
}
undef $start;
  }
 
 IE limit a message to no more than $max_blank_lines in a row, not the total
 collapse of more than 11. (adjust $max_blank_lines as you see fit or make it
 a configurable parameter).

+1

Can you file a bug report or raise the topic in dev@ list? The code
change is sufficiently simple, but I want that issue discussed first.
Wonder what's the reason for that collapsing in the first place.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Blank line rules

2014-05-22 Thread John Hardin

On Thu, 22 May 2014, Karsten Bräckelmann wrote:


On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote:

I am clearly missing something with these rules but I lack the experience to
see what it is:

score RAW_BLANK_LINES_05 0.5
rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i


Why is everyone trying to match empty lines these days? Must be spam I'm
missing out on. ;)


Heh. Something similar just plopped into my spam quarantine.

You might want to do this:

  rawbody  MANY_BLANK_LINES  /(?:(?:br)?\r?\n){9}/mi


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...intellectuals have no interest in what _creates_ wealth, and
  what _inhibits_ the creation of wealth. They are very concerned
  about the _distribution_ of it, but they act as if wealth just
  exists somehow. It's like manna from heaven, it's only a
  question of how we split it up.-- Thomas Sowell
---
 4 days until Memorial Day - honor those who sacrificed for our liberty

Re: Blank line rules

2014-05-22 Thread Amir Caspi
On May 22, 2014, at 6:44 PM, John Hardin jhar...@impsec.org wrote:
 
 You might want to do this:
 
  rawbody  MANY_BLANK_LINES  /(?:(?:br)?\r?\n){9}/mi

AC_BR_BONANZA should cover the HTML case. It could be easily extended to match 
standard LF or CR per above. (In my case I am matching something like 20 
newlines for the HTML case, to try to prevent FPs.)

--- Amir
thumbed via iPhone



Re: Blank line rules

2014-05-22 Thread Kevin A. McGrail

On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote:
Why is everyone trying to match empty lines these days? Must be spam 
I'm missing out on. ;) 

Who here has seen Pootietang and is laughing about this?  Just me, likely...


Re: Mystery SpamWare

2014-05-22 Thread jdebert
On Thu, 22 May 2014 18:23:48 +0100
hospice admin hospice...@outlook.com wrote:

 Hi Team,
 
 All of a sudden I've started noticing a lot of spam coming in with
 some fairly unique headers like this:
 
 x-track-version: 4
 x-track-source: notifire_XXX
 x-track-spooler-id: 
 x-track-spooler-split-id: 
 x-track-spooler-segment-id: 
 x-render: render-
 Precedence: bulk
 x-track-contact-id: 
 
  is some number which varies with user to some degree, XXX varies
 by spammer.
 
 Does anyone recognise where these headers come from?
 

Those headers seem to be tracking headers for commercial email
marketing campaigns. Possibly from Notifire.co.uk, an email
massmarketing firm, calling itself a white label. Quite uncertain w/o
more data. But those headers are enough to make a filter from or to use
in header checks to reject such trash.

jd




Re: Blank line rules

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 20:56 -0400, Kevin A. McGrail wrote:
 On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote:
  Why is everyone trying to match empty lines these days? Must be spam 
  I'm missing out on. ;)
 
 Who here has seen Pootietang and is laughing about this?  Just me, likely...

The fact I just googled that word should sufficiently answer it as far
as I am concerned. ;)  Good thing it amused you, but that reference was
certainly unintended.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



OFF-TOPIC: The Brilliance of PootieTang was Re: Blank line rules

2014-05-22 Thread Kevin A. McGrail

On 5/22/2014 9:17 PM, Karsten Bräckelmann wrote:

On Thu, 2014-05-22 at 20:56 -0400, Kevin A. McGrail wrote:

On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote:

Why is everyone trying to match empty lines these days? Must be spam
I'm missing out on. ;)

Who here has seen Pootietang and is laughing about this?  Just me, likely...

The fact I just googled that word should sufficiently answer it as far
as I am concerned. ;)  Good thing it amused you, but that reference was
certainly unintended.

https://www.youtube.com/watch?v=RtCxvv8Y3Bs

2:54 is classic.

This movie is one of the real hit or miss comedies.  I think it's 
brilliant on a lot of levels.  Others don't get it.


regards,
KAM


I'm doing it wrong.

2014-05-22 Thread Kai Meyer
I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin 
(user prefs via mysql) server that I've been running for a few years 
now. It's just a few of my private domains, not a lot of traffic. In the 
last 6 months, the amount of spam getting through has gone from one or 
two a week to 30 a day. I had sa-learn setup on imap folders called SPAM 
and HAM running as root, so I just started tossing emails in there. It 
seemed like I had groups of emails around 2, 0, -1, and -2 (my threshold 
to dump to my JUNK folder is 3, and I have spamchk sideline things above 
7). I still get legitimate email in the 2-3 range, but I haven't had 
legitimate email above 3 in a long time. After a bit, the 2s became 3s 
and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did 
this habitually for more than a month, and the progress seemed to stop. 
I googled around a bit and realized that I didn't do a very good job 
setting up rules, so I added pyzor and razor2, and they seem functional. 
Spam got better, and it's down to maybe 10 a day, but they still range 
all the way up to 5.


What really gets me is that if I take an email that scores -2, strip 
the X-Spam* headers, and run it through spamc by hand (even as the spamd 
user) just like the spamchk script does, it scores around a 4. I have 
one here that scores a 4.1 if it comes through the mail, and a 6.6 if I 
run it manually. What can I do to reconcile these scores? I would like 
the scores I'm getting from the commandline over the ones I'm getting 
through postfix, but I don't know the system well enough to know what is 
causing the difference.


== Via postfix
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on 
kai2.gnukai.com

X-Spam-Flag: YES
X-Spam-Level: 
X-Spam-Status: Yes, score=4.1 required=3.0 
tests=BAYES_60,HTML_IMAGE_RATIO_08,
HTML_MESSAGE,INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS 
autolearn=no

version=3.3.1
...
Content analysis details:   (4.1 points, 3.0 required)

 pts rule name  description
 -- 
--

 1.1 INVALID_DATE   Invalid Date: header (not RFC 2822)
-0.0 SPF_PASS   SPF: sender matches SPF record
 0.0 HTML_IMAGE_RATIO_08BODY: HTML has a low ratio of text to image 
area

 1.5 BAYES_60   BODY: Bayes spam probability is 60 to 80%
[score: 0.6298]
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.8 RDNS_NONE  Delivered to internal network by a host 
with no rDNS



 Via commandline (cat test.mail | sudo -u spamd 
/usr/bin/spamc -u myemail  postsa.mail)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on 
kai2.gnukai.com

X-Spam-Flag: YES
X-Spam-Level: **
X-Spam-Status: Yes, score=6.6 required=3.0 tests=BAYES_60,HTML_MESSAGE,
INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS,URIBL_DBL_SPAM 
autolearn=no

version=3.3.1
...
Content analysis details:   (6.6 points, 3.0 required)

 pts rule name  description
 -- 
--

 1.1 INVALID_DATE   Invalid Date: header (not RFC 2822)
-0.0 SPF_PASS   SPF: sender matches SPF record
 2.5 URIBL_DBL_SPAM Contains an URL listed in the DBL blocklist
[URIs: fellage.me]
 1.5 BAYES_60   BODY: Bayes spam probability is 60 to 80%
[score: 0.6299]
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.8 RDNS_NONE  Delivered to internal network by a host 
with no rDNS



 /etc/mail/spamassassin.cf (I added the last 4 lines in 
a desperate attempt to see something change, but to no effect)

/etc/mail/spamassassin/local.cf
# These values can be overridden by editing 
~/.spamassassin/user_prefs.cf

# (see spamassassin(1) for details)

# These should be safe assumptions and allow for simple visual sifting
# without risking lost emails.

required_hits 5.0
report_safe 1
rewrite_header Subject [***SPAM***]
add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ 
tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_

trusted_networks 69.160.84.222
razor_config /etc/mail/spamassassin/.razor/razor-agent.conf
pyzor_options --homedir /etc/mail/spamassassin
auto_learn 0
use_razor2
use_dcc
use_pyzor




Re: I'm doing it wrong.

2014-05-22 Thread David B Funk

On Thu, 22 May 2014, Kai Meyer wrote:

I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin (user 
prefs via mysql) server that I've been running for a few years now. It's just 
a few of my private domains, not a lot of traffic. In the last 6 months, the 
amount of spam getting through has gone from one or two a week to 30 a day. I 
had sa-learn setup on imap folders called SPAM and HAM running as root, so I 
just started tossing emails in there. It seemed like I had groups of emails 
around 2, 0, -1, and -2 (my threshold to dump to my JUNK folder is 3, and I 
have spamchk sideline things above 7). I still get legitimate email in the 
2-3 range, but I haven't had legitimate email above 3 in a long time. After a 
bit, the 2s became 3s and the 0s became 1s, but the -1 and -2 spam emails 
stayed put. I did this habitually for more than a month, and the progress 
seemed to stop. I googled around a bit and realized that I didn't do a very 
good job setting up rules, so I added pyzor and razor2, and they seem 
functional. Spam got better, and it's down to maybe 10 a day, but they still 
range all the way up to 5.


What really gets me is that if I take an email that scores -2, strip the 
X-Spam* headers, and run it through spamc by hand (even as the spamd user) 
just like the spamchk script does, it scores around a 4. I have one here that 
scores a 4.1 if it comes through the mail, and a 6.6 if I run it manually. 
What can I do to reconcile these scores? I would like the scores I'm getting 
from the commandline over the ones I'm getting through postfix, but I don't 
know the system well enough to know what is causing the difference.


== Via postfix
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on kai2.gnukai.com
X-Spam-Flag: YES
X-Spam-Level: 
X-Spam-Status: Yes, score=4.1 required=3.0 
tests=BAYES_60,HTML_IMAGE_RATIO_08,
   HTML_MESSAGE,INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS 
autolearn=no

   version=3.3.1
...
Content analysis details:   (4.1 points, 3.0 required)

pts rule name  description
 -- 
--

1.1 INVALID_DATE   Invalid Date: header (not RFC 2822)
-0.0 SPF_PASS   SPF: sender matches SPF record
0.0 HTML_IMAGE_RATIO_08BODY: HTML has a low ratio of text to image area
1.5 BAYES_60   BODY: Bayes spam probability is 60 to 80%
   [score: 0.6298]
0.0 HTML_MESSAGE   BODY: HTML included in message
0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
0.8 RDNS_NONE  Delivered to internal network by a host with no 
rDNS



 Via commandline (cat test.mail | sudo -u spamd 
/usr/bin/spamc -u myemail  postsa.mail)

X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on kai2.gnukai.com
X-Spam-Flag: YES
X-Spam-Level: **
X-Spam-Status: Yes, score=6.6 required=3.0 tests=BAYES_60,HTML_MESSAGE,
   INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS,URIBL_DBL_SPAM 
autolearn=no

   version=3.3.1
...
Content analysis details:   (6.6 points, 3.0 required)

pts rule name  description
 -- 
--

1.1 INVALID_DATE   Invalid Date: header (not RFC 2822)
-0.0 SPF_PASS   SPF: sender matches SPF record
2.5 URIBL_DBL_SPAM Contains an URL listed in the DBL blocklist
   [URIs: fellage.me]
1.5 BAYES_60   BODY: Bayes spam probability is 60 to 80%
   [score: 0.6299]
0.0 HTML_MESSAGE   BODY: HTML included in message
0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
0.8 RDNS_NONE  Delivered to internal network by a host with no 
rDNS

[snip..]

The only major difference between those two score sets is the addition of
the URIBL_DBL_SPAM hit in the second one. This ment that by the time you
got around to running that manual check somebody had reported that URL to
the URIBL list and they cataloged it as a spammer URL.

If you had run that manual check at the same time (or soon thereafter)
as the postfix run it probably wouldn't have had that URIBL_DBL_SPAM hit
and thus had the same score.

In that regard, URIBLs are like anti-virus signatures; they don't do much
good on a zero-day attack but catch repeat offenders.
Spammers know that and are registering 10's of thousands (or more) new domain
names each day, using them for a few days and then discarding them.
Good news if you're a registrar (lots of fresh business) bad news if you run
a root DNS server (they're in the multi-million name size) or in the
anti-spam business.

The one thing that might help is to utilize grey-listing in your MTA,
the delaying of unknown mail may give it enough time to become listed
in an URIBL and recognized as spam.

Tough but that's the name of the game these days.


--
Dave Funk  

Re: I'm doing it wrong.

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 20:14 -0600, Kai Meyer wrote:
 I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin 
 (user prefs via mysql) server that I've been running for a few years 

The configuration you pasted below does not show any user_* options.
Unless there are more cf files you omitted, you do not use user_prefs
via SQL.

 now. It's just a few of my private domains, not a lot of traffic. In the 
 last 6 months, the amount of spam getting through has gone from one or 
 two a week to 30 a day. I had sa-learn setup on imap folders called SPAM 
 and HAM running as root, so I just started tossing emails in there. It 

Training as root rather than the system user receiving the mail (and
calling SA) is only possible with site-wide Bayes setup. The pasted
configuration doesn't show that, either, so you would need to train as
the mail receiving / scanning user.

 seemed like I had groups of emails around 2, 0, -1, and -2 (my threshold 
 to dump to my JUNK folder is 3, and I have spamchk sideline things above 
 7). I still get legitimate email in the 2-3 range, but I haven't had 
 legitimate email above 3 in a long time. After a bit, the 2s became 3s 
 and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did 
 this habitually for more than a month, and the progress seemed to stop. 
 I googled around a bit and realized that I didn't do a very good job 
 setting up rules, so I added pyzor and razor2, and they seem functional. 
 Spam got better, and it's down to maybe 10 a day, but they still range 
 all the way up to 5.

Mixing in Razor or Pyzor sure can help. But that setting up rules you
just considered your job is a bit weird. Local rules of course also can
help, but are  (a) an advanced topic, and  (b) not the task of a regular
SA instance. You didn't mention any of that in your configuration
either, so it's unclear what you're about here.


 What really gets me is that if I take an email that scores -2, strip 
 the X-Spam* headers, and run it through spamc by hand (even as the spamd 
 user) just like the spamchk script does, it scores around a 4. I have 

It is not necessary to strip X-Spam headers. SA ignores these, if
present.

You just mixed in a third user, spamd -- in addition to root and the
real mail receiving user. Without site-wide Bayes you are comparing
apples to oranges, and now peaches. All yummy, though not the same.

What is that spamchk script you just mentioned, and how does it fit
into your setup? You should review your entire mail-processing chain.
Describing it in detail might help here, too.

 one here that scores a 4.1 if it comes through the mail, and a 6.6 if I 
 run it manually. What can I do to reconcile these scores? I would like 
 the scores I'm getting from the commandline over the ones I'm getting 
 through postfix, but I don't know the system well enough to know what is 
 causing the difference.

Highlighting the differences, removing common rule hits:

 == Via postfix

   0.0 HTML_IMAGE_RATIO_08BODY: HTML has a low ratio of text to image 
 area

  Via commandline (cat test.mail | sudo -u spamd 
 /usr/bin/spamc -u myemail  postsa.mail)

   2.5 URIBL_DBL_SPAM Contains an URL listed in the DBL blocklist

The Bayesian probability is ~identical, merely differing a thousands.

Hitting URIBL_DBL_SPAM in the later manual check, but not at receiving
time may be due to timing and the URI actually getting listed later.

What's odd is, that the subsequent manual check is *missing* the HTML
image ratio rule triggering. Something altered the message.


  /etc/mail/spamassassin.cf (I added the last 4 lines in 
 a desperate attempt to see something change, but to no effect)
 /etc/mail/spamassassin/local.cf

Which one? The latter spamassassin/local.cf is default (though packager
dependent), the claimed (typo'ed ?) one is custom, if it exists at all.

Snip, skipping to the last four lines:

 auto_learn 0
 use_razor2
 use_dcc
 use_pyzor

auto_learn is not a valid option. That would be bayes_auto_learn.

The other use_* options require arguments (0 or 1). The lines as pasted
do not enable them, and instead produce lint warnings. See

  spamassassin --lint

That lint check is a good starting point anyway...


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: I'm doing it wrong.

2014-05-22 Thread Kai Meyer

On Fri, 23 May 2014 05:33:31 +0200, Karsten Bräckelmann wrote:

On Thu, 2014-05-22 at 20:14 -0600, Kai Meyer wrote:
I have a CentOS 6 postfix + dovecot + mysql (for vmail) + 
spamassassin

(user prefs via mysql) server that I've been running for a few years


The configuration you pasted below does not show any user_* options.
Unless there are more cf files you omitted, you do not use user_prefs
via SQL.

now. It's just a few of my private domains, not a lot of traffic. In 
the
last 6 months, the amount of spam getting through has gone from one 
or
two a week to 30 a day. I had sa-learn setup on imap folders called 
SPAM
and HAM running as root, so I just started tossing emails in there. 
It


Training as root rather than the system user receiving the mail (and
calling SA) is only possible with site-wide Bayes setup. The pasted
configuration doesn't show that, either, so you would need to train 
as

the mail receiving / scanning user.

Ya, that was what I was worried about. Just to clarify, postfix runs as 
the regular postfix user. I'm configured very similar to this:

http://www.akadia.com/services/postfix_spamassassin.html
Notice the spamchk script. My process list has this entry:
postfix  10477 12953  0 22:20 ?00:00:00 pipe -n spamchk -t unix 
flags=Rq user=spamd argv=/usr/local/bin/spamchk -f ${sender} -- 
${recipient}
My spamchk is functionally identical to the one in the link above. (I'm 
using the sideline option, rather than just dumping the email, or 
sending it to another mailbox). My spamd service runs as the user spamd:
root  6188 1  0 15:56 ?00:00:08 /usr/bin/spamd -d -m10 
-q -x -u spamd -r /var/run/spamd.pid

spamd 6190  6188  0 15:56 ?00:01:27 spamd child
So when I run spamassassin manually, I'm using sudo to switch to that 
user (cat test.mail.left | sudo -u spamd /usr/bin/spamc -u 
k...@gnukai.com  test.mail.right)
So if I turn sa-learn back on, I should make sure that I run it as the 
spamd user.
seemed like I had groups of emails around 2, 0, -1, and -2 (my 
threshold
to dump to my JUNK folder is 3, and I have spamchk sideline things 
above

7). I still get legitimate email in the 2-3 range, but I haven't had
legitimate email above 3 in a long time. After a bit, the 2s became 
3s
and the 0s became 1s, but the -1 and -2 spam emails stayed put. I 
did
this habitually for more than a month, and the progress seemed to 
stop.

I googled around a bit and realized that I didn't do a very good job
setting up rules, so I added pyzor and razor2, and they seem 
functional.
Spam got better, and it's down to maybe 10 a day, but they still 
range

all the way up to 5.


Mixing in Razor or Pyzor sure can help. But that setting up rules 
you
just considered your job is a bit weird. Local rules of course also 
can
help, but are  (a) an advanced topic, and  (b) not the task of a 
regular

SA instance. You didn't mention any of that in your configuration
either, so it's unclear what you're about here.


I think by setting up rules I meant adding configurations for pyzor 
and razor2 and the likes. Are they called plugins?



What really gets me is that if I take an email that scores -2, strip
the X-Spam* headers, and run it through spamc by hand (even as the 
spamd
user) just like the spamchk script does, it scores around a 4. I 
have


It is not necessary to strip X-Spam headers. SA ignores these, if
present.

You just mixed in a third user, spamd -- in addition to root and the
real mail receiving user. Without site-wide Bayes you are comparing
apples to oranges, and now peaches. All yummy, though not the same.

What is that spamchk script you just mentioned, and how does it fit
into your setup? You should review your entire mail-processing chain.
Describing it in detail might help here, too.

In the link above, it describes my process pretty closely. I deviate by 
having a sql.cf:

# cat /etc/mail/spamassassin/sql.cf
user_scores_dsn  DBI:mysql:spamassassin:localhost:3306
user_scores_sql_password spampass
user_scores_sql_username spamd
user_scores_sql_custom_query SELECT preference, value FROM _TABLE_ 
WHERE username = _USERNAME_ OR username = '$GLOBAL' OR username = 
CONCAT('%',_DOMAIN_) ORDER BY username ASC


Here's some of the db:
mysql select * from userpref where username='$GLOBAL';
++--++---+--+-+--+-+
| id | username | preference | value | descript | added 
 | added_by | modified|

++--++---+--+-+--+-+
|  1 | $GLOBAL  | required_score | 4.5   | NULL | 2003-01-01 
00:00:00 |  | 2010-08-23 10:23:26 |
| 28 | $GLOBAL  | auto_learn | 0 | NULL | 2014-05-22 
16:20:01 |  | 2014-05-22 16:20:01 |
| 29 | $GLOBAL  | use_razor2 | 1 | NULL | 2014-05-22 
16:20:52 |  | 2014-05-22 16:20:52