spamassassin (cmd line) connection to Redis
Hi all. As stated in the subject I am just trying to test my SpamAssassin 3.4.0 installation (I am using the Debian Jessie package), with the usual method described here: http://wiki.apache.org/spamassassin/TestingInstallation In the output of the command: spamassassin -D gTube_spam.txt I have got the following error: (...) May 22 12:31:39.240 [8390] warn: plugin: eval failed: bayes: Redis failed: Redis error: ERR operation not permitted at /usr/share/perl5 /Mail/SpamAssassin/BayesStore/Redis.pm line 233, GEN2 line 1. at /usr/share/perl5/Mail/SpamAssassin/BayesStore/Redis.pm line 265. (...) In the end the test have worked perfectly, because SA has correctly classified the GTUBE spam sample but I am worried about that Redis error. The SA local.cf contains the following string: bayes_sql_dsn server=10.1.1.19:6379;password=mypass;database=2 (...) which, I taught, should be enough for SA. Note that if I am using the redis-cli from the command line, specifying the same parameters, I did not have any connection/authorization problem. Looking for the line 233 stated in the error message, I found that the error is raised inside the sub on_connect but it looks like it's not a Redis authentication error. Any clues about what I am doing wrong? Thanks in advance! Best regards, Matteo
Re: spamassassin (cmd line) connection to Redis
On 05/22/2014 12:56 PM, Matteo Dessalvi wrote: Hi all. As stated in the subject I am just trying to test my SpamAssassin 3.4.0 installation (I am using the Debian Jessie package), with the usual method described here: http://wiki.apache.org/spamassassin/TestingInstallation In the output of the command: spamassassin -D gTube_spam.txt I have got the following error: (...) May 22 12:31:39.240 [8390] warn: plugin: eval failed: bayes: Redis failed: Redis error: ERR operation not permitted at /usr/share/perl5 /Mail/SpamAssassin/BayesStore/Redis.pm line 233, GEN2 line 1. at /usr/share/perl5/Mail/SpamAssassin/BayesStore/Redis.pm line 265. (...) In the end the test have worked perfectly, because SA has correctly classified the GTUBE spam sample but I am worried about that Redis error. The SA local.cf contains the following string: bayes_sql_dsn server=10.1.1.19:6379;password=mypass;database=2 (...) which, I taught, should be enough for SA. Note that if I am using the redis-cli from the command line, specifying the same parameters, I did not have any connection/authorization problem. Looking for the line 233 stated in the error message, I found that the error is raised inside the sub on_connect but it looks like it's not a Redis authentication error. Any clues about what I am doing wrong? Thanks in advance! have you included this in your local.cf ? bayes_store_module Mail::SpamAssassin::BayesStore::Redis
Re: spamassassin (cmd line) connection to Redis
On 05/22/2014 12:56 PM, Matteo Dessalvi wrote: Hi all. As stated in the subject I am just trying to test my SpamAssassin 3.4.0 installation (I am using the Debian Jessie package), with the usual method described here: http://wiki.apache.org/spamassassin/TestingInstallation In the output of the command: spamassassin -D gTube_spam.txt I have got the following error: (...) May 22 12:31:39.240 [8390] warn: plugin: eval failed: bayes: Redis failed: Redis error: ERR operation not permitted at /usr/share/perl5 /Mail/SpamAssassin/BayesStore/Redis.pm line 233, GEN2 line 1. at /usr/share/perl5/Mail/SpamAssassin/BayesStore/Redis.pm line 265. (...) In the end the test have worked perfectly, because SA has correctly classified the GTUBE spam sample but I am worried about that Redis error. The SA local.cf contains the following string: bayes_sql_dsn server=10.1.1.19:6379;password=mypass;database=2 (...) which, I taught, should be enough for SA. Note that if I am using the redis-cli from the command line, specifying the same parameters, I did not have any connection/authorization problem. Looking for the line 233 stated in the error message, I found that the error is raised inside the sub on_connect but it looks like it's not a Redis authentication error. Any clues about what I am doing wrong? Thanks in advance! what happens if you don't use authentication?
Re: spamassassin (cmd line) connection to Redis
On 22.05.2014 13:10, Axb wrote: have you included this in your local.cf ? bayes_store_module Mail::SpamAssassin::BayesStore::Redis These are the relevant configuration lines for the Redis SA module: bayes_store_module Mail::SpamAssassin::BayesStore::Redis bayes_sql_dsn server=10.1.1.19:6379;password=mypass;database=2 bayes_token_ttl 21d bayes_seen_ttl 8d bayes_auto_expire 1 On 22.05.2014 13:12, Axb wrote: what happens if you don't use authentication? It looks like the problem lies in the authentication. When I have tried with an empty 'password=' (after disabling the requirepass in the redis.conf) I have got the following messages: (I have included empty lines for the sake of readbility): (...) dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x3cc14c0), bayes_store_module=Mail::SpamAssassin::BayesStore::Redis dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::Redis=HASH(0x42161c0) dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x3cc14c0) implements 'learner_is_scan_available', priority 0 dbg: bayes: _open_db(not yet connected) dbg: bayes: Redis on-connect, db_id 2 dbg: bayes: CLIENT SETNAME command failed, don't worry, possibly an old redis version: ERR Syntax error, try CLIENT (LIST | KILL ip:port) dbg: bayes: redis server version 2.4.14, memory used 6.8 MiB, Lua is not available dbg: bayes: initialized empty database, version 3 dbg: bayes: nspam_nham_get nspam=0, nham=0 dbg: bayes: not available for scanning, only 0 spam(s) in bayes DB 200 (...) Of course this is just the initial test, so I do not have enough bayes data. The 'CLIENT SETNAME' error is probably due to my old Redis version but other than that it looks fine. I will try again with the authentication enabled and see if I stumble in the same problem as before. Best regards, Matteo
Re: spamassassin (cmd line) connection to Redis
On 05/22/2014 02:06 PM, Matteo Dessalvi wrote: dbg: bayes: redis server version 2.4.14, memory used 6.8 MiB, Lua is not available You're using an ancient Redis version with no LUA support. Redis 2.8.9 is the latest stable version. I'd suggest you update Redis before you go on chasing windmills.
Rule updates?
Hi, After checking the results of sa-update and doing some manual dns queries, it seems that last rule updates were done more than a month ago. This used to be an almost daily process, even when there were only score changes due to masschecks. Any specific reason for no new updates? Something we can assist with? Regards, Tom signature.asc Description: OpenPGP digital signature
Re: spamassassin (cmd line) connection to Redis
Yes, you are definitely right: with the latest stable Redis version (2.8.9 indeed) everything works smoothly with the authentication. Thanks for pointing me in the right direction! Best regards, Matteo On 22.05.2014 14:10, Axb wrote: You're using an ancient Redis version with no LUA support. Redis 2.8.9 is the latest stable version. I'd suggest you update Redis before you go on chasing windmills.
Re: Rule updates?
On 5/22/2014 9:04 AM, Tom Hendrikx wrote: After checking the results of sa-update and doing some manual dns queries, it seems that last rule updates were done more than a month ago. This used to be an almost daily process, even when there were only score changes due to masschecks. Any specific reason for no new updates? Something we can assist with? Hi Tom, The system running the update processing failed catastrophically and backups were insufficient. I've been rebuilding the box as time allows. Regards, KAM
Re: spamassassin (cmd line) connection to Redis
On 05/22/2014 03:27 PM, Matteo Dessalvi wrote: Yes, you are definitely right: with the latest stable Redis version (2.8.9 indeed) everything works smoothly with the authentication. Thanks for pointing me in the right direction! Best regards, Matteo On 22.05.2014 14:10, Axb wrote: You're using an ancient Redis version with no LUA support. Redis 2.8.9 is the latest stable version. I'd suggest you update Redis before you go on chasing windmills. Good to hear you got it working. If your box is high traffic, watch the Redis memory usage. When it does the dump to file it duplicates memory usage so if you expect Redis to use 2GB of memory, you'll need 4GB of free memory to do the dump. Swapping is not a happy option my Redis usage looks like. bayes_token_ttl 432000 bayes_seen_ttl 2d 0.000 0 22202312 0 non-token data: nspam 0.000 09593796 0 non-token data: nham # Clients connected_clients:203 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0 # Memory used_memory:4085255152 used_memory_human:3.80G used_memory_rss:6439870464 used_memory_peak:6307356768 used_memory_peak_human:5.87G used_memory_lua:126976 mem_fragmentation_ratio:1.58 mem_allocator:jemalloc-3.2.0 used_memory_peak is what it used to do the dump to file. and even with that amount of data, Bayes is extremely fast and goes totally unnoticed in overall msg processing time.
Re: Rule updates?
On 05/22/2014 03:36 PM, Kevin A. McGrail wrote: On 5/22/2014 9:04 AM, Tom Hendrikx wrote: After checking the results of sa-update and doing some manual dns queries, it seems that last rule updates were done more than a month ago. This used to be an almost daily process, even when there were only score changes due to masschecks. Any specific reason for no new updates? Something we can assist with? Hi Tom, The system running the update processing failed catastrophically and backups were insufficient. Ah, bugger ; I've been rebuilding the box as time allows. Fair enough :) Thanks fr the insight. Kind regards, Tom signature.asc Description: OpenPGP digital signature
Re: autolearn_force
On Wed, 21 May 2014 21:34:23 -0700 Ian Zimmerman wrote: I don't understand this setting, and reading the documentation doesn't help. It seems it sould make bayes learn spam whenever the total score surpasses the value of bayes_auto_learn_threshold_spam, and not require 3 points from header and body each; that would make it a global setting similar in purpose to bayes_auto_learn_threshold_spam. But in fact this is a per-test setting, a subcategory of tflags. Do I have to specify it separately for every test? Why? The point is to set it for a small number of rules that are sufficiently strong as to guarantee there will be no mislearning in combination with the autolearn as spam threshold. It's probably best to create a single metarule for this - something that eliminates the possibility of mistraining through a lot of overlapping rules. I do something similar to get more spam into my high-scoring folder. I assign a lot of the near-certain spam rules to different classes: BAYES, RBLs, URIBLs, relaycountry etc and then count the number of classes.
Re: autolearn_force
On Thu, 22 May 2014 15:54:42 +0100 RW rwmailli...@googlemail.com wrote: Ian I don't understand this setting, and reading the documentation Ian doesn't help. Ian It seems it should make Bayes learn spam whenever the total score Ian surpasses the value of bayes_auto_learn_threshold_spam, and not Ian require 3 points from header and body each; that would make it a Ian global setting similar in purpose to Ian bayes_auto_learn_threshold_spam. Ian But in fact this is a per-test setting, a subcategory of tflags. Ian Do I have to specify it separately for every test? Why? RW The point is to set it for a small number of rules that are RW sufficiently strong as to guarantee there will be no mislearning in RW combination with the autolearn as spam threshold. RW It's probably best to create a single metarule for this - something RW that eliminates the possibility of mistraining through a lot of RW overlapping rules. I do something similar to get more spam into my RW high-scoring folder. I assign a lot of the near-certain spam rules RW to different classes: BAYES, RBLs, URIBLs, relaycountry etc and then RW count the number of classes. The problem I am trying to solve is that nearly all of my spam is flagged due to body rules. The header rules seem to be close to useless with the latest campaigns - spammers seem to have learned enough to avoid sending obvious stinking pieces of turd. (The one exception is patterns in the Message-ID, but I am afraid that will be short lived too, and is insufficient by itself even now). Thus, even if I set bayes_auto_learn_threshold_spam low, very few of my spams are autolearned because of the 3/3 requirement. The damn 3/3 is my problem - how can I work around it? If I have to spend an hour a day manually training the classifier the spammers have won :-( By the way, how are meta rules counted for this purpose? The documentation says nothing about that. -- Please *no* private copies of mailing list or newsgroup messages.
Mystery SpamWare
Hi Team, All of a sudden I've started noticing a lot of spam coming in with some fairly unique headers like this: x-track-version: 4 x-track-source: notifire_XXX x-track-spooler-id: x-track-spooler-split-id: x-track-spooler-segment-id: x-render: render- Precedence: bulk x-track-contact-id: is some number which varies with user to some degree, XXX varies by spammer. Does anyone recognise where these headers come from? Thanks Jude.
Re: Mystery SpamWare
On 05/22/2014 07:23 PM, hospice admin wrote: Hi Team, All of a sudden I've started noticing a lot of spam coming in with some fairly unique headers like this: x-track-version: 4 x-track-source: notifire_XXX x-track-spooler-id: x-track-spooler-split-id: x-track-spooler-segment-id: x-render: render- Precedence: bulk x-track-contact-id: is some number which varies with user to some degree, XXX varies by spammer. Does anyone recognise where these headers come from? Thanks can you pastebin a sample?
Re: 20_sought_fraud.cf
On 5/20/2014 3:03 PM, psychobyte wrote: Hi, Has there been any progress on this? We are looking to integrate these rules but, won't bother if the project is abandoned. Thanks, There has been some progress, yes but it's taken a back seat a bit. It's not abandoned. Ping the list in 2 weeks. regards, KAM
Blank line rules
I am clearly missing something with these rules but I lack the experience to see what it is: score RAW_BLANK_LINES_05 0.5 rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i describe RAW_BLANK_LINES_05 Raw body contains 5 or more consecutive empty lines score RAW_BLANK_LINES_10 1.0 rawbody RAW_BLANK_LINES_10 /(\r?\n){10,24}/i describe RAW_BLANK_LINES_10 Raw body contains 10 or more consecutive empty lines score RAW_BLANK_LINES_15 1.5 rawbody RAW_BLANK_LINES_15 /(\r?\n){25}/ describe RAW_BLANK_LINES_15 Raw body contains 25 or more consecutive empty lines I created a test file that consisted of nought but newlines (shown as $ characters using vim set list). I passed it to spamassassin from the command line with the above rules in /etc/mail/spamassassin/local.cf and nothing was reported. I used an actual message body from a spam message received and only the RAW_BLANK_LINES_05 test is tripped even though the body of that message has 18 consecutive blank lines, also consisting of nothing but \n characters. So what is it about the regexp I am using that I evidently do not understand? -- *** E-Mail is NOT a SECURE channel *** James B. Byrnemailto:byrn...@harte-lyne.ca Harte Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
Re: 20_sought_fraud.cf
Great! Will do and Thx. On 05/22/2014 12:13 PM, Kevin A. McGrail wrote: On 5/20/2014 3:03 PM, psychobyte wrote: Hi, Has there been any progress on this? We are looking to integrate these rules but, won't bother if the project is abandoned. Thanks, There has been some progress, yes but it's taken a back seat a bit. It's not abandoned. Ping the list in 2 weeks. regards, KAM
Re: Blank line rules
On Thu, 22 May 2014, James B. Byrne wrote: I am clearly missing something with these rules but I lack the experience to see what it is: score RAW_BLANK_LINES_05 0.5 rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i describe RAW_BLANK_LINES_05 Raw body contains 5 or more consecutive empty lines score RAW_BLANK_LINES_10 1.0 rawbody RAW_BLANK_LINES_10 /(\r?\n){10,24}/i describe RAW_BLANK_LINES_10 Raw body contains 10 or more consecutive empty lines score RAW_BLANK_LINES_15 1.5 rawbody RAW_BLANK_LINES_15 /(\r?\n){25}/ describe RAW_BLANK_LINES_15 Raw body contains 25 or more consecutive empty lines Regular expressions by default only consider a single line of text. You need to provide an option to say treat multiple lines as a single line. Try this: rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m rawbody RAW_BLANK_LINES_10 /(?:\r?\n){10,24}/m rawbody RAW_BLANK_LINES_15 /(?:\r?\n){25}/m The case-insensitive flag is not meaningful for these rules as there's no attempt to match text, and I added the ?: to make the groups non-capturing, which is a bit more efficient. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Windows and its users got mentioned at home today, after my wife the psych major brought up Seligman's theory of learned helplessness. -- Dan Birchall in a.s.r --- 4 days until Memorial Day - honor those who sacrificed for our liberty
Re: Blank line rules
On Thu, 22 May 2014 13:47:04 -0700 (PDT) John Hardin jhar...@impsec.org wrote: John Regular expressions by default only consider a single line of John text. You need to provide an option to say treat multiple lines John as a single line. Try this: rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m rawbody RAW_BLANK_LINES_10 /(?:\r?\n){10,24}/m rawbody RAW_BLANK_LINES_15 /(?:\r?\n){25}/m James, see also the Bayes refinement thread where I posted about doing the exact same thing. Somehow John's multiline rules don't work for me, either. Kärsten was looking at it last I know. -- Please *no* private copies of mailing list or newsgroup messages.
Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)
On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote: In either case, having a sample would speed up this ping-pong style debugging. And I am curious. ;) Mind putting your sample up a pastebin? Ian sent me the original message off-list. It indeed contains about 16 consecutive newlines, but doesn't trigger the rawbody rules discussed. The issue is not related to rawbody being split up into chunks. A stripped down test-case is easy to generate: echo -e \n\n~\n\n\n\nend That's an empty mail header and a very short text body, consisting of consecutive newlines. The tilde and end string are merely there for anchoring and visualizing the match. The rule for debugging the issue is the same I posted before, just slightly modified to better visualize the match. rawbody __BLANKS /.\n{2,}/ tflags __BLANKS multiple Feeding the test-case to spamassassin -D, the debug output shows the match like the following: dbg: rules: ran rawbody rule __BLANKS == got hit: ~ dbg: rules: [...] ... dbg: rules: [...] The number of continuation lines equals the number of newlines in the test-case. Well, up until 12, that is. :-/ Any number up to 11 of consecutive newlines can be matched with rawbody rules. However, 12 or more consecutive newlines will be squeezed and replaced by exactly two newlines. I've had a quick look at the code already, but did not yet find where the supposedly raw (sic) body gets altered. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Blank line rules
On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote: I am clearly missing something with these rules but I lack the experience to see what it is: score RAW_BLANK_LINES_05 0.5 rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i Why is everyone trying to match empty lines these days? Must be spam I'm missing out on. ;) I passed it to spamassassin from the command line with the above rules in /etc/mail/spamassassin/local.cf and nothing was reported. I used an actual message body from a spam message received and only the RAW_BLANK_LINES_05 test is tripped even though the body of that message has 18 consecutive blank lines, also consisting of nothing but \n characters. So what is it about the regexp I am using that I evidently do not understand? See the post Consecutive Newlines in Rawbody Rules as of a few minutes ago, follow-up to the Bayes refinement thread. In a nutshell: 12 or more consecutive newlines cannot be matched with rawbody rules. They get replaced by 2 newlines. There's another issue with your approach of different rules matching up to n occurrences and more than n. The first will always match in addition, if the latter matches. If the desired behavior is mutually exclusive matching, you need meta rules actually encoding the math / logic. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Blank line rules
On Thu, 2014-05-22 at 13:47 -0700, John Hardin wrote: On Thu, 22 May 2014, James B. Byrne wrote: rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i Regular expressions by default only consider a single line of text. You Nope. You're thinking about ^ and $ by default only matching the beginning and end of the string. A \n newline is just an ordinary char. REs don't know the concept of lines, they operate on a string. need to provide an option to say treat multiple lines as a single line. Try this: rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m The /m modifier changes ^ and $ to match anywhere in the string. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)
On Thu, 22 May 2014, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote: [snip..] The number of continuation lines equals the number of newlines in the test-case. Well, up until 12, that is. :-/ Any number up to 11 of consecutive newlines can be matched with rawbody rules. However, 12 or more consecutive newlines will be squeezed and replaced by exactly two newlines. I've had a quick look at the code already, but did not yet find where the supposedly raw (sic) body gets altered. Look at Message.pm, around line 300: # if we've got a series of blank lines, get rid of them if (defined $start) { my $num = $start-$cnt; if ($num 10) { splice @message, $cnt+2, $num-1; } -- Dave Funk University of Iowa dbfunk (at) engineering.uiowa.eduCollege of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include std_disclaimer.h Better is not better, 'standard' is better. B{
Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)
On Thu, 22 May 2014, David B Funk wrote: On Thu, 22 May 2014, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote: [snip..] The number of continuation lines equals the number of newlines in the test-case. Well, up until 12, that is. :-/ Any number up to 11 of consecutive newlines can be matched with rawbody rules. However, 12 or more consecutive newlines will be squeezed and replaced by exactly two newlines. I've had a quick look at the code already, but did not yet find where the supposedly raw (sic) body gets altered. Look at Message.pm, around line 300: # if we've got a series of blank lines, get rid of them if (defined $start) { my $num = $start-$cnt; if ($num 10) { splice @message, $cnt+2, $num-1; } After doing some experimenting with that code I came up with something that I'd argue is more semantically correct: # if we've got a long series of blank lines, limit them if (defined $start) { my $max_blank_lines = 20; my $num = $start-$cnt; if ($num $max_blank_lines) { splice @message, $cnt+2, $num-$max_blank_lines; } undef $start; } IE limit a message to no more than $max_blank_lines in a row, not the total collapse of more than 11. (adjust $max_blank_lines as you see fit or make it a configurable parameter). After making that change, I found rules like BLANK_LINES_60_70 BODY: Message is at least 60% blank lines started firing on a test message that I was using. So could argue by that total collapse of large blocks of lines the creators of that code are torpedoing rules like BLANK_LINES_60_70. -- Dave Funk University of Iowa dbfunk (at) engineering.uiowa.eduCollege of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include std_disclaimer.h Better is not better, 'standard' is better. B{
Re: Consecutive Newlines in Rawbody Rules
On Thu, 2014-05-22 at 17:43 -0500, David B Funk wrote: On Thu, 22 May 2014, Karsten Bräckelmann wrote: Any number up to 11 of consecutive newlines can be matched with rawbody rules. However, 12 or more consecutive newlines will be squeezed and replaced by exactly two newlines. I've had a quick look at the code already, but did not yet find where the supposedly raw (sic) body gets altered. Look at Message.pm, around line 300: Thanks, good catch! # if we've got a series of blank lines, get rid of them if (defined $start) { my $num = $start-$cnt; if ($num 10) { splice @message, $cnt+2, $num-1; } 10 empty lines, 11 consecutive newlines. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)
On Thu, 2014-05-22 at 18:34 -0500, David B Funk wrote: After doing some experimenting with that code I came up with something that I'd argue is more semantically correct: # if we've got a long series of blank lines, limit them if (defined $start) { my $max_blank_lines = 20; my $num = $start-$cnt; if ($num $max_blank_lines) { splice @message, $cnt+2, $num-$max_blank_lines; } undef $start; } IE limit a message to no more than $max_blank_lines in a row, not the total collapse of more than 11. (adjust $max_blank_lines as you see fit or make it a configurable parameter). +1 Can you file a bug report or raise the topic in dev@ list? The code change is sufficiently simple, but I want that issue discussed first. Wonder what's the reason for that collapsing in the first place. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Blank line rules
On Thu, 22 May 2014, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote: I am clearly missing something with these rules but I lack the experience to see what it is: score RAW_BLANK_LINES_05 0.5 rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i Why is everyone trying to match empty lines these days? Must be spam I'm missing out on. ;) Heh. Something similar just plopped into my spam quarantine. You might want to do this: rawbody MANY_BLANK_LINES /(?:(?:br)?\r?\n){9}/mi -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- ...intellectuals have no interest in what _creates_ wealth, and what _inhibits_ the creation of wealth. They are very concerned about the _distribution_ of it, but they act as if wealth just exists somehow. It's like manna from heaven, it's only a question of how we split it up.-- Thomas Sowell --- 4 days until Memorial Day - honor those who sacrificed for our liberty
Re: Blank line rules
On May 22, 2014, at 6:44 PM, John Hardin jhar...@impsec.org wrote: You might want to do this: rawbody MANY_BLANK_LINES /(?:(?:br)?\r?\n){9}/mi AC_BR_BONANZA should cover the HTML case. It could be easily extended to match standard LF or CR per above. (In my case I am matching something like 20 newlines for the HTML case, to try to prevent FPs.) --- Amir thumbed via iPhone
Re: Blank line rules
On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote: Why is everyone trying to match empty lines these days? Must be spam I'm missing out on. ;) Who here has seen Pootietang and is laughing about this? Just me, likely...
Re: Mystery SpamWare
On Thu, 22 May 2014 18:23:48 +0100 hospice admin hospice...@outlook.com wrote: Hi Team, All of a sudden I've started noticing a lot of spam coming in with some fairly unique headers like this: x-track-version: 4 x-track-source: notifire_XXX x-track-spooler-id: x-track-spooler-split-id: x-track-spooler-segment-id: x-render: render- Precedence: bulk x-track-contact-id: is some number which varies with user to some degree, XXX varies by spammer. Does anyone recognise where these headers come from? Those headers seem to be tracking headers for commercial email marketing campaigns. Possibly from Notifire.co.uk, an email massmarketing firm, calling itself a white label. Quite uncertain w/o more data. But those headers are enough to make a filter from or to use in header checks to reject such trash. jd
Re: Blank line rules
On Thu, 2014-05-22 at 20:56 -0400, Kevin A. McGrail wrote: On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote: Why is everyone trying to match empty lines these days? Must be spam I'm missing out on. ;) Who here has seen Pootietang and is laughing about this? Just me, likely... The fact I just googled that word should sufficiently answer it as far as I am concerned. ;) Good thing it amused you, but that reference was certainly unintended. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
OFF-TOPIC: The Brilliance of PootieTang was Re: Blank line rules
On 5/22/2014 9:17 PM, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 20:56 -0400, Kevin A. McGrail wrote: On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote: Why is everyone trying to match empty lines these days? Must be spam I'm missing out on. ;) Who here has seen Pootietang and is laughing about this? Just me, likely... The fact I just googled that word should sufficiently answer it as far as I am concerned. ;) Good thing it amused you, but that reference was certainly unintended. https://www.youtube.com/watch?v=RtCxvv8Y3Bs 2:54 is classic. This movie is one of the real hit or miss comedies. I think it's brilliant on a lot of levels. Others don't get it. regards, KAM
I'm doing it wrong.
I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin (user prefs via mysql) server that I've been running for a few years now. It's just a few of my private domains, not a lot of traffic. In the last 6 months, the amount of spam getting through has gone from one or two a week to 30 a day. I had sa-learn setup on imap folders called SPAM and HAM running as root, so I just started tossing emails in there. It seemed like I had groups of emails around 2, 0, -1, and -2 (my threshold to dump to my JUNK folder is 3, and I have spamchk sideline things above 7). I still get legitimate email in the 2-3 range, but I haven't had legitimate email above 3 in a long time. After a bit, the 2s became 3s and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did this habitually for more than a month, and the progress seemed to stop. I googled around a bit and realized that I didn't do a very good job setting up rules, so I added pyzor and razor2, and they seem functional. Spam got better, and it's down to maybe 10 a day, but they still range all the way up to 5. What really gets me is that if I take an email that scores -2, strip the X-Spam* headers, and run it through spamc by hand (even as the spamd user) just like the spamchk script does, it scores around a 4. I have one here that scores a 4.1 if it comes through the mail, and a 6.6 if I run it manually. What can I do to reconcile these scores? I would like the scores I'm getting from the commandline over the ones I'm getting through postfix, but I don't know the system well enough to know what is causing the difference. == Via postfix X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on kai2.gnukai.com X-Spam-Flag: YES X-Spam-Level: X-Spam-Status: Yes, score=4.1 required=3.0 tests=BAYES_60,HTML_IMAGE_RATIO_08, HTML_MESSAGE,INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS autolearn=no version=3.3.1 ... Content analysis details: (4.1 points, 3.0 required) pts rule name description -- -- 1.1 INVALID_DATE Invalid Date: header (not RFC 2822) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 HTML_IMAGE_RATIO_08BODY: HTML has a low ratio of text to image area 1.5 BAYES_60 BODY: Bayes spam probability is 60 to 80% [score: 0.6298] 0.0 HTML_MESSAGE BODY: HTML included in message 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS Via commandline (cat test.mail | sudo -u spamd /usr/bin/spamc -u myemail postsa.mail) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on kai2.gnukai.com X-Spam-Flag: YES X-Spam-Level: ** X-Spam-Status: Yes, score=6.6 required=3.0 tests=BAYES_60,HTML_MESSAGE, INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS,URIBL_DBL_SPAM autolearn=no version=3.3.1 ... Content analysis details: (6.6 points, 3.0 required) pts rule name description -- -- 1.1 INVALID_DATE Invalid Date: header (not RFC 2822) -0.0 SPF_PASS SPF: sender matches SPF record 2.5 URIBL_DBL_SPAM Contains an URL listed in the DBL blocklist [URIs: fellage.me] 1.5 BAYES_60 BODY: Bayes spam probability is 60 to 80% [score: 0.6299] 0.0 HTML_MESSAGE BODY: HTML included in message 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS /etc/mail/spamassassin.cf (I added the last 4 lines in a desperate attempt to see something change, but to no effect) /etc/mail/spamassassin/local.cf # These values can be overridden by editing ~/.spamassassin/user_prefs.cf # (see spamassassin(1) for details) # These should be safe assumptions and allow for simple visual sifting # without risking lost emails. required_hits 5.0 report_safe 1 rewrite_header Subject [***SPAM***] add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_ trusted_networks 69.160.84.222 razor_config /etc/mail/spamassassin/.razor/razor-agent.conf pyzor_options --homedir /etc/mail/spamassassin auto_learn 0 use_razor2 use_dcc use_pyzor
Re: I'm doing it wrong.
On Thu, 22 May 2014, Kai Meyer wrote: I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin (user prefs via mysql) server that I've been running for a few years now. It's just a few of my private domains, not a lot of traffic. In the last 6 months, the amount of spam getting through has gone from one or two a week to 30 a day. I had sa-learn setup on imap folders called SPAM and HAM running as root, so I just started tossing emails in there. It seemed like I had groups of emails around 2, 0, -1, and -2 (my threshold to dump to my JUNK folder is 3, and I have spamchk sideline things above 7). I still get legitimate email in the 2-3 range, but I haven't had legitimate email above 3 in a long time. After a bit, the 2s became 3s and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did this habitually for more than a month, and the progress seemed to stop. I googled around a bit and realized that I didn't do a very good job setting up rules, so I added pyzor and razor2, and they seem functional. Spam got better, and it's down to maybe 10 a day, but they still range all the way up to 5. What really gets me is that if I take an email that scores -2, strip the X-Spam* headers, and run it through spamc by hand (even as the spamd user) just like the spamchk script does, it scores around a 4. I have one here that scores a 4.1 if it comes through the mail, and a 6.6 if I run it manually. What can I do to reconcile these scores? I would like the scores I'm getting from the commandline over the ones I'm getting through postfix, but I don't know the system well enough to know what is causing the difference. == Via postfix X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on kai2.gnukai.com X-Spam-Flag: YES X-Spam-Level: X-Spam-Status: Yes, score=4.1 required=3.0 tests=BAYES_60,HTML_IMAGE_RATIO_08, HTML_MESSAGE,INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS autolearn=no version=3.3.1 ... Content analysis details: (4.1 points, 3.0 required) pts rule name description -- -- 1.1 INVALID_DATE Invalid Date: header (not RFC 2822) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 HTML_IMAGE_RATIO_08BODY: HTML has a low ratio of text to image area 1.5 BAYES_60 BODY: Bayes spam probability is 60 to 80% [score: 0.6298] 0.0 HTML_MESSAGE BODY: HTML included in message 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS Via commandline (cat test.mail | sudo -u spamd /usr/bin/spamc -u myemail postsa.mail) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on kai2.gnukai.com X-Spam-Flag: YES X-Spam-Level: ** X-Spam-Status: Yes, score=6.6 required=3.0 tests=BAYES_60,HTML_MESSAGE, INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS,URIBL_DBL_SPAM autolearn=no version=3.3.1 ... Content analysis details: (6.6 points, 3.0 required) pts rule name description -- -- 1.1 INVALID_DATE Invalid Date: header (not RFC 2822) -0.0 SPF_PASS SPF: sender matches SPF record 2.5 URIBL_DBL_SPAM Contains an URL listed in the DBL blocklist [URIs: fellage.me] 1.5 BAYES_60 BODY: Bayes spam probability is 60 to 80% [score: 0.6299] 0.0 HTML_MESSAGE BODY: HTML included in message 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS [snip..] The only major difference between those two score sets is the addition of the URIBL_DBL_SPAM hit in the second one. This ment that by the time you got around to running that manual check somebody had reported that URL to the URIBL list and they cataloged it as a spammer URL. If you had run that manual check at the same time (or soon thereafter) as the postfix run it probably wouldn't have had that URIBL_DBL_SPAM hit and thus had the same score. In that regard, URIBLs are like anti-virus signatures; they don't do much good on a zero-day attack but catch repeat offenders. Spammers know that and are registering 10's of thousands (or more) new domain names each day, using them for a few days and then discarding them. Good news if you're a registrar (lots of fresh business) bad news if you run a root DNS server (they're in the multi-million name size) or in the anti-spam business. The one thing that might help is to utilize grey-listing in your MTA, the delaying of unknown mail may give it enough time to become listed in an URIBL and recognized as spam. Tough but that's the name of the game these days. -- Dave Funk
Re: I'm doing it wrong.
On Thu, 2014-05-22 at 20:14 -0600, Kai Meyer wrote: I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin (user prefs via mysql) server that I've been running for a few years The configuration you pasted below does not show any user_* options. Unless there are more cf files you omitted, you do not use user_prefs via SQL. now. It's just a few of my private domains, not a lot of traffic. In the last 6 months, the amount of spam getting through has gone from one or two a week to 30 a day. I had sa-learn setup on imap folders called SPAM and HAM running as root, so I just started tossing emails in there. It Training as root rather than the system user receiving the mail (and calling SA) is only possible with site-wide Bayes setup. The pasted configuration doesn't show that, either, so you would need to train as the mail receiving / scanning user. seemed like I had groups of emails around 2, 0, -1, and -2 (my threshold to dump to my JUNK folder is 3, and I have spamchk sideline things above 7). I still get legitimate email in the 2-3 range, but I haven't had legitimate email above 3 in a long time. After a bit, the 2s became 3s and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did this habitually for more than a month, and the progress seemed to stop. I googled around a bit and realized that I didn't do a very good job setting up rules, so I added pyzor and razor2, and they seem functional. Spam got better, and it's down to maybe 10 a day, but they still range all the way up to 5. Mixing in Razor or Pyzor sure can help. But that setting up rules you just considered your job is a bit weird. Local rules of course also can help, but are (a) an advanced topic, and (b) not the task of a regular SA instance. You didn't mention any of that in your configuration either, so it's unclear what you're about here. What really gets me is that if I take an email that scores -2, strip the X-Spam* headers, and run it through spamc by hand (even as the spamd user) just like the spamchk script does, it scores around a 4. I have It is not necessary to strip X-Spam headers. SA ignores these, if present. You just mixed in a third user, spamd -- in addition to root and the real mail receiving user. Without site-wide Bayes you are comparing apples to oranges, and now peaches. All yummy, though not the same. What is that spamchk script you just mentioned, and how does it fit into your setup? You should review your entire mail-processing chain. Describing it in detail might help here, too. one here that scores a 4.1 if it comes through the mail, and a 6.6 if I run it manually. What can I do to reconcile these scores? I would like the scores I'm getting from the commandline over the ones I'm getting through postfix, but I don't know the system well enough to know what is causing the difference. Highlighting the differences, removing common rule hits: == Via postfix 0.0 HTML_IMAGE_RATIO_08BODY: HTML has a low ratio of text to image area Via commandline (cat test.mail | sudo -u spamd /usr/bin/spamc -u myemail postsa.mail) 2.5 URIBL_DBL_SPAM Contains an URL listed in the DBL blocklist The Bayesian probability is ~identical, merely differing a thousands. Hitting URIBL_DBL_SPAM in the later manual check, but not at receiving time may be due to timing and the URI actually getting listed later. What's odd is, that the subsequent manual check is *missing* the HTML image ratio rule triggering. Something altered the message. /etc/mail/spamassassin.cf (I added the last 4 lines in a desperate attempt to see something change, but to no effect) /etc/mail/spamassassin/local.cf Which one? The latter spamassassin/local.cf is default (though packager dependent), the claimed (typo'ed ?) one is custom, if it exists at all. Snip, skipping to the last four lines: auto_learn 0 use_razor2 use_dcc use_pyzor auto_learn is not a valid option. That would be bayes_auto_learn. The other use_* options require arguments (0 or 1). The lines as pasted do not enable them, and instead produce lint warnings. See spamassassin --lint That lint check is a good starting point anyway... -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: I'm doing it wrong.
On Fri, 23 May 2014 05:33:31 +0200, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 20:14 -0600, Kai Meyer wrote: I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin (user prefs via mysql) server that I've been running for a few years The configuration you pasted below does not show any user_* options. Unless there are more cf files you omitted, you do not use user_prefs via SQL. now. It's just a few of my private domains, not a lot of traffic. In the last 6 months, the amount of spam getting through has gone from one or two a week to 30 a day. I had sa-learn setup on imap folders called SPAM and HAM running as root, so I just started tossing emails in there. It Training as root rather than the system user receiving the mail (and calling SA) is only possible with site-wide Bayes setup. The pasted configuration doesn't show that, either, so you would need to train as the mail receiving / scanning user. Ya, that was what I was worried about. Just to clarify, postfix runs as the regular postfix user. I'm configured very similar to this: http://www.akadia.com/services/postfix_spamassassin.html Notice the spamchk script. My process list has this entry: postfix 10477 12953 0 22:20 ?00:00:00 pipe -n spamchk -t unix flags=Rq user=spamd argv=/usr/local/bin/spamchk -f ${sender} -- ${recipient} My spamchk is functionally identical to the one in the link above. (I'm using the sideline option, rather than just dumping the email, or sending it to another mailbox). My spamd service runs as the user spamd: root 6188 1 0 15:56 ?00:00:08 /usr/bin/spamd -d -m10 -q -x -u spamd -r /var/run/spamd.pid spamd 6190 6188 0 15:56 ?00:01:27 spamd child So when I run spamassassin manually, I'm using sudo to switch to that user (cat test.mail.left | sudo -u spamd /usr/bin/spamc -u k...@gnukai.com test.mail.right) So if I turn sa-learn back on, I should make sure that I run it as the spamd user. seemed like I had groups of emails around 2, 0, -1, and -2 (my threshold to dump to my JUNK folder is 3, and I have spamchk sideline things above 7). I still get legitimate email in the 2-3 range, but I haven't had legitimate email above 3 in a long time. After a bit, the 2s became 3s and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did this habitually for more than a month, and the progress seemed to stop. I googled around a bit and realized that I didn't do a very good job setting up rules, so I added pyzor and razor2, and they seem functional. Spam got better, and it's down to maybe 10 a day, but they still range all the way up to 5. Mixing in Razor or Pyzor sure can help. But that setting up rules you just considered your job is a bit weird. Local rules of course also can help, but are (a) an advanced topic, and (b) not the task of a regular SA instance. You didn't mention any of that in your configuration either, so it's unclear what you're about here. I think by setting up rules I meant adding configurations for pyzor and razor2 and the likes. Are they called plugins? What really gets me is that if I take an email that scores -2, strip the X-Spam* headers, and run it through spamc by hand (even as the spamd user) just like the spamchk script does, it scores around a 4. I have It is not necessary to strip X-Spam headers. SA ignores these, if present. You just mixed in a third user, spamd -- in addition to root and the real mail receiving user. Without site-wide Bayes you are comparing apples to oranges, and now peaches. All yummy, though not the same. What is that spamchk script you just mentioned, and how does it fit into your setup? You should review your entire mail-processing chain. Describing it in detail might help here, too. In the link above, it describes my process pretty closely. I deviate by having a sql.cf: # cat /etc/mail/spamassassin/sql.cf user_scores_dsn DBI:mysql:spamassassin:localhost:3306 user_scores_sql_password spampass user_scores_sql_username spamd user_scores_sql_custom_query SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_ OR username = '$GLOBAL' OR username = CONCAT('%',_DOMAIN_) ORDER BY username ASC Here's some of the db: mysql select * from userpref where username='$GLOBAL'; ++--++---+--+-+--+-+ | id | username | preference | value | descript | added | added_by | modified| ++--++---+--+-+--+-+ | 1 | $GLOBAL | required_score | 4.5 | NULL | 2003-01-01 00:00:00 | | 2010-08-23 10:23:26 | | 28 | $GLOBAL | auto_learn | 0 | NULL | 2014-05-22 16:20:01 | | 2014-05-22 16:20:01 | | 29 | $GLOBAL | use_razor2 | 1 | NULL | 2014-05-22 16:20:52 | | 2014-05-22 16:20:52