Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On 2/6/2013 7:41 PM, John Hardin wrote: On Wed, 6 Feb 2013, John Hardin wrote: On Wed, 6 Feb 2013, Eliezer Croitoru wrote: body __HBRW_CHARS/[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/ body __TOTAL_CHARS /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/ Eliezer: Apoligies for not noticing this the first time through: lose the question marks in these rules. Thanks, I have added and removed them when other stuff went in the wrong direction. -- Eliezer Croitoru http://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Wed, 6 Feb 2013, David B Funk wrote: It's also easier to do an edit s/T_/__/g when you've got things working to your satisfaction to move from testing to production. s/ T_/ __/ please! :) -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If Microsoft made hammers, everyone would whine about how poorly screws were designed and about how they are hard to hammer in, and wonder why it takes so long to paint a wall using the hammer. --- 6 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Wed, 6 Feb 2013, Martin Gregorie wrote: On Wed, 2013-02-06 at 17:45 +0200, Eliezer Croitoru wrote: Sorry but I didn't had much time to understand all of the rules syntax. When developing a meta rule that combines subrules there';s littlew point in writing descriptions for the subrules. In addition I find its helpful to do the initial development without the leading underscores because this way you can see these rules firing. After the combination is working as I want it to I put the underscores in. So, I'd start your main rule like this: describe HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender header HSFROMFrom =~ /spamadmin\@ngtech.co.il/i mimeheader HSENC Content-type =~ /charset=.{0,3}windows-1251/i body HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/ tflags HSHCH multiple body HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/ tflags HSTCH multiple meta HSPCT ( (HSHCH * 100) / (HSTCH + 1 ) ) meta HBRW_SPAM (HSPCT < 1) && HSFROM && HSENC score HBRW_SPAM 10.3 Then this gets tested on a set of messages that exercise every subrule as well as checking that the metas work correctly. In this case I'd manually create simpler message bodies that exercise every test case (I think you'd need at least 10 test messages to fully test HBRW_SPAM and all its subrules). With this technique you do need to use the lint check but don't need debugging because the list of rules 6that fires tell you whether a rule fired or didn't *and* will show the number of times a 'multiple' fired. After all is working correctly I put the underscores back: # # HBRW_SPAM detects messages from spamad...@ngtech.co.il with a message body or # part using the Windows 1251 (Hebrew) charset and that contains mostly # non-Hebrew text. # describe HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender header __HSFROMFrom =~ /spamadmin\@ngtech.co.il/i mimeheader __HSENC Content-type =~ /charset=.{0,3}windows-1251/i body __HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/ tflags __HSHCH multiple body __HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/ tflags __HSTCH multiple meta HBRW_SPAM ( (__HSHCH * 100) / (__HSTCH + 1 ) ) score HBRW_SPAM 10.3 After that I re-lint and try all test cases again. I this case I'd do the underscore additions on two stages: first add them to HSHCH and HSTCH so I can see that HSPCT still works and, if so, put the rest back and re-test. [snip..] One caveat, an "indirect rule" (one that starts with '__') receives no intrinsic score. A regular rule will receive a default score of 1.0 So all your "HS*" rules in the above example will have a score of 1.0 and contribute to the final score whereas when they're changed to "__HS*" will be scoreless and not show up in the final score. This may make development more difficult. An alternate way to handle this is to use "testing rules" (rules that start with 'T_'). These rules are given a default score of 0.01 and thus show up in the rule report but do not materially contribute to the final score. So for your example use: describe HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender header T_HSFROMFrom =~ /spamadmin\@ngtech.co.il/i mimeheader T_HSENC Content-type =~ /charset=.{0,3}windows-1251/i body T_HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/ tflags T_HSHCH multiple body T_HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/ tflags T_HSTCH multiple meta HBRW_SPAM ( (T_HSHCH * 100) / (T_HSTCH + 1 ) ) score HBRW_SPAM 10.3 It's also easier to do an edit s/T_/__/g when you've got things working to your satisfaction to move from testing to production. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Wed, 6 Feb 2013, John Hardin wrote: On Wed, 6 Feb 2013, Eliezer Croitoru wrote: body __HBRW_CHARS/[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/ body __TOTAL_CHARS /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/ Eliezer: Apoligies for not noticing this the first time through: lose the question marks in these rules. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- ...we talk about creating "millions of shovel-ready jobs" for a society that doesn't really encourage people to pick up a shovel. -- Mike Rowe, testifying before Congress --- 6 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Wed, 6 Feb 2013, Martin Gregorie wrote: body HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/ body HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/ Why the question marks? They make the character optional, which in this case makes the *entire RE* optional, which is a bad idea, especially if it's a multiple-match rule. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- ...we talk about creating "millions of shovel-ready jobs" for a society that doesn't really encourage people to pick up a shovel. -- Mike Rowe, testifying before Congress --- 6 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Wed, 2013-02-06 at 17:45 +0200, Eliezer Croitoru wrote: > Sorry but I didn't had much time to understand all of the rules syntax. > When developing a meta rule that combines subrules there';s littlew point in writing descriptions for the subrules. In addition I find its helpful to do the initial development without the leading underscores because this way you can see these rules firing. After the combination is working as I want it to I put the underscores in. So, I'd start your main rule like this: describe HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender header HSFROMFrom =~ /spamadmin\@ngtech.co.il/i mimeheader HSENC Content-type =~ /charset=.{0,3}windows-1251/i body HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/ tflags HSHCH multiple body HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/ tflags HSTCH multiple meta HSPCT ( (HSHCH * 100) / (HSTCH + 1 ) ) meta HBRW_SPAM (HSPCT < 1) && HSFROM && HSENC score HBRW_SPAM 10.3 Then this gets tested on a set of messages that exercise every subrule as well as checking that the metas work correctly. In this case I'd manually create simpler message bodies that exercise every test case (I think you'd need at least 10 test messages to fully test HBRW_SPAM and all its subrules). With this technique you do need to use the lint check but don't need debugging because the list of rules 6that fires tell you whether a rule fired or didn't *and* will show the number of times a 'multiple' fired. After all is working correctly I put the underscores back: # # HBRW_SPAM detects messages from spamad...@ngtech.co.il with a message body or # part using the Windows 1251 (Hebrew) charset and that contains mostly # non-Hebrew text. # describe HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender header __HSFROMFrom =~ /spamadmin\@ngtech.co.il/i mimeheader __HSENC Content-type =~ /charset=.{0,3}windows-1251/i body __HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/ tflags __HSHCH multiple body __HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/ tflags __HSTCH multiple meta __HSPCT ( (__HSHCH * 100) / (__HSTCH + 1 ) ) meta HBRW_SPAM (__HSPCT < 1) && __HSFROM && __HSENC score HBRW_SPAM 10.3 After that I re-lint and try all test cases again. I this case I'd do the underscore additions on two stages: first add them to HSHCH and HSTCH so I can see that HSPCT still works and, if so, put the rest back and re-test. In a complex rule like this its well worth preceeding it with a set of comment lines to describe it (as above). I like to use shorter names for subrules (so the subrule name length won't be longer than the meta rule name when the underscores have been put in) and to name them so their names emphasize that they are part of the meta-rule. If you find out later that you want to use a subrule in more than one meta-rule its easy enough to pull it out as a free-standing rule and give it a description, a meaningful name and score it as 0.01, e.g. describe HEBREW-CHARSET MIME part or message body uses CHARSET 1251 mimeheader HEBREW-CHARSET Content-type =~ /charset=.{0,3}windows-1251/i score HEBREW-CHARSET 0.01 and, of course, change the name of the subrule in the original metarule. Forgetting this last step won't be picked up by a lint check. The meta rule(s) that use the old name will merely think the subrule didn't fire. HTH Martin
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
>Subrules (those beginning with __) are not scored. Those score lines have no effect, and should probably be removed to avoid confusion that they actually *do* have an effect. this might be the reason. I will check later. On 2/6/2013 5:40 PM, John Hardin wrote: Typo. s/b FROM_FORM. Perhaps that's why this version of the rule didn't work. sorry a typo in the mail not the source. Suggestion: when doing rule development, always run a lint test. See the SpamAssassin man page for details. Have used so it's vaid. That would also tell you whether the math and the less-than comparison are syntactically valid. Sorry but I didn't had much time to understand all of the rules syntax. -- Eliezer Croitoru http://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Wed, 6 Feb 2013, Eliezer Croitoru wrote: Thanks, I have checked the suggested rules like this: header FROM_FORM From =~ /spamadmin\@ngtech.co.il/i score FROM_FORM -0.1 body __HBRW_ENCODING /charset=\"windows-1255\"/ The fact that the charset= isn't a body part has already been mentioned. score __HBRW_ENCODING -0.1 Subrules (those beginning with __) are not scored. Those score lines have no effect, and should probably be removed to avoid confusion that they actually *do* have an effect. body __HBRW_CHARS/[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/ score __HBRW_CHARS -0.1 tflags __HBRW_CHARSmultiple body __TOTAL_CHARS /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/ score __TOTAL_CHARS -0.1 #body __TOTAL_CHARS /\S/ tflags __TOTAL_CHARS multiple #since there is a possibility of dividing by zero I added the + 1 which suppose to be harmless in this kind email sizes. meta __HBRW_PCT ( (__HBRW_CHARS * 100) / (__TOTAL_CHARS + 1 ) ) score __HBRW_PCT -0.1 #tried this to make sure one thing or another dosn't work. meta HBRW_SPAMFROM_FORM && __HBRW_ENCODING # disabled after the basic tests didn't worked. # meta HBRW_SPAM (__HBRW_PCT < 1) && FROM_FROM && __HBRW_ENCODING score HBRW_SPAM 10.3 Typo. s/b FROM_FORM. Perhaps that's why this version of the rule didn't work. Suggestion: when doing rule development, always run a lint test. See the SpamAssassin man page for details. That would also tell you whether the math and the less-than comparison are syntactically valid. The only part which is being logged by SA in headers is the FROM_FORM rule. Right. The only way to see whether subrules hit is to run SA in debug mode with the "--debug area=rules" option example: X-Spam-Status: No, score=2.322 tagged_above=2 required=6.2 tests=[FROM_FORM_IL=-0.1, FROM_ILLEGAL_CHARS=2.059, NORMAL_HTTP_TO_IP=0.001, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=no so i'm kind of do not understand what's wrong. I have tried couple ways to find Hebrew encoding. I understood it's a part of the body and not a header so, maybe there is something I dont understand or know about it? The typo is the most obvious problem. Second is looking for the encoding in the body, as it *is* a header, either in the main message headers or in a MIME body part header. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Microsoft is not a standards body. --- 6 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On 2/6/2013 11:04 AM, Wolfgang Zeikat wrote: In an older episode, on 2013-02-06 09:53, Eliezer Croitoru wrote: body __HBRW_ENCODING /charset=\"windows-1255\"/ score __HBRW_ENCODING -0.1 I use a rule mimeheader LOCAL_1251_CHARSETContent-Type =~ /charset=.{0,3}windows-1251/i IMHO, charset is a MIME header, not a part of the message body. Hope this helps, wolfgang Helps a lot. Thanks, -- Eliezer Croitoru http://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
In an older episode, on 2013-02-06 09:53, Eliezer Croitoru wrote: body __HBRW_ENCODING /charset=\"windows-1255\"/ score __HBRW_ENCODING -0.1 I use a rule mimeheader LOCAL_1251_CHARSETContent-Type =~ /charset=.{0,3}windows-1251/i IMHO, charset is a MIME header, not a part of the message body. Hope this helps, wolfgang
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
Thanks, I have checked the suggested rules like this: header FROM_FORM From =~ /spamadmin\@ngtech.co.il/i score FROM_FORM -0.1 body __HBRW_ENCODING /charset=\"windows-1255\"/ score __HBRW_ENCODING -0.1 body __HBRW_CHARS/[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/ score __HBRW_CHARS -0.1 tflags __HBRW_CHARSmultiple body __TOTAL_CHARS /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/ score __TOTAL_CHARS -0.1 #body __TOTAL_CHARS /\S/ tflags __TOTAL_CHARS multiple #since there is a possibility of dividing by zero I added the + 1 which suppose to be harmless in this kind email sizes. meta __HBRW_PCT ( (__HBRW_CHARS * 100) / (__TOTAL_CHARS + 1 ) ) score __HBRW_PCT -0.1 #tried this to make sure one thing or another dosn't work. meta HBRW_SPAMFROM_FORM && __HBRW_ENCODING #disabled after the basic tests didn't worked. #meta HBRW_SPAM (__HBRW_PCT < 1) && FROM_FROM && __HBRW_ENCODING score HBRW_SPAM 10.3 The only part which is being logged by SA in headers is the FROM_FORM rule. example: X-Spam-Status: No, score=2.322 tagged_above=2 required=6.2 tests=[FROM_FORM_IL=-0.1, FROM_ILLEGAL_CHARS=2.059, NORMAL_HTTP_TO_IP=0.001, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=no so i'm kind of do not understand what's wrong. I have tried couple ways to find Hebrew encoding. I understood it's a part of the body and not a header so, maybe there is something I dont understand or know about it? Thanks, On 2/3/2013 7:53 PM, John Hardin wrote: Followup note: __TOTAL_CHARS includes punctuation, if you do try this you might want to do something like this instead: body __TOTAL_CHARS /[\x30-\x39\x41-\x5a\x61-\x7a\x80-\xff]/ to exclude punctuation, whitespace and control characters. -- Eliezer Croitoru http://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Sun, 3 Feb 2013, Eliezer Croitoru wrote: On 2/3/2013 7:23 AM, John Hardin wrote: body __HBRW_CHARS/[\xc0-\xcb\xcd-\xdb\xdf-\xfb]/ tflags __HBRW_CHARSmultiple body __TOTAL_CHARS /\S/ tflags __TOTAL_CHARS multiple meta __HBRW_PCT ((__HBRW_CHARS * 100) / __TOTAL_CHARS) meta HBRW_SPAM (__HBRW_PCT < 50) && __HBRW_ENCODING I don't know whether the division in __HBRW_PCT or the less-than comparison in HBRW_SPAM would work, that's totally off the top of my head and untested. I also leave the __HBRW_ENCODING rule as an exercise for the student. :) Thanks I had the __HBRW_ENCODING ready from before. I think I will use a meta hat will check the mail then the encoding and then th percentage. Thanks Again, Followup note: __TOTAL_CHARS includes punctuation, if you do try this you might want to do something like this instead: body __TOTAL_CHARS /[\x30-\x39\x41-\x5a\x61-\x7a\x80-\xff]/ to exclude punctuation, whitespace and control characters. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If "healthcare is a Right" means that the government is obligated to provide the people with hospitals, physicians, treatments and medications at low or no cost, then the right to free speech means the government is obligated to provide the people with printing presses and public address systems, the right to freedom of religion means the government is obligated to build churches for the people, and the right to keep and bear arms means the government is obligated to provide the people with guns, all at low or no cost. --- 9 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On 2/3/2013 7:23 AM, John Hardin wrote: On Sat, 2 Feb 2013, Eliezer Croitoru wrote: I wrote something in ruby which actually works fine as a starter. #code start spam_content = "the long part from the mail".force_encoding("Windows-1255") template_hebrew_chars = 270 def hebrew_char(char) if (223..251).member?(char.unpack("H*")[0].hex) return true elsif (192..203).member?(char.unpack("H*")[0].hex) return true elsif (205..219).member?(char.unpack("H*")[0].hex) return true end return false end counter = 0; spam_content.each_char {|char| if hebrew_char(char);counter += 1 ;end;} if counter == template_hebrew_chars puts "this is a spam" else puts "might not be a spam" end ##code end Now *that* might be possible in plain SA rules without a plugin: count the number of characters in the message body, and the number of characters that fall in a given range (e.g. those that are hebrew glyphs), and calculate the percentage. I *think* you can do math in meta rules... However, a plugin would be _much_ more efficient than something like: body __HBRW_CHARS/[\xc0-\xcb\xcd-\xdb\xdf-\xfb]/ tflags __HBRW_CHARSmultiple body __TOTAL_CHARS /\S/ tflags __TOTAL_CHARS multiple meta __HBRW_PCT ((__HBRW_CHARS * 100) / __TOTAL_CHARS) meta HBRW_SPAM (__HBRW_PCT < 50) && __HBRW_ENCODING I don't know whether the division in __HBRW_PCT or the less-than comparison in HBRW_SPAM would work, that's totally off the top of my head and untested. I also leave the __HBRW_ENCODING rule as an exercise for the student. :) Thanks I had the __HBRW_ENCODING ready from before. I think I will use a meta hat will check the mail then the encoding and then th percentage. Thanks Again, -- Eliezer Croitoru http://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Sat, 2 Feb 2013, Eliezer Croitoru wrote: I wrote something in ruby which actually works fine as a starter. #code start spam_content = "the long part from the mail".force_encoding("Windows-1255") template_hebrew_chars = 270 def hebrew_char(char) if (223..251).member?(char.unpack("H*")[0].hex) return true elsif (192..203).member?(char.unpack("H*")[0].hex) return true elsif (205..219).member?(char.unpack("H*")[0].hex) return true end return false end counter = 0; spam_content.each_char {|char| if hebrew_char(char);counter += 1 ;end;} if counter == template_hebrew_chars puts "this is a spam" else puts "might not be a spam" end ##code end Now *that* might be possible in plain SA rules without a plugin: count the number of characters in the message body, and the number of characters that fall in a given range (e.g. those that are hebrew glyphs), and calculate the percentage. I *think* you can do math in meta rules... However, a plugin would be _much_ more efficient than something like: body __HBRW_CHARS/[\xc0-\xcb\xcd-\xdb\xdf-\xfb]/ tflags __HBRW_CHARSmultiple body __TOTAL_CHARS /\S/ tflags __TOTAL_CHARS multiple meta __HBRW_PCT ((__HBRW_CHARS * 100) / __TOTAL_CHARS) meta HBRW_SPAM (__HBRW_PCT < 50) && __HBRW_ENCODING I don't know whether the division in __HBRW_PCT or the less-than comparison in HBRW_SPAM would work, that's totally off the top of my head and untested. I also leave the __HBRW_ENCODING rule as an exercise for the student. :) -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- For those who are being swayed by Microsoft's whining about the GPL, consider how aggressively viral their Shared Source license is: If you've *ever* seen *any* MS code covered by the Shared Source license, you're infected for life. MS can sue you for Intellectual Property misappropriation whenever they like, so you'd better not come up with any Innovative Ideas that they want to Embrace... --- 10 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On 2/2/2013 11:01 PM, John Hardin wrote: On Sat, 2 Feb 2013, Eliezer Croitoru wrote: Yes I do understand that it's hard. I worked a bit with perl so I might be able to write something that will do that if dosn't exists already. That's probably what it will take. I will try to explain even more. The problem is that I get the mail with an example of the SPAM content which didn't came from EMAIL and just to categorize it as SPAM. This is not how and for what SA was built for but it gives very good results in general. This is a specific case. Ah, I think I see; by "this is a form" you meant your need is for scanning content submitted via a web form to see if it is spammy? Yes.. I have an active system which someone wrote in C# that scans the chars etc but the problem is that it's in C# and it's an active check that crawls the site and parsing it rather then a restful system that triggers the checks when needed. This is an example of the content: http://www.fpaste.org/yFOC/ It can be even some CMS post that someone got and he want's to categorize as spam. So that sample message is largely hacked up just to provide headers so that it looks like an email and SA can scan it? That sure doesn't look like a valid email and there are a lot of obvious spam signs in the headers. This msg indeed recognized as spam. I have other msgs which have: X-Spam-Status: No, score=3.146 tagged_above=2 required=6.2 tests=[FROM_ILLEGAL_CHARS=2.059, LOTS_OF_MONEY=0.001, RCVD_IN_XBL=0.724, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=no And in this case I have a one way filter that actually works. Language filtering. I wrote something in ruby which actually works fine as a starter. #code start spam_content = "the long part from the mail".force_encoding("Windows-1255") template_hebrew_chars = 270 def hebrew_char(char) if (223..251).member?(char.unpack("H*")[0].hex) return true elsif (192..203).member?(char.unpack("H*")[0].hex) return true elsif (205..219).member?(char.unpack("H*")[0].hex) return true end return false end counter = 0; spam_content.each_char {|char| if hebrew_char(char);counter += 1 ;end;} if counter == template_hebrew_chars puts "this is a spam" else puts "might not be a spam" end ##code end There are couple directions in the identification tree like how many words exist. If there are mixed hebrew and english words what to decide... Identify URLs etc. I have used: http://msdn.microsoft.com/en-US/goglobal/cc305148 http://en.wikipedia.org/wiki/Windows-1255#Code_page_layout http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1255.txt And maybe later I will try to write something in perl that can help in that. The mixing of two languages makes it a bit of a problem and I had a nice algorithm in mid to decide on percentage for hebrew language in this encoding. In another encoding such as UTF-8 or even more complex phonetic languages makes it's a bit difficult but since most simple mails consist of plain text it wont be such a big problem. Thanks, -- Eliezer Croitoru http://www1.ngtech.co.il
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Sat, 2 Feb 2013, Eliezer Croitoru wrote: Yes I do understand that it's hard. I worked a bit with perl so I might be able to write something that will do that if dosn't exists already. That's probably what it will take. I will try to explain even more. The problem is that I get the mail with an example of the SPAM content which didn't came from EMAIL and just to categorize it as SPAM. This is not how and for what SA was built for but it gives very good results in general. This is a specific case. Ah, I think I see; by "this is a form" you meant your need is for scanning content submitted via a web form to see if it is spammy? I have an active system which someone wrote in C# that scans the chars etc but the problem is that it's in C# and it's an active check that crawls the site and parsing it rather then a restful system that triggers the checks when needed. This is an example of the content: http://www.fpaste.org/yFOC/ It can be even some CMS post that someone got and he want's to categorize as spam. So that sample message is largely hacked up just to provide headers so that it looks like an email and SA can scan it? That sure doesn't look like a valid email and there are a lot of obvious spam signs in the headers. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Users mistake widespread adoption of Microsoft Office for the development of a document format standard. --- 10 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On 2/2/2013 8:58 PM, John Hardin wrote: That's the difficult part. It's easy to look for specific strings in the body, or specific things like the ratio of text to whitespace or text to images, but trying to *interpret* the text to do something like detect which language it is in is a *hard* problem. Even more so if you want to detect that the message body is in more than one language, and determine the ratios. The closest we can come today is to look at the character set of the message and try to guess from that whether the *entire* message is in a "foreign" language. This runs into problems where the character set of the message supports multiple languages, like UTF-8 or some of the character sets used by Windows. Do you have Bayes enabled? If so, are you training these messages as spam? If you are doing this, then they should eventually hit BAYES_99 and if there are any other spammy characteristics that would probably be enough to detect them. If you would upload a few of these spams to someplace like pastebin and point us at them then we will be able to do better than just guess and make general suggestions. Yes I do understand that it's hard. I worked a bit with perl so I might be able to write something that will do that if dosn't exists already. I will try to explain even more. The problem is that I get the mail with an example of the SPAM content which didn't came from EMAIL and just to categorize it as SPAM. This is not how and for what SA was built for but it gives very good results in general. This is a specific case. I have an active system which someone wrote in C# that scans the chars etc but the problem is that it's in C# and it's an active check that crawls the site and parsing it rather then a restful system that triggers the checks when needed. This is an example of the content: http://www.fpaste.org/yFOC/ It can be even some CMS post that someone got and he want's to categorize as spam. -- John Hardin KA7OHZ http://www.impsec.org/~jhardin/ Thanks, -- Eliezer Croitoru http://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Sat, 2013-02-02 at 20:23 +0200, Eliezer Croitoru wrote: > On 2/2/2013 7:39 PM, Martin Gregorie wrote: > > In that case something like this would work: > > > > describe EC_BANNED_ADDRESS Mail from a spamming address > > header EC_BANNED_ADDRESS From =~ sender@spamming_address > > scoreEC_BANNED_ADDRESS 10.0 > > > > > There's no point in writing rules against the message body when the mail > > is all from an address that you know. > > > > > > Martin > > Thanks Martin. > I do have.. > The mail is fine. > I just need to know about a pattern match in the content since it's a form. > This address spam is pretty specific. > This is why I wanted to use specific check for this kind of mail. > The start and end has specific percentage of Hebrew language. > Most of the mail should be in hebrew and if there is more then 50 > percent of the body in english it's 100% spam. > less then that I can score it with basic rules. > > I was thinking of meta rule like here: > http://spamassassin.1065346.n5.nabble.com/spamassassin-conditional-rules-td42578.html > Use a meta-rule to combine non-scoring rules: describe EC_SPAMTRAP Mail from a spamming address header __EC_BANNED_ADDRESS From =~ sender@spamming_address body __EC_MOSTLY_HEBREW rule to decide if the body is mostly Hebrew meta EC_SPAMTRAP (__EC_BANNED_ADDRESS && __EC_MOSTLY_HEBREW ) scoreEC_SPAMTRAP 10 The meta rule will only fire if its subrules both fire. This sort of structure is the way to encode logical relationships, but there's no way to control whether a subrule is run. However, I can't suggest how to code the 'mostly Hebrew' test. AFAIK there's no easy way to recognise languages or national character sets now that universal character coding sets like UTF-8 and UTF-16 are common: formerly you could do it by seeing which Microsoft codepage was used for the body, though that was only useful if the sender was using a Windows PC. You'd probably might have to write a plugin to handle language recognition, especially as your weighting scheme requires every word in the body to be counted as Hebrew or non-Hebrew. Martin
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Sat, 2 Feb 2013, Eliezer Croitoru wrote: I just need to know about a pattern match in the content since it's a form. There are existing rules to detect fill-in-the-form emails. Are any of the FILL_FORM family of rules hitting those messages? If the form text is in hebrew it likely won't; if you want to send me samples of those messages off-list, ideally as RFC-822 attachments, I'll be happy to see if I can add hebrew variants to the form rules. This address spam is pretty specific. If the spams are from the same address then you can blacklist that address as Martin suggested. This is why I wanted to use specific check for this kind of mail. The start and end has specific percentage of Hebrew language. Most of the mail should be in hebrew and if there is more then 50 percent of the body in english it's 100% spam. That's the difficult part. It's easy to look for specific strings in the body, or specific things like the ratio of text to whitespace or text to images, but trying to *interpret* the text to do something like detect which language it is in is a *hard* problem. Even more so if you want to detect that the message body is in more than one language, and determine the ratios. The closest we can come today is to look at the character set of the message and try to guess from that whether the *entire* message is in a "foreign" language. This runs into problems where the character set of the message supports multiple languages, like UTF-8 or some of the character sets used by Windows. Do you have Bayes enabled? If so, are you training these messages as spam? If you are doing this, then they should eventually hit BAYES_99 and if there are any other spammy characteristics that would probably be enough to detect them. If you would upload a few of these spams to someplace like pastebin and point us at them then we will be able to do better than just guess and make general suggestions. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- "A well educated Electorate, being necessary to the liberty of a free State, the Right of the People to Keep and Read Books, shall not be infringed." ...means only registered voters can read books, and only those books obtained with State permission from State-controlled bookstores? --- 10 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On 2/2/2013 7:39 PM, Martin Gregorie wrote: In that case something like this would work: describe EC_BANNED_ADDRESS Mail from a spamming address header EC_BANNED_ADDRESS From =~ sender@spamming_address scoreEC_BANNED_ADDRESS 10.0 There's no point in writing rules against the message body when the mail is all from an address that you know. Martin Thanks Martin. I do have.. The mail is fine. I just need to know about a pattern match in the content since it's a form. This address spam is pretty specific. This is why I wanted to use specific check for this kind of mail. The start and end has specific percentage of Hebrew language. Most of the mail should be in hebrew and if there is more then 50 percent of the body in english it's 100% spam. less then that I can score it with basic rules. I was thinking of meta rule like here: http://spamassassin.1065346.n5.nabble.com/spamassassin-conditional-rules-td42578.html -- Eliezer Croitoru
Re: IS there a simple way to add a rule of a body mail test? I have a pattern..
On Sat, 2013-02-02 at 19:26 +0200, Eliezer Croitoru wrote: > I have specific mail address which I get messages couple times with a > basic pattern which I want to block. > > I started reading: > http://wiki.apache.org/spamassassin/WritingRules > > And I would be very happy to get some notes and help about it. > > - The mail is from specific mail address. > In that case something like this would work: describe EC_BANNED_ADDRESS Mail from a spamming address header EC_BANNED_ADDRESS From =~ sender@spamming_address scoreEC_BANNED_ADDRESS 10.0 There's no point in writing rules against the message body when the mail is all from an address that you know. Martin
IS there a simple way to add a rule of a body mail test? I have a pattern..
I have specific mail address which I get messages couple times with a basic pattern which I want to block. I started reading: http://wiki.apache.org/spamassassin/WritingRules And I would be very happy to get some notes and help about it. - The mail is from specific mail address. - The mail body have specific pattern which is #quote start of mail body one language paragraph couple paragraphs in more then 50% English or 50% non my local language (the spam msg body) other paragraph info -- signatrue #quote end of mail body So basically I need to match the sender then "score" the body from "" to "\r\n_msg_end_marker" by language characters percentage. Any direction to do that will be very helpful. Now the email scores are: X-Spam-Flag: NO X-Spam-Score: 3.146 X-Spam-Level: *** X-Spam-Status: No, score=3.146 tagged_above=2 required=6.2 tests=[FROM_ILLEGAL_CHARS=2.059, LOTS_OF_MONEY=0.001, RCVD_IN_XBL=0.724, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=no Thanks, -- Eliezer Croitoru
Re: A simple way to...
On Sat, 9 Oct 2004 15:41:37 -0600 (CST) Ryan Thompson <[EMAIL PROTECTED]> wrote: > Robin Lynn Frank wrote to users@spamassassin.apache.org: > > > We use SA 3.0.0 with MySQL so we can extract certain AWL data and > > use it at the MTA level. However, since SA doesn't have an > > auto-blacklist feature, > > Hi Robin, > > Actually, "AutoWhiteList" (AWL) is a bit of a misnomer. AWL maintains > average message scores for sender/class-B tuples, so, in effect, it is > also an auto blacklist, because repeat spam senders will have high > average scores in the AWL database. > > > I'd like to find a relatively simple way to extract IP addresses > > from emails that contain spam. If it is of any importance, we > > invoke SA via amavisd-new. > > See, for instance, the check_whitelist script in the tools/ directory > of the distribution. I get output like this: > > -4.5 (-35.6/8) -- [EMAIL PROTECTED]|ip=64.59 > 9.3(27.9/3) -- [EMAIL PROTECTED]|ip=65.39 > > The first line is for a user that sends ham, so his/her score on > future messages would be pushed closer to -4.5. > > The second line is for a user that sends spam, so, if they sent a more > hammy message later, the AWL would likely *add* points to the message, > while decreasing the average slightly. > > It works both ways. If you want to use this at the MTA level, I could > envision you wanting to grab, say, every entry over a certain average > score and potentially greylist based on that or something. > > Hope this helps, > - Ryan > Yes it does. The only thing I see that is a problem is that the IPs appear to be /16s. /24s would be a broad enough brush to paint with. Back to the drawing board. -- Robin Lynn Frank Director of Operations Paradigm-Omega, LLC http://www.paradigm-omega.com == Sed quis custodiet ipsos custodes? pgpZtWxbE2FED.pgp Description: PGP signature
Re: A simple way to...
- Original Message - From: "Ryan Thompson" <[EMAIL PROTECTED]> > Robin Lynn Frank wrote to users@spamassassin.apache.org: > > > We use SA 3.0.0 with MySQL so we can extract certain AWL data and use > > it at the MTA level. However, since SA doesn't have an auto-blacklist > > feature, > > Hi Robin, > > Actually, "AutoWhiteList" (AWL) is a bit of a misnomer. AWL maintains > average message scores for sender/class-B tuples, so, in effect, it is > also an auto blacklist, because repeat spam senders will have high > average scores in the AWL database. > > > I'd like to find a relatively simple way to extract IP addresses from > > emails that contain spam. If it is of any importance, we invoke SA > > via amavisd-new. > > See, for instance, the check_whitelist script in the tools/ directory of > the distribution. I get output like this: > > -4.5 (-35.6/8) -- [EMAIL PROTECTED]|ip=64.59 > 9.3(27.9/3) -- [EMAIL PROTECTED]|ip=65.39 > > The first line is for a user that sends ham, so his/her score on future > messages would be pushed closer to -4.5. > > The second line is for a user that sends spam, so, if they sent a more > hammy message later, the AWL would likely *add* points to the message, > while decreasing the average slightly. > > It works both ways. If you want to use this at the MTA level, I could > envision you wanting to grab, say, every entry over a certain average > score and potentially greylist based on that or something. I'm wondering if the devs have consider changing the name associated with AWL from auto-whitelisting to something more descriptive of what AWL actually does, maybe something like auto-weight-leveling? Bill
Re: A simple way to...
Robin Lynn Frank wrote to users@spamassassin.apache.org: We use SA 3.0.0 with MySQL so we can extract certain AWL data and use it at the MTA level. However, since SA doesn't have an auto-blacklist feature, Hi Robin, Actually, "AutoWhiteList" (AWL) is a bit of a misnomer. AWL maintains average message scores for sender/class-B tuples, so, in effect, it is also an auto blacklist, because repeat spam senders will have high average scores in the AWL database. I'd like to find a relatively simple way to extract IP addresses from emails that contain spam. If it is of any importance, we invoke SA via amavisd-new. See, for instance, the check_whitelist script in the tools/ directory of the distribution. I get output like this: -4.5 (-35.6/8) -- [EMAIL PROTECTED]|ip=64.59 9.3(27.9/3) -- [EMAIL PROTECTED]|ip=65.39 The first line is for a user that sends ham, so his/her score on future messages would be pushed closer to -4.5. The second line is for a user that sends spam, so, if they sent a more hammy message later, the AWL would likely *add* points to the message, while decreasing the average slightly. It works both ways. If you want to use this at the MTA level, I could envision you wanting to grab, say, every entry over a certain average score and potentially greylist based on that or something. Hope this helps, - Ryan -- Ryan Thompson <[EMAIL PROTECTED]> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America
A simple way to...
We use SA 3.0.0 with MySQL so we can extract certain AWL data and use it at the MTA level. However, since SA doesn't have an auto-blacklist feature, I'd like to find a relatively simple way to extract IP addresses from emails that contain spam. If it is of any importance, we invoke SA via amavisd-new. -- Robin Lynn Frank Director of Operations Paradigm-Omega, LLC http://www.paradigm-omega.com == Sed quis custodiet ipsos custodes? pgpTztCEGUoBu.pgp Description: PGP signature