Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-11 Thread Eliezer Croitoru

On 2/6/2013 7:41 PM, John Hardin wrote:

On Wed, 6 Feb 2013, John Hardin wrote:


On Wed, 6 Feb 2013, Eliezer Croitoru wrote:


 body   __HBRW_CHARS/[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
 body   __TOTAL_CHARS  /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/


Eliezer:

Apoligies for not noticing this the first time through: lose the
question marks in these rules.


Thanks,
I have added and removed them when other stuff went in the wrong direction.


--
Eliezer Croitoru
http://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer  ngtech.co.il


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-06 Thread John Hardin

On Wed, 6 Feb 2013, David B Funk wrote:


It's also easier to do an edit s/T_/__/g when you've got things working
to your satisfaction to move from testing to production.


s/ T_/ __/  please! :)

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If Microsoft made hammers, everyone would whine about how poorly
  screws were designed and about how they are hard to hammer in, and
  wonder why it takes so long to paint a wall using the hammer.
---
 6 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-06 Thread David B Funk

On Wed, 6 Feb 2013, Martin Gregorie wrote:


On Wed, 2013-02-06 at 17:45 +0200, Eliezer Croitoru wrote:


Sorry but I didn't had much time to understand all of the rules syntax.


When developing a meta rule that combines subrules there';s littlew
point in writing descriptions for the subrules. In addition I find its
helpful to do the initial development without the leading underscores
because this way you can see these rules firing. After the combination
is working as I want it to I put the underscores in. So, I'd start your
main rule like this:

describe   HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender
header HSFROMFrom =~ /spamadmin\@ngtech.co.il/i
mimeheader HSENC Content-type =~ /charset=.{0,3}windows-1251/i
body   HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
tflags HSHCH multiple
body   HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/
tflags HSTCH multiple
meta   HSPCT ( (HSHCH * 100) / (HSTCH + 1 ) )
meta   HBRW_SPAM (HSPCT < 1) && HSFROM && HSENC
score  HBRW_SPAM 10.3

Then this gets tested on a set of messages that exercise every subrule as well 
as
checking that the metas work correctly. In this case I'd manually create simpler
message bodies that exercise every test case (I think you'd need at least 10 
test
messages to fully test HBRW_SPAM and all its subrules). With this technique
you do need to use the lint check but don't need debugging because the
list of rules 6that fires tell you whether a rule fired or didn't *and* will
show the number of times a 'multiple' fired.

After all is working correctly I put the underscores back:

#
# HBRW_SPAM detects messages from spamad...@ngtech.co.il with a message body or
# part using the Windows 1251 (Hebrew) charset and that contains mostly
# non-Hebrew text.
#
describe   HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender
header __HSFROMFrom =~ /spamadmin\@ngtech.co.il/i
mimeheader __HSENC Content-type =~ /charset=.{0,3}windows-1251/i
body   __HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
tflags __HSHCH multiple
body   __HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/
tflags __HSTCH multiple
meta   HBRW_SPAM ( (__HSHCH * 100) / (__HSTCH + 1 ) )
score  HBRW_SPAM 10.3

After that I re-lint and try all test cases again. I this case I'd do
the underscore additions on two stages: first add them to HSHCH and
HSTCH  so I can see that HSPCT still works and, if so, put the rest back
and re-test.

[snip..]

One caveat, an "indirect rule" (one that starts with '__') receives no
intrinsic score. A regular rule will receive a default score of 1.0
So all your "HS*" rules in the above example will have a score of 1.0
and contribute to the final score whereas when they're changed to "__HS*"
will be scoreless and not show up in the final score.
This may make development more difficult.

An alternate way to handle this is to use "testing rules" (rules that
start with 'T_'). These rules are given a default score of 0.01 and thus
show up in the rule report but do not materially contribute to the final
score. So for your example use:

 describe   HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender
 header T_HSFROMFrom =~ /spamadmin\@ngtech.co.il/i
 mimeheader T_HSENC Content-type =~ /charset=.{0,3}windows-1251/i
 body   T_HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
 tflags T_HSHCH multiple
 body   T_HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/
 tflags T_HSTCH multiple
 meta   HBRW_SPAM ( (T_HSHCH * 100) / (T_HSTCH + 1 ) )
 score  HBRW_SPAM 10.3

It's also easier to do an edit s/T_/__/g when you've got things working
to your satisfaction to move from testing to production.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-06 Thread John Hardin

On Wed, 6 Feb 2013, John Hardin wrote:


On Wed, 6 Feb 2013, Eliezer Croitoru wrote:


 body   __HBRW_CHARS/[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
 body   __TOTAL_CHARS  /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/


Eliezer:

Apoligies for not noticing this the first time through: lose the question 
marks in these rules.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...we talk about creating "millions of shovel-ready jobs" for a
  society that doesn't really encourage people to pick up a shovel.
 -- Mike Rowe, testifying before Congress
---
 6 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-06 Thread John Hardin

On Wed, 6 Feb 2013, Martin Gregorie wrote:


body   HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
body   HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/


Why the question marks? They make the character optional, which in this 
case makes the *entire RE* optional, which is a bad idea, especially if 
it's a multiple-match rule.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...we talk about creating "millions of shovel-ready jobs" for a
  society that doesn't really encourage people to pick up a shovel.
 -- Mike Rowe, testifying before Congress
---
 6 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-06 Thread Martin Gregorie
On Wed, 2013-02-06 at 17:45 +0200, Eliezer Croitoru wrote:

> Sorry but I didn't had much time to understand all of the rules syntax.
> 
When developing a meta rule that combines subrules there';s littlew
point in writing descriptions for the subrules. In addition I find its
helpful to do the initial development without the leading underscores
because this way you can see these rules firing. After the combination
is working as I want it to I put the underscores in. So, I'd start your
main rule like this:

describe   HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender
header HSFROMFrom =~ /spamadmin\@ngtech.co.il/i
mimeheader HSENC Content-type =~ /charset=.{0,3}windows-1251/i 
body   HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
tflags HSHCH multiple
body   HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/
tflags HSTCH multiple
meta   HSPCT ( (HSHCH * 100) / (HSTCH + 1 ) )
meta   HBRW_SPAM (HSPCT < 1) && HSFROM && HSENC
score  HBRW_SPAM 10.3

Then this gets tested on a set of messages that exercise every subrule as well 
as 
checking that the metas work correctly. In this case I'd manually create 
simpler 
message bodies that exercise every test case (I think you'd need at least 10 
test 
messages to fully test HBRW_SPAM and all its subrules). With this technique
you do need to use the lint check but don't need debugging because the 
list of rules 6that fires tell you whether a rule fired or didn't *and* will
show the number of times a 'multiple' fired.

After all is working correctly I put the underscores back:

#
# HBRW_SPAM detects messages from spamad...@ngtech.co.il with a message body or
# part using the Windows 1251 (Hebrew) charset and that contains mostly
# non-Hebrew text.
# 
describe   HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender
header __HSFROMFrom =~ /spamadmin\@ngtech.co.il/i
mimeheader __HSENC Content-type =~ /charset=.{0,3}windows-1251/i 
body   __HSHCH /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
tflags __HSHCH multiple
body   __HSTCH /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/
tflags __HSTCH multiple
meta   __HSPCT ( (__HSHCH * 100) / (__HSTCH + 1 ) )
meta   HBRW_SPAM (__HSPCT < 1) && __HSFROM && __HSENC
score  HBRW_SPAM 10.3

After that I re-lint and try all test cases again. I this case I'd do
the underscore additions on two stages: first add them to HSHCH and
HSTCH  so I can see that HSPCT still works and, if so, put the rest back
and re-test.

In a complex rule like this its well worth preceeding it with a set of
comment lines to describe it (as above). I like to use shorter names for
subrules (so the subrule name length won't be longer than the meta rule
name when the underscores have been put in) and to name them so their
names emphasize that they are part of the meta-rule. 

If you find out later that you want to use a subrule in more than one
meta-rule its easy enough to pull it out as a free-standing rule and
give it a description, a meaningful name and score it as 0.01, e.g.

describe   HEBREW-CHARSET MIME part or message body uses CHARSET 1251
mimeheader HEBREW-CHARSET Content-type =~ /charset=.{0,3}windows-1251/i
score  HEBREW-CHARSET 0.01 

and, of course, change the name of the subrule in the original metarule.
Forgetting this last step won't be picked up by a lint check. The meta
rule(s) that use the old name will merely think the subrule didn't fire.
 
HTH


Martin




Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-06 Thread Eliezer Croitoru


>Subrules (those beginning with __) are not scored. Those score lines 
have no effect, and should probably be removed to avoid confusion that 
they actually *do* have an effect.


this might be the reason.
I will check later.

On 2/6/2013 5:40 PM, John Hardin wrote:

Typo. s/b FROM_FORM. Perhaps that's why this version of the rule didn't
work.

sorry a typo in the mail not the source.


Suggestion: when doing rule development, always run a lint test. See the
SpamAssassin man page for details.

Have used so it's vaid.


That would also tell you whether the math and the less-than comparison
are syntactically valid.


Sorry but I didn't had much time to understand all of the rules syntax.

--
Eliezer Croitoru
http://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer  ngtech.co.il


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-06 Thread John Hardin

On Wed, 6 Feb 2013, Eliezer Croitoru wrote:


Thanks,

I have checked the suggested rules like this:

header FROM_FORM  From =~ /spamadmin\@ngtech.co.il/i
score FROM_FORM -0.1

body __HBRW_ENCODING /charset=\"windows-1255\"/


The fact that the charset= isn't a body part has already been mentioned.


score __HBRW_ENCODING -0.1


Subrules (those beginning with __) are not scored. Those score lines have 
no effect, and should probably be removed to avoid confusion that they 
actually *do* have an effect.



body   __HBRW_CHARS/[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
score __HBRW_CHARS -0.1

tflags __HBRW_CHARSmultiple
body   __TOTAL_CHARS  /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/
score __TOTAL_CHARS -0.1
#body   __TOTAL_CHARS   /\S/

tflags __TOTAL_CHARS   multiple

#since there is a possibility of dividing by zero I added the + 1 which 
suppose to be harmless in this kind email sizes.

meta   __HBRW_PCT  ( (__HBRW_CHARS * 100) / (__TOTAL_CHARS + 1 ) )
score __HBRW_PCT -0.1

#tried this to make sure one thing or another dosn't work.
meta   HBRW_SPAMFROM_FORM && __HBRW_ENCODING

# disabled after the basic tests didn't worked.
# meta   HBRW_SPAM   (__HBRW_PCT < 1) && FROM_FROM && __HBRW_ENCODING
score HBRW_SPAM 10.3


Typo. s/b FROM_FORM. Perhaps that's why this version of the rule didn't 
work.


Suggestion: when doing rule development, always run a lint test. See the 
SpamAssassin man page for details.


That would also tell you whether the math and the less-than comparison are 
syntactically valid.



The only part which is being logged by SA in headers is the FROM_FORM rule.


Right. The only way to see whether subrules hit is to run SA in debug mode 
with the "--debug area=rules" option



example:
X-Spam-Status: No, score=2.322 tagged_above=2 required=6.2
 tests=[FROM_FORM_IL=-0.1, FROM_ILLEGAL_CHARS=2.059,
 NORMAL_HTTP_TO_IP=0.001, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001]
 autolearn=no

so i'm kind of do not understand what's wrong.
I have tried couple ways to find Hebrew encoding.
I understood it's a part of the body and not a header so, maybe there is 
something I dont understand or know about it?


The typo is the most obvious problem. Second is looking for the encoding 
in the body, as it *is* a header, either in the main message headers or in 
a MIME body part header.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Microsoft is not a standards body.
---
 6 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-06 Thread Eliezer Croitoru


On 2/6/2013 11:04 AM, Wolfgang Zeikat wrote:

In an older episode, on 2013-02-06 09:53, Eliezer Croitoru wrote:

body __HBRW_ENCODING /charset=\"windows-1255\"/
score __HBRW_ENCODING -0.1


I use a rule

mimeheader LOCAL_1251_CHARSETContent-Type =~
/charset=.{0,3}windows-1251/i

IMHO, charset is a MIME header, not a part of the message body.

Hope this helps,

wolfgang


Helps a lot.

Thanks,
--
Eliezer Croitoru
http://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer  ngtech.co.il


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-06 Thread Wolfgang Zeikat

In an older episode, on 2013-02-06 09:53, Eliezer Croitoru wrote:

body __HBRW_ENCODING /charset=\"windows-1255\"/
score __HBRW_ENCODING -0.1


I use a rule

mimeheader LOCAL_1251_CHARSETContent-Type =~ 
/charset=.{0,3}windows-1251/i


IMHO, charset is a MIME header, not a part of the message body.

Hope this helps,

wolfgang



Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-06 Thread Eliezer Croitoru

Thanks,

I have checked the suggested rules like this:

header FROM_FORM  From =~ /spamadmin\@ngtech.co.il/i
score FROM_FORM -0.1

body __HBRW_ENCODING /charset=\"windows-1255\"/
score __HBRW_ENCODING -0.1

body   __HBRW_CHARS/[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
score __HBRW_CHARS -0.1

tflags __HBRW_CHARSmultiple
body   __TOTAL_CHARS  /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/
score __TOTAL_CHARS -0.1
#body   __TOTAL_CHARS   /\S/

tflags __TOTAL_CHARS   multiple

#since there is a possibility of dividing by zero I added the + 1 which 
suppose to be harmless in this kind email sizes.

meta   __HBRW_PCT  ( (__HBRW_CHARS * 100) / (__TOTAL_CHARS + 1 ) )
score __HBRW_PCT -0.1

#tried this to make sure one thing or another dosn't work.
meta   HBRW_SPAMFROM_FORM && __HBRW_ENCODING

#disabled after the basic tests didn't worked.
#meta   HBRW_SPAM   (__HBRW_PCT < 1) && FROM_FROM && __HBRW_ENCODING
score HBRW_SPAM 10.3

The only part which is being logged by SA in headers is the FROM_FORM rule.
example:
X-Spam-Status: No, score=2.322 tagged_above=2 required=6.2
tests=[FROM_FORM_IL=-0.1, FROM_ILLEGAL_CHARS=2.059,
NORMAL_HTTP_TO_IP=0.001, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001]
autolearn=no

so i'm kind of do not understand what's wrong.
I have tried couple ways to find Hebrew encoding.
I understood it's a part of the body and not a header so, maybe there is 
something I dont understand or know about it?


Thanks,

On 2/3/2013 7:53 PM, John Hardin wrote:

Followup note: __TOTAL_CHARS includes punctuation, if you do try this
you might want to do something like this instead:

   body   __TOTAL_CHARS  /[\x30-\x39\x41-\x5a\x61-\x7a\x80-\xff]/

to exclude punctuation, whitespace and control characters.


--
Eliezer Croitoru
http://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer  ngtech.co.il


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-03 Thread John Hardin

On Sun, 3 Feb 2013, Eliezer Croitoru wrote:


On 2/3/2013 7:23 AM, John Hardin wrote:


body   __HBRW_CHARS/[\xc0-\xcb\xcd-\xdb\xdf-\xfb]/
tflags __HBRW_CHARSmultiple
body   __TOTAL_CHARS   /\S/
tflags __TOTAL_CHARS   multiple
meta   __HBRW_PCT  ((__HBRW_CHARS * 100) / __TOTAL_CHARS)
meta   HBRW_SPAM   (__HBRW_PCT < 50) && __HBRW_ENCODING

 I don't know whether the division in __HBRW_PCT or the less-than
 comparison in HBRW_SPAM would work, that's totally off the top of my
 head and untested. I also leave the __HBRW_ENCODING rule as an exercise
 for the student. :)


Thanks

I had the __HBRW_ENCODING ready from before.
I think I will use a meta hat will check the mail then the encoding and then 
th percentage.


Thanks Again,


Followup note: __TOTAL_CHARS includes punctuation, if you do try this you 
might want to do something like this instead:


  body   __TOTAL_CHARS  /[\x30-\x39\x41-\x5a\x61-\x7a\x80-\xff]/

to exclude punctuation, whitespace and control characters.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If "healthcare is a Right" means that the government is obligated
  to provide the people with hospitals, physicians, treatments and
  medications at low or no cost, then the right to free speech means
  the government is obligated to provide the people with printing
  presses and public address systems, the right to freedom of
  religion means the government is obligated to build churches for the
  people, and the right to keep and bear arms means the government is
  obligated to provide the people with guns, all at low or no cost.
---
 9 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-03 Thread Eliezer Croitoru

On 2/3/2013 7:23 AM, John Hardin wrote:

On Sat, 2 Feb 2013, Eliezer Croitoru wrote:


I wrote something in ruby which actually works fine as a starter.

#code start
spam_content = "the long part from the
mail".force_encoding("Windows-1255")

template_hebrew_chars = 270

def hebrew_char(char)
  if (223..251).member?(char.unpack("H*")[0].hex)
return true
  elsif (192..203).member?(char.unpack("H*")[0].hex)
 return true
  elsif (205..219).member?(char.unpack("H*")[0].hex)
 return true
  end
  return false
end

counter = 0; spam_content.each_char {|char| if
hebrew_char(char);counter += 1 ;end;}

if counter == template_hebrew_chars
 puts "this is a spam"
else
 puts "might not be a spam"
end
##code end


Now *that* might be possible in plain SA rules without a plugin: count
the number of characters in the message body, and the number of
characters that fall in a given range (e.g. those that are hebrew
glyphs), and calculate the percentage. I *think* you can do math in meta
rules...

However, a plugin would be _much_ more efficient than something like:

   body   __HBRW_CHARS/[\xc0-\xcb\xcd-\xdb\xdf-\xfb]/
   tflags __HBRW_CHARSmultiple
   body   __TOTAL_CHARS   /\S/
   tflags __TOTAL_CHARS   multiple
   meta   __HBRW_PCT  ((__HBRW_CHARS * 100) / __TOTAL_CHARS)
   meta   HBRW_SPAM   (__HBRW_PCT < 50) && __HBRW_ENCODING

I don't know whether the division in __HBRW_PCT or the less-than
comparison in HBRW_SPAM would work, that's totally off the top of my
head and untested. I also leave the __HBRW_ENCODING rule as an exercise
for the student. :)


Thanks

I had the __HBRW_ENCODING ready from before.
I think I will use a meta hat will check the mail then the encoding and 
then th percentage.


Thanks Again,

--
Eliezer Croitoru
http://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer  ngtech.co.il


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-02 Thread John Hardin

On Sat, 2 Feb 2013, Eliezer Croitoru wrote:


I wrote something in ruby which actually works fine as a starter.

#code start
spam_content = "the long part from the mail".force_encoding("Windows-1255")

template_hebrew_chars = 270

def hebrew_char(char)
  if (223..251).member?(char.unpack("H*")[0].hex)
return true
  elsif (192..203).member?(char.unpack("H*")[0].hex)
 return true
  elsif (205..219).member?(char.unpack("H*")[0].hex)
 return true
  end
  return false
end

counter = 0; spam_content.each_char {|char| if hebrew_char(char);counter += 1 
;end;}


if counter == template_hebrew_chars
 puts "this is a spam"
else
 puts "might not be a spam"
end
##code end


Now *that* might be possible in plain SA rules without a plugin: count the 
number of characters in the message body, and the number of characters 
that fall in a given range (e.g. those that are hebrew glyphs), and 
calculate the percentage. I *think* you can do math in meta rules...


However, a plugin would be _much_ more efficient than something like:

  body   __HBRW_CHARS/[\xc0-\xcb\xcd-\xdb\xdf-\xfb]/
  tflags __HBRW_CHARSmultiple
  body   __TOTAL_CHARS   /\S/
  tflags __TOTAL_CHARS   multiple
  meta   __HBRW_PCT  ((__HBRW_CHARS * 100) / __TOTAL_CHARS)
  meta   HBRW_SPAM   (__HBRW_PCT < 50) && __HBRW_ENCODING

I don't know whether the division in __HBRW_PCT or the less-than 
comparison in HBRW_SPAM would work, that's totally off the top of my head 
and untested. I also leave the __HBRW_ENCODING rule as an exercise for the 
student. :)



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  For those who are being swayed by Microsoft's whining about the
  GPL, consider how aggressively viral their Shared Source license is:
  If you've *ever* seen *any* MS code covered by the Shared Source
  license, you're infected for life. MS can sue you for Intellectual
  Property misappropriation whenever they like, so you'd better not
  come up with any Innovative Ideas that they want to Embrace...
---
 10 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-02 Thread Eliezer Croitoru

On 2/2/2013 11:01 PM, John Hardin wrote:

On Sat, 2 Feb 2013, Eliezer Croitoru wrote:


Yes I do understand that it's hard.
I worked a bit with perl so I might be able to write something that
will do that if dosn't exists already.


That's probably what it will take.


I will try to explain even more.
The problem is that I get the mail with an example of the SPAM content
which didn't came from EMAIL and just to categorize it as SPAM.
This is not how and for what SA was built for but it gives very good
results in general.
This is a specific case.


Ah, I think I see; by "this is a form" you meant your need is for
scanning content submitted via a web form to see if it is spammy?


Yes..


I have an active system which someone wrote in C# that scans the chars
etc but the problem is that it's in C# and it's an active check that
crawls the site and parsing it rather then a restful system that
triggers the checks when needed.

This is an example of the content:
http://www.fpaste.org/yFOC/

It can be even some CMS post that someone got and he want's to
categorize as spam.


So that sample message is largely hacked up just to provide headers so
that it looks like an email and SA can scan it? That sure doesn't look
like a valid email and there are a lot of obvious spam signs in the
headers.

This msg indeed recognized as spam.
I have other msgs which have:
X-Spam-Status: No, score=3.146 tagged_above=2 required=6.2
tests=[FROM_ILLEGAL_CHARS=2.059, LOTS_OF_MONEY=0.001,
RCVD_IN_XBL=0.724, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=no

And in this case I have a one way filter that actually works.
Language filtering.

I wrote something in ruby which actually works fine as a starter.

#code start
spam_content = "the long part from the mail".force_encoding("Windows-1255")

template_hebrew_chars = 270

def hebrew_char(char)
  if (223..251).member?(char.unpack("H*")[0].hex)
return true
  elsif (192..203).member?(char.unpack("H*")[0].hex)
 return true
  elsif (205..219).member?(char.unpack("H*")[0].hex)
 return true
  end
  return false
end

counter = 0; spam_content.each_char {|char| if hebrew_char(char);counter 
+= 1 ;end;}


if counter == template_hebrew_chars
  puts "this is a spam"
else
  puts "might not be a spam"
end
##code end

There are couple directions in the identification tree like how many 
words exist.

If there are mixed hebrew and english words what to decide...
Identify URLs etc.

I have used:
http://msdn.microsoft.com/en-US/goglobal/cc305148
http://en.wikipedia.org/wiki/Windows-1255#Code_page_layout
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1255.txt

And maybe later I will try to write something in perl that can help in that.
The mixing of two languages makes it a bit of a problem and I had a nice 
algorithm in mid to decide on percentage for hebrew language in this 
encoding.


In another encoding such as UTF-8 or even more complex phonetic 
languages makes it's a bit difficult but since most simple mails consist 
of plain text it wont be such a big problem.


Thanks,
--
Eliezer Croitoru
http://www1.ngtech.co.il


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-02 Thread John Hardin

On Sat, 2 Feb 2013, Eliezer Croitoru wrote:


Yes I do understand that it's hard.
I worked a bit with perl so I might be able to write something that will do 
that if dosn't exists already.


That's probably what it will take.


I will try to explain even more.
The problem is that I get the mail with an example of the SPAM content which 
didn't came from EMAIL and just to categorize it as SPAM.
This is not how and for what SA was built for but it gives very good results 
in general.

This is a specific case.


Ah, I think I see; by "this is a form" you meant your need is for scanning 
content submitted via a web form to see if it is spammy?


I have an active system which someone wrote in C# that scans the chars etc 
but the problem is that it's in C# and it's an active check that crawls the 
site and parsing it rather then a restful system that triggers the checks 
when needed.


This is an example of the content:
http://www.fpaste.org/yFOC/

It can be even some CMS post that someone got and he want's to categorize as 
spam.


So that sample message is largely hacked up just to provide headers so 
that it looks like an email and SA can scan it? That sure doesn't look 
like a valid email and there are a lot of obvious spam signs in the 
headers.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Users mistake widespread adoption of Microsoft Office for the
  development of a document format standard.
---
 10 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-02 Thread Eliezer Croitoru

On 2/2/2013 8:58 PM, John Hardin wrote:

That's the difficult part.

It's easy to look for specific strings in the body, or specific things
like the ratio of text to whitespace or text to images, but trying to
*interpret* the text to do something like detect which language it is in
is a *hard* problem. Even more so if you want to detect that the message
body is in more than one language, and determine the ratios.

The closest we can come today is to look at the character set of the
message and try to guess from that whether the *entire* message is in a
"foreign" language. This runs into problems where the character set of
the message supports multiple languages, like UTF-8 or some of the
character sets used by Windows.

Do you have Bayes enabled? If so, are you training these messages as
spam? If you are doing this, then they should eventually hit BAYES_99
and if there are any other spammy characteristics that would probably be
enough to detect them.

If you would upload a few of these spams to someplace like pastebin and
point us at them then we will be able to do better than just guess and
make general suggestions.


Yes I do understand that it's hard.
I worked a bit with perl so I might be able to write something that will 
do that if dosn't exists already.


I will try to explain even more.
The problem is that I get the mail with an example of the SPAM content 
which didn't came from EMAIL and just to categorize it as SPAM.
This is not how and for what SA was built for but it gives very good 
results in general.

This is a specific case.
I have an active system which someone wrote in C# that scans the chars 
etc but the problem is that it's in C# and it's an active check that 
crawls the site and parsing it rather then a restful system that 
triggers the checks when needed.


This is an example of the content:
http://www.fpaste.org/yFOC/

It can be even some CMS post that someone got and he want's to 
categorize as spam.





--
  John Hardin KA7OHZ http://www.impsec.org/~jhardin/


Thanks,
--
Eliezer Croitoru
http://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer  ngtech.co.il


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-02 Thread Martin Gregorie
On Sat, 2013-02-02 at 20:23 +0200, Eliezer Croitoru wrote:
> On 2/2/2013 7:39 PM, Martin Gregorie wrote:
> > In that case something like this would work:
> >
> > describe EC_BANNED_ADDRESS Mail from a spamming address
> > header   EC_BANNED_ADDRESS From =~ sender@spamming_address
> > scoreEC_BANNED_ADDRESS 10.0
> 
> >
> > There's no point in writing rules against the message body when the mail
> > is all from an address that you know.
> >
> >
> > Martin
> 
> Thanks Martin.
> I do have..
> The mail is fine.
> I just need to know about a pattern match in the content since it's a form.
> This address spam is pretty specific.
> This is why I wanted to use specific check for this kind of mail.
> The start and end has specific percentage of Hebrew language.
> Most of the mail should be in hebrew and if there is more then 50 
> percent of the body in english it's 100% spam.
> less then that I can score it with basic rules.
> 
> I was thinking of meta rule like here:
> http://spamassassin.1065346.n5.nabble.com/spamassassin-conditional-rules-td42578.html
> 
Use a meta-rule to combine non-scoring rules:

 describe EC_SPAMTRAP Mail from a spamming address
 header __EC_BANNED_ADDRESS From =~ sender@spamming_address
 body   __EC_MOSTLY_HEBREW  rule to decide if the body is mostly Hebrew
 meta EC_SPAMTRAP  (__EC_BANNED_ADDRESS && __EC_MOSTLY_HEBREW )
 scoreEC_SPAMTRAP  10

The meta rule will only fire if its subrules both fire. This sort of
structure is the way to encode logical relationships, but there's no way
to control whether a subrule is run. 

However, I can't suggest how to code the 'mostly Hebrew' test. AFAIK
there's no easy way to recognise languages or national character sets
now that universal character coding sets like UTF-8 and UTF-16 are
common: formerly you could do it by seeing which Microsoft codepage was
used for the body, though that was only useful if the sender was using a
Windows PC. You'd probably might have to write a plugin to handle
language recognition, especially as your weighting scheme requires every
word in the body to be counted as Hebrew or non-Hebrew.


Martin




Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-02 Thread John Hardin

On Sat, 2 Feb 2013, Eliezer Croitoru wrote:


I just need to know about a pattern match in the content since it's a form.


There are existing rules to detect fill-in-the-form emails. Are any of the 
FILL_FORM family of rules hitting those messages?


If the form text is in hebrew it likely won't; if you want to send me 
samples of those messages off-list, ideally as RFC-822 attachments, I'll 
be happy to see if I can add hebrew variants to the form rules.



This address spam is pretty specific.


If the spams are from the same address then you can blacklist that address 
as Martin suggested.



This is why I wanted to use specific check for this kind of mail.
The start and end has specific percentage of Hebrew language.
Most of the mail should be in hebrew and if there is more then 50 percent of 
the body in english it's 100% spam.


That's the difficult part.

It's easy to look for specific strings in the body, or specific things 
like the ratio of text to whitespace or text to images, but trying to 
*interpret* the text to do something like detect which language it is in 
is a *hard* problem. Even more so if you want to detect that the message 
body is in more than one language, and determine the ratios.


The closest we can come today is to look at the character set of the 
message and try to guess from that whether the *entire* message is in a 
"foreign" language. This runs into problems where the character set of the 
message supports multiple languages, like UTF-8 or some of the character 
sets used by Windows.


Do you have Bayes enabled? If so, are you training these messages as spam? 
If you are doing this, then they should eventually hit BAYES_99 and if 
there are any other spammy characteristics that would probably be enough 
to detect them.


If you would upload a few of these spams to someplace like pastebin and 
point us at them then we will be able to do better than just guess and 
make general suggestions.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
   "A well educated Electorate, being necessary to the liberty of a
free State, the Right of the People to Keep and Read Books,
shall not be infringed."
  ...means only registered voters can read books, and only those books
  obtained with State permission from State-controlled bookstores?
---
 10 days until Abraham Lincoln's and Charles Darwin's 204th Birthdays


Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-02 Thread Eliezer Croitoru

On 2/2/2013 7:39 PM, Martin Gregorie wrote:

In that case something like this would work:

describe EC_BANNED_ADDRESS Mail from a spamming address
header   EC_BANNED_ADDRESS From =~ sender@spamming_address
scoreEC_BANNED_ADDRESS 10.0

There's no point in writing rules against the message body when the mail
is all from an address that you know.


Martin


Thanks Martin.
I do have..
The mail is fine.
I just need to know about a pattern match in the content since it's a form.
This address spam is pretty specific.
This is why I wanted to use specific check for this kind of mail.
The start and end has specific percentage of Hebrew language.
Most of the mail should be in hebrew and if there is more then 50 
percent of the body in english it's 100% spam.

less then that I can score it with basic rules.

I was thinking of meta rule like here:
http://spamassassin.1065346.n5.nabble.com/spamassassin-conditional-rules-td42578.html

--
Eliezer Croitoru



Re: IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-02 Thread Martin Gregorie
On Sat, 2013-02-02 at 19:26 +0200, Eliezer Croitoru wrote:
> I have specific mail address which I get messages couple times with a 
> basic pattern which I want to block.
> 
> I started reading:
> http://wiki.apache.org/spamassassin/WritingRules
> 
> And I would be very happy to get some notes and help about it.
> 
> - The mail is from specific mail address.
>
In that case something like this would work:

describe EC_BANNED_ADDRESS Mail from a spamming address
header   EC_BANNED_ADDRESS From =~ sender@spamming_address
scoreEC_BANNED_ADDRESS 10.0

There's no point in writing rules against the message body when the mail
is all from an address that you know.


Martin





IS there a simple way to add a rule of a body mail test? I have a pattern..

2013-02-02 Thread Eliezer Croitoru
I have specific mail address which I get messages couple times with a 
basic pattern which I want to block.


I started reading:
http://wiki.apache.org/spamassassin/WritingRules

And I would be very happy to get some notes and help about it.

- The mail is from specific mail address.
- The mail body have specific pattern which is
#quote start of mail body
one language paragraph

couple paragraphs in more then 50% English or 50% non my local language 
(the spam msg body)


other paragraph info
--
signatrue
#quote end of mail body

So basically I need to match the sender then "score" the body from 
"" to "\r\n_msg_end_marker" by language characters percentage.


Any direction to do that will be very helpful.

Now the email scores are:
X-Spam-Flag: NO
X-Spam-Score: 3.146
X-Spam-Level: ***
X-Spam-Status: No, score=3.146 tagged_above=2 required=6.2
tests=[FROM_ILLEGAL_CHARS=2.059, LOTS_OF_MONEY=0.001,
RCVD_IN_XBL=0.724, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=no



Thanks,

--
Eliezer Croitoru


Re: A simple way to...

2004-10-10 Thread Robin Lynn Frank
On Sat, 9 Oct 2004 15:41:37 -0600 (CST)
Ryan Thompson <[EMAIL PROTECTED]> wrote:

> Robin Lynn Frank wrote to users@spamassassin.apache.org:
> 
> > We use SA 3.0.0 with MySQL so we can extract certain AWL data and
> > use it at the MTA level.  However, since SA doesn't have an
> > auto-blacklist feature,
> 
> Hi Robin,
> 
> Actually, "AutoWhiteList" (AWL) is a bit of a misnomer. AWL maintains
> average message scores for sender/class-B tuples, so, in effect, it is
> also an auto blacklist, because repeat spam senders will have high
> average scores in the AWL database.
> 
> > I'd like to find a relatively simple way to extract IP addresses
> > from emails that contain spam.  If it is of any importance, we
> > invoke SA via amavisd-new.
> 
> See, for instance, the check_whitelist script in the tools/ directory
> of the distribution. I get output like this:
> 
>  -4.5   (-35.6/8)  --  [EMAIL PROTECTED]|ip=64.59
>   9.3(27.9/3)  --  [EMAIL PROTECTED]|ip=65.39
> 
> The first line is for a user that sends ham, so his/her score on
> future messages would be pushed closer to -4.5.
> 
> The second line is for a user that sends spam, so, if they sent a more
> hammy message later, the AWL would likely *add* points to the message,
> while decreasing the average slightly.
> 
> It works both ways. If you want to use this at the MTA level, I could
> envision you wanting to grab, say, every entry over a certain average
> score and potentially greylist based on that or something.
> 
> Hope this helps,
> - Ryan
> 
Yes it does.  The only thing I see that is a problem is that the IPs
appear to be /16s.  /24s would be a broad enough brush to paint with. 
Back to the drawing board.

-- 
Robin Lynn Frank
Director of Operations
Paradigm-Omega, LLC
http://www.paradigm-omega.com
==
Sed quis custodiet ipsos custodes?


pgpZtWxbE2FED.pgp
Description: PGP signature


Re: A simple way to...

2004-10-09 Thread Bill Landry
- Original Message - 
From: "Ryan Thompson" <[EMAIL PROTECTED]>

> Robin Lynn Frank wrote to users@spamassassin.apache.org:
>
> > We use SA 3.0.0 with MySQL so we can extract certain AWL data and use
> > it at the MTA level.  However, since SA doesn't have an auto-blacklist
> > feature,
>
> Hi Robin,
>
> Actually, "AutoWhiteList" (AWL) is a bit of a misnomer. AWL maintains
> average message scores for sender/class-B tuples, so, in effect, it is
> also an auto blacklist, because repeat spam senders will have high
> average scores in the AWL database.
>
> > I'd like to find a relatively simple way to extract IP addresses from
> > emails that contain spam.  If it is of any importance, we invoke SA
> > via amavisd-new.
>
> See, for instance, the check_whitelist script in the tools/ directory of
> the distribution. I get output like this:
>
>  -4.5   (-35.6/8)  --  [EMAIL PROTECTED]|ip=64.59
>   9.3(27.9/3)  --  [EMAIL PROTECTED]|ip=65.39
>
> The first line is for a user that sends ham, so his/her score on future
> messages would be pushed closer to -4.5.
>
> The second line is for a user that sends spam, so, if they sent a more
> hammy message later, the AWL would likely *add* points to the message,
> while decreasing the average slightly.
>
> It works both ways. If you want to use this at the MTA level, I could
> envision you wanting to grab, say, every entry over a certain average
> score and potentially greylist based on that or something.

I'm wondering if the devs have consider changing the name associated with
AWL from auto-whitelisting to something more descriptive of what AWL
actually does, maybe something like auto-weight-leveling?

Bill



Re: A simple way to...

2004-10-09 Thread Ryan Thompson
Robin Lynn Frank wrote to users@spamassassin.apache.org:
We use SA 3.0.0 with MySQL so we can extract certain AWL data and use
it at the MTA level.  However, since SA doesn't have an auto-blacklist
feature,
Hi Robin,
Actually, "AutoWhiteList" (AWL) is a bit of a misnomer. AWL maintains
average message scores for sender/class-B tuples, so, in effect, it is
also an auto blacklist, because repeat spam senders will have high
average scores in the AWL database.
I'd like to find a relatively simple way to extract IP addresses from
emails that contain spam.  If it is of any importance, we invoke SA
via amavisd-new.
See, for instance, the check_whitelist script in the tools/ directory of
the distribution. I get output like this:
-4.5   (-35.6/8)  --  [EMAIL PROTECTED]|ip=64.59
 9.3(27.9/3)  --  [EMAIL PROTECTED]|ip=65.39
The first line is for a user that sends ham, so his/her score on future
messages would be pushed closer to -4.5.
The second line is for a user that sends spam, so, if they sent a more
hammy message later, the AWL would likely *add* points to the message,
while decreasing the average slightly.
It works both ways. If you want to use this at the MTA level, I could
envision you wanting to grab, say, every entry over a certain average
score and potentially greylist based on that or something.
Hope this helps,
- Ryan
--
  Ryan Thompson <[EMAIL PROTECTED]>
  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America


A simple way to...

2004-10-09 Thread Robin Lynn Frank
We use SA 3.0.0 with MySQL so we can extract certain AWL data and use it
at the MTA level.  However, since SA doesn't have an auto-blacklist
feature, I'd like to find a relatively simple way to extract IP
addresses from emails that contain spam.  If it is of any importance, we
invoke SA via amavisd-new.

-- 
Robin Lynn Frank
Director of Operations
Paradigm-Omega, LLC
http://www.paradigm-omega.com
==
Sed quis custodiet ipsos custodes?


pgpTztCEGUoBu.pgp
Description: PGP signature