Re: [Mailman-Users] privacy options, SPAM, regex

2008-11-28 Thread Mark Sapiro
Helmut Schneider wrote:
>
>As far as I can see this patch works great. As a positive side effect, is it 
>possible that this patch also affects uncaught bounces? I recieve lots of 
>uncaught bounces now where a SPAM-filter was required before the patch.


No. The patch has absolutely no effect on uncaught bounces. Uncaught
bounces are messages sent to a LIST-bounces address that are not
VERPed and are not recognized as DSNs. If spam is sent to a
LIST-bounces address and makes it to Mailman, it will be an
unrecognized bounce. SpamDetect.py and header_filter_rules are not
involved at all in processing mail received at a LIST-bounces address.

Any change you observed in uncaught bounces is just a coincidence.

-- 
Mark Sapiro <[EMAIL PROTECTED]>The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Users] privacy options, SPAM, regex

2008-11-28 Thread Helmut Schneider

From: "Mark Sapiro" <[EMAIL PROTECTED]>

Mark Sapiro wrote:

Helmut Schneider wrote:

Interesting, with "^subject:.*Declined.*"

Subject: Declined: [Somelist] Invitation to workshop on 13rd Dec. 2008

matches while

Subject: [Somelist] Declined:  Invitation to workshop on 13rd Dec. 2008

does not. Huh?!



It turns out that RFC 2047 encoded headers are not decoded before
matching against the regexps. Is that the issue here? What do the raw
headers look like?

I think that the headers should be decoded, but I wonder if people are
currently working around this with regexps that match encoded headers
and wouldn't match decoded headers.



I have developed a patch for SpamDetect.py which will decode RFC 2047
encoded headers. This is somewhat problematic because the decoded
headers will presumably contain non-ascii characters, and while the
character sets of the headers are known (and there can be different
headers or even different parts of a single header encoded in different
character sets), the character set of the regexps in header_filter_rules
is not known.

The patch creates a unicode object containing all the headers unfolded
and RFC 2047 decoded with one complete header per line and then encodes
it into the character set of the list's preferred_language, and this
result is what the regexps will search. As long as the regexps contain
only ascii and the raw headers contain no non-ascii characters, this
should give expected results. If the regexps contain non-ascii
characters or the headers contain non-ascii not RFC 2047 encoded,
results may be unexpected.

If in fact, the original issue is due to RFC 2047 encoded headers, try
the patch and let us know how it works.


As far as I can see this patch works great. As a positive side effect, is it 
possible that this patch also affects uncaught bounces? I recieve lots of 
uncaught bounces now where a SPAM-filter was required before the patch.


Thanks a lot, Helmut 


--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Users] privacy options, SPAM, regex

2008-11-27 Thread Mark Sapiro
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mark Sapiro wrote:
> Helmut Schneider wrote:
>> Interesting, with "^subject:.*Declined.*"
>>
>> Subject: Declined: [Somelist] Invitation to workshop on 13rd Dec. 2008
>>
>> matches while
>>
>> Subject: [Somelist] Declined:  Invitation to workshop on 13rd Dec. 2008
>>
>> does not. Huh?!
> 
> 
> It turns out that RFC 2047 encoded headers are not decoded before
> matching against the regexps. Is that the issue here? What do the raw
> headers look like?
> 
> I think that the headers should be decoded, but I wonder if people are
> currently working around this with regexps that match encoded headers
> and wouldn't match decoded headers.


I have developed a patch for SpamDetect.py which will decode RFC 2047
encoded headers. This is somewhat problematic because the decoded
headers will presumably contain non-ascii characters, and while the
character sets of the headers are known (and there can be different
headers or even different parts of a single header encoded in different
character sets), the character set of the regexps in header_filter_rules
is not known.

The patch creates a unicode object containing all the headers unfolded
and RFC 2047 decoded with one complete header per line and then encodes
it into the character set of the list's preferred_language, and this
result is what the regexps will search. As long as the regexps contain
only ascii and the raw headers contain no non-ascii characters, this
should give expected results. If the regexps contain non-ascii
characters or the headers contain non-ascii not RFC 2047 encoded,
results may be unexpected.

If in fact, the original issue is due to RFC 2047 encoded headers, try
the patch and let us know how it works.

- --
Mark Sapiro <[EMAIL PROTECTED]>The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (MingW32)

iD8DBQFJLwEfVVuXXpU7hpMRArKTAKCiDYtwz3VENF8Qww1tEw3lUMzUnQCgoGNh
K8vySqy57Vn8w0EHpj6LeJM=
=0pk1
-END PGP SIGNATURE-
--- f:/test-mailman-2.2/Mailman/Handlers/SpamDetect.py  2007-07-17 
11:06:14.0 -0700
+++ f:/test-mailman/Mailman/Handlers/SpamDetect.py  2008-11-27 
11:53:59.46875 -0800
@@ -26,9 +26,8 @@
 """
 
 import re
-from cStringIO import StringIO
 
-from email.Generator import Generator
+from email.Header import decode_header
 
 from Mailman import mm_cfg
 from Mailman import Errors
@@ -60,34 +59,21 @@
 
 
 
-class Tee:
-def __init__(self, outfp_a, outfp_b):
-self._outfp_a = outfp_a
-self._outfp_b = outfp_b
-
-def write(self, s):
-self._outfp_a.write(s)
-self._outfp_b.write(s)
-
-
-# Class to capture the headers separate from the message body
-class HeaderGenerator(Generator):
-def __init__(self, outfp, mangle_from_=True, maxheaderlen=78):
-Generator.__init__(self, outfp, mangle_from_, maxheaderlen)
-self._headertxt = ''
-
-def _write_headers(self, msg):
-sfp = StringIO()
-oldfp = self._fp
-self._fp = Tee(oldfp, sfp)
-try:
-Generator._write_headers(self, msg)
-finally:
-self._fp = oldfp
-self._headertxt = sfp.getvalue()
+def getDecodedHeaders(msg, cset='utf-8'):
+"""Returns a string containing all the headers of msg, unfolded and
+RFC 2047 decoded and encoded in cset.
+"""
 
-def header_text(self):
-return self._headertxt
+headers = ''
+for h, v in msg.items():
+uvalue = u''
+v = decode_header(re.sub('\n\s', ' ', v))
+for frag, cs in v:
+if not cs:
+cs = 'us-ascii'
+uvalue += unicode(frag, cs, 'replace')
+headers += '%s: %s\n' % (h, uvalue.encode(cset, 'replace'))
+return headers
 
 
 
@@ -106,13 +92,10 @@
 # TK: Collect headers in sub-parts because attachment filename
 # extension may be a clue to possible virus/spam.
 headers = ''
+# Get the character set of the lists preferred language for headers
+cset = mm_cfg.LC_DESCRIPTIONS[mlist.preferred_language][1]
 for p in msg.walk():
-g = HeaderGenerator(StringIO())
-g.flatten(p)
-headers += g.header_text()
-# Now reshape headers (remove extra CR and connect multiline).
-headers = re.sub('\n+', '\n', headers)
-headers = re.sub('\n\s', ' ', headers)
+headers += getDecodedHeaders(p, cset)
 for patterns, action, empty in mlist.header_filter_rules:
 if action == mm_cfg.DEFER:
 continue
--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Users] privacy options, SPAM, regex

2008-11-27 Thread Mark Sapiro
Helmut Schneider wrote:
>
>Interesting, with "^subject:.*Declined.*"
>
>Subject: Declined: [Somelist] Invitation to workshop on 13rd Dec. 2008
>
>matches while
>
>Subject: [Somelist] Declined:  Invitation to workshop on 13rd Dec. 2008
>
>does not. Huh?!


It turns out that RFC 2047 encoded headers are not decoded before
matching against the regexps. Is that the issue here? What do the raw
headers look like?

I think that the headers should be decoded, but I wonder if people are
currently working around this with regexps that match encoded headers
and wouldn't match decoded headers.

-- 
Mark Sapiro <[EMAIL PROTECTED]>The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Users] privacy options, SPAM, regex

2008-11-27 Thread Helmut Schneider

Helmut Schneider wrote:


I have lots of problems with out-of-office replies. I tried to set up
a few  filter rules using 2.1.10. Unfortuantely they don't catch them.
Are the  expressions case sensitiv? Are the expressions basic or
extended? 
What I tried yet:


^subject:.*Accepted.*
^subject:.*Declined.*
^subject:.*is out of office.*



There are two different filters at # Privacy options... ->Spam filters,
and they work differently.

The more flexible of the two is header_filter_rules. For
header_filter_rules the regexps are matched against a multi-line
string containing all the unfolded headers in the message, both
message headers and sub-part headers. The regexp is a python regexp
 and
the headers are searched
 for a match of the
regexp in MULTILINE and IGNORECASE mode. This means the '^' matches
the beginning of the string or the null character immediately
following a newline and the match is case insensitive. Thus your above
expressions look good.


That's weird. Messages still pass with e.g.

Subject: [Somelist] Declined:  Invitation to workshop on 13rd Dec. 2008

in the Header. Do I need to escape the colon? Or something else?


Interesting, with "^subject:.*Declined.*"

Subject: Declined: [Somelist] Invitation to workshop on 13rd Dec. 2008

matches while

Subject: [Somelist] Declined:  Invitation to workshop on 13rd Dec. 2008

does not. Huh?!
--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Users] privacy options, SPAM, regex

2008-11-26 Thread Mark Sapiro
Michael Welch wrote:
>
>>^subject:.*is out of office.*
>
>I just added this rule as a "Reject" since we have had a few of these come to 
>the list lately.
>^subject:.*out of office.*
>
>Would folks be willing to share the rules they have developed that would apply 
>to general business lists? We have not had any problems with spam, as our list 
>is very tight. This would be for accidental or automated sendings.
>
>Also, what does the sender receive upon rejection? I am hesitant to test lest 
>something accidentally gets through. 


That's what test lists are for ;)

The post is send back to the poster attached to a message with the
original subject which says "Message rejected by filter rule match".


>Is the list owner notified of these spam filter rejections?


No


>Are the spam rules applied after testing for list membership?


No. in the default pipeline, header_filter_rules are the first thing
done, even before checking for an Approved: header.

OTOH, bounce_matching_headers are not checked until after membership
checks.

-- 
Mark Sapiro <[EMAIL PROTECTED]>The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Users] privacy options, SPAM, regex

2008-11-26 Thread Michael Welch
Hi friends.

>^subject:.*is out of office.*

I just added this rule as a "Reject" since we have had a few of these come to 
the list lately.
^subject:.*out of office.*

Would folks be willing to share the rules they have developed that would apply 
to general business lists? We have not had any problems with spam, as our list 
is very tight. This would be for accidental or automated sendings.

Also, what does the sender receive upon rejection? I am hesitant to test lest 
something accidentally gets through. 

Is the list owner notified of these spam filter rejections? Are the spam rules 
applied after testing for list membership?


- - - - - - - - - - - -
Michael Welch, volunteer
Redwood Alliance
PO Box 293
Arcata, CA 95518
707-822-7884
[EMAIL PROTECTED]
www.redwoodalliance.org

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Users] privacy options, SPAM, regex

2008-11-26 Thread Mark Sapiro
Helmut Schneider wrote:

> Mark Sapiro wrote:

>> Helmut Schneider wrote:
>>> 
>>> I have lots of problems with out-of-office replies. I tried to set up
>>> a few  filter rules using 2.1.10. Unfortuantely they don't catch them.
>>> Are the  expressions case sensitiv? Are the expressions basic or
>>> extended? 
>>> What I tried yet:
>>> 
>>> ^subject:.*Accepted.*
>>> ^subject:.*Declined.*
>>> ^subject:.*is out of office.*
>> 
>> 
>> There are two different filters at # Privacy options... ->Spam filters,
>> and they work differently.
>> 
>> The more flexible of the two is header_filter_rules. For
>> header_filter_rules the regexps are matched against a multi-line
>> string containing all the unfolded headers in the message, both
>> message headers and sub-part headers. The regexp is a python regexp
>>  and
>> the headers are searched
>>  for a match of the
>> regexp in MULTILINE and IGNORECASE mode. This means the '^' matches
>> the beginning of the string or the null character immediately
>> following a newline and the match is case insensitive. Thus your above
>> expressions look good.
>
>That's weird. Messages still pass with e.g.
>
>Subject: [Somelist] Declined:  Invitation to workshop on 13rd Dec. 2008
>
>in the Header. Do I need to escape the colon? Or something else?


I just tested a rule with the three regexps

^subject:.*Accepted.*
^subject:.*Declined.*
^subject:.*is out of office.*

copied from your post and Action set to Reject, and a message with

Subject: [Somelist] Declined:  Invitation to workshop on 13rd Dec. 2008

was rejected for matching the rule. Perhaps you didn't set the rule
action. Note that Action = Defer does not mean defer the post; it
means defer the rule - i.e. don't enforce it.

-- 
Mark Sapiro <[EMAIL PROTECTED]>The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Users] privacy options, SPAM, regex

2008-11-26 Thread Helmut Schneider
- Original Message - 
From: "Mark Sapiro" <[EMAIL PROTECTED]>

To: "Helmut Schneider" <[EMAIL PROTECTED]>; 
Sent: Wednesday, November 26, 2008 5:12 AM
Subject: Re: [Mailman-Users] privacy options, SPAM, regex 

Helmut Schneider wrote:


I have lots of problems with out-of-office replies. I tried to set up
a few  filter rules using 2.1.10. Unfortuantely they don't catch them.
Are the  expressions case sensitiv? Are the expressions basic or
extended? 
What I tried yet:


^subject:.*Accepted.*
^subject:.*Declined.*
^subject:.*is out of office.*



There are two different filters at # Privacy options... ->Spam filters,
and they work differently.

The more flexible of the two is header_filter_rules. For
header_filter_rules the regexps are matched against a multi-line
string containing all the unfolded headers in the message, both
message headers and sub-part headers. The regexp is a python regexp
<http://docs.python.org/library/re.html#regular-expression-syntax> and
the headers are searched
<http://docs.python.org/library/re.html#re.search> for a match of the
regexp in MULTILINE and IGNORECASE mode. This means the '^' matches
the beginning of the string or the null character immediately
following a newline and the match is case insensitive. Thus your above
expressions look good.


That's weird. Messages still pass with e.g.

Subject: [Somelist] Declined:  Invitation to workshop on 13rd Dec. 2008

in the Header. Do I need to escape the colon? Or something else?

Thanks, Helmut
--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Users] privacy options, SPAM, regex

2008-11-25 Thread Mark Sapiro
Helmut Schneider wrote:
>
>I have lots of problems with out-of-office replies. I tried to set up a few 
>filter rules using 2.1.10. Unfortuantely they don't catch them. Are the 
>expressions case sensitiv? Are the expressions basic or extended?
>
>What I tried yet:
>
>^subject:.*Accepted.*
>^subject:.*Declined.*
>^subject:.*is out of office.*


There are two different filters at # Privacy options... ->Spam filters,
and they work differently.

The more flexible of the two is header_filter_rules. For
header_filter_rules the regexps are matched against a multi-line
string containing all the unfolded headers in the message, both
message headers and sub-part headers. The regexp is a python regexp
 and
the headers are searched
 for a match of the
regexp in MULTILINE and IGNORECASE mode. This means the '^' matches
the beginning of the string or the null character immediately
following a newline and the match is case insensitive. Thus your above
expressions look good.

The other is bounce_matching_headers which works differently. It
expects a header name followed by a colon followed by a regexp to
match against the contents of that header - e.g.

subject:is out of office

would match any subject: header that contained 'is out of office'. This
match too is case insensitive.

Also, with bounce_matching_headers, you can't specify an action. The
action is always 'Hold'.

-- 
Mark Sapiro <[EMAIL PROTECTED]>The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


[Mailman-Users] privacy options, SPAM, regex

2008-11-25 Thread Helmut Schneider

Hi,

I have lots of problems with out-of-office replies. I tried to set up a few 
filter rules using 2.1.10. Unfortuantely they don't catch them. Are the 
expressions case sensitiv? Are the expressions basic or extended?


What I tried yet:

^subject:.*Accepted.*
^subject:.*Declined.*
^subject:.*is out of office.*

Thanks, Helmut 


--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9