Re: [Mailman-Users] UTF-8 From and Reply-to addresses not getting properly processed.

2020-02-16 Thread Mark Sapiro
On 2/16/20 1:10 PM, Lindsay Haisley wrote:
> 
> One question, Mark. Are the spam filter rules ("header_filter_rules")
> applied _before_ or _after_ a message with a uuencoded (base64) From
> address is decoded?


As I said before, header_filter_rules are matched against the decoded
headers.


> If _before_ (matching what we see in on the Administrative Requests
> page), the it should be sufficient to discard any message with a From
> header staring with "=?utf-8".


You won't see that. Further, there may be legitimate  From: headers with
a utf-8 encoded fragment if, e.g., the From: display name is non-ascii.

Also note that your problem messages are non-compliant. RFC 2047,
section 5(3) is clear

+ An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'.



Your real issue here is whatever header_filter_rule with hold action
above the discard rule is matching. That is what is preventing
generic_nonmember_action from discarding the message.

-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] UTF-8 From and Reply-to addresses not getting properly processed.

2020-02-16 Thread Mark Sapiro
On 2/16/20 12:44 PM, Lindsay Haisley wrote:

> 
> Possible. The "Reason" is "The message headers matched a filter rule"


Then there must be a rule with Hold action before (above) your discard
rule, and as I've noted, your discard rule is way too broad, matching a
character class rather than a word.

-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] UTF-8 From and Reply-to addresses not getting properly processed.

2020-02-16 Thread Lindsay Haisley
On Sun, 2020-02-16 at 12:08 -0800, Mark Sapiro wrote:
> A match on a header_filter_rule with a Hold action.

One question, Mark. Are the spam filter rules ("header_filter_rules")
applied _before_ or _after_ a message with a uuencoded (base64) From
address is decoded?

If _before_ (matching what we see in on the Administrative Requests
page), the it should be sufficient to discard any message with a From
header staring with "=?utf-8".

-- 
Lindsay Haisley   |  "The arc of history is long, but
FMP Computer Services | it bends toward Justice"
512-259-1190  |
http://www.fmp.com|- Barack Obama


--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] UTF-8 From and Reply-to addresses not getting properly processed.

2020-02-16 Thread Lindsay Haisley
On Sun, 2020-02-16 at 12:08 -0800, Mark Sapiro wrote:
> Munged a few words.
> 
> On 2/15/20 11:20 PM, Lindsay Haisley wrote:
> 
> > The only filter relevant to this issue is "(?i)Subject: .*[f...]".
> 
> The (?i) is irrelevant as the match always ignores case. Also, I don't
> think that's what you want as it will match any Subject that contains
> any of the letters f, u, c, k in either case. What is the action of this
> rule?

Discard.

> On 2/16/20 10:17 AM, Lindsay Haisley wrote:
> > 
> > We want to discard _all_ non-member
> > posts, and the problem is that these base64-addressed posts _are_ being
> > held and not discarded. 
> 
> 
> If generic_nonmember_action is Discard, non-member posts should be
> discarded unless some prior test causes them to be held. Things which
> could cause a hold are in order:
> 
> A match on a header_filter_rule with a Hold action.

Possible. The "Reason" is "The message headers matched a filter rule"


> One of the addresses returned by get_senders() is a moderated member and
> member_moderation_action is Hold.

This is certainly a possiblity. I see that you've given me some code to
work with below. I'll explore this. Thanks!

> None of the addresses returned by get_senders() is a member and the
> address returned by get_sender() matches an entry in
> hold_these_nonmembers

hold_these_nonmembers is empty.

> An address returned by get_senders() is an unmoderated member and the
> list's emergency setting is Yes.

It is No.

> What is the reason given for the hold?

"The message headers matched a filter rule"

h

> In the case of the message headers in your OP, get_senders() will
> return
> a list like
> 
> ['=?utf-8?b?ikfiaweiidxbymlhqg11bhrplm5ldc5waz4=?=',
> 'a...@multi.net.pk',
> '=?utf-8?b?ikfiaweiidxbymlhqg11bhrplm5ldc5waz4=?=']
> 
> which are lowercased versions of respectively, the undecoded From:,
> The
> unix from which I deduce from Return-Path: and the undecoded Reply-
> To:.
> Both the original From: and Reply-To: decode to
> 
> "Abia" 
> 
> msg.get_sender() returns '=?utf-
> 8?b?ikfiaweiidxbymlhqg11bhrplm5ldc5waz4=?='
> 
> From what you've said, that message whose decoded Subject: header is
> 
> Subject: I InstaF... Request is Pending
> 
> would match the header_filter_rule and be handled per that rule's
> action, but then so would a message with
> 
> Subject: It's a fine day

I substituted for "uck" where it showed up to avoid hitting subscriber
filters on _this_ list.

> You can decode these headers like
> 
> python
> ...
> > > > from email.header import decode_header
> > > > decode_header('=?utf-
> > > > 8?B?IkFiaWEiIDxBYmlhQG11bHRpLm5ldC5waz4=?=')
> 
> [('"Abia" ', 'utf-8')]

Thanks!!! This is the tool I need, however the list administrator
doesn't have access to the Python interactive shell on this system, and
these messages seem to have different encoded From headers (I'll check
this). We need some way going forward so that she can get these
discarded at the git-go without having to run get_senders() on each
one.

This may not be possible. One of the characteristics of spam is that
it's like clouds in the sky. One particular type and form may all over
you one day, and completely gone the next.

-- 
Lindsay Haisley   |  "The arc of history is long, but
FMP Computer Services | it bends toward Justice"
512-259-1190  |
http://www.fmp.com|- Barack Obama


--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] UTF-8 From and Reply-to addresses not getting properly processed.

2020-02-16 Thread Mark Sapiro
Munged a few words.

On 2/15/20 11:20 PM, Lindsay Haisley wrote:

> The only filter relevant to this issue is "(?i)Subject: .*[f...]".

The (?i) is irrelevant as the match always ignores case. Also, I don't
think that's what you want as it will match any Subject that contains
any of the letters f, u, c, k in either case. What is the action of this
rule?


On 2/16/20 10:17 AM, Lindsay Haisley wrote:
> 
> We want to discard _all_ non-member
> posts, and the problem is that these base64-addressed posts _are_ being
> held and not discarded. 


If generic_nonmember_action is Discard, non-member posts should be
discarded unless some prior test causes them to be held. Things which
could cause a hold are in order:

A match on a header_filter_rule with a Hold action.

One of the addresses returned by get_senders() is a moderated member and
member_moderation_action is Hold.

None of the addresses returned by get_senders() is a member and the
address returned by get_sender() matches an entry in hold_these_nonmembers

An address returned by get_senders() is an unmoderated member and the
list's emergency setting is Yes.

What is the reason given for the hold?

In the case of the message headers in your OP, get_senders() will return
a list like

['=?utf-8?b?ikfiaweiidxbymlhqg11bhrplm5ldc5waz4=?=',
'a...@multi.net.pk',
'=?utf-8?b?ikfiaweiidxbymlhqg11bhrplm5ldc5waz4=?=']

which are lowercased versions of respectively, the undecoded From:, The
unix from which I deduce from Return-Path: and the undecoded Reply-To:.
Both the original From: and Reply-To: decode to

"Abia" 

msg.get_sender() returns '=?utf-8?b?ikfiaweiidxbymlhqg11bhrplm5ldc5waz4=?='

>From what you've said, that message whose decoded Subject: header is

Subject: I InstaF... Request is Pending

would match the header_filter_rule and be handled per that rule's
action, but then so would a message with

Subject: It's a fine day

You can decode these headers like

python
...
>>> from email.header import decode_header
>>> decode_header('=?utf-8?B?IkFiaWEiIDxBYmlhQG11bHRpLm5ldC5waz4=?=')
[('"Abia" ', 'utf-8')]


-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] UTF-8 From and Reply-to addresses not getting properly processed.

2020-02-16 Thread Lindsay Haisley
On Sun, 2020-02-16 at 14:09 -0500, Richard Damon wrote:
> One thing to note is that you seem to have two different filters at work 
> here, one being non-member post, which you want to Discard, and messages 
> with 'bad' words in the subject, which you define to Hold. A message 
> which matches both filters will be acted by the first filter that the 
> message hits.

The only relevant spam filter in effect here is:

(?i)Subject: .*[f---]
[dashes substituted to avoid downline filters on this list]

The disposition (Action) specified for this filter is Discard, not
Hold. I don't believe we have a "bad words" filter as such, nor a
relevant criterion which is set to "Hold" which might override existing
settings. Some of these problem posts don't have "f---" in the Subject
header, (e.g. "I Instacheat") but contain "f---" in the body. Filtering
on the body content for prohibited words, if it's supported (and I
don't think it is), should not trump generic_nonmember_action, which is
why I suspect the problem posts may be from moderated members, but the
list has 859 members, most of whom are legit but unknown to the list
admin.

The primary problem here is what I identified as ascii-armored UTF-8
encoding (Mark says they're wrapped in base64 encoding) so we don't
have any idea what the _real_ From address is, so we can't search the
subscriber list, nor set up specific rules by sender address.

The Administrative Requests list decodes the Subject header, which is
similarly encoded, but not the From address. Without this, we're
shooting in the dark. As I said, if the spamming senders address is
actually a moderated subscriber we have no way of knowing from the
information provided by MM 2.

Since the code is in MM2 to decode base64 headers, which it does for
the similarly encoded Subject, I should probably hack the MM code to
make it do the same for the From header before showing it in the list
of Administrative Requests (or someone else could do this, probably
faster than I could).

-- 
Lindsay Haisley   |  "The arc of history is long, but
FMP Computer Services | it bends toward Justice"
512-259-1190  |
http://www.fmp.com|- Barack Obama


--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] UTF-8 From and Reply-to addresses not getting properly processed.

2020-02-16 Thread Richard Damon
One thing to note is that you seem to have two different filters at work 
here, one being non-member post, which you want to Discard, and messages 
with 'bad' words in the subject, which you define to Hold. A message 
which matches both filters will be acted by the first filter that the 
message hits.


I have found that on the list I run, which is on a shared server so I am 
limited to what I can control, the spam filters are checked BEFORE the 
non-member filter, so if I define a spam filter, it can cause me to see 
messages that would otherwise be discarded. This says that I don't want 
to be too aggressive with these filters, as they will create work for me.


If you can adjust the configuration of mailman, you may want to move the 
filter that discards non-member posts before your spam filters.


--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] UTF-8 From and Reply-to addresses not getting properly processed.

2020-02-16 Thread Lindsay Haisley
Here's a more concise summary:

On Sat, 2020-02-15 at 20:00 -0800, Mark Sapiro wrote:
> On 2/15/20 5:58 PM, Lindsay Haisley wrote:
> > We're running Mailman 2.1.18-1 and have a list which is having a porn
> > spam problem. The list is set to discard posts from non-members, and
> > the list moderator has set various filters to try to filter on words
> > which contain "f***", as many do, however the Subject, From and Reply-
> > to addresses are all UTF-8 strings, and are apparently confusing
> > Mailman's decision-making functions, and these posts are ending up in
> > the administrative requests list.  Here's a sample set of headers:
> >
> > From: =?utf-8?B?IkFiaWEiIDxBYmlhQG11bHRpLm5ldC5waz4=?=
[snip..]

* The list admin wants to discard, not hold, _all_ nonmember
submissions. These "problem posts" are getting held, not discarded.

* generic_nonmember_action is set to "Discard" but this isn't working
for these posts.

* From and Reply-to addresses on "problem posts" are base64 (utf-8 ?)
encoded, both in the held post detail and on the held post listing
page, so there's no way of identifying the addresses they represent. They may 
actually be subscribed in their decoded form, or handled in
some other context which prescribes that they be held, not discarded.
(New member posts are moderated by default via
default_member_moderation.)

* From and Reply-to addresses differ from one to another of these
"problem posts", so blocking individual sender addresses is useless, as
is usually the case with spam.

* MM spam filters are apparently irrelevant to this issue.

> If you only want to discard non-member posts with RFC 2047 encoded
> From:, you could put something like
> 
> ^[^@]+@[a-z0-9_.]+$
> 
> in hold_these_nonmembers to hold the ones that at least don't have
> base64 encoded From:

We could use

^[^@]+$

in hold_these_nonmembers and this _might_ discard the base64
addressed "problem posts", but would _hold_ other non-member posts,
which isn't the result we want. We want to discard _all_ non-member
posts, and the problem is that these base64-addressed posts _are_ being
held and not discarded. 

-- 
Lindsay Haisley   |  "The arc of history is long, but
FMP Computer Services | it bends toward Justice"
512-259-1190  |
http://www.fmp.com|- Barack Obama


--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] UTF-8 From and Reply-to addresses not getting properly processed.

2020-02-15 Thread Lindsay Haisley
On Sat, 2020-02-15 at 20:00 -0800, Mark Sapiro wrote:
> On 2/15/20 5:58 PM, Lindsay Haisley wrote:
> > We're running Mailman 2.1.18-1 and have a list which is having a porn
> > spam problem. The list is set to discard posts from non-members, and
> > the list moderator has set various filters to try to filter on words
> > which contain "f***", as many do, however the Subject, From and Reply-
> > to addresses are all UTF-8 strings, and are apparently confusing
> > Mailman's decision-making functions, and these posts are ending up in
> > the administrative requests list.  Here's a sample set of headers:
> 
> 
> Exactly what filters are used?

The only filter relevant to this issue is "(?i)Subject: .*[fuck]". It
apparently isn't working, or the syntax isn't proper (although the re
syntax looks OK to me. I didn't put it there, the list admin/moderator
did.

> header_filter_rules will RFC 2047 decode the headers.

We don't know what to put into the rules for From and Reply-to since
these are encoded in the message detail, as they are in the displayed
headers. And even if the from headers could be put here, it is, as I
said, a game of whack-a-mole.

The held message page section header is no help, i.e.

  Held Messages
  --
 From:=?utf-8?b?ik1hcmlliia8twfyawvay29ty2fzdgj1c2luzxnzlm5ldd4=?=

 etc.

> mm_cfg.KNOWN_SPAMMERS and bounce_matching_headers do not, but since
> bounce_matching_headers only holds the message, I'm guessing you aren't
> using that, and since list owners can't set mm_cfg.KNOWN_SPAMMERS, I'm
> guessing you aren't using that either.

I run the system, and have access to mm_cfg, so I can put what's
necressary there, But I would assume one would have to match decoded
>From headers, and all these headers in the held posts are encoded.
They're mostly different from message to message, so in any event
blocking by From header, either encoded or decoded is as I said an
exercise in whack-a-mole.

> > MM is properly decoding the Subject in the message detail headers, but
> > not the From address.
> > 
> > Is there any way to get these get Mailman to properly handle these?
> 
> 
> If the only issue is the From: or other sender header, Mailman doesn't
> RFC 2047 decode those in trying to determine if the sender is a member,
> but what's the issue? If you are trying to match a specific address in
> discard_these_nonmembers, I see the problem, but you can discard them by
> setting generic_nonmember_action to discard.

This _is_ how it's set.

> If you only want to discard non-member posts with RFC 2047 encoded
> From:, you could put something like
> 
> ^[^@]+@[a-z0-9_.]+$

> in hold_these_nonmembers to hold the ones that at least don't have
> base64 encoded From:

The list manager has set generic_nonmember_action to Discard, which
should be the last word, _unless_ the _decoded_ from address shows up
in some other place such that the message is held for approval rather
than discarded outright, which is the desired action.
generic_nomemember_action only comes into play if "no explicit action
is defined", so perhaps there's a match somewhere, but again, the From
headers are encoded in the held message detail, so it's hard to tell.
The offending poster may even have joined the list (the list moderator
has default_member_moderation turned on.) 

Is there function or class method in the Python code which can be used
to decode these headers? As you may recall, I'm somewhat Python
literate - actually a minor contributor to the MM 2 code base :)

I also looked at bounce_matching_headers. The explanation and name on
this setting is ambiguous since the name implies a full bounce, but the
explanation says that posts are "held" [for moderation].
 
-- 
Lindsay Haisley   |  "The arc of history is long, but
FMP Computer Services | it bends toward Justice"
512-259-1190  |
http://www.fmp.com|- Barack Obama


--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] UTF-8 From and Reply-to addresses not getting properly processed.

2020-02-15 Thread Mark Sapiro
On 2/15/20 5:58 PM, Lindsay Haisley wrote:
> We're running Mailman 2.1.18-1 and have a list which is having a porn
> spam problem. The list is set to discard posts from non-members, and
> the list moderator has set various filters to try to filter on words
> which contain "f***", as many do, however the Subject, From and Reply-
> to addresses are all UTF-8 strings, and are apparently confusing
> Mailman's decision-making functions, and these posts are ending up in
> the administrative requests list.  Here's a sample set of headers:


Exactly what filters are used?

header_filter_rules will RFC 2047 decode the headers.
mm_cfg.KNOWN_SPAMMERS and bounce_matching_headers do not, but since
bounce_matching_headers only holds the message, I'm guessing you aren't
using that, and since list owners can't set mm_cfg.KNOWN_SPAMMERS, I'm
guessing you aren't using that either.



> MM is properly decoding the Subject in the message detail headers, but
> not the From address.
> 
> Is there any way to get these get Mailman to properly handle these?


If the only issue is the From: or other sender header, Mailman doesn't
RFC 2047 decode those in trying to determine if the sender is a member,
but what's the issue? If you are trying to match a specific address in
discard_these_nonmembers, I see the problem, but you can discard them by
setting generic_nonmember_action to discard.

If you only want to discard non-member posts with RFC 2047 encoded
From:, you could put something like

^[^@]+@[a-z0-9_.]+$

in hold_these_nonmembers to hold the ones that at least don't have
base64 encoded From:


-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org