Re: [Mailman-Developers] Regexp filtering

2016-03-02 Thread Adam McGreggor
On Tue, Mar 01, 2016 at 11:13:18PM +0900, Stephen J. Turnbull wrote:
> Adam McGreggor writes:
> 
>  > Or could we meet user expectations (real users, not geeks), [and
>  > allow glob syntax].
> 
> Definitely worth discussing, but my initial reaction is negative for
> the reasons discussed below.
> 
>  > Simples:
>  > *@mail.ru
>  > *@*mail.ru
>  > ?@mail.ru
> 
> Are those anchored?  At the beginning of string?  At end?  

'throughout'.


> Is there really a use case for "?"?  I don't see this as an obvious feature.

I'd imagine there could be some use for people wanting say, to handle
five-character localparts of an address, although it's an in-elegant
approach, it's something a user can understand, without needing to
understand regexp ("all our new subscriptions are five characters
before the @ sign. I want to block them").

> Globs are also too blunt for the use case, especially since bad actors
> do deliberately use fine distinctions between well-known domains and
> their own sinkholes of depravity when phishing. 

True. (I was picking on mail.ru, as it's one of the common ones that I
find quite irresponsible).

> Users are likely to
> be lazy, using "*@*mail.ru" to catch both "badac...@mail.ru" and
> "badac...@spamsource.mail.ru", trashing "nice...@goodmail.ru"'s posts
> in the process.

Are they going to use *@* necessarily, or just *@? (unless they want
subdomains when "*@*.mail.ru" might be acceptable).

> 
>  > Off the top of my head, the syntax would define if it's an absolute
>  > address (f...@example.com) vs a regexp.
> 
> "f...@example.com" is unambiguous, but "foo+mail...@example.com" is
> not.  That's a big trap for users, who surely know exactly what they
> mean by that (and it's not fmail...@example.com!)

Agree.

-- 
"applying logic to English slang is never a sound idea"
-- Stephen Fry
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Regexp filtering

2016-03-02 Thread Adam McGreggor
On Tue, Mar 01, 2016 at 09:26:13AM -0500, Barry Warsaw wrote:
> globs make sense for file system operations, and we've been using them for
> decades in shells.  I think globs make less sense for header value pattern
> matching.

Looking at my sieve/procmail recipes, I rarely use globs (except in
blacklisting), it seems.

In the blacklisting case, it's against words in Subject: lines, as
well as Sender:/From: headers. I'd imagine (for those still using such
things), that's a fairly common approach.

-- 
"Ink is handicapped, in a way, because you can blow up a man with gunpowder in
 half a second, while it may take twenty years to blow him up with a book. But
 the gunpowder destroys itself along with its victim, while a book can keep on
 exploding for centuries."
-- Christopher Morley
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Regexp filtering

2016-03-01 Thread Barry Warsaw
On Mar 01, 2016, at 11:13 PM, Stephen J. Turnbull wrote:

>In theory we could use globs as well (some of the modern VCSes permit
>glob or regexp syntax), but it's not a serious data loss issue for a
>VCS if a mistake is made.  You just run the add command again with -f,
>or uncommit, or whatever.  Granted, a perverse enough user could fail
>to add a file, commit, then overwrite the file, but this is much less
>serious than the possibility that a particular user would end up as
>collateral damage to a spam filter.

globs make sense for file system operations, and we've been using them for
decades in shells.  I think globs make less sense for header value pattern
matching.

Cheers,
-Barry
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Regexp filtering

2016-03-01 Thread Stephen J. Turnbull
Adam McGreggor writes:

 > Or could we meet user expectations (real users, not geeks), [and
 > allow glob syntax].

Definitely worth discussing, but my initial reaction is negative for
the reasons discussed below.

 > Simples:
 > *@mail.ru
 > *@*mail.ru
 > ?@mail.ru

Are those anchored?  At the beginning of string?  At end?  Is there
really a use case for "?"?  I don't see this as an obvious feature.
Globs are also too blunt for the use case, especially since bad actors
do deliberately use fine distinctions between well-known domains and
their own sinkholes of depravity when phishing.  Users are likely to
be lazy, using "*@*mail.ru" to catch both "badac...@mail.ru" and
"badac...@spamsource.mail.ru", trashing "nice...@goodmail.ru"'s posts
in the process.

 > Off the top of my head, the syntax would define if it's an absolute
 > address (f...@example.com) vs a regexp.

"f...@example.com" is unambiguous, but "foo+mail...@example.com" is
not.  That's a big trap for users, who surely know exactly what they
mean by that (and it's not fmail...@example.com!)

In theory we could use globs as well (some of the modern VCSes permit
glob or regexp syntax), but it's not a serious data loss issue for a
VCS if a mistake is made.  You just run the add command again with -f,
or uncommit, or whatever.  Granted, a perverse enough user could fail
to add a file, commit, then overwrite the file, but this is much less
serious than the possibility that a particular user would end up as
collateral damage to a spam filter.

Steve
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Regexp filtering

2016-03-01 Thread Adam McGreggor
On Tue, Mar 01, 2016 at 04:37:16AM +0900, Stephen J. Turnbull wrote:
> Barry Warsaw writes:
> 
>  > IBan would need to have a flag which indicate whether the `email`
>  > is a literal address or a pattern.  I don't think it's worth having
>  > two separate interfaces/models, but we might want to rename `email`
>  > to something more generic (`pattern` would be fine, with the
>  > understanding that is_regexp=False means the pattern is a literal).
> 
> Are regexps sufficiently slow that *always* using a regexp would hurt
> performance?[1]  The model I really had in mind was to always use
> regexps, and have a flag in the UI (Postorius) to regexp-quote when
> the user wants a literal.

Or could we meet user expectations (real users, not geeks), and just
interpret * and ? (for example) as being regexp values, as well as
letting power users use more complicated regexps?

Essentially the two classes:

Simples:
*@mail.ru
*@*mail.ru
?@mail.ru

Power-user:
^.*\+.*?\d{3,}@
\.*j\.*o\.*e\.*b\.*l\.*o\.*w\.*+.*@gmail\.com

and the sort we saw in the threads around bot subscriptions and
regexps on Mailman-user?


Off the top of my head, the syntax would define if it's an absolute
address (f...@example.com) vs a regexp.

-- 
"I never make predictions. I never have, and I never will."
-- Tony Blair
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Regexp filtering

2016-02-29 Thread Stephen J. Turnbull
Barry Warsaw writes:
 > On Mar 01, 2016, at 04:37 AM, Stephen J. Turnbull wrote:

 > >Or we could continue to have the core representation be "leading '^'
 > >iff regexp", and once again have Postorius prepend "^.*" or whatever.
 > 
 > In which case, the core's model wouldn't have to change, right?

That's the point, yes!

 > I really want to avoid regexp-quoted strings for literals in the
 > model.  I'm fine if the core model doesn't change but Postorius
 > makes things nicer for the user.

OK on avoidance, you're the FLUFL after all!  If Terri or Florian
doesn't pipe up with total hate soon (== by the time I getta round
tuit), I'll file a feature request.

Steve
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Regexp filtering

2016-02-29 Thread Barry Warsaw
On Mar 01, 2016, at 04:37 AM, Stephen J. Turnbull wrote:

>Are regexps sufficiently slow that *always* using a regexp would hurt
>performance?[1]  The model I really had in mind was to always use
>regexps, and have a flag in the UI (Postorius) to regexp-quote when
>the user wants a literal.

I think it's less about performance and more about being explicit.  My own
sense is that literals are more common than regexps, and that in general
regexps are more difficult to understand, but I don't have a lot of data
points to back that up.

>Or we could continue to have the core representation be "leading '^'
>iff regexp", and once again have Postorius prepend "^.*" or whatever.

In which case, the core's model wouldn't have to change, right?

I really want to avoid regexp-quoted strings for literals in the model.  I'm
fine if the core model doesn't change but Postorius makes things nicer for the
user.

Cheers,
-Barry


pgpGVtohYY40u.pgp
Description: OpenPGP digital signature
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Developers] Regexp filtering

2016-02-29 Thread Stephen J. Turnbull
Barry Warsaw writes:

 > IBan would need to have a flag which indicate whether the `email`
 > is a literal address or a pattern.  I don't think it's worth having
 > two separate interfaces/models, but we might want to rename `email`
 > to something more generic (`pattern` would be fine, with the
 > understanding that is_regexp=False means the pattern is a literal).

Are regexps sufficiently slow that *always* using a regexp would hurt
performance?[1]  The model I really had in mind was to always use
regexps, and have a flag in the UI (Postorius) to regexp-quote when
the user wants a literal.

Or we could continue to have the core representation be "leading '^'
iff regexp", and once again have Postorius prepend "^.*" or whatever.


Footnotes: 
[1]  XEmacs actually checks whether a regexp contains any regexp
operators and automatically switches to a very fast literal search if
not.


___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Regexp filtering

2016-02-29 Thread Barry Warsaw
On Feb 27, 2016, at 02:02 PM, Stephen J. Turnbull wrote:

>I hope we haven't propagated this rather user-unfriendly interface
>(the convention of accepting both regexps and literals, distinguishing
>by "^" in column 0) to Mailman 3.

Sadly, it's true.

Mostly this is historical since we've essentially just ported the data and
code from Mailman 2.  It was implemented this way because of the limitations
for data modeling, and the unsophisticated web ui in MM2.

We could do better in MM3, both because we can model the data better, we can
expose the distinction in REST, and Postorius could expose the difference in a
much better web ui.

Here's a rough sketch of what you'd have to do in the core to make this
change.  As always merge requests are welcome!

IBan would need to have a flag which indicate whether the `email` is a literal
address or a pattern.  I don't think it's worth having two separate
interfaces/models, but we might want to rename `email` to something more
generic (`pattern` would be fine, with the understanding that is_regexp=False
means the pattern is a literal).  You'll need to change a bunch of checks and
what-not in the ban management code.

This also shows up in AcceptableAliases, so a similar change would have to be
made to IAcceptableAlias, the various model implementation bits of that
interface, and the implicit_dest.py rule.

The REST API for these would probably need some additional work, but that
can't easily be done.  The trickiest part would be if IBan.email is renamed,
in which case you'd probably want to continue to accept the old data format
for the 3.0 API (and translate it into the new model layer), but only accept
the new data format in the 3.1 API.  There are examples of how to do
API-version differentiation.

It's still used in the *_these_nonmember checks (moderation.py rule), but as
these are legacy facilities from Mailman 2, I'm not sure they need to change.
Eventually, we want to remove these settings anyway, since all the
functionality is implemented differently and better in MM3 already.

Another odd use of this is in the `withlist` subcommand.

(It's also used in the wsgiref/falcon plumbing layer, but since that's all
internal implementation details, nothing here needs to change.)

You'd need to handle database migrations and documentation updates too, along
with a robust test suite, but there's nothing intractable about any of this.

Cheers,
-Barry
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Regexp filtering

2016-02-27 Thread Stephen J. Turnbull
Mark Sapiro writes:

 > I agree it's confusing, and I've been caught in this confusion myself
 > and neglected to put the leading ^ in what I clearly intended to be a
 > regexp, but the convention goes back a long way in MM2.

Oh, of course I'm -1 on changing "regexps start with '^'" convention
in Mailman 2 myself!
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Regexp filtering

2016-02-26 Thread Mark Sapiro
On 02/26/2016 09:02 PM, Stephen J. Turnbull wrote:
> On Mailman-Users, Mark Sapiro writes:
> 
>  > Further, in the ban_list (and many other places in Mailman) if an
>  > address is intended to be a regular expression pattern, it must begin
>  > with '^', so you really want
>  > 
>  > ^.*@domain\.com$
>  > 
>  > to match any_addr...@domain.com.
> 
> I hope we haven't propagated this rather user-unfriendly interface
> (the convention of accepting both regexps and literals, distinguishing
> by "^" in column 0) to Mailman 3.  Even as a Python programmer, I find
> Mark's post somewhat confusing: I would design filters using
> re.search, so that the above would actually be equivalent as a Python
> regular expression to r"@domain\.com$".  OTOH, if the implementation
> uses re.match, the "^" is redundant, so I have a "say what?!" event.


I agree it's confusing, and I've been caught in this confusion myself
and neglected to put the leading ^ in what I clearly intended to be a
regexp, but the convention goes back a long way in MM2.


> If we have, I propose changing it to
> 
> Ban these addresses, one entry per line: []
> [ ] Entries are regular expressions.
> 
> or something like that.  We also ought to have a "Python features for
> Mailman administrators" section of the FAQ, starting with "what is a
> regular expression", and giving examples of how to accomplish common
> tasks like banning a whole domain with regular expressions.  Typical
> regexp FAQs are hard for non-programmers (and even beginning
> programmers) to grasp.
> 
> I don't have time to actually work on these now, but if there's uptake
> on the suggestion ("let's think about it" at +0 or above :-) I'll file
> issues.


I'm not sure what the MM3 story is at this point, but +1 for Steve's idea.

-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


[Mailman-Developers] Regexp filtering

2016-02-26 Thread Stephen J. Turnbull
On Mailman-Users, Mark Sapiro writes:

 > Further, in the ban_list (and many other places in Mailman) if an
 > address is intended to be a regular expression pattern, it must begin
 > with '^', so you really want
 > 
 > ^.*@domain\.com$
 > 
 > to match any_addr...@domain.com.

I hope we haven't propagated this rather user-unfriendly interface
(the convention of accepting both regexps and literals, distinguishing
by "^" in column 0) to Mailman 3.  Even as a Python programmer, I find
Mark's post somewhat confusing: I would design filters using
re.search, so that the above would actually be equivalent as a Python
regular expression to r"@domain\.com$".  OTOH, if the implementation
uses re.match, the "^" is redundant, so I have a "say what?!" event.

If we have, I propose changing it to

Ban these addresses, one entry per line: []
[ ] Entries are regular expressions.

or something like that.  We also ought to have a "Python features for
Mailman administrators" section of the FAQ, starting with "what is a
regular expression", and giving examples of how to accomplish common
tasks like banning a whole domain with regular expressions.  Typical
regexp FAQs are hard for non-programmers (and even beginning
programmers) to grasp.

I don't have time to actually work on these now, but if there's uptake
on the suggestion ("let's think about it" at +0 or above :-) I'll file
issues.

___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9