Re: Proposal for 1.2: Dumber email validation

2009-10-15 Thread Ulrich Petri


> Russell raises my biggest concern with this proposal.  There are
> a lot of smart folks in the Django-Developers end of things that
> can cobble together a pretty legit regexp that covers the
> majority of cases with no horrific DOS cases (e.g. last security
> issue).
>
...
> My initial candidate is ticket #12005, though it merely
> re.VERBOSE's the original and tweaks the domain portion to meet
> an internal need.  Some changes on the stuff before the "@" might
> make it more "relaxed" (if not RFC-compliant-ish) while keeping
> out some of the badness.
>

Of course it would also be possible to use a non-regex approach. There
are libraries by Dominic Sayers [1] and Cal Henderson [2] with a ton
of tests. Unfortunately they are written in PHP but shouldn't be to
hard to translate to Python. As a bonus the authors claim these
libraries to validate fully RFC compliant.

Ulrich

[1] http://www.dominicsayers.com/isemail/
[2] http://code.iamcal.com/php/rfc822/
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Proposal for 1.2: Dumber email validation

2009-10-15 Thread Tim Chase

> 1) If we encourage people to write their own regex if they want
> tighter email validation, we run the risk that users will
> inadvertently introduce the same bug that we have just fixed. 

Russell raises my biggest concern with this proposal.  There are 
a lot of smart folks in the Django-Developers end of things that 
can cobble together a pretty legit regexp that covers the 
majority of cases with no horrific DOS cases (e.g. last security 
issue).

I've seen the regexps created by people who don't comprehend them 
and it's UGLY.  This proposal basically throws those people to 
the wolves.

I'd much rather Django provided an email field that got most of 
the way and let regexp-understanding users tweak if needed.  But 
I'd hate to see somebody opening themselves to email addresses like

   bad_stuff()@wherever.space space.&

or

   f...@domain.tld\x0a\0x0dfrom: s...@spammer.spam\x0a\0x0dto: 
s...@spimmer.spim\x0a\x0d\x0a\x0dspam, spam, spam!

which can (without added caution) inject headers into sent-mail.

My initial candidate is ticket #12005, though it merely 
re.VERBOSE's the original and tweaks the domain portion to meet 
an internal need.  Some changes on the stuff before the "@" might 
make it more "relaxed" (if not RFC-compliant-ish) while keeping 
out some of the badness.

-tim




--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Proposal for 1.2: Dumber email validation

2009-10-15 Thread Chris Adams

On Oct 10, 9:35 am, James Bennett  wrote:
> So what I'd like to propose is that EmailField essentially check that
> the value contains an '@', and a '.' somewhere after it. This will
> cover most addresses that are likely to be in actual use, and various
> confirmation processes can be used to rule out any invalid addresses
> which happen to slip through that.

Good idea - every real project I've had where this became an issue had
to switch to some sort of actual mail-based validation system
(confirmation, live MX connection, etc.). Adding doc links about email
validation tools would be better because it'd establish that this is
something which deserves some thought if you rely on email addresses.

Chris
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Proposal for 1.2: Dumber email validation

2009-10-10 Thread Jeremy Dunck

Ned,
   You really ought to show us all how to use that time machine. :)


On Oct 10, 2009, at 8:49 AM, Ned Batchelder   
wrote:

>
> +1
>
> http://nedbatchelder.com/blog/200908/humane_email_validation.html
>
> I was going to kibbitz on the fix (removing a single * would have
> sufficed), and realized we were once again in the quagmire of email
> regex validation.
>
> --Ned.
>
> James Bennett wrote:
>> In light of yesterday's security issue, I'd like to propose that we
>> significantly dumb down the regex Django uses to validate email
>> addresses.
>>
>> Currently, the regex we use covers many common cases, but comes
>> nowhere near covering the entire spectrum of addresses allowed by the
>> RFC; several tickets are open regarding this. Trying to cover more of
>> the RFC is possible, although supporting all valid email addresses is
>> not (various regexes claim to do this, but full coverage is  
>> impossible
>> -- the RFC is flexible enough WRT things like nested comments that  
>> I'm
>> fairly certain no single regex can handle them all), and -- as we've
>> seen -- attempts to cover a broader chunk of the RFC can introduce
>> issues with performance.
>>
>> So what I'd like to propose is that EmailField essentially check that
>> the value contains an '@', and a '.' somewhere after it. This will
>> cover most addresses that are likely to be in actual use, and various
>> confirmation processes can be used to rule out any invalid addresses
>> which happen to slip through that. Meanwhile, people who want to
>> support comments inside a bang path or other such exotic beasts can
>> simply write their own regex for it and tell a form to use that
>> instead.
>>
>>
>>
>>
>>
>
> >

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Proposal for 1.2: Dumber email validation

2009-10-10 Thread Russell Keith-Magee

On Sat, Oct 10, 2009 at 9:35 PM, James Bennett  wrote:
>
> In light of yesterday's security issue, I'd like to propose that we
> significantly dumb down the regex Django uses to validate email
> addresses.
>
> Currently, the regex we use covers many common cases, but comes
> nowhere near covering the entire spectrum of addresses allowed by the
> RFC; several tickets are open regarding this. Trying to cover more of
> the RFC is possible, although supporting all valid email addresses is
> not (various regexes claim to do this, but full coverage is impossible
> -- the RFC is flexible enough WRT things like nested comments that I'm
> fairly certain no single regex can handle them all), and -- as we've
> seen -- attempts to cover a broader chunk of the RFC can introduce
> issues with performance.
>
> So what I'd like to propose is that EmailField essentially check that
> the value contains an '@', and a '.' somewhere after it. This will
> cover most addresses that are likely to be in actual use, and various
> confirmation processes can be used to rule out any invalid addresses
> which happen to slip through that. Meanwhile, people who want to
> support comments inside a bang path or other such exotic beasts can
> simply write their own regex for it and tell a form to use that
> instead.

+1, with two additions:

1) If we encourage people to write their own regex if they want
tighter email validation, we run the risk that users will
inadvertently introduce the same bug that we have just fixed. We
should probably beef up the documentation of RegexField to highlight
the potential problem, give a few examples of how it can be triggered,
and give some links to useful resources.

2) I think we should we relax the analogous regex check on URLField.
This one is slightly self-serving - one of the customers has an
internal network in which machines are named mymachine.foowhiz -
which, is a violation of the RFC because of the 7 character TLD, but
that doesn't change the fact that it works fine on their internal
network.

A quick survey of tickets affected by this:

#9764 - Validation on internationalized domain names
#9202 - URLField validation
#7334 - non-ASCII domains (possible dupe of #9764)
#6092 - Allow custom validator for URL and Email fields

Russ %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Proposal for 1.2: Dumber email validation

2009-10-10 Thread Ned Batchelder

+1

http://nedbatchelder.com/blog/200908/humane_email_validation.html

I was going to kibbitz on the fix (removing a single * would have 
sufficed), and realized we were once again in the quagmire of email 
regex validation.

--Ned.

James Bennett wrote:
> In light of yesterday's security issue, I'd like to propose that we
> significantly dumb down the regex Django uses to validate email
> addresses.
>
> Currently, the regex we use covers many common cases, but comes
> nowhere near covering the entire spectrum of addresses allowed by the
> RFC; several tickets are open regarding this. Trying to cover more of
> the RFC is possible, although supporting all valid email addresses is
> not (various regexes claim to do this, but full coverage is impossible
> -- the RFC is flexible enough WRT things like nested comments that I'm
> fairly certain no single regex can handle them all), and -- as we've
> seen -- attempts to cover a broader chunk of the RFC can introduce
> issues with performance.
>
> So what I'd like to propose is that EmailField essentially check that
> the value contains an '@', and a '.' somewhere after it. This will
> cover most addresses that are likely to be in actual use, and various
> confirmation processes can be used to rule out any invalid addresses
> which happen to slip through that. Meanwhile, people who want to
> support comments inside a bang path or other such exotic beasts can
> simply write their own regex for it and tell a form to use that
> instead.
>
>
>
>
>   

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---