Unicode normalization for username field

2016-04-21 Thread Rick Leir
Hi all,
We have discussed the possibility of username spoofing in the users list. 

https://groups.google.com/d/msg/django-users/Q0WDYqJsBsY/Sq-P0814LwAJ

"It's not important until this happens: 
https://labs.spotify.com/2013/06/18/creative-usernames/ 

But my searches did not turn up anything in this list. Would you point me 
at any relevant discussions here please?
Thanks -- Rick

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/b578b3b8-6887-41e6-978d-ffad4096822b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-04-21 Thread Tim Graham
Here is one: 
https://groups.google.com/d/topic/django-developers/6aAHgP5g0lA/discussion
(all I did was search "unicode username")

Here's a relevant Trac ticket: https://code.djangoproject.com/ticket/21379

On Thursday, April 21, 2016 at 11:22:54 AM UTC-4, Rick Leir wrote:
>
> Hi all,
> We have discussed the possibility of username spoofing in the users list. 
>
> https://groups.google.com/d/msg/django-users/Q0WDYqJsBsY/Sq-P0814LwAJ
>
> "It's not important until this happens: 
> https://labs.spotify.com/2013/06/18/creative-usernames/ 
>
> But my searches did not turn up anything in this list. Would you point me 
> at any relevant discussions here please?
> Thanks -- Rick
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/598b82f5-7903-4c4c-adc0-80d4e3722459%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-04-21 Thread Rick Leir
Thanks. To summarize quickly, (corrections please)

2008 - Usernames in django.contrib.auth are restricted to ASCII  
alphanumerics. Allowing Unicode seems fairly simple: compile the  
validator's regular expression with the re.UNICODE flag.

but:
http://en.wikipedia.org/wiki/Internationalized_domain_name#ASCII_Spoofing_and_squatting_concerns

2014 - trac issue still open, no tests, no patches, with problems in 
difference between py2 and py3 (py2 is supported until 2017)

normalization could be done with 
https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize
unicodedata.normalize(input, 'NFKD')


On Thursday, 21 April 2016 11:43:56 UTC-4, Tim Graham wrote:
>
> Here is one: 
> https://groups.google.com/d/topic/django-developers/6aAHgP5g0lA/discussion
> (all I did was search "unicode username")
>
> Here's a relevant Trac ticket: https://code.djangoproject.com/ticket/21379
>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/746795c5-7009-48c4-8065-9d5c1997033c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-04-21 Thread Aymeric Augustin
Hello,

Judging from the (rather confused) discussion on the users lists, it looks like 
we’re discussing in the abstract. No one has tested whether the problem can 
happen with Django.

Since the ticket quoted below says Django (unexpectedly) accepts non-ascii 
usernames on Python 3, it’s just a matter of trying to create a user called 
rené and one called rené.

Here’s how to create these strings if copy-pasting them doesn’t suffice:

>>> composed = 'rené'
>>> composed.encode('utf-8')
b'ren\xc3\xa9'
>>> import unicodedata
>>> decomposed = unicodedata.normalize('NFKD', composed)
>>> decomposed.encode('utf-8')
b'rene\xcc\x81’

I suspect the problem may not exist on full-featured database (i.e. not 
SQLite), depending on the database’s collation settings, which will cause these 
two strings to compare identical on most reasonable setups.

Who wants to try on various databases?

If it turns out that the problem does exist and that Django should normalize 
things, it should normalize to NFKC. Proposing normalization to NFKD suggests a 
lack of familiarity with the topic.

For what it’s worth, I’m in favor of restoring the intended behavior of 
restricting usernames to ASCII on Python 3 and letting developers who want 
something more elaborate implement their own requirements.

One last anecdote: I live in a country where many people have non-ASCII names 
and no one would ask for a non-ASCII username because everyone knows it would 
cause problems at some point, even if IT thinks otherwise! :-)

-- 
Aymeric.

> On 21 Apr 2016, at 20:16, Rick Leir  wrote:
> 
> Thanks. To summarize quickly, (corrections please)
> 
> 2008 - Usernames in django.contrib.auth are restricted to ASCII  
> alphanumerics. Allowing Unicode seems fairly simple: compile the  
> validator's regular expression with the re.UNICODE flag.
> 
> but:
> http://en.wikipedia.org/wiki/Internationalized_domain_name#ASCII_Spoofing_and_squatting_concerns
> 
> 2014 - trac issue still open, no tests, no patches, with problems in 
> difference between py2 and py3 (py2 is supported until 2017)
> 
> normalization could be done with 
> https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize
> unicodedata.normalize(input, 'NFKD')
> 
> 
> On Thursday, 21 April 2016 11:43:56 UTC-4, Tim Graham wrote:
> Here is one: 
> https://groups.google.com/d/topic/django-developers/6aAHgP5g0lA/discussion 
> 
> (all I did was search "unicode username")
> 
> Here's a relevant Trac ticket: https://code.djangoproject.com/ticket/21379 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com 
> .
> To post to this group, send email to django-developers@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/django-developers 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/746795c5-7009-48c4-8065-9d5c1997033c%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/2D6A86D6-C907-4D1E-8CF4-43289B8B278F%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-04-22 Thread Claude Paroz
Le jeudi 21 avril 2016 21:23:16 UTC+2, Aymeric Augustin a écrit :
>
> For what it’s worth, I’m in favor of restoring the intended behavior of 
> restricting usernames to ASCII on Python 3 and letting developers who want 
> something more elaborate implement their own requirements.
>

I'm sorry to disagree, you know that I'm a Unicode's nerd :-) We should 
have probably done that when adding Python 3 support, but it might be a bit 
late now. I'll see if I can find the time to work on something acceptable, 
allowing people to choose either policy without too much hassle and 
backwards incompatibility. Of course, anyone else could try it, too.
 

> One last anecdote: I live in a country where many people have non-ASCII 
> names and no one would ask for a non-ASCII username because everyone knows 
> it would cause problems at some point, even if IT thinks otherwise! :-)
>

As for me, I think that's a behavior inherited from the past, where pure 
ASCII was king. It feels to me a bit ethnocentric (even if I know there 
are/were technical reasons for that).

Claude

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/acc00270-1939-4806-b5e4-bb0c549f59cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-04-22 Thread Claude Paroz
Le vendredi 22 avril 2016 14:25:59 UTC+2, Claude Paroz a écrit :
>
>  I'll see if I can find the time to work on something acceptable, allowing 
> people to choose either policy without too much hassle and backwards 
> incompatibility. Of course, anyone else could try it, too.
>

Here's some code, unpolished, but a base for discussion.
https://github.com/django/django/pull/6494 

Claude

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8fe9adaf-7d35-401e-b48f-073bf746efaf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-04-23 Thread Aymeric Augustin
Hi Claude,

> Le 23 avr. 2016 à 00:04, Claude Paroz  a écrit :
> 
> Le vendredi 22 avril 2016 14:25:59 UTC+2, Here's some code, unpolished, but a 
> base for discussion.
> https://github.com/django/django/pull/6494

This patch looks pretty good. I have a few questions, not necessarily because I 
disagree with your proposal, but to make sure we have considered alternatives. 
Actually I don't think there's exactly one correct solution here; it's more a 
matter of tradeoffs.

You added a username_validator attribute instead of documenting how to override 
the whole username field. Can you elaborate on this decision? I simplifies the 
use case targeted by the patch by introducing a one-off API. As a matter of 
principle I'm a bit skeptical of such special cases. But I understand the 
convenience.

Normalization happens at the form layer. I'm wondering whether it would be 
safer to do it at the model layer. That would extend the security hardening to 
cases where users aren't created with a form — for example if they're created 
through an API or programmatically.

I would keep ASCII usernames as the default because:

- this has always been the intent;
- allowing non ASCII usernames may result in interoperability problems with 
other software e.g. if a Django project is used as SSO server;
- these interoperability issues might escalate into security vulnerabilities — 
there isn't a straightforward connection but (1) non ASCII data can be used for 
breaking out of parsing routines (2) I'm paranoid with anything that 
manipulates authentication credentials;
- I'm afraid this change may result in boilerplate as most custom user models 
will revert to Django's historical (and in my opinion sensible) username 
validation rules.

Finally, I would add a test to check that a username containing a zero-width 
space is rejected, just to make sure we never accidentally make it trivial to 
create usernames that render identically, which this PR aims at preventing. It 
will be rejected because it won't match \w.

Best regards,

-- 
Aymeric.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/DCD0DA6C-B366-4602-8823-9E51CB054458%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-04-24 Thread Claude Paroz
Hi Aymeric,

Le samedi 23 avril 2016 14:33:56 UTC+2, Aymeric Augustin a écrit :
>
> > https://github.com/django/django/pull/6494 
>
> This patch looks pretty good. I have a few questions, not necessarily 
> because I disagree with your proposal, but to make sure we have considered 
> alternatives. Actually I don't think there's exactly one correct solution 
> here; it's more a matter of tradeoffs. 
>
> You added a username_validator attribute instead of documenting how to 
> override the whole username field. Can you elaborate on this decision? I 
> simplifies the use case targeted by the patch by introducing a one-off API. 
> As a matter of principle I'm a bit skeptical of such special cases. But I 
> understand the convenience.
>

My preoccupation here was not to force users to create a custom user model 
just to change the username validation, especially as the migration system 
doesn't seem to support yet upgrading from the standard auth User to a 
custom user. I thought that creating a proxy custom user is easier 
migration-wise, as no new table is required. But I may be wrong.
 

> Normalization happens at the form layer. I'm wondering whether it would be 
> safer to do it at the model layer. That would extend the security hardening 
> to cases where users aren't created with a form — for example if they're 
> created through an API or programmatically. 
>

Normalization happens both at the form layer and at the model layer in 
_create_user. You may have missed the _create_user change.
 

> I would keep ASCII usernames as the default because: 
>
> - this has always been the intent; 
>

Until now! Things are evolving, we see that for example with 
internationalized domain names. I think that most if not all technical 
reasons requiring pure ASCII usernames have vanished nowadays.
 

> - allowing non ASCII usernames may result in interoperability problems 
> with other software e.g. if a Django project is used as SSO server; 
>

These are still not the majority of Django use cases. And even then, I 
think that LDAPv3 for example should support unicode in attributes. Those 
project could still configure the ASCIIUsernameValidator if desired.
 

> - these interoperability issues might escalate into security 
> vulnerabilities — there isn't a straightforward connection but (1) non 
> ASCII data can be used for breaking out of parsing routines (2) I'm 
> paranoid with anything that manipulates authentication credentials; 
>

Sure, the more characters, the more attack surface. As you said before, 
it's a tradeoff. My thinking is that sooner or later, we'll have to cope 
with unicode in usernames. So let's do our most to not open security holes, 
based on some passed issues (BTW I think you forgot you references).
 

> - I'm afraid this change may result in boilerplate as most custom user 
> models will revert to Django's historical (and in my opinion sensible) 
> username validation rules. 
>

That's a tough question to estimate. This might be true for most English 
monolingual web sites, but not necessarily for the majority of Django 
sites. Hopefully we'll get some more user inputs in this thread.
 

> Finally, I would add a test to check that a username containing a 
> zero-width space is rejected, just to make sure we never accidentally make 
> it trivial to create usernames that render identically, which this PR aims 
> at preventing. It will be rejected because it won't match \w. 
>

Sure, good idea.
 
Globally, I totally understand your opinion, and I agree there is no 
"right" or "wrong" solution. Eventually, this might be a decision to be 
brought to the technical broad.

Claude

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/3c750af9-68f4-4e6c-88be-ead3fe0d3358%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-04-25 Thread Erik Cederstrand

> Den 24. apr. 2016 kl. 20.58 skrev Claude Paroz :
> 
> - I'm afraid this change may result in boilerplate as most custom user models 
> will revert to Django's historical (and in my opinion sensible) username 
> validation rules. 
> 
> That's a tough question to estimate. This might be true for most English 
> monolingual web sites, but not necessarily for the majority of Django sites. 
> Hopefully we'll get some more user inputs in this thread.

>From a security point of view, I understand the reasoning here. Everyone 
>expects ASCII-only usernames. There were the same discussions when IDN was 
>introduced. But even with ASCII, people are attempting to spoof (googel.com, 
>gogle.com etc). We need a way to normalize unicode - preferably an external 
>library or method, since this is not a Django-specific problem.

Being from .dk, we're used to translating "Åge Æbelø" to "aage_aebeloe" when 
creating usernames. But just like 8.3 filenames were the norm in DOS and we got 
around that, we now expect to be able to create long filenames with spaces and 
unicode characters. There was a 15-year transition period where unicode 
filenames would maybe work and often break things in unexpected ways (I *still* 
find issues from time to time). Unicode was for the brave and those with lots 
of free time to waste. But I think that in 2016, software that cannot handle 
unicode is simply broken and must be fixed. Sure, unicode can be a hassle. I 
still need to hexdump strings once in a while to find out what is going on. But 
there is no way we can continue to not support the billions of people in the 
world that use a language that doesn't fit into ASCII.

For usernames, most people may still want the old behavior, and they can do 
that. It could even be the default (remember POLA). But being able to create 
unicode usernames should be possible and supported.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/9C750A64-E218-4D76-828C-72DA3E98ECAE%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-04-25 Thread Aymeric Augustin
Hi Claude,

On 24 Apr 2016, at 20:58, Claude Paroz  wrote:
> 
> Le samedi 23 avril 2016 14:33:56 UTC+2, Aymeric Augustin a écrit :
> 
> You added a username_validator attribute instead of documenting how to 
> override the whole username field. Can you elaborate on this decision? I 
> simplifies the use case targeted by the patch by introducing a one-off API. 
> As a matter of principle I'm a bit skeptical of such special cases. But I 
> understand the convenience.
> 
> My preoccupation here was not to force users to create a custom user model 
> just to change the username validation, especially as the migration system 
> doesn't seem to support yet upgrading from the standard auth User to a custom 
> user. I thought that creating a proxy custom user is easier migration-wise, 
> as no new table is required. But I may be wrong.

I believe that you can switch to a custom user model that has the same fields 
as auth.User just by declaring db_table = ‘auth_user’. You may still have to 
throw away your migration history and recreate a fresh set of migrations. I 
made these tests some time ago and I’m not sure of the results. Indeed, a proxy 
model is easier.

On a side note, we should recommend to always start with a custom user model… I 
don’t know if we added that to the docs.

> Globally, I totally understand your opinion, and I agree there is no "right" 
> or "wrong" solution. Eventually, this might be a decision to be brought to 
> the technical broad.

It’s a -0 from me, not a -1, and it may turn into a +0 as time passes... More 
arguments or opinions, especially backed by data or experience, would certainly 
be useful.

-- 
Aymeric.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/593A78A0-1078-42D1-AD59-41E1105B5715%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-04-25 Thread Shai Berger
On Monday 25 April 2016 21:11:51 Aymeric Augustin wrote:
> 
> It’s a -0 from me, not a -1, and it may turn into a +0 as time passes...
> More arguments or opinions, especially backed by data or experience, would
> certainly be useful.

As far as I can see, the force of the push to use non-ASCII usernames is 
inversely proportional to the size of your native alphabet's intersection with 
ASCII (for me, this intersection is empty).

Shai.


Re: Unicode normalization for username field

2016-04-25 Thread Aymeric Augustin
On 25 Apr 2016, at 20:31, Shai Berger  wrote:
> 
> On Monday 25 April 2016 21:11:51 Aymeric Augustin wrote:
>> 
>> It’s a -0 from me, not a -1, and it may turn into a +0 as time passes...
>> More arguments or opinions, especially backed by data or experience, would
>> certainly be useful.
> 
> As far as I can see, the force of the push to use non-ASCII usernames is 
> inversely proportional to the size of your native alphabet's intersection 
> with 
> ASCII (for me, this intersection is empty).

Based on further clarifications by Shai on IRC, I’m changing my -0 to +1.

Rather stupidly, I didn’t realize countries with non-latin alphabets are
already using non-ASCII usernames and mostly getting away with it.

-- 
Aymeric.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/A3A78393-4A5B-45A2-B8FB-A18B64329918%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-05-03 Thread Rick Leir
Hi all
Could there be a consensus with
-default to ASCII
-optionally, UTF8 with normalization
-based on Claude's code
-Python 3 required so we are not distracted by compatibility issues
Cheers -- Rick

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/5f657c11-499a-4d2b-b7d9-6aad4747f27d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-05-09 Thread Tim Graham
Rather than change the behavior of Python 2 near its last supported version 
of Django, I would make the default validator ASCII on Python 2 and Unicode 
on Python 3.

On Tuesday, May 3, 2016 at 9:29:06 AM UTC-4, Rick Leir wrote:
>
> Hi all
> Could there be a consensus with
> -default to ASCII
> -optionally, UTF8 with normalization
> -based on Claude's code
> -Python 3 required so we are not distracted by compatibility issues
> Cheers -- Rick

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/bb711bc1-e645-48c1-aa12-10bf950e3ad7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-05-09 Thread Claude Paroz
Le lundi 9 mai 2016 14:48:06 UTC+2, Tim Graham a écrit :
>
> Rather than change the behavior of Python 2 near its last supported 
> version of Django, I would make the default validator ASCII on Python 2 and 
> Unicode on Python 3.
>

I can buy this, providing we don't face migration issues.

Claude 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/9c69b956-5443-42dc-9b53-474fc7fbdbd5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-05-12 Thread Tim Graham
Just to be sure, do you mean django.db.migrations (referencing the 
appropriate validator in the migration file, I guess?) or some problem a 
project would face when migrating from Python 2 to 3?

On Monday, May 9, 2016 at 4:00:27 PM UTC-4, Claude Paroz wrote:
>
> Le lundi 9 mai 2016 14:48:06 UTC+2, Tim Graham a écrit :
>>
>> Rather than change the behavior of Python 2 near its last supported 
>> version of Django, I would make the default validator ASCII on Python 2 and 
>> Unicode on Python 3.
>>
>
> I can buy this, providing we don't face migration issues.
>
> Claude 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/a7362f65-146e-4515-95cd-5ef0fdbf78db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-05-14 Thread Claude Paroz
Le jeudi 12 mai 2016 18:45:15 UTC+2, Tim Graham a écrit :
>
> Just to be sure, do you mean django.db.migrations (referencing the 
> appropriate validator in the migration file, I guess?) or some problem a 
> project would face when migrating from Python 2 to 3?
>

Both things, hopefully not an issue, but who knows?

I have attached the new PR to the ticket.

Claude

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/61b6ebe2-c2d3-4a82-96fb-0410a7502d5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-05-19 Thread David Tan

>
> - I'm afraid this change may result in boilerplate as most custom user 
> models will revert to Django's historical (and in my opinion sensible) 
> username validation rules. 
>

That's a tough question to estimate. This might be true for most English 
monolingual web sites, but not necessarily for the majority of Django 
sites. Hopefully we'll get some more user inputs in this thread.
 

Hi, just wanted to give my input on this point, I agree with Aymeric 
Augustin here and my vote is to keep usernames as ASCII by default.

I created a Django ticket 
 for this, I will copy 
my reasoning here:

A Django user who is trying to save time and get a product out the door 
isn't going to focus on finer details such as Unicode usernames, and will 
be in for a shock when he finds out a bunch of his users have registered 
themselves with Egyptian hieroglyphics. He may be very frustrated, 
eventually figuring out that he must subclass the User model and 
setusername_validator 
= ASCIIUsernameValidator() to get the functionality he expected. And what 
is he to do with the existing Unicode users, delete all their accounts?

Whereas a technologically forward user might be friendlier towards Unicode 
usernames, and would be well-informed on these capabilities within Django. 
Furthermore, the technologically forward user will be more likely to 
already have a custom user model, and won't find it cumbersome to 
explicitly enable Unicode usernames. Enabling Unicode usernames isn't 
destructive like disabling it would be (no need to figure out what to do 
with the existing users offending the validation), so users can simply 
start using it immediately.

On Sunday, April 24, 2016 at 2:58:55 PM UTC-4, Claude Paroz wrote:
>
> Hi Aymeric,
>
> Le samedi 23 avril 2016 14:33:56 UTC+2, Aymeric Augustin a écrit :
>>
>> > https://github.com/django/django/pull/6494 
>>
>> This patch looks pretty good. I have a few questions, not necessarily 
>> because I disagree with your proposal, but to make sure we have considered 
>> alternatives. Actually I don't think there's exactly one correct solution 
>> here; it's more a matter of tradeoffs. 
>>
>> You added a username_validator attribute instead of documenting how to 
>> override the whole username field. Can you elaborate on this decision? I 
>> simplifies the use case targeted by the patch by introducing a one-off API. 
>> As a matter of principle I'm a bit skeptical of such special cases. But I 
>> understand the convenience.
>>
>
> My preoccupation here was not to force users to create a custom user model 
> just to change the username validation, especially as the migration system 
> doesn't seem to support yet upgrading from the standard auth User to a 
> custom user. I thought that creating a proxy custom user is easier 
> migration-wise, as no new table is required. But I may be wrong.
>  
>
>> Normalization happens at the form layer. I'm wondering whether it would 
>> be safer to do it at the model layer. That would extend the security 
>> hardening to cases where users aren't created with a form — for example if 
>> they're created through an API or programmatically. 
>>
>
> Normalization happens both at the form layer and at the model layer in 
> _create_user. You may have missed the _create_user change.
>  
>
>> I would keep ASCII usernames as the default because: 
>>
>> - this has always been the intent; 
>>
>
> Until now! Things are evolving, we see that for example with 
> internationalized domain names. I think that most if not all technical 
> reasons requiring pure ASCII usernames have vanished nowadays.
>  
>
>> - allowing non ASCII usernames may result in interoperability problems 
>> with other software e.g. if a Django project is used as SSO server; 
>>
>
> These are still not the majority of Django use cases. And even then, I 
> think that LDAPv3 for example should support unicode in attributes. Those 
> project could still configure the ASCIIUsernameValidator if desired.
>  
>
>> - these interoperability issues might escalate into security 
>> vulnerabilities — there isn't a straightforward connection but (1) non 
>> ASCII data can be used for breaking out of parsing routines (2) I'm 
>> paranoid with anything that manipulates authentication credentials; 
>>
>
> Sure, the more characters, the more attack surface. As you said before, 
> it's a tradeoff. My thinking is that sooner or later, we'll have to cope 
> with unicode in usernames. So let's do our most to not open security holes, 
> based on some passed issues (BTW I think you forgot you references).
>  
>
>> - I'm afraid this change may result in boilerplate as most custom user 
>> models will revert to Django's historical (and in my opinion sensible) 
>> username validation rules. 
>>
>
> That's a tough question to estimate. This might be true for most English 
> monolingual web sites, but not necessarily for the majority of Django 
> sites. Hopefully 

Re: Unicode normalization for username field

2016-05-19 Thread charettes
Hi David,

I agree with your reasoning but I think you're missing an important detail 
about
unicode username support: they have been mistakenly enabled on Python 3 
since
Django added support for it (1.5-1.6).

If we were to disallow non-ASCII characters silently from Django 1.10 
Python 3
developers would be left with the same problem you mentioned about existing
users with usernames containing unicode characters.

Cheers,
Simon

Le jeudi 19 mai 2016 14:48:39 UTC-4, David Tan a écrit :
>
> - I'm afraid this change may result in boilerplate as most custom user 
>> models will revert to Django's historical (and in my opinion sensible) 
>> username validation rules. 
>>
>
> That's a tough question to estimate. This might be true for most English 
> monolingual web sites, but not necessarily for the majority of Django 
> sites. Hopefully we'll get some more user inputs in this thread.
>  
>
> Hi, just wanted to give my input on this point, I agree with Aymeric 
> Augustin here and my vote is to keep usernames as ASCII by default.
>
> I created a Django ticket 
>  for this, I will 
> copy my reasoning here:
>
> A Django user who is trying to save time and get a product out the door 
> isn't going to focus on finer details such as Unicode usernames, and will 
> be in for a shock when he finds out a bunch of his users have registered 
> themselves with Egyptian hieroglyphics. He may be very frustrated, 
> eventually figuring out that he must subclass the User model and 
> setusername_validator 
> = ASCIIUsernameValidator() to get the functionality he expected. And what 
> is he to do with the existing Unicode users, delete all their accounts?
>
> Whereas a technologically forward user might be friendlier towards Unicode 
> usernames, and would be well-informed on these capabilities within Django. 
> Furthermore, the technologically forward user will be more likely to 
> already have a custom user model, and won't find it cumbersome to 
> explicitly enable Unicode usernames. Enabling Unicode usernames isn't 
> destructive like disabling it would be (no need to figure out what to do 
> with the existing users offending the validation), so users can simply 
> start using it immediately.
>
> On Sunday, April 24, 2016 at 2:58:55 PM UTC-4, Claude Paroz wrote:
>>
>> Hi Aymeric,
>>
>> Le samedi 23 avril 2016 14:33:56 UTC+2, Aymeric Augustin a écrit :
>>>
>>> > https://github.com/django/django/pull/6494 
>>>
>>> This patch looks pretty good. I have a few questions, not necessarily 
>>> because I disagree with your proposal, but to make sure we have considered 
>>> alternatives. Actually I don't think there's exactly one correct solution 
>>> here; it's more a matter of tradeoffs. 
>>>
>>> You added a username_validator attribute instead of documenting how to 
>>> override the whole username field. Can you elaborate on this decision? I 
>>> simplifies the use case targeted by the patch by introducing a one-off API. 
>>> As a matter of principle I'm a bit skeptical of such special cases. But I 
>>> understand the convenience.
>>>
>>
>> My preoccupation here was not to force users to create a custom user 
>> model just to change the username validation, especially as the migration 
>> system doesn't seem to support yet upgrading from the standard auth User to 
>> a custom user. I thought that creating a proxy custom user is easier 
>> migration-wise, as no new table is required. But I may be wrong.
>>  
>>
>>> Normalization happens at the form layer. I'm wondering whether it would 
>>> be safer to do it at the model layer. That would extend the security 
>>> hardening to cases where users aren't created with a form — for example if 
>>> they're created through an API or programmatically. 
>>>
>>
>> Normalization happens both at the form layer and at the model layer in 
>> _create_user. You may have missed the _create_user change.
>>  
>>
>>> I would keep ASCII usernames as the default because: 
>>>
>>> - this has always been the intent; 
>>>
>>
>> Until now! Things are evolving, we see that for example with 
>> internationalized domain names. I think that most if not all technical 
>> reasons requiring pure ASCII usernames have vanished nowadays.
>>  
>>
>>> - allowing non ASCII usernames may result in interoperability problems 
>>> with other software e.g. if a Django project is used as SSO server; 
>>>
>>
>> These are still not the majority of Django use cases. And even then, I 
>> think that LDAPv3 for example should support unicode in attributes. Those 
>> project could still configure the ASCIIUsernameValidator if desired.
>>  
>>
>>> - these interoperability issues might escalate into security 
>>> vulnerabilities — there isn't a straightforward connection but (1) non 
>>> ASCII data can be used for breaking out of parsing routines (2) I'm 
>>> paranoid with anything that manipulates authentication credentials; 
>>>
>>
>> Sure, the more characters, the more attack s