Hi Russell,

> I've had a look at the patches for (1) and (2), and to me, the look like 
> mirror images of the same patch -- it's just a matter of whether we convert 
> everything to bytes or unicode when we have the opportunity. 
> My immediate reaction is that (2) -- keeping everything in unicode until it 
> doesn't need to be -- looks like the better long term solution, but I'll also 
> admit that this is based purely upon history and an inspection of the patch.

Yes, they're similar, because they touch the same locations. (1) is restoring 
the behavior of Django 1.4 by switching some string literals back to 
bytestrings in modules that use unicode_literals. (2) is modifying the tests to 
adjust for the changes in the code.

I agree that (2) looks like the better solution in the long term. Django is 
already doing a lot of filesystem-related operations in unicode internally. 
Keeping a mix of bytes and unicode will cause more trouble in the future.

Sorry if I sound pessimistic, but no matter what we do, I expect that some apps 
will die with UnicodeDecodeError after upgrading to 1.5, because Django will 
attempt to convert some byetstrings created in user code to unicode. The risk 
is a bit higher with option (2). Option (1) doesn't eliminate it either: see 
https://code.djangoproject.com/ticket/19357#comment:13

At this point, sticking with our guns and insisting that developers use unicode 
everywhere (even where bytestrings used to work) is a decent plan. At least 
that's easy to explain.

> In particular, I'm not completely up to speed with the Python3 implications.

Python 3 is a non-issue here:
  - It provides an unicode abstraction to the filesystem and handles decoding / 
encoding automatically.
  - We don't have any backwards compatibility to begin with.

> In the notes for approach 2, you say that this approach would be "deviating 
> from Python's" behaviour -- can you summarise what the expected Python 
> behaviour here is (especially for Python 3, but summarising Python 2 wouldn't 
> hurt either)?


(explanation stolen from https://code.djangoproject.com/ticket/19398)

By default, filesystem paths are represented with native strings (ie. str 
objects) in Python 2 and Python 3.

% python2
>>> import os
>>> type(os.listdir('.')[0])
<type 'str'>

% python3
>>> import os
>>> type(os.listdir('.')[0])
<class 'str'>

In other words, they were switched from bytestrings in Python 2 to unicode in 
Python 3.

For the sake of completeness:
- In Python 2, it's possible to use unicode for filesystem paths, when 
os.path.supports_unicode_filenames = True, but that's not the default mode of 
operation.
- In Python 3, it's possible to use bytestrings for filesystem paths, because 
not all supported platforms sport unicode-aware filesystems.

However, the intent of Python's developers is that str objets should be used in 
all cases.

Best regards,
-- 
Aymeric.



-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to