Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-04 Thread Stephen J. Turnbull
 Guido van Rossum writes:

  Given that the claim Python 2 doesn't support Unicode filenames
  is factually incorrect (in Python 2.7, most filesystem calls in
  fact do support Unicode, at least on some platforms),

I don't understand what support Unicode means.  Just that

with open(u\u4e00, w) as f: f.write(works!\n)

does what is expected[1] if the user knows what he is doing (ie, has
set PYTHONIOENCODING to a Unicode UTF or one of the Asian encodings)?

  I think individual functions in the os module that are found
  lacking should be considered bugs, and if someone goes through 
  the effort to supply an otherwise acceptable fix, we shouldn't
  reject it on the basis that we don't want to consider supporting
  Unicode filenames.

As above, acceptable fix means take whatever the current value is
for file system name encoding, and use that to encode and decode
unicode objects to/from str, or raise a UnicodeError if it doesn't
work?

I think it's important to define this somewhat carefully, because this
is an area that has a strong tendency to mission creep.  Given that
builtin open works by the above definition, I guess it's reasonable
to accept such patches.

Footnotes: 
[1] It writes the line works!\n to a file whose name consists of the
single Chinese character for one.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-04 Thread Guido van Rossum
On Tue, Mar 4, 2014 at 5:23 AM, Stephen J. Turnbull step...@xemacs.orgwrote:

  Guido van Rossum writes:

   Given that the claim Python 2 doesn't support Unicode filenames
   is factually incorrect (in Python 2.7, most filesystem calls in
   fact do support Unicode, at least on some platforms),

 I don't understand what support Unicode means.  Just that

 with open(u\u4e00, w) as f: f.write(works!\n)

 does what is expected[1] if the user knows what he is doing (ie, has
 set PYTHONIOENCODING to a Unicode UTF or one of the Asian encodings)?


That's all I'm asking for, since that's what most functions in 2.7 already
do.


I think individual functions in the os module that are found
   lacking should be considered bugs, and if someone goes through
   the effort to supply an otherwise acceptable fix, we shouldn't
   reject it on the basis that we don't want to consider supporting
   Unicode filenames.

 As above, acceptable fix means take whatever the current value is
 for file system name encoding, and use that to encode and decode
 unicode objects to/from str, or raise a UnicodeError if it doesn't
 work?


The same thing that is done for other functions that take filenames.


 I think it's important to define this somewhat carefully, because this
 is an area that has a strong tendency to mission creep.  Given that
 builtin open works by the above definition, I guess it's reasonable
 to accept such patches.


Right, the interpretation given to Unicode filenames by builtin open()
should be propagated to other functions (I actually suspect that
os.statvfs(), which apparently doesn't, is in the minority here).  AFAIK
that's also roughly what happens in Python 3.


 Footnotes:
 [1] It writes the line works!\n to a file whose name consists of the
 single Chinese character for one.





-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-03 Thread Chris Barker
On Sun, Mar 2, 2014 at 6:44 PM, Guido van Rossum gu...@python.org wrote:

 AFACT, in that message Victor was only talking about allowing Unicode
 filenames.

...

 Finally, in most places Python 2.7 *does* handle Unicode filenames just
 fine.


I'm a bit confused. In this example:

http://bugs.python.org/issue18695

You are proposing that the issue should be considered a bug and a
well-written patch accepted?

Or is is just too late for 2.7 ?

Personally I think that having some, but not all file functions accept
unicode paths is pretty brokenand fixing these kinds of thing will ease
2 to 3 transition, so a good thing overall.

- Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-03 Thread Guido van Rossum
On Mon, Mar 3, 2014 at 8:37 AM, Chris Barker chris.bar...@noaa.gov wrote:

 On Sun, Mar 2, 2014 at 6:44 PM, Guido van Rossum gu...@python.org wrote:

 AFACT, in that message Victor was only talking about allowing Unicode
 filenames.

 ...

  Finally, in most places Python 2.7 *does* handle Unicode filenames just
 fine.


 I'm a bit confused. In this example:

 http://bugs.python.org/issue18695

 You are proposing that the issue should be considered a bug and a
 well-written patch accepted?

 Or is is just too late for 2.7 ?

 Personally I think that having some, but not all file functions accept
 unicode paths is pretty brokenand fixing these kinds of thing will ease
 2 to 3 transition, so a good thing overall.


Agreed.

Given that the claim Python 2 doesn't support Unicode filenames is
factually incorrect (in Python 2.7, most filesystem calls in fact do
support Unicode, at least on some platforms), I think individual functions
in the os module that are found lacking should be considered bugs, and if
someone goes through the effort to supply an otherwise acceptable fix, we
shouldn't reject it on the basis that we don't want to consider supporting
Unicode filenames.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-02 Thread Terry Reedy
Suppose a 2.7 standard library function is documented as taking a 
'string' argument, such as these examples from the turtle module.


pencolor(colorstring)
Set pencolor to colorstring, which is a Tk color specification 
string, such as red, yellow, or #33cc8c.


turtle.shape(name=None)
Parameters: name – a string which is a valid shapename

class turtle.Shape(type_, data)
Parameters: type_ – one of the strings “polygon”, “image”, “compound”

Suppose adding
from __future__ import unicode_literals
to a working program causes an exception, such as with turtle
http://bugs.python.org/issue15618
(Note: unicode_literals is not indexed.)

Is this a programmer error for passing unicode instead of string, or a 
library error for not accepting unicode?
Is changing 'isinstance(x, str)' in the library (with whatever other 
changes are needed) a bugfix to be pushed or a prohibited API expansion?


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-02 Thread Antoine Pitrou
On Sun, 02 Mar 2014 15:01:01 -0500
Terry Reedy tjre...@udel.edu wrote:
 Suppose a 2.7 standard library function is documented as taking a 
 'string' argument, such as these examples from the turtle module.
 
 pencolor(colorstring)
  Set pencolor to colorstring, which is a Tk color specification 
 string, such as red, yellow, or #33cc8c.
 
 turtle.shape(name=None)
  Parameters:  name – a string which is a valid shapename
 
 class turtle.Shape(type_, data)
  Parameters:  type_ – one of the strings “polygon”, “image”, 
 “compound”
 
 Suppose adding
 from __future__ import unicode_literals
 to a working program causes an exception, such as with turtle
 http://bugs.python.org/issue15618
 (Note: unicode_literals is not indexed.)
 
 Is this a programmer error for passing unicode instead of string, or a 
 library error for not accepting unicode?

In most cases I would say it's a library error.
The only exception is when the argument is clearly meant as a byte
string rather than a text string, such as when writing to a binary file
or a socket.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-02 Thread Guido van Rossum
It looks to me like a defect in the library (*), and you are making a
reasonable argument that we should fix it in 2.7 to help people be more
prepared for the transition to Python 3.

(*) As Antoine points out, pretty much the only time where it's not a good
idea to switch from str to basestring is when the data is meant to be
binary -- but in this case it's clearly text (we can also tell from what
the same code looks like in Python 3 :-).


On Sun, Mar 2, 2014 at 12:01 PM, Terry Reedy tjre...@udel.edu wrote:

 Suppose a 2.7 standard library function is documented as taking a 'string'
 argument, such as these examples from the turtle module.

 pencolor(colorstring)
 Set pencolor to colorstring, which is a Tk color specification string,
 such as red, yellow, or #33cc8c.

 turtle.shape(name=None)
 Parameters: name - a string which is a valid shapename

 class turtle.Shape(type_, data)
 Parameters: type_ - one of the strings polygon, image, compound

 Suppose adding
 from __future__ import unicode_literals
 to a working program causes an exception, such as with turtle
 http://bugs.python.org/issue15618
 (Note: unicode_literals is not indexed.)

 Is this a programmer error for passing unicode instead of string, or a
 library error for not accepting unicode?
 Is changing 'isinstance(x, str)' in the library (with whatever other
 changes are needed) a bugfix to be pushed or a prohibited API expansion?

 --
 Terry Jan Reedy


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: https://mail.python.org/mailman/options/python-dev/
 guido%40python.org




-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-02 Thread Terry Reedy

On 3/2/2014 3:12 PM, Guido van Rossum wrote:

It looks to me like a defect in the library (*), and you are making a
reasonable argument that we should fix it in 2.7 to help people be more
prepared for the transition to Python 3.

(*) As Antoine points out, pretty much the only time where it's not a
good idea to switch from str to basestring is when the data is meant to
be binary -- but in this case it's clearly text (we can also tell from
what the same code looks like in Python 3 :-).


Thanks to both of you. 'bugfix' noted on the issue.


On Sun, Mar 2, 2014 at 12:01 PM, Terry Reedy tjre...@udel.edu
mailto:tjre...@udel.edu wrote:

Suppose a 2.7 standard library function is documented as taking a
'string' argument, such as these examples from the turtle module.

pencolor(colorstring)
 Set pencolor to colorstring, which is a Tk color specification
string, such as red, yellow, or #33cc8c.

turtle.shape(name=None)
 Parameters: name – a string which is a valid shapename

class turtle.Shape(type_, data)
 Parameters: type_ – one of the strings “polygon”, “image”,
“compound”

Suppose adding
from __future__ import unicode_literals
to a working program causes an exception, such as with turtle
http://bugs.python.org/__issue15618 http://bugs.python.org/issue15618
(Note: unicode_literals is not indexed.)

Is this a programmer error for passing unicode instead of string, or
a library error for not accepting unicode?
Is changing 'isinstance(x, str)' in the library (with whatever other
changes are needed) a bugfix to be pushed or a prohibited API expansion?


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-02 Thread Serhiy Storchaka

02.03.14 22:01, Terry Reedy написав(ла):

Is this a programmer error for passing unicode instead of string, or a
library error for not accepting unicode?
Is changing 'isinstance(x, str)' in the library (with whatever other
changes are needed) a bugfix to be pushed or a prohibited API expansion?


Patches which add support for unicode strings were accepted for one 
issues (e.g. http://bugs.python.org/issue19099) and rejected for other 
issues (e.g. http://bugs.python.org/issue20014 and 
http://bugs.python.org/issue20015). Some issues (e.g. 
http://bugs.python.org/issue18695) hang in undefined state.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-02 Thread Berker Peksağ
On Sun, Mar 2, 2014 at 11:23 PM, Serhiy Storchaka storch...@gmail.com wrote:
 Patches which add support for unicode strings were accepted for one issues
 (e.g. http://bugs.python.org/issue19099) and rejected for other issues (e.g.
 http://bugs.python.org/issue20014 and http://bugs.python.org/issue20015).

See also http://bugs.python.org/issue15843.

--Berker
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-02 Thread Terry Reedy

On 3/2/2014 4:23 PM, Serhiy Storchaka wrote:

02.03.14 22:01, Terry Reedy написав(ла):

Is this a programmer error for passing unicode instead of string, or a
library error for not accepting unicode?
Is changing 'isinstance(x, str)' in the library (with whatever other
changes are needed) a bugfix to be pushed or a prohibited API expansion?


Patches which add support for unicode strings were accepted for one
issues (e.g. http://bugs.python.org/issue19099) and rejected for other
issues (e.g. http://bugs.python.org/issue20014 and
http://bugs.python.org/issue20015). Some issues (e.g.
http://bugs.python.org/issue18695) hang in undefined state.


If Antoine and Guido don't reverse themselves, those could perhaps be 
re-opened. It strikes me as borderline, depending interpretation of 
'string'. I am not surprised there have been different resolutions.


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-02 Thread Nick Coghlan
On 3 March 2014 10:02, Terry Reedy tjre...@udel.edu wrote:
 On 3/2/2014 4:23 PM, Serhiy Storchaka wrote:

 02.03.14 22:01, Terry Reedy написав(ла):

 Is this a programmer error for passing unicode instead of string, or a
 library error for not accepting unicode?
 Is changing 'isinstance(x, str)' in the library (with whatever other
 changes are needed) a bugfix to be pushed or a prohibited API expansion?


 Patches which add support for unicode strings were accepted for one
 issues (e.g. http://bugs.python.org/issue19099) and rejected for other
 issues (e.g. http://bugs.python.org/issue20014 and
 http://bugs.python.org/issue20015). Some issues (e.g.
 http://bugs.python.org/issue18695) hang in undefined state.


 If Antoine and Guido don't reverse themselves, those could perhaps be
 re-opened. It strikes me as borderline, depending interpretation of
 'string'. I am not surprised there have been different resolutions.

It occurs to me that it would be good to have a bug fix or feature?
section in the developer guide to provide a more permanent record of
dicussions like this. That would also be the place to document tricks
like defining a private API to fix a bug in a maintenance release, and
then potentially making that new API public for the next feature
release if it's potentially useful to end users.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-02 Thread Stephen J. Turnbull
Terry Reedy writes:
  On 3/2/2014 4:23 PM, Serhiy Storchaka wrote:

   Patches which add support for unicode strings were accepted for one
   issues (e.g. http://bugs.python.org/issue19099) and rejected for other
   issues (e.g. http://bugs.python.org/issue20014 and
   http://bugs.python.org/issue20015). Some issues (e.g.
   http://bugs.python.org/issue18695) hang in undefined state.
  
  If Antoine and Guido don't reverse themselves, those could perhaps be 
  re-opened. It strikes me as borderline, depending interpretation of 
  'string'. I am not surprised there have been different resolutions.

I agree with Victor in http://bugs.python.org/issue18695#msg208857:
there's no bug.  It is just that in the design of 2.x 'str' is not
Unicode, and the fix is Python 3.  This may be an area where 2to3
could give more help.

As Victor points out in that message, the issue-by-issue approach to
this inconsistency is just whack-a-mole.

I would worry not only about the whack-a-mole aspect where 'unicode'
objects leak into contexts where they're not supported, but also that
this could confuse tools like 2to3.

I agree that usage of the word string is all too often ambiguous in
the documentation, but that doesn't justify a wholesale overhaul of
the Python 2.7 API to make everything polymorphic.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-02 Thread Guido van Rossum
AFACT, in that message Victor was only talking about allowing Unicode
filenames.

Making everything polymorphic is clearly pulling on the thread that will
unravel the entire sweater.

But... The start of this thread was about changing a few occurrences of
isinstance(..., str) to use basestring, and that's a different matter. The
Python 2 Unicode design calls for mixing of Unicode and 8-bit strings as
long as the latter contain 7-bit ASCII -- the code in turtle violates that
design by insisting on an 8-bit string. The underlying Tkinter module
handles Unicode strings just fine (and not just 7-bit ASCII).

As far as lib2to3 goes, using basestring instead of str actually
disambiguates things -- with str it can't tell for sure whether text or
binary was meant, but with basestring it's a safe bet that the intention
was text.

Finally, in most places Python 2.7 *does* handle Unicode filenames just
fine.


On Sun, Mar 2, 2014 at 6:26 PM, Stephen J. Turnbull step...@xemacs.orgwrote:

 Terry Reedy writes:
   On 3/2/2014 4:23 PM, Serhiy Storchaka wrote:

Patches which add support for unicode strings were accepted for one
issues (e.g. http://bugs.python.org/issue19099) and rejected for
 other
issues (e.g. http://bugs.python.org/issue20014 and
http://bugs.python.org/issue20015). Some issues (e.g.
http://bugs.python.org/issue18695) hang in undefined state.
  
   If Antoine and Guido don't reverse themselves, those could perhaps be
   re-opened. It strikes me as borderline, depending interpretation of
   'string'. I am not surprised there have been different resolutions.

 I agree with Victor in http://bugs.python.org/issue18695#msg208857:
 there's no bug.  It is just that in the design of 2.x 'str' is not
 Unicode, and the fix is Python 3.  This may be an area where 2to3
 could give more help.

 As Victor points out in that message, the issue-by-issue approach to
 this inconsistency is just whack-a-mole.

 I would worry not only about the whack-a-mole aspect where 'unicode'
 objects leak into contexts where they're not supported, but also that
 this could confuse tools like 2to3.

 I agree that usage of the word string is all too often ambiguous in
 the documentation, but that doesn't justify a wholesale overhaul of
 the Python 2.7 API to make everything polymorphic.

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 https://mail.python.org/mailman/options/python-dev/guido%40python.org




-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode_string future, str - basestring, fix or feature

2014-03-02 Thread Serhiy Storchaka

03.03.14 02:02, Terry Reedy написав(ла):

On 3/2/2014 4:23 PM, Serhiy Storchaka wrote:

02.03.14 22:01, Terry Reedy написав(ла):

Is this a programmer error for passing unicode instead of string, or a
library error for not accepting unicode?
Is changing 'isinstance(x, str)' in the library (with whatever other
changes are needed) a bugfix to be pushed or a prohibited API expansion?


Patches which add support for unicode strings were accepted for one
issues (e.g. http://bugs.python.org/issue19099) and rejected for other
issues (e.g. http://bugs.python.org/issue20014 and
http://bugs.python.org/issue20015). Some issues (e.g.
http://bugs.python.org/issue18695) hang in undefined state.


If Antoine and Guido don't reverse themselves, those could perhaps be
re-opened. It strikes me as borderline, depending interpretation of
'string'. I am not surprised there have been different resolutions.


I believe that in all cases when valid values are ASCII-only strings 
(format specifiers for array, struct, memoryview, etc), we can accept 
both str and unicode. Especially when they are likely literals. But when 
valid value can be non-ASCII (e.g. file names), it is a different case, 
because it requires additional and may be totally different code.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com