[issue20686] Confusing statement about unicode strings in tutorial introduction

2020-05-31 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Python 2.7 is no longer supported.

--
nosy: +serhiy.storchaka
resolution:  -> out of date
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20686] Confusing statement about unicode strings in tutorial introduction

2014-03-20 Thread Georg Brandl

Georg Brandl added the comment:

First, entering a string at the command prompt like this is not considered 
"printing"; it's invoking the repr().

Then, when you say flexible, you say it as if it's a good thing.  In this 
context "flexible" means as much as "easy to produce mojibake" and is not 
desirable.

For all these use cases, there are ways to do the right thing with Unicode 
strings in Python 2 (e.g. using io.open instead of builtin open).  But making 
these the builtin case was the big gain of Python 3.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20686] Confusing statement about unicode strings in tutorial introduction

2014-03-20 Thread Daniel U. Thibault

Daniel U. Thibault added the comment:

>>> mystring="äöü"
>>> myustring=u"äöü"

>>> mystring
'\xc3\xa4\xc3\xb6\xc3\xbc'
>>> myustring
u'\xe4\xf6\xfc'

>>> str(mystring)
'\xc3\xa4\xc3\xb6\xc3\xbc'
>>> str(myustring)
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: 
ordinal not in range(128)

>>> f = open('workfile', 'w')
>>> f.write(mystring)
>>> f.close()
>>> f = open('workufile', 'w')
>>> f.write(myustring)
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: 
ordinal not in range(128)
>>> f.close()

workfile contains C3 A4 C3 B6 C3 BC

So the Unicode string (myustring) does indeed try to convert to ASCII when 
written to file. But not when just printed.

It seems really strange that non-Unicode strings (mystring) should actually be 
more flexible than Unicode strings...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20686] Confusing statement about unicode strings in tutorial introduction

2014-03-20 Thread R. David Murray

R. David Murray added the comment:

re: file.  You forgot the 'u' in front of the string:

>>> f.write(u'This is a «test»\n')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xab' in position 
10: ordinal not in range(128)

So you were actually writing binary in your console encoding, which must have 
been utf-8.  (This kind of confusion is the main reason python3 exists).

--
title: Confusing statement -> Confusing statement about unicode strings in 
tutorial introduction

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20686] Confusing statement

2014-03-20 Thread Daniel U. Thibault

Daniel U. Thibault added the comment:

"The default encoding is normally set to ASCII [...]. When a Unicode string is 
printed, written to a file, or converted with str(), conversion takes place 
using this default encoding."

>>> u"äöü"
u'\xe4\xf6\xfc'
   Printing a Unicode string uses ASCII encoding: false (the characters are not 
converted to their ASCII equivalents) (compare with str(), below)

>>> str(u"äöü")
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: 
ordinal not in range(128)
   Converting a Unicode string with str() uses ASCII encoding: true (if print 
(see above) behaved like str(), you'd get an error too)

>>> f = open('workfile', 'w')
>>> f.write('This is a «test»\n')
>>> f.close()
   Writing a Unicode string to a file uses ASCII encoding: false (examination 
of the file reveals UTF-8 characters (hex dump: 54 68 69 73 20 69 73 20 61 20 
C2 AB 74 65 73 74 C2 BB 0A))

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20686] Confusing statement

2014-02-20 Thread R. David Murray

R. David Murray added the comment:

Thanks, yes, Georg already pointed out the issue with print.  I suppose that 
this is something that changed at some point in Python2's history but this bit 
of the docs was not updated.

Python can write anything to a file, you just have to tell it what encoding to 
use, either by explicitly encoding the unicode to binary before writing it to 
the file, or by using codecs.open and specifying an encoding for the file.  
(This is all much easier in python3, where the unicode support is part of the 
core of the language.)

--
versions: +Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20686] Confusing statement

2014-02-20 Thread Daniel U. Thibault

Daniel U. Thibault added the comment:

"It seems to me the statement is correct as written.  What experiments indicate 
otherwise?"

Here's a simple one:

>>> print «1»

The guillemets are certainly not ASCII (Unicode AB and BB, well outside ASCII's 
7F upper limit) but are rendered as guillemets.  (Guillemets are easy for me 
'cause I use a French keyboard)  I haven't actually checked yet what happens 
when writing to a file.  If Python is unable to write anything but ASCII to 
file, it becomes nearly useless.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20686] Confusing statement

2014-02-19 Thread Georg Brandl

Georg Brandl added the comment:

The only problem I can see is that "print" uses the console encoding.

For files and str(), the comment is correct for Python 2.

--
nosy: +georg.brandl

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20686] Confusing statement

2014-02-19 Thread R. David Murray

R. David Murray added the comment:

It seems to me the statement is correct as written.  What experiments indicate 
otherwise?

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20686] Confusing statement

2014-02-19 Thread Daniel U. Thibault

New submission from Daniel U. Thibault:

Near the end of 3.1.3 
http://docs.python.org/2/tutorial/introduction.html#unicode-strings you can 
read:

"When a Unicode string is printed, written to a file, or converted with str(), 
conversion takes place using this default encoding."

This can be interpreted as stating that stating that printing a Unicode string 
(using the print function or the shell's default print behaviour) results in 
ASCII printout.  It can likewise be interpreted as stating that any write of a 
Unicode string to a file converts the string to ASCII.  Experimentation shows 
this is not true.  Perhaps you meant something like this:

"When a Unicode string is converted with str() in order to be printed or 
written to a file, conversion takes place using this default encoding."

Grammatical comments: In the statement "When a Unicode string is printed, 
written to a file, or converted with str(), conversion takes place using this 
default encoding.", the ", or" puts the three elements of the enumeration on 
the same level (respectively "printed", "written to a file", and "converted 
with str()"). The confusion seems to arise because "with str()" was meant to 
apply to the list as a whole, not just its last element.

--
assignee: docs@python
components: Documentation
messages: 211627
nosy: Daniel.U..Thibault, docs@python
priority: normal
severity: normal
status: open
title: Confusing statement
type: enhancement

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com