Re: Python2.7 unicode conundrum

2018-11-26 Thread Robert Latest via Python-list
Richard Damon wrote:
> Why do you say it has been convert to 'Latin'. The string prints as
> being Unicode. Internally Python doesn't store strings as UTF-8, but as
> plain Unicode (UCS-2 or UCS-4 as needed), and code-point E4 is the
> character you want.

You're right, this wasn't the minimal example for my problem after all.
Turns out that the actual issue is somewhere between SQLAlchemy and
MySQL. I took a more specific questioon overt to stackoverflow.com

Thanks
robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python2.7 unicode conundrum

2018-11-25 Thread Richard Damon
On 11/25/18 12:51 PM, Robert Latest via Python-list wrote:
> Hi folks,
> what semmingly started out as a weird database character encoding mix-up
> could be boiled down to a few lines of pure Python. The source-code
> below is real utf8 (as evidenced by the UTF code point 'c3 a4' in the
> third line of the hexdump). When just printed, the string "s" is
> displayed correctly as 'ä' (a umlaut), but the string representation
> shows that it seems to have been converted to latin-1 'e4' somewhere on
> the way.
> How can this be avoided?
>
> dh@jenna:~/python$ cat unicode.py
> # -*- encoding: utf8 -*-
>
> s = u'ä'
>
> print(s)
> print((s, ))
>
> dh@jenna:~/python$ hd unicode.py 
>   23 20 2d 2a 2d 20 65 6e  63 6f 64 69 6e 67 3a 20  |# -*- encoding: |
> 0010  75 74 66 38 20 2d 2a 2d  0a 0a 73 20 3d 20 75 27  |utf8 -*-..s = u'|
> 0020  c3 a4 27 0a 0a 70 72 69  6e 74 28 73 29 0a 70 72  |..'..print(s).pr|
> 0030  69 6e 74 28 28 73 2c 20  29 29 0a 0a  |int((s,))..|
> 003c
> dh@jenna:~/python$ python unicode.py
> ä
> (u'\xe4',)
> dh@jenna:~/python$
>
>
>
Why do you say it has been convert to 'Latin'. The string prints as
being Unicode. Internally Python doesn't store strings as UTF-8, but as
plain Unicode (UCS-2 or UCS-4 as needed), and code-point E4 is the
character you want.

The encoding statement tells python how your source file is encoded.

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python2.7 unicode conundrum

2018-11-25 Thread Thomas Jollans
On 25/11/2018 18:51, Robert Latest via Python-list wrote:
> Hi folks,
> what semmingly started out as a weird database character encoding mix-up
> could be boiled down to a few lines of pure Python. The source-code
> below is real utf8 (as evidenced by the UTF code point 'c3 a4' in the
> third line of the hexdump). When just printed, the string "s" is
> displayed correctly as 'ä' (a umlaut), but the string representation
> shows that it seems to have been converted to latin-1 'e4' somewhere on
> the way.

It's not being converted to latin-1. It's a unicode string, as evidences
by the 'u'.

u'\xe4' is a unicode string with one character, U+00E4 (ä)

> How can this be avoided?
> 
> dh@jenna:~/python$ cat unicode.py
> # -*- encoding: utf8 -*-
> 
> s = u'ä'
> 
> print(s)
> print((s, ))
> 
> dh@jenna:~/python$ hd unicode.py 
>   23 20 2d 2a 2d 20 65 6e  63 6f 64 69 6e 67 3a 20  |# -*- encoding: |
> 0010  75 74 66 38 20 2d 2a 2d  0a 0a 73 20 3d 20 75 27  |utf8 -*-..s = u'|
> 0020  c3 a4 27 0a 0a 70 72 69  6e 74 28 73 29 0a 70 72  |..'..print(s).pr|
> 0030  69 6e 74 28 28 73 2c 20  29 29 0a 0a  |int((s,))..|
> 003c
> dh@jenna:~/python$ python unicode.py
> ä
> (u'\xe4',)
> dh@jenna:~/python$
> 
> 
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Python2.7 unicode conundrum

2018-11-25 Thread Robert Latest via Python-list
Hi folks,
what semmingly started out as a weird database character encoding mix-up
could be boiled down to a few lines of pure Python. The source-code
below is real utf8 (as evidenced by the UTF code point 'c3 a4' in the
third line of the hexdump). When just printed, the string "s" is
displayed correctly as 'ä' (a umlaut), but the string representation
shows that it seems to have been converted to latin-1 'e4' somewhere on
the way.
How can this be avoided?

dh@jenna:~/python$ cat unicode.py
# -*- encoding: utf8 -*-

s = u'ä'

print(s)
print((s, ))

dh@jenna:~/python$ hd unicode.py 
  23 20 2d 2a 2d 20 65 6e  63 6f 64 69 6e 67 3a 20  |# -*- encoding: |
0010  75 74 66 38 20 2d 2a 2d  0a 0a 73 20 3d 20 75 27  |utf8 -*-..s = u'|
0020  c3 a4 27 0a 0a 70 72 69  6e 74 28 73 29 0a 70 72  |..'..print(s).pr|
0030  69 6e 74 28 28 73 2c 20  29 29 0a 0a  |int((s,))..|
003c
dh@jenna:~/python$ python unicode.py
ä
(u'\xe4',)
dh@jenna:~/python$



-- 
https://mail.python.org/mailman/listinfo/python-list