Re: Python2.7 unicode conundrum
Richard Damon wrote: > Why do you say it has been convert to 'Latin'. The string prints as > being Unicode. Internally Python doesn't store strings as UTF-8, but as > plain Unicode (UCS-2 or UCS-4 as needed), and code-point E4 is the > character you want. You're right, this wasn't the minimal example for my problem after all. Turns out that the actual issue is somewhere between SQLAlchemy and MySQL. I took a more specific questioon overt to stackoverflow.com Thanks robert -- https://mail.python.org/mailman/listinfo/python-list
Re: Python2.7 unicode conundrum
On 11/25/18 12:51 PM, Robert Latest via Python-list wrote: > Hi folks, > what semmingly started out as a weird database character encoding mix-up > could be boiled down to a few lines of pure Python. The source-code > below is real utf8 (as evidenced by the UTF code point 'c3 a4' in the > third line of the hexdump). When just printed, the string "s" is > displayed correctly as 'ä' (a umlaut), but the string representation > shows that it seems to have been converted to latin-1 'e4' somewhere on > the way. > How can this be avoided? > > dh@jenna:~/python$ cat unicode.py > # -*- encoding: utf8 -*- > > s = u'ä' > > print(s) > print((s, )) > > dh@jenna:~/python$ hd unicode.py > 23 20 2d 2a 2d 20 65 6e 63 6f 64 69 6e 67 3a 20 |# -*- encoding: | > 0010 75 74 66 38 20 2d 2a 2d 0a 0a 73 20 3d 20 75 27 |utf8 -*-..s = u'| > 0020 c3 a4 27 0a 0a 70 72 69 6e 74 28 73 29 0a 70 72 |..'..print(s).pr| > 0030 69 6e 74 28 28 73 2c 20 29 29 0a 0a |int((s,))..| > 003c > dh@jenna:~/python$ python unicode.py > ä > (u'\xe4',) > dh@jenna:~/python$ > > > Why do you say it has been convert to 'Latin'. The string prints as being Unicode. Internally Python doesn't store strings as UTF-8, but as plain Unicode (UCS-2 or UCS-4 as needed), and code-point E4 is the character you want. The encoding statement tells python how your source file is encoded. -- Richard Damon -- https://mail.python.org/mailman/listinfo/python-list
Re: Python2.7 unicode conundrum
On 25/11/2018 18:51, Robert Latest via Python-list wrote: > Hi folks, > what semmingly started out as a weird database character encoding mix-up > could be boiled down to a few lines of pure Python. The source-code > below is real utf8 (as evidenced by the UTF code point 'c3 a4' in the > third line of the hexdump). When just printed, the string "s" is > displayed correctly as 'ä' (a umlaut), but the string representation > shows that it seems to have been converted to latin-1 'e4' somewhere on > the way. It's not being converted to latin-1. It's a unicode string, as evidences by the 'u'. u'\xe4' is a unicode string with one character, U+00E4 (ä) > How can this be avoided? > > dh@jenna:~/python$ cat unicode.py > # -*- encoding: utf8 -*- > > s = u'ä' > > print(s) > print((s, )) > > dh@jenna:~/python$ hd unicode.py > 23 20 2d 2a 2d 20 65 6e 63 6f 64 69 6e 67 3a 20 |# -*- encoding: | > 0010 75 74 66 38 20 2d 2a 2d 0a 0a 73 20 3d 20 75 27 |utf8 -*-..s = u'| > 0020 c3 a4 27 0a 0a 70 72 69 6e 74 28 73 29 0a 70 72 |..'..print(s).pr| > 0030 69 6e 74 28 28 73 2c 20 29 29 0a 0a |int((s,))..| > 003c > dh@jenna:~/python$ python unicode.py > ä > (u'\xe4',) > dh@jenna:~/python$ > > > -- https://mail.python.org/mailman/listinfo/python-list
Python2.7 unicode conundrum
Hi folks, what semmingly started out as a weird database character encoding mix-up could be boiled down to a few lines of pure Python. The source-code below is real utf8 (as evidenced by the UTF code point 'c3 a4' in the third line of the hexdump). When just printed, the string "s" is displayed correctly as 'ä' (a umlaut), but the string representation shows that it seems to have been converted to latin-1 'e4' somewhere on the way. How can this be avoided? dh@jenna:~/python$ cat unicode.py # -*- encoding: utf8 -*- s = u'ä' print(s) print((s, )) dh@jenna:~/python$ hd unicode.py 23 20 2d 2a 2d 20 65 6e 63 6f 64 69 6e 67 3a 20 |# -*- encoding: | 0010 75 74 66 38 20 2d 2a 2d 0a 0a 73 20 3d 20 75 27 |utf8 -*-..s = u'| 0020 c3 a4 27 0a 0a 70 72 69 6e 74 28 73 29 0a 70 72 |..'..print(s).pr| 0030 69 6e 74 28 28 73 2c 20 29 29 0a 0a |int((s,))..| 003c dh@jenna:~/python$ python unicode.py ä (u'\xe4',) dh@jenna:~/python$ -- https://mail.python.org/mailman/listinfo/python-list