Re: How to decode UTF strings?
In comp.lang.python, DFS wrote: > On 10/25/2019 10:57 PM, MRAB wrote: >> Here's a simple example, based in your code: >> >> from email.header import decode_header >> >> def test(header, default_encoding='utf-8'): >> parts = [] >> >> for data, encoding in decode_header(header): >> if isinstance(data, str): >> parts.append(data) >> else: >> parts.append(data.decode(encoding or default_encoding)) >> >> print(''.join(parts)) >> >> test('=?iso-8859-9?b?T/B1eg==?= ') >> test('=?utf-8?Q?=EB=AF=B8?= ') >> test('=?GBK?B?0Pu66A==?= ') >> test('=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= >> ') > I don't think it's working: It's close. Just ''.join should be ' '.join. > $ python decode_utf.py > O≡uz > δ»╕ > ╨√║Φ > ╬¥╬»╬║╬┐╧é ╬Æ╬¡╧ü╬│╬┐╧é Is your terminal UTF-8? I think not. Elijah -- answered with C code to do this in comp.lang.c -- https://mail.python.org/mailman/listinfo/python-list
Re: How to decode UTF strings?
On 2019-10-26 03:10, Arne Vajhøj wrote: On 10/25/2019 4:52 PM, DFS wrote: =?iso-8859-9?b?T/B1eg==?= =?utf-8?Q?=EB=AF=B8?= =?GBK?B?0Pu66A==?= =?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= How does something like: from email.header import decode_header def test(s): print(s) s2 = decode_header(s) print(s2[0][0]) print(s2[1][0].strip()) test('=?iso-8859-9?b?T/B1eg==?= ') test('=?utf-8?Q?=EB=AF=B8?= ') test('=?GBK?B?0Pu66A==?= ') test('=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= ') work? When you decode the header you get a number of parts, each with its own encoding. Here's a simple example, based in your code: from email.header import decode_header def test(header, default_encoding='utf-8'): parts = [] for data, encoding in decode_header(header): if isinstance(data, str): parts.append(data) else: parts.append(data.decode(encoding or default_encoding)) print(''.join(parts)) test('=?iso-8859-9?b?T/B1eg==?= ') test('=?utf-8?Q?=EB=AF=B8?= ') test('=?GBK?B?0Pu66A==?= ') test('=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= ') -- https://mail.python.org/mailman/listinfo/python-list
Re: How to decode UTF strings?
On 10/25/2019 4:52 PM, DFS wrote: =?iso-8859-9?b?T/B1eg==?= =?utf-8?Q?=EB=AF=B8?= =?GBK?B?0Pu66A==?= =?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= How does something like: from email.header import decode_header def test(s): print(s) s2 = decode_header(s) print(s2[0][0]) print(s2[1][0].strip()) test('=?iso-8859-9?b?T/B1eg==?= ') test('=?utf-8?Q?=EB=AF=B8?= ') test('=?GBK?B?0Pu66A==?= ') test('=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= ') work? Arne -- https://mail.python.org/mailman/listinfo/python-list