Re: How to decode UTF strings?

2019-10-26 Thread Eli the Bearded
In comp.lang.python, DFS   wrote:
> On 10/25/2019 10:57 PM, MRAB wrote:
>> Here's a simple example, based in your code:
>> 
>> from email.header import decode_header
>> 
>> def test(header, default_encoding='utf-8'):
>>   parts = []
>> 
>>   for data, encoding in decode_header(header):
>>   if isinstance(data, str):
>>  parts.append(data)
>>   else:
>>  parts.append(data.decode(encoding or default_encoding))
>> 
>>   print(''.join(parts))
>> 
>> test('=?iso-8859-9?b?T/B1eg==?= ')
>> test('=?utf-8?Q?=EB=AF=B8?= ')
>> test('=?GBK?B?0Pu66A==?= ')
>> test('=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= 
>> ')
> I don't think it's working:

It's close. Just ''.join should be ' '.join.

> $ python decode_utf.py
> O≡uz
> 미
> ╨√║Φ
> Νίκος Βέργος

Is your terminal UTF-8? I think not.

Elijah
--
answered with C code to do this in comp.lang.c
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to decode UTF strings?

2019-10-25 Thread MRAB

On 2019-10-26 03:10, Arne Vajhøj wrote:

On 10/25/2019 4:52 PM, DFS wrote:

=?iso-8859-9?b?T/B1eg==?= 
=?utf-8?Q?=EB=AF=B8?= 
=?GBK?B?0Pu66A==?= 
=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= 


How does something like:

from email.header import decode_header

def test(s):
  print(s)
  s2 = decode_header(s)
  print(s2[0][0])
  print(s2[1][0].strip())

test('=?iso-8859-9?b?T/B1eg==?= ')
test('=?utf-8?Q?=EB=AF=B8?= ')
test('=?GBK?B?0Pu66A==?= ')
test('=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?=
')

work?

When you decode the header you get a number of parts, each with its own 
encoding.


Here's a simple example, based in your code:

from email.header import decode_header

def test(header, default_encoding='utf-8'):
 parts = []

 for data, encoding in decode_header(header):
 if isinstance(data, str):
parts.append(data)
 else:
parts.append(data.decode(encoding or default_encoding))

 print(''.join(parts))

test('=?iso-8859-9?b?T/B1eg==?= ')
test('=?utf-8?Q?=EB=AF=B8?= ')
test('=?GBK?B?0Pu66A==?= ')
test('=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= 
')

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to decode UTF strings?

2019-10-25 Thread Arne Vajhøj

On 10/25/2019 4:52 PM, DFS wrote:

=?iso-8859-9?b?T/B1eg==?= 
=?utf-8?Q?=EB=AF=B8?= 
=?GBK?B?0Pu66A==?= 
=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= 


How does something like:

from email.header import decode_header

def test(s):
print(s)
s2 = decode_header(s)
print(s2[0][0])
print(s2[1][0].strip())

test('=?iso-8859-9?b?T/B1eg==?= ')
test('=?utf-8?Q?=EB=AF=B8?= ')
test('=?GBK?B?0Pu66A==?= ')
test('=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= 
')


work?

Arne

--
https://mail.python.org/mailman/listinfo/python-list