[issue24339] iso6937 encoding missing

Julien Sun, 13 Nov 2016 14:12:06 -0800

Julien added the comment:

Hi John, thanks for your contribution,


Looks like your implementation is missing some codepoints, like "\t":

    >>> print("\t".encode(encoding='iso6937'))                                  
                                                   
    [...]
    UnicodeError: encoding with 'iso6937' codec failed (UnicodeError: 
Unacceptable utf-8 character)

Probably due to the "range(0x20, "…, why `0x20`?

You're having problems to decode multibytes sequences as you're not having the 
`else: … result += chr(c[0])` in this case. So typically decoding `\xc2\x20` 
will raise a `KeyError` as `\x20` is _not_ in your decoding table.

Also, please conform your contribution to the PEP8: you're missing spaces after 
comas and you're sometime indenting with 8 spaces instead of 4.

I implemented a simple checker based on glibc localedata, it show clearly your 
decoding problems step by step, and should be easily extended to check for your 
encoding function too, see attachment. It uses the ISO6937 found typically in 
the locales debian package or in an 'apt-get sourcee glibc'.

----------
nosy: +sizeof
Added file: http://bugs.python.org/file45478/check_iso6937.py

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue24339>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24339] iso6937 encoding missing

Reply via email to