Nick Coghlan added the comment:

Following up here after rejecting #15622 as invalid

The "unicode" codes in PEP 3118 need to be seriously rethought before any 
related changes are made in the struct module.

1. The 'c' and 's' codes are currently used for raw bytes data (represented as 
bytes objects at the Python layer). This means the 'c' code cannot be used as 
described in PEP 3118 in a world with strict binary/text separation.

2. Any format codes for UCS1, UCS2 and UCS4 are more usefully modelled on 's' 
than they are on 'c' (so that repeat counts create longer strings rather than 
lists of strings that each contain a single code point)

3. Given some of the other proposals in PEP 3118, it seems more useful to 
define an embedded text format as "S{<encoding>}".

UCS1 would then be "S{latin-1}", UCS2 would be approximated as "S{utf-16}" and 
UCS4 would be "S{utf-32}" and arbitrary encodings would also be supported. 
struct packing would implicitly encode from text to bytes while unpacking would 
implicitly decode bytes to text. As with 's' a length mismatch in the encoded 
form would mean an error.

----------
nosy: +ncoghlan

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue3132>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to