Marc-Andre Lemburg <m...@egenix.com> added the comment: STINNER Victor wrote: > > STINNER Victor <victor.stin...@haypocalc.com> added the comment: > >> Given that "y#" is not (yet) in wide-spread use, ... > > t# is only used once (in codecs.charbuffer_encode()), whereas y# is used by > ossaudiodev, socket and mmap modules (there are 8 functions using y#). There > are 46 functions using y* format. y format is not used in Python3. > > To me, it looks easier to just drop t# and continue to use y, y* and y# > formats in Python3.
You are forgetting our main target: to get extension writers to port their extensions to Python3. Changes to the Python core are a lot easier to implement than getting thousands of extensions ported. "t#" is in wide-spread use, since it's the only way a Python2 extension can request access to an object's text data version. "y#" was introduced with Python3, and there are only very few extensions written for it. Given these facts, it's better to drop "y#" and replace it with "t#". This is easily done for the core modules and by adding synonyms for "y#" we can also automatically take care of the few Python3 extensions possibly using it. >> "y#" and "y*" could then be setup as synonyms for "t#" and "t*" > > If we have to keep backward compatibility, yes, t# can be kept as a synonym > for y#. But I don't think that backward compatibility of the C API is > important in Python3 because only few 3rd party modules are compatible with > Python3. True and that's why we have to make it easier for extension writer to port their extensions rather than making it harder. It is not too difficult to adjust a Python2 extension to work in Python3 as well, so that's most likely the route that many extension writer will take, hence the need to reduce the number of differences between the Python2 and Python3 C API. > -- > > I prefer to use y, y* and y# formats because they target the *bytes* type > (which is the Python3 type to store byte strings), whereas s# is used in > Python2 to get text, *str* type.. which are byte strings, but most Python2 > programmers consider that the str type is the type of chararacter string. I > see the change of s# to y#, as the change from str to bytes (the strict > separation between bytes and str). That's not correct: "s#" is used in Python2 to get at the bytes representation of an object, not the text version. "t#" was specifically added to access a text version of the content. In Python3, this distinction is no longer available (for whatever reason), so only the bytes representation of the object remains. Looking at the implementation again, I found that "y#" rejects Unicode, while "s#" returns the default encoded version like "t#" does in Python2. So I have to correct what I said earlier: "y#" is not the right replacement for "t#" in order to stay compatible with its Python2 pendant. The "t#" implementation in Python3 is not compatible with the Python2 approach - it's in fact, a totally different parser, since Unicode no longer provides a buffer interface and thus cannot be used as input for "t#". The only compatible pendant to the Python2 "t#" parser marker in Python3 appears to be "s#". I'll have to think about this some more, but seen in that light, removing "t#" in Python3 may actually be a better strategy after all - mostly to remove a misguided forward-porting attempt and to reduce the number of surprising extension writer will see when porting their apps to Python3. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8839> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com