[issue8839] PyArg_ParseTuple(): remove "t# format

Marc-Andre Lemburg Fri, 28 May 2010 04:30:29 -0700

Marc-Andre Lemburg <m...@egenix.com> added the comment:

STINNER Victor wrote:
> 
> STINNER Victor <victor.stin...@haypocalc.com> added the comment:
> 
>> Given that "y#" is not (yet) in wide-spread use, ...
> 
> t# is only used once (in codecs.charbuffer_encode()), whereas y# is used by 
> ossaudiodev, socket and mmap modules (there are 8 functions using y#). There 
> are 46 functions using y* format. y format is not used in Python3.
> 
> To me, it looks easier to just drop t# and continue to use y, y* and y# 
> formats in Python3.


You are forgetting our main target: to get extension writers to
port their extensions to Python3. Changes to the Python core are
a lot easier to implement than getting thousands of extensions
ported.

"t#" is in wide-spread use, since it's the only way a Python2
extension can request access to an object's text data version.

"y#" was introduced with Python3, and there are only very few
extensions written for it.

Given these facts, it's better to drop "y#" and replace it with
"t#". This is easily done for the core modules and by adding
synonyms for "y#" we can also automatically take care of the
few Python3 extensions possibly using it.

>> "y#" and "y*" could then be setup as synonyms for "t#" and "t*"
> 
> If we have to keep backward compatibility, yes, t# can be kept as a synonym 
> for y#. But I don't think that backward compatibility of the C API is 
> important in Python3 because only few 3rd party modules are compatible with 
> Python3.

True and that's why we have to make it easier for extension writer
to port their extensions rather than making it harder.

It is not too difficult to adjust a Python2 extension to work
in Python3 as well, so that's most likely the route that
many extension writer will take, hence the need to reduce the
number of differences between the Python2 and Python3 C API.

> --
> 
> I prefer to use y, y* and y# formats because they target the *bytes* type 
> (which is the Python3 type to store byte strings), whereas s# is used in 
> Python2 to get text, *str* type.. which are byte strings, but most Python2 
> programmers consider that the str type is the type of chararacter string. I 
> see the change of s# to y#, as the change from str to bytes (the strict 
> separation between bytes and str).

That's not correct: "s#" is used in Python2 to get at the bytes
representation of an object, not the text version. "t#" was
specifically added to access a text version of the content.

In Python3, this distinction is no longer available (for whatever
reason), so only the bytes representation of the object remains.

Looking at the implementation again, I found that "y#" rejects
Unicode, while "s#" returns the default encoded version like
"t#" does in Python2.

So I have to correct what I said earlier:

"y#" is not the right replacement for "t#" in order to stay compatible
with its Python2 pendant. The "t#" implementation in Python3 is not
compatible with the Python2 approach - it's in fact, a totally
different parser, since Unicode no longer provides a buffer interface
and thus cannot be used as input for "t#".

The only compatible pendant to the Python2 "t#" parser marker
in Python3 appears to be "s#".

I'll have to think about this some more, but seen in that light,
removing "t#" in Python3 may actually be a better strategy after
all - mostly to remove a misguided forward-porting attempt
and to reduce the number of surprising extension writer will
see when porting their apps to Python3.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8839>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8839] PyArg_ParseTuple(): remove "t# format

Reply via email to