[issue8839] PyArg_ParseTuple(): remove t# format

2010-06-10 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 t# was meant to provide access to text data, so replacing it with a
 parser code that is meant for binary data is not correct. The
 closes Python3 gets to t# from Python2 is s# or s*, so please use
 those in the NEWS entry and s* in charbuffer_encode().
 
 Done. Patch commited as r81854 in 3.2: it removes also 
 codecs.charbuffer_encode(). Commit blocked in 3.1 (r81855).

Thanks.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8839
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8839] PyArg_ParseTuple(): remove t# format

2010-06-08 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8839
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8839] PyArg_ParseTuple(): remove t# format

2010-06-07 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 New version of the patch:
  - charbuffer_encode() uses y* instead of y# format to accept modifiable 
 buffer objects (eg. bytearray)
  - Improve the documentation about the change
 
 @lemburg: So, do you agree with my patch?

No, because y*/y# are not correct replacements for t#. They don't
accept Unicode objects.

t# was meant to provide access to text data, so replacing it with a
parser code that is meant for binary data is not correct. The
closes Python3 gets to t# from Python2 is s# or s*, so please use
those in the NEWS entry and s* in charbuffer_encode().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8839
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8839] PyArg_ParseTuple(): remove t# format

2010-06-06 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

New version of the patch:
 - charbuffer_encode() uses y* instead of y# format to accept modifiable buffer 
objects (eg. bytearray)
 - Improve the documentation about the change

@lemburg: So, do you agree with my patch?

--
Added file: http://bugs.python.org/file17579/getarg_remove_tdash-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8839
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8839] PyArg_ParseTuple(): remove t# format

2010-05-28 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 New submission from STINNER Victor victor.stin...@haypocalc.com:
 
 t# format was introduced by r11803 (11 years ago): Implement new format 
 character 't#'. This is like s#, accepting an object that implements the 
 buffer interface, but requires a buffer that contains 8-bit character data.
 
 Python3 now has a strict separation between byte string (bytes and bytearray 
 types) and unicode string (str), and has PyBuffer and PyCapsule APIs. t# 
 format can be replaced by y# or y*.

 Extract of getarg.c:
 
   /*TEO: This can be eliminated --- here only for backward
 compatibility */
 case 't': { /* 8-bit character buffer, read-only access */
 
 In Python, the last function using t# is _codecs.charbuffer_encode() and I 
 proposed to remove this function in #8838. We can also patch this function.
 
 I don't know if third party modules use this format or not. I don't know if 
 it can be just removed or if it should raise a deprecation warning (but who 
 will notice such warning since there are disabled by default?).

Since Python3 completely removed the getcharbuffer interface
to which the t# interfaces in Python2, t# does indeed no
longer serve any special purpose.

It's probably wise to just map t# to y# in order to ease
porting extensions from 2.x to 3.x.

--
nosy: +lemburg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8839
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8839] PyArg_ParseTuple(): remove t# format

2010-05-28 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 Patch to remove t#:
  - Update c-api/arg.rst documentation
  - Replace t# format by y# in codecs.charbuffer_encode()
  - Add a note in Doc/whatsnew/3.2.rst (in Porting to Python 3.2)

Given that y# is not (yet) in wide-spread use, it may actually make
more sense, to replace y# with t# and introduce t* to replace
y*.

y# and y* could then be setup as synonyms for t# and t*.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8839
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8839] PyArg_ParseTuple(): remove t# format

2010-05-28 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 Given that y# is not (yet) in wide-spread use, ...

t# is only used once (in codecs.charbuffer_encode()), whereas y# is used by 
ossaudiodev, socket and mmap modules (there are 8 functions using y#). There 
are 46 functions using y* format. y format is not used in Python3.

To me, it looks easier to just drop t# and continue to use y, y* and y# formats 
in Python3.

 y# and y* could then be setup as synonyms for t# and t*

If we have to keep backward compatibility, yes, t# can be kept as a synonym for 
y#. But I don't think that backward compatibility of the C API is important in 
Python3 because only few 3rd party modules are compatible with Python3.

--

I prefer to use y, y* and y# formats because they target the *bytes* type 
(which is the Python3 type to store byte strings), whereas s# is used in 
Python2 to get text, *str* type.. which are byte strings, but most Python2 
programmers consider that the str type is the type of chararacter string. I see 
the change of s# to y#, as the change from str to bytes (the strict separation 
between bytes and str).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8839
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8839] PyArg_ParseTuple(): remove t# format

2010-05-28 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 Given that y# is not (yet) in wide-spread use, ...
 
 t# is only used once (in codecs.charbuffer_encode()), whereas y# is used by 
 ossaudiodev, socket and mmap modules (there are 8 functions using y#). There 
 are 46 functions using y* format. y format is not used in Python3.
 
 To me, it looks easier to just drop t# and continue to use y, y* and y# 
 formats in Python3.

You are forgetting our main target: to get extension writers to
port their extensions to Python3. Changes to the Python core are
a lot easier to implement than getting thousands of extensions
ported.

t# is in wide-spread use, since it's the only way a Python2
extension can request access to an object's text data version.

y# was introduced with Python3, and there are only very few
extensions written for it.

Given these facts, it's better to drop y# and replace it with
t#. This is easily done for the core modules and by adding
synonyms for y# we can also automatically take care of the
few Python3 extensions possibly using it.

 y# and y* could then be setup as synonyms for t# and t*
 
 If we have to keep backward compatibility, yes, t# can be kept as a synonym 
 for y#. But I don't think that backward compatibility of the C API is 
 important in Python3 because only few 3rd party modules are compatible with 
 Python3.

True and that's why we have to make it easier for extension writer
to port their extensions rather than making it harder.

It is not too difficult to adjust a Python2 extension to work
in Python3 as well, so that's most likely the route that
many extension writer will take, hence the need to reduce the
number of differences between the Python2 and Python3 C API.

 --
 
 I prefer to use y, y* and y# formats because they target the *bytes* type 
 (which is the Python3 type to store byte strings), whereas s# is used in 
 Python2 to get text, *str* type.. which are byte strings, but most Python2 
 programmers consider that the str type is the type of chararacter string. I 
 see the change of s# to y#, as the change from str to bytes (the strict 
 separation between bytes and str).

That's not correct: s# is used in Python2 to get at the bytes
representation of an object, not the text version. t# was
specifically added to access a text version of the content.

In Python3, this distinction is no longer available (for whatever
reason), so only the bytes representation of the object remains.

Looking at the implementation again, I found that y# rejects
Unicode, while s# returns the default encoded version like
t# does in Python2.

So I have to correct what I said earlier:

y# is not the right replacement for t# in order to stay compatible
with its Python2 pendant. The t# implementation in Python3 is not
compatible with the Python2 approach - it's in fact, a totally
different parser, since Unicode no longer provides a buffer interface
and thus cannot be used as input for t#.

The only compatible pendant to the Python2 t# parser marker
in Python3 appears to be s#.

I'll have to think about this some more, but seen in that light,
removing t# in Python3 may actually be a better strategy after
all - mostly to remove a misguided forward-porting attempt
and to reduce the number of surprising extension writer will
see when porting their apps to Python3.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8839
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8839] PyArg_ParseTuple(): remove t# format

2010-05-28 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Le vendredi 28 mai 2010 13:30:22, vous avez écrit :
 Looking at the implementation again, I found that y# rejects
 Unicode, while s# returns the default encoded version like
 t# does in Python2.

Oh, I didn't noticed that.

 So I have to correct what I said earlier:
 
 y# is not the right replacement for t# in order to stay compatible
 with its Python2 pendant. The t# implementation in Python3 is not
 compatible with the Python2 approach - it's in fact, a totally
 different parser, since Unicode no longer provides a buffer interface
 and thus cannot be used as input for t#.
 
 The only compatible pendant to the Python2 t# parser marker
 in Python3 appears to be s#.
 
 I'll have to think about this some more, but seen in that light,
 removing t# in Python3 may actually be a better strategy after
 all - mostly to remove a misguided forward-porting attempt
 and to reduce the number of surprising extension writer will
 see when porting their apps to Python3.

So t#, s# and y# are all different. I'm waiting for your final decision.

reduce the number of surprising extension writer ... is a good argument in 
favor of removing t# :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8839
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8839] PyArg_ParseTuple(): remove t# format

2010-05-28 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8839
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8839] PyArg_ParseTuple(): remove t# format

2010-05-27 Thread STINNER Victor

New submission from STINNER Victor victor.stin...@haypocalc.com:

t# format was introduced by r11803 (11 years ago): Implement new format 
character 't#'. This is like s#, accepting an object that implements the buffer 
interface, but requires a buffer that contains 8-bit character data.

Python3 now has a strict separation between byte string (bytes and bytearray 
types) and unicode string (str), and has PyBuffer and PyCapsule APIs. t# 
format can be replaced by y# or y*.

Extract of getarg.c:

  /*TEO: This can be eliminated --- here only for backward
compatibility */
case 't': { /* 8-bit character buffer, read-only access */

In Python, the last function using t# is _codecs.charbuffer_encode() and I 
proposed to remove this function in #8838. We can also patch this function.

I don't know if third party modules use this format or not. I don't know if it 
can be just removed or if it should raise a deprecation warning (but who will 
notice such warning since there are disabled by default?).

--
components: Interpreter Core
messages: 106627
nosy: haypo
priority: normal
severity: normal
status: open
title: PyArg_ParseTuple(): remove t# format
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8839
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com