Re: [Python-Dev] TextIO seek and tell cookies
On 25 September 2016 at 21:18, Guido van Rossumwrote: > Be careful though, comparing these to plain integers should probably > be allowed, There's a good reason why it's "opaque" ... why would you want to make it less opaque? And I'm curious why Python didn't adopt the fgetpos/fsetpos style that makes the data structure completely opaque (fpos_t). IIRC, this was added to C when the ANSI standard was first written, to allow cross-platform compatibility in cases where ftell/fseek was difficult (or impossible) to fully implement. Maybe those reasons don't matter any more (e.g., dealing with record-oriented or keyed file systems) ... > and we also should make sure that things like > serialization via JSON or storing in an SQL database don't break. I > personally think it's one of those "learn not to touch the stove" > cases and there's limited value in making this API idiot proof. > > On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlan wrote: > > On 26 September 2016 at 10:21, MRAB wrote: > >> On 2016-09-26 00:21, Ben Leslie wrote: > >>> Are there any downsides to this? I've made some progress developing a > >>> patch to change this functionality. Is it worth polishing and > >>> submitting? > >>> > >> An alternative might be a subclass of int. > > > > It could make sense to use a subclass of int that emitted deprecation > > warnings for integer arithmetic, and then eventually disallowed it > > entirely. > > > > Cheers, > > Nick. > > > > -- > > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > > ___ > > Python-Dev mailing list > > Python-Dev@python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > > > > -- > --Guido van Rossum (python.org/~guido) > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > pludemann%40google.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possibly inconsistent behavior in re groupdict
Hi Gordon, You pose an interesting question that I don't think anyone has posed before. Having thought about it, I think that the keys in the group dict are similar to the names of variables or attributes, and I think treating them always as strings makes sense. For example, I might write a function that allows passing in a pattern and a search string, both either str or bytes, where the function would expect fixed keys in the group dict: def extract_key_value(pattern, target): m = re.match(pattern, target) return m and m.groupdict['key'], m.groupdict['value'] There might be a problem with decoding the group name from the pattern, so sticking to ASCII group names would be wise. There's also the backwards compatibility concern: even if we did want to change this, would we want to break existing code (like the above) that might currently work? --Guido On Sun, Sep 25, 2016 at 5:25 PM, Gordon R. Burgesswrote: > I've been lurking for a couple of months, working up the confidence to > ask the list about this behavior - I've searched through the PEPs but > couldn't find any specific reference to it. > > In a nutshell, in the Python 3.5 library re patterns and search buffers > both need to be either unicode or byte strings - but the keys in the > groupdict are always returned as str in either case. > > I don't know whether or not this is by design, but it would make more > sense to me if when searching a bytes object with a bytes pattern the > keys returned in the groupdict were bytes as well. > > I reworked the example a little just now so it would run it on 2.7 as > well; on 2.7 the keys in the dictionary correspond to the mode of the > pattern as expected (and bytes and unicode are interconverted silently) > - code and output are inline below. > > Thanks for your time, > > Gordon > > [Code] > > import sys > import re > from datetime import datetime > > data = (u"first string (unicode)", > b"second string (bytes)") > > pattern = [re.compile(u"(?P\\w+) .*\\((?P\\w+)\\)"), >re.compile(b"(?P\\w+) .*\\((?P\\w+)\\)")] > > print("*** re consistency check ***\nRun: %s\nVersion: Python %s\n" % > (datetime.now(), sys.version)) > for p in pattern: > for d in data: > try: > result = "groupdict: %s" % (p.match(d) and > p.match(d).groupdict()) > except Exception as e: > result = "error: %s" % e.args[0] > print("mode: %s\npattern: %s\ndata: %s\n%s\n" % > (type(p.pattern).__name__, p.pattern, d, result)) > > [Output] > > gordon@w540:~/workspace/regex_demo$ python3 regex_demo.py > *** re consistency check *** > Run: 2016-09-25 20:06:29.472332 > Version: Python 3.5.2+ (default, Sep 10 2016, 10:24:58) > [GCC 6.2.0 20160901] > > mode: str > pattern: (?P\w+) .*\((?P\w+)\) > data: first string (unicode) > groupdict: {'ordinal': 'first', 'type': 'unicode'} > > mode: str > pattern: (?P\w+) .*\((?P\w+)\) > data: b'second string (bytes)' > error: cannot use a string pattern on a bytes-like object > > mode: bytes > pattern: b'(?P\\w+) .*\\((?P\\w+)\\)' > data: first string (unicode) > error: cannot use a bytes pattern on a string-like object > > mode: bytes > pattern: b'(?P\\w+) .*\\((?P\\w+)\\)' > data: b'second string (bytes)' > groupdict: {'ordinal': b'second', 'type': b'bytes'} > > gordon@w540:~/workspace/regex_demo$ python regex_demo.py > *** re > consistency check *** > Run: 2016-09-25 20:06:23.375322 > Version: Python > 2.7.12+ (default, Sep 1 2016, 20:27:38) > [GCC 6.2.0 20160822] > > mode: unicode > pattern: (?P\w+) .*\((?P\w+)\) > data: first string (unicode) > groupdict: {u'ordinal': u'first', u'type': u'unicode'} > > mode: unicode > pattern: (?P\w+) .*\((?P\w+)\) > data: second string (bytes) > groupdict: {u'ordinal': 'second', u'type': 'bytes'} > > mode: str > pattern: (?P\w+) .*\((?P\w+)\) > data: first string (unicode) > groupdict: {'ordinal': u'first', 'type': u'unicode'} > > mode: str > pattern: (?P\w+) .*\((?P\w+)\) > data: second string (bytes) > groupdict: {'ordinal': 'second', 'type': 'bytes'} > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
Be careful though, comparing these to plain integers should probably be allowed, and we also should make sure that things like serialization via JSON or storing in an SQL database don't break. I personally think it's one of those "learn not to touch the stove" cases and there's limited value in making this API idiot proof. On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlanwrote: > On 26 September 2016 at 10:21, MRAB wrote: >> On 2016-09-26 00:21, Ben Leslie wrote: >>> Are there any downsides to this? I've made some progress developing a >>> patch to change this functionality. Is it worth polishing and >>> submitting? >>> >> An alternative might be a subclass of int. > > It could make sense to use a subclass of int that emitted deprecation > warnings for integer arithmetic, and then eventually disallowed it > entirely. > > Cheers, > Nick. > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
On 26 September 2016 at 10:21, MRABwrote: > On 2016-09-26 00:21, Ben Leslie wrote: >> Are there any downsides to this? I've made some progress developing a >> patch to change this functionality. Is it worth polishing and >> submitting? >> > An alternative might be a subclass of int. It could make sense to use a subclass of int that emitted deprecation warnings for integer arithmetic, and then eventually disallowed it entirely. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Possibly inconsistent behavior in re groupdict
I've been lurking for a couple of months, working up the confidence to ask the list about this behavior - I've searched through the PEPs but couldn't find any specific reference to it. In a nutshell, in the Python 3.5 library re patterns and search buffers both need to be either unicode or byte strings - but the keys in the groupdict are always returned as str in either case. I don't know whether or not this is by design, but it would make more sense to me if when searching a bytes object with a bytes pattern the keys returned in the groupdict were bytes as well. I reworked the example a little just now so it would run it on 2.7 as well; on 2.7 the keys in the dictionary correspond to the mode of the pattern as expected (and bytes and unicode are interconverted silently) - code and output are inline below. Thanks for your time, Gordon [Code] import sys import re from datetime import datetime data = (u"first string (unicode)", b"second string (bytes)") pattern = [re.compile(u"(?P\\w+) .*\\((?P\\w+)\\)"), re.compile(b"(?P\\w+) .*\\((?P\\w+)\\)")] print("*** re consistency check ***\nRun: %s\nVersion: Python %s\n" % (datetime.now(), sys.version)) for p in pattern: for d in data: try: result = "groupdict: %s" % (p.match(d) and p.match(d).groupdict()) except Exception as e: result = "error: %s" % e.args[0] print("mode: %s\npattern: %s\ndata: %s\n%s\n" % (type(p.pattern).__name__, p.pattern, d, result)) [Output] gordon@w540:~/workspace/regex_demo$ python3 regex_demo.py *** re consistency check *** Run: 2016-09-25 20:06:29.472332 Version: Python 3.5.2+ (default, Sep 10 2016, 10:24:58) [GCC 6.2.0 20160901] mode: str pattern: (?P\w+) .*\((?P\w+)\) data: first string (unicode) groupdict: {'ordinal': 'first', 'type': 'unicode'} mode: str pattern: (?P\w+) .*\((?P\w+)\) data: b'second string (bytes)' error: cannot use a string pattern on a bytes-like object mode: bytes pattern: b'(?P\\w+) .*\\((?P\\w+)\\)' data: first string (unicode) error: cannot use a bytes pattern on a string-like object mode: bytes pattern: b'(?P\\w+) .*\\((?P\\w+)\\)' data: b'second string (bytes)' groupdict: {'ordinal': b'second', 'type': b'bytes'} gordon@w540:~/workspace/regex_demo$ python regex_demo.py *** re consistency check *** Run: 2016-09-25 20:06:23.375322 Version: Python 2.7.12+ (default, Sep 1 2016, 20:27:38) [GCC 6.2.0 20160822] mode: unicode pattern: (?P\w+) .*\((?P\w+)\) data: first string (unicode) groupdict: {u'ordinal': u'first', u'type': u'unicode'} mode: unicode pattern: (?P\w+) .*\((?P\w+)\) data: second string (bytes) groupdict: {u'ordinal': 'second', u'type': 'bytes'} mode: str pattern: (?P\w+) .*\((?P\w+)\) data: first string (unicode) groupdict: {'ordinal': u'first', 'type': u'unicode'} mode: str pattern: (?P\w+) .*\((?P\w+)\) data: second string (bytes) groupdict: {'ordinal': 'second', 'type': 'bytes'} ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
On 2016-09-26 00:21, Ben Leslie wrote: Hi all, I recently shot myself in the foot by assuming that TextIO.tell returned integers rather than opaque cookies. Specifically I was adding an offset to the value returned by TextIO.tell. In retrospect this doesn't make sense/ Now, I don't want to drive change simply because I failed to read the documentation carefully, but I think the current API is very easy to misuse. Most of the time TextIO.tell returns a cookie that is actually an integer and adding an offset to it and seek-ing works fine. The only indication you get that you are mis-using the API is that sometimes tell returns a cookie that when you add an integer offset to it will cause seek() to fail with an OverflowError. Would it be possible to change the API to return something more opaque? E.g.: rather than converting the C cookie structure to a long, could it instead be converted to a bytes() object. (I.e.: Change textiowrapper_build_cookie to use PyBytes_FromStringAndSize rather than _PyLong_FromByteArray and equivalent for textiowrapper_parse_cookie). This would ensure the return value is never mis-used and is probably also faster using bytes objects than converting to/from an integer. why would it be faster? It's an integer internally. Are there any downsides to this? I've made some progress developing a patch to change this functionality. Is it worth polishing and submitting? An alternative might be a subclass of int. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] TextIO seek and tell cookies
Hi all, I recently shot myself in the foot by assuming that TextIO.tell returned integers rather than opaque cookies. Specifically I was adding an offset to the value returned by TextIO.tell. In retrospect this doesn't make sense/ Now, I don't want to drive change simply because I failed to read the documentation carefully, but I think the current API is very easy to misuse. Most of the time TextIO.tell returns a cookie that is actually an integer and adding an offset to it and seek-ing works fine. The only indication you get that you are mis-using the API is that sometimes tell returns a cookie that when you add an integer offset to it will cause seek() to fail with an OverflowError. Would it be possible to change the API to return something more opaque? E.g.: rather than converting the C cookie structure to a long, could it instead be converted to a bytes() object. (I.e.: Change textiowrapper_build_cookie to use PyBytes_FromStringAndSize rather than _PyLong_FromByteArray and equivalent for textiowrapper_parse_cookie). This would ensure the return value is never mis-used and is probably also faster using bytes objects than converting to/from an integer. Are there any downsides to this? I've made some progress developing a patch to change this functionality. Is it worth polishing and submitting? Cheers, Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython (3.6): replace usage of Py_VA_COPY with the (C99) standard va_copy
On 24 September 2016 at 18:07, Benjamin Petersonwrote: > > > On Fri, Sep 23, 2016, at 09:32, Steven D'Aprano wrote: >> On Thu, Sep 22, 2016 at 11:47:20PM -0700, Benjamin Peterson wrote: >> > >> > On Thu, Sep 22, 2016, at 04:44, Victor Stinner wrote: >> > > 2016-09-22 8:02 GMT+02:00 Benjamin Peterson : >> > > > Just dump the compat macros in Python 4.0 I think. >> > > >> > > Please don't. Python 3 was so painful because we decided to make >> > > millions of tiny backward incompatible changes. To have a smooth >> > > Python 4.0 release, we should only remove things which were already >> > > deprecated since at least 2 cycles, and well documented as deprecated. >> > >> > I'm being flippant here because of the triviality of the change. Anyone >> > using Py_VA_COPY or Py_MEMCPY can fix their code in a backwards and >> > forwards compatible manner in 7 seconds with a sed command. >> >> Sorry, I haven't been following this thread in detail, so perhaps I've >> misunderstood. Are you assuming that anyone who is building Python from >> source is automatically able to diagnose C level build failures and >> known how to fix them using sed? > > I am assuming authors of CPython extensions possess those skills. Not all projects on PyPI have active maintainers though, and on the project user end, there's a significant difference between "Can set up a C build environment well enough to let distutils build simple C extensions for a new Python release" and "Is the maintainer of the C extension". It's often useful to think of *any* backwards incompatible change we make as a pruning filter on PyPI: projects that don't have active maintainers that are affected by the change won't be updated as a matter of course. The end result is then usually going to be one of: - the original author returns to active maintenance for long enough to release an update - an interested user contacts the original author and takes over maintenance - affected users migrate away to a new actively maintained fork of the project - affected users migrate away to another existing project addressing the same need There are some cases where a lack of active maintenance is inherently a problem (e.g. network security), so we're happy to trigger those ripple effects. In other cases, the pay-off might be in ease of maintenance for the core development team, or in ease of future learning for new Python developers. But it doesn't matter how trivial the specific change needed is if getting it resolved and a new version published turns out to require a transfer of project ownership - the cost is in the ownership change rather than the software change itself. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com