Re: [Python-Dev] TextIO seek and tell cookies

2016-09-25 Thread Peter Ludemann via Python-Dev
On 25 September 2016 at 21:18, Guido van Rossum  wrote:

> Be careful though, comparing these to plain integers should probably
> be allowed,


​There's a good reason why it's "opaque" ... why would you want to make it
less opaque?

And I'm curious why Python didn't adopt the fgetpos/fsetpos style that
makes the data structure completely opaque (fpos_t). IIRC, this was added
to C when the ANSI standard was first written, to allow cross-platform
compatibility in cases where ftell/fseek was difficult (or impossible) to
fully implement. Maybe those reasons don't matter any more (e.g., dealing
with record-oriented or keyed file systems) ...



> and we also should make sure that things like
> serialization via JSON or storing in an SQL database don't break. I
> personally think it's one of those "learn not to touch the stove"
> cases and there's limited value in making this API idiot proof.
>
> On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlan  wrote:
> > On 26 September 2016 at 10:21, MRAB  wrote:
> >> On 2016-09-26 00:21, Ben Leslie wrote:
> >>> Are there any downsides to this? I've made some progress developing a
> >>> patch to change this functionality. Is it worth polishing and
> >>> submitting?
> >>>
> >> An alternative might be a subclass of int.
> >
> > It could make sense to use a subclass of int that emitted deprecation
> > warnings for integer arithmetic, and then eventually disallowed it
> > entirely.
> >
> > Cheers,
> > Nick.
> >
> > --
> > Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> pludemann%40google.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possibly inconsistent behavior in re groupdict

2016-09-25 Thread Guido van Rossum
Hi Gordon,

You pose an interesting question that I don't think anyone has posed
before. Having thought about it, I think that the keys in the group
dict are similar to the names of variables or attributes, and I think
treating them always as strings makes sense. For example, I might
write a function that allows passing in a pattern and a search string,
both either str or bytes, where the function would expect fixed keys
in the group dict:

def extract_key_value(pattern, target):
m = re.match(pattern, target)
return m and m.groupdict['key'], m.groupdict['value']

There might be a problem with decoding the group name from the
pattern, so sticking to ASCII group names would be wise.

There's also the backwards compatibility concern: even if we did want
to change this, would we want to break existing code (like the above)
that might currently work?

--Guido

On Sun, Sep 25, 2016 at 5:25 PM, Gordon R. Burgess
 wrote:
> I've been lurking for a couple of months, working up the confidence to
> ask the list about this behavior - I've searched through the PEPs but
> couldn't find any specific reference to it.
>
> In a nutshell, in the Python 3.5 library re patterns and search buffers
> both need to be either unicode or byte strings - but the keys in the
> groupdict are always returned as str in either case.
>
> I don't know whether or not this is by design, but it would make more
> sense to me if when searching a bytes object with a bytes pattern the
> keys returned in the groupdict were bytes as well.
>
> I reworked the example a little just now so it would run it on 2.7 as
> well; on 2.7 the keys in the dictionary correspond to the mode of the
> pattern as expected (and bytes and unicode are interconverted silently)
> - code and output are inline below.
>
> Thanks for your time,
>
> Gordon
>
> [Code]
>
> import sys
> import re
> from datetime import datetime
>
> data = (u"first string (unicode)",
>  b"second string (bytes)")
>
> pattern = [re.compile(u"(?P\\w+) .*\\((?P\\w+)\\)"),
>re.compile(b"(?P\\w+) .*\\((?P\\w+)\\)")]
>
> print("*** re consistency check ***\nRun: %s\nVersion: Python %s\n" %
>   (datetime.now(), sys.version))
> for p in pattern:
> for d in data:
> try:
> result = "groupdict: %s" % (p.match(d) and
> p.match(d).groupdict())
> except Exception as e:
> result = "error: %s" % e.args[0]
> print("mode: %s\npattern: %s\ndata: %s\n%s\n" %
>   (type(p.pattern).__name__, p.pattern, d, result))
>
> [Output]
>
> gordon@w540:~/workspace/regex_demo$ python3 regex_demo.py
> *** re consistency check ***
> Run: 2016-09-25 20:06:29.472332
> Version: Python 3.5.2+ (default, Sep 10 2016, 10:24:58)
> [GCC 6.2.0 20160901]
>
> mode: str
> pattern: (?P\w+) .*\((?P\w+)\)
> data: first string (unicode)
> groupdict: {'ordinal': 'first', 'type': 'unicode'}
>
> mode: str
> pattern: (?P\w+) .*\((?P\w+)\)
> data: b'second string (bytes)'
> error: cannot use a string pattern on a bytes-like object
>
> mode: bytes
> pattern: b'(?P\\w+) .*\\((?P\\w+)\\)'
> data: first string (unicode)
> error: cannot use a bytes pattern on a string-like object
>
> mode: bytes
> pattern: b'(?P\\w+) .*\\((?P\\w+)\\)'
> data: b'second string (bytes)'
> groupdict: {'ordinal': b'second', 'type': b'bytes'}
>
> gordon@w540:~/workspace/regex_demo$ python regex_demo.py
> *** re
> consistency check ***
> Run: 2016-09-25 20:06:23.375322
> Version: Python
> 2.7.12+ (default, Sep  1 2016, 20:27:38)
> [GCC 6.2.0 20160822]
>
> mode: unicode
> pattern: (?P\w+) .*\((?P\w+)\)
> data: first string (unicode)
> groupdict: {u'ordinal': u'first', u'type': u'unicode'}
>
> mode: unicode
> pattern: (?P\w+) .*\((?P\w+)\)
> data: second string (bytes)
> groupdict: {u'ordinal': 'second', u'type': 'bytes'}
>
> mode: str
> pattern: (?P\w+) .*\((?P\w+)\)
> data: first string (unicode)
> groupdict: {'ordinal': u'first', 'type': u'unicode'}
>
> mode: str
> pattern: (?P\w+) .*\((?P\w+)\)
> data: second string (bytes)
> groupdict: {'ordinal': 'second', 'type': 'bytes'}
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-25 Thread Guido van Rossum
Be careful though, comparing these to plain integers should probably
be allowed, and we also should make sure that things like
serialization via JSON or storing in an SQL database don't break. I
personally think it's one of those "learn not to touch the stove"
cases and there's limited value in making this API idiot proof.

On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlan  wrote:
> On 26 September 2016 at 10:21, MRAB  wrote:
>> On 2016-09-26 00:21, Ben Leslie wrote:
>>> Are there any downsides to this? I've made some progress developing a
>>> patch to change this functionality. Is it worth polishing and
>>> submitting?
>>>
>> An alternative might be a subclass of int.
>
> It could make sense to use a subclass of int that emitted deprecation
> warnings for integer arithmetic, and then eventually disallowed it
> entirely.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-25 Thread Nick Coghlan
On 26 September 2016 at 10:21, MRAB  wrote:
> On 2016-09-26 00:21, Ben Leslie wrote:
>> Are there any downsides to this? I've made some progress developing a
>> patch to change this functionality. Is it worth polishing and
>> submitting?
>>
> An alternative might be a subclass of int.

It could make sense to use a subclass of int that emitted deprecation
warnings for integer arithmetic, and then eventually disallowed it
entirely.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Possibly inconsistent behavior in re groupdict

2016-09-25 Thread Gordon R. Burgess
I've been lurking for a couple of months, working up the confidence to
ask the list about this behavior - I've searched through the PEPs but
couldn't find any specific reference to it.

In a nutshell, in the Python 3.5 library re patterns and search buffers
both need to be either unicode or byte strings - but the keys in the
groupdict are always returned as str in either case.

I don't know whether or not this is by design, but it would make more
sense to me if when searching a bytes object with a bytes pattern the
keys returned in the groupdict were bytes as well.

I reworked the example a little just now so it would run it on 2.7 as
well; on 2.7 the keys in the dictionary correspond to the mode of the
pattern as expected (and bytes and unicode are interconverted silently)
- code and output are inline below.

Thanks for your time,

Gordon

[Code]

import sys
import re
from datetime import datetime

data = (u"first string (unicode)",
 b"second string (bytes)")

pattern = [re.compile(u"(?P\\w+) .*\\((?P\\w+)\\)"),
   re.compile(b"(?P\\w+) .*\\((?P\\w+)\\)")]

print("*** re consistency check ***\nRun: %s\nVersion: Python %s\n" %
  (datetime.now(), sys.version))
for p in pattern:
for d in data:
try:
result = "groupdict: %s" % (p.match(d) and
p.match(d).groupdict())
except Exception as e:
result = "error: %s" % e.args[0]
print("mode: %s\npattern: %s\ndata: %s\n%s\n" %
  (type(p.pattern).__name__, p.pattern, d, result))

[Output]

gordon@w540:~/workspace/regex_demo$ python3 regex_demo.py 
*** re consistency check ***
Run: 2016-09-25 20:06:29.472332
Version: Python 3.5.2+ (default, Sep 10 2016, 10:24:58) 
[GCC 6.2.0 20160901]

mode: str
pattern: (?P\w+) .*\((?P\w+)\)
data: first string (unicode)
groupdict: {'ordinal': 'first', 'type': 'unicode'}

mode: str
pattern: (?P\w+) .*\((?P\w+)\)
data: b'second string (bytes)'
error: cannot use a string pattern on a bytes-like object

mode: bytes
pattern: b'(?P\\w+) .*\\((?P\\w+)\\)'
data: first string (unicode)
error: cannot use a bytes pattern on a string-like object

mode: bytes
pattern: b'(?P\\w+) .*\\((?P\\w+)\\)'
data: b'second string (bytes)'
groupdict: {'ordinal': b'second', 'type': b'bytes'}

gordon@w540:~/workspace/regex_demo$ python regex_demo.py 
*** re
consistency check ***
Run: 2016-09-25 20:06:23.375322
Version: Python
2.7.12+ (default, Sep  1 2016, 20:27:38) 
[GCC 6.2.0 20160822]

mode: unicode
pattern: (?P\w+) .*\((?P\w+)\)
data: first string (unicode)
groupdict: {u'ordinal': u'first', u'type': u'unicode'}

mode: unicode
pattern: (?P\w+) .*\((?P\w+)\)
data: second string (bytes)
groupdict: {u'ordinal': 'second', u'type': 'bytes'}

mode: str
pattern: (?P\w+) .*\((?P\w+)\)
data: first string (unicode)
groupdict: {'ordinal': u'first', 'type': u'unicode'}

mode: str
pattern: (?P\w+) .*\((?P\w+)\)
data: second string (bytes)
groupdict: {'ordinal': 'second', 'type': 'bytes'}

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-25 Thread MRAB

On 2016-09-26 00:21, Ben Leslie wrote:

Hi all,

I recently shot myself in the foot by assuming that TextIO.tell
returned integers rather than opaque cookies. Specifically I was
adding an offset to the value returned by TextIO.tell. In retrospect
this doesn't make sense/

Now, I don't want to drive change simply because I failed to read the
documentation carefully, but I think the current API is very easy to
misuse. Most of the time TextIO.tell returns a cookie that is actually
an integer and adding an offset to it and seek-ing works fine.

The only indication you get that you are mis-using the API is that
sometimes tell returns a cookie that when you add an integer offset to
it will cause seek() to fail with an OverflowError.

Would it be possible to change the API to return something more
opaque? E.g.: rather than converting the C cookie structure to a long,
could it instead be converted to  a bytes() object.

(I.e.: Change textiowrapper_build_cookie to use
PyBytes_FromStringAndSize rather than _PyLong_FromByteArray and
equivalent for textiowrapper_parse_cookie).

This would ensure the return value is never mis-used and is probably
also faster using bytes objects than converting to/from an integer.


why would it be faster? It's an integer internally.


Are there any downsides to this? I've made some progress developing a
patch to change this functionality. Is it worth polishing and
submitting?


An alternative might be a subclass of int.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] TextIO seek and tell cookies

2016-09-25 Thread Ben Leslie
Hi all,

I recently shot myself in the foot by assuming that TextIO.tell
returned integers rather than opaque cookies. Specifically I was
adding an offset to the value returned by TextIO.tell. In retrospect
this doesn't make sense/

Now, I don't want to drive change simply because I failed to read the
documentation carefully, but I think the current API is very easy to
misuse. Most of the time TextIO.tell returns a cookie that is actually
an integer and adding an offset to it and seek-ing works fine.

The only indication you get that you are mis-using the API is that
sometimes tell returns a cookie that when you add an integer offset to
it will cause seek() to fail with an OverflowError.

Would it be possible to change the API to return something more
opaque? E.g.: rather than converting the C cookie structure to a long,
could it instead be converted to  a bytes() object.

(I.e.: Change textiowrapper_build_cookie to use
PyBytes_FromStringAndSize rather than _PyLong_FromByteArray and
equivalent for textiowrapper_parse_cookie).

This would ensure the return value is never mis-used and is probably
also faster using bytes objects than converting to/from an integer.

Are there any downsides to this? I've made some progress developing a
patch to change this functionality. Is it worth polishing and
submitting?

Cheers,

Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython (3.6): replace usage of Py_VA_COPY with the (C99) standard va_copy

2016-09-25 Thread Nick Coghlan
On 24 September 2016 at 18:07, Benjamin Peterson  wrote:
>
>
> On Fri, Sep 23, 2016, at 09:32, Steven D'Aprano wrote:
>> On Thu, Sep 22, 2016 at 11:47:20PM -0700, Benjamin Peterson wrote:
>> >
>> > On Thu, Sep 22, 2016, at 04:44, Victor Stinner wrote:
>> > > 2016-09-22 8:02 GMT+02:00 Benjamin Peterson :
>> > > > Just dump the compat macros in Python 4.0 I think.
>> > >
>> > > Please don't. Python 3 was so painful because we decided to make
>> > > millions of tiny backward incompatible changes. To have a smooth
>> > > Python 4.0 release, we should only remove things which were already
>> > > deprecated since at least 2 cycles, and well documented as deprecated.
>> >
>> > I'm being flippant here because of the triviality of the change. Anyone
>> > using Py_VA_COPY or Py_MEMCPY can fix their code in a backwards and
>> > forwards compatible manner in 7 seconds with a sed command.
>>
>> Sorry, I haven't been following this thread in detail, so perhaps I've
>> misunderstood. Are you assuming that anyone who is building Python from
>> source is automatically able to diagnose C level build failures and
>> known how to fix them using sed?
>
> I am assuming authors of CPython extensions possess those skills.

Not all projects on PyPI have active maintainers though, and on the
project user end, there's a significant difference between "Can set up
a C build environment well enough to let distutils build simple C
extensions for a new Python release" and "Is the maintainer of the C
extension".

It's often useful to think of *any* backwards incompatible change we
make as a pruning filter on PyPI: projects that don't have active
maintainers that are affected by the change won't be updated as a
matter of course. The end result is then usually going to be one of:

- the original author returns to active maintenance for long enough to
release an update
- an interested user contacts the original author and takes over maintenance
- affected users migrate away to a new actively maintained fork of the project
- affected users migrate away to another existing project addressing
the same need

There are some cases where a lack of active maintenance is inherently
a problem (e.g. network security), so we're happy to trigger those
ripple effects. In other cases, the pay-off might be in ease of
maintenance for the core development team, or in ease of future
learning for new Python developers.

But it doesn't matter how trivial the specific change needed is if
getting it resolved and a new version published turns out to require a
transfer of project ownership - the cost is in the ownership change
rather than the software change itself.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com