Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Greg Ewing

Nick Coghlan wrote:
having checks in both the producer & the consumer merely means that 
you'll be checking for signals twice every 65k iterations, rather than once.


Here's a possible reason for wanting checks in the producers:
If your producer happens to take a long time per iteration,
and the consumer only checks every 65k iterations, it might be
a while before a Ctrl-C takes effect.

If the producer is checking, it is likely to have a better
idea of what an appropriate checking interval might be.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Nick Coghlan
On 19 October 2017 at 08:34, Greg Ewing  wrote:

> Nick Coghlan wrote:
>
>> since breaking up the current single level loops as nested loops would be
>> a pre-requisite for allowing these APIs to check for signals while they're
>> running while keeping the per-iteration overhead low
>>
>
> Is there really much overhead? Isn't it just checking a flag?
>

It's checking an atomically updated flag, so it forces CPU cache
synchronisation, which means you don't want to be doing it on every
iteration of a low level loop.

However, reviewing Serhiy's PR reminded me that PyErr_CheckSignals()
already encapsulates the "Should this thread even be checking for signals
in the first place?" logic, which means the code change to make the
itertools iterators inherently interruptible with Ctrl-C is much smaller
than I thought it would be. That approach is also clearly safe from an
exception handling point of view, since all consumer loops already need to
cope with the fact that itr.__next__() may raise arbitrary exceptions
(including KeyboardInterrupt).

So that change alone already offers a notable improvement, and combining it
with a __length_hint__() implementation that keeps container constructors
from even starting to iterate would go even further towards making the
infinite iterators more user friendly.

Similar signal checking changes to the consumer loops would also be
possible, but I don't think that's an either/or decision: changing the
iterators means they'll be interruptible for any consumer, while changing
the consumers would make them interruptible for any iterator, and having
checks in both the producer & the consumer merely means that you'll be
checking for signals twice every 65k iterations, rather than once.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Memory limits [was Re: Membership of infinite iterators]

2017-10-18 Thread Steven D'Aprano
On Wed, Oct 18, 2017 at 02:51:37PM +0200, Stefan Krah wrote:

> $ softlimit -m 10 python3
[...]
> MemoryError
> 
> 
> People who are worried could make a python3 alias or use Ctrl-\.

I just tried that on two different Linux computers I have, and neither 
have softlimit. 

Nor (presumably) would this help Windows users.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Greg Ewing

Nick Coghlan wrote:
since breaking up the current single level loops as nested loops 
would be a pre-requisite for allowing these APIs to check for signals 
while they're running while keeping the per-iteration overhead low


Is there really much overhead? Isn't it just checking a flag?

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP draft: context variables

2017-10-18 Thread Yury Selivanov
On Sun, Oct 15, 2017 at 8:15 AM, Paul Moore  wrote:
> On 13 October 2017 at 23:30, Yury Selivanov  wrote:
>> At this point of time, there's just one place which describes one well
>> defined semantics: PEP 550 latest version.  Paul, if you have
>> time/interest, please take a look at it, and say what's confusing
>> there.
>
> Hi Yury,
> The following is my impressions from a read-through of the initial
> part of the PEP. tl; dr - you say "concurrent" too much and it makes
> my head hurt :-)
[..]
> I hope this is of some use. I appreciate I'm talking about a pretty
> wholesale rewrite, and it's quite far down the line to be suggesting
> such a thing. I'll understand if you don't feel it's worthwhile to
> take that route.


Hi Paul,

Thanks *a lot* for this detailed analysis.  Even though PEP 550 isn't
going to make it to 3.7 and I'm not going to edit/rewrite it anymore,
I'll try to incorporate some of your feedback into the new PEP.

Thanks,
Yury
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 10:30 PM, Serhiy Storchaka 
wrote:

> 18.10.17 22:21, Koos Zevenhoven пише:
>
>> ​Nice! Though I'd really like a general ​solution that other code can
>> easily adopt, even third-party extension libraries.
>>
>
> What is the more general solution? For interrupting C code you need to
> check signals manually, either in every loop, or in every iterator. It
> seems to me that the number of loops is larger than the number of iterators.
>
>
​Sorry, I missed this email earlier.

Maybe a macro like Py_MAKE_THIS_LOOP_BREAKABLE_FOR_ME_PLEASE that you could
insert wherever you think the code might be spending some time without
calling any Python code. One could use it rather carelessly, at least more
so than refcounts. Something like the macro you wrote, except that it would
take care of the whole thing and not just the counting.

-- Koos

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 10:21 PM, Koos Zevenhoven  wrote:

> On Wed, Oct 18, 2017 at 10:13 PM, Serhiy Storchaka 
> wrote:
>
>> 18.10.17 17:48, Nick Coghlan пише:
>>
>>> 1. It will make those loops slower, due to the extra overhead of
>>> checking for signals (even the opcode eval loop includes all sorts of
>>> tricks to avoid actually checking for new signals, since doing so is
>>> relatively slow)
>>> 2. It will make those loops harder to maintain, since the high cost of
>>> checking for signals means the existing flat loops will need to be replaced
>>> with nested ones to reduce the per-iteration cost of the more expensive
>>> checks
>>> 3. It means making the signal checking even harder to reason about than
>>> it already is, since even C implemented methods that avoid invoking
>>> arbitrary Python code could now still end up checking for signals
>>>
>>
>> I have implemented signals checking for itertools iterators. [1] The
>> overhead is insignificant because signals are checked only for every
>> 0x1-th item (100-4000 times/sec). The consuming loops are not changed
>> because signals are checked on the producer's side.
>>
>> [1] https://bugs.python.org/issue31815
>>
>>
> ​Nice! Though I'd really like a general ​solution that other code can
> easily adopt, even third-party extension libraries.
>
>
​By the way, now that I actually read the BPO issue​, it looks like the
benchmarks were for 0x1000 (15 bits)? And why is everyone doing powers of
two anyway?

Anyway, I still don't think infinite iterables are the most common case
where this problem occurs. Solving this in the most common consuming loops
would allow breaking out of a lot of long loops regardless of which
iterable type (if any) is being used. So I'm still asking which one should
solve the problem.

​-- Koos​


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Serhiy Storchaka

18.10.17 22:21, Koos Zevenhoven пише:
​Nice! Though I'd really like a general ​solution that other code can 
easily adopt, even third-party extension libraries.


What is the more general solution? For interrupting C code you need to 
check signals manually, either in every loop, or in every iterator. It 
seems to me that the number of loops is larger than the number of iterators.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 10:13 PM, Serhiy Storchaka 
wrote:

> 18.10.17 17:48, Nick Coghlan пише:
>
>> 1. It will make those loops slower, due to the extra overhead of checking
>> for signals (even the opcode eval loop includes all sorts of tricks to
>> avoid actually checking for new signals, since doing so is relatively slow)
>> 2. It will make those loops harder to maintain, since the high cost of
>> checking for signals means the existing flat loops will need to be replaced
>> with nested ones to reduce the per-iteration cost of the more expensive
>> checks
>> 3. It means making the signal checking even harder to reason about than
>> it already is, since even C implemented methods that avoid invoking
>> arbitrary Python code could now still end up checking for signals
>>
>
> I have implemented signals checking for itertools iterators. [1] The
> overhead is insignificant because signals are checked only for every
> 0x1-th item (100-4000 times/sec). The consuming loops are not changed
> because signals are checked on the producer's side.
>
> [1] https://bugs.python.org/issue31815
>
>
​Nice! Though I'd really like a general ​solution that other code can
easily adopt, even third-party extension libraries.

-- Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Serhiy Storchaka

18.10.17 17:48, Nick Coghlan пише:
1. It will make those loops slower, due to the extra overhead of 
checking for signals (even the opcode eval loop includes all sorts of 
tricks to avoid actually checking for new signals, since doing so is 
relatively slow)
2. It will make those loops harder to maintain, since the high cost of 
checking for signals means the existing flat loops will need to be 
replaced with nested ones to reduce the per-iteration cost of the more 
expensive checks
3. It means making the signal checking even harder to reason about than 
it already is, since even C implemented methods that avoid invoking 
arbitrary Python code could now still end up checking for signals


I have implemented signals checking for itertools iterators. [1] The 
overhead is insignificant because signals are checked only for every 
0x1-th item (100-4000 times/sec). The consuming loops are not 
changed because signals are checked on the producer's side.


[1] https://bugs.python.org/issue31815

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 9:24 PM, MRAB  wrote:

>
> The re module increments a counter on each iteration and checks for
> signals when the bottom 12 bits are 0.
>
> The regex module increments a 16-bit counter on each iteration and checks
> for signals when it wraps around to 0.
>

Then I​'d say that's a great solution, except that `regex` probably
over-exaggerates the overhead of checking for signals, and that `re` module
for some strange reason wants to make an additional bitwise and.

-- Koos

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread MRAB

On 2017-10-18 15:48, Nick Coghlan wrote:
On 18 October 2017 at 22:36, Koos Zevenhoven > wrote:


On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan mailto:ncogh...@gmail.com>> wrote:

That one can only be fixed in count() - list already checks
operator.length_hint(), so implementing
itertools.count.__length_hint__() to always raise an exception
would be enough to handle the container constructor case.


While that may be a convenient hack to solve some of the cases,
maybe it's possible for list(..) etc. to give Ctrl-C a chance every
now and then? (Without a noticeable performance penalty, that is.)
That would also help with *finite* C-implemented iterables that are
just slow to turn into a list.

If I'm not mistaken, we're talking about C-implemented functions
that iterate over C-implemented iterators. It's not at all obvious
to me that it's the iterator that should handle Ctrl-C.


It isn't, it's the loop's responsibility. The problem is that one of the 
core design assumptions in the CPython interpreter implementation is 
that signals from the operating system get handled by the opcode eval 
loop in the main thread, and Ctrl-C is one of those signals.


This is why "for x in itertools.cycle(): pass" can be interrupted, while 
"sum(itertools.cycle())" can't: in the latter case, the opcode eval loop 
isn't running, as we're inside a tight loop inside the sum() implementation.


It's easy to say "Well those loops should all be checking for signals 
then", but I expect folks wouldn't actually like the consequences of 
doing something about it, as:


1. It will make those loops slower, due to the extra overhead of 
checking for signals (even the opcode eval loop includes all sorts of 
tricks to avoid actually checking for new signals, since doing so is 
relatively slow)
2. It will make those loops harder to maintain, since the high cost of 
checking for signals means the existing flat loops will need to be 
replaced with nested ones to reduce the per-iteration cost of the more 
expensive checks


The re module increments a counter on each iteration and checks for 
signals when the bottom 12 bits are 0.


The regex module increments a 16-bit counter on each iteration and 
checks for signals when it wraps around to 0.


3. It means making the signal checking even harder to reason about than 
it already is, since even C implemented methods that avoid invoking 
arbitrary Python code could now still end up checking for signals


It's far from being clear to me that making such a change would actually 
be a net improvement, especially when there's an opportunity to mitigate 
the problem by having known-infinite iterators report themselves as such.



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 6:56 PM, Paul Moore  wrote:

> OK, looks like I've lost track of what this thread is about then.
> Sorry for the noise.
> Paul
>
>
​No worries. I'm not sure I can tell what this thread is about either.
Different people seem to have different ideas about that.

My most recent point was that __contains__ already has to allow Python code
to run on each iteration, so it is not the kind of code that Nick was
referring to, and which I'm not convinced even exists.

––Koos



> On 18 October 2017 at 16:40, Koos Zevenhoven  wrote:
> > On Wed, Oct 18, 2017 at 6:36 PM, Paul Moore  wrote:
> >>
> >> On 18 October 2017 at 16:27, Koos Zevenhoven  wrote:
> >> > So you're talking about code that would make a C-implemented Python
> >> > iterable
> >> > of strictly C-implemented Python objects and then pass this to
> something
> >> > C-implemented like list(..) or sum(..), while expecting no Python code
> >> > to be
> >> > run or signals to be checked anywhere while doing it. I'm not really
> >> > convinced that such code exists. But if such code does exist, it
> sounds
> >> > like
> >> > the code is heavily dependent on implementation details.
> >>
> >> Well, the OP specifically noted that he had recently encountered
> >> precisely that situation:
> >>
> >> """
> >> I recently came across a bug where checking negative membership
> >> (__contains__ returns False) of an infinite iterator will freeze the
> >> program.
> >> """
> >>
> >
> > No, __contains__ does not expect no python code to be run, because Python
> > code *can* run, as Serhiy in fact already demonstrated for another
> purpose:
> >
> > On Wed, Oct 18, 2017 at 3:53 PM, Serhiy Storchaka 
> > wrote:
> >>
> >> 18.10.17 13:22, Nick Coghlan пише:
> >>>
> >>> 2.. These particular cases can be addressed locally using existing
> >>> protocols, so the chances of negative side effects are low
> >>
> >>
> >> Only the particular case `count() in count()` can be addressed without
> >> breaking the following examples:
> >>
> >> >>> class C:
> >> ... def __init__(self, count):
> >> ... self.count = count
> >> ... def __eq__(self, other):
> >> ... print(self.count, other)
> >> ... if not self.count:
> >> ... return True
> >> ... self.count -= 1
> >> ... return False
> >> ...
> >> >>> import itertools
> >> >>> C(5) in itertools.count()
> >> 5 0
> >> 4 1
> >> 3 2
> >> 2 3
> >> 1 4
> >> 0 5
> >> True
> >
> >
> >
> > Clearly, Python code *does* run from within
> itertools.count.__contains__(..)
> >
> >
> > ––Koos
> >
> >
> > --
> > + Koos Zevenhoven + http://twitter.com/k7hoven +
>



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Paul Moore
OK, looks like I've lost track of what this thread is about then.
Sorry for the noise.
Paul

On 18 October 2017 at 16:40, Koos Zevenhoven  wrote:
> On Wed, Oct 18, 2017 at 6:36 PM, Paul Moore  wrote:
>>
>> On 18 October 2017 at 16:27, Koos Zevenhoven  wrote:
>> > So you're talking about code that would make a C-implemented Python
>> > iterable
>> > of strictly C-implemented Python objects and then pass this to something
>> > C-implemented like list(..) or sum(..), while expecting no Python code
>> > to be
>> > run or signals to be checked anywhere while doing it. I'm not really
>> > convinced that such code exists. But if such code does exist, it sounds
>> > like
>> > the code is heavily dependent on implementation details.
>>
>> Well, the OP specifically noted that he had recently encountered
>> precisely that situation:
>>
>> """
>> I recently came across a bug where checking negative membership
>> (__contains__ returns False) of an infinite iterator will freeze the
>> program.
>> """
>>
>
> No, __contains__ does not expect no python code to be run, because Python
> code *can* run, as Serhiy in fact already demonstrated for another purpose:
>
> On Wed, Oct 18, 2017 at 3:53 PM, Serhiy Storchaka 
> wrote:
>>
>> 18.10.17 13:22, Nick Coghlan пише:
>>>
>>> 2.. These particular cases can be addressed locally using existing
>>> protocols, so the chances of negative side effects are low
>>
>>
>> Only the particular case `count() in count()` can be addressed without
>> breaking the following examples:
>>
>> >>> class C:
>> ... def __init__(self, count):
>> ... self.count = count
>> ... def __eq__(self, other):
>> ... print(self.count, other)
>> ... if not self.count:
>> ... return True
>> ... self.count -= 1
>> ... return False
>> ...
>> >>> import itertools
>> >>> C(5) in itertools.count()
>> 5 0
>> 4 1
>> 3 2
>> 2 3
>> 1 4
>> 0 5
>> True
>
>
>
> Clearly, Python code *does* run from within itertools.count.__contains__(..)
>
>
> ––Koos
>
>
> --
> + Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 6:36 PM, Paul Moore  wrote:

> On 18 October 2017 at 16:27, Koos Zevenhoven  wrote:
> > So you're talking about code that would make a C-implemented Python
> iterable
> > of strictly C-implemented Python objects and then pass this to something
> > C-implemented like list(..) or sum(..), while expecting no Python code
> to be
> > run or signals to be checked anywhere while doing it. I'm not really
> > convinced that such code exists. But if such code does exist, it sounds
> like
> > the code is heavily dependent on implementation details.
>
> Well, the OP specifically noted that he had recently encountered
> precisely that situation:
>
> """
> I recently came across a bug where checking negative membership
> (__contains__ returns False) of an infinite iterator will freeze the
> program.
> """
>
>
​No, __contains__ does not expect no python code to be run, because Python
code *can* run, as Serhiy in fact already demonstrated for another purpose:
​

On Wed, Oct 18, 2017 at 3:53 PM, Serhiy Storchaka 
 wrote:

> 18.10.17 13:22, Nick Coghlan пише:
>
>> 2.. These particular cases can be addressed locally using existing
>> protocols, so the chances of negative side effects are low
>>
>
> Only the particular case `count() in count()` can be addressed without
> breaking the following examples:
>
> >>> class C:
> ... def __init__(self, count):
> ... self.count = count
> ... def __eq__(self, other):
> ... print(self.count, other)
> ... if not self.count:
> ... return True
> ... self.count -= 1
> ... return False
> ...
> >>> import itertools
> >>> C(5) in itertools.count()
> 5 0
> 4 1
> 3 2
> 2 3
> 1 4
> 0 5
> True



Clearly, Python code *does* run from within itertools.count.__contains__(..)


––Koos​


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Paul Moore
On 18 October 2017 at 16:27, Koos Zevenhoven  wrote:
> So you're talking about code that would make a C-implemented Python iterable
> of strictly C-implemented Python objects and then pass this to something
> C-implemented like list(..) or sum(..), while expecting no Python code to be
> run or signals to be checked anywhere while doing it. I'm not really
> convinced that such code exists. But if such code does exist, it sounds like
> the code is heavily dependent on implementation details.

Well, the OP specifically noted that he had recently encountered
precisely that situation:

"""
I recently came across a bug where checking negative membership
(__contains__ returns False) of an infinite iterator will freeze the
program.
"""

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 5:48 PM, Nick Coghlan  wrote:

> On 18 October 2017 at 22:36, Koos Zevenhoven  wrote:
>
>> On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan  wrote:
>>
>>> That one can only be fixed in count() - list already checks
>>> operator.length_hint(), so implementing itertools.count.__length_hint__()
>>> to always raise an exception would be enough to handle the container
>>> constructor case.
>>>
>>
>> While that may be a convenient hack to solve some of the cases, maybe
>> it's possible for list(..) etc. to give Ctrl-C a chance every now and then?
>> (Without a noticeable performance penalty, that is.) That would also help
>> with *finite* C-implemented iterables that are just slow to turn into a
>> list.
>>
>> If I'm not mistaken, we're talking about C-implemented functions that
>> iterate over C-implemented iterators. It's not at all obvious to me that
>> it's the iterator that should handle Ctrl-C.
>>
>
> It isn't, it's the loop's responsibility. The problem is that one of the
> core design assumptions in the CPython interpreter implementation is that
> signals from the operating system get handled by the opcode eval loop in
> the main thread, and Ctrl-C is one of those signals.
>
> This is why "for x in itertools.cycle(): pass" can be interrupted, while
> "sum(itertools.cycle())" can't: in the latter case, the opcode eval loop
> isn't running, as we're inside a tight loop inside the sum() implementation.
>
> It's easy to say "Well those loops should all be checking for signals
> then", but I expect folks wouldn't actually like the consequences of doing
> something about it, as:
>
> 1. It will make those loops slower, due to the extra overhead of checking
> for signals (even the opcode eval loop includes all sorts of tricks to
> avoid actually checking for new signals, since doing so is relatively slow)
> 2. It will make those loops harder to maintain, since the high cost of
> checking for signals means the existing flat loops will need to be replaced
> with nested ones to reduce the per-iteration cost of the more expensive
> checks
>

Combining points 1 and 2, I don't believe nesting loops is strictly a
requirement.



> 3. It means making the signal checking even harder to reason about than it
> already is, since even C implemented methods that avoid invoking arbitrary
> Python code could now still end up checking for signals
>

So you're talking about code that would make a C-implemented Python
iterable of strictly C-implemented Python objects and then pass this to
something C-implemented like list(..) or sum(..), while expecting no Python
code to be run or signals to be checked anywhere while doing it. I'm not
really convinced that such code exists.​ But if such code does exist, it
sounds like the code is heavily dependent on implementation details.
​

>
> It's far from being clear to me that making such a change would actually
> be a net improvement, especially when there's an opportunity to mitigate
> the problem by having known-infinite iterators report themselves as such.
>
>

​I'm not against that, per se. I just don't think that solves the quite
typical case of *finite* but very tedious or memory-consuming loops that
one might want to break out of. And raising an exception from
.__length_hint__() might ​also break some obscure, but completely valid,
operations on *infinite* iterables.


​––Koos​​​



> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Nick Coghlan
On 18 October 2017 at 22:36, Koos Zevenhoven  wrote:

> On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan  wrote:
>
>> That one can only be fixed in count() - list already checks
>> operator.length_hint(), so implementing itertools.count.__length_hint__()
>> to always raise an exception would be enough to handle the container
>> constructor case.
>>
>
> While that may be a convenient hack to solve some of the cases, maybe it's
> possible for list(..) etc. to give Ctrl-C a chance every now and then?
> (Without a noticeable performance penalty, that is.) That would also help
> with *finite* C-implemented iterables that are just slow to turn into a
> list.
>
> If I'm not mistaken, we're talking about C-implemented functions that
> iterate over C-implemented iterators. It's not at all obvious to me that
> it's the iterator that should handle Ctrl-C.
>

It isn't, it's the loop's responsibility. The problem is that one of the
core design assumptions in the CPython interpreter implementation is that
signals from the operating system get handled by the opcode eval loop in
the main thread, and Ctrl-C is one of those signals.

This is why "for x in itertools.cycle(): pass" can be interrupted, while
"sum(itertools.cycle())" can't: in the latter case, the opcode eval loop
isn't running, as we're inside a tight loop inside the sum() implementation.

It's easy to say "Well those loops should all be checking for signals
then", but I expect folks wouldn't actually like the consequences of doing
something about it, as:

1. It will make those loops slower, due to the extra overhead of checking
for signals (even the opcode eval loop includes all sorts of tricks to
avoid actually checking for new signals, since doing so is relatively slow)
2. It will make those loops harder to maintain, since the high cost of
checking for signals means the existing flat loops will need to be replaced
with nested ones to reduce the per-iteration cost of the more expensive
checks
3. It means making the signal checking even harder to reason about than it
already is, since even C implemented methods that avoid invoking arbitrary
Python code could now still end up checking for signals

It's far from being clear to me that making such a change would actually be
a net improvement, especially when there's an opportunity to mitigate the
problem by having known-infinite iterators report themselves as such.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Stephan Houben
Hi all,

FWIW, I just tried the list(count()) experiment on my phone (Termux Python
interpreter under Android).

Python 3.6.2 (default, Sep 16 2017, 23:55:07)
[GCC 4.2.1 Compatible Android Clang 5.0.300080 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import itertools
>>> list(itertools.count())
Killed

Interestingly even the Termux app stays alive and otherwise the phone
remains responsive and doesn't get hot. I am now sending this mail from
that very phone.

So this issue is not an issue on the world's most popular OS 😁

Stephan







Op 18 okt. 2017 08:46 schreef "Brendan Barnwell" :

> On 2017-10-17 07:26, Serhiy Storchaka wrote:
>
>> 17.10.17 17:06, Nick Coghlan пише:
>>
>>> >Keep in mind we're not talking about a regular loop you can break out of
>>> >with Ctrl-C here - we're talking about a tight loop inside the
>>> >interpreter internals that leads to having to kill the whole host
>>> >process just to get out of it.
>>>
>> And this is the root of the issue. Just let more tight loops be
>> interruptible with Ctrl-C, and this will fix the more general issue.
>>
>
> I was just thinking the same thing.  I think in general it's
> always bad for code to be uninterruptible with Ctrl-C.  If these infinite
> iterators were fixed so they could be interrupted, this containment problem
> would be much less painful.
>
> --
> Brendan Barnwell
> "Do not follow where the path may lead.  Go, instead, where there is no
> path, and leave a trail."
>--author unknown
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Nick Coghlan
On 18 October 2017 at 22:53, Serhiy Storchaka  wrote:

> 18.10.17 13:22, Nick Coghlan пише:
>
>> 2.. These particular cases can be addressed locally using existing
>> protocols, so the chances of negative side effects are low
>>
>
> Only the particular case `count() in count()` can be addressed without
> breaking the following examples:
>

You're right, the potential impact on objects with weird __eq__
implementations would mean that even the `__contains__` approach would
require deprecation warnings for the APIs that allow short-circuiting. So
we can discard it in favour of exploring the "Can we make a beneficial
improvement via __length_hint__?" question.


> 3. The total amount of code involved is likely to be small (a dozen or so
>> lines of C, a similar number of lines of Python in the tests) in
>> well-isolated protocol functions, so the chances of introducing future
>> maintainability problems are low
>>
>
> It depends on what you want to achieve. Just prevent an infinity loop in
> `count() in count()`, or optimize `int in count()`, or optimize more
> special cases.
>

My interest lies specifically in reducing the number of innocent looking
ways we offer to provoke uninterruptible infinite loops or unbounded memory
consumption.


> 4. We have a potential contributor who is presumably offering to do the
>> work (if that's not the case, then the question is moot anyway until a
>> sufficiently interested volunteer turns up)
>>
>
> Maintaining is more than writing an initial code.
>

Aye, that's why the preceding point was to ask how large a change we'd be
offering to maintain indefinitely, and how well isolated that change would
be.


> If we were to do that, then we *could* make the solution to the reported
>> problem more general by having all builtin and standard library operations
>> that expect to be working with finite iterators (the containment testing
>> fallback, min, max, sum, any, all, functools.reduce, etc) check for a
>> length hint, even if they aren't actually pre-allocating any memory.
>>
>
> This will add a significant overhead for relatively short (hundreds of
> items) sequences. I already did benchmarking for similar cases in the past.


I did wonder about that, so I guess the baseline zero-risk enhancement idea
would be to only prevent the infinite loop in cases that already request a
length hint as a memory pre-allocation check. That would reduce the
likelihood of the most painful case (grinding the machine to a halt),
without worrying about the less painful cases (which will break the current
process, but the rest of the machine will be fine).

Given that, adding TypeError raising __length_hint__ implementations to
itertools.count(), itertools.cycle(), and itertools.repeat() would make
sense as an independent RFE, without worrying about any APIs that don't
already check for a length hint.

A more intrusive option would then be to look at breaking the other tight
iteration loops into two phases, such that checking for potentially
infinite iterators could be delayed until after the first thousand
iterations or so. That option is potentially worth exploring anyway, since
breaking up the current single level loops as nested loops would be a
pre-requisite for allowing these APIs to check for signals while they're
running while keeping the per-iteration overhead low (only one
pre-requisite of many though, and probably one of the easier ones).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Memory limits [was Re: Membership of infinite iterators]

2017-10-18 Thread Nick Coghlan
On 18 October 2017 at 22:51, Stefan Krah  wrote:

> On Wed, Oct 18, 2017 at 10:43:57PM +1000, Nick Coghlan wrote:
> > Per-process memory quotas *can* help avoid this, but enforcing them
> > requires that every process run in a resource controlled sandbox. Hence,
> > it's not a coincidence that mobile operating systems and container-based
> > server environments already work that way, and the improved ability to
> cope
> > with misbehaving applications is part of why desktop operating systems
> > would like to follow the lead of their mobile and server counterparts :)
>
> Does this also fall under the sandbox definition?
>
> $ softlimit -m 10 python3
>

Yeah, Linux offers good opt-in tools for this kind of thing, and the
combination of Android and containerised server environments means they're
only getting better. But we're still some time away from it being routine
for your desktop to be well protected from memory management misbehaviour
in arbitrary console or GUI applications.

The resource module (which Steven mentioned in passing) already provides
opt-in access to some of those features from within the program itself:
https://docs.python.org/3/library/resource.html

For example:

>>> import sys, resource
>>> data = bytes(2**32)
>>> resource.setrlimit(resource.RLIMIT_DATA, (2**31, sys.maxsize))
>>> data = bytes(2**32)
Traceback (most recent call last):
  File "", line 1, in 
MemoryError
>>> resource.setrlimit(resource.RLIMIT_DATA, (sys.maxsize, sys.maxsize))
>>> data = bytes(2**32)

(Bulk memory allocations start failing on my machine somewhere between
2**33 and 2**34, which is about what I'd expect, since it has 8 GiB of
physical RAM installed)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Memory limits [was Re: Membership of infinite iterators]

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 2:38 PM, Steven D'Aprano 
wrote:

> On Wed, Oct 18, 2017 at 01:39:28PM +0300, Koos Zevenhoven wrote:
>
> > I'm writing from my phone now, cause I was dumb enough to try
> list(count())
>
> You have my sympathies -- I once, due to typo, accidentally ran
> something like range(10**100) in Python 2.
>
>
​Oh, I think I've done something like that too, and there are definitely
still opportunities in Python 3 to ask for the impossible. But what I did
now, I did "on purpose".​ For a split second, I really wanted to know how
bad it would be. But a few minutes later I had little interest left in that
;). Rebooting a computer definitely takes longer than restarting a Python
process.


>
> > But should it be fixed in list or in count?
>
> Neither. There are too many other places this can break for it to be
> effective to try to fix each one in place.
>
>
​To clarify, I was talking about allowing Ctrl-C to break it, which
somebody had suggested. That would also help if the C-implemented iterable
just takes a lot of time to generate the items.

​And for the record, I just tried

>>> sum(itertools.count())

And as we could expect, it does not respect Ctrl-C either.


––Koos



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Serhiy Storchaka

18.10.17 13:22, Nick Coghlan пише:
2.. These particular cases can be addressed locally using existing 
protocols, so the chances of negative side effects are low


Only the particular case `count() in count()` can be addressed without 
breaking the following examples:


>>> class C:
... def __init__(self, count):
... self.count = count
... def __eq__(self, other):
... print(self.count, other)
... if not self.count:
... return True
... self.count -= 1
... return False
...
>>> import itertools
>>> C(5) in itertools.count()
5 0
4 1
3 2
2 3
1 4
0 5
True
>>> it = itertools.cycle([C(5)]); it in it
5 
4 
3 
2 
1 
0 
True
>>> it = itertools.repeat(C(5)); it in it
5 repeat(<__main__.C object at 0x7f65512c5dc0>)
4 repeat(<__main__.C object at 0x7f65512c5dc0>)
3 repeat(<__main__.C object at 0x7f65512c5dc0>)
2 repeat(<__main__.C object at 0x7f65512c5dc0>)
1 repeat(<__main__.C object at 0x7f65512c5dc0>)
0 repeat(<__main__.C object at 0x7f65512c5dc0>)
True

3. The total amount of code involved is likely to be small (a dozen or 
so lines of C, a similar number of lines of Python in the tests) in 
well-isolated protocol functions, so the chances of introducing future 
maintainability problems are low


It depends on what you want to achieve. Just prevent an infinity loop in 
`count() in count()`, or optimize `int in count()`, or optimize more 
special cases.


4. We have a potential contributor who is presumably offering to do the 
work (if that's not the case, then the question is moot anyway until a 
sufficiently interested volunteer turns up)


Maintaining is more than writing an initial code.

If we were to do that, then we *could* make the solution to the reported 
problem more general by having all builtin and standard library 
operations that expect to be working with finite iterators (the 
containment testing fallback, min, max, sum, any, all, functools.reduce, 
etc) check for a length hint, even if they aren't actually 
pre-allocating any memory.


This will add a significant overhead for relatively short (hundreds of 
items) sequences. I already did benchmarking for similar cases in the past.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Memory limits [was Re: Membership of infinite iterators]

2017-10-18 Thread Stefan Krah
On Wed, Oct 18, 2017 at 10:43:57PM +1000, Nick Coghlan wrote:
> Per-process memory quotas *can* help avoid this, but enforcing them
> requires that every process run in a resource controlled sandbox. Hence,
> it's not a coincidence that mobile operating systems and container-based
> server environments already work that way, and the improved ability to cope
> with misbehaving applications is part of why desktop operating systems
> would like to follow the lead of their mobile and server counterparts :)

Does this also fall under the sandbox definition?

$ softlimit -m 10 python3
Python 3.7.0a1+ (heads/master:bdaeb7d237, Oct 16 2017, 18:54:55) 
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> [0] * 10
Traceback (most recent call last):
  File "", line 1, in 
MemoryError


People who are worried could make a python3 alias or use Ctrl-\.



Stefan Krah



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Memory limits [was Re: Membership of infinite iterators]

2017-10-18 Thread Nick Coghlan
On 18 October 2017 at 21:38, Steven D'Aprano  wrote:

> > But should it be fixed in list or in count?
>
> Neither. There are too many other places this can break for it to be
> effective to try to fix each one in place.
>

> e.g. set(xrange(2**64)), or tuple(itertools.repeat([1]))
>

A great many of these call operator.length_hint() these days in order to
make a better guess as to how much memory to pre-allocate, so while that
still wouldn't intercept everything, it would catch a lot of them.


> Rather, I think we should set a memory limit that applies to the whole
> process. Once you try to allocate more memory, you get an MemoryError
> exception rather than have the OS thrash forever trying to allocate a
> terabyte on a 4GB machine.
>
> (I don't actually understand why the OS can't fix this.)
>

Trying to allocate enormous amounts of memory all at once isn't the
problem, as that just fails outright with "Not enough memory":

>>> data = bytes(2**62)
Traceback (most recent call last):
  File "", line 1, in 
MemoryError

The machine-killing case is repeated allocation requests that the operating
system *can* satisfy, but require paging almost everything else out of RAM.
And that's exactly what "list(infinite_iterator)" entails, since the
interpreter will make an initial guess as to the correct size, and then
keep resizing the allocation to 125% of its previous size each time it
fills up (or so - I didn't check the current overallocation factor) .

Per-process memory quotas *can* help avoid this, but enforcing them
requires that every process run in a resource controlled sandbox. Hence,
it's not a coincidence that mobile operating systems and container-based
server environments already work that way, and the improved ability to cope
with misbehaving applications is part of why desktop operating systems
would like to follow the lead of their mobile and server counterparts :)

So here is my suggestion:
>
> 1. Let's add a function in sys to set the "maximum memory" available,
> for some definition of memory that makes the most sense on your
> platform. Ordinary Python programmers shouldn't have to try to decipher
> the ulimit interface.
>

Historically, one key reason we didn't do that was because the `PyMem_*`
APIs bypassed CPython's memory allocator, so such a limit wouldn't have
been particularly effective.

As of 3.6 though, even bulk memory allocations pass through pymalloc,
making a Python level memory allocation limit potentially more viable
(since it would pick up almost all of the interpeter's own allocations,
even if it missed those in extension modules):
https://docs.python.org/dev/whatsnew/3.6.html#optimizations

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan  wrote:

> On 18 October 2017 at 20:39, Koos Zevenhoven  wrote:
>
>> On Oct 18, 2017 13:29, "Nick Coghlan"  wrote:
>>
>> On 18 October 2017 at 19:56, Koos Zevenhoven  wrote:
>>
>>> I'm unable to reproduce the "uninterruptible with Ctrl-C"​ problem with
>>> infinite iterators. At least itertools doesn't seem to have it:
>>>
>>> >>> import itertools
>>> >>> for i in itertools.count():
>>> ... pass
>>> ...
>>>
>>
>> That's interrupting the for loop, not the iterator. This is the test case
>> you want for the problem Jason raised:
>>
>> >>> "a" in itertools.count()
>>
>> Be prepared to suspend and terminate the affected process, because Ctrl-C
>> isn't going to help :)
>>
>>
>> I'm writing from my phone now, cause I was dumb enough to try
>> list(count())
>>
>
> Yeah, that's pretty much the worst case example, since the machine starts
> thrashing memory long before it actually gives up and starts denying the
> allocation requests :(
>
>
>> But should it be fixed in list or in count?
>>
>
> That one can only be fixed in count() - list already checks
> operator.length_hint(), so implementing itertools.count.__length_hint__()
> to always raise an exception would be enough to handle the container
> constructor case.
>
>
While that may be a convenient hack to solve some of the cases, maybe it's
possible for list(..) etc. to give Ctrl-C a chance every now and then?
(Without a noticeable performance penalty, that is.) That would also help
with *finite* C-implemented iterables that are just slow to turn into a
list.

If I'm not mistaken, we're talking about C-implemented functions that
iterate over C-implemented iterators. It's not at all obvious to me that
it's the iterator that should handle Ctrl-C.

––Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Memory limits [was Re: Membership of infinite iterators]

2017-10-18 Thread Steven D'Aprano
On Wed, Oct 18, 2017 at 01:39:28PM +0300, Koos Zevenhoven wrote:

> I'm writing from my phone now, cause I was dumb enough to try list(count())

You have my sympathies -- I once, due to typo, accidentally ran 
something like range(10**100) in Python 2.

 
> But should it be fixed in list or in count?

Neither. There are too many other places this can break for it to be 
effective to try to fix each one in place.

e.g. set(xrange(2**64)), or tuple(itertools.repeat([1]))

Rather, I think we should set a memory limit that applies to the whole 
process. Once you try to allocate more memory, you get an MemoryError 
exception rather than have the OS thrash forever trying to allocate a 
terabyte on a 4GB machine.

(I don't actually understand why the OS can't fix this.)

Being able to limit memory use is a fairly common request, e.g. on 
Stackoverflow:

https://stackoverflow.com/questions/30269238/limit-memory-usage
https://stackoverflow.com/questions/2308091/how-to-limit-the-heap-size
https://community.webfaction.com/questions/15367/setting-max-memory-for-python-script

And Java apparently has a commandline switch to manage memory:

https://stackoverflow.com/questions/22887400/is-there-an-equivalent-to-java-xmx-for-python

The problems with the resources module are that its effectively an 
interface to ulimit, which makes it confusing and platform-specific; it 
is no help to Windows users; it isn't used by default; and not many 
people know about it. (I know I didn't until tonight.)
 
So here is my suggestion:

1. Let's add a function in sys to set the "maximum memory" available, 
for some definition of memory that makes the most sense on your 
platform. Ordinary Python programmers shouldn't have to try to decipher 
the ulimit interface.

2. Have a command line switch to set that value, e.g.:

python3 -L 1073741824  # 1 GiB
python3 -L 0  # machine-dependent limit
python3 -L -1  # unlimited

where the machine-dependent limit is set by the interpreter, depending 
on the amount of memory it thinks it has available.

3. For the moment, stick to defaulting to -L -1 "unlimited", but with 
the intention to change to -L 0 "let the interpreter decide" in some 
future release, after an appropriate transition period.

On Linux, we can always run

   ulimit  python3 

but honestly I never know which option to give (maximum stack size? 
maximum virtual memory? why is there no setting for maximum real 
memory?) and that doesn't help Windows users.


Thoughts?




-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Nick Coghlan
On 18 October 2017 at 20:39, Koos Zevenhoven  wrote:

> On Oct 18, 2017 13:29, "Nick Coghlan"  wrote:
>
> On 18 October 2017 at 19:56, Koos Zevenhoven  wrote:
>
>> I'm unable to reproduce the "uninterruptible with Ctrl-C"​ problem with
>> infinite iterators. At least itertools doesn't seem to have it:
>>
>> >>> import itertools
>> >>> for i in itertools.count():
>> ... pass
>> ...
>>
>
> That's interrupting the for loop, not the iterator. This is the test case
> you want for the problem Jason raised:
>
> >>> "a" in itertools.count()
>
> Be prepared to suspend and terminate the affected process, because Ctrl-C
> isn't going to help :)
>
>
> I'm writing from my phone now, cause I was dumb enough to try list(count())
>

Yeah, that's pretty much the worst case example, since the machine starts
thrashing memory long before it actually gives up and starts denying the
allocation requests :(


> But should it be fixed in list or in count?
>

That one can only be fixed in count() - list already checks
operator.length_hint(), so implementing itertools.count.__length_hint__()
to always raise an exception would be enough to handle the container
constructor case.

The open question would then be the cases that don't pre-allocate memory,
but still always attempt to consume the entire iterator:

min(itr)
max(itr)
sum(itr)
functools.reduce(op, itr)
"".join(itr)

And those which *may* attempt to consume the entire iterator, but won't
necessarily do so:

x in itr
any(itr)
all(itr)

The items in the first category could likely be updated to check
length_hint and propagate any errors immediately, since they don't provide
any short circuiting behaviour - feeding them an infinite iterator is a
guaranteed uninterruptible infinite loop, so checking for a length hint
won't break any currently working code (operator.length_hint defaults to
returning zero if a type doesn't implement __length_hint__).

I'm tempted to say the same for the APIs in the latter category as well,
but their short-circuiting semantics mean those can technically have
well-defined behaviour, even when given an infinite iterator:

>>> any(itertools.count())
True
>>> all(itertools.count())
False
>>> 1 in itertools.count()
True

It's only the "never short-circuits" branch that is ill-defined for
non-terminating input. So for these, the safer path would be to emit
DeprecationWarning if length_hint fails in 3.7, and then pass the exception
through in 3.8+.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Oct 18, 2017 13:29, "Nick Coghlan"  wrote:

On 18 October 2017 at 19:56, Koos Zevenhoven  wrote:

> I'm unable to reproduce the "uninterruptible with Ctrl-C"​ problem with
> infinite iterators. At least itertools doesn't seem to have it:
>
> >>> import itertools
> >>> for i in itertools.count():
> ... pass
> ...
>

That's interrupting the for loop, not the iterator. This is the test case
you want for the problem Jason raised:

>>> "a" in itertools.count()

Be prepared to suspend and terminate the affected process, because Ctrl-C
isn't going to help :)


I'm writing from my phone now, cause I was dumb enough to try list(count())

But should it be fixed in list or in count?

-- Koos

PS. Nick, sorry about the duplicate email.



Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Nick Coghlan
On 18 October 2017 at 19:56, Koos Zevenhoven  wrote:

> I'm unable to reproduce the "uninterruptible with Ctrl-C"​ problem with
> infinite iterators. At least itertools doesn't seem to have it:
>
> >>> import itertools
> >>> for i in itertools.count():
> ... pass
> ...
>

That's interrupting the for loop, not the iterator. This is the test case
you want for the problem Jason raised:

>>> "a" in itertools.count()

Be prepared to suspend and terminate the affected process, because Ctrl-C
isn't going to help :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Paul Moore
On 18 October 2017 at 10:56, Koos Zevenhoven  wrote:
> I'm unable to reproduce the "uninterruptible with Ctrl-C" problem with
> infinite iterators. At least itertools doesn't seem to have it:
>
 import itertools
 for i in itertools.count():
> ... pass
> ...
> ^CTraceback (most recent call last):
>   File "", line 1, in 
> KeyboardInterrupt

That's not the issue here, as the CPython interpreter implements this
with multiple opcodes, and checks between opcodes for Ctrl-C. The
demonstration is:

>>> import itertools
>>> 'x' in itertools.count()

... only way to break out is to kill the process.
Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Nick Coghlan
On 18 October 2017 at 03:39, Koos Zevenhoven  wrote:

> On Tue, Oct 17, 2017 at 5:26 PM, Serhiy Storchaka 
> wrote:
>
>> 17.10.17 17:06, Nick Coghlan пише:
>>
>>> Keep in mind we're not talking about a regular loop you can break out of
>>> with Ctrl-C here - we're talking about a tight loop inside the interpreter
>>> internals that leads to having to kill the whole host process just to get
>>> out of it.
>>>
>>
>> And this is the root of the issue. Just let more tight loops be
>> interruptible with Ctrl-C, and this will fix the more general issue.
>>
>>
> ​Not being able to interrupt something with Ctrl-C in the repl or with the
> interrupt command in Jupyter notebooks is definitely a thing I sometimes
> encounter. A pity I don't remember when it happens, because I usually
> forget it very soon after I've restarted the kernel and continued working.
> But my guess is it's usually not because of an infinite iterator.
>

Fixing the general case is hard, because the assumption that signals are
only checked between interpreter opcodes is a pervasive one throughout the
interpreter internals.  We certainly *could* redefine affected C APIs as
potentially raising KeyboardInterrupt (adjusting the signal management
infrastructure accordingly), and if someone actually follows through and
implements that some day, then the argument could then be made that given
such change, it might be reasonable to drop any a priori guards that we
have put in place for particular *detectable* uninterruptible infinite
loops.

However, that's not the design question being discussed in this thread. The
design question here is "We have 3 known uninterruptible infinite loops
that are easy to detect and prevent. Should we detect and prevent them?".
"We shouldn't allow anyone to do this easy thing, because it would be
preferable for someone to instead do this hard and complicated thing that
nobody is offering to do" isn't a valid design argument in that situation.

And I have a four step check for that which prompts me to say "Yes, we
should detect and prevent them":

1. Uninterruptible loops are bad, so having fewer of them is better
2. These particular cases can be addressed locally using existing
protocols, so the chances of negative side effects are low
3. The total amount of code involved is likely to be small (a dozen or so
lines of C, a similar number of lines of Python in the tests) in
well-isolated protocol functions, so the chances of introducing future
maintainability problems are low
4. We have a potential contributor who is presumably offering to do the
work (if that's not the case, then the question is moot anyway until a
sufficiently interested volunteer turns up)

As an alternative implementation approach, the case could also be made that
these iterators should be raising TypeError in __length_hint__, as that
protocol method is explicitly designed to be used for finite container
pre-allocation. That way things like "list(itertools.count())" would fail
immediately (similar to the way "list(range(10**100))" already does) rather
than attempting to consume all available memory before (hopefully) finally
failing with MemoryError.

If we were to do that, then we *could* make the solution to the reported
problem more general by having all builtin and standard library operations
that expect to be working with finite iterators (the containment testing
fallback, min, max, sum, any, all, functools.reduce, etc) check for a
length hint, even if they aren't actually pre-allocating any memory. Then
the general purpose marker for "infinite iterator" would be "Explicitly
defines __length_hint__ to raise TypeError", and it would prevent a priori
all operations that attempted to fully consume the iterator.

That more general approach would cause some currently "working" code (like
"any(itertools.count())" and "all(itertools.count())", both of which
consume at most 2 items from the iterator) to raise an exception instead,
and hence would require the introduction of a DeprecationWarning in 3.7
(where the affected APIs would start calling length hint, but suppress any
exceptions from it), before allowing the exception to propagate in 3.8+.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Tue, Oct 17, 2017 at 9:44 PM, Brendan Barnwell 
wrote:

> On 2017-10-17 07:26, Serhiy Storchaka wrote:
>
>> 17.10.17 17:06, Nick Coghlan пише:
>>
>>> >Keep in mind we're not talking about a regular loop you can break out of
>>> >with Ctrl-C here - we're talking about a tight loop inside the
>>> >interpreter internals that leads to having to kill the whole host
>>> >process just to get out of it.
>>>
>> And this is the root of the issue. Just let more tight loops be
>> interruptible with Ctrl-C, and this will fix the more general issue.
>>
>
> I was just thinking the same thing.  I think in general it's
> always bad for code to be uninterruptible with Ctrl-C.


Indeed I agree about this.


> If these infinite iterators were fixed so they could be interrupted, this
> containment problem would be much less painful.
>
> ​
I'm unable to reproduce the "uninterruptible with Ctrl-C"​ problem with
infinite iterators. At least itertools doesn't seem to have it:

>>> import itertools
>>> for i in itertools.count():
... pass
...
^CTraceback (most recent call last):
  File "", line 1, in 
KeyboardInterrupt

>>> for i in itertools.repeat(1):
... pass
...
^CTraceback (most recent call last):
  File "", line 1, in 
KeyboardInterrupt

>>> for i in itertools.cycle((1,)):
... pass
...
^CTraceback (most recent call last):
  File "", line 1, in 
KeyboardInterrupt
>>>


Same thing on both Windows and Linux, Python 3.6.


––Koos

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/