Re: [Python-ideas] Membership of infinite iterators

2017-11-02 Thread Koos Zevenhoven
On Thu, Nov 2, 2017 at 1:41 PM, Antoine Pitrou  wrote:

> On Thu, 2 Nov 2017 13:27:17 +0200
> Koos Zevenhoven  wrote:
> > ​There's a limit to how cheap the call to PyErr_CheckSignals() can be.
> As I
> > mentioned earlier, even just the fact that it's a C function call​ can be
> > too much.
> >
> > That's why, in the above, I used a new name PyErr_PROBE_SIGNALS() instead
> > of optimizing the existing PyErr_CheckSignals() –– turning
> > PyErr_CheckSignals() into a static inline function would change the ABI.
> I
> > also didn't call it PyErr_CHECK_SIGNALS() because the functionality is
> not
> > strictly equivalent to the existing function.
>
> Please.  If you want to replace PyErr_CheckSignals() with something
> faster, the first thing to do is to prove that PyErr_CheckSignals()
> *is* too expensive.
>

I believe Serhiy proved that by showing that his first patch did not have
negligible overhead for all cases. One might also write a C-implemented
fibonacci calculator or similar to prove the overhead is significant.

While I agree that there's a hint of premature optimization involved, my
justification for it is that I want people to be confident that when they
add the check, it's not going to slow down the loop significantly. Maybe
the fear of overhead is one of reasons for people to write uninterruptible
code. IMO, it should be made as easy as possible for the C programmer.


––Koos
​

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-11-02 Thread Koos Zevenhoven
On Thu, Nov 2, 2017 at 3:39 AM, Nick Coghlan  wrote:

> On 2 November 2017 at 04:29, Koos Zevenhoven  wrote:
>
>> SPOILER ALERT! At the moment, Nick's statement is in fact **always**
>> true in **all** cases (at least when ignoring untypical cases and some
>> inaccuracies in phrasing). ​Another question is whether the statement
>> **should** be true at all.​
>> ​
>>
>> PyErr_CheckSignals(), the function that checks for pending signals, now
>> **implicitly** uses the strictest possible memory-order requirement
>> (SEQ_CST) for checking the `is_tripped` flag, a value which can be used to
>> peek whether there are any pending signals. This means that two threads
>> that call PyErr_CheckSignals can't even **check** the value of that flag
>> at the same time, and they'll have to wait for each other and whatever the
>> CPU hardware implementation does for cache synchronzation.
>>
>
> Nice, that's deeper than I went - I just assumed there was an
> interlocked_increment in there somewhere, without checking whether or not
> there were any optimised code paths that managed to avoid that call :)
>
>
>> ​From a correctness point of view, that is absolutely great: if
>> PyErr_CheckSignals() is called, it is guaranteed to notice a new signal
>> regardles of how small the number of picoseconds after the `is_tripped`
>> flag has been set.  But is that really important? Do we really need things
>> to be slowed down by this? And do we want people to "optimize" code by
>> avoiding signal checking?
>>
>
> It isn't signal handling latency that's the concern here, it's potentially
> missing signals. Consider the case where we have 3 threads: A, B, C. A is
> the main thread that will actually handle signals, B and C both happened to
> receive them. We'll also assume  the two threads receive *different*
> signals (otherwise they'll potentially get collapsed into one notification
> regardless).
>
> The scenario we want to avoid is having one or both of the signals set,
> but is_tripped cleared. With an interlocked check (where both 0->1 and 1->0
> are memory barriers), that clearly can't happen. I suspect you're right
> that this could also be achieved with a less strict memory sync requirement
> on the 0->1 check, but that's much harder to reason about than simply
> limiting the number of iterations we make through an iterator consumption
> loop before checking for signals.
>

​​Missing signals is a concern, and there's another concern that one
received signal might be interpreted as two separate occurrences. But it
doesn't prevent us from *checking* is_tripped in a more relaxed manner, as
long as the 0->1 check is *verified* using something like SEQ_CST and the
associated signal handling is synchronized properly.

And that's why my PyErr_PROBE_SIGNALS below should work, and wouldn't
prevent multiple threads from handling signals either.
​

>
>
>> The signals won't be caught until checked anyway, and as we've seen, one
>> solution is to use a counter to determine if enough time has passed that
>> another check for potential signals should happen. That does, however,
>> raise the question of who should own the counter, and if it's OK for
>> multiple threads to use the same counter. If yes, then would we be able to
>> avoid slow atomic decrements (or increments)?
>>
>> But another solution might be to make a less strict but almost equivalent
>> functionality with much less overhead. Something like this in a header file:
>>
>> static inline int PyErr_PROBE_SIGNALS() {
>> static int volatile *flag = (int volatile *) _tripped;
>> if (*flag) {
>> return PyErr_CheckSignals();
>> }
>> else {
>> return 0;
>> }
>> }
>>
>> ​Here, the flag is casted to volatile to force a check to happen each
>> time from memory/cache. However, to make it more standard and to make sure
>> it works with all compilers/platforms, it might be better to, instead of
>> *flag, use an atomic load with "MEMORY_ORDER_RELAXED".​ Another thing is
>> that `is_tripped`, which is currently declared as static inside
>> signalmodule.c [4], should somehow be exposed in the API to make it
>> available for the inline function in headers.
>>
>> ​This solution might be fast enough for a lot of cases, although still
>> slightly slower than the counter approach, at least if the counter approach
>> would completely avoid per-iteration memory access by requiring each
>> function to own the counter that is used for this.
>>
>
> One thing I like about a nested inner loop is that it's easy to understand
> the rationale for it *without* knowing any of the details of how memory
> synchronisation works, as it's just:
>
> - we want to check for signals regularly so the latency in interrupt
> handling isn't too high
> - PyErr_CheckSignals() is too expensive to call on every iteration
> - so we only check every N iterations
>
>
My initial "negative" reaction to that potential solution ("I don't believe
nesting loops 

Re: [Python-ideas] Membership of infinite iterators

2017-11-01 Thread Nick Coghlan
On 2 November 2017 at 04:29, Koos Zevenhoven  wrote:

> SPOILER ALERT! At the moment, Nick's statement is in fact **always** true
> in **all** cases (at least when ignoring untypical cases and some
> inaccuracies in phrasing). ​Another question is whether the statement
> **should** be true at all.​
> ​
>
> PyErr_CheckSignals(), the function that checks for pending signals, now
> **implicitly** uses the strictest possible memory-order requirement
> (SEQ_CST) for checking the `is_tripped` flag, a value which can be used to
> peek whether there are any pending signals. This means that two threads
> that call PyErr_CheckSignals can't even **check** the value of that flag
> at the same time, and they'll have to wait for each other and whatever the
> CPU hardware implementation does for cache synchronzation.
>

Nice, that's deeper than I went - I just assumed there was an
interlocked_increment in there somewhere, without checking whether or not
there were any optimised code paths that managed to avoid that call :)


> ​From a correctness point of view, that is absolutely great: if
> PyErr_CheckSignals() is called, it is guaranteed to notice a new signal
> regardles of how small the number of picoseconds after the `is_tripped`
> flag has been set.  But is that really important? Do we really need things
> to be slowed down by this? And do we want people to "optimize" code by
> avoiding signal checking?
>

It isn't signal handling latency that's the concern here, it's potentially
missing signals. Consider the case where we have 3 threads: A, B, C. A is
the main thread that will actually handle signals, B and C both happened to
receive them. We'll also assume  the two threads receive *different*
signals (otherwise they'll potentially get collapsed into one notification
regardless).

The scenario we want to avoid is having one or both of the signals set, but
is_tripped cleared. With an interlocked check (where both 0->1 and 1->0 are
memory barriers), that clearly can't happen. I suspect you're right that
this could also be achieved with a less strict memory sync requirement on
the 0->1 check, but that's much harder to reason about than simply limiting
the number of iterations we make through an iterator consumption loop
before checking for signals.


> The signals won't be caught until checked anyway, and as we've seen, one
> solution is to use a counter to determine if enough time has passed that
> another check for potential signals should happen. That does, however,
> raise the question of who should own the counter, and if it's OK for
> multiple threads to use the same counter. If yes, then would we be able to
> avoid slow atomic decrements (or increments)?
>
> But another solution might be to make a less strict but almost equivalent
> functionality with much less overhead. Something like this in a header file:
>
> static inline int PyErr_PROBE_SIGNALS() {
> static int volatile *flag = (int volatile *) _tripped;
> if (*flag) {
> return PyErr_CheckSignals();
> }
> else {
> return 0;
> }
> }
>
> ​Here, the flag is casted to volatile to force a check to happen each time
> from memory/cache. However, to make it more standard and to make sure it
> works with all compilers/platforms, it might be better to, instead of
> *flag, use an atomic load with "MEMORY_ORDER_RELAXED".​ Another thing is
> that `is_tripped`, which is currently declared as static inside
> signalmodule.c [4], should somehow be exposed in the API to make it
> available for the inline function in headers.
>
> ​This solution might be fast enough for a lot of cases, although still
> slightly slower than the counter approach, at least if the counter approach
> would completely avoid per-iteration memory access by requiring each
> function to own the counter that is used for this.
>

One thing I like about a nested inner loop is that it's easy to understand
the rationale for it *without* knowing any of the details of how memory
synchronisation works, as it's just:

- we want to check for signals regularly so the latency in interrupt
handling isn't too high
- PyErr_CheckSignals() is too expensive to call on every iteration
- so we only check every N iterations

Memory synchronisation then only comes up if someone asks why
"PyErr_CheckSignals" is expensive to call. And while it wouldn't surprise
at all if you're right and there are ways to make that call cheaper,
they're still never going to be cheaper than explicitly reducing the
frequency with which it is called.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-19 Thread Koos Zevenhoven
On Thu, Oct 19, 2017 at 3:42 AM, Nick Coghlan  wrote:

> On 19 October 2017 at 08:34, Greg Ewing 
> wrote:
>
>> Nick Coghlan wrote:
>>
>>> since breaking up the current single level loops as nested loops would
>>> be a pre-requisite for allowing these APIs to check for signals while
>>> they're running while keeping the per-iteration overhead low
>>>
>>
>> Is there really much overhead? Isn't it just checking a flag?
>>
>
> It's checking an atomically updated flag, so it forces CPU cache
> synchronisation, which means you don't want to be doing it on every
> iteration of a low level loop.
>
>
Even just that it's a C function call makes me not want to recommend doing
it in a lot of tight loops. Who knows what the function does anyway, let
alone what it might or might not do in the future.


> However, reviewing Serhiy's PR reminded me that PyErr_CheckSignals()
> already encapsulates the "Should this thread even be checking for signals
> in the first place?" logic, which means the code change to make the
> itertools iterators inherently interruptible with Ctrl-C is much smaller
> than I thought it would be.
>

​And if it didn't encapsulate that, you would probably have written a
wrapper that does.​ Good thing it's the wrapper that's exposed in the API.



> That approach is also clearly safe from an exception handling point of
> view, since all consumer loops already need to cope with the fact that
> itr.__next__() may raise arbitrary exceptions (including KeyboardInterrupt).
>
>
So that change alone already offers a notable improvement, and combining it
> with a __length_hint__() implementation that keeps container constructors
> from even starting to iterate would go even further towards making the
> infinite iterators more user friendly.
>
> Similar signal checking changes to the consumer loops would also be
> possible, but I don't think that's an either/or decision: changing the
> iterators means they'll be interruptible for any consumer, while changing
> the consumers would make them interruptible for any iterator, and having
> checks in both the producer & the consumer merely means that you'll be
> checking for signals twice every 65k iterations, rather than once.
>
>
​Indeed it's not strictly an either/or decision, but more about where we
might spend time executing C code. But I'm leaning a bit towards doing it
on the consumer side, because there it's more obvious that ​the code might
take some time to run.

If the consumer ends up iterating over pure-Python objects, there are no
concerns about the overhead. But if it *does* call a C-implemented
__next__, then that's the case where we actully need the whole thing.
Adding the check in both places would double the (small) overhead. And
nested (wrapped) iterators are also a thing.

​––Koos



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Nick Coghlan
On 19 October 2017 at 08:34, Greg Ewing  wrote:

> Nick Coghlan wrote:
>
>> since breaking up the current single level loops as nested loops would be
>> a pre-requisite for allowing these APIs to check for signals while they're
>> running while keeping the per-iteration overhead low
>>
>
> Is there really much overhead? Isn't it just checking a flag?
>

It's checking an atomically updated flag, so it forces CPU cache
synchronisation, which means you don't want to be doing it on every
iteration of a low level loop.

However, reviewing Serhiy's PR reminded me that PyErr_CheckSignals()
already encapsulates the "Should this thread even be checking for signals
in the first place?" logic, which means the code change to make the
itertools iterators inherently interruptible with Ctrl-C is much smaller
than I thought it would be. That approach is also clearly safe from an
exception handling point of view, since all consumer loops already need to
cope with the fact that itr.__next__() may raise arbitrary exceptions
(including KeyboardInterrupt).

So that change alone already offers a notable improvement, and combining it
with a __length_hint__() implementation that keeps container constructors
from even starting to iterate would go even further towards making the
infinite iterators more user friendly.

Similar signal checking changes to the consumer loops would also be
possible, but I don't think that's an either/or decision: changing the
iterators means they'll be interruptible for any consumer, while changing
the consumers would make them interruptible for any iterator, and having
checks in both the producer & the consumer merely means that you'll be
checking for signals twice every 65k iterations, rather than once.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Greg Ewing

Nick Coghlan wrote:
since breaking up the current single level loops as nested loops 
would be a pre-requisite for allowing these APIs to check for signals 
while they're running while keeping the per-iteration overhead low


Is there really much overhead? Isn't it just checking a flag?

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 10:30 PM, Serhiy Storchaka 
wrote:

> 18.10.17 22:21, Koos Zevenhoven пише:
>
>> ​Nice! Though I'd really like a general ​solution that other code can
>> easily adopt, even third-party extension libraries.
>>
>
> What is the more general solution? For interrupting C code you need to
> check signals manually, either in every loop, or in every iterator. It
> seems to me that the number of loops is larger than the number of iterators.
>
>
​Sorry, I missed this email earlier.

Maybe a macro like Py_MAKE_THIS_LOOP_BREAKABLE_FOR_ME_PLEASE that you could
insert wherever you think the code might be spending some time without
calling any Python code. One could use it rather carelessly, at least more
so than refcounts. Something like the macro you wrote, except that it would
take care of the whole thing and not just the counting.

-- Koos

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 10:21 PM, Koos Zevenhoven  wrote:

> On Wed, Oct 18, 2017 at 10:13 PM, Serhiy Storchaka 
> wrote:
>
>> 18.10.17 17:48, Nick Coghlan пише:
>>
>>> 1. It will make those loops slower, due to the extra overhead of
>>> checking for signals (even the opcode eval loop includes all sorts of
>>> tricks to avoid actually checking for new signals, since doing so is
>>> relatively slow)
>>> 2. It will make those loops harder to maintain, since the high cost of
>>> checking for signals means the existing flat loops will need to be replaced
>>> with nested ones to reduce the per-iteration cost of the more expensive
>>> checks
>>> 3. It means making the signal checking even harder to reason about than
>>> it already is, since even C implemented methods that avoid invoking
>>> arbitrary Python code could now still end up checking for signals
>>>
>>
>> I have implemented signals checking for itertools iterators. [1] The
>> overhead is insignificant because signals are checked only for every
>> 0x1-th item (100-4000 times/sec). The consuming loops are not changed
>> because signals are checked on the producer's side.
>>
>> [1] https://bugs.python.org/issue31815
>>
>>
> ​Nice! Though I'd really like a general ​solution that other code can
> easily adopt, even third-party extension libraries.
>
>
​By the way, now that I actually read the BPO issue​, it looks like the
benchmarks were for 0x1000 (15 bits)? And why is everyone doing powers of
two anyway?

Anyway, I still don't think infinite iterables are the most common case
where this problem occurs. Solving this in the most common consuming loops
would allow breaking out of a lot of long loops regardless of which
iterable type (if any) is being used. So I'm still asking which one should
solve the problem.

​-- Koos​


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Serhiy Storchaka

18.10.17 22:21, Koos Zevenhoven пише:
​Nice! Though I'd really like a general ​solution that other code can 
easily adopt, even third-party extension libraries.


What is the more general solution? For interrupting C code you need to 
check signals manually, either in every loop, or in every iterator. It 
seems to me that the number of loops is larger than the number of iterators.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 10:13 PM, Serhiy Storchaka 
wrote:

> 18.10.17 17:48, Nick Coghlan пише:
>
>> 1. It will make those loops slower, due to the extra overhead of checking
>> for signals (even the opcode eval loop includes all sorts of tricks to
>> avoid actually checking for new signals, since doing so is relatively slow)
>> 2. It will make those loops harder to maintain, since the high cost of
>> checking for signals means the existing flat loops will need to be replaced
>> with nested ones to reduce the per-iteration cost of the more expensive
>> checks
>> 3. It means making the signal checking even harder to reason about than
>> it already is, since even C implemented methods that avoid invoking
>> arbitrary Python code could now still end up checking for signals
>>
>
> I have implemented signals checking for itertools iterators. [1] The
> overhead is insignificant because signals are checked only for every
> 0x1-th item (100-4000 times/sec). The consuming loops are not changed
> because signals are checked on the producer's side.
>
> [1] https://bugs.python.org/issue31815
>
>
​Nice! Though I'd really like a general ​solution that other code can
easily adopt, even third-party extension libraries.

-- Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Serhiy Storchaka

18.10.17 17:48, Nick Coghlan пише:
1. It will make those loops slower, due to the extra overhead of 
checking for signals (even the opcode eval loop includes all sorts of 
tricks to avoid actually checking for new signals, since doing so is 
relatively slow)
2. It will make those loops harder to maintain, since the high cost of 
checking for signals means the existing flat loops will need to be 
replaced with nested ones to reduce the per-iteration cost of the more 
expensive checks
3. It means making the signal checking even harder to reason about than 
it already is, since even C implemented methods that avoid invoking 
arbitrary Python code could now still end up checking for signals


I have implemented signals checking for itertools iterators. [1] The 
overhead is insignificant because signals are checked only for every 
0x1-th item (100-4000 times/sec). The consuming loops are not 
changed because signals are checked on the producer's side.


[1] https://bugs.python.org/issue31815

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 9:24 PM, MRAB  wrote:

>
> The re module increments a counter on each iteration and checks for
> signals when the bottom 12 bits are 0.
>
> The regex module increments a 16-bit counter on each iteration and checks
> for signals when it wraps around to 0.
>

Then I​'d say that's a great solution, except that `regex` probably
over-exaggerates the overhead of checking for signals, and that `re` module
for some strange reason wants to make an additional bitwise and.

-- Koos

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread MRAB

On 2017-10-18 15:48, Nick Coghlan wrote:
On 18 October 2017 at 22:36, Koos Zevenhoven > wrote:


On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan > wrote:

That one can only be fixed in count() - list already checks
operator.length_hint(), so implementing
itertools.count.__length_hint__() to always raise an exception
would be enough to handle the container constructor case.


While that may be a convenient hack to solve some of the cases,
maybe it's possible for list(..) etc. to give Ctrl-C a chance every
now and then? (Without a noticeable performance penalty, that is.)
That would also help with *finite* C-implemented iterables that are
just slow to turn into a list.

If I'm not mistaken, we're talking about C-implemented functions
that iterate over C-implemented iterators. It's not at all obvious
to me that it's the iterator that should handle Ctrl-C.


It isn't, it's the loop's responsibility. The problem is that one of the 
core design assumptions in the CPython interpreter implementation is 
that signals from the operating system get handled by the opcode eval 
loop in the main thread, and Ctrl-C is one of those signals.


This is why "for x in itertools.cycle(): pass" can be interrupted, while 
"sum(itertools.cycle())" can't: in the latter case, the opcode eval loop 
isn't running, as we're inside a tight loop inside the sum() implementation.


It's easy to say "Well those loops should all be checking for signals 
then", but I expect folks wouldn't actually like the consequences of 
doing something about it, as:


1. It will make those loops slower, due to the extra overhead of 
checking for signals (even the opcode eval loop includes all sorts of 
tricks to avoid actually checking for new signals, since doing so is 
relatively slow)
2. It will make those loops harder to maintain, since the high cost of 
checking for signals means the existing flat loops will need to be 
replaced with nested ones to reduce the per-iteration cost of the more 
expensive checks


The re module increments a counter on each iteration and checks for 
signals when the bottom 12 bits are 0.


The regex module increments a 16-bit counter on each iteration and 
checks for signals when it wraps around to 0.


3. It means making the signal checking even harder to reason about than 
it already is, since even C implemented methods that avoid invoking 
arbitrary Python code could now still end up checking for signals


It's far from being clear to me that making such a change would actually 
be a net improvement, especially when there's an opportunity to mitigate 
the problem by having known-infinite iterators report themselves as such.



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 6:56 PM, Paul Moore  wrote:

> OK, looks like I've lost track of what this thread is about then.
> Sorry for the noise.
> Paul
>
>
​No worries. I'm not sure I can tell what this thread is about either.
Different people seem to have different ideas about that.

My most recent point was that __contains__ already has to allow Python code
to run on each iteration, so it is not the kind of code that Nick was
referring to, and which I'm not convinced even exists.

––Koos



> On 18 October 2017 at 16:40, Koos Zevenhoven  wrote:
> > On Wed, Oct 18, 2017 at 6:36 PM, Paul Moore  wrote:
> >>
> >> On 18 October 2017 at 16:27, Koos Zevenhoven  wrote:
> >> > So you're talking about code that would make a C-implemented Python
> >> > iterable
> >> > of strictly C-implemented Python objects and then pass this to
> something
> >> > C-implemented like list(..) or sum(..), while expecting no Python code
> >> > to be
> >> > run or signals to be checked anywhere while doing it. I'm not really
> >> > convinced that such code exists. But if such code does exist, it
> sounds
> >> > like
> >> > the code is heavily dependent on implementation details.
> >>
> >> Well, the OP specifically noted that he had recently encountered
> >> precisely that situation:
> >>
> >> """
> >> I recently came across a bug where checking negative membership
> >> (__contains__ returns False) of an infinite iterator will freeze the
> >> program.
> >> """
> >>
> >
> > No, __contains__ does not expect no python code to be run, because Python
> > code *can* run, as Serhiy in fact already demonstrated for another
> purpose:
> >
> > On Wed, Oct 18, 2017 at 3:53 PM, Serhiy Storchaka 
> > wrote:
> >>
> >> 18.10.17 13:22, Nick Coghlan пише:
> >>>
> >>> 2.. These particular cases can be addressed locally using existing
> >>> protocols, so the chances of negative side effects are low
> >>
> >>
> >> Only the particular case `count() in count()` can be addressed without
> >> breaking the following examples:
> >>
> >> >>> class C:
> >> ... def __init__(self, count):
> >> ... self.count = count
> >> ... def __eq__(self, other):
> >> ... print(self.count, other)
> >> ... if not self.count:
> >> ... return True
> >> ... self.count -= 1
> >> ... return False
> >> ...
> >> >>> import itertools
> >> >>> C(5) in itertools.count()
> >> 5 0
> >> 4 1
> >> 3 2
> >> 2 3
> >> 1 4
> >> 0 5
> >> True
> >
> >
> >
> > Clearly, Python code *does* run from within
> itertools.count.__contains__(..)
> >
> >
> > ––Koos
> >
> >
> > --
> > + Koos Zevenhoven + http://twitter.com/k7hoven +
>



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Paul Moore
OK, looks like I've lost track of what this thread is about then.
Sorry for the noise.
Paul

On 18 October 2017 at 16:40, Koos Zevenhoven  wrote:
> On Wed, Oct 18, 2017 at 6:36 PM, Paul Moore  wrote:
>>
>> On 18 October 2017 at 16:27, Koos Zevenhoven  wrote:
>> > So you're talking about code that would make a C-implemented Python
>> > iterable
>> > of strictly C-implemented Python objects and then pass this to something
>> > C-implemented like list(..) or sum(..), while expecting no Python code
>> > to be
>> > run or signals to be checked anywhere while doing it. I'm not really
>> > convinced that such code exists. But if such code does exist, it sounds
>> > like
>> > the code is heavily dependent on implementation details.
>>
>> Well, the OP specifically noted that he had recently encountered
>> precisely that situation:
>>
>> """
>> I recently came across a bug where checking negative membership
>> (__contains__ returns False) of an infinite iterator will freeze the
>> program.
>> """
>>
>
> No, __contains__ does not expect no python code to be run, because Python
> code *can* run, as Serhiy in fact already demonstrated for another purpose:
>
> On Wed, Oct 18, 2017 at 3:53 PM, Serhiy Storchaka 
> wrote:
>>
>> 18.10.17 13:22, Nick Coghlan пише:
>>>
>>> 2.. These particular cases can be addressed locally using existing
>>> protocols, so the chances of negative side effects are low
>>
>>
>> Only the particular case `count() in count()` can be addressed without
>> breaking the following examples:
>>
>> >>> class C:
>> ... def __init__(self, count):
>> ... self.count = count
>> ... def __eq__(self, other):
>> ... print(self.count, other)
>> ... if not self.count:
>> ... return True
>> ... self.count -= 1
>> ... return False
>> ...
>> >>> import itertools
>> >>> C(5) in itertools.count()
>> 5 0
>> 4 1
>> 3 2
>> 2 3
>> 1 4
>> 0 5
>> True
>
>
>
> Clearly, Python code *does* run from within itertools.count.__contains__(..)
>
>
> ––Koos
>
>
> --
> + Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 6:36 PM, Paul Moore  wrote:

> On 18 October 2017 at 16:27, Koos Zevenhoven  wrote:
> > So you're talking about code that would make a C-implemented Python
> iterable
> > of strictly C-implemented Python objects and then pass this to something
> > C-implemented like list(..) or sum(..), while expecting no Python code
> to be
> > run or signals to be checked anywhere while doing it. I'm not really
> > convinced that such code exists. But if such code does exist, it sounds
> like
> > the code is heavily dependent on implementation details.
>
> Well, the OP specifically noted that he had recently encountered
> precisely that situation:
>
> """
> I recently came across a bug where checking negative membership
> (__contains__ returns False) of an infinite iterator will freeze the
> program.
> """
>
>
​No, __contains__ does not expect no python code to be run, because Python
code *can* run, as Serhiy in fact already demonstrated for another purpose:
​

On Wed, Oct 18, 2017 at 3:53 PM, Serhiy Storchaka 
 wrote:

> 18.10.17 13:22, Nick Coghlan пише:
>
>> 2.. These particular cases can be addressed locally using existing
>> protocols, so the chances of negative side effects are low
>>
>
> Only the particular case `count() in count()` can be addressed without
> breaking the following examples:
>
> >>> class C:
> ... def __init__(self, count):
> ... self.count = count
> ... def __eq__(self, other):
> ... print(self.count, other)
> ... if not self.count:
> ... return True
> ... self.count -= 1
> ... return False
> ...
> >>> import itertools
> >>> C(5) in itertools.count()
> 5 0
> 4 1
> 3 2
> 2 3
> 1 4
> 0 5
> True



Clearly, Python code *does* run from within itertools.count.__contains__(..)


––Koos​


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Paul Moore
On 18 October 2017 at 16:27, Koos Zevenhoven  wrote:
> So you're talking about code that would make a C-implemented Python iterable
> of strictly C-implemented Python objects and then pass this to something
> C-implemented like list(..) or sum(..), while expecting no Python code to be
> run or signals to be checked anywhere while doing it. I'm not really
> convinced that such code exists. But if such code does exist, it sounds like
> the code is heavily dependent on implementation details.

Well, the OP specifically noted that he had recently encountered
precisely that situation:

"""
I recently came across a bug where checking negative membership
(__contains__ returns False) of an infinite iterator will freeze the
program.
"""

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 5:48 PM, Nick Coghlan  wrote:

> On 18 October 2017 at 22:36, Koos Zevenhoven  wrote:
>
>> On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan  wrote:
>>
>>> That one can only be fixed in count() - list already checks
>>> operator.length_hint(), so implementing itertools.count.__length_hint__()
>>> to always raise an exception would be enough to handle the container
>>> constructor case.
>>>
>>
>> While that may be a convenient hack to solve some of the cases, maybe
>> it's possible for list(..) etc. to give Ctrl-C a chance every now and then?
>> (Without a noticeable performance penalty, that is.) That would also help
>> with *finite* C-implemented iterables that are just slow to turn into a
>> list.
>>
>> If I'm not mistaken, we're talking about C-implemented functions that
>> iterate over C-implemented iterators. It's not at all obvious to me that
>> it's the iterator that should handle Ctrl-C.
>>
>
> It isn't, it's the loop's responsibility. The problem is that one of the
> core design assumptions in the CPython interpreter implementation is that
> signals from the operating system get handled by the opcode eval loop in
> the main thread, and Ctrl-C is one of those signals.
>
> This is why "for x in itertools.cycle(): pass" can be interrupted, while
> "sum(itertools.cycle())" can't: in the latter case, the opcode eval loop
> isn't running, as we're inside a tight loop inside the sum() implementation.
>
> It's easy to say "Well those loops should all be checking for signals
> then", but I expect folks wouldn't actually like the consequences of doing
> something about it, as:
>
> 1. It will make those loops slower, due to the extra overhead of checking
> for signals (even the opcode eval loop includes all sorts of tricks to
> avoid actually checking for new signals, since doing so is relatively slow)
> 2. It will make those loops harder to maintain, since the high cost of
> checking for signals means the existing flat loops will need to be replaced
> with nested ones to reduce the per-iteration cost of the more expensive
> checks
>

Combining points 1 and 2, I don't believe nesting loops is strictly a
requirement.



> 3. It means making the signal checking even harder to reason about than it
> already is, since even C implemented methods that avoid invoking arbitrary
> Python code could now still end up checking for signals
>

So you're talking about code that would make a C-implemented Python
iterable of strictly C-implemented Python objects and then pass this to
something C-implemented like list(..) or sum(..), while expecting no Python
code to be run or signals to be checked anywhere while doing it. I'm not
really convinced that such code exists.​ But if such code does exist, it
sounds like the code is heavily dependent on implementation details.
​

>
> It's far from being clear to me that making such a change would actually
> be a net improvement, especially when there's an opportunity to mitigate
> the problem by having known-infinite iterators report themselves as such.
>
>

​I'm not against that, per se. I just don't think that solves the quite
typical case of *finite* but very tedious or memory-consuming loops that
one might want to break out of. And raising an exception from
.__length_hint__() might ​also break some obscure, but completely valid,
operations on *infinite* iterables.


​––Koos​​​



> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Stephan Houben
Hi all,

FWIW, I just tried the list(count()) experiment on my phone (Termux Python
interpreter under Android).

Python 3.6.2 (default, Sep 16 2017, 23:55:07)
[GCC 4.2.1 Compatible Android Clang 5.0.300080 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import itertools
>>> list(itertools.count())
Killed

Interestingly even the Termux app stays alive and otherwise the phone
remains responsive and doesn't get hot. I am now sending this mail from
that very phone.

So this issue is not an issue on the world's most popular OS 

Stephan







Op 18 okt. 2017 08:46 schreef "Brendan Barnwell" :

> On 2017-10-17 07:26, Serhiy Storchaka wrote:
>
>> 17.10.17 17:06, Nick Coghlan пише:
>>
>>> >Keep in mind we're not talking about a regular loop you can break out of
>>> >with Ctrl-C here - we're talking about a tight loop inside the
>>> >interpreter internals that leads to having to kill the whole host
>>> >process just to get out of it.
>>>
>> And this is the root of the issue. Just let more tight loops be
>> interruptible with Ctrl-C, and this will fix the more general issue.
>>
>
> I was just thinking the same thing.  I think in general it's
> always bad for code to be uninterruptible with Ctrl-C.  If these infinite
> iterators were fixed so they could be interrupted, this containment problem
> would be much less painful.
>
> --
> Brendan Barnwell
> "Do not follow where the path may lead.  Go, instead, where there is no
> path, and leave a trail."
>--author unknown
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Nick Coghlan
On 18 October 2017 at 22:53, Serhiy Storchaka  wrote:

> 18.10.17 13:22, Nick Coghlan пише:
>
>> 2.. These particular cases can be addressed locally using existing
>> protocols, so the chances of negative side effects are low
>>
>
> Only the particular case `count() in count()` can be addressed without
> breaking the following examples:
>

You're right, the potential impact on objects with weird __eq__
implementations would mean that even the `__contains__` approach would
require deprecation warnings for the APIs that allow short-circuiting. So
we can discard it in favour of exploring the "Can we make a beneficial
improvement via __length_hint__?" question.


> 3. The total amount of code involved is likely to be small (a dozen or so
>> lines of C, a similar number of lines of Python in the tests) in
>> well-isolated protocol functions, so the chances of introducing future
>> maintainability problems are low
>>
>
> It depends on what you want to achieve. Just prevent an infinity loop in
> `count() in count()`, or optimize `int in count()`, or optimize more
> special cases.
>

My interest lies specifically in reducing the number of innocent looking
ways we offer to provoke uninterruptible infinite loops or unbounded memory
consumption.


> 4. We have a potential contributor who is presumably offering to do the
>> work (if that's not the case, then the question is moot anyway until a
>> sufficiently interested volunteer turns up)
>>
>
> Maintaining is more than writing an initial code.
>

Aye, that's why the preceding point was to ask how large a change we'd be
offering to maintain indefinitely, and how well isolated that change would
be.


> If we were to do that, then we *could* make the solution to the reported
>> problem more general by having all builtin and standard library operations
>> that expect to be working with finite iterators (the containment testing
>> fallback, min, max, sum, any, all, functools.reduce, etc) check for a
>> length hint, even if they aren't actually pre-allocating any memory.
>>
>
> This will add a significant overhead for relatively short (hundreds of
> items) sequences. I already did benchmarking for similar cases in the past.


I did wonder about that, so I guess the baseline zero-risk enhancement idea
would be to only prevent the infinite loop in cases that already request a
length hint as a memory pre-allocation check. That would reduce the
likelihood of the most painful case (grinding the machine to a halt),
without worrying about the less painful cases (which will break the current
process, but the rest of the machine will be fine).

Given that, adding TypeError raising __length_hint__ implementations to
itertools.count(), itertools.cycle(), and itertools.repeat() would make
sense as an independent RFE, without worrying about any APIs that don't
already check for a length hint.

A more intrusive option would then be to look at breaking the other tight
iteration loops into two phases, such that checking for potentially
infinite iterators could be delayed until after the first thousand
iterations or so. That option is potentially worth exploring anyway, since
breaking up the current single level loops as nested loops would be a
pre-requisite for allowing these APIs to check for signals while they're
running while keeping the per-iteration overhead low (only one
pre-requisite of many though, and probably one of the easier ones).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Serhiy Storchaka

18.10.17 13:22, Nick Coghlan пише:
2.. These particular cases can be addressed locally using existing 
protocols, so the chances of negative side effects are low


Only the particular case `count() in count()` can be addressed without 
breaking the following examples:


>>> class C:
... def __init__(self, count):
... self.count = count
... def __eq__(self, other):
... print(self.count, other)
... if not self.count:
... return True
... self.count -= 1
... return False
...
>>> import itertools
>>> C(5) in itertools.count()
5 0
4 1
3 2
2 3
1 4
0 5
True
>>> it = itertools.cycle([C(5)]); it in it
5 
4 
3 
2 
1 
0 
True
>>> it = itertools.repeat(C(5)); it in it
5 repeat(<__main__.C object at 0x7f65512c5dc0>)
4 repeat(<__main__.C object at 0x7f65512c5dc0>)
3 repeat(<__main__.C object at 0x7f65512c5dc0>)
2 repeat(<__main__.C object at 0x7f65512c5dc0>)
1 repeat(<__main__.C object at 0x7f65512c5dc0>)
0 repeat(<__main__.C object at 0x7f65512c5dc0>)
True

3. The total amount of code involved is likely to be small (a dozen or 
so lines of C, a similar number of lines of Python in the tests) in 
well-isolated protocol functions, so the chances of introducing future 
maintainability problems are low


It depends on what you want to achieve. Just prevent an infinity loop in 
`count() in count()`, or optimize `int in count()`, or optimize more 
special cases.


4. We have a potential contributor who is presumably offering to do the 
work (if that's not the case, then the question is moot anyway until a 
sufficiently interested volunteer turns up)


Maintaining is more than writing an initial code.

If we were to do that, then we *could* make the solution to the reported 
problem more general by having all builtin and standard library 
operations that expect to be working with finite iterators (the 
containment testing fallback, min, max, sum, any, all, functools.reduce, 
etc) check for a length hint, even if they aren't actually 
pre-allocating any memory.


This will add a significant overhead for relatively short (hundreds of 
items) sequences. I already did benchmarking for similar cases in the past.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan  wrote:

> On 18 October 2017 at 20:39, Koos Zevenhoven  wrote:
>
>> On Oct 18, 2017 13:29, "Nick Coghlan"  wrote:
>>
>> On 18 October 2017 at 19:56, Koos Zevenhoven  wrote:
>>
>>> I'm unable to reproduce the "uninterruptible with Ctrl-C"​ problem with
>>> infinite iterators. At least itertools doesn't seem to have it:
>>>
>>> >>> import itertools
>>> >>> for i in itertools.count():
>>> ... pass
>>> ...
>>>
>>
>> That's interrupting the for loop, not the iterator. This is the test case
>> you want for the problem Jason raised:
>>
>> >>> "a" in itertools.count()
>>
>> Be prepared to suspend and terminate the affected process, because Ctrl-C
>> isn't going to help :)
>>
>>
>> I'm writing from my phone now, cause I was dumb enough to try
>> list(count())
>>
>
> Yeah, that's pretty much the worst case example, since the machine starts
> thrashing memory long before it actually gives up and starts denying the
> allocation requests :(
>
>
>> But should it be fixed in list or in count?
>>
>
> That one can only be fixed in count() - list already checks
> operator.length_hint(), so implementing itertools.count.__length_hint__()
> to always raise an exception would be enough to handle the container
> constructor case.
>
>
While that may be a convenient hack to solve some of the cases, maybe it's
possible for list(..) etc. to give Ctrl-C a chance every now and then?
(Without a noticeable performance penalty, that is.) That would also help
with *finite* C-implemented iterables that are just slow to turn into a
list.

If I'm not mistaken, we're talking about C-implemented functions that
iterate over C-implemented iterators. It's not at all obvious to me that
it's the iterator that should handle Ctrl-C.

––Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Nick Coghlan
On 18 October 2017 at 20:39, Koos Zevenhoven  wrote:

> On Oct 18, 2017 13:29, "Nick Coghlan"  wrote:
>
> On 18 October 2017 at 19:56, Koos Zevenhoven  wrote:
>
>> I'm unable to reproduce the "uninterruptible with Ctrl-C"​ problem with
>> infinite iterators. At least itertools doesn't seem to have it:
>>
>> >>> import itertools
>> >>> for i in itertools.count():
>> ... pass
>> ...
>>
>
> That's interrupting the for loop, not the iterator. This is the test case
> you want for the problem Jason raised:
>
> >>> "a" in itertools.count()
>
> Be prepared to suspend and terminate the affected process, because Ctrl-C
> isn't going to help :)
>
>
> I'm writing from my phone now, cause I was dumb enough to try list(count())
>

Yeah, that's pretty much the worst case example, since the machine starts
thrashing memory long before it actually gives up and starts denying the
allocation requests :(


> But should it be fixed in list or in count?
>

That one can only be fixed in count() - list already checks
operator.length_hint(), so implementing itertools.count.__length_hint__()
to always raise an exception would be enough to handle the container
constructor case.

The open question would then be the cases that don't pre-allocate memory,
but still always attempt to consume the entire iterator:

min(itr)
max(itr)
sum(itr)
functools.reduce(op, itr)
"".join(itr)

And those which *may* attempt to consume the entire iterator, but won't
necessarily do so:

x in itr
any(itr)
all(itr)

The items in the first category could likely be updated to check
length_hint and propagate any errors immediately, since they don't provide
any short circuiting behaviour - feeding them an infinite iterator is a
guaranteed uninterruptible infinite loop, so checking for a length hint
won't break any currently working code (operator.length_hint defaults to
returning zero if a type doesn't implement __length_hint__).

I'm tempted to say the same for the APIs in the latter category as well,
but their short-circuiting semantics mean those can technically have
well-defined behaviour, even when given an infinite iterator:

>>> any(itertools.count())
True
>>> all(itertools.count())
False
>>> 1 in itertools.count()
True

It's only the "never short-circuits" branch that is ill-defined for
non-terminating input. So for these, the safer path would be to emit
DeprecationWarning if length_hint fails in 3.7, and then pass the exception
through in 3.8+.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Koos Zevenhoven
On Oct 18, 2017 13:29, "Nick Coghlan"  wrote:

On 18 October 2017 at 19:56, Koos Zevenhoven  wrote:

> I'm unable to reproduce the "uninterruptible with Ctrl-C"​ problem with
> infinite iterators. At least itertools doesn't seem to have it:
>
> >>> import itertools
> >>> for i in itertools.count():
> ... pass
> ...
>

That's interrupting the for loop, not the iterator. This is the test case
you want for the problem Jason raised:

>>> "a" in itertools.count()

Be prepared to suspend and terminate the affected process, because Ctrl-C
isn't going to help :)


I'm writing from my phone now, cause I was dumb enough to try list(count())

But should it be fixed in list or in count?

-- Koos

PS. Nick, sorry about the duplicate email.



Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Nick Coghlan
On 18 October 2017 at 19:56, Koos Zevenhoven  wrote:

> I'm unable to reproduce the "uninterruptible with Ctrl-C"​ problem with
> infinite iterators. At least itertools doesn't seem to have it:
>
> >>> import itertools
> >>> for i in itertools.count():
> ... pass
> ...
>

That's interrupting the for loop, not the iterator. This is the test case
you want for the problem Jason raised:

>>> "a" in itertools.count()

Be prepared to suspend and terminate the affected process, because Ctrl-C
isn't going to help :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Paul Moore
On 18 October 2017 at 10:56, Koos Zevenhoven  wrote:
> I'm unable to reproduce the "uninterruptible with Ctrl-C" problem with
> infinite iterators. At least itertools doesn't seem to have it:
>
 import itertools
 for i in itertools.count():
> ... pass
> ...
> ^CTraceback (most recent call last):
>   File "", line 1, in 
> KeyboardInterrupt

That's not the issue here, as the CPython interpreter implements this
with multiple opcodes, and checks between opcodes for Ctrl-C. The
demonstration is:

>>> import itertools
>>> 'x' in itertools.count()

... only way to break out is to kill the process.
Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Nick Coghlan
On 18 October 2017 at 03:39, Koos Zevenhoven  wrote:

> On Tue, Oct 17, 2017 at 5:26 PM, Serhiy Storchaka 
> wrote:
>
>> 17.10.17 17:06, Nick Coghlan пише:
>>
>>> Keep in mind we're not talking about a regular loop you can break out of
>>> with Ctrl-C here - we're talking about a tight loop inside the interpreter
>>> internals that leads to having to kill the whole host process just to get
>>> out of it.
>>>
>>
>> And this is the root of the issue. Just let more tight loops be
>> interruptible with Ctrl-C, and this will fix the more general issue.
>>
>>
> ​Not being able to interrupt something with Ctrl-C in the repl or with the
> interrupt command in Jupyter notebooks is definitely a thing I sometimes
> encounter. A pity I don't remember when it happens, because I usually
> forget it very soon after I've restarted the kernel and continued working.
> But my guess is it's usually not because of an infinite iterator.
>

Fixing the general case is hard, because the assumption that signals are
only checked between interpreter opcodes is a pervasive one throughout the
interpreter internals.  We certainly *could* redefine affected C APIs as
potentially raising KeyboardInterrupt (adjusting the signal management
infrastructure accordingly), and if someone actually follows through and
implements that some day, then the argument could then be made that given
such change, it might be reasonable to drop any a priori guards that we
have put in place for particular *detectable* uninterruptible infinite
loops.

However, that's not the design question being discussed in this thread. The
design question here is "We have 3 known uninterruptible infinite loops
that are easy to detect and prevent. Should we detect and prevent them?".
"We shouldn't allow anyone to do this easy thing, because it would be
preferable for someone to instead do this hard and complicated thing that
nobody is offering to do" isn't a valid design argument in that situation.

And I have a four step check for that which prompts me to say "Yes, we
should detect and prevent them":

1. Uninterruptible loops are bad, so having fewer of them is better
2. These particular cases can be addressed locally using existing
protocols, so the chances of negative side effects are low
3. The total amount of code involved is likely to be small (a dozen or so
lines of C, a similar number of lines of Python in the tests) in
well-isolated protocol functions, so the chances of introducing future
maintainability problems are low
4. We have a potential contributor who is presumably offering to do the
work (if that's not the case, then the question is moot anyway until a
sufficiently interested volunteer turns up)

As an alternative implementation approach, the case could also be made that
these iterators should be raising TypeError in __length_hint__, as that
protocol method is explicitly designed to be used for finite container
pre-allocation. That way things like "list(itertools.count())" would fail
immediately (similar to the way "list(range(10**100))" already does) rather
than attempting to consume all available memory before (hopefully) finally
failing with MemoryError.

If we were to do that, then we *could* make the solution to the reported
problem more general by having all builtin and standard library operations
that expect to be working with finite iterators (the containment testing
fallback, min, max, sum, any, all, functools.reduce, etc) check for a
length hint, even if they aren't actually pre-allocating any memory. Then
the general purpose marker for "infinite iterator" would be "Explicitly
defines __length_hint__ to raise TypeError", and it would prevent a priori
all operations that attempted to fully consume the iterator.

That more general approach would cause some currently "working" code (like
"any(itertools.count())" and "all(itertools.count())", both of which
consume at most 2 items from the iterator) to raise an exception instead,
and hence would require the introduction of a DeprecationWarning in 3.7
(where the affected APIs would start calling length hint, but suppress any
exceptions from it), before allowing the exception to propagate in 3.8+.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-18 Thread Brendan Barnwell

On 2017-10-17 07:26, Serhiy Storchaka wrote:

17.10.17 17:06, Nick Coghlan пише:

>Keep in mind we're not talking about a regular loop you can break out of
>with Ctrl-C here - we're talking about a tight loop inside the
>interpreter internals that leads to having to kill the whole host
>process just to get out of it.

And this is the root of the issue. Just let more tight loops be
interruptible with Ctrl-C, and this will fix the more general issue.


	I was just thinking the same thing.  I think in general it's always bad 
for code to be uninterruptible with Ctrl-C.  If these infinite iterators 
were fixed so they could be interrupted, this containment problem would 
be much less painful.


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-17 Thread Koos Zevenhoven
On Tue, Oct 17, 2017 at 5:26 PM, Serhiy Storchaka 
wrote:

> 17.10.17 17:06, Nick Coghlan пише:
>
>> Keep in mind we're not talking about a regular loop you can break out of
>> with Ctrl-C here - we're talking about a tight loop inside the interpreter
>> internals that leads to having to kill the whole host process just to get
>> out of it.
>>
>
> And this is the root of the issue. Just let more tight loops be
> interruptible with Ctrl-C, and this will fix the more general issue.
>
>
​Not being able to interrupt something with Ctrl-C in the repl or with the
interrupt command in Jupyter notebooks is definitely a thing I sometimes
encounter. A pity I don't remember when it happens, because I usually
forget it very soon after I've restarted the kernel and continued working.
But my guess is it's usually not because of an infinite iterator.

Regarding what the OP might have been after, and just for some wild
brainstorming based on true stories: In some sense, x in y should always
have an answer, even if it may be expensive to compute. Currently, it's
possible to implement "lazy truth values" which compute the bool value
lazily when .__bool__() is called. Until you call bool(..) on it, it would
just be Maybe, and then after the call, you'd actually have True or False.
In many cases it can even be enough to know if something is Maybe true.
Also, if you do something like any(*truth_values), then you could skip the
Maybe ones on the first pass, because if you find one that's plain True,
you already have the answer.

Regarding `x in y`, where y is an infinite iterable without well defined
contents, that would return an instance of MaybeType, but .__bool__() would
raise an exception.

––Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-17 Thread Serhiy Storchaka

17.10.17 17:06, Nick Coghlan пише:
Keep in mind we're not talking about a regular loop you can break out of 
with Ctrl-C here - we're talking about a tight loop inside the 
interpreter internals that leads to having to kill the whole host 
process just to get out of it.


And this is the root of the issue. Just let more tight loops be 
interruptible with Ctrl-C, and this will fix the more general issue.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-17 Thread Nick Coghlan
On 17 October 2017 at 23:17, Koos Zevenhoven  wrote:

> On Tue, Oct 17, 2017 at 2:46 PM, Serhiy Storchaka 
> wrote:
>
>> 17.10.17 14:10, Nick Coghlan пише:
>>
>>> 1. It's pretty easy to write "for x in y in y" when you really meant to
>>> write "for x in y", and if "y" is an infinite iterator, the "y in y" part
>>> will become an unbreakable infinite loop when executed instead of the
>>> breakable one you intended (especially annoying if it means you have to
>>> discard and restart a REPL session due to it, and that's exactly where that
>>> kind of typo is going to be easiest to make)
>>>
>>
>> I think it is better to left this on linters.
>
>
> ​Just to note that there is currently nothing that would prevent making
> `for x in y in z`​ a syntax error. There is nothing meaningful that it
> could do, really, because y in z can only return True or False (or raise an
> Exception or loop infinitely).
>

That was just an example of one of the ways we can accidentally end up
writing "x in y" at the REPL, where "y" is an infinite iterator, since it's
the kind that's specific to "x in y", whereas other forms (like
accidentally using the wrong variable name) also apply to other iterator
consuming APIs (like the ones Serhiy mentioned).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-17 Thread Nick Coghlan
On 17 October 2017 at 21:46, Serhiy Storchaka  wrote:

> 17.10.17 14:10, Nick Coghlan пише:
>
>> 1. It's pretty easy to write "for x in y in y" when you really meant to
>> write "for x in y", and if "y" is an infinite iterator, the "y in y" part
>> will become an unbreakable infinite loop when executed instead of the
>> breakable one you intended (especially annoying if it means you have to
>> discard and restart a REPL session due to it, and that's exactly where that
>> kind of typo is going to be easiest to make)
>>
>
> I think it is better to left this on linters. I never encountered this
> mistake and doubt it is common. In any case the first execution of this
> code will expose the mistake.
>

People don't run linters at the REPL, and it's at the REPL where
accidentally getting an unbreakable infinite loop is most annoying.

Keep in mind we're not talking about a regular loop you can break out of
with Ctrl-C here - we're talking about a tight loop inside the interpreter
internals that leads to having to kill the whole host process just to get
out of it.


> 2. Containment testing already has a dedicated protocol so containers can
>> implement optimised containment tests, which means it's also trivial for an
>> infinite iterator to intercept and explicitly disallow containment checks
>> if it chooses to do so
>>
>
> But this has non-zero maintaining cost. As the one who made many changes
> in itertools.c I don't like the idea of increasing its complexity for
> optimizing a pretty rare case.
>

It's not an optimisation, it's a UX improvement for the interactive prompt.
The maintenance burden should be low, as it's highly unlikely we'd ever
need to change this behaviour again in the future (I do think deprecating
the success case would be more trouble than it would be worth though).


> And note that the comparison can have side effect. You can implement the
> optimization of `x in count()` only for the limited set of builtin types.
> For example `x in range()` is optimized only for exact int and bool. You
> can't guarantee the finite time for cycle() and repeat() either since they
> can emit values of arbitrary types, with arbitrary __eq__.


We're not trying to guarantee finite execution time in general, we're just
making it more likely that either Ctrl-C works, or else you don't get stuck
in an infinite loop in the first place.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-17 Thread Koos Zevenhoven
On Tue, Oct 17, 2017 at 2:46 PM, Serhiy Storchaka 
wrote:

> 17.10.17 14:10, Nick Coghlan пише:
>
>> 1. It's pretty easy to write "for x in y in y" when you really meant to
>> write "for x in y", and if "y" is an infinite iterator, the "y in y" part
>> will become an unbreakable infinite loop when executed instead of the
>> breakable one you intended (especially annoying if it means you have to
>> discard and restart a REPL session due to it, and that's exactly where that
>> kind of typo is going to be easiest to make)
>>
>
> I think it is better to left this on linters.


​Just to note that there is currently nothing that would prevent making
`for x in y in z`​ a syntax error. There is nothing meaningful that it
could do, really, because y in z can only return True or False (or raise an
Exception or loop infinitely).

But for an infinite iterable, the right answer may be Maybe ;)

​––Koos​



> I never encountered this mistake and doubt it is common. In any case the
> first execution of this code will expose the mistake.
>
> 2. Containment testing already has a dedicated protocol so containers can
>> implement optimised containment tests, which means it's also trivial for an
>> infinite iterator to intercept and explicitly disallow containment checks
>> if it chooses to do so
>>
>
> But this has non-zero maintaining cost. As the one who made many changes
> in itertools.c I don't like the idea of increasing its complexity for
> optimizing a pretty rare case.
>
> And note that the comparison can have side effect. You can implement the
> optimization of `x in count()` only for the limited set of builtin types.
> For example `x in range()` is optimized only for exact int and bool. You
> can't guarantee the finite time for cycle() and repeat() either since they
> can emit values of arbitrary types, with arbitrary __eq__.
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-17 Thread Nick Coghlan
On 17 October 2017 at 19:19, Serhiy Storchaka  wrote:

> 17.10.17 09:42, Nick Coghlan пише:
>
>> On 17 October 2017 at 16:32, Nick Coghlan  ncogh...@gmail.com>> wrote:
>>
>> So this sounds like a reasonable API UX improvement to me, but you'd
>> need to ensure that you don't inadvertently change the external
>> behaviour of *successful* containment tests.
>>
>>
>> I should also note that there's another option here beyond just returning
>> "False": it would also be reasonable to raise an exception like
>> "RuntimeError('Attempted negative containment check on infinite iterator')".
>>
>
> What about other operations with infinite iterators? min(count()),
> max(count()), all(count(1))? Do you want to implement special cases for all
> of them?


No, as folks expect those to iterate without the opportunity to break out,
and are hence more careful with them when infinite iterators are part of
their application. We also don't have any existing protocols we could use
to intercept them, even if we decided we *did* want to do so.

The distinction I see with "x in y" is:

1. It's pretty easy to write "for x in y in y" when you really meant to
write "for x in y", and if "y" is an infinite iterator, the "y in y" part
will become an unbreakable infinite loop when executed instead of the
breakable one you intended (especially annoying if it means you have to
discard and restart a REPL session due to it, and that's exactly where that
kind of typo is going to be easiest to make)
2. Containment testing already has a dedicated protocol so containers can
implement optimised containment tests, which means it's also trivial for an
infinite iterator to intercept and explicitly disallow containment checks
if it chooses to do so

So the problem is more likely to be encountered due to "x in y" appearing
in both the containment test syntax and as part of the iteration syntax,
*and* it's straightforward to do something about it because the
__contains__ hook already exists. Those two points together are enough for
me to say "Sure, it makes sense to replace the current behaviour with
something more user friendly".

If either of them was false, then I'd say "No, that's not worth the hassle
of changing anything".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-17 Thread Nick Coghlan
On 17 October 2017 at 17:44, Steven D'Aprano  wrote:

> On Tue, Oct 17, 2017 at 04:42:35PM +1000, Nick Coghlan wrote:
>
> > I should also note that there's another option here beyond just returning
> > "False": it would also be reasonable to raise an exception like
> > "RuntimeError('Attempted negative containment check on infinite
> iterator')".
>
> I don't think that works, even if we limit discussion to just
> itertools.count() rather than arbitrary iterators. Obviously we
> cannot wait until the entire infinite iterator is checked (that
> might take longer than is acceptible...) but if you only check a
> *finite* number before giving up, you lead to false-negatives:
>
> # say we only check 100 values before raising
> 0 in itertools.count(1)  # correctly raises
> 101 in itertools.count(1)  # wrongly raises
>

Nobody suggested that, as it's obviously wrong. This discussion is solely
about infinite iterators that have closed form containment tests, either
because they're computed (itertools.count()), or because they're based on
an underlying finite sequence of values (cycle(), repeat()).


> If we do a computed membership test, then why raise at all? We quickly
> know whether or not the value is in the sequence, so there's no error to
> report.
>

Because we should probably always be raising for these particular
containment checks, and it's safe to start doing so in the negative case,
since that's currently a guaranteed infinite loop.

And unlike a "while True" loop (which has many real world applications),
none of these implicit infinite loops allow for any kind of useful work on
each iteration, they just end up in a tight loop deep inside the
interpreter internals, doing absolutely nothing. They won't even check for
signals or release the GIL, so you'll need to ask the operating system to
clobber the entire process to break out of it - Ctrl-C will be ignored.

I'd also have no major objection to deprecating containment tests on these
iterators entirely, but that doesn't offer the same kind of UX benefit that
replacing an infinite loop with an immediate exception does, so I think the
two questions should be considered separately.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-17 Thread Serhiy Storchaka

17.10.17 09:42, Nick Coghlan пише:
On 17 October 2017 at 16:32, Nick Coghlan 
> wrote:


So this sounds like a reasonable API UX improvement to me, but you'd
need to ensure that you don't inadvertently change the external
behaviour of *successful* containment tests.


I should also note that there's another option here beyond just 
returning "False": it would also be reasonable to raise an exception 
like "RuntimeError('Attempted negative containment check on infinite 
iterator')".


What about other operations with infinite iterators? min(count()), 
max(count()), all(count(1))? Do you want to implement special cases for 
all of them?


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-17 Thread Steven D'Aprano
On Tue, Oct 17, 2017 at 04:42:35PM +1000, Nick Coghlan wrote:

> I should also note that there's another option here beyond just returning
> "False": it would also be reasonable to raise an exception like
> "RuntimeError('Attempted negative containment check on infinite iterator')".

I don't think that works, even if we limit discussion to just 
itertools.count() rather than arbitrary iterators. Obviously we 
cannot wait until the entire infinite iterator is checked (that 
might take longer than is acceptible...) but if you only check a 
*finite* number before giving up, you lead to false-negatives:

# say we only check 100 values before raising
0 in itertools.count(1)  # correctly raises
101 in itertools.count(1)  # wrongly raises

If we do a computed membership test, then why raise at all? We quickly 
know whether or not the value is in the sequence, so there's no error to 
report.

Personally, I think a better approach is to give the specialist 
itertools iterator types a __contains__ method which unconditionally 
raises a warning regardless of whether the containment test returns 
true, false or doesn't return at all. Perhaps with a flag (module-wide?) 
to disable the warning, or turn it into an error.

I think a warning (by default) is better than an error because we don't 
really know for sure that it is an error:

n in itertools.count()

is, on the face of it, no more than an error than any other 
potentially infinite loop:

while condition(n):
...

and like any so-called infinite loop, we can never be sure when to give 
up and raise. A thousand loops? A million? A millisecond? An hour? 
Whatever we pick, it will be a case of one-size fits none.

I appreciate that, in practice it is easier to mess up a containment 
test using one of the itertools iterators than to accidentally write an 
infinite loop using while, and my concession to that is to raise a 
warning, and let the programmer decide whether to ignore it or turn it 
into an error.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-17 Thread Nick Coghlan
On 17 October 2017 at 16:32, Nick Coghlan  wrote:

> So this sounds like a reasonable API UX improvement to me, but you'd need
> to ensure that you don't inadvertently change the external behaviour of
> *successful* containment tests.
>

I should also note that there's another option here beyond just returning
"False": it would also be reasonable to raise an exception like
"RuntimeError('Attempted negative containment check on infinite iterator')".

That way currently working code would be semantically unchanged, but code
that otherwise results in an infinite loop would turn into an immediate
exception instead.

The rationale for this approach would be "What you are trying to do doesn't
really make sense, so we're going to complain about it, rather than just
giving you an answer".

The rationale against the error is that "If this item is present, advance
past it, otherwise don't do anything" would be an at least arguably
reasonable operation to request.

Whether "x in itr" is a reasonable *spelling* of that operation would then
be a different question that still favoured the "raise an exception"
approach.

Cheers,
NIck.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-17 Thread Nick Coghlan
On 16 October 2017 at 11:12, Jason Campbell  wrote:

> I recently came across a bug where checking negative membership
> (__contains__ returns False) of an infinite iterator will freeze the
> program.
>
> It may seem a bit obvious, but you can check membership in range for
> example without iterating over the entire range.
>
As Terry noted, this is due to a semantic difference between range (compact
representation of a tuple of integers) and arbitrary iterators.

However, if we can avoid seemingly innocent code triggering an infinite
loop, that's probably a good thing.

> `int(1e100) in range(int(1e101))` on my machine takes about 1.5us
> `int(1e7) in itertools.count()` on my machine takes about 250ms (and gets
> worse quite quickly).
>
> Any membership test on the infinite iterators that is negative will freeze
> the program as stated earlier, which is odd to me.
>
The other relevant behavioural difference is that even a successful
membership test will consume the iterator up to that point.

> itertools.count can use the same math as range
>
+1, with the caveat that a successful containment test should still advance
the iterator to the point immediately after the requested element, just as
it does today (if people don't want that behaviour, they should be defining
a suitable finite range instead).

> itertools.cycle could use membership from the underlying iterable
>
Sort of - you'll still need to iterate through the underlying iterator at
least once in order to see all of the candidate elements. However, the
containment test could still be defined as running through the full cycle
at most once, such that you end up either at the point immediately after
the item or else back where you started (if the item wasn't found).

> itertools.repeat is even easier, just compare to the repeatable element
>
+1

So this sounds like a reasonable API UX improvement to me, but you'd need
to ensure that you don't inadvertently change the external behaviour of
*successful* containment tests.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-16 Thread Michel Desmoulin
Given that:

- it doesn't break anything;
- the behavior makes sense;
- it can avoid the machine freezing when one is not careful;

It can be a good idea.

Those generators could have iterators that are not themselves and have a
__contains__ method behaving accordingly.

I only did the mistake of using "in" on those once, but it did happen. I
can imagine everybody do the same at least once. It's not a huge help,
but the joy of using python is a collection of small things after all.

Le 17/10/2017 à 05:27, אלעזר a écrit :
> 
> 
> בתאריך יום ג׳, 17 באוק׳ 2017, 00:13, מאת Terry Reedy ‏ >:
> 
> On 10/15/2017 9:12 PM, Jason Campbell wrote:
> ...  
> > itertools.cycle could use membership from the underlying iterable
> 
> If the underlying iterable is an iterator, this does not work.  You
> could define a Cycle class that requires that the input have
> .__contain__.
> 
> > itertools.repeat is even easier, just compare to the repeatable
> element
> 
> Again, create a Repeat(ob) class whose .__iter__ returns repeat(ob) and
> that has .__contains__(x) returning 'x == ob'.
> 
> > Does anyone else think this would be useful?
> 
> itertools is not going to switch from iterators to non-iterator
> iterables.
> 
> 
> It doesn't have to switch, and does not have to _require_ the input to
> define __contains__ method for the proposal to be meaningful. It can
> work with the same kind of iterables as its inputs, delegating whenever
> possible. I'm sure there are good reasons not to do it, but that's not
> an either/or decision.  
> 
> Elazar 
> 
> 
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
> 
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-16 Thread אלעזר
בתאריך יום ג׳, 17 באוק׳ 2017, 00:13, מאת Terry Reedy ‏:

> On 10/15/2017 9:12 PM, Jason Campbell wrote:
> ...
> > itertools.cycle could use membership from the underlying iterable
>
> If the underlying iterable is an iterator, this does not work.  You
> could define a Cycle class that requires that the input have .__contain__.
>
> > itertools.repeat is even easier, just compare to the repeatable element
>
> Again, create a Repeat(ob) class whose .__iter__ returns repeat(ob) and
> that has .__contains__(x) returning 'x == ob'.
>
> > Does anyone else think this would be useful?
>
> itertools is not going to switch from iterators to non-iterator iterables.
>

It doesn't have to switch, and does not have to _require_ the input to
define __contains__ method for the proposal to be meaningful. It can work
with the same kind of iterables as its inputs, delegating whenever
possible. I'm sure there are good reasons not to do it, but that's not an
either/or decision.

Elazar
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Membership of infinite iterators

2017-10-16 Thread Terry Reedy

On 10/15/2017 9:12 PM, Jason Campbell wrote:
I recently came across a bug where checking negative membership 
(__contains__ returns False) of an infinite iterator will freeze the 
program.



It may seem a bit obvious, but you can check membership in range for 
example without iterating over the entire range.


A range is a finite (non-iterator) *iterable*, not an *iterator*.  In 
particular, a range is a finite sequence.  In Python 2, ranges were 
implemented as lists.  Python 3 now uses a more compact implementation. 
Either way, a ranges have a .__contains__ method.  (In Py3 it is O(1) 
instead of O(n)).



`int(1e100) in range(int(1e101))` on my machine takes about 1.5us
`int(1e7) in itertools.count()` on my machine takes about 250ms (and 
gets worse quite quickly).


Non-iterator iterables and iterators are different things. It you 
compared iter(range(n)) with itertools.count(n), you would get similar 
results.


Any membership test on the infinite iterators that is negative will 
freeze the program as stated earlier, which is odd to me.


Membership testing with iterators is generally a bad idea.  In general, 
iterators do not have a .__contains__ method, so at best, one exhausts 
the iterator, making it useless.  In particular, generators do not have 
any non-iterator, non-generator methods.  The itertools module has 
iterator classes rather than generator functions because it is coded in 
C, but they are closely equivalent in behavior to the generator 
functions given in the doc.



itertools.count can use the same math as range


You could use range(start, 1000[, step]) instead of 
count(start[, step]), or wrap count in an infinite sequence Count class 
that has all the methods and arithmetic of range.  The __iter__ method 
would return a count iterator.



itertools.cycle could use membership from the underlying iterable


If the underlying iterable is an iterator, this does not work.  You 
could define a Cycle class that requires that the input have .__contain__.



itertools.repeat is even easier, just compare to the repeatable element


Again, create a Repeat(ob) class whose .__iter__ returns repeat(ob) and 
that has .__contains__(x) returning 'x == ob'.



Does anyone else think this would be useful?


itertools is not going to switch from iterators to non-iterator iterables.

--
Terry Jan Reedy

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/