Re: [Python-ideas] A cute Python implementation of itertools.tee

2018-05-14 Thread Franklin? Lee
The topic reminded me of Stephan Houben's streamtee.
https://github.com/stephanh42/streamtee/blob/master/streamtee.py

It was an attempt at a memory-efficient tee, but it turned out tee was
efficient enough. It uses a thunk-like method and recursion to remove
the need for an explicit linked list.

In the spirit of the topic, I tried to strip it down to the bare essentials:

def tee(iterable, n):
stream = _streamiter(iter(iterable))
return tuple(_wrapper(stream) for _ in range(n))

def _streamiter(itr):
result = (next(itr), _streamiter(itr))
del itr # Release unneeded reference.
while True:
yield result

def _wrapper(stream):
while True:
result, stream = next(stream)
yield result

Unfortunately, this code, too, breaks with the new StopIteration
propagation rules.

On Sun, Apr 15, 2018 at 1:05 AM, Tim Peters  wrote:
> Just for fun - no complaint, no suggestion, just sharing a bit of code
> that tickled me.
>
> The docs for `itertools.tee()` contain a Python work-alike, which is
> easy to follow.  It gives each derived generator its own deque, and
> when a new value is obtained from the original iterator it pushes that
> value onto each of those deques.
>
> Of course it's possible for them to share a single deque, but the code
> gets more complicated.  Is it possible to make it simpler instead?
>
> What it "really" needs is a shared singly-linked list of values,
> pointing from oldest value to newest.  Then each derived generator can
> just follow the links, and yield its next result in time independent
> of the number of derived generators.  But how do you know when a new
> value needs to be obtained from the original iterator, and how do you
> know when space for an older value can be recycled (because all of the
> derived generators have yielded it)?
>
> I ended up with almost a full page of code to do that, storing with
> each value and link a count of the number of derived generators that
> had yet to yield the value, effectively coding my own reference-count
> scheme by hand, along with "head" and "tail" pointers to the ends of
> the linked list that proved remarkably tricky to keep correct in all
> cases.
>
> Then I thought "this is stupid!  Python already does reference
> counting."  Voila!  Vast swaths of tedious code vanished, giving this
> remarkably simple implementation:
>
> def mytee(xs, n):
> last = [None, None]
>
> def gen(it, mylast):
> nonlocal last
> while True:
> mylast = mylast[1]
> if not mylast:
> mylast = last[1] = last = [next(it), None]
> yield mylast[0]
>
> it = iter(xs)
> return tuple(gen(it, last) for _ in range(n))
>
> There's no need to keep a pointer to the start of the shared list at
> all - we only need a pointer to the end of the list ("last"), and each
> derived generator only needs a pointer to its own current position in
> the list ("mylast").
>
> What I find kind of hilarious is that it's no help at all as a
> prototype for a C implementation:  Python recycles stale `[next(it),
> None]` pairs all by itself, when their internal refcounts fall to 0.
> That's the hardest part.
>
> BTW, I certainly don't suggest adding this to the itertools docs
> either.  While it's short and elegant, it's too subtle to grasp easily
> - if you think "it's obvious", you haven't yet thought hard enough
> about the problem ;-)
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A cute Python implementation of itertools.tee

2018-05-07 Thread Tim Peters
I posted this several weeks ago, just for fun - an obscure but
surprisingly brief Python implementation of itertools.tee, sharing a
single stream (as a singly linked list) among all the returned
iterables.

Didn't think about it again until today, when recent discussions of
lexical scoping made me wonder "hmm - when's the last time I even used
nonlocal?".  Well, this was.  And shortly thereafter, "but there's no
need to keep track of `last` at all!"  I have no idea where that
thought came from :-)

> def mytee(xs, n):
> last = [None, None]
>
> def gen(it, mylast):
> nonlocal last
> while True:
> mylast = mylast[1]
> if not mylast:
> mylast = last[1] = last = [next(it), None]
> yield mylast[0]
>
> it = iter(xs)
> return tuple(gen(it, last) for _ in range(n))

So, in fact, the inner function there can be replaced by the even briefer:

def gen(it, mylast):
while True:
if mylast[1] is None:
mylast[1] = [next(it), None]
mylast = mylast[1]
yield mylast[0]

To make it more relevant to current discussions, collapsing the last
two lines using a binding expression:

yield (mylast := mylast[1])[0]

isn't even slightly tempting.  Most of the cases where it would be
_possible_ to use binding expressions don't strike me as being
_sensible_ places to use them - slamming together conceptually
different tasks just because it's possible to do so.  But because name
bindings are so very common, that still leaves plenty where the
transformation leaves the code clearer to my eyes (usually just a
little, rarely a lot).  There are no such cases in the code above.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A cute Python implementation of itertools.tee

2018-04-16 Thread Steven D'Aprano
On Sun, Apr 15, 2018 at 08:35:51PM +0300, Serhiy Storchaka wrote:

> I have ideas about implementing zero-overhead try/except, but I have 
> doubts that it is worth. The benefit seems too small.

It is conventional wisdom that catching exceptions is expensive, and 
that in performance critical code it is better to "look before you leap" 
if possible, and avoid try...except.

Are you saying this advice is obsolete?

If not, then perhaps reducing the overhead of catching exceptions may be 
worthwhile.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A cute Python implementation of itertools.tee

2018-04-16 Thread Koos Zevenhoven
On Sun, Apr 15, 2018 at 11:55 PM, Chris Angelico  wrote:

> On Mon, Apr 16, 2018 at 6:46 AM, Koos Zevenhoven 
> wrote:
> > Anyway, the whole linked list is unnecessary if the iterable can be
> iterated
> > over multiple times. But "tee" won't know when to do that. *That* is
> what I
> > call overhead (unless of course all the tee branches are consumed in an
> > interleaved manner).
>
> But if you have something you can iterate over multiple times, why
> bother with tee at all? Just take N iterators from the underlying
> iterable. The overhead is intrinsic to the value of the function.
>
> ​
Indeed. But if you have, say, an Iterable[int], you don't know if you need
the additional buffer or not.  It could be a range object or a set or a
generator (or iterator), who knows. Even your type checker doesn't know
what you need.

-- Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A cute Python implementation of itertools.tee

2018-04-16 Thread Koos Zevenhoven
On Mon, Apr 16, 2018 at 12:06 AM, Tim Peters  wrote:

> [Koos Zevenhoven ]
> > It's definitely possible to write the above in a more
> > readable way, and FWIW I don't think it involves "assignments as
> > expressions".
>
> Of course it is.  The point was brevity and speed, not readability.
> It was presented partly as a puzzle :-)
>
> >> What I find kind of hilarious is that it's no help at all as a
> >> prototype for a C implementation:  Python recycles stale `[next(it),
> >> None]` pairs all by itself, when their internal refcounts fall to 0.
> >> That's the hardest part.
>
> > Why can't the C implementation use Python refcounts? Are you talking
> about
> > standalone C code?
>
> Yes, expressing the algorithm in plain old C, not building on top of
> (say) the Python C API.
>
>
​There must have been a reason why pseudo code was "invented".​


> > Or perhaps you are thinking about overhead?
>
> Nope.
>
>
> > (In PEP 555 that was not a concern, though). Surely it would make sense
> > to reuse the refcounting code that's already there. There are no cycles
> > here, so it's not particulaly complicated -- just duplication.
> >
> > Anyway, the whole linked list is unnecessary if the iterable can be
> iterated
> > over multiple times.
>
> If the latter were how iterables always worked, there would be no need
> for tee() at all.  It's tee's _purpose_ to make it possible for
> multiple consumers to traverse an iterable's can't-restart-or-even
> -go-back result sequence each at their own pace.
>

​Yes. (I'm not sure which is easier, going back or starting from the
beginning)

-- Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A cute Python implementation of itertools.tee

2018-04-15 Thread Tim Peters
[Koos Zevenhoven ]
> It's definitely possible to write the above in a more
> readable way, and FWIW I don't think it involves "assignments as
> expressions".

Of course it is.  The point was brevity and speed, not readability.
It was presented partly as a puzzle :-)

>> What I find kind of hilarious is that it's no help at all as a
>> prototype for a C implementation:  Python recycles stale `[next(it),
>> None]` pairs all by itself, when their internal refcounts fall to 0.
>> That's the hardest part.

> Why can't the C implementation use Python refcounts? Are you talking about
> standalone C code?

Yes, expressing the algorithm in plain old C, not building on top of
(say) the Python C API.

> Or perhaps you are thinking about overhead?

Nope.


> (In PEP 555 that was not a concern, though). Surely it would make sense
> to reuse the refcounting code that's already there. There are no cycles
> here, so it's not particulaly complicated -- just duplication.
>
> Anyway, the whole linked list is unnecessary if the iterable can be iterated
> over multiple times.

If the latter were how iterables always worked, there would be no need
for tee() at all.  It's tee's _purpose_ to make it possible for
multiple consumers to traverse an iterable's can't-restart-or-even
-go-back result sequence each at their own pace.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A cute Python implementation of itertools.tee

2018-04-15 Thread Chris Angelico
On Mon, Apr 16, 2018 at 6:46 AM, Koos Zevenhoven  wrote:
> Anyway, the whole linked list is unnecessary if the iterable can be iterated
> over multiple times. But "tee" won't know when to do that. *That* is what I
> call overhead (unless of course all the tee branches are consumed in an
> interleaved manner).

But if you have something you can iterate over multiple times, why
bother with tee at all? Just take N iterators from the underlying
iterable. The overhead is intrinsic to the value of the function.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A cute Python implementation of itertools.tee

2018-04-15 Thread Koos Zevenhoven
On Sun, Apr 15, 2018 at 8:05 AM, Tim Peters  wrote:
​[...]​


> Then I thought "this is stupid!  Python already does reference
> counting."  Voila!  Vast swaths of tedious code vanished, giving this
> remarkably simple implementation:
>
> def mytee(xs, n):
> last = [None, None]
>
> def gen(it, mylast):
> nonlocal last
> while True:
> mylast = mylast[1]
> if not mylast:
> mylast = last[1] = last = [next(it), None]
> yield mylast[0]
>
> it = iter(xs)
> return tuple(gen(it, last) for _ in range(n))
>
> There's no need to keep a pointer to the start of the shared list at
> all - we only need a pointer to the end of the list ("last"), and each
> derived generator only needs a pointer to its own current position in
> the list ("mylast").
>
>
Things here remind me of my implementation design for PEP 555: the
"contexts" present in the process are represented by a singly-linked tree
of assignment objects. It's definitely possible to write the above in a
more readable way, and FWIW I don't think it involves "assignments as
expressions".


> What I find kind of hilarious is that it's no help at all as a
> prototype for a C implementation:  Python recycles stale `[next(it),
> None]` pairs all by itself, when their internal refcounts fall to 0.
> That's the hardest part.
>
>
​Why can't the C implementation use Python refcounts? Are you talking about
standalone C code? Or perhaps you are thinking about overhead? (In PEP 555
that was not a concern, though). Surely it would make sense to reuse the
refcounting code that's already there. There are no cycles here, so it's
not particulaly complicated -- just duplication.

Anyway, the whole linked list is unnecessary if the iterable can be
iterated over multiple times. But "tee" won't know when to do that. *That*
is what I call overhead (unless of course all the tee branches are consumed
in an interleaved manner).



> BTW, I certainly don't suggest adding this to the itertools docs
> either.  While it's short and elegant, it's too subtle to grasp easily
> - if you think "it's obvious", you haven't yet thought hard enough
> about the problem ;-)
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A cute Python implementation of itertools.tee

2018-04-15 Thread Antoine Pitrou
On Sun, 15 Apr 2018 00:05:58 -0500
Tim Peters  wrote:

> Just for fun - no complaint, no suggestion, just sharing a bit of code
> that tickled me.
> 
> The docs for `itertools.tee()` contain a Python work-alike, which is
> easy to follow.  It gives each derived generator its own deque, and
> when a new value is obtained from the original iterator it pushes that
> value onto each of those deques.
> 
> Of course it's possible for them to share a single deque, but the code
> gets more complicated.  Is it possible to make it simpler instead?
> 
> What it "really" needs is a shared singly-linked list of values,
> pointing from oldest value to newest.  Then each derived generator can
> just follow the links, and yield its next result in time independent
> of the number of derived generators.  But how do you know when a new
> value needs to be obtained from the original iterator, and how do you
> know when space for an older value can be recycled (because all of the
> derived generators have yielded it)?
> 
> I ended up with almost a full page of code to do that, storing with
> each value and link a count of the number of derived generators that
> had yet to yield the value, effectively coding my own reference-count
> scheme by hand, along with "head" and "tail" pointers to the ends of
> the linked list that proved remarkably tricky to keep correct in all
> cases.
> 
> Then I thought "this is stupid!  Python already does reference
> counting."  Voila!  Vast swaths of tedious code vanished, giving this
> remarkably simple implementation:

This implementation doesn't work with Python 3.7 or 3.8.
I've tried it here:
https://gist.github.com/pitrou/b3991f638300edb6d06b5be23a4c66d6

and get:
Traceback (most recent call last):
  File "mytee.py", line 14, in gen
mylast = last[1] = last = [next(it), None]
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "mytee.py", line 47, in 
run(mytee1)
  File "mytee.py", line 36, in run
lists[i].append(next(iters[i]))
RuntimeError: generator raised StopIteration

(Yuck!)

In short, you want the following instead:

try:
mylast = last[1] = last = [next(it), None]
except StopIteration:
return

> def mytee(xs, n):
> last = [None, None]
> 
> def gen(it, mylast):
> nonlocal last
> while True:
> mylast = mylast[1]
> if not mylast:
> mylast = last[1] = last = [next(it), None]

That's smart and obscure :-o
The way it works is that the `last` assignment changes the `last` value
seen by all derived generators, while the `last[1]` assignment updates
the bindings made in the other generators' `mylast` lists...  It's
difficult to find the words to explain it.

The chained assignment makes it more difficult to parse as well (when I
read this I don't know if `last[i]` or `last` gets assigned first;
apparently the answer is `last[i]`, otherwise the recipe wouldn't work
correctly).  Perhaps like this:

while True:
mylast = mylast[1]
if not mylast:
try:
# Create new list link
mylast = [next(it), None]
except StopIteration:
return
else:
# Append to other generators `mylast` linked lists
last[1] = mylast
# Update shared list link
last = last[1]
yield mylast[0]


Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] A cute Python implementation of itertools.tee

2018-04-14 Thread Tim Peters
Just for fun - no complaint, no suggestion, just sharing a bit of code
that tickled me.

The docs for `itertools.tee()` contain a Python work-alike, which is
easy to follow.  It gives each derived generator its own deque, and
when a new value is obtained from the original iterator it pushes that
value onto each of those deques.

Of course it's possible for them to share a single deque, but the code
gets more complicated.  Is it possible to make it simpler instead?

What it "really" needs is a shared singly-linked list of values,
pointing from oldest value to newest.  Then each derived generator can
just follow the links, and yield its next result in time independent
of the number of derived generators.  But how do you know when a new
value needs to be obtained from the original iterator, and how do you
know when space for an older value can be recycled (because all of the
derived generators have yielded it)?

I ended up with almost a full page of code to do that, storing with
each value and link a count of the number of derived generators that
had yet to yield the value, effectively coding my own reference-count
scheme by hand, along with "head" and "tail" pointers to the ends of
the linked list that proved remarkably tricky to keep correct in all
cases.

Then I thought "this is stupid!  Python already does reference
counting."  Voila!  Vast swaths of tedious code vanished, giving this
remarkably simple implementation:

def mytee(xs, n):
last = [None, None]

def gen(it, mylast):
nonlocal last
while True:
mylast = mylast[1]
if not mylast:
mylast = last[1] = last = [next(it), None]
yield mylast[0]

it = iter(xs)
return tuple(gen(it, last) for _ in range(n))

There's no need to keep a pointer to the start of the shared list at
all - we only need a pointer to the end of the list ("last"), and each
derived generator only needs a pointer to its own current position in
the list ("mylast").

What I find kind of hilarious is that it's no help at all as a
prototype for a C implementation:  Python recycles stale `[next(it),
None]` pairs all by itself, when their internal refcounts fall to 0.
That's the hardest part.

BTW, I certainly don't suggest adding this to the itertools docs
either.  While it's short and elegant, it's too subtle to grasp easily
- if you think "it's obvious", you haven't yet thought hard enough
about the problem ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/