Re: [Python-ideas] A cute Python implementation of itertools.tee
The topic reminded me of Stephan Houben's streamtee. https://github.com/stephanh42/streamtee/blob/master/streamtee.py It was an attempt at a memory-efficient tee, but it turned out tee was efficient enough. It uses a thunk-like method and recursion to remove the need for an explicit linked list. In the spirit of the topic, I tried to strip it down to the bare essentials: def tee(iterable, n): stream = _streamiter(iter(iterable)) return tuple(_wrapper(stream) for _ in range(n)) def _streamiter(itr): result = (next(itr), _streamiter(itr)) del itr # Release unneeded reference. while True: yield result def _wrapper(stream): while True: result, stream = next(stream) yield result Unfortunately, this code, too, breaks with the new StopIteration propagation rules. On Sun, Apr 15, 2018 at 1:05 AM, Tim Peterswrote: > Just for fun - no complaint, no suggestion, just sharing a bit of code > that tickled me. > > The docs for `itertools.tee()` contain a Python work-alike, which is > easy to follow. It gives each derived generator its own deque, and > when a new value is obtained from the original iterator it pushes that > value onto each of those deques. > > Of course it's possible for them to share a single deque, but the code > gets more complicated. Is it possible to make it simpler instead? > > What it "really" needs is a shared singly-linked list of values, > pointing from oldest value to newest. Then each derived generator can > just follow the links, and yield its next result in time independent > of the number of derived generators. But how do you know when a new > value needs to be obtained from the original iterator, and how do you > know when space for an older value can be recycled (because all of the > derived generators have yielded it)? > > I ended up with almost a full page of code to do that, storing with > each value and link a count of the number of derived generators that > had yet to yield the value, effectively coding my own reference-count > scheme by hand, along with "head" and "tail" pointers to the ends of > the linked list that proved remarkably tricky to keep correct in all > cases. > > Then I thought "this is stupid! Python already does reference > counting." Voila! Vast swaths of tedious code vanished, giving this > remarkably simple implementation: > > def mytee(xs, n): > last = [None, None] > > def gen(it, mylast): > nonlocal last > while True: > mylast = mylast[1] > if not mylast: > mylast = last[1] = last = [next(it), None] > yield mylast[0] > > it = iter(xs) > return tuple(gen(it, last) for _ in range(n)) > > There's no need to keep a pointer to the start of the shared list at > all - we only need a pointer to the end of the list ("last"), and each > derived generator only needs a pointer to its own current position in > the list ("mylast"). > > What I find kind of hilarious is that it's no help at all as a > prototype for a C implementation: Python recycles stale `[next(it), > None]` pairs all by itself, when their internal refcounts fall to 0. > That's the hardest part. > > BTW, I certainly don't suggest adding this to the itertools docs > either. While it's short and elegant, it's too subtle to grasp easily > - if you think "it's obvious", you haven't yet thought hard enough > about the problem ;-) > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] A cute Python implementation of itertools.tee
I posted this several weeks ago, just for fun - an obscure but surprisingly brief Python implementation of itertools.tee, sharing a single stream (as a singly linked list) among all the returned iterables. Didn't think about it again until today, when recent discussions of lexical scoping made me wonder "hmm - when's the last time I even used nonlocal?". Well, this was. And shortly thereafter, "but there's no need to keep track of `last` at all!" I have no idea where that thought came from :-) > def mytee(xs, n): > last = [None, None] > > def gen(it, mylast): > nonlocal last > while True: > mylast = mylast[1] > if not mylast: > mylast = last[1] = last = [next(it), None] > yield mylast[0] > > it = iter(xs) > return tuple(gen(it, last) for _ in range(n)) So, in fact, the inner function there can be replaced by the even briefer: def gen(it, mylast): while True: if mylast[1] is None: mylast[1] = [next(it), None] mylast = mylast[1] yield mylast[0] To make it more relevant to current discussions, collapsing the last two lines using a binding expression: yield (mylast := mylast[1])[0] isn't even slightly tempting. Most of the cases where it would be _possible_ to use binding expressions don't strike me as being _sensible_ places to use them - slamming together conceptually different tasks just because it's possible to do so. But because name bindings are so very common, that still leaves plenty where the transformation leaves the code clearer to my eyes (usually just a little, rarely a lot). There are no such cases in the code above. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] A cute Python implementation of itertools.tee
On Sun, Apr 15, 2018 at 08:35:51PM +0300, Serhiy Storchaka wrote: > I have ideas about implementing zero-overhead try/except, but I have > doubts that it is worth. The benefit seems too small. It is conventional wisdom that catching exceptions is expensive, and that in performance critical code it is better to "look before you leap" if possible, and avoid try...except. Are you saying this advice is obsolete? If not, then perhaps reducing the overhead of catching exceptions may be worthwhile. -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] A cute Python implementation of itertools.tee
On Sun, Apr 15, 2018 at 11:55 PM, Chris Angelicowrote: > On Mon, Apr 16, 2018 at 6:46 AM, Koos Zevenhoven > wrote: > > Anyway, the whole linked list is unnecessary if the iterable can be > iterated > > over multiple times. But "tee" won't know when to do that. *That* is > what I > > call overhead (unless of course all the tee branches are consumed in an > > interleaved manner). > > But if you have something you can iterate over multiple times, why > bother with tee at all? Just take N iterators from the underlying > iterable. The overhead is intrinsic to the value of the function. > > Indeed. But if you have, say, an Iterable[int], you don't know if you need the additional buffer or not. It could be a range object or a set or a generator (or iterator), who knows. Even your type checker doesn't know what you need. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] A cute Python implementation of itertools.tee
On Mon, Apr 16, 2018 at 12:06 AM, Tim Peterswrote: > [Koos Zevenhoven ] > > It's definitely possible to write the above in a more > > readable way, and FWIW I don't think it involves "assignments as > > expressions". > > Of course it is. The point was brevity and speed, not readability. > It was presented partly as a puzzle :-) > > >> What I find kind of hilarious is that it's no help at all as a > >> prototype for a C implementation: Python recycles stale `[next(it), > >> None]` pairs all by itself, when their internal refcounts fall to 0. > >> That's the hardest part. > > > Why can't the C implementation use Python refcounts? Are you talking > about > > standalone C code? > > Yes, expressing the algorithm in plain old C, not building on top of > (say) the Python C API. > > There must have been a reason why pseudo code was "invented". > > Or perhaps you are thinking about overhead? > > Nope. > > > > (In PEP 555 that was not a concern, though). Surely it would make sense > > to reuse the refcounting code that's already there. There are no cycles > > here, so it's not particulaly complicated -- just duplication. > > > > Anyway, the whole linked list is unnecessary if the iterable can be > iterated > > over multiple times. > > If the latter were how iterables always worked, there would be no need > for tee() at all. It's tee's _purpose_ to make it possible for > multiple consumers to traverse an iterable's can't-restart-or-even > -go-back result sequence each at their own pace. > Yes. (I'm not sure which is easier, going back or starting from the beginning) -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] A cute Python implementation of itertools.tee
[Koos Zevenhoven] > It's definitely possible to write the above in a more > readable way, and FWIW I don't think it involves "assignments as > expressions". Of course it is. The point was brevity and speed, not readability. It was presented partly as a puzzle :-) >> What I find kind of hilarious is that it's no help at all as a >> prototype for a C implementation: Python recycles stale `[next(it), >> None]` pairs all by itself, when their internal refcounts fall to 0. >> That's the hardest part. > Why can't the C implementation use Python refcounts? Are you talking about > standalone C code? Yes, expressing the algorithm in plain old C, not building on top of (say) the Python C API. > Or perhaps you are thinking about overhead? Nope. > (In PEP 555 that was not a concern, though). Surely it would make sense > to reuse the refcounting code that's already there. There are no cycles > here, so it's not particulaly complicated -- just duplication. > > Anyway, the whole linked list is unnecessary if the iterable can be iterated > over multiple times. If the latter were how iterables always worked, there would be no need for tee() at all. It's tee's _purpose_ to make it possible for multiple consumers to traverse an iterable's can't-restart-or-even -go-back result sequence each at their own pace. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] A cute Python implementation of itertools.tee
On Mon, Apr 16, 2018 at 6:46 AM, Koos Zevenhovenwrote: > Anyway, the whole linked list is unnecessary if the iterable can be iterated > over multiple times. But "tee" won't know when to do that. *That* is what I > call overhead (unless of course all the tee branches are consumed in an > interleaved manner). But if you have something you can iterate over multiple times, why bother with tee at all? Just take N iterators from the underlying iterable. The overhead is intrinsic to the value of the function. ChrisA ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] A cute Python implementation of itertools.tee
On Sun, Apr 15, 2018 at 8:05 AM, Tim Peterswrote: [...] > Then I thought "this is stupid! Python already does reference > counting." Voila! Vast swaths of tedious code vanished, giving this > remarkably simple implementation: > > def mytee(xs, n): > last = [None, None] > > def gen(it, mylast): > nonlocal last > while True: > mylast = mylast[1] > if not mylast: > mylast = last[1] = last = [next(it), None] > yield mylast[0] > > it = iter(xs) > return tuple(gen(it, last) for _ in range(n)) > > There's no need to keep a pointer to the start of the shared list at > all - we only need a pointer to the end of the list ("last"), and each > derived generator only needs a pointer to its own current position in > the list ("mylast"). > > Things here remind me of my implementation design for PEP 555: the "contexts" present in the process are represented by a singly-linked tree of assignment objects. It's definitely possible to write the above in a more readable way, and FWIW I don't think it involves "assignments as expressions". > What I find kind of hilarious is that it's no help at all as a > prototype for a C implementation: Python recycles stale `[next(it), > None]` pairs all by itself, when their internal refcounts fall to 0. > That's the hardest part. > > Why can't the C implementation use Python refcounts? Are you talking about standalone C code? Or perhaps you are thinking about overhead? (In PEP 555 that was not a concern, though). Surely it would make sense to reuse the refcounting code that's already there. There are no cycles here, so it's not particulaly complicated -- just duplication. Anyway, the whole linked list is unnecessary if the iterable can be iterated over multiple times. But "tee" won't know when to do that. *That* is what I call overhead (unless of course all the tee branches are consumed in an interleaved manner). > BTW, I certainly don't suggest adding this to the itertools docs > either. While it's short and elegant, it's too subtle to grasp easily > - if you think "it's obvious", you haven't yet thought hard enough > about the problem ;-) > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] A cute Python implementation of itertools.tee
On Sun, 15 Apr 2018 00:05:58 -0500 Tim Peterswrote: > Just for fun - no complaint, no suggestion, just sharing a bit of code > that tickled me. > > The docs for `itertools.tee()` contain a Python work-alike, which is > easy to follow. It gives each derived generator its own deque, and > when a new value is obtained from the original iterator it pushes that > value onto each of those deques. > > Of course it's possible for them to share a single deque, but the code > gets more complicated. Is it possible to make it simpler instead? > > What it "really" needs is a shared singly-linked list of values, > pointing from oldest value to newest. Then each derived generator can > just follow the links, and yield its next result in time independent > of the number of derived generators. But how do you know when a new > value needs to be obtained from the original iterator, and how do you > know when space for an older value can be recycled (because all of the > derived generators have yielded it)? > > I ended up with almost a full page of code to do that, storing with > each value and link a count of the number of derived generators that > had yet to yield the value, effectively coding my own reference-count > scheme by hand, along with "head" and "tail" pointers to the ends of > the linked list that proved remarkably tricky to keep correct in all > cases. > > Then I thought "this is stupid! Python already does reference > counting." Voila! Vast swaths of tedious code vanished, giving this > remarkably simple implementation: This implementation doesn't work with Python 3.7 or 3.8. I've tried it here: https://gist.github.com/pitrou/b3991f638300edb6d06b5be23a4c66d6 and get: Traceback (most recent call last): File "mytee.py", line 14, in gen mylast = last[1] = last = [next(it), None] StopIteration The above exception was the direct cause of the following exception: Traceback (most recent call last): File "mytee.py", line 47, in run(mytee1) File "mytee.py", line 36, in run lists[i].append(next(iters[i])) RuntimeError: generator raised StopIteration (Yuck!) In short, you want the following instead: try: mylast = last[1] = last = [next(it), None] except StopIteration: return > def mytee(xs, n): > last = [None, None] > > def gen(it, mylast): > nonlocal last > while True: > mylast = mylast[1] > if not mylast: > mylast = last[1] = last = [next(it), None] That's smart and obscure :-o The way it works is that the `last` assignment changes the `last` value seen by all derived generators, while the `last[1]` assignment updates the bindings made in the other generators' `mylast` lists... It's difficult to find the words to explain it. The chained assignment makes it more difficult to parse as well (when I read this I don't know if `last[i]` or `last` gets assigned first; apparently the answer is `last[i]`, otherwise the recipe wouldn't work correctly). Perhaps like this: while True: mylast = mylast[1] if not mylast: try: # Create new list link mylast = [next(it), None] except StopIteration: return else: # Append to other generators `mylast` linked lists last[1] = mylast # Update shared list link last = last[1] yield mylast[0] Regards Antoine. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] A cute Python implementation of itertools.tee
Just for fun - no complaint, no suggestion, just sharing a bit of code that tickled me. The docs for `itertools.tee()` contain a Python work-alike, which is easy to follow. It gives each derived generator its own deque, and when a new value is obtained from the original iterator it pushes that value onto each of those deques. Of course it's possible for them to share a single deque, but the code gets more complicated. Is it possible to make it simpler instead? What it "really" needs is a shared singly-linked list of values, pointing from oldest value to newest. Then each derived generator can just follow the links, and yield its next result in time independent of the number of derived generators. But how do you know when a new value needs to be obtained from the original iterator, and how do you know when space for an older value can be recycled (because all of the derived generators have yielded it)? I ended up with almost a full page of code to do that, storing with each value and link a count of the number of derived generators that had yet to yield the value, effectively coding my own reference-count scheme by hand, along with "head" and "tail" pointers to the ends of the linked list that proved remarkably tricky to keep correct in all cases. Then I thought "this is stupid! Python already does reference counting." Voila! Vast swaths of tedious code vanished, giving this remarkably simple implementation: def mytee(xs, n): last = [None, None] def gen(it, mylast): nonlocal last while True: mylast = mylast[1] if not mylast: mylast = last[1] = last = [next(it), None] yield mylast[0] it = iter(xs) return tuple(gen(it, last) for _ in range(n)) There's no need to keep a pointer to the start of the shared list at all - we only need a pointer to the end of the list ("last"), and each derived generator only needs a pointer to its own current position in the list ("mylast"). What I find kind of hilarious is that it's no help at all as a prototype for a C implementation: Python recycles stale `[next(it), None]` pairs all by itself, when their internal refcounts fall to 0. That's the hardest part. BTW, I certainly don't suggest adding this to the itertools docs either. While it's short and elegant, it's too subtle to grasp easily - if you think "it's obvious", you haven't yet thought hard enough about the problem ;-) ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/