The topic reminded me of Stephan Houben's streamtee. https://github.com/stephanh42/streamtee/blob/master/streamtee.py
It was an attempt at a memory-efficient tee, but it turned out tee was efficient enough. It uses a thunk-like method and recursion to remove the need for an explicit linked list. In the spirit of the topic, I tried to strip it down to the bare essentials: def tee(iterable, n): stream = _streamiter(iter(iterable)) return tuple(_wrapper(stream) for _ in range(n)) def _streamiter(itr): result = (next(itr), _streamiter(itr)) del itr # Release unneeded reference. while True: yield result def _wrapper(stream): while True: result, stream = next(stream) yield result Unfortunately, this code, too, breaks with the new StopIteration propagation rules. On Sun, Apr 15, 2018 at 1:05 AM, Tim Peters <tim.pet...@gmail.com> wrote: > Just for fun - no complaint, no suggestion, just sharing a bit of code > that tickled me. > > The docs for `itertools.tee()` contain a Python work-alike, which is > easy to follow. It gives each derived generator its own deque, and > when a new value is obtained from the original iterator it pushes that > value onto each of those deques. > > Of course it's possible for them to share a single deque, but the code > gets more complicated. Is it possible to make it simpler instead? > > What it "really" needs is a shared singly-linked list of values, > pointing from oldest value to newest. Then each derived generator can > just follow the links, and yield its next result in time independent > of the number of derived generators. But how do you know when a new > value needs to be obtained from the original iterator, and how do you > know when space for an older value can be recycled (because all of the > derived generators have yielded it)? > > I ended up with almost a full page of code to do that, storing with > each value and link a count of the number of derived generators that > had yet to yield the value, effectively coding my own reference-count > scheme by hand, along with "head" and "tail" pointers to the ends of > the linked list that proved remarkably tricky to keep correct in all > cases. > > Then I thought "this is stupid! Python already does reference > counting." Voila! Vast swaths of tedious code vanished, giving this > remarkably simple implementation: > > def mytee(xs, n): > last = [None, None] > > def gen(it, mylast): > nonlocal last > while True: > mylast = mylast[1] > if not mylast: > mylast = last[1] = last = [next(it), None] > yield mylast[0] > > it = iter(xs) > return tuple(gen(it, last) for _ in range(n)) > > There's no need to keep a pointer to the start of the shared list at > all - we only need a pointer to the end of the list ("last"), and each > derived generator only needs a pointer to its own current position in > the list ("mylast"). > > What I find kind of hilarious is that it's no help at all as a > prototype for a C implementation: Python recycles stale `[next(it), > None]` pairs all by itself, when their internal refcounts fall to 0. > That's the hardest part. > > BTW, I certainly don't suggest adding this to the itertools docs > either. While it's short and elegant, it's too subtle to grasp easily > - if you think "it's obvious", you haven't yet thought hard enough > about the problem ;-) > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/