Re: [Python-ideas] Add a .chunks() method to sequences

Erik Fri, 05 May 2017 03:31:46 -0700

Hi Nick,

On 05/05/17 08:29, Nick Coghlan wrote:

And then given the proposed str.splitgroups() on the one hand, and the
existing memoryview.cast() on the other, offering
itertools.itergroups() as a corresponding building block specifically
for working with streams of regular data would make sense to me -
that's a standard approach in time-division multiplexing protocols,
and it also shows up in areas like digital audio processing as well
(where you're often doing things like shuffling incoming data chunks
into FFT buffers)

It looks to me like your "itertools.itergroups()" is similar tomore_itertools.chunked() - with at least one obvious change, see below(*).

If anyone wants to persue this (or any itertools) enhancement, thenplease be aware of the following thread (and in particular the messagebeing linked to - and the bug and discussion that it is replying to):


https://mail.python.org/pipermail/python-dev/2012-July/120885.html

I have been told off for bringing this up already, but I do it again indirect response to your suggestion because it seems there is a bar togetting something included in itertools and something like "chunked()"has already failed to make it. The thing to do is probably to talkdirectly to Raymond to see if there's an acceptable solution firstbefore too much work is put into something that may be rejected as beingtoo high level.

It may be that a C version of "more_itertools" for things which peoplewould find a speedup useful might be a solution (where themore_itertools package defers to those built-ins if they exist on theversion of Python its executing on, otherwise uses its existingimplementation as a fallback). I am not suggesting implementing the_whole_ of more_itertools in C - it's quite large now.

(*) I had implemented itertools.chunked in C before (also for audioprocessing, as it happens) and one thing that I didn't like is the waystrings get unpacked:


>>> tuple(more_itertools.chunked("foo bar baz", 2))
(['f', 'o'], ['o', ' '], ['b', 'a'], ['r', ' '], ['b', 'a'], ['z'])

If the chunked/itergroups method checked for the presence of a__chunks__ or similar dunder method in the source sequence which returnsan iterator, then the string class could efficiently yield substringsrather than individual characters which then had to be wrapped in a listor tuple (which I think is what you wanted itergroups() to do):


>>> tuple(itertools.chunked("foo bar baz", 2))
('fo', 'o ', 'ba', 'r ', 'ba', 'z')

Similarly, for objects which _represent_ a lot of data but do notactually hold those data literally (for example, range objects or evenmemoryviews), the returned chunks can also be representations of thedata (subranges or subviews) and not the actual rendered data. Forexample, the existing:


>>> range(10)
range(0, 10)
>>> tuple(more_itertools.chunked(range(10), 3))
([0, 1, 2], [3, 4, 5], [6, 7, 8], [9])

becomes:

>>> tuple(more_itertools.chunked(range(10), 3))
(range(0, 3), range(3, 6), range(6, 9), range(9, 10))

Obviously, with those short strings and ranges one could argue thatthere's no point, but the principle of doing it this way scales betterthan the version that collects all of the data in lists - for thingslike chunks of some sort of "view" object, you would still only have theactual data stored once in the original object.

I suppose that one thing to consider is what happens when an iterator ispassed to the chunked() function. An iterator could have a __chunks__method which returned chunks of the source sequence from the existingpoint in the iteration, however the difference between such an iteratorand one that _doesn't_ have a __chunks__ method is that in the secondcase the iterator would be consumed by the fall-back code which justdoes what more_itertools.chunked() does now, but in the first it would not.

Perhaps there is a precedent for that particular edge case withiterators in a different context.


Hope that helps,
E.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Add a .chunks() method to sequences

Reply via email to