On 21Aug2019 21:26, Sarah Hembree wrote:
How do you chunk data? We came up with the below snippet. It works (with
integer list data) for our needs, but it seems so clunky.
def _chunks(lst: list, size: int) -> list:
return [lst[x:x+size] for x in range(0, len(lst), size)]
What do you do? Also, what about doing this lazily so as to keep memory
drag at a minimum?
This looks pretty good to me. But as you say, it constructs the complete
list of chunks and returns them all. For many chunks that is both slow
and memory hungry.
If you want to conserve memory and return chunks in a lazy manner you
can rewrite this as a generator. A first cut might look like this:
def _chunks(lst: list, size: int) -> list:
for x in range(0, len(lst), size):
yield lst[x:x+size]
which causes _chunk() be a generator function: it returns an iterator
which yields each chunk one at a time - the body of the function is kept
"running", but stalled. When you iterate over the return from _chunk()
Python runs that stalled function until it yields a value, then stalls
it again and hands you that value.
Modern Python has a thing called a "generator expression". Your original
function is a "list comprehension": it constructs a list of values and
returns that list. In many cases, particularly for very long lists, that
can be both slow and memory hungry. You can rewrite such a thing like
this:
def _chunks(lst: list, size: int) -> list:
return ( lst[x:x+size] for x in range(0, len(lst), size) )
Omitting the square brackets turns this into a generator expression. It
returns an iterator instead of a list, which functions like the
generator function I sketched, and generates the chunks lazily.
Cheers,
Cameron Simpson
___
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor