Re: [Python-ideas] Run length encoding

2017-06-10 Thread David Mertz
If you understand what iterators do, the fact that itertools.groupby collects contiguous elements is both obvious and necessary. Iterators might be infinitely long... you cannot ask for every "A" that might eventually occur in an infinite sequence of letters. On Sat, Jun 10, 2017 at 10:08 PM,

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Neal Fultz
Agreed to a degree about providing it as code, but it may also be worth mentioning also that zlib itself implements rle [1], and if there was ever a desire to go "python all the way down" you need an RLE somewhere anyway :) That said, I'll be pretty happy with anything that replaces an hour of

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Serhiy Storchaka
11.06.17 05:20, Neal Fultz пише: I am very new to this, but on a different forum and after a couple conversations, I really wished Python came with run-length encoding built-in; after all, it ships with zip, which is much more complicated :) The general idea is to be able to go back and

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Nick Coghlan
On 11 June 2017 at 13:35, David Mertz wrote: > You are right. I made a thinko. > > List construction from an iterator is O(N) just as is `sum(1 for _ in it)`. > Both of them need to march through every element. But as a constant > multiplier, just constructing the list should

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Greg Ewing
In my experience, RLE isn't something you often find on its own. Usually it's used as part of some compression scheme that also has ways of encoding verbatim runs of data and maybe other things. So I'm skeptical that it can be usefully provided as a library function. It seems more like a design

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Nick Coghlan
On 11 June 2017 at 13:35, Neal Fultz wrote: > Whoops, scratch that part about encode /decode. Aye, decode is a relatively straightforward nested comprehension: def run_length_decode(iterable): return (item for item, item_count in iterable for __ in

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Neal Fultz
I would also submit there's some value in the obvious readability of z = runlength.encode(sequence) vs z = [(k, len(list(g))) for k, g in itertools.groupby(sequence)] but that's my personal opinion. Everyone is welcome to use my code, but I probably won't submit to pypi for a two function

Re: [Python-ideas] Run length encoding

2017-06-10 Thread David Mertz
God no! Not in the Python 2 docs! ... if the recipe belongs somewhere it's in the Python 3 docs. Although, I suppose it could go under 2 also, since it's not actually a behavior change in the feature-frozen interpreter. But as a Python instructor (and someone who remembers the cool new features

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Terry Reedy
On 6/10/2017 11:27 PM, Joshua Morton wrote: Neal: As for why zip (at first I thought you meant the zip function, not the zip compression scheme) is included and rle is not, zip is (or was), I believe, used as part of python's packaging infrastructure, hopefully someone else can correct me if

Re: [Python-ideas] Run length encoding

2017-06-10 Thread David Mertz
You are right. I made a thinko. List construction from an iterator is O(N) just as is `sum(1 for _ in it)`. Both of them need to march through every element. But as a constant multiplier, just constructing the list should be faster than needing an addition (Python append is O(1) because of

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Neal Fultz
Whoops, scratch that part about encode /decode. On Sat, Jun 10, 2017 at 8:33 PM, Neal Fultz wrote: > Yes, I mean zip compression :) > > Also, everyone's been posting decode functions, but encode is a bit harder > :). > > I think it should be equally easy to go one direction as

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Neal Fultz
Yes, I mean zip compression :) Also, everyone's been posting decode functions, but encode is a bit harder :). I think it should be equally easy to go one direction as the other. Hopefully this email chain builds up enough info to update the docs for posterity / future me. On Sat, Jun 10, 2017

Re: [Python-ideas] Run length encoding

2017-06-10 Thread David Mertz
If what you really want is sparse matrices, you should use those: https://docs.scipy.org/doc/scipy/reference/sparse.html. Or maybe from the experimental Dask offshoot that I contributed a few lines to: https://github.com/mrocklin/sparse. Either of those will be about two orders of magnitude

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Bernardo Sulzbach
On 2017-06-11 00:13, David Mertz wrote: Bernardo Sulzbach posted a much prettier version than mine that is a bit shorter. But his is also somewhat slower (and I believe asymptotically so as the number of equal elements in subsequence goes up). He needs to sum up a bunch of 1's repeatedly

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Neal Fultz
Thanks, that's cool. Maybe the root problem is that the docs aren't using the right words when I google. Run-length-encoding is particularly relevant for spare matrices, but there's probably a library for those as well. On the data science side of things, there's a few hundred R packages that

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Joshua Morton
Another is [(k, len(list(g))) for k, g in groupby(l)] It might be worth adding it to the list of recipies either at https://docs.python.org/2/library/itertools.html#itertools.groupby or at https://docs.python.org/2/library/itertools.html#recipes, though. On Sat, Jun 10, 2017 at 8:07 PM David

Re: [Python-ideas] Run length encoding

2017-06-10 Thread David Mertz
Here's a one-line version: from itertools import groupby rle_encode = lambda it: ( (l[0],len(l)) for g in groupby(it) for l in [list(g[1])]) Since "not every one line function needs to be in the standard library" is a guiding principle of Python, and even moreso of `itertools`, probably this

Re: [Python-ideas] Run length encoding

2017-06-10 Thread Bernardo Sulzbach
On 2017-06-10 23:20, Neal Fultz wrote: Hello python-ideas, I am very new to this, but on a different forum and after a couple conversations, I really wished Python came with run-length encoding built-in; after all, it ships with zip, which is much more complicated :) The general idea is to

[Python-ideas] Run length encoding

2017-06-10 Thread Neal Fultz
Hello python-ideas, I am very new to this, but on a different forum and after a couple conversations, I really wished Python came with run-length encoding built-in; after all, it ships with zip, which is much more complicated :) The general idea is to be able to go back and forth between two