If what you really want is sparse matrices, you should use those: https://docs.scipy.org/doc/scipy/reference/sparse.html.
Or maybe from the experimental Dask offshoot that I contributed a few lines to: https://github.com/mrocklin/sparse. Either of those will be about two orders of magnitude faster than working with Python lists for numeric data. The reason, I think, that there's no RLE module in Python (standard library, there's probably something on PyPI) is that it's so easy to roll your own with the building blocks in itertools. The `zipfile` and `gzip` modules are written in hundreds or thousands of lines of C code, and more importantly they are for dealing with *files* mostly, not generic sequences... that's the domain of `itertools`, but `itertools` is kept to a bare minimum collection of building blocks from which 1-10 line functions can be built efficiently. On Sat, Jun 10, 2017 at 8:14 PM, Neal Fultz <nfu...@gmail.com> wrote: > Thanks, that's cool. Maybe the root problem is that the docs aren't > using the right words when I google. Run-length-encoding is particularly > relevant for spare matrices, but there's probably a library for those as > well. On the data science side of things, there's a few hundred R packages > that use it there[1]. > > Can you explicate the guiding principle a bit? I'm perplexed that python > would come with zip and gzip but not rle. > > [1] : https://github.com/search?l=R&q=user%3Acran+rle&type=Code& > utf8=%E2%9C%93 > > On Sat, Jun 10, 2017 at 7:59 PM, David Mertz <me...@gnosis.cx> wrote: > >> Here's a one-line version: >> >> from itertools import groupby >> rle_encode = lambda it: ( >> (l[0],len(l)) for g in groupby(it) for l in [list(g[1])]) >> >> Since "not every one line function needs to be in the standard library" >> is a guiding principle of Python, and even moreso of `itertools`, probably >> this is a recipe in the documentation at most. Or maybe it would have a >> home in `more_itertools`. >> >> >> On Sat, Jun 10, 2017 at 7:20 PM, Neal Fultz <nfu...@gmail.com> wrote: >> >>> Hello python-ideas, >>> >>> I am very new to this, but on a different forum and after a couple >>> conversations, I really wished Python came with run-length encoding >>> built-in; after all, it ships with zip, which is much more complicated :) >>> >>> The general idea is to be able to go back and forth between two >>> representations of a sequence: >>> >>> [1,1,1,1,2,3,4,4,3,3,3] >>> >>> and >>> >>> [(1, 4), (2, 1), (3, 1), (4, 2), (3, 3)] >>> >>> where the first element is the data element, and the second is how many >>> times it is repeated. >>> >>> I wrote an encoder/decoder in about 20 lines ( >>> https://github.com/nfultz/rle.py/blob/master/rle.py ) and would like to >>> offer it for the next version; I think it might fit in nicely in the >>> itertools module, for example. I am curious about your thoughts. >>> >>> Best, >>> >>> -Neal >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas@python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/