If the consensus is "Let's add ten lines to the recipes" I'm all aboard, ignore the rest:
if I could have googled a good answer I would have stopped there. I won't argue the necessity or obviousness of itertools.groupby, just it's name: * I myself am a false negative that wanted the RLE behavior *and couldn't find it easily * so we should update the docs * other people have been false positive and wanted a SQL-type group by, but got burned * hence the warnings in the docs. * If you say explicate "by run", some extra group of them will then know what that means vs the current wording. I would definitely also support adding helper functions though, I think this is a very common use case which turns up in math/optimization applied to geology, biology, ... , and also fax machines: https://en.wikipedia.org/wiki/Run-length_encoding Also, if someone rewrote zip in pure python, would many people actually notice a slow down vs network latency, disk IO, etc? RLE is a building block just like bisect. :) Anyway, I'm not claiming my implementation is some huge gift, but let's at least add a recipe or documentation so people can find y'all's way later without reinventing the wheel. On Sat, Jun 10, 2017 at 10:19 PM, David Mertz <me...@gnosis.cx> wrote: > If you understand what iterators do, the fact that itertools.groupby > collects contiguous elements is both obvious and necessary. Iterators > might be infinitely long... you cannot ask for every "A" that might > eventually occur in an infinite sequence of letters. > > On Sat, Jun 10, 2017 at 10:08 PM, Neal Fultz <nfu...@gmail.com> wrote: > >> Agreed to a degree about providing it as code, but it may also be worth >> mentioning also that zlib itself implements rle [1], and if there was ever >> a desire to go "python all the way down" you need an RLE somewhere anyway >> :) >> >> That said, I'll be pretty happy with anything that replaces an hour of >> google/coding/testing/(hour later find out I'm an idiot from a random >> listserv) with 1 minute of googling. Again, my issue isn't that it was >> difficult to code, but it *was* hard to make the research-y jump from >> googling for "run length encoding python", where I knew *exactly* what >> algorithm I wanted, to "itertools.groupby" which appears to be more >> general purpose and needs a little tweaking. Adjusting the docs/recipes >> would probably solve that problem. >> >> -- To me this is roughly on the same level as googling for 'binary >> search python' and not having bisect show up. >> >> However, the fact that `itertools.groupby` doesn't group over elements >> that are not contiguous is a bit surprising to me coming from SQL/pandas/R >> land (that is probably a large part of my disconnect here). This is >> actually explicitly called out in the current docs, but I wonder how many >> people search for one thing and find the other: >> >> I googled for RLE and the solution was actually groupby, but probably a >> lot of other people want a SQL group-by accidentally got an RLE and have to >> work around that... Then again, I don't know if you all can easily change >> names of functions at this point. >> >> -Neal >> >> [1] https://github.com/madler/zlib/blob/master/deflate.c#L2057 >> >> >> >> On Sat, Jun 10, 2017 at 9:39 PM, Greg Ewing <greg.ew...@canterbury.ac.nz> >> wrote: >> >>> In my experience, RLE isn't something you often find on its own. >>> Usually it's used as part of some compression scheme that also >>> has ways of encoding verbatim runs of data and maybe other >>> things. >>> >>> So I'm skeptical that it can be usefully provided as a library >>> function. It seems more like a design pattern than something >>> you can capture in a library. >>> >>> -- >>> Greg >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas@python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas@python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/