If the consensus is "Let's add ten lines to the recipes" I'm all aboard,
ignore the rest:
if I could have googled a good answer I would have stopped there. I won't
argue the necessity or obviousness of itertools.groupby, just it's name:
* I myself am a false negative that wanted the RLE behavior
*and couldn't find it easily
* so we should update the docs
* other people have been false positive and wanted a SQL-type group by,
but got burned
* hence the warnings in the docs.
* If you say explicate "by run", some extra group of them will
then know what that means vs the current wording.
I would definitely also support adding helper functions though, I think
this is a very common use case which turns up in math/optimization applied
to geology, biology, ... , and also fax machines:
https://en.wikipedia.org/wiki/Run-length_encoding
Also, if someone rewrote zip in pure python, would many people actually
notice a slow down vs network latency, disk IO, etc? RLE is a building
block just like bisect.
:) Anyway, I'm not claiming my implementation is some huge gift, but let's
at least add a recipe or documentation so people can find y'all's way later
without reinventing the wheel.
On Sat, Jun 10, 2017 at 10:19 PM, David Mertz wrote:
> If you understand what iterators do, the fact that itertools.groupby
> collects contiguous elements is both obvious and necessary. Iterators
> might be infinitely long... you cannot ask for every "A" that might
> eventually occur in an infinite sequence of letters.
>
> On Sat, Jun 10, 2017 at 10:08 PM, Neal Fultz wrote:
>
>> Agreed to a degree about providing it as code, but it may also be worth
>> mentioning also that zlib itself implements rle [1], and if there was ever
>> a desire to go "python all the way down" you need an RLE somewhere anyway
>> :)
>>
>> That said, I'll be pretty happy with anything that replaces an hour of
>> google/coding/testing/(hour later find out I'm an idiot from a random
>> listserv) with 1 minute of googling. Again, my issue isn't that it was
>> difficult to code, but it *was* hard to make the research-y jump from
>> googling for "run length encoding python", where I knew *exactly* what
>> algorithm I wanted, to "itertools.groupby" which appears to be more
>> general purpose and needs a little tweaking. Adjusting the docs/recipes
>> would probably solve that problem.
>>
>> -- To me this is roughly on the same level as googling for 'binary
>> search python' and not having bisect show up.
>>
>> However, the fact that `itertools.groupby` doesn't group over elements
>> that are not contiguous is a bit surprising to me coming from SQL/pandas/R
>> land (that is probably a large part of my disconnect here). This is
>> actually explicitly called out in the current docs, but I wonder how many
>> people search for one thing and find the other:
>>
>> I googled for RLE and the solution was actually groupby, but probably a
>> lot of other people want a SQL group-by accidentally got an RLE and have to
>> work around that... Then again, I don't know if you all can easily change
>> names of functions at this point.
>>
>> -Neal
>>
>> [1] https://github.com/madler/zlib/blob/master/deflate.c#L2057
>>
>>
>>
>> On Sat, Jun 10, 2017 at 9:39 PM, Greg Ewing
>> wrote:
>>
>>> In my experience, RLE isn't something you often find on its own.
>>> Usually it's used as part of some compression scheme that also
>>> has ways of encoding verbatim runs of data and maybe other
>>> things.
>>>
>>> So I'm skeptical that it can be usefully provided as a library
>>> function. It seems more like a design pattern than something
>>> you can capture in a library.
>>>
>>> --
>>> Greg
>>>
>>>
>>> ___
>>> Python-ideas mailing list
>>> Python-ideas@python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>>
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons. Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/