Re: [Python-ideas] Run length encoding

2017-06-11 Thread Serhiy Storchaka

11.06.17 09:17, Neal Fultz пише:
   * other people have been false positive and wanted a SQL-type group 
by, but got burned

* hence the warnings in the docs.


This wouldn't help if people don't read the docs.

Also, if someone rewrote zip in pure python, would many people actually 
notice a slow down vs network latency, disk IO,  etc?


Definitely yes.


RLE is a building block just like bisect.


This is very specific building block. And if ZIP compression be rewrote 
in pure Python it wouldn't use


FYI, there are multiple compression methods supported in ZIP files, but 
the zipmodule module implements not all of them. In particular simple 
RLE based methods are not implemented (they almost not used in real 
world now). I suppose that if the zipmodule module implements these 
algorithms it wouldn't use any general RLE implementation.


https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Run length encoding

2017-06-11 Thread Neal Fultz
If the consensus is "Let's add ten lines to the recipes" I'm all aboard,
ignore the rest:

if I could have googled a good answer I would have stopped there. I won't
argue the necessity or obviousness of itertools.groupby, just it's name:
  * I myself am a false negative that wanted the RLE behavior
  *and couldn't find it easily
  * so we should update the docs
  * other people have been false positive and wanted a SQL-type group by,
but got burned
   * hence the warnings in the docs.
  * If you say explicate "by run", some extra group of them will
then  know what that means vs the current wording.


I would definitely also support adding helper functions though, I think
this is a very common use case which turns up in math/optimization applied
to geology, biology, ... , and also fax machines:
https://en.wikipedia.org/wiki/Run-length_encoding

Also, if someone rewrote zip in pure python, would many people actually
notice a slow down vs network latency, disk IO,  etc? RLE is a building
block just like bisect.

:) Anyway, I'm not claiming my implementation is some huge gift, but let's
at least add a recipe or documentation so people can find y'all's way later
without reinventing the wheel.





On Sat, Jun 10, 2017 at 10:19 PM, David Mertz  wrote:

> If you understand what iterators do, the fact that itertools.groupby
> collects contiguous elements is both obvious and necessary.  Iterators
> might be infinitely long... you cannot ask for every "A" that might
> eventually occur in an infinite sequence of letters.
>
> On Sat, Jun 10, 2017 at 10:08 PM, Neal Fultz  wrote:
>
>> Agreed to a degree about providing it as code, but it may also be worth
>> mentioning also that zlib itself implements rle [1], and if there was ever
>> a desire to  go "python all the way down" you need an RLE somewhere anyway
>> :)
>>
>> That said, I'll be pretty happy with anything that replaces an hour of
>> google/coding/testing/(hour later find out I'm an idiot from a random
>> listserv) with 1 minute of googling.  Again, my issue isn't that it was
>> difficult to code, but it *was* hard to make the research-y jump from
>> googling for "run length encoding python", where I knew *exactly* what
>> algorithm I wanted, to  "itertools.groupby" which appears to be more
>> general purpose and needs a little tweaking.  Adjusting the docs/recipes
>> would probably solve that problem.
>>
>>  -- To me this is roughly on the same level as googling for 'binary
>> search python' and not having bisect show up.
>>
>> However, the fact that  `itertools.groupby` doesn't group over elements
>> that are not contiguous is a bit surprising to me coming from SQL/pandas/R
>> land (that is probably a large part of my disconnect here). This is
>> actually explicitly called out in the current docs, but I wonder how many
>> people search for one thing and find the other:
>>
>>  I googled for RLE and the solution was actually groupby, but probably a
>> lot of other people want a SQL group-by accidentally got an RLE and have to
>> work around that... Then again, I don't know if you all can easily change
>> names of functions at this point.
>>
>> -Neal
>>
>> [1] https://github.com/madler/zlib/blob/master/deflate.c#L2057
>>
>>
>>
>> On Sat, Jun 10, 2017 at 9:39 PM, Greg Ewing 
>> wrote:
>>
>>> In my experience, RLE isn't something you often find on its own.
>>> Usually it's used as part of some compression scheme that also
>>> has ways of encoding verbatim runs of data and maybe other
>>> things.
>>>
>>> So I'm skeptical that it can be usefully provided as a library
>>> function. It seems more like a design pattern than something
>>> you can capture in a library.
>>>
>>> --
>>> Greg
>>>
>>>
>>> ___
>>> Python-ideas mailing list
>>> Python-ideas@python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>>
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/