You can certainly get fancy and apply delta encoding + entropy
compression, such as done in Parquet, a high-performance data storage
format:
https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5

(the linked paper from Lemire and Boytsov gives a lot of ideas)

But it would be weird to apply such level of engineering when we never
bothered compressing docstrings.

Regards

Antoine.



On Fri, 7 May 2021 23:30:46 +0100
Pablo Galindo Salgado <pablog...@gmail.com> wrote:
> This is actually a very good point. The only disadvantage is that it
> complicates the parsing a bit and we loose the possibility of indexing
> the table by instruction offset.
> 
> On Fri, 7 May 2021 at 23:01, Larry Hastings <la...@hastings.org> wrote:
> 
> > On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
> >
> > Given that column numbers are not very big compared with line numbers, we
> > plan to store these as unsigned chars
> > or unsigned shorts. We ran some experiments over the standard library and
> > we found that the overhead of all pyc files is:
> >
> > * If we use shorts, the total overhead is ~3% (total size 28MB and the
> > extra size is 0.88 MB).
> > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> > extra size is 0.44MB).
> >
> > One of the disadvantages of using chars is that we can only report columns
> > from 1 to 255 so if an error happens in a column
> > bigger than that then we would have to exclude it (and not show the
> > highlighting) for that frame. Unsigned short will allow
> > the values to go from 0 to 65535.
> >
> > Are lnotab entries required to be a fixed size?  If not:
> >
> > if column < 255:
> >     lnotab.write_one_byte(column)
> > else:
> >     lnotab.write_one_byte(255)
> >     lnotab.write_two_bytes(column)
> >
> >
> > I might even write four bytes instead of two in the latter case,
> >
> >
> > */arry*
> > _______________________________________________
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at
> > https://mail.python.org/archives/list/python-dev@python.org/message/B3SFCZPXIKGO3LM6UJVSJXFIRAZH2R26/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> >  
> 



_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UOCHN5ZY3ERPNWOCO2SJRTCDTEYMYVD7/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to