Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment:

For what it's worth, I've had idea about string storage, roughly based on how 
*nix stores data on disk.

If a string is small, point to a block of codepoints.

If a string is medium-sized, point to a block of pointers to codepoint blocks.

If a string is large, point to a block of pointers to pointer blocks.

This means that a large string doesn't need a single large allocation.

The level of indirection can be increased as necessary.

For simplicity, all codepoint blocks contain the same number of codepoints, 
except the final codepoint block, which may contain fewer.

A codepoint block may use the minimum width necessary (1, 2 or 4 bytes) to 
store all of its codepoints.

This means that there are no surrogates and that different sections of the 
string can be stored in different widths to reduce memory usage.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12729>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to