On Tue, Jan 25, 2011 at 5:43 PM, M.-A. Lemburg <m...@egenix.com> wrote:
> I also don't see how this could save a lot of memory. As an example
> take a French text with say 10mio code points. This would end up
> appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB),
> one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending
> on how many accents are used). That's a saving of -10MB compared to
> today's implementation :-)

If I am reading the pep right, which I may not be as I am no expert on
unicode, the new implementation would actually give a 10MB saving
since the wchar field is optional, so only the str (Latin-1) and utf8
fields would need to be stored. How it decides not to store one field
or another would need to be clarified in the pep is I am right.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to