Marc-Andre Lemburg <m...@egenix.com> added the comment: All string length calculations in Python 2.4 are done using ints which are 32-bit, even on 64-bit platforms.
Since UTF-8 can use up to 4 bytes per Unicode code point, the encoder overallocates the needed chunk of memory to len*4 bytes. This will go straight over the 2GB limit the 32-bit int imposes if you try to encode a 512M code point Unicode string. The reason for using ints to represent string length is simple: no one really expected that someone would work with 2GB strings in memory at the time the string API was designed (large hard drives had around 2GB at that time) - strings of such size are simply not supported by Python 2.4. BTW: I wouldn't really count on Python 2.4 working properly on 64-bit platforms. A lot of issues were fixed in Python 2.5 related to 32/64-bit differences. ---------- nosy: +lemburg title: SystemError/MemoryError/OverflowErrors on encode() a unicode string -> SystemError/MemoryError/OverflowErrors on encode() a unicode string _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue7551> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com