[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string
Marc-Andre Lemburg m...@egenix.com added the comment: All string length calculations in Python 2.4 are done using ints which are 32-bit, even on 64-bit platforms. Since UTF-8 can use up to 4 bytes per Unicode code point, the encoder overallocates the needed chunk of memory to len*4 bytes. This will go straight over the 2GB limit the 32-bit int imposes if you try to encode a 512M code point Unicode string. The reason for using ints to represent string length is simple: no one really expected that someone would work with 2GB strings in memory at the time the string API was designed (large hard drives had around 2GB at that time) - strings of such size are simply not supported by Python 2.4. BTW: I wouldn't really count on Python 2.4 working properly on 64-bit platforms. A lot of issues were fixed in Python 2.5 related to 32/64-bit differences. -- nosy: +lemburg title: SystemError/MemoryError/OverflowErrors on encode() a unicode string - SystemError/MemoryError/OverflowErrors on encode() a unicode string ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7551 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string
New submission from Andreas Jung aj...@users.sourceforge.net: We encountered a pretty bizarre behavior of Python 2.4.6 while decoding a 600MB long unicode string 'data': Python 2.4.6 (8GB RAM, 64 bit) (Pdb) type(data) type 'unicode' (Pdb) len(data) 601794657 (Pdb) data2=data.encode('utf-8') *** SystemError: Negative size passed to PyString_FromStringAndSize Assuming that this has something to do with a 512MB limit: (Pdb) data2=data[:512*1024*1024].encode('utf-8') *** SystemError: Negative size passed to PyString_FromStringAndSize Same bug...now with 512MB - 1 byte: (Pdb) data2=data[:(256*1024*1024)-1].encode('utf-8') OverflowError Cross-check on a different Linux box (4GB RAM, 4 GB Swap, 64 bit) aj...@blackmoon:~ python2.4 Python 2.4.5 (#1, Jun 9 2008, 10:35:12) [GCC 4.2.1 (SUSE Linux)] on linux2 Type help, copyright, credits or license for more information. data = u'x'*601794657 data2= data.encode('utf-8') Traceback (most recent call last): File stdin, line 1, in ? MemoryError Where is this different behavior coming from? -- messages: 96695 nosy: ajung severity: normal status: open title: SystemError/MemoryError/OverflowErrors on encode() a unicode string versions: Python 2.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7551 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string
Mark Dickinson dicki...@gmail.com added the comment: Is the first machine also a Linux machine? Perhaps the difference is that the first machine has a wide-unicode build (i.e., it uses UCS4 internally) and the other doesn't? Unfortunately there's not much that the python-devs can do about this unless the problem is still present in Python 2.6: Python 2.4 is now more than 5 years old and is no longer maintained, while Python 2.5 is only receiving security fixes at this stage. Can you reproduce the problem with Python 2.6? -- nosy: +mark.dickinson resolution: - out of date status: open - pending ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7551 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string
Andreas Jung aj...@users.sourceforge.net added the comment: Both systems are Linux system running a narrow Python build. The problem does not occur with Python 2.5 or 2.6. Unfortunately this error occurs with Zope 2 which is tied (at least with versions prior to Zope 2.12 to Python 2.4). -- status: pending - open ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7551 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string
Mark Dickinson dicki...@gmail.com added the comment: Well, the signature of PyUnicode_Encode in Python 2.4 (see Objects/unicodeobject.c) is: PyObject *PyUnicode_Encode(const Py_UNICODE *s, int size, const char *encoding, const char *errors) which looks like it might be relevant to the problems you're seeing. In 2.6, the size has type Py_ssize_t instead, which should be a 64-bit type on 64-bit Linux. Closing this, since it's out of date for current Python. -- status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7551 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string
Martin v. Löwis mar...@v.loewis.de added the comment: Just to support Mark's decision: Python 2.4 is no longer maintained; you are on your own with any problems you encounter with it. So closing it as won't fix would also have been appropriate. The same holds for 2.5, unless you can demonstrate this to cause security issues (e.g. crashing the Python interpreter). -- nosy: +loewis ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7551 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com