[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string

2009-12-21 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

All string length calculations in Python 2.4 are done using ints
which are 32-bit, even on 64-bit platforms.

Since UTF-8 can use up to 4 bytes per Unicode code point, the encoder
overallocates the needed chunk of memory to len*4 bytes. This
will go straight over the 2GB limit the 32-bit int imposes if
you try to encode a 512M code point Unicode string.

The reason for using ints to represent string length is simple:
no one really expected that someone would work with 2GB strings
in memory at the time the string API was designed (large hard
drives had around 2GB at that time) - strings of such size are
simply not supported by Python 2.4.

BTW: I wouldn't really count on Python 2.4 working properly on
64-bit platforms. A lot of issues were fixed in Python 2.5
related to 32/64-bit differences.

--
nosy: +lemburg
title: SystemError/MemoryError/OverflowErrors on encode() a unicode string - 
SystemError/MemoryError/OverflowErrors on encode() a  unicode string

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7551
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string

2009-12-20 Thread Andreas Jung

New submission from Andreas Jung aj...@users.sourceforge.net:

We encountered a pretty bizarre behavior of Python 2.4.6 while decoding a 600MB 
long unicode string 
'data':

Python 2.4.6 (8GB RAM, 64 bit)

(Pdb) type(data)
type 'unicode'

(Pdb) len(data)
601794657

(Pdb) data2=data.encode('utf-8')
*** SystemError: Negative size passed to PyString_FromStringAndSize

Assuming that this has something to do with a 512MB limit:

(Pdb) data2=data[:512*1024*1024].encode('utf-8')
*** SystemError: Negative size passed to PyString_FromStringAndSize

Same bug...now with 512MB - 1 byte:

(Pdb) data2=data[:(256*1024*1024)-1].encode('utf-8')
OverflowError

Cross-check on a different Linux box (4GB RAM, 4 GB Swap, 64 bit)

aj...@blackmoon:~ python2.4
Python 2.4.5 (#1, Jun  9 2008, 10:35:12) 
[GCC 4.2.1 (SUSE Linux)] on linux2
Type help, copyright, credits or license for more information.
 data = u'x'*601794657
 data2= data.encode('utf-8')
Traceback (most recent call last):
  File stdin, line 1, in ?
MemoryError

Where is this different behavior coming from?

--
messages: 96695
nosy: ajung
severity: normal
status: open
title: SystemError/MemoryError/OverflowErrors on encode() a unicode string
versions: Python 2.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7551
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string

2009-12-20 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

Is the first machine also a Linux machine?  Perhaps the difference is that 
the first machine has a wide-unicode build (i.e., it uses UCS4 internally) 
and the other doesn't?

Unfortunately there's not much that the python-devs can do about this 
unless the problem is still present in Python 2.6:  Python 2.4 is now more 
than 5 years old and is no longer maintained, while Python 2.5 is only 
receiving security fixes at this stage.  Can you reproduce the problem 
with Python 2.6?

--
nosy: +mark.dickinson
resolution:  - out of date
status: open - pending

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7551
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string

2009-12-20 Thread Andreas Jung

Andreas Jung aj...@users.sourceforge.net added the comment:

Both systems are Linux system running a narrow Python build.

The problem does not occur with Python 2.5 or 2.6.

Unfortunately this error occurs with Zope 2 which is tied (at least with 
versions prior to Zope 2.12 to Python 2.4).

--
status: pending - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7551
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string

2009-12-20 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

Well, the signature of PyUnicode_Encode in Python 2.4 (see 
Objects/unicodeobject.c) is:

PyObject *PyUnicode_Encode(const Py_UNICODE *s,
   int size,
   const char *encoding,
   const char *errors)

which looks like it might be relevant to the problems you're seeing.  In 
2.6, the size has type Py_ssize_t instead, which should be a 64-bit type 
on 64-bit Linux.

Closing this, since it's out of date for current Python.

--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7551
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string

2009-12-20 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

Just to support Mark's decision: Python 2.4 is no longer maintained; you
are on your own with any problems you encounter with it. So closing it
as won't fix would also have been appropriate.

The same holds for 2.5, unless you can demonstrate this to cause
security issues (e.g. crashing the Python interpreter).

--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7551
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com