Python enters some sort of infinite loop when attempting to read data from a
malformed file that is big5 encoded (using the codecs library).  This
behaviour can be observed under Linux and FreeBSD, using Python 2.4 and 2.5.
A really simple example illustrating the bug follows:

Python 2.4.4 (#1, May 15 2007, 13:33:55)
[GCC 4.1.1 (Gentoo 4.1.1-r3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import codecs
fname='out'
outfd=open(fname,'w')
outfd.write(chr(243))
outfd.close()

infd=codecs.open(fname, encoding='big5')
infd.read(1024)

And then, it hangs forever.  If I instead use the following code:

Python 2.5 (r25:51908, Jan  8 2007, 19:09:28)
[GCC 3.4.5 (Gentoo 3.4.5-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import codecs, signal
fname='out'
def handler(*args):
...   raise Exception("boo!")
...
signal.signal(signal.SIGALRM, handler)
0
outfd=open(fname, 'w')
outfd.write(chr(243))
outfd.close()

infd=codecs.open(fname, encoding='big5')
signal.alarm(5)
0
infd.read(1024)

The program still hangs forever.  The program can be made to crash if I
don't install a signal handler at all, but that's pretty lame.  It looks
like the entire interpreter is being locked up by this read, so I don't
think there's likely to be a pure-python workaround, but I thought it would
be a good but to have out there so a future version of python can
(hopefully) fix this.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to