New submission from STINNER Victor <vstin...@redhat.com>:

Using attached sf.py and sf.xml, I can crash Python. lxml builds a fake 
traceback to inject the XML filename the XML line number where the parsing 
error occurs. The problem is that the filename is a bytes object, whereas 
print_exception() expects the filename to be a Unicode string.

Attached PR fix the crash.

Fedora bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1665490


Example:

$ python3 sf.py
<lxml.etree._ElementTree object at 0x7f7d0f8abd08>
Traceback (most recent call last):
  File "sf.py", line 6, in <module>
    xml2 = etree.parse("sf.xml")
  File "src/lxml/etree.pyx", line 3426, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1840, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1866, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1770, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1163, in 
lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 601, in 
lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 651, in lxml.etree._raiseParseError
Segmentation fault (core dumped)


(gdb) frame 6
#6  0x00007ffff7c85898 in print_exception (value=None, f=<_io.TextIOWrapper at 
remote 0x7fffea910708>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:753
753     /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c: No such 
file or directory.
(gdb) l
748     in /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c
(gdb) p filename
$1 = b'sf.xml'
(gdb) p *filename
$2 = {
  ob_refcnt = 2, 
  ob_type = 0x7ffff7db5da0 <PyBytes_Type>
}


Extract of print_exception():

        PyObject *message, *filename, *text;
        int lineno, offset;
        if (!parse_syntax_error(value, &message, &filename,
                                &lineno, &offset, &text))
            PyErr_Clear();
        else {
            PyObject *line;

            Py_DECREF(value);
            value = message;

            line = PyUnicode_FromFormat("  File \"%S\", line %d\n",   // 
<====== HERE
                                          filename, lineno);
            Py_DECREF(filename);


More gdb traceback:

Program received signal SIGSEGV, Segmentation fault.
find_maxchar_surrogates (num_surrogates=<synthetic pointer>, maxchar=<synthetic 
pointer>, 
    end=0xfffffffffffffffd <error: Cannot access memory at address 
0xfffffffffffffffd>, begin=0x1 <error: Cannot access memory at address 0x1>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1660
1660    /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c: No 
such file or directory.
Missing separate debuginfos, use: dnf debuginfo-install 
abrt-libs-2.12.0-2.fc30.x86_64 augeas-libs-1.12.0-1.fc30.x86_64 
bzip2-libs-1.0.6-29.fc30.x86_64 dbus-libs-1.12.16-1.fc30.x86_64 
elfutils-libelf-0.176-3.fc30.x86_64 elfutils-libs-0.176-3.fc30.x86_64 
expat-2.2.6-2.fc30.x86_64 glib2-2.60.4-1.fc30.x86_64 
libacl-2.2.53-3.fc30.x86_64 libcap-2.26-5.fc30.x86_64 
libdb-5.3.28-37.fc30.x86_64 libffi-3.1-19.fc30.x86_64 
libgcc-9.1.1-1.fc30.x86_64 libgcrypt-1.8.4-3.fc30.x86_64 
libgpg-error-1.33-2.fc30.x86_64 libmount-2.33.2-1.fc30.x86_64 
libreport-2.10.0-3.fc30.x86_64 libselinux-2.9-1.fc30.x86_64 
libstdc++-9.1.1-1.fc30.x86_64 libtar-1.2.20-17.fc30.x86_64 
libtool-ltdl-2.4.6-29.fc30.x86_64 libuuid-2.33.2-1.fc30.x86_64 
libxcrypt-4.4.6-2.fc30.x86_64 libxml2-2.9.9-2.fc30.x86_64 
libxslt-1.1.33-1.fc30.x86_64 libzstd-1.4.0-1.fc30.x86_64 
lz4-libs-1.8.3-2.fc30.x86_64 pcre-8.43-2.fc30.x86_64 popt-1.16-17.fc30.x86_64 
python3-abrt-2.12.0-2.fc30.x86_64 python3-dbus-1.2.8-5.fc30.x86_64 
python3-libreport-2.10
 .0-3.fc30.x86_64 python3-lxml-4.2.5-2.fc30.x86_64 
python3-systemd-234-8.fc30.x86_64 python3-xmlsec-1.3.3-5.fc30.x86_64 
rpm-libs-4.14.2.1-4.fc30.1.x86_64 systemd-libs-241-8.git9ef65cb.fc30.x86_64 
xmlsec1-1.2.27-2.fc30.x86_64 xmlsec1-openssl-1.2.27-2.fc30.x86_64 
xz-libs-5.2.4-5.fc30.x86_64 zlib-1.2.11-15.fc30.x86_64
(gdb) where
#0  0x00007ffff7c321ad in find_maxchar_surrogates
    (num_surrogates=<synthetic pointer>, maxchar=<synthetic pointer>, 
end=0xfffffffffffffffd <error: Cannot access memory at address 
0xfffffffffffffffd>, begin=0x1 <error: Cannot access memory at address 0x1>) at 
/usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1660
#1  0x00007ffff7c321ad in _PyUnicode_Ready (unicode=b'sf.xml') at 
/usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:1699
#2  0x00007ffff7afad8e in unicode_fromformat_write_str (precision=-1, width=-1, 
str=<optimized out>, writer=0x7fffffffcb20)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2596
#3  0x00007ffff7afad8e in unicode_fromformat_arg (vargs=0x7fffffffcb80, 
f=<optimized out>, writer=0x7fffffffcb20)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2807
#4  0x00007ffff7afad8e in PyUnicode_FromFormatV (format=<optimized out>, 
vargs=<optimized out>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2914
#5  0x00007ffff7b82a99 in PyUnicode_FromFormat 
(format=format@entry=0x7ffff7c9b045 "  File \"%U\", line %d\n")
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Objects/unicodeobject.c:2966
#6  0x00007ffff7c85898 in print_exception (value=None, f=<_io.TextIOWrapper at 
remote 0x7fffea910708>)
    at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:753
#7  0x00007ffff7c85898 in print_exception_recursive
    (f=<_io.TextIOWrapper at remote 0x7fffea910708>, 
value=<XMLSyntaxError(error_log=<lxml.etree._ListErrorLog at remote 
0x7fffe9fe3598>, code=1) at remote 0x7fffea046828>, seen=<optimized out>) at 
/usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:901
#8  0x00007ffff7c8b8bb in PyErr_Display
    (exception=<optimized out>, 
value=<XMLSyntaxError(error_log=<lxml.etree._ListErrorLog at remote 
0x7fffe9fe3598>, code=1) at remote 0x7fffea046828>, tb=<optimized out>) at 
/usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/pythonrun.c:935
#9  0x00007ffff7c8b93c in sys_excepthook (self=<optimized out>, args=<optimized 
out>) at /usr/src/debug/python3-3.7.3-3.fc30.x86_64/Python/sysmodule.c:332
(...)

----------
files: sf.py
messages: 346988
nosy: vstinner
priority: normal
severity: normal
status: open
title: print_exception() crash when lxml 4.2.5 raises a parser error
type: crash
versions: Python 3.7
Added file: https://bugs.python.org/file48449/sf.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37467>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to