New submission from Dave Malcolm <dmalc...@redhat.com>:

pyexpat configured with "--with-system-expat" segfaults in one selftest when 
built against expat 2.0.1

SVN trunk:
$ ./configure --with-system-expat
(with expat-2.0.1)
$ make
$ ./python Lib/test/test_pyexpat.py
[snip]
test_parse_only_xml_data (__main__.sf1296433Test) ... Segmentation fault (core 
dumped)

The issue occurs when an exception happens within a character handler.

When a Python exception is raised inside pyexpat.c:call_character_handler(), it 
invokes pyexpat.c:flag_error() which clears the pyexpat object's handlers, and 
it then sets the expat structure's handler to a no-op handler:
        XML_SetCharacterDataHandler(self->itself,
                                    noop_character_data_handler);

The reason why appears to be that python's copy of that code in 
xmlparse.c:doContent() looks like this:
            characterDataHandler(handlerArg, dataBuf,
                                 (int)(dataPtr - (ICHAR *)dataBuf));

(see
http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?annotate=77680&pathrev=77680#l2544
)

i.e. using:
  #define characterDataHandler (parser->m_characterDataHandler)
to look up the handler each time, thus picking up on the newly installed noop 
handler, whereas the corresponding code in expat 2.0.1 looks like this:

        XML_CharacterDataHandler charDataHandler = characterDataHandler;
        if (charDataHandler) {
            ...
            for (;;) {
              ...
=>            charDataHandler(handlerArg, dataBuf,
                              (int)(dataPtr - (ICHAR *)dataBuf));
              ...
            }
          }

i.e. keeping the old function pointer cached

(gdb) p self->itself->m_characterDataHandler 
$30 = (XML_CharacterDataHandler) 0xf8b370 <noop_character_data_handler>

(gdb) p charDataHandler
$31 = (XML_CharacterDataHandler) 0xf8daf0 <my_CharacterDataHandler>

So when the exception is raised and the handler is changed, the local in the 
loop doesn't change, and we're calling the old handler which doesn't allow for 
being in the error-recovery state, and self->handlers[3] is NULL, leading to an 
eventual segfault.

This is with expat-2.0.1-10.fc13.i686 on a Fedora 13 box.

expat's xmlparse.c's caching of the character handler appears to be this
change:
http://expat.cvs.sourceforge.net/viewvc/expat/expat/lib/xmlparse.c?r1=1.157&r2=1.158

> Revision 1.158 - (view) (download) (annotate) - [select for diffs] 
> Mon Jul 10 18:59:52 2006 UTC (3 years, 10 months ago) by kwaclaw 
> Branch: MAIN 
> Changes since 1.157: +39 -37 lines 
> Diff to previous 1.157
> Improved fix for issues # 1515266 and # 1515600. Will now preserve the
> "failover to default handler" logic. Note: clearing the character data handler
> does not take effect immediately anymore.

which is almost 4 years old (2006-07-10)

For reference the issues referred to above are:
http://sourceforge.net/tracker/?func=detail&aid=1515266&group_id=10127&atid=110127
http://sourceforge.net/tracker/?func=detail&aid=1515600&group_id=10127&atid=110127

expat CVS R_2_0_0 has xmlparse.c r1.151 from 2005-12-23
expat CVS R_2_0_1 has xmlparse.c r1.161 from 2007-05-08

So it appears that this "clearing the character data handler does not take
effect immediately anymore." is in expat 2.0.1 onwards, but not in 2.0.0

Running:
$ diff -rup ~/coding/python-svn/trunk-clean/Modules/expat
/usr/src/debug/expat-2.0.1/lib/|less
to show the difference between system expat and latest python SVN code
indicates that we're using 2.0.1 as the system copy, and latest Python SVN code
is using a copy of 2.0.0

Looking at http://bugs.python.org/issue1296433 (as per the test name), this was 
fixed in r47191 (see http://svn.python.org/view?view=rev&revision=47191 ) on 
2006-07-01, adding a termination clause to the loop within its copy of 
xmlparse.c but this was followed five days later (2006-07-06) with:
http://svn.python.org/view?view=rev&revision=47253
which had the log:
> - back out Expat change; the final fix to Expat will be different
> - change the pyexpat wrapper to not be so sensitive to this detail of the
>  Expat implementation (the ex-crasher test still passes)

though this change happened before the change in expat on 2006-07-10

So, _if_ I'm reading this right, it appears that Python's pyexpat.c module is 
incompatible with expat 2.0.1: pyexpat expects that when it sets:
    XML_SetCharacterDataHandler(self->itself,
                                noop_character_data_handler);
that my_CharacterDataHandler will no longer be called, but it is, and it 
segfaults during the test.

Working around it appears trivial; I will attach a patch.

----------
components: XML
messages: 108323
nosy: dmalcolm
priority: normal
severity: normal
status: open
title: pyexpat configured with "--with-system-expat" is incompatible with expat 
2.0.1

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9054>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to