[issue11804] expat parser not xml 1.1 (breaks xmlrpclib)

Martin v . Löwis Thu, 24 May 2012 08:58:55 -0700

Martin v. Löwis <[email protected]> added the comment:

This has nothing to do with XML 1.1 (so closing this report as "won't fix").


The UTF-8 text that you present works very well:

>>> p=xml.parsers.expat.ParserCreate(encoding="utf-8")
>>> p.Parse("<x>\xc3\x87</x", 1)
1

The character LATIN CAPITAL LETTER C WITH CEDILLA is definitely supported in 
XML 1.0, so there is no need for XML 1.1 here.

If this still fails to parse for you, it may be because the input is actually 
different, e.g.

>>> p=xml.parsers.expat.ParserCreate(encoding="utf-8")
>>> p.Parse("<x>&#195;\x87</x>", 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 9

I.e. the input might contain the character &, #, 1, 9, 5, ;, and \x87. That is 
ill-formed UTF-8, and the parser is right to choke on it. Even if it was 
declared as XML 1.1, it will still be ill-formed, because it still would be 
invalid UTF-8.

----------
resolution:  -> wont fix
status: open -> closed

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue11804>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11804] expat parser not xml 1.1 (breaks xmlrpclib)

Reply via email to