Stefan Behnel added the comment: > The parser is *not* rejecting control chars.
The parser *is* rejecting control characters. It's an XML parser. See the example in the link you posted. > assume you have a script that simply stores each message it receives (from > stdin, from a tcp stream, whatever) inside an xml tree like > '<text>{message1}</text><text>{message2}<text>', > and prints the tree on SIGINT. That's not an XML specific issue. You are printing a byte string here, so repr() would be the right thing to use (and is actually being used automatically in Py3), instead of plain printing. The fact that you are wrapping the content in XML doesn't matter. >> What part of the create-to-serialise process exactly is a problem here? > ElementTree.tostring(). What I meant was: at what step of the process from creating an XML tree in memory to serialisation is it a problem that the tree contains control characters? Because once the data is serialised, it will just be rejected on input by any XML parser, and handling bytes data is a thing on its own (e.g. you could serialise to UTF16 and the result would contain null bytes - too bad). It may just be a bad example that you chose here, but I really can't see this being a security problem. You are mishandling arbitrary untrusted binary data, that's all. Control characters are most likely not the only problem that you should guard against. Unless there is a more dangerous way to exploit this that is actually due to XML being used, I'd suggest changing the type from "security" back to "behaviour". ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18850> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com