Stefan Behnel added the comment:

> The parser is *not* rejecting control chars.

The parser *is* rejecting control characters. It's an XML parser. See the 
example in the link you posted.


> assume you have a script that simply stores each message it receives (from 
> stdin, from a tcp stream, whatever) inside an xml tree like 
> '<text>{message1}</text><text>{message2}<text>',
> and prints the tree on SIGINT.

That's not an XML specific issue. You are printing a byte string here, so 
repr() would be the right thing to use (and is actually being used 
automatically in Py3), instead of plain printing. The fact that you are 
wrapping the content in XML doesn't matter.


>> What part of the create-to-serialise process exactly is a problem here?
> ElementTree.tostring().

What I meant was: at what step of the process from creating an XML tree in 
memory to serialisation is it a problem that the tree contains control 
characters? Because once the data is serialised, it will just be rejected on 
input by any XML parser, and handling bytes data is a thing on its own (e.g. 
you could serialise to UTF16 and the result would contain null bytes - too bad).

It may just be a bad example that you chose here, but I really can't see this 
being a security problem. You are mishandling arbitrary untrusted binary data, 
that's all. Control characters are most likely not the only problem that you 
should guard against.

Unless there is a more dangerous way to exploit this that is actually due to 
XML being used, I'd suggest changing the type from "security" back to 
"behaviour".

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18850>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to