New submission from Daniele Varrazzo: XML defines the following chars as whitespace [1]::
S ::= (#x20 | #x9 | #xD | #xA)+ However the chars are not properly escaped into attributes, so they are converted into spaces as per attribute-value normalization [2] >>> data = '\x09\x0a\x0d\x20' >>> data '\t\n\r ' >>> import xml.etree.ElementTree as ET >>> e = ET.Element('x', attr=data) >>> s = ET.tostring(e) >>> s '<x attr="\t \r " />' >>> e1 = ET.fromstring(s) >>> data1 = e1.attrib['attr'] >>> data1 == data False >>> data1 ' \n ' cElementTree suffers of the same bug:: >>> import xml.etree.cElementTree as cET >>> cET.fromstring(cET.tostring(cET.Element('a', attr=data))).attrib['attr'] ' \n ' but not the external library lxml.etree:: >>> import lxml.etree as LET >>> LET.fromstring(LET.tostring(LET.Element('a', attr=data))).attrib['attr'] '\t\n\r ' The bug is analogous to #5752 but it refers to a different and independent module. Proper escaping should be added to the _escape_attrib() function into /xml/etree/ElementTree.py (and equivalent for cElementTree). [1] http://www.w3.org/TR/REC-xml/#white [2] http://www.w3.org/TR/REC-xml/#AVNormalize ---------- components: Library (Lib), XML messages: 185574 nosy: piro priority: normal severity: normal status: open title: xml.etree.ElementTree does not preserve whitespaces in attributes versions: Python 2.7, Python 3.2 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17582> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com