[issue17582] xml.etree.ElementTree does not preserve whitespaces in attributes

Daniele Varrazzo Sat, 30 Mar 2013 09:26:45 -0700

New submission from Daniele Varrazzo:

XML defines the following chars as whitespace [1]::


    S ::= (#x20 | #x9 | #xD | #xA)+

However the chars are not properly escaped into attributes, so they are 
converted into spaces as per attribute-value normalization [2]

    >>> data = '\x09\x0a\x0d\x20'
    >>> data
    '\t\n\r '

    >>> import  xml.etree.ElementTree as ET
    >>> e = ET.Element('x', attr=data)
    >>> s = ET.tostring(e)
    >>> s
    '<x attr="\t&#10;\r " />'

    >>> e1 = ET.fromstring(s)
    >>> data1 = e1.attrib['attr']
    >>> data1 == data
    False

    >>> data1
    ' \n  '

cElementTree suffers of the same bug::

    >>> import  xml.etree.cElementTree as cET
    >>> cET.fromstring(cET.tostring(cET.Element('a', attr=data))).attrib['attr']
    ' \n  '

but not the external library lxml.etree::

    >>> import lxml.etree as LET
    >>> LET.fromstring(LET.tostring(LET.Element('a', attr=data))).attrib['attr']
    '\t\n\r '

The bug is analogous to #5752 but it refers to a different and independent 
module. Proper escaping should be added to the _escape_attrib() function into 
/xml/etree/ElementTree.py (and equivalent for cElementTree).

[1] http://www.w3.org/TR/REC-xml/#white
[2] http://www.w3.org/TR/REC-xml/#AVNormalize

----------
components: Library (Lib), XML
messages: 185574
nosy: piro
priority: normal
severity: normal
status: open
title: xml.etree.ElementTree does not preserve whitespaces in attributes
versions: Python 2.7, Python 3.2

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue17582>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17582] xml.etree.ElementTree does not preserve whitespaces in attributes

Reply via email to