New submission from Stefan Behnel <sco...@users.sourceforge.net>:

The xml.etree.ElementTree package in the Python 3.x standard library breaks 
compatibility with existing ET 1.2 code. The serialiser returns a unicode 
string when no encoding is passed. Previously, the serialiser was guaranteed to 
return a byte string. By default, the string was 7-bit ASCII compatible.

This behavioural change breaks all code that relies on the default behaviour of 
ElementTree. Since there is no longer a default encoding in Python 3, unicode 
strings are incompatible with byte strings, which means that the result of the 
serialisation can no longer be written to a file, for example.

XML is well defined as a stream of bytes. Redefining it as a unicode string *by 
default* is hard to understand at best.

Finally, it would have been good to look at the other ET implementation before 
introducing such a change. The lxml.etree package has had support for 
serialising XML into a unicode string for years, and does so in a clear, safe 
and explicit way. It requires the user to pass the 'unicode' (Py3 'str') type 
as encoding parameter, e.g.

    tree.tostring(encoding=str)

which is explicit enough to make it clear that this is different from a normal 
encoding.

----------
components: Library (Lib)
messages: 100333
nosy: scoder
severity: normal
status: open
title: Serialiser in ElementTree returns unicode strings in Py3k
type: behavior
versions: Python 3.1, Python 3.2

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8047>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to