[issue9521] xml.etree.ElementTree skips processing instructions when parsing

Stefan Behnel Sun, 19 Jan 2014 01:01:55 -0800

Stefan Behnel added the comment:

When you write "XML PI", do you mean the XML declaration? At least that's what 
Mark used in his original example.


ET avoids writing them out when they are not necessary, i.e. for UTF-8 
compatible encodings. IMHO that's perfectly ok and definitely not an incorrect 
behaviour.

As for processing instructions (what you used in your test case patch), making 
them appear in the tree by default would be a behavioural change that might 
break existing ET code.

Note that lxml keeps PIs in the tree by default, unless you configure its 
parser explicitly with "remove_pis=True".

There is also a "remove_comments=True" in lxml. ET simply discards comments 
when parsing IIRC.

http://lxml.de/parsing.html#parser-options

IMHO, both behaviours are ok, which lxml having a tendency towards keeping the 
data as it came in rather than trying to find the easiest possible way for the 
user to work with the tree. PIs and comments are a bit 'special' to work with.

A fix could be to add the two keyword arguments also to ET's parser, but make 
them default to True (as opposed to False in lxml), so that users can enable 
them at need.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue9521>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9521] xml.etree.ElementTree skips processing instructions when parsing

Reply via email to