New submission from once-off <once-...@mailinator.com>: The attached script (sgml_error.py) was designed to output XML files unchanged, other than expanding <empty/> tags into an opening and closing tag, such as <empty></empty>.
It seems the SGMLParser class recognizes an empty tag, but does not emit the closing tag until the NEXT forward slash it sees. So everything from the forward slash in <empty/> (even the closing angle bracket) until the next forward slash is considered to be textual data. See the following line output. Have I missed something here (like a conscious design limitation on the class, an error on my part, etc), or is this really a bug with the class? C:\Python24\Lib>python sgmllib.py H:\input.xml start tag: <root> data: '\n ' start tag: <tag1> end tag: </tag1> data: '\n ' start tag: <tag2> data: '>\n <tag3>hello<' end tag: </tag2> data: 'tag3>\n' end tag: </root> C:\Python24\Lib>python ActivePython 2.4.3 Build 12 (ActiveState Software Inc.) based on Python 2.4.3 (#69, Apr 11 2006, 15:32:42) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sgml_error Input: <root> <tag1></tag1> <tag2/> <tag3>hello</tag3> </root> Output: <root> <tag1></tag1> <tag2>> <tag3>hello<</tag2>tag3> </root> Expected: <root> <tag1></tag1> <tag2></tag2> <tag3>hello</tag3> </root> ---------- components: Extension Modules, Library (Lib), XML files: sgml_error.py messages: 83667 nosy: once-off severity: normal status: open title: Can SGMLParser properly handle <empty/> tags? type: behavior versions: 3rd party, Python 2.4, Python 2.5 Added file: http://bugs.python.org/file13348/sgml_error.py _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5498> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com