Éric Araujo <mer...@netwok.org> added the comment:

Hello

XML of the form <tag/> are an SGML hack, or more precisely the combination of 
two features of SGML. The forward slash closes the tag, and the following angle 
bracket is character data, not part of the tag.

The W3C validator  uses a real SGML parser for HTML doctypes, and fails on 
XML-like /> constructs: 
http://validator.w3.org/check?uri=data%3Atext%2Fhtml%2C%3C!DOCTYPE+html+PUBLIC+%22-%2F%2FW3C%2F%2FDTD+HTML+4.01%2F%2FEN%22+%22http%3A%2F%2Fwww.w3.org%2FTR%2Fhtml4%2Fstrict.dtd%22%3E+%3Chtml%3E+%3Chead%3E+++%3Ctitle%3ETest%3C%2Ftitle%3E+++%3Cmeta+name%3Dtest+content%3Done%2F%3E+++%3Cmeta+name%3Dbug+content%3Dtwo%3E+%3C%2Fhead%3E+%3Cbody%3E+++%3Cp%3ETest%3C%2Fp%3E+%3C%2Fbody%3E+%3C%2Fhtml%3E&charset=%28detect+automatically%29&doctype=Inline&group=1&verbose=1

The complete explanation can be read at 
http://www.cs.tut.fi/~jkorpela/html/empty.html

In conclusion, sgmllib is right. Use an XML parser for XML or an HTML5 parser 
for HTML.

Kind regards

----------
nosy: +Merwok

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue5498>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to