Matt Basta <bastaw...@gmail.com> added the comment:

> So I think the example is invalid (should escape the <), and that HTMLParser 
> is not buggy.

On the other hand, the HTML5 spec clearly dictates otherwise:

http://www.w3.org/TR/html5/syntax.html#cdata-rcdata-restrictions
The text in raw text and RCDATA elements must not contain any occurrences of 
the string "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS) followed by characters 
that case-insensitively match the tag name of the element followed by one of 
U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED 
(FF), U+000D CARRIAGE RETURN (CR), U+0020 SPACE, U+003E GREATER-THAN SIGN (>), 
or U+002F SOLIDUS (/).


Additionally, no browsers (perhaps unless they are in quirks mode) currently 
obey the HTML4 variant of the rule. This is due largely in part to the need to 
include strings such as "</scr" + "ipt>" within a script tag itself. This 
behavior can be observed firsthand by loading this snippet in a browser:

<script><span></span>This should not be visible.</script>

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue670664>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to