Re: [whatwg] Script parsing mode within SVG sections in HTML documents

2015-06-21 Thread Ian Hickson
On Sun, 21 Jun 2015, Niels Keurentjes wrote:

 I ran into a discussion on Stack Overflow in this topic: 
 http://stackoverflow.com/q/30952737/1729885, regarding embedding the 
 following code snippet in an HTML document:
   
 svgscript#x61;#x6c;#x65;#x72;#x74;#x28;#x31;#x29;/script/svg
 
 The character references translate to alert(1). I have confirmed that, 
 in all the latest versions of IE, Chrome and Firefox, this code is 
 executed, whilst it is not if the svg container is omitted. I neither 
 think this is intentional nor wanted behavior, as HTML5 explicitly 
 defines a separate script parsing mode which handles character 
 references as plain text.

It's not great, but it is intentional. Within svg and math blocks, we 
use the foreign content parsing mode wherein parsing is much more 
similar to legacy XML parsing than legacy HTML parsing:

   https://html.spec.whatwg.org/#parsing-main-inforeign

Note in particular that the special behaviour for script here doesn't 
include changing the tokeniser mode, like it would in non-foreign content.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


[whatwg] Script parsing mode within SVG sections in HTML documents

2015-06-21 Thread Niels Keurentjes
I ran into a discussion on Stack Overflow in this topic: 
http://stackoverflow.com/q/30952737/1729885, regarding embedding the following 
code snippet in an HTML document:


svgscript#x61;#x6c;#x65;#x72;#x74;#x28;#x31;#x29;/script/svg

The character references translate to alert(1). I have confirmed that, in all 
the latest versions of IE, Chrome and Firefox, this code is executed, whilst it 
is not if the svg container is omitted. I neither think this is intentional 
nor wanted behavior, as HTML5 explicitly defines a separate script parsing mode 
which handles character references as plain text.
 
HTML5 however also explicitly defines that the semantics of SVG elements are 
defined by the SVG specification, and SVG also defines the script element, 
without the script parsing mode (as it is XML itself it would be impossible nor 
necessary to enforce that). Therefore it seems that all browsers are correct in 
executing this code in this context, according to current standards. It does 
leave a potentially giant loophole though to embed malevolent code in HTML 
documents which is not scanned or detected by naïve scanners assuming script 
tags cannot work like this in HTML.

I think the HTML specification on the SVG element 
(http://www.w3.org/TR/html5/embedded-content-0.html#svg), or the more general 
section on embedded content at 
http://www.w3.org/TR/html5/dom.html#embedded-content-2, should be expanded to 
state that either, if an SVG document is embedded in HTML, it inherits its 
limitations with regards to the parsing mode of elements defined in both 
standards, or more generically something to the effect of that active content 
in any section of the document must adhere to limitations imposed by all of its 
containing documents.