On Saturday, 10 January 2015 at 19:17:22 UTC, Ola Fosheim Grøstad wrote:
Nice and clean code; does it expand html entities ("&amp")?

Of course. It does it both ways:

<span>a &amp;</span>

span.innerText == "a &"

span.innerText = "a \" b";
assert(span.innerHTML == "a &quot; b");

parseGarbage also tries to fix broken entities, so like & standing alone it will translate to &amp; for you. there's also parseStrict which just throws an exception in cases like that.

That's one thing a lot of XML parsers don't do in the name of speed, but I do since it is pretty rare that I don't want them translated. One thing I did for a speedup though was scan the string for & and if it doesn't find one, return a slice of the original, and if it does, return a new string with the entity translated. Gave a surprisingly big speed boost without costing anything in convenience.

The HTML5 standard has improved on HTML4 by now being explicit on how incorrect documents shall be interpreted in section 8.2. That ought to be sufficient, since that is what web browsers are supposed to do.

http://www.w3.org/TR/html5/syntax.html#html-parser

Huh, I never read that, my thing just did what looked right to me over hundreds of test pages that were broken in various strange and bizarre ways.

Reply via email to