Eduard Pascual wrote:
(Note: I made a recurrent typo on my previous e-mails: XML's CDATA tag
is spelled <![CDATA[ ... ]]> rather than <[CDATA[ ... ]]>. The "<!"
sequence is a legacy from SGML's obscure features. My apologies if
those mistakes caused any issue; although I hope the idea behind my
posts was clear enough.)

On Wed, Apr 7, 2010 at 7:49 AM, T.J. Crowder <[email protected]> wrote:
<[CDATA[ ... ]]>.  This is far easier to

sanitize (you just need to ensure that the input doesn't include the

"]]>" sequence), thus being more usable on user-provided content.
What makes ]]> easier to defend against than </code>?
As I said, with <![CDATA[ ... ]]> you only need to care about the
exact sequence "]]>": if it's found within an input, get rid of it or
somehow fix it (string replacement "]]>" => "]]>]]&gt;<![CDATA[" gets
the job done safely). With </code> (or even with Arthur's <cdata>
suggestion, to some degree), things are quite more complex:
1) an instance of the "</code>" string may be legitimate within the
content (if it closes a matching <code ...> within the content).
2) due to HTML5's error-handling rules, something other than "</code>"
may end up closing the initial <code ...>, so a sanitizer would have
to implement the error-handling rules and play really smart to handle
those cases. I don't know the rules down to the detail, but IIRC
something like this: <div> <code> </div> would have the <code> element
implicitly closed just before the </div>.

That's why I just use DOMDocument (libxml2) for all dynamically generated code. I don't have to worry about that kind of thing.

User input where markup is allowed is sent through a filter first (html tidy in xml mode followed by HTML Purifier) that fixes it for xml sanity and then it is imported into a DOM of its own before the node is imported into the DOM that is served to the requesting client.

Code injection is a non issue for me.

It's a little slower, but you can cache it once it has been done that way making performance an issue only the first time it is assembled or modified.

Reply via email to