On Monday, 12 December 2016 at 11:32:42 UTC, Nicholas Wilson
wrote:
for strip_tags I would look for an xml library (e.g. arsd.dom)
and parse it and then reprint it without the tags. There's
probably a better way to do it though. I'm sure Adam Ruppe will
be able to help you there.
Well, it depends what you are doing with it. If you are just
outputting user data, I wouldn't allow any HTML at all... but I'd
do it by encoding it all. So if they write "<script>" in the
form, the output will be "<script>", which is harmless.
dom.d's htmlEntitiesEncode will do that:
http://dpldocs.info/experimental-docs/arsd.dom.htmlEntitiesEncode.html
auto safe = htmlEntitiesEncode(user_data);
Compare htmlentities() in PHP.
If you want to allow some HTML but not all, then yeah, you can
use the full DOM parser and rip stuff out that way.
Element.stripOut
<http://dpldocs.info/experimental-docs/arsd.dom.Element.stripOut.html> can help with that, or innerText <http://dpldocs.info/experimental-docs/arsd.dom.Element.innerText.1.html>.
ask me if you need more