On Wednesday, 28 June 2017 at 18:08:12 UTC, aberba wrote:
I wanted strip_tags() for sanitization in vibe.d and I set out for algorithms on how to do it and came across this JavaScript library at

string stripTags(string input, in string[] allowedTags = [])
{
        import std.regex: Captures, replaceAll, ctRegex;

        auto regex = ctRegex!(`</?(\w*)>`);

Ouch, parsing html or xml with regular expressions is problematic.
What people generally don't realize is that the > is not required to be encoded as entity when in the data. This means that <thing attr="Hello >"> or <data>></data> are absolutely legal. Regular expressions may break when they encounter them.

http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx/
https://blog.codinghorror.com/parsing-html-the-cthulhu-way/


Reply via email to