Alright, I think we that <template> is the furthest
we get now and we got to mind the specs!
Thanks Craig.

On 4/7/19 2:01 PM, Craig Francis wrote:
Hi Joris,

I suspect it's just how the web has developed, where the mixing of JavaScript and imperfect HTML is normal.

I quite like this video as a demo:

https://www.youtube.com/watch?v=lG7U3fuNw3A

Where I think your point is raised when comparing the different parsing of:

1) <div><script title="</div>">
2) <script><div title="</script">

My favourite exploit is very similar...

*<script>*
user_name = "Craig*</script>*Hello";
</script>

Personally I'd like to say to the browser, similar to the old/obsolete <plaintext> element, you won't find any JavaScript code after this point (maybe it can block all scripts in the <body>?)... but this is only because I load my JS files in the <head>, and attach event listeners after DOMContentLoaded, but I know so few developers do this, so it won't be useful to add.

I think this is the main reason Content Security Policy came into existence, where I can skip "unsafe-inline" to block any inline JavaScript, and limit the JavaScript files that can be included.

You can kind of get an idea of what happens with the browser parsing by using JavaScript to load the HTML into a <template> element... but that does raise the question on how you get the unsafe variables to the JavaScript in the first place.

As an aside, I use <meta name="js_name" content="..." /> tags... sometimes with JSON encoded data in the content attribute, where I'd use something like the following to get the content:

  var my_data = document.querySelector('meta[name="js_data"]');
  if (my_data) {
    try {
      my_data = JSON.parse(my_data.getAttribute('content'));
    } catch (e) {
      my_data = null;
    }
  }

But going forwards, the HTML5 spec does cover how the browser (and third-party libraries) should be parsing imperfect HTML, so hopefully these differences will reduce (but I don't imagine they will all be perfectly aligned, in the same way different browsers aren't).

Craig





On Sun, 7 Apr 2019 at 09:00, joris <joris.gutj...@gmail.com <mailto:joris.gutj...@gmail.com>> wrote:


    I agree, that would be a vulnerability.
    But I think this is not the core of my wonder.
    I wonder, why do Web developers have to
    guess what the Browser thinks is JS and executes
    it and what isn't?
    Why can't they just ask the Web Browser to do that
    for them?
    That would be more secure because
    all third-party libraries parse somewhat differently
    than all the Web Browser they are used with.
    On 4/6/19 12:51 PM, Craig Francis wrote:
    While I quite like the simplicity of this idea, where it kind of
    reminds me of the @inert attribute.

    My main concern is how to bypass it, take the code:

    <div noscripts="true"><?= $unsafe_user_name ?></div>

    Where the attacker can set their username to
    `X*</div>*<script>evil_code</script><div>`

    ---

    Unfortunately, I think this is why we need to work with more
    complicated/advanced solutions...

    We need to sanitise all strings that are included in the HTML on
    the server side - e.g. using templating systems; or passing the
    string though something like HTML Purifier:

    http://htmlpurifier.org/

    Or, and you have to be careful here... escaping all HTML output
    though functions like htmlentities() / htmlencode(), where this
    does not fix `<a href=<?= htmlentities($unsafe_url)>` due to the
    url being able to start with `javascript:`, or being able to take
    advantage of the missing quotation marks on the attribute via `
    onclick=evil_code`.

    And when working with strings in JavaScript - you should use safe
    methods like `element.textContent`, or pass them though something
    to sanitise the HTML (both in removing the many ways JavaScript
    can be included, but also just making sure the HTML is well formed):

    
https://github.com/google/closure-library/blob/master/closure/goog/html/sanitizer/htmlsanitizer.js

    https://github.com/punkave/sanitize-html

    Then you would ideally add a Content Security Policy to limit the
    scripts on the page, just incase you miss something.

    https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP

    And as an extra bonus, start playing with the (currently in
    development) Trusted Types, to make sure you aren't using unsafe
    things like element.innerHTML.

    https://developers.google.com/web/updates/2019/02/trusted-types

    Or for even more fun (pain), on your local development server,
    try setting the header:

        Content-Type: application/xhtml+xml; charset=UTF-8

    Do not do this on live, as any bad formatting of your HTML will
    break the page - but this ensures all of your attributes are
    quoted, and all of your tags are perfectly nested (this includes
    `<br>` needing to be `<br />`, the attribute `selected` needing
    to be `selected="selected"`, etc).

    Craig



    On Fri, 5 Apr 2019 at 23:47, Yog Bii <joris.gutj...@gmail.com
    <mailto:joris.gutj...@gmail.com>> wrote:

        XSS prevention is a very important and costly part of a
        Websites Security.
        Because XSS is currently prevented by matching for JS in user
        input
        and is than either blocked or masked by the Web Developer,
        each on his
        own site,
        XSS attacks find differences between the matching of the Web
        Developer
        and the Browser, such that the Web Developer's matching doesn't
        recognize JS as JS, but the Browser executes it.

        This is a constant fight between the Web Developer and the
        XSS attacker,
        that costs many resources needed somewhere else instead.
        And this fight favors larger business over small Web developers.

        I think that this fight can be terminated by letting the
        Web Developer not guess what the Browser may think to be JS
        and instead tell him explicitly that somewhere shouldn't be
        any code.
        The Browser then behaves in that region like
        he would have JS disabled.

        I would do that with a new attribute, called noscripts.
        Inside an HTML element with noscripts = "true",
        the Browser handle anything inside that element like
        JS would be disabled globally.

        An example HTML would look like this:
        <!doctype html>
        <html>
        ...
        <div noscripts="true">
        <script>
        // No danger by unescaped <script> tags
        </script>
        <button onclick="nor by Event listeners">Click me</button>
        ...
        </html>

        If you know a way to do this without any differences between
        what the
        Browser executes and what ever that mechanic lets pass, let
        me know
        and let me know why it isn't thought in every HTML/JS
        Tutorial and
        every Documentation about Web Development.
        _______________________________________________
        dev-security mailing list
        dev-security@lists.mozilla.org
        <mailto:dev-security@lists.mozilla.org>
        https://lists.mozilla.org/listinfo/dev-security



_______________________________________________
dev-security mailing list
dev-security@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security

Reply via email to