I'm just wondering, the idea of this is to display potentially unsafe HTML we got from the server, right? Is there a way to discourage using this to sanitizing HTML on the client before sending it to the server? I can definitely see newer programmers assuming that doing this sanitization on the client side is safe enough, when it's clearly not. I guess some documentation to that effect would be enough.
-- Arthur Kalmenson On Wed, Aug 18, 2010 at 10:11 AM, Christoph Kern <x...@google.com> wrote: > > > On Wed, Aug 18, 2010 at 01:44, <t.bro...@gmail.com> wrote: >> >> On 2010/08/17 23:23:39, xtof wrote: >>> >>> On 2010/08/17 23:05:06, tbroyer wrote: >>> > >>> > Looking at the code more closely it would merely "fail" by overly >>> > rejecting tags that are whitelisted: i.e. "<b foo=<i>should be >>> > bold" would be sanitized to "<b foo=<i>should be bold" and the >>> > end part would be italicized instead of bold. >> >>> And that is exactly correct behavior for this class as document. It >>> only claims to accept HTML with attribute-free tags within the >>> whitelist. It doesn't claim to do anything particularly sensible >>> with input that doesn't fit this constraint; it does however claim >>> that whatever it outputs is safe (will not result in XSS/script >>> execution). >> >> Oops, yes, sorry, I can't tell how it happened but I misread the >> "whitelisting" code (matches the whole thing between '<' and '>', so any >> attribute, or even whitespace, as in "<b >bold</b>", would make it fail >> and thus be escaped). >> It's fine then. Sorry again for the noise. > > No prob, always better to err on the side of caution! > >> >> Still, there's a small issue with the fact that >> SafeHtmlTemplatesGenerator doesn't use the HTML5 serialization algorithm >> (or any similar one): @Template("<br/>") will result in "<br></br>" >> which is interpreted by browsers as "<br><br>" [1], which makes it >> impossible to generate a single "line break" in a SafeHtmlTemplates. >> >> [1] http://www.w3.org/TR/html5/tokenization.html#parsing-main-inbody >> (search for « An end tag whose tag name is "br" », it's there for >> "compat with the Web") > > Agreed. Another potential problem with this parser is that it's too strict > -- it insists on well-formed XHTML, but that's a much stronger constraint > than needed to ensure the SafeHtml type contract. > For example, > �...@template("<a href=\"{0}\">") > SafeHtml openATag(String href); > is perfectly safe, and might be useful in something where one needs to > conditionally assemble various pieces of HTML markup, like, > SafeHtmlBuilder shb; > //... > shb.append(openATag(someUrl)); > if (...) { > shb.append(someThing) > } else { > shb.append(someThingElse); > } > To ensure the SafeHtml type contract, the parser need only record the > HTML-context ("inner text", "url-valued attribute", "style", etc) of the > positional parameters, and require that the template ends in "inner text" > context (to ensure that SafeHtmls are safely appendable to each other). > Note that a bug in code like the above could result in non-well-formed HTML > (like unbalanced tags), but not unsafe HTML that might cause XSS (that is of > course absent bugs in the SafeHtml generators themselves). > As I'd mentioned, I'm proposing to replace the current XML based parser with > the java streamhtmlparser, which has this property. > I'd be fine with removing the templating code from this change and land it > separately along with the new parser. > cheers, > --xtof > > >> >> >> http://gwt-code-reviews.appspot.com/771801/show > > -- > http://groups.google.com/group/Google-Web-Toolkit-Contributors -- http://groups.google.com/group/Google-Web-Toolkit-Contributors