[ https://issues.apache.org/jira/browse/TEXT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223929#comment-16223929 ]
Ilguiz Latypov edited comment on TEXT-42 at 10/29/17 3:25 PM: -------------------------------------------------------------- I wonder if the escapeEcmaScript()'s use cases can be scrutinized. * Outputting a standalone javascript file containing string literals. The generation of string literals to be surrounded by double or single quotes seems to be covered by the existing code in escapeEcmaScript(). {code:java} String dq = Character.toString('"'); out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");"); {code} * Outputting an HTML attribute containing javascript containing string literals. This needs a new method *escapeHtmlAttr*. Depending on the surrounding quotes or absence of them, all characters of the attribute value will go through either a minimal substitution of [single/double quotes and ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state] with the HTML entity or through a broader replacement of [whitespace, ampersand, single/double quotes, equals, greater/less-than and backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state]. Safety calls to use the broader escaping by default (and allow the narrow one as an option). I.e. {code:java} out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq + escapeEcmaScript(input) + dq + ")") + dq); {code} * Outputting string literals in the script tag contents. The existing code *lacks protection* against the script's end tag taking precedence over any contents. Because browsers allow readable javascript between the script tags, browsers [stopped applying a straight decoding algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents] similar to one in HTML attributes. The code in escapeEcmaScript() *must escape the less-than character* (with either backslash-x notation or with a simple backslash prefix). I suggest to escape ampersands (assuming that browsers may keep applying their HTML entity decoding throughout the script tag contents). Escaping the greater-than character does not seem necessary but would look symmetrical to escaping the less-than character. {code:java} out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq + ")</script>"); {code} was (Author: ilatypov): I wonder if the escapeEcmaScript()'s use cases can be scrutinized. * Outputting a standalone javascript file containing string literals. The generation of string literals to be surrounded by double quotes seems to be covered by the existing code in escapeEcmaScript(). We could *extend it to cover cases of single-quote string literals* and back-quote string templates. {code:java} String dq = Character.toString('"'); out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");"); {code} * Outputting an HTML attribute containing javascript containing string literals. This needs a new method *escapeHtmlAttr*. Depending on the surrounding quotes or absence of them, all characters of the attribute value will go through either a minimal substitution of [single/double quotes and ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state] with the HTML entity or through a broader replacement of [whitespace, ampersand, single/double quotes, equals, greater/less-than and backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state]. Safety calls to use the broader escaping by default (and allow the narrow one as an option). I.e. {code:java} out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq + escapeEcmaScript(input) + dq + ")") + dq); {code} * Outputting string literals in the script tag contents. The existing code *lacks protection* against the script's end tag taking precedence over any contents. Because browsers allow readable javascript between the script tags, browsers [stopped applying a straight decoding algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents] similar to one in HTML attributes. The code in escapeEcmaScript() *must escape the less-than character* (with either backslash-x notation or with a simple backslash prefix). I suggest to escape ampersands (assuming that browsers may keep applying their HTML entity decoding throughout the script tag contents). Escaping the greater-than character does not seem necessary but would look symmetrical to escaping the less-than character. {code:java} out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq + ")</script>"); {code} > [XSS] Possible attacks through StringEscapeUtils.escapeEcmaScript? > ------------------------------------------------------------------ > > Key: TEXT-42 > URL: https://issues.apache.org/jira/browse/TEXT-42 > Project: Commons Text > Issue Type: Bug > Reporter: Andy Reek > Labels: XSS > Fix For: 1.x > > > org.apache.commons.lang3.StringEscapeUtils.escapeEcmaScript does the escape > via a prefixed '\' on all characters which must be escaped. I am not sure if > this is really secure, if am looking at the comments on > https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_JavaScript_Data_Values. > They say it is possible to do an attack by escape the escape. I tested this > with the string '\"' and the output was '\\\"'. Is this really > ecma-/java-script secure? Or is it better to use the implementation used by > OWASP? -- This message was sent by Atlassian JIRA (v6.4.14#64029)