[ 
https://issues.apache.org/jira/browse/TEXT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223929#comment-16223929
 ] 

Ilguiz Latypov edited comment on TEXT-42 at 10/29/17 3:25 PM:
--------------------------------------------------------------

I wonder if the escapeEcmaScript()'s use cases can be scrutinized.

* Outputting a standalone javascript file containing string literals.  The 
generation of string literals to be surrounded by double quotes seems to be 
covered by the existing code in escapeEcmaScript().  We could *extend it to 
cover cases of single-quote string literals* and back-quote string templates.
{code:java}
String dq = Character.toString('"');
out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");");
{code}
* Outputting an HTML attribute containing javascript containing string 
literals.  This needs a new method *escapeHtmlAttr*.  Depending on the 
surrounding quotes or absence of them, all characters of the attribute value 
will go through either a minimal substitution of [single/double quotes and 
ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state]
 with the HTML entity or through a broader replacement of [whitespace, 
ampersand, single/double quotes, equals, greater/less-than and 
backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state].
 Safety calls to use the broader escaping by default (and allow the narrow one 
as an option). I.e.
{code:java}
out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq + 
escapeEcmaScript(input) + dq + ")") + dq);
{code}
* Outputting string literals in the script tag contents. The existing code 
*lacks protection* against the script's end tag taking precedence over any 
contents.  Because browsers allow readable javascript between the script tags, 
browsers [stopped applying a straight decoding 
algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents]
 similar to one in HTML attributes.  The code in escapeEcmaScript() *must 
escape the less-than character* (with either backslash-x notation or with a 
simple backslash prefix).  I suggest to escape ampersands (assuming that 
browsers may keep applying their HTML entity decoding throughout the script tag 
contents).  Escaping the greater-than character does not seem necessary but 
would look symmetrical to escaping the less-than character.
{code:java}
out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq + 
")</script>");
{code}



was (Author: ilatypov):
I wonder if the escapeEcmaScript()'s use cases can be scrutinized.

* Outputting a standalone javascript file containing string literals.  The 
generation of string literals to be surrounded by double quotes seems to be 
covered by the existing code in escapeEcmaScript().  We could extend it to 
cover cases of   I.e.
{code:java}
String dq = Character.toString('"');
out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");");
{code}
* Outputting an HTML attribute containing javascript containing string 
literals.  This needs a new method *escapeHtmlAttr*.  Depending on the 
surrounding quotes or absence of them, all characters of the attribute value 
will go through either a minimal substitution of [single/double quotes and 
ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state]
 with the HTML entity or through a broader replacement of [whitespace, 
ampersand, single/double quotes, equals, greater/less-than and 
backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state].
 Safety calls to use the broader escaping by default (and allow the narrow one 
as an option). I.e.
{code:java}
out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq + 
escapeEcmaScript(input) + dq + ")") + dq);
{code}
* Outputting string literals in the script tag contents. The existing code 
*lacks protection* against the script's end tag taking precedence over any 
contents.  Because browsers allow readable javascript between the script tags, 
browsers [stopped applying a straight decoding 
algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents]
 similar to one in HTML attributes.  The code in escapeEcmaScript() *must 
escape the less-than character* (with either backslash-x notation or with a 
simple backslash prefix).  I suggest to escape ampersands (assuming that 
browsers may keep applying their HTML entity decoding throughout the script tag 
contents).  Escaping the greater-than character does not seem necessary but 
would look symmetrical to escaping the less-than character.
{code:java}
out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq + 
")</script>");
{code}


> [XSS] Possible attacks through StringEscapeUtils.escapeEcmaScript?
> ------------------------------------------------------------------
>
>                 Key: TEXT-42
>                 URL: https://issues.apache.org/jira/browse/TEXT-42
>             Project: Commons Text
>          Issue Type: Bug
>            Reporter: Andy Reek
>              Labels: XSS
>             Fix For: 1.x
>
>
> org.apache.commons.lang3.StringEscapeUtils.escapeEcmaScript does the escape 
> via a prefixed '\' on all characters which must be escaped. I am not sure if 
> this is really secure, if am looking at the comments on 
> https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_JavaScript_Data_Values.
>  They say it is possible to do an attack by escape the escape. I tested this 
> with the string '\"' and the output was '\\\"'. Is this really 
> ecma-/java-script secure? Or is it better to use the implementation used by 
> OWASP?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to