[ 
https://issues.apache.org/jira/browse/TEXT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223929#comment-16223929
 ] 

Ilguiz Latypov edited comment on TEXT-42 at 10/29/17 3:25 PM:
--------------------------------------------------------------

I wonder if the escapeEcmaScript()'s use cases can be scrutinized.

* Outputting a standalone javascript file containing string literals.  The 
generation of string literals to be surrounded by double or single quotes seems 
to be covered by the existing code in escapeEcmaScript().
{code:java}
String dq = Character.toString('"');
out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");");
{code}
* Outputting an HTML attribute containing javascript containing string 
literals.  This needs a new method *escapeHtmlAttr*.  Depending on the 
surrounding quotes or absence of them, all characters of the attribute value 
will go through either a minimal substitution of [single/double quotes and 
ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state]
 with the HTML entity or through a broader replacement of [whitespace, 
ampersand, single/double quotes, equals, greater/less-than and 
backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state].
 Safety calls to use the broader escaping by default (and allow the narrow one 
as an option). I.e.
{code:java}
out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq + 
escapeEcmaScript(input) + dq + ")") + dq);
{code}
* Outputting string literals in the script tag contents. The existing code 
*lacks protection* against the script's end tag taking precedence over any 
contents.  Because browsers allow readable javascript between the script tags, 
browsers [stopped applying a straight decoding 
algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents]
 similar to one in HTML attributes.  The code in escapeEcmaScript() *must 
escape the less-than character* (with either backslash-x notation or with a 
simple backslash prefix).  I suggest to escape ampersands (assuming that 
browsers may keep applying their HTML entity decoding throughout the script tag 
contents).  Escaping the greater-than character does not seem necessary but 
would look symmetrical to escaping the less-than character.
{code:java}
out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq + 
")</script>");
{code}



was (Author: ilatypov):
I wonder if the escapeEcmaScript()'s use cases can be scrutinized.

* Outputting a standalone javascript file containing string literals.  The 
generation of string literals to be surrounded by double quotes seems to be 
covered by the existing code in escapeEcmaScript().  We could *extend it to 
cover cases of single-quote string literals* and back-quote string templates.
{code:java}
String dq = Character.toString('"');
out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");");
{code}
* Outputting an HTML attribute containing javascript containing string 
literals.  This needs a new method *escapeHtmlAttr*.  Depending on the 
surrounding quotes or absence of them, all characters of the attribute value 
will go through either a minimal substitution of [single/double quotes and 
ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state]
 with the HTML entity or through a broader replacement of [whitespace, 
ampersand, single/double quotes, equals, greater/less-than and 
backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state].
 Safety calls to use the broader escaping by default (and allow the narrow one 
as an option). I.e.
{code:java}
out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq + 
escapeEcmaScript(input) + dq + ")") + dq);
{code}
* Outputting string literals in the script tag contents. The existing code 
*lacks protection* against the script's end tag taking precedence over any 
contents.  Because browsers allow readable javascript between the script tags, 
browsers [stopped applying a straight decoding 
algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents]
 similar to one in HTML attributes.  The code in escapeEcmaScript() *must 
escape the less-than character* (with either backslash-x notation or with a 
simple backslash prefix).  I suggest to escape ampersands (assuming that 
browsers may keep applying their HTML entity decoding throughout the script tag 
contents).  Escaping the greater-than character does not seem necessary but 
would look symmetrical to escaping the less-than character.
{code:java}
out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq + 
")</script>");
{code}


> [XSS] Possible attacks through StringEscapeUtils.escapeEcmaScript?
> ------------------------------------------------------------------
>
>                 Key: TEXT-42
>                 URL: https://issues.apache.org/jira/browse/TEXT-42
>             Project: Commons Text
>          Issue Type: Bug
>            Reporter: Andy Reek
>              Labels: XSS
>             Fix For: 1.x
>
>
> org.apache.commons.lang3.StringEscapeUtils.escapeEcmaScript does the escape 
> via a prefixed '\' on all characters which must be escaped. I am not sure if 
> this is really secure, if am looking at the comments on 
> https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_JavaScript_Data_Values.
>  They say it is possible to do an attack by escape the escape. I tested this 
> with the string '\"' and the output was '\\\"'. Is this really 
> ecma-/java-script secure? Or is it better to use the implementation used by 
> OWASP?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to