[jira] [Commented] (TEXT-222) StringEscapeUtils.escapeJava() cannot restore string processed by unescapeJava()

Alex Herbert (Jira) Thu, 10 Nov 2022 03:22:30 -0800


    [ 
https://issues.apache.org/jira/browse/TEXT-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631597#comment-17631597
 ]


Alex Herbert commented on TEXT-222:
-----------------------------------

Note that the unescape is converting an escaped sequence into the original 
characters. So the string you pass to unescape should be a valid *escaped* 
string. In this case the input string is not a valid escaped string, it is a 
valid *unescaped* string. The escaped version would be 
{{{}"wwww\\\\u202euuuu"{}}}.
{code:java}
@ParameterizedTest
@CsvSource({
    "wwww\\u202euuuu",
    "wwww\\\\u202euuuu",
})
public void testText222(String escaped) {
    String unescaped = StringEscapeUtils.unescapeJava(escaped);
    System.out.printf("%s -> %s -> %s%n", escaped, unescaped,
                      StringEscapeUtils.escapeJava(unescaped));
}
{code}
Prints:
{noformat}
wwww\u202euuuu -> wwww?uuuu -> wwww\u202Euuuu
wwww\\u202euuuu -> wwww\u202euuuu -> wwww\\u202euuuu
{noformat}
By asking the StringEscapeUtils to unescape the already unescaped string, you 
have triggered it to believe that the \\u is the start of a unicode character. 
I do not think this is what your original string is intended to represent.

For example if you try to unescape this "\\u2" you will receive an exception:
{noformat}
java.lang.IllegalArgumentException: Less than 4 hex digits in unicode value: 
'\u2' due to end of CharSequence{noformat}
Note also that when using System.out.println(String) to view a string then the 
output will not be the same as the Java string. This is because the print will 
convert the [java escaped 
characters|https://docs.oracle.com/javase/tutorial/java/data/characters.html] 
to their actual characters:
{code:java}
Stream.of("\\", "\\\\", "\t", "\n")
      .map(s -> "+++" + s + "---")
      .forEach(System.out::println);{code}
Prints:
{noformat}
+++\---
+++\\---
+++    ---
+++
---
{noformat}
So be aware that your double backslash in code will print as a single backslash.

> StringEscapeUtils.escapeJava() cannot restore string processed by 
> unescapeJava()
> --------------------------------------------------------------------------------
>
>                 Key: TEXT-222
>                 URL: https://issues.apache.org/jira/browse/TEXT-222
>             Project: Commons Text
>          Issue Type: Bug
>    Affects Versions: 1.6
>            Reporter: clover
>            Priority: Minor
>         Attachments: code-1.PNG, code.PNG
>
>
> When we called StringEscapeUtils.unescapeJava(orignal) and then called 
> StringEscapeUtils.escapeJava(unescaped), sometimes the orginal string cannot 
> be rest as expected.
> For example: 
>     // Commons Text 1.6
>     String unescapeJava = StringEscapeUtils.unescapeJava("wwwwu202
> {color:#ff0000}e{color}uuuu");
>     System.out.println("unescapeJava=" + unescapeJava);   // print 
> unescapeJava=wwww‮uuuu
>     System.out.println("escapeJava=" + 
> StringEscapeUtils.escapeJava(unescapeJava)); // print 
> escapeJava=wwww\u202{color:#ff0000}E{color}uuuu
> The lowercase 'e' in "wwww
> u202euuuu" is converted to uppercase 'E'.
>  
> !code.PNG!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TEXT-222) StringEscapeUtils.escapeJava() cannot restore string processed by unescapeJava()

Reply via email to