[ https://issues.apache.org/jira/browse/TEXT-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631597#comment-17631597 ]
Alex Herbert commented on TEXT-222: ----------------------------------- Note that the unescape is converting an escaped sequence into the original characters. So the string you pass to unescape should be a valid *escaped* string. In this case the input string is not a valid escaped string, it is a valid *unescaped* string. The escaped version would be {{{}"wwww\\\\u202euuuu"{}}}. {code:java} @ParameterizedTest @CsvSource({ "wwww\\u202euuuu", "wwww\\\\u202euuuu", }) public void testText222(String escaped) { String unescaped = StringEscapeUtils.unescapeJava(escaped); System.out.printf("%s -> %s -> %s%n", escaped, unescaped, StringEscapeUtils.escapeJava(unescaped)); } {code} Prints: {noformat} wwww\u202euuuu -> wwww?uuuu -> wwww\u202Euuuu wwww\\u202euuuu -> wwww\u202euuuu -> wwww\\u202euuuu {noformat} By asking the StringEscapeUtils to unescape the already unescaped string, you have triggered it to believe that the \\u is the start of a unicode character. I do not think this is what your original string is intended to represent. For example if you try to unescape this "\\u2" you will receive an exception: {noformat} java.lang.IllegalArgumentException: Less than 4 hex digits in unicode value: '\u2' due to end of CharSequence{noformat} Note also that when using System.out.println(String) to view a string then the output will not be the same as the Java string. This is because the print will convert the [java escaped characters|https://docs.oracle.com/javase/tutorial/java/data/characters.html] to their actual characters: {code:java} Stream.of("\\", "\\\\", "\t", "\n") .map(s -> "+++" + s + "---") .forEach(System.out::println);{code} Prints: {noformat} +++\--- +++\\--- +++ --- +++ --- {noformat} So be aware that your double backslash in code will print as a single backslash. > StringEscapeUtils.escapeJava() cannot restore string processed by > unescapeJava() > -------------------------------------------------------------------------------- > > Key: TEXT-222 > URL: https://issues.apache.org/jira/browse/TEXT-222 > Project: Commons Text > Issue Type: Bug > Affects Versions: 1.6 > Reporter: clover > Priority: Minor > Attachments: code-1.PNG, code.PNG > > > When we called StringEscapeUtils.unescapeJava(orignal) and then called > StringEscapeUtils.escapeJava(unescaped), sometimes the orginal string cannot > be rest as expected. > For example: > // Commons Text 1.6 > String unescapeJava = StringEscapeUtils.unescapeJava("wwwwu202 > {color:#ff0000}e{color}uuuu"); > System.out.println("unescapeJava=" + unescapeJava); // print > unescapeJava=wwwwuuuu > System.out.println("escapeJava=" + > StringEscapeUtils.escapeJava(unescapeJava)); // print > escapeJava=wwww\u202{color:#ff0000}E{color}uuuu > The lowercase 'e' in "wwww > u202euuuu" is converted to uppercase 'E'. > > !code.PNG! -- This message was sent by Atlassian Jira (v8.20.10#820010)