Re: Question/reflexion about html validation

Daniel Watford Tue, 18 Oct 2022 03:05:59 -0700

Hi Gaetan,

Assuming the string was pasted into an input field then I believe it is
being correctly flagged as a security risk since you are entering a literal
'\' followed by a literal 'r' into the field rather than a carriage
return.  The '\r' is the encoded form of a carriage return which would be
parsed by a ECMA or HTML language processor and changed.


The difference is quite subtle, so hopefully some tests will help
demonstrate what I'm getting at. Try pasting the following tests into
UtilCodecTests.java and run:


    @Test
    public void testCheckStringWithCrForHtmlSafe() {
        String value = "MY ITEM DESSCRIPTION\rS:UK 5.5 - EU 39 - CC:126334";
        List<String> errorList = new ArrayList<>();
        String checkedFieldValue =
UtilCodec.checkStringForHtmlSafe("fieldName", value, errorList,
                new Locale("test"), true);
        assertEquals(0, errorList.size());
    }

    @Test
    public void testCheckStringWithEncodedCrForHtmlSafe() {
        String value = "MY ITEM DESSCRIPTION\\rS:UK 5.5 - EU 39 -
CC:126334";
        List<String> errorList = new ArrayList<>();
        String checkedFieldValue =
UtilCodec.checkStringForHtmlSafe("fieldName", value, errorList,
                new Locale("test"), true);
        assertEquals(1, errorList.size());
        assertEquals("In field [fieldName] by our input policy, your input
has not been accepted for security reason. "
                + "Please check and modify accordingly, thanks.",
errorList.get(0));
    }


In the first test, java parses the \r sequence into the carriage return
character which passes security checks.

In the second test, java parses the \\r sequence into the two characters
literal \ followed by literal \r, which fails security checking. If your
input data came from an input field then I think it will align with the
scenario in the second test.

It's a bit confusing and subtle, but I hope the above helps.

Dan.


On Tue, 18 Oct 2022 at 09:58, Gaetan <gaetan.chabous...@nereide.fr> wrote:

> Hello community,
> We are under the process of migrating ofbiz from 18 to 22 for a project,
> and we saw some part of the code that we would like to discuss.
>
> Our case is the following :
> - We have this string as input data :
> "MY ITEM DESSCRIPTION\rS:UK 5.5 - EU 39 - CC:126334"
> - We are calling the service `createShoppingListOrderItemAttribute` with
> `attrValue` set with the previous string.
> - this service has and safe html policy.
> ```xml
> <override name="attrValue" allow-html="safe"/>
> ```
> - the service fails because of security validation process.
> - we got puzzled and analyzed the code, and found this :
> UtilCodec:L538
> ```java
>          String filtered = policy.sanitize(value);
>          String unescapeHtml4 = StringEscapeUtils.unescapeHtml4(filtered);
>          String unescapeEcmaScriptAndHtml4 =
> StringEscapeUtils.unescapeEcmaScript(unescapeHtml4);
>          // Replaces possible quotes entities in value (due to
> HtmlSanitizer above) to avoid issue with
>          // testCreateCustRequestItemNote and allow saving when using
> quotes in fields
>          if (filtered != null && !value.replace("&#39;",
> "'").replace("&#34;", "\"").equals(unescapeEcmaScriptAndHtml4)) {
>              String issueMsg = null;
>              if (locale.equals(new Locale("test"))) { // labels are not
> available in testClasses Gradle task
>                  issueMsg = "In field [" + valueName + "] by our input
> policy, your input has not been accepted "
>                          + "for security reason. Please check and modify
> accordingly, thanks.";
>              } else {
>                  issueMsg =
> UtilProperties.getMessage("SecurityUiLabels", "PolicySafe",
>                          UtilMisc.toMap("valueName", valueName), locale);
>              }
>              errorMessageList.add(issueMsg);
>          }
> ```
>  From what we understood, the input string is parsed as HTML and
> Javascript, and then compared to the initial string.
> If the initial and the parsed string are different, then there is a
> security issue, and an error is added to the service return.
>
> This causes us some questions and issues, with the example string above,
> and more specifically the `\r` string.
> Because the `StringEscapeUtils.unescapeEcmaScript()` parses the `\r`
> string as the **line break character**, and the comparison doesn't match.
> So for us, in this case, the security validation doesn't allow the
> string even though there is no security issue.
>
> Could someone give some guidance or explanation on why this is done like
> this ?
> Thanks a lot in advance
>
>

-- 
Daniel Watford

Re: Question/reflexion about html validation

Reply via email to