[jira] [Commented] (TEXT-216) HTML 5.0 Entities are not supported
[ https://issues.apache.org/jira/browse/TEXT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528641#comment-17528641 ] Richard Bunel commented on TEXT-216: It is the user who is providing the input. My need is to check whether the provided input is safe before rendering it on a web browser. My website has millions of view everyday, I need to be sure that the content shared by users to others doesn't contain XSS. > HTML 5.0 Entities are not supported > --- > > Key: TEXT-216 > URL: https://issues.apache.org/jira/browse/TEXT-216 > Project: Commons Text > Issue Type: Improvement >Affects Versions: 1.0 >Reporter: Richard Bunel >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As noted in > [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and > probably other tickets, HTML 5.0 entities are not supported. > A nice evolution would be to include them all. > Tentative PR: https://github.com/apache/commons-text/pull/312 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TEXT-215) NumericEntityUnescaper may miss decimal entity
[ https://issues.apache.org/jira/browse/TEXT-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Bunel updated TEXT-215: --- Description: *Description:* A security breach can be used in the NumericEntityUnescaper through the use of decimal character entities. At [line|https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117] 117 a string of hexadecimal characters are searched, whether or not the entity is an hexadecimal one. Therefore, if the "semiColonOptional" option is enabled and a deicmal entity without semi-colon is immediately followed by one or several letters from A to F, these letters will be caught. The Integer parsing with a radix at 10 will then fail and the whole entity will be ignored. *Example:* If one uses the following string: {code:java} {code} The sequence identifying the entity will wrongly be "ja" instead of "j". As "ja" is not a valid decimal entity, its Integer parsing fails and the whole entity remains escaped. Such code would then trigger the alert on all modern browsers. *Solution:* The fix for this is to restrict hexadecimal characters to hexadecimal entities and decimal characters to decimal entities. was: *Description:* A security breach can be used in the NumericEntityUnescaper through the use of decimal character entities. At [line|https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117] 117 a string of hexadecimal characters are searched, whether or not the entity is an hexadecimal one. Therefore, if the "semiColonOptional" option is enabled and a deicmal entity without semi-colon is immediately followed by one or several letters from A to E, these letters will be caught. The Integer parsing with a radix at 10 will then fail and the whole entity will be ignored. *Example:* If one uses the following string: {code:java} {code} The sequence identifying the entity will wrongly be "ja" instead of "j". As "ja" is not a valid decimal entity, its Integer parsing fails and the whole entity remains escaped. Such code would then trigger the alert on all modern browsers. *Solution:* The fix for this is to restrict hexadecimal characters to hexadecimal entities and decimal characters to decimal entities. > NumericEntityUnescaper may miss decimal entity > -- > > Key: TEXT-215 > URL: https://issues.apache.org/jira/browse/TEXT-215 > Project: Commons Text > Issue Type: Bug >Affects Versions: 1.0 >Reporter: Richard Bunel >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > *Description:* > A security breach can be used in the NumericEntityUnescaper through the use > of decimal character entities. > At > [line|https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117] > 117 a string of hexadecimal characters are searched, whether or not the > entity is an hexadecimal one. > Therefore, if the "semiColonOptional" option is enabled and a deicmal entity > without semi-colon is immediately followed by one or several letters from A > to F, these letters will be caught. The Integer parsing with a radix at 10 > will then fail and the whole entity will be ignored. > *Example:* > If one uses the following string: > {code:java} > {code} > The sequence identifying the entity will wrongly be "ja" instead of > "j". > As "ja" is not a valid decimal entity, its Integer parsing fails and the > whole entity remains escaped. > Such code would then trigger the alert on all modern browsers. > *Solution:* > The fix for this is to restrict hexadecimal characters to hexadecimal > entities and decimal characters to decimal entities. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TEXT-216) HTML 5.0 Entities are not supported
[ https://issues.apache.org/jira/browse/TEXT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513868#comment-17513868 ] Richard Bunel commented on TEXT-216: Well, my target usage (in my web application) is to use the "unescapeHtml5" method to parse HTML content (to detect potential XSS attack) before it is sent to and rendered on the browser. Leaving escaped characters entities create vulnerabilities. For example, if I try to prevent against javascript injection on images, a simple string like this will bypass the filter as the : entity remains escaped. {code:java} {code} The usage of "escapeHTML5" is admittedly less evident, but so are the "escapeHtml4" or "escapeHtml3" methods and they still form part of the library. > HTML 5.0 Entities are not supported > --- > > Key: TEXT-216 > URL: https://issues.apache.org/jira/browse/TEXT-216 > Project: Commons Text > Issue Type: Improvement >Affects Versions: 1.0 >Reporter: Richard Bunel >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As noted in > [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and > probably other tickets, HTML 5.0 entities are not supported. > A nice evolution would be to include them all. > Tentative PR: https://github.com/apache/commons-text/pull/312 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TEXT-216) HTML 5.0 Entities are not supported
[ https://issues.apache.org/jira/browse/TEXT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513665#comment-17513665 ] Richard Bunel commented on TEXT-216: That would depend on the usage you're making of the escaper doesn't it ? I could think of many cases where you would need to escape/unescape HTML 5.0 text. > HTML 5.0 Entities are not supported > --- > > Key: TEXT-216 > URL: https://issues.apache.org/jira/browse/TEXT-216 > Project: Commons Text > Issue Type: Improvement >Affects Versions: 1.0 >Reporter: Richard Bunel >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As noted in > [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and > probably other tickets, HTML 5.0 entities are not supported. > A nice evolution would be to include them all. > Tentative PR: https://github.com/apache/commons-text/pull/312 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (TEXT-216) HTML 5.0 Entities are not supported
[ https://issues.apache.org/jira/browse/TEXT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Bunel updated TEXT-216: --- Description: As noted in [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and probably other tickets, HTML 5.0 entities are not supported. A nice evolution would be to include them all. Tentative PR: https://github.com/apache/commons-text/pull/312 was: As noted in [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and probably other tickets, HTML 5.0 entities are not supported. A nice evolution would be to include them all. > HTML 5.0 Entities are not supported > --- > > Key: TEXT-216 > URL: https://issues.apache.org/jira/browse/TEXT-216 > Project: Commons Text > Issue Type: Improvement >Affects Versions: 1.0 >Reporter: Richard Bunel >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As noted in > [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and > probably other tickets, HTML 5.0 entities are not supported. > A nice evolution would be to include them all. > Tentative PR: https://github.com/apache/commons-text/pull/312 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (TEXT-216) HTML 5.0 Entities are not supported
[ https://issues.apache.org/jira/browse/TEXT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Bunel updated TEXT-216: --- External issue URL: https://github.com/apache/commons-text/pull/312 > HTML 5.0 Entities are not supported > --- > > Key: TEXT-216 > URL: https://issues.apache.org/jira/browse/TEXT-216 > Project: Commons Text > Issue Type: Improvement >Affects Versions: 1.0 >Reporter: Richard Bunel >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As noted in > [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and > probably other tickets, HTML 5.0 entities are not supported. > A nice evolution would be to include them all. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (TEXT-216) HTML 5.0 Entities are not supported
Richard Bunel created TEXT-216: -- Summary: HTML 5.0 Entities are not supported Key: TEXT-216 URL: https://issues.apache.org/jira/browse/TEXT-216 Project: Commons Text Issue Type: Improvement Affects Versions: 1.0 Reporter: Richard Bunel As noted in [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and probably other tickets, HTML 5.0 entities are not supported. A nice evolution would be to include them all. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (TEXT-215) NumericEntityUnescaper may miss decimal entity
[ https://issues.apache.org/jira/browse/TEXT-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Bunel updated TEXT-215: --- Description: *Description:* A security breach can be used in the NumericEntityUnescaper through the use of decimal character entities. At [line|https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117] 117 a string of hexadecimal characters are searched, whether or not the entity is an hexadecimal one. Therefore, if the "semiColonOptional" option is enabled and a deicmal entity without semi-colon is immediately followed by one or several letters from A to E, these letters will be caught. The Integer parsing with a radix at 10 will then fail and the whole entity will be ignored. *Example:* If one uses the following string: {code:java} {code} The sequence identifying the entity will wrongly be "ja" instead of "j". As "ja" is not a valid decimal entity, its Integer parsing fails and the whole entity remains escaped. Such code would then trigger the alert on all modern browsers. *Solution:* The fix for this is to restrict hexadecimal characters to hexadecimal entities and decimal characters to decimal entities. was: *Description:* A security breach can be used in the NumericEntityUnescaper through the use of decimal character entities. At [line 117|[https://github.com/opendigitaleducation/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117],] a string of hexadecimal characters are searched, whether or not the entity is an hexadecimal one. Therefore, if the "semiColonOptional" option is enabled and a deicmal entity without semi-colon is immediately followed by one or several letters from A to E, these letters will be caught. The Integer parsing with a radix at 10 will then fail and the whole entity will be ignored. *Example:* If one uses the following string: {code:java} {code} The sequence identifying the entity will wrongly be "ja" instead of "j". As "ja" is not a valid decimal entity, its Integer parsing fails and the whole entity remains escaped. Such code would then trigger the alert on all modern browsers. *Solution:* The fix for this is to restrict hexadecimal characters to hexadecimal entities and decimal characters to decimal entities. > NumericEntityUnescaper may miss decimal entity > -- > > Key: TEXT-215 > URL: https://issues.apache.org/jira/browse/TEXT-215 > Project: Commons Text > Issue Type: Bug >Affects Versions: 1.0 >Reporter: Richard Bunel >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > *Description:* > A security breach can be used in the NumericEntityUnescaper through the use > of decimal character entities. > At > [line|https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117] > 117 a string of hexadecimal characters are searched, whether or not the > entity is an hexadecimal one. > Therefore, if the "semiColonOptional" option is enabled and a deicmal entity > without semi-colon is immediately followed by one or several letters from A > to E, these letters will be caught. The Integer parsing with a radix at 10 > will then fail and the whole entity will be ignored. > *Example:* > If one uses the following string: > {code:java} > {code} > The sequence identifying the entity will wrongly be "ja" instead of > "j". > As "ja" is not a valid decimal entity, its Integer parsing fails and the > whole entity remains escaped. > Such code would then trigger the alert on all modern browsers. > *Solution:* > The fix for this is to restrict hexadecimal characters to hexadecimal > entities and decimal characters to decimal entities. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (TEXT-215) NumericEntityUnescaper may miss decimal entity
[ https://issues.apache.org/jira/browse/TEXT-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Bunel updated TEXT-215: --- External issue URL: https://github.com/apache/commons-text/pull/310 > NumericEntityUnescaper may miss decimal entity > -- > > Key: TEXT-215 > URL: https://issues.apache.org/jira/browse/TEXT-215 > Project: Commons Text > Issue Type: Bug >Affects Versions: 1.0 >Reporter: Richard Bunel >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > *Description:* > A security breach can be used in the NumericEntityUnescaper through the use > of decimal character entities. > At [line > 117|[https://github.com/opendigitaleducation/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117],] > a string of hexadecimal characters are searched, whether or not the entity > is an hexadecimal one. > Therefore, if the "semiColonOptional" option is enabled and a deicmal entity > without semi-colon is immediately followed by one or several letters from A > to E, these letters will be caught. The Integer parsing with a radix at 10 > will then fail and the whole entity will be ignored. > *Example:* > If one uses the following string: > {code:java} > {code} > The sequence identifying the entity will wrongly be "ja" instead of > "j". > As "ja" is not a valid decimal entity, its Integer parsing fails and the > whole entity remains escaped. > Such code would then trigger the alert on all modern browsers. > *Solution:* > The fix for this is to restrict hexadecimal characters to hexadecimal > entities and decimal characters to decimal entities. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (TEXT-215) NumericEntityUnescaper may miss decimal entity
Richard Bunel created TEXT-215: -- Summary: NumericEntityUnescaper may miss decimal entity Key: TEXT-215 URL: https://issues.apache.org/jira/browse/TEXT-215 Project: Commons Text Issue Type: Bug Affects Versions: 1.0 Reporter: Richard Bunel *Description:* A security breach can be used in the NumericEntityUnescaper through the use of decimal character entities. At [line 117|[https://github.com/opendigitaleducation/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117],] a string of hexadecimal characters are searched, whether or not the entity is an hexadecimal one. Therefore, if the "semiColonOptional" option is enabled and a deicmal entity without semi-colon is immediately followed by one or several letters from A to E, these letters will be caught. The Integer parsing with a radix at 10 will then fail and the whole entity will be ignored. *Example:* If one uses the following string: {code:java} {code} The sequence identifying the entity will wrongly be "ja" instead of "j". As "ja" is not a valid decimal entity, its Integer parsing fails and the whole entity remains escaped. Such code would then trigger the alert on all modern browsers. *Solution:* The fix for this is to restrict hexadecimal characters to hexadecimal entities and decimal characters to decimal entities. -- This message was sent by Atlassian Jira (v8.20.1#820001)