[ https://issues.apache.org/jira/browse/TEXT-215?focusedWorklogId=762819&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762819 ]
ASF GitHub Bot logged work on TEXT-215: --------------------------------------- Author: ASF GitHub Bot Created on: 27/Apr/22 11:58 Start Date: 27/Apr/22 11:58 Worklog Time Spent: 10m Work Description: garydgregory commented on PR #310: URL: https://github.com/apache/commons-text/pull/310#issuecomment-1110914006 My personal opinion is that we should stick to a specific version of a specification, in this case W3C XML. If we also want to emulate what a browser does or what another language does, then that could be the job for a subclass. So maybe we should be refactoring the code? Needs other opinions... Issue Time Tracking ------------------- Worklog Id: (was: 762819) Time Spent: 1h 10m (was: 1h) > NumericEntityUnescaper may miss decimal entity > ---------------------------------------------- > > Key: TEXT-215 > URL: https://issues.apache.org/jira/browse/TEXT-215 > Project: Commons Text > Issue Type: Bug > Affects Versions: 1.0 > Reporter: Richard Bunel > Assignee: Bruno P. Kinoshita > Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > *Description:* > A security breach can be used in the NumericEntityUnescaper through the use > of decimal character entities. > At > [line|https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117] > 117 a string of hexadecimal characters are searched, whether or not the > entity is an hexadecimal one. > Therefore, if the "semiColonOptional" option is enabled and a deicmal entity > without semi-colon is immediately followed by one or several letters from A > to F, these letters will be caught. The Integer parsing with a radix at 10 > will then fail and the whole entity will be ignored. > *Example:* > If one uses the following string: > {code:java} > <iframe src=\"javascript:alert(1)\">{code} > The sequence identifying the entity will wrongly be "ja" instead of > "j". > As "ja" is not a valid decimal entity, its Integer parsing fails and the > whole entity remains escaped. > Such code would then trigger the alert on all modern browsers. > *Solution:* > The fix for this is to restrict hexadecimal characters to hexadecimal > entities and decimal characters to decimal entities. -- This message was sent by Atlassian Jira (v8.20.7#820007)