[jira] [Commented] (TEXT-216) HTML 5.0 Entities are not supported

2022-04-27 Thread Richard Bunel (Jira)


[ 
https://issues.apache.org/jira/browse/TEXT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528641#comment-17528641
 ] 

Richard Bunel commented on TEXT-216:


It is the user who is providing the input.

My need is to check whether the provided input is safe before rendering it on a 
web browser.

My website has millions of view everyday, I need to be sure that the content 
shared by users to others doesn't contain XSS.

> HTML 5.0 Entities are not supported
> ---
>
> Key: TEXT-216
> URL: https://issues.apache.org/jira/browse/TEXT-216
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Richard Bunel
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As noted in 
> [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and 
> probably other tickets, HTML 5.0 entities are not supported.
> A nice evolution would be to include them all.
> Tentative PR: https://github.com/apache/commons-text/pull/312



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TEXT-215) NumericEntityUnescaper may miss decimal entity

2022-03-31 Thread Richard Bunel (Jira)


 [ 
https://issues.apache.org/jira/browse/TEXT-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Bunel updated TEXT-215:
---
Description: 
*Description:*

A security breach can be used in the NumericEntityUnescaper through the use of 
decimal character entities.

At 
[line|https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117]
 117 a string of hexadecimal characters are searched, whether or not the entity 
is an hexadecimal one.

Therefore, if the "semiColonOptional" option is enabled and a deicmal entity 
without semi-colon is immediately followed by one or several letters from A to 
F, these letters will be caught. The Integer parsing with a radix at 10 will 
then fail and the whole entity will be ignored.

*Example:*

If one uses the following string: 
{code:java}
{code}
The sequence identifying the entity will wrongly be 

[jira] [Commented] (TEXT-216) HTML 5.0 Entities are not supported

2022-03-29 Thread Richard Bunel (Jira)


[ 
https://issues.apache.org/jira/browse/TEXT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513868#comment-17513868
 ] 

Richard Bunel commented on TEXT-216:


Well, my target usage (in my web application) is to use the "unescapeHtml5" 
method to parse HTML content (to detect potential XSS attack) before it is sent 
to and rendered on the browser. Leaving escaped characters entities create 
vulnerabilities.

For example, if I try to prevent against javascript injection on images, a 
simple string like this will bypass the filter as the  entity remains 
escaped.
{code:java}
 {code}
 

The usage of "escapeHTML5" is admittedly less evident, but so are the 
"escapeHtml4" or "escapeHtml3" methods and they still form part of the library.

> HTML 5.0 Entities are not supported
> ---
>
> Key: TEXT-216
> URL: https://issues.apache.org/jira/browse/TEXT-216
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Richard Bunel
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As noted in 
> [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and 
> probably other tickets, HTML 5.0 entities are not supported.
> A nice evolution would be to include them all.
> Tentative PR: https://github.com/apache/commons-text/pull/312



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TEXT-216) HTML 5.0 Entities are not supported

2022-03-28 Thread Richard Bunel (Jira)


[ 
https://issues.apache.org/jira/browse/TEXT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513665#comment-17513665
 ] 

Richard Bunel commented on TEXT-216:


That would depend on the usage you're making of the escaper doesn't it ?

I could think of many cases where you would need to escape/unescape HTML 5.0 
text.

> HTML 5.0 Entities are not supported
> ---
>
> Key: TEXT-216
> URL: https://issues.apache.org/jira/browse/TEXT-216
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Richard Bunel
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As noted in 
> [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and 
> probably other tickets, HTML 5.0 entities are not supported.
> A nice evolution would be to include them all.
> Tentative PR: https://github.com/apache/commons-text/pull/312



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TEXT-216) HTML 5.0 Entities are not supported

2022-03-28 Thread Richard Bunel (Jira)


 [ 
https://issues.apache.org/jira/browse/TEXT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Bunel updated TEXT-216:
---
Description: 
As noted in 
[TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and 
probably other tickets, HTML 5.0 entities are not supported.

A nice evolution would be to include them all.

Tentative PR: https://github.com/apache/commons-text/pull/312

  was:
As noted in 
[TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and 
probably other tickets, HTML 5.0 entities are not supported.

A nice evolution would be to include them all.


> HTML 5.0 Entities are not supported
> ---
>
> Key: TEXT-216
> URL: https://issues.apache.org/jira/browse/TEXT-216
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Richard Bunel
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As noted in 
> [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and 
> probably other tickets, HTML 5.0 entities are not supported.
> A nice evolution would be to include them all.
> Tentative PR: https://github.com/apache/commons-text/pull/312



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TEXT-216) HTML 5.0 Entities are not supported

2022-03-28 Thread Richard Bunel (Jira)


 [ 
https://issues.apache.org/jira/browse/TEXT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Bunel updated TEXT-216:
---
External issue URL: https://github.com/apache/commons-text/pull/312

> HTML 5.0 Entities are not supported
> ---
>
> Key: TEXT-216
> URL: https://issues.apache.org/jira/browse/TEXT-216
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Richard Bunel
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As noted in 
> [TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and 
> probably other tickets, HTML 5.0 entities are not supported.
> A nice evolution would be to include them all.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (TEXT-216) HTML 5.0 Entities are not supported

2022-03-28 Thread Richard Bunel (Jira)
Richard Bunel created TEXT-216:
--

 Summary: HTML 5.0 Entities are not supported
 Key: TEXT-216
 URL: https://issues.apache.org/jira/browse/TEXT-216
 Project: Commons Text
  Issue Type: Improvement
Affects Versions: 1.0
Reporter: Richard Bunel


As noted in 
[TEXT-193|https://issues.apache.org/jira/projects/TEXT/issues/TEXT-193] and 
probably other tickets, HTML 5.0 entities are not supported.

A nice evolution would be to include them all.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TEXT-215) NumericEntityUnescaper may miss decimal entity

2022-03-25 Thread Richard Bunel (Jira)


 [ 
https://issues.apache.org/jira/browse/TEXT-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Bunel updated TEXT-215:
---
Description: 
*Description:*

A security breach can be used in the NumericEntityUnescaper through the use of 
decimal character entities.

At 
[line|https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117]
 117 a string of hexadecimal characters are searched, whether or not the entity 
is an hexadecimal one.

Therefore, if the "semiColonOptional" option is enabled and a deicmal entity 
without semi-colon is immediately followed by one or several letters from A to 
E, these letters will be caught. The Integer parsing with a radix at 10 will 
then fail and the whole entity will be ignored.

*Example:*

If one uses the following string: 
{code:java}
{code}
The sequence identifying the entity will wrongly be 

[jira] [Updated] (TEXT-215) NumericEntityUnescaper may miss decimal entity

2022-03-25 Thread Richard Bunel (Jira)


 [ 
https://issues.apache.org/jira/browse/TEXT-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Bunel updated TEXT-215:
---
External issue URL: https://github.com/apache/commons-text/pull/310

> NumericEntityUnescaper may miss decimal entity
> --
>
> Key: TEXT-215
> URL: https://issues.apache.org/jira/browse/TEXT-215
> Project: Commons Text
>  Issue Type: Bug
>Affects Versions: 1.0
>Reporter: Richard Bunel
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Description:*
> A security breach can be used in the NumericEntityUnescaper through the use 
> of decimal character entities.
> At [line 
> 117|[https://github.com/opendigitaleducation/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117],]
>  a string of hexadecimal characters are searched, whether or not the entity 
> is an hexadecimal one.
> Therefore, if the "semiColonOptional" option is enabled and a deicmal entity 
> without semi-colon is immediately followed by one or several letters from A 
> to E, these letters will be caught. The Integer parsing with a radix at 10 
> will then fail and the whole entity will be ignored.
> *Example:*
> If one uses the following string: 
> {code:java}
> {code}
> The sequence identifying the entity will wrongly be 

[jira] [Created] (TEXT-215) NumericEntityUnescaper may miss decimal entity

2022-03-25 Thread Richard Bunel (Jira)
Richard Bunel created TEXT-215:
--

 Summary: NumericEntityUnescaper may miss decimal entity
 Key: TEXT-215
 URL: https://issues.apache.org/jira/browse/TEXT-215
 Project: Commons Text
  Issue Type: Bug
Affects Versions: 1.0
Reporter: Richard Bunel


*Description:*

A security breach can be used in the NumericEntityUnescaper through the use of 
decimal character entities.

At [line 
117|[https://github.com/opendigitaleducation/commons-text/blob/master/src/main/java/org/apache/commons/text/translate/NumericEntityUnescaper.java#L117],]
 a string of hexadecimal characters are searched, whether or not the entity is 
an hexadecimal one.

Therefore, if the "semiColonOptional" option is enabled and a deicmal entity 
without semi-colon is immediately followed by one or several letters from A to 
E, these letters will be caught. The Integer parsing with a radix at 10 will 
then fail and the whole entity will be ignored.

*Example:*

If one uses the following string: 
{code:java}
{code}
The sequence identifying the entity will wrongly be