[ 
https://issues.apache.org/jira/browse/VALIDATOR-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152362#comment-17152362
 ] 

Ivan Larionov commented on VALIDATOR-429:
-----------------------------------------

Should be fixed in VALIDATOR-467.

> UrlValidator - path is invalid due to using java.net.URI for validation 
> (regression)
> ------------------------------------------------------------------------------------
>
>                 Key: VALIDATOR-429
>                 URL: https://issues.apache.org/jira/browse/VALIDATOR-429
>             Project: Commons Validator
>          Issue Type: Bug
>          Components: Routines
>    Affects Versions: 1.6
>            Reporter: limpygnome
>            Priority: Major
>              Labels: easyfix
>
> h1. Summary
> We've been hit by a bug in a real world application after upgrading 1.4.1 to 
> 1.6, where previously valid URLs are no longer valid, which looks to be due 
> to using java.net.URI for validating the path of a URL.
> h1. Steps to Reproduce
> Our application went to validate URLs similar to the following:
> * http://example.com//_test
> This is no longer valid in 1.6.1, but the following cases are:
> * http://example.com//test
> * http://example.com/_test
> h1. Impact
> It seems paths in UrlValidator are being parsed/validated as host-names, per 
> java.net.URI's validation.
> h1. Technical
> It looks like this may have been introduced by the following change:
> https://github.com/apache/commons-validator/commit/03bf0d33143ebd13e4f389cd4ecac8aec17c2057
> Specifically due to now using java.net.URI to validate a path. The usage is 
> as follows in org.apache.commons.validator.routines.UrlValidator:
> {code}
> URI uri = new URI(null,null,path,null);
> {code}
> It looks like URI is trying to parse the path as a hostname when the schema 
> and hostname are not specified.
> Example to reproduce:
> {code}
> new URI(null, null, "//_test", null);   // throws URISyntaxException
> {code}
> Same example with other parts, no longer throwing exception:
> {code}
> new URI(null, "test", "//_test", null);
> {code}
> Even though java.net.URI states string components can be null, it seems the 
> URL built internally, which is validated, is slightly different. So when 
> specifying a hostname with URI, internally it constructs:
> * //test//_test
> Using no hostname, in the same way as UrlValidator, the following is 
> constructed and validated internally:
> * //_test
> Therefore it looks like there's either a bug in java.net.URI, or its usage is 
> not correctly documented.
> h1. Fix
> A potential fix is to change 
> org.apache.commons.validator.routines.UrlValidator to pass an empty string in 
> the hostname. Internally, in java.net.URI, this produces:
> * ////_test
> Thus the hostname is empty, which is considered empty, and the correct path 
> is validated.
> Would this fix be appropriate, or considered too fragile?
> Alternatively the fix could be to extract similar logic to java.net.URI, to 
> validate the path, which appears to be just checking the characters are valid 
> and between a certain range. This logic can be seen in 
> java.net.URI.parseHierarchical, which calls upon checkChars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to