On Fri, 27 Aug 2021 08:23:17 GMT, Masanori Yano <my...@openjdk.org> wrote:

>> I would normally opt for a generic regexp-based solution such as proposed by 
>> @dfuch, but there is a security aspect to this as well (e.g. script 
>> invocation), so I'd go with the more conservative approach here to just add 
>> `ftp:` protocol to the list.
>
> I decided the regex `^[^:/?#]+:.+$` from the description in RFC 2396.
> 
> B. Parsing a URI Reference with a Regular Expression
> 
>    The following line is the regular expression for breaking-down a URI
>    reference into its components.
> 
>       ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?
>        12            3  4          5       6  7        8 9
>   ...
>    Therefore, we can determine the value of the four components and fragment 
> as
>   ...
>       scheme    = $2
> 
> I agree that adding `ftp:` is better for the viewpoint of security. However, 
> in addition to ftp, schemes such as javascript and git may be specified, so 
> it's difficult to cover all commonly used schemes.

That regexp will correctly break the URI  into its different components but it 
doesn't guarantee that each of the component is syntactically correct - as 
further syntax restriction may apply on each of the components.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5198

Reply via email to