On Fri, 27 Aug 2021 08:23:17 GMT, Masanori Yano <my...@openjdk.org> wrote:
>> I would normally opt for a generic regexp-based solution such as proposed by >> @dfuch, but there is a security aspect to this as well (e.g. script >> invocation), so I'd go with the more conservative approach here to just add >> `ftp:` protocol to the list. > > I decided the regex `^[^:/?#]+:.+$` from the description in RFC 2396. > > B. Parsing a URI Reference with a Regular Expression > > The following line is the regular expression for breaking-down a URI > reference into its components. > > ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))? > 12 3 4 5 6 7 8 9 > ... > Therefore, we can determine the value of the four components and fragment > as > ... > scheme = $2 > > I agree that adding `ftp:` is better for the viewpoint of security. However, > in addition to ftp, schemes such as javascript and git may be specified, so > it's difficult to cover all commonly used schemes. That regexp will correctly break the URI into its different components but it doesn't guarantee that each of the component is syntactically correct - as further syntax restriction may apply on each of the components. ------------- PR: https://git.openjdk.java.net/jdk/pull/5198