On Tue, 24 Aug 2021 09:59:59 GMT, Hannes Wallnöfer <hann...@openjdk.org> wrote:
>> That said a stricter regexp (unless I'm mistaken) could be: >> `^[a-zA-Z][a-zA-Z0-9+-.]*:.+$` >> [ from RFC 2396: scheme = alpha *( alpha | digit | "+" | "-" | >> "." ) ] > > I would normally opt for a generic regexp-based solution such as proposed by > @dfuch, but there is a security aspect to this as well (e.g. script > invocation), so I'd go with the more conservative approach here to just add > `ftp:` protocol to the list. I decided the regex `^[^:/?#]+:.+$` from the description in RFC 2396. B. Parsing a URI Reference with a Regular Expression The following line is the regular expression for breaking-down a URI reference into its components. ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9 ... Therefore, we can determine the value of the four components and fragment as ... scheme = $2 I agree that adding `ftp:` is better for the viewpoint of security. However, in addition to ftp, schemes such as javascript and git may be specified, so it's difficult to cover all commonly used schemes. ------------- PR: https://git.openjdk.java.net/jdk/pull/5198