Hi Jeff,

Thank you for your reply.

The colon has nothing to do with the issue. If we remove the colon, the
issue still persists:

*curl* "a;bc@xyz"
curl: (6) Could not resolve host: xyz

*wget* "a;bc@xyz"
wget: unable to resolve host address ‘a;bc@xyz’

*wget* "abc@xyz"
wget: unable to resolve host address ‘xyz’

So, when the semicolon is included in *userinfo*, wget treats *userinfo* as
part of the hostname. You can replicate this after disconnecting from your
network first.

Thank you,
Bachir

On Mon, Feb 5, 2024 at 10:08 PM Jeffrey Walton <[email protected]> wrote:

> On Mon, Feb 5, 2024 at 4:57 PM Bachir Bendrissou <[email protected]>
> wrote:
> >
> > The url attached example contains a semicolon in the userinfo segment.
> >
> > Wget rejects this url with the following error message:
> >
> > *Bad port number.*
> >
> > It seems that Wget sees "c" as a port number. When "c" is replaced by a
> > digit, Wget accepts the url and attempts to resolve "xyz".
> >
> > It's worth noting that both curl and aria2 accept the url example.
> >
> > Why is the semicolon not allowed in userinfo, despite the fact that other
> > special characters are allowed?
>
> A colon in the userinfo is deprecated but not forbidden. However, an
> application can choose to reject it. From RFC 3968, Uniform Resource
> Identifier (URI): Generic Syntax, Section 3.2,
> <https://datatracker.ietf.org/doc/html/rfc3986#section-3.2>.
>
>    The userinfo subcomponent may consist of a user name and, optionally,
>    scheme-specific information about how to gain authorization to access
>    the resource.  The user information, if present, is followed by a
>    commercial at-sign ("@") that delimits it from the host.
>
>       userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
>
>    Use of the format "user:password" in the userinfo field is
>    deprecated.  Applications should not render as clear text any data
>    after the first colon (":") character found within a userinfo
>    subcomponent unless the data after the colon is the empty string
>    (indicating no password).  Applications may choose to ignore or
>    reject such data when it is received as part of a reference and
>    should reject the storage of such data in unencrypted form.
>
> According to the BNF is Appendix A, the semicolon ';' is allowed as a
> <sub-delims> token. It does not need to be percent encoded.
>
> Jeff
>

Reply via email to