On 15/07/2024 11:20, Máté Kocsis wrote:
Hey Ignace, Nicolas,
Based on your request for adding support for RFC 3986 spec compatible
parsing,
I evaluated another library (https://github.com/uriparser/uriparser/)
in the recent days
in order to add support for the requested functionality. As far as I
can tell, the results
were very promising, so I'm ok to include this into my proposal (I
haven't pushed my
changes yet and haven't updated the RFC yet).
Regarding the reference resolution
(https://uriparser.github.io/doc/api/latest/#resolution)
feature which has also already been asked for, I'm genuinely wondering
what the use-case is?
But in any case, I'm fine with incorporating this as well into the
RFC, since apparently
both Lexbor and uriparser support this (naturally).
What I became puzzled about is the correct object structure and
naming. Now that uriparser
which can deal with URIs came into the picture, while Lexbor can parse
URLs, I don't
know if it's a good idea to have a dedicated URI and a URL class
extending the former one...
If it is, then in my opinion, the logical behavior would be that
Lexbor always instantiates URL
classes, while uriparser would have to decide if the passed-in URI is
actually an URL, and
choose the instantiated class based on this factor... But in this case
the differences between
the RFC 3986 and WHATWG specifications couldn't be spelled out, since
URL objects
could hold URLs parsed based on both specs (and therefore having a
unified interface is required).
Or rather we should have a separate URI and a WhatwgUrl class so that
the former one would
always be created by uriparser, while the latter one by Lexbor? This
way we could have a dedicated
object interface for both standards (e.g. the RFC 3986 related one
could have a getUserInfo() method,
while the WHATWG related one could have both getUser() and
getPassword() methods). But then
the question is how interchangeable these classes should be? I.e.
should we be able to convert them
back and forth, or should there be an interface that is implemented by
the two classes?
I'd appreciate any suggestions regarding these questions.
P.S. due to its bad receptance, I got rid of the UrlParser class as
well as the UrlComponent enum from my
implementation in the meantime.
Regards,
Máté
Hi Máté,
> As far as I can tell, the results were very promising, so I'm ok to
include this into my proposal (I haven't pushed my changes yet and
haven't updated the RFC yet).
This is a great news if indeed it is possible to release both
specifications at the same time that would be really great.
> Regarding the reference resolution
(https://uriparser.github.io/doc/api/latest/#resolution)
feature which has also already been asked for, I'm genuinely wondering
what the use-case is?
Resolution is common when using an HTTP client and you defined a base
URI and then you can construct
subsequent URI based on that base URI using resolution.
> What I became puzzled about is the correct object structure and
naming. Now that uriparser
which can deal with URIs came into the picture, while Lexbor can parse
URLs, I don't
know if it's a good idea to have a dedicated URI and a URL class
extending the former one...
Both specification parse and can be represented by a URL value object.
The main difference between both
implementation are around normalization and encoding.
RFC3986 only allow non destructive normalization which is not true in
the case of WHATWG spec:
Here's a simple example to illustrate the differences:
`HttPs://0300.0250.0000.0001/path?query=foo%20bar`
- with RFC3986 you will end up with
`https://0300.0250.0000.0001/path?query=foo%20bar`
- with WHATWG you will end up with `https://192.168.0.1/path?query=foo+bar`
In the case of WHATWG the host is changed and the query string follow a
distinctive encoding spec.
From my POV you have 2 choices either you use one URL object for both
specifications with distinctive named constructors fromRFC3986 and
fromWhatwg or you have one interface and two distinctive implementations.
I do not think that one can be the extended to create the other one at
least that's my POV.
Hope this helps you in your implementation.
Best regards,
Ignace