On 15/07/2024 11:20, Máté Kocsis wrote:
Hey Ignace, Nicolas,

Based on your request for adding support for RFC 3986 spec compatible parsing, I evaluated another library (https://github.com/uriparser/uriparser/) in the recent days in order to add support for the requested functionality. As far as I can tell, the results were very promising, so I'm ok to include this into my proposal (I haven't pushed my
changes yet and haven't updated the RFC yet).

Regarding the reference resolution (https://uriparser.github.io/doc/api/latest/#resolution) feature which has also already been asked for, I'm genuinely wondering what the use-case is? But in any case, I'm fine with incorporating this as well into the RFC, since apparently
both Lexbor and uriparser support this (naturally).

What I became puzzled about is the correct object structure and naming. Now that uriparser which can deal with URIs came into the picture, while Lexbor can parse URLs, I don't know if it's a good idea to have a dedicated URI and a URL class extending the former one... If it is, then in my opinion, the logical behavior would be that Lexbor always instantiates URL classes, while uriparser would have to decide if the passed-in URI is actually an URL, and choose the instantiated class based on this factor... But in this case the differences between the RFC 3986 and WHATWG specifications couldn't be spelled out, since URL objects could hold URLs parsed based on both specs (and therefore having a unified interface is required).

Or rather we should have a separate URI and a WhatwgUrl class so that the former one would always be created by uriparser, while the latter one by Lexbor? This way we could have a dedicated object interface for both standards (e.g. the RFC 3986 related one could have a getUserInfo() method, while the WHATWG related one could have both getUser() and getPassword() methods). But then the question is how interchangeable these classes should be? I.e. should we be able to convert them back and forth, or should there be an interface that is implemented by the two classes?

I'd appreciate any suggestions regarding these questions.

P.S. due to its bad receptance, I got rid of the UrlParser class as well as the UrlComponent enum from my
implementation in the meantime.

Regards,
Máté


Hi Máté,

> As far as I can tell, the results were very promising, so I'm ok to include this into my proposal (I haven't pushed my changes yet and haven't updated the RFC yet).


This is a great news if indeed it is possible to release both specifications at the same time that would be really great.

> Regarding the reference resolution (https://uriparser.github.io/doc/api/latest/#resolution) feature which has also already been asked for, I'm genuinely wondering what the use-case is?

Resolution is common when using an HTTP client and you defined a base URI and then you can construct
subsequent URI based on that base URI using resolution.

>  What I became puzzled about is the correct object structure and naming. Now that uriparser which can deal with URIs came into the picture, while Lexbor can parse URLs, I don't know if it's a good idea to have a dedicated URI and a URL class extending the former one...

Both specification parse and can be represented by a URL value object. The main difference between both
implementation are around normalization and encoding.

RFC3986 only allow non destructive normalization which is not true in the case of WHATWG spec:

Here's a simple example to illustrate the differences:

`HttPs://0300.0250.0000.0001/path?query=foo%20bar`

- with RFC3986 you will end up with `https://0300.0250.0000.0001/path?query=foo%20bar`
- with WHATWG you will end up with `https://192.168.0.1/path?query=foo+bar`

In the case of WHATWG the host is changed and the query string follow a distinctive encoding spec.

From my POV you have 2 choices either you use one URL object for both specifications with distinctive named constructors fromRFC3986 and fromWhatwg or you have one interface and two distinctive implementations. I do not think that one can be the extended to create the other one at least that's my POV.

Hope this helps you in your implementation.

Best regards,
Ignace

Reply via email to