Christopher,

On 4/13/21 4:38 PM, Christopher Faulet wrote:
At the end it remains your choice. The function is quite good. I just wonder if it could be valuable to also handle single dot-segment here in addition to double dot-segment. Thus, the normalizer should be renamed "dot-segments" or something similar.

I planned to add a separate normalizer for that. This keeps the functions simple and easy to check for correctness. It also allows the administrator to cherry-pick exactly the normalizers they desire and that do not break their backend. In the old discussion Willy said that not everything that might be possible to normalize can actually be normalized when combined with legacy software.

Another point is about the dot encoding. It may be good to handle encoded dot (%2E), may be via an option. And IMHO, the way empty segments are handle is a bit counter intuitive. Calling "merge-slashes" normalizer first is a solution of course, but this means rewriting twice the uri. We must figure out what is the main expectation for this normalizer. Especially because ignoring empty segment when dot-segments are merged is not exactly the same than merge all slashes.

The percent encoding of the dot will be handled by a 'percent-decode' normalizer that decodes percent encoded unreserved characters (RFC 3986, section 2.3). The administrator then first would use the percent-decode normalizer, then the merge-slashes normalizer, then the dotdot normalizer.

Yes, it means rewriting the URI several times. But it is nice, explicit and composes well.

We can later figure out whether we want to provide "combined normalizers", such as a 'filesystem' normalizer that would combine the '.', '..' and '//' normalizers in an efficient way. Adding something like that later is easy. Changing the behavior of a normalizer later is hard.

That's why I'd like to keep them simple "Unix style" for now. Make them do one thing, make them do it well.

Note I was first surprised that leading dot-segments were preserved, before reading the 6th patch because for me it is the most important part. But I'm fine with an option in a way or another.


It's a result of how I approached the development. I wanted to not rebase my branch more than necessary. I will probably merge the two patches and change the default once the general approach is approved :-)

Best regards
Tim Düsterhus

Reply via email to