Le 13/04/2021 à 18:03, Tim Düsterhus a écrit :
Christopher,
On 4/13/21 4:38 PM, Christopher Faulet wrote:
At the end it remains your choice. The function is quite good. I just
wonder if it could be valuable to also handle single dot-segment here in
addition to double dot-segment. Thus, the normalizer should be renamed
"dot-segments" or something similar.
I planned to add a separate normalizer for that. This keeps the
functions simple and easy to check for correctness. It also allows the
administrator to cherry-pick exactly the normalizers they desire and
that do not break their backend. In the old discussion Willy said that
not everything that might be possible to normalize can actually be
normalized when combined with legacy software.
Ok, that make sense.
Another point is about the dot encoding. It may be good to handle
encoded dot (%2E), may be via an option. And IMHO, the way empty
segments are handle is a bit counter intuitive. Calling "merge-slashes"
normalizer first is a solution of course, but this means rewriting twice
the uri. We must figure out what is the main expectation for this
normalizer. Especially because ignoring empty segment when dot-segments
are merged is not exactly the same than merge all slashes.
The percent encoding of the dot will be handled by a 'percent-decode'
normalizer that decodes percent encoded unreserved characters (RFC 3986,
section 2.3). The administrator then first would use the percent-decode
normalizer, then the merge-slashes normalizer, then the dotdot normalizer.
Well, it is a bit different here. Because someone could choose to not decode
unreserved characters but want to handle encoded dot in dotdot normalizer.
Yes, it means rewriting the URI several times. But it is nice, explicit
and composes well.
On this point, you're right. It is far cleaner this way.
We can later figure out whether we want to provide "combined
normalizers", such as a 'filesystem' normalizer that would combine the
'.', '..' and '//' normalizers in an efficient way. Adding something
like that later is easy. Changing the behavior of a normalizer later is
hard.
That's true. Depending on feebacks, it will be possible to add more normalizers.
I'm fine with that.
That's why I'd like to keep them simple "Unix style" for now. Make them
do one thing, make them do it well.
Note I was first surprised that leading dot-segments were preserved,
before reading the 6th patch because for me it is the most important
part. But I'm fine with an option in a way or another.
It's a result of how I approached the development. I wanted to not
rebase my branch more than necessary. I will probably merge the two
patches and change the default once the general approach is approved :-)
Well, it is not a problem. You can keep it in two patches if it is easier for
you.
--
Christopher Faulet