пт, 26 июн. 2020 г. в 11:00, Willy Tarreau <w...@1wt.eu>: > Hi Tim, > > On Thu, Jun 25, 2020 at 04:30:37PM +0200, Tim Düsterhus wrote: > (...) > > Willy: Please correct me if I misrepresented your arguments or left out > > something important. > > I think it's well summarized. There are other more painful points not > mentioned here: >
Tim, can we schedule this for 2.3 ? It seems to be "too much" for 2.2 as for normalization, I'd like an idea to compare nginx normalization rules. (I recall myself that only "merge_slashes off;" was rarely an issue, the rest of normalization rules seem to be just fine) > > - RFC3986's path normalization algorithm is bogus when it sees > multiple slashes such as in "/static//images/12.jpg". This happens > very frequently in URLs built by concatenation. The problem is that > when it meets a "../" it suggests to remove only one level of slash > and will end up in a different directory than the one a server that > does simplistic normalization would do (or a cache which would first > merge consecutive slashes to increase cache hit ratio). So: > /static//images/../../css/main.css would become: > /static/css/main.css according to RFC3986 > /css/main.css according to most file systems or simplifications > > - some operating systems also support the backslash "\" as a directory > delimiter. So you try to normalize your path correctly and leave > "\..\admin/" and you're screwed again. > > - "+" and "%20" are equivalent in the query string, but given that in > many simple applications these ones will only appear in a single form, > such applications might not even check for the other one. So if you > replace "%20" with "+" you will break some of them and if you replace > "+" with "%20" you will break others. I've seen quite a number of > implementations in the past perform the decoding just like haproxy > used to until recently, which is: feed the whole URL string to the > percent-decoder and decode the "+" as a space in the path part. And > by normalizing that we'd also break some of them. > > - some servers support a non-standard UTF-16 encoding (the same ones as > those using case-insensitive matching). For this they use "%u" followed > by 4 digits. So your "A" could be encoded "%61", "%41", "%u0061", > "%u0041", "%U0041" or "%U0061" there and will also match. But this is > not standard and must not be decoded as such, at the risk of breaking > yet other applications which do not expect that "%u" is transcoded. And > it's even possible that in some of these servers' configurations there > are rules matching "%UFEDC" but not "%FE%DC". > > - and I wouldn't even be surprised if some servers using some internal > normalization functions would also resolve unicode homoglyphs to valid > characters! Just check if your server accepts "/%ef%bd%81dmin/", that > would be fun! > > - actually, even browsers DO NOT normalize URLs, in order to preserve > them as much as possible and not to fail on broken servers! This should > be heard as a strong warning! Try it by yourself, just direct your > browser to /a%%b%31c%xyz/?brightness=10% and see it send: > > GET /a%%b%31c%xyz/?brightness=10% HTTP/1.1 > > you'll note that even "%31" wasn't turned into a "1". > > - there are other aspects (some mentioned in RFC3986). The authority > par of the URI can have various forms. The best known one is the > split of the net and the host in the IPv4 address representation, by > which "127.0.0.1" is also "127.0.1", "127.1", "2130706433" or even > "0x7f000001" (with X or x and F or f). You can even add leading > zeroes. And you can use octal encoding: 017700000001. You can try to > ping all of them, they'll likely work on your OS. At least my browser > rewrites them in the URL bar before sending the request. This might be > normalized... or not, for the same reasons of not breaking the next > step in the chain, which possibly expect to have a different behavior > when dealing with "16bit.16bit" representation, since host names made > of digits only are permitted if there's a domain behind :-/ The port > number can accept leading zeroes, so ":80" and ":0000080" are aliases. > > - last, those running haproxy in front of a WAF certainly don't want > haproxy to wipe these precious information before the WAF has a chance > to raise its awareness on this request! > > The problem with normalization is that it would work if everyone was doing > it, but RFC3986 was specified long after 99% of the internet was already > deployed and used, so at best it can be used as a guideline and a reference > of traps to avoid. And things are getting worse with IoT. You can just try > to run an HTTP server on an ESP8266 and you'll see that most of the time, > percent-decoding is not the web server's problem at all and it will pass > it unmodified. So you can definitely expect that your light bulb's web > server will take requests like "GET /dim?brightness=10%" and expect that > to work out of the box. Just install a normalizing load balancer in front > of a gateway managing tens of thousands of heating controllers based on > such devices and suddenly nobody can adjust the temperature in their homes > anymore (or worse, the '%' gets dropped and becomes degrees C so when you > ask for 50% heating you get 50 degrees C). > > The real problem is that initial implementations of HTTP never said that > it was illegal to send non-canonical characters and that they ought to be > rejected. It's just that HTTP was designed at an era where a server would > run fine in 30kB of RAM. Nowadays with 1000 times more, many wouldn't just > load. So it was less easy to insist that certain things were strictly > forbidden by then. > > In our discussion you invoked the principle of least surprise, which > dictates that haproxy ought to do "the right thing" by default. And I > totally agree with it. It just turns our that with so many different > behaviors around, when you're in the middle of the chain you have to > be discrete and not start to rearrange the stuff that's not your > business and claim it's better once tidied up, otherwise you're certain > to cause bad surprises. I'm pretty sure there are far less users of > a "deny" rule applied to a path than there are users of applications > that would simply break by normalizing. Mind you that I've found "/" > in header names and even percent-encoding, so you can easily imagine > the inventivity you can have in a path that gets rewritten along a > long chain using regex... Thus for me the principle of least surprise > is *NOT* to normalize in the middle of the chain. > > > Concluding: > > - The documentation should be updated to warn administrators that > > http-request deny must be used very carefully when combining it with a > path. > > That was my initial point and I was even a bit disappointed not to find > such a mention of percent-decoding in the doc as it used to be well-known > at the time "reqrep" and friends were the norm. We even still have examples > of this in examples/acl-content-sw.cfg with the forbidden URIs. But that's > how projects evolve, people change and assumptions as well, and certain > design decisions need to be clearly documented. > > > - HAProxy should gain the ability to correctly normalize URLs (i.e. not > > using the url_dec version). How exactly that would look is not yet clear. > > - It could be a `http-request normalize-path percent,dotdot,Y` action. > > - It could be a `normalize(percent)` converter > > - The `path` fetch could be extended to gain an argument, specifying > > the normalization requested. > > - An option within the `defaults` section could enable normalization > > for everything. > > It cannot be an option in a defaults section because it means you'd want > to disable it for some frontends, and this becomes incompatible with the > need for filtering before and after. The main problem is to be able to > do that : > > http-request accept/redirect/set-var/etc ... if blah > ... > http-request normalize-uri [ possible options ] > http-request other-rules ... if blah > > In short, apply security processing before normalization, and simple > decisions after. Those who don't want these security processing but > still want to apply deny rules could just start with normalize-uri. > We could also imagine having a global option to state that everything > that enters the H1/H2 muxes gets normalized before processing, but I > strongly doubt anyone would use this because it would definitely break > applications and would still not allow them to write safe rules. > > Converters can be useful to only check without modifying. > > Another easier option is to have a sample-fetch or converter that > works the other way around and tells you whether there is something > non-canonical in your request. It wouldn't trip on "10%" or "%%" or > such things, it would catch valid encodings of characters that should > not be encoded in the first place. Because these are the dangerous > and suspicious ones. And this allows to also spot the "%u0061". It > could also check for "../". By doing so you would be able to write : > > http-request deny if { req.url,is_suspicious } > http-request deny if { req.path /admin/ } > > I tend to prefer this one because it doesn't modify the request and > will not break applications. And that's more or less the way I've > used to proceed in certain environments with regex matching stuff > like "%(2[D-F]|3[0-9]|[46][0-0A-F]|[57][0-9A])" which is already > very efficient at blocking most of these patterns. > > A normalize action could however also correctly recode non-printable > characters that some browsers follow in links. But various options > are usable simultaneously, such as : > > http-request normalize-uri if { req.url,is_suspicious } > > > If you have anything to add after reading this mail: Please do! > > Sure, I always have something to add :-) > > As I mentioned in our exchanges, due to all the variations above, URL-based > access denial *never works*. YOU CANNOT IMPLEMENT SECURITY BY DENYING ONLY > A SUBSET OF IDENTIFIABLE PATTERNS AMONG A LARGE AND UNCLEAR LIST OF > ALIASES. > > Some application servers might want to see /images/../admin as something > valid under a symlink while others will instead resolve it to /admin. Some > will also consider that /admin/login.php.html is the same as > /admin/login.php, > and that this is also the same as /admin/.%5Clogin.php. You could also face > the case of internal rerouting of URLs like "/admin/debug?path=/login.php" > where it's the application server itself which routes its request inside > the application. > > So even with normalization, you'd be left with a huge doubt and your > application would remain totally insecure. This is why some responders > said that such filtering ought to be made on the server itself. And > you rightfully pointed that .htaccess is placed in the proper directory > for a reason. It's also why we've implemented certificate passing in > headers. It might not be the easiest thing to deploy as you said, but > having the application protect itself is the only way to make it secure > by DESIGN and not by applying pads on top of identified wounds. > > However, putting a protection layer in front of the application *IS* a > good way to protect it against zero-day attacks. Blocking "../", "%00", > double extensions, invalid arguments and so on will definitely save the > application from being exposed to a lot of dangerous dirt, starting with > vulnerability scanners which eat your CPU and fill your logs. > > That's why I still have a preference for implementing a nice and > configurable converter to detect non-canonical forms and probably > another one which would provide a good level of security in a single > rule. > > Cheers, > Willy >