On 11/26/19 4:00 PM, Tomas Hozza wrote: > > > On 20. 11. 2019 18:47, Tim Rühsen wrote: >> On 20.11.19 12:41, Tomas Hozza wrote: >>> On 7. 11. 2019 21:30, Tim Rühsen wrote: >>>> On 07.11.19 15:21, Tomas Hozza wrote: >>>>> Hi. >>>>> >>>>> In RHEL-8, we ship a wget version that suffers from bug fixed by [1]. The >>>>> fix resolved issue with matching subdomains when no_proxy domain >>>>> definition was prefixed with dot, e.q. "no_prefix=.mit.edu". As part of >>>>> backporting the fix to RHEL, I wanted to create an upstream test for >>>>> no_prefix functionality. However I found that there is still one corner >>>>> case, which is not handled by the current upstream code and honestly I'm >>>>> not sure what is the intended domain matching behavior in that case. Man >>>>> page is also not very specific in this regard. >>>>> >>>>> The corner case is as follows: >>>>> - no_proxy=.mit.edu >>>>> - download URL is e.g. "http://mit.edu/file1" >>>>> >>>>> In this case the proxy settings are used, because domains don't match due >>>>> to the leftmost dot in no_proxy domain definition. This is either >>>>> intended or corner case that was not considered. One could argue, that if >>>>> the no_proxy is set to ".mit.edu", then leftmost dot means that the proxy >>>>> settings should not apply only to subdomains of "mit.edu", but proxy >>>>> settings should still apply to "mit.edu" domain itself. From my point of >>>>> view, after reading wget man page, I don't think that the leftmost dost >>>>> in no_proxy definition has any special meaning. >>>> >>>> Hello Tomas, >>>> >>>> hard to decide how to handle this. I personally would like to see a >>>> match with curl's behavior (see https://github.com/curl/curl/issues/1208). >>>> >>>> Given the docs from GNU emacs, you are right. "no_proxy=.mit.edu" means >>>> "mit.edu and subdomains" are excluded from proxy settings. >>>> (see https://www.gnu.org/software/emacs/manual/html_node/url/Proxies.html) >>>> >>>> The caveat with emacs' behavior is that you cannot exclude just all >>>> subdomains of mit.edu without mit.edu itself. Effectively, that creates >>>> a corner case that can't be handled at all. (but if curl also does it >>>> that way, let's go for it). >>>> >>>> Maybe you can find out about the current no_proxy behavior of typical >>>> and wide-spread tools (regarding leftmost dot) !? Once we have that >>>> information, we can make a confident decision. >>>> >>>> Regards, Tim >>> >>> Hi Tim. >>> >>> It took me some time to go through the current situation and to be honest, >>> it is kind of a mess. While each tool handles the no_proxy env a little bit >>> differently, there are some similarities. Nevertheless I was not able to >>> find any standard. >>> >>> curl's behavior: >>> - "no_proxy=.mit.edu" >>> - will match the domain and subdomains e.g. "www.mit.edu" or >>> "www.subdomain.mit.edu" >>> - will match the host "mit.edu" >>> - "no_proxy=mit.edu" >>> - will match the domain and subdomains e.g. "www.mit.edu" or >>> "www.subdomain.mit.edu" >>> - will match the host "mit.edu" >>> - downside: can not match only the host; can not match only the domain and >>> subdomains >>> >>> current wget's behavior: >>> - "no_proxy=.mit.edu" >>> - will match the domain and subdomains e.g. "www.mit.edu" or >>> "www.subdomain.mit.edu" >>> - will NOT match the host "mit.edu" >>> - "no_proxy=mit.edu" >>> - will match the domain and subdomains e.g. "www.mit.edu" or >>> "www.subdomain.mit.edu" >>> - will match the host "mit.edu" >>> - downside: can not match only the host >>> >>> wget's behavior with proposed patch: >>> - "no_proxy=.mit.edu" >>> - will match the domain and subdomains e.g. "www.mit.edu" or >>> "www.subdomain.mit.edu" >>> - will match the host "mit.edu" >>> - "no_proxy=mit.edu" >>> - will match the domain and subdomains e.g. "www.mit.edu" or >>> "www.subdomain.mit.edu" >>> - will match the host "mit.edu" >>> - downside: can not match only the host; can not match only the domain and >>> subdomains >>> - it would be consistent with curl's behavior >>> >>> emacs's behavior: >>> - "no_proxy=.mit.edu" >>> - will match the domain and subdomains e.g. "www.mit.edu" or >>> "www.subdomain.mit.edu" >>> - will match the host "mit.edu" >>> - "no_proxy=mit.edu" >>> - will NOT match the domain and subdomains e.g. "www.mit.edu" or >>> "www.subdomain.mit.edu" >>> - will match the host "mit.edu" >>> - downside: can not match only subdomains >>> >>> python httplib2's behavior: >>> - "no_proxy=.mit.edu" >>> - will match the domain and subdomains e.g. "www.mit.edu" or >>> "www.subdomain.mit.edu" >>> - will match the host "mit.edu" >>> - "no_proxy=mit.edu" >>> - will NOT match the domain and subdomains e.g. "www.mit.edu" or >>> "www.subdomain.mit.edu" >>> - will match the host "mit.edu" >>> - downside: can not match only subdomains >>> >>> To sum it up. Each approach has some downsides. Given the change that I >>> provided, wget's behavior would be consistent with curl's behavior. However >>> it will have more downsides that it currently has, specifically it will >>> loose the ability to not to match the host, but only domain and subdomains. >>> Emacs's behavior is similar to Python's httplib2 behavior regarding the >>> leftmost dot. >>> >>> Honestly I have a soft preference for keeping the current wget's behavior. >>> But I admit that making the behavior consistent with curl's behavior makes >>> sense. Please let me know how you would like to proceed. >>> >>> To make the behavior consistent with curl, the previously attached changes >>> should be OK. If you find those new conditions too complicated, I can try >>> to rethink it, but I already tried to make it as little complicated as >>> possible and at the same time trying to not completely rewrite the function. >>> >>> If you'll decide to keep the current behavior, I'll modify the test that I >>> added to cope with the behavior. >> >> Great work, Tomas ! >> >> Wow, didn't think it's so messed up :-( >> >> We should definitely document your results, e.g. in the wget manual. >> >> If we keep the current behavior, we could adjust it with a new option or >> a new env variable 'WGET_NO_PROXY_MODE'. Which could take well-defined >> values like 'curl', 'emacs', 'wget' (the default), and maybe a new one >> ('strict') with none of the detected downsides. >> >> Looks a bit over-engineered, but it means that wget can easily adopt to >> existing environments. And the code seems pretty straight forward. >> >> Let's see if some more opinions come in. >> >> Regards, Tim > > Yes, 'WGET_NO_PROXY_MODE' is probably the safest option with regard to > backward compatibility. And if needed, the default could later change. One > downside of allowing values like 'curl' or 'emacs' is that these would > probably require also handling of asterisk "*" in hostnames, as those tools > do. > > Nevertheless for now, I at least modified the new test to cover current wget > behavior. There were also minor changes to the test framework in order to > make the test possible. Patches are attached. Please let me know if they need > any changes.
Please add your test to testenv/Makefile.am (best directly before adding $(METALINK_TESTS)). The test currently doesn't work for me, since host `working1.localhost` can't be resolved. Maybe you can add `testenv/certs/wgethosts`, similar as `tests/certs/wgethosts`, but with working1.localhost and working2.localhost included. You have to set env variable HOSTALIASES to that file name (glibc feature). In patch 0001 there is a typo 'retuns'. Please don't make your lines in the commit messages longer than 79 chars. Not sure if and when I implement WGET_NO_PROXY_MODE. We try to make no more changes to wget 1.x except bug fixes. All new stuff should only go into wget2. Regards, Tim
signature.asc
Description: OpenPGP digital signature
