Thank you, Thomas, patches applied and pushed. Regards, Tim
On 10.07.20 15:34, Tomas Hozza wrote: > > > On 27. 11. 2019 10:50, Tim Rühsen wrote: >> On 11/26/19 4:00 PM, Tomas Hozza wrote: >>> >>> >>> On 20. 11. 2019 18:47, Tim Rühsen wrote: >>>> On 20.11.19 12:41, Tomas Hozza wrote: >>>>> On 7. 11. 2019 21:30, Tim Rühsen wrote: >>>>>> On 07.11.19 15:21, Tomas Hozza wrote: >>>>>>> Hi. >>>>>>> >>>>>>> In RHEL-8, we ship a wget version that suffers from bug fixed by [1]. >>>>>>> The fix resolved issue with matching subdomains when no_proxy domain >>>>>>> definition was prefixed with dot, e.q. "no_prefix=.mit.edu". As part of >>>>>>> backporting the fix to RHEL, I wanted to create an upstream test for >>>>>>> no_prefix functionality. However I found that there is still one corner >>>>>>> case, which is not handled by the current upstream code and honestly >>>>>>> I'm not sure what is the intended domain matching behavior in that >>>>>>> case. Man page is also not very specific in this regard. >>>>>>> >>>>>>> The corner case is as follows: >>>>>>> - no_proxy=.mit.edu >>>>>>> - download URL is e.g. "http://mit.edu/file1" >>>>>>> >>>>>>> In this case the proxy settings are used, because domains don't match >>>>>>> due to the leftmost dot in no_proxy domain definition. This is either >>>>>>> intended or corner case that was not considered. One could argue, that >>>>>>> if the no_proxy is set to ".mit.edu", then leftmost dot means that the >>>>>>> proxy settings should not apply only to subdomains of "mit.edu", but >>>>>>> proxy settings should still apply to "mit.edu" domain itself. From my >>>>>>> point of view, after reading wget man page, I don't think that the >>>>>>> leftmost dost in no_proxy definition has any special meaning. >>>>>> >>>>>> Hello Tomas, >>>>>> >>>>>> hard to decide how to handle this. I personally would like to see a >>>>>> match with curl's behavior (see >>>>>> https://github.com/curl/curl/issues/1208). >>>>>> >>>>>> Given the docs from GNU emacs, you are right. "no_proxy=.mit.edu" means >>>>>> "mit.edu and subdomains" are excluded from proxy settings. >>>>>> (see >>>>>> https://www.gnu.org/software/emacs/manual/html_node/url/Proxies.html) >>>>>> >>>>>> The caveat with emacs' behavior is that you cannot exclude just all >>>>>> subdomains of mit.edu without mit.edu itself. Effectively, that creates >>>>>> a corner case that can't be handled at all. (but if curl also does it >>>>>> that way, let's go for it). >>>>>> >>>>>> Maybe you can find out about the current no_proxy behavior of typical >>>>>> and wide-spread tools (regarding leftmost dot) !? Once we have that >>>>>> information, we can make a confident decision. >>>>>> >>>>>> Regards, Tim >>>>> >>>>> Hi Tim. >>>>> >>>>> It took me some time to go through the current situation and to be >>>>> honest, it is kind of a mess. While each tool handles the no_proxy env a >>>>> little bit differently, there are some similarities. Nevertheless I was >>>>> not able to find any standard. >>>>> >>>>> curl's behavior: >>>>> - "no_proxy=.mit.edu" >>>>> - will match the domain and subdomains e.g. "www.mit.edu" or >>>>> "www.subdomain.mit.edu" >>>>> - will match the host "mit.edu" >>>>> - "no_proxy=mit.edu" >>>>> - will match the domain and subdomains e.g. "www.mit.edu" or >>>>> "www.subdomain.mit.edu" >>>>> - will match the host "mit.edu" >>>>> - downside: can not match only the host; can not match only the domain >>>>> and subdomains >>>>> >>>>> current wget's behavior: >>>>> - "no_proxy=.mit.edu" >>>>> - will match the domain and subdomains e.g. "www.mit.edu" or >>>>> "www.subdomain.mit.edu" >>>>> - will NOT match the host "mit.edu" >>>>> - "no_proxy=mit.edu" >>>>> - will match the domain and subdomains e.g. "www.mit.edu" or >>>>> "www.subdomain.mit.edu" >>>>> - will match the host "mit.edu" >>>>> - downside: can not match only the host >>>>> >>>>> wget's behavior with proposed patch: >>>>> - "no_proxy=.mit.edu" >>>>> - will match the domain and subdomains e.g. "www.mit.edu" or >>>>> "www.subdomain.mit.edu" >>>>> - will match the host "mit.edu" >>>>> - "no_proxy=mit.edu" >>>>> - will match the domain and subdomains e.g. "www.mit.edu" or >>>>> "www.subdomain.mit.edu" >>>>> - will match the host "mit.edu" >>>>> - downside: can not match only the host; can not match only the domain >>>>> and subdomains >>>>> - it would be consistent with curl's behavior >>>>> >>>>> emacs's behavior: >>>>> - "no_proxy=.mit.edu" >>>>> - will match the domain and subdomains e.g. "www.mit.edu" or >>>>> "www.subdomain.mit.edu" >>>>> - will match the host "mit.edu" >>>>> - "no_proxy=mit.edu" >>>>> - will NOT match the domain and subdomains e.g. "www.mit.edu" or >>>>> "www.subdomain.mit.edu" >>>>> - will match the host "mit.edu" >>>>> - downside: can not match only subdomains >>>>> >>>>> python httplib2's behavior: >>>>> - "no_proxy=.mit.edu" >>>>> - will match the domain and subdomains e.g. "www.mit.edu" or >>>>> "www.subdomain.mit.edu" >>>>> - will match the host "mit.edu" >>>>> - "no_proxy=mit.edu" >>>>> - will NOT match the domain and subdomains e.g. "www.mit.edu" or >>>>> "www.subdomain.mit.edu" >>>>> - will match the host "mit.edu" >>>>> - downside: can not match only subdomains >>>>> >>>>> To sum it up. Each approach has some downsides. Given the change that I >>>>> provided, wget's behavior would be consistent with curl's behavior. >>>>> However it will have more downsides that it currently has, specifically >>>>> it will loose the ability to not to match the host, but only domain and >>>>> subdomains. Emacs's behavior is similar to Python's httplib2 behavior >>>>> regarding the leftmost dot. >>>>> >>>>> Honestly I have a soft preference for keeping the current wget's >>>>> behavior. But I admit that making the behavior consistent with curl's >>>>> behavior makes sense. Please let me know how you would like to proceed. >>>>> >>>>> To make the behavior consistent with curl, the previously attached >>>>> changes should be OK. If you find those new conditions too complicated, I >>>>> can try to rethink it, but I already tried to make it as little >>>>> complicated as possible and at the same time trying to not completely >>>>> rewrite the function. >>>>> >>>>> If you'll decide to keep the current behavior, I'll modify the test that >>>>> I added to cope with the behavior. >>>> >>>> Great work, Tomas ! >>>> >>>> Wow, didn't think it's so messed up :-( >>>> >>>> We should definitely document your results, e.g. in the wget manual. >>>> >>>> If we keep the current behavior, we could adjust it with a new option or >>>> a new env variable 'WGET_NO_PROXY_MODE'. Which could take well-defined >>>> values like 'curl', 'emacs', 'wget' (the default), and maybe a new one >>>> ('strict') with none of the detected downsides. >>>> >>>> Looks a bit over-engineered, but it means that wget can easily adopt to >>>> existing environments. And the code seems pretty straight forward. >>>> >>>> Let's see if some more opinions come in. >>>> >>>> Regards, Tim >>> >>> Yes, 'WGET_NO_PROXY_MODE' is probably the safest option with regard to >>> backward compatibility. And if needed, the default could later change. One >>> downside of allowing values like 'curl' or 'emacs' is that these would >>> probably require also handling of asterisk "*" in hostnames, as those tools >>> do. >>> >>> Nevertheless for now, I at least modified the new test to cover current >>> wget behavior. There were also minor changes to the test framework in order >>> to make the test possible. Patches are attached. Please let me know if they >>> need any changes. > > Hi Tim. > > I finally got to finishing the requested changes. Modified patches are > attached. > >> Please add your test to testenv/Makefile.am (best directly before adding >> $(METALINK_TESTS)). The test currently doesn't work for me, since host >> `working1.localhost` can't be resolved. > > I added the test to testenv/Makefile.am. > >> Maybe you can add `testenv/certs/wgethosts`, similar as >> `tests/certs/wgethosts`, but with working1.localhost and >> working2.localhost included. You have to set env variable HOSTALIASES to >> that file name (glibc feature). > > I was surprised by this, since I was not able to reproduce this issue at > first. However later I found out that since Fedora uses by default systemd's > nss plugin "myhostname", which translates any localhost subdomain to > localhost address, it magically works. > > I tried your suggestion, however based on the documentation of HOSTALIASES > and my experiments, the HOSTALIASES works only for hostnames without dot. So > in this case, when I need to check even subdomains of working1.localhost > (like www.working1.localhost), it does not work. It does not even work for > working1.localhost. Honestly I didn't find any easy way how to make this work > on a system which does not support translating any localhost subdomains to > localhost address. > > After some thinking, I came up with at least a workaround, to skip this test > if the system can not translate necessary localhost subdomains. This is > checked at the beginning of the test and if the resolution fails, the test is > skipped. However on systems that support it (like Fedora, RHEL), the test is > run and passes. > > If you have some additional ideas about how to make this work everywhere, > I'll be more than happy to change the test. Unfortunately I'm out of ideas > and skipping the test in some cases makes it at least possible to include in > the upstream repository (so that it does not make unnecessarily the test > suite to fail). > >> In patch 0001 there is a typo 'retuns'. > > Fixed. > >> Please don't make your lines in the commit messages longer than 79 chars. > > Fixed. > >> Not sure if and when I implement WGET_NO_PROXY_MODE. We try to make no >> more changes to wget 1.x except bug fixes. All new stuff should only go >> into wget2. >> >> Regards, Tim >> > Regards, > Tomas
