On 27. 11. 2019 10:50, Tim Rühsen wrote:
> On 11/26/19 4:00 PM, Tomas Hozza wrote:
>>
>>
>> On 20. 11. 2019 18:47, Tim Rühsen wrote:
>>> On 20.11.19 12:41, Tomas Hozza wrote:
>>>> On 7. 11. 2019 21:30, Tim Rühsen wrote:
>>>>> On 07.11.19 15:21, Tomas Hozza wrote:
>>>>>> Hi.
>>>>>>
>>>>>> In RHEL-8, we ship a wget version that suffers from bug fixed by [1]. 
>>>>>> The fix resolved issue with matching subdomains when no_proxy domain 
>>>>>> definition was prefixed with dot, e.q. "no_prefix=.mit.edu". As part of 
>>>>>> backporting the fix to RHEL, I wanted to create an upstream test for 
>>>>>> no_prefix functionality. However I found that there is still one corner 
>>>>>> case, which is not handled by the current upstream code and honestly I'm 
>>>>>> not sure what is the intended domain matching behavior in that case. Man 
>>>>>> page is also not very specific in this regard.
>>>>>>
>>>>>> The corner case is as follows:
>>>>>> - no_proxy=.mit.edu
>>>>>> - download URL is e.g. "http://mit.edu/file1";
>>>>>>
>>>>>> In this case the proxy settings are used, because domains don't match 
>>>>>> due to the leftmost dot in no_proxy domain definition. This is either 
>>>>>> intended or corner case that was not considered. One could argue, that 
>>>>>> if the no_proxy is set to ".mit.edu", then leftmost dot means that the 
>>>>>> proxy settings should not apply only to subdomains of "mit.edu", but 
>>>>>> proxy settings should still apply to "mit.edu" domain itself. From my 
>>>>>> point of view, after reading wget man page, I don't think that the 
>>>>>> leftmost dost in no_proxy definition has any special meaning.
>>>>>
>>>>> Hello Tomas,
>>>>>
>>>>> hard to decide how to handle this. I personally would like to see a
>>>>> match with curl's behavior (see https://github.com/curl/curl/issues/1208).
>>>>>
>>>>> Given the docs from GNU emacs, you are right. "no_proxy=.mit.edu" means
>>>>> "mit.edu and subdomains" are excluded from proxy settings.
>>>>> (see https://www.gnu.org/software/emacs/manual/html_node/url/Proxies.html)
>>>>>
>>>>> The caveat with emacs' behavior is that you cannot exclude just all
>>>>> subdomains of mit.edu without mit.edu itself. Effectively, that creates
>>>>> a corner case that can't be handled at all. (but if curl also does it
>>>>> that way, let's go for it).
>>>>>
>>>>> Maybe you can find out about the current no_proxy behavior of typical
>>>>> and wide-spread tools (regarding leftmost dot) !? Once we have that
>>>>> information, we can make a confident decision.
>>>>>
>>>>> Regards, Tim
>>>>
>>>> Hi Tim.
>>>>
>>>> It took me some time to go through the current situation and to be honest, 
>>>> it is kind of a mess. While each tool handles the no_proxy env a little 
>>>> bit differently, there are some similarities. Nevertheless I was not able 
>>>> to find any standard.
>>>>
>>>> curl's behavior:
>>>> - "no_proxy=.mit.edu"
>>>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>>>> "www.subdomain.mit.edu"
>>>>   - will match the host "mit.edu"
>>>> - "no_proxy=mit.edu"
>>>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>>>> "www.subdomain.mit.edu"
>>>>   - will match the host "mit.edu"
>>>> - downside: can not match only the host; can not match only the domain and 
>>>> subdomains
>>>>
>>>> current wget's behavior:
>>>> - "no_proxy=.mit.edu"
>>>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>>>> "www.subdomain.mit.edu"
>>>>   - will NOT match the host "mit.edu"
>>>> - "no_proxy=mit.edu"
>>>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>>>> "www.subdomain.mit.edu"
>>>>   - will match the host "mit.edu"
>>>> - downside: can not match only the host
>>>>
>>>> wget's behavior with proposed patch:
>>>> - "no_proxy=.mit.edu"
>>>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>>>> "www.subdomain.mit.edu"
>>>>   - will match the host "mit.edu"
>>>> - "no_proxy=mit.edu"
>>>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>>>> "www.subdomain.mit.edu"
>>>>   - will match the host "mit.edu"
>>>> - downside: can not match only the host; can not match only the domain and 
>>>> subdomains
>>>> - it would be consistent with curl's behavior
>>>>
>>>> emacs's behavior:
>>>> - "no_proxy=.mit.edu"
>>>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>>>> "www.subdomain.mit.edu"
>>>>   - will match the host "mit.edu"
>>>> - "no_proxy=mit.edu"
>>>>   - will NOT match the domain and subdomains e.g. "www.mit.edu" or 
>>>> "www.subdomain.mit.edu"
>>>>   - will match the host "mit.edu"
>>>> - downside: can not match only subdomains
>>>>
>>>> python httplib2's behavior:
>>>> - "no_proxy=.mit.edu"
>>>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>>>> "www.subdomain.mit.edu"
>>>>   - will match the host "mit.edu"
>>>> - "no_proxy=mit.edu"
>>>>   - will NOT match the domain and subdomains e.g. "www.mit.edu" or 
>>>> "www.subdomain.mit.edu"
>>>>   - will match the host "mit.edu"
>>>> - downside: can not match only subdomains
>>>>
>>>> To sum it up. Each approach has some downsides. Given the change that I 
>>>> provided, wget's behavior would be consistent with curl's behavior. 
>>>> However it will have more downsides that it currently has, specifically it 
>>>> will loose the ability to not to match the host, but only domain and 
>>>> subdomains. Emacs's behavior is similar to Python's httplib2 behavior 
>>>> regarding the leftmost dot.
>>>>
>>>> Honestly I have a soft preference for keeping the current wget's behavior. 
>>>> But I admit that making the behavior consistent with curl's behavior makes 
>>>> sense. Please let me know how you would like to proceed.
>>>>
>>>> To make the behavior consistent with curl, the previously attached changes 
>>>> should be OK. If you find those new conditions too complicated, I can try 
>>>> to rethink it, but I already tried to make it as little complicated as 
>>>> possible and at the same time trying to not completely rewrite the 
>>>> function.
>>>>
>>>> If you'll decide to keep the current behavior, I'll modify the test that I 
>>>> added to cope with the behavior.
>>>
>>> Great work, Tomas !
>>>
>>> Wow, didn't think it's so messed up :-(
>>>
>>> We should definitely document your results, e.g. in the wget manual.
>>>
>>> If we keep the current behavior, we could adjust it with a new option or
>>> a new env variable 'WGET_NO_PROXY_MODE'. Which could take well-defined
>>> values like 'curl', 'emacs', 'wget' (the default), and maybe a new one
>>> ('strict') with none of the detected downsides.
>>>
>>> Looks a bit over-engineered, but it means that wget can easily adopt to
>>> existing environments. And the code seems pretty straight forward.
>>>
>>> Let's see if some more opinions come in.
>>>
>>> Regards, Tim
>>
>> Yes, 'WGET_NO_PROXY_MODE' is probably the safest option with regard to 
>> backward compatibility. And if needed, the default could later change. One 
>> downside of allowing values like 'curl' or 'emacs' is that these would 
>> probably require also handling of asterisk "*" in hostnames, as those tools 
>> do.
>>
>> Nevertheless for now, I at least modified the new test to cover current wget 
>> behavior. There were also minor changes to the test framework in order to 
>> make the test possible. Patches are attached. Please let me know if they 
>> need any changes.

Hi Tim.

I finally got to finishing the requested changes. Modified patches are attached.

> Please add your test to testenv/Makefile.am (best directly before adding
> $(METALINK_TESTS)). The test currently doesn't work for me, since host
> `working1.localhost` can't be resolved.

I added the test to testenv/Makefile.am.

> Maybe you can add `testenv/certs/wgethosts`, similar as
> `tests/certs/wgethosts`, but with working1.localhost and
> working2.localhost included. You have to set env variable HOSTALIASES to
> that file name (glibc feature).

I was surprised by this, since I was not able to reproduce this issue at first. 
However later I found out that since Fedora uses by default systemd's nss 
plugin "myhostname", which translates any localhost subdomain to localhost 
address, it magically works.

I tried your suggestion, however based on the documentation of HOSTALIASES and 
my experiments, the HOSTALIASES works only for hostnames without dot. So in 
this case, when I need to check even subdomains of working1.localhost (like 
www.working1.localhost), it does not work. It does not even work for 
working1.localhost. Honestly I didn't find any easy way how to make this work 
on a system which does not support translating any localhost subdomains to 
localhost address.

After some thinking, I came up with at least a workaround, to skip this test if 
the system can not translate necessary localhost subdomains. This is checked at 
the beginning of the test and if the resolution fails, the test is skipped. 
However on systems that support it (like Fedora, RHEL), the test is run and 
passes.

If you have some additional ideas about how to make this work everywhere, I'll 
be more than happy to change the test. Unfortunately I'm out of ideas and 
skipping the test in some cases makes it at least possible to include in the 
upstream repository (so that it does not make unnecessarily the test suite to 
fail).

> In patch 0001 there is a typo 'retuns'.

Fixed.

> Please don't make your lines in the commit messages longer than 79 chars.

Fixed.

> Not sure if and when I implement WGET_NO_PROXY_MODE. We try to make no
> more changes to wget 1.x except bug fixes. All new stuff should only go
> into wget2.
> 
> Regards, Tim
>
Regards,
Tomas
-- 
Tomas Hozza
Manager, Software Engineering - Linux Engineering, Core Services

PGP: 1D9F3C2D
UTC+2 (CEST / EMEA)
Red Hat Czech                 http://cz.redhat.com
>From 5d1c8547db0afa5f9f1f1b7843e45adbb2aee468 Mon Sep 17 00:00:00 2001
From: Tomas Hozza <tho...@redhat.com>
Date: Thu, 7 Nov 2019 12:46:15 +0100
Subject: [PATCH 1/3] testenv: HTTPTest.begin() should return exit value

* testenv/test/http_test.py: Ensure that HTTPTest.begin() always returns
a value

Previously the HTTPTest.begin() method always returned None. However this
is not consistent with the begin() implementation of the parent class
(BaseTest). This change ensures that HTTPTest.begin() returns a value.

Signed-off-by: Tomas Hozza <tho...@redhat.com>
---
 testenv/test/http_test.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/testenv/test/http_test.py b/testenv/test/http_test.py
index fef0c2ef..462ac6e7 100644
--- a/testenv/test/http_test.py
+++ b/testenv/test/http_test.py
@@ -42,7 +42,7 @@ class HTTPTest(BaseTest):
                 print_green("Test Passed.")
             else:
                 self.tests_passed = False
-            super(HTTPTest, self).begin()
+            return super(HTTPTest, self).begin()
 
     def instantiate_server_by(self, protocol):
         server = {HTTP: HTTPd,
-- 
2.25.4

>From c1a0b11980c050db4a95616895db25addb241d8c Mon Sep 17 00:00:00 2001
From: Tomas Hozza <tho...@redhat.com>
Date: Thu, 7 Nov 2019 13:01:44 +0100
Subject: [PATCH 2/3] testenv: Allow definition of environment variables for
 wget execuion

* testenv/README: Added description for new EnvironmentVariable hook
* testenv/conf/environment_variable.py: Added implementation of
EnvironmentVariable hook
* testenv/test/base_test.py: Modified exec_wget() to enable use of
EnvironmentVariable hook

Added new test hook called EnvironmentVariables, for defining environment
variables when wget is executed in tests. This is handy for testing
environment variables, which are accepted by wget.

Signed-off-by: Tomas Hozza <tho...@redhat.com>
---
 testenv/README                        |  3 +++
 testenv/conf/environment_variables.py | 14 ++++++++++++++
 testenv/test/base_test.py             |  6 +++++-
 3 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 testenv/conf/environment_variables.py

diff --git a/testenv/README b/testenv/README
index aca8cdda..d4fabddd 100644
--- a/testenv/README
+++ b/testenv/README
@@ -224,6 +224,9 @@ executed. The currently supported options are:
     file. While all Download URL's are passed to Urls, a notable exception is
     when in-url authentication is used. In such a case, the URL is specified in
     the WgetCommands string.
+    * EnvironmentVariables: A dictionary with key-value items, which will be
+    defined as environment variables during the execution of wget command in
+    test.
 
 Post-Test Hooks:
 ================================================================================
diff --git a/testenv/conf/environment_variables.py b/testenv/conf/environment_variables.py
new file mode 100644
index 00000000..323c051c
--- /dev/null
+++ b/testenv/conf/environment_variables.py
@@ -0,0 +1,14 @@
+from conf import hook
+
+""" Test Option: EnvironmentVariables
+This hook is used to define environment variables used for execution of wget
+command in test."""
+
+
+@hook(alias='EnvironmentVariables')
+class URLs:
+    def __init__(self, envs):
+        self.envs = envs
+
+    def __call__(self, test_obj):
+        test_obj.envs.update(**self.envs)
diff --git a/testenv/test/base_test.py b/testenv/test/base_test.py
index 0d2624ad..228f728b 100644
--- a/testenv/test/base_test.py
+++ b/testenv/test/base_test.py
@@ -51,6 +51,7 @@ class BaseTest:
 
         self.wget_options = ''
         self.urls = []
+        self.envs = dict()
 
         self.tests_passed = True
         self.ready = False
@@ -97,12 +98,15 @@ class BaseTest:
         cmd_line = self.gen_cmd_line()
         params = shlex.split(cmd_line)
         print(params)
+        envs = {"HOME": os.getcwd()}
+        envs.update(**self.envs)
+        print(envs)
 
         if os.getenv("SERVER_WAIT"):
             time.sleep(float(os.getenv("SERVER_WAIT")))
 
         try:
-            ret_code = call(params, env={"HOME": os.getcwd()})
+            ret_code = call(params, env=envs)
         except FileNotFoundError:
             raise TestFailed("The Wget Executable does not exist at the "
                              "expected path.")
-- 
2.25.4

>From f771909ec9916da32c26b21d298118877ea43a77 Mon Sep 17 00:00:00 2001
From: Tomas Hozza <tho...@redhat.com>
Date: Fri, 10 Jul 2020 15:21:33 +0200
Subject: [PATCH 3/3] testenv: Add test for handling of no_proxy environment
 variable

* testenv/Test-no_proxy-env.py: Added new test for no_proxy env
* testenv/Makefile.am: Added the new test to Makefile

Added new test with 5 cases, which are testing various combinations
of no_proxy environment variable definition and requested URLs.
The test is skipped if the system does not support resolution of
localhost subdomains to lcalhost address.

Signed-off-by: Tomas Hozza <tho...@redhat.com>
---
 testenv/Makefile.am          |   1 +
 testenv/Test-no_proxy-env.py | 160 +++++++++++++++++++++++++++++++++++
 2 files changed, 161 insertions(+)
 create mode 100755 testenv/Test-no_proxy-env.py

diff --git a/testenv/Makefile.am b/testenv/Makefile.am
index 412f8125..685e5115 100644
--- a/testenv/Makefile.am
+++ b/testenv/Makefile.am
@@ -116,6 +116,7 @@ if HAVE_PYTHON3
     Test--rejected-log.py                           \
     Test-reserved-chars.py                          \
     Test--spider-r.py                               \
+    Test-no_proxy-env.py                            \
     $(METALINK_TESTS)
 
 endif
diff --git a/testenv/Test-no_proxy-env.py b/testenv/Test-no_proxy-env.py
new file mode 100755
index 00000000..a02c738f
--- /dev/null
+++ b/testenv/Test-no_proxy-env.py
@@ -0,0 +1,160 @@
+#!/usr/bin/env python3
+import socket
+from sys import exit
+from test.http_test import HTTPTest
+from test.base_test import HTTP
+from misc.wget_file import WgetFile
+
+"""
+    This test ensures, that domains with and without leftmost dot defined in
+    no_proxy environment variable are accepted by wget. The idea is to use
+    non-existing proxy server address and detect whether files are downloaded
+    when proxy settings are omitted based on no_proxy environment variable
+    value.
+
+    current wget's behavior:
+    - "no_proxy=.mit.edu"
+      - will match the domain and subdomains e.g. "www.mit.edu" or "www.subdomain.mit.edu" (Case #4)
+      - will NOT match the host "mit.edu" (Case #3)
+    - "no_proxy=mit.edu"
+      - will match the domain and subdomains e.g. "www.mit.edu" or "www.subdomain.mit.edu" (Case #2)
+      - will match the host "mit.edu" (Case #1)
+    - downside: can not match only the host
+"""
+
+# Check whether the system supports translating localhost subdomains
+# to localhost address and if not, skip it.
+hostnames_to_check = [
+    "working1.localhost",
+    "working2.localhost",
+    "www.working1.localhost",
+    "www.working2.localhost",
+]
+for hostname in hostnames_to_check:
+    try:
+        ip = socket.gethostbyname(hostname)
+    except socket.gaierror as _:
+        # resolution of the name failes
+        # return value 77 -> SKIP
+        exit(77)
+
+# File Definitions
+File1 = "Would you like some Tea?"
+File2 = "With lemon or cream?"
+
+A_File = WgetFile ("File1", File1)
+B_File = WgetFile ("File2", File2)
+
+WGET_URLS = [["File1", "File2"]]
+WGET_ENVS = {
+    "http_proxy": "nonexisting.localhost:8080",
+    "no_proxy": "working1.localhost,.working2.localhost"
+}
+
+Servers = [HTTP]
+Files = [[A_File, B_File]]
+
+ExpectedReturnCodeWorking = 0
+ExpectedReturnCodeNotWorking = 4  # network error (non-existing proxy address)
+
+ExpectedDownloadedFilesWorking = [A_File, B_File]
+
+# Pre and Post Test Hooks
+test_options = {
+    "Urls"                : WGET_URLS,
+    "EnvironmentVariables": WGET_ENVS
+}
+post_test_working = {
+    "ExpectedFiles"     : ExpectedDownloadedFilesWorking,
+    "ExpectedRetcode"   : ExpectedReturnCodeWorking
+}
+post_test_not_working = {
+    "ExpectedRetcode"   : ExpectedReturnCodeNotWorking
+}
+
+# Case #1:
+# - Requested domain matches exactly the domain definition in no_proxy.
+# - Domain definition in no_proxy is NOT dot-prefixed
+# Expected result: proxy settings don't apply and files are downloaded.
+pre_case_1 = {
+    "ServerFiles"       : Files,
+    "Domains"           : ["working1.localhost"]
+}
+
+err_case_1 = HTTPTest (
+    pre_hook=pre_case_1,
+    test_params=test_options,
+    post_hook=post_test_working,
+    protocols=Servers
+).begin ()
+
+# Case #2:
+# - Requested domain is sub-domain of a domain definition in no_proxy.
+# - Domain definition in no_proxy is NOT dot-prefixed
+# Expected result: proxy settings don't apply and files are downloaded.
+pre_case_2 = {
+    "ServerFiles"       : Files,
+    "Domains"           : ["www.working1.localhost"]
+}
+
+err_case_2 = HTTPTest (
+    pre_hook=pre_case_2,
+    test_params=test_options,
+    post_hook=post_test_working,
+    protocols=Servers
+).begin ()
+
+# Case #3:
+# - Requested domain matches exactly the domain definition in no_proxy,
+#   except for the leftmost dot (".") in no_proxy domain definition.
+# - Domain definition in no_proxy IS dot-prefixed
+# Expected result: proxy settings apply and files are downloaded. This is
+#                  due to the mismatch in leftmost dot.
+# NOTE: This is inconsistent with curl's behavior, but has less drawbacks.
+pre_case_3 = {
+    "ServerFiles"       : Files,
+    "Domains"           : ["working2.localhost"]
+}
+
+err_case_3 = HTTPTest (
+    pre_hook=pre_case_3,
+    test_params=test_options,
+    post_hook=post_test_not_working,
+    protocols=Servers
+).begin ()
+
+# Case #4:
+# - Requested domain is sub-domain of a domain definition in no_proxy.
+# - Domain definition in no_proxy IS dot-prefixed
+# Expected result: proxy settings don't apply and files are downloaded.
+pre_case_4 = {
+    "ServerFiles"       : Files,
+    "Domains"           : ["www.working2.localhost"]
+}
+
+err_case_4 = HTTPTest (
+    pre_hook=pre_case_4,
+    test_params=test_options,
+    post_hook=post_test_working,
+    protocols=Servers
+).begin ()
+
+# Case #5
+# - Requested domain does not match a domain definition in no_proxy.
+# - Requested domain is NOT sub-domain of a domain definition in no_proxy.
+# Expected result: proxy settings apply and files are NOT downloaded due to
+#                  network error when using proxy with non-existing URL.
+pre_case_5 = {
+    "ServerFiles"       : Files,
+    "Domains"           : ["www.example.localhost"]
+}
+
+err_case_5 = HTTPTest (
+    pre_hook=pre_case_5,
+    test_params=test_options,
+    post_hook=post_test_not_working,
+    protocols=Servers
+).begin ()
+
+# Combine error codes from all test cases
+exit (max(err_case_1, err_case_2, err_case_3, err_case_4, err_case_5))
-- 
2.25.4

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to