Re: [PATCH] no_proxy domain matching

Tomas Hozza Tue, 26 Nov 2019 07:02:56 -0800


On 20. 11. 2019 18:47, Tim Rühsen wrote:
> On 20.11.19 12:41, Tomas Hozza wrote:
>> On 7. 11. 2019 21:30, Tim Rühsen wrote:
>>> On 07.11.19 15:21, Tomas Hozza wrote:
>>>> Hi.
>>>>
>>>> In RHEL-8, we ship a wget version that suffers from bug fixed by [1]. The 
>>>> fix resolved issue with matching subdomains when no_proxy domain 
>>>> definition was prefixed with dot, e.q. "no_prefix=.mit.edu". As part of 
>>>> backporting the fix to RHEL, I wanted to create an upstream test for 
>>>> no_prefix functionality. However I found that there is still one corner 
>>>> case, which is not handled by the current upstream code and honestly I'm 
>>>> not sure what is the intended domain matching behavior in that case. Man 
>>>> page is also not very specific in this regard.
>>>>
>>>> The corner case is as follows:
>>>> - no_proxy=.mit.edu
>>>> - download URL is e.g. "http://mit.edu/file1";
>>>>
>>>> In this case the proxy settings are used, because domains don't match due 
>>>> to the leftmost dot in no_proxy domain definition. This is either intended 
>>>> or corner case that was not considered. One could argue, that if the 
>>>> no_proxy is set to ".mit.edu", then leftmost dot means that the proxy 
>>>> settings should not apply only to subdomains of "mit.edu", but proxy 
>>>> settings should still apply to "mit.edu" domain itself. From my point of 
>>>> view, after reading wget man page, I don't think that the leftmost dost in 
>>>> no_proxy definition has any special meaning.
>>>
>>> Hello Tomas,
>>>
>>> hard to decide how to handle this. I personally would like to see a
>>> match with curl's behavior (see https://github.com/curl/curl/issues/1208).
>>>
>>> Given the docs from GNU emacs, you are right. "no_proxy=.mit.edu" means
>>> "mit.edu and subdomains" are excluded from proxy settings.
>>> (see https://www.gnu.org/software/emacs/manual/html_node/url/Proxies.html)
>>>
>>> The caveat with emacs' behavior is that you cannot exclude just all
>>> subdomains of mit.edu without mit.edu itself. Effectively, that creates
>>> a corner case that can't be handled at all. (but if curl also does it
>>> that way, let's go for it).
>>>
>>> Maybe you can find out about the current no_proxy behavior of typical
>>> and wide-spread tools (regarding leftmost dot) !? Once we have that
>>> information, we can make a confident decision.
>>>
>>> Regards, Tim
>>
>> Hi Tim.
>>
>> It took me some time to go through the current situation and to be honest, 
>> it is kind of a mess. While each tool handles the no_proxy env a little bit 
>> differently, there are some similarities. Nevertheless I was not able to 
>> find any standard.
>>
>> curl's behavior:
>> - "no_proxy=.mit.edu"
>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>> "www.subdomain.mit.edu"
>>   - will match the host "mit.edu"
>> - "no_proxy=mit.edu"
>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>> "www.subdomain.mit.edu"
>>   - will match the host "mit.edu"
>> - downside: can not match only the host; can not match only the domain and 
>> subdomains
>>
>> current wget's behavior:
>> - "no_proxy=.mit.edu"
>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>> "www.subdomain.mit.edu"
>>   - will NOT match the host "mit.edu"
>> - "no_proxy=mit.edu"
>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>> "www.subdomain.mit.edu"
>>   - will match the host "mit.edu"
>> - downside: can not match only the host
>>
>> wget's behavior with proposed patch:
>> - "no_proxy=.mit.edu"
>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>> "www.subdomain.mit.edu"
>>   - will match the host "mit.edu"
>> - "no_proxy=mit.edu"
>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>> "www.subdomain.mit.edu"
>>   - will match the host "mit.edu"
>> - downside: can not match only the host; can not match only the domain and 
>> subdomains
>> - it would be consistent with curl's behavior
>>
>> emacs's behavior:
>> - "no_proxy=.mit.edu"
>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>> "www.subdomain.mit.edu"
>>   - will match the host "mit.edu"
>> - "no_proxy=mit.edu"
>>   - will NOT match the domain and subdomains e.g. "www.mit.edu" or 
>> "www.subdomain.mit.edu"
>>   - will match the host "mit.edu"
>> - downside: can not match only subdomains
>>
>> python httplib2's behavior:
>> - "no_proxy=.mit.edu"
>>   - will match the domain and subdomains e.g. "www.mit.edu" or 
>> "www.subdomain.mit.edu"
>>   - will match the host "mit.edu"
>> - "no_proxy=mit.edu"
>>   - will NOT match the domain and subdomains e.g. "www.mit.edu" or 
>> "www.subdomain.mit.edu"
>>   - will match the host "mit.edu"
>> - downside: can not match only subdomains
>>
>> To sum it up. Each approach has some downsides. Given the change that I 
>> provided, wget's behavior would be consistent with curl's behavior. However 
>> it will have more downsides that it currently has, specifically it will 
>> loose the ability to not to match the host, but only domain and subdomains. 
>> Emacs's behavior is similar to Python's httplib2 behavior regarding the 
>> leftmost dot.
>>
>> Honestly I have a soft preference for keeping the current wget's behavior. 
>> But I admit that making the behavior consistent with curl's behavior makes 
>> sense. Please let me know how you would like to proceed.
>>
>> To make the behavior consistent with curl, the previously attached changes 
>> should be OK. If you find those new conditions too complicated, I can try to 
>> rethink it, but I already tried to make it as little complicated as possible 
>> and at the same time trying to not completely rewrite the function.
>>
>> If you'll decide to keep the current behavior, I'll modify the test that I 
>> added to cope with the behavior.
> 
> Great work, Tomas !
> 
> Wow, didn't think it's so messed up :-(
> 
> We should definitely document your results, e.g. in the wget manual.
> 
> If we keep the current behavior, we could adjust it with a new option or
> a new env variable 'WGET_NO_PROXY_MODE'. Which could take well-defined
> values like 'curl', 'emacs', 'wget' (the default), and maybe a new one
> ('strict') with none of the detected downsides.
> 
> Looks a bit over-engineered, but it means that wget can easily adopt to
> existing environments. And the code seems pretty straight forward.
> 
> Let's see if some more opinions come in.
> 
> Regards, Tim


Yes, 'WGET_NO_PROXY_MODE' is probably the safest option with regard to backward 
compatibility. And if needed, the default could later change. One downside of 
allowing values like 'curl' or 'emacs' is that these would probably require 
also handling of asterisk "*" in hostnames, as those tools do.

Nevertheless for now, I at least modified the new test to cover current wget 
behavior. There were also minor changes to the test framework in order to make 
the test possible. Patches are attached. Please let me know if they need any 
changes.

Thanks.

Regards,
Tomas
-- 
Tomas Hozza
Associate Manager, Software Engineering - EMEA ENG Core Services

PGP: 1D9F3C2D
UTC+1 (CET)
Red Hat Inc.                 http://cz.redhat.com

>From dea0f6272889adcff846144fff5714c076067b16 Mon Sep 17 00:00:00 2001
From: Tomas Hozza <[email protected]>
Date: Thu, 7 Nov 2019 12:46:15 +0100
Subject: [PATCH 1/3] testenv: HTTPTest.begin() should return exit value

* testenv/test/http_test.py: Ensure that HTTPTest.begin() always retuns a value

Previously the HTTPTest.begin() method always returned None. However this is not consistent with the begin() implementation of the parent class (BaseTest). This change ensures that HTTPTest.begin() returns a value.

Signed-off-by: Tomas Hozza <[email protected]>
---
 testenv/test/http_test.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/testenv/test/http_test.py b/testenv/test/http_test.py
index fef0c2ef..462ac6e7 100644
--- a/testenv/test/http_test.py
+++ b/testenv/test/http_test.py
@@ -42,7 +42,7 @@ class HTTPTest(BaseTest):
                 print_green("Test Passed.")
             else:
                 self.tests_passed = False
-            super(HTTPTest, self).begin()
+            return super(HTTPTest, self).begin()
 
     def instantiate_server_by(self, protocol):
         server = {HTTP: HTTPd,
-- 
2.21.0

>From 7fba12cf25ff7cc352f0f5df7d91670df7035823 Mon Sep 17 00:00:00 2001
From: Tomas Hozza <[email protected]>
Date: Thu, 7 Nov 2019 13:01:44 +0100
Subject: [PATCH 2/3] testenv: Allow definition of environment variables for
 wget execuion

* testenv/README: Added description for new EnvironmentVariable hook
* testenv/conf/environment_variable.py: Added implementation of EnvironmentVariable hook
* testenv/test/base_test.py: Modified exec_wget() to enable use of EnvironmentVariable hook

Added new test hook called EnvironmentVariables, for defining environment variables when wget is executed in tests. This is handy for testing environment variables, which are accepted by wget.

Signed-off-by: Tomas Hozza <[email protected]>
---
 testenv/README                        |  3 +++
 testenv/conf/environment_variables.py | 14 ++++++++++++++
 testenv/test/base_test.py             |  6 +++++-
 3 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 testenv/conf/environment_variables.py

diff --git a/testenv/README b/testenv/README
index aca8cdda..d4fabddd 100644
--- a/testenv/README
+++ b/testenv/README
@@ -224,6 +224,9 @@ executed. The currently supported options are:
     file. While all Download URL's are passed to Urls, a notable exception is
     when in-url authentication is used. In such a case, the URL is specified in
     the WgetCommands string.
+    * EnvironmentVariables: A dictionary with key-value items, which will be
+    defined as environment variables during the execution of wget command in
+    test.
 
 Post-Test Hooks:
 ================================================================================
diff --git a/testenv/conf/environment_variables.py b/testenv/conf/environment_variables.py
new file mode 100644
index 00000000..323c051c
--- /dev/null
+++ b/testenv/conf/environment_variables.py
@@ -0,0 +1,14 @@
+from conf import hook
+
+""" Test Option: EnvironmentVariables
+This hook is used to define environment variables used for execution of wget
+command in test."""
+
+
+@hook(alias='EnvironmentVariables')
+class URLs:
+    def __init__(self, envs):
+        self.envs = envs
+
+    def __call__(self, test_obj):
+        test_obj.envs.update(**self.envs)
diff --git a/testenv/test/base_test.py b/testenv/test/base_test.py
index dbf4678f..04a6f748 100644
--- a/testenv/test/base_test.py
+++ b/testenv/test/base_test.py
@@ -51,6 +51,7 @@ class BaseTest:
 
         self.wget_options = ''
         self.urls = []
+        self.envs = dict()
 
         self.tests_passed = True
         self.ready = False
@@ -97,12 +98,15 @@ class BaseTest:
         cmd_line = self.gen_cmd_line()
         params = shlex.split(cmd_line)
         print(params)
+        envs = {"HOME": os.getcwd()}
+        envs.update(**self.envs)
+        print(envs)
 
         if os.getenv("SERVER_WAIT"):
             time.sleep(float(os.getenv("SERVER_WAIT")))
 
         try:
-            ret_code = call(params, env={"HOME": os.getcwd()})
+            ret_code = call(params, env=envs)
         except FileNotFoundError:
             raise TestFailed("The Wget Executable does not exist at the "
                              "expected path.")
-- 
2.21.0

>From 0d50becc19ba07f34157b2842ca97675cc95fc1a Mon Sep 17 00:00:00 2001
From: Tomas Hozza <[email protected]>
Date: Thu, 7 Nov 2019 13:11:30 +0100
Subject: [PATCH 3/3] testenv: Add test for handling of no_proxy environment
 variable

* testenv/Test-no_proxy-env.py: Added new test for no_proxy env

Added new test with 5 cases, which are testing various combinations of no_proxy environment variable definition and requested URLs

Signed-off-by: Tomas Hozza <[email protected]>
---
 testenv/Test-no_proxy-env.py | 142 +++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)
 create mode 100755 testenv/Test-no_proxy-env.py

diff --git a/testenv/Test-no_proxy-env.py b/testenv/Test-no_proxy-env.py
new file mode 100755
index 00000000..ea7f38c4
--- /dev/null
+++ b/testenv/Test-no_proxy-env.py
@@ -0,0 +1,142 @@
+#!/usr/bin/env python3
+from sys import exit
+from test.http_test import HTTPTest
+from test.base_test import HTTP
+from misc.wget_file import WgetFile
+
+"""
+    This test ensures, that domains with and without leftmost dot defined in
+    no_proxy environment variable are accepted by wget. The idea is to use
+    non-existing proxy server address and detect whether files are downloaded
+    when proxy settings are omitted based on no_proxy environment variable
+    value.
+
+    current wget's behavior:
+    - "no_proxy=.mit.edu"
+      - will match the domain and subdomains e.g. "www.mit.edu" or "www.subdomain.mit.edu" (Case #4)
+      - will NOT match the host "mit.edu" (Case #3)
+    - "no_proxy=mit.edu"
+      - will match the domain and subdomains e.g. "www.mit.edu" or "www.subdomain.mit.edu" (Case #2)
+      - will match the host "mit.edu" (Case #1)
+    - downside: can not match only the host
+"""
+# File Definitions
+File1 = "Would you like some Tea?"
+File2 = "With lemon or cream?"
+
+A_File = WgetFile ("File1", File1)
+B_File = WgetFile ("File2", File2)
+
+WGET_URLS = [["File1", "File2"]]
+WGET_ENVS = {
+    "http_proxy": "nonexisting.localhost:8080",
+    "no_proxy": "working1.localhost,.working2.localhost"
+}
+
+Servers = [HTTP]
+Files = [[A_File, B_File]]
+
+ExpectedReturnCodeWorking = 0
+ExpectedReturnCodeNotWorking = 4  # network error (non-existing proxy address)
+
+ExpectedDownloadedFilesWorking = [A_File, B_File]
+
+# Pre and Post Test Hooks
+test_options = {
+    "Urls"                : WGET_URLS,
+    "EnvironmentVariables": WGET_ENVS
+}
+post_test_working = {
+    "ExpectedFiles"     : ExpectedDownloadedFilesWorking,
+    "ExpectedRetcode"   : ExpectedReturnCodeWorking
+}
+post_test_not_working = {
+    "ExpectedRetcode"   : ExpectedReturnCodeNotWorking
+}
+
+# Case #1:
+# - Requested domain matches exactly the domain definition in no_proxy.
+# - Domain definition in no_proxy is NOT dot-prefixed
+# Expected result: proxy settings don't apply and files are downloaded.
+pre_case_1 = {
+    "ServerFiles"       : Files,
+    "Domains"           : ["working1.localhost"]
+}
+
+err_case_1 = HTTPTest (
+    pre_hook=pre_case_1,
+    test_params=test_options,
+    post_hook=post_test_working,
+    protocols=Servers
+).begin ()
+
+# Case #2:
+# - Requested domain is sub-domain of a domain definition in no_proxy.
+# - Domain definition in no_proxy is NOT dot-prefixed
+# Expected result: proxy settings don't apply and files are downloaded.
+pre_case_2 = {
+    "ServerFiles"       : Files,
+    "Domains"           : ["www.working1.localhost"]
+}
+
+err_case_2 = HTTPTest (
+    pre_hook=pre_case_2,
+    test_params=test_options,
+    post_hook=post_test_working,
+    protocols=Servers
+).begin ()
+
+# Case #3:
+# - Requested domain matches exactly the domain definition in no_proxy,
+#   except for the leftmost dot (".") in no_proxy domain definition.
+# - Domain definition in no_proxy IS dot-prefixed
+# Expected result: proxy settings apply and files are downloaded. This is
+#                  due to the mismatch in leftmost dot.
+# NOTE: This is inconsistent with curl's behavior, but has less drawbacks.
+pre_case_3 = {
+    "ServerFiles"       : Files,
+    "Domains"           : ["working2.localhost"]
+}
+
+err_case_3 = HTTPTest (
+    pre_hook=pre_case_3,
+    test_params=test_options,
+    post_hook=post_test_not_working,
+    protocols=Servers
+).begin ()
+
+# Case #4:
+# - Requested domain is sub-domain of a domain definition in no_proxy.
+# - Domain definition in no_proxy IS dot-prefixed
+# Expected result: proxy settings don't apply and files are downloaded.
+pre_case_4 = {
+    "ServerFiles"       : Files,
+    "Domains"           : ["www.working2.localhost"]
+}
+
+err_case_4 = HTTPTest (
+    pre_hook=pre_case_4,
+    test_params=test_options,
+    post_hook=post_test_working,
+    protocols=Servers
+).begin ()
+
+# Case #5
+# - Requested domain does not match a domain definition in no_proxy.
+# - Requested domain is NOT sub-domain of a domain definition in no_proxy.
+# Expected result: proxy settings apply and files are NOT downloaded due to
+#                  network error when using proxy with non-existing URL.
+pre_case_5 = {
+    "ServerFiles"       : Files,
+    "Domains"           : ["www.example.localhost"]
+}
+
+err_case_5 = HTTPTest (
+    pre_hook=pre_case_5,
+    test_params=test_options,
+    post_hook=post_test_not_working,
+    protocols=Servers
+).begin ()
+
+# Combine error codes from all test cases
+exit (max(err_case_1, err_case_2, err_case_3, err_case_4, err_case_5))
-- 
2.21.0

Re: [PATCH] no_proxy domain matching

Reply via email to