commit python-Scrapy for openSUSE:Factory

Source-Sync Mon, 12 Sep 2022 10:08:36 -0700

Script 'mail_helper' called by obssrc
Hello community,

here is the log from the commit of package python-Scrapy for openSUSE:Factory 
checked in at 2022-09-12 19:08:23
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-Scrapy (Old)
 and      /work/SRC/openSUSE:Factory/.python-Scrapy.new.2083 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "python-Scrapy"

Mon Sep 12 19:08:23 2022 rev:15 rq:1002736 version:2.6.2

Changes:
--------
--- /work/SRC/openSUSE:Factory/python-Scrapy/python-Scrapy.changes      
2022-03-06 18:48:44.204173661 +0100
+++ /work/SRC/openSUSE:Factory/.python-Scrapy.new.2083/python-Scrapy.changes    
2022-09-12 19:08:24.510563363 +0200
@@ -1,0 +2,30 @@
+Fri Sep  9 15:21:20 UTC 2022 - Yogalakshmi Arunachalam <yarunacha...@suse.com>
+
+- Update to v2.6.2 
+  Security bug fix:
+  * When HttpProxyMiddleware processes a request with proxy metadata, and that 
proxy metadata includes proxy credentials,
+    HttpProxyMiddleware sets the Proxy-Authentication header, but only if that 
header is not already set.
+  * There are third-party proxy-rotation downloader middlewares that set 
different proxy metadata every time they process a request.
+  * Because of request retries and redirects, the same request can be 
processed by downloader middlewares more than once,
+    including both HttpProxyMiddleware and any third-party proxy-rotation 
downloader middleware.
+  * These third-party proxy-rotation downloader middlewares could change the 
proxy metadata of a request to a new value,
+    but fail to remove the Proxy-Authentication header from the previous value 
of the proxy metadata, causing the credentials of one
+    proxy to be sent to a different proxy.
+  * To prevent the unintended leaking of proxy credentials, the behavior of 
HttpProxyMiddleware is now as follows when processing a request:
+    + If the request being processed defines proxy metadata that includes 
credentials, the Proxy-Authorization header is always updated 
+    to feature those credentials.
+    + If the request being processed defines proxy metadata without 
credentials, the Proxy-Authorization header is removed unless
+    it was originally defined for the same proxy URL.
+    + To remove proxy credentials while keeping the same proxy URL, remove the 
Proxy-Authorization header.
+    + If the request has no proxy metadata, or that metadata is a falsy value 
(e.g. None), the Proxy-Authorization header is removed.
+    + It is no longer possible to set a proxy URL through the proxy metadata 
but set the credentials through the Proxy-Authorization header.
+    Set proxy credentials through the proxy metadata instead.
+  * Also fixes the following regressions introduced in 2.6.0:
+    + CrawlerProcess supports again crawling multiple spiders (issue 5435, 
issue 5436)
+    + Installing a Twisted reactor before Scrapy does (e.g. importing 
twisted.internet.reactor somewhere at the module level)
+    no longer prevents Scrapy from starting, as long as a different reactor is 
not specified in TWISTED_REACTOR (issue 5525, issue 5528)
+    + Fixed an exception that was being logged after the spider finished under 
certain conditions (issue 5437, issue 5440)
+    + The --output/-o command-line parameter supports again a value starting 
with a hyphen (issue 5444, issue 5445)
+    + The scrapy parse -h command no longer throws an error (issue 5481, issue 
5482)
+
+-------------------------------------------------------------------

Old:
----
  Scrapy-2.6.1.tar.gz

New:
----
  Scrapy-2.6.2.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-Scrapy.spec ++++++
--- /var/tmp/diff_new_pack.1oQoGS/_old  2022-09-12 19:08:24.990564712 +0200
+++ /var/tmp/diff_new_pack.1oQoGS/_new  2022-09-12 19:08:24.998564735 +0200
@@ -19,7 +19,7 @@
 %{?!python_module:%define python_module() python3-%{**}}
 %define skip_python2 1
 Name:           python-Scrapy
-Version:        2.6.1
+Version:        2.6.2
 Release:        0
 Summary:        A high-level Python Screen Scraping framework
 License:        BSD-3-Clause

++++++ Scrapy-2.6.1.tar.gz -> Scrapy-2.6.2.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/PKG-INFO new/Scrapy-2.6.2/PKG-INFO
--- old/Scrapy-2.6.1/PKG-INFO   2022-03-01 14:07:43.042602300 +0100
+++ new/Scrapy-2.6.2/PKG-INFO   2022-07-25 13:54:31.427226500 +0200
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: Scrapy
-Version: 2.6.1
+Version: 2.6.2
 Summary: A high-level Web Crawling and Web Scraping framework
 Home-page: https://scrapy.org
 Author: Scrapy developers
@@ -10,7 +10,6 @@
 Project-URL: Documentation, https://docs.scrapy.org/
 Project-URL: Source, https://github.com/scrapy/scrapy
 Project-URL: Tracker, https://github.com/scrapy/scrapy/issues
-Platform: UNKNOWN
 Classifier: Framework :: Scrapy
 Classifier: Development Status :: 5 - Production/Stable
 Classifier: Environment :: Console
@@ -144,5 +143,3 @@
 ==================
 
 See https://scrapy.org/support/ for details.
-
-
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/Scrapy.egg-info/PKG-INFO 
new/Scrapy-2.6.2/Scrapy.egg-info/PKG-INFO
--- old/Scrapy-2.6.1/Scrapy.egg-info/PKG-INFO   2022-03-01 14:07:42.000000000 
+0100
+++ new/Scrapy-2.6.2/Scrapy.egg-info/PKG-INFO   2022-07-25 13:54:31.000000000 
+0200
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: Scrapy
-Version: 2.6.1
+Version: 2.6.2
 Summary: A high-level Web Crawling and Web Scraping framework
 Home-page: https://scrapy.org
 Author: Scrapy developers
@@ -10,7 +10,6 @@
 Project-URL: Documentation, https://docs.scrapy.org/
 Project-URL: Source, https://github.com/scrapy/scrapy
 Project-URL: Tracker, https://github.com/scrapy/scrapy/issues
-Platform: UNKNOWN
 Classifier: Framework :: Scrapy
 Classifier: Development Status :: 5 - Production/Stable
 Classifier: Environment :: Console
@@ -144,5 +143,3 @@
 ==================
 
 See https://scrapy.org/support/ for details.
-
-
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/Scrapy.egg-info/SOURCES.txt 
new/Scrapy-2.6.2/Scrapy.egg-info/SOURCES.txt
--- old/Scrapy-2.6.1/Scrapy.egg-info/SOURCES.txt        2022-03-01 
14:07:42.000000000 +0100
+++ new/Scrapy-2.6.2/Scrapy.egg-info/SOURCES.txt        2022-07-25 
13:54:31.000000000 +0200
@@ -393,6 +393,12 @@
 tests/CrawlerProcess/caching_hostname_resolver.py
 tests/CrawlerProcess/caching_hostname_resolver_ipv6.py
 tests/CrawlerProcess/default_name_resolver.py
+tests/CrawlerProcess/multi.py
+tests/CrawlerProcess/reactor_default.py
+tests/CrawlerProcess/reactor_default_twisted_reactor_select.py
+tests/CrawlerProcess/reactor_select.py
+tests/CrawlerProcess/reactor_select_subclass_twisted_reactor_select.py
+tests/CrawlerProcess/reactor_select_twisted_reactor_select.py
 tests/CrawlerProcess/simple.py
 tests/CrawlerProcess/twisted_reactor_asyncio.py
 tests/CrawlerProcess/twisted_reactor_custom_settings.py
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/docs/conf.py 
new/Scrapy-2.6.2/docs/conf.py
--- old/Scrapy-2.6.1/docs/conf.py       2022-03-01 14:07:30.000000000 +0100
+++ new/Scrapy-2.6.2/docs/conf.py       2022-07-25 13:54:18.000000000 +0200
@@ -291,10 +291,12 @@
     'pytest': ('https://docs.pytest.org/en/latest', None),
     'python': ('https://docs.python.org/3', None),
     'sphinx': ('https://www.sphinx-doc.org/en/master', None),
-    'tox': ('https://tox.readthedocs.io/en/latest', None),
-    'twisted': ('https://twistedmatrix.com/documents/current', None),
-    'twistedapi': ('https://twistedmatrix.com/documents/current/api', None),
+    'tox': ('https://tox.wiki/en/latest/', None),
+    'twisted': ('https://docs.twisted.org/en/stable/', None),
+    'twistedapi': ('https://docs.twisted.org/en/stable/api/', None),
+    'w3lib': ('https://w3lib.readthedocs.io/en/latest', None),
 }
+intersphinx_disabled_reftypes = []
 
 
 # Options for sphinx-hoverxref options
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/docs/news.rst 
new/Scrapy-2.6.2/docs/news.rst
--- old/Scrapy-2.6.1/docs/news.rst      2022-03-01 14:07:30.000000000 +0100
+++ new/Scrapy-2.6.2/docs/news.rst      2022-07-25 13:54:18.000000000 +0200
@@ -3,6 +3,78 @@
 Release notes
 =============
 
+.. _release-2.6.2:
+
+Scrapy 2.6.2 (2022-07-25)
+-------------------------
+
+**Security bug fix:**
+
+-   When :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware`
+    processes a request with :reqmeta:`proxy` metadata, and that
+    :reqmeta:`proxy` metadata includes proxy credentials,
+    :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` sets
+    the ``Proxy-Authentication`` header, but only if that header is not already
+    set.
+
+    There are third-party proxy-rotation downloader middlewares that set
+    different :reqmeta:`proxy` metadata every time they process a request.
+
+    Because of request retries and redirects, the same request can be processed
+    by downloader middlewares more than once, including both
+    :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` and
+    any third-party proxy-rotation downloader middleware.
+
+    These third-party proxy-rotation downloader middlewares could change the
+    :reqmeta:`proxy` metadata of a request to a new value, but fail to remove
+    the ``Proxy-Authentication`` header from the previous value of the
+    :reqmeta:`proxy` metadata, causing the credentials of one proxy to be sent
+    to a different proxy.
+
+    To prevent the unintended leaking of proxy credentials, the behavior of
+    :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` is now
+    as follows when processing a request:
+
+    -   If the request being processed defines :reqmeta:`proxy` metadata that
+        includes credentials, the ``Proxy-Authorization`` header is always
+        updated to feature those credentials.
+
+    -   If the request being processed defines :reqmeta:`proxy` metadata
+        without credentials, the ``Proxy-Authorization`` header is removed
+        *unless* it was originally defined for the same proxy URL.
+
+        To remove proxy credentials while keeping the same proxy URL, remove
+        the ``Proxy-Authorization`` header.
+
+    -   If the request has no :reqmeta:`proxy` metadata, or that metadata is a
+        falsy value (e.g. ``None``), the ``Proxy-Authorization`` header is
+        removed.
+
+        It is no longer possible to set a proxy URL through the
+        :reqmeta:`proxy` metadata but set the credentials through the
+        ``Proxy-Authorization`` header. Set proxy credentials through the
+        :reqmeta:`proxy` metadata instead.
+
+Also fixes the following regressions introduced in 2.6.0:
+
+-   :class:`~scrapy.crawler.CrawlerProcess` supports again crawling multiple
+    spiders (:issue:`5435`, :issue:`5436`)
+
+-   Installing a Twisted reactor before Scrapy does (e.g. importing
+    :mod:`twisted.internet.reactor` somewhere at the module level) no longer
+    prevents Scrapy from starting, as long as a different reactor is not
+    specified in :setting:`TWISTED_REACTOR` (:issue:`5525`, :issue:`5528`)
+
+-   Fixed an exception that was being logged after the spider finished under
+    certain conditions (:issue:`5437`, :issue:`5440`)
+
+-   The ``--output``/``-o`` command-line parameter supports again a value
+    starting with a hyphen (:issue:`5444`, :issue:`5445`)
+
+-   The ``scrapy parse -h`` command no longer throws an error (:issue:`5481`,
+    :issue:`5482`)
+
+
 .. _release-2.6.1:
 
 Scrapy 2.6.1 (2022-03-01)
@@ -113,6 +185,9 @@
     meet expectations, :exc:`TypeError` is now raised at startup time. Before,
     other exceptions would be raised at run time. (:issue:`3559`)
 
+-   The ``_encoding`` field of serialized :class:`~scrapy.http.Request` objects
+    is now named ``encoding``, in line with all other fields (:issue:`5130`)
+
 
 Deprecation removals
 ~~~~~~~~~~~~~~~~~~~~
@@ -1897,6 +1972,59 @@
 (:issue:`3884`)
 
 
+.. _release-1.8.3:
+
+Scrapy 1.8.3 (2022-07-25)
+-------------------------
+
+**Security bug fix:**
+
+-   When :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware`
+    processes a request with :reqmeta:`proxy` metadata, and that
+    :reqmeta:`proxy` metadata includes proxy credentials,
+    :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` sets
+    the ``Proxy-Authentication`` header, but only if that header is not already
+    set.
+
+    There are third-party proxy-rotation downloader middlewares that set
+    different :reqmeta:`proxy` metadata every time they process a request.
+
+    Because of request retries and redirects, the same request can be processed
+    by downloader middlewares more than once, including both
+    :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` and
+    any third-party proxy-rotation downloader middleware.
+
+    These third-party proxy-rotation downloader middlewares could change the
+    :reqmeta:`proxy` metadata of a request to a new value, but fail to remove
+    the ``Proxy-Authentication`` header from the previous value of the
+    :reqmeta:`proxy` metadata, causing the credentials of one proxy to be sent
+    to a different proxy.
+
+    To prevent the unintended leaking of proxy credentials, the behavior of
+    :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` is now
+    as follows when processing a request:
+
+    -   If the request being processed defines :reqmeta:`proxy` metadata that
+        includes credentials, the ``Proxy-Authorization`` header is always
+        updated to feature those credentials.
+
+    -   If the request being processed defines :reqmeta:`proxy` metadata
+        without credentials, the ``Proxy-Authorization`` header is removed
+        *unless* it was originally defined for the same proxy URL.
+
+        To remove proxy credentials while keeping the same proxy URL, remove
+        the ``Proxy-Authorization`` header.
+
+    -   If the request has no :reqmeta:`proxy` metadata, or that metadata is a
+        falsy value (e.g. ``None``), the ``Proxy-Authorization`` header is
+        removed.
+
+        It is no longer possible to set a proxy URL through the
+        :reqmeta:`proxy` metadata but set the credentials through the
+        ``Proxy-Authorization`` header. Set proxy credentials through the
+        :reqmeta:`proxy` metadata instead.
+
+
 .. _release-1.8.2:
 
 Scrapy 1.8.2 (2022-03-01)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/docs/requirements.txt 
new/Scrapy-2.6.2/docs/requirements.txt
--- old/Scrapy-2.6.1/docs/requirements.txt      2022-03-01 14:07:30.000000000 
+0100
+++ new/Scrapy-2.6.2/docs/requirements.txt      2022-07-25 13:54:18.000000000 
+0200
@@ -1,4 +1,4 @@
-Sphinx>=3.0
-sphinx-hoverxref>=0.2b1
-sphinx-notfound-page>=0.4
-sphinx-rtd-theme>=0.5.2
\ No newline at end of file
+sphinx==5.0.2
+sphinx-hoverxref==1.1.1
+sphinx-notfound-page==0.8
+sphinx-rtd-theme==1.0.0
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/docs/topics/settings.rst 
new/Scrapy-2.6.2/docs/topics/settings.rst
--- old/Scrapy-2.6.1/docs/topics/settings.rst   2022-03-01 14:07:30.000000000 
+0100
+++ new/Scrapy-2.6.2/docs/topics/settings.rst   2022-07-25 13:54:18.000000000 
+0200
@@ -1638,9 +1638,10 @@
 
 
 The default value of the :setting:`TWISTED_REACTOR` setting is ``None``, which
-means that Scrapy will install the default reactor defined by Twisted for the
-current platform. This is to maintain backward compatibility and avoid possible
-problems caused by using a non-default reactor.
+means that Scrapy will use the existing reactor if one is already installed, or
+install the default reactor defined by Twisted for the current platform. This
+is to maintain backward compatibility and avoid possible problems caused by
+using a non-default reactor.
 
 For additional information, see :doc:`core/howto/choosing-reactor`.
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/VERSION 
new/Scrapy-2.6.2/scrapy/VERSION
--- old/Scrapy-2.6.1/scrapy/VERSION     2022-03-01 14:07:30.000000000 +0100
+++ new/Scrapy-2.6.2/scrapy/VERSION     2022-07-25 13:54:18.000000000 +0200
@@ -1 +1 @@
-2.6.1
+2.6.2
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/cmdline.py 
new/Scrapy-2.6.2/scrapy/cmdline.py
--- old/Scrapy-2.6.1/scrapy/cmdline.py  2022-03-01 14:07:30.000000000 +0100
+++ new/Scrapy-2.6.2/scrapy/cmdline.py  2022-07-25 13:54:18.000000000 +0200
@@ -14,6 +14,15 @@
 from scrapy.utils.python import garbage_collect
 
 
+class ScrapyArgumentParser(argparse.ArgumentParser):
+    def _parse_optional(self, arg_string):
+        # if starts with -: it means that is a parameter not a argument
+        if arg_string[:2] == '-:':
+            return None
+
+        return super()._parse_optional(arg_string)
+
+
 def _iter_command_classes(module_name):
     # TODO: add `name` attribute to commands and and merge this function with
     # scrapy.utils.spider.iter_spider_classes
@@ -131,10 +140,10 @@
         sys.exit(2)
 
     cmd = cmds[cmdname]
-    parser = argparse.ArgumentParser(formatter_class=ScrapyHelpFormatter,
-                                     usage=f"scrapy {cmdname} {cmd.syntax()}",
-                                     conflict_handler='resolve',
-                                     description=cmd.long_desc())
+    parser = ScrapyArgumentParser(formatter_class=ScrapyHelpFormatter,
+                                  usage=f"scrapy {cmdname} {cmd.syntax()}",
+                                  conflict_handler='resolve',
+                                  description=cmd.long_desc())
     settings.setdict(cmd.default_settings, priority='command')
     cmd.settings = settings
     cmd.add_options(parser)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/commands/parse.py 
new/Scrapy-2.6.2/scrapy/commands/parse.py
--- old/Scrapy-2.6.1/scrapy/commands/parse.py   2022-03-01 14:07:30.000000000 
+0100
+++ new/Scrapy-2.6.2/scrapy/commands/parse.py   2022-07-25 13:54:18.000000000 
+0200
@@ -51,7 +51,7 @@
         parser.add_argument("--cbkwargs", dest="cbkwargs",
                             help="inject extra callback kwargs into the 
Request, it must be a valid raw json string")
         parser.add_argument("-d", "--depth", dest="depth", type=int, default=1,
-                            help="maximum depth for parsing requests [default: 
%default]")
+                            help="maximum depth for parsing requests [default: 
%(default)s]")
         parser.add_argument("-v", "--verbose", dest="verbose", 
action="store_true",
                             help="print each depth level one by one")
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/core/engine.py 
new/Scrapy-2.6.2/scrapy/core/engine.py
--- old/Scrapy-2.6.1/scrapy/core/engine.py      2022-03-01 14:07:30.000000000 
+0100
+++ new/Scrapy-2.6.2/scrapy/core/engine.py      2022-07-25 13:54:18.000000000 
+0200
@@ -136,7 +136,9 @@
         self.paused = False
 
     def _next_request(self) -> None:
-        assert self.slot is not None  # typing
+        if self.slot is None:
+            return
+
         assert self.spider is not None  # typing
 
         if self.paused:
@@ -184,7 +186,8 @@
         d.addErrback(lambda f: logger.info('Error while removing request from 
slot',
                                            exc_info=failure_to_exc_info(f),
                                            extra={'spider': self.spider}))
-        d.addBoth(lambda _: self.slot.nextcall.schedule())
+        slot = self.slot
+        d.addBoth(lambda _: slot.nextcall.schedule())
         d.addErrback(lambda f: logger.info('Error while scheduling new 
request',
                                            exc_info=failure_to_exc_info(f),
                                            extra={'spider': self.spider}))
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/crawler.py 
new/Scrapy-2.6.2/scrapy/crawler.py
--- old/Scrapy-2.6.1/scrapy/crawler.py  2022-03-01 14:07:30.000000000 +0100
+++ new/Scrapy-2.6.2/scrapy/crawler.py  2022-07-25 13:54:18.000000000 +0200
@@ -78,8 +78,7 @@
             if reactor_class:
                 install_reactor(reactor_class, 
self.settings["ASYNCIO_EVENT_LOOP"])
             else:
-                from twisted.internet import default
-                default.install()
+                from twisted.internet import reactor  # noqa: F401
             log_reactor_info()
         if reactor_class:
             verify_installed_reactor(reactor_class)
@@ -290,6 +289,7 @@
         super().__init__(settings)
         configure_logging(self.settings, install_root_handler)
         log_scrapy_info(self.settings)
+        self._initialized_reactor = False
 
     def _signal_shutdown(self, signum, _):
         from twisted.internet import reactor
@@ -310,7 +310,9 @@
     def _create_crawler(self, spidercls):
         if isinstance(spidercls, str):
             spidercls = self.spider_loader.load(spidercls)
-        return Crawler(spidercls, self.settings, init_reactor=True)
+        init_reactor = not self._initialized_reactor
+        self._initialized_reactor = True
+        return Crawler(spidercls, self.settings, init_reactor=init_reactor)
 
     def start(self, stop_after_crawl=True, install_signal_handlers=True):
         """
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/Scrapy-2.6.1/scrapy/downloadermiddlewares/httpproxy.py 
new/Scrapy-2.6.2/scrapy/downloadermiddlewares/httpproxy.py
--- old/Scrapy-2.6.1/scrapy/downloadermiddlewares/httpproxy.py  2022-03-01 
14:07:30.000000000 +0100
+++ new/Scrapy-2.6.2/scrapy/downloadermiddlewares/httpproxy.py  2022-07-25 
13:54:18.000000000 +0200
@@ -45,31 +45,37 @@
         return creds, proxy_url
 
     def process_request(self, request, spider):
-        # ignore if proxy is already set
+        creds, proxy_url = None, None
         if 'proxy' in request.meta:
-            if request.meta['proxy'] is None:
-                return
-            # extract credentials if present
-            creds, proxy_url = self._get_proxy(request.meta['proxy'], '')
-            request.meta['proxy'] = proxy_url
-            if creds and not request.headers.get('Proxy-Authorization'):
-                request.headers['Proxy-Authorization'] = b'Basic ' + creds
-            return
-        elif not self.proxies:
-            return
-
-        parsed = urlparse_cached(request)
-        scheme = parsed.scheme
+            if request.meta['proxy'] is not None:
+                creds, proxy_url = self._get_proxy(request.meta['proxy'], '')
+        elif self.proxies:
+            parsed = urlparse_cached(request)
+            scheme = parsed.scheme
+            if (
+                (
+                    # 'no_proxy' is only supported by http schemes
+                    scheme not in ('http', 'https')
+                    or not proxy_bypass(parsed.hostname)
+                )
+                and scheme in self.proxies
+            ):
+                creds, proxy_url = self.proxies[scheme]
 
-        # 'no_proxy' is only supported by http schemes
-        if scheme in ('http', 'https') and proxy_bypass(parsed.hostname):
-            return
+        self._set_proxy_and_creds(request, proxy_url, creds)
 
-        if scheme in self.proxies:
-            self._set_proxy(request, scheme)
-
-    def _set_proxy(self, request, scheme):
-        creds, proxy = self.proxies[scheme]
-        request.meta['proxy'] = proxy
+    def _set_proxy_and_creds(self, request, proxy_url, creds):
+        if proxy_url:
+            request.meta['proxy'] = proxy_url
+        elif request.meta.get('proxy') is not None:
+            request.meta['proxy'] = None
         if creds:
-            request.headers['Proxy-Authorization'] = b'Basic ' + creds
+            request.headers[b'Proxy-Authorization'] = b'Basic ' + creds
+            request.meta['_auth_proxy'] = proxy_url
+        elif '_auth_proxy' in request.meta:
+            if proxy_url != request.meta['_auth_proxy']:
+                if b'Proxy-Authorization' in request.headers:
+                    del request.headers[b'Proxy-Authorization']
+                del request.meta['_auth_proxy']
+        elif b'Proxy-Authorization' in request.headers:
+            del request.headers[b'Proxy-Authorization']
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/utils/reactor.py 
new/Scrapy-2.6.2/scrapy/utils/reactor.py
--- old/Scrapy-2.6.1/scrapy/utils/reactor.py    2022-03-01 14:07:30.000000000 
+0100
+++ new/Scrapy-2.6.2/scrapy/utils/reactor.py    2022-07-25 13:54:18.000000000 
+0200
@@ -83,7 +83,7 @@
     path."""
     from twisted.internet import reactor
     reactor_class = load_object(reactor_path)
-    if not isinstance(reactor, reactor_class):
+    if not reactor.__class__ == reactor_class:
         msg = ("The installed reactor "
                f"({reactor.__module__}.{reactor.__class__.__name__}) does not "
                f"match the requested one ({reactor_path})")
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/tests/CrawlerProcess/multi.py 
new/Scrapy-2.6.2/tests/CrawlerProcess/multi.py
--- old/Scrapy-2.6.1/tests/CrawlerProcess/multi.py      1970-01-01 
01:00:00.000000000 +0100
+++ new/Scrapy-2.6.2/tests/CrawlerProcess/multi.py      2022-07-25 
13:54:18.000000000 +0200
@@ -0,0 +1,16 @@
+import scrapy
+from scrapy.crawler import CrawlerProcess
+
+
+class NoRequestsSpider(scrapy.Spider):
+    name = 'no_request'
+
+    def start_requests(self):
+        return []
+
+
+process = CrawlerProcess(settings={})
+
+process.crawl(NoRequestsSpider)
+process.crawl(NoRequestsSpider)
+process.start()
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_default.py 
new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_default.py
--- old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_default.py    1970-01-01 
01:00:00.000000000 +0100
+++ new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_default.py    2022-07-25 
13:54:18.000000000 +0200
@@ -0,0 +1,17 @@
+import scrapy
+from scrapy.crawler import CrawlerProcess
+from twisted.internet import reactor
+
+
+class NoRequestsSpider(scrapy.Spider):
+    name = 'no_request'
+
+    def start_requests(self):
+        return []
+
+
+process = CrawlerProcess(settings={})
+
+process.crawl(NoRequestsSpider)
+process.start()
+
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_default_twisted_reactor_select.py 
new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_default_twisted_reactor_select.py
--- 
old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_default_twisted_reactor_select.py 
    1970-01-01 01:00:00.000000000 +0100
+++ 
new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_default_twisted_reactor_select.py 
    2022-07-25 13:54:18.000000000 +0200
@@ -0,0 +1,20 @@
+import scrapy
+from scrapy.crawler import CrawlerProcess
+from twisted.internet import reactor
+
+
+class NoRequestsSpider(scrapy.Spider):
+    name = 'no_request'
+
+    def start_requests(self):
+        return []
+
+
+process = CrawlerProcess(settings={
+    "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor",
+})
+
+process.crawl(NoRequestsSpider)
+process.start()
+
+
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select.py 
new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select.py
--- old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select.py     1970-01-01 
01:00:00.000000000 +0100
+++ new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select.py     2022-07-25 
13:54:18.000000000 +0200
@@ -0,0 +1,19 @@
+import scrapy
+from scrapy.crawler import CrawlerProcess
+from twisted.internet import selectreactor
+selectreactor.install()
+
+
+class NoRequestsSpider(scrapy.Spider):
+    name = 'no_request'
+
+    def start_requests(self):
+        return []
+
+
+process = CrawlerProcess(settings={})
+
+process.crawl(NoRequestsSpider)
+process.start()
+
+
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select_subclass_twisted_reactor_select.py
 
new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select_subclass_twisted_reactor_select.py
--- 
old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select_subclass_twisted_reactor_select.py
     1970-01-01 01:00:00.000000000 +0100
+++ 
new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select_subclass_twisted_reactor_select.py
     2022-07-25 13:54:18.000000000 +0200
@@ -0,0 +1,31 @@
+import scrapy
+from scrapy.crawler import CrawlerProcess
+from twisted.internet.main import installReactor
+from twisted.internet.selectreactor import SelectReactor
+
+
+class SelectReactorSubclass(SelectReactor):
+    pass
+
+
+reactor = SelectReactorSubclass()
+installReactor(reactor)
+
+
+class NoRequestsSpider(scrapy.Spider):
+    name = 'no_request'
+
+    def start_requests(self):
+        return []
+
+
+process = CrawlerProcess(settings={
+    "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor",
+})
+
+process.crawl(NoRequestsSpider)
+process.start()
+
+
+
+
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select_twisted_reactor_select.py 
new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select_twisted_reactor_select.py
--- 
old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select_twisted_reactor_select.py  
    1970-01-01 01:00:00.000000000 +0100
+++ 
new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select_twisted_reactor_select.py  
    2022-07-25 13:54:18.000000000 +0200
@@ -0,0 +1,22 @@
+import scrapy
+from scrapy.crawler import CrawlerProcess
+from twisted.internet import selectreactor
+selectreactor.install()
+
+
+class NoRequestsSpider(scrapy.Spider):
+    name = 'no_request'
+
+    def start_requests(self):
+        return []
+
+
+process = CrawlerProcess(settings={
+    "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor",
+})
+
+process.crawl(NoRequestsSpider)
+process.start()
+
+
+
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/tests/test_commands.py 
new/Scrapy-2.6.2/tests/test_commands.py
--- old/Scrapy-2.6.1/tests/test_commands.py     2022-03-01 14:07:30.000000000 
+0100
+++ new/Scrapy-2.6.2/tests/test_commands.py     2022-07-25 13:54:18.000000000 
+0200
@@ -770,6 +770,21 @@
         log = self.get_log(spider_code, args=args)
         self.assertIn("error: Please use only one of -o/--output and 
-O/--overwrite-output", log)
 
+    def test_output_stdout(self):
+        spider_code = """
+import scrapy
+
+class MySpider(scrapy.Spider):
+    name = 'myspider'
+
+    def start_requests(self):
+        self.logger.debug('FEEDS: {}'.format(self.settings.getdict('FEEDS')))
+        return []
+"""
+        args = ['-o', '-:json']
+        log = self.get_log(spider_code, args=args)
+        self.assertIn("[myspider] DEBUG: FEEDS: {'stdout:': {'format': 
'json'}}", log)
+
 
 @skipIf(platform.system() != 'Windows', "Windows required for .pyw files")
 class WindowsRunSpiderCommandTest(RunSpiderCommandTest):
@@ -915,3 +930,17 @@
         args = ['-o', 'example1.json', '-O', 'example2.json']
         log = self.get_log(spider_code, args=args)
         self.assertIn("error: Please use only one of -o/--output and 
-O/--overwrite-output", log)
+
+
+class HelpMessageTest(CommandTest):
+
+    def setUp(self):
+        super().setUp()
+        self.commands = ["parse", "startproject", "view", "crawl", "edit",
+                         "list", "fetch", "settings", "shell", "runspider",
+                         "version", "genspider", "check", "bench"]
+
+    def test_help_messages(self):
+        for command in self.commands:
+            _, out, _ = self.proc(command, "-h")
+            self.assertIn("Usage", out)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/tests/test_crawler.py 
new/Scrapy-2.6.2/tests/test_crawler.py
--- old/Scrapy-2.6.1/tests/test_crawler.py      2022-03-01 14:07:30.000000000 
+0100
+++ new/Scrapy-2.6.2/tests/test_crawler.py      2022-07-25 13:54:18.000000000 
+0200
@@ -302,6 +302,63 @@
         self.assertIn('Spider closed (finished)', log)
         self.assertNotIn("Using reactor: 
twisted.internet.asyncioreactor.AsyncioSelectorReactor", log)
 
+    def test_multi(self):
+        log = self.run_script('multi.py')
+        self.assertIn('Spider closed (finished)', log)
+        self.assertNotIn("Using reactor: 
twisted.internet.asyncioreactor.AsyncioSelectorReactor", log)
+        self.assertNotIn("ReactorAlreadyInstalledError", log)
+
+    def test_reactor_default(self):
+        log = self.run_script('reactor_default.py')
+        self.assertIn('Spider closed (finished)', log)
+        self.assertNotIn("Using reactor: 
twisted.internet.asyncioreactor.AsyncioSelectorReactor", log)
+        self.assertNotIn("ReactorAlreadyInstalledError", log)
+
+    def test_reactor_default_twisted_reactor_select(self):
+        log = self.run_script('reactor_default_twisted_reactor_select.py')
+        if platform.system() == 'Windows':
+            # The goal of this test function is to test that, when a reactor is
+            # installed (the default one here) and a different reactor is
+            # configured (select here), an error raises.
+            #
+            # In Windows the default reactor is the select reactor, so that
+            # error does not raise.
+            #
+            # If that ever becomes the case on more platforms (i.e. if Linux
+            # also starts using the select reactor by default in a future
+            # version of Twisted), then we will need to rethink this test.
+            self.assertIn('Spider closed (finished)', log)
+        else:
+            self.assertNotIn('Spider closed (finished)', log)
+            self.assertIn(
+                (
+                    "does not match the requested one "
+                    "(twisted.internet.selectreactor.SelectReactor)"
+                ),
+                log,
+            )
+
+    def test_reactor_select(self):
+        log = self.run_script('reactor_select.py')
+        self.assertIn('Spider closed (finished)', log)
+        self.assertNotIn("ReactorAlreadyInstalledError", log)
+
+    def test_reactor_select_twisted_reactor_select(self):
+        log = self.run_script('reactor_select_twisted_reactor_select.py')
+        self.assertIn('Spider closed (finished)', log)
+        self.assertNotIn("ReactorAlreadyInstalledError", log)
+
+    def test_reactor_select_subclass_twisted_reactor_select(self):
+        log = 
self.run_script('reactor_select_subclass_twisted_reactor_select.py')
+        self.assertNotIn('Spider closed (finished)', log)
+        self.assertIn(
+            (
+                "does not match the requested one "
+                "(twisted.internet.selectreactor.SelectReactor)"
+            ),
+            log,
+        )
+
     def test_asyncio_enabled_no_reactor(self):
         log = self.run_script('asyncio_enabled_no_reactor.py')
         self.assertIn('Spider closed (finished)', log)
@@ -334,33 +391,33 @@
             self.assertNotIn("TimeoutError", log)
             self.assertNotIn("twisted.internet.error.DNSLookupError", log)
 
-    def test_reactor_select(self):
+    def test_twisted_reactor_select(self):
         log = self.run_script("twisted_reactor_select.py")
         self.assertIn("Spider closed (finished)", log)
         self.assertIn("Using reactor: 
twisted.internet.selectreactor.SelectReactor", log)
 
     @mark.skipif(platform.system() == 'Windows', reason="PollReactor is not 
supported on Windows")
-    def test_reactor_poll(self):
+    def test_twisted_reactor_poll(self):
         log = self.run_script("twisted_reactor_poll.py")
         self.assertIn("Spider closed (finished)", log)
         self.assertIn("Using reactor: 
twisted.internet.pollreactor.PollReactor", log)
 
-    def test_reactor_asyncio(self):
+    def test_twisted_reactor_asyncio(self):
         log = self.run_script("twisted_reactor_asyncio.py")
         self.assertIn("Spider closed (finished)", log)
         self.assertIn("Using reactor: 
twisted.internet.asyncioreactor.AsyncioSelectorReactor", log)
 
-    def test_reactor_asyncio_custom_settings(self):
+    def test_twisted_reactor_asyncio_custom_settings(self):
         log = self.run_script("twisted_reactor_custom_settings.py")
         self.assertIn("Spider closed (finished)", log)
         self.assertIn("Using reactor: 
twisted.internet.asyncioreactor.AsyncioSelectorReactor", log)
 
-    def test_reactor_asyncio_custom_settings_same(self):
+    def test_twisted_reactor_asyncio_custom_settings_same(self):
         log = self.run_script("twisted_reactor_custom_settings_same.py")
         self.assertIn("Spider closed (finished)", log)
         self.assertIn("Using reactor: 
twisted.internet.asyncioreactor.AsyncioSelectorReactor", log)
 
-    def test_reactor_asyncio_custom_settings_conflict(self):
+    def test_twisted_reactor_asyncio_custom_settings_conflict(self):
         log = self.run_script("twisted_reactor_custom_settings_conflict.py")
         self.assertIn("Using reactor: 
twisted.internet.selectreactor.SelectReactor", log)
         self.assertIn("(twisted.internet.selectreactor.SelectReactor) does not 
match the requested one", log)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/Scrapy-2.6.1/tests/test_downloadermiddleware_httpproxy.py 
new/Scrapy-2.6.2/tests/test_downloadermiddleware_httpproxy.py
--- old/Scrapy-2.6.1/tests/test_downloadermiddleware_httpproxy.py       
2022-03-01 14:07:30.000000000 +0100
+++ new/Scrapy-2.6.2/tests/test_downloadermiddleware_httpproxy.py       
2022-07-25 13:54:18.000000000 +0200
@@ -65,12 +65,12 @@
         mw = HttpProxyMiddleware()
         req = Request('http://scrapytest.org')
         assert mw.process_request(req, spider) is None
-        self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'})
+        self.assertEqual(req.meta['proxy'], 'https://proxy:3128')
         self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic 
dXNlcjpwYXNz')
         # proxy from request.meta
         req = Request('http://scrapytest.org', meta={'proxy': 
'https://username:password@proxy:3128'})
         assert mw.process_request(req, spider) is None
-        self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'})
+        self.assertEqual(req.meta['proxy'], 'https://proxy:3128')
         self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic 
dXNlcm5hbWU6cGFzc3dvcmQ=')
 
     def test_proxy_auth_empty_passwd(self):
@@ -78,12 +78,12 @@
         mw = HttpProxyMiddleware()
         req = Request('http://scrapytest.org')
         assert mw.process_request(req, spider) is None
-        self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'})
+        self.assertEqual(req.meta['proxy'], 'https://proxy:3128')
         self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic 
dXNlcjo=')
         # proxy from request.meta
         req = Request('http://scrapytest.org', meta={'proxy': 
'https://username:@proxy:3128'})
         assert mw.process_request(req, spider) is None
-        self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'})
+        self.assertEqual(req.meta['proxy'], 'https://proxy:3128')
         self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic 
dXNlcm5hbWU6')
 
     def test_proxy_auth_encoding(self):
@@ -92,26 +92,26 @@
         mw = HttpProxyMiddleware(auth_encoding='utf-8')
         req = Request('http://scrapytest.org')
         assert mw.process_request(req, spider) is None
-        self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'})
+        self.assertEqual(req.meta['proxy'], 'https://proxy:3128')
         self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic 
bcOhbjpwYXNz')
 
         # proxy from request.meta
         req = Request('http://scrapytest.org', meta={'proxy': 
'https://\u00FCser:pass@proxy:3128'})
         assert mw.process_request(req, spider) is None
-        self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'})
+        self.assertEqual(req.meta['proxy'], 'https://proxy:3128')
         self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic 
w7xzZXI6cGFzcw==')
 
         # default latin-1 encoding
         mw = HttpProxyMiddleware(auth_encoding='latin-1')
         req = Request('http://scrapytest.org')
         assert mw.process_request(req, spider) is None
-        self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'})
+        self.assertEqual(req.meta['proxy'], 'https://proxy:3128')
         self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic 
beFuOnBhc3M=')
 
         # proxy from request.meta, latin-1 encoding
         req = Request('http://scrapytest.org', meta={'proxy': 
'https://\u00FCser:pass@proxy:3128'})
         assert mw.process_request(req, spider) is None
-        self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'})
+        self.assertEqual(req.meta['proxy'], 'https://proxy:3128')
         self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic 
/HNlcjpwYXNz')
 
     def test_proxy_already_seted(self):
@@ -152,3 +152,300 @@
         # '/var/run/docker.sock' may be used by the user for
         # no_proxy value but is not parseable and should be skipped
         assert 'no' not in mw.proxies
+
+    def test_add_proxy_without_credentials(self):
+        middleware = HttpProxyMiddleware()
+        request = Request('https://example.com')
+        assert middleware.process_request(request, spider) is None
+        request.meta['proxy'] = 'https://example.com'
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.com')
+        self.assertNotIn(b'Proxy-Authorization', request.headers)
+
+    def test_add_proxy_with_credentials(self):
+        middleware = HttpProxyMiddleware()
+        request = Request('https://example.com')
+        assert middleware.process_request(request, spider) is None
+        request.meta['proxy'] = 'https://user1:passwo...@example.com'
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.com')
+        encoded_credentials = middleware._basic_auth_header(
+            'user1',
+            'password1',
+        )
+        self.assertEqual(
+            request.headers['Proxy-Authorization'],
+            b'Basic ' + encoded_credentials,
+        )
+
+    def test_remove_proxy_without_credentials(self):
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            meta={'proxy': 'https://example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+        request.meta['proxy'] = None
+        assert middleware.process_request(request, spider) is None
+        self.assertIsNone(request.meta['proxy'])
+        self.assertNotIn(b'Proxy-Authorization', request.headers)
+
+    def test_remove_proxy_with_credentials(self):
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            meta={'proxy': 'https://user1:passwo...@example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+        request.meta['proxy'] = None
+        assert middleware.process_request(request, spider) is None
+        self.assertIsNone(request.meta['proxy'])
+        self.assertNotIn(b'Proxy-Authorization', request.headers)
+
+    def test_add_credentials(self):
+        """If the proxy request meta switches to a proxy URL with the same
+        proxy and adds credentials (there were no credentials before), the new
+        credentials must be used."""
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            meta={'proxy': 'https://example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+
+        request.meta['proxy'] = 'https://user1:passwo...@example.com'
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.com')
+        encoded_credentials = middleware._basic_auth_header(
+            'user1',
+            'password1',
+        )
+        self.assertEqual(
+            request.headers['Proxy-Authorization'],
+            b'Basic ' + encoded_credentials,
+        )
+
+    def test_change_credentials(self):
+        """If the proxy request meta switches to a proxy URL with different
+        credentials, those new credentials must be used."""
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            meta={'proxy': 'https://user1:passwo...@example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+        request.meta['proxy'] = 'https://user2:passwo...@example.com'
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.com')
+        encoded_credentials = middleware._basic_auth_header(
+            'user2',
+            'password2',
+        )
+        self.assertEqual(
+            request.headers['Proxy-Authorization'],
+            b'Basic ' + encoded_credentials,
+        )
+
+    def test_remove_credentials(self):
+        """If the proxy request meta switches to a proxy URL with the same
+        proxy but no credentials, the original credentials must be still
+        used.
+
+        To remove credentials while keeping the same proxy URL, users must
+        delete the Proxy-Authorization header.
+        """
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            meta={'proxy': 'https://user1:passwo...@example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+
+        request.meta['proxy'] = 'https://example.com'
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.com')
+        encoded_credentials = middleware._basic_auth_header(
+            'user1',
+            'password1',
+        )
+        self.assertEqual(
+            request.headers['Proxy-Authorization'],
+            b'Basic ' + encoded_credentials,
+        )
+
+        request.meta['proxy'] = 'https://example.com'
+        del request.headers[b'Proxy-Authorization']
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.com')
+        self.assertNotIn(b'Proxy-Authorization', request.headers)
+
+    def test_change_proxy_add_credentials(self):
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            meta={'proxy': 'https://example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+
+        request.meta['proxy'] = 'https://user1:passwo...@example.org'
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.org')
+        encoded_credentials = middleware._basic_auth_header(
+            'user1',
+            'password1',
+        )
+        self.assertEqual(
+            request.headers['Proxy-Authorization'],
+            b'Basic ' + encoded_credentials,
+        )
+
+    def test_change_proxy_keep_credentials(self):
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            meta={'proxy': 'https://user1:passwo...@example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+
+        request.meta['proxy'] = 'https://user1:passwo...@example.org'
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.org')
+        encoded_credentials = middleware._basic_auth_header(
+            'user1',
+            'password1',
+        )
+        self.assertEqual(
+            request.headers['Proxy-Authorization'],
+            b'Basic ' + encoded_credentials,
+        )
+
+        # Make sure, indirectly, that _auth_proxy is updated.
+        request.meta['proxy'] = 'https://example.com'
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.com')
+        self.assertNotIn(b'Proxy-Authorization', request.headers)
+
+    def test_change_proxy_change_credentials(self):
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            meta={'proxy': 'https://user1:passwo...@example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+
+        request.meta['proxy'] = 'https://user2:passwo...@example.org'
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.org')
+        encoded_credentials = middleware._basic_auth_header(
+            'user2',
+            'password2',
+        )
+        self.assertEqual(
+            request.headers['Proxy-Authorization'],
+            b'Basic ' + encoded_credentials,
+        )
+
+    def test_change_proxy_remove_credentials(self):
+        """If the proxy request meta switches to a proxy URL with a different
+        proxy and no credentials, no credentials must be used."""
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            meta={'proxy': 'https://user1:passwo...@example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+        request.meta['proxy'] = 'https://example.org'
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta, {'proxy': 'https://example.org'})
+        self.assertNotIn(b'Proxy-Authorization', request.headers)
+
+    def test_change_proxy_remove_credentials_preremoved_header(self):
+        """Corner case of proxy switch with credentials removal where the
+        credentials have been removed beforehand.
+
+        It ensures that our implementation does not assume that the credentials
+        header exists when trying to remove it.
+        """
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            meta={'proxy': 'https://user1:passwo...@example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+        request.meta['proxy'] = 'https://example.org'
+        del request.headers[b'Proxy-Authorization']
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta, {'proxy': 'https://example.org'})
+        self.assertNotIn(b'Proxy-Authorization', request.headers)
+
+    def test_proxy_authentication_header_undefined_proxy(self):
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            headers={'Proxy-Authorization': 'Basic foo'},
+        )
+        assert middleware.process_request(request, spider) is None
+        self.assertNotIn('proxy', request.meta)
+        self.assertNotIn(b'Proxy-Authorization', request.headers)
+
+    def test_proxy_authentication_header_disabled_proxy(self):
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            headers={'Proxy-Authorization': 'Basic foo'},
+            meta={'proxy': None},
+        )
+        assert middleware.process_request(request, spider) is None
+        self.assertIsNone(request.meta['proxy'])
+        self.assertNotIn(b'Proxy-Authorization', request.headers)
+
+    def test_proxy_authentication_header_proxy_without_credentials(self):
+        middleware = HttpProxyMiddleware()
+        request = Request(
+            'https://example.com',
+            headers={'Proxy-Authorization': 'Basic foo'},
+            meta={'proxy': 'https://example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.com')
+        self.assertNotIn(b'Proxy-Authorization', request.headers)
+
+    def test_proxy_authentication_header_proxy_with_same_credentials(self):
+        middleware = HttpProxyMiddleware()
+        encoded_credentials = middleware._basic_auth_header(
+            'user1',
+            'password1',
+        )
+        request = Request(
+            'https://example.com',
+            headers={'Proxy-Authorization': b'Basic ' + encoded_credentials},
+            meta={'proxy': 'https://user1:passwo...@example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.com')
+        self.assertEqual(
+            request.headers['Proxy-Authorization'],
+            b'Basic ' + encoded_credentials,
+        )
+
+    def 
test_proxy_authentication_header_proxy_with_different_credentials(self):
+        middleware = HttpProxyMiddleware()
+        encoded_credentials1 = middleware._basic_auth_header(
+            'user1',
+            'password1',
+        )
+        request = Request(
+            'https://example.com',
+            headers={'Proxy-Authorization': b'Basic ' + encoded_credentials1},
+            meta={'proxy': 'https://user2:passwo...@example.com'},
+        )
+        assert middleware.process_request(request, spider) is None
+        self.assertEqual(request.meta['proxy'], 'https://example.com')
+        encoded_credentials2 = middleware._basic_auth_header(
+            'user2',
+            'password2',
+        )
+        self.assertEqual(
+            request.headers['Proxy-Authorization'],
+            b'Basic ' + encoded_credentials2,
+        )
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/tests/test_engine.py 
new/Scrapy-2.6.2/tests/test_engine.py
--- old/Scrapy-2.6.1/tests/test_engine.py       2022-03-01 14:07:30.000000000 
+0100
+++ new/Scrapy-2.6.2/tests/test_engine.py       2022-07-25 13:54:18.000000000 
+0200
@@ -12,9 +12,11 @@
 
 import os
 import re
+import subprocess
 import sys
 import warnings
 from collections import defaultdict
+from threading import Timer
 from urllib.parse import urlparse
 
 import attr
@@ -502,6 +504,37 @@
             self.assertEqual(warning_list[0].category, 
ScrapyDeprecationWarning)
             self.assertEqual(str(warning_list[0].message), 
"ExecutionEngine.has_capacity is deprecated")
 
+    def test_short_timeout(self):
+        args = (
+            sys.executable,
+            '-m',
+            'scrapy.cmdline',
+            'fetch',
+            '-s',
+            'CLOSESPIDER_TIMEOUT=0.001',
+            '-s',
+            'LOG_LEVEL=DEBUG',
+            'http://toscrape.com',
+        )
+        p = subprocess.Popen(
+            args,
+            stderr=subprocess.PIPE,
+        )
+
+        def kill_proc():
+            p.kill()
+            p.communicate()
+            assert False, 'Command took too much time to complete'
+
+        timer = Timer(15, kill_proc)
+        try:
+            timer.start()
+            _, stderr = p.communicate()
+        finally:
+            timer.cancel()
+
+        self.assertNotIn(b'Traceback', stderr)
+
 
 if __name__ == "__main__":
     if len(sys.argv) > 1 and sys.argv[1] == 'runserver':
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/Scrapy-2.6.1/tox.ini new/Scrapy-2.6.2/tox.ini
--- old/Scrapy-2.6.1/tox.ini    2022-03-01 14:07:30.000000000 +0100
+++ new/Scrapy-2.6.2/tox.ini    2022-07-25 13:54:18.000000000 +0200
@@ -12,10 +12,11 @@
     -rtests/requirements.txt
     # mitmproxy does not support PyPy
     # mitmproxy does not support Windows when running Python < 3.7
-    # Python 3.9+ requires 
https://github.com/mitmproxy/mitmproxy/commit/8e5e43de24c9bc93092b63efc67fbec029a9e7fe
+    # Python 3.9+ requires mitmproxy >= 5.3.0
     # mitmproxy >= 5.3.0 requires h2 >= 4.0, Twisted 21.2 requires h2 < 4.0
     #mitmproxy >= 5.3.0; python_version >= '3.9' and implementation_name != 
'pypy'
-    mitmproxy >= 4.0.4; python_version >= '3.7' and python_version < '3.9' and 
implementation_name != 'pypy'
+    # The tests hang with mitmproxy 8.0.0: 
https://github.com/scrapy/scrapy/issues/5454
+    mitmproxy >= 4.0.4, < 8; python_version >= '3.7' and python_version < 
'3.9' and implementation_name != 'pypy'
     mitmproxy >= 4.0.4, < 5; python_version >= '3.6' and python_version < 
'3.7' and platform_system != 'Windows' and implementation_name != 'pypy'
     # newer markupsafe is incompatible with deps of old mitmproxy (which we 
get on Python 3.7 and lower)
     markupsafe < 2.1.0; python_version >= '3.6' and python_version < '3.8' and 
implementation_name != 'pypy'

commit python-Scrapy for openSUSE:Factory

Reply via email to