Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package python-Scrapy for openSUSE:Factory checked in at 2022-09-12 19:08:23 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-Scrapy (Old) and /work/SRC/openSUSE:Factory/.python-Scrapy.new.2083 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-Scrapy" Mon Sep 12 19:08:23 2022 rev:15 rq:1002736 version:2.6.2 Changes: -------- --- /work/SRC/openSUSE:Factory/python-Scrapy/python-Scrapy.changes 2022-03-06 18:48:44.204173661 +0100 +++ /work/SRC/openSUSE:Factory/.python-Scrapy.new.2083/python-Scrapy.changes 2022-09-12 19:08:24.510563363 +0200 @@ -1,0 +2,30 @@ +Fri Sep 9 15:21:20 UTC 2022 - Yogalakshmi Arunachalam <yarunacha...@suse.com> + +- Update to v2.6.2 + Security bug fix: + * When HttpProxyMiddleware processes a request with proxy metadata, and that proxy metadata includes proxy credentials, + HttpProxyMiddleware sets the Proxy-Authentication header, but only if that header is not already set. + * There are third-party proxy-rotation downloader middlewares that set different proxy metadata every time they process a request. + * Because of request retries and redirects, the same request can be processed by downloader middlewares more than once, + including both HttpProxyMiddleware and any third-party proxy-rotation downloader middleware. + * These third-party proxy-rotation downloader middlewares could change the proxy metadata of a request to a new value, + but fail to remove the Proxy-Authentication header from the previous value of the proxy metadata, causing the credentials of one + proxy to be sent to a different proxy. + * To prevent the unintended leaking of proxy credentials, the behavior of HttpProxyMiddleware is now as follows when processing a request: + + If the request being processed defines proxy metadata that includes credentials, the Proxy-Authorization header is always updated + to feature those credentials. + + If the request being processed defines proxy metadata without credentials, the Proxy-Authorization header is removed unless + it was originally defined for the same proxy URL. + + To remove proxy credentials while keeping the same proxy URL, remove the Proxy-Authorization header. + + If the request has no proxy metadata, or that metadata is a falsy value (e.g. None), the Proxy-Authorization header is removed. + + It is no longer possible to set a proxy URL through the proxy metadata but set the credentials through the Proxy-Authorization header. + Set proxy credentials through the proxy metadata instead. + * Also fixes the following regressions introduced in 2.6.0: + + CrawlerProcess supports again crawling multiple spiders (issue 5435, issue 5436) + + Installing a Twisted reactor before Scrapy does (e.g. importing twisted.internet.reactor somewhere at the module level) + no longer prevents Scrapy from starting, as long as a different reactor is not specified in TWISTED_REACTOR (issue 5525, issue 5528) + + Fixed an exception that was being logged after the spider finished under certain conditions (issue 5437, issue 5440) + + The --output/-o command-line parameter supports again a value starting with a hyphen (issue 5444, issue 5445) + + The scrapy parse -h command no longer throws an error (issue 5481, issue 5482) + +------------------------------------------------------------------- Old: ---- Scrapy-2.6.1.tar.gz New: ---- Scrapy-2.6.2.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-Scrapy.spec ++++++ --- /var/tmp/diff_new_pack.1oQoGS/_old 2022-09-12 19:08:24.990564712 +0200 +++ /var/tmp/diff_new_pack.1oQoGS/_new 2022-09-12 19:08:24.998564735 +0200 @@ -19,7 +19,7 @@ %{?!python_module:%define python_module() python3-%{**}} %define skip_python2 1 Name: python-Scrapy -Version: 2.6.1 +Version: 2.6.2 Release: 0 Summary: A high-level Python Screen Scraping framework License: BSD-3-Clause ++++++ Scrapy-2.6.1.tar.gz -> Scrapy-2.6.2.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/PKG-INFO new/Scrapy-2.6.2/PKG-INFO --- old/Scrapy-2.6.1/PKG-INFO 2022-03-01 14:07:43.042602300 +0100 +++ new/Scrapy-2.6.2/PKG-INFO 2022-07-25 13:54:31.427226500 +0200 @@ -1,6 +1,6 @@ Metadata-Version: 2.1 Name: Scrapy -Version: 2.6.1 +Version: 2.6.2 Summary: A high-level Web Crawling and Web Scraping framework Home-page: https://scrapy.org Author: Scrapy developers @@ -10,7 +10,6 @@ Project-URL: Documentation, https://docs.scrapy.org/ Project-URL: Source, https://github.com/scrapy/scrapy Project-URL: Tracker, https://github.com/scrapy/scrapy/issues -Platform: UNKNOWN Classifier: Framework :: Scrapy Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console @@ -144,5 +143,3 @@ ================== See https://scrapy.org/support/ for details. - - diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/Scrapy.egg-info/PKG-INFO new/Scrapy-2.6.2/Scrapy.egg-info/PKG-INFO --- old/Scrapy-2.6.1/Scrapy.egg-info/PKG-INFO 2022-03-01 14:07:42.000000000 +0100 +++ new/Scrapy-2.6.2/Scrapy.egg-info/PKG-INFO 2022-07-25 13:54:31.000000000 +0200 @@ -1,6 +1,6 @@ Metadata-Version: 2.1 Name: Scrapy -Version: 2.6.1 +Version: 2.6.2 Summary: A high-level Web Crawling and Web Scraping framework Home-page: https://scrapy.org Author: Scrapy developers @@ -10,7 +10,6 @@ Project-URL: Documentation, https://docs.scrapy.org/ Project-URL: Source, https://github.com/scrapy/scrapy Project-URL: Tracker, https://github.com/scrapy/scrapy/issues -Platform: UNKNOWN Classifier: Framework :: Scrapy Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console @@ -144,5 +143,3 @@ ================== See https://scrapy.org/support/ for details. - - diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/Scrapy.egg-info/SOURCES.txt new/Scrapy-2.6.2/Scrapy.egg-info/SOURCES.txt --- old/Scrapy-2.6.1/Scrapy.egg-info/SOURCES.txt 2022-03-01 14:07:42.000000000 +0100 +++ new/Scrapy-2.6.2/Scrapy.egg-info/SOURCES.txt 2022-07-25 13:54:31.000000000 +0200 @@ -393,6 +393,12 @@ tests/CrawlerProcess/caching_hostname_resolver.py tests/CrawlerProcess/caching_hostname_resolver_ipv6.py tests/CrawlerProcess/default_name_resolver.py +tests/CrawlerProcess/multi.py +tests/CrawlerProcess/reactor_default.py +tests/CrawlerProcess/reactor_default_twisted_reactor_select.py +tests/CrawlerProcess/reactor_select.py +tests/CrawlerProcess/reactor_select_subclass_twisted_reactor_select.py +tests/CrawlerProcess/reactor_select_twisted_reactor_select.py tests/CrawlerProcess/simple.py tests/CrawlerProcess/twisted_reactor_asyncio.py tests/CrawlerProcess/twisted_reactor_custom_settings.py diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/docs/conf.py new/Scrapy-2.6.2/docs/conf.py --- old/Scrapy-2.6.1/docs/conf.py 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/docs/conf.py 2022-07-25 13:54:18.000000000 +0200 @@ -291,10 +291,12 @@ 'pytest': ('https://docs.pytest.org/en/latest', None), 'python': ('https://docs.python.org/3', None), 'sphinx': ('https://www.sphinx-doc.org/en/master', None), - 'tox': ('https://tox.readthedocs.io/en/latest', None), - 'twisted': ('https://twistedmatrix.com/documents/current', None), - 'twistedapi': ('https://twistedmatrix.com/documents/current/api', None), + 'tox': ('https://tox.wiki/en/latest/', None), + 'twisted': ('https://docs.twisted.org/en/stable/', None), + 'twistedapi': ('https://docs.twisted.org/en/stable/api/', None), + 'w3lib': ('https://w3lib.readthedocs.io/en/latest', None), } +intersphinx_disabled_reftypes = [] # Options for sphinx-hoverxref options diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/docs/news.rst new/Scrapy-2.6.2/docs/news.rst --- old/Scrapy-2.6.1/docs/news.rst 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/docs/news.rst 2022-07-25 13:54:18.000000000 +0200 @@ -3,6 +3,78 @@ Release notes ============= +.. _release-2.6.2: + +Scrapy 2.6.2 (2022-07-25) +------------------------- + +**Security bug fix:** + +- When :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` + processes a request with :reqmeta:`proxy` metadata, and that + :reqmeta:`proxy` metadata includes proxy credentials, + :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` sets + the ``Proxy-Authentication`` header, but only if that header is not already + set. + + There are third-party proxy-rotation downloader middlewares that set + different :reqmeta:`proxy` metadata every time they process a request. + + Because of request retries and redirects, the same request can be processed + by downloader middlewares more than once, including both + :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` and + any third-party proxy-rotation downloader middleware. + + These third-party proxy-rotation downloader middlewares could change the + :reqmeta:`proxy` metadata of a request to a new value, but fail to remove + the ``Proxy-Authentication`` header from the previous value of the + :reqmeta:`proxy` metadata, causing the credentials of one proxy to be sent + to a different proxy. + + To prevent the unintended leaking of proxy credentials, the behavior of + :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` is now + as follows when processing a request: + + - If the request being processed defines :reqmeta:`proxy` metadata that + includes credentials, the ``Proxy-Authorization`` header is always + updated to feature those credentials. + + - If the request being processed defines :reqmeta:`proxy` metadata + without credentials, the ``Proxy-Authorization`` header is removed + *unless* it was originally defined for the same proxy URL. + + To remove proxy credentials while keeping the same proxy URL, remove + the ``Proxy-Authorization`` header. + + - If the request has no :reqmeta:`proxy` metadata, or that metadata is a + falsy value (e.g. ``None``), the ``Proxy-Authorization`` header is + removed. + + It is no longer possible to set a proxy URL through the + :reqmeta:`proxy` metadata but set the credentials through the + ``Proxy-Authorization`` header. Set proxy credentials through the + :reqmeta:`proxy` metadata instead. + +Also fixes the following regressions introduced in 2.6.0: + +- :class:`~scrapy.crawler.CrawlerProcess` supports again crawling multiple + spiders (:issue:`5435`, :issue:`5436`) + +- Installing a Twisted reactor before Scrapy does (e.g. importing + :mod:`twisted.internet.reactor` somewhere at the module level) no longer + prevents Scrapy from starting, as long as a different reactor is not + specified in :setting:`TWISTED_REACTOR` (:issue:`5525`, :issue:`5528`) + +- Fixed an exception that was being logged after the spider finished under + certain conditions (:issue:`5437`, :issue:`5440`) + +- The ``--output``/``-o`` command-line parameter supports again a value + starting with a hyphen (:issue:`5444`, :issue:`5445`) + +- The ``scrapy parse -h`` command no longer throws an error (:issue:`5481`, + :issue:`5482`) + + .. _release-2.6.1: Scrapy 2.6.1 (2022-03-01) @@ -113,6 +185,9 @@ meet expectations, :exc:`TypeError` is now raised at startup time. Before, other exceptions would be raised at run time. (:issue:`3559`) +- The ``_encoding`` field of serialized :class:`~scrapy.http.Request` objects + is now named ``encoding``, in line with all other fields (:issue:`5130`) + Deprecation removals ~~~~~~~~~~~~~~~~~~~~ @@ -1897,6 +1972,59 @@ (:issue:`3884`) +.. _release-1.8.3: + +Scrapy 1.8.3 (2022-07-25) +------------------------- + +**Security bug fix:** + +- When :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` + processes a request with :reqmeta:`proxy` metadata, and that + :reqmeta:`proxy` metadata includes proxy credentials, + :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` sets + the ``Proxy-Authentication`` header, but only if that header is not already + set. + + There are third-party proxy-rotation downloader middlewares that set + different :reqmeta:`proxy` metadata every time they process a request. + + Because of request retries and redirects, the same request can be processed + by downloader middlewares more than once, including both + :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` and + any third-party proxy-rotation downloader middleware. + + These third-party proxy-rotation downloader middlewares could change the + :reqmeta:`proxy` metadata of a request to a new value, but fail to remove + the ``Proxy-Authentication`` header from the previous value of the + :reqmeta:`proxy` metadata, causing the credentials of one proxy to be sent + to a different proxy. + + To prevent the unintended leaking of proxy credentials, the behavior of + :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` is now + as follows when processing a request: + + - If the request being processed defines :reqmeta:`proxy` metadata that + includes credentials, the ``Proxy-Authorization`` header is always + updated to feature those credentials. + + - If the request being processed defines :reqmeta:`proxy` metadata + without credentials, the ``Proxy-Authorization`` header is removed + *unless* it was originally defined for the same proxy URL. + + To remove proxy credentials while keeping the same proxy URL, remove + the ``Proxy-Authorization`` header. + + - If the request has no :reqmeta:`proxy` metadata, or that metadata is a + falsy value (e.g. ``None``), the ``Proxy-Authorization`` header is + removed. + + It is no longer possible to set a proxy URL through the + :reqmeta:`proxy` metadata but set the credentials through the + ``Proxy-Authorization`` header. Set proxy credentials through the + :reqmeta:`proxy` metadata instead. + + .. _release-1.8.2: Scrapy 1.8.2 (2022-03-01) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/docs/requirements.txt new/Scrapy-2.6.2/docs/requirements.txt --- old/Scrapy-2.6.1/docs/requirements.txt 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/docs/requirements.txt 2022-07-25 13:54:18.000000000 +0200 @@ -1,4 +1,4 @@ -Sphinx>=3.0 -sphinx-hoverxref>=0.2b1 -sphinx-notfound-page>=0.4 -sphinx-rtd-theme>=0.5.2 \ No newline at end of file +sphinx==5.0.2 +sphinx-hoverxref==1.1.1 +sphinx-notfound-page==0.8 +sphinx-rtd-theme==1.0.0 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/docs/topics/settings.rst new/Scrapy-2.6.2/docs/topics/settings.rst --- old/Scrapy-2.6.1/docs/topics/settings.rst 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/docs/topics/settings.rst 2022-07-25 13:54:18.000000000 +0200 @@ -1638,9 +1638,10 @@ The default value of the :setting:`TWISTED_REACTOR` setting is ``None``, which -means that Scrapy will install the default reactor defined by Twisted for the -current platform. This is to maintain backward compatibility and avoid possible -problems caused by using a non-default reactor. +means that Scrapy will use the existing reactor if one is already installed, or +install the default reactor defined by Twisted for the current platform. This +is to maintain backward compatibility and avoid possible problems caused by +using a non-default reactor. For additional information, see :doc:`core/howto/choosing-reactor`. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/VERSION new/Scrapy-2.6.2/scrapy/VERSION --- old/Scrapy-2.6.1/scrapy/VERSION 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/scrapy/VERSION 2022-07-25 13:54:18.000000000 +0200 @@ -1 +1 @@ -2.6.1 +2.6.2 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/cmdline.py new/Scrapy-2.6.2/scrapy/cmdline.py --- old/Scrapy-2.6.1/scrapy/cmdline.py 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/scrapy/cmdline.py 2022-07-25 13:54:18.000000000 +0200 @@ -14,6 +14,15 @@ from scrapy.utils.python import garbage_collect +class ScrapyArgumentParser(argparse.ArgumentParser): + def _parse_optional(self, arg_string): + # if starts with -: it means that is a parameter not a argument + if arg_string[:2] == '-:': + return None + + return super()._parse_optional(arg_string) + + def _iter_command_classes(module_name): # TODO: add `name` attribute to commands and and merge this function with # scrapy.utils.spider.iter_spider_classes @@ -131,10 +140,10 @@ sys.exit(2) cmd = cmds[cmdname] - parser = argparse.ArgumentParser(formatter_class=ScrapyHelpFormatter, - usage=f"scrapy {cmdname} {cmd.syntax()}", - conflict_handler='resolve', - description=cmd.long_desc()) + parser = ScrapyArgumentParser(formatter_class=ScrapyHelpFormatter, + usage=f"scrapy {cmdname} {cmd.syntax()}", + conflict_handler='resolve', + description=cmd.long_desc()) settings.setdict(cmd.default_settings, priority='command') cmd.settings = settings cmd.add_options(parser) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/commands/parse.py new/Scrapy-2.6.2/scrapy/commands/parse.py --- old/Scrapy-2.6.1/scrapy/commands/parse.py 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/scrapy/commands/parse.py 2022-07-25 13:54:18.000000000 +0200 @@ -51,7 +51,7 @@ parser.add_argument("--cbkwargs", dest="cbkwargs", help="inject extra callback kwargs into the Request, it must be a valid raw json string") parser.add_argument("-d", "--depth", dest="depth", type=int, default=1, - help="maximum depth for parsing requests [default: %default]") + help="maximum depth for parsing requests [default: %(default)s]") parser.add_argument("-v", "--verbose", dest="verbose", action="store_true", help="print each depth level one by one") diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/core/engine.py new/Scrapy-2.6.2/scrapy/core/engine.py --- old/Scrapy-2.6.1/scrapy/core/engine.py 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/scrapy/core/engine.py 2022-07-25 13:54:18.000000000 +0200 @@ -136,7 +136,9 @@ self.paused = False def _next_request(self) -> None: - assert self.slot is not None # typing + if self.slot is None: + return + assert self.spider is not None # typing if self.paused: @@ -184,7 +186,8 @@ d.addErrback(lambda f: logger.info('Error while removing request from slot', exc_info=failure_to_exc_info(f), extra={'spider': self.spider})) - d.addBoth(lambda _: self.slot.nextcall.schedule()) + slot = self.slot + d.addBoth(lambda _: slot.nextcall.schedule()) d.addErrback(lambda f: logger.info('Error while scheduling new request', exc_info=failure_to_exc_info(f), extra={'spider': self.spider})) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/crawler.py new/Scrapy-2.6.2/scrapy/crawler.py --- old/Scrapy-2.6.1/scrapy/crawler.py 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/scrapy/crawler.py 2022-07-25 13:54:18.000000000 +0200 @@ -78,8 +78,7 @@ if reactor_class: install_reactor(reactor_class, self.settings["ASYNCIO_EVENT_LOOP"]) else: - from twisted.internet import default - default.install() + from twisted.internet import reactor # noqa: F401 log_reactor_info() if reactor_class: verify_installed_reactor(reactor_class) @@ -290,6 +289,7 @@ super().__init__(settings) configure_logging(self.settings, install_root_handler) log_scrapy_info(self.settings) + self._initialized_reactor = False def _signal_shutdown(self, signum, _): from twisted.internet import reactor @@ -310,7 +310,9 @@ def _create_crawler(self, spidercls): if isinstance(spidercls, str): spidercls = self.spider_loader.load(spidercls) - return Crawler(spidercls, self.settings, init_reactor=True) + init_reactor = not self._initialized_reactor + self._initialized_reactor = True + return Crawler(spidercls, self.settings, init_reactor=init_reactor) def start(self, stop_after_crawl=True, install_signal_handlers=True): """ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/downloadermiddlewares/httpproxy.py new/Scrapy-2.6.2/scrapy/downloadermiddlewares/httpproxy.py --- old/Scrapy-2.6.1/scrapy/downloadermiddlewares/httpproxy.py 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/scrapy/downloadermiddlewares/httpproxy.py 2022-07-25 13:54:18.000000000 +0200 @@ -45,31 +45,37 @@ return creds, proxy_url def process_request(self, request, spider): - # ignore if proxy is already set + creds, proxy_url = None, None if 'proxy' in request.meta: - if request.meta['proxy'] is None: - return - # extract credentials if present - creds, proxy_url = self._get_proxy(request.meta['proxy'], '') - request.meta['proxy'] = proxy_url - if creds and not request.headers.get('Proxy-Authorization'): - request.headers['Proxy-Authorization'] = b'Basic ' + creds - return - elif not self.proxies: - return - - parsed = urlparse_cached(request) - scheme = parsed.scheme + if request.meta['proxy'] is not None: + creds, proxy_url = self._get_proxy(request.meta['proxy'], '') + elif self.proxies: + parsed = urlparse_cached(request) + scheme = parsed.scheme + if ( + ( + # 'no_proxy' is only supported by http schemes + scheme not in ('http', 'https') + or not proxy_bypass(parsed.hostname) + ) + and scheme in self.proxies + ): + creds, proxy_url = self.proxies[scheme] - # 'no_proxy' is only supported by http schemes - if scheme in ('http', 'https') and proxy_bypass(parsed.hostname): - return + self._set_proxy_and_creds(request, proxy_url, creds) - if scheme in self.proxies: - self._set_proxy(request, scheme) - - def _set_proxy(self, request, scheme): - creds, proxy = self.proxies[scheme] - request.meta['proxy'] = proxy + def _set_proxy_and_creds(self, request, proxy_url, creds): + if proxy_url: + request.meta['proxy'] = proxy_url + elif request.meta.get('proxy') is not None: + request.meta['proxy'] = None if creds: - request.headers['Proxy-Authorization'] = b'Basic ' + creds + request.headers[b'Proxy-Authorization'] = b'Basic ' + creds + request.meta['_auth_proxy'] = proxy_url + elif '_auth_proxy' in request.meta: + if proxy_url != request.meta['_auth_proxy']: + if b'Proxy-Authorization' in request.headers: + del request.headers[b'Proxy-Authorization'] + del request.meta['_auth_proxy'] + elif b'Proxy-Authorization' in request.headers: + del request.headers[b'Proxy-Authorization'] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/scrapy/utils/reactor.py new/Scrapy-2.6.2/scrapy/utils/reactor.py --- old/Scrapy-2.6.1/scrapy/utils/reactor.py 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/scrapy/utils/reactor.py 2022-07-25 13:54:18.000000000 +0200 @@ -83,7 +83,7 @@ path.""" from twisted.internet import reactor reactor_class = load_object(reactor_path) - if not isinstance(reactor, reactor_class): + if not reactor.__class__ == reactor_class: msg = ("The installed reactor " f"({reactor.__module__}.{reactor.__class__.__name__}) does not " f"match the requested one ({reactor_path})") diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/tests/CrawlerProcess/multi.py new/Scrapy-2.6.2/tests/CrawlerProcess/multi.py --- old/Scrapy-2.6.1/tests/CrawlerProcess/multi.py 1970-01-01 01:00:00.000000000 +0100 +++ new/Scrapy-2.6.2/tests/CrawlerProcess/multi.py 2022-07-25 13:54:18.000000000 +0200 @@ -0,0 +1,16 @@ +import scrapy +from scrapy.crawler import CrawlerProcess + + +class NoRequestsSpider(scrapy.Spider): + name = 'no_request' + + def start_requests(self): + return [] + + +process = CrawlerProcess(settings={}) + +process.crawl(NoRequestsSpider) +process.crawl(NoRequestsSpider) +process.start() diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_default.py new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_default.py --- old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_default.py 1970-01-01 01:00:00.000000000 +0100 +++ new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_default.py 2022-07-25 13:54:18.000000000 +0200 @@ -0,0 +1,17 @@ +import scrapy +from scrapy.crawler import CrawlerProcess +from twisted.internet import reactor + + +class NoRequestsSpider(scrapy.Spider): + name = 'no_request' + + def start_requests(self): + return [] + + +process = CrawlerProcess(settings={}) + +process.crawl(NoRequestsSpider) +process.start() + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_default_twisted_reactor_select.py new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_default_twisted_reactor_select.py --- old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_default_twisted_reactor_select.py 1970-01-01 01:00:00.000000000 +0100 +++ new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_default_twisted_reactor_select.py 2022-07-25 13:54:18.000000000 +0200 @@ -0,0 +1,20 @@ +import scrapy +from scrapy.crawler import CrawlerProcess +from twisted.internet import reactor + + +class NoRequestsSpider(scrapy.Spider): + name = 'no_request' + + def start_requests(self): + return [] + + +process = CrawlerProcess(settings={ + "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor", +}) + +process.crawl(NoRequestsSpider) +process.start() + + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select.py new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select.py --- old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select.py 1970-01-01 01:00:00.000000000 +0100 +++ new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select.py 2022-07-25 13:54:18.000000000 +0200 @@ -0,0 +1,19 @@ +import scrapy +from scrapy.crawler import CrawlerProcess +from twisted.internet import selectreactor +selectreactor.install() + + +class NoRequestsSpider(scrapy.Spider): + name = 'no_request' + + def start_requests(self): + return [] + + +process = CrawlerProcess(settings={}) + +process.crawl(NoRequestsSpider) +process.start() + + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select_subclass_twisted_reactor_select.py new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select_subclass_twisted_reactor_select.py --- old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select_subclass_twisted_reactor_select.py 1970-01-01 01:00:00.000000000 +0100 +++ new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select_subclass_twisted_reactor_select.py 2022-07-25 13:54:18.000000000 +0200 @@ -0,0 +1,31 @@ +import scrapy +from scrapy.crawler import CrawlerProcess +from twisted.internet.main import installReactor +from twisted.internet.selectreactor import SelectReactor + + +class SelectReactorSubclass(SelectReactor): + pass + + +reactor = SelectReactorSubclass() +installReactor(reactor) + + +class NoRequestsSpider(scrapy.Spider): + name = 'no_request' + + def start_requests(self): + return [] + + +process = CrawlerProcess(settings={ + "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor", +}) + +process.crawl(NoRequestsSpider) +process.start() + + + + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select_twisted_reactor_select.py new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select_twisted_reactor_select.py --- old/Scrapy-2.6.1/tests/CrawlerProcess/reactor_select_twisted_reactor_select.py 1970-01-01 01:00:00.000000000 +0100 +++ new/Scrapy-2.6.2/tests/CrawlerProcess/reactor_select_twisted_reactor_select.py 2022-07-25 13:54:18.000000000 +0200 @@ -0,0 +1,22 @@ +import scrapy +from scrapy.crawler import CrawlerProcess +from twisted.internet import selectreactor +selectreactor.install() + + +class NoRequestsSpider(scrapy.Spider): + name = 'no_request' + + def start_requests(self): + return [] + + +process = CrawlerProcess(settings={ + "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor", +}) + +process.crawl(NoRequestsSpider) +process.start() + + + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/tests/test_commands.py new/Scrapy-2.6.2/tests/test_commands.py --- old/Scrapy-2.6.1/tests/test_commands.py 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/tests/test_commands.py 2022-07-25 13:54:18.000000000 +0200 @@ -770,6 +770,21 @@ log = self.get_log(spider_code, args=args) self.assertIn("error: Please use only one of -o/--output and -O/--overwrite-output", log) + def test_output_stdout(self): + spider_code = """ +import scrapy + +class MySpider(scrapy.Spider): + name = 'myspider' + + def start_requests(self): + self.logger.debug('FEEDS: {}'.format(self.settings.getdict('FEEDS'))) + return [] +""" + args = ['-o', '-:json'] + log = self.get_log(spider_code, args=args) + self.assertIn("[myspider] DEBUG: FEEDS: {'stdout:': {'format': 'json'}}", log) + @skipIf(platform.system() != 'Windows', "Windows required for .pyw files") class WindowsRunSpiderCommandTest(RunSpiderCommandTest): @@ -915,3 +930,17 @@ args = ['-o', 'example1.json', '-O', 'example2.json'] log = self.get_log(spider_code, args=args) self.assertIn("error: Please use only one of -o/--output and -O/--overwrite-output", log) + + +class HelpMessageTest(CommandTest): + + def setUp(self): + super().setUp() + self.commands = ["parse", "startproject", "view", "crawl", "edit", + "list", "fetch", "settings", "shell", "runspider", + "version", "genspider", "check", "bench"] + + def test_help_messages(self): + for command in self.commands: + _, out, _ = self.proc(command, "-h") + self.assertIn("Usage", out) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/tests/test_crawler.py new/Scrapy-2.6.2/tests/test_crawler.py --- old/Scrapy-2.6.1/tests/test_crawler.py 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/tests/test_crawler.py 2022-07-25 13:54:18.000000000 +0200 @@ -302,6 +302,63 @@ self.assertIn('Spider closed (finished)', log) self.assertNotIn("Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor", log) + def test_multi(self): + log = self.run_script('multi.py') + self.assertIn('Spider closed (finished)', log) + self.assertNotIn("Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor", log) + self.assertNotIn("ReactorAlreadyInstalledError", log) + + def test_reactor_default(self): + log = self.run_script('reactor_default.py') + self.assertIn('Spider closed (finished)', log) + self.assertNotIn("Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor", log) + self.assertNotIn("ReactorAlreadyInstalledError", log) + + def test_reactor_default_twisted_reactor_select(self): + log = self.run_script('reactor_default_twisted_reactor_select.py') + if platform.system() == 'Windows': + # The goal of this test function is to test that, when a reactor is + # installed (the default one here) and a different reactor is + # configured (select here), an error raises. + # + # In Windows the default reactor is the select reactor, so that + # error does not raise. + # + # If that ever becomes the case on more platforms (i.e. if Linux + # also starts using the select reactor by default in a future + # version of Twisted), then we will need to rethink this test. + self.assertIn('Spider closed (finished)', log) + else: + self.assertNotIn('Spider closed (finished)', log) + self.assertIn( + ( + "does not match the requested one " + "(twisted.internet.selectreactor.SelectReactor)" + ), + log, + ) + + def test_reactor_select(self): + log = self.run_script('reactor_select.py') + self.assertIn('Spider closed (finished)', log) + self.assertNotIn("ReactorAlreadyInstalledError", log) + + def test_reactor_select_twisted_reactor_select(self): + log = self.run_script('reactor_select_twisted_reactor_select.py') + self.assertIn('Spider closed (finished)', log) + self.assertNotIn("ReactorAlreadyInstalledError", log) + + def test_reactor_select_subclass_twisted_reactor_select(self): + log = self.run_script('reactor_select_subclass_twisted_reactor_select.py') + self.assertNotIn('Spider closed (finished)', log) + self.assertIn( + ( + "does not match the requested one " + "(twisted.internet.selectreactor.SelectReactor)" + ), + log, + ) + def test_asyncio_enabled_no_reactor(self): log = self.run_script('asyncio_enabled_no_reactor.py') self.assertIn('Spider closed (finished)', log) @@ -334,33 +391,33 @@ self.assertNotIn("TimeoutError", log) self.assertNotIn("twisted.internet.error.DNSLookupError", log) - def test_reactor_select(self): + def test_twisted_reactor_select(self): log = self.run_script("twisted_reactor_select.py") self.assertIn("Spider closed (finished)", log) self.assertIn("Using reactor: twisted.internet.selectreactor.SelectReactor", log) @mark.skipif(platform.system() == 'Windows', reason="PollReactor is not supported on Windows") - def test_reactor_poll(self): + def test_twisted_reactor_poll(self): log = self.run_script("twisted_reactor_poll.py") self.assertIn("Spider closed (finished)", log) self.assertIn("Using reactor: twisted.internet.pollreactor.PollReactor", log) - def test_reactor_asyncio(self): + def test_twisted_reactor_asyncio(self): log = self.run_script("twisted_reactor_asyncio.py") self.assertIn("Spider closed (finished)", log) self.assertIn("Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor", log) - def test_reactor_asyncio_custom_settings(self): + def test_twisted_reactor_asyncio_custom_settings(self): log = self.run_script("twisted_reactor_custom_settings.py") self.assertIn("Spider closed (finished)", log) self.assertIn("Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor", log) - def test_reactor_asyncio_custom_settings_same(self): + def test_twisted_reactor_asyncio_custom_settings_same(self): log = self.run_script("twisted_reactor_custom_settings_same.py") self.assertIn("Spider closed (finished)", log) self.assertIn("Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor", log) - def test_reactor_asyncio_custom_settings_conflict(self): + def test_twisted_reactor_asyncio_custom_settings_conflict(self): log = self.run_script("twisted_reactor_custom_settings_conflict.py") self.assertIn("Using reactor: twisted.internet.selectreactor.SelectReactor", log) self.assertIn("(twisted.internet.selectreactor.SelectReactor) does not match the requested one", log) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/tests/test_downloadermiddleware_httpproxy.py new/Scrapy-2.6.2/tests/test_downloadermiddleware_httpproxy.py --- old/Scrapy-2.6.1/tests/test_downloadermiddleware_httpproxy.py 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/tests/test_downloadermiddleware_httpproxy.py 2022-07-25 13:54:18.000000000 +0200 @@ -65,12 +65,12 @@ mw = HttpProxyMiddleware() req = Request('http://scrapytest.org') assert mw.process_request(req, spider) is None - self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'}) + self.assertEqual(req.meta['proxy'], 'https://proxy:3128') self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic dXNlcjpwYXNz') # proxy from request.meta req = Request('http://scrapytest.org', meta={'proxy': 'https://username:password@proxy:3128'}) assert mw.process_request(req, spider) is None - self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'}) + self.assertEqual(req.meta['proxy'], 'https://proxy:3128') self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic dXNlcm5hbWU6cGFzc3dvcmQ=') def test_proxy_auth_empty_passwd(self): @@ -78,12 +78,12 @@ mw = HttpProxyMiddleware() req = Request('http://scrapytest.org') assert mw.process_request(req, spider) is None - self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'}) + self.assertEqual(req.meta['proxy'], 'https://proxy:3128') self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic dXNlcjo=') # proxy from request.meta req = Request('http://scrapytest.org', meta={'proxy': 'https://username:@proxy:3128'}) assert mw.process_request(req, spider) is None - self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'}) + self.assertEqual(req.meta['proxy'], 'https://proxy:3128') self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic dXNlcm5hbWU6') def test_proxy_auth_encoding(self): @@ -92,26 +92,26 @@ mw = HttpProxyMiddleware(auth_encoding='utf-8') req = Request('http://scrapytest.org') assert mw.process_request(req, spider) is None - self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'}) + self.assertEqual(req.meta['proxy'], 'https://proxy:3128') self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic bcOhbjpwYXNz') # proxy from request.meta req = Request('http://scrapytest.org', meta={'proxy': 'https://\u00FCser:pass@proxy:3128'}) assert mw.process_request(req, spider) is None - self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'}) + self.assertEqual(req.meta['proxy'], 'https://proxy:3128') self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic w7xzZXI6cGFzcw==') # default latin-1 encoding mw = HttpProxyMiddleware(auth_encoding='latin-1') req = Request('http://scrapytest.org') assert mw.process_request(req, spider) is None - self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'}) + self.assertEqual(req.meta['proxy'], 'https://proxy:3128') self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic beFuOnBhc3M=') # proxy from request.meta, latin-1 encoding req = Request('http://scrapytest.org', meta={'proxy': 'https://\u00FCser:pass@proxy:3128'}) assert mw.process_request(req, spider) is None - self.assertEqual(req.meta, {'proxy': 'https://proxy:3128'}) + self.assertEqual(req.meta['proxy'], 'https://proxy:3128') self.assertEqual(req.headers.get('Proxy-Authorization'), b'Basic /HNlcjpwYXNz') def test_proxy_already_seted(self): @@ -152,3 +152,300 @@ # '/var/run/docker.sock' may be used by the user for # no_proxy value but is not parseable and should be skipped assert 'no' not in mw.proxies + + def test_add_proxy_without_credentials(self): + middleware = HttpProxyMiddleware() + request = Request('https://example.com') + assert middleware.process_request(request, spider) is None + request.meta['proxy'] = 'https://example.com' + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.com') + self.assertNotIn(b'Proxy-Authorization', request.headers) + + def test_add_proxy_with_credentials(self): + middleware = HttpProxyMiddleware() + request = Request('https://example.com') + assert middleware.process_request(request, spider) is None + request.meta['proxy'] = 'https://user1:passwo...@example.com' + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.com') + encoded_credentials = middleware._basic_auth_header( + 'user1', + 'password1', + ) + self.assertEqual( + request.headers['Proxy-Authorization'], + b'Basic ' + encoded_credentials, + ) + + def test_remove_proxy_without_credentials(self): + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + meta={'proxy': 'https://example.com'}, + ) + assert middleware.process_request(request, spider) is None + request.meta['proxy'] = None + assert middleware.process_request(request, spider) is None + self.assertIsNone(request.meta['proxy']) + self.assertNotIn(b'Proxy-Authorization', request.headers) + + def test_remove_proxy_with_credentials(self): + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + meta={'proxy': 'https://user1:passwo...@example.com'}, + ) + assert middleware.process_request(request, spider) is None + request.meta['proxy'] = None + assert middleware.process_request(request, spider) is None + self.assertIsNone(request.meta['proxy']) + self.assertNotIn(b'Proxy-Authorization', request.headers) + + def test_add_credentials(self): + """If the proxy request meta switches to a proxy URL with the same + proxy and adds credentials (there were no credentials before), the new + credentials must be used.""" + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + meta={'proxy': 'https://example.com'}, + ) + assert middleware.process_request(request, spider) is None + + request.meta['proxy'] = 'https://user1:passwo...@example.com' + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.com') + encoded_credentials = middleware._basic_auth_header( + 'user1', + 'password1', + ) + self.assertEqual( + request.headers['Proxy-Authorization'], + b'Basic ' + encoded_credentials, + ) + + def test_change_credentials(self): + """If the proxy request meta switches to a proxy URL with different + credentials, those new credentials must be used.""" + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + meta={'proxy': 'https://user1:passwo...@example.com'}, + ) + assert middleware.process_request(request, spider) is None + request.meta['proxy'] = 'https://user2:passwo...@example.com' + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.com') + encoded_credentials = middleware._basic_auth_header( + 'user2', + 'password2', + ) + self.assertEqual( + request.headers['Proxy-Authorization'], + b'Basic ' + encoded_credentials, + ) + + def test_remove_credentials(self): + """If the proxy request meta switches to a proxy URL with the same + proxy but no credentials, the original credentials must be still + used. + + To remove credentials while keeping the same proxy URL, users must + delete the Proxy-Authorization header. + """ + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + meta={'proxy': 'https://user1:passwo...@example.com'}, + ) + assert middleware.process_request(request, spider) is None + + request.meta['proxy'] = 'https://example.com' + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.com') + encoded_credentials = middleware._basic_auth_header( + 'user1', + 'password1', + ) + self.assertEqual( + request.headers['Proxy-Authorization'], + b'Basic ' + encoded_credentials, + ) + + request.meta['proxy'] = 'https://example.com' + del request.headers[b'Proxy-Authorization'] + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.com') + self.assertNotIn(b'Proxy-Authorization', request.headers) + + def test_change_proxy_add_credentials(self): + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + meta={'proxy': 'https://example.com'}, + ) + assert middleware.process_request(request, spider) is None + + request.meta['proxy'] = 'https://user1:passwo...@example.org' + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.org') + encoded_credentials = middleware._basic_auth_header( + 'user1', + 'password1', + ) + self.assertEqual( + request.headers['Proxy-Authorization'], + b'Basic ' + encoded_credentials, + ) + + def test_change_proxy_keep_credentials(self): + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + meta={'proxy': 'https://user1:passwo...@example.com'}, + ) + assert middleware.process_request(request, spider) is None + + request.meta['proxy'] = 'https://user1:passwo...@example.org' + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.org') + encoded_credentials = middleware._basic_auth_header( + 'user1', + 'password1', + ) + self.assertEqual( + request.headers['Proxy-Authorization'], + b'Basic ' + encoded_credentials, + ) + + # Make sure, indirectly, that _auth_proxy is updated. + request.meta['proxy'] = 'https://example.com' + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.com') + self.assertNotIn(b'Proxy-Authorization', request.headers) + + def test_change_proxy_change_credentials(self): + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + meta={'proxy': 'https://user1:passwo...@example.com'}, + ) + assert middleware.process_request(request, spider) is None + + request.meta['proxy'] = 'https://user2:passwo...@example.org' + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.org') + encoded_credentials = middleware._basic_auth_header( + 'user2', + 'password2', + ) + self.assertEqual( + request.headers['Proxy-Authorization'], + b'Basic ' + encoded_credentials, + ) + + def test_change_proxy_remove_credentials(self): + """If the proxy request meta switches to a proxy URL with a different + proxy and no credentials, no credentials must be used.""" + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + meta={'proxy': 'https://user1:passwo...@example.com'}, + ) + assert middleware.process_request(request, spider) is None + request.meta['proxy'] = 'https://example.org' + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta, {'proxy': 'https://example.org'}) + self.assertNotIn(b'Proxy-Authorization', request.headers) + + def test_change_proxy_remove_credentials_preremoved_header(self): + """Corner case of proxy switch with credentials removal where the + credentials have been removed beforehand. + + It ensures that our implementation does not assume that the credentials + header exists when trying to remove it. + """ + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + meta={'proxy': 'https://user1:passwo...@example.com'}, + ) + assert middleware.process_request(request, spider) is None + request.meta['proxy'] = 'https://example.org' + del request.headers[b'Proxy-Authorization'] + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta, {'proxy': 'https://example.org'}) + self.assertNotIn(b'Proxy-Authorization', request.headers) + + def test_proxy_authentication_header_undefined_proxy(self): + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + headers={'Proxy-Authorization': 'Basic foo'}, + ) + assert middleware.process_request(request, spider) is None + self.assertNotIn('proxy', request.meta) + self.assertNotIn(b'Proxy-Authorization', request.headers) + + def test_proxy_authentication_header_disabled_proxy(self): + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + headers={'Proxy-Authorization': 'Basic foo'}, + meta={'proxy': None}, + ) + assert middleware.process_request(request, spider) is None + self.assertIsNone(request.meta['proxy']) + self.assertNotIn(b'Proxy-Authorization', request.headers) + + def test_proxy_authentication_header_proxy_without_credentials(self): + middleware = HttpProxyMiddleware() + request = Request( + 'https://example.com', + headers={'Proxy-Authorization': 'Basic foo'}, + meta={'proxy': 'https://example.com'}, + ) + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.com') + self.assertNotIn(b'Proxy-Authorization', request.headers) + + def test_proxy_authentication_header_proxy_with_same_credentials(self): + middleware = HttpProxyMiddleware() + encoded_credentials = middleware._basic_auth_header( + 'user1', + 'password1', + ) + request = Request( + 'https://example.com', + headers={'Proxy-Authorization': b'Basic ' + encoded_credentials}, + meta={'proxy': 'https://user1:passwo...@example.com'}, + ) + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.com') + self.assertEqual( + request.headers['Proxy-Authorization'], + b'Basic ' + encoded_credentials, + ) + + def test_proxy_authentication_header_proxy_with_different_credentials(self): + middleware = HttpProxyMiddleware() + encoded_credentials1 = middleware._basic_auth_header( + 'user1', + 'password1', + ) + request = Request( + 'https://example.com', + headers={'Proxy-Authorization': b'Basic ' + encoded_credentials1}, + meta={'proxy': 'https://user2:passwo...@example.com'}, + ) + assert middleware.process_request(request, spider) is None + self.assertEqual(request.meta['proxy'], 'https://example.com') + encoded_credentials2 = middleware._basic_auth_header( + 'user2', + 'password2', + ) + self.assertEqual( + request.headers['Proxy-Authorization'], + b'Basic ' + encoded_credentials2, + ) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/tests/test_engine.py new/Scrapy-2.6.2/tests/test_engine.py --- old/Scrapy-2.6.1/tests/test_engine.py 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/tests/test_engine.py 2022-07-25 13:54:18.000000000 +0200 @@ -12,9 +12,11 @@ import os import re +import subprocess import sys import warnings from collections import defaultdict +from threading import Timer from urllib.parse import urlparse import attr @@ -502,6 +504,37 @@ self.assertEqual(warning_list[0].category, ScrapyDeprecationWarning) self.assertEqual(str(warning_list[0].message), "ExecutionEngine.has_capacity is deprecated") + def test_short_timeout(self): + args = ( + sys.executable, + '-m', + 'scrapy.cmdline', + 'fetch', + '-s', + 'CLOSESPIDER_TIMEOUT=0.001', + '-s', + 'LOG_LEVEL=DEBUG', + 'http://toscrape.com', + ) + p = subprocess.Popen( + args, + stderr=subprocess.PIPE, + ) + + def kill_proc(): + p.kill() + p.communicate() + assert False, 'Command took too much time to complete' + + timer = Timer(15, kill_proc) + try: + timer.start() + _, stderr = p.communicate() + finally: + timer.cancel() + + self.assertNotIn(b'Traceback', stderr) + if __name__ == "__main__": if len(sys.argv) > 1 and sys.argv[1] == 'runserver': diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.6.1/tox.ini new/Scrapy-2.6.2/tox.ini --- old/Scrapy-2.6.1/tox.ini 2022-03-01 14:07:30.000000000 +0100 +++ new/Scrapy-2.6.2/tox.ini 2022-07-25 13:54:18.000000000 +0200 @@ -12,10 +12,11 @@ -rtests/requirements.txt # mitmproxy does not support PyPy # mitmproxy does not support Windows when running Python < 3.7 - # Python 3.9+ requires https://github.com/mitmproxy/mitmproxy/commit/8e5e43de24c9bc93092b63efc67fbec029a9e7fe + # Python 3.9+ requires mitmproxy >= 5.3.0 # mitmproxy >= 5.3.0 requires h2 >= 4.0, Twisted 21.2 requires h2 < 4.0 #mitmproxy >= 5.3.0; python_version >= '3.9' and implementation_name != 'pypy' - mitmproxy >= 4.0.4; python_version >= '3.7' and python_version < '3.9' and implementation_name != 'pypy' + # The tests hang with mitmproxy 8.0.0: https://github.com/scrapy/scrapy/issues/5454 + mitmproxy >= 4.0.4, < 8; python_version >= '3.7' and python_version < '3.9' and implementation_name != 'pypy' mitmproxy >= 4.0.4, < 5; python_version >= '3.6' and python_version < '3.7' and platform_system != 'Windows' and implementation_name != 'pypy' # newer markupsafe is incompatible with deps of old mitmproxy (which we get on Python 3.7 and lower) markupsafe < 2.1.0; python_version >= '3.6' and python_version < '3.8' and implementation_name != 'pypy'