Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package python-Scrapy for openSUSE:Factory checked in at 2021-10-08 00:06:30 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-Scrapy (Old) and /work/SRC/openSUSE:Factory/.python-Scrapy.new.2443 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-Scrapy" Fri Oct 8 00:06:30 2021 rev:11 rq:924057 version:2.5.1 Changes: -------- --- /work/SRC/openSUSE:Factory/python-Scrapy/python-Scrapy.changes 2021-09-09 23:08:09.300873495 +0200 +++ /work/SRC/openSUSE:Factory/.python-Scrapy.new.2443/python-Scrapy.changes 2021-10-08 00:07:29.293899563 +0200 @@ -1,0 +2,31 @@ +Thu Oct 7 14:35:57 UTC 2021 - Ben Greiner <c...@bnavigator.de> + +- Update to 2.5.1, Security bug fix + * boo#1191446, CVE-2021-41125 + * If you use HttpAuthMiddleware (i.e. the http_user and + http_pass spider attributes) for HTTP authentication, + any request exposes your credentials to the request + target. + * To prevent unintended exposure of authentication + credentials to unintended domains, you must now + additionally set a new, additional spider attribute, + http_auth_domain, and point it to the specific domain to + which the authentication credentials must be sent. + * If the http_auth_domain spider attribute is not set, the + domain of the first request will be considered the HTTP + authentication target, and authentication credentials + will only be sent in requests targeting that domain. + * If you need to send the same HTTP authentication + credentials to multiple domains, you can use + w3lib.http.basic_auth_header instead to set the value of + the Authorization header of your requests. + * If you really want your spider to send the same HTTP + authentication credentials to any domain, set the + http_auth_domain spider attribute to None. + * Finally, if you are a user of scrapy-splash, know that + this version of Scrapy breaks compatibility with + scrapy-splash 0.7.2 and earlier. You will need to upgrade + scrapy-splash to a greater version for it to continue to + work. + +------------------------------------------------------------------- Old: ---- Scrapy-2.5.0.tar.gz New: ---- Scrapy-2.5.1.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-Scrapy.spec ++++++ --- /var/tmp/diff_new_pack.ueQEjY/_old 2021-10-08 00:07:29.773900377 +0200 +++ /var/tmp/diff_new_pack.ueQEjY/_new 2021-10-08 00:07:29.777900383 +0200 @@ -21,7 +21,7 @@ # python-uvloop does not support python3.6 %define skip_python36 1 Name: python-Scrapy -Version: 2.5.0 +Version: 2.5.1 Release: 0 Summary: A high-level Python Screen Scraping framework License: BSD-3-Clause ++++++ Scrapy-2.5.0.tar.gz -> Scrapy-2.5.1.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.5.0/PKG-INFO new/Scrapy-2.5.1/PKG-INFO --- old/Scrapy-2.5.0/PKG-INFO 2021-04-06 16:48:12.813729500 +0200 +++ new/Scrapy-2.5.1/PKG-INFO 2021-10-05 15:48:14.482129600 +0200 @@ -1,6 +1,6 @@ -Metadata-Version: 1.2 +Metadata-Version: 2.1 Name: Scrapy -Version: 2.5.0 +Version: 2.5.1 Summary: A high-level Web Crawling and Web Scraping framework Home-page: https://scrapy.org Author: Scrapy developers @@ -10,116 +10,6 @@ Project-URL: Documentation, https://docs.scrapy.org/ Project-URL: Source, https://github.com/scrapy/scrapy Project-URL: Tracker, https://github.com/scrapy/scrapy/issues -Description: ====== - Scrapy - ====== - - .. image:: https://img.shields.io/pypi/v/Scrapy.svg - :target: https://pypi.python.org/pypi/Scrapy - :alt: PyPI Version - - .. image:: https://img.shields.io/pypi/pyversions/Scrapy.svg - :target: https://pypi.python.org/pypi/Scrapy - :alt: Supported Python Versions - - .. image:: https://github.com/scrapy/scrapy/workflows/Ubuntu/badge.svg - :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AUbuntu - :alt: Ubuntu - - .. image:: https://github.com/scrapy/scrapy/workflows/macOS/badge.svg - :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AmacOS - :alt: macOS - - .. image:: https://github.com/scrapy/scrapy/workflows/Windows/badge.svg - :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AWindows - :alt: Windows - - .. image:: https://img.shields.io/badge/wheel-yes-brightgreen.svg - :target: https://pypi.python.org/pypi/Scrapy - :alt: Wheel Status - - .. image:: https://img.shields.io/codecov/c/github/scrapy/scrapy/master.svg - :target: https://codecov.io/github/scrapy/scrapy?branch=master - :alt: Coverage report - - .. image:: https://anaconda.org/conda-forge/scrapy/badges/version.svg - :target: https://anaconda.org/conda-forge/scrapy - :alt: Conda Version - - - Overview - ======== - - Scrapy is a fast high-level web crawling and web scraping framework, used to - crawl websites and extract structured data from their pages. It can be used for - a wide range of purposes, from data mining to monitoring and automated testing. - - Scrapy is maintained by Zyte_ (formerly Scrapinghub) and `many other - contributors`_. - - .. _many other contributors: https://github.com/scrapy/scrapy/graphs/contributors - .. _Zyte: https://www.zyte.com/ - - Check the Scrapy homepage at https://scrapy.org for more information, - including a list of features. - - - Requirements - ============ - - * Python 3.6+ - * Works on Linux, Windows, macOS, BSD - - Install - ======= - - The quick way:: - - pip install scrapy - - See the install section in the documentation at - https://docs.scrapy.org/en/latest/intro/install.html for more details. - - Documentation - ============= - - Documentation is available online at https://docs.scrapy.org/ and in the ``docs`` - directory. - - Releases - ======== - - You can check https://docs.scrapy.org/en/latest/news.html for the release notes. - - Community (blog, twitter, mail list, IRC) - ========================================= - - See https://scrapy.org/community/ for details. - - Contributing - ============ - - See https://docs.scrapy.org/en/master/contributing.html for details. - - Code of Conduct - --------------- - - Please note that this project is released with a Contributor Code of Conduct - (see https://github.com/scrapy/scrapy/blob/master/CODE_OF_CONDUCT.md). - - By participating in this project you agree to abide by its terms. - Please report unacceptable behavior to opensou...@zyte.com. - - Companies using Scrapy - ====================== - - See https://scrapy.org/companies/ for a list. - - Commercial Support - ================== - - See https://scrapy.org/support/ for details. - Platform: UNKNOWN Classifier: Framework :: Scrapy Classifier: Development Status :: 5 - Production/Stable @@ -139,3 +29,117 @@ Classifier: Topic :: Software Development :: Libraries :: Application Frameworks Classifier: Topic :: Software Development :: Libraries :: Python Modules Requires-Python: >=3.6 +License-File: LICENSE +License-File: AUTHORS + +====== +Scrapy +====== + +.. image:: https://img.shields.io/pypi/v/Scrapy.svg + :target: https://pypi.python.org/pypi/Scrapy + :alt: PyPI Version + +.. image:: https://img.shields.io/pypi/pyversions/Scrapy.svg + :target: https://pypi.python.org/pypi/Scrapy + :alt: Supported Python Versions + +.. image:: https://github.com/scrapy/scrapy/workflows/Ubuntu/badge.svg + :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AUbuntu + :alt: Ubuntu + +.. image:: https://github.com/scrapy/scrapy/workflows/macOS/badge.svg + :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AmacOS + :alt: macOS + +.. image:: https://github.com/scrapy/scrapy/workflows/Windows/badge.svg + :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AWindows + :alt: Windows + +.. image:: https://img.shields.io/badge/wheel-yes-brightgreen.svg + :target: https://pypi.python.org/pypi/Scrapy + :alt: Wheel Status + +.. image:: https://img.shields.io/codecov/c/github/scrapy/scrapy/master.svg + :target: https://codecov.io/github/scrapy/scrapy?branch=master + :alt: Coverage report + +.. image:: https://anaconda.org/conda-forge/scrapy/badges/version.svg + :target: https://anaconda.org/conda-forge/scrapy + :alt: Conda Version + + +Overview +======== + +Scrapy is a fast high-level web crawling and web scraping framework, used to +crawl websites and extract structured data from their pages. It can be used for +a wide range of purposes, from data mining to monitoring and automated testing. + +Scrapy is maintained by Zyte_ (formerly Scrapinghub) and `many other +contributors`_. + +.. _many other contributors: https://github.com/scrapy/scrapy/graphs/contributors +.. _Zyte: https://www.zyte.com/ + +Check the Scrapy homepage at https://scrapy.org for more information, +including a list of features. + + +Requirements +============ + +* Python 3.6+ +* Works on Linux, Windows, macOS, BSD + +Install +======= + +The quick way:: + + pip install scrapy + +See the install section in the documentation at +https://docs.scrapy.org/en/latest/intro/install.html for more details. + +Documentation +============= + +Documentation is available online at https://docs.scrapy.org/ and in the ``docs`` +directory. + +Releases +======== + +You can check https://docs.scrapy.org/en/latest/news.html for the release notes. + +Community (blog, twitter, mail list, IRC) +========================================= + +See https://scrapy.org/community/ for details. + +Contributing +============ + +See https://docs.scrapy.org/en/master/contributing.html for details. + +Code of Conduct +--------------- + +Please note that this project is released with a Contributor Code of Conduct +(see https://github.com/scrapy/scrapy/blob/master/CODE_OF_CONDUCT.md). + +By participating in this project you agree to abide by its terms. +Please report unacceptable behavior to opensou...@zyte.com. + +Companies using Scrapy +====================== + +See https://scrapy.org/companies/ for a list. + +Commercial Support +================== + +See https://scrapy.org/support/ for details. + + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.5.0/Scrapy.egg-info/PKG-INFO new/Scrapy-2.5.1/Scrapy.egg-info/PKG-INFO --- old/Scrapy-2.5.0/Scrapy.egg-info/PKG-INFO 2021-04-06 16:48:12.000000000 +0200 +++ new/Scrapy-2.5.1/Scrapy.egg-info/PKG-INFO 2021-10-05 15:48:14.000000000 +0200 @@ -1,6 +1,6 @@ -Metadata-Version: 1.2 +Metadata-Version: 2.1 Name: Scrapy -Version: 2.5.0 +Version: 2.5.1 Summary: A high-level Web Crawling and Web Scraping framework Home-page: https://scrapy.org Author: Scrapy developers @@ -10,116 +10,6 @@ Project-URL: Documentation, https://docs.scrapy.org/ Project-URL: Source, https://github.com/scrapy/scrapy Project-URL: Tracker, https://github.com/scrapy/scrapy/issues -Description: ====== - Scrapy - ====== - - .. image:: https://img.shields.io/pypi/v/Scrapy.svg - :target: https://pypi.python.org/pypi/Scrapy - :alt: PyPI Version - - .. image:: https://img.shields.io/pypi/pyversions/Scrapy.svg - :target: https://pypi.python.org/pypi/Scrapy - :alt: Supported Python Versions - - .. image:: https://github.com/scrapy/scrapy/workflows/Ubuntu/badge.svg - :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AUbuntu - :alt: Ubuntu - - .. image:: https://github.com/scrapy/scrapy/workflows/macOS/badge.svg - :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AmacOS - :alt: macOS - - .. image:: https://github.com/scrapy/scrapy/workflows/Windows/badge.svg - :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AWindows - :alt: Windows - - .. image:: https://img.shields.io/badge/wheel-yes-brightgreen.svg - :target: https://pypi.python.org/pypi/Scrapy - :alt: Wheel Status - - .. image:: https://img.shields.io/codecov/c/github/scrapy/scrapy/master.svg - :target: https://codecov.io/github/scrapy/scrapy?branch=master - :alt: Coverage report - - .. image:: https://anaconda.org/conda-forge/scrapy/badges/version.svg - :target: https://anaconda.org/conda-forge/scrapy - :alt: Conda Version - - - Overview - ======== - - Scrapy is a fast high-level web crawling and web scraping framework, used to - crawl websites and extract structured data from their pages. It can be used for - a wide range of purposes, from data mining to monitoring and automated testing. - - Scrapy is maintained by Zyte_ (formerly Scrapinghub) and `many other - contributors`_. - - .. _many other contributors: https://github.com/scrapy/scrapy/graphs/contributors - .. _Zyte: https://www.zyte.com/ - - Check the Scrapy homepage at https://scrapy.org for more information, - including a list of features. - - - Requirements - ============ - - * Python 3.6+ - * Works on Linux, Windows, macOS, BSD - - Install - ======= - - The quick way:: - - pip install scrapy - - See the install section in the documentation at - https://docs.scrapy.org/en/latest/intro/install.html for more details. - - Documentation - ============= - - Documentation is available online at https://docs.scrapy.org/ and in the ``docs`` - directory. - - Releases - ======== - - You can check https://docs.scrapy.org/en/latest/news.html for the release notes. - - Community (blog, twitter, mail list, IRC) - ========================================= - - See https://scrapy.org/community/ for details. - - Contributing - ============ - - See https://docs.scrapy.org/en/master/contributing.html for details. - - Code of Conduct - --------------- - - Please note that this project is released with a Contributor Code of Conduct - (see https://github.com/scrapy/scrapy/blob/master/CODE_OF_CONDUCT.md). - - By participating in this project you agree to abide by its terms. - Please report unacceptable behavior to opensou...@zyte.com. - - Companies using Scrapy - ====================== - - See https://scrapy.org/companies/ for a list. - - Commercial Support - ================== - - See https://scrapy.org/support/ for details. - Platform: UNKNOWN Classifier: Framework :: Scrapy Classifier: Development Status :: 5 - Production/Stable @@ -139,3 +29,117 @@ Classifier: Topic :: Software Development :: Libraries :: Application Frameworks Classifier: Topic :: Software Development :: Libraries :: Python Modules Requires-Python: >=3.6 +License-File: LICENSE +License-File: AUTHORS + +====== +Scrapy +====== + +.. image:: https://img.shields.io/pypi/v/Scrapy.svg + :target: https://pypi.python.org/pypi/Scrapy + :alt: PyPI Version + +.. image:: https://img.shields.io/pypi/pyversions/Scrapy.svg + :target: https://pypi.python.org/pypi/Scrapy + :alt: Supported Python Versions + +.. image:: https://github.com/scrapy/scrapy/workflows/Ubuntu/badge.svg + :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AUbuntu + :alt: Ubuntu + +.. image:: https://github.com/scrapy/scrapy/workflows/macOS/badge.svg + :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AmacOS + :alt: macOS + +.. image:: https://github.com/scrapy/scrapy/workflows/Windows/badge.svg + :target: https://github.com/scrapy/scrapy/actions?query=workflow%3AWindows + :alt: Windows + +.. image:: https://img.shields.io/badge/wheel-yes-brightgreen.svg + :target: https://pypi.python.org/pypi/Scrapy + :alt: Wheel Status + +.. image:: https://img.shields.io/codecov/c/github/scrapy/scrapy/master.svg + :target: https://codecov.io/github/scrapy/scrapy?branch=master + :alt: Coverage report + +.. image:: https://anaconda.org/conda-forge/scrapy/badges/version.svg + :target: https://anaconda.org/conda-forge/scrapy + :alt: Conda Version + + +Overview +======== + +Scrapy is a fast high-level web crawling and web scraping framework, used to +crawl websites and extract structured data from their pages. It can be used for +a wide range of purposes, from data mining to monitoring and automated testing. + +Scrapy is maintained by Zyte_ (formerly Scrapinghub) and `many other +contributors`_. + +.. _many other contributors: https://github.com/scrapy/scrapy/graphs/contributors +.. _Zyte: https://www.zyte.com/ + +Check the Scrapy homepage at https://scrapy.org for more information, +including a list of features. + + +Requirements +============ + +* Python 3.6+ +* Works on Linux, Windows, macOS, BSD + +Install +======= + +The quick way:: + + pip install scrapy + +See the install section in the documentation at +https://docs.scrapy.org/en/latest/intro/install.html for more details. + +Documentation +============= + +Documentation is available online at https://docs.scrapy.org/ and in the ``docs`` +directory. + +Releases +======== + +You can check https://docs.scrapy.org/en/latest/news.html for the release notes. + +Community (blog, twitter, mail list, IRC) +========================================= + +See https://scrapy.org/community/ for details. + +Contributing +============ + +See https://docs.scrapy.org/en/master/contributing.html for details. + +Code of Conduct +--------------- + +Please note that this project is released with a Contributor Code of Conduct +(see https://github.com/scrapy/scrapy/blob/master/CODE_OF_CONDUCT.md). + +By participating in this project you agree to abide by its terms. +Please report unacceptable behavior to opensou...@zyte.com. + +Companies using Scrapy +====================== + +See https://scrapy.org/companies/ for a list. + +Commercial Support +================== + +See https://scrapy.org/support/ for details. + + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.5.0/Scrapy.egg-info/SOURCES.txt new/Scrapy-2.5.1/Scrapy.egg-info/SOURCES.txt --- old/Scrapy-2.5.0/Scrapy.egg-info/SOURCES.txt 2021-04-06 16:48:12.000000000 +0200 +++ new/Scrapy-2.5.1/Scrapy.egg-info/SOURCES.txt 2021-10-05 15:48:14.000000000 +0200 @@ -25,7 +25,6 @@ docs/faq.rst docs/index.rst docs/news.rst -docs/pip.txt docs/requirements.txt docs/versioning.rst docs/_ext/scrapydocs.py diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.5.0/docs/news.rst new/Scrapy-2.5.1/docs/news.rst --- old/Scrapy-2.5.0/docs/news.rst 2021-04-06 16:48:02.000000000 +0200 +++ new/Scrapy-2.5.1/docs/news.rst 2021-10-05 15:48:05.000000000 +0200 @@ -3,6 +3,44 @@ Release notes ============= +.. _release-2.5.1: + +Scrapy 2.5.1 (2021-10-05) +------------------------- + +* **Security bug fix:** + + If you use + :class:`~scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware` + (i.e. the ``http_user`` and ``http_pass`` spider attributes) for HTTP + authentication, any request exposes your credentials to the request target. + + To prevent unintended exposure of authentication credentials to unintended + domains, you must now additionally set a new, additional spider attribute, + ``http_auth_domain``, and point it to the specific domain to which the + authentication credentials must be sent. + + If the ``http_auth_domain`` spider attribute is not set, the domain of the + first request will be considered the HTTP authentication target, and + authentication credentials will only be sent in requests targeting that + domain. + + If you need to send the same HTTP authentication credentials to multiple + domains, you can use :func:`w3lib.http.basic_auth_header` instead to + set the value of the ``Authorization`` header of your requests. + + If you *really* want your spider to send the same HTTP authentication + credentials to any domain, set the ``http_auth_domain`` spider attribute + to ``None``. + + Finally, if you are a user of `scrapy-splash`_, know that this version of + Scrapy breaks compatibility with scrapy-splash 0.7.2 and earlier. You will + need to upgrade scrapy-splash to a greater version for it to continue to + work. + +.. _scrapy-splash: https://github.com/scrapy-plugins/scrapy-splash + + .. _release-2.5.0: Scrapy 2.5.0 (2021-04-06) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.5.0/docs/pip.txt new/Scrapy-2.5.1/docs/pip.txt --- old/Scrapy-2.5.0/docs/pip.txt 2021-04-06 16:48:02.000000000 +0200 +++ new/Scrapy-2.5.1/docs/pip.txt 1970-01-01 01:00:00.000000000 +0100 @@ -1,3 +0,0 @@ -# In pip 20.3-21.0, the default dependency resolver causes the build in -# ReadTheDocs to fail due to memory exhaustion or timeout. -pip<20.3 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.5.0/docs/requirements.txt new/Scrapy-2.5.1/docs/requirements.txt --- old/Scrapy-2.5.0/docs/requirements.txt 2021-04-06 16:48:02.000000000 +0200 +++ new/Scrapy-2.5.1/docs/requirements.txt 2021-10-05 15:48:05.000000000 +0200 @@ -1,4 +1,4 @@ Sphinx>=3.0 sphinx-hoverxref>=0.2b1 sphinx-notfound-page>=0.4 -sphinx_rtd_theme>=0.4 +sphinx-rtd-theme>=0.5.2 \ No newline at end of file diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.5.0/docs/topics/downloader-middleware.rst new/Scrapy-2.5.1/docs/topics/downloader-middleware.rst --- old/Scrapy-2.5.0/docs/topics/downloader-middleware.rst 2021-04-06 16:48:02.000000000 +0200 +++ new/Scrapy-2.5.1/docs/topics/downloader-middleware.rst 2021-10-05 15:48:05.000000000 +0200 @@ -323,8 +323,21 @@ This middleware authenticates all requests generated from certain spiders using `Basic access authentication`_ (aka. HTTP auth). - To enable HTTP authentication from certain spiders, set the ``http_user`` - and ``http_pass`` attributes of those spiders. + To enable HTTP authentication for a spider, set the ``http_user`` and + ``http_pass`` spider attributes to the authentication data and the + ``http_auth_domain`` spider attribute to the domain which requires this + authentication (its subdomains will be also handled in the same way). + You can set ``http_auth_domain`` to ``None`` to enable the + authentication for all requests but you risk leaking your authentication + credentials to unrelated domains. + + .. warning:: + In previous Scrapy versions HttpAuthMiddleware sent the authentication + data with all requests, which is a security problem if the spider + makes requests to several different domains. Currently if the + ``http_auth_domain`` attribute is not set, the middleware will use the + domain of the first request, which will work for some spiders but not + for others. In the future the middleware will produce an error instead. Example:: @@ -334,6 +347,7 @@ http_user = 'someuser' http_pass = 'somepass' + http_auth_domain = 'intranet.example.com' name = 'intranet.example.com' # .. rest of the spider code omitted ... diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.5.0/scrapy/VERSION new/Scrapy-2.5.1/scrapy/VERSION --- old/Scrapy-2.5.0/scrapy/VERSION 2021-04-06 16:48:02.000000000 +0200 +++ new/Scrapy-2.5.1/scrapy/VERSION 2021-10-05 15:48:05.000000000 +0200 @@ -1 +1 @@ -2.5.0 +2.5.1 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.5.0/scrapy/downloadermiddlewares/httpauth.py new/Scrapy-2.5.1/scrapy/downloadermiddlewares/httpauth.py --- old/Scrapy-2.5.0/scrapy/downloadermiddlewares/httpauth.py 2021-04-06 16:48:02.000000000 +0200 +++ new/Scrapy-2.5.1/scrapy/downloadermiddlewares/httpauth.py 2021-10-05 15:48:05.000000000 +0200 @@ -3,10 +3,14 @@ See documentation in docs/topics/downloader-middleware.rst """ +import warnings from w3lib.http import basic_auth_header from scrapy import signals +from scrapy.exceptions import ScrapyDeprecationWarning +from scrapy.utils.httpobj import urlparse_cached +from scrapy.utils.url import url_is_from_any_domain class HttpAuthMiddleware: @@ -24,8 +28,23 @@ pwd = getattr(spider, 'http_pass', '') if usr or pwd: self.auth = basic_auth_header(usr, pwd) + if not hasattr(spider, 'http_auth_domain'): + warnings.warn('Using HttpAuthMiddleware without http_auth_domain is deprecated and can cause security ' + 'problems if the spider makes requests to several different domains. http_auth_domain ' + 'will be set to the domain of the first request, please set it to the correct value ' + 'explicitly.', + category=ScrapyDeprecationWarning) + self.domain_unset = True + else: + self.domain = spider.http_auth_domain + self.domain_unset = False def process_request(self, request, spider): auth = getattr(self, 'auth', None) if auth and b'Authorization' not in request.headers: - request.headers[b'Authorization'] = auth + domain = urlparse_cached(request).hostname + if self.domain_unset: + self.domain = domain + self.domain_unset = False + if not self.domain or url_is_from_any_domain(request.url, [self.domain]): + request.headers[b'Authorization'] = auth diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.5.0/tests/test_downloadermiddleware_httpauth.py new/Scrapy-2.5.1/tests/test_downloadermiddleware_httpauth.py --- old/Scrapy-2.5.0/tests/test_downloadermiddleware_httpauth.py 2021-04-06 16:48:02.000000000 +0200 +++ new/Scrapy-2.5.1/tests/test_downloadermiddleware_httpauth.py 2021-10-05 15:48:05.000000000 +0200 @@ -1,13 +1,60 @@ import unittest +from w3lib.http import basic_auth_header + from scrapy.http import Request from scrapy.downloadermiddlewares.httpauth import HttpAuthMiddleware from scrapy.spiders import Spider +class TestSpiderLegacy(Spider): + http_user = 'foo' + http_pass = 'bar' + + class TestSpider(Spider): http_user = 'foo' http_pass = 'bar' + http_auth_domain = 'example.com' + + +class TestSpiderAny(Spider): + http_user = 'foo' + http_pass = 'bar' + http_auth_domain = None + + +class HttpAuthMiddlewareLegacyTest(unittest.TestCase): + + def setUp(self): + self.spider = TestSpiderLegacy('foo') + + def test_auth(self): + mw = HttpAuthMiddleware() + mw.spider_opened(self.spider) + + # initial request, sets the domain and sends the header + req = Request('http://example.com/') + assert mw.process_request(req, self.spider) is None + self.assertEqual(req.headers['Authorization'], basic_auth_header('foo', 'bar')) + + # subsequent request to the same domain, should send the header + req = Request('http://example.com/') + assert mw.process_request(req, self.spider) is None + self.assertEqual(req.headers['Authorization'], basic_auth_header('foo', 'bar')) + + # subsequent request to a different domain, shouldn't send the header + req = Request('http://example-noauth.com/') + assert mw.process_request(req, self.spider) is None + self.assertNotIn('Authorization', req.headers) + + def test_auth_already_set(self): + mw = HttpAuthMiddleware() + mw.spider_opened(self.spider) + req = Request('http://example.com/', + headers=dict(Authorization='Digest 123')) + assert mw.process_request(req, self.spider) is None + self.assertEqual(req.headers['Authorization'], b'Digest 123') class HttpAuthMiddlewareTest(unittest.TestCase): @@ -20,13 +67,45 @@ def tearDown(self): del self.mw + def test_no_auth(self): + req = Request('http://example-noauth.com/') + assert self.mw.process_request(req, self.spider) is None + self.assertNotIn('Authorization', req.headers) + + def test_auth_domain(self): + req = Request('http://example.com/') + assert self.mw.process_request(req, self.spider) is None + self.assertEqual(req.headers['Authorization'], basic_auth_header('foo', 'bar')) + + def test_auth_subdomain(self): + req = Request('http://foo.example.com/') + assert self.mw.process_request(req, self.spider) is None + self.assertEqual(req.headers['Authorization'], basic_auth_header('foo', 'bar')) + + def test_auth_already_set(self): + req = Request('http://example.com/', + headers=dict(Authorization='Digest 123')) + assert self.mw.process_request(req, self.spider) is None + self.assertEqual(req.headers['Authorization'], b'Digest 123') + + +class HttpAuthAnyMiddlewareTest(unittest.TestCase): + + def setUp(self): + self.mw = HttpAuthMiddleware() + self.spider = TestSpiderAny('foo') + self.mw.spider_opened(self.spider) + + def tearDown(self): + del self.mw + def test_auth(self): - req = Request('http://scrapytest.org/') + req = Request('http://example.com/') assert self.mw.process_request(req, self.spider) is None - self.assertEqual(req.headers['Authorization'], b'Basic Zm9vOmJhcg==') + self.assertEqual(req.headers['Authorization'], basic_auth_header('foo', 'bar')) def test_auth_already_set(self): - req = Request('http://scrapytest.org/', + req = Request('http://example.com/', headers=dict(Authorization='Digest 123')) assert self.mw.process_request(req, self.spider) is None self.assertEqual(req.headers['Authorization'], b'Digest 123') diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/Scrapy-2.5.0/tox.ini new/Scrapy-2.5.1/tox.ini --- old/Scrapy-2.5.0/tox.ini 2021-04-06 16:48:02.000000000 +0200 +++ new/Scrapy-2.5.1/tox.ini 2021-10-05 15:48:05.000000000 +0200 @@ -19,6 +19,8 @@ mitmproxy >= 4.0.4, < 5; python_version >= '3.6' and python_version < '3.7' and platform_system != 'Windows' and implementation_name != 'pypy' # Extras botocore>=1.4.87 + # Peek support breaks tests. + queuelib < 1.6.0 passenv = S3_TEST_FILE_URI AWS_ACCESS_KEY_ID @@ -63,7 +65,8 @@ reppy robotexclusionrulesparser # Test dependencies - pylint + # Force the pylint version used in CI for the 2.5.0 tag + pylint==2.7.4 commands = pylint conftest.py docs extras scrapy setup.py tests