Script 'mail_helper' called by obssrc
Hello community,
here is the log from the commit of package python-beautifulsoup4 for
openSUSE:Factory checked in at 2025-10-15 12:44:34
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-beautifulsoup4 (Old)
and /work/SRC/openSUSE:Factory/.python-beautifulsoup4.new.18484 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-beautifulsoup4"
Wed Oct 15 12:44:34 2025 rev:45 rq:1311186 version:4.14.2
Changes:
--------
---
/work/SRC/openSUSE:Factory/python-beautifulsoup4/python-beautifulsoup4.changes
2025-09-11 14:39:11.611423595 +0200
+++
/work/SRC/openSUSE:Factory/.python-beautifulsoup4.new.18484/python-beautifulsoup4.changes
2025-10-15 12:44:37.334994871 +0200
@@ -1,0 +2,26 @@
+Mon Oct 13 09:11:52 UTC 2025 - Dirk Müller <[email protected]>
+
+- update to 4.14.2:
+ * Making ResultSet inherit from MutableSequence still resulted
+ in too many breaking changes in users of the library,
+ so I reverted the ResultSet code back to where it was in 4.13.5
+ and added tests of all known breaking behavior. [bug=2125906]
+ * Made ResultSet inherit from MutableSequence instead of
+ Sequence, since lots of existing code treats ResultSet as a
+ mutable list.
+ * This version adds function overloading to the find_* methods
+ to make it easier to write type-safe Python.
+ * The typing for find_parent() and find_parents() was improved
+ without any overloading. Casts should never be necessary,
+ since those methods only ever return Tag and ResultSet[Tag],
+ respectively.
+ * ResultSet now inherits from Sequence. This should make it
+ easier to incorporate ResultSet objects into your type system
+ without needing to handle ResultSet specially.
+ * Fixed an unhandled exception when creating the string
+ representation of a decomposed element.
+ * The default value for the 'attrs' attribute in find* methods
+ is now None, not the empty dictionary. This should have no visible
+ effect on anything.
+
+-------------------------------------------------------------------
Old:
----
beautifulsoup4-4.13.5.tar.gz
New:
----
beautifulsoup4-4.14.2.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-beautifulsoup4.spec ++++++
--- /var/tmp/diff_new_pack.al2S3k/_old 2025-10-15 12:44:38.075025820 +0200
+++ /var/tmp/diff_new_pack.al2S3k/_new 2025-10-15 12:44:38.079025988 +0200
@@ -1,7 +1,7 @@
#
# spec file for package python-beautifulsoup4
#
-# Copyright (c) 2025 SUSE LLC
+# Copyright (c) 2025 SUSE LLC and contributors
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@@ -18,7 +18,7 @@
%{?sle15_python_module_pythons}
Name: python-beautifulsoup4
-Version: 4.13.5
+Version: 4.14.2
Release: 0
Summary: HTML/XML Parser for Quick-Turnaround Applications Like
Screen-Scraping
License: MIT
++++++ beautifulsoup4-4.13.5.tar.gz -> beautifulsoup4-4.14.2.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.13.5/CHANGELOG
new/beautifulsoup4-4.14.2/CHANGELOG
--- old/beautifulsoup4-4.13.5/CHANGELOG 2020-02-02 01:00:00.000000000 +0100
+++ new/beautifulsoup4-4.14.2/CHANGELOG 2020-02-02 01:00:00.000000000 +0100
@@ -1,3 +1,115 @@
+= 4.14.2 (20250929)
+
+* Making ResultSet inherit from MutableSequence still resulted in too many
+ breaking changes in users of the library, so I reverted the
+ ResultSet code back to where it was in 4.13.5 and added tests of all known
+ breaking behavior. [bug=2125906]
+
+= 4.14.1 (20250929)
+
+* Made ResultSet inherit from MutableSequence instead of Sequence,
+ since lots of existing code treats ResultSet as a mutable list.
+ [bug=2125906,2125903]
+
+= 4.14.0 (20250927)
+
+* This version adds function overloading to the find_* methods to make
+ it easier to write type-safe Python.
+
+ In most cases you can just assign the result of a find() or
+ find_all() call to the type of object you're expecting to get back:
+ a Tag, a NavigableString, a Sequence[Tag], or a
+ Sequence[NavigableString]. It's very rare that you'll have to do a
+ cast or suppress type-checker warnings like you did in previous
+ versions of Beautiful Soup.
+
+ (In fact, the only time you should still have to do this is if you
+ pass both 'string' and one of the other arguments into one of the
+ find* methods, e.g. tag.find("a", string="tag contents".)
+
+ The following code has been verified to pass type checking using
+ mypy, pyright, and the Visual Studio Code IDE. It's available in
+ the source repository as scripts/type_checking_smoke_test.py.
+
+---
+from typing import Optional, Sequence
+from bs4 import BeautifulSoup, Tag, NavigableString
+soup = BeautifulSoup("<p>", 'html.parser')
+
+tag:Optional[Tag]
+string:Optional[NavigableString]
+tags:Sequence[Tag]
+strings:Sequence[NavigableString]
+
+tag = soup.find()
+tag = soup.find(id="a")
+string = soup.find(string="b")
+
+tags = soup()
+tags = soup(id="a")
+strings = soup(string="b")
+
+tags = soup.find_all()
+tags = soup.find_all(id="a")
+strings = soup.find_all(string="b")
+
+tag = soup.find_next()
+tag = soup.find_next(id="a")
+string = soup.find_next(string="b")
+
+tags = soup.find_all_next()
+tags = soup.find_all_next(id="a")
+strings = soup.find_all_next(string="b")
+
+tag = soup.find_next_sibling()
+tag = soup.find_next_sibling(id="a")
+string = soup.find_next_sibling(string="b")
+
+tags = soup.find_next_siblings()
+tags = soup.find_next_siblings(id="a")
+strings = soup.find_next_siblings(string="b")
+
+tag = soup.find_previous()
+tag = soup.find_previous(id="a")
+string = soup.find_previous(string="b")
+
+tags = soup.find_all_previous()
+tags = soup.find_all_previous(id="a")
+strings = soup.find_all_previous(string="b")
+
+tag = soup.find_previous_sibling()
+tag = soup.find_previous_sibling(id="a")
+string = soup.find_previous_sibling(string="bold")
+
+tags = soup.find_previous_siblings()
+tags = soup.find_previous_siblings(id="a")
+strings = soup.find_previous_siblings(string="b")
+
+tag = soup.find_parent()
+tag = soup.find_parent(id="a")
+tags = soup.find_parents()
+tags = soup.find_parents(id="a")
+
+# This code will work, but mypy and pyright will both flag it.
+tags = soup.find_all("a", string="b")
+---
+
+* The typing for find_parent() and find_parents() was improved without
+ any overloading. Casts should never be necessary, since those
+ methods only ever return Tag and ResultSet[Tag], respectively.
+
+* ResultSet now inherits from Sequence. This should make it easier to
+ incorporate ResultSet objects into your type system without needing to
+ handle ResultSet specially.
+
+* Fixed an unhandled exception when creating the string representation of
+ a decomposed element. (The output is not *useful* and you still
+ shouldn't do this, but it won't raise an exception anymore.) [bug=2120300]
+
+* The default value for the 'attrs' attribute in find* methods is now
+ None, not the empty dictionary. This should have no visible effect
+ on anything.
+
= 4.13.5 (20250824)
* Fixed an unhandled exception when parsing invalid markup that contains the {
character
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.13.5/PKG-INFO
new/beautifulsoup4-4.14.2/PKG-INFO
--- old/beautifulsoup4-4.13.5/PKG-INFO 2020-02-02 01:00:00.000000000 +0100
+++ new/beautifulsoup4-4.14.2/PKG-INFO 2020-02-02 01:00:00.000000000 +0100
@@ -1,6 +1,6 @@
Metadata-Version: 2.4
Name: beautifulsoup4
-Version: 4.13.5
+Version: 4.14.2
Summary: Screen-scraping library
Project-URL: Download,
https://www.crummy.com/software/BeautifulSoup/bs4/download/
Project-URL: Homepage, https://www.crummy.com/software/BeautifulSoup/bs4/
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.13.5/bs4/__init__.py
new/beautifulsoup4-4.14.2/bs4/__init__.py
--- old/beautifulsoup4-4.13.5/bs4/__init__.py 2020-02-02 01:00:00.000000000
+0100
+++ new/beautifulsoup4-4.14.2/bs4/__init__.py 2020-02-02 01:00:00.000000000
+0100
@@ -15,7 +15,7 @@
"""
__author__ = "Leonard Richardson ([email protected])"
-__version__ = "4.13.5"
+__version__ = "4.14.2"
__copyright__ = "Copyright (c) 2004-2025 Leonard Richardson"
# Use of this source code is governed by the MIT license.
__license__ = "MIT"
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.13.5/bs4/_typing.py
new/beautifulsoup4-4.14.2/bs4/_typing.py
--- old/beautifulsoup4-4.13.5/bs4/_typing.py 2020-02-02 01:00:00.000000000
+0100
+++ new/beautifulsoup4-4.14.2/bs4/_typing.py 2020-02-02 01:00:00.000000000
+0100
@@ -198,4 +198,8 @@
#: are available on the objects they're dealing with.
_OneElement: TypeAlias = Union["PageElement", "Tag", "NavigableString"]
_AtMostOneElement: TypeAlias = Optional[_OneElement]
+_AtMostOneTag: TypeAlias = Optional["Tag"]
+_AtMostOneNavigableString: TypeAlias = Optional["NavigableString"]
_QueryResults: TypeAlias = "ResultSet[_OneElement]"
+_SomeTags: TypeAlias = "ResultSet[Tag]"
+_SomeNavigableStrings: TypeAlias = "ResultSet[NavigableString]"
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.13.5/bs4/css.py
new/beautifulsoup4-4.14.2/bs4/css.py
--- old/beautifulsoup4-4.13.5/bs4/css.py 2020-02-02 01:00:00.000000000
+0100
+++ new/beautifulsoup4-4.14.2/bs4/css.py 2020-02-02 01:00:00.000000000
+0100
@@ -20,6 +20,7 @@
cast,
Iterable,
Iterator,
+ MutableSequence,
Optional,
TYPE_CHECKING,
)
@@ -88,7 +89,7 @@
ns = self.tag._namespaces
return ns
- def _rs(self, results: Iterable[Tag]) -> ResultSet[Tag]:
+ def _rs(self, results: MutableSequence[Tag]) -> ResultSet[Tag]:
"""Normalize a list of results to a py:class:`ResultSet`.
A py:class:`ResultSet` is more consistent with the rest of
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.13.5/bs4/element.py
new/beautifulsoup4-4.14.2/bs4/element.py
--- old/beautifulsoup4-4.13.5/bs4/element.py 2020-02-02 01:00:00.000000000
+0100
+++ new/beautifulsoup4-4.14.2/bs4/element.py 2020-02-02 01:00:00.000000000
+0100
@@ -28,6 +28,7 @@
Iterator,
List,
Mapping,
+ MutableSequence,
Optional,
Pattern,
Set,
@@ -54,6 +55,8 @@
)
from bs4._typing import (
_AtMostOneElement,
+ _AtMostOneTag,
+ _AtMostOneNavigableString,
_AttributeValue,
_AttributeValues,
_Encoding,
@@ -65,6 +68,8 @@
_StrainableAttribute,
_StrainableAttributes,
_StrainableString,
+ _SomeNavigableStrings,
+ _SomeTags,
)
_OneOrMoreStringTypes: TypeAlias = Union[
@@ -651,6 +656,7 @@
next_up = e.next_element
e.__dict__.clear()
if isinstance(e, Tag):
+ e.name = ""
e.contents = []
e._decomposed = True
e = next_up
@@ -745,13 +751,35 @@
return results
+ # For the suppression of this pyright warning, see discussion here:
+ # https://github.com/microsoft/pyright/issues/10929
+ @overload
+ def find_next( # pyright: ignore [reportOverlappingOverload]
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
+ string: None=None,
+ **kwargs: _StrainableAttribute,
+ ) -> _AtMostOneTag:
+ ...
+
+ @overload
+ def find_next(
+ self,
+ name: None=None,
+ attrs: None=None,
+ string: _StrainableString="",
+ **kwargs: _StrainableAttribute,
+ ) -> _AtMostOneNavigableString:
+ ...
+
def find_next(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
string: Optional[_StrainableString] = None,
**kwargs: _StrainableAttribute,
- ) -> _AtMostOneElement:
+ ) -> Union[_AtMostOneTag,_AtMostOneNavigableString,_AtMostOneElement]:
"""Find the first PageElement that matches the given criteria and
appears later in the document than this PageElement.
@@ -767,15 +795,39 @@
findNext = _deprecated_function_alias("findNext", "find_next", "4.0.0")
+ @overload
+ def find_all_next( # pyright: ignore [reportOverlappingOverload]
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
+ string: None = None,
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeTags:
+ ...
+
+ @overload
+ def find_all_next(
+ self,
+ name: None = None,
+ attrs: None = None,
+ string: _StrainableString = "",
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeNavigableStrings:
+ ...
+
def find_all_next(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
string: Optional[_StrainableString] = None,
limit: Optional[int] = None,
_stacklevel: int = 2,
**kwargs: _StrainableAttribute,
- ) -> _QueryResults:
+ ) -> Union[_SomeTags,_SomeNavigableStrings,_QueryResults]:
"""Find all `PageElement` objects that match the given criteria and
appear later in the document than this `PageElement`.
@@ -801,13 +853,33 @@
findAllNext = _deprecated_function_alias("findAllNext", "find_all_next",
"4.0.0")
+ @overload
+ def find_next_sibling( # pyright: ignore [reportOverlappingOverload]
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
+ string: None=None,
+ **kwargs: _StrainableAttribute,
+ ) -> _AtMostOneTag:
+ ...
+
+ @overload
+ def find_next_sibling(
+ self,
+ name: None=None,
+ attrs: None=None,
+ string: _StrainableString="",
+ **kwargs: _StrainableAttribute,
+ ) -> _AtMostOneNavigableString:
+ ...
+
def find_next_sibling(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
string: Optional[_StrainableString] = None,
**kwargs: _StrainableAttribute,
- ) -> _AtMostOneElement:
+ ) -> Union[_AtMostOneTag,_AtMostOneNavigableString,_AtMostOneElement]:
"""Find the closest sibling to this PageElement that matches the
given criteria and appears later in the document.
@@ -825,15 +897,39 @@
"findNextSibling", "find_next_sibling", "4.0.0"
)
+ @overload
+ def find_next_siblings( # pyright: ignore [reportOverlappingOverload]
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
+ string: None = None,
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeTags:
+ ...
+
+ @overload
+ def find_next_siblings(
+ self,
+ name: None = None,
+ attrs: None = None,
+ string: _StrainableString = "",
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeNavigableStrings:
+ ...
+
def find_next_siblings(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
string: Optional[_StrainableString] = None,
limit: Optional[int] = None,
_stacklevel: int = 2,
**kwargs: _StrainableAttribute,
- ) -> _QueryResults:
+ ) -> Union[_SomeTags,_SomeNavigableStrings,_QueryResults]:
"""Find all siblings of this `PageElement` that match the given
criteria
and appear later in the document.
@@ -864,13 +960,33 @@
"fetchNextSiblings", "find_next_siblings", "3.0.0"
)
+ @overload
+ def find_previous( # pyright: ignore [reportOverlappingOverload]
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
+ string: None=None,
+ **kwargs: _StrainableAttribute,
+ ) -> _AtMostOneTag:
+ ...
+
+ @overload
+ def find_previous(
+ self,
+ name: None=None,
+ attrs: None=None,
+ string: _StrainableString="",
+ **kwargs: _StrainableAttribute,
+ ) -> _AtMostOneNavigableString:
+ ...
+
def find_previous(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
string: Optional[_StrainableString] = None,
**kwargs: _StrainableAttribute,
- ) -> _AtMostOneElement:
+ ) -> Union[_AtMostOneTag,_AtMostOneNavigableString,_AtMostOneElement]:
"""Look backwards in the document from this `PageElement` and find the
first `PageElement` that matches the given criteria.
@@ -886,15 +1002,39 @@
findPrevious = _deprecated_function_alias("findPrevious", "find_previous",
"3.0.0")
+ @overload
+ def find_all_previous( # pyright: ignore [reportOverlappingOverload]
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
+ string: None = None,
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeTags:
+ ...
+
+ @overload
+ def find_all_previous(
+ self,
+ name: None = None,
+ attrs: None = None,
+ string: _StrainableString = "",
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeNavigableStrings:
+ ...
+
def find_all_previous(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
string: Optional[_StrainableString] = None,
limit: Optional[int] = None,
_stacklevel: int = 2,
**kwargs: _StrainableAttribute,
- ) -> _QueryResults:
+ ) -> Union[_SomeTags,_SomeNavigableStrings,_QueryResults]:
"""Look backwards in the document from this `PageElement` and find all
`PageElement` that match the given criteria.
@@ -925,13 +1065,33 @@
"fetchAllPrevious", "find_all_previous", "3.0.0"
)
+ @overload
+ def find_previous_sibling( # pyright: ignore [reportOverlappingOverload]
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
+ string: None=None,
+ **kwargs: _StrainableAttribute,
+ ) -> _AtMostOneTag:
+ ...
+
+ @overload
+ def find_previous_sibling(
+ self,
+ name: None=None,
+ attrs: None=None,
+ string: _StrainableString="",
+ **kwargs: _StrainableAttribute,
+ ) -> _AtMostOneNavigableString:
+ ...
+
def find_previous_sibling(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
string: Optional[_StrainableString] = None,
**kwargs: _StrainableAttribute,
- ) -> _AtMostOneElement:
+ ) -> Union[_AtMostOneTag,_AtMostOneNavigableString,_AtMostOneElement]:
"""Returns the closest sibling to this `PageElement` that matches the
given criteria and appears earlier in the document.
@@ -951,15 +1111,39 @@
"findPreviousSibling", "find_previous_sibling", "4.0.0"
)
+ @overload
+ def find_previous_siblings( # pyright: ignore [reportOverlappingOverload]
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
+ string: None = None,
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeTags:
+ ...
+
+ @overload
+ def find_previous_siblings(
+ self,
+ name: None = None,
+ attrs: None = None,
+ string: _StrainableString = "",
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeNavigableStrings:
+ ...
+
def find_previous_siblings(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
string: Optional[_StrainableString] = None,
limit: Optional[int] = None,
_stacklevel: int = 2,
**kwargs: _StrainableAttribute,
- ) -> _QueryResults:
+ ) -> Union[_SomeTags,_SomeNavigableStrings,_QueryResults]:
"""Returns all siblings to this PageElement that match the
given criteria and appear earlier in the document.
@@ -993,9 +1177,9 @@
def find_parent(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
**kwargs: _StrainableAttribute,
- ) -> _AtMostOneElement:
+ ) -> _AtMostOneTag:
"""Find the closest parent of this PageElement that matches the given
criteria.
@@ -1023,11 +1207,11 @@
def find_parents(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
limit: Optional[int] = None,
_stacklevel: int = 2,
**kwargs: _StrainableAttribute,
- ) -> _QueryResults:
+ ) -> _SomeTags:
"""Find all parents of this `PageElement` that match the given
criteria.
All find_* methods take a common set of arguments. See the online
@@ -1040,9 +1224,11 @@
:kwargs: Additional filters on attribute values.
"""
iterator = self.parents
- return self._find_all(
+ # Only Tags can have children, so this ResultSet will contain
+ # nothing but Tags.
+ return cast(ResultSet[Tag], self._find_all(
name, attrs, None, limit, iterator, _stacklevel=_stacklevel + 1,
**kwargs
- )
+ ))
findParents = _deprecated_function_alias("findParents", "find_parents",
"4.0.0")
fetchParents = _deprecated_function_alias("fetchParents", "find_parents",
"3.0.0")
@@ -1067,7 +1253,7 @@
# specific here.
method: Callable,
name: _FindMethodName,
- attrs: _StrainableAttributes,
+ attrs: Optional[_StrainableAttributes],
string: Optional[_StrainableString],
**kwargs: _StrainableAttribute,
) -> _AtMostOneElement:
@@ -1080,7 +1266,7 @@
def _find_all(
self,
name: _FindMethodName,
- attrs: _StrainableAttributes,
+ attrs: Optional[_StrainableAttributes],
string: Optional[_StrainableString],
limit: Optional[int],
generator: Iterator[PageElement],
@@ -1115,11 +1301,11 @@
else:
matcher = SoupStrainer(name, attrs, string, **kwargs)
- result: Iterable[_OneElement]
+ result: MutableSequence[_OneElement]
if string is None and not limit and not attrs and not kwargs:
if name is True or name is None:
# Optimization to find all tags.
- result = (element for element in generator if
isinstance(element, Tag))
+ result = [element for element in generator if
isinstance(element, Tag)]
return ResultSet(matcher, result)
elif isinstance(name, str):
# Optimization to find all tags with a given name.
@@ -2238,22 +2424,63 @@
"Deleting tag[key] deletes all 'key' attributes for the tag."
self.attrs.pop(key, None)
+ @overload
+ def __call__( # pyright: ignore [reportOverlappingOverload]
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
+ recursive: bool = True,
+ string: None = None,
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeTags:
+ ...
+
+ @overload
def __call__(
self,
- name: Optional[_StrainableElement] = None,
- attrs: _StrainableAttributes = {},
+ name: None = None,
+ attrs: None = None,
+ recursive: bool = True,
+ string: _StrainableString = "",
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeNavigableStrings:
+ ...
+
+ def __call__(
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
recursive: bool = True,
string: Optional[_StrainableString] = None,
limit: Optional[int] = None,
_stacklevel: int = 2,
**kwargs: _StrainableAttribute,
- ) -> _QueryResults:
+ ) -> Union[_SomeTags,_SomeNavigableStrings,_QueryResults]:
"""Calling a Tag like a function is the same as calling its
find_all() method. Eg. tag('a') returns a list of all the A tags
found within this tag."""
- return self.find_all(
- name, attrs, recursive, string, limit, _stacklevel, **kwargs
+ if string is not None and (name is not None or attrs is not None or
kwargs):
+ # TODO: Using the @overload decorator to express the three ways you
+ # could get into this path is way too much code for a rarely(?)
used
+ # feature.
+ return cast(ResultSet[Tag], self.find_all(name, attrs, recursive,
string, limit, _stacklevel, **kwargs)) #type: ignore
+
+ if string is None:
+ # If string is None, we're searching for tags.
+ tags:ResultSet[Tag] = self.find_all(
+ name, attrs, recursive, None, limit, _stacklevel, **kwargs
+ )
+ return tags
+
+ # Otherwise, we're searching for strings.
+ strings:ResultSet[NavigableString] = self.find_all(
+ None, None, recursive, string, limit, _stacklevel, **kwargs
)
+ return strings
def __getattr__(self, subtag: str) -> Optional[Tag]:
"""Calling tag.subtag is the same as calling tag.find(name="subtag")"""
@@ -2276,7 +2503,7 @@
raise AttributeError(
"'%s' object has no attribute '%s'" % (self.__class__, subtag)
)
- return cast(Optional[Tag], result)
+ return result
def __eq__(self, other: Any) -> bool:
"""Returns true iff this Tag has the same name, the same attributes,
@@ -2706,14 +2933,35 @@
# Soup methods
+ @overload
+ def find(
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
+ recursive: bool = True,
+ string: None=None,
+ **kwargs: _StrainableAttribute,
+ ) -> _AtMostOneTag:
+ ...
+
+ @overload
+ def find(
+ self,
+ name: None=None,
+ attrs: None=None,
+ recursive: bool = True,
+ string: _StrainableString="",
+ ) -> _AtMostOneNavigableString:
+ ...
+
def find(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
recursive: bool = True,
string: Optional[_StrainableString] = None,
**kwargs: _StrainableAttribute,
- ) -> _AtMostOneElement:
+ ) -> Union[_AtMostOneTag,_AtMostOneNavigableString,_AtMostOneElement]:
"""Look in the children of this PageElement and find the first
PageElement that matches the given criteria.
@@ -2726,27 +2974,63 @@
recursive search of this Tag's children. Otherwise,
only the direct children will be considered.
:param string: A filter on the `Tag.string` attribute.
- :param limit: Stop looking after finding this many results.
:kwargs: Additional filters on attribute values.
"""
- r = None
- results = self.find_all(name, attrs, recursive, string, 1,
_stacklevel=3, **kwargs)
- if results:
- r = results[0]
- return r
+ if string is not None and (name is not None or attrs is not None or
kwargs):
+ # TODO: Using the @overload decorator to express the three ways you
+ # could get into this path is way too much code for a rarely(?)
used
+ # feature.
+ elements = self.find_all(name, attrs, recursive, string, 1,
_stacklevel=3, **kwargs) # type:ignore
+ if elements:
+ return cast(Tag, elements[0])
+ elif string is None:
+ tags = self.find_all(name, attrs, recursive, None, 1,
_stacklevel=3, **kwargs)
+ if tags:
+ return cast(Tag, tags[0])
+ else:
+ strings = self.find_all(None, None, recursive, string, 1,
_stacklevel=3, **kwargs)
+ if strings:
+ return cast(NavigableString, strings[0])
+ return None
findChild = _deprecated_function_alias("findChild", "find", "3.0.0")
+ @overload
+ def find_all( # pyright: ignore [reportOverlappingOverload]
+ self,
+ name: _FindMethodName = None,
+ attrs: Optional[_StrainableAttributes] = None,
+ recursive: bool = True,
+ string: None = None,
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeTags:
+ ...
+
+ @overload
+ def find_all(
+ self,
+ name: None = None,
+ attrs: None = None,
+ recursive: bool = True,
+ string: _StrainableString = "",
+ limit: Optional[int] = None,
+ _stacklevel: int = 2,
+ **kwargs: _StrainableAttribute,
+ ) -> _SomeNavigableStrings:
+ ...
+
def find_all(
self,
name: _FindMethodName = None,
- attrs: _StrainableAttributes = {},
+ attrs: Optional[_StrainableAttributes] = None,
recursive: bool = True,
string: Optional[_StrainableString] = None,
limit: Optional[int] = None,
_stacklevel: int = 2,
**kwargs: _StrainableAttribute,
- ) -> _QueryResults:
+ ) -> Union[_SomeTags,_SomeNavigableStrings,_QueryResults]:
"""Look in the children of this `PageElement` and find all
`PageElement` objects that match the given criteria.
@@ -2765,9 +3049,27 @@
generator = self.descendants
if not recursive:
generator = self.children
- return self._find_all(
- name, attrs, string, limit, generator, _stacklevel=_stacklevel +
1, **kwargs
- )
+ _stacklevel += 1
+
+ if string is not None and (name is not None or attrs is not None or
kwargs):
+ # TODO: Using the @overload decorator to express the three ways you
+ # could get into this path is way too much code for a rarely(?)
used
+ # feature.
+ return cast(ResultSet[Tag],
+ self._find_all(name, attrs, string, limit, generator,
+ _stacklevel=_stacklevel, **kwargs)
+ )
+
+ if string is None:
+ # If string is None, we're searching for tags.
+ return cast(ResultSet[Tag], self._find_all(
+ name, attrs, None, limit, generator, _stacklevel=_stacklevel,
**kwargs
+ ))
+
+ # Otherwise, we're searching for strings.
+ return cast(ResultSet[NavigableString], self._find_all(
+ None, None, string, limit, generator, _stacklevel=_stacklevel,
**kwargs
+ ))
findAll = _deprecated_function_alias("findAll", "find_all", "4.0.0")
findChildren = _deprecated_function_alias("findChildren", "find_all",
"3.0.0")
@@ -2882,7 +3184,6 @@
_PageElementT = TypeVar("_PageElementT", bound=PageElement)
-
class ResultSet(List[_PageElementT], Generic[_PageElementT]):
"""A ResultSet is a list of `PageElement` objects, gathered as the result
of matching an :py:class:`ElementFilter` against a parse tree. Basically,
a list of
@@ -2903,7 +3204,6 @@
f"""ResultSet object has no attribute "{key}". You're probably
treating a list of elements like a single element. Did you call find_all() when
you meant to call find()?"""
)
-
# Now that all the classes used by SoupStrainer have been defined,
# import SoupStrainer itself into this module to preserve the
# backwards compatibility of anyone who imports
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.13.5/bs4/filter.py
new/beautifulsoup4-4.14.2/bs4/filter.py
--- old/beautifulsoup4-4.13.5/bs4/filter.py 2020-02-02 01:00:00.000000000
+0100
+++ new/beautifulsoup4-4.14.2/bs4/filter.py 2020-02-02 01:00:00.000000000
+0100
@@ -136,8 +136,7 @@
# If there are no rules at all, don't bother filtering. Let
# anything through.
if self.includes_everything:
- for i in generator:
- yield i
+ yield from generator
while True:
try:
i = next(generator)
@@ -175,12 +174,12 @@
:param limit: Stop looking after finding this many results.
"""
- results: _QueryResults = ResultSet(self)
+ results = []
for match in self.filter(generator):
results.append(match)
if limit is not None and len(results) >= limit:
break
- return results
+ return ResultSet(self, results)
def allow_tag_creation(
self, nsprefix: Optional[str], name: str, attrs:
Optional[_RawAttributeValues]
@@ -379,7 +378,7 @@
def __init__(
self,
name: Optional[_StrainableElement] = None,
- attrs: Dict[str, _StrainableAttribute] = {},
+ attrs: Optional[Dict[str, _StrainableAttribute]] = None,
string: Optional[_StrainableString] = None,
**kwargs: _StrainableAttribute,
):
@@ -397,11 +396,13 @@
# that matches all Tags, and only Tags.
self.name_rules = [TagNameMatchRule(present=True)]
else:
- self.name_rules = cast(
- List[TagNameMatchRule], list(self._make_match_rules(name,
TagNameMatchRule))
- )
+ self.name_rules = cast(
+ List[TagNameMatchRule], list(self._make_match_rules(name,
TagNameMatchRule))
+ )
self.attribute_rules = defaultdict(list)
+ if attrs is None:
+ attrs = {}
if not isinstance(attrs, dict):
# Passing something other than a dictionary as attrs is
# sugar for matching that thing against the 'class'
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.13.5/bs4/tests/test_element.py
new/beautifulsoup4-4.14.2/bs4/tests/test_element.py
--- old/beautifulsoup4-4.13.5/bs4/tests/test_element.py 2020-02-02
01:00:00.000000000 +0100
+++ new/beautifulsoup4-4.14.2/bs4/tests/test_element.py 2020-02-02
01:00:00.000000000 +0100
@@ -13,6 +13,7 @@
NamespacedAttribute,
ResultSet,
)
+from bs4.filter import ElementFilter
class TestNamedspacedAttribute:
def test_name_may_be_none_or_missing(self):
@@ -136,3 +137,42 @@
"""ResultSet object has no attribute "name". You're probably
treating a list of elements like a single element. Did you call find_all() when
you meant to call find()?"""
== str(e.value)
)
+
+ def test_len(self):
+ # The length of a ResultSet is the length of its result sequence.
+ rs = ResultSet(None, [1,2,3])
+ assert len(rs) == 3
+
+ def test_getitem(self):
+ # __getitem__ is delegated to the result sequence.
+ rs = ResultSet(None, [1,2,3])
+ assert rs[1] == 2
+
+ def test_equality(self):
+ # A ResultSet is equal to a list if its result sequence is equal to
that list.
+ l = [1, 2, 3]
+ rs1 = ResultSet(None, [1,2,3])
+ assert l == rs1
+ assert l != (1,2,3)
+
+ rs2 = ResultSet(None, [1,2])
+ assert l != rs2
+
+ # A ResultSet is equal to another ResultSet if their results are equal
+ assert rs1 == rs1
+ assert rs1 != rs2
+
+ # Even if the results come from two different sources, the ResultSets
are equal.
+ assert ResultSet(ElementFilter(), [1,2,3]) == rs1
+
+ def test_mutability(self):
+ # A ResultSet is mutable. (Lots of external code depends on this.)
+ rs = ResultSet(None, [1,2,3])
+ rs[1] = 4
+ assert rs == [1,4,3]
+
+ def test_add_resultsets_together(self):
+ # ResultSets can be added together like lists. (pandas depends on
this.)
+ rs1 = ResultSet(None, [1,2,3])
+ rs2 = ResultSet(None, [4,5,6])
+ assert rs1 + rs2 == [1,2,3,4,5,6]
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.13.5/bs4/tests/test_tree.py
new/beautifulsoup4-4.14.2/bs4/tests/test_tree.py
--- old/beautifulsoup4-4.13.5/bs4/tests/test_tree.py 2020-02-02
01:00:00.000000000 +0100
+++ new/beautifulsoup4-4.14.2/bs4/tests/test_tree.py 2020-02-02
01:00:00.000000000 +0100
@@ -1378,6 +1378,8 @@
# p2 is unaffected.
assert False is p2.decomposed
+ assert "<></>" == str(p1)
+
def test_decompose_string(self):
soup = self.soup("<div><p>String 1</p><p>String 2</p></p>")
div = soup.div
@@ -1386,6 +1388,7 @@
text.decompose()
assert True is text.decomposed
assert "<div><p></p><p>String 2</p></div>" == div.decode()
+ assert "String 1" == str(text)
def test_string_set(self):
"""Tag.string = 'string'"""
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.13.5/doc/index.rst
new/beautifulsoup4-4.14.2/doc/index.rst
--- old/beautifulsoup4-4.13.5/doc/index.rst 2020-02-02 01:00:00.000000000
+0100
+++ new/beautifulsoup4-4.14.2/doc/index.rst 2020-02-02 01:00:00.000000000
+0100
@@ -16,7 +16,7 @@
how to use it, how to make it do what you want, and what to do when it
violates your expectations.
-This document covers Beautiful Soup version 4.13.5. The examples in
+This document covers Beautiful Soup version 4.14.2. The examples in
this documentation were written for Python 3.8.
You might be looking for the documentation for `Beautiful Soup 3
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.13.5/pyproject.toml
new/beautifulsoup4-4.14.2/pyproject.toml
--- old/beautifulsoup4-4.13.5/pyproject.toml 2020-02-02 01:00:00.000000000
+0100
+++ new/beautifulsoup4-4.14.2/pyproject.toml 2020-02-02 01:00:00.000000000
+0100
@@ -81,7 +81,7 @@
# Scripts.
"/test-all-versions",
- "/scripts/*.py",
+ "/scripts/demonstrate_parser_differences.py",
# Documentation source in various languages.
"/doc*/Makefile",