Script 'mail_helper' called by obssrc
Hello community,
here is the log from the commit of package python-selection for
openSUSE:Factory checked in at 2023-06-06 19:57:24
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-selection (Old)
and /work/SRC/openSUSE:Factory/.python-selection.new.15902 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-selection"
Tue Jun 6 19:57:24 2023 rev:3 rq:1091106 version:0.0.21
Changes:
--------
--- /work/SRC/openSUSE:Factory/python-selection/python-selection.changes
2019-03-01 16:50:07.993737781 +0100
+++
/work/SRC/openSUSE:Factory/.python-selection.new.15902/python-selection.changes
2023-06-06 19:58:00.963098185 +0200
@@ -1,0 +2,6 @@
+Tue Jun 6 12:24:18 UTC 2023 - [email protected]
+
+- version update to 0.0.21
+ * no upstream changelog found
+
+-------------------------------------------------------------------
Old:
----
selection-0.0.14.tar.gz
New:
----
selection-0.0.21.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-selection.spec ++++++
--- /var/tmp/diff_new_pack.CX7ncW/_old 2023-06-06 19:58:01.463101149 +0200
+++ /var/tmp/diff_new_pack.CX7ncW/_new 2023-06-06 19:58:01.467101173 +0200
@@ -1,7 +1,7 @@
#
# spec file for package python-selection
#
-# Copyright (c) 2019 SUSE LINUX GmbH, Nuernberg, Germany.
+# Copyright (c) 2023 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@@ -16,9 +16,8 @@
#
-%{?!python_module:%define python_module() python-%{**} python3-%{**}}
Name: python-selection
-Version: 0.0.14
+Version: 0.0.21
Release: 0
Summary: API to extract content from HTML & XML documents
License: MIT
@@ -26,14 +25,12 @@
URL: https://github.com/lorien/selection
Source:
https://files.pythonhosted.org/packages/source/s/selection/selection-%{version}.tar.gz
BuildRequires: %{python_module lxml}
+BuildRequires: %{python_module pip}
BuildRequires: %{python_module setuptools}
-BuildRequires: %{python_module six}
-BuildRequires: %{python_module weblib}
+BuildRequires: %{python_module wheel}
BuildRequires: fdupes
BuildRequires: python-rpm-macros
Requires: python-lxml
-Requires: python-six
-Requires: python-weblib
BuildArch: noarch
%python_subpackages
@@ -44,15 +41,16 @@
%setup -q -n selection-%{version}
%build
-%python_build
+%pyproject_wheel
%install
-%python_install
+%pyproject_install
%python_expand %fdupes %{buildroot}%{$python_sitelib}
%files %{python_files}
%license LICENSE
-%doc README.rst
-%{python_sitelib}/*
+%doc README.md
+%{python_sitelib}/selection
+%{python_sitelib}/selection-%{version}*-info
%changelog
++++++ selection-0.0.14.tar.gz -> selection-0.0.21.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/LICENSE new/selection-0.0.21/LICENSE
--- old/selection-0.0.14/LICENSE 2018-04-22 19:18:11.000000000 +0200
+++ new/selection-0.0.21/LICENSE 2022-12-07 16:44:48.000000000 +0100
@@ -1,6 +1,6 @@
The MIT License (MIT)
-Copyright (c) 2015-2018, Gregory Petukhov
+Copyright (c) 2015-2022, Gregory Petukhov
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/MANIFEST.in
new/selection-0.0.21/MANIFEST.in
--- old/selection-0.0.14/MANIFEST.in 2018-04-22 19:17:14.000000000 +0200
+++ new/selection-0.0.21/MANIFEST.in 1970-01-01 01:00:00.000000000 +0100
@@ -1 +0,0 @@
-include LICENSE
\ No newline at end of file
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/PKG-INFO
new/selection-0.0.21/PKG-INFO
--- old/selection-0.0.14/PKG-INFO 2018-08-07 17:09:16.000000000 +0200
+++ new/selection-0.0.21/PKG-INFO 2022-12-28 00:36:24.305795200 +0100
@@ -1,17 +1,83 @@
-Metadata-Version: 1.1
+Metadata-Version: 2.1
Name: selection
-Version: 0.0.14
+Version: 0.0.21
Summary: API to extract content from HTML & XML documents
-Home-page: UNKNOWN
-Author: Gregory Petukhov
-Author-email: [email protected]
-License: MIT
-Description: UNKNOWN
-Platform: UNKNOWN
+Author-email: Gregory Petukhov <[email protected]>
+License: The MIT License (MIT)
+
+ Copyright (c) 2015-2022, Gregory Petukhov
+
+ Permission is hereby granted, free of charge, to any person obtaining
a copy
+ of this software and associated documentation files (the "Software"),
to deal
+ in the Software without restriction, including without limitation the
rights
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell
+ copies of the Software, and to permit persons to whom the Software is
+ furnished to do so, subject to the following conditions:
+
+ The above copyright notice and this permission notice shall be
included in all
+ copies or substantial portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
SHALL THE
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM,
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE
+ SOFTWARE.
+
+Project-URL: homepage, http://github.com/lorien/selection
+Keywords: lxml,dom,html
Classifier: Programming Language :: Python
-Classifier: Programming Language :: Python :: 2.7
-Classifier: Programming Language :: Python :: 3.4
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.7
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: License :: OSI Approved :: MIT License
-Classifier: Topic :: Software Development :: Libraries :: Application
Frameworks
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Topic :: Utilities
+Classifier: Typing :: Typed
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+Provides-Extra: pyquery
+License-File: LICENSE
+
+# Selection Documenation
+
+[](https://github.com/lorien/selection/actions/workflows/test.yml)
+[](https://github.com/lorien/selection/actions/workflows/code_quality.yml)
+[](https://github.com/lorien/selection/actions/workflows/mypy.yml)
+[](https://coveralls.io/r/lorien/selection?branch=master)
+
+API to query DOM tree of HTML/XML document.
+
+
+## Usage Example
+
+```
+from selection import XpathSelector
+from lxml.html import fromstring
+
+html = '<div><h1>test</h1><ul id="items"><li>1</li><li>2</li></ul></div>'
+sel = XpathSelector(fromstring(html))
+print(sel.select('//h1')).text()
+print(sel.select('//li').text_list()
+print(sel.select('//ul').attr('id')
+```
+
+
+## Installation
+
+Run: `pip install -U selection`
+
+
+## Community
+
+Telegram English chat: [https://t.me/grablab](https://t.me/grablab)
+
+Telegram Russian chat: [https://t.me/grablab\_ru](https://t.me/grablab_ru)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/README.md
new/selection-0.0.21/README.md
--- old/selection-0.0.14/README.md 1970-01-01 01:00:00.000000000 +0100
+++ new/selection-0.0.21/README.md 2022-12-09 19:22:53.000000000 +0100
@@ -0,0 +1,34 @@
+# Selection Documenation
+
+[](https://github.com/lorien/selection/actions/workflows/test.yml)
+[](https://github.com/lorien/selection/actions/workflows/code_quality.yml)
+[](https://github.com/lorien/selection/actions/workflows/mypy.yml)
+[](https://coveralls.io/r/lorien/selection?branch=master)
+
+API to query DOM tree of HTML/XML document.
+
+
+## Usage Example
+
+```
+from selection import XpathSelector
+from lxml.html import fromstring
+
+html = '<div><h1>test</h1><ul id="items"><li>1</li><li>2</li></ul></div>'
+sel = XpathSelector(fromstring(html))
+print(sel.select('//h1')).text()
+print(sel.select('//li').text_list()
+print(sel.select('//ul').attr('id')
+```
+
+
+## Installation
+
+Run: `pip install -U selection`
+
+
+## Community
+
+Telegram English chat: [https://t.me/grablab](https://t.me/grablab)
+
+Telegram Russian chat: [https://t.me/grablab\_ru](https://t.me/grablab_ru)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/README.rst
new/selection-0.0.21/README.rst
--- old/selection-0.0.14/README.rst 2017-02-06 04:19:43.000000000 +0100
+++ new/selection-0.0.21/README.rst 1970-01-01 01:00:00.000000000 +0100
@@ -1,49 +0,0 @@
-=========
-Selection
-=========
-
-.. image:: https://travis-ci.org/lorien/selection.png?branch=master
- :target: https://travis-ci.org/lorien/selection
- :alt: Travis CI
-
-.. image:: https://coveralls.io/repos/lorien/selection/badge.svg?branch=master
- :target: https://coveralls.io/r/lorien/selection?branch=master
- :alt: coveralls.io
-
-API to query DOM tree of HTML/XML document.
-
-
-Usage Example
-=============
-
-Example:
-
-.. code:: python
-
- from selection import XpathSelector
- from lxml.html import fromstring
-
- html = '<div><h1>test</h1><ul id="items"><li>1</li><li>2</li></ul></div>'
- sel = XpathSelector(fromstring(html))
- print(sel.select('//h1')).text()
- print(sel.select('//li').text_list()
- print(sel.select('//ul').attr('id')
-
-
-Installation
-============
-
-Run:
-
-.. code:: bash
-
- pip install -U pip setuptools
- pip install -U selection
-
-
-Dependencies
-============
-
-* lxml
-* tools
-* six
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/pyproject.toml
new/selection-0.0.21/pyproject.toml
--- old/selection-0.0.14/pyproject.toml 1970-01-01 01:00:00.000000000 +0100
+++ new/selection-0.0.21/pyproject.toml 2022-12-28 00:36:14.000000000 +0100
@@ -0,0 +1,90 @@
+[project]
+
+name = "selection"
+version = "0.0.21"
+description = "API to extract content from HTML & XML documents"
+readme = "README.md"
+requires-python = ">=3.8"
+license = {"file" = "LICENSE"}
+keywords = ["lxml", "dom", "html"]
+authors = [
+ {name = "Gregory Petukhov", email = "[email protected]"}
+]
+# https://pypi.org/pypi?%3Aaction=list_classifiers
+classifiers = [
+ "Programming Language :: Python",
+ "Programming Language :: Python :: 3",
+ "Programming Language :: Python :: 3.7",
+ "Programming Language :: Python :: 3.8",
+ "Programming Language :: Python :: 3.9",
+ "Programming Language :: Python :: 3.10",
+ "Programming Language :: Python :: Implementation :: CPython",
+ "License :: OSI Approved :: MIT License",
+ "Development Status :: 4 - Beta",
+ "Environment :: Console",
+ "Intended Audience :: Developers",
+ "Operating System :: OS Independent",
+ "Topic :: Software Development :: Libraries :: Python Modules",
+ "Topic :: Utilities",
+ "Typing :: Typed",
+]
+dependencies = [
+ 'lxml;platform_system!="Windows"',
+]
+
+[project.optional-dependencies]
+pyquery = ["pyquery"]
+
+[build-system]
+requires = ["setuptools"]
+build-backend = "setuptools.build_meta"
+
+[project.urls]
+homepage = "http://github.com/lorien/selection"
+
+[tool.setuptools]
+packages = ["selection"]
+
+[tool.setuptools.package-data]
+"*" = ["py.typed"]
+
+[tool.isort]
+profile = "black"
+line_length = 88
+# skip_gitignore = true # throws errors in stderr when ".git" dir does not
exist
+
+[tool.bandit]
+# B101 assert_used
+# B410 Using HtmlElement to parse untrusted XML data
+skips = ["B101", "B410"]
+
+[[tool.mypy.overrides]]
+module = "pyquery"
+ignore_missing_imports = true
+
+[tool.pylint.main]
+jobs=4
+extension-pkg-whitelist="lxml"
+disable="missing-docstring,broad-except,too-few-public-methods,consider-using-f-string,fixme"
+variable-rgx="[a-z_][a-z0-9_]{1,30}$"
+attr-rgx="[a-z_][a-z0-9_]{1,30}$"
+argument-rgx="[a-z_][a-z0-9_]{1,30}$"
+max-line-length=88
+max-args=9
+load-plugins=[
+ "pylint.extensions.check_elif",
+ "pylint.extensions.comparetozero",
+ "pylint.extensions.comparison_placement",
+ "pylint.extensions.consider_ternary_expression",
+ "pylint.extensions.docstyle",
+ "pylint.extensions.emptystring",
+ "pylint.extensions.for_any_all",
+ "pylint.extensions.overlapping_exceptions",
+ "pylint.extensions.redefined_loop_name",
+ "pylint.extensions.redefined_variable_type",
+ "pylint.extensions.set_membership",
+ "pylint.extensions.typing",
+]
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/selection/__init__.py
new/selection-0.0.21/selection/__init__.py
--- old/selection-0.0.14/selection/__init__.py 2018-08-07 17:08:51.000000000
+0200
+++ new/selection-0.0.21/selection/__init__.py 2022-12-28 00:36:14.000000000
+0100
@@ -1,6 +1,6 @@
-from weblib.const import NULL # noqa
+from selection.backend_lxml import XpathSelector
+from selection.base import RexResultList, Selector, SelectorList
-from selection.base import Selector, SelectorList, RexResultList # noqa
-from selection.backend import XpathSelector # noqa
+__all__ = ["XpathSelector", "RexResultList", "Selector", "SelectorList"]
-version = '0.0.14'
+__version__ = "0.0.21"
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/selection/backend.py
new/selection-0.0.21/selection/backend.py
--- old/selection-0.0.14/selection/backend.py 2017-02-06 04:18:31.000000000
+0100
+++ new/selection-0.0.21/selection/backend.py 1970-01-01 01:00:00.000000000
+0100
@@ -1,101 +0,0 @@
-import six
-from weblib.etree import get_node_text, render_html
-from weblib.text import normalize_space as normalize_space_func
-from weblib.const import NULL
-from weblib.error import DataNotFound
-
-from selection.base import Selector
-from selection.error import SelectionRuntimeError
-
-__all__ = ['XpathSelector', 'PyquerySelector']
-XPATH_CACHE = {}
-REGEXP_NS = 'http://exslt.org/regular-expressions'
-
-
-class LxmlNodeSelector(Selector):
- __slots__ = ()
-
- def is_text_node(self):
- return isinstance(self.node(), six.string_types)
-
- def select(self, query=None):
- if self.is_text_node():
- raise SelectionRuntimeError('Text node selectors do not '
- 'allow select method')
- return super(LxmlNodeSelector, self).select(query)
-
- def html(self, encoding='unicode'):
- if self.is_text_node():
- return self.node()
- else:
- return render_html(self.node(), encoding=encoding)
-
- def attr(self, key, default=NULL):
- if self.is_text_node():
- raise SelectionRuntimeError('Text node selectors do not '
- 'allow attr method')
- if default is NULL:
- if key in self.node().attrib:
- return self.node().get(key)
- else:
- raise DataNotFound(u'No such attribute: %s' % key)
- else:
- return self.node().get(key, default)
-
- def text(self, smart=False, normalize_space=True):
- if self.is_text_node():
- if normalize_space:
- return normalize_space_func(self.node())
- else:
- return self.node()
- else:
- return get_node_text(self.node(), smart=smart,
- normalize_space=normalize_space)
-
-
-class XpathSelector(LxmlNodeSelector):
- __slots__ = ()
-
- def process_query(self, query):
- from lxml.etree import XPath
-
- if query not in XPATH_CACHE:
- obj = XPath(query, namespaces={'re': REGEXP_NS})
- XPATH_CACHE[query] = obj
- xpath_obj = XPATH_CACHE[query]
-
- result = xpath_obj(self.node())
-
- # If you query XPATH like //some/crap/@foo="bar" then xpath function
- # returns boolean value instead of list of something.
- # To work around this problem I just returns empty list.
- # This is not great solutions but it produces less confusing error.
- if isinstance(result, bool):
- result = []
-
- if isinstance(result, six.string_types):
- result = [result]
-
- return result
-
-
-class PyquerySelector(LxmlNodeSelector):
- __slots__ = ()
-
- def pyquery_node(self):
- from pyquery import PyQuery
-
- return PyQuery(self.node())
-
- def process_query(self, query):
- return self.pyquery_node().find(query)
-
-
-#class CssSelector(XpathSelector):
-# __slots__ = ()
-#
-# def process_query(self, query):
-# from cssselect import HTMLTranslator
-#
-# xpath_query = HTMLTranslator().css_to_xpath(query)
-# return super(CssSelector, self).process_query(xpath_query)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/selection/backend_lxml.py
new/selection-0.0.21/selection/backend_lxml.py
--- old/selection-0.0.14/selection/backend_lxml.py 1970-01-01
01:00:00.000000000 +0100
+++ new/selection-0.0.21/selection/backend_lxml.py 2022-12-12
16:53:00.000000000 +0100
@@ -0,0 +1,84 @@
+from __future__ import annotations
+
+from abc import abstractmethod
+from collections.abc import Iterable
+from typing import Any, List, TypeVar, cast
+
+from lxml.etree import XPath, _Element
+
+from . import util
+from .base import Selector, SelectorList
+from .const import UNDEFINED
+
+__all__ = ["XpathSelector"]
+XPATH_CACHE = {}
+REGEXP_NS = "http://exslt.org/regular-expressions"
+LxmlNodeT = TypeVar("LxmlNodeT", bound=_Element) # LxmlNodeProtocol)
+
+
+class LxmlNodeSelector(Selector[LxmlNodeT]):
+ __slots__ = ()
+
+ @abstractmethod
+ def process_query(self, query: str) -> Iterable[LxmlNodeT]:
+ raise NotImplementedError
+
+ def is_text_node(self) -> bool:
+ return isinstance(self.node(), str)
+
+ def select(self, query: str) -> SelectorList[LxmlNodeT]:
+ if self.is_text_node():
+ raise TypeError("Text node selectors do not allow select method")
+ return super().select(query)
+
+ def html(self) -> str:
+ if self.is_text_node():
+ return str(self.node())
+ return util.render_html(cast(_Element, self.node()))
+
+ def attr(self, key: str, default: Any = UNDEFINED) -> Any:
+ if self.is_text_node():
+ raise TypeError("Text node selectors do not allow attr method")
+ if default is UNDEFINED:
+ if key in self.node().attrib:
+ return self.node().get(key)
+ raise IndexError("No such attribute: %s" % key)
+ return self.node().get(key, default)
+
+ def text(self, smart: bool = False, normalize_space: bool = True) -> str:
+ if self.is_text_node():
+ if normalize_space:
+ return util.normalize_spaces(cast(str, self.node()))
+ return str(self.node())
+ return str(
+ util.get_node_text(
+ cast(_Element, self.node()),
+ smart=smart,
+ normalize_space=normalize_space,
+ )
+ )
+
+
+class XpathSelector(LxmlNodeSelector[LxmlNodeT]):
+ __slots__ = ()
+
+ def process_query(self, query: str) -> Iterable[LxmlNodeT]:
+ if query not in XPATH_CACHE:
+ obj = XPath(query, namespaces={"re": REGEXP_NS})
+ XPATH_CACHE[query] = obj
+ xpath_obj = XPATH_CACHE[query]
+
+ result = xpath_obj(cast(_Element, self.node()))
+
+ # If you query XPATH like //some/crap/@foo="bar" then xpath function
+ # returns boolean value instead of list of something.
+ # To work around this problem I just returns empty list.
+ # This is not great solutions but it produces less confusing error.
+ if isinstance(result, bool):
+ result = []
+
+ if isinstance(result, str):
+ result = [result]
+
+ # pylint: disable=deprecated-typing-alias
+ return cast(List[LxmlNodeT], result)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/selection/backend_pyquery.py
new/selection-0.0.21/selection/backend_pyquery.py
--- old/selection-0.0.14/selection/backend_pyquery.py 1970-01-01
01:00:00.000000000 +0100
+++ new/selection-0.0.21/selection/backend_pyquery.py 2022-12-12
16:57:36.000000000 +0100
@@ -0,0 +1,22 @@
+from __future__ import annotations
+
+import typing
+from collections.abc import Iterable
+from typing import Any, cast
+
+from pyquery import PyQuery
+
+from .backend_lxml import LxmlNodeSelector, LxmlNodeT
+
+__all__ = ["PyquerySelector"]
+
+
+class PyquerySelector(LxmlNodeSelector[LxmlNodeT]):
+ __slots__ = ()
+
+ def pyquery_node(self) -> Any:
+ return PyQuery(self.node())
+
+ def process_query(self, query: str) -> Iterable[LxmlNodeT]:
+ # pylint: disable=deprecated-typing-alias
+ return cast(typing.Iterable[LxmlNodeT],
self.pyquery_node().find(query))
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/selection/base.py
new/selection-0.0.21/selection/base.py
--- old/selection-0.0.14/selection/base.py 2018-08-07 17:01:27.000000000
+0200
+++ new/selection-0.0.21/selection/base.py 2022-12-28 00:33:04.000000000
+0100
@@ -1,244 +1,277 @@
+from __future__ import annotations
+
import logging
-from weblib.const import NULL
-from weblib.error import DataNotFound, RequiredDataNotFound
-from weblib.text import normalize_space as normalize_space_func
-from weblib.html import decode_entities
-from weblib.text import find_number
-from weblib import rex as rex_tools
-
-__all__ = ['Selector', 'SelectorList', 'RexResultList']
-logger = logging.getLogger('selection.base')
-XPATH_CACHE = {}
+import re
+from abc import abstractmethod
+from collections.abc import Iterable, Iterator
+from re import Match, Pattern
+from types import TracebackType
+from typing import Any, Generic, TypeVar
+
+from . import util
+from .const import UNDEFINED
+
+__all__ = ["Selector", "SelectorList", "RexResultList"]
+LOG = logging.getLogger("selection.base")
+T = TypeVar("T")
-class Selector(object):
- __slots__ = ('_node',)
+class Selector(Generic[T]):
+ __slots__ = ("_node",)
- def __init__(self, node):
+ def __init__(self, node: T):
self._node = node
- def node(self):
+ def node(self) -> T:
return self._node
- def select(self, query):
+ @abstractmethod
+ def process_query(self, query: str) -> Iterable[T]:
+ raise NotImplementedError
+
+ def select(self, query: str) -> "SelectorList[T]":
return self._wrap_node_list(self.process_query(query), query)
- def _wrap_node_list(self, nodes, query):
- selector_list = []
- for node in nodes:
- selector_list.append(self.__class__(node))
- return SelectorList(selector_list, self.__class__, query)
+ def _wrap_node_list(self, nodes: Iterable[T], query: str) ->
"SelectorList[T]":
+ selectors = [self.__class__(x) for x in nodes]
+ return SelectorList(selectors, self.__class__, query)
- def is_text_node(self):
+ def is_text_node(self) -> bool:
raise NotImplementedError
- def html(self, encoding='unicode'):
+ @abstractmethod
+ def html(self) -> str:
raise NotImplementedError
- def attr(self, key, default=NULL):
+ def attr(self, key: str, default: Any = UNDEFINED) -> Any:
raise NotImplementedError
- def text(self, smart=False, normalize_space=True):
+ def text(self, smart: bool = False, normalize_space: bool = True) -> str:
raise NotImplementedError
- def number(self, default=NULL, ignore_spaces=False,
- smart=False, make_int=True):
+ def number(
+ self,
+ default: Any = UNDEFINED,
+ ignore_spaces: bool = False,
+ smart: bool = False,
+ make_int: bool = True,
+ ) -> Any:
try:
- return find_number(self.text(smart=smart),
- ignore_spaces=ignore_spaces,
- make_int=make_int)
+ return util.find_number(
+ self.text(smart=smart), ignore_spaces=ignore_spaces,
make_int=make_int
+ )
except IndexError:
- if default is NULL:
+ if default is UNDEFINED:
raise
- else:
- return default
-
- def rex(self, regexp, flags=0):
- norm_regexp = rex_tools.normalize_regexp(regexp, flags)
- matches = list(norm_regexp.finditer(self.html()))
- return RexResultList(matches, source_rex=norm_regexp)
-
-
-class SelectorList(object):
- __slots__ = ('selector_list', 'origin_selector_class', 'origin_query')
+ return default
- def __init__(self, selector_list, origin_selector_class, origin_query):
+ def rex(
+ self, regexp: str | Pattern[str], flags: int = 0
+ ) -> "RexResultList": # pylint: disable=used-before-assignment
+
+ if isinstance(regexp, str):
+ regexp = re.compile(regexp, flags)
+ matches = list(regexp.finditer(self.html()))
+ return RexResultList(matches, source_rex=regexp)
+
+
+class SelectorList(Generic[T]):
+ __slots__ = ("selector_list", "origin_selector_class", "origin_query")
+
+ def __init__(
+ self,
+ selector_list: list[Selector[T]],
+ origin_selector_class: type[Selector[T]],
+ origin_query: str,
+ ) -> None:
self.selector_list = selector_list
self.origin_selector_class = origin_selector_class
self.origin_query = origin_query
- def __enter__(self):
+ def __enter__(self) -> "SelectorList[T]":
return self
- def __exit__(self, exc_type, exc_value, traceback):
+ def __exit__(
+ self, exc_type: type[Exception], exc_value: Exception, traceback:
TracebackType
+ ) -> None:
pass
- def __getitem__(self, x):
- return self.selector_list[x]
+ def __getitem__(self, index: int) -> Selector[T]:
+ return self.selector_list[index]
- def __len__(self):
+ def __len__(self) -> int:
return self.count()
- def count(self):
+ def __iter__(self) -> Iterator[Selector[T]]:
+ return iter(self.selector_list)
+
+ def count(self) -> int:
return len(self.selector_list)
- def one(self, default=NULL):
+ def one(self, default: Any = UNDEFINED) -> Any:
try:
return self.selector_list[0]
- except IndexError:
- if default is NULL:
- m = 'Could not get first item for %s query of class %s'\
- % (self.origin_query, self.origin_selector_class.__name__)
- raise DataNotFound(m)
- else:
- return default
+ except IndexError as ex:
+ if default is UNDEFINED:
+ raise IndexError(
+ "Could not get first item for %s query of class %s"
+ % (
+ self.origin_query,
+ self.origin_selector_class.__name__,
+ )
+ ) from ex
+ return default
- def node(self, default=NULL):
+ def node(self, default: Any = UNDEFINED) -> Any:
try:
return self.one().node()
- except IndexError:
- if default is NULL:
- m = 'Could not get first item for %s query of class %s'\
- % (self.origin_query, self.origin_selector_class.__name__)
- raise DataNotFound(m)
- else:
- return default
-
- def text(self, default=NULL, smart=False, normalize_space=True):
+ except IndexError as ex:
+ if default is UNDEFINED:
+ raise IndexError(
+ "Could not get first item for %s query of class %s"
+ % (
+ self.origin_query,
+ self.origin_selector_class.__name__,
+ )
+ ) from ex
+ return default
+
+ def text(
+ self,
+ default: Any = UNDEFINED,
+ smart: bool = False,
+ normalize_space: bool = True,
+ ) -> Any:
try:
sel = self.one()
except IndexError:
- if default is NULL:
+ if default is UNDEFINED:
raise
- else:
- return default
+ return default
else:
return sel.text(smart=smart, normalize_space=normalize_space)
- def text_list(self, smart=False, normalize_space=True):
+ def text_list(self, smart: bool = False, normalize_space: bool = True) ->
list[str]:
result_list = []
for item in self.selector_list:
- result_list.append(item.text(normalize_space=normalize_space,
- smart=smart))
+ result_list.append(item.text(normalize_space=normalize_space,
smart=smart))
return result_list
- def html(self, default=NULL, encoding='unicode'):
+ def html(self, default: Any = UNDEFINED) -> Any:
try:
sel = self.one()
except IndexError:
- if default is NULL:
+ if default is UNDEFINED:
raise
- else:
- return default
+ return default
else:
- return sel.html(encoding=encoding)
+ return sel.html()
- def inner_html(self, default=NULL, encoding='unicode'):
+ def inner_html(self, default: Any = UNDEFINED) -> Any:
try:
sel = self.one()
except IndexError:
- if default is NULL:
+ if default is UNDEFINED:
raise
- else:
- return default
+ return default
else:
- result_list = [item.html(encoding=encoding) for item in
sel.select('./*')]
- return ''.join(result_list).strip()
-
- def number(self, default=NULL, ignore_spaces=False,
- smart=False, make_int=True):
- """
- Find number in normalized text of node which matches the given xpath.
- """
+ result_list = [item.html() for item in sel.select("./*")]
+ return "".join(result_list).strip()
+ def number(
+ self,
+ default: Any = UNDEFINED,
+ ignore_spaces: bool = False,
+ smart: bool = False,
+ make_int: bool = True,
+ ) -> Any:
+ """Find number in normalized text of node which matches the given
xpath."""
try:
sel = self.one()
except IndexError:
- if default is NULL:
+ if default is UNDEFINED:
raise
- else:
- return default
+ return default
else:
- return sel.number(ignore_spaces=ignore_spaces, smart=smart,
- default=default, make_int=make_int)
-
- def exists(self):
- """
- Return True if selector list is not empty.
- """
+ return sel.number(
+ ignore_spaces=ignore_spaces,
+ smart=smart,
+ default=default,
+ make_int=make_int,
+ )
+ def exists(self) -> bool:
+ """Return True if selector list is not empty."""
return len(self.selector_list) > 0
- def require(self):
- """
- Raise RequiredDataNotFound if selector data does not exist.
- """
-
+ def require(self) -> None:
+ """Raise IndexError if selector data does not exist."""
if not self.exists():
- raise RequiredDataNotFound(
- u'Node does not exists, query: %s, query type: %s' % (
+ raise IndexError(
+ "Node does not exists, query: %s, query type: %s"
+ % (
self.origin_query,
self.origin_selector_class.__name__,
)
)
- def attr(self, key, default=NULL):
+ def attr(self, key: str, default: Any = UNDEFINED) -> Any:
try:
sel = self.one()
except IndexError:
- if default is NULL:
+ if default is UNDEFINED:
raise
- else:
- return default
+ return default
else:
return sel.attr(key, default=default)
- def attr_list(self, key, default=NULL):
+ def attr_list(self, key: str, default: Any = UNDEFINED) -> Any:
result_list = []
for item in self.selector_list:
result_list.append(item.attr(key, default=default))
return result_list
- def rex(self, regexp, flags=0, default=NULL):
+ def rex(
+ self, regexp: Pattern[str], flags: int = 0, default: Any = UNDEFINED
+ ) -> Any:
try:
sel = self.one()
except IndexError:
- if default is NULL:
+ if default is UNDEFINED:
raise
- else:
- return default
+ return default
else:
return sel.rex(regexp, flags=flags)
- def node_list(self):
+ def node_list(self) -> list[Any]:
return [x.node() for x in self.selector_list]
- def select(self, query):
- result = SelectorList([], self.origin_selector_class,
- self.origin_query + ' + ' + query)
+ def select(self, query: str) -> "SelectorList[T]":
+ result: SelectorList[T] = SelectorList(
+ [], self.origin_selector_class, self.origin_query + " + " + query
+ )
for selector in self.selector_list:
- result.selector_list.extend(selector.select(query))
+ result.selector_list.extend(iter(selector.select(query)))
return result
-class RexResultList(object):
- __slots__ = ('items', 'source_rex')
+class RexResultList:
+ __slots__ = ("items", "source_rex")
- def __init__(self, items, source_rex):
+ def __init__(self, items: list[Match[str]], source_rex: Pattern[str]) ->
None:
self.items = items
self.source_rex = source_rex
- def one(self):
+ def one(self) -> Match[str]:
return self.items[0]
- def text(self, default=NULL):
+ def text(self, default: Any = UNDEFINED) -> Any:
try:
- return normalize_space_func(decode_entities(self.one().group(1)))
- except (AttributeError, IndexError):
- if default is NULL:
- raise DataNotFound
- else:
- return default
+ return
util.normalize_spaces(util.decode_entities(self.one().group(1)))
+ except (AttributeError, IndexError) as ex:
+ if default is UNDEFINED:
+ raise IndexError from ex
+ return default
- def number(self):
+ def number(self) -> int:
return int(self.text())
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/selection/const.py
new/selection-0.0.21/selection/const.py
--- old/selection-0.0.14/selection/const.py 1970-01-01 01:00:00.000000000
+0100
+++ new/selection-0.0.21/selection/const.py 2022-12-07 21:36:13.000000000
+0100
@@ -0,0 +1 @@
+UNDEFINED = object()
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/selection/error.py
new/selection-0.0.21/selection/error.py
--- old/selection-0.0.14/selection/error.py 2017-02-06 04:18:31.000000000
+0100
+++ new/selection-0.0.21/selection/error.py 1970-01-01 01:00:00.000000000
+0100
@@ -1,2 +0,0 @@
-class SelectionRuntimeError(Exception):
- pass
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/selection/util.py
new/selection-0.0.21/selection/util.py
--- old/selection-0.0.14/selection/util.py 1970-01-01 01:00:00.000000000
+0100
+++ new/selection-0.0.21/selection/util.py 2022-12-12 16:56:12.000000000
+0100
@@ -0,0 +1,144 @@
+"""Helpful things used in multiple modules of selection package.
+
+Most of this module contents is a copy-paste from weblib package. It is done to
+drop outdated weblib dependency.
+"""
+from __future__ import annotations
+
+import re
+from html.entities import name2codepoint
+from re import Match
+from typing import List, cast
+
+import lxml.html
+from lxml.etree import _Element
+
+RE_NUMBER = re.compile(r"\d+")
+RE_NUMBER_WITH_SPACES = re.compile(r"\d[\s\d]*")
+RE_SPACE = re.compile(r"\s+")
+RE_NAMED_ENTITY = re.compile(r"(&[a-z]+;)")
+RE_NUM_ENTITY = re.compile(r"(&#[0-9]+;)")
+RE_HEX_ENTITY = re.compile(r"(&#x[a-f0-9]+;)", re.I)
+
+
+def normalize_spaces(val: str) -> str:
+ return re.sub(r"\s+", " ", val).strip()
+
+
+def drop_spaces(val: str) -> str:
+ """Drop all space-chars in the `text`."""
+ return RE_SPACE.sub("", val)
+
+
+def find_number(
+ text: str,
+ ignore_spaces: bool = False,
+ make_int: bool = True,
+ ignore_chars: None | str | list[str] = None,
+) -> str | int:
+ """Find the number in the `text`.
+
+ :param text: str
+ :param ignore_spaces: if True then consider groups of digits delimited
+ by spaces as a single number
+
+ Raises IndexError if number was not found.
+ """
+ if ignore_chars:
+ for char in ignore_chars:
+ text = text.replace(char, "")
+ rex = RE_NUMBER_WITH_SPACES if ignore_spaces else RE_NUMBER
+ match = rex.search(text)
+ if match:
+ val = match.group(0)
+ if ignore_spaces:
+ val = drop_spaces(val)
+ if make_int:
+ return int(val)
+ return val
+ raise IndexError("Could not find a number in given text")
+
+
+def process_named_entity(match: Match[str]) -> str:
+ entity = match.group(1)
+ name = entity[1:-1]
+ if name in name2codepoint:
+ return chr(name2codepoint[name])
+ return entity
+
+
+def process_num_entity(match: Match[str]) -> str:
+ entity = match.group(1)
+ num = entity[2:-1]
+ try:
+ return chr(int(num))
+ except ValueError:
+ return entity
+
+
+def process_hex_entity(match: Match[str]) -> str:
+ entity = match.group(1)
+ code = entity[3:-1]
+ try:
+ return chr(int(code, 16))
+ except ValueError:
+ return entity
+
+
+def decode_entities(html: str) -> str:
+ """Convert all HTML entities into their unicode representations.
+
+ This functions processes following entities:
+ * &XXX;
+ * &#XXX;
+
+ Example::
+
+ >>> print html.decode_entities('→ABC R©')
+ âABC R©
+ """
+ html = RE_NUM_ENTITY.sub(process_num_entity, html)
+ html = RE_HEX_ENTITY.sub(process_hex_entity, html)
+ return RE_NAMED_ENTITY.sub(process_named_entity, html)
+
+
+def render_html(node: _Element) -> str:
+ """Render Element node."""
+ return lxml.html.tostring(
+ cast(lxml.html.HtmlElement, node), encoding="utf-8"
+ ).decode("utf-8")
+
+
+def get_node_text(
+ node: _Element, smart: bool = False, normalize_space: bool = True
+) -> str:
+ """Extract text content of the `node` and all its descendants.
+
+ In smart mode `get_node_text` insert spaces between <tag><another tag>
+ and also ignores content of the script and style tags.
+
+ In non-smart mode this func just return text_content() of node
+ with normalized spaces
+ """
+ if isinstance(node, str):
+ value = str(node)
+ elif smart:
+ # pylint: disable=deprecated-typing-alias
+ value = " ".join(
+ cast(
+ List[str],
+ node.xpath(
+ './descendant-or-self::*[name() != "script" and '
+ 'name() != "style"]/text()[normalize-space()]'
+ ),
+ )
+ )
+ elif isinstance(node, lxml.html.HtmlElement):
+ value = node.text_content()
+ else:
+ # If DOM tree was built with lxml.etree.fromstring
+ # then tree nodes do not have text_content() method
+ value = "".join(map(str, node.xpath(".//text()")))
+ if normalize_space:
+ return normalize_spaces(value)
+ return value
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/selection.egg-info/PKG-INFO
new/selection-0.0.21/selection.egg-info/PKG-INFO
--- old/selection-0.0.14/selection.egg-info/PKG-INFO 2018-08-07
17:09:16.000000000 +0200
+++ new/selection-0.0.21/selection.egg-info/PKG-INFO 2022-12-28
00:36:24.000000000 +0100
@@ -1,17 +1,83 @@
-Metadata-Version: 1.1
+Metadata-Version: 2.1
Name: selection
-Version: 0.0.14
+Version: 0.0.21
Summary: API to extract content from HTML & XML documents
-Home-page: UNKNOWN
-Author: Gregory Petukhov
-Author-email: [email protected]
-License: MIT
-Description: UNKNOWN
-Platform: UNKNOWN
+Author-email: Gregory Petukhov <[email protected]>
+License: The MIT License (MIT)
+
+ Copyright (c) 2015-2022, Gregory Petukhov
+
+ Permission is hereby granted, free of charge, to any person obtaining
a copy
+ of this software and associated documentation files (the "Software"),
to deal
+ in the Software without restriction, including without limitation the
rights
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell
+ copies of the Software, and to permit persons to whom the Software is
+ furnished to do so, subject to the following conditions:
+
+ The above copyright notice and this permission notice shall be
included in all
+ copies or substantial portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
SHALL THE
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM,
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE
+ SOFTWARE.
+
+Project-URL: homepage, http://github.com/lorien/selection
+Keywords: lxml,dom,html
Classifier: Programming Language :: Python
-Classifier: Programming Language :: Python :: 2.7
-Classifier: Programming Language :: Python :: 3.4
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.7
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: License :: OSI Approved :: MIT License
-Classifier: Topic :: Software Development :: Libraries :: Application
Frameworks
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Topic :: Utilities
+Classifier: Typing :: Typed
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+Provides-Extra: pyquery
+License-File: LICENSE
+
+# Selection Documenation
+
+[](https://github.com/lorien/selection/actions/workflows/test.yml)
+[](https://github.com/lorien/selection/actions/workflows/code_quality.yml)
+[](https://github.com/lorien/selection/actions/workflows/mypy.yml)
+[](https://coveralls.io/r/lorien/selection?branch=master)
+
+API to query DOM tree of HTML/XML document.
+
+
+## Usage Example
+
+```
+from selection import XpathSelector
+from lxml.html import fromstring
+
+html = '<div><h1>test</h1><ul id="items"><li>1</li><li>2</li></ul></div>'
+sel = XpathSelector(fromstring(html))
+print(sel.select('//h1')).text()
+print(sel.select('//li').text_list()
+print(sel.select('//ul').attr('id')
+```
+
+
+## Installation
+
+Run: `pip install -U selection`
+
+
+## Community
+
+Telegram English chat: [https://t.me/grablab](https://t.me/grablab)
+
+Telegram Russian chat: [https://t.me/grablab\_ru](https://t.me/grablab_ru)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/selection.egg-info/SOURCES.txt
new/selection-0.0.21/selection.egg-info/SOURCES.txt
--- old/selection-0.0.14/selection.egg-info/SOURCES.txt 2018-08-07
17:09:16.000000000 +0200
+++ new/selection-0.0.21/selection.egg-info/SOURCES.txt 2022-12-28
00:36:24.000000000 +0100
@@ -1,11 +1,13 @@
LICENSE
-MANIFEST.in
-README.rst
-setup.py
+README.md
+pyproject.toml
selection/__init__.py
-selection/backend.py
+selection/backend_lxml.py
+selection/backend_pyquery.py
selection/base.py
-selection/error.py
+selection/const.py
+selection/py.typed
+selection/util.py
selection.egg-info/PKG-INFO
selection.egg-info/SOURCES.txt
selection.egg-info/dependency_links.txt
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/selection.egg-info/requires.txt
new/selection-0.0.21/selection.egg-info/requires.txt
--- old/selection-0.0.14/selection.egg-info/requires.txt 2018-08-07
17:09:16.000000000 +0200
+++ new/selection-0.0.21/selection.egg-info/requires.txt 2022-12-28
00:36:24.000000000 +0100
@@ -1,5 +1,6 @@
-weblib
-six
[:platform_system != "Windows"]
lxml
+
+[pyquery]
+pyquery
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/selection-0.0.14/setup.py
new/selection-0.0.21/setup.py
--- old/selection-0.0.14/setup.py 2018-08-07 17:08:51.000000000 +0200
+++ new/selection-0.0.21/setup.py 1970-01-01 01:00:00.000000000 +0100
@@ -1,25 +0,0 @@
-from setuptools import setup, find_packages
-
-setup(
- name = 'selection',
- version = '0.0.14',
- description = 'API to extract content from HTML & XML documents',
- author = 'Gregory Petukhov',
- author_email = '[email protected]',
- install_requires = [
- 'lxml;platform_system!="Windows"',
- 'weblib',
- 'six',
- ],
- packages = find_packages(exclude=['test']),
- license = "MIT",
- classifiers = [
- 'Programming Language :: Python',
- 'Programming Language :: Python :: 2.7',
- 'Programming Language :: Python :: 3.4',
- 'Programming Language :: Python :: Implementation :: CPython',
- 'License :: OSI Approved :: MIT License',
- 'Topic :: Software Development :: Libraries :: Application Frameworks',
- 'Topic :: Software Development :: Libraries :: Python Modules',
- ],
-)