commit python-tldextract for openSUSE:Factory

Source-Sync Sun, 21 May 2023 10:10:04 -0700

Script 'mail_helper' called by obssrc
Hello community,

here is the log from the commit of package python-tldextract for 
openSUSE:Factory checked in at 2023-05-21 19:09:08
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-tldextract (Old)
 and      /work/SRC/openSUSE:Factory/.python-tldextract.new.1533 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "python-tldextract"

Sun May 21 19:09:08 2023 rev:18 rq:1088132 version:3.4.4

Changes:
--------
--- /work/SRC/openSUSE:Factory/python-tldextract/python-tldextract.changes      
2023-05-12 20:40:02.534775795 +0200
+++ 
/work/SRC/openSUSE:Factory/.python-tldextract.new.1533/python-tldextract.changes
    2023-05-21 19:09:52.174832852 +0200
@@ -1,0 +2,24 @@
+Sun May 21 13:02:41 UTC 2023 - Mia Herkt <m...@0x0.st>
+
+- Update to 3.4.4:
+Bugfixes
+  * Honor private domains flag on self, not only when passed to
+    __call__
+    #gh/john-kurkowski/tldextract#289
+- Changes in 3.4.3:
+Bugfixes
+  * Speed up 10-15% over all inputs
+  * Refactor suffix_index() to use a trie
+    #gh/john-kurkowski/tldextract#285
+Docs
+  * Adopt PEP257 doc style
+- Changes in 3.4.2:
+Bugfixes
+  * Speed up 10-40% on "average" inputs, and even more on
+    pathological inputs, like long subdomains
+  * Optimize suffix_index(): search from right to left
+    #gh/john-kurkowski/tldextract#283
+  * Optimize netloc extraction: switch from regex to if/else
+    #gh/john-kurkowski/tldextract#284
+
+-------------------------------------------------------------------

Old:
----
  tldextract-3.4.1.tar.gz

New:
----
  tldextract-3.4.4.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-tldextract.spec ++++++
--- /var/tmp/diff_new_pack.tkAnGX/_old  2023-05-21 19:09:52.594835250 +0200
+++ /var/tmp/diff_new_pack.tkAnGX/_new  2023-05-21 19:09:52.598835272 +0200
@@ -18,7 +18,7 @@
 
 %define oldpython python
 Name:           python-tldextract
-Version:        3.4.1
+Version:        3.4.4
 Release:        0
 Summary:        Python module to separate the TLD of a URL
 License:        BSD-3-Clause

++++++ tldextract-3.4.1.tar.gz -> tldextract-3.4.4.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/CHANGELOG.md 
new/tldextract-3.4.4/CHANGELOG.md
--- old/tldextract-3.4.1/CHANGELOG.md   2023-04-27 01:26:27.000000000 +0200
+++ new/tldextract-3.4.4/CHANGELOG.md   2023-05-20 02:30:51.000000000 +0200
@@ -3,6 +3,26 @@
 After upgrading, update your cache file by deleting it or via `tldextract
 --update`.
 
+## 3.4.4 (2023-05-19)
+
+* Bugfixes
+  * Honor private domains flag on `self`, not only when passed to `__call__` 
([#289](https://github.com/john-kurkowski/tldextract/issues/289))
+
+## 3.4.3 (2023-05-18)
+
+* Bugfixes
+  * Speed up 10-15% over all inputs
+    * Refactor `suffix_index()` to use a trie 
([#285](https://github.com/john-kurkowski/tldextract/issues/285))
+* Docs
+  * Adopt PEP257 doc style
+
+## 3.4.2 (2023-05-16)
+
+* Bugfixes
+  * Speed up 10-40% on "average" inputs, and even more on pathological inputs, 
like long subdomains
+    * Optimize `suffix_index()`: search from right to left 
([#283](https://github.com/john-kurkowski/tldextract/issues/283))
+    * Optimize netloc extraction: switch from regex to if/else 
([#284](https://github.com/john-kurkowski/tldextract/issues/284))
+
 ## 3.4.1 (2023-04-26)
 
 * Bugfixes
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/PKG-INFO 
new/tldextract-3.4.4/PKG-INFO
--- old/tldextract-3.4.1/PKG-INFO       2023-04-27 01:31:34.373349000 +0200
+++ new/tldextract-3.4.4/PKG-INFO       2023-05-20 02:33:31.880953800 +0200
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: tldextract
-Version: 3.4.1
+Version: 3.4.4
 Summary: Accurately separates a URL's subdomain, domain, and public suffix, 
using the Public Suffix List (PSL). By default, this includes the public ICANN 
TLDs and their exceptions. You can optionally support the Public Suffix List's 
private domains as well.
 Home-page: https://github.com/john-kurkowski/tldextract
 Author: John Kurkowski
@@ -20,8 +20,9 @@
 Description-Content-Type: text/markdown
 License-File: LICENSE
 
- `tldextract` accurately separates a URL's subdomain, domain, and public 
suffix,
-using the Public Suffix List (PSL).
+`tldextract` accurately separates a URL's subdomain, domain, and public suffix.
+
+It does this via the Public Suffix List (PSL).
 
     >>> import tldextract
     >>> tldextract.extract('http://forums.news.cnn.com/')
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/setup.py 
new/tldextract-3.4.4/setup.py
--- old/tldextract-3.4.1/setup.py       2023-04-27 01:28:16.000000000 +0200
+++ new/tldextract-3.4.4/setup.py       2023-05-20 02:25:26.000000000 +0200
@@ -1,5 +1,6 @@
-""" `tldextract` accurately separates a URL's subdomain, domain, and public 
suffix,
-using the Public Suffix List (PSL).
+"""`tldextract` accurately separates a URL's subdomain, domain, and public 
suffix.
+
+It does this via the Public Suffix List (PSL).
 
     >>> import tldextract
     >>> tldextract.extract('http://forums.news.cnn.com/')
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tests/__init__.py 
new/tldextract-3.4.4/tests/__init__.py
--- old/tldextract-3.4.1/tests/__init__.py      2023-01-12 02:07:59.000000000 
+0100
+++ new/tldextract-3.4.4/tests/__init__.py      2023-05-20 02:25:26.000000000 
+0200
@@ -0,0 +1 @@
+"""Package tests."""
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tests/conftest.py 
new/tldextract-3.4.4/tests/conftest.py
--- old/tldextract-3.4.1/tests/conftest.py      2023-01-12 02:07:59.000000000 
+0100
+++ new/tldextract-3.4.4/tests/conftest.py      2023-05-20 02:25:26.000000000 
+0200
@@ -3,13 +3,16 @@
 import logging
 
 import pytest
+
 import tldextract.cache
 
 
 @pytest.fixture(autouse=True)
 def reset_log_level():
-    """Automatically reset log level verbosity between tests. Generally want
-    test output the Unix way: silence is golden."""
+    """Automatically reset log level verbosity between tests.
+
+    Generally want test output the Unix way: silence is golden.
+    """
     tldextract.cache._DID_LOG_UNABLE_TO_CACHE = (  # pylint: 
disable=protected-access
         False
     )
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tests/main_test.py 
new/tldextract-3.4.4/tests/main_test.py
--- old/tldextract-3.4.1/tests/main_test.py     2023-04-10 03:08:44.000000000 
+0200
+++ new/tldextract-3.4.4/tests/main_test.py     2023-05-20 02:30:02.000000000 
+0200
@@ -7,6 +7,7 @@
 
 import pytest
 import responses
+
 import tldextract
 import tldextract.suffix_list
 from tldextract.cache import DiskCache
@@ -36,9 +37,11 @@
         extract_using_fallback_to_snapshot_no_cache,
     ),
 ) -> None:
-    """Test helper to compare all the expected and actual attributes and
-    properties of an extraction. Runs the same comparison across several
-    permutations of tldextract instance configurations."""
+    """Test helper to compare all expected and actual attributes of an 
extraction.
+
+    Runs the same comparison across several permutations of tldextract instance
+    configurations.
+    """
     (
         expected_fqdn,
         expected_subdomain,
@@ -84,6 +87,14 @@
 def test_suffix():
     assert_extract("com", ("", "", "", "com"))
     assert_extract("co.uk", ("", "", "", "co.uk"))
+    assert_extract("example.ck", ("", "", "", "example.ck"))
+    assert_extract("www.example.ck", ("www.example.ck", "", "www", 
"example.ck"))
+    assert_extract(
+        "sub.www.example.ck", ("sub.www.example.ck", "sub", "www", 
"example.ck")
+    )
+    assert_extract("www.ck", ("www.ck", "", "www", "ck"))
+    assert_extract("nes.buskerud.no", ("", "", "", "nes.buskerud.no"))
+    assert_extract("buskerud.no", ("buskerud.no", "", "buskerud", "no"))
 
 
 def test_local_host():
@@ -187,9 +198,7 @@
 
 
 def test_idna_2008():
-    """Python supports IDNA 2003.
-    The IDNA library adds 2008 support for characters like Ã.
-    """
+    """Python supports IDNA 2003. The IDNA library adds 2008 support for 
characters like Ã."""
     assert_extract(
         "xn--gieen46ers-73a.de",
         ("xn--gieen46ers-73a.de", "", "xn--gieen46ers-73a", "de"),
@@ -205,6 +214,13 @@
 
 
 def test_scheme():
+    assert_extract("//", ("", "", "", ""))
+    assert_extract("://", ("", "", "", ""))
+    assert_extract("://example.com", ("", "", "", ""))
+    assert_extract("a+-.://example.com", ("example.com", "", "example", "com"))
+    assert_extract("a#//example.com", ("", "", "a", ""))
+    assert_extract("a@://example.com", ("", "", "", ""))
+    assert_extract("#//example.com", ("", "", "", ""))
     assert_extract(
         "https://mail.google.com/mail";, ("mail.google.com", "mail", "google", 
"com")
     )
@@ -272,10 +288,29 @@
     #                ('www.net.cn', 'www', 'net', 'cn'))
 
 
+def test_no_1st_level_tld():
+    assert_extract("za", ("", "", "za", ""))
+    assert_extract("example.za", ("", "example", "za", ""))
+    assert_extract("co.za", ("", "", "", "co.za"))
+    assert_extract("example.co.za", ("example.co.za", "", "example", "co.za"))
+    assert_extract(
+        "sub.example.co.za", ("sub.example.co.za", "sub", "example", "co.za")
+    )
+
+
 def test_dns_root_label():
     assert_extract(
         "http://www.example.com./";, ("www.example.com", "www", "example", 
"com")
     )
+    assert_extract(
+        "http://www.example.com\u3002/";, ("www.example.com", "www", "example", 
"com")
+    )
+    assert_extract(
+        "http://www.example.com\uff0e/";, ("www.example.com", "www", "example", 
"com")
+    )
+    assert_extract(
+        "http://www.example.com\uff61/";, ("www.example.com", "www", "example", 
"com")
+    )
 
 
 def test_private_domains():
@@ -317,7 +352,6 @@
 
 def test_cache_permission(mocker, monkeypatch, tmpdir):
     """Emit a warning once that this can't cache the latest PSL."""
-
     warning = mocker.patch.object(logging.getLogger("tldextract.cache"), 
"warning")
 
     def no_permission_makedirs(*args, **kwargs):
@@ -350,6 +384,17 @@
         tldextract.suffix_list.find_first_response(cache, [server], 5)
 
 
+def test_include_psl_private_domain_attr():
+    extract_private = tldextract.TLDExtract(include_psl_private_domains=True)
+    extract_public = tldextract.TLDExtract(include_psl_private_domains=False)
+    assert extract_private("foo.uk.com") == ExtractResult(
+        subdomain="", domain="foo", suffix="uk.com"
+    )
+    assert extract_public("foo.uk.com") == ExtractResult(
+        subdomain="foo", domain="uk", suffix="com"
+    )
+
+
 def test_tlds_property():
     extract_private = tldextract.TLDExtract(
         cache_dir=None, suffix_list_urls=(), include_psl_private_domains=True
@@ -367,3 +412,32 @@
     assert tldextract.extract(
         "foo.blogspot.com", include_psl_private_domains=True
     ) == ExtractResult(subdomain="", domain="foo", suffix="blogspot.com")
+    assert tldextract.extract(
+        "s3.ap-south-1.amazonaws.com", include_psl_private_domains=True
+    ) == ExtractResult(subdomain="", domain="", 
suffix="s3.ap-south-1.amazonaws.com")
+    assert tldextract.extract(
+        "the-quick-brown-fox.ap-south-1.amazonaws.com", 
include_psl_private_domains=True
+    ) == ExtractResult(
+        subdomain="the-quick-brown-fox.ap-south-1", domain="amazonaws", 
suffix="com"
+    )
+    assert tldextract.extract(
+        "ap-south-1.amazonaws.com", include_psl_private_domains=True
+    ) == ExtractResult(subdomain="ap-south-1", domain="amazonaws", 
suffix="com")
+    assert tldextract.extract(
+        "amazonaws.com", include_psl_private_domains=True
+    ) == ExtractResult(subdomain="", domain="amazonaws", suffix="com")
+    assert tldextract.extract(
+        "s3.cn-north-1.amazonaws.com.cn", include_psl_private_domains=True
+    ) == ExtractResult(subdomain="", domain="", 
suffix="s3.cn-north-1.amazonaws.com.cn")
+    assert tldextract.extract(
+        "the-quick-brown-fox.cn-north-1.amazonaws.com.cn",
+        include_psl_private_domains=True,
+    ) == ExtractResult(
+        subdomain="the-quick-brown-fox.cn-north-1", domain="amazonaws", 
suffix="com.cn"
+    )
+    assert tldextract.extract(
+        "cn-north-1.amazonaws.com.cn", include_psl_private_domains=True
+    ) == ExtractResult(subdomain="cn-north-1", domain="amazonaws", 
suffix="com.cn")
+    assert tldextract.extract(
+        "amazonaws.com.cn", include_psl_private_domains=True
+    ) == ExtractResult(subdomain="", domain="amazonaws", suffix="com.cn")
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tests/test_cache.py 
new/tldextract-3.4.4/tests/test_cache.py
--- old/tldextract-3.4.1/tests/test_cache.py    2023-01-12 02:07:59.000000000 
+0100
+++ new/tldextract-3.4.4/tests/test_cache.py    2023-05-20 02:25:26.000000000 
+0200
@@ -1,4 +1,4 @@
-"""Test the caching functionality"""
+"""Test the caching functionality."""
 import os.path
 import sys
 import types
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tests/test_parallel.py 
new/tldextract-3.4.4/tests/test_parallel.py
--- old/tldextract-3.4.1/tests/test_parallel.py 2023-01-12 02:07:59.000000000 
+0100
+++ new/tldextract-3.4.4/tests/test_parallel.py 2023-05-20 02:25:26.000000000 
+0200
@@ -1,15 +1,16 @@
-"""Test ability to run in parallel with shared cache"""
+"""Test ability to run in parallel with shared cache."""
 import os
 import os.path
 from multiprocessing import Pool
 
 import responses
+
 from tldextract import TLDExtract
 from tldextract.tldextract import PUBLIC_SUFFIX_LIST_URLS
 
 
 def test_multiprocessing_makes_one_request(tmpdir):
-    """Ensure there aren't duplicate download requests"""
+    """Ensure there aren't duplicate download requests."""
     process_count = 3
     with Pool(processes=process_count) as pool:
         http_request_counts = pool.map(_run_extractor, [str(tmpdir)] * 
process_count)
@@ -18,7 +19,7 @@
 
 @responses.activate
 def _run_extractor(cache_dir):
-    """run the extractor"""
+    """Run the extractor."""
     responses.add(responses.GET, PUBLIC_SUFFIX_LIST_URLS[0], status=208, 
body="uk.co")
     extract = TLDExtract(cache_dir=cache_dir)
 
@@ -28,7 +29,7 @@
 
 @responses.activate
 def test_cache_cleared_by_other_process(tmpdir, monkeypatch):
-    """Simulate a file being deleted after we check for existence but before 
we try to delete it"""
+    """Simulate a file being deleted after we check for existence but before 
we try to delete it."""
     responses.add(responses.GET, PUBLIC_SUFFIX_LIST_URLS[0], status=208, 
body="uk.com")
 
     cache_dir = str(tmpdir)
@@ -37,7 +38,7 @@
     orig_unlink = os.unlink
 
     def evil_unlink(filename):
-        """Simulates someone delete the file right before we try to"""
+        """Simulate someone deletes the file right before we try to."""
         if filename.startswith(cache_dir):
             orig_unlink(filename)
         orig_unlink(filename)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tests/test_trie.py 
new/tldextract-3.4.4/tests/test_trie.py
--- old/tldextract-3.4.1/tests/test_trie.py     1970-01-01 01:00:00.000000000 
+0100
+++ new/tldextract-3.4.4/tests/test_trie.py     2023-05-20 02:25:26.000000000 
+0200
@@ -0,0 +1,53 @@
+"""Trie tests."""
+from itertools import permutations
+
+from tldextract.tldextract import Trie
+
+
+def test_nested_dict() -> None:
+    original_keys_sequence = [
+        ["a"],
+        ["a", "d"],
+        ["a", "b"],
+        ["a", "b", "c"],
+        ["c"],
+        ["c", "b"],
+        ["d", "f"],
+    ]
+    for keys_sequence in permutations(original_keys_sequence):
+        trie = Trie()
+        for keys in keys_sequence:
+            trie.add_suffix(keys)
+        # check each nested value
+        # Top level c
+        assert "c" in trie.matches
+        top_c = trie.matches["c"]
+        assert len(top_c.matches) == 1
+        assert "b" in top_c.matches
+        assert top_c.end
+        # Top level a
+        assert "a" in trie.matches
+        top_a = trie.matches["a"]
+        assert len(top_a.matches) == 2
+        #  a -> d
+        assert "d" in top_a.matches
+        a_to_d = top_a.matches["d"]
+        assert not a_to_d.matches
+        #  a -> b
+        assert "b" in top_a.matches
+        a_to_b = top_a.matches["b"]
+        assert a_to_b.end
+        assert len(a_to_b.matches) == 1
+        #  a -> b -> c
+        assert "c" in a_to_b.matches
+        a_to_b_to_c = a_to_b.matches["c"]
+        assert not a_to_b_to_c.matches
+        assert top_a.end
+        #  d -> f
+        assert "d" in trie.matches
+        top_d = trie.matches["d"]
+        assert not top_d.end
+        assert "f" in top_d.matches
+        d_to_f = top_d.matches["f"]
+        assert d_to_f.end
+        assert not d_to_f.matches
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tldextract/_version.py 
new/tldextract-3.4.4/tldextract/_version.py
--- old/tldextract-3.4.1/tldextract/_version.py 2023-04-27 01:31:34.000000000 
+0200
+++ new/tldextract-3.4.4/tldextract/_version.py 2023-05-20 02:33:31.000000000 
+0200
@@ -1,4 +1,4 @@
 # file generated by setuptools_scm
 # don't change, don't track in version control
-__version__ = version = '3.4.1'
-__version_tuple__ = version_tuple = (3, 4, 1)
+__version__ = version = '3.4.4'
+__version_tuple__ = version_tuple = (3, 4, 4)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tldextract/cache.py 
new/tldextract-3.4.4/tldextract/cache.py
--- old/tldextract-3.4.1/tldextract/cache.py    2023-04-10 03:08:44.000000000 
+0200
+++ new/tldextract-3.4.4/tldextract/cache.py    2023-05-20 02:25:26.000000000 
+0200
@@ -1,4 +1,4 @@
-"""Helpers """
+"""Helpers."""
 import errno
 import hashlib
 import json
@@ -30,7 +30,7 @@
 
 def get_pkg_unique_identifier() -> str:
     """
-    Generate an identifier unique to the python version, tldextract version, 
and python instance
+    Generate an identifier unique to the python version, tldextract version, 
and python instance.
 
     This will prevent interference between virtualenvs and issues that might 
arise when installing
     a new version of tldextract
@@ -61,7 +61,7 @@
 
 def get_cache_dir() -> str:
     """
-    Get a cache dir that we have permission to write to
+    Get a cache dir that we have permission to write to.
 
     Try to follow the XDG standard, but if that doesn't work fallback to the 
package directory
     http://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html
@@ -86,7 +86,7 @@
 
 
 class DiskCache:
-    """Disk _cache that only works for jsonable values"""
+    """Disk _cache that only works for jsonable values."""
 
     def __init__(self, cache_dir: Optional[str], lock_timeout: int = 20):
         self.enabled = bool(cache_dir)
@@ -115,7 +115,7 @@
     def set(
         self, namespace: str, key: Union[str, Dict[str, Hashable]], value: 
object
     ) -> None:
-        """Set a value in the disk cache"""
+        """Set a value in the disk cache."""
         if not self.enabled:
             return
 
@@ -142,7 +142,7 @@
                 _DID_LOG_UNABLE_TO_CACHE = True
 
     def clear(self) -> None:
-        """Clear the disk cache"""
+        """Clear the disk cache."""
         for root, _, files in os.walk(self.cache_dir):
             for filename in files:
                 if filename.endswith(self.file_ext) or filename.endswith(
@@ -175,7 +175,7 @@
         kwargs: Dict[str, Hashable],
         hashed_argnames: Iterable[str],
     ) -> T:
-        """Get a url but cache the response"""
+        """Get a url but cache the response."""
         if not self.enabled:
             return func(**kwargs)
 
@@ -215,7 +215,7 @@
     def cached_fetch_url(
         self, session: requests.Session, url: str, timeout: Union[float, int, 
None]
     ) -> str:
-        """Get a url but cache the response"""
+        """Get a url but cache the response."""
         return self.run_and_cache(
             func=_fetch_url,
             namespace="urls",
@@ -241,7 +241,7 @@
 
 
 def _make_dir(filename: str) -> None:
-    """Make a directory if it doesn't already exist"""
+    """Make a directory if it doesn't already exist."""
     if not os.path.exists(os.path.dirname(filename)):
         try:
             os.makedirs(os.path.dirname(filename))
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tldextract/cli.py 
new/tldextract-3.4.4/tldextract/cli.py
--- old/tldextract-3.4.1/tldextract/cli.py      2023-04-10 03:15:19.000000000 
+0200
+++ new/tldextract-3.4.4/tldextract/cli.py      2023-05-20 02:25:26.000000000 
+0200
@@ -1,4 +1,4 @@
-"""tldextract CLI"""
+"""tldextract CLI."""
 
 
 import argparse
@@ -12,7 +12,7 @@
 
 
 def main() -> None:
-    """tldextract CLI main command."""
+    """Tldextract CLI main command."""
     logging.basicConfig()
 
     parser = argparse.ArgumentParser(
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tldextract/remote.py 
new/tldextract-3.4.4/tldextract/remote.py
--- old/tldextract-3.4.1/tldextract/remote.py   2023-01-12 02:07:59.000000000 
+0100
+++ new/tldextract-3.4.4/tldextract/remote.py   2023-05-20 02:25:26.000000000 
+0200
@@ -1,36 +1,51 @@
-"tldextract helpers for testing and fetching remote resources."
+"""tldextract helpers for testing and fetching remote resources."""
 
 import re
 import socket
 from urllib.parse import scheme_chars
 
 IP_RE = re.compile(
-    # pylint: disable-next=line-too-long
-    
r"^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$"
+    r"^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.)"
+    r"{3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$"
 )
 
-SCHEME_RE = re.compile(r"^([" + scheme_chars + "]+:)?//")
+scheme_chars_set = set(scheme_chars)
 
 
 def lenient_netloc(url: str) -> str:
-    """Extract the netloc of a URL-like string, similar to the netloc attribute
-    returned by urllib.parse.{urlparse,urlsplit}, but extract more leniently,
-    without raising errors."""
+    """Extract the netloc of a URL-like string.
 
+    Similar to the netloc attribute returned by
+    urllib.parse.{urlparse,urlsplit}, but extract more leniently, without
+    raising errors.
+    """
     return (
-        SCHEME_RE.sub("", url)
+        _schemeless_url(url)
         .partition("/")[0]
         .partition("?")[0]
         .partition("#")[0]
-        .split("@")[-1]
+        .rpartition("@")[-1]
         .partition(":")[0]
         .strip()
-        .rstrip(".")
+        .rstrip(".\u3002\uff0e\uff61")
     )
 
 
+def _schemeless_url(url: str) -> str:
+    double_slashes_start = url.find("//")
+    if double_slashes_start == 0:
+        return url[2:]
+    if (
+        double_slashes_start < 2
+        or not url[double_slashes_start - 1] == ":"
+        or set(url[: double_slashes_start - 1]) - scheme_chars_set
+    ):
+        return url
+    return url[double_slashes_start + 2 :]
+
+
 def looks_like_ip(maybe_ip: str) -> bool:
-    """Does the given str look like an IP address?"""
+    """Check whether the given str looks like an IP address."""
     if not maybe_ip[0].isdigit():
         return False
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tldextract/suffix_list.py 
new/tldextract-3.4.4/tldextract/suffix_list.py
--- old/tldextract-3.4.1/tldextract/suffix_list.py      2023-01-12 
02:48:06.000000000 +0100
+++ new/tldextract-3.4.4/tldextract/suffix_list.py      2023-05-20 
02:25:26.000000000 +0200
@@ -1,4 +1,4 @@
-"tldextract helpers for testing and fetching remote resources."
+"""tldextract helpers for testing and fetching remote resources."""
 
 import logging
 import pkgutil
@@ -17,8 +17,11 @@
 
 
 class SuffixListNotFound(LookupError):
-    """A recoverable error while looking up a suffix list. Recoverable because
-    you can specify backups, or use this library's bundled snapshot."""
+    """A recoverable error while looking up a suffix list.
+
+    Recoverable because you can specify backups, or use this library's bundled
+    snapshot.
+    """
 
 
 def find_first_response(
@@ -26,9 +29,7 @@
     urls: Sequence[str],
     cache_fetch_timeout: Union[float, int, None] = None,
 ) -> str:
-    """Decode the first successfully fetched URL, from UTF-8 encoding to
-    Python unicode.
-    """
+    """Decode the first successfully fetched URL, from UTF-8 encoding to 
Python unicode."""
     with requests.Session() as session:
         session.mount("file://", FileAdapter())
 
@@ -46,8 +47,7 @@
 
 
 def extract_tlds_from_suffix_list(suffix_list_text: str) -> Tuple[List[str], 
List[str]]:
-    """Parse the raw suffix list text for its different designations of
-    suffixes."""
+    """Parse the raw suffix list text for its different designations of 
suffixes."""
     public_text, _, private_text = suffix_list_text.partition(
         PUBLIC_PRIVATE_SUFFIX_SEPARATOR
     )
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tldextract/tldextract.py 
new/tldextract-3.4.4/tldextract/tldextract.py
--- old/tldextract-3.4.1/tldextract/tldextract.py       2023-01-12 
02:48:06.000000000 +0100
+++ new/tldextract-3.4.4/tldextract/tldextract.py       2023-05-20 
02:30:02.000000000 +0200
@@ -1,5 +1,6 @@
-""" `tldextract` accurately separates a URL's subdomain, domain, and public 
suffix,
-using the Public Suffix List (PSL).
+"""`tldextract` accurately separates a URL's subdomain, domain, and public 
suffix.
+
+It does this via the Public Suffix List (PSL).
 
     >>> import tldextract
 
@@ -48,12 +49,22 @@
     '127.0.0.1'
 """
 
+from __future__ import annotations
+
 import logging
 import os
-import re
-from functools import wraps
-from typing import FrozenSet, List, NamedTuple, Optional, Sequence, Union
 import urllib.parse
+from functools import wraps
+from typing import (
+    Collection,
+    Dict,
+    FrozenSet,
+    List,
+    NamedTuple,
+    Optional,
+    Sequence,
+    Union,
+)
 
 import idna
 
@@ -71,8 +82,6 @@
     
"https://raw.githubusercontent.com/publicsuffix/list/master/public_suffix_list.dat";,
 )
 
-_UNICODE_DOTS_RE = re.compile("[\u002e\u3002\uff0e\uff61]")
-
 
 class ExtractResult(NamedTuple):
     """namedtuple of a URL's subdomain, domain, and suffix."""
@@ -91,8 +100,8 @@
         >>> extract('http://localhost:8080').registered_domain
         ''
         """
-        if self.domain and self.suffix:
-            return self.domain + "." + self.suffix
+        if self.suffix and self.domain:
+            return f"{self.domain}.{self.suffix}"
         return ""
 
     @property
@@ -105,7 +114,7 @@
         >>> extract('http://localhost:8080').fqdn
         ''
         """
-        if self.domain and self.suffix:
+        if self.suffix and self.domain:
             # Disable bogus lint error 
(https://github.com/PyCQA/pylint/issues/2568)
             # pylint: disable-next=not-an-iterable
             return ".".join(i for i in self if i)
@@ -114,7 +123,7 @@
     @property
     def ipv4(self) -> str:
         """
-        Returns the ipv4 if that is what the presented domain/url is
+        Returns the ipv4 if that is what the presented domain/url is.
 
         >>> extract('http://127.0.0.1/path/to/file').ipv4
         '127.0.0.1'
@@ -129,8 +138,7 @@
 
 
 class TLDExtract:
-    """A callable for extracting, subdomain, domain, and suffix components from
-    a URL."""
+    """A callable for extracting, subdomain, domain, and suffix components 
from a URL."""
 
     # TODO: Agreed with Pylint: too-many-arguments
     def __init__(  # pylint: disable=too-many-arguments
@@ -142,9 +150,7 @@
         extra_suffixes: Sequence[str] = (),
         cache_fetch_timeout: Union[str, float, None] = CACHE_TIMEOUT,
     ) -> None:
-        """
-        Constructs a callable for extracting subdomain, domain, and suffix
-        components from a URL.
+        """Construct a callable for extracting subdomain, domain, and suffix 
components from a URL.
 
         Upon calling it, it first checks for a JSON in `cache_dir`. By default,
         the `cache_dir` will live in the tldextract directory. You can disable
@@ -207,17 +213,17 @@
         self._cache = DiskCache(cache_dir)
 
     def __call__(
-        self, url: str, include_psl_private_domains: Optional[bool] = None
+        self, url: str, include_psl_private_domains: bool | None = None
     ) -> ExtractResult:
         """Alias for `extract_str`."""
         return self.extract_str(url, include_psl_private_domains)
 
     def extract_str(
-        self, url: str, include_psl_private_domains: Optional[bool] = None
+        self, url: str, include_psl_private_domains: bool | None = None
     ) -> ExtractResult:
-        """
-        Takes a string URL and splits it into its subdomain, domain, and
-        suffix (effective TLD, gTLD, ccTLD, etc.) components.
+        """Take a string URL and splits it into its subdomain, domain, and 
suffix components.
+
+        I.e. its effective TLD, gTLD, ccTLD, etc. components.
 
         >>> extractor = TLDExtract()
         >>> extractor.extract_str('http://forums.news.cnn.com/')
@@ -232,10 +238,10 @@
         url: Union[urllib.parse.ParseResult, urllib.parse.SplitResult],
         include_psl_private_domains: Optional[bool] = None,
     ) -> ExtractResult:
-        """
-        Takes the output of urllib.parse URL parsing methods and further splits
-        the parsed URL into its subdomain, domain, and suffix (effective TLD,
-        gTLD, ccTLD, etc.) components.
+        """Take the output of urllib.parse URL parsing methods and further 
splits the parsed URL.
+
+        Splits the parsed URL into its subdomain, domain, and suffix
+        components, i.e. its effective TLD, gTLD, ccTLD, etc. components.
 
         This method is like `extract_str` but faster, as the string's domain
         name has already been parsed.
@@ -251,18 +257,22 @@
     def _extract_netloc(
         self, netloc: str, include_psl_private_domains: Optional[bool]
     ) -> ExtractResult:
-        labels = _UNICODE_DOTS_RE.split(netloc)
+        labels = (
+            netloc.replace("\u3002", "\u002e")
+            .replace("\uff0e", "\u002e")
+            .replace("\uff61", "\u002e")
+            .split(".")
+        )
 
-        translations = [_decode_punycode(label) for label in labels]
         suffix_index = self._get_tld_extractor().suffix_index(
-            translations, 
include_psl_private_domains=include_psl_private_domains
+            labels, include_psl_private_domains=include_psl_private_domains
         )
 
-        suffix = ".".join(labels[suffix_index:])
-        if not suffix and netloc and looks_like_ip(netloc):
+        if suffix_index == len(labels) and netloc and looks_like_ip(netloc):
             return ExtractResult("", netloc, "")
 
-        subdomain = ".".join(labels[: suffix_index - 1]) if suffix_index else 
""
+        suffix = ".".join(labels[suffix_index:]) if suffix_index != 
len(labels) else ""
+        subdomain = ".".join(labels[: suffix_index - 1]) if suffix_index >= 2 
else ""
         domain = labels[suffix_index - 1] if suffix_index else ""
         return ExtractResult(subdomain, domain, suffix)
 
@@ -276,22 +286,23 @@
     @property
     def tlds(self) -> List[str]:
         """
-        Returns the list of tld's used by default
+        Returns the list of tld's used by default.
 
         This will vary based on `include_psl_private_domains` and 
`extra_suffixes`
         """
         return list(self._get_tld_extractor().tlds())
 
-    def _get_tld_extractor(self) -> "_PublicSuffixListTLDExtractor":
-        """Get or compute this object's TLDExtractor. Looks up the TLDExtractor
-        in roughly the following order, based on the settings passed to
-        __init__:
+    def _get_tld_extractor(self) -> _PublicSuffixListTLDExtractor:
+        """Get or compute this object's TLDExtractor.
+
+        Looks up the TLDExtractor in roughly the following order, based on the
+        settings passed to __init__:
 
         1. Memoized on `self`
         2. Local system _cache file
         3. Remote PSL, over HTTP
-        4. Bundled PSL snapshot file"""
-
+        4. Bundled PSL snapshot file
+        """
         if self._extractor:
             return self._extractor
 
@@ -317,6 +328,37 @@
 TLD_EXTRACTOR = TLDExtract()
 
 
+class Trie:
+    """Trie for storing eTLDs with their labels in reverse-order."""
+
+    def __init__(self, matches: Optional[Dict] = None, end: bool = False) -> 
None:
+        self.matches = matches if matches else {}
+        self.end = end
+
+    @staticmethod
+    def create(suffixes: Collection[str]) -> Trie:
+        """Create a Trie from a list of suffixes and return its root node."""
+        root_node = Trie()
+
+        for suffix in suffixes:
+            suffix_labels = suffix.split(".")
+            suffix_labels.reverse()
+            root_node.add_suffix(suffix_labels)
+
+        return root_node
+
+    def add_suffix(self, labels: List[str]) -> None:
+        """Append a suffix's labels to this Trie node."""
+        node = self
+
+        for label in labels:
+            if label not in node.matches:
+                node.matches[label] = Trie()
+            node = node.matches[label]
+
+        node.end = True
+
+
 @wraps(TLD_EXTRACTOR.__call__)
 def extract(  # pylint: disable=missing-function-docstring
     url: str, include_psl_private_domains: Optional[bool] = False
@@ -331,9 +373,7 @@
 
 
 class _PublicSuffixListTLDExtractor:
-    """Wrapper around this project's main algo for PSL
-    lookups.
-    """
+    """Wrapper around this project's main algo for PSL lookups."""
 
     def __init__(
         self,
@@ -348,6 +388,8 @@
         self.private_tlds = private_tlds
         self.tlds_incl_private = frozenset(public_tlds + private_tlds + 
extra_tlds)
         self.tlds_excl_private = frozenset(public_tlds + extra_tlds)
+        self.tlds_incl_private_trie = Trie.create(self.tlds_incl_private)
+        self.tlds_excl_private_trie = Trie.create(self.tlds_excl_private)
 
     def tlds(
         self, include_psl_private_domains: Optional[bool] = None
@@ -363,27 +405,41 @@
         )
 
     def suffix_index(
-        self, lower_spl: List[str], include_psl_private_domains: 
Optional[bool] = None
+        self, spl: List[str], include_psl_private_domains: Optional[bool] = 
None
     ) -> int:
-        """Returns the index of the first suffix label.
-        Returns len(spl) if no suffix is found
+        """Return the index of the first suffix label.
+
+        Returns len(spl) if no suffix is found.
         """
-        tlds = self.tlds(include_psl_private_domains)
-        length = len(lower_spl)
-        for i in range(length):
-            maybe_tld = ".".join(lower_spl[i:])
-            exception_tld = "!" + maybe_tld
-            if exception_tld in tlds:
-                return i + 1
-
-            if maybe_tld in tlds:
-                return i
-
-            wildcard_tld = "*." + ".".join(lower_spl[i + 1 :])
-            if wildcard_tld in tlds:
-                return i
+        if include_psl_private_domains is None:
+            include_psl_private_domains = self.include_psl_private_domains
+
+        node = (
+            self.tlds_incl_private_trie
+            if include_psl_private_domains
+            else self.tlds_excl_private_trie
+        )
+        i = len(spl)
+        j = i
+        for label in reversed(spl):
+            decoded_label = _decode_punycode(label)
+            if decoded_label in node.matches:
+                j -= 1
+                if node.matches[decoded_label].end:
+                    i = j
+                node = node.matches[decoded_label]
+                continue
+
+            is_wildcard = "*" in node.matches
+            if is_wildcard:
+                is_wildcard_exception = "!" + decoded_label in node.matches
+                if is_wildcard_exception:
+                    return j
+                return j - 1
+
+            break
 
-        return length
+        return i
 
 
 def _decode_punycode(label: str) -> str:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tldextract.egg-info/PKG-INFO 
new/tldextract-3.4.4/tldextract.egg-info/PKG-INFO
--- old/tldextract-3.4.1/tldextract.egg-info/PKG-INFO   2023-04-27 
01:31:34.000000000 +0200
+++ new/tldextract-3.4.4/tldextract.egg-info/PKG-INFO   2023-05-20 
02:33:31.000000000 +0200
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: tldextract
-Version: 3.4.1
+Version: 3.4.4
 Summary: Accurately separates a URL's subdomain, domain, and public suffix, 
using the Public Suffix List (PSL). By default, this includes the public ICANN 
TLDs and their exceptions. You can optionally support the Public Suffix List's 
private domains as well.
 Home-page: https://github.com/john-kurkowski/tldextract
 Author: John Kurkowski
@@ -20,8 +20,9 @@
 Description-Content-Type: text/markdown
 License-File: LICENSE
 
- `tldextract` accurately separates a URL's subdomain, domain, and public 
suffix,
-using the Public Suffix List (PSL).
+`tldextract` accurately separates a URL's subdomain, domain, and public suffix.
+
+It does this via the Public Suffix List (PSL).
 
     >>> import tldextract
     >>> tldextract.extract('http://forums.news.cnn.com/')
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.4.1/tldextract.egg-info/SOURCES.txt 
new/tldextract-3.4.4/tldextract.egg-info/SOURCES.txt
--- old/tldextract-3.4.1/tldextract.egg-info/SOURCES.txt        2023-04-27 
01:31:34.000000000 +0200
+++ new/tldextract-3.4.4/tldextract.egg-info/SOURCES.txt        2023-05-20 
02:33:31.000000000 +0200
@@ -18,6 +18,7 @@
 tests/main_test.py
 tests/test_cache.py
 tests/test_parallel.py
+tests/test_trie.py
 tests/fixtures/fake_suffix_list_fixture.dat
 tldextract/.tld_set_snapshot
 tldextract/__init__.py

commit python-tldextract for openSUSE:Factory

Reply via email to