Script 'mail_helper' called by obssrc
Hello community,
here is the log from the commit of package python-charset-normalizer for
openSUSE:Factory checked in at 2022-09-18 17:31:58
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-charset-normalizer (Old)
and /work/SRC/openSUSE:Factory/.python-charset-normalizer.new.2083 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-charset-normalizer"
Sun Sep 18 17:31:58 2022 rev:15 rq:1004361 version:2.1.1
Changes:
--------
---
/work/SRC/openSUSE:Factory/python-charset-normalizer/python-charset-normalizer.changes
2022-08-20 20:27:51.741219415 +0200
+++
/work/SRC/openSUSE:Factory/.python-charset-normalizer.new.2083/python-charset-normalizer.changes
2022-09-18 17:32:00.929734658 +0200
@@ -1,0 +2,7 @@
+Sat Sep 17 15:46:10 UTC 2022 - Dirk M??ller <[email protected]>
+
+- update to 2.1.1:
+ * Function `normalize` scheduled for removal in 3.0
+ * Removed useless call to decode in fn is_unprintable (#206)
+
+-------------------------------------------------------------------
Old:
----
charset_normalizer-2.1.0.tar.gz
New:
----
charset_normalizer-2.1.1.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-charset-normalizer.spec ++++++
--- /var/tmp/diff_new_pack.1YRidd/_old 2022-09-18 17:32:01.469736234 +0200
+++ /var/tmp/diff_new_pack.1YRidd/_new 2022-09-18 17:32:01.477736257 +0200
@@ -19,7 +19,7 @@
%{?!python_module:%define python_module() python3-%{**}}
%define skip_python2 1
Name: python-charset-normalizer
-Version: 2.1.0
+Version: 2.1.1
Release: 0
Summary: Python Universal Charset detector
License: MIT
++++++ charset_normalizer-2.1.0.tar.gz -> charset_normalizer-2.1.1.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.1.0/.github/workflows/lint.yml
new/charset_normalizer-2.1.1/.github/workflows/lint.yml
--- old/charset_normalizer-2.1.0/.github/workflows/lint.yml 2022-06-19
23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/.github/workflows/lint.yml 2022-08-20
00:06:12.000000000 +0200
@@ -28,7 +28,7 @@
python setup.py install
- name: Type checking (Mypy)
run: |
- mypy charset_normalizer
+ mypy --strict charset_normalizer
- name: Import sorting check (isort)
run: |
isort --check charset_normalizer
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.1.0/CHANGELOG.md
new/charset_normalizer-2.1.1/CHANGELOG.md
--- old/charset_normalizer-2.1.0/CHANGELOG.md 2022-06-19 23:55:20.000000000
+0200
+++ new/charset_normalizer-2.1.1/CHANGELOG.md 2022-08-20 00:06:12.000000000
+0200
@@ -2,6 +2,17 @@
All notable changes to charset-normalizer will be documented in this file.
This project adheres to [Semantic
Versioning](https://semver.org/spec/v2.0.0.html).
The format is based on [Keep a
Changelog](https://keepachangelog.com/en/1.0.0/).
+## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1)
(2022-08-19)
+
+### Deprecated
+- Function `normalize` scheduled for removal in 3.0
+
+### Changed
+- Removed useless call to decode in fn is_unprintable (#206)
+
+### Fixed
+- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263)
with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov)
(#204)
+
##
[2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0)
(2022-06-19)
### Added
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.1.0/README.md
new/charset_normalizer-2.1.1/README.md
--- old/charset_normalizer-2.1.0/README.md 2022-06-19 23:55:20.000000000
+0200
+++ new/charset_normalizer-2.1.1/README.md 2022-08-20 00:06:12.000000000
+0200
@@ -29,11 +29,12 @@
| `Universal**` | ??? | :heavy_check_mark: |
??? |
| `Reliable` **without** distinguishable standards | ??? | :heavy_check_mark:
| :heavy_check_mark: |
| `Reliable` **with** distinguishable standards | :heavy_check_mark: |
:heavy_check_mark: | :heavy_check_mark: |
-| `Free & Open` | :heavy_check_mark: | :heavy_check_mark:
| :heavy_check_mark: |
-| `License` | LGPL-2.1 | MIT | MPL-1.1
+| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
| `Native Python` | :heavy_check_mark: | :heavy_check_mark: | ??? |
| `Detect spoken language` | ??? | :heavy_check_mark: | N/A |
-| `Supported Encoding` | 30 | :tada:
[93](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings)
| 40
+| `UnicodeDecodeError Safety` | ??? | :heavy_check_mark: | ??? |
+| `Whl Size` | 193.6 kB | 39.5 kB | ~200 kB |
+| `Supported Encoding` | 33 | :tada:
[93](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings)
| 40
<p align="center">
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text"
width="226"/><img
src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif"
alt="Cat Reading Text" width="200"/>
@@ -51,7 +52,7 @@
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
| ------------- | :-------------: | :------------------: |
:------------------: |
-| [chardet](https://github.com/chardet/chardet) | 92 % |
200 ms | 5 file/sec |
+| [chardet](https://github.com/chardet/chardet) | 86 % |
200 ms | 5 file/sec |
| charset-normalizer | **98 %** | **39 ms** | 26
file/sec |
| Package | 99th percentile | 95th percentile | 50th percentile |
@@ -64,6 +65,8 @@
> Stats are generated using 400+ files using default parameters. More details
> on used files, see GHA workflows.
> And yes, these results might change at any time. The dataset can be updated
> to include more files.
> The actual delays heavily depends on your CPU capabilities. The factors
> should remain the same.
+> Keep in mind that the stats are generous and that Chardet accuracy vs our is
measured using Chardet initial capability
+> (eg. Supported Encoding) Challenge-them if you want.
[cchardet](https://github.com/PyYoshi/cChardet) is a non-native (cpp binding)
and unmaintained faster alternative with
a better accuracy than chardet but lower than this package. If speed is the
most important factor, you should try it.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.1.0/charset_normalizer/__init__.py
new/charset_normalizer-2.1.1/charset_normalizer/__init__.py
--- old/charset_normalizer-2.1.0/charset_normalizer/__init__.py 2022-06-19
23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/charset_normalizer/__init__.py 2022-08-20
00:06:12.000000000 +0200
@@ -1,4 +1,4 @@
-# -*- coding: utf_8 -*-
+# -*- coding: utf-8 -*-
"""
Charset-Normalizer
~~~~~~~~~~~~~~
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.1.0/charset_normalizer/api.py
new/charset_normalizer-2.1.1/charset_normalizer/api.py
--- old/charset_normalizer-2.1.0/charset_normalizer/api.py 2022-06-19
23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/charset_normalizer/api.py 2022-08-20
00:06:12.000000000 +0200
@@ -1,7 +1,8 @@
import logging
+import warnings
from os import PathLike
from os.path import basename, splitext
-from typing import BinaryIO, List, Optional, Set
+from typing import Any, BinaryIO, List, Optional, Set
from .cd import (
coherence_ratio,
@@ -36,8 +37,8 @@
steps: int = 5,
chunk_size: int = 512,
threshold: float = 0.2,
- cp_isolation: List[str] = None,
- cp_exclusion: List[str] = None,
+ cp_isolation: Optional[List[str]] = None,
+ cp_exclusion: Optional[List[str]] = None,
preemptive_behaviour: bool = True,
explain: bool = False,
) -> CharsetMatches:
@@ -486,8 +487,8 @@
steps: int = 5,
chunk_size: int = 512,
threshold: float = 0.20,
- cp_isolation: List[str] = None,
- cp_exclusion: List[str] = None,
+ cp_isolation: Optional[List[str]] = None,
+ cp_exclusion: Optional[List[str]] = None,
preemptive_behaviour: bool = True,
explain: bool = False,
) -> CharsetMatches:
@@ -508,12 +509,12 @@
def from_path(
- path: PathLike,
+ path: "PathLike[Any]",
steps: int = 5,
chunk_size: int = 512,
threshold: float = 0.20,
- cp_isolation: List[str] = None,
- cp_exclusion: List[str] = None,
+ cp_isolation: Optional[List[str]] = None,
+ cp_exclusion: Optional[List[str]] = None,
preemptive_behaviour: bool = True,
explain: bool = False,
) -> CharsetMatches:
@@ -535,17 +536,22 @@
def normalize(
- path: PathLike,
+ path: "PathLike[Any]",
steps: int = 5,
chunk_size: int = 512,
threshold: float = 0.20,
- cp_isolation: List[str] = None,
- cp_exclusion: List[str] = None,
+ cp_isolation: Optional[List[str]] = None,
+ cp_exclusion: Optional[List[str]] = None,
preemptive_behaviour: bool = True,
) -> CharsetMatch:
"""
Take a (text-based) file path and try to create another file next to it,
this time using UTF-8.
"""
+ warnings.warn(
+ "normalize is deprecated and will be removed in 3.0",
+ DeprecationWarning,
+ )
+
results = from_path(
path,
steps,
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.1.0/charset_normalizer/assets/__init__.py
new/charset_normalizer-2.1.1/charset_normalizer/assets/__init__.py
--- old/charset_normalizer-2.1.0/charset_normalizer/assets/__init__.py
2022-06-19 23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/charset_normalizer/assets/__init__.py
2022-08-20 00:06:12.000000000 +0200
@@ -1,4 +1,4 @@
-# -*- coding: utf_8 -*-
+# -*- coding: utf-8 -*-
from typing import Dict, List
FREQUENCIES: Dict[str, List[str]] = {
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.1.0/charset_normalizer/cd.py
new/charset_normalizer-2.1.1/charset_normalizer/cd.py
--- old/charset_normalizer-2.1.0/charset_normalizer/cd.py 2022-06-19
23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/charset_normalizer/cd.py 2022-08-20
00:06:12.000000000 +0200
@@ -2,7 +2,7 @@
from codecs import IncrementalDecoder
from collections import Counter
from functools import lru_cache
-from typing import Dict, List, Optional, Tuple
+from typing import Counter as TypeCounter, Dict, List, Optional, Tuple
from .assets import FREQUENCIES
from .constant import KO_NAMES, LANGUAGE_SUPPORTED_COUNT, TOO_SMALL_SEQUENCE,
ZH_NAMES
@@ -24,7 +24,9 @@
if is_multi_byte_encoding(iana_name):
raise IOError("Function not supported on multi-byte code page")
- decoder =
importlib.import_module("encodings.{}".format(iana_name)).IncrementalDecoder #
type: ignore
+ decoder = importlib.import_module(
+ "encodings.{}".format(iana_name)
+ ).IncrementalDecoder
p: IncrementalDecoder = decoder(errors="ignore")
seen_ranges: Dict[str, int] = {}
@@ -307,7 +309,7 @@
lg_inclusion_list.remove("Latin Based")
for layer in alpha_unicode_split(decoded_sequence):
- sequence_frequencies: Counter = Counter(layer)
+ sequence_frequencies: TypeCounter[str] = Counter(layer)
most_common = sequence_frequencies.most_common()
character_count: int = sum(o for c, o in most_common)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.1.0/charset_normalizer/cli/normalizer.py
new/charset_normalizer-2.1.1/charset_normalizer/cli/normalizer.py
--- old/charset_normalizer-2.1.0/charset_normalizer/cli/normalizer.py
2022-06-19 23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/charset_normalizer/cli/normalizer.py
2022-08-20 00:06:12.000000000 +0200
@@ -3,7 +3,7 @@
from json import dumps
from os.path import abspath
from platform import python_version
-from typing import List
+from typing import List, Optional
try:
from unicodedata2 import unidata_version
@@ -48,7 +48,7 @@
sys.stdout.write("Please respond with 'yes' or 'no' " "(or 'y' or
'n').\n")
-def cli_detect(argv: List[str] = None) -> int:
+def cli_detect(argv: Optional[List[str]] = None) -> int:
"""
CLI assistant using ARGV and ArgumentParser
:param argv:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.1.0/charset_normalizer/models.py
new/charset_normalizer-2.1.1/charset_normalizer/models.py
--- old/charset_normalizer-2.1.0/charset_normalizer/models.py 2022-06-19
23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/charset_normalizer/models.py 2022-08-20
00:06:12.000000000 +0200
@@ -4,7 +4,16 @@
from hashlib import sha256
from json import dumps
from re import sub
-from typing import Any, Dict, Iterator, List, Optional, Tuple, Union
+from typing import (
+ Any,
+ Counter as TypeCounter,
+ Dict,
+ Iterator,
+ List,
+ Optional,
+ Tuple,
+ Union,
+)
from .constant import NOT_PRINTABLE_PATTERN, TOO_BIG_SEQUENCE
from .md import mess_ratio
@@ -95,7 +104,7 @@
return 0.0
@property
- def w_counter(self) -> Counter:
+ def w_counter(self) -> TypeCounter[str]:
"""
Word counter instance on decoded text.
Notice: Will be removed in 3.0
@@ -280,7 +289,7 @@
Act like a list(iterable) but does not implements all related methods.
"""
- def __init__(self, results: List[CharsetMatch] = None):
+ def __init__(self, results: Optional[List[CharsetMatch]] = None):
self._results: List[CharsetMatch] = sorted(results) if results else []
def __iter__(self) -> Iterator[CharsetMatch]:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.1.0/charset_normalizer/utils.py
new/charset_normalizer-2.1.1/charset_normalizer/utils.py
--- old/charset_normalizer-2.1.0/charset_normalizer/utils.py 2022-06-19
23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/charset_normalizer/utils.py 2022-08-20
00:06:12.000000000 +0200
@@ -13,7 +13,7 @@
from re import findall
from typing import Generator, List, Optional, Set, Tuple, Union
-from _multibytecodec import MultibyteIncrementalDecoder # type: ignore
+from _multibytecodec import MultibyteIncrementalDecoder
from .constant import (
ENCODING_MARKS,
@@ -206,7 +206,7 @@
character.isspace() is False # includes \n \t \r \v
and character.isprintable() is False
and character != "\x1A" # Why? Its the ASCII substitute character.
- and character != b"\xEF\xBB\xBF".decode("utf_8") # bug discovered in
Python,
+ and character != "\ufeff" # bug discovered in Python,
# Zero Width No-Break Space located in Arabic Presentation
Forms-B, Unicode 1.1 not acknowledged as space.
)
@@ -231,6 +231,9 @@
for specified_encoding in results:
specified_encoding = specified_encoding.lower().replace("-", "_")
+ encoding_alias: str
+ encoding_iana: str
+
for encoding_alias, encoding_iana in aliases.items():
if encoding_alias == specified_encoding:
return encoding_iana
@@ -256,7 +259,7 @@
"utf_32_be",
"utf_7",
} or issubclass(
-
importlib.import_module("encodings.{}".format(name)).IncrementalDecoder, #
type: ignore
+
importlib.import_module("encodings.{}".format(name)).IncrementalDecoder,
MultibyteIncrementalDecoder,
)
@@ -286,6 +289,9 @@
def iana_name(cp_name: str, strict: bool = True) -> str:
cp_name = cp_name.lower().replace("-", "_")
+ encoding_alias: str
+ encoding_iana: str
+
for encoding_alias, encoding_iana in aliases.items():
if cp_name in [encoding_alias, encoding_iana]:
return encoding_iana
@@ -315,8 +321,12 @@
if is_multi_byte_encoding(iana_name_a) or
is_multi_byte_encoding(iana_name_b):
return 0.0
- decoder_a =
importlib.import_module("encodings.{}".format(iana_name_a)).IncrementalDecoder
# type: ignore
- decoder_b =
importlib.import_module("encodings.{}".format(iana_name_b)).IncrementalDecoder
# type: ignore
+ decoder_a = importlib.import_module(
+ "encodings.{}".format(iana_name_a)
+ ).IncrementalDecoder
+ decoder_b = importlib.import_module(
+ "encodings.{}".format(iana_name_b)
+ ).IncrementalDecoder
id_a: IncrementalDecoder = decoder_a(errors="ignore")
id_b: IncrementalDecoder = decoder_b(errors="ignore")
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.1.0/charset_normalizer/version.py
new/charset_normalizer-2.1.1/charset_normalizer/version.py
--- old/charset_normalizer-2.1.0/charset_normalizer/version.py 2022-06-19
23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/charset_normalizer/version.py 2022-08-20
00:06:12.000000000 +0200
@@ -2,5 +2,5 @@
Expose version
"""
-__version__ = "2.1.0"
+__version__ = "2.1.1"
VERSION = __version__.split(".")
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.1.0/dev-requirements.txt
new/charset_normalizer-2.1.1/dev-requirements.txt
--- old/charset_normalizer-2.1.0/dev-requirements.txt 2022-06-19
23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/dev-requirements.txt 2022-08-20
00:06:12.000000000 +0200
@@ -1,10 +1,10 @@
pytest
pytest-cov
codecov
-chardet==4.0.*
-Flask>=2.0,<3.0; python_version >= '3.6'
-requests>=2.26,<3.0; python_version >= '3.6'
-black==22.3.0; python_version >= '3.6'
-flake8==4.0.1; python_version >= '3.6'
-mypy==0.961; python_version >= '3.6'
-isort; python_version >= '3.6'
+chardet>=5.0,<5.1
+Flask>=2.0,<3.0
+requests>=2.26,<3.0
+black==22.6.0
+flake8==5.0.4
+mypy==0.971
+isort
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.1.0/docs/community/faq.rst
new/charset_normalizer-2.1.1/docs/community/faq.rst
--- old/charset_normalizer-2.1.0/docs/community/faq.rst 2022-06-19
23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/docs/community/faq.rst 2022-08-20
00:06:12.000000000 +0200
@@ -23,6 +23,10 @@
The real debate is to state if the detection is an HTTP client matter or not.
That is more complicated and not my field.
+Some individuals keep insisting that the *whole* Internet is UTF-8 ready.
Those are absolutely wrong and very Europe and North America-centered,
+In my humble experience, the countries in the world are very disparate in this
evolution. And the Internet is not just about HTML content.
+Having a thorough analysis of this is very scary.
+
Should I bother using detection?
--------------------------------
@@ -36,11 +40,10 @@
Then this change is mostly backward-compatible, exception of a thing:
- This new library support way more code pages (x3) than its counterpart
Chardet.
- - Based on the 30-ich charsets that Chardet support, expect roughly 90% BC
results
https://github.com/Ousret/charset_normalizer/pull/77/checks?check_run_id=3244585065
+ - Based on the 30-ich charsets that Chardet support, expect roughly 85% BC
results
https://github.com/Ousret/charset_normalizer/pull/77/checks?check_run_id=3244585065
We do not guarantee this BC exact percentage through time. May vary but not by
much.
-
Isn't it the same as Chardet?
-----------------------------
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.1.0/docs/community/why_migrate.rst
new/charset_normalizer-2.1.1/docs/community/why_migrate.rst
--- old/charset_normalizer-2.1.0/docs/community/why_migrate.rst 2022-06-19
23:55:20.000000000 +0200
+++ new/charset_normalizer-2.1.1/docs/community/why_migrate.rst 2022-08-20
00:06:12.000000000 +0200
@@ -4,13 +4,13 @@
There is so many reason to migrate your current project. Here are some of them:
- Remove ANY license ambiguity/restriction for projects bundling Chardet (even
indirectly).
-- X5 faster than Chardet in average and X2 faster in 99% of the cases AND
support 3 times more encoding.
+- X5 faster than Chardet in average and X3 faster in 99% of the cases AND
support 3 times more encoding.
- Never return a encoding if not suited for the given decoder. Eg. Never get
UnicodeDecodeError!
- Actively maintained, open to contributors.
- Have the backward compatible function ``detect`` that come from Chardet.
- Truly detect the language used in the text.
- It is, for the first time, really universal! As there is no specific probe
per charset.
-- The package size is X4 lower than Chardet's (4.0)!
+- The package size is X4 lower than Chardet's (5.0)!
- Propose much more options/public kwargs to tweak the detection as you sees
fit!
- Using static typing to ease your development.
- Detect Unicode content better than Chardet or cChardet does.