Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package python-charset-normalizer for openSUSE:Factory checked in at 2023-03-29 23:26:15 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-charset-normalizer (Old) and /work/SRC/openSUSE:Factory/.python-charset-normalizer.new.31432 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-charset-normalizer" Wed Mar 29 23:26:15 2023 rev:18 rq:1074517 version:3.1.0 Changes: -------- --- /work/SRC/openSUSE:Factory/python-charset-normalizer/python-charset-normalizer.changes 2022-12-04 14:57:55.260120466 +0100 +++ /work/SRC/openSUSE:Factory/.python-charset-normalizer.new.31432/python-charset-normalizer.changes 2023-03-29 23:26:22.471223639 +0200 @@ -1,0 +2,9 @@ +Sun Mar 26 20:04:17 UTC 2023 - Dirk Müller <dmuel...@suse.com> + +- update to 3.1.0: + * Argument `should_rename_legacy` for legacy function `detect` + and disregard any new arguments without errors (PR #262) + * Removed Support for Python 3.6 (PR #260) + * Optional speedup provided by mypy/c 1.0.1 + +------------------------------------------------------------------- Old: ---- charset_normalizer-3.0.1.tar.gz New: ---- charset_normalizer-3.1.0.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-charset-normalizer.spec ++++++ --- /var/tmp/diff_new_pack.OmyiLS/_old 2023-03-29 23:26:23.671229276 +0200 +++ /var/tmp/diff_new_pack.OmyiLS/_new 2023-03-29 23:26:23.675229295 +0200 @@ -1,7 +1,7 @@ # # spec file for package python-charset-normalizer # -# Copyright (c) 2022 SUSE LLC +# Copyright (c) 2023 SUSE LLC # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed @@ -19,12 +19,13 @@ %{?!python_module:%define python_module() python3-%{**}} %define skip_python2 1 Name: python-charset-normalizer -Version: 3.0.1 +Version: 3.1.0 Release: 0 Summary: Python Universal Charset detector License: MIT URL: https://github.com/ousret/charset_normalizer Source: https://github.com/Ousret/charset_normalizer/archive/refs/tags/%{version}.tar.gz#/charset_normalizer-%{version}.tar.gz +BuildRequires: %{python_module base >= 3.7} BuildRequires: %{python_module setuptools} BuildRequires: fdupes BuildRequires: python-rpm-macros ++++++ charset_normalizer-3.0.1.tar.gz -> charset_normalizer-3.1.0.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/.github/workflows/mypyc-verify.yml new/charset_normalizer-3.1.0/.github/workflows/mypyc-verify.yml --- old/charset_normalizer-3.0.1/.github/workflows/mypyc-verify.yml 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/.github/workflows/mypyc-verify.yml 2023-03-06 07:46:55.000000000 +0100 @@ -9,7 +9,7 @@ strategy: fail-fast: false matrix: - python-version: [3.6, 3.7, 3.8, 3.9, "3.10"] + python-version: [3.7, 3.8, 3.9, "3.10", "3.11"] os: [ubuntu-latest] steps: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/.github/workflows/python-publish.yml new/charset_normalizer-3.1.0/.github/workflows/python-publish.yml --- old/charset_normalizer-3.0.1/.github/workflows/python-publish.yml 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/.github/workflows/python-publish.yml 2023-03-06 07:46:55.000000000 +0100 @@ -52,7 +52,7 @@ strategy: fail-fast: false matrix: - python-version: [ 3.6, 3.7, 3.8, 3.9, "3.10", "3.11" ] + python-version: [ 3.7, 3.8, 3.9, "3.10", "3.11" ] os: [ ubuntu-latest ] steps: @@ -215,7 +215,7 @@ run: | python -m pip install -U pip wheel setuptools build twine - name: Build wheels - uses: pypa/cibuildwheel@v2.11.2 + uses: pypa/cibuildwheel@v2.12.0 env: #CIBW_BUILD_FRONTEND: "build" CIBW_ARCHS_MACOS: x86_64 arm64 universal2 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/.github/workflows/run-tests.yml new/charset_normalizer-3.1.0/.github/workflows/run-tests.yml --- old/charset_normalizer-3.0.1/.github/workflows/run-tests.yml 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/.github/workflows/run-tests.yml 2023-03-06 07:46:55.000000000 +0100 @@ -9,7 +9,7 @@ strategy: fail-fast: false matrix: - python-version: [3.6, 3.7, 3.8, 3.9, "3.10", "3.11", "3.12-dev"] + python-version: [3.7, 3.8, 3.9, "3.10", "3.11", "3.12-dev"] os: [ubuntu-latest] steps: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/CHANGELOG.md new/charset_normalizer-3.1.0/CHANGELOG.md --- old/charset_normalizer-3.0.1/CHANGELOG.md 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/CHANGELOG.md 2023-03-06 07:46:55.000000000 +0100 @@ -2,6 +2,17 @@ All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). +## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06) + +### Added +- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262) + +### Removed +- Support for Python 3.6 (PR #260) + +### Changed +- Optional speedup provided by mypy/c 1.0.1 + ## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18) ### Fixed diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/README.md new/charset_normalizer-3.1.0/README.md --- old/charset_normalizer-3.0.1/README.md 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/README.md 2023-03-06 07:46:55.000000000 +0100 @@ -23,18 +23,18 @@ This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**. -| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) | -| ------------- | :-------------: | :------------------: | :------------------: | -| `Fast` | â<br> | â <br> | â <br> | -| `Universal**` | â | â | â | -| `Reliable` **without** distinguishable standards | â | â | â | -| `Reliable` **with** distinguishable standards | â | â | â | -| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ | -| `Native Python` | â | â | â | -| `Detect spoken language` | â | â | N/A | -| `UnicodeDecodeError Safety` | â | â | â | -| `Whl Size` | 193.6 kB | 39.5 kB | ~200 kB | -| `Supported Encoding` | 33 | :tada: [90](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 +| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) | +|--------------------------------------------------|:---------------------------------------------:|:------------------------------------------------------------------------------------------------------:|:-----------------------------------------------:| +| `Fast` | â<br> | â <br> | â <br> | +| `Universal**` | â | â | â | +| `Reliable` **without** distinguishable standards | â | â | â | +| `Reliable` **with** distinguishable standards | â | â | â | +| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ | +| `Native Python` | â | â | â | +| `Detect spoken language` | â | â | N/A | +| `UnicodeDecodeError Safety` | â | â | â | +| `Whl Size` | 193.6 kB | 39.5 kB | ~200 kB | +| `Supported Encoding` | 33 | :tada: [90](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 | <p align="center"> <img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/> @@ -50,15 +50,15 @@ This package offer better performance than its counterpart Chardet. Here are some numbers. -| Package | Accuracy | Mean per file (ms) | File per sec (est) | -| ------------- | :-------------: | :------------------: | :------------------: | -| [chardet](https://github.com/chardet/chardet) | 86 % | 200 ms | 5 file/sec | -| charset-normalizer | **98 %** | **10 ms** | 100 file/sec | - -| Package | 99th percentile | 95th percentile | 50th percentile | -| ------------- | :-------------: | :------------------: | :------------------: | -| [chardet](https://github.com/chardet/chardet) | 1200 ms | 287 ms | 23 ms | -| charset-normalizer | 100 ms | 50 ms | 5 ms | +| Package | Accuracy | Mean per file (ms) | File per sec (est) | +|-----------------------------------------------|:--------:|:------------------:|:------------------:| +| [chardet](https://github.com/chardet/chardet) | 86 % | 200 ms | 5 file/sec | +| charset-normalizer | **98 %** | **10 ms** | 100 file/sec | + +| Package | 99th percentile | 95th percentile | 50th percentile | +|-----------------------------------------------|:---------------:|:---------------:|:---------------:| +| [chardet](https://github.com/chardet/chardet) | 1200 ms | 287 ms | 23 ms | +| charset-normalizer | 100 ms | 50 ms | 5 ms | Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload. @@ -185,15 +185,15 @@ ## ð° How - Discard all charset encoding table that could not fit the binary content. - - Measure chaos, or the mess once opened (by chunks) with a corresponding charset encoding. + - Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding. - Extract matches with the lowest mess detected. - Additionally, we measure coherence / probe for a language. -**Wait a minute**, what is chaos/mess and coherence according to **YOU ?** +**Wait a minute**, what is noise/mess and coherence according to **YOU ?** -*Chaos :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then +*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then **I established** some ground rules about **what is obvious** when **it seems like** a mess. - I know that my interpretation of what is chaotic is very subjective, feel free to contribute in order to + I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to improve or rewrite it. *Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought @@ -204,6 +204,16 @@ - Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters)) - Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content. +## â ï¸ About Python EOLs + +**If you are running:** + +- Python >=2.7,<3.5: Unsupported +- Python 3.5: charset-normalizer < 2.1 +- Python 3.6: charset-normalizer < 3.1 + +Upgrade your Python interpreter as soon as possible. + ## ð¤ Contributing Contributions, issues and feature requests are very much welcome.<br /> @@ -211,7 +221,17 @@ ## ð License -Copyright © 2019 [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br /> +Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br /> This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed. Characters frequencies used in this project © 2012 [Denny VrandeÄiÄ](http://simia.net/letters/) + +## ð¼ For Enterprise + +Professional support for charset-normalizer is available as part of the [Tidelift +Subscription][1]. Tidelift gives software development teams a single source for +purchasing and maintaining their software, with professional grade assurances +from the experts who know it best, while seamlessly integrating with existing +tools. + +[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/bin/run_autofix.sh new/charset_normalizer-3.1.0/bin/run_autofix.sh --- old/charset_normalizer-3.0.1/bin/run_autofix.sh 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/bin/run_autofix.sh 2023-03-06 07:46:55.000000000 +0100 @@ -7,5 +7,5 @@ set -x -${PREFIX}black --target-version=py36 charset_normalizer +${PREFIX}black --target-version=py37 charset_normalizer ${PREFIX}isort charset_normalizer diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/bin/run_checks.sh new/charset_normalizer-3.1.0/bin/run_checks.sh --- old/charset_normalizer-3.0.1/bin/run_checks.sh 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/bin/run_checks.sh 2023-03-06 07:46:55.000000000 +0100 @@ -8,7 +8,7 @@ set -x ${PREFIX}pytest -${PREFIX}black --check --diff --target-version=py36 charset_normalizer +${PREFIX}black --check --diff --target-version=py37 charset_normalizer ${PREFIX}flake8 charset_normalizer ${PREFIX}mypy charset_normalizer ${PREFIX}isort --check --diff charset_normalizer diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/build-requirements.txt new/charset_normalizer-3.1.0/build-requirements.txt --- old/charset_normalizer-3.0.1/build-requirements.txt 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/build-requirements.txt 2023-03-06 07:46:55.000000000 +0100 @@ -1,7 +1,5 @@ # in the meantime we migrate to pyproject.toml # this represent the minimum requirement to build (for the optional speedup) -mypy==0.990; python_version >= "3.7" -mypy==0.971; python_version < "3.7" -build==0.9.0 -wheel==0.38.4; python_version >= "3.7" -wheel==0.37.1; python_version < "3.7" +mypy==1.0.1 +build==0.10.0 +wheel==0.38.4 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/charset_normalizer/api.py new/charset_normalizer-3.1.0/charset_normalizer/api.py --- old/charset_normalizer-3.0.1/charset_normalizer/api.py 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/charset_normalizer/api.py 2023-03-06 07:46:55.000000000 +0100 @@ -175,7 +175,6 @@ prioritized_encodings.append("utf_8") for encoding_iana in prioritized_encodings + IANA_SUPPORTED: - if cp_isolation and encoding_iana not in cp_isolation: continue @@ -318,7 +317,9 @@ bom_or_sig_available and strip_sig_or_bom is False ): break - except UnicodeDecodeError as e: # Lazy str loading may have missed something there + except ( + UnicodeDecodeError + ) as e: # Lazy str loading may have missed something there logger.log( TRACE, "LazyStr Loading: After MD chunk decode, code page %s does not fit given bytes sequence at ALL. %s", diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/charset_normalizer/cd.py new/charset_normalizer-3.1.0/charset_normalizer/cd.py --- old/charset_normalizer-3.0.1/charset_normalizer/cd.py 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/charset_normalizer/cd.py 2023-03-06 07:46:55.000000000 +0100 @@ -140,7 +140,6 @@ source_have_accents = any(is_accentuated(character) for character in characters) for language, language_characters in FREQUENCIES.items(): - target_have_accents, target_pure_latin = get_target_features(language) if ignore_non_latin and target_pure_latin is False: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/charset_normalizer/cli/normalizer.py new/charset_normalizer-3.1.0/charset_normalizer/cli/normalizer.py --- old/charset_normalizer-3.0.1/charset_normalizer/cli/normalizer.py 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/charset_normalizer/cli/normalizer.py 2023-03-06 07:46:55.000000000 +0100 @@ -147,7 +147,6 @@ x_ = [] for my_file in args.files: - matches = from_fp(my_file, threshold=args.threshold, explain=args.verbose) best_guess = matches.best() @@ -222,7 +221,6 @@ ) if args.normalize is True: - if best_guess.encoding.startswith("utf") is True: print( '"{}" file does not need to be normalized, as it already came from unicode.'.format( diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/charset_normalizer/legacy.py new/charset_normalizer-3.1.0/charset_normalizer/legacy.py --- old/charset_normalizer-3.0.1/charset_normalizer/legacy.py 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/charset_normalizer/legacy.py 2023-03-06 07:46:55.000000000 +0100 @@ -1,10 +1,13 @@ -from typing import Dict, Optional, Union +from typing import Any, Dict, Optional, Union +from warnings import warn from .api import from_bytes from .constant import CHARDET_CORRESPONDENCE -def detect(byte_str: bytes) -> Dict[str, Optional[Union[str, float]]]: +def detect( + byte_str: bytes, should_rename_legacy: bool = False, **kwargs: Any +) -> Dict[str, Optional[Union[str, float]]]: """ chardet legacy method Detect the encoding of the given byte string. It should be mostly backward-compatible. @@ -13,7 +16,14 @@ further information. Not planned for removal. :param byte_str: The byte sequence to examine. + :param should_rename_legacy: Should we rename legacy encodings + to their more modern equivalents? """ + if len(kwargs): + warn( + f"charset-normalizer disregard arguments '{','.join(list(kwargs.keys()))}' in legacy function detect()" + ) + if not isinstance(byte_str, (bytearray, bytes)): raise TypeError( # pragma: nocover "Expected object of type bytes or bytearray, got: " @@ -34,10 +44,11 @@ if r is not None and encoding == "utf_8" and r.bom: encoding += "_sig" + if should_rename_legacy is False and encoding in CHARDET_CORRESPONDENCE: + encoding = CHARDET_CORRESPONDENCE[encoding] + return { - "encoding": encoding - if encoding not in CHARDET_CORRESPONDENCE - else CHARDET_CORRESPONDENCE[encoding], + "encoding": encoding, "language": language, "confidence": confidence, } diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/charset_normalizer/utils.py new/charset_normalizer-3.1.0/charset_normalizer/utils.py --- old/charset_normalizer-3.0.1/charset_normalizer/utils.py 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/charset_normalizer/utils.py 2023-03-06 07:46:55.000000000 +0100 @@ -311,7 +311,6 @@ def cp_similarity(iana_name_a: str, iana_name_b: str) -> float: - if is_multi_byte_encoding(iana_name_a) or is_multi_byte_encoding(iana_name_b): return 0.0 @@ -351,7 +350,6 @@ level: int = logging.INFO, format_string: str = "%(asctime)s | %(levelname)s | %(message)s", ) -> None: - logger = logging.getLogger(name) logger.setLevel(level) @@ -371,7 +369,6 @@ is_multi_byte_decoder: bool, decoded_payload: Optional[str] = None, ) -> Generator[str, None, None]: - if decoded_payload and is_multi_byte_decoder is False: for i in offsets: chunk = decoded_payload[i : i + chunk_size] @@ -397,7 +394,6 @@ # multi-byte bad cutting detector and adjustment # not the cleanest way to perform that fix but clever enough for now. if is_multi_byte_decoder and i > 0: - chunk_partial_size_chk: int = min(chunk_size, 16) if ( diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/charset_normalizer/version.py new/charset_normalizer-3.1.0/charset_normalizer/version.py --- old/charset_normalizer-3.0.1/charset_normalizer/version.py 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/charset_normalizer/version.py 2023-03-06 07:46:55.000000000 +0100 @@ -2,5 +2,5 @@ Expose version """ -__version__ = "3.0.1" +__version__ = "3.1.0" VERSION = __version__.split(".") diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/dev-requirements.txt new/charset_normalizer-3.1.0/dev-requirements.txt --- old/charset_normalizer-3.0.1/dev-requirements.txt 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/dev-requirements.txt 2023-03-06 07:46:55.000000000 +0100 @@ -1,26 +1,13 @@ flake8==5.0.4 -chardet==5.0.0 -isort==5.10.1 +chardet==5.1.0 +isort==5.11.4 codecov==2.1.12 pytest-cov==4.0.0 -build==0.9.0 +build==0.10.0 +wheel==0.38.4 -# The vast majority of project dropped Python 3.6 -# This is to ensure build are reproducible >=3.6 -black==22.8.0; python_version < "3.7" -black==22.10.0; python_version >= "3.7" - -mypy==0.990; python_version >= "3.7" -mypy==0.971; python_version < "3.7" - -Flask==2.2.2; python_version >= "3.7" -Flask==2.0.3; python_version < "3.7" - -pytest==7.0.0; python_version < "3.7" -pytest==7.2.0; python_version >= "3.7" - -requests==2.27.1; python_version < "3.7" -requests==2.28.1; python_version >= "3.7" - -wheel==0.38.4; python_version >= "3.7" -wheel==0.37.1; python_version < "3.7" +black==23.1.0 +mypy==1.0.1 +Flask==2.2.3 +pytest==7.2.1 +requests==2.28.2 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/docs/community/faq.rst new/charset_normalizer-3.1.0/docs/community/faq.rst --- old/charset_normalizer-3.0.1/docs/community/faq.rst 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/docs/community/faq.rst 2023-03-06 07:46:55.000000000 +0100 @@ -40,7 +40,7 @@ Then this change is mostly backward-compatible, exception of a thing: - This new library support way more code pages (x3) than its counterpart Chardet. - - Based on the 30-ich charsets that Chardet support, expect roughly 85% BC results https://github.com/Ousret/charset_normalizer/pull/77/checks?check_run_id=3244585065 +- Based on the 30-ich charsets that Chardet support, expect roughly 80% BC results We do not guarantee this BC exact percentage through time. May vary but not by much. @@ -56,3 +56,20 @@ Any code page supported by your cPython is supported by charset-normalizer! It is that simple, no need to update the library. It is as generic as we could do. + +I can't build standalone executable +----------------------------------- + +If you are using ``pyinstaller``, ``py2exe`` or alike, you may be encountering this or close to: + + ModuleNotFoundError: No module named 'charset_normalizer.md__mypyc' + +Why? + +- Your package manager picked up a optimized (for speed purposes) wheel that match your architecture and operating system. +- Finally, the module ``charset_normalizer.md__mypyc`` is imported via binaries and can't be seen using your tool. + +How to remedy? + +If your bundler program support it, set up a hook that implicitly import the hidden module. +Otherwise, follow the guide on how to install the vanilla version of this package. (Section: *Optional speedup extension*) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/docs/user/cli.rst new/charset_normalizer-3.1.0/docs/user/cli.rst --- old/charset_normalizer-3.0.1/docs/user/cli.rst 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/docs/user/cli.rst 2023-03-06 07:46:55.000000000 +0100 @@ -5,6 +5,7 @@ This is a great tool to fully exploit the detector capabilities without having to write Python code. Possible use cases: + #. Quickly discover probable originating charset from a file. #. I want to quickly convert a non Unicode file to Unicode. #. Debug the charset-detector. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/docs/user/support.rst new/charset_normalizer-3.1.0/docs/user/support.rst --- old/charset_normalizer-3.0.1/docs/user/support.rst 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/docs/user/support.rst 2023-03-06 07:46:55.000000000 +0100 @@ -2,13 +2,21 @@ Support ================= -Here are a list of supported encoding and supported language with latest update. Also this list -may change depending of your python version. +**If you are running:** + +- Python >=2.7,<3.5: Unsupported +- Python 3.5: charset-normalizer < 2.1 +- Python 3.6: charset-normalizer < 3.1 + +Upgrade your Python interpreter as soon as possible. ------------------- Supported Encodings ------------------- +Here are a list of supported encoding and supported language with latest update. Also this list +may change depending of your python version. + Charset Normalizer is able to detect any of those encoding. This list is NOT static and depends heavily on what your current cPython version is shipped with. See https://docs.python.org/3/library/codecs.html#standard-encodings @@ -116,41 +124,51 @@ Those language can be detected inside your content. All of these are specified in ./charset_normalizer/assets/__init__.py . -English, -German, -French, -Dutch, -Italian, -Polish, -Spanish, -Russian, -Japanese, -Portuguese, -Swedish, -Chinese, -Ukrainian, -Norwegian, -Finnish, -Vietnamese, -Czech, -Hungarian, -Korean, -Indonesian, -Turkish, -Romanian, -Farsi, -Arabic, -Danish, -Serbian, -Lithuanian, -Slovene, -Slovak, -Malay, -Hebrew, -Bulgarian, -Croatian, -Hindi, -Estonian, -Thai, -Greek, -Tamil. +| English, +| German, +| French, +| Dutch, +| Italian, +| Polish, +| Spanish, +| Russian, +| Japanese, +| Portuguese, +| Swedish, +| Chinese, +| Ukrainian, +| Norwegian, +| Finnish, +| Vietnamese, +| Czech, +| Hungarian, +| Korean, +| Indonesian, +| Turkish, +| Romanian, +| Farsi, +| Arabic, +| Danish, +| Serbian, +| Lithuanian, +| Slovene, +| Slovak, +| Malay, +| Hebrew, +| Bulgarian, +| Croatian, +| Hindi, +| Estonian, +| Thai, +| Greek, +| Tamil. + +---------------------------- +Incomplete Sequence / Stream +---------------------------- + +It is not (yet) officially supported. If you feed an incomplete byte sequence (eg. truncated multi-byte sequence) the detector will +most likely fail to return a proper result. +If you are purposely feeding part of your payload for performance concerns, you may stop doing it as this package is fairly optimized. + +We are working on a dedicated way to handle streams. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/setup.cfg new/charset_normalizer-3.1.0/setup.cfg --- old/charset_normalizer-3.0.1/setup.cfg 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/setup.cfg 2023-03-06 07:46:55.000000000 +0100 @@ -8,7 +8,7 @@ license = MIT author_email = ahmed.ta...@cloudnursery.dev author = Ahmed TAHRI -python_requires = >=3.6.0 +python_requires = >=3.7.0 project_urls = Bug Reports = https://github.com/Ousret/charset_normalizer/issues Documentation = https://charset-normalizer.readthedocs.io/en/latest @@ -20,7 +20,6 @@ Operating System :: OS Independent Programming Language :: Python Programming Language :: Python :: 3 - Programming Language :: Python :: 3.6 Programming Language :: Python :: 3.7 Programming Language :: Python :: 3.8 Programming Language :: Python :: 3.9 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/charset_normalizer-3.0.1/tests/test_logging.py new/charset_normalizer-3.1.0/tests/test_logging.py --- old/charset_normalizer-3.0.1/tests/test_logging.py 2022-11-18 06:44:30.000000000 +0100 +++ new/charset_normalizer-3.1.0/tests/test_logging.py 2023-03-06 07:46:55.000000000 +0100 @@ -7,7 +7,7 @@ class TestLogBehaviorClass: - def setup(self): + def setup_method(self): self.logger = logging.getLogger("charset_normalizer") self.logger.handlers.clear() self.logger.addHandler(logging.NullHandler())