commit python-charset-normalizer for openSUSE:Factory

Source-Sync Wed, 16 Feb 2022 23:47:38 -0800

Script 'mail_helper' called by obssrc
Hello community,

here is the log from the commit of package python-charset-normalizer for 
openSUSE:Factory checked in at 2022-02-17 00:29:57
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-charset-normalizer (Old)
 and      /work/SRC/openSUSE:Factory/.python-charset-normalizer.new.1956 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "python-charset-normalizer"

Thu Feb 17 00:29:57 2022 rev:12 rq:954654 version:2.0.12

Changes:
--------
--- 
/work/SRC/openSUSE:Factory/python-charset-normalizer/python-charset-normalizer.changes
      2022-01-11 21:20:37.289015879 +0100
+++ 
/work/SRC/openSUSE:Factory/.python-charset-normalizer.new.1956/python-charset-normalizer.changes
    2022-02-17 00:30:05.709437803 +0100
@@ -1,0 +2,9 @@
+Tue Feb 15 08:42:30 UTC 2022 - Dirk M??ller <dmuel...@suse.com>
+
+- update to 2.0.12:
+  * ASCII miss-detection on rare cases (PR #170) 
+  * Explicit support for Python 3.11 (PR #164)
+  * The logging behavior have been completely reviewed, now using only TRACE
+    and DEBUG levels
+
+-------------------------------------------------------------------

Old:
----
  charset_normalizer-2.0.10.tar.gz

New:
----
  charset_normalizer-2.0.12.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-charset-normalizer.spec ++++++
--- /var/tmp/diff_new_pack.UHOZtE/_old  2022-02-17 00:30:06.421437680 +0100
+++ /var/tmp/diff_new_pack.UHOZtE/_new  2022-02-17 00:30:06.425437679 +0100
@@ -19,7 +19,7 @@
 %{?!python_module:%define python_module() python-%{**} python3-%{**}}
 %define skip_python2 1
 Name:           python-charset-normalizer
-Version:        2.0.10
+Version:        2.0.12
 Release:        0
 Summary:        Python Universal Charset detector
 License:        MIT

++++++ charset_normalizer-2.0.10.tar.gz -> charset_normalizer-2.0.12.tar.gz 
++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/charset_normalizer-2.0.10/.github/workflows/detector-coverage.yml 
new/charset_normalizer-2.0.12/.github/workflows/detector-coverage.yml
--- old/charset_normalizer-2.0.10/.github/workflows/detector-coverage.yml       
2022-01-04 21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/.github/workflows/detector-coverage.yml       
2022-02-12 15:24:47.000000000 +0100
@@ -31,7 +31,7 @@
         git clone https://github.com/Ousret/char-dataset.git
     - name: Coverage WITH preemptive
       run: |
-        python ./bin/coverage.py --coverage 98 --with-preemptive
+        python ./bin/coverage.py --coverage 97 --with-preemptive
     - name: Coverage WITHOUT preemptive
       run: |
         python ./bin/coverage.py --coverage 95
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/charset_normalizer-2.0.10/.github/workflows/python-publish.yml 
new/charset_normalizer-2.0.12/.github/workflows/python-publish.yml
--- old/charset_normalizer-2.0.10/.github/workflows/python-publish.yml  
2022-01-04 21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/.github/workflows/python-publish.yml  
2022-02-12 15:24:47.000000000 +0100
@@ -101,7 +101,7 @@
           git clone https://github.com/Ousret/char-dataset.git
       - name: Coverage WITH preemptive
         run: |
-          python ./bin/coverage.py --coverage 98 --with-preemptive
+          python ./bin/coverage.py --coverage 97 --with-preemptive
       - name: Coverage WITHOUT preemptive
         run: |
           python ./bin/coverage.py --coverage 95
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/charset_normalizer-2.0.10/.github/workflows/run-tests.yml 
new/charset_normalizer-2.0.12/.github/workflows/run-tests.yml
--- old/charset_normalizer-2.0.10/.github/workflows/run-tests.yml       
2022-01-04 21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/.github/workflows/run-tests.yml       
2022-02-12 15:24:47.000000000 +0100
@@ -9,7 +9,7 @@
     strategy:
       fail-fast: false
       matrix:
-        python-version: [3.5, 3.6, 3.7, 3.8, 3.9, "3.10"]
+        python-version: [3.5, 3.6, 3.7, 3.8, 3.9, "3.10", "3.11.0-alpha.4"]
         os: [ubuntu-latest]
 
     steps:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/charset_normalizer-2.0.10/CHANGELOG.md 
new/charset_normalizer-2.0.12/CHANGELOG.md
--- old/charset_normalizer-2.0.10/CHANGELOG.md  2022-01-04 21:14:06.000000000 
+0100
+++ new/charset_normalizer-2.0.12/CHANGELOG.md  2022-02-12 15:24:47.000000000 
+0100
@@ -2,6 +2,19 @@
 All notable changes to charset-normalizer will be documented in this file. 
This project adheres to [Semantic 
Versioning](https://semver.org/spec/v2.0.0.html).
 The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/).
 
+## 
[2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) 
(2022-02-12)
+
+### Fixed
+- ASCII miss-detection on rare cases (PR #170) 
+
+## 
[2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) 
(2022-01-30)
+
+### Added
+- Explicit support for Python 3.11 (PR #164)
+
+### Changed
+- The logging behavior have been completely reviewed, now using only TRACE and 
DEBUG levels (PR #163 #165)
+
 ## 
[2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) 
(2022-01-04)
 
 ### Fixed
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/charset_normalizer-2.0.10/README.md 
new/charset_normalizer-2.0.12/README.md
--- old/charset_normalizer-2.0.10/README.md     2022-01-04 21:14:06.000000000 
+0100
+++ new/charset_normalizer-2.0.12/README.md     2022-02-12 15:24:47.000000000 
+0100
@@ -33,12 +33,13 @@
 | `License` | LGPL-2.1 | MIT | MPL-1.1
 | `Native Python` | :heavy_check_mark: | :heavy_check_mark: | ??? |
 | `Detect spoken language` | ??? | :heavy_check_mark: | N/A |
-| `Supported Encoding` | 30 | :tada: 
[93](https://charset-normalizer.readthedocs.io/en/latest/support.html)  | 40
+| `Supported Encoding` | 30 | :tada: 
[93](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings)
  | 40
 
 <p align="center">
 <img src="https://i.imgflip.com/373iay.gif"; alt="Reading Normalized Text" 
width="226"/><img 
src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif"; 
alt="Cat Reading Text" width="200"/>
 
 *\*\* : They are clearly using specific code for a specific encoding even if 
covering most of used one*<br> 
+Did you got there because of the logs? See 
[https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html](https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html)
 
 ## ??? Your support
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/charset_normalizer-2.0.10/bin/bc.py 
new/charset_normalizer-2.0.12/bin/bc.py
--- old/charset_normalizer-2.0.10/bin/bc.py     2022-01-04 21:14:06.000000000 
+0100
+++ new/charset_normalizer-2.0.12/bin/bc.py     2022-02-12 15:24:47.000000000 
+0100
@@ -43,7 +43,7 @@
     success_count = 0
     total_count = 0
 
-    for tbt_path in glob("./char-dataset/**/*.*"):
+    for tbt_path in sorted(glob("./char-dataset/**/*.*")):
         total_count += 1
 
         with open(tbt_path, "rb") as fp:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/charset_normalizer-2.0.10/bin/coverage.py 
new/charset_normalizer-2.0.12/bin/coverage.py
--- old/charset_normalizer-2.0.10/bin/coverage.py       2022-01-04 
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/bin/coverage.py       2022-02-12 
15:24:47.000000000 +0100
@@ -43,7 +43,7 @@
     success_count = 0
     total_count = 0
 
-    for tbt_path in glob("./char-dataset/**/*.*"):
+    for tbt_path in sorted(glob("./char-dataset/**/*.*")):
 
         expected_encoding = tbt_path.split(sep)[-2]
         total_count += 1
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/charset_normalizer-2.0.10/bin/performance.py 
new/charset_normalizer-2.0.12/bin/performance.py
--- old/charset_normalizer-2.0.10/bin/performance.py    2022-01-04 
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/bin/performance.py    2022-02-12 
15:24:47.000000000 +0100
@@ -37,7 +37,7 @@
     chardet_results = []
     charset_normalizer_results = []
 
-    for tbt_path in glob("./char-dataset/**/*.*"):
+    for tbt_path in sorted(glob("./char-dataset/**/*.*")):
         print(tbt_path)
 
         # Read Bin file
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/charset_normalizer-2.0.10/bin/serve.py 
new/charset_normalizer-2.0.12/bin/serve.py
--- old/charset_normalizer-2.0.10/bin/serve.py  2022-01-04 21:14:06.000000000 
+0100
+++ new/charset_normalizer-2.0.12/bin/serve.py  2022-02-12 15:24:47.000000000 
+0100
@@ -13,7 +13,7 @@
 def read_targets():
     return jsonify(
         [
-            el.replace("./char-dataset", "/raw").replace("\\", "/") for el in 
glob("./char-dataset/**/*")
+            el.replace("./char-dataset", "/raw").replace("\\", "/") for el in 
sorted(glob("./char-dataset/**/*"))
         ]
     )
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/charset_normalizer-2.0.10/charset_normalizer/api.py 
new/charset_normalizer-2.0.12/charset_normalizer/api.py
--- old/charset_normalizer-2.0.10/charset_normalizer/api.py     2022-01-04 
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/charset_normalizer/api.py     2022-02-12 
15:24:47.000000000 +0100
@@ -13,7 +13,7 @@
     mb_encoding_languages,
     merge_coherence_ratios,
 )
-from .constant import IANA_SUPPORTED, TOO_BIG_SEQUENCE, TOO_SMALL_SEQUENCE
+from .constant import IANA_SUPPORTED, TOO_BIG_SEQUENCE, TOO_SMALL_SEQUENCE, 
TRACE
 from .md import mess_ratio
 from .models import CharsetMatch, CharsetMatches
 from .utils import (
@@ -25,6 +25,8 @@
     should_strip_sig_or_bom,
 )
 
+# Will most likely be controversial
+# logging.addLevelName(TRACE, "TRACE")
 logger = logging.getLogger("charset_normalizer")
 explain_handler = logging.StreamHandler()
 explain_handler.setFormatter(
@@ -70,19 +72,20 @@
     if explain:
         previous_logger_level = logger.level  # type: int
         logger.addHandler(explain_handler)
-        logger.setLevel(logging.DEBUG)
+        logger.setLevel(TRACE)
 
     length = len(sequences)  # type: int
 
     if length == 0:
-        logger.warning("Encoding detection on empty bytes, assuming utf_8 
intention.")
+        logger.debug("Encoding detection on empty bytes, assuming utf_8 
intention.")
         if explain:
             logger.removeHandler(explain_handler)
             logger.setLevel(previous_logger_level or logging.WARNING)
         return CharsetMatches([CharsetMatch(sequences, "utf_8", 0.0, False, 
[], "")])
 
     if cp_isolation is not None:
-        logger.debug(
+        logger.log(
+            TRACE,
             "cp_isolation is set. use this flag for debugging purpose. "
             "limited list of encoding allowed : %s.",
             ", ".join(cp_isolation),
@@ -92,7 +95,8 @@
         cp_isolation = []
 
     if cp_exclusion is not None:
-        logger.debug(
+        logger.log(
+            TRACE,
             "cp_exclusion is set. use this flag for debugging purpose. "
             "limited list of encoding excluded : %s.",
             ", ".join(cp_exclusion),
@@ -102,7 +106,8 @@
         cp_exclusion = []
 
     if length <= (chunk_size * steps):
-        logger.debug(
+        logger.log(
+            TRACE,
             "override steps (%i) and chunk_size (%i) as content does not fit 
(%i byte(s) given) parameters.",
             steps,
             chunk_size,
@@ -118,16 +123,18 @@
     is_too_large_sequence = len(sequences) >= TOO_BIG_SEQUENCE  # type: bool
 
     if is_too_small_sequence:
-        logger.warning(
+        logger.log(
+            TRACE,
             "Trying to detect encoding from a tiny portion of ({}) 
byte(s).".format(
                 length
-            )
+            ),
         )
     elif is_too_large_sequence:
-        logger.info(
+        logger.log(
+            TRACE,
             "Using lazy str decoding because the payload is quite large, ({}) 
byte(s).".format(
                 length
-            )
+            ),
         )
 
     prioritized_encodings = []  # type: List[str]
@@ -138,7 +145,8 @@
 
     if specified_encoding is not None:
         prioritized_encodings.append(specified_encoding)
-        logger.info(
+        logger.log(
+            TRACE,
             "Detected declarative mark in sequence. Priority +1 given for %s.",
             specified_encoding,
         )
@@ -157,7 +165,8 @@
 
     if sig_encoding is not None:
         prioritized_encodings.append(sig_encoding)
-        logger.info(
+        logger.log(
+            TRACE,
             "Detected a SIG or BOM mark on first %i byte(s). Priority +1 given 
for %s.",
             len(sig_payload),
             sig_encoding,
@@ -188,7 +197,8 @@
         )  # type: bool
 
         if encoding_iana in {"utf_16", "utf_32"} and not bom_or_sig_available:
-            logger.debug(
+            logger.log(
+                TRACE,
                 "Encoding %s wont be tested as-is because it require a BOM. 
Will try some sub-encoder LE/BE.",
                 encoding_iana,
             )
@@ -197,8 +207,10 @@
         try:
             is_multi_byte_decoder = is_multi_byte_encoding(encoding_iana)  # 
type: bool
         except (ModuleNotFoundError, ImportError):
-            logger.debug(
-                "Encoding %s does not provide an IncrementalDecoder", 
encoding_iana
+            logger.log(
+                TRACE,
+                "Encoding %s does not provide an IncrementalDecoder",
+                encoding_iana,
             )
             continue
 
@@ -219,7 +231,8 @@
                 )
         except (UnicodeDecodeError, LookupError) as e:
             if not isinstance(e, LookupError):
-                logger.debug(
+                logger.log(
+                    TRACE,
                     "Code page %s does not fit given bytes sequence at ALL. 
%s",
                     encoding_iana,
                     str(e),
@@ -235,7 +248,8 @@
                 break
 
         if similar_soft_failure_test:
-            logger.debug(
+            logger.log(
+                TRACE,
                 "%s is deemed too similar to code page %s and was consider 
unsuited already. Continuing!",
                 encoding_iana,
                 encoding_soft_failed,
@@ -255,7 +269,8 @@
         )  # type: bool
 
         if multi_byte_bonus:
-            logger.debug(
+            logger.log(
+                TRACE,
                 "Code page %s is a multi byte encoding table and it appear 
that at least one character "
                 "was encoded using n-bytes.",
                 encoding_iana,
@@ -285,7 +300,8 @@
                     errors="ignore" if is_multi_byte_decoder else "strict",
                 )  # type: str
             except UnicodeDecodeError as e:  # Lazy str loading may have 
missed something there
-                logger.debug(
+                logger.log(
+                    TRACE,
                     "LazyStr Loading: After MD chunk decode, code page %s does 
not fit given bytes sequence at ALL. %s",
                     encoding_iana,
                     str(e),
@@ -337,7 +353,8 @@
             try:
                 sequences[int(50e3) :].decode(encoding_iana, errors="strict")
             except UnicodeDecodeError as e:
-                logger.debug(
+                logger.log(
+                    TRACE,
                     "LazyStr Loading: After final lookup, code page %s does 
not fit given bytes sequence at ALL. %s",
                     encoding_iana,
                     str(e),
@@ -350,7 +367,8 @@
         )  # type: float
         if mean_mess_ratio >= threshold or early_stop_count >= 
max_chunk_gave_up:
             tested_but_soft_failure.append(encoding_iana)
-            logger.info(
+            logger.log(
+                TRACE,
                 "%s was excluded because of initial chaos probing. Gave up %i 
time(s). "
                 "Computed mean chaos is %f %%.",
                 encoding_iana,
@@ -373,7 +391,8 @@
                     fallback_u8 = fallback_entry
             continue
 
-        logger.info(
+        logger.log(
+            TRACE,
             "%s passed initial chaos probing. Mean measured chaos is %f %%",
             encoding_iana,
             round(mean_mess_ratio * 100, ndigits=3),
@@ -385,10 +404,11 @@
             target_languages = mb_encoding_languages(encoding_iana)
 
         if target_languages:
-            logger.debug(
+            logger.log(
+                TRACE,
                 "{} should target any language(s) of {}".format(
                     encoding_iana, str(target_languages)
-                )
+                ),
             )
 
         cd_ratios = []
@@ -406,10 +426,11 @@
         cd_ratios_merged = merge_coherence_ratios(cd_ratios)
 
         if cd_ratios_merged:
-            logger.info(
+            logger.log(
+                TRACE,
                 "We detected language {} using {}".format(
                     cd_ratios_merged, encoding_iana
-                )
+                ),
             )
 
         results.append(
@@ -427,8 +448,8 @@
             encoding_iana in [specified_encoding, "ascii", "utf_8"]
             and mean_mess_ratio < 0.1
         ):
-            logger.info(
-                "%s is most likely the one. Stopping the process.", 
encoding_iana
+            logger.debug(
+                "Encoding detection: %s is most likely the one.", encoding_iana
             )
             if explain:
                 logger.removeHandler(explain_handler)
@@ -436,8 +457,9 @@
             return CharsetMatches([results[encoding_iana]])
 
         if encoding_iana == sig_encoding:
-            logger.info(
-                "%s is most likely the one as we detected a BOM or SIG within 
the beginning of the sequence.",
+            logger.debug(
+                "Encoding detection: %s is most likely the one as we detected 
a BOM or SIG within "
+                "the beginning of the sequence.",
                 encoding_iana,
             )
             if explain:
@@ -447,13 +469,15 @@
 
     if len(results) == 0:
         if fallback_u8 or fallback_ascii or fallback_specified:
-            logger.debug(
-                "Nothing got out of the detection process. Using 
ASCII/UTF-8/Specified fallback."
+            logger.log(
+                TRACE,
+                "Nothing got out of the detection process. Using 
ASCII/UTF-8/Specified fallback.",
             )
 
         if fallback_specified:
             logger.debug(
-                "%s will be used as a fallback match", 
fallback_specified.encoding
+                "Encoding detection: %s will be used as a fallback match",
+                fallback_specified.encoding,
             )
             results.append(fallback_specified)
         elif (
@@ -465,12 +489,21 @@
             )
             or (fallback_u8 is not None)
         ):
-            logger.warning("utf_8 will be used as a fallback match")
+            logger.debug("Encoding detection: utf_8 will be used as a fallback 
match")
             results.append(fallback_u8)
         elif fallback_ascii:
-            logger.warning("ascii will be used as a fallback match")
+            logger.debug("Encoding detection: ascii will be used as a fallback 
match")
             results.append(fallback_ascii)
 
+    if results:
+        logger.debug(
+            "Encoding detection: Found %s as plausible (best-candidate) for 
content. With %i alternatives.",
+            results.best().encoding,  # type: ignore
+            len(results) - 1,
+        )
+    else:
+        logger.debug("Encoding detection: Unable to determine any suitable 
charset.")
+
     if explain:
         logger.removeHandler(explain_handler)
         logger.setLevel(previous_logger_level)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/charset_normalizer-2.0.10/charset_normalizer/constant.py 
new/charset_normalizer-2.0.12/charset_normalizer/constant.py
--- old/charset_normalizer-2.0.10/charset_normalizer/constant.py        
2022-01-04 21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/charset_normalizer/constant.py        
2022-02-12 15:24:47.000000000 +0100
@@ -498,3 +498,6 @@
 NOT_PRINTABLE_PATTERN = re_compile(r"[0-9\W\n\r\t]+")
 
 LANGUAGE_SUPPORTED_COUNT = len(FREQUENCIES)  # type: int
+
+# Logging LEVEL bellow DEBUG
+TRACE = 5  # type: int
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/charset_normalizer-2.0.10/charset_normalizer/md.py 
new/charset_normalizer-2.0.12/charset_normalizer/md.py
--- old/charset_normalizer-2.0.10/charset_normalizer/md.py      2022-01-04 
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/charset_normalizer/md.py      2022-02-12 
15:24:47.000000000 +0100
@@ -314,7 +314,7 @@
             self._buffer = ""
             self._buffer_accent_count = 0
         elif (
-            character not in {"<", ">", "-", "="}
+            character not in {"<", ">", "-", "=", "~", "|", "_"}
             and character.isdigit() is False
             and is_symbol(character)
         ):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/charset_normalizer-2.0.10/charset_normalizer/version.py 
new/charset_normalizer-2.0.12/charset_normalizer/version.py
--- old/charset_normalizer-2.0.10/charset_normalizer/version.py 2022-01-04 
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/charset_normalizer/version.py 2022-02-12 
15:24:47.000000000 +0100
@@ -2,5 +2,5 @@
 Expose version
 """
 
-__version__ = "2.0.10"
+__version__ = "2.0.12"
 VERSION = __version__.split(".")
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/charset_normalizer-2.0.10/dev-requirements.txt 
new/charset_normalizer-2.0.12/dev-requirements.txt
--- old/charset_normalizer-2.0.10/dev-requirements.txt  2022-01-04 
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/dev-requirements.txt  2022-02-12 
15:24:47.000000000 +0100
@@ -4,7 +4,7 @@
 chardet==4.0.*
 Flask>=2.0,<3.0; python_version >= '3.6'
 requests>=2.26,<3.0; python_version >= '3.6'
-black==21.12b0; python_version >= '3.6'
+black==22.1.0; python_version >= '3.6'
 flake8==4.0.1; python_version >= '3.6'
-mypy==0.930; python_version >= '3.6'
+mypy==0.931; python_version >= '3.6'
 isort; python_version >= '3.6'
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/charset_normalizer-2.0.10/docs/user/miscellaneous.rst 
new/charset_normalizer-2.0.12/docs/user/miscellaneous.rst
--- old/charset_normalizer-2.0.10/docs/user/miscellaneous.rst   2022-01-04 
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/docs/user/miscellaneous.rst   2022-02-12 
15:24:47.000000000 +0100
@@ -18,3 +18,29 @@
 
     # This should print '????????????????????????????????????????????????'
     print(str(result))
+
+
+Logging
+-------
+
+Prior to the version 2.0.10 you may encounter some unexpected logs in your 
streams.
+Something along the line of:
+
+ ::
+
+    ... | WARNING | override steps (5) and chunk_size (512) as content does 
not fit (465 byte(s) given) parameters.
+    ... | INFO | ascii passed initial chaos probing. Mean measured chaos is 
0.000000 %
+    ... | INFO | ascii should target any language(s) of ['Latin Based']
+
+
+It is most likely because you altered the root getLogger instance. The package 
has its own logic behind logging and why
+it is useful. See https://docs.python.org/3/howto/logging.html to learn the 
basics.
+
+If you are looking to silence and/or reduce drastically the amount of logs, 
please upgrade to the latest version
+available for `charset-normalizer` using your package manager or by `pip 
install charset-normalizer -U`.
+
+The latest version will no longer produce any entry greater than `DEBUG`.
+On `DEBUG` only one entry will be observed and that is about the detection 
result.
+
+Then regarding the others log entries, they will be pushed as `Level 5`. 
Commonly known as TRACE level, but we do
+not register it globally.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/charset_normalizer-2.0.10/setup.py 
new/charset_normalizer-2.0.12/setup.py
--- old/charset_normalizer-2.0.10/setup.py      2022-01-04 21:14:06.000000000 
+0100
+++ new/charset_normalizer-2.0.12/setup.py      2022-02-12 15:24:47.000000000 
+0100
@@ -73,6 +73,7 @@
         'Programming Language :: Python :: 3.8',
         'Programming Language :: Python :: 3.9',
         'Programming Language :: Python :: 3.10',
+        'Programming Language :: Python :: 3.11',
         'Topic :: Text Processing :: Linguistic',
         'Topic :: Utilities',
         'Programming Language :: Python :: Implementation :: PyPy',
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/charset_normalizer-2.0.10/tests/test_logging.py 
new/charset_normalizer-2.0.12/tests/test_logging.py
--- old/charset_normalizer-2.0.10/tests/test_logging.py 2022-01-04 
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/tests/test_logging.py 2022-02-12 
15:24:47.000000000 +0100
@@ -3,6 +3,7 @@
 
 from charset_normalizer.utils import set_logging_handler
 from charset_normalizer.api import from_bytes, explain_handler
+from charset_normalizer.constant import TRACE
 
 
 class TestLogBehaviorClass:
@@ -17,16 +18,16 @@
         from_bytes(test_sequence, steps=1, chunk_size=50, explain=True)
         assert explain_handler not in self.logger.handlers
         for record in caplog.records:
-            assert record.levelname in ["INFO", "DEBUG"]
+            assert record.levelname in ["Level 5", "DEBUG"]
 
     def test_explain_false_handler_set_behavior(self, caplog):
         test_sequence = b'This is a test sequence of bytes that should be 
sufficient'
-        set_logging_handler(level=logging.INFO, format_string="%(message)s")
+        set_logging_handler(level=TRACE, format_string="%(message)s")
         from_bytes(test_sequence, steps=1, chunk_size=50, explain=False)
         assert any(isinstance(hdl, logging.StreamHandler) for hdl in 
self.logger.handlers)
         for record in caplog.records:
-            assert record.levelname in ["INFO", "DEBUG"]
-        assert "ascii is most likely the one. Stopping the process." in 
caplog.text
+            assert record.levelname in ["Level 5", "DEBUG"]
+        assert "Encoding detection: ascii is most likely the one." in 
caplog.text
 
     def test_set_stream_handler(self, caplog):
         set_logging_handler(
@@ -34,7 +35,7 @@
         )
         self.logger.debug("log content should log with default format")
         for record in caplog.records:
-            assert record.levelname in ["INFO", "DEBUG"]
+            assert record.levelname in ["Level 5", "DEBUG"]
         assert "log content should log with default format" in caplog.text
 
     def test_set_stream_handler_format(self, caplog):

commit python-charset-normalizer for openSUSE:Factory

Reply via email to