New submission from Greg Price <gnpr...@gmail.com>:

The unicodedata module has two test cases which run through the database and 
make a hash of its visible outputs for all codepoints, comparing the hash 
against a checksum.  These are helpful regression tests for making sure the 
behavior isn't changed by patches that didn't intend to change it.

But Unicode has grown since Python first gained support for it, when Unicode 
itself was still rather new.  These test cases were added in commit 6a20ee7de 
back in 2000, and they haven't needed to change much since then... but they 
should be changed to look beyond the Basic Multilingual Plane 
(`range(0x10000)`) and cover all 17 planes of Unicode's final form.

Spotted in discussion on GH-15019 
(https://github.com/python/cpython/pull/15019#discussion_r308947884 ).  I have 
a patch for this which I'll send shortly.

----------
components: Tests
messages: 349014
nosy: Greg Price
priority: normal
severity: normal
status: open
title: unicodedata checksum-tests only test 1/17th of Unicode's codepoints
type: enhancement
versions: Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37758>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to