[issue32285] In `unicodedata`, it should be possible to check a unistr's normal form without necessarily copying it

2018-11-04 Thread Benjamin Peterson

Benjamin Peterson  added the comment:


New changeset 2810dd7be9876236f74ac80716d113572c9098dd by Benjamin Peterson 
(Max Bélanger) in branch 'master':
closes bpo-32285: Add unicodedata.is_normalized. (GH-4806)
https://github.com/python/cpython/commit/2810dd7be9876236f74ac80716d113572c9098dd


--
nosy: +benjamin.peterson
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32285] In `unicodedata`, it should be possible to check a unistr's normal form without necessarily copying it

2018-10-24 Thread Maxime Belanger


Change by Maxime Belanger :


--
versions: +Python 3.8 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32285] In `unicodedata`, it should be possible to check a unistr's normal form without necessarily copying it

2017-12-12 Thread STINNER Victor

STINNER Victor  added the comment:

> However, I'm concerned by your comment that you fall back on creating a 
> normalized copy and comparing.

The purpose of the function is to be faster than str == 
unicodedata.normalize(form, str). So yeah, any optimization is welcome.

But I don't bother with MAYBE suboptimal case which is implemented with: str == 
unicodedata.normalize(form, str). It can be optimized later, if needed.

If someone cares of performance, I will require a benchmark, since I only trust 
numbers :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32285] In `unicodedata`, it should be possible to check a unistr's normal form without necessarily copying it

2017-12-12 Thread Steven D'Aprano

Steven D'Aprano  added the comment:

Python 2.7 is in feature freeze, so this can only go into 3.7.

I would find this useful, and would like this feature. However, I'm concerned 
by your comment that you fall back on creating a normalized copy and comparing. 
That could be expensive, and shouldn't be needed. According to here:

http://unicode.org/reports/tr15/#Detecting_Normalization_Forms

in the worst case, you can incrementally check only the code points in doubt 
(around the "MAYBE" code points).

--
nosy: +steven.daprano
type:  -> enhancement
versions:  -Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32285] In `unicodedata`, it should be possible to check a unistr's normal form without necessarily copying it

2017-12-11 Thread Max Bélanger

Change by Max Bélanger :


--
keywords: +patch
pull_requests: +4703
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32285] In `unicodedata`, it should be possible to check a unistr's normal form without necessarily copying it

2017-12-11 Thread Maxime Belanger

New submission from Maxime Belanger :

In our deployment of Python 2.7, we've patched `unicodedata` to introduce a new 
function: `is_normalized` can check whether a unistr is in a given normal form. 
This currently has to be done by creating a normalized copy, then checking 
whether it is equal to the source string.

This function uses the internal helper (also called `is_normalized`) that can 
"quick check" normalization, but falls back on creating a normalized copy and 
comparing (when necessary).

We're contributing this change in case this can helpful to others. Feedback is 
welcome!

--
components: Unicode
messages: 308085
nosy: Maxime Belanger, ezio.melotti, vstinner
priority: normal
severity: normal
status: open
title: In `unicodedata`, it should be possible to check a unistr's normal form 
without necessarily copying it
versions: Python 2.7, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com