New submission from Guillaume Sanchez:

"a⃑".center(width=5, fillchar=".")
produces
'..a⃑.' instead of '..a⃑..'

The reason is that "a⃑" is composed of two code points (2 UCS4 chars), one 'a' 
and one combining code point "above arrow". str.center() counts the size of the 
string and fills it both sides with `fillchar` until the size reaches `width`. 
However, this size is certainly intended to be the number of characters and not 
the number of code points.

The correct way to count characters is to use the grapheme clustering algorithm 
from UAX TR29.

Turns out I implemented this myself already, and might do the PR if asked so, 
with a little help to make the C <-> Python glue.

Thanks for your time.

----------
components: Library (Lib)
messages: 296478
nosy: Guillaume Sanchez
priority: normal
severity: normal
status: open
title: str.center() is not unicode aware
versions: Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30717>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to