On Mon, 16 Jul 2018 15:28:51 -0400, Terry Reedy wrote: > On 7/16/2018 1:11 PM, Richard Damon wrote: > >> Many consider that UTF-32 is a variable-width encoding because of the >> combining characters. It can take multiple ‘codepoints’ to define what >> should be a single ‘character’ for display. > > I hope you realize that this is not the standard meaning of > 'variable-width encoding', which is 'variable number of bytes for a > codepoint'.
A minor correction Terry: it is the number of code units, not bytes. UTF-8 uses 1-byte code units, and from 1 to 4 code units per code point; UTF-16 uses 2-byte code units (a 16-bit word), and 1 or 2 words per code point; UTF-32 uses 4-byte code units (a 32-bit word), and only ever a single code unit for every code point. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list