Inspired by this thread: https://www.postgresql.org/message-id/011f01d8757e%24f5d69700%24e183c500%24%40ndensan.co.jp Trying to display some special Chinese characters in Postgresql. For now I am using postgresql 15 beta1. The OS is Ubuntu 20.
localhost:5433 admin@test=# show LC_COLLATE; +------------+ | lc_collate | +------------+ | C.UTF-8 | +------------+ localhost:5433 admin@test=# select icu_unicode_version(); +---------------------+ | icu_unicode_version | +---------------------+ | 13.0 | +---------------------+ icu_unicode_version is the extension function. Wiki about character Biang: https://en.wikipedia.org/wiki/Biangbiang_noodles quote: > The character's traditional and simplified forms were added to Unicode > <https://en.wikipedia.org/wiki/Unicode> version 13.0 in March 2020 in the CJK > Unified Ideographs Extension G > <https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_G> block > of the newly allocated Tertiary Ideographic Plane > <https://en.wikipedia.org/wiki/Tertiary_Ideographic_Plane>.[19] > <https://en.wikipedia.org/wiki/Biangbiang_noodles#cite_note-20> The > corresponding Unicode characters are: > Unicode character info: https://www.compart.com/en/unicode/U+30EDD query with strings(s) as ( > values (U&'\+0030EDD') > ) > select s, > octet_length(s), > char_length(s), > (select count(*) from icu_character_boundaries(s,'en')) as graphemes > from strings; > return +-----+--------------+-------------+-----------+ | s | octet_length | char_length | graphemes | +-----+--------------+-------------+-----------+ | ロD | 4 | 2 | 2 | +-----+--------------+-------------+-----------+ Seems not right. graphemes should be 1? And I am not sure values (U&'\+0030EDD') is the same as 𰻝. -- I recommend David Deutsch's <<The Beginning of Infinity>> Jian