Re: Unicode grapheme clusters

2023-01-24 Thread Bruce Momjian
On Tue, Jan 24, 2023 at 11:40:01AM -0500, Greg Stark wrote: > On Sat, 21 Jan 2023 at 13:17, Tom Lane wrote: > > > > Probably our long-term answer is to avoid depending on wcwidth > > and use wcswidth instead. But it's hard to get excited about > > doing the legwork for that until popular libc

Re: Unicode grapheme clusters

2023-01-24 Thread Isaac Morland
On Tue, 24 Jan 2023 at 11:40, Greg Stark wrote: > > At the end of the day Unicode kind of assumes a variable-width display > where the rendering is handled by something that has access to the > actual font metrics. So anything trying to line things up in columns > in a way that works with any

Re: Unicode grapheme clusters

2023-01-24 Thread Greg Stark
On Sat, 21 Jan 2023 at 13:17, Tom Lane wrote: > > Probably our long-term answer is to avoid depending on wcwidth > and use wcswidth instead. But it's hard to get excited about > doing the legwork for that until popular libc implementations > get it right. Here's an interesting blog post about

Re: Unicode grapheme clusters

2023-01-21 Thread Bruce Momjian
On Sat, Jan 21, 2023 at 01:17:27PM -0500, Tom Lane wrote: > Bruce Momjian writes: > > I just checked if wcswidth() would honor graphene clusters, though > > wcwidth() does not, but it seems wcswidth() treats characters just like > > wcwidth(): > > Well, that's at least potentially fixable within

Re: Unicode grapheme clusters

2023-01-21 Thread Tom Lane
Bruce Momjian writes: > I just checked if wcswidth() would honor graphene clusters, though > wcwidth() does not, but it seems wcswidth() treats characters just like > wcwidth(): Well, that's at least potentially fixable within libc, while wcwidth clearly can never do this right. Probably our

Re: Unicode grapheme clusters

2023-01-21 Thread Bruce Momjian
On Sat, Jan 21, 2023 at 12:37:30PM -0500, Bruce Momjian wrote: > Well, as one of the URLs I quoted said: > > This is by design. wcwidth() is utterly broken. Any terminal or > terminal application that uses it is also utterly broken. Forget > about emoji wcwidth() doesn't even

Re: Unicode grapheme clusters

2023-01-21 Thread Bruce Momjian
On Sat, Jan 21, 2023 at 11:20:39AM -0500, Greg Stark wrote: > On Fri, 20 Jan 2023 at 00:07, Pavel Stehule wrote: > > > > I partially watch an progres in VTE - one of the widely used terminal libs, > > and I am very sceptical so there will be support in the next two years. > > > > Maybe the new

Re: Unicode grapheme clusters

2023-01-21 Thread Tom Lane
Greg Stark writes: > (If we were really crazy about this we could use terminal escape codes > to query the current cursor position after emitting multicharacter > graphemes. But as I said, I don't even think that would be useful, > even if there weren't other reasons it would be a bad idea)

Re: Unicode grapheme clusters

2023-01-21 Thread Pavel Stehule
so 21. 1. 2023 v 17:21 odesílatel Greg Stark napsal: > On Fri, 20 Jan 2023 at 00:07, Pavel Stehule > wrote: > > > > I partially watch an progres in VTE - one of the widely used terminal > libs, and I am very sceptical so there will be support in the next two > years. > > > > Maybe the new

Re: Unicode grapheme clusters

2023-01-21 Thread Greg Stark
On Fri, 20 Jan 2023 at 00:07, Pavel Stehule wrote: > > I partially watch an progres in VTE - one of the widely used terminal libs, > and I am very sceptical so there will be support in the next two years. > > Maybe the new microsoft terminal will give this area a new dynamic, but > currently

Re: Unicode grapheme clusters

2023-01-19 Thread Pavel Stehule
lution. > > We have a few options: > > * TODO item > * document psql works that way > * do nothing > > I think the big question is how common such cases will be in the future. > The report from 2022, and one from 2019 didn't seem to clearly outline > the issue s

Re: Unicode grapheme clusters

2023-01-19 Thread Bruce Momjian
On Thu, Jan 19, 2023 at 07:53:43PM -0500, Tom Lane wrote: > Bruce Momjian writes: > > I am not sure what you are referring to above? character_length? I was > > talking about display length, and psql uses that --- at some point, our > > lack of support for graphemes will cause psql to not align

Re: Unicode grapheme clusters

2023-01-19 Thread Tom Lane
Bruce Momjian writes: > I am not sure what you are referring to above? character_length? I was > talking about display length, and psql uses that --- at some point, our > lack of support for graphemes will cause psql to not align columns. That's going to happen regardless, as long as we can't

Re: Unicode grapheme clusters

2023-01-19 Thread Bruce Momjian
On Thu, Jan 19, 2023 at 07:37:48PM -0500, Greg Stark wrote: > This is how we've always documented it. Postgres treats code points as > "characters" not graphemes. > > You don't need to go to anything as esoteric as emojis to see this either. > Accented characters like é have no canonical forms

Re: Unicode grapheme clusters

2023-01-19 Thread Greg Stark
This is how we've always documented it. Postgres treats code points as "characters" not graphemes. You don't need to go to anything as esoteric as emojis to see this either. Accented characters like é have no canonical forms that are multiple code points and in some character sets some accented

Re: Unicode grapheme clusters

2023-01-19 Thread Bruce Momjian
On Thu, Jan 19, 2023 at 02:44:57PM +0100, Pavel Stehule wrote: > Surely it should be fixed. Unfortunately - all the terminals that I can use > don't support it. So at this moment it may be premature to fix it, because the > visual form will still be broken. Yes, none of my terminal emulators

Re: Unicode grapheme clusters

2023-01-19 Thread Pavel Stehule
čt 19. 1. 2023 v 1:20 odesílatel Bruce Momjian napsal: > Just my luck, I had to dig into a two-"character" emoji that came to me > as part of a Google Calendar entry --- here it is: > > ‍⚕️喙 > > libc > Unicode UTF8 len > U+1F469

Unicode grapheme clusters

2023-01-18 Thread Bruce Momjian
Just my luck, I had to dig into a two-"character" emoji that came to me as part of a Google Calendar entry --- here it is: ‍⚕️喙 libc Unicode UTF8 len U+1F469 f0 9f 91 a9 2 woman U+1F3FC f0 9f 8f bc 2 emoji