Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for unsupported unicode chars
On Sat, Oct 25, 2014 at 11:15 PM, dequis d...@dxzone.com.ar wrote: ... My patch: Just wcwidth(...) - abs(wcwidth(...)) In other words: if wcwidth returns -1, interpret that as a column width of 1. It's a bit dirty and lazy, but it works wonderfully for most characters. ... It's better than nothing, but I guess Xft could offer much better advice with XftTextExtents*.
Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for
On 26 October 2014 19:36, Dmitrij D. Czarkoff czark...@gmail.com wrote: Apparently no workaround is needed - just use sane libc or avoid using new codepoints until glibc fixes that. Or, if you won't feel dirty after that, send a patch to glibc instead. 'Just using musl' doesn't seem feasible given a glibc-based system - it's easier for smaller programs, but in the case of st i'd have to rebuild xlib, xcb, xft, freetype, fontconfig, etc (and their dependencies) against musl. Not to mention that musl doesn't implement unicode 7.0 either, only up to 6.2. I tend to think that it is better fix the wrong software than add workarounds in the correct software. I don't have problems now, and I think is the same for almost all the users of st. Maybe you can add this patch to the wiki, because it is sure it will be useful for more people. If you really want to fix the problem you should send a patch to the glibc mailing list. Regards,
Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for
On Mon, Oct 27, 2014 at 08:21:52AM +0100, k...@shike2.com wrote: On 26 October 2014 19:36, Dmitrij D. Czarkoff czark...@gmail.com wrote: Apparently no workaround is needed - just use sane libc or avoid using new codepoints until glibc fixes that. Or, if you won't feel dirty after that, send a patch to glibc instead. 'Just using musl' doesn't seem feasible given a glibc-based system - it's easier for smaller programs, but in the case of st i'd have to rebuild xlib, xcb, xft, freetype, fontconfig, etc (and their dependencies) against musl. Not to mention that musl doesn't implement unicode 7.0 either, only up to 6.2. I tend to think that it is better fix the wrong software than add workarounds in the correct software. I don't have problems now, and I think is the same for almost all the users of st. Maybe you can add this patch to the wiki, because it is sure it will be useful for more people. If you really want to fix the problem you should send a patch to the glibc mailing list. POSIX states for int wcwidth(wchar_t wc): ... or return -1 (if wc does not correspond to a printable wide-character code). Therefore I think a return value of -1 should be handled gracefully. Having said that in dvtm it seems completely broken at the moment, no char shows up at all. Patches for that are certainly welcome ... -- Marc André Tanner http://www.brain-dump.org/ GPG key: CF7D56C0
Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for
On Mon, 27 Oct 2014 13:03:49 +0100 Marc André Tanner m...@brain-dump.org wrote: POSIX states for int wcwidth(wchar_t wc): ... or return -1 (if wc does not correspond to a printable wide-character code). Therefore I think a return value of -1 should be handled gracefully. I have to agree here, even though I do not favor changing the code for broken libraries, I have to quote wcwidth(3): DESCRIPTION The wcwidth() function returns the number of columns needed to represent the wide character c. If c is a printable wide character, the value is at least 0. If c is null wide character (L'\0'), the value is 0. Otherwise -1 is returned. That's how POSIX-2001 defines it. So yeah, using abs() to catch the invalid case is fine, but could be refined even more. Cheers FRIGN -- FRIGN d...@frign.de
Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for
That's how POSIX-2001 defines it. So yeah, using abs() to catch the invalid case is fine, but could be refined even more. Ok, it is true. We should catch this case, but I don't like the idea of the abs. I think the correct solution should be print the invalid character. Regards,
Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for
FRIGN said: DESCRIPTION The wcwidth() function returns the number of columns needed to represent the wide character c. If c is a printable wide character, the value is at least 0. If c is null wide character (L'\0'), the value is 0. Otherwise -1 is returned. That's how POSIX-2001 defines it. So yeah, using abs() to catch the invalid case is fine, but could be refined even more. I don't follow your logic here. Glibc currently yelds incorrect result for a set of valid Unicode 7.0 codepoints that were invalid previously. This time they are 1 column width (I didn't verify that, so don't take my word for it), but in future wide characters may be added as well, and wcwidth() will return -1 for them as well, so the abs() hack won't work. IMO ideally st should either not print characters whose codepoints are unknown to libc, or should silently replace them with U+FFFD. At any rate, assumptions about codepoints' properties are not st's business. -- Dmitrij D. Czarkoff
Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for unsupported unicode chars
On Sun, Oct 26, 2014 at 3:15 AM, dequis d...@dxzone.com.ar wrote: My patch: Just wcwidth(...) - abs(wcwidth(...)) In other words: if wcwidth returns -1, interpret that as a column width of 1. It's a bit dirty and lazy, but it works wonderfully for most characters. I'm not sure what the correct solution would be, but it's definitely not something as simple as this - would mean fixing the libc to support unicode up to 7.0, or implementing our own version of it. Opinions? I think your fall-back workaround makes alot of sense until glibc mans up and implements it (the correct solution). I have tested it locally with glibc and musl libc, interestingly musl knows the correct character width. Kind regards, Hiltjo
Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for unsupported unicode chars
On 26 October 2014 19:36, Dmitrij D. Czarkoff czark...@gmail.com wrote: Apparently no workaround is needed - just use sane libc or avoid using new codepoints until glibc fixes that. Or, if you won't feel dirty after that, send a patch to glibc instead. 'Just using musl' doesn't seem feasible given a glibc-based system - it's easier for smaller programs, but in the case of st i'd have to rebuild xlib, xcb, xft, freetype, fontconfig, etc (and their dependencies) against musl. Not to mention that musl doesn't implement unicode 7.0 either, only up to 6.2. Avoiding using new codepoints is not really an option either, they just appear in messages written by other people and result in unreadable/invisible lines. They offset columns by -1, so with enough characters, they can make a whole line appear as empty. My workaround until now was to open a different terminal emulator just to read the 'missing' lines. I'd say that knowingly allowing that kind of rendering glitch is a bad idea.
[dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for unsupported unicode chars
Hi suckless! First, thanks for st. Been using it for a long while, still impressed at how it gets a lot of stuff right - stuff that urxvt failed miserably at. There's only one issue that has been bothering me particularly. The issue itself: Unicode characters added since unicode 5.2 (released in 2009, the latest revision[1] is 7.0) are not supported by the wcwidth() implementation of glibc, and as a result, they behave weirdly in st. The man page of wcwidth() specifies that -1 is returned for invalid unicode characters. I found a stack overflow question[2] about this same issue. How st handles it: I made a gif[3] showing its behavior. It just offsets the columns by the value returned by wcwidth, expecting either 1 or 2, not -1. So each unsupported unicode character behaves like a printable backspace. Picked U+0524[4] for the tests. The st on the top shows the current behavior, the st on the bottom is my patched version. The first two lines typed in the gif are spaces followed by that character. Third line is the letter 'a' just to show how it overlaps. Then I used a tmux keybinding that is supposed to scan for URLs, but the main effect here is refreshing the terminal contents, which makes those characters vanish. That z^H is a typo, ignore that. My patch: Just wcwidth(...) - abs(wcwidth(...)) In other words: if wcwidth returns -1, interpret that as a column width of 1. It's a bit dirty and lazy, but it works wonderfully for most characters. I'm not sure what the correct solution would be, but it's definitely not something as simple as this - would mean fixing the libc to support unicode up to 7.0, or implementing our own version of it. Opinions? [1]: http://www.fileformat.info/info/unicode/version/index.htm [2]: http://stackoverflow.com/questions/16371418/why-does-wcwidth-return-1-with-a-sign-that-i-can-print-on-the-terminal [3]: http://i.imgur.com/MDzMJJH.gif [4]: http://www.fileformat.info/info/unicode/char/0524/index.htm From 810937df2c8a693deb26ac278c73f0147353079b Mon Sep 17 00:00:00 2001 From: dequis d...@dxzone.com.ar Date: Sat, 25 Oct 2014 23:07:38 -0300 Subject: [PATCH] Fix issues with wcwidth() returning -1 for unsupported unicode chars Unicode characters added since unicode 5.2 are not supported by the wcwidth() implementation of glibc, and as a result, they behave weirdly in st. The man page of wcwidth() specifies that -1 is returned for invalid unicode characters. This patch wraps the wcwidth() calls with abs() to ensure that a column size of 1 is used for unknown unicode characters. --- st.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/st.c b/st.c index 23dd7f1..12af1ab 100644 --- a/st.c +++ b/st.c @@ -2576,7 +2576,7 @@ tputc(char *c, int len) { unicodep = ascii = *c; } else { utf8decode(c, unicodep, UTF_SIZ); - width = wcwidth(unicodep); + width = abs(wcwidth(unicodep)); control = ISCONTROLC1(unicodep); ascii = unicodep; } @@ -3440,7 +3440,7 @@ xdraws(char *s, Glyph base, int x, int y, int charlen, int bytelen) { xp, winy + frc[i].font-ascent, (FcChar8 *)u8c, u8cblen); - xp += xw.cw * wcwidth(unicodep); + xp += xw.cw * abs(wcwidth(unicodep)); } /* -- 2.1.2