Hi suckless! First, thanks for st. Been using it for a long while, still impressed at how it gets a lot of stuff right - stuff that urxvt failed miserably at. There's only one issue that has been bothering me particularly.
The issue itself: Unicode characters added since unicode 5.2 (released in 2009, the latest revision[1] is 7.0) are not supported by the wcwidth() implementation of glibc, and as a result, they behave weirdly in st. The man page of wcwidth() specifies that -1 is returned for invalid unicode characters. I found a stack overflow question[2] about this same issue. How st handles it: I made a gif[3] showing its behavior. It just offsets the columns by the value returned by wcwidth, expecting either 1 or 2, not -1. So each unsupported unicode character behaves like a printable backspace. Picked U+0524[4] for the tests. The st on the top shows the current behavior, the st on the bottom is my patched version. The first two lines typed in the gif are spaces followed by that character. Third line is the letter 'a' just to show how it overlaps. Then I used a tmux keybinding that is supposed to scan for URLs, but the main effect here is refreshing the terminal contents, which makes those characters vanish. That z^H is a typo, ignore that. My patch: Just wcwidth(...) -> abs(wcwidth(...)) In other words: if wcwidth returns -1, interpret that as a column width of 1. It's a bit dirty and lazy, but it works wonderfully for most characters. I'm not sure what the "correct" solution would be, but it's definitely not something as simple as this - would mean fixing the libc to support unicode up to 7.0, or implementing our own version of it. Opinions? [1]: http://www.fileformat.info/info/unicode/version/index.htm [2]: http://stackoverflow.com/questions/16371418/why-does-wcwidth-return-1-with-a-sign-that-i-can-print-on-the-terminal [3]: http://i.imgur.com/MDzMJJH.gif [4]: http://www.fileformat.info/info/unicode/char/0524/index.htm
From 810937df2c8a693deb26ac278c73f0147353079b Mon Sep 17 00:00:00 2001 From: dequis <d...@dxzone.com.ar> Date: Sat, 25 Oct 2014 23:07:38 -0300 Subject: [PATCH] Fix issues with wcwidth() returning -1 for unsupported unicode chars Unicode characters added since unicode 5.2 are not supported by the wcwidth() implementation of glibc, and as a result, they behave weirdly in st. The man page of wcwidth() specifies that -1 is returned for invalid unicode characters. This patch wraps the wcwidth() calls with abs() to ensure that a column size of 1 is used for unknown unicode characters. --- st.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/st.c b/st.c index 23dd7f1..12af1ab 100644 --- a/st.c +++ b/st.c @@ -2576,7 +2576,7 @@ tputc(char *c, int len) { unicodep = ascii = *c; } else { utf8decode(c, &unicodep, UTF_SIZ); - width = wcwidth(unicodep); + width = abs(wcwidth(unicodep)); control = ISCONTROLC1(unicodep); ascii = unicodep; } @@ -3440,7 +3440,7 @@ xdraws(char *s, Glyph base, int x, int y, int charlen, int bytelen) { xp, winy + frc[i].font->ascent, (FcChar8 *)u8c, u8cblen); - xp += xw.cw * wcwidth(unicodep); + xp += xw.cw * abs(wcwidth(unicodep)); } /* -- 2.1.2