[dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for unsupported unicode chars

dequis Sat, 25 Oct 2014 19:16:54 -0700

Hi suckless! First, thanks for st. Been using it for a long while,
still impressed at how it gets a lot of stuff right - stuff that urxvt
failed miserably at. There's only one issue that has been bothering me
particularly.



The issue itself:

Unicode characters added since unicode 5.2 (released in 2009, the
latest revision[1] is 7.0) are not supported by the wcwidth()
implementation of glibc, and as a result, they behave weirdly in st.
The man page of wcwidth() specifies that -1 is returned for invalid
unicode characters. I found a stack overflow question[2] about this
same issue.


How st handles it:

I made a gif[3] showing its behavior.

It just offsets the columns by the value returned by wcwidth,
expecting either 1 or 2, not -1. So each unsupported unicode character
behaves like a printable backspace.

Picked U+0524[4] for the tests. The st on the top shows the current
behavior, the st on the bottom is my patched version. The first two
lines typed in the gif are spaces followed by that character. Third
line is the letter 'a' just to show how it overlaps.

Then I used a tmux keybinding that is supposed to scan for URLs, but
the main effect here is refreshing the terminal contents, which makes
those characters vanish. That z^H is a typo, ignore that.


My patch:

Just wcwidth(...) -> abs(wcwidth(...))

In other words: if wcwidth returns -1, interpret that as a column
width of 1. It's a bit dirty and lazy, but it works wonderfully for
most characters.

I'm not sure what the "correct" solution would be, but it's definitely
not something as simple as this - would mean fixing the libc to
support unicode up to 7.0, or implementing our own version of it.

Opinions?


[1]: http://www.fileformat.info/info/unicode/version/index.htm
[2]: 
http://stackoverflow.com/questions/16371418/why-does-wcwidth-return-1-with-a-sign-that-i-can-print-on-the-terminal
[3]: http://i.imgur.com/MDzMJJH.gif
[4]: http://www.fileformat.info/info/unicode/char/0524/index.htm

From 810937df2c8a693deb26ac278c73f0147353079b Mon Sep 17 00:00:00 2001
From: dequis <d...@dxzone.com.ar>
Date: Sat, 25 Oct 2014 23:07:38 -0300
Subject: [PATCH] Fix issues with wcwidth() returning -1 for unsupported
 unicode chars

Unicode characters added since unicode 5.2 are not supported by the
wcwidth() implementation of glibc, and as a result, they behave weirdly
in st. The man page of wcwidth() specifies that -1 is returned for
invalid unicode characters.

This patch wraps the wcwidth() calls with abs() to ensure that a column
size of 1 is used for unknown unicode characters.
---
 st.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/st.c b/st.c
index 23dd7f1..12af1ab 100644
--- a/st.c
+++ b/st.c
@@ -2576,7 +2576,7 @@ tputc(char *c, int len) {
 		unicodep = ascii = *c;
 	} else {
 		utf8decode(c, &unicodep, UTF_SIZ);
-		width = wcwidth(unicodep);
+		width = abs(wcwidth(unicodep));
 		control = ISCONTROLC1(unicodep);
 		ascii = unicodep;
 	}
@@ -3440,7 +3440,7 @@ xdraws(char *s, Glyph base, int x, int y, int charlen, int bytelen) {
 				xp, winy + frc[i].font->ascent,
 				(FcChar8 *)u8c, u8cblen);
 
-		xp += xw.cw * wcwidth(unicodep);
+		xp += xw.cw * abs(wcwidth(unicodep));
 	}
 
 	/*
-- 
2.1.2

[dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for unsupported unicode chars

Reply via email to