Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for unsupported unicode chars

2014-10-28 Thread Ismael Luceno
On Sat, Oct 25, 2014 at 11:15 PM, dequis d...@dxzone.com.ar wrote:
...
 My patch:

 Just wcwidth(...) - abs(wcwidth(...))

 In other words: if wcwidth returns -1, interpret that as a column
 width of 1. It's a bit dirty and lazy, but it works wonderfully for
 most characters.
...

It's better than nothing, but I guess Xft could offer much better
advice with XftTextExtents*.



Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for

2014-10-27 Thread k0ga
 On 26 October 2014 19:36, Dmitrij D. Czarkoff czark...@gmail.com wrote:
 Apparently no workaround is needed - just use sane libc or avoid using
 new codepoints until glibc fixes that.  Or, if you won't feel dirty
 after that, send a patch to glibc instead.
 
 'Just using musl' doesn't seem feasible given a glibc-based system -
 it's easier for smaller programs, but in the case of st i'd have to
 rebuild xlib, xcb, xft, freetype, fontconfig, etc (and their
 dependencies) against musl. Not to mention that musl doesn't implement
 unicode 7.0 either, only up to 6.2.

I tend to think that it is better fix the wrong software than add
workarounds in the correct software. I don't have problems now,
and I think is the same for almost all the users of st. Maybe you
can add this patch to the wiki, because it is sure it will be useful
for more people.

If you really want to fix the problem you should send a patch
to the glibc mailing list.

Regards,




Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for

2014-10-27 Thread Marc André Tanner
On Mon, Oct 27, 2014 at 08:21:52AM +0100, k...@shike2.com wrote:
  On 26 October 2014 19:36, Dmitrij D. Czarkoff czark...@gmail.com wrote:
  Apparently no workaround is needed - just use sane libc or avoid using
  new codepoints until glibc fixes that.  Or, if you won't feel dirty
  after that, send a patch to glibc instead.
  
  'Just using musl' doesn't seem feasible given a glibc-based system -
  it's easier for smaller programs, but in the case of st i'd have to
  rebuild xlib, xcb, xft, freetype, fontconfig, etc (and their
  dependencies) against musl. Not to mention that musl doesn't implement
  unicode 7.0 either, only up to 6.2.
 
 I tend to think that it is better fix the wrong software than add
 workarounds in the correct software. I don't have problems now,
 and I think is the same for almost all the users of st. Maybe you
 can add this patch to the wiki, because it is sure it will be useful
 for more people.
 
 If you really want to fix the problem you should send a patch
 to the glibc mailing list.

POSIX states for int wcwidth(wchar_t wc): ... or return -1 (if wc 
does not correspond to a printable wide-character code). Therefore
I think a return value of -1 should be handled gracefully.

Having said that in dvtm it seems completely broken at the moment, 
no char shows up at all. Patches for that are certainly welcome ...

-- 
 Marc André Tanner  http://www.brain-dump.org/  GPG key: CF7D56C0



Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for

2014-10-27 Thread FRIGN
On Mon, 27 Oct 2014 13:03:49 +0100
Marc André Tanner m...@brain-dump.org wrote:

 POSIX states for int wcwidth(wchar_t wc): ... or return -1 (if wc 
 does not correspond to a printable wide-character code). Therefore
 I think a return value of -1 should be handled gracefully.

I have to agree here, even though I do not favor changing the code for
broken libraries, I have to quote wcwidth(3):

 DESCRIPTION
   The  wcwidth() function returns the number of columns needed to
   represent the wide character c.  If c is a printable wide 
   character, the value is at least 0.  If c
   is null wide character (L'\0'), the value is 0.  Otherwise -1 is
   returned.

That's how POSIX-2001 defines it.

So yeah, using abs() to catch the invalid case is fine, but could be
refined even more.

Cheers

FRIGN

-- 
FRIGN d...@frign.de



Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for

2014-10-27 Thread k0ga

 That's how POSIX-2001 defines it.
 
 So yeah, using abs() to catch the invalid case is fine, but could be
 refined even more.
 

Ok, it is true. We should catch this case, but I don't like the idea
of the abs. I think the correct solution should be print the invalid
character.

Regards,




Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for

2014-10-27 Thread Dmitrij D. Czarkoff
FRIGN said:
  DESCRIPTION
The  wcwidth() function returns the number of columns needed to
represent the wide character c.  If c is a printable wide 
character, the value is at least 0.  If c
is null wide character (L'\0'), the value is 0.  Otherwise -1 is
returned.
 
 That's how POSIX-2001 defines it.
 
 So yeah, using abs() to catch the invalid case is fine, but could be
 refined even more.

I don't follow your logic here.  Glibc currently yelds incorrect result
for a set of valid Unicode 7.0 codepoints that were invalid previously.
This time they are 1 column width (I didn't verify that, so don't take
my word for it), but in future wide characters may be added as well, and
wcwidth() will return -1 for them as well, so the abs() hack won't
work.

IMO ideally st should either not print characters whose codepoints are
unknown to libc, or should silently replace them with U+FFFD.  At any
rate, assumptions about codepoints' properties are not st's business.

-- 
Dmitrij D. Czarkoff



Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for unsupported unicode chars

2014-10-26 Thread Hiltjo Posthuma
On Sun, Oct 26, 2014 at 3:15 AM, dequis d...@dxzone.com.ar wrote:

 My patch:

 Just wcwidth(...) - abs(wcwidth(...))

 In other words: if wcwidth returns -1, interpret that as a column
 width of 1. It's a bit dirty and lazy, but it works wonderfully for
 most characters.

 I'm not sure what the correct solution would be, but it's definitely
 not something as simple as this - would mean fixing the libc to
 support unicode up to 7.0, or implementing our own version of it.

 Opinions?


I think your fall-back workaround makes alot of sense until glibc mans
up and implements it (the correct solution). I have tested it locally
with glibc and musl libc, interestingly musl knows the correct
character width.

Kind regards,
Hiltjo



Re: [dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for unsupported unicode chars

2014-10-26 Thread dequis
On 26 October 2014 19:36, Dmitrij D. Czarkoff czark...@gmail.com wrote:
 Apparently no workaround is needed - just use sane libc or avoid using
 new codepoints until glibc fixes that.  Or, if you won't feel dirty
 after that, send a patch to glibc instead.

'Just using musl' doesn't seem feasible given a glibc-based system -
it's easier for smaller programs, but in the case of st i'd have to
rebuild xlib, xcb, xft, freetype, fontconfig, etc (and their
dependencies) against musl. Not to mention that musl doesn't implement
unicode 7.0 either, only up to 6.2.

Avoiding using new codepoints is not really an option either, they
just appear in messages written by other people and result in
unreadable/invisible lines.

They offset columns by -1, so with enough characters, they can make a
whole line appear as empty. My workaround until now was to open a
different terminal emulator just to read the 'missing' lines.

I'd say that knowingly allowing that kind of rendering glitch is a bad idea.



[dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for unsupported unicode chars

2014-10-25 Thread dequis
Hi suckless! First, thanks for st. Been using it for a long while,
still impressed at how it gets a lot of stuff right - stuff that urxvt
failed miserably at. There's only one issue that has been bothering me
particularly.


The issue itself:

Unicode characters added since unicode 5.2 (released in 2009, the
latest revision[1] is 7.0) are not supported by the wcwidth()
implementation of glibc, and as a result, they behave weirdly in st.
The man page of wcwidth() specifies that -1 is returned for invalid
unicode characters. I found a stack overflow question[2] about this
same issue.


How st handles it:

I made a gif[3] showing its behavior.

It just offsets the columns by the value returned by wcwidth,
expecting either 1 or 2, not -1. So each unsupported unicode character
behaves like a printable backspace.

Picked U+0524[4] for the tests. The st on the top shows the current
behavior, the st on the bottom is my patched version. The first two
lines typed in the gif are spaces followed by that character. Third
line is the letter 'a' just to show how it overlaps.

Then I used a tmux keybinding that is supposed to scan for URLs, but
the main effect here is refreshing the terminal contents, which makes
those characters vanish. That z^H is a typo, ignore that.


My patch:

Just wcwidth(...) - abs(wcwidth(...))

In other words: if wcwidth returns -1, interpret that as a column
width of 1. It's a bit dirty and lazy, but it works wonderfully for
most characters.

I'm not sure what the correct solution would be, but it's definitely
not something as simple as this - would mean fixing the libc to
support unicode up to 7.0, or implementing our own version of it.

Opinions?


[1]: http://www.fileformat.info/info/unicode/version/index.htm
[2]: 
http://stackoverflow.com/questions/16371418/why-does-wcwidth-return-1-with-a-sign-that-i-can-print-on-the-terminal
[3]: http://i.imgur.com/MDzMJJH.gif
[4]: http://www.fileformat.info/info/unicode/char/0524/index.htm
From 810937df2c8a693deb26ac278c73f0147353079b Mon Sep 17 00:00:00 2001
From: dequis d...@dxzone.com.ar
Date: Sat, 25 Oct 2014 23:07:38 -0300
Subject: [PATCH] Fix issues with wcwidth() returning -1 for unsupported
 unicode chars

Unicode characters added since unicode 5.2 are not supported by the
wcwidth() implementation of glibc, and as a result, they behave weirdly
in st. The man page of wcwidth() specifies that -1 is returned for
invalid unicode characters.

This patch wraps the wcwidth() calls with abs() to ensure that a column
size of 1 is used for unknown unicode characters.
---
 st.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/st.c b/st.c
index 23dd7f1..12af1ab 100644
--- a/st.c
+++ b/st.c
@@ -2576,7 +2576,7 @@ tputc(char *c, int len) {
 		unicodep = ascii = *c;
 	} else {
 		utf8decode(c, unicodep, UTF_SIZ);
-		width = wcwidth(unicodep);
+		width = abs(wcwidth(unicodep));
 		control = ISCONTROLC1(unicodep);
 		ascii = unicodep;
 	}
@@ -3440,7 +3440,7 @@ xdraws(char *s, Glyph base, int x, int y, int charlen, int bytelen) {
 xp, winy + frc[i].font-ascent,
 (FcChar8 *)u8c, u8cblen);
 
-		xp += xw.cw * wcwidth(unicodep);
+		xp += xw.cw * abs(wcwidth(unicodep));
 	}
 
 	/*
-- 
2.1.2