Bug#466321: output of dselect broken with german UTF-8

Thomas Dickey Tue, 19 Feb 2008 17:06:18 -0800

On Tue, Feb 19, 2008 at 04:34:09PM +0100, Frank Lichtenheld wrote:
> On Mon, Feb 18, 2008 at 08:33:06PM -0500, Thomas Dickey wrote:
> > On Mon, Feb 18, 2008 at 03:30:31PM +0100, Frank Lichtenheld wrote:
> > > reassign 466321 libncurses5 5.6+20080203-1
> > > severity 466321 important
> > > thanks
> > > 
> > > Downgrading to libncurses5 5.6+20080119-1 fixes the problem.
> > 
> > Assign it back to dselect.  As noted in the dependencies, it's linked
> > with libncurses rather than libncursesw.
> > 
> > > On Sun, Feb 17, 2008 at 11:53:19PM +0100, Berthold Cogel wrote:
> > > > Output of dselect is broken with german UTF-8:
> > 
> > ...it's behaving as expected.
> 
> Any idea why it works with the older libncurses?


hmm - today I have more time...

Here's the related changes (to ncurses):

20080203
        + modify _nc_setupscreen() to set the legacy-coding value the same
          for both narrow/wide models.  It had been set only for wide model,   
          but is needed to make unctrl() work with locale in the narrow model. 
        + improve waddch() and winsch() handling of EILSEQ from mbrtowc() by   
          using unctrl() to display illegal bytes rather than trying to append 
          further bytes to make up a valid sequence (reported by Andrey A
          Chernov).
        + modify unctrl() to check codes in 128-255 range versus isprint().
          If they are not printable, and locale was set, use a "M-" or "~"
          sequence.

Looking with ncurses' test-program at the codes, before the change I would
see nothing useful for 128-255 since they were nonprintable.  Now unctrl()
returns a string that's 3 bytes.

The thought occurs to me that dselect is doing some type of
workaround for UTF-8 that's being thwarted by the change to unctrl().
(It has to be doing _something_, otherwise it would write interesting
garbage on the screen in a UTF-8 locale).

To explain a little, some programs (such as lynx) may try to use UTF-8
with a non-UTF-8 curses library by pretending that the line size is
wide enough to store the "extra" bytes.  It doesn't repaint very well,
but sort-of works.

However, if dselect converts those bytes to UTF-8 by itself, ncurses now
sees that they're really nonprintable, and displays them as M-'s.
That's because it sees the locale setting - and is using isprint/iswprint
for each character.

ncurses is "supposed" to suppress the isprint check if the setlocale
function weren't called (it's for legacy code).  I see that dselect
does call setlocale before initscr, so ncurses is aware that the
locale has been set.  If it were the other way around, then dselect
would be doing as much as it's able to fit into that legacy mode.

(And if _that_ didn't work, then I'd have to revisit my change -
though I do recall trying to ensure that legacy/non-locale mode worked).

If dselect were linked with ncursesw, it _should_ call setlocale first,
since ncursesw is designed to handle UTF-8.

-- 
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net

pgp7qDLffMesk.pgp
Description: PGP signature

Bug#466321: output of dselect broken with german UTF-8

Reply via email to