Re: xterm(1) changing UTF-8 characters when copy-pasting?

Ingo Schwarze Wed, 29 Nov 2017 09:28:02 -0800

Hi Philippe,

Philippe Meunier wrote on Wed, Nov 29, 2017 at 11:35:59AM -0500:
> Ingo Schwarze wrote:
>> Philippe Meunier wrote:


>>> $ ls
>>> Thérèse

>> That's a bad idea.  Do not use non-ASCII bytes in file names.

> That's a nice thought but in practice I have some files on that machine
> with names written in French, Thai, Chinese, Korean, and Japanese, and for
> some of these files renaming is not an option for work reasons.  I somehow
> doubt that I'm the only one in such a situation.

Sure.  In some situations, there is no viable alternative to dealing
with file systems containing broken filenames.  That's why we try
to make tools like ls(1) as useful as possible in such a bad
situation.  But you can never expect a smooth user experience.
It is not an OpenBSD-specific problem, in facts it's worse almost
everywhere else, although not everybody is likely to admit that.

>> It's certainly not ksh(1) because our ksh is not fully multibyte-
>> character aware on purpose, but deliberately has only limited
>> multibyte-character support.

> Actually, since you brought this up, I wish ksh had fuller multibyte
> character support.  As you say above the problem is mostly hidden and most
> of the time it happens to just work, but, for example, trying to delete
> double-wide Korean characters (well, syllables, really, which are *all*
> double-wide) messes up the command line:

That is indeed expected, and it is one of the things that are very
unlikely to change even in the long term.  Adding support for
correctly handling character display widths in shell command line
editing would require calling functions like mbtowc(3) and wcwidth(3)
on the fly in the command line editing modules.  Such changes would
be fairly intrusive and carry a substantial risk of introducing
nasty, perhaps even security-relevant bugs into the shell, so even
if somebody would cook up patches, i'm not convinced that they could
go in.

That said, i see that you are actually torturing our shell in these
respects quite a bit.  As long as you don't expect that everything
can be fixed, you are quite welcome to report issues that you see.
I don't doubt that there are still outright bugs, and it also seems
likely that there are missing features which can be implemented
without making a mess of the shell.  So reports based on real everyday
use are definitely helpful.  While several developers understand
the basics of how multibyte character support works in the shell
and in some others of our POSIX utilities, very few use that heavily,
as far as i know.

Yours,
  Ingo

Re: xterm(1) changing UTF-8 characters when copy-pasting?

Reply via email to