I saw several discussion emails regarding the topic you wrote in this
email and I also agree with you that a backspace should erase a character as
the minimum. (I added a reason at below why I added "as a minimum".)

All commercial CJK terminals and terminal emulators in my understanding
work that way, i.e., a backspace will erase a character and depending on
the what is the character being erase, it could be two screen columns or
one screen column, and also some public domain terminal emulators like
hanterm and so on if I remember correctly.

(Here's the reason.) There is an exception on this general rule though.
For instance, some Thai and Korean terminals and terminal emulators in my
understanding have two different modes in erase operation that user can
select (or support so-called "display cell erase mode" only):

- Display cell erase mode:
  A display cell is a group of characters (or, character elements) that form
  a composed character. For instance, in Thai, a single screen column of 
  a display cell can have several characters like tone/diacritical mark, upper
  vowel, consonant, and/or lower vowel. (There could be other examples on
  this especially from combining character sequences in Latin scripts for
  instance.) In Korean, a Hangul syllable can have two screen column width of
  a display cell that could contain initial consonant, vowel, optinal final
  consonant.
  
  In this erase mode, no matter how many characters are in a display cell,
  the display cell will be erase and thus all characters in the display cell
  will also be erased from the line buffer. For instance,
  if we have A, a combinging diacritical mark accute, and a combining
  diacritical mark breve in a single column display cell and also in
  a line buffer, a backspace operation will erase all the characters in
  the display cell from the screen and also the line buffer.
  
- Character/character element erase mode:
  In this erase mode, the last character in the current display cell will be
  erased and depending on the context, the display cell could be erased (if
  the character erased is the last remaining one in the display cell) or
  that particular character only will be erased while the other characters in
  the display cell still show up in the display cell. For instance,
  if we have A, a combinging diacritical mark accute, and a combining
  diacritical mark breve in a single column display cell and also in
  a line buffer, a backspace operation will erase only the combining
  diacritical mark breve and thus the display cell will be re-displayed with
  the A with the combining diacritical mark accute only.

Talking with various developers and customers in different regions, we found
that the "display cell erase mode" is the must and the "character/character
element erase mode" is something nice to have (and occasionally a must for
certain regions of the world).

And, in your argument that we don't need to have wcwidth, you wrote:

] ........................................... How many single-width cells
] this corresponds to (1 or 2) depends on the character left of the active
] position before the backspace is processed.

How do you know then "how many single-width cells (the character that is
going to be erased) corresponds to (1 or 2) depends on the character"
without wcwidth like function in the kernel?? There is absolutely
no way to know whether a Unicode character that is going to be erased
occupies zero, single, or two screen column width unless you have
a some kind of wcwidth.

I also think that we won't be able to cover all possible cases of context-
sensitive shapings with a simple wcwidth like function but that will cover
pretty much all cases including combining diacritical marks except a very few
cases that you will need to have some level of script specific surrounding
context of a character that you want to erase and thus require additional
processing like Thai's SaraAm character.

And so I think it will be a good investment for us to have the feature at
the kernel if we are going to keep using the terminal emulators and also
various kind of shells.

With regards,

Ienup


] X-URL: http://www.cl.cam.ac.uk/~mgk25/
] Date: Fri, 26 Jan 2001 15:51:23 +0000
] From: Markus Kuhn <[EMAIL PROTECTED]>
] Subject: Re: kernel tty patches
] To: [EMAIL PROTECTED]
] MIME-version: 1.0
] 
] Ienup Sung wrote on 2001-01-25 19:32 UTC:
] > For your argument on A, well, most of commercial Unix variants I believe
] > at least support EUC by downloading width information of the current
] > locale/codeset which is usually about 6 bytes in size to ldterm kernel 
module
] > through ioctl(2) so that for canonical input mode (and shells that are 
relying
] > on the canonical input mode) can do erase/kill operation correctly.
] 
] I don't have a very strong opinion on the subject (and probably neither
] has ECMA-48), but I'd rather prefer if backspace moved the cursor one
] (non-combining) character width to the left. How many single-width cells
] this corresponds to (1 or 2) depends on the character left of the active
] position before the backspace is processed. I understand that this is
] not how CJK terminal emulators implement backspace at the moment, but I
] still think it is the cleaner approach, and it would keep the kernel
] (and many many similar trivial line editors) free of having to worry
] about wcwidth. Line/word erasure can similarly be implemented without
] wcwidth awareness if backspace moved over character positions, not cell
] widths.
] 
] Admittedly, the semantics get's rather messy in the context of combining
] characters, which is why I'd really prefer to use only UCS Level 1 in
] such simple terminal applications (meaning, the keyboard driver should
] really not generate combining characters).
] 
] Markus
] 
] -- 
] Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
] Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
] 
] -
] Linux-UTF8:   i18n of Linux on all levels
] Archive:      http://mail.nl.linux.org/lists/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to