Francois Pinard wrote on 2000-07-14 01:23 UTC:
> This sends the wrong message to users that C locale should prefer "quote"
> over `quote', which is untrue if C locale is basing itself on ASCII.

Users of the C locale should not abuse the grave accent as a quotation
mark, because it looks silly with a very significant number of fonts.

http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

Most implementations of the C locale are not based on ANSI X3.4-1968
(ASCII) today, but on ISO 646 IRV, ISO 8859, ISO 10646, etc.

The C standard does definitely *not* imply that the C locale is based on
ASCII or even a particular flavour of ASCII. In fact, the C standard
does not require the C locale to support for instance the characters
U+0024 ($), U+0040 (@), or U+0060 (`) at all! Let me quote ISO/IEC
9899:1999 (E) section 5.2.1, paragraph 3:

       [#3]  Both  the  basic  source and basic execution character
       sets shall have the  following  members:  the  26  uppercase
       letters of the Latin alphabet

               A  B  C  D  E  F  G  H  I  J  K  L  M
               N  O  P  Q  R  S  T  U  V  W  X  Y  Z

       the 26 lowercase letters of the Latin alphabet

               a  b  c  d  e  f  g  h  i  j  k  l  m
               n  o  p  q  r  s  t  u  v  w  x  y  z

       the 10 decimal digits

               0  1  2  3  4  5  6  7  8  9

       the following 29 graphic characters

               !  "  #  %  &  '  (  )  *  +  ,  -  .  /  :
               ;  <  =  >  ?  [  \  ]  ^  _  {  |  }  ~

       the  space  character,  and  control characters representing
       horizontal  tab,  vertical  tab,   and   form   feed.    The
       representation  of  each  member of the source and execution
       basic character sets shall fit  in  a  byte.   In  both  the
       source and execution basic character sets, the value of each
       character after 0 in the above list of decimal digits  shall
       be  one  greater  than the value of the previous.  In source
       files, there shall be some way of indicating the end of each
       line  of  text;  this  International Standard treats such an
       end-of-line indicator  as  if  it  were  a  single  new-line
       character.   In  the  basic  execution  character set, there
       shall be control characters representing  alert,  backspace,
       carriage  return, and new line.  If any other characters are
       encountered in a source file (except  in  an  identifier,  a
       character  constant,  a  string  literal,  a  header name, a
       comment, or a preprocessing token that is never converted to
       a token), the behavior is undefined.

That is all that C requires of the character set, and UTF-8 or CP1252
fit that requirement just as good as ASCII.

Again, please do not claim that the C locale is in any way tied to ASCII
or even a particular incarnation or font style of it, because obviously
it really isn't.

I very much advocate that once glibc 2.2 has been roled out and widely
deployed, we start thinking about making UTF-8 the character encoding of
the C locale. Plan9 has been doing that successfully for almost a decade
now, and this way the fathers of C and Unix at Ball Labs have already
sent us a very clear message with regard to their preference in this
matter.

> We should not send message to users that under C locale, [quotation mark]
> directionality is better lost.  Reverting to the previous behaviour is
> the correct thing to do.

Sorry, I strongly disagree. The most portable way of using quotation
marks in C's portable basic execution set is 'quote' or "quote", but
certainly not `quote'.

ANSI X3.4-1968 ASCII is horribly annoying, especially with proportional
fonts. The most annoying bit for me is the unification of the hyphen and
the minus. This might not have been a problem for monospaced typewriter
fonts, but I really can't stand any more seeing hyphens glyphs, which
are fatter and half the width of a plus sign in proportional fonts,
being used as minus signs (which should look just like the horizontal
bar of a plus) or en-dashes (e.g., as dashes in unnumbered lists). The
quotation mark problem is in my opinion trivial compared to the minus
mess.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to