Francois Pinard wrote on 2000-07-14 01:23 UTC: > This sends the wrong message to users that C locale should prefer "quote" > over `quote', which is untrue if C locale is basing itself on ASCII. Users of the C locale should not abuse the grave accent as a quotation mark, because it looks silly with a very significant number of fonts. http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html Most implementations of the C locale are not based on ANSI X3.4-1968 (ASCII) today, but on ISO 646 IRV, ISO 8859, ISO 10646, etc. The C standard does definitely *not* imply that the C locale is based on ASCII or even a particular flavour of ASCII. In fact, the C standard does not require the C locale to support for instance the characters U+0024 ($), U+0040 (@), or U+0060 (`) at all! Let me quote ISO/IEC 9899:1999 (E) section 5.2.1, paragraph 3: [#3] Both the basic source and basic execution character sets shall have the following members: the 26 uppercase letters of the Latin alphabet A B C D E F G H I J K L M N O P Q R S T U V W X Y Z the 26 lowercase letters of the Latin alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z the 10 decimal digits 0 1 2 3 4 5 6 7 8 9 the following 29 graphic characters ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~ the space character, and control characters representing horizontal tab, vertical tab, and form feed. The representation of each member of the source and execution basic character sets shall fit in a byte. In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous. In source files, there shall be some way of indicating the end of each line of text; this International Standard treats such an end-of-line indicator as if it were a single new-line character. In the basic execution character set, there shall be control characters representing alert, backspace, carriage return, and new line. If any other characters are encountered in a source file (except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token), the behavior is undefined. That is all that C requires of the character set, and UTF-8 or CP1252 fit that requirement just as good as ASCII. Again, please do not claim that the C locale is in any way tied to ASCII or even a particular incarnation or font style of it, because obviously it really isn't. I very much advocate that once glibc 2.2 has been roled out and widely deployed, we start thinking about making UTF-8 the character encoding of the C locale. Plan9 has been doing that successfully for almost a decade now, and this way the fathers of C and Unix at Ball Labs have already sent us a very clear message with regard to their preference in this matter. > We should not send message to users that under C locale, [quotation mark] > directionality is better lost. Reverting to the previous behaviour is > the correct thing to do. Sorry, I strongly disagree. The most portable way of using quotation marks in C's portable basic execution set is 'quote' or "quote", but certainly not `quote'. ANSI X3.4-1968 ASCII is horribly annoying, especially with proportional fonts. The most annoying bit for me is the unification of the hyphen and the minus. This might not have been a problem for monospaced typewriter fonts, but I really can't stand any more seeing hyphens glyphs, which are fatter and half the width of a plus sign in proportional fonts, being used as minus signs (which should look just like the horizontal bar of a plus) or en-dashes (e.g., as dashes in unnumbered lists). The quotation mark problem is in my opinion trivial compared to the minus mess. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
