Re: [bug-gnulib] quote characters in stds
to educated people it recommends Unicode, without mentioning it explicitly. True. I do not know how else to write it. (I'm also not sure rms will go for it at all.) That depends on your mailer. Is it a package in Emacs, or is it 'pine' without Bernhard Kaindl's patches? My personal configuration is not the point (it's vm inside emacs). My point is that it didn't come through correctly. I am sure I am not unique in this. Maybe you can reformulate the last two paragraphs in a way that is less incorrect? Sorry, since I do not see what is incorrect about them, I do not know how to reformulate them. If you can suggest wording that makes you happier, please do. It's at the IANA: http://www.iana.org/assignments/character-sets Thanks. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: quote characters in stds
Hi Karl, Here's a nit: [EMAIL PROTECTED] (Karl Berry) wrote: ... > The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote} > and @code{quotearg} modules provide a reasonably straightforward way > support locale-specific quote characters, as well as taking care of s/support/to support/ ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: [bug-gnulib] quote characters in stds
Karl Berry wrote: > Yes, but rms has explicitly rejected (in previous email with me) the > idea of recommending the use of UTF-8 in any context whatsoever. Sigh. Sigh. What you wrote there: If you need to use non-ASCII characters, for example to represent names of contributors, you should normally stick with one encoding, as one cannot in general mix encodings reliably. is a salomonic solution: to educated people it recommends Unicode, without mentioning it explicitly. > My personal experience is that it is true that Unicode is still > considerably less widely usable than Latin1. Sure, Unicode is available > in many contexts and systems. But the names in your message, just for > example, came through as garbage to me. That depends on your mailer. Is it a package in Emacs, or is it 'pine' without Bernhard Kaindl's patches? > No doubt I personally could > eventually configure everything involved to display it properly, but the > point is that it doesn't "just work". True: there are some distributions where things don't "just work", but these non-Unicode-enabled corners are diminishing. Maybe you can reformulate the last two paragraphs in a way that is less incorrect? > PS: The right spelling of the encodings is "Latin1" (no dash, no space) > > I'm glad to know that, it's easier to type than @tie{} :). I had mostly > seen it with a space. Do you happen to know where the definitive > spelling is given? It's at the IANA: http://www.iana.org/assignments/character-sets Bruno ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: gcc -Wall warning for minmax.h
Hi, On Tue, Jun 07, 2005 at 10:19:47AM -0400, Derek Price wrote: > >- we have to document also the fact that AS_TR_SH & AS_TR_CPP expand > > to literal variable (symbol) name, if their argument is a literal > > I didn't think this was important from the user's perspective. In the patch I proposed, I used gl_CACHE_VAR as the second argument of AC_CACHE_CHECK. That argument cannot contain backticks, see the Appendix below. That explains why I relied on the feature, and why I wanted to document it. But yes, it's better to fix AC_CACHE_CHECK to remove this limitation. After that fix, we could also remove the AS_LITERAL_IF with m4_fatal. I'm happy that AS_LITERAL_IF will stay undocumented. m4_fatal and m4_warning should be documented, though. Have a nice day, Stepan Appendix: If gl_CACHE_VAR expanded to `echo "xyz-xyz" | sed ...` then you get something like: eval "test \"\${gl_CACHE_VAR+set}\" = set" eval "test \"\${`echo "xyz-xyz" | sed ...`+set}\" = set" And this construct is not portable, see the first paragraph of node "Shell Substitutions". This can be fixed: AS_VAR_TEST_SET could in this case expand to as_var=`echo "xyz-xyz" | sed ...` eval "test \"\${$as_var+set}\" = set" ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: [bug-gnulib] quote characters in stds
This is misleading. I know, but I'm not sure what to say. Just delete the sentence about Latin1, maybe? I guess it's not really necessary. To represent them, you need Unicode, i.e. the UTF-8 encoding. Yes, but rms has explicitly rejected (in previous email with me) the idea of recommending the use of UTF-8 in any context whatsoever. Sigh. This is not true for several years now. Well, whether or not it is true, rms will not accept it, so there's no sense arguing it here. My personal experience is that it is true that Unicode is still considerably less widely usable than Latin1. Sure, Unicode is available in many contexts and systems. But the names in your message, just for example, came through as garbage to me. No doubt I personally could eventually configure everything involved to display it properly, but the point is that it doesn't "just work". And I suspect I am using far newer versions of everything than an "average" user. PS: The right spelling of the encodings is "Latin1" (no dash, no space) I'm glad to know that, it's easier to type than @tie{} :). I had mostly seen it with a space. Do you happen to know where the definitive spelling is given? I've poked around the ISO site without success. Another draft below. I'm not quite sure why ` would ever be "unacceptable", and I'm a bit skeptical that it will past muster with rms, but I'm trying to avoid an argument with standards-mavens. And gcc 4 already does '...'. Any improved wording and/or backup facts welcome :). Thanks, k @node Quote characters @section Quote characters @cindex quote characters In the C locale, GNU programs should stick to plain ASCII for quotation characters in messages to users: preferably 0x60 (`) for left quotes and 0x27 (') for right quotes. If using ` is unacceptable in your application, other possibilities are using ' for both opening and closing, or " (0x22) for both opening and closing. It is ok, but not required, to use locale-specific quotes in other locales. The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote} and @code{quotearg} modules provide a reasonably straightforward way support locale-specific quote characters, as well as taking care of other issues, such as quoting a filename that itself contains a quote character. See the Gnulib documentation for usage details. In any case, the documentation for your program should clearly specify how it does quoting, if different than the preferred method of ` and '. This is especially important if the output of your program is ever likely to be parsed by another program. ASCII should also be preferred in source code comments, text documents, and other contexts, unless there is good reason to do something else because of the domain at hand. If you need to use non-ASCII characters, for example to represent names of contributors, you should normally stick with one encoding, as one cannot in general mix encodings reliably. Quotation characters are a difficult area in the computing world at this time: there are no true left or right quote characters in ASCII, or even Latin1 (the ` character we use is standardized as a grave accent). Latin1 does have paired standalone accents, but it seems wrong in principle to abuse them as quotes. And even Latin1 is not universally usable. Unicode contains the unambiguous quote characters required, and its common encoding UTF-8 is upward compatible with [EMAIL PROTECTED] But Unicode and UTF-8 are deployed even less widely than Latin1; it would be premature to require Unicode support for running essentially every GNU program. Perhaps the prevailing situation will change in a few years, and then we will revisit this. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: [bug-gnulib] quote characters in stds
Karl Berry wrote: > @node Quote characters > @section Quote characters > @cindex quote characters > > In the C locale, GNU programs should stick to plain ASCII for > quotation characters in messages to users: either 0x60 (`) for left > quotes and 0x27 (') for right quotes, or ' for both opening and > closing, or " (0x22) for both opening and closing. It is ok, but not > required, to use locale-specific quotes in other locales. > > The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote} > and @code{quotearg} modules provide a reasonably straightforward way > support locale-specific quote characters, as well as taking care of > other issues, such as quoting a filename that itself contains a quote > character. See the Gnulib documentation for usage details. > > ASCII should also be preferred in source code comments, text > documents, and other contexts, unless there is good reason to do > something else because of the domain at hand. Agreed. > If you need to use non-ASCII characters, for example to represent > names of contributors, you should normally stick with one encoding, as > one cannot in general mix encodings reliably. [EMAIL PROTECTED] is the > most widely usable encoding today, after plain [EMAIL PROTECTED] This is misleading. In a list of contributors, I often find names like Rafał Maszkowski, Primož Peterlin, Martin Mokrejš, and Владимир Слепнев (Vladimir Slepnev). To represent them, you need Unicode, i.e. the UTF-8 encoding. > Quotation characters are a difficult area in the computing world at > this time: there are no true left or right quote characters in ASCII, > or even [EMAIL PROTECTED] [EMAIL PROTECTED] does have paired standalone > accents, but it seems wrong in principle to abuse them as quotes. And > even [EMAIL PROTECTED] is not universally usable. > > Unicode contains the unambiguous quote characters required, and its > common encoding [EMAIL PROTECTED] is upward compatible with [EMAIL PROTECTED] Agreed. > But Unicode and UTF-8 are deployed less widely than [EMAIL PROTECTED]; it > would > be premature to require Unicode support for running essentially every > GNU program. This is not true for several years now. The major GUI toolkits, KDE/Qt and GNOME/Gtk, support Unicode for several years, and are now featuring good support not only of Western and CJK languages, but also of Bidi scripts and Indic languages. 'vi' is UTF-8 enabled since 2001. For more than one year, major Linux distributions like Fedora Core 3 put users into UTF-8 locales by default. See http://www.cl.cam.ac.uk/~mgk25/unicode.html for more info. Bruno PS: The right spelling of the encodings is "Latin1" (no dash, no space) and "UTF-8" (with a HYPHEN-MINUS in between). ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: quote characters in stds
Simon Josefsson wrote: > Is it possible to discuss how that relate to the > [EMAIL PROTECTED] stuff recommended by gettext to solve the similar problem? I wouldn't expand on this, at least not in the GNU standards: very few people use these [EMAIL PROTECTED] or [EMAIL PROTECTED] catalogs. In 5 years existence, I got 1 single bug-report/support question about them. Therefore if you want to push this, I think gettext's ABOUT-NLS or some i18n web sites are more appropriate than the GNU standards. Bruno ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: quote characters in stds
Hi Simon, Thanks for the note. Are they mutual exclusive, or should they be used in combination? Those things aren't clear to me. They aren't clear to me either, but I *think* they can be used in combination. That is, if you use the gnulib quote module or equivalent, then you could set your locale to [EMAIL PROTECTED] (somehow ...), and get the UTF-8 quotes, transliterated to ' or " in some cases. (I'm getting this from the po file, appended; I've never actually used [EMAIL PROTECTED]) I think the appropriate place to discuss the details would be in the Gnulib and/or Gettext documentation, not the coding standards, although perhaps they should be mentioned. (Something like: "Independent of gnulib, you can use the [EMAIL PROTECTED] catalog provided by gettext to achieve a similar result. See .") I trust Bruno will fill us in. karl # English translations for GNU gettext-runtime package. # Copyright (C) 2005 Free Software Foundation, Inc. # This file is distributed under the same license as the GNU gettext-runtime package. # Automatically generated, 2005. # # All this catalog "translates" are quotation characters. # The msgids must be ASCII and therefore cannot contain real quotation # characters, only substitutes like grave accent (0x60), apostrophe (0x27) # and double quote (0x22). These substitutes look strange; see # http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html # # This catalog translates grave accent (0x60) and apostrophe (0x27) to # left single quotation mark (U+2018) and right single quotation mark (U+2019). # It also translates pairs of apostrophe (0x27) to # left single quotation mark (U+2018) and right single quotation mark (U+2019) # and pairs of quotation mark (0x22) to # left double quotation mark (U+201C) and right double quotation mark (U+201D). # # When output to an UTF-8 terminal, the quotation characters appear perfectly. # When output to an ISO-8859-1 terminal, the single quotation marks are # transliterated to apostrophes (by iconv in glibc 2.2 or newer) or to # grave/acute accent (by libiconv), and the double quotation marks are # transliterated to 0x22. # When output to an ASCII terminal, the single quotation marks are # transliterated to apostrophes, and the double quotation marks are # transliterated to 0x22. # ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
FYI: Minor patch to glob_.h
I've installed the attached patch: 2005-06-07 Derek Price <[EMAIL PROTECTED]> Sync from CVS. * lib/glob_.h: Indent nested #ifdef. Do you want me to keep sending FYI's to this list for this sort of minor change? Regards, Derek Index: lib/glob_.h === RCS file: /cvsroot/gnulib/gnulib/lib/glob_.h,v retrieving revision 1.2 diff -u -p -r1.2 glob_.h --- lib/glob_.h 31 May 2005 21:01:17 - 1.2 +++ lib/glob_.h 7 Jun 2005 14:55:51 - @@ -135,11 +135,11 @@ typedef struct are used instead of the normal file access functions. */ void (*gl_closedir) (void *); #ifdef __USE_GNU -#if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__ +# if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__ struct dirent *(*gl_readdir) (void *); -#else +# else struct direct *(*gl_readdir) (void *); -#endif +# endif #else void *(*gl_readdir) (void *); #endif ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: quote characters in stds
[EMAIL PROTECTED] (Karl Berry) writes: > @node Quote characters I like that section. Is it possible to discuss how that relate to the [EMAIL PROTECTED] stuff recommended by gettext to solve the similar problem? Which method is preferable? Are they mutual exclusive, or should they be used in combination? Those things aren't clear to me. From the gettext manual: It is recommended that you add the "languages" [EMAIL PROTECTED]' and [EMAIL PROTECTED]' to the `LINGUAS' file. [EMAIL PROTECTED]' is a variant of English message catalogs (`en') which uses real quotation marks instead of the ugly looking asymmetric ASCII substitutes ``' and `''. [EMAIL PROTECTED]' is a variant of [EMAIL PROTECTED]' that additionally outputs quoted pieces of text in a bold font, when used in a terminal emulator which supports the VT100 escape sequences (such as `xterm' or the Linux console, but not Emacs in `M-x shell' mode). These extra message catalogs [EMAIL PROTECTED]' and [EMAIL PROTECTED]' are constructed automatically, not by translators; to support them, you need the files `Rules-quot', `quot.sed', `boldquot.sed', [EMAIL PROTECTED]', [EMAIL PROTECTED]', `insert-header.sin' in the `po/' directory. You can copy them from GNU gettext's `po/' directory; they are also installed by running `gettextize'. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: stat and lstat should define their replacements
Paul Eggert wrote: >Derek Price <[EMAIL PROTECTED]> writes: > > > >>What does this mean for Bruno's recent patch for stat & lstat which >>removed the SunOS 4.1.4 support in addition to some other fixes? >> >> > >His patch made sense to me, but as far as I know nobody has taken the >time to integrate the comments on it and try it out. Here's what I >see so far: > > I've attached a revised patch. In addition to integrating the suggestions Bruno's patch received, it also includes changes to modules/lstat, config/srclist.txt, and MODULES.html.sh. In addition to this patch, lib/stat.c, m4/stat.m4, and modules/stat need to be removed. It appears to work installed in CVS on Linux, but that only tests the modules file and macro, not the C source or header files. Cheers, Derek Index: MODULES.html.sh === RCS file: /cvsroot/gnulib/gnulib/MODULES.html.sh,v retrieving revision 1.88 diff -u -p -r1.88 MODULES.html.sh --- MODULES.html.sh 2 Jun 2005 20:41:04 - 1.88 +++ MODULES.html.sh 7 Jun 2005 14:46:02 - @@ -1063,7 +1063,6 @@ srand srand48 srandom sscanf -stat statvfs stdin strcasecmp @@ -1756,7 +1755,6 @@ func_all_modules () func_module mkdtemp func_module poll func_module readlink - func_module stat func_module lstat func_module time_r func_module timespec Index: config/srclist.txt === RCS file: /cvsroot/gnulib/gnulib/config/srclist.txt,v retrieving revision 1.63 diff -u -p -r1.63 srclist.txt --- config/srclist.txt 29 May 2005 16:56:02 - 1.63 +++ config/srclist.txt 7 Jun 2005 14:46:02 - @@ -143,7 +143,6 @@ $LIBCSRC/sysdeps/generic/memmem.c lib gp # # These implementations are quite different. #$LIBCSRC/io/lstat.c lib gpl -#$LIBCSRC/io/stat.clib gpl #$LIBCSRC/libio/__fpending.c lib gpl #$LIBCSRC/malloc/malloc.c lib gpl #$LIBCSRC/misc/dirname.c lib gpl Index: lib/lstat.c === RCS file: /cvsroot/gnulib/gnulib/lib/lstat.c,v retrieving revision 1.7 diff -u -p -r1.7 lstat.c --- lib/lstat.c 14 May 2005 06:03:58 - 1.7 +++ lib/lstat.c 7 Jun 2005 14:46:02 - @@ -1,8 +1,6 @@ -/* Work around the bug in some systems whereby lstat succeeds when - given the zero-length file name argument. The lstat from SunOS 4.1.4 - has this bug. +/* Work around a bug of lstat on some systems - Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003 Free + Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify @@ -19,5 +17,59 @@ along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ -#define LSTAT -#include "stat.c" +/* written by Jim Meyering */ + +#include + +/* The specification of these functions is in sys_stat.h. But we cannot + include this include file here, because on some systems, a + "#define lstat lstat64" is being used, and sys_stat.h deletes this + definition. */ + +#include +#include +#include +#include + +#include "stat-macros.h" +#include "xalloc.h" + +/* lstat works differently on Linux and Solaris systems. POSIX (see + `pathname resolution' in the glossary) requires that programs like `ls' + take into consideration the fact that FILE has a trailing slash when + FILE is a symbolic link. On Linux systems, the lstat function already + has the desired semantics (in treating `lstat("symlink/",sbuf)' just like + `lstat("symlink/.",sbuf)', but on Solaris it does not. + + If FILE has a trailing slash and specifies a symbolic link, + then append a `.' to FILE and call lstat a second time. */ + +int +rpl_lstat (const char *file, struct stat *sbuf) +{ + size_t len; + char *new_file; + + int lstat_result = lstat (file, sbuf); + + if (lstat_result != 0 || !S_ISLNK (sbuf->st_mode)) +return lstat_result; + + len = strlen (file); + if (len == 0 || file[len - 1] != '/') +return lstat_result; + + /* FILE refers to a symbolic link and the name ends with a slash. + Append a `.' to FILE and repeat the lstat call. */ + + /* Add one for the `.' we'll append, and one more for the trailing NUL. */ + new_file = xmalloc (len + 1 + 1); + memcpy (new_file, file, len); + new_file[len] = '.'; + new_file[len + 1] = 0; + + lstat_result = lstat (new_file, sbuf); + free (new_file); + + return lstat_result; +} Index: lib/lstat.h === RCS file: lib/lstat.h diff -N lib/lstat.h --- /dev/null 1 Jan 1970 00:00:00 - +++ lib/lstat.h 7 Jun 2005 14:46:02 - @@ -0,0 +1,24 @@ +/* Retrieving information about files. + Copyrigh
Re: gcc -Wall warning for minmax.h
Stepan Kasal wrote: >- We need to document also AS_LITERAL_IF and m4_fatal > (And you could also document m4_warning, when you are at it.) > > I'll see about it after I get comments on the first round back. >- we have to document also the fact that AS_TR_SH & AS_TR_CPP expand > to literal variable (symbol) name, if their argument is a literal > > I didn't think this was important from the user's perspective. I showed in my examples that variables were expanded properly before conversion, but I didn't think it important to mention here that expansion of input without shell meta-characters would be optimized. Cheers, Derek ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: quote characters in stds
I believe that the standard should probably suggest a preferred alternative. Yeah, you're probably right. I was trying to avoid dissension I suspect there are some GNU'ers who will hate the idea of using `), but it's likely unavoidable :). Guess I'll try changing the first "either" to "preferably", etc. process the output of your program with another Yes, good point. I'll try to dream something up for that. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: quote characters in stds
Hi James, It might be worth pointing out that all valid ASCII files are valid UTF-8 files, but not all valid Latin-1 files are valid UTF-8 files. Thanks for the suggestion. I'm glad to know this myself (I thought it was the case, but didn't know the specifics), but since rms does not want to support/recommend UTF-8 at all, I think the less said about it here the better. Cheers, Karl ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: quote characters in stds
On Tue, Jun 07, 2005 at 09:15:04AM -0400, Karl Berry wrote: > In the C locale, GNU programs should stick to plain ASCII for > quotation characters in messages to users: either 0x60 (`) for left > quotes and 0x27 (') for right quotes, or ' for both opening and > closing, or " (0x22) for both opening and closing. It is ok, but not > required, to use locale-specific quotes in other locales. I forgot to actually provide feedback. The main thrust of the text is agreeable to me though I think it might be better to be a little more prescriptive - that is, I believe that the standard should probably suggest a preferred alternative. Secondly, the standard should state that if it is ever likely that someone will need to process the output of your program with another program, then the documentation for your program should clearly indicate how it does quoting and how the various 'corner cases' are handled. Regards, James. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: quote characters in stds
Karl writes: > Unicode contains the unambiguous quote characters required, and its > common encoding [EMAIL PROTECTED] is upward compatible with [EMAIL PROTECTED] > It might be worth pointing out that all valid ASCII files are valid UTF-8 files, but not all valid Latin-1 files are valid UTF-8 files. Specifically, there are characters in Latin-1 that are used in Unicode as leading bytes of multibyte characters (for example 0xE8, which is an e with a grave accent). Unicode is a superset of Latin-1, but that doesn't mean that you can load a Latin-1 file as if it was UTF-8. It might be worth considering this wording change... > Unicode contains the unambiguous quote characters required, and its > common encoding [EMAIL PROTECTED] is upward compatible with [EMAIL PROTECTED] > However, you can't process a Latin-1 encoded file as if it were > [EMAIL PROTECTED], because some Latin-1 character codes are used to begin > multibyte character sequences in [EMAIL PROTECTED] ... though this is sort of drifting away from the main point of a section on quote characters and into guidance on handling character encoding systems. James. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: gcc -Wall warning for minmax.h
Hello Derek, thank you very much for taking care about this. On Mon, Jun 06, 2005 at 02:31:24PM -0400, Derek Price wrote: > Yes, AS_TR_SH & AS_TR_CPP appear to be undocumented. I've submitted a > patch to autoconf-patches to remedy this and will commit it within a few > days unless there are objections there. I haven't seen the patch; and won't see before you commit, as I'll be offline until Jun 21. That's why I add some comments now, though you probably know most of them: - We need to document also AS_LITERAL_IF and m4_fatal (And you could also document m4_warning, when you are at it.) - we have to document also the fact that AS_TR_SH & AS_TR_CPP expand to literal variable (symbol) name, if their argument is a literal A cheeky closing note: Bruno, your code also uses undocumented macros: `define', `undefine' and `translit'. Please note that the manual states that they are _moved_ into the m4_ pseudo-namespace. And IMHO these will never be documented, they are only for backward compatibility. ;-) Have a nice day, Stepan ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
quote characters in stds
rms wants to address the issue of quote characters in the GNU coding standards. Among the people I've talked to, there's a general consensus that it would be best to stick to ASCII at least for the C locale, and rms agreed with that. Paul Eggert (thanks Paul) and I drafted some text following that. I thought before I sent it back to rms, I would see if anyone else had comments ... see below if you care. I think the entry is quite good and clear. -- José E. Marchesi <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> GNU España http://es.gnu.org GNU No es Unix! http://www.gnu.org ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
quote characters in stds
rms wants to address the issue of quote characters in the GNU coding standards. Among the people I've talked to, there's a general consensus that it would be best to stick to ASCII at least for the C locale, and rms agreed with that. Paul Eggert (thanks Paul) and I drafted some text following that. I thought before I sent it back to rms, I would see if anyone else had comments ... see below if you care. I am not sure if rms himself will accept everything in here, but we should at least try to submit something that minimizes unhappiness among the programmers. BTW, the Gnulib doc today does not talk about quotes, but I will add something before any coding standards change gets distributed. (Since I do the actual coding standards updates, I can be sure of this. :) Thanks, karl @node Quote characters @section Quote characters @cindex quote characters In the C locale, GNU programs should stick to plain ASCII for quotation characters in messages to users: either 0x60 (`) for left quotes and 0x27 (') for right quotes, or ' for both opening and closing, or " (0x22) for both opening and closing. It is ok, but not required, to use locale-specific quotes in other locales. The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote} and @code{quotearg} modules provide a reasonably straightforward way support locale-specific quote characters, as well as taking care of other issues, such as quoting a filename that itself contains a quote character. See the Gnulib documentation for usage details. ASCII should also be preferred in source code comments, text documents, and other contexts, unless there is good reason to do something else because of the domain at hand. If you need to use non-ASCII characters, for example to represent names of contributors, you should normally stick with one encoding, as one cannot in general mix encodings reliably. [EMAIL PROTECTED] is the most widely usable encoding today, after plain [EMAIL PROTECTED] Quotation characters are a difficult area in the computing world at this time: there are no true left or right quote characters in ASCII, or even [EMAIL PROTECTED] [EMAIL PROTECTED] does have paired standalone accents, but it seems wrong in principle to abuse them as quotes. And even [EMAIL PROTECTED] is not universally usable. Unicode contains the unambiguous quote characters required, and its common encoding [EMAIL PROTECTED] is upward compatible with [EMAIL PROTECTED] But Unicode and UTF-8 are deployed less widely than [EMAIL PROTECTED]; it would be premature to require Unicode support for running essentially every GNU program. Perhaps the prevailing situation will change in a few years, and then we will revisit this. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: [bug-gnulib] Handling of invalid multibyte character sequences in fnmatch()
On Tue, Jun 07, 2005 at 12:03:27AM -0700, Paul Eggert wrote: > James Youngman <[EMAIL PROTECTED]> writes: > > > Any ideas/suggestions? > > Does the following untested patch fix things? It attempts to mimic > what Bash does. > > *** fnmatch.c Fri May 13 23:03:58 2005 > --- /tmp/fnmatch.cTue Jun 7 00:02:03 2005 > *** fnmatch (const char *pattern, const char [...] It appears not to affect this behaviour, but I don't have time right now to run it under a debugger to find out why. I'll have time in about 9 hours when I get back from work. Regards, James. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: stat and lstat should define their replacements
Derek Price <[EMAIL PROTECTED]> writes: > What does this mean for Bruno's recent patch for stat & lstat which > removed the SunOS 4.1.4 support in addition to some other fixes? His patch made sense to me, but as far as I know nobody has taken the time to integrate the comments on it and try it out. Here's what I see so far: http://lists.gnu.org/archive/html/bug-gnulib/2005-05/msg00243.html http://lists.gnu.org/archive/html/bug-gnulib/2005-05/msg00244.html http://lists.gnu.org/archive/html/bug-gnulib/2005-05/msg00246.html http://lists.gnu.org/archive/html/bug-gnulib/2005-05/msg00264.html ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: [bug-gnulib] Handling of invalid multibyte character sequences in fnmatch()
James Youngman <[EMAIL PROTECTED]> writes: > Any ideas/suggestions? Does the following untested patch fix things? It attempts to mimic what Bash does. *** fnmatch.c Fri May 13 23:03:58 2005 --- /tmp/fnmatch.c Tue Jun 7 00:02:03 2005 *** fnmatch (const char *pattern, const char *** 319,372 wide characters. */ memset (&ps, '\0', sizeof (ps)); patsize = mbsrtowcs (NULL, &pattern, 0, &ps) + 1; ! if (__builtin_expect (patsize == 0, 0)) ! /* Something wrong. ! XXX Do we have to set `errno' to something which mbsrtows hasn't ! already done? */ ! return -1; ! assert (mbsinit (&ps)); ! strsize = mbsrtowcs (NULL, &string, 0, &ps) + 1; ! if (__builtin_expect (strsize == 0, 0)) ! /* Something wrong. ! XXX Do we have to set `errno' to something which mbsrtows hasn't ! already done? */ ! return -1; ! assert (mbsinit (&ps)); ! totsize = patsize + strsize; ! if (__builtin_expect (! (patsize <= totsize ! && totsize <= SIZE_MAX / sizeof (wchar_t)), ! 0)) { ! errno = ENOMEM; ! return -1; ! } ! ! /* Allocate room for the wide characters. */ ! if (__builtin_expect (totsize < ALLOCA_LIMIT, 1)) ! wpattern = (wchar_t *) alloca (totsize * sizeof (wchar_t)); ! else ! { ! wpattern = malloc (totsize * sizeof (wchar_t)); ! if (__builtin_expect (! wpattern, 0)) { ! errno = ENOMEM; ! return -1; } } - wstring = wpattern + patsize; - - /* Convert the strings into wide characters. */ - mbsrtowcs (wpattern, &pattern, patsize, &ps); - assert (mbsinit (&ps)); - mbsrtowcs (wstring, &string, strsize, &ps); - - res = internal_fnwmatch (wpattern, wstring, wstring + strsize - 1, - flags & FNM_PERIOD, flags); - - if (__builtin_expect (! (totsize < ALLOCA_LIMIT), 0)) - free (wpattern); - return res; } # endif /* HANDLE_MULTIBYTE */ return internal_fnmatch (pattern, string, string + strlen (string), --- 319,369 wide characters. */ memset (&ps, '\0', sizeof (ps)); patsize = mbsrtowcs (NULL, &pattern, 0, &ps) + 1; ! if (__builtin_expect (patsize != 0, 1)) { ! assert (mbsinit (&ps)); ! strsize = mbsrtowcs (NULL, &string, 0, &ps) + 1; ! if (__builtin_expect (strsize != 0, 1)) { ! assert (mbsinit (&ps)); ! totsize = patsize + strsize; ! if (__builtin_expect (! (patsize <= totsize ! && totsize <= SIZE_MAX / sizeof (wchar_t)), ! 0)) ! { ! errno = ENOMEM; ! return -1; ! } ! ! /* Allocate room for the wide characters. */ ! if (__builtin_expect (totsize < ALLOCA_LIMIT, 1)) ! wpattern = (wchar_t *) alloca (totsize * sizeof (wchar_t)); ! else ! { ! wpattern = malloc (totsize * sizeof (wchar_t)); ! if (__builtin_expect (! wpattern, 0)) ! { ! errno = ENOMEM; ! return -1; ! } ! } ! wstring = wpattern + patsize; ! ! /* Convert the strings into wide characters. */ ! mbsrtowcs (wpattern, &pattern, patsize, &ps); ! assert (mbsinit (&ps)); ! mbsrtowcs (wstring, &string, strsize, &ps); ! ! res = internal_fnwmatch (wpattern, wstring, wstring + strsize - 1, ! flags & FNM_PERIOD, flags); ! ! if (__builtin_expect (! (totsize < ALLOCA_LIMIT), 0)) ! free (wpattern); ! return res; } } } + # endif /* HANDLE_MULTIBYTE */ return internal_fnmatch (pattern, string, string + strlen (string), ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib