date:20050607

Re: [bug-gnulib] quote characters in stds

2005-06-07 Thread Karl Berry

to educated people it recommends Unicode, without mentioning it explicitly.

True.  I do not know how else to write it.  (I'm also not sure rms will
go for it at all.)

That depends on your mailer. Is it a package in Emacs, or is it 'pine'
without Bernhard Kaindl's patches?

My personal configuration is not the point (it's vm inside emacs).  My
point is that it didn't come through correctly.  I am sure I am not
unique in this.

Maybe you can reformulate the last two paragraphs in a way that is
less incorrect?

Sorry, since I do not see what is incorrect about them, I do not know
how to reformulate them.  If you can suggest wording that makes you
happier, please do.

It's at the IANA: http://www.iana.org/assignments/character-sets

Thanks.


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: quote characters in stds

2005-06-07 Thread Jim Meyering

Hi Karl,

Here's a nit:

[EMAIL PROTECTED] (Karl Berry) wrote:
...
> The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote}
> and @code{quotearg} modules provide a reasonably straightforward way
> support locale-specific quote characters, as well as taking care of

s/support/to support/


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: [bug-gnulib] quote characters in stds

2005-06-07 Thread Bruno Haible

Karl Berry wrote:
> Yes, but rms has explicitly rejected (in previous email with me) the
> idea of recommending the use of UTF-8 in any context whatsoever.  Sigh.

Sigh. What you wrote there:

   If you need to use non-ASCII characters, for example to represent
   names of contributors, you should normally stick with one encoding, as
   one cannot in general mix encodings reliably.  

is a salomonic solution: to educated people it recommends Unicode, without
mentioning it explicitly.

> My personal experience is that it is true that Unicode is still
> considerably less widely usable than Latin1.  Sure, Unicode is available
> in many contexts and systems.  But the names in your message, just for
> example, came through as garbage to me.

That depends on your mailer. Is it a package in Emacs, or is it 'pine'
without Bernhard Kaindl's patches?

> No doubt I personally could
> eventually configure everything involved to display it properly, but the
> point is that it doesn't "just work".

True: there are some distributions where things don't "just work", but these
non-Unicode-enabled corners are diminishing.

Maybe you can reformulate the last two paragraphs in a way that is less
incorrect?

> PS: The right spelling of the encodings is "Latin1" (no dash, no space)
>
> I'm glad to know that, it's easier to type than @tie{} :).  I had mostly
> seen it with a space.  Do you happen to know where the definitive
> spelling is given?

It's at the IANA: http://www.iana.org/assignments/character-sets

Bruno

___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: gcc -Wall warning for minmax.h

2005-06-07 Thread Stepan Kasal

Hi,

On Tue, Jun 07, 2005 at 10:19:47AM -0400, Derek Price wrote:
> >- we have to document also the fact that AS_TR_SH & AS_TR_CPP expand
> >  to literal variable (symbol) name, if their argument is a literal
> 
> I didn't think this was important from the user's perspective.

In the patch I proposed, I used gl_CACHE_VAR as the second argument
of AC_CACHE_CHECK.
That argument cannot contain backticks, see the Appendix below.

That explains why I relied on the feature, and why I wanted to document it.
But yes, it's better to fix AC_CACHE_CHECK to remove this limitation.

After that fix, we could also remove the AS_LITERAL_IF with m4_fatal.

I'm happy that AS_LITERAL_IF will stay undocumented.
m4_fatal and m4_warning should be documented, though.

Have a nice day,
Stepan

Appendix:
If gl_CACHE_VAR expanded to

`echo "xyz-xyz" | sed ...`

then you get something like:

eval "test \"\${gl_CACHE_VAR+set}\" = set"

eval "test \"\${`echo "xyz-xyz" | sed ...`+set}\" = set"

And this construct is not portable, see the first paragraph of node
"Shell Substitutions".

This can be fixed: AS_VAR_TEST_SET could in this case expand to
as_var=`echo "xyz-xyz" | sed ...`
eval "test \"\${$as_var+set}\" = set"

___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: [bug-gnulib] quote characters in stds

2005-06-07 Thread Karl Berry

This is misleading.

I know, but I'm not sure what to say.  Just delete the sentence about
Latin1, maybe?  I guess it's not really necessary.

To represent them, you need Unicode, i.e. the UTF-8 encoding.

Yes, but rms has explicitly rejected (in previous email with me) the
idea of recommending the use of UTF-8 in any context whatsoever.  Sigh.

This is not true for several years now. 

Well, whether or not it is true, rms will not accept it, so there's no
sense arguing it here.

My personal experience is that it is true that Unicode is still
considerably less widely usable than Latin1.  Sure, Unicode is available
in many contexts and systems.  But the names in your message, just for
example, came through as garbage to me.  No doubt I personally could
eventually configure everything involved to display it properly, but the
point is that it doesn't "just work".  And I suspect I am using far
newer versions of everything than an "average" user.

PS: The right spelling of the encodings is "Latin1" (no dash, no space)

I'm glad to know that, it's easier to type than @tie{} :).  I had mostly
seen it with a space.  Do you happen to know where the definitive
spelling is given?  I've poked around the ISO site without success.


Another draft below.  I'm not quite sure why ` would ever be
"unacceptable", and I'm a bit skeptical that it will past muster with
rms, but I'm trying to avoid an argument with standards-mavens.  And gcc
4 already does '...'.  Any improved wording and/or backup facts welcome :).

Thanks,
k


@node Quote characters
@section Quote characters
@cindex quote characters

In the C locale, GNU programs should stick to plain ASCII for
quotation characters in messages to users: preferably 0x60 (`) for
left quotes and 0x27 (') for right quotes.  If using ` is unacceptable
in your application, other possibilities are using ' for both opening
and closing, or " (0x22) for both opening and closing.  It is ok, but
not required, to use locale-specific quotes in other locales.

The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote}
and @code{quotearg} modules provide a reasonably straightforward way
support locale-specific quote characters, as well as taking care of
other issues, such as quoting a filename that itself contains a quote
character.  See the Gnulib documentation for usage details.

In any case, the documentation for your program should clearly specify
how it does quoting, if different than the preferred method of ` and
'.  This is especially important if the output of your program is ever
likely to be parsed by another program.

ASCII should also be preferred in source code comments, text
documents, and other contexts, unless there is good reason to do
something else because of the domain at hand.

If you need to use non-ASCII characters, for example to represent
names of contributors, you should normally stick with one encoding, as
one cannot in general mix encodings reliably.  

Quotation characters are a difficult area in the computing world at this
time: there are no true left or right quote characters in ASCII, or even
Latin1 (the ` character we use is standardized as a grave accent).
Latin1 does have paired standalone accents, but it seems wrong in
principle to abuse them as quotes.  And even Latin1 is not universally
usable.

Unicode contains the unambiguous quote characters required, and its
common encoding UTF-8 is upward compatible with [EMAIL PROTECTED]  But Unicode
and UTF-8 are deployed even less widely than Latin1; it would be
premature to require Unicode support for running essentially every GNU
program.

Perhaps the prevailing situation will change in a few years, and then
we will revisit this.


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: [bug-gnulib] quote characters in stds

2005-06-07 Thread Bruno Haible

Karl Berry wrote:

> @node Quote characters
> @section Quote characters
> @cindex quote characters
>
> In the C locale, GNU programs should stick to plain ASCII for
> quotation characters in messages to users: either 0x60 (`) for left
> quotes and 0x27 (') for right quotes, or ' for both opening and
> closing, or " (0x22) for both opening and closing.  It is ok, but not
> required, to use locale-specific quotes in other locales.
>
> The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote}
> and @code{quotearg} modules provide a reasonably straightforward way
> support locale-specific quote characters, as well as taking care of
> other issues, such as quoting a filename that itself contains a quote
> character.  See the Gnulib documentation for usage details.
>
> ASCII should also be preferred in source code comments, text
> documents, and other contexts, unless there is good reason to do
> something else because of the domain at hand.

Agreed.

> If you need to use non-ASCII characters, for example to represent
> names of contributors, you should normally stick with one encoding, as
> one cannot in general mix encodings reliably.  [EMAIL PROTECTED] is the
> most widely usable encoding today, after plain [EMAIL PROTECTED]

This is misleading. In a list of contributors, I often find names like
Rafał Maszkowski, Primož Peterlin, Martin Mokrejš, and Владимир 
Слепнев
(Vladimir Slepnev). To represent them, you need Unicode, i.e. the
UTF-8 encoding.

> Quotation characters are a difficult area in the computing world at
> this time: there are no true left or right quote characters in ASCII,
> or even [EMAIL PROTECTED]  [EMAIL PROTECTED] does have paired standalone
> accents, but it seems wrong in principle to abuse them as quotes.  And
> even [EMAIL PROTECTED] is not universally usable.
>
> Unicode contains the unambiguous quote characters required, and its
> common encoding [EMAIL PROTECTED] is upward compatible with [EMAIL PROTECTED]

Agreed.

> But Unicode and UTF-8 are deployed less widely than [EMAIL PROTECTED]; it 
> would
> be premature to require Unicode support for running essentially every
> GNU program.

This is not true for several years now. The major GUI toolkits, KDE/Qt
and GNOME/Gtk, support Unicode for several years, and are now featuring
good support not only of Western and CJK languages, but also of Bidi
scripts and Indic languages. 'vi' is UTF-8 enabled since 2001. For more
than one year, major Linux distributions like Fedora Core 3 put users
into UTF-8 locales by default.
See http://www.cl.cam.ac.uk/~mgk25/unicode.html for more info.

Bruno

PS: The right spelling of the encodings is "Latin1" (no dash, no space)
and "UTF-8" (with a HYPHEN-MINUS in between).



___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: quote characters in stds

2005-06-07 Thread Bruno Haible

Simon Josefsson wrote:
> Is it possible to discuss how that relate to the
> [EMAIL PROTECTED] stuff recommended by gettext to solve the similar problem?

I wouldn't expand on this, at least not in the GNU standards: very
few people use these [EMAIL PROTECTED] or [EMAIL PROTECTED] catalogs. In 5 
years existence,
I got 1 single bug-report/support question about them. Therefore if you
want to push this, I think gettext's ABOUT-NLS or some i18n web sites
are more appropriate than the GNU standards.

Bruno

___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: quote characters in stds

2005-06-07 Thread Karl Berry

Hi Simon,

Thanks for the note.

Are they mutual exclusive, or should they
be used in combination?  Those things aren't clear to me.  

They aren't clear to me either, but I *think* they can be used in
combination.  That is, if you use the gnulib quote module or equivalent,
then you could set your locale to [EMAIL PROTECTED] (somehow ...), and get the
UTF-8 quotes, transliterated to ' or " in some cases.  (I'm getting this
from the po file, appended; I've never actually used [EMAIL PROTECTED])

I think the appropriate place to discuss the details would be in the
Gnulib and/or Gettext documentation, not the coding standards, although
perhaps they should be mentioned.  (Something like: "Independent of
gnulib, you can use the [EMAIL PROTECTED] catalog provided by gettext to 
achieve a
similar result.  See .")

I trust Bruno will fill us in.

karl


# English translations for GNU gettext-runtime package.
# Copyright (C) 2005 Free Software Foundation, Inc.
# This file is distributed under the same license as the GNU gettext-runtime 
package.
# Automatically generated, 2005.
#
# All this catalog "translates" are quotation characters.
# The msgids must be ASCII and therefore cannot contain real quotation
# characters, only substitutes like grave accent (0x60), apostrophe (0x27)
# and double quote (0x22). These substitutes look strange; see
# http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
#
# This catalog translates grave accent (0x60) and apostrophe (0x27) to
# left single quotation mark (U+2018) and right single quotation mark (U+2019).
# It also translates pairs of apostrophe (0x27) to
# left single quotation mark (U+2018) and right single quotation mark (U+2019)
# and pairs of quotation mark (0x22) to
# left double quotation mark (U+201C) and right double quotation mark (U+201D).
#
# When output to an UTF-8 terminal, the quotation characters appear perfectly.
# When output to an ISO-8859-1 terminal, the single quotation marks are
# transliterated to apostrophes (by iconv in glibc 2.2 or newer) or to
# grave/acute accent (by libiconv), and the double quotation marks are
# transliterated to 0x22.
# When output to an ASCII terminal, the single quotation marks are
# transliterated to apostrophes, and the double quotation marks are
# transliterated to 0x22.
#


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

FYI: Minor patch to glob_.h

2005-06-07 Thread Derek Price

I've installed the attached patch:

2005-06-07  Derek Price  <[EMAIL PROTECTED]>

Sync from CVS.
* lib/glob_.h: Indent nested #ifdef.

Do you want me to keep sending FYI's to this list for this sort of minor
change?

Regards,

Derek
Index: lib/glob_.h
===
RCS file: /cvsroot/gnulib/gnulib/lib/glob_.h,v
retrieving revision 1.2
diff -u -p -r1.2 glob_.h
--- lib/glob_.h 31 May 2005 21:01:17 -  1.2
+++ lib/glob_.h 7 Jun 2005 14:55:51 -
@@ -135,11 +135,11 @@ typedef struct
are used instead of the normal file access functions.  */
 void (*gl_closedir) (void *);
 #ifdef __USE_GNU
-#if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
+# if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
 struct dirent *(*gl_readdir) (void *);
-#else
+# else
 struct direct *(*gl_readdir) (void *);
-#endif
+# endif
 #else
 void *(*gl_readdir) (void *);
 #endif
___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: quote characters in stds

2005-06-07 Thread Simon Josefsson

[EMAIL PROTECTED] (Karl Berry) writes:

> @node Quote characters

I like that section.  Is it possible to discuss how that relate to the
[EMAIL PROTECTED] stuff recommended by gettext to solve the similar problem?
Which method is preferable?  Are they mutual exclusive, or should they
be used in combination?  Those things aren't clear to me.  From the
gettext manual:

   It is recommended that you add the "languages" [EMAIL PROTECTED]' and
   [EMAIL PROTECTED]' to the `LINGUAS' file.  [EMAIL PROTECTED]' is a variant of
   English message catalogs (`en') which uses real quotation marks
   instead of the ugly looking asymmetric ASCII substitutes ``' and
   `''.  [EMAIL PROTECTED]' is a variant of [EMAIL PROTECTED]' that additionally
   outputs quoted pieces of text in a bold font, when used in a
   terminal emulator which supports the VT100 escape sequences (such
   as `xterm' or the Linux console, but not Emacs in `M-x shell'
   mode).

   These extra message catalogs [EMAIL PROTECTED]' and [EMAIL PROTECTED]' are
   constructed automatically, not by translators; to support them, you
   need the files `Rules-quot', `quot.sed', `boldquot.sed',
   [EMAIL PROTECTED]', [EMAIL PROTECTED]', `insert-header.sin' in the
   `po/' directory.  You can copy them from GNU gettext's `po/'
   directory; they are also installed by running `gettextize'.


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: stat and lstat should define their replacements

2005-06-07 Thread Derek Price

Paul Eggert wrote:

>Derek Price <[EMAIL PROTECTED]> writes:
>
>  
>
>>What does this mean for Bruno's recent patch for stat & lstat which
>>removed the SunOS 4.1.4 support in addition to some other fixes?
>>
>>
>
>His patch made sense to me, but as far as I know nobody has taken the
>time to integrate the comments on it and try it out.  Here's what I
>see so far:
>  
>

I've attached a revised patch.  In addition to integrating the
suggestions Bruno's patch received, it also includes changes to
modules/lstat, config/srclist.txt, and MODULES.html.sh.  In addition to
this patch, lib/stat.c, m4/stat.m4, and modules/stat need to be removed.

It appears to work installed in CVS on Linux, but that only tests the
modules file and macro, not the C source or header files.

Cheers,

Derek
Index: MODULES.html.sh
===
RCS file: /cvsroot/gnulib/gnulib/MODULES.html.sh,v
retrieving revision 1.88
diff -u -p -r1.88 MODULES.html.sh
--- MODULES.html.sh 2 Jun 2005 20:41:04 -   1.88
+++ MODULES.html.sh 7 Jun 2005 14:46:02 -
@@ -1063,7 +1063,6 @@ srand
 srand48
 srandom
 sscanf
-stat
 statvfs
 stdin
 strcasecmp
@@ -1756,7 +1755,6 @@ func_all_modules ()
   func_module mkdtemp
   func_module poll
   func_module readlink
-  func_module stat
   func_module lstat
   func_module time_r
   func_module timespec
Index: config/srclist.txt
===
RCS file: /cvsroot/gnulib/gnulib/config/srclist.txt,v
retrieving revision 1.63
diff -u -p -r1.63 srclist.txt
--- config/srclist.txt  29 May 2005 16:56:02 -  1.63
+++ config/srclist.txt  7 Jun 2005 14:46:02 -
@@ -143,7 +143,6 @@ $LIBCSRC/sysdeps/generic/memmem.c   lib gp
 #
 # These implementations are quite different.
 #$LIBCSRC/io/lstat.c   lib gpl
-#$LIBCSRC/io/stat.clib gpl
 #$LIBCSRC/libio/__fpending.c   lib gpl
 #$LIBCSRC/malloc/malloc.c  lib gpl
 #$LIBCSRC/misc/dirname.c   lib gpl
Index: lib/lstat.c
===
RCS file: /cvsroot/gnulib/gnulib/lib/lstat.c,v
retrieving revision 1.7
diff -u -p -r1.7 lstat.c
--- lib/lstat.c 14 May 2005 06:03:58 -  1.7
+++ lib/lstat.c 7 Jun 2005 14:46:02 -
@@ -1,8 +1,6 @@
-/* Work around the bug in some systems whereby lstat succeeds when
-   given the zero-length file name argument.  The lstat from SunOS 4.1.4
-   has this bug.
+/* Work around a bug of lstat on some systems
 
-   Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003 Free
+   Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005 Free
Software Foundation, Inc.
 
This program is free software; you can redistribute it and/or modify
@@ -19,5 +17,59 @@
along with this program; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
 
-#define LSTAT
-#include "stat.c"
+/* written by Jim Meyering */
+
+#include 
+
+/* The specification of these functions is in sys_stat.h.  But we cannot
+   include this include file here, because on some systems, a
+   "#define lstat lstat64" is being used, and sys_stat.h deletes this
+   definition.  */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "stat-macros.h"
+#include "xalloc.h"
+
+/* lstat works differently on Linux and Solaris systems.  POSIX (see
+   `pathname resolution' in the glossary) requires that programs like `ls'
+   take into consideration the fact that FILE has a trailing slash when
+   FILE is a symbolic link.  On Linux systems, the lstat function already
+   has the desired semantics (in treating `lstat("symlink/",sbuf)' just like
+   `lstat("symlink/.",sbuf)', but on Solaris it does not.
+
+   If FILE has a trailing slash and specifies a symbolic link,
+   then append a `.' to FILE and call lstat a second time.  */
+
+int
+rpl_lstat (const char *file, struct stat *sbuf)
+{
+  size_t len;
+  char *new_file;
+
+  int lstat_result = lstat (file, sbuf);
+
+  if (lstat_result != 0 || !S_ISLNK (sbuf->st_mode))
+return lstat_result;
+
+  len = strlen (file);
+  if (len == 0 || file[len - 1] != '/')
+return lstat_result;
+
+  /* FILE refers to a symbolic link and the name ends with a slash.
+ Append a `.' to FILE and repeat the lstat call.  */
+
+  /* Add one for the `.' we'll append, and one more for the trailing NUL.  */
+  new_file = xmalloc (len + 1 + 1);
+  memcpy (new_file, file, len);
+  new_file[len] = '.';
+  new_file[len + 1] = 0;
+
+  lstat_result = lstat (new_file, sbuf);
+  free (new_file);
+
+  return lstat_result;
+}
Index: lib/lstat.h
===
RCS file: lib/lstat.h
diff -N lib/lstat.h
--- /dev/null   1 Jan 1970 00:00:00 -
+++ lib/lstat.h 7 Jun 2005 14:46:02 -
@@ -0,0 +1,24 @@
+/* Retrieving information about files.
+   Copyrigh

Re: gcc -Wall warning for minmax.h

2005-06-07 Thread Derek Price

Stepan Kasal wrote:

>- We need to document also AS_LITERAL_IF and m4_fatal
>  (And you could also document m4_warning, when you are at it.)
>  
>

I'll see about it after I get comments on the first round back.

>- we have to document also the fact that AS_TR_SH & AS_TR_CPP expand
>  to literal variable (symbol) name, if their argument is a literal
>  
>

I didn't think this was important from the user's perspective.  I showed
in my examples that variables were expanded properly before conversion,
but I didn't think it important to mention here that expansion of input
without shell meta-characters would be optimized.

Cheers,

Derek

___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: quote characters in stds

2005-06-07 Thread Karl Berry

I believe that the standard should probably
suggest a preferred alternative.

Yeah, you're probably right.  I was trying to avoid dissension I suspect
there are some GNU'ers who will hate the idea of using `), but it's
likely unavoidable :).  Guess I'll try changing the first "either" to
"preferably", etc.

 process the output of your program with another

Yes, good point.  I'll try to dream something up for that.


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: quote characters in stds

2005-06-07 Thread Karl Berry

Hi James,

It might be worth pointing out that all valid ASCII files are valid
UTF-8 files, but not all valid Latin-1 files are valid UTF-8 files.

Thanks for the suggestion.  I'm glad to know this myself (I thought it
was the case, but didn't know the specifics), but since rms does not
want to support/recommend UTF-8 at all, I think the less said about it
here the better.

Cheers,
Karl


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: quote characters in stds

2005-06-07 Thread James Youngman

On Tue, Jun 07, 2005 at 09:15:04AM -0400, Karl Berry wrote:

> In the C locale, GNU programs should stick to plain ASCII for
> quotation characters in messages to users: either 0x60 (`) for left
> quotes and 0x27 (') for right quotes, or ' for both opening and
> closing, or " (0x22) for both opening and closing.  It is ok, but not
> required, to use locale-specific quotes in other locales.

I forgot to actually provide feedback.  The main thrust of the text is
agreeable to me though I think it might be better to be a little more
prescriptive - that is, I believe that the standard should probably
suggest a preferred alternative.

Secondly, the standard should state that if it is ever likely that
someone will need to process the output of your program with another
program, then the documentation for your program should clearly
indicate how it does quoting and how the various 'corner cases' are
handled.

Regards,
James.

___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: quote characters in stds

2005-06-07 Thread James Youngman

Karl writes:

> Unicode contains the unambiguous quote characters required, and its
> common encoding [EMAIL PROTECTED] is upward compatible with [EMAIL PROTECTED] 
>  

It might be worth pointing out that all valid ASCII files are valid
UTF-8 files, but not all valid Latin-1 files are valid UTF-8 files.

Specifically, there are characters in Latin-1 that are used in Unicode
as leading bytes of multibyte characters (for example 0xE8, which is
an e with a grave accent).  Unicode is a superset of Latin-1, but that
doesn't mean that you can load a Latin-1 file as if it was UTF-8.

It might be worth considering this wording change...

> Unicode contains the unambiguous quote characters required, and its
> common encoding [EMAIL PROTECTED] is upward compatible with [EMAIL PROTECTED]
> However, you can't process a Latin-1 encoded file as if it were
> [EMAIL PROTECTED], because some Latin-1 character codes are used to begin
> multibyte character sequences in [EMAIL PROTECTED]

... though this is sort of drifting away from the main point of a
section on quote characters and into guidance on handling character
encoding systems.

James.

___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: gcc -Wall warning for minmax.h

2005-06-07 Thread Stepan Kasal

Hello Derek,

thank you very much for taking care about this.

On Mon, Jun 06, 2005 at 02:31:24PM -0400, Derek Price wrote:
> Yes, AS_TR_SH & AS_TR_CPP appear to be undocumented.  I've submitted a
> patch to autoconf-patches to remedy this and will commit it within a few
> days unless there are objections there.

I haven't seen the patch; and won't see before you commit, as I'll be
offline until Jun 21.

That's why I add some comments now, though you probably know most of
them:
- We need to document also AS_LITERAL_IF and m4_fatal
  (And you could also document m4_warning, when you are at it.)
- we have to document also the fact that AS_TR_SH & AS_TR_CPP expand
  to literal variable (symbol) name, if their argument is a literal

A cheeky closing note:
Bruno, your code also uses undocumented macros: `define', `undefine'
and `translit'.  Please note that the manual states that they are
_moved_ into the m4_ pseudo-namespace.
And IMHO these will never be documented, they are only for backward
compatibility.  ;-)

Have a nice day,
Stepan

___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

quote characters in stds

2005-06-07 Thread Jose E. Marchesi


rms wants to address the issue of quote characters in the GNU coding
standards.  Among the people I've talked to, there's a general consensus
that it would be best to stick to ASCII at least for the C locale, and
rms agreed with that.  Paul Eggert (thanks Paul) and I drafted some text
following that.  I thought before I sent it back to rms, I would see if
anyone else had comments ... see below if you care.

I think the entry is quite good and clear.

-- 
José E. Marchesi <[EMAIL PROTECTED]>
 <[EMAIL PROTECTED]>

GNU España   http://es.gnu.org
GNU No es Unix!  http://www.gnu.org



___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

quote characters in stds

2005-06-07 Thread Karl Berry

rms wants to address the issue of quote characters in the GNU coding
standards.  Among the people I've talked to, there's a general consensus
that it would be best to stick to ASCII at least for the C locale, and
rms agreed with that.  Paul Eggert (thanks Paul) and I drafted some text
following that.  I thought before I sent it back to rms, I would see if
anyone else had comments ... see below if you care.

I am not sure if rms himself will accept everything in here, but we
should at least try to submit something that minimizes unhappiness among
the programmers.

BTW, the Gnulib doc today does not talk about quotes, but I will add
something before any coding standards change gets distributed.  (Since I
do the actual coding standards updates, I can be sure of this. :)

Thanks,
karl


@node Quote characters
@section Quote characters
@cindex quote characters

In the C locale, GNU programs should stick to plain ASCII for
quotation characters in messages to users: either 0x60 (`) for left
quotes and 0x27 (') for right quotes, or ' for both opening and
closing, or " (0x22) for both opening and closing.  It is ok, but not
required, to use locale-specific quotes in other locales.

The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote}
and @code{quotearg} modules provide a reasonably straightforward way
support locale-specific quote characters, as well as taking care of
other issues, such as quoting a filename that itself contains a quote
character.  See the Gnulib documentation for usage details.

ASCII should also be preferred in source code comments, text
documents, and other contexts, unless there is good reason to do
something else because of the domain at hand.

If you need to use non-ASCII characters, for example to represent
names of contributors, you should normally stick with one encoding, as
one cannot in general mix encodings reliably.  [EMAIL PROTECTED] is the
most widely usable encoding today, after plain [EMAIL PROTECTED]

Quotation characters are a difficult area in the computing world at
this time: there are no true left or right quote characters in ASCII,
or even [EMAIL PROTECTED]  [EMAIL PROTECTED] does have paired standalone
accents, but it seems wrong in principle to abuse them as quotes.  And
even [EMAIL PROTECTED] is not universally usable.

Unicode contains the unambiguous quote characters required, and its
common encoding [EMAIL PROTECTED] is upward compatible with [EMAIL PROTECTED]  
But
Unicode and UTF-8 are deployed less widely than [EMAIL PROTECTED]; it would
be premature to require Unicode support for running essentially every
GNU program.

Perhaps the prevailing situation will change in a few years, and then
we will revisit this.




___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: [bug-gnulib] Handling of invalid multibyte character sequences in fnmatch()

2005-06-07 Thread James Youngman

On Tue, Jun 07, 2005 at 12:03:27AM -0700, Paul Eggert wrote:
> James Youngman <[EMAIL PROTECTED]> writes:
> 
> > Any ideas/suggestions?
> 
> Does the following untested patch fix things?  It attempts to mimic
> what Bash does.
> 
> *** fnmatch.c Fri May 13 23:03:58 2005
> --- /tmp/fnmatch.cTue Jun  7 00:02:03 2005
> *** fnmatch (const char *pattern, const char

[...]

It appears not to affect this behaviour, but I don't have time right
now to run it under a debugger to find out why.  I'll have time in
about 9 hours when I get back from work.

Regards,
James.



___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: stat and lstat should define their replacements

2005-06-07 Thread Paul Eggert

Derek Price <[EMAIL PROTECTED]> writes:

> What does this mean for Bruno's recent patch for stat & lstat which
> removed the SunOS 4.1.4 support in addition to some other fixes?

His patch made sense to me, but as far as I know nobody has taken the
time to integrate the comments on it and try it out.  Here's what I
see so far:

http://lists.gnu.org/archive/html/bug-gnulib/2005-05/msg00243.html
http://lists.gnu.org/archive/html/bug-gnulib/2005-05/msg00244.html
http://lists.gnu.org/archive/html/bug-gnulib/2005-05/msg00246.html
http://lists.gnu.org/archive/html/bug-gnulib/2005-05/msg00264.html


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: [bug-gnulib] Handling of invalid multibyte character sequences in fnmatch()

2005-06-07 Thread Paul Eggert

James Youngman <[EMAIL PROTECTED]> writes:

> Any ideas/suggestions?

Does the following untested patch fix things?  It attempts to mimic
what Bash does.

*** fnmatch.c   Fri May 13 23:03:58 2005
--- /tmp/fnmatch.c  Tue Jun  7 00:02:03 2005
*** fnmatch (const char *pattern, const char
*** 319,372 
 wide characters.  */
memset (&ps, '\0', sizeof (ps));
patsize = mbsrtowcs (NULL, &pattern, 0, &ps) + 1;
!   if (__builtin_expect (patsize == 0, 0))
!   /* Something wrong.
!  XXX Do we have to set `errno' to something which mbsrtows hasn't
!  already done?  */
!   return -1;
!   assert (mbsinit (&ps));
!   strsize = mbsrtowcs (NULL, &string, 0, &ps) + 1;
!   if (__builtin_expect (strsize == 0, 0))
!   /* Something wrong.
!  XXX Do we have to set `errno' to something which mbsrtows hasn't
!  already done?  */
!   return -1;
!   assert (mbsinit (&ps));
!   totsize = patsize + strsize;
!   if (__builtin_expect (! (patsize <= totsize
!  && totsize <= SIZE_MAX / sizeof (wchar_t)),
!   0))
{
! errno = ENOMEM;
! return -1;
!   }
! 
!   /* Allocate room for the wide characters.  */
!   if (__builtin_expect (totsize < ALLOCA_LIMIT, 1))
!   wpattern = (wchar_t *) alloca (totsize * sizeof (wchar_t));
!   else
!   {
! wpattern = malloc (totsize * sizeof (wchar_t));
! if (__builtin_expect (! wpattern, 0))
{
! errno = ENOMEM;
! return -1;
}
}
-   wstring = wpattern + patsize;
- 
-   /* Convert the strings into wide characters.  */
-   mbsrtowcs (wpattern, &pattern, patsize, &ps);
-   assert (mbsinit (&ps));
-   mbsrtowcs (wstring, &string, strsize, &ps);
- 
-   res = internal_fnwmatch (wpattern, wstring, wstring + strsize - 1,
-  flags & FNM_PERIOD, flags);
- 
-   if (__builtin_expect (! (totsize < ALLOCA_LIMIT), 0))
-   free (wpattern);
-   return res;
  }
  # endif /* HANDLE_MULTIBYTE */
  
return internal_fnmatch (pattern, string, string + strlen (string),
--- 319,369 
 wide characters.  */
memset (&ps, '\0', sizeof (ps));
patsize = mbsrtowcs (NULL, &pattern, 0, &ps) + 1;
!   if (__builtin_expect (patsize != 0, 1))
{
! assert (mbsinit (&ps));
! strsize = mbsrtowcs (NULL, &string, 0, &ps) + 1;
! if (__builtin_expect (strsize != 0, 1))
{
! assert (mbsinit (&ps));
! totsize = patsize + strsize;
! if (__builtin_expect (! (patsize <= totsize
!  && totsize <= SIZE_MAX / sizeof 
(wchar_t)),
!   0))
!   {
! errno = ENOMEM;
! return -1;
!   }
! 
! /* Allocate room for the wide characters.  */
! if (__builtin_expect (totsize < ALLOCA_LIMIT, 1))
!   wpattern = (wchar_t *) alloca (totsize * sizeof (wchar_t));
! else
!   {
! wpattern = malloc (totsize * sizeof (wchar_t));
! if (__builtin_expect (! wpattern, 0))
!   {
! errno = ENOMEM;
! return -1;
!   }
!   }
! wstring = wpattern + patsize;
! 
! /* Convert the strings into wide characters.  */
! mbsrtowcs (wpattern, &pattern, patsize, &ps);
! assert (mbsinit (&ps));
! mbsrtowcs (wstring, &string, strsize, &ps);
! 
! res = internal_fnwmatch (wpattern, wstring, wstring + strsize - 1,
!  flags & FNM_PERIOD, flags);
! 
! if (__builtin_expect (! (totsize < ALLOCA_LIMIT), 0))
!   free (wpattern);
! return res;
}
}
  }
+ 
  # endif /* HANDLE_MULTIBYTE */
  
return internal_fnmatch (pattern, string, string + strlen (string),


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib

Re: [bug-gnulib] quote characters in stds

Re: quote characters in stds

Re: [bug-gnulib] quote characters in stds

Re: gcc -Wall warning for minmax.h

Re: [bug-gnulib] quote characters in stds

Re: [bug-gnulib] quote characters in stds

Re: quote characters in stds

Re: quote characters in stds

FYI: Minor patch to glob_.h

Re: quote characters in stds

Re: stat and lstat should define their replacements

Re: gcc -Wall warning for minmax.h

Re: quote characters in stds

Re: quote characters in stds

Re: quote characters in stds

Re: quote characters in stds

Re: gcc -Wall warning for minmax.h

quote characters in stds

quote characters in stds

Re: [bug-gnulib] Handling of invalid multibyte character sequences in fnmatch()

Re: stat and lstat should define their replacements

Re: [bug-gnulib] Handling of invalid multibyte character sequences in fnmatch()

22 matches

Site Navigation

Mail list logo

Footer information