Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)
On Tue, Dec 03, 2002 at 11:09:20AM +0300, Andrey A. Chernov wrote: > On Wed, Nov 20, 2002 at 14:54:12 +0200, Ruslan Ermilov wrote: > > > Index: b.c > > === > > RCS file: /home/ncvs/src/contrib/one-true-awk/b.c,v > > retrieving revision 1.1.1.2 > > diff -u -p -r1.1.1.2 b.c > > David, this variant is nice enough. Please, commit. > One needs to catch up to his email. :-) A new version of one-true-awk was released which includes these fixes. Cheers, -- Ruslan Ermilov Sysadmin and DBA, [EMAIL PROTECTED] Sunbay Software AG, [EMAIL PROTECTED] FreeBSD committer, +380.652.512.251Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age msg48021/pgp0.pgp Description: PGP signature
Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)
On Wed, Nov 20, 2002 at 14:54:12 +0200, Ruslan Ermilov wrote: > Index: b.c > === > RCS file: /home/ncvs/src/contrib/one-true-awk/b.c,v > retrieving revision 1.1.1.2 > diff -u -p -r1.1.1.2 b.c David, this variant is nice enough. Please, commit. -- Andrey A. Chernov http://ache.pp.ru/ msg48017/pgp0.pgp Description: PGP signature
Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)
On Wed, Nov 20, 2002 at 02:27:53PM +1100, Tim Robbins wrote: > On Wed, Nov 20, 2002 at 04:38:38AM +0300, Andrey A. Chernov wrote: > > > On Tue, Nov 19, 2002 at 14:52:02 +0200, Ruslan Ermilov wrote: > > > It seems that this patch has never been committed. This is a critical > > > bug that should be fixed before 5.0-RELEASE is out. > > > > I agree. There is no locale yet and I never see that patch. > > This patch seems to work, I used the logic from regcomp.c in libc. > Long lines make it ugly, but it was like that when I got here ;) > Index: src/usr.bin/awk/Makefile > === > RCS file: /x/freebsd/src/usr.bin/awk/Makefile,v > retrieving revision 1.9 > diff -u -r1.9 Makefile > --- src/usr.bin/awk/Makefile 10 May 2002 20:36:21 - 1.9 > +++ src/usr.bin/awk/Makefile 20 Nov 2002 03:13:50 - > @@ -6,7 +6,7 @@ > PROG=nawk > SRCS=awkgram.y b.c lex.c lib.c main.c parse.c proctab.c run.c tran.c ytab.h > > -CFLAGS+= -I. -I${AWKSRC} > +CFLAGS+= -I. -I${AWKSRC} -I${.CURDIR}/../../lib/libc/locale > Ouch. > DPADD= ${LIBM} > LDADD= -lm > Index: src/contrib/one-true-awk/b.c > === > RCS file: /x/freebsd/src/contrib/one-true-awk/b.c,v > retrieving revision 1.1.1.2 > diff -u -r1.1.1.2 b.c > --- src/contrib/one-true-awk/b.c 19 Feb 2002 09:35:24 - 1.1.1.2 > +++ src/contrib/one-true-awk/b.c 20 Nov 2002 03:16:10 - > @@ -32,6 +32,7 @@ > #include > #include "awk.h" > #include "ytab.h" > +#include "collate.h" > > #define HAT (NCHARS-2) /* matches ^ in regular expr */ > /* NCHARS is 2**n */ > @@ -284,7 +285,7 @@ > > char *cclenter(char *argp) /* add a character class */ > { > - int i, c, c2; > + int i, j, c, c2; > uschar *p = (uschar *) argp; > uschar *op, *bp; > static uschar *buf = 0; > @@ -308,12 +309,24 @@ > i--; > continue; > } > - while (c < c2) { > - if (!adjbuf((char **) &buf, &bufsz, bp-buf+2, >100, (char **) &bp, 0)) > - FATAL("out of space for character >class [%.10s...] 2", p); > - *bp++ = ++c; > - i++; > - } > + if (__collate_load_error) { > + while (c < c2) { > + if (!adjbuf((char **) &buf, &bufsz, >bp-buf+2, 100, (char **) &bp, 0)) > + FATAL("out of space for >character class [%.10s...] 2", p); > + *bp++ = ++c; > + i++; > + } > + } else { > + for (j = CHAR_MIN; j <= CHAR_MAX; j++) { > + if (!adjbuf((char **) &buf, &bufsz, >bp-buf+2, 100, (char **) &bp, 0)) > + FATAL("out of space for >character class [%.10s...] 2", p); > + if (__collate_range_cmp(c, j) <= 0 > + && __collate_range_cmp(j, c2) <= >0) { > + *bp++ = j; > + i++; > + } > + } > +} > continue; > } > } There are a number of problems here: 1. The "empty range" check preceding this block should be made locale-aware too. 2. CHAR_MAX evaluates to 127 here. Here's my version of the above fix plus [[:class:]] fixes Andrey mentioned. I gave it only light testing. The collate_range_cmp() was stolen from the old awk(1). Cheers, -- Ruslan Ermilov Sysadmin and DBA, [EMAIL PROTECTED] Sunbay Software AG, [EMAIL PROTECTED] FreeBSD committer, +380.652.512.251Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age Index: b.c === RCS file: /home/ncvs/src/contrib/one-true-awk/b.c,v retrieving revision 1.1.1.2 diff -u -p -r1.1.1.2 b.c --- b.c 19 Feb 2002 09:35:24 - 1.1.1.2 +++ b.c 20 Nov 2002 12:51:10 - @@ -282,9 +282,25 @@ int quoted(char **pp) /* pick up next th return c; } +static int collate_range_cmp (a, b) + int a, b; +{ + int r; + static char s[2][2]; + +
Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)
On Wed, Nov 20, 2002 at 14:27:53 +1100, Tim Robbins wrote: > On Wed, Nov 20, 2002 at 04:38:38AM +0300, Andrey A. Chernov wrote: > > > On Tue, Nov 19, 2002 at 14:52:02 +0200, Ruslan Ermilov wrote: > > > It seems that this patch has never been committed. This is a critical > > > bug that should be fixed before 5.0-RELEASE is out. > > > > I agree. There is no locale yet and I never see that patch. > > This patch seems to work, I used the logic from regcomp.c in libc. > Long lines make it ugly, but it was like that when I got here ;) Looks good, but it is not enough. Please look in b.c to see how weird character classes, i.e. [:alpha:] are implemented there, this stuff must be rewritted too. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)
On Wed, Nov 20, 2002 at 04:38:38AM +0300, Andrey A. Chernov wrote: > On Tue, Nov 19, 2002 at 14:52:02 +0200, Ruslan Ermilov wrote: > > It seems that this patch has never been committed. This is a critical > > bug that should be fixed before 5.0-RELEASE is out. > > I agree. There is no locale yet and I never see that patch. This patch seems to work, I used the logic from regcomp.c in libc. Long lines make it ugly, but it was like that when I got here ;) Tim Index: src/usr.bin/awk/Makefile === RCS file: /x/freebsd/src/usr.bin/awk/Makefile,v retrieving revision 1.9 diff -u -r1.9 Makefile --- src/usr.bin/awk/Makefile10 May 2002 20:36:21 - 1.9 +++ src/usr.bin/awk/Makefile20 Nov 2002 03:13:50 - @@ -6,7 +6,7 @@ PROG= nawk SRCS= awkgram.y b.c lex.c lib.c main.c parse.c proctab.c run.c tran.c ytab.h -CFLAGS+= -I. -I${AWKSRC} +CFLAGS+= -I. -I${AWKSRC} -I${.CURDIR}/../../lib/libc/locale DPADD= ${LIBM} LDADD= -lm Index: src/contrib/one-true-awk/b.c === RCS file: /x/freebsd/src/contrib/one-true-awk/b.c,v retrieving revision 1.1.1.2 diff -u -r1.1.1.2 b.c --- src/contrib/one-true-awk/b.c19 Feb 2002 09:35:24 - 1.1.1.2 +++ src/contrib/one-true-awk/b.c20 Nov 2002 03:16:10 - @@ -32,6 +32,7 @@ #include #include "awk.h" #include "ytab.h" +#include "collate.h" #defineHAT (NCHARS-2) /* matches ^ in regular expr */ /* NCHARS is 2**n */ @@ -284,7 +285,7 @@ char *cclenter(char *argp) /* add a character class */ { - int i, c, c2; + int i, j, c, c2; uschar *p = (uschar *) argp; uschar *op, *bp; static uschar *buf = 0; @@ -308,12 +309,24 @@ i--; continue; } - while (c < c2) { - if (!adjbuf((char **) &buf, &bufsz, bp-buf+2, 100, (char **) &bp, 0)) - FATAL("out of space for character class [%.10s...] 2", p); - *bp++ = ++c; - i++; - } + if (__collate_load_error) { + while (c < c2) { + if (!adjbuf((char **) &buf, &bufsz, +bp-buf+2, 100, (char **) &bp, 0)) + FATAL("out of space for +character class [%.10s...] 2", p); + *bp++ = ++c; + i++; + } + } else { + for (j = CHAR_MIN; j <= CHAR_MAX; j++) { + if (!adjbuf((char **) &buf, &bufsz, +bp-buf+2, 100, (char **) &bp, 0)) + FATAL("out of space for +character class [%.10s...] 2", p); + if (__collate_range_cmp(c, j) <= 0 + && __collate_range_cmp(j, c2) <= +0) { + *bp++ = j; + i++; + } + } +} continue; } } Index: src/contrib/one-true-awk/main.c === RCS file: /x/freebsd/src/contrib/one-true-awk/main.c,v retrieving revision 1.1.1.3 diff -u -r1.1.1.3 main.c --- src/contrib/one-true-awk/main.c 16 Mar 2002 16:50:56 - 1.1.1.3 +++ src/contrib/one-true-awk/main.c 20 Nov 2002 03:03:38 - @@ -27,6 +27,7 @@ #define DEBUG #include #include +#include #include #include #include @@ -55,6 +56,7 @@ char *fs = NULL; cmdname = argv[0]; + setlocale(LC_ALL, ""); if (argc == 1) { fprintf(stderr, "Usage: %s [-f programfile | 'program'] [-Ffieldsep] [-v var=value] [files]\n", cmdname); exit(1); Index: src/contrib/one-true-awk/run.c === RCS file: /x/freebsd/src/contrib/one-true-awk/run.c,v retrieving revision 1.1.1.2 diff -u -r1.1.1.2 run.c --- src/contrib/one-true-awk/run.c 19 Feb 2002 09:35:25 - 1.1.1.2 +++ src/contrib/one-true-awk/run.c 20 Nov 2002 03:02:29 - @@ -1504,11 +1504,11 @@ if (t == FTOUPPER) { for (p = buf; *p; p++) if (islower((us
Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)
On Tue, Nov 19, 2002 at 14:52:02 +0200, Ruslan Ermilov wrote: > It seems that this patch has never been committed. This is a critical > bug that should be fixed before 5.0-RELEASE is out. I agree. There is no locale yet and I never see that patch. -- Andrey A. Chernov http://ache.pp.ru/ msg46960/pgp0.pgp Description: PGP signature
awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)
On Thu, Nov 01, 2001 at 05:58:08PM -0800, David O'Brien wrote: > On Fri, Nov 02, 2001 at 04:44:12AM +0300, Andrey A. Chernov wrote: > > Next bad thing discovered about new awk just looking at sourse code: it > > not support locale (collating in regexp ranges too, of course). We just > > make great backward step switching to it. > > I have a patch for that. > It seems that this patch has never been committed. This is a critical bug that should be fixed before 5.0-RELEASE is out. /usr/bin/env LC_ALL=cs_CZ.ISO8859-2 sh -c "echo a | grep '[A-Z]'" /usr/bin/env LC_ALL=cs_CZ.ISO8859-2 sh -c "echo a | awk '/[A-Z]/ {print}'" On a related note, fixing this bug would extrapolate PR misc/45460 to 5.0-CURRENT as well. Cheers, -- Ruslan Ermilov Sysadmin and DBA, [EMAIL PROTECTED] Sunbay Software AG, [EMAIL PROTECTED] FreeBSD committer, +380.652.512.251Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age msg46927/pgp0.pgp Description: PGP signature