Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)

2002-12-03 Thread Ruslan Ermilov
On Tue, Dec 03, 2002 at 11:09:20AM +0300, Andrey A. Chernov wrote:
> On Wed, Nov 20, 2002 at 14:54:12 +0200, Ruslan Ermilov wrote:
> 
> > Index: b.c
> > ===
> > RCS file: /home/ncvs/src/contrib/one-true-awk/b.c,v
> > retrieving revision 1.1.1.2
> > diff -u -p -r1.1.1.2 b.c
> 
> David, this variant is nice enough. Please, commit.
> 
One needs to catch up to his email.  :-)

A new version of one-true-awk was released which includes
these fixes.


Cheers,
-- 
Ruslan Ermilov  Sysadmin and DBA,
[EMAIL PROTECTED]   Sunbay Software AG,
[EMAIL PROTECTED]  FreeBSD committer,
+380.652.512.251Simferopol, Ukraine

http://www.FreeBSD.org  The Power To Serve
http://www.oracle.com   Enabling The Information Age



msg48021/pgp0.pgp
Description: PGP signature


Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)

2002-12-03 Thread Andrey A. Chernov
On Wed, Nov 20, 2002 at 14:54:12 +0200, Ruslan Ermilov wrote:

> Index: b.c
> ===
> RCS file: /home/ncvs/src/contrib/one-true-awk/b.c,v
> retrieving revision 1.1.1.2
> diff -u -p -r1.1.1.2 b.c

David, this variant is nice enough. Please, commit.

-- 
Andrey A. Chernov
http://ache.pp.ru/



msg48017/pgp0.pgp
Description: PGP signature


Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)

2002-11-20 Thread Ruslan Ermilov
On Wed, Nov 20, 2002 at 02:27:53PM +1100, Tim Robbins wrote:
> On Wed, Nov 20, 2002 at 04:38:38AM +0300, Andrey A. Chernov wrote:
> 
> > On Tue, Nov 19, 2002 at 14:52:02 +0200, Ruslan Ermilov wrote:
> > > It seems that this patch has never been committed.  This is a critical
> > > bug that should be fixed before 5.0-RELEASE is out.
> > 
> > I agree. There is no locale yet and I never see that patch.
> 
> This patch seems to work, I used the logic from regcomp.c in libc.
> Long lines make it ugly, but it was like that when I got here ;)

> Index: src/usr.bin/awk/Makefile
> ===
> RCS file: /x/freebsd/src/usr.bin/awk/Makefile,v
> retrieving revision 1.9
> diff -u -r1.9 Makefile
> --- src/usr.bin/awk/Makefile  10 May 2002 20:36:21 -  1.9
> +++ src/usr.bin/awk/Makefile  20 Nov 2002 03:13:50 -
> @@ -6,7 +6,7 @@
>  PROG=nawk
>  SRCS=awkgram.y b.c lex.c lib.c main.c parse.c proctab.c run.c tran.c ytab.h
>  
> -CFLAGS+= -I. -I${AWKSRC}
> +CFLAGS+= -I. -I${AWKSRC} -I${.CURDIR}/../../lib/libc/locale
>  
Ouch.

>  DPADD=   ${LIBM}
>  LDADD=   -lm
> Index: src/contrib/one-true-awk/b.c
> ===
> RCS file: /x/freebsd/src/contrib/one-true-awk/b.c,v
> retrieving revision 1.1.1.2
> diff -u -r1.1.1.2 b.c
> --- src/contrib/one-true-awk/b.c  19 Feb 2002 09:35:24 -  1.1.1.2
> +++ src/contrib/one-true-awk/b.c  20 Nov 2002 03:16:10 -
> @@ -32,6 +32,7 @@
>  #include 
>  #include "awk.h"
>  #include "ytab.h"
> +#include "collate.h"
>  
>  #define  HAT (NCHARS-2)  /* matches ^ in regular expr */
>   /* NCHARS is 2**n */
> @@ -284,7 +285,7 @@
>  
>  char *cclenter(char *argp)   /* add a character class */
>  {
> - int i, c, c2;
> + int i, j, c, c2;
>   uschar *p = (uschar *) argp;
>   uschar *op, *bp;
>   static uschar *buf = 0;
> @@ -308,12 +309,24 @@
>   i--;
>   continue;
>   }
> - while (c < c2) {
> - if (!adjbuf((char **) &buf, &bufsz, bp-buf+2, 
>100, (char **) &bp, 0))
> - FATAL("out of space for character 
>class [%.10s...] 2", p);
> - *bp++ = ++c;
> - i++;
> - }
> + if (__collate_load_error) {
> + while (c < c2) {
> + if (!adjbuf((char **) &buf, &bufsz, 
>bp-buf+2, 100, (char **) &bp, 0))
> + FATAL("out of space for 
>character class [%.10s...] 2", p);
> + *bp++ = ++c;
> + i++;
> + }
> + } else {
> + for (j = CHAR_MIN; j <= CHAR_MAX; j++) {
> + if (!adjbuf((char **) &buf, &bufsz, 
>bp-buf+2, 100, (char **) &bp, 0))
> + FATAL("out of space for 
>character class [%.10s...] 2", p);
> + if (__collate_range_cmp(c, j) <= 0
> + && __collate_range_cmp(j, c2) <= 
>0) {
> + *bp++ = j;
> + i++;
> + }
> + }
> +}
>   continue;
>   }
>   }

There are a number of problems here:

1.  The "empty range" check preceding this block should be made
locale-aware too.

2.  CHAR_MAX evaluates to 127 here.

Here's my version of the above fix plus [[:class:]] fixes Andrey mentioned.
I gave it only light testing.

The collate_range_cmp() was stolen from the old awk(1).


Cheers,
-- 
Ruslan Ermilov  Sysadmin and DBA,
[EMAIL PROTECTED]   Sunbay Software AG,
[EMAIL PROTECTED]  FreeBSD committer,
+380.652.512.251Simferopol, Ukraine

http://www.FreeBSD.org  The Power To Serve
http://www.oracle.com   Enabling The Information Age

Index: b.c
===
RCS file: /home/ncvs/src/contrib/one-true-awk/b.c,v
retrieving revision 1.1.1.2
diff -u -p -r1.1.1.2 b.c
--- b.c 19 Feb 2002 09:35:24 -  1.1.1.2
+++ b.c 20 Nov 2002 12:51:10 -
@@ -282,9 +282,25 @@ int quoted(char **pp)  /* pick up next th
return c;
 }
 
+static int collate_range_cmp (a, b)
+   int a, b;
+{
+   int r;
+   static char s[2][2];
+
+

Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)

2002-11-19 Thread Andrey A. Chernov
On Wed, Nov 20, 2002 at 14:27:53 +1100, Tim Robbins wrote:
> On Wed, Nov 20, 2002 at 04:38:38AM +0300, Andrey A. Chernov wrote:
> 
> > On Tue, Nov 19, 2002 at 14:52:02 +0200, Ruslan Ermilov wrote:
> > > It seems that this patch has never been committed.  This is a critical
> > > bug that should be fixed before 5.0-RELEASE is out.
> > 
> > I agree. There is no locale yet and I never see that patch.
> 
> This patch seems to work, I used the logic from regcomp.c in libc.
> Long lines make it ugly, but it was like that when I got here ;)

Looks good, but it is not enough. Please look in b.c to see how weird 
character classes, i.e. [:alpha:] are implemented there, this stuff must 
be rewritted too.

-- 
Andrey A. Chernov
http://ache.pp.ru/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)

2002-11-19 Thread Tim Robbins
On Wed, Nov 20, 2002 at 04:38:38AM +0300, Andrey A. Chernov wrote:

> On Tue, Nov 19, 2002 at 14:52:02 +0200, Ruslan Ermilov wrote:
> > It seems that this patch has never been committed.  This is a critical
> > bug that should be fixed before 5.0-RELEASE is out.
> 
> I agree. There is no locale yet and I never see that patch.

This patch seems to work, I used the logic from regcomp.c in libc.
Long lines make it ugly, but it was like that when I got here ;)


Tim


Index: src/usr.bin/awk/Makefile
===
RCS file: /x/freebsd/src/usr.bin/awk/Makefile,v
retrieving revision 1.9
diff -u -r1.9 Makefile
--- src/usr.bin/awk/Makefile10 May 2002 20:36:21 -  1.9
+++ src/usr.bin/awk/Makefile20 Nov 2002 03:13:50 -
@@ -6,7 +6,7 @@
 PROG=  nawk
 SRCS=  awkgram.y b.c lex.c lib.c main.c parse.c proctab.c run.c tran.c ytab.h
 
-CFLAGS+= -I. -I${AWKSRC}
+CFLAGS+= -I. -I${AWKSRC} -I${.CURDIR}/../../lib/libc/locale
 
 DPADD= ${LIBM}
 LDADD= -lm
Index: src/contrib/one-true-awk/b.c
===
RCS file: /x/freebsd/src/contrib/one-true-awk/b.c,v
retrieving revision 1.1.1.2
diff -u -r1.1.1.2 b.c
--- src/contrib/one-true-awk/b.c19 Feb 2002 09:35:24 -  1.1.1.2
+++ src/contrib/one-true-awk/b.c20 Nov 2002 03:16:10 -
@@ -32,6 +32,7 @@
 #include 
 #include "awk.h"
 #include "ytab.h"
+#include "collate.h"
 
 #defineHAT (NCHARS-2)  /* matches ^ in regular expr */
/* NCHARS is 2**n */
@@ -284,7 +285,7 @@
 
 char *cclenter(char *argp) /* add a character class */
 {
-   int i, c, c2;
+   int i, j, c, c2;
uschar *p = (uschar *) argp;
uschar *op, *bp;
static uschar *buf = 0;
@@ -308,12 +309,24 @@
i--;
continue;
}
-   while (c < c2) {
-   if (!adjbuf((char **) &buf, &bufsz, bp-buf+2, 
100, (char **) &bp, 0))
-   FATAL("out of space for character 
class [%.10s...] 2", p);
-   *bp++ = ++c;
-   i++;
-   }
+   if (__collate_load_error) {
+   while (c < c2) {
+   if (!adjbuf((char **) &buf, &bufsz, 
+bp-buf+2, 100, (char **) &bp, 0))
+   FATAL("out of space for 
+character class [%.10s...] 2", p);
+   *bp++ = ++c;
+   i++;
+   }
+   } else {
+   for (j = CHAR_MIN; j <= CHAR_MAX; j++) {
+   if (!adjbuf((char **) &buf, &bufsz, 
+bp-buf+2, 100, (char **) &bp, 0))
+   FATAL("out of space for 
+character class [%.10s...] 2", p);
+   if (__collate_range_cmp(c, j) <= 0
+   && __collate_range_cmp(j, c2) <= 
+0) {
+   *bp++ = j;
+   i++;
+   }
+   }
+}
continue;
}
}
Index: src/contrib/one-true-awk/main.c
===
RCS file: /x/freebsd/src/contrib/one-true-awk/main.c,v
retrieving revision 1.1.1.3
diff -u -r1.1.1.3 main.c
--- src/contrib/one-true-awk/main.c 16 Mar 2002 16:50:56 -  1.1.1.3
+++ src/contrib/one-true-awk/main.c 20 Nov 2002 03:03:38 -
@@ -27,6 +27,7 @@
 #define DEBUG
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -55,6 +56,7 @@
char *fs = NULL;
 
cmdname = argv[0];
+   setlocale(LC_ALL, "");
if (argc == 1) {
fprintf(stderr, "Usage: %s [-f programfile | 'program'] [-Ffieldsep] 
[-v var=value] [files]\n", cmdname);
exit(1);
Index: src/contrib/one-true-awk/run.c
===
RCS file: /x/freebsd/src/contrib/one-true-awk/run.c,v
retrieving revision 1.1.1.2
diff -u -r1.1.1.2 run.c
--- src/contrib/one-true-awk/run.c  19 Feb 2002 09:35:25 -  1.1.1.2
+++ src/contrib/one-true-awk/run.c  20 Nov 2002 03:02:29 -
@@ -1504,11 +1504,11 @@
if (t == FTOUPPER) {
for (p = buf; *p; p++)
if (islower((us

Re: awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)

2002-11-19 Thread Andrey A. Chernov
On Tue, Nov 19, 2002 at 14:52:02 +0200, Ruslan Ermilov wrote:
> It seems that this patch has never been committed.  This is a critical
> bug that should be fixed before 5.0-RELEASE is out.

I agree. There is no locale yet and I never see that patch.

-- 
Andrey A. Chernov
http://ache.pp.ru/



msg46960/pgp0.pgp
Description: PGP signature


awk(1) is locale unaware (was: Re: buildworld breakage during "make depend" at usr.bin/kdump)

2002-11-19 Thread Ruslan Ermilov
On Thu, Nov 01, 2001 at 05:58:08PM -0800, David O'Brien wrote:
> On Fri, Nov 02, 2001 at 04:44:12AM +0300, Andrey A. Chernov wrote:
> > Next bad thing discovered about new awk just looking at sourse code: it
> > not support locale (collating in regexp ranges too, of course). We just
> > make great backward step switching to it.
> 
> I have a patch for that.
> 
It seems that this patch has never been committed.  This is a critical
bug that should be fixed before 5.0-RELEASE is out.

/usr/bin/env LC_ALL=cs_CZ.ISO8859-2 sh -c "echo a | grep '[A-Z]'"
/usr/bin/env LC_ALL=cs_CZ.ISO8859-2 sh -c "echo a | awk '/[A-Z]/ {print}'"

On a related note, fixing this bug would extrapolate PR misc/45460 to
5.0-CURRENT as well.


Cheers,
-- 
Ruslan Ermilov  Sysadmin and DBA,
[EMAIL PROTECTED]   Sunbay Software AG,
[EMAIL PROTECTED]  FreeBSD committer,
+380.652.512.251Simferopol, Ukraine

http://www.FreeBSD.org  The Power To Serve
http://www.oracle.com   Enabling The Information Age



msg46927/pgp0.pgp
Description: PGP signature