bug in awk implementation?

2002-07-15 Thread Gordon Tetlow

I was parsing ldif format with awk (formerly gawk) and found a buglet in 
awk with the following script:

BEGIN {
RS="\n\n";
FS="(: |\n)";
}

{ print $2; }

Fed the following output:

dn: Some Such DN
gidNumber: 1000
uidNumber: 1080

dn: Some Other DN
gidNumber: 1000
uidNumber: 1405

This is what I get:

one-true-awk:

Some Such DN
1000
1080

Some Other DN
1000
1405

gawk:

Some Such DN
Some Other DN


So, this seems to be a bug in the one-true-awk implementation. Any ideas 
on how to fix this?

-gordon


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: bug in awk implementation?

2002-07-15 Thread Robert Drehmel

On Mon, Jul 15, 2002 at 08:20:58AM -0700, Gordon Tetlow wrote:
> I was parsing ldif format with awk (formerly gawk) and found a buglet in 
> awk with the following script:
> 
> BEGIN {
>   RS="\n\n";
>   FS="(: |\n)";
> }
> 
> { print $2; }
> 
> Fed the following output:
> 
> dn: Some Such DN
> gidNumber: 1000
> uidNumber: 1080
> 
> dn: Some Other DN
> gidNumber: 1000
> uidNumber: 1405
> 
> This is what I get:
> 
> one-true-awk:
> 
> Some Such DN
> 1000
> 1080
> 
> Some Other DN
> 1000
> 1405

Ok.

> 
> gawk:
> 
> Some Such DN
> Some Other DN
> 

Oh.

> So, this seems to be a bug in the one-true-awk implementation. Any ideas 
> on how to fix this?

To me, this seems like a bug in 'gawk'.  The AWK language uses
only the first character in RS as the record separator, to my
knowledge.

ciao,
-robert

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: bug in awk implementation?

2002-07-15 Thread Gordon Tetlow

On Mon, 15 Jul 2002, Robert Drehmel wrote:

> > So, this seems to be a bug in the one-true-awk implementation. Any ideas 
> > on how to fix this?
> 
> To me, this seems like a bug in 'gawk'.  The AWK language uses
> only the first character in RS as the record separator, to my
> knowledge.

Ah, okay, there is a distinct lack of documentation to that fact. I have 
figured out that I can just set RS="" and that does the same thing. I 
suppose it would be helpful to have an awk book around. =)

-gordon


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: bug in awk implementation?

2002-07-15 Thread Garrett Wollman

< said:

> Ah, okay, there is a distinct lack of documentation to that fact. I have 
> figured out that I can just set RS="" and that does the same thing. I 
> suppose it would be helpful to have an awk book around. =)

The Standard is clear:

# The first character of the string value of RS shall be the input
# record separator; a  by default. If RS contains more than
# one character, the results are unspecified. If RS is null, then
# records are separated by sequences consisting of a  plus
# one or more blank lines, leading or trailing blank lines shall not
# result in empty records at the beginning or end of the input, and a
#  shall always be a field separator, no matter what the
# value of FS is.

-GAWollman


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: bug in awk implementation?

2002-07-15 Thread Brandon S. Allbery

On Mon, 2002-07-15 at 11:37, Robert Drehmel wrote:
> To me, this seems like a bug in 'gawk'.  The AWK language uses
> only the first character in RS as the record separator, to my
> knowledge.

Hm, I thought that was a gawk extension.

-- 
brandon s. allbery   [linux][solaris][japh][freebsd]
[EMAIL PROTECTED]
system administrator [openafs][heimdal][too many hats]
[EMAIL PROTECTED]
electrical and computer engineering 
KF8NH
carnegie mellon university["better check the oblivious first"
-ke6sls]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: bug in awk implementation?

2002-07-15 Thread Robert Drehmel

Hello Brandon,

On Mon, Jul 15, 2002 at 02:53:59PM -0400, Brandon S. Allbery  KF8NH wrote:
> On Mon, 2002-07-15 at 11:37, Robert Drehmel wrote:
> > To me, this seems like a bug in 'gawk'.  The AWK language uses
> > only the first character in RS as the record separator, to my
> > knowledge.
> 
> Hm, I thought that was a gawk extension.

>From the GNU AWK 3.0.6 source:
"""
else if (RS->stlen > 1) {
static int warned = FALSE;

RS_regexp = make_regexp(RS->stptr, RS->stlen, IGNORECASE, TRUE);

if (do_lint && ! warned) {
warning("multicharacter value of `RS' is not portable");
warned = TRUE;
}
}
"""

You are right.  However, I still consider it a bug.  :-)

ciao,
-robert

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: bug in awk implementation?

2002-07-15 Thread Garrett Wollman

< 
said:

> You are right.  However, I still consider it a bug.  :-)

The standard says that the behavior is ``undefined''.  That means that
you computer is allowed to turn into a frog.  Actually doing something
useful is also permitted.

-GAWollman


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: bug in awk implementation?

2002-07-16 Thread Crist J. Clark

On Mon, Jul 15, 2002 at 04:00:29PM -0400, Garrett Wollman wrote:
> < 
>said:
> 
> > You are right.  However, I still consider it a bug.  :-)
> 
> The standard says that the behavior is ``undefined''.  That means that
> you computer is allowed to turn into a frog.  Actually doing something
> useful is also permitted.

And since it is clearly documented, awk(1) says,

   Records
   Normally,  records  are  separated  by newline characters.
   You can control how records  are  separated  by  assigning
   values  to  the built-in variable RS.  If RS is any single
   character, that character separates  records.   Otherwise,
   RS  is  a  regular  expression.   Text  in  the input that
   matches this regular expression will separate the  record.
   However,  in  compatibility mode, only the first character
   of its string value is used for separating records.  If RS
   is  set  to the null string, then records are separated by
   blank lines.  When RS is set to the null string, the  new-
   line  character always acts as a field separator, in addi-
   tion to whatever value FS may have.

It is not a bug.
-- 
Crist J. Clark | [EMAIL PROTECTED]
   | [EMAIL PROTECTED]
http://people.freebsd.org/~cjc/| [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: bug in awk implementation?

2002-07-16 Thread Gordon Tetlow

On Tue, 16 Jul 2002, Crist J. Clark wrote:

> And since it is clearly documented, awk(1) says,
> 
>Records
>Normally,  records  are  separated  by newline characters.
>You can control how records  are  separated  by  assigning
>values  to  the built-in variable RS.  If RS is any single
>character, that character separates  records.   Otherwise,
>RS  is  a  regular  expression.   Text  in  the input that
>matches this regular expression will separate the  record.
>However,  in  compatibility mode, only the first character
>of its string value is used for separating records.  If RS
>is  set  to the null string, then records are separated by
>blank lines.  When RS is set to the null string, the  new-
>line  character always acts as a field separator, in addi-
>tion to whatever value FS may have.
> 
> It is not a bug.

No, you are quoting from the gawk(1) man page. The awk(1) man page makes 
no such statement.

-gordon


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: bug in awk implementation?

2002-07-16 Thread Garrett Wollman

[Since you insisted on CC'ing me...]

< said:

> No, you are quoting from the gawk(1) man page. The awk(1) man page makes 
> no such statement.

The awk(1) manual page does not define the correct behavior of
gawk(1).

IEEE Std. 1003.1-2001 defines the correct behavior of both awk(1) and
gawk(1), and as I have already demonstrated, it leaves the behavior in
question clearly unspecified.

-GAWollman


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: bug in awk implementation?

2002-07-16 Thread Crist J. Clark

On Tue, Jul 16, 2002 at 04:57:42PM -0700, Gordon Tetlow wrote:
> On Tue, 16 Jul 2002, Crist J. Clark wrote:
> 
> > And since it is clearly documented, awk(1) says,
> > 
> >Records
> >Normally,  records  are  separated  by newline characters.
> >You can control how records  are  separated  by  assigning
> >values  to  the built-in variable RS.  If RS is any single
> >character, that character separates  records.   Otherwise,
> >RS  is  a  regular  expression.   Text  in  the input that
> >matches this regular expression will separate the  record.
> >However,  in  compatibility mode, only the first character
> >of its string value is used for separating records.  If RS
> >is  set  to the null string, then records are separated by
> >blank lines.  When RS is set to the null string, the  new-
> >line  character always acts as a field separator, in addi-
> >tion to whatever value FS may have.
> > 
> > It is not a bug.
> 
> No, you are quoting from the gawk(1) man page. The awk(1) man page makes 
> no such statement.

I pulled it from a RELENG_4_6_0_RELEASE box where awk == gawk. I said
'awk(1)' since I typed 'man awk' to get it, but of course you're
right, I did mean gawk(1).

But the point is that there's still no bug in gawk or one-true-awk
with respect to how they deal with RS.
-- 
Crist J. Clark | [EMAIL PROTECTED]
   | [EMAIL PROTECTED]
http://people.freebsd.org/~cjc/| [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message