bug in awk implementation?
I was parsing ldif format with awk (formerly gawk) and found a buglet in awk with the following script: BEGIN { RS="\n\n"; FS="(: |\n)"; } { print $2; } Fed the following output: dn: Some Such DN gidNumber: 1000 uidNumber: 1080 dn: Some Other DN gidNumber: 1000 uidNumber: 1405 This is what I get: one-true-awk: Some Such DN 1000 1080 Some Other DN 1000 1405 gawk: Some Such DN Some Other DN So, this seems to be a bug in the one-true-awk implementation. Any ideas on how to fix this? -gordon To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: bug in awk implementation?
On Mon, Jul 15, 2002 at 08:20:58AM -0700, Gordon Tetlow wrote: > I was parsing ldif format with awk (formerly gawk) and found a buglet in > awk with the following script: > > BEGIN { > RS="\n\n"; > FS="(: |\n)"; > } > > { print $2; } > > Fed the following output: > > dn: Some Such DN > gidNumber: 1000 > uidNumber: 1080 > > dn: Some Other DN > gidNumber: 1000 > uidNumber: 1405 > > This is what I get: > > one-true-awk: > > Some Such DN > 1000 > 1080 > > Some Other DN > 1000 > 1405 Ok. > > gawk: > > Some Such DN > Some Other DN > Oh. > So, this seems to be a bug in the one-true-awk implementation. Any ideas > on how to fix this? To me, this seems like a bug in 'gawk'. The AWK language uses only the first character in RS as the record separator, to my knowledge. ciao, -robert To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: bug in awk implementation?
On Mon, 15 Jul 2002, Robert Drehmel wrote: > > So, this seems to be a bug in the one-true-awk implementation. Any ideas > > on how to fix this? > > To me, this seems like a bug in 'gawk'. The AWK language uses > only the first character in RS as the record separator, to my > knowledge. Ah, okay, there is a distinct lack of documentation to that fact. I have figured out that I can just set RS="" and that does the same thing. I suppose it would be helpful to have an awk book around. =) -gordon To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: bug in awk implementation?
< said: > Ah, okay, there is a distinct lack of documentation to that fact. I have > figured out that I can just set RS="" and that does the same thing. I > suppose it would be helpful to have an awk book around. =) The Standard is clear: # The first character of the string value of RS shall be the input # record separator; a by default. If RS contains more than # one character, the results are unspecified. If RS is null, then # records are separated by sequences consisting of a plus # one or more blank lines, leading or trailing blank lines shall not # result in empty records at the beginning or end of the input, and a # shall always be a field separator, no matter what the # value of FS is. -GAWollman To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: bug in awk implementation?
On Mon, 2002-07-15 at 11:37, Robert Drehmel wrote: > To me, this seems like a bug in 'gawk'. The AWK language uses > only the first character in RS as the record separator, to my > knowledge. Hm, I thought that was a gawk extension. -- brandon s. allbery [linux][solaris][japh][freebsd] [EMAIL PROTECTED] system administrator [openafs][heimdal][too many hats] [EMAIL PROTECTED] electrical and computer engineering KF8NH carnegie mellon university["better check the oblivious first" -ke6sls] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: bug in awk implementation?
Hello Brandon, On Mon, Jul 15, 2002 at 02:53:59PM -0400, Brandon S. Allbery KF8NH wrote: > On Mon, 2002-07-15 at 11:37, Robert Drehmel wrote: > > To me, this seems like a bug in 'gawk'. The AWK language uses > > only the first character in RS as the record separator, to my > > knowledge. > > Hm, I thought that was a gawk extension. >From the GNU AWK 3.0.6 source: """ else if (RS->stlen > 1) { static int warned = FALSE; RS_regexp = make_regexp(RS->stptr, RS->stlen, IGNORECASE, TRUE); if (do_lint && ! warned) { warning("multicharacter value of `RS' is not portable"); warned = TRUE; } } """ You are right. However, I still consider it a bug. :-) ciao, -robert To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: bug in awk implementation?
< said: > You are right. However, I still consider it a bug. :-) The standard says that the behavior is ``undefined''. That means that you computer is allowed to turn into a frog. Actually doing something useful is also permitted. -GAWollman To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: bug in awk implementation?
On Mon, Jul 15, 2002 at 04:00:29PM -0400, Garrett Wollman wrote: > < >said: > > > You are right. However, I still consider it a bug. :-) > > The standard says that the behavior is ``undefined''. That means that > you computer is allowed to turn into a frog. Actually doing something > useful is also permitted. And since it is clearly documented, awk(1) says, Records Normally, records are separated by newline characters. You can control how records are separated by assigning values to the built-in variable RS. If RS is any single character, that character separates records. Otherwise, RS is a regular expression. Text in the input that matches this regular expression will separate the record. However, in compatibility mode, only the first character of its string value is used for separating records. If RS is set to the null string, then records are separated by blank lines. When RS is set to the null string, the new- line character always acts as a field separator, in addi- tion to whatever value FS may have. It is not a bug. -- Crist J. Clark | [EMAIL PROTECTED] | [EMAIL PROTECTED] http://people.freebsd.org/~cjc/| [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: bug in awk implementation?
On Tue, 16 Jul 2002, Crist J. Clark wrote: > And since it is clearly documented, awk(1) says, > >Records >Normally, records are separated by newline characters. >You can control how records are separated by assigning >values to the built-in variable RS. If RS is any single >character, that character separates records. Otherwise, >RS is a regular expression. Text in the input that >matches this regular expression will separate the record. >However, in compatibility mode, only the first character >of its string value is used for separating records. If RS >is set to the null string, then records are separated by >blank lines. When RS is set to the null string, the new- >line character always acts as a field separator, in addi- >tion to whatever value FS may have. > > It is not a bug. No, you are quoting from the gawk(1) man page. The awk(1) man page makes no such statement. -gordon To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: bug in awk implementation?
[Since you insisted on CC'ing me...] < said: > No, you are quoting from the gawk(1) man page. The awk(1) man page makes > no such statement. The awk(1) manual page does not define the correct behavior of gawk(1). IEEE Std. 1003.1-2001 defines the correct behavior of both awk(1) and gawk(1), and as I have already demonstrated, it leaves the behavior in question clearly unspecified. -GAWollman To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: bug in awk implementation?
On Tue, Jul 16, 2002 at 04:57:42PM -0700, Gordon Tetlow wrote: > On Tue, 16 Jul 2002, Crist J. Clark wrote: > > > And since it is clearly documented, awk(1) says, > > > >Records > >Normally, records are separated by newline characters. > >You can control how records are separated by assigning > >values to the built-in variable RS. If RS is any single > >character, that character separates records. Otherwise, > >RS is a regular expression. Text in the input that > >matches this regular expression will separate the record. > >However, in compatibility mode, only the first character > >of its string value is used for separating records. If RS > >is set to the null string, then records are separated by > >blank lines. When RS is set to the null string, the new- > >line character always acts as a field separator, in addi- > >tion to whatever value FS may have. > > > > It is not a bug. > > No, you are quoting from the gawk(1) man page. The awk(1) man page makes > no such statement. I pulled it from a RELENG_4_6_0_RELEASE box where awk == gawk. I said 'awk(1)' since I typed 'man awk' to get it, but of course you're right, I did mean gawk(1). But the point is that there's still no bug in gawk or one-true-awk with respect to how they deal with RS. -- Crist J. Clark | [EMAIL PROTECTED] | [EMAIL PROTECTED] http://people.freebsd.org/~cjc/| [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message