Re: [PLUG] gawk switch statement syntax errors
On Mon, 23 Jul 2018, Tomas Kuchta wrote: Maybe you can speed things up by pdf2txt and identify the lines of interest in awk. Thomas, Almost every page is different. All have headers, data for a variable number of hours (some with flags in the left margin, most without), and some have summaries at the bottom. Then there are the days with missing data. And some days have data in a specific column (but not on all data rows) while other days are blank in that column. And, this is a one-time process. It's to get the data from the source documents into a format suitable for import into a database and statistical analyses. THanks, Rich ___ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug
Re: [PLUG] gawk switch statement syntax errors
Maybe you can speed things up by pdf2txt and identify the lines of interest in awk. On Mon, Jul 23, 2018, 4:43 PM Rich Shepard wrote: > On Mon, 23 Jul 2018, Tomas Kuchta wrote: > > > Depending on your awk script and/or your data - this can have significant > > runtime impact, beside nicer coding style. > > Tomas, > >It takes me 5-10 minutes to highlight data in the PDF file and paste it > into a text file. When done the shell script, calling two sed and six awk > scripts runs in less than a second. The prompt returns almost immediately. > > Rich > ___ > PLUG mailing list > PLUG@pdxlinux.org > http://lists.pdxlinux.org/mailman/listinfo/plug > ___ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug
Re: [PLUG] gawk switch statement syntax errors
Making the code more complex than necessary leads to long latencies as you query the plug list. On Mon, Jul 23, 2018, 16:45 Rich Shepard wrote: > On Mon, 23 Jul 2018, Tomas Kuchta wrote: > > > Depending on your awk script and/or your data - this can have significant > > runtime impact, beside nicer coding style. > > Tomas, > >It takes me 5-10 minutes to highlight data in the PDF file and paste it > into a text file. When done the shell script, calling two sed and six awk > scripts runs in less than a second. The prompt returns almost immediately. > > Rich > ___ > PLUG mailing list > PLUG@pdxlinux.org > http://lists.pdxlinux.org/mailman/listinfo/plug > ___ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug
Re: [PLUG] gawk switch statement syntax errors
On Mon, 23 Jul 2018, Tomas Kuchta wrote: Depending on your awk script and/or your data - this can have significant runtime impact, beside nicer coding style. Tomas, It takes me 5-10 minutes to highlight data in the PDF file and paste it into a text file. When done the shell script, calling two sed and six awk scripts runs in less than a second. The prompt returns almost immediately. Rich ___ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug
Re: [PLUG] gawk switch statement syntax errors
I hope that I am not beating dead horse with this. There is also performance problem with using case/switch statement like this - the whole code block gets evaluated/run for every record/line. If you use it the way suggested, the code block is only run for the correct record/lines. You can optimize it further if you keep simple comparison such as NF==35 condition before regexp comparison/search. Depending on your awk script and/or your data - this can have significant runtime impact, beside nicer coding style. Tomas On Mon, Jul 23, 2018, 3:21 PM Rich Shepard wrote: > On Mon, 23 Jul 2018, Tomas Kuchta wrote: > > > Do not use switch/case - just use NF==35 {print "I see 35 columns on this > > line"} > > ... type of a code. > > > > If you need more than that you can do something like this: > > NF==35 && $2<5 {print "I see 35 columns on this line and column 2 is less > > than 5"} > > > > I guess that is what Russell was saying too. > > Tomas, > >It turns out that the switch/case statement works when the whole thing > is > enclosed in curly braces because it's all part of the action response. So > it > would look like this: > > { switch (NF) { >case 1: > ... >case 2: > ... >} > } > >But, using the number of fields as the pattern does make it easier to > read: > > NF == 36 { print } > > Thanks, > > Rich > ___ > PLUG mailing list > PLUG@pdxlinux.org > http://lists.pdxlinux.org/mailman/listinfo/plug > ___ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug
Re: [PLUG] gawk switch statement syntax errors
On Mon, 23 Jul 2018, Tomas Kuchta wrote: Do not use switch/case - just use NF==35 {print "I see 35 columns on this line"} ... type of a code. If you need more than that you can do something like this: NF==35 && $2<5 {print "I see 35 columns on this line and column 2 is less than 5"} I guess that is what Russell was saying too. Tomas, It turns out that the switch/case statement works when the whole thing is enclosed in curly braces because it's all part of the action response. So it would look like this: { switch (NF) { case 1: ... case 2: ... } } But, using the number of fields as the pattern does make it easier to read: NF == 36 { print } Thanks, Rich ___ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug
Re: [PLUG] gawk switch statement syntax errors
Do not use switch/case - just use NF==35 {print "I see 35 columns on this line"} ... type of a code. If you need more than that you can do something like this: NF==35 && $2<5 {print "I see 35 columns on this line and column 2 is less than 5"} I guess that is what Russell was saying too. Tomas On Mon, Jul 23, 2018, 12:30 PM Russell Senior wrote: > Ah, gawk does have switch(), but not in compatibility mode. Maybe you are > in compatibility mode. But in either case, I don't see the need here (see > my "thirdly" suggestion, and ignore my NR == 37 typo). > > On Mon, Jul 23, 2018 at 12:21 PM, Russell Senior < > russ...@personaltelco.net> > wrote: > > > First off, I don't have your book and have no idea what you are trying to > > do. > > > > Second, I think you want NF, not NR. > > > > Thirdly, I think you want to just write matching rules (mawk manpage > > didn't mention switch), e.g.: > > > > NF == 38 { print stuff } > > NR == 37 { print other stuff } > > > > Lastly, if the vertical bars are significant, you should maybe parse on > > that character to harmonize the input to a subsequent stage ... but > that's > > just a guess, since I don't know wtf you are doing. > > > > On Mon, Jul 23, 2018 at 11:02 AM, Rich Shepard > > > wrote: > > > >> gawk-4.1.3 is installed here. According to Arnold Robbins' 'Effective > >> awk > >> Programming, 4th Ed', page 154, the syntax for the switch statement is > >> used > >> in this code: > >> > >> # Get line length (number of fields) > >> switch (NR) { > >> case 36: # No shifts present. > >> { print $1, $6, $7, $8, $9, $10, $11, $12, $13, $18, $19, $20, $21, > >> $22, $23, $24, $25, $29, $30, $31, $32, $33, $34, $35, $36 } > >> break > >> case 37: # 1 shift present. > >> { print $1, $6, $7, $8, $9, $10, $11, $12, $13, $19, $20, $21, $22, > >> $23, $24, $25, $26, $30, $31, $32, $33, $34, $35, $36, $37 } > >> break > >> case 38: # 2 shifts present. > >> { print $1, $7, $8, $9, $10, $11, $12, $13, $14, $20, $21, $22, $23, > >> $24, $25, $26, $27, $31, $32, $33, $34, $35, $36, $37, $38 } > >> break > >> case ?: > >> break > >> } > >> > >> Running this code on data results in syntax errors: > >> > >> $ gawk -f trim-fields.awk test.dat > out > >> gawk: trim-fields.awk:13: switch (NR) { > >> gawk: trim-fields.awk:13: ^ syntax error > >> gawk: trim-fields.awk:14: case 36: # No shifts present. > >> gawk: trim-fields.awk:14: ^ syntax error > >> gawk: trim-fields.awk:17: case 37: # 1 shift present. > >> gawk: trim-fields.awk:17: ^ syntax error > >> gawk: trim-fields.awk:20: case 38: # 2 shifts present. > >> gawk: trim-fields.awk:20: ^ syntax error > >> gawk: trim-fields.awk:23: case ?: > >> gawk: trim-fields.awk:23: ^ syntax error > >> > >> I'm sure it's a simple error on my part but I'm just not seeing the > >> problem. > >> > >> Test data set (test.dat) has lines with each length: > >> > >> 11/24/07 0400 12.12 |0400 2090 0.01| 12.10 12.10 12.04 12.08 12.12 12.12 > >> 12.10 12.06 1200 12.00 |1200 1930 0.01| 12.08 12.06 12.07 12.04 12.00 > 12.04 > >> 12.03 12.03 12.05 | 2000 2000 | 12.03 12.06 12.04 12.01 12.00 12.02 > 12.00 > >> 12.01 > >> 11/25/07 12.01 | 1950 0.01| 12.01 12.01 11.99 11.97 11.97 11.98 > >> 11.96 11.96 2400 11.87 |2400 1770 0.00| 11.97 11.95 11.95 11.95 11.93 > 11.91 > >> 11.93 11.93 11.95 | 1860 1860 | 11.96 11.97 11.93 11.93 11.91 11.89 > 11.89 > >> 11.90 > >> 11/26/07 1830 11.97 |1830 1890 | 11.87 11.87 11.90 11.90 11.89 11.86 > >> 11.87 11.81 0800 11.78 |0800 1680 0.00| 11.78 11.88 11.86 11.79 11.81 > 11.89 > >> 11.81 11.82 11.87 | 1770 1770 | 11.80 11.79 11.92 11.92 11.94 11.92 > 11.95 > >> 11.93 > >> 11/27/07 0230 12.05 |0230 1990 | 11.94 11.99 12.04 12.04 12.04 12.04 > >> 12.04 12.03 2230 11.93 |2230 1840 | 12.03 12.02 12.02 11.98 11.95 11.97 > >> 11.96 11.95 11.98 | 1900 1900 | 11.94 11.94 11.94 11.96 11.97 11.97 > 11.94 > >> 11.93 > >> 11/28/07 2000 12.02 |2000 1950 | 11.94 11.92 11.91 11.92 11.90 11.88 > >> 11.88 11.86 1430 11.81 |1430 1710 | 11.85 11.85 11.86 11.86 11.85 11.82 > >> 11.82 11.83 11.89 | 1790 1790 | 11.86 11.86 11.87 11.90 12.02 12.00 > 11.90 > >> 11.91 > >> > >> I'm stuck (again) and I don't think this is a white space issue or an > >> improper newline placement. > >> > >> Rich > >> ___ > >> PLUG mailing list > >> PLUG@pdxlinux.org > >> http://lists.pdxlinux.org/mailman/listinfo/plug > >> > > > > > ___ > PLUG mailing list > PLUG@pdxlinux.org > http://lists.pdxlinux.org/mailman/listinfo/plug > ___ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug
Re: [PLUG] gawk switch statement syntax errors [RESOLVED]
On Mon, 23 Jul 2018, Russell Senior wrote: Russell, Second, I think you want NF, not NR. Yes. That is correct. Thirdly, I think you want to just write matching rules (mawk manpage didn't mention switch), e.g.: NF == 38 { print stuff } NR == 37 { print other stuff } Sigh. Yes, specifying the pattern followed by the action is the solution. I moved this processing from a bash script using IF - ELIF - ELSE so the switch statement seemed to be the right choice. Thanks for getting me back to the (g)awk solution. Best regards, Rich ___ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug
Re: [PLUG] gawk switch statement syntax errors
Ah, gawk does have switch(), but not in compatibility mode. Maybe you are in compatibility mode. But in either case, I don't see the need here (see my "thirdly" suggestion, and ignore my NR == 37 typo). On Mon, Jul 23, 2018 at 12:21 PM, Russell Senior wrote: > First off, I don't have your book and have no idea what you are trying to > do. > > Second, I think you want NF, not NR. > > Thirdly, I think you want to just write matching rules (mawk manpage > didn't mention switch), e.g.: > > NF == 38 { print stuff } > NR == 37 { print other stuff } > > Lastly, if the vertical bars are significant, you should maybe parse on > that character to harmonize the input to a subsequent stage ... but that's > just a guess, since I don't know wtf you are doing. > > On Mon, Jul 23, 2018 at 11:02 AM, Rich Shepard > wrote: > >> gawk-4.1.3 is installed here. According to Arnold Robbins' 'Effective >> awk >> Programming, 4th Ed', page 154, the syntax for the switch statement is >> used >> in this code: >> >> # Get line length (number of fields) >> switch (NR) { >> case 36: # No shifts present. >> { print $1, $6, $7, $8, $9, $10, $11, $12, $13, $18, $19, $20, $21, >> $22, $23, $24, $25, $29, $30, $31, $32, $33, $34, $35, $36 } >> break >> case 37: # 1 shift present. >> { print $1, $6, $7, $8, $9, $10, $11, $12, $13, $19, $20, $21, $22, >> $23, $24, $25, $26, $30, $31, $32, $33, $34, $35, $36, $37 } >> break >> case 38: # 2 shifts present. >> { print $1, $7, $8, $9, $10, $11, $12, $13, $14, $20, $21, $22, $23, >> $24, $25, $26, $27, $31, $32, $33, $34, $35, $36, $37, $38 } >> break >> case ?: >> break >> } >> >> Running this code on data results in syntax errors: >> >> $ gawk -f trim-fields.awk test.dat > out >> gawk: trim-fields.awk:13: switch (NR) { >> gawk: trim-fields.awk:13: ^ syntax error >> gawk: trim-fields.awk:14: case 36: # No shifts present. >> gawk: trim-fields.awk:14: ^ syntax error >> gawk: trim-fields.awk:17: case 37: # 1 shift present. >> gawk: trim-fields.awk:17: ^ syntax error >> gawk: trim-fields.awk:20: case 38: # 2 shifts present. >> gawk: trim-fields.awk:20: ^ syntax error >> gawk: trim-fields.awk:23: case ?: >> gawk: trim-fields.awk:23: ^ syntax error >> >> I'm sure it's a simple error on my part but I'm just not seeing the >> problem. >> >> Test data set (test.dat) has lines with each length: >> >> 11/24/07 0400 12.12 |0400 2090 0.01| 12.10 12.10 12.04 12.08 12.12 12.12 >> 12.10 12.06 1200 12.00 |1200 1930 0.01| 12.08 12.06 12.07 12.04 12.00 12.04 >> 12.03 12.03 12.05 | 2000 2000 | 12.03 12.06 12.04 12.01 12.00 12.02 12.00 >> 12.01 >> 11/25/07 12.01 | 1950 0.01| 12.01 12.01 11.99 11.97 11.97 11.98 >> 11.96 11.96 2400 11.87 |2400 1770 0.00| 11.97 11.95 11.95 11.95 11.93 11.91 >> 11.93 11.93 11.95 | 1860 1860 | 11.96 11.97 11.93 11.93 11.91 11.89 11.89 >> 11.90 >> 11/26/07 1830 11.97 |1830 1890 | 11.87 11.87 11.90 11.90 11.89 11.86 >> 11.87 11.81 0800 11.78 |0800 1680 0.00| 11.78 11.88 11.86 11.79 11.81 11.89 >> 11.81 11.82 11.87 | 1770 1770 | 11.80 11.79 11.92 11.92 11.94 11.92 11.95 >> 11.93 >> 11/27/07 0230 12.05 |0230 1990 | 11.94 11.99 12.04 12.04 12.04 12.04 >> 12.04 12.03 2230 11.93 |2230 1840 | 12.03 12.02 12.02 11.98 11.95 11.97 >> 11.96 11.95 11.98 | 1900 1900 | 11.94 11.94 11.94 11.96 11.97 11.97 11.94 >> 11.93 >> 11/28/07 2000 12.02 |2000 1950 | 11.94 11.92 11.91 11.92 11.90 11.88 >> 11.88 11.86 1430 11.81 |1430 1710 | 11.85 11.85 11.86 11.86 11.85 11.82 >> 11.82 11.83 11.89 | 1790 1790 | 11.86 11.86 11.87 11.90 12.02 12.00 11.90 >> 11.91 >> >> I'm stuck (again) and I don't think this is a white space issue or an >> improper newline placement. >> >> Rich >> ___ >> PLUG mailing list >> PLUG@pdxlinux.org >> http://lists.pdxlinux.org/mailman/listinfo/plug >> > > ___ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug
Re: [PLUG] gawk switch statement syntax errors
First off, I don't have your book and have no idea what you are trying to do. Second, I think you want NF, not NR. Thirdly, I think you want to just write matching rules (mawk manpage didn't mention switch), e.g.: NF == 38 { print stuff } NR == 37 { print other stuff } Lastly, if the vertical bars are significant, you should maybe parse on that character to harmonize the input to a subsequent stage ... but that's just a guess, since I don't know wtf you are doing. On Mon, Jul 23, 2018 at 11:02 AM, Rich Shepard wrote: > gawk-4.1.3 is installed here. According to Arnold Robbins' 'Effective awk > Programming, 4th Ed', page 154, the syntax for the switch statement is > used > in this code: > > # Get line length (number of fields) > switch (NR) { > case 36: # No shifts present. > { print $1, $6, $7, $8, $9, $10, $11, $12, $13, $18, $19, $20, $21, > $22, $23, $24, $25, $29, $30, $31, $32, $33, $34, $35, $36 } > break > case 37: # 1 shift present. > { print $1, $6, $7, $8, $9, $10, $11, $12, $13, $19, $20, $21, $22, > $23, $24, $25, $26, $30, $31, $32, $33, $34, $35, $36, $37 } > break > case 38: # 2 shifts present. > { print $1, $7, $8, $9, $10, $11, $12, $13, $14, $20, $21, $22, $23, > $24, $25, $26, $27, $31, $32, $33, $34, $35, $36, $37, $38 } > break > case ?: > break > } > > Running this code on data results in syntax errors: > > $ gawk -f trim-fields.awk test.dat > out > gawk: trim-fields.awk:13: switch (NR) { > gawk: trim-fields.awk:13: ^ syntax error > gawk: trim-fields.awk:14: case 36: # No shifts present. > gawk: trim-fields.awk:14: ^ syntax error > gawk: trim-fields.awk:17: case 37: # 1 shift present. > gawk: trim-fields.awk:17: ^ syntax error > gawk: trim-fields.awk:20: case 38: # 2 shifts present. > gawk: trim-fields.awk:20: ^ syntax error > gawk: trim-fields.awk:23: case ?: > gawk: trim-fields.awk:23: ^ syntax error > > I'm sure it's a simple error on my part but I'm just not seeing the > problem. > > Test data set (test.dat) has lines with each length: > > 11/24/07 0400 12.12 |0400 2090 0.01| 12.10 12.10 12.04 12.08 12.12 12.12 > 12.10 12.06 1200 12.00 |1200 1930 0.01| 12.08 12.06 12.07 12.04 12.00 12.04 > 12.03 12.03 12.05 | 2000 2000 | 12.03 12.06 12.04 12.01 12.00 12.02 12.00 > 12.01 > 11/25/07 12.01 | 1950 0.01| 12.01 12.01 11.99 11.97 11.97 11.98 > 11.96 11.96 2400 11.87 |2400 1770 0.00| 11.97 11.95 11.95 11.95 11.93 11.91 > 11.93 11.93 11.95 | 1860 1860 | 11.96 11.97 11.93 11.93 11.91 11.89 11.89 > 11.90 > 11/26/07 1830 11.97 |1830 1890 | 11.87 11.87 11.90 11.90 11.89 11.86 11.87 > 11.81 0800 11.78 |0800 1680 0.00| 11.78 11.88 11.86 11.79 11.81 11.89 11.81 > 11.82 11.87 | 1770 1770 | 11.80 11.79 11.92 11.92 11.94 11.92 11.95 11.93 > 11/27/07 0230 12.05 |0230 1990 | 11.94 11.99 12.04 12.04 12.04 12.04 12.04 > 12.03 2230 11.93 |2230 1840 | 12.03 12.02 12.02 11.98 11.95 11.97 11.96 > 11.95 11.98 | 1900 1900 | 11.94 11.94 11.94 11.96 11.97 11.97 11.94 11.93 > 11/28/07 2000 12.02 |2000 1950 | 11.94 11.92 11.91 11.92 11.90 11.88 11.88 > 11.86 1430 11.81 |1430 1710 | 11.85 11.85 11.86 11.86 11.85 11.82 11.82 > 11.83 11.89 | 1790 1790 | 11.86 11.86 11.87 11.90 12.02 12.00 11.90 11.91 > > I'm stuck (again) and I don't think this is a white space issue or an > improper newline placement. > > Rich > ___ > PLUG mailing list > PLUG@pdxlinux.org > http://lists.pdxlinux.org/mailman/listinfo/plug > ___ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug
[PLUG] gawk switch statement syntax errors
gawk-4.1.3 is installed here. According to Arnold Robbins' 'Effective awk Programming, 4th Ed', page 154, the syntax for the switch statement is used in this code: # Get line length (number of fields) switch (NR) { case 36: # No shifts present. { print $1, $6, $7, $8, $9, $10, $11, $12, $13, $18, $19, $20, $21, $22, $23, $24, $25, $29, $30, $31, $32, $33, $34, $35, $36 } break case 37: # 1 shift present. { print $1, $6, $7, $8, $9, $10, $11, $12, $13, $19, $20, $21, $22, $23, $24, $25, $26, $30, $31, $32, $33, $34, $35, $36, $37 } break case 38: # 2 shifts present. { print $1, $7, $8, $9, $10, $11, $12, $13, $14, $20, $21, $22, $23, $24, $25, $26, $27, $31, $32, $33, $34, $35, $36, $37, $38 } break case ?: break } Running this code on data results in syntax errors: $ gawk -f trim-fields.awk test.dat > out gawk: trim-fields.awk:13: switch (NR) { gawk: trim-fields.awk:13: ^ syntax error gawk: trim-fields.awk:14: case 36: # No shifts present. gawk: trim-fields.awk:14: ^ syntax error gawk: trim-fields.awk:17: case 37: # 1 shift present. gawk: trim-fields.awk:17: ^ syntax error gawk: trim-fields.awk:20: case 38: # 2 shifts present. gawk: trim-fields.awk:20: ^ syntax error gawk: trim-fields.awk:23: case ?: gawk: trim-fields.awk:23: ^ syntax error I'm sure it's a simple error on my part but I'm just not seeing the problem. Test data set (test.dat) has lines with each length: 11/24/07 0400 12.12 |0400 2090 0.01| 12.10 12.10 12.04 12.08 12.12 12.12 12.10 12.06 1200 12.00 |1200 1930 0.01| 12.08 12.06 12.07 12.04 12.00 12.04 12.03 12.03 12.05 | 2000 2000 | 12.03 12.06 12.04 12.01 12.00 12.02 12.00 12.01 11/25/07 12.01 | 1950 0.01| 12.01 12.01 11.99 11.97 11.97 11.98 11.96 11.96 2400 11.87 |2400 1770 0.00| 11.97 11.95 11.95 11.95 11.93 11.91 11.93 11.93 11.95 | 1860 1860 | 11.96 11.97 11.93 11.93 11.91 11.89 11.89 11.90 11/26/07 1830 11.97 |1830 1890 | 11.87 11.87 11.90 11.90 11.89 11.86 11.87 11.81 0800 11.78 |0800 1680 0.00| 11.78 11.88 11.86 11.79 11.81 11.89 11.81 11.82 11.87 | 1770 1770 | 11.80 11.79 11.92 11.92 11.94 11.92 11.95 11.93 11/27/07 0230 12.05 |0230 1990 | 11.94 11.99 12.04 12.04 12.04 12.04 12.04 12.03 2230 11.93 |2230 1840 | 12.03 12.02 12.02 11.98 11.95 11.97 11.96 11.95 11.98 | 1900 1900 | 11.94 11.94 11.94 11.96 11.97 11.97 11.94 11.93 11/28/07 2000 12.02 |2000 1950 | 11.94 11.92 11.91 11.92 11.90 11.88 11.88 11.86 1430 11.81 |1430 1710 | 11.85 11.85 11.86 11.86 11.85 11.82 11.82 11.83 11.89 | 1790 1790 | 11.86 11.86 11.87 11.90 12.02 12.00 11.90 11.91 I'm stuck (again) and I don't think this is a white space issue or an improper newline placement. Rich ___ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug