[ On , April 26, 2000 at 18:41:32 (+0200), Akim Demaille wrote: ]
> Subject: Performances of awk
>
> What's so expensive is the finalizing loop which uses a small AWK
> script. If you concentrate the measure on this very script, the
> performance penalty is frightening:
>
> | ~/src/fileutils % cat > /tmp/finalize.awk
> | {
> | sub(/[ ]*$/, "")
> | if ($0 == "")
> | {
> | if (!duplicate)
> | print
> | duplicate = 1
> | next
> | }
> | duplicate = 0
> | oline++
> | while (sub(/__oline__/, oline))
> | continue
> | while (sub(/@<:@/, "["))
> | continue
> | while (sub(/@:>@/, "]"))
> | continue
> | while (sub(/@S\|@/, "$"))
> | continue
> | while (sub(/@%:@/, "#"))
> | continue
> | print
> | }
> | ~/src/fileutils % time mawk -f /tmp/finalize.awk < configure >/dev/null
> | 9,51s user 0,02s system 100% cpu 9,521 total
> | ~/src/fileutils % time gawk -f /tmp/finalize.awk < configure >/dev/null
> | 0,89s user 0,01s system 101% cpu 0,890 total
>
> So, should we change AC_PROG_AWK? Should the package Autoconf use a
> different macro?
I don't know exactly what's going on here, but I am sure that mawk is
usually a good choice and sometimes the best choice still:
17:12 [104] $ time nawk -f /home/most/woods/src/finalize.awk < gmp/configure >>
0.28s real 0.15s user 0.11s system
17:12 [105] $ time mawk -f /home/most/woods/src/finalize.awk < gmp/configure >>
0.17s real 0.13s user 0.03s system
17:12 [106] $ time gawk -f /home/most/woods/src/finalize.awk < gmp/configure >>
0.33s real 0.27s user 0.05s system
17:13 [107] $ time nawk -f /home/most/woods/src/finalize.awk < ggrep/configure>
0.22s real 0.09s user 0.08s system
17:14 [108] $ time mawk -f /home/most/woods/src/finalize.awk < ggrep/configure>
0.08s real 0.05s user 0.01s system
17:14 [109] $ time gawk -f /home/most/woods/src/finalize.awk < ggrep/configure>
0.13s real 0.10s user 0.02s system
17:14 [110] $ time nawk -f /home/most/woods/src/finalize.awk < newsyslog/confi>
0.44s real 0.20s user 0.20s system
17:14 [111] $ time mawk -f /home/most/woods/src/finalize.awk < newsyslog/confi>
0.37s real 0.35s user 0.01s system
17:14 [112] $ time gawk -f /home/most/woods/src/finalize.awk < newsyslog/confi>
0.32s real 0.30s user 0.01s system
17:19 [132] $ time nawk -f /home/most/woods/src/finalize.awk < fingerd/configu>
0.53s real 0.22s user 0.24s system
17:19 [133] $ time mawk -f /home/most/woods/src/finalize.awk < fingerd/configu>
0.55s real 0.40s user 0.02s system
17:19 [134] $ time gawk -f /home/most/woods/src/finalize.awk < fingerd/configu>
0.40s real 0.32s user 0.03s system
Only this last one (which is the largest configure script I had handy)
showed mawk wasting time beyond all reason:
17:21 [136] $ time nawk -f /home/most/woods/src/finalize.awk < amanda/configur>
3.57s real 1.94s user 1.53s system
17:21 [137] $ time mawk -f /home/most/woods/src/finalize.awk < amanda/configur>
54.65s real 52.90s user 0.28s system
17:22 [138] $ time gawk -f /home/most/woods/src/finalize.awk < amanda/configur>
2.46s real 2.29s user 0.12s system
So, there seem to be some odd-ball cases where mawk is dramatically
slower than either nawk or gawk. Note that the above timings were done
with mawk-1.2.2. The current version is 1.3.3 so perhaps even the
odd-ball cases will work faster with it.
(what's interesting in my timings is the excessive system time that nawk
always seems to take! ;-)
Now I do have to ask what the purpose of that awk script could possibly
be, and why it has to do things the way it seems to want to do them?
--
Greg A. Woods
+1 416 218-0098 VE3TCP <[EMAIL PROTECTED]> <robohack!woods>
Planix, Inc. <[EMAIL PROTECTED]>; Secrets of the Weird <[EMAIL PROTECTED]>