Re: OT Perl Optimizations Was: undef & NULL

James D. White Fri, 14 Sep 2001 14:15:26 -0700
I modified Peter J. Holzer's code to test both the case where the
$i is set to 1 (Holzer's case) and where $i is 0 (which forces
more assignments).  I also added a straight assignment
statement to each set.  The program and results are below.
This test was run three times on a Sun Ultra 5, with Solaris 8
and Perl 5.6.1.  The first run was when I left for lunch -- I do
not know what else might have been running during that test.
The other tests were run while I was at my desk.  Top showed
that the test program was getting 98+% of the processor during
those times.  Each of the second and and third tests took about
22 minutes according to time.

Questions:
-Why does Benchmark show negative times?
-Why is there a several-fold timing difference between runs of
  the same code?
-How can any test in the second set ($i set to 0) be faster than
  a straight assignment without a preceding test?

I have not looked at the code for Benchmark, but it the only way
I can reconcile the output is:

1) The timer used by Benchmark uses wall clock time, not CPU
    time, and there was some other process(es) competing with
    the first test, inflating the times.
2) Benchmark computes times according to the following
    simplified pseudo-code:
        $total_time = 0;
        while($loop_count-- > 0) {
            $start_time = get_the_time_somehow();
            do_a_test();
            $end_time = get_the_time_somehow();
            $total_time += $end_time - $start_time;
        }
        print "total_time=$total_time\n";
3) The "get_the_time_somehow()" routine used by Benchmark
    is not getting the time in an atomic way. i.e., on rare occasions,
    part of the time value is getting updated during retrieval of the
    time.  If the high-order portion is retrieved, first, then the
    low-order portion may roll-over to zero, thus sometimes
    understating the true elapsed time.  If the low-order portion
    were retrieved first, then the high-order part could be
    incremented before retrieval, thus overstating the time.  (My
    system is running in 32-bit mode, maybe 64-bit mode would
    not have this problem, but I have not tested it.)
4) Benchmark is thus unreliable for measuring trivial pieces of
    code, such as that used in these tests.  Increasing the loop
    counter for the tests does not necessarily increase accuracy,
    because the chance of getting an inaccurate time value is
    increased as well.

Suggestions:

1) The "get_the_time_somehow()" routine needs to be examined
    and changed to get an atomic time value.  (I do not mean from
    an atomic clock, just that the value be sampled as a whole at
    some instant in time, not extracted in pieces.)
2) Benchmark could be changed to compute the time in a manner
    similar to the following:
        $counter = $loop_count;
        $start_time1 = get_the_time_somehow();
        while($counter-- > 0) {
            do_a_test();
        }
        $end_time1 = get_the_time_somehow();
        $start_time2 = get_the_time_somehow();
        while($counter-- > 0) {
            # measure loop overhead without the code to be tested
        }
        $end_time2 = get_the_time_somehow();
        $total_time += ($end_time1 - $start_time1) - ($end_time2 - $start_time2);
        print "total_time=$total_time\n";

Results:

To get back to the original question about the speed of the various
alternatives, I threw out the first set of tests, because I am assuming
that some other process was running in the background and inflating
the time estimates dramatically.  Next in order to account for the
apparent negative bias of the timing errors, I used the larger of the
two second runs.  The results are as follows:

$i = 1 (minimal assignments)

1) ||= do
2) post unless
3) ||=
3) pre unless
5) assign
6) ?:

The first four are in the range 4-6 seconds , probably equal within the
precision of the timing.  The last two are 10 and 12 seconds.  The first
four test, but do not assign.  Given optimal code optimization, they
should all do exactly the same thing in this test.  The fifth always
assigns, but does not test.  The last and slowest always both tests
and assigns.

$i =  (maximal assignments)

1) assign
2) post unless
3) ||= do
3) ||=
5) ?:
6) pre unless

The fastest is "assign" (but never test) at 12 seconds.  All of the
rest always both test and assign.  "Post unless" is 14 seconds.  The
rest all took 16-18 seconds.  It is difficult to prove a difference in the
last four or maybe five, given the wide variance of timings for these tests.

Conclusions:
1) These timings are not inconsistent with the expected behaviour of
    the code generated by these varying code constructs.
2) Do not believe the timings from Benchmark for trivial code examples
    such as these without carefully analyzing the results from different
    runs (each run under identical conditions) to try to account for the
    negative bias introduced by timing errors.  Benchmark should be more
    useful for testing longer code fragments, where the time to run each
    test is much longer than the timing errors.

Jim White

--------------------Program--------------------
#!/usr/local/bin/perl -w
   use strict;
   use Benchmark qw/cmpthese/;

   my $i = 1;
   print "test with \$i set to 1\n";
   cmpthese(10_000_000, {
       'assign'            => sub { $i = 1 },
       '||='               => sub { $i ||= 1 },
       '||= do'            => sub { $i ||= do { 1 } },
       'post unless'       => sub { $i = 1 unless $i },
       'pre unless'        => sub { unless ($i) { $i = 1 } },
       '?:'                => sub { $i = $i ? $i : 1 },
   }
   );

   print "\ntest with \$i set to 0\n";
   $i = 0;
   cmpthese(10_000_000, {
       'assign'            => sub { $i = 0 },
       '||='               => sub { $i ||= 0 },
       '||= do'            => sub { $i ||= do { 0 } },
       'post unless'       => sub { $i = 0 unless $i },
       'pre unless'        => sub { unless ($i) { $i = 0 } },
       '?:'                => sub { $i = $i ? $i : 0 },
   }
   );
--------------------Output1--------------------
test with $i set to 1
Benchmark: timing 10000000 iterations of ?:, assign, post unless, pre unless, ||=,
||= do...
        ?:: 30 wallclock secs (30.77 usr +  0.00 sys = 30.77 CPU) @ 324991.88/s
(n=10000000)
    assign: 51 wallclock secs (51.14 usr +  0.00 sys = 51.14 CPU) @ 195541.65/s
(n=10000000)
post unless: -4 wallclock secs (-4.62 usr +  0.00 sys = -4.62 CPU) @ -2164502.16/s
(n=10000000)
            (warning: too few iterations for a reliable count)
pre unless: 22 wallclock secs (22.76 usr +  0.00 sys = 22.76 CPU) @ 439367.31/s
(n=10000000)
       ||=:  7 wallclock secs ( 7.44 usr +  0.00 sys =  7.44 CPU) @ 1344086.02/s
(n=10000000)
    ||= do: 11 wallclock secs (10.52 usr +  0.00 sys = 10.52 CPU) @ 950570.34/s
(n=10000000)
                   Rate post unless  assign       ?: pre unless  ||= do      ||=
post unless -2.16e+06/s          --  -1207%    -766%      -593%   -328%    -261%
assign         195542/s       -109%      --     -40%       -55%    -79%     -85%
?:             324992/s       -115%     66%       --       -26%    -66%     -76%
pre unless     439367/s       -120%    125%      35%         --    -54%     -67%
||= do         950570/s       -144%    386%     192%       116%      --     -29%
||=           1344086/s       -162%    587%     314%       206%     41%       --

test with $i set to 0
Benchmark: timing 10000000 iterations of ?:, assign, post unless, pre unless, ||=,
||= do...
        ?:: 67 wallclock secs (67.43 usr +  0.00 sys = 67.43 CPU) @ 148301.94/s
(n=10000000)
    assign: 51 wallclock secs (50.63 usr +  0.00 sys = 50.63 CPU) @ 197511.36/s
(n=10000000)
post unless: 26 wallclock secs (26.02 usr +  0.00 sys = 26.02 CPU) @ 384319.75/s
(n=10000000)
pre unless: 46 wallclock secs (44.48 usr +  0.00 sys = 44.48 CPU) @ 224820.14/s
(n=10000000)
       ||=: 48 wallclock secs (47.47 usr +  0.00 sys = 47.47 CPU) @ 210659.36/s
(n=10000000)
    ||= do: 68 wallclock secs (66.96 usr +  0.00 sys = 66.96 CPU) @ 149342.89/s
(n=10000000)
                Rate        ?:   ||= do   assign      ||= pre unless post unless
?:          148302/s        --      -1%     -25%     -30%       -34%        -61%
||= do      149343/s        1%       --     -24%     -29%       -34%        -61%
assign      197511/s       33%      32%       --      -6%       -12%        -49%
||=         210659/s       42%      41%       7%       --        -6%        -45%
pre unless  224820/s       52%      51%      14%       7%         --        -42%
post unless 384320/s      159%     157%      95%      82%        71%          --
--------------------Output2--------------------
test with $i set to 1
Benchmark: timing 10000000 iterations of ?:, assign, post unless, pre unless, ||=,
||= do...
        ?:: 20 wallclock secs (17.57 usr +  0.01 sys = 17.58 CPU) @ 568828.21/s
(n=10000000)
    assign: 15 wallclock secs (14.64 usr +  0.00 sys = 14.64 CPU) @ 683060.11/s
(n=10000000)
dna2.chem.ou.edu%
dna2.chem.ou.edu%
dna2.chem.ou.edu%
dna2.chem.ou.edu% time !!
time OTPerl_benchmark.pl
test with $i set to 1
Benchmark: timing 10000000 iterations of ?:, assign, post unless, pre unless, ||=,
||= do...
        ?:: 12 wallclock secs (11.78 usr +  0.00 sys = 11.78 CPU) @ 848896.43/s
(n=10000000)
    assign:  8 wallclock secs ( 8.78 usr +  0.00 sys =  8.78 CPU) @ 1138952.16/s
(n=10000000)
post unless:  5 wallclock secs ( 4.20 usr +  0.00 sys =  4.20 CPU) @ 2380952.38/s
(n=10000000)
pre unless:  6 wallclock secs ( 5.34 usr +  0.00 sys =  5.34 CPU) @ 1872659.18/s
(n=10000000)
       ||=: -1 wallclock secs (-1.45 usr +  0.00 sys = -1.45 CPU) @ -6896551.72/s
(n=10000000)
            (warning: too few iterations for a reliable count)
    ||= do:  2 wallclock secs ( 2.37 usr +  0.00 sys =  2.37 CPU) @ 4219409.28/s
(n=10000000)
                   Rate      ||=       ?:  assign pre unless post unless  ||= do
||=         -6.90e+06/s       --    -912%   -706%      -468%       -390%   -263%
?:             848896/s    -112%       --    -25%       -55%        -64%    -80%
assign        1138952/s    -117%      34%      --       -39%        -52%    -73%
pre unless    1872659/s    -127%     121%     64%         --        -21%    -56%
post unless   2380952/s    -135%     180%    109%        27%          --    -44%
||= do        4219409/s    -161%     397%    270%       125%         77%      --

test with $i set to 0
Benchmark: timing 10000000 iterations of ?:, assign, post unless, pre unless, ||=,
||= do...
        ?:: 11 wallclock secs (10.98 usr +  0.00 sys = 10.98 CPU) @ 910746.81/s
(n=10000000)
    assign:  9 wallclock secs ( 8.33 usr +  0.00 sys =  8.33 CPU) @ 1200480.19/s
(n=10000000)
post unless: 14 wallclock secs (13.37 usr +  0.00 sys = 13.37 CPU) @ 747943.16/s
(n=10000000)
pre unless: 18 wallclock secs (18.28 usr +  0.00 sys = 18.28 CPU) @ 547045.95/s
(n=10000000)
       ||=: 10 wallclock secs (10.24 usr +  0.00 sys = 10.24 CPU) @ 976562.50/s
(n=10000000)
    ||= do:  9 wallclock secs ( 9.86 usr +  0.00 sys =  9.86 CPU) @ 1014198.78/s
(n=10000000)
                 Rate pre unless post unless       ?:      ||=   ||= do   assign
pre unless   547046/s         --        -27%     -40%     -44%     -46%     -54%
post unless  747943/s        37%          --     -18%     -23%     -26%     -38%
?:           910747/s        66%         22%       --      -7%     -10%     -24%
||=          976562/s        79%         31%       7%       --      -4%     -19%
||= do      1014199/s        85%         36%      11%       4%       --     -16%
assign      1200480/s       119%         61%      32%      23%      18%       --
--------------------Output3--------------------
test with $i set to 1
Benchmark: timing 10000000 iterations of ?:, assign, post unless, pre unless, ||=,
||= do...
        ?:: 12 wallclock secs (11.84 usr +  0.00 sys = 11.84 CPU) @ 844594.59/s
(n=10000000)
    assign: 10 wallclock secs (10.47 usr +  0.00 sys = 10.47 CPU) @ 955109.84/s
(n=10000000)
post unless:  0 wallclock secs (-0.12 usr +  0.00 sys = -0.12 CPU) @ -83333333.33/s
(n=10000000)
            (warning: too few iterations for a reliable count)
pre unless:  1 wallclock secs ( 1.65 usr +  0.00 sys =  1.65 CPU) @ 6060606.06/s
(n=10000000)
       ||=:  6 wallclock secs ( 5.09 usr +  0.00 sys =  5.09 CPU) @ 1964636.54/s
(n=10000000)
    ||= do:  4 wallclock secs ( 3.95 usr +  0.00 sys =  3.95 CPU) @ 2531645.57/s
(n=10000000)
                   Rate post unless       ?:   assign     ||=  ||= do pre unless
post unless -8.33e+07/s          --   -9967%   -8825%  -4342%  -3392%     -1475%
?:             844595/s       -101%       --     -12%    -57%    -67%       -86%
assign         955110/s       -101%      13%       --    -51%    -62%       -84%
||=           1964637/s       -102%     133%     106%      --    -22%       -68%
||= do        2531646/s       -103%     200%     165%     29%      --       -58%
pre unless    6060606/s       -107%     618%     535%    208%    139%         --

test with $i set to 0
Benchmark: timing 10000000 iterations of ?:, assign, post unless, pre unless, ||=,
||= do...
        ?:: 17 wallclock secs (15.22 usr +  0.00 sys = 15.22 CPU) @ 657030.22/s
(n=10000000)
    assign: 12 wallclock secs (11.28 usr +  0.00 sys = 11.28 CPU) @ 886524.82/s
(n=10000000)
post unless: 10 wallclock secs (10.62 usr +  0.00 sys = 10.62 CPU) @ 941619.59/s
(n=10000000)
pre unless: 13 wallclock secs (12.83 usr +  0.00 sys = 12.83 CPU) @ 779423.23/s
(n=10000000)
       ||=: 16 wallclock secs (15.34 usr +  0.00 sys = 15.34 CPU) @ 651890.48/s
(n=10000000)
    ||= do: 16 wallclock secs (16.21 usr +  0.00 sys = 16.21 CPU) @ 616903.15/s
(n=10000000)
                Rate   ||= do       ||=       ?: pre unless   assign post unless
||= do      616903/s       --       -5%      -6%       -21%     -30%        -34%
||=         651890/s       6%        --      -1%       -16%     -26%        -31%
?:          657030/s       7%        1%       --       -16%     -26%        -30%
pre unless  779423/s      26%       20%      19%         --     -12%        -17%
assign      886525/s      44%       36%      35%        14%       --         -6%
post unless 941620/s      53%       44%      43%        21%       6%          --
--------------------Perl -V Output--------------------
% perl -V
Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration:
  Platform:
    osname=solaris, osvers=2.8, archname=sun4-solaris
    uname='sunos dna2.chem.ou.edu 5.8 generic_108528-08 sun4u sparc sunw,ultra-5_10
'
    config_args='-Dcc=gcc'
    hint=previous, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
  Compiler:
    cc='gcc', ccflags ='-fno-strict-aliasing -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O',
    cppflags='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64'
    ccversion='', gccversion='2.95.2 19991024 (release)', gccosandvers='solaris2.7'
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib '
    libpth=/usr/local/lib /usr/lib /usr/ccs/lib
    libs=-lsocket -lnsl -ldl -lm -lc
    perllibs=-lsocket -lnsl -ldl -lm -lc
    libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-fPIC', lddlflags='-G -L/usr/local/lib'


Characteristics of this binary (from libperl):
  Compile-time options: USE_LARGE_FILES
  Built under solaris
  Compiled at Jul 26 2001 10:53:59
  @INC:
    /usr/local/lib/perl5/5.6.1/sun4-solaris
    /usr/local/lib/perl5/5.6.1
    /usr/local/lib/perl5/site_perl/5.6.1/sun4-solaris
    /usr/local/lib/perl5/site_perl/5.6.1
    /usr/local/lib/perl5/site_perl
    .
--------------------End of Output--------------------


"Peter J . Holzer" wrote:

> On 2001-09-13 17:01:20 -0400, Rob Ransbottom wrote:
> > On Thu, 13 Sep 2001 [EMAIL PROTECTED] wrote:
> >
> > > I have a minor optimization suggestion.  Instead of this:
> >
> > > unless ($sth_routine_name) {
> > >     #setup statement handle
> > > }
> >
> > > Do this:
> > >
> > > $sth_routine_name ||= $dbh->prepare(....);
> >
> > > pretty sure this is more efficient than setting up the unless block.
>
> First, I should say that I think that the speed differences between
> these methods are IMHO negligible in almost all situations and that one
> should use the most readable version (I like ||=, btw. It looks neat).
>
> > True, these are about the same, but in order of speed:
> >
> >      $i ||= 1;
> >      $i = 1 unless $i;
> >      $i = $i ? 1 : 0;
> >
> > All faster than:
> >
> >      unless ( $i) { $i = 0;}
>
> Just for fun I tried that with perl 5.6.0 under Linux on a Pentium
> II/233, and unless ( $i) { $i = 1; } came out fastest (yes, I changed 0
> to 1 here. Otherwise it would not be equivalent to $i ||= 1):
>
>                  Rate          ?: post unless         ||=      ||= do pre unless
> ?:          1440922/s          --        -53%        -57%        -76%       -80%
> post unless 3086420/s        114%          --         -7%        -49%       -57%
> ||=         3333333/s        131%          8%          --        -45%       -53%
> ||= do      6024096/s        318%         95%         81%          --       -16%
> pre unless  7142857/s        396%        131%        114%         19%         --
>
> here is the script:
>
> #!/usr/local/bin/perl -w
> use strict;
> use Benchmark qw/cmpthese/;
>
> my $i = 1;
> cmpthese(5000_000, {
>     '||='               => sub { $i ||= 1 },
>     '||= do'            => sub { $i ||= do { 1 } },
>     'post unless'       => sub { $i = 1 unless $i },
>     'pre unless'        => sub { unless ($i) { $i = 1 } },
>     '?:'                => sub { $i = $i ? $i : 1 },
> }
> );
>
> I don't trust the Benchmark module, though. It reported one negative
> "wallclock secs" value at every run, and the wallclock and usr times
> differ too much for a test which should essentially take 100% user time.
>
>         hp
>
> --
>    _  | Peter J. Holzer      | My definition of a stupid question is
> |_|_) | Sysadmin WSR / LUGA  | "a question that if you're embarassed to
> | |   | [EMAIL PROTECTED]        | ask it, you stay stupid."
> __/   | http://www.hjp.at/   |    -- Tim Helck on dbi-users, 2001-07-30
>
>   -------------------------------------------------------------------------------
>    Part 1.2Type: application/pgp-signature

--

James D. White   ([EMAIL PROTECTED])
Department of Chemistry and Biochemistry
University of Oklahoma
620 Parrington Oval, Room 313
Norman, OK 73019-3051
Phone: (405) 325-4912, FAX: (405) 325-7762
Re: OT Perl Optimizations Was: undef & NULL

Reply via email to