a new utility: failure tests sequence finder

Stas Bekman Fri, 07 Dec 2001 08:43:40 -0800

[crossposting httpd/modperl test/dev lists, since it's relevant to both]

Intro: When we try to test a stateless machine (i.e. all tests are
independent), running all tests once ensures that all tested things
properly work. However when a state machine is tested
(i.e. where a run of one test may influence another test) it's not
enough to run all the tests once to know that the tested features
actually work. It's quite possible that if the same tests are run in
a different order and/or repeated a few times, some tests may fail.
This usually happens when some tests don't restore the system under
test to its pristine state at the end of the run, which may
influence other tests which rely on the fact that they start on
pristine state, when in fact it's not true anymore. In fact it's
possible that a single test may fail when run twice or three times
in a sequence.


Solution: To reduce the possibility of such dependency errors, it's
important to run random testing repeated many times with many
different srand seeds. Of course if no failures get spotted that
doesn't mean that there are no tests inter-dependencies, which may
cause a failure in production. But random testing definitely helps
to spot many problems and gives better test coverage.

Resolving sequence problems: When this kind of testing is used and a
failure is detected there are two problems:
* First is to be able to reproduce the problem so if we think we fix
   it, we could verify the fix. This one is easy, just remember the
   sequence of tests run till the failed test and rerun the same
   sequence once again after the problem has been fixed.
* Second is to be able to understand the cause of the problem. If
   during the random test the failure has happened after running 400
   tests, how can we possibly know which previously running tests has
   caused to the failure of the test 401. Chances are that most of
   the tests were clean and don't have inter-dependency
   problem. Therefore it'd be very helpful if we could reduce the
   long sequence to a minimum. Preferably 1 or 2 tests. That's when
   we can try to understand the cause of the detected problem.

   This program attempts to solve both problems, and at the end of
   each iteration print a minimal sequence of tests causing to a
   failure. This doesn't always succeed, but works in many cases.

This utility:
1. runs the tests randomly until the first failure is detected.
2. then it tries to reduce that sequence of tests to a minimum, and
    this sequence still causes to the same failure.

X. (XXX: not yet) then it reruns the minimal sequence in the verbose
    mode and saves the output

If the program's first argument is NONSTOP, it will run in a loop
MAX_ITER times or until aborted. On each iteration it will use a
different random seed which potentially should detect different
failures. After each iteration a report is created which is saved
into a file of format: smoke-report-Fri_Dec__7_19:28:57_2001.txt

so if before you went to sleep you were running:
     % t/TEST
now you should run
     % util/smokerandom.pl NONSTOP
from the same directory

the utility is not in cvs yet, so I've attached it here.

_____________________________________________________________________
Stas Bekman             JAm_pH      --   Just Another mod_perl Hacker
http://stason.org/      mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/

#!/usr/bin/perl

use warnings FATAL => 'all';
use strict;

#
# Intro: When we try to test a stateless machine (i.e. all tests are
# independent), running all tests once ensures that all tested things
# properly work. However when a state machine is tested (i.e. where a
# run of one test may influence another test) it's not enough to run
# all the tests once to know that the tested features actually
# work. It's quite possible that if the same tests are run in a
# different order and/or repeated a few times, some tests may fail.
# This usually happens when some tests don't restore the system under
# test to its pristine state at the end of the run, which may
# influence other tests which rely on the fact that they start on
# pristine state, when in fact it's not true anymore. In fact it's
# possible that a single test may fail when run twice or three times
# in a sequence.
#
# Solution: To reduce the possibility of such dependency errors, it's
# important to run random testing repeated many times with many
# different srand seeds. Of course if no failures get spotted that
# doesn't mean that there are no tests inter-dependencies, which may
# cause a failure in production. But random testing definitely helps
# to spot many problems and gives better test coverage.
#
# Resolving sequence problems: When this kind of testing is used and a
# failure is detected there are two problems:
# * First is to be able to reproduce the problem so if we think we fix
#   it, we could verify the fix. This one is easy, just remember the
#   sequence of tests run till the failed test and rerun the same
#   sequence once again after the problem has been fixed.
# * Second is to be able to understand the cause of the problem. If
#   during the random test the failure has happened after running 400
#   tests, how can we possibly know which previously running tests has
#   caused to the failure of the test 401. Chances are that most of
#   the tests were clean and don't have inter-dependency
#   problem. Therefore it'd be very helpful if we could reduce the
#   long sequence to a minimum. Preferably 1 or 2 tests. That's when
#   we can try to understand the cause of the detected problem.
#
#   This program attempts to solve both problems, and at the end of
#   each iteration print a minimal sequence of tests causing to a
#   failure. This doesn't always succeed, but works in many cases.
#
# This utility:
# 1. runs the tests randomly until the first failure is detected.
# 2. then it tries to reduce that sequence of tests to a minimum, and
#    this sequence still causes to the same failure.
#
# X. (XXX: not yet) then it reruns the minimal sequence in the verbose
#    mode and saves the output
#
# If the program's first argument is NONSTOP, it will run in a loop
# MAX_ITER times or until aborted. On each iteration it will use a
# different random seed which potentially should detect different
# failures. After each iteration a report is created which is saved
# into a file of format: smoke-report-Fri_Dec__7_19:28:57_2001.txt
#
# so if before you went to sleep you were running:
#   % t/TEST
# now you should run
#   % util/smokerandom.pl NONSTOP
# from the same directory

use POSIX qw(SIGINT SIGHUP);

use constant DEBUG     => 1;

# how many various seeds to try in NONSTOP mode
use constant MAX_ITER  => 50;

# if after this number of tries to reduce the number of tests fails we
# give up on more tries
use constant MAX_REDUCE_TRIES => 30;

use subs qw(debug notice);

use FindBin;
chdir "$FindBin::Bin/..";

################################

my $run_non_stop = shift @ARGV if @ARGV && $ARGV[0] eq 'NONSTOP';

#die "must pass the command line to run!" unless @ARGV;
#my $command = "@ARGV";
my $command = 't/TEST -times=20 -order=random';

my %seen;

my $report_fh = report_start();
my $iter = 0;
do {
    $iter++;
    %seen = (); # reset
    run_iter($iter, $command);
    exit unless $run_non_stop;
} while ($iter++ < MAX_ITER);
report_finish($report_fh);

sub run_iter {
    my($iter, $command) = @_;

    my $reduce_iter = 0;
    my @good = ();
    my %seen = ();
    notice sprintf "[%03d-%02d-%02d] trying all tests",
        $iter, $reduce_iter, 0;

    # first time run all tests (so we don't specify them)
    my $bad = run_test($iter, $reduce_iter, $command, \@good);
    return unless $bad;

    # XXX: need to handle this generically 
    $command =~ s/-order=\S+//;
    $command =~ s/-times=\d+//;

    # positive failure
    my $ok_tests = @good;
    while (@good > 1) {
        my $tries = 0;
        my $reduce_sub = reduce_stream(\@good);
        $reduce_iter++;
        while ($tries++ < MAX_REDUCE_TRIES) {
            my @try = @{ $reduce_sub->() };
            my $try_command = "$command @try $bad";
            notice sprintf "[%03d-%02d-%02d] trying %d tests", 
                $iter, $reduce_iter, $tries, scalar(@try);
            my @ok = ();
            my $new_bad = run_test($iter, $reduce_iter, $try_command, \@ok);
            if ($new_bad) {
                # successful reduction
                @good = @ok;
                $tries = 0;
                print "!!!!!!!!!!!!!!!!!!!! reduced !!!!!!!!!!!!!!!!\n";
                last;
            }
        }
        # last round of reducing has failed, so we give up
        last if $tries >= MAX_REDUCE_TRIES;
    }

    my $sequence = "$command @good $bad";
    my $tests = @good + 1;
    report($iter, $reduce_iter, $sequence, $tests);
}

sub run_test {
    my($iter, $count, $command, $ra_ok) = @_;
    my $bad = '';

    notice $command;

    #$SIG{PIPE} = 'IGNORE';
    $SIG{PIPE} = sub { die "pipe broke" };
    open my $pipe, "$command |" or die "cannot fork: $!";
#    open my $pipe, "$command 2>&1|" or die "cannot fork: $!";
    my $oldfh = select($pipe); $| = 1; select($oldfh);

    while (my $t = <$pipe>) {
        next unless $t =~ /^(\S+?)\.+(ok|FAILED)/;
        push(@$ra_ok, $1), next if $2 eq 'ok';

        # failure
        $bad = $1;
debug "$1: $2";
        last;
    }
    close $pipe or die "bad netstat: $! $?";
    kill_proc();

    return $bad;

}

# my $sub = reduce_strem(\@items);
sub reduce_stream {
    my @items = @{+shift};
    my $c = 0;
    my $items = @items;
    my $odd = $items % 2 ? 1 : 0;
#    my $left  = int($items/2);
#    my $right = $items - $left;
    return sub {
        $c++; # will be used in the future as a real stream's counter
        # cannot reduce 1 item anymore
        return \@items if $items == 1;

        my $repeat = 0;
        my $max_repeat_tries = 50; # avoid seen sequences
        my @try = ();
        do {
            # try to use a random window size alg
            my $left = int rand($items);
            $left = $items - 1 if $left == $items - 1;
            my $right = $left + int rand($items - $left);
            $right = $items - 1 if $right >= $items;
            @try = @items[$left..$right];
        } while ($seen{join '', @try}++ && $repeat++ < $max_repeat_tries);
        return \@try;
    }
}

sub report_start {
    my $time = scalar localtime;
    (my $str_time = $time) =~ s/\s/_/g;
    my $file = "smoke-report-$str_time.txt";
    open my $fh, ">$file" or die "cannot open $file for writing: $!";
    print $fh "Special Tests Sequence Failure Finder Report from $time\n";
    print $fh "Started at: $time\n";
    print $fh "-" x 74, "\n";
    $SIG{INT} = sub {print $fh "!!! Aborted by user\n";
                     report_finish($fh);
                     exit;};
    return $fh;
}

sub report {
    my ($iter, $reduce_iter, $sequence, $tests) = @_;
    my @report = ("iteration $iter ($tests tests):\n",
        "\t$sequence\n",
        "(made $reduce_iter reduction attempts)\n\n");
    print @report;
    print $report_fh @report;
}

sub report_finish {
    my($fh) = shift;
    my $time = scalar localtime;
    print $fh "-" x 74, "\n";
    print $fh "Ended at  : $time\n";
    close $report_fh;
}

sub kill_proc {
    # a hack
    my $file = "t/logs/httpd.pid";
    return unless -f $file;
    my $pid = `cat $file`;
    chomp $pid;
    return unless $pid;
    kill SIGINT, $pid;
}

sub debug {
    print "***: @_\n" if DEBUG;
}

sub notice {
    print "+++: @_\n";
}

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

a new utility: failure tests sequence finder

Reply via email to