Turned out to be a rather large refactoring and change.

This obviously will require quite a few more tests.

In short, what this does:
------------------------
Instead on deciding whether a port built by the apparition of packages, it
tracks the job builder status better.

If the job ended correctly, then the packages should be there.

If not, some of the packages may be there, but we should always register
an error (or, in case a machine hung up, retry).

This gives "another errorlist": stuff where packages should be there, but
are not, because of NFS.

So, we now have three lists corresponding (more or less) to live locks:
- outside locks.
- errors
- nfs hung packages.

The code gets refactored so that those are handled in a similar way, apart
from small strategy decision (in particular, nfs hangs only unlock early
when *every* package is accounted for), and makes some of the code a bit
more streamlined.

Why keep locks for nfs held jobs ?
Well, it makes things more visible, for starters.
The recent addition of "clean" to locks (for always clean) also means that
those extra locks won't prevent junk from running correctly.

More importantly, it gives bulk-builders a chance to rectify some specific
issue: sometimes, you update a port which is known to be failing before it
gets built. Then the port builds, but doesn't create the right packages because
of revision bumps.  dpb doesn't know better, and will keep these nfs locked
"forever". By handling this like other locks, this makes it possible for
the bulk-builder to rm the lock when he sees that stuff in the H report line.


There are several reasons behind this change.
--------------------------------------------
For starters, the old "waiting for nfs" hack turns out to be VERY expensive
when the nfs server takes too long to answer (yes, on a crowded server,
it managed to outlive its ten minutes). Also, tracking the end status is
more correct for reporting real errors, so that actual packages that show
up late only due to nfs do not add noise in the form of extra errors in
the engine.log.

----------------

However, this does require a bit more tests!  First, it is not a very 
small change. More importantly, this is a diff that DEALS ALMOST EXCLUSIVELY
in error conditions.  So testing every bizarre case is going to take a while.

I'd especially love ppl running dpb on busy nfs servers to try this out.
Watch out for the H=   line in the display (and "H: " lines in the engine)


Index: Engine.pm
===================================================================
RCS file: /home/openbsd/cvs/ports/infrastructure/lib/DPB/Engine.pm,v
retrieving revision 1.83
diff -u -p -r1.83 Engine.pm
--- Engine.pm   30 Jun 2013 16:45:16 -0000      1.83
+++ Engine.pm   15 Jul 2013 12:58:53 -0000
@@ -19,6 +19,131 @@ use strict;
 use warnings;
 
 use DPB::Limiter;
+package DPB::ErrorList::Base;
+
+sub new
+{
+       my $class = shift;
+       bless [], $class;
+}
+
+sub recheck
+{
+       my ($list, $engine) = @_;
+       return if @$list == 0;
+       my $locker = $engine->{locker};
+
+       my @keep = ();
+       while (my $v = shift @$list) {
+               if ($list->unlock_early($v, $engine)) {
+                       $locker->unlock($v);
+                       next;
+               }
+               if ($locker->locked($v)) {
+                       push(@keep, $v);
+               } else {
+                       $list->reprepare($v, $engine);
+               }
+       }
+       push(@$list, @keep) if @keep != 0;
+}
+
+sub stringize
+{
+       my $list = shift;
+       my @l = ();
+       for my $e (@$list) {
+               my $s = $e->logname;
+               if (defined $e->{host} && !$e->{host}->is_localhost) {
+                       $s .= "(".$e->{host}->name.")";
+               }
+               if (defined $e->{info} && $e->{info}->has_property('nojunk')) {
+                       $s .= '!';
+               }
+               push(@l, $s);
+       }
+       return join(' ', @l);
+}
+
+sub reprepare
+{
+       my ($class, $v, $engine) = @_;
+       $v->requeue($engine);
+}
+
+package DPB::ErrorList;
+our @ISA = (qw(DPB::ErrorList::Base));
+
+sub unlock_early
+{
+       my ($list, $v, $engine) = @_;
+       if ($v->unlock_conditions($engine)) {
+               $v->requeue($engine);
+               return 1;
+       } else {
+               return 0;
+       }
+}
+
+sub reprepare
+{
+       my ($list, $v, $engine) = @_;
+       $engine->rescan($v);
+}
+
+package DPB::LockList;
+our @ISA = (qw(DPB::ErrorList::Base));
+sub unlock_early
+{
+       &DPB::ErrorList::unlock_early;
+}
+
+sub stringize
+{
+       my $list = shift;
+       my @l = ();
+       my $done = {};
+       for my $e (@$list) {
+               my $s = $e->lockname;
+               if (!defined $done->{$s}) {
+                       push(@l, $s);
+                       $done->{$s} = 1;
+               }
+       }
+       return join(' ', @l);
+}
+
+package DPB::NFSList;
+our @ISA = (qw(DPB::ErrorList::Base));
+
+sub reprepare
+{
+       &DPB::ErrorList::reprepare;
+}
+
+sub unlock_early
+{
+       my ($list, $v, $engine) = @_;
+       my $okay = 1;
+       my $h = $engine->{nfs}{$v};
+       while (my ($k, $w) = each %$h) {
+               if ($engine->{builder}->check($w)) {
+                       $engine->mark_as_done($w);
+                       delete $h->{$w};
+               } else {
+                       $okay = 0;
+                       # infamous
+                       $engine->log('H', $v);
+               }
+       }
+       if ($okay) {
+               delete $engine->{nfs}{$v};
+       }
+       return $okay;
+}
+
+
+
 package DPB::SubEngine;
 sub new
 {
@@ -51,6 +176,12 @@ sub is_done_quick
        return $self->is_done(@_);
 }
 
+sub is_done_or_enqueue
+{
+       my $self =shift;
+       return $self->is_done(@_);
+}
+
 sub sorted
 {
        my ($self, $core) = @_;
@@ -111,6 +242,7 @@ sub start
 {
        my $self = shift;
        my $core = $self->get_core;
+
        if (@{$self->{engine}{requeued}} > 0) {
                $self->{engine}->rebuild_info($core);
                return;
@@ -200,19 +332,17 @@ sub done
 
 sub end
 {
-       my ($self, $core, $v) = @_;
+       my ($self, $core, $v, $fail) = @_;
        my $e = $core->mark_ready;
-       if ($self->is_done($v)) {
-               $self->{engine}{locker}->unlock($v);
-               $self->end_build($v);
-               $core->success;
-       } else {
+       if ($fail) {
                $core->failure;
                if (!$e || $core->{status} == 65280) {
                        $self->add($v);
                        $self->{engine}{locker}->unlock($v);
                        $self->log('N', $v);
                } else {
+                       # XXX in case some packages got built
+                       $self->is_done($v);
                        unshift(@{$self->{engine}{errors}}, $v);
                        $v->{host} = $core->host;
                        $self->log('E', $v);
@@ -220,9 +350,17 @@ sub end
                                $self->end_build($v);
                        }
                }
+       } else {
+               if ($self->is_done_or_enqueue($v)) {
+                       $self->{engine}{locker}->unlock($v);
+               } else {
+                       push(@{$self->{nfslist}}, $v);
+               }
+               $self->end_build($v);
+               $core->success;
+               $self->done($v);
+               $self->{engine}->flush;
        }
-       $self->done($v);
-       $self->{engine}->flush;
 }
 
 sub dump
@@ -244,6 +382,7 @@ sub new
        my $o = $class->SUPER::new($engine);
        $o->{builder} = $builder;
        $o->{toinstall} = [];
+       $o->{nfs} = {};
        return $o;
 }
 
@@ -295,6 +434,21 @@ sub mark_as_done
        $self->remove($v);
 }
 
+sub is_done_or_enqueue
+{
+       my ($self, $v) = @_;
+       my $okay = 1;
+       for my $w ($v->build_path_list) {
+               if ($self->{builder}->check($w)) {
+                       $self->mark_as_done($w);
+               } else {
+                       $self->{nfs}{$v}{$w} = $w;
+                       $okay = 0;
+               }
+       }
+       return $okay;
+}
+
 sub is_done
 {
        my ($self, $v) = @_;
@@ -342,7 +496,11 @@ sub start_build
        my ($self, $v, $core, $lock) = @_;
        $self->log('J', $v, " ".$core->hostname);
        $self->{engine}{affinity}->start($v, $core);
-       $self->{builder}->build($v, $core, $lock, sub {$self->end($core, $v)});
+       $self->{builder}->build($v, $core, $lock, 
+           sub {
+               my $fail = shift;
+               $self->end($core, $v, $fail);
+           });
 }
 
 sub end_build
@@ -405,7 +563,9 @@ sub start_build
        my ($self, $v, $core, $lock) = @_;
        $self->log('J', $v);
        DPB::Fetch->fetch($self->{engine}{logger}, $v, $core,
-           sub { $self->end($core, $v)});
+           sub { 
+               $self->end($core, $v, $core->{status});
+           });
 }
 
 sub end_build
@@ -429,8 +589,9 @@ sub new
            locker => $state->locker,
            logger => $state->logger,
            affinity => $state->{affinity},
-           errors => [],
-           locks => [],
+           errors => DPB::ErrorList->new,
+           locks => DPB::LockList->new,
+           nfslist => DPB::NFSList->new,
            ts => time(),
            requeued => [],
            ignored => []}, $class;
@@ -447,11 +608,9 @@ sub new
 sub recheck_errors
 {
        my $self = shift;
-       if (@{$self->{errors}} != 0 || @{$self->{locks}} != 0) {
-               $self->{locker}->recheck_errors($self);
-               return 1;
-       }
-       return 0;
+       $self->{errors}->recheck($self);
+       $self->{locks}->recheck($self);
+       $self->{nfslist}->recheck($self);
 }
 
 sub log_no_ts
@@ -488,38 +647,6 @@ sub count
        }
 }
 
-sub errors_string
-{
-       my ($self, $name) = @_;
-       my @l = ();
-       for my $e (@{$self->{$name}}) {
-               my $s = $e->logname;
-               if (defined $e->{host} && !$e->{host}->is_localhost) {
-                       $s .= "(".$e->{host}->name.")";
-               }
-               if (defined $e->{info} && $e->{info}->has_property('nojunk')) {
-                       $s .= '!';
-               }
-               push(@l, $s);
-       }
-       return join(' ', @l);
-}
-
-sub lock_errors_string
-{
-       my ($self, $name) = @_;
-       my @l = ();
-       my $done = {};
-       for my $e (@{$self->{$name}}) {
-               my $s = $e->lockname;
-               if (!defined $done->{$s}) {
-                       push(@l, $s);
-                       $done->{$s} = 1;
-               }
-       }
-       return join(' ', @l);
-}
-
 sub fetchcount
 {
        my ($self, $q, $t)= @_;
@@ -565,8 +692,9 @@ sub report
        return join(" ",
            $self->statline,
            "!=".$self->count("ignored"))."\n".
-           $self->may_add("L=", $self->lock_errors_string("locks")).
-           $self->may_add("E=", $self->errors_string("errors"));
+           $self->may_add("L=", $self->{locks}->stringize).
+           $self->may_add("E=", $self->{errors}->stringize). 
+           $self->may_add("H=", $self->{nfslist}->stringize);
 }
 
 sub stats
Index: Locks.pm
===================================================================
RCS file: /home/openbsd/cvs/ports/infrastructure/lib/DPB/Locks.pm,v
retrieving revision 1.21
diff -u -p -r1.21 Locks.pm
--- Locks.pm    12 Jul 2013 08:07:19 -0000      1.21
+++ Locks.pm    15 Jul 2013 10:52:12 -0000
@@ -156,32 +156,6 @@ sub locked
        return -e $self->lockname($v);
 }
 
-sub recheck_errors
-{
-       my ($self, $engine) = (@_);
-
-       for my $name (qw(errors locks)) {
-               my $e = $engine->{$name};
-               $engine->{$name} = [];
-               while (my $v = shift @$e) {
-                       if ($v->unlock_conditions($engine)) {
-                               $self->unlock($v);
-                               $v->requeue($engine);
-                               next;
-                       }
-                       if ($self->locked($v)) {
-                               push(@{$engine->{$name}}, $v);
-                       } else {
-                               if ($name eq 'errors') {
-                                       $engine->rescan($v);
-                               } else {
-                                       $v->requeue($engine);
-                               }
-                       }
-               }
-       }
-}
-
 sub find_dependencies
 {
        my ($self, $hostname) = @_;
Index: PortBuilder.pm
===================================================================
RCS file: /home/openbsd/cvs/ports/infrastructure/lib/DPB/PortBuilder.pm,v
retrieving revision 1.46
diff -u -p -r1.46 PortBuilder.pm
--- PortBuilder.pm      21 Jun 2013 23:13:37 -0000      1.46
+++ PortBuilder.pm      13 Jul 2013 07:44:45 -0000
@@ -206,7 +206,7 @@ sub build
                close($fh); 
                $self->end_lock($lock, $core, $job); 
                $self->report($v, $job, $core); 
-               &$final_sub;
+               &$final_sub($job->{failed});
            });
        $core->start_job($job, $v);
        if ($job->{parallel}) {
Index: Job/Port.pm
===================================================================
RCS file: /home/openbsd/cvs/ports/infrastructure/lib/DPB/Job/Port.pm,v
retrieving revision 1.109
diff -u -p -r1.109 Port.pm
--- Job/Port.pm 12 Jul 2013 08:07:19 -0000      1.109
+++ Job/Port.pm 13 Jul 2013 08:01:33 -0000
@@ -115,56 +115,6 @@ sub run
 
 sub notime { 0 }
 
-# this code is only necessary thanks to NFS's brain-damage...
-sub make_sure_we_have_packages
-{
-       my ($self, $core) = @_;
-       my $job = $core->job;
-       my $check = 1;
-       # check ALL BUILD_PACKAGES
-       for my $w ($job->{v}->build_path_list) {
-               if (!defined $w->{info}) {
-                       print {$job->{logfh}} ">>> ", $w->fullpkgpath,
-                        " may be missing\n", 
-                        ">>> but it has no associated info so we don't care\n";
-                       next;
-               }
-               if ($w->{info}->is_stub) {
-                       print {$job->{logfh}} ">>> ", $w->fullpkgpath,
-                        " may be missing\n", 
-                        ">>> but it can't be installed, so we don't care\n";
-                       next;
-               }
-               if (!$w->has_fullpkgname) {
-                       print {$job->{logfh}} ">>> ", $w->fullpkgpath,
-                        " may be missing\n", 
-                        ">>> but it has no fullpkgname, so we don't care\n";
-                       next;
-               }
-               my $f = $job->{builder}->pkgfile($w);
-               if (-f $f) {
-                       $job->{builder}->register_package($w);
-               } else {
-                       $check = 0;
-                       print {$job->{logfh}} ">>> Missing $f\n";
-               }
-       }
-       return if $check;
-       if (!defined $job->{waiting}) {
-               $job->{waiting} = 0;
-       }
-       if ($core->prop->{wait_timeout}) {
-               if ($job->{waiting}*10 > $core->prop->{wait_timeout}) {
-                       print {$job->{logfh}} ">>> giving up\n";
-               } else {
-                       print {$job->{logfh}} ">>> waiting 10 seconds\n";
-                       $job->insert_tasks(
-                           DPB::Task::Port::VerifyPackages->new(
-                               'waiting-for-nfs '.$job->{waiting}++));
-               }
-       }
-}
-
 package DPB::Task::Port;
 our @ISA = qw(DPB::Task::BasePort);
 
@@ -675,9 +625,7 @@ our @ISA = qw(DPB::Task::Port::BaseClean
 sub finalize
 {
        my ($self, $core) = @_;
-       if (!$self->requeue($core)) {
-               $self->make_sure_we_have_packages($core);
-       }
+       $self->requeue($core);
        $self->SUPER::finalize($core);
 }
 
@@ -689,7 +637,6 @@ sub finalize
        if ($core->{status} != 0) {
                return 0;
        }
-       $self->make_sure_we_have_packages($core);
 }
 
 sub run

Reply via email to