On Tue, Oct/06/2009 10:23:48AM, Ashley Pittman wrote:
> 
> Further to the mail linked below, padb is able to perform diagnostics,
> including backtraces on hung jobs and integrates well into automated
> testing environments.

Can padb get a backtrace from a non-debuggable MPI (e.g., not compiled
with -g)?

-Ethan

> 
> The attached patch is a minimal change which should enable the
> functionality.  I don't however have access to a working MTT
> installation to test this however.
> 
> http://www.open-mpi.org/community/lists/mtt-devel/2009/06/0415.php
> 
> This will require a HEAD version of padb, at least r273 to allow it to
> accept the pid of mpirun rather than a jobid assigned by the underlying
> resource manager.
> 
> Yours,
> 
> Ashley,
> 
> -- 
> 
> Ashley Pittman, Bath, UK.
> 
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk

> Index: lib/MTT/DoCommand.pm
> ===================================================================
> --- lib/MTT/DoCommand.pm      (revision 1322)
> +++ lib/MTT/DoCommand.pm      (working copy)
> @@ -359,6 +359,7 @@
>      }
>      my $killed_status = undef;
>      my $last_over = 0;
> +    my $padb_output;
>      while ($done > 0) {
>          my $nfound = select($rout = $rin, undef, undef, $t);
>          if (vec($rout, fileno(OUTread), 1) == 1) {
> @@ -410,6 +411,8 @@
>                  my $timeout_email_recipient = 
> $MTT::Globals::Values->{docommand_timeout_notify_email};
>                  my $timeout_notify_timeout  = 
> $MTT::Globals::Values->{docommand_timeout_notify_timeout};
>  
> +             $padb_output = `padb --config-option rmgr=mpirun 
> --full-report=$pid`;
> +
>                  if (defined($timeout_sentinel_file)) {
>  
>                      # Email someone, if an email address has been specified
> @@ -493,6 +496,9 @@
>      # Return an anonymous hash containing the relevant data
>  
>      $ret->{result_stdout} = join('', @out);
> +    if ( defined $padb_output ) {
> +      $ret->{result_stdout} .= "\n$padb_output";
> +    }
>      $ret->{result_stderr} = join('', @err),
>          if (!$merge_output);
>      return $ret;

> _______________________________________________
> mtt-devel mailing list
> mtt-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel

Reply via email to