Re: Checking if a streaming job failed

2009-04-02 Thread Miles Osborne
here is how i do it (in perl).  hadoop streaming is actually called by
a shell script, which in this case expects compressed input and
produces compressed output.  but you get the idea:

(the mailer had messed-up the formatting somewhat)
>
sub runStreamingCompInCompOut {
my $mapper = shift @_;
my $reducer = shift @_;
my $inDir = shift @_;
my $outDir = shift @_;
my $numMappers = shift @_;
my $numReducers = shift @_;
my $jobName = $runName . ":" . shift @_;
my $cmd = "sh runStreamingCompInCompOut.sh $mapper $reducer $inDir
$outDir $jobName $numMappers \$numReducers &> /tmp/.trace";
print STDERR "Running: $cmd\n";
system $cmd;
open IN, "/tmp/.trace" or die "can't open streaming trace";
while(!eof(IN)){
my $line = ;
(my $date,my $time,my $status) = split(/\s+/,$line);
if ($status eq "ERROR") {
print STDERR "command: $cmd failed\n";
exit(-1);
}
}
}


2009/4/3 Mayuran Yogarajah :
> Hello, does anyone know how I can check if a streaming job (in Perl) has
> failed or succeeded? The only way I can see at the moment is to check
> the web interface for that jobID and parse out the '*Status:*' value.
>
> Is it not possible to do this using 'hadoop job -status' ? I see there is a
> count
> for failed map/reduce tasks, but map/reduce tasks failing is normal (or so
> I thought).  I am under the impression that if a task fails it will simply
> be
> reassigned to a different node.  Is this not the case?  If this is normal
> then I
> can't reliably use this count to check if the job as a whole failed or
> succeeded.
>
> Any feedback is greatly appreciated.
>
> thanks,
> M
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Checking if a streaming job failed

2009-04-02 Thread Mayuran Yogarajah

Hello, does anyone know how I can check if a streaming job (in Perl) has
failed or succeeded? The only way I can see at the moment is to check
the web interface for that jobID and parse out the '*Status:*' value.

Is it not possible to do this using 'hadoop job -status' ? I see there 
is a count

for failed map/reduce tasks, but map/reduce tasks failing is normal (or so
I thought).  I am under the impression that if a task fails it will 
simply be
reassigned to a different node.  Is this not the case?  If this is 
normal then I
can't reliably use this count to check if the job as a whole failed or 
succeeded.


Any feedback is greatly appreciated.

thanks,
M