Re: Issue with Hadoop Streaming

Subir S Fri, 03 Aug 2012 20:17:35 -0700

In streaming contents of the file will be streamed to mapper through
STDIN, not the file names.


Fix the perl script accordingly.

Thanks, Subir

On 8/3/12, Devi Kumarappan <kpala...@att.net> wrote:
>
>
> After specifying NLineInputFormat option, streaming job fails with
>
> Error from attempt_201205171448_0092_m_000000_0: java.lang.RuntimeException:
> PipeMapRed.waitOutputThreads(): subprocess failed with code 2
>
> It spawns two mappers, but i am not sure whether the mapper runs with file
> names
> specified in the input option.  I was expecting one mapper to run with
> /user/devi/s_input/a.txt and one mapper to run with
> /user/devi/s_input/b.txt. I
> digged into the task files, but could not find anything.
>
> Here is the simple  mapper perl script .All does is it reads the file and
> prints
> it. (It needs to do much more stuff, but I could not get the basic job
> itself to
> run).
>
>  $i = 0;
>    $userinput = <STDIN>;
>    open(INFILE,"$userinput") || die "could not open the file $userinput \n";
>    while (<INFILE>) {
>      my $line = $_;
>      print "$i".$line ;
>      $i++;
>    }
>    close(INFILE);
> exit;
>
> My command is hadoop jar
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar
> -input
> /user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl
> /home/devi/Perl/crash_parser.pl" -inputformat
> org.apache.hadoop.mapred.lib.NLineInputFormat
>
>
> Really appreciate your help.
>
> Devi
>
>
>
>
>
>
> ________________________________
> From: Robert Evans <ev...@yahoo-inc.com>
> To: "mapreduce-u...@hadoop.apache.org" <mapreduce-u...@hadoop.apache.org>;
> "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
> Sent: Thu, August 2, 2012 1:16:54 PM
> Subject: Re: Issue with Hadoop Streaming
>
>
> http://www.mail-archive.com/core-user@hadoop.apache.org/msg07382.html
>
>
>
> From: Devi Kumarappan <kpala...@att.net>
> Reply-To: "mapreduce-u...@hadoop.apache.org"
> <mapreduce-u...@hadoop.apache.org>
> Date: Thursday, August 2, 2012 3:03 PM
> To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>,
> "mapreduce-u...@hadoop.apache.org" <mapreduce-u...@hadoop.apache.org>
> Subject: Re: Issue with Hadoop Streaming
>
>
> My mapper is perl script  and it is not in Java.So how do I specify the
> NLineFormat?
>
>
>
>
> ________________________________
> From: Robert Evans <ev...@yahoo-inc.com>
> To: "mapreduce-u...@hadoop.apache.org" <mapreduce-u...@hadoop.apache.org>;
> "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
> Sent: Thu, August 2, 2012 12:59:50 PM
> Subject: Re: Issue with Hadoop Streaming
>
> It depends on the input format you use.  You probably want to look at using
> NLineInputFormat
>
> From: Devi Kumarappan <kpala...@att.net<mailto:kpala...@att.net>>
> Reply-To:
> "mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>"
> <mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>>
> Date: Wednesday, August 1, 2012 8:09 PM
> To: "common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>"
> <common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>>,
> "mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>"
> <mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>>
> Subject: Issue with Hadoop Streaming
>
> I am trying to run hadoop streaming using perl script as the mapper and with
> no
> reducer. My requirement is for the Mapper  to run on one file at a time.
> since
> I have to do pattern processing in the entire contents of one file at a time
> and
> the file size is small.
>
> Hadoop streaming manual suggests the following solution
>
> *  Generate a file containing the full HDFS path of the input files. Each
> map
> task would get one file name as input.
> *  Create a mapper script which, given a filename, will get the file to
> local
> disk, gzip the file and put it back in the desired output directory.
>
> I am running the fllowing command.
>
> hadoop jar
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar
> -input
> /user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl
> /home/devi/Perl/crash_parser.pl"
>
>
>
> /user/devi/file.txt contains the following two lines.
>
> /user/devi/s_input/a.txt
> /user/devi/s_input/b.txt
>
> When this runs, instead of spawing two mappers for a.txt and b.txt as per
> the
> document, only one mapper is being spawned and the perl script gets the
> /user/devi/s_input/a.txt and /user/devi/s_input/b.txt as the inputs.
>
>
>
> How could I make the mapper perl script to run using only one file at a time
> ?
>
>
>
> Appreciate your help, Thanks, Devi

Re: Issue with Hadoop Streaming

Reply via email to