In streaming contents of the file will be streamed to mapper through
STDIN, not the file names.

Fix the perl script accordingly.

Thanks, Subir

On 8/3/12, Devi Kumarappan <> wrote:
> After specifying NLineInputFormat option, streaming job fails with
> Error from attempt_201205171448_0092_m_000000_0: java.lang.RuntimeException:
> PipeMapRed.waitOutputThreads(): subprocess failed with code 2
> It spawns two mappers, but i am not sure whether the mapper runs with file
> names
> specified in the input option.  I was expecting one mapper to run with
> /user/devi/s_input/a.txt and one mapper to run with
> /user/devi/s_input/b.txt. I
> digged into the task files, but could not find anything.
> Here is the simple  mapper perl script .All does is it reads the file and
> prints
> it. (It needs to do much more stuff, but I could not get the basic job
> itself to
> run).
>  $i = 0;
>    $userinput = <STDIN>;
>    open(INFILE,"$userinput") || die "could not open the file $userinput \n";
>    while (<INFILE>) {
>      my $line = $_;
>      print "$i".$line ;
>      $i++;
>    }
>    close(INFILE);
> exit;
> My command is hadoop jar
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar
> -input
> /user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl
> /home/devi/Perl/" -inputformat
> org.apache.hadoop.mapred.lib.NLineInputFormat
> Really appreciate your help.
> Devi
> ________________________________
> From: Robert Evans <>
> To: "" <>;
> "" <>
> Sent: Thu, August 2, 2012 1:16:54 PM
> Subject: Re: Issue with Hadoop Streaming
> From: Devi Kumarappan <>
> Reply-To: ""
> <>
> Date: Thursday, August 2, 2012 3:03 PM
> To: "" <>,
> "" <>
> Subject: Re: Issue with Hadoop Streaming
> My mapper is perl script  and it is not in Java.So how do I specify the
> NLineFormat?
> ________________________________
> From: Robert Evans <>
> To: "" <>;
> "" <>
> Sent: Thu, August 2, 2012 12:59:50 PM
> Subject: Re: Issue with Hadoop Streaming
> It depends on the input format you use.  You probably want to look at using
> NLineInputFormat
> From: Devi Kumarappan <<>>
> Reply-To:
> "<>"
> <<>>
> Date: Wednesday, August 1, 2012 8:09 PM
> To: "<>"
> <<>>,
> "<>"
> <<>>
> Subject: Issue with Hadoop Streaming
> I am trying to run hadoop streaming using perl script as the mapper and with
> no
> reducer. My requirement is for the Mapper  to run on one file at a time.
> since
> I have to do pattern processing in the entire contents of one file at a time
> and
> the file size is small.
> Hadoop streaming manual suggests the following solution
> *  Generate a file containing the full HDFS path of the input files. Each
> map
> task would get one file name as input.
> *  Create a mapper script which, given a filename, will get the file to
> local
> disk, gzip the file and put it back in the desired output directory.
> I am running the fllowing command.
> hadoop jar
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar
> -input
> /user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl
> /home/devi/Perl/"
> /user/devi/file.txt contains the following two lines.
> /user/devi/s_input/a.txt
> /user/devi/s_input/b.txt
> When this runs, instead of spawing two mappers for a.txt and b.txt as per
> the
> document, only one mapper is being spawned and the perl script gets the
> /user/devi/s_input/a.txt and /user/devi/s_input/b.txt as the inputs.
> How could I make the mapper perl script to run using only one file at a time
> ?
> Appreciate your help, Thanks, Devi

Reply via email to