In streaming contents of the file will be streamed to mapper through STDIN, not the file names.
Fix the perl script accordingly. Thanks, Subir On 8/3/12, Devi Kumarappan <kpala...@att.net> wrote: > > > After specifying NLineInputFormat option, streaming job fails with > > Error from attempt_201205171448_0092_m_000000_0: java.lang.RuntimeException: > PipeMapRed.waitOutputThreads(): subprocess failed with code 2 > > It spawns two mappers, but i am not sure whether the mapper runs with file > names > specified in the input option. I was expecting one mapper to run with > /user/devi/s_input/a.txt and one mapper to run with > /user/devi/s_input/b.txt. I > digged into the task files, but could not find anything. > > Here is the simple mapper perl script .All does is it reads the file and > prints > it. (It needs to do much more stuff, but I could not get the basic job > itself to > run). > > $i = 0; > $userinput = <STDIN>; > open(INFILE,"$userinput") || die "could not open the file $userinput \n"; > while (<INFILE>) { > my $line = $_; > print "$i".$line ; > $i++; > } > close(INFILE); > exit; > > My command is hadoop jar > /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar > -input > /user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl > /home/devi/Perl/crash_parser.pl" -inputformat > org.apache.hadoop.mapred.lib.NLineInputFormat > > > Really appreciate your help. > > Devi > > > > > > > ________________________________ > From: Robert Evans <ev...@yahoo-inc.com> > To: "mapreduce-u...@hadoop.apache.org" <mapreduce-u...@hadoop.apache.org>; > "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> > Sent: Thu, August 2, 2012 1:16:54 PM > Subject: Re: Issue with Hadoop Streaming > > > http://www.mail-archive.com/core-user@hadoop.apache.org/msg07382.html > > > > From: Devi Kumarappan <kpala...@att.net> > Reply-To: "mapreduce-u...@hadoop.apache.org" > <mapreduce-u...@hadoop.apache.org> > Date: Thursday, August 2, 2012 3:03 PM > To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>, > "mapreduce-u...@hadoop.apache.org" <mapreduce-u...@hadoop.apache.org> > Subject: Re: Issue with Hadoop Streaming > > > My mapper is perl script and it is not in Java.So how do I specify the > NLineFormat? > > > > > ________________________________ > From: Robert Evans <ev...@yahoo-inc.com> > To: "mapreduce-u...@hadoop.apache.org" <mapreduce-u...@hadoop.apache.org>; > "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> > Sent: Thu, August 2, 2012 12:59:50 PM > Subject: Re: Issue with Hadoop Streaming > > It depends on the input format you use. You probably want to look at using > NLineInputFormat > > From: Devi Kumarappan <kpala...@att.net<mailto:kpala...@att.net>> > Reply-To: > "mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>" > <mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>> > Date: Wednesday, August 1, 2012 8:09 PM > To: "common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>" > <common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>>, > "mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>" > <mapreduce-u...@hadoop.apache.org<mailto:mapreduce-u...@hadoop.apache.org>> > Subject: Issue with Hadoop Streaming > > I am trying to run hadoop streaming using perl script as the mapper and with > no > reducer. My requirement is for the Mapper to run on one file at a time. > since > I have to do pattern processing in the entire contents of one file at a time > and > the file size is small. > > Hadoop streaming manual suggests the following solution > > * Generate a file containing the full HDFS path of the input files. Each > map > task would get one file name as input. > * Create a mapper script which, given a filename, will get the file to > local > disk, gzip the file and put it back in the desired output directory. > > I am running the fllowing command. > > hadoop jar > /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar > -input > /user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl > /home/devi/Perl/crash_parser.pl" > > > > /user/devi/file.txt contains the following two lines. > > /user/devi/s_input/a.txt > /user/devi/s_input/b.txt > > When this runs, instead of spawing two mappers for a.txt and b.txt as per > the > document, only one mapper is being spawned and the perl script gets the > /user/devi/s_input/a.txt and /user/devi/s_input/b.txt as the inputs. > > > > How could I make the mapper perl script to run using only one file at a time > ? > > > > Appreciate your help, Thanks, Devi