Thanks, Amogh. But my case is slightly different. The command line inputs are 2 
files: file1 and file2. I need to tell in the mapper which line is from which 
file:
#In mapper
while (<STDIN>){
  //how to tell the current line is from file1 or file2?
}

-jobconfs map.input.file param does not help in this case 
because file1 and file2 are both input.

-Steve

--- On Thu, 10/23/08, Amogh Vasekar <[EMAIL PROTECTED]> wrote:
From: Amogh Vasekar <[EMAIL PROTECTED]>
Subject: RE: Is there a way to know the input filename at Hadoop Streaming?
To: [EMAIL PROTECTED]
Date: Thursday, October 23, 2008, 12:11 AM

Personally haven't worked with streaming but I guess the ur jobconfs
map.input.file param should do it for you.
-----Original Message-----
From: Steve Gao [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 23, 2008 7:26 AM
To: core-user@hadoop.apache.org
Cc: [EMAIL PROTECTED]
Subject: Is there a way to know the input filename at Hadoop Streaming?

I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?

For example:
$HADOOP_HOME/bin/hadoop  \
jar $HADOOP_HOME/hadoop-streaming.jar \
    -input file1 \
    -input file2 \
    -output myOutputDir \
    -mapper mapper \
    -reducer reducer

In mapper:
while (<STDIN>){
  //how to tell the current line is from file1 or file2?
}




      



      

Reply via email to