I guess one trick you can do without the help of hadoop is to encode the file
identifier inside the file itself. For example, each line of file1 could start
with 1'space''content of the original line'.
- Original Message
From: Steve Gao [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Cc: [EMAIL PROTECTED]
Sent: Thursday, October 23, 2008 1:48:11 PM
Subject: [Help needed] Is there a way to know the input filename at Hadoop
Streaming?
Sorry for the email. Thanks for any help or hint.
I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?
For example:
$HADOOP_HOME/bin/hadoop \
jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-input file2 \
-output myOutputDir \
-mapper mapper \
-reducer reducer
In mapper:
while (STDIN){
//how to tell the current line is from file1 or file2?
}