On Wed, Oct 22, 2008 at 18:55, Steve Gao <[EMAIL PROTECTED]> wrote:
> I am using Hadoop Streaming. The input are multiple files.
> Is there a way to get the current filename in mapper?
>

Streaming map tasks should have a "map_input_file" environment
variable like the following:

map_input_file=hdfs://HOST/path/to/file

rick

> For example:
> $HADOOP_HOME/bin/hadoop  \
> jar $HADOOP_HOME/hadoop-streaming.jar \
>    -input file1 \
>    -input file2 \
>    -output myOutputDir \
>    -mapper mapper \
>    -reducer reducer
>
> In mapper:
> while (<STDIN>){
>  //how to tell the current line is from file1 or file2?
> }
>
>
>
>
>

Reply via email to