On Wed, Oct 22, 2008 at 18:55, Steve Gao <[EMAIL PROTECTED]> wrote: > I am using Hadoop Streaming. The input are multiple files. > Is there a way to get the current filename in mapper? >
Streaming map tasks should have a "map_input_file" environment variable like the following: map_input_file=hdfs://HOST/path/to/file rick > For example: > $HADOOP_HOME/bin/hadoop \ > jar $HADOOP_HOME/hadoop-streaming.jar \ > -input file1 \ > -input file2 \ > -output myOutputDir \ > -mapper mapper \ > -reducer reducer > > In mapper: > while (<STDIN>){ > //how to tell the current line is from file1 or file2? > } > > > > >