It is also pretty easy to over-ride bits of TextInputFormat to give the file as the key instead of the offset.
On 10/12/07 10:19 AM, "Benjamin Reed" <[EMAIL PROTECTED]> wrote: > We do this in Pig by using our own InputSplits. > > ben > > On Friday 12 October 2007, Owen O'Malley wrote: >> On Oct 12, 2007, at 5:51 AM, Shailendra Mudgal wrote: >>> I am adding two input dir in a job. Both the input dirs have same >>> <Key.class, >>> Value.class>. Inside the map method i want to know that which >>> pair<key, >>> value> has come from which input dir. How can i do this ? Any help >>> will be >>> appreciated.. >> >> *sigh* We've _almost_ had that feature for a long time now, see >> HADOOP-372. >> >> The work around is to use the information on: >> http://wiki.apache.org/lucene-hadoop/TaskExecutionEnvironment >> and get the "map.input.file" from the map's JobConf and match against >> the prefix. > >