Hello, I have a set of input files part-r-* which I will pass through another map(no reduce). the part-r-* files consist of key, values, keys being small, values fairly large(MB's)
I would like to index these, i.e run a map, whose output is key and /filename/ i.e to which part-r-* file the particular key belongs, so that if i need them again I can just access that file. Q: In the map stage,how do I retrieve the name of the file being processed? I'd rather not use the MapFileOutputFormat. Hadoop 0.21 Regards Saptarshi