Pig tries to do this with some of their optimizations. You ultimately have to combine them together into a single map/reduce job, with two separate execution paths. It is complicated, especially in the shuffle phase. It would probably look something like
MapCollectorWrapper implements collector { Collector wrapped; LongWriteable taskkey; MapCollectorWrapper(Collector c, taskkey) { wrapped = c; this.taskkey = taskkey; } emit(key, value) { wrapped.emit(SpecialCompoundKey(key, taskkey), value)); } } Map (key, value, collector) { //TODO clone key and value to avoid mapper from changing them. MapCollectorWrapper m1 = new MapCollectorWrappen(collector, 1); Map1(key, value, m1); MapCollectorWrapper m2 = new MapCollectorWrappen(collector, 2); Map2(key, value, m2); } Reduce ( SpecialCompoundKey key, Iterable values, collector) { //TODO need to have a multifile output format and wrap the collector here to so that the output files all go to the proper place. if(key.getTaskKey() == 1) { Reduce1(key.getRealKey(), values, collector ); } else { Reduce2(key.getRealKey(), values, collector ); } } On 2/25/12 7:34 AM, "Bruce Wang" <bruc...@hotmail.com> wrote: Hi, There are tow map-reduce jobs,which have same input file. They must read the input file double times. I want that the jobs read the file one time,and they can share the same in memory How can I to do? Thanks ________________________________ Best Regards Bruce Wang