Re: Tow Jobs share one input file

Robert Evans Mon, 27 Feb 2012 07:31:15 -0800

Pig tries to do this with some of their optimizations.  You ultimately have to 
combine them together into a single map/reduce job, with two separate execution 
paths.  It is complicated, especially in the shuffle phase.  It would probably 
look something like


MapCollectorWrapper implements collector {
    Collector wrapped;
    LongWriteable taskkey;

    MapCollectorWrapper(Collector c, taskkey) {
      wrapped = c;
      this.taskkey = taskkey;
    }

    emit(key, value) {
        wrapped.emit(SpecialCompoundKey(key, taskkey), value));
    }
}

Map (key, value, collector) {
  //TODO clone key and value to avoid mapper from changing them.
  MapCollectorWrapper m1 = new MapCollectorWrappen(collector, 1);
  Map1(key, value, m1);
  MapCollectorWrapper m2 = new MapCollectorWrappen(collector, 2);
  Map2(key, value, m2);
}

Reduce ( SpecialCompoundKey key, Iterable values, collector) {
    //TODO need to have a multifile output format and wrap the collector here 
to so that the output files all go to the proper place.
    if(key.getTaskKey() == 1) {
        Reduce1(key.getRealKey(), values, collector );
    } else {
        Reduce2(key.getRealKey(), values, collector );
   }
}

On 2/25/12 7:34 AM, "Bruce Wang" <bruc...@hotmail.com> wrote:

Hi,
There are tow map-reduce jobs,which have same input file.
They must read the input file double times.
I want that the jobs read the file one time,and they can share the same in 
memory
How can I to do?
Thanks
________________________________
Best Regards
Bruce Wang

Re: Tow Jobs share one input file

Reply via email to