Re: Can I share datas for several map tasks?

Hello World Tue, 16 Jun 2009 01:08:52 -0700

Thanks for your reply. Can you do me a favor to make a check?
I modified mapred-default.xml as follows:
    540 <property>
    541   <name>mapred.job.reuse.jvm.num.tasks</name>
    542   <value>-1</value>
    543   <description>How many tasks to run per jvm. If set to -1, there is
    544   no limit.
    545   </description>
    546 </property>
And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop;


This is my program:

     17 public class WordCount {
     18
     19   public static class TokenizerMapper
     20        extends Mapper<Object, Text, Text, IntWritable>{
     21
     22     private final static IntWritable one = new IntWritable(1);
     23     private Text word = new Text();
     24     public static int[] ToBeSharedData = new int[1024 * 1024 * 16];
     25
     26     protected void setup(Context context
     27             ) throws IOException, InterruptedException {
     28         //Init shared data
     29         ToBeSharedData[0] = 12345;
     30         System.out.println("setup shared data[0] = " +
ToBeSharedData[0]);
     31     }
     32
     33     public void map(Object key, Text value, Context context
     34                     ) throws IOException, InterruptedException {
     35       StringTokenizer itr = new StringTokenizer(value.toString());
     36       while (itr.hasMoreTokens()) {
     37         word.set(itr.nextToken());
     38         context.write(word, one);
     39       }
     40       System.out.println("read shared data[0] = " +
ToBeSharedData[0]);
     41     }
     42   }

First, can you tell me how to make sure "jvm reuse" is taking effect, for I
didn't see anything different from before. I use "top" command under linux
and see the same number of java processes and same memory usage.

Second, can you tell me how to make the "ToBeSharedData" be inited only once
and can be read from other MapTasks on the same node? Or this is not a
suitable programming style for map-reduce?

By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a
single-node.
thanks in advance

On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal <shara...@yahoo-inc.com>wrote:

>
> snowloong wrote:
> > Hi,
> > I want to share some data structures for the map tasks on a same node(not
> through files), I mean, if one map task has already initialized some data
> structures (e.g. an array or a list), can other map tasks share these
> memorys and directly access them, for I don't want to reinitialize these
> datas and I want to save some memory. Can hadoop help me do this?
>
> You can enable jvm reuse across tasks. See mapred.job.reuse.jvm.num.tasks
> in mapred-default.xml for usage. Then you can cache the data in a static
> variable in your mapper.
>
> - Sharad
>

Re: Can I share datas for several map tasks?

Reply via email to