Re: Can I share datas for several map tasks?

Iman E Tue, 16 Jun 2009 21:07:31 -0700

Thank you, Jason. I found the example. So, is there a way to share the same JVM 
between different jobs?





________________________________
From: jason hadoop <[email protected]>
To: [email protected]
Sent: Tuesday, June 16, 2009 7:22:16 PM
Subject: Re: Can I share datas for several map tasks?

in the example code, download bundle, in the package
com.apress.hadoopbook.examples.advancedtechniques, is the class
JVMReuseAndStaticInitializers.java

which demonstrates sharing data between instances using jvm reuse.

I built this to prove to myself that it was possible.
It never got an actual write up in the book itself.

On Tue, Jun 16, 2009 at 6:55 PM, Hello World <[email protected]> wrote:

> I can't get your book, so can you give me a few more words to describe the
> solution? very appreciate.
>
> -snowloong
>
> On Tue, Jun 16, 2009 at 9:51 PM, jason hadoop <[email protected]
> >wrote:
>
> > In the examples for my book is a jvm reuse with static data shared
> between
> > jvm's example
> >
> > On Tue, Jun 16, 2009 at 1:08 AM, Hello World <[email protected]>
> wrote:
> >
> > > Thanks for your reply. Can you do me a favor to make a check?
> > > I modified mapred-default.xml as follows:
> > >    540 <property>
> > >    541  <name>mapred.job.reuse.jvm.num.tasks</name>
> > >    542  <value>-1</value>
> > >    543  <description>How many tasks to run per jvm. If set to -1,
> there
> > is
> > >    544  no limit.
> > >    545  </description>
> > >    546 </property>
> > > And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop;
> > >
> > > This is my program:
> > >
> > >    17 public class WordCount {
> > >    18
> > >    19  public static class TokenizerMapper
> > >    20        extends Mapper<Object, Text, Text, IntWritable>{
> > >    21
> > >    22    private final static IntWritable one = new IntWritable(1);
> > >    23    private Text word = new Text();
> > >    24    public static int[] ToBeSharedData = new int[1024 * 1024 *
> > 16];
> > >    25
> > >    26    protected void setup(Context context
> > >    27            ) throws IOException, InterruptedException {
> > >    28        //Init shared data
> > >    29        ToBeSharedData[0] = 12345;
> > >    30        System.out.println("setup shared data[0] = " +
> > > ToBeSharedData[0]);
> > >    31    }
> > >    32
> > >    33    public void map(Object key, Text value, Context context
> > >    34                    ) throws IOException, InterruptedException {
> > >    35      StringTokenizer itr = new
> StringTokenizer(value.toString());
> > >    36      while (itr.hasMoreTokens()) {
> > >    37        word.set(itr.nextToken());
> > >    38        context.write(word, one);
> > >    39      }
> > >    40      System.out.println("read shared data[0] = " +
> > > ToBeSharedData[0]);
> > >    41    }
> > >    42  }
> > >
> > > First, can you tell me how to make sure "jvm reuse" is taking effect,
> for
> > I
> > > didn't see anything different from before. I use "top" command under
> > linux
> > > and see the same number of java processes and same memory usage.
> > >
> > > Second, can you tell me how to make the "ToBeSharedData" be inited only
> > > once
> > > and can be read from other MapTasks on the same node? Or this is not a
> > > suitable programming style for map-reduce?
> > >
> > > By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a
> > > single-node.
> > > thanks in advance
> > >
> > > On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal <
> [email protected]
> > > >wrote:
> > >
> > > >
> > > > snowloong wrote:
> > > > > Hi,
> > > > > I want to share some data structures for the map tasks on a same
> > > node(not
> > > > through files), I mean, if one map task has already initialized some
> > data
> > > > structures (e.g. an array or a list), can other map tasks share these
> > > > memorys and directly access them, for I don't want to reinitialize
> > these
> > > > datas and I want to save some memory. Can hadoop help me do this?
> > > >
> > > > You can enable jvm reuse across tasks. See
> > mapred.job.reuse.jvm.num.tasks
> > > > in mapred-default.xml for usage. Then you can cache the data in a
> > static
> > > > variable in your mapper.
> > > >
> > > > - Sharad
> > > >
> > >
> >
> >
> >
> > --
> > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > http://www.apress.com/book/view/9781430219422
> > www.prohadoopbook.com a community for Hadoop Professionals
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: Can I share datas for several map tasks?

Reply via email to