Re: Can I share datas for several map tasks?

Hello World Tue, 16 Jun 2009 18:56:26 -0700

I can't get your book, so can you give me a few more words to describe the
solution? very appreciate.


-snowloong

On Tue, Jun 16, 2009 at 9:51 PM, jason hadoop <jason.had...@gmail.com>wrote:

> In the examples for my book is a jvm reuse with static data shared between
> jvm's example
>
> On Tue, Jun 16, 2009 at 1:08 AM, Hello World <snowlo...@gmail.com> wrote:
>
> > Thanks for your reply. Can you do me a favor to make a check?
> > I modified mapred-default.xml as follows:
> >    540 <property>
> >    541   <name>mapred.job.reuse.jvm.num.tasks</name>
> >    542   <value>-1</value>
> >    543   <description>How many tasks to run per jvm. If set to -1, there
> is
> >    544   no limit.
> >    545   </description>
> >    546 </property>
> > And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop;
> >
> > This is my program:
> >
> >     17 public class WordCount {
> >     18
> >     19   public static class TokenizerMapper
> >     20        extends Mapper<Object, Text, Text, IntWritable>{
> >     21
> >     22     private final static IntWritable one = new IntWritable(1);
> >     23     private Text word = new Text();
> >     24     public static int[] ToBeSharedData = new int[1024 * 1024 *
> 16];
> >     25
> >     26     protected void setup(Context context
> >     27             ) throws IOException, InterruptedException {
> >     28         //Init shared data
> >     29         ToBeSharedData[0] = 12345;
> >     30         System.out.println("setup shared data[0] = " +
> > ToBeSharedData[0]);
> >     31     }
> >     32
> >     33     public void map(Object key, Text value, Context context
> >     34                     ) throws IOException, InterruptedException {
> >     35       StringTokenizer itr = new StringTokenizer(value.toString());
> >     36       while (itr.hasMoreTokens()) {
> >     37         word.set(itr.nextToken());
> >     38         context.write(word, one);
> >     39       }
> >     40       System.out.println("read shared data[0] = " +
> > ToBeSharedData[0]);
> >     41     }
> >     42   }
> >
> > First, can you tell me how to make sure "jvm reuse" is taking effect, for
> I
> > didn't see anything different from before. I use "top" command under
> linux
> > and see the same number of java processes and same memory usage.
> >
> > Second, can you tell me how to make the "ToBeSharedData" be inited only
> > once
> > and can be read from other MapTasks on the same node? Or this is not a
> > suitable programming style for map-reduce?
> >
> > By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a
> > single-node.
> > thanks in advance
> >
> > On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal <shara...@yahoo-inc.com
> > >wrote:
> >
> > >
> > > snowloong wrote:
> > > > Hi,
> > > > I want to share some data structures for the map tasks on a same
> > node(not
> > > through files), I mean, if one map task has already initialized some
> data
> > > structures (e.g. an array or a list), can other map tasks share these
> > > memorys and directly access them, for I don't want to reinitialize
> these
> > > datas and I want to save some memory. Can hadoop help me do this?
> > >
> > > You can enable jvm reuse across tasks. See
> mapred.job.reuse.jvm.num.tasks
> > > in mapred-default.xml for usage. Then you can cache the data in a
> static
> > > variable in your mapper.
> > >
> > > - Sharad
> > >
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.apress.com/book/view/9781430219422
> www.prohadoopbook.com a community for Hadoop Professionals
>

Re: Can I share datas for several map tasks?

Reply via email to