Hi Jason, Would you please tell us in which chapter is this example. Thanks Iman
________________________________ From: jason hadoop <jason.had...@gmail.com> To: core-user@hadoop.apache.org Sent: Tuesday, June 16, 2009 6:51:48 AM Subject: Re: Can I share datas for several map tasks? In the examples for my book is a jvm reuse with static data shared between jvm's example On Tue, Jun 16, 2009 at 1:08 AM, Hello World <snowlo...@gmail.com> wrote: > Thanks for your reply. Can you do me a favor to make a check? > I modified mapred-default.xml as follows: > 540 <property> > 541 <name>mapred.job.reuse.jvm.num.tasks</name> > 542 <value>-1</value> > 543 <description>How many tasks to run per jvm. If set to -1, there is > 544 no limit. > 545 </description> > 546 </property> > And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop; > > This is my program: > > 17 public class WordCount { > 18 > 19 public static class TokenizerMapper > 20 extends Mapper<Object, Text, Text, IntWritable>{ > 21 > 22 private final static IntWritable one = new IntWritable(1); > 23 private Text word = new Text(); > 24 public static int[] ToBeSharedData = new int[1024 * 1024 * 16]; > 25 > 26 protected void setup(Context context > 27 ) throws IOException, InterruptedException { > 28 //Init shared data > 29 ToBeSharedData[0] = 12345; > 30 System.out.println("setup shared data[0] = " + > ToBeSharedData[0]); > 31 } > 32 > 33 public void map(Object key, Text value, Context context > 34 ) throws IOException, InterruptedException { > 35 StringTokenizer itr = new StringTokenizer(value.toString()); > 36 while (itr.hasMoreTokens()) { > 37 word.set(itr.nextToken()); > 38 context.write(word, one); > 39 } > 40 System.out.println("read shared data[0] = " + > ToBeSharedData[0]); > 41 } > 42 } > > First, can you tell me how to make sure "jvm reuse" is taking effect, for I > didn't see anything different from before. I use "top" command under linux > and see the same number of java processes and same memory usage. > > Second, can you tell me how to make the "ToBeSharedData" be inited only > once > and can be read from other MapTasks on the same node? Or this is not a > suitable programming style for map-reduce? > > By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a > single-node. > thanks in advance > > On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal <shara...@yahoo-inc.com > >wrote: > > > > > snowloong wrote: > > > Hi, > > > I want to share some data structures for the map tasks on a same > node(not > > through files), I mean, if one map task has already initialized some data > > structures (e.g. an array or a list), can other map tasks share these > > memorys and directly access them, for I don't want to reinitialize these > > datas and I want to save some memory. Can hadoop help me do this? > > > > You can enable jvm reuse across tasks. See mapred.job.reuse.jvm.num.tasks > > in mapred-default.xml for usage. Then you can cache the data in a static > > variable in your mapper. > > > > - Sharad > > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals