I can't get your book, so can you give me a few more words to describe the solution? very appreciate.
-snowloong On Tue, Jun 16, 2009 at 9:51 PM, jason hadoop <jason.had...@gmail.com>wrote: > In the examples for my book is a jvm reuse with static data shared between > jvm's example > > On Tue, Jun 16, 2009 at 1:08 AM, Hello World <snowlo...@gmail.com> wrote: > > > Thanks for your reply. Can you do me a favor to make a check? > > I modified mapred-default.xml as follows: > > 540 <property> > > 541 <name>mapred.job.reuse.jvm.num.tasks</name> > > 542 <value>-1</value> > > 543 <description>How many tasks to run per jvm. If set to -1, there > is > > 544 no limit. > > 545 </description> > > 546 </property> > > And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop; > > > > This is my program: > > > > 17 public class WordCount { > > 18 > > 19 public static class TokenizerMapper > > 20 extends Mapper<Object, Text, Text, IntWritable>{ > > 21 > > 22 private final static IntWritable one = new IntWritable(1); > > 23 private Text word = new Text(); > > 24 public static int[] ToBeSharedData = new int[1024 * 1024 * > 16]; > > 25 > > 26 protected void setup(Context context > > 27 ) throws IOException, InterruptedException { > > 28 //Init shared data > > 29 ToBeSharedData[0] = 12345; > > 30 System.out.println("setup shared data[0] = " + > > ToBeSharedData[0]); > > 31 } > > 32 > > 33 public void map(Object key, Text value, Context context > > 34 ) throws IOException, InterruptedException { > > 35 StringTokenizer itr = new StringTokenizer(value.toString()); > > 36 while (itr.hasMoreTokens()) { > > 37 word.set(itr.nextToken()); > > 38 context.write(word, one); > > 39 } > > 40 System.out.println("read shared data[0] = " + > > ToBeSharedData[0]); > > 41 } > > 42 } > > > > First, can you tell me how to make sure "jvm reuse" is taking effect, for > I > > didn't see anything different from before. I use "top" command under > linux > > and see the same number of java processes and same memory usage. > > > > Second, can you tell me how to make the "ToBeSharedData" be inited only > > once > > and can be read from other MapTasks on the same node? Or this is not a > > suitable programming style for map-reduce? > > > > By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a > > single-node. > > thanks in advance > > > > On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal <shara...@yahoo-inc.com > > >wrote: > > > > > > > > snowloong wrote: > > > > Hi, > > > > I want to share some data structures for the map tasks on a same > > node(not > > > through files), I mean, if one map task has already initialized some > data > > > structures (e.g. an array or a list), can other map tasks share these > > > memorys and directly access them, for I don't want to reinitialize > these > > > datas and I want to save some memory. Can hadoop help me do this? > > > > > > You can enable jvm reuse across tasks. See > mapred.job.reuse.jvm.num.tasks > > > in mapred-default.xml for usage. Then you can cache the data in a > static > > > variable in your mapper. > > > > > > - Sharad > > > > > > > > > -- > Pro Hadoop, a book to guide you from beginner to hadoop mastery, > http://www.apress.com/book/view/9781430219422 > www.prohadoopbook.com a community for Hadoop Professionals >