Re: Can I share datas for several map tasks?
Thank you, Jason. I found the example. So, is there a way to share the same JVM between different jobs? From: jason hadoop To: core-user@hadoop.apache.org Sent: Tuesday, June 16, 2009 7:22:16 PM Subject: Re: Can I share datas for several map tasks? in the example code, download bundle, in the package com.apress.hadoopbook.examples.advancedtechniques, is the class JVMReuseAndStaticInitializers.java which demonstrates sharing data between instances using jvm reuse. I built this to prove to myself that it was possible. It never got an actual write up in the book itself. On Tue, Jun 16, 2009 at 6:55 PM, Hello World wrote: > I can't get your book, so can you give me a few more words to describe the > solution? very appreciate. > > -snowloong > > On Tue, Jun 16, 2009 at 9:51 PM, jason hadoop >wrote: > > > In the examples for my book is a jvm reuse with static data shared > between > > jvm's example > > > > On Tue, Jun 16, 2009 at 1:08 AM, Hello World > wrote: > > > > > Thanks for your reply. Can you do me a favor to make a check? > > > I modified mapred-default.xml as follows: > > > 540 > > > 541 mapred.job.reuse.jvm.num.tasks > > > 542 -1 > > > 543 How many tasks to run per jvm. If set to -1, > there > > is > > > 544 no limit. > > > 545 > > > 546 > > > And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop; > > > > > > This is my program: > > > > > > 17 public class WordCount { > > > 18 > > > 19 public static class TokenizerMapper > > > 20 extends Mapper{ > > > 21 > > > 22 private final static IntWritable one = new IntWritable(1); > > > 23 private Text word = new Text(); > > > 24 public static int[] ToBeSharedData = new int[1024 * 1024 * > > 16]; > > > 25 > > > 26 protected void setup(Context context > > > 27 ) throws IOException, InterruptedException { > > > 28 //Init shared data > > > 29 ToBeSharedData[0] = 12345; > > > 30 System.out.println("setup shared data[0] = " + > > > ToBeSharedData[0]); > > > 31 } > > > 32 > > > 33 public void map(Object key, Text value, Context context > > > 34 ) throws IOException, InterruptedException { > > > 35 StringTokenizer itr = new > StringTokenizer(value.toString()); > > > 36 while (itr.hasMoreTokens()) { > > > 37 word.set(itr.nextToken()); > > > 38 context.write(word, one); > > > 39 } > > > 40 System.out.println("read shared data[0] = " + > > > ToBeSharedData[0]); > > > 41 } > > > 42 } > > > > > > First, can you tell me how to make sure "jvm reuse" is taking effect, > for > > I > > > didn't see anything different from before. I use "top" command under > > linux > > > and see the same number of java processes and same memory usage. > > > > > > Second, can you tell me how to make the "ToBeSharedData" be inited only > > > once > > > and can be read from other MapTasks on the same node? Or this is not a > > > suitable programming style for map-reduce? > > > > > > By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a > > > single-node. > > > thanks in advance > > > > > > On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal < > shara...@yahoo-inc.com > > > >wrote: > > > > > > > > > > > snowloong wrote: > > > > > Hi, > > > > > I want to share some data structures for the map tasks on a same > > > node(not > > > > through files), I mean, if one map task has already initialized some > > data > > > > structures (e.g. an array or a list), can other map tasks share these > > > > memorys and directly access them, for I don't want to reinitialize > > these > > > > datas and I want to save some memory. Can hadoop help me do this? > > > > > > > > You can enable jvm reuse across tasks. See > > mapred.job.reuse.jvm.num.tasks > > > > in mapred-default.xml for usage. Then you can cache the data in a > > static > > > > variable in your mapper. > > > > > > > > - Sharad > > > > > > > > > > > > > > > -- > > Pro Hadoop, a book to guide you from beginner to hadoop mastery, > > http://www.apress.com/book/view/9781430219422 > > www.prohadoopbook.com a community for Hadoop Professionals > > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Can I share datas for several map tasks?
Hi Jason, Would you please tell us in which chapter is this example. Thanks Iman From: jason hadoop To: core-user@hadoop.apache.org Sent: Tuesday, June 16, 2009 6:51:48 AM Subject: Re: Can I share datas for several map tasks? In the examples for my book is a jvm reuse with static data shared between jvm's example On Tue, Jun 16, 2009 at 1:08 AM, Hello World wrote: > Thanks for your reply. Can you do me a favor to make a check? > I modified mapred-default.xml as follows: >540 >541 mapred.job.reuse.jvm.num.tasks >542 -1 >543 How many tasks to run per jvm. If set to -1, there is >544 no limit. >545 >546 > And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop; > > This is my program: > > 17 public class WordCount { > 18 > 19 public static class TokenizerMapper > 20extends Mapper{ > 21 > 22 private final static IntWritable one = new IntWritable(1); > 23 private Text word = new Text(); > 24 public static int[] ToBeSharedData = new int[1024 * 1024 * 16]; > 25 > 26 protected void setup(Context context > 27 ) throws IOException, InterruptedException { > 28 //Init shared data > 29 ToBeSharedData[0] = 12345; > 30 System.out.println("setup shared data[0] = " + > ToBeSharedData[0]); > 31 } > 32 > 33 public void map(Object key, Text value, Context context > 34 ) throws IOException, InterruptedException { > 35 StringTokenizer itr = new StringTokenizer(value.toString()); > 36 while (itr.hasMoreTokens()) { > 37 word.set(itr.nextToken()); > 38 context.write(word, one); > 39 } > 40 System.out.println("read shared data[0] = " + > ToBeSharedData[0]); > 41 } > 42 } > > First, can you tell me how to make sure "jvm reuse" is taking effect, for I > didn't see anything different from before. I use "top" command under linux > and see the same number of java processes and same memory usage. > > Second, can you tell me how to make the "ToBeSharedData" be inited only > once > and can be read from other MapTasks on the same node? Or this is not a > suitable programming style for map-reduce? > > By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a > single-node. > thanks in advance > > On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal >wrote: > > > > > snowloong wrote: > > > Hi, > > > I want to share some data structures for the map tasks on a same > node(not > > through files), I mean, if one map task has already initialized some data > > structures (e.g. an array or a list), can other map tasks share these > > memorys and directly access them, for I don't want to reinitialize these > > datas and I want to save some memory. Can hadoop help me do this? > > > > You can enable jvm reuse across tasks. See mapred.job.reuse.jvm.num.tasks > > in mapred-default.xml for usage. Then you can cache the data in a static > > variable in your mapper. > > > > - Sharad > > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals
Re: Can I run the testcase in local
Zhang, You will need cygwin. There is also a hadoop virtual machine that you can use. Check this tutorials for more details: http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html zjffdu wrote: I found it can only work on linux, not windows. So is there any way I can run it on windows. From: zhang jianfeng [mailto:zjf...@gmail.com] Sent: 2009年5月10日 16:39 To: core-user@hadoop.apache.org Subject: Re: Can I run the testcase in local PS, I run it in windows machine On Sun, May 10, 2009 at 4:11 PM, zjffdu wrote: Hi all, I’d like to know more about the hadoop, so I want to debug the testcase in local. But I found the errors below: Can anyone help to solve this problem, thank you very much. ### 2009-05-10 16:00:51,483 ERROR namenode.FSNamesystem (FSNamesystem.java:(291)) - FSNamesystem initialization failed. java.io.IOException: Problem starting http server at org.apache.hadoop.http.HttpServer.start(HttpServer.java:369) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:372) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:289) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:162) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:209) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:197) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:822) at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:275) at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:119) at org.apache.hadoop.mapred.ClusterMapReduceTestCase.startCluster(ClusterMapReduceTestCase.java:81) at org.apache.hadoop.mapred.ClusterMapReduceTestCase.setUp(ClusterMapReduceTestCase.java:56) at junit.framework.TestCase.runBare(TestCase.java:125) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196) Caused by: org.mortbay.util.MultiException[java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.server.namenode.dfshealth_jsp, java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.server.namenode.nn_005fbrowsedfscontent_jsp] at org.mortbay.http.HttpServer.doStart(HttpServer.java:731) at org.mortbay.util.Container.start(Container.java:72) at org.apache.hadoop.http.HttpServer.start(HttpServer.java:347) ... 23 more 2009-05-10 16:00:51,483 INFO namenode.FSNamesystem (FSEditLog.java:printStatistics(940)) - Number of transactions: 0 Total time for transactions(ms): 0 Number of syncs: 0 SyncTimes(ms): 0 0 2009-05-10 16:00:51,483 WARN namenode.FSNamesystem (FSNamesystem.java:run(2217)) - ReplicationMonitor thread received InterruptedException.java.lang.InterruptedException: sleep interrupted 2009-05-10 16:00:51,655 INFO ipc.Server (Server.java:stop(1033)) - Stopping server on 4233
Re: OT: How to search mailing list archives?
You might also want to try the mail archive: http://www.mail-archive.com/core-user@hadoop.apache.org/ Jimmy Lin wrote: I've found nabble to be helpful: http://www.nabble.com/Hadoop-core-user-f30590.html -Jimmy Miles Osborne wrote: posts tend to get indexed by Google, so try that Miles 2009/3/8 Stuart White : This is slightly off-topic, and I realize this question is not specific to Hadoop, but what is the best way to search the mailing list archives? Here's where I'm looking: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/ I don't see any way to search the archives. Am I missing something? Is there another archive site I should be looking at? Thanks!
Re: Eclipse plugin
Hi John, When I created the hadoop location, the hadoop.job.ugi did not appear in the advanced parameter. But when I later edited it, it was there. I donnu how that was fixed:) Also to get it to work, I had to edit the fs.default.name and mapred.job.tracker in hadoop/conf/hadoop-site.xml I added these lines: fs.default.name hdfs://:9000 mapred.job.tracker :9001 dfs.replication 1 Finally, I decided to install hadoop locally on my machine instead of using the hadoop virtual machine. Iman. John Livingstone wrote: Iman-4, I have encountered the same problem that you have encountered: Not being able to access HDFS on my Hadoop VMware Linux server (uning the Hadoop Yahoo tutorial) and not seeing "hadoop.job.ugi" in my Eclipse Europa 3.3.2 list of parameters. What did you have to do or change to get it to work? Thanks, John L. Iman-4 wrote: Thank you so much, Norbert. It worked. Iman Norbert Burger wrote: Are running Eclipse on Windows? If so, be aware that you need to spawn Eclipse from within Cygwin in order to access HDFS. It seems that the plugin uses "whoami" to get info about the active user. This thread has some more info: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/%3c487cd747.8050...@signal7.de%3e Norbert On 2/12/09, Iman wrote: Hi, I am using VM image hadoop-appliance-0.18.0.vmx and an eclipse plug-in of hadoop. I have followed all the steps in this tutorial: http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html. My problem is that I am not able to browse the HDFS. It only shows an entry "Error:null". Upload files to DFS, and Create new directory fail. Any suggestions? I have tried to chang all the directories in the hadoop location advanced parameters to "/tmp/hadoop-user", but it did not work. Also, the tutorials mentioned a parameter "hadoop.job.ugi" that needs to be changed, but I could not find it in the list of parameters. Thanks Iman
Re: Probelms getting Eclipse Hadoop plugin to work.
This thread helped me fix a similar problem: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/%3cc001e847c1fd4248a7d6537643690e2101c83...@mse16be2.mse16.exchange.ms%3e In my case, I had the ports specified in the hadoop-site.xml for the name node and job tracker switched in the Map/Reduce location's configuration. Iman. P.S. I sent this reply to the wrong thread before. Erik Holstad wrote: Thanks guys! Running Linux and the remote cluster is also Linux. I have the properties set up like that already on my remote cluster, but not sure where to input this info into Eclipse. And when changing the ports to 9000 and 9001 I get: Error: java.io.IOException: Unknown protocol to job tracker: org.apache.hadoop.dfs.ClientProtocol Regards Erik
Re: Map/Recuce Job done locally?
This thread helped me fix a similar problem: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/%3cc001e847c1fd4248a7d6537643690e2101c83...@mse16be2.mse16.exchange.ms%3e In my case, I had the ports specified in the hadoop-site.xml for the name node and job tracker switched in the Map/Reduce location's configuration. Iman. Erik Holstad wrote: Hey Philipp! Not sure about your time tracking thing, probably works, I've just used a bash script to start the jar and then you can do the timing in the script. About how to compile the jars, you need to include the dependencies too, but you will see what you are missing when you run the job. Regards Erik
Re: Eclipse plugin
Thank you so much, Norbert. It worked. Iman Norbert Burger wrote: Are running Eclipse on Windows? If so, be aware that you need to spawn Eclipse from within Cygwin in order to access HDFS. It seems that the plugin uses "whoami" to get info about the active user. This thread has some more info: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/%3c487cd747.8050...@signal7.de%3e Norbert On 2/12/09, Iman wrote: Hi, I am using VM image hadoop-appliance-0.18.0.vmx and an eclipse plug-in of hadoop. I have followed all the steps in this tutorial: http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html. My problem is that I am not able to browse the HDFS. It only shows an entry "Error:null". Upload files to DFS, and Create new directory fail. Any suggestions? I have tried to chang all the directories in the hadoop location advanced parameters to "/tmp/hadoop-user", but it did not work. Also, the tutorials mentioned a parameter "hadoop.job.ugi" that needs to be changed, but I could not find it in the list of parameters. Thanks Iman
Eclipse plugin
Hi, I am using VM image hadoop-appliance-0.18.0.vmx and an eclipse plug-in of hadoop. I have followed all the steps in this tutorial: http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html. My problem is that I am not able to browse the HDFS. It only shows an entry "Error:null". Upload files to DFS, and Create new directory fail. Any suggestions? I have tried to chang all the directories in the hadoop location advanced parameters to "/tmp/hadoop-user", but it did not work. Also, the tutorials mentioned a parameter "hadoop.job.ugi" that needs to be changed, but I could not find it in the list of parameters. Thanks Iman