Re: Can I share datas for several map tasks?

2009-06-16 Thread Iman E
Thank you, Jason. I found the example. So, is there a way to share the same JVM 
between different jobs?





From: jason hadoop 
To: core-user@hadoop.apache.org
Sent: Tuesday, June 16, 2009 7:22:16 PM
Subject: Re: Can I share datas for several map tasks?

in the example code, download bundle, in the package
com.apress.hadoopbook.examples.advancedtechniques, is the class
JVMReuseAndStaticInitializers.java

which demonstrates sharing data between instances using jvm reuse.

I built this to prove to myself that it was possible.
It never got an actual write up in the book itself.

On Tue, Jun 16, 2009 at 6:55 PM, Hello World  wrote:

> I can't get your book, so can you give me a few more words to describe the
> solution? very appreciate.
>
> -snowloong
>
> On Tue, Jun 16, 2009 at 9:51 PM, jason hadoop  >wrote:
>
> > In the examples for my book is a jvm reuse with static data shared
> between
> > jvm's example
> >
> > On Tue, Jun 16, 2009 at 1:08 AM, Hello World 
> wrote:
> >
> > > Thanks for your reply. Can you do me a favor to make a check?
> > > I modified mapred-default.xml as follows:
> > >    540 
> > >    541  mapred.job.reuse.jvm.num.tasks
> > >    542  -1
> > >    543  How many tasks to run per jvm. If set to -1,
> there
> > is
> > >    544  no limit.
> > >    545  
> > >    546 
> > > And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop;
> > >
> > > This is my program:
> > >
> > >    17 public class WordCount {
> > >    18
> > >    19  public static class TokenizerMapper
> > >    20        extends Mapper{
> > >    21
> > >    22    private final static IntWritable one = new IntWritable(1);
> > >    23    private Text word = new Text();
> > >    24    public static int[] ToBeSharedData = new int[1024 * 1024 *
> > 16];
> > >    25
> > >    26    protected void setup(Context context
> > >    27            ) throws IOException, InterruptedException {
> > >    28        //Init shared data
> > >    29        ToBeSharedData[0] = 12345;
> > >    30        System.out.println("setup shared data[0] = " +
> > > ToBeSharedData[0]);
> > >    31    }
> > >    32
> > >    33    public void map(Object key, Text value, Context context
> > >    34                    ) throws IOException, InterruptedException {
> > >    35      StringTokenizer itr = new
> StringTokenizer(value.toString());
> > >    36      while (itr.hasMoreTokens()) {
> > >    37        word.set(itr.nextToken());
> > >    38        context.write(word, one);
> > >    39      }
> > >    40      System.out.println("read shared data[0] = " +
> > > ToBeSharedData[0]);
> > >    41    }
> > >    42  }
> > >
> > > First, can you tell me how to make sure "jvm reuse" is taking effect,
> for
> > I
> > > didn't see anything different from before. I use "top" command under
> > linux
> > > and see the same number of java processes and same memory usage.
> > >
> > > Second, can you tell me how to make the "ToBeSharedData" be inited only
> > > once
> > > and can be read from other MapTasks on the same node? Or this is not a
> > > suitable programming style for map-reduce?
> > >
> > > By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a
> > > single-node.
> > > thanks in advance
> > >
> > > On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal <
> shara...@yahoo-inc.com
> > > >wrote:
> > >
> > > >
> > > > snowloong wrote:
> > > > > Hi,
> > > > > I want to share some data structures for the map tasks on a same
> > > node(not
> > > > through files), I mean, if one map task has already initialized some
> > data
> > > > structures (e.g. an array or a list), can other map tasks share these
> > > > memorys and directly access them, for I don't want to reinitialize
> > these
> > > > datas and I want to save some memory. Can hadoop help me do this?
> > > >
> > > > You can enable jvm reuse across tasks. See
> > mapred.job.reuse.jvm.num.tasks
> > > > in mapred-default.xml for usage. Then you can cache the data in a
> > static
> > > > variable in your mapper.
> > > >
> > > > - Sharad
> > > >
> > >
> >
> >
> >
> > --
> > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > http://www.apress.com/book/view/9781430219422
> > www.prohadoopbook.com a community for Hadoop Professionals
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals



  

Re: Can I share datas for several map tasks?

2009-06-16 Thread Iman E
Hi Jason,
Would you please tell us in which chapter is this example.
Thanks
Iman





From: jason hadoop 
To: core-user@hadoop.apache.org
Sent: Tuesday, June 16, 2009 6:51:48 AM
Subject: Re: Can I share datas for several map tasks?

In the examples for my book is a jvm reuse with static data shared between
jvm's example

On Tue, Jun 16, 2009 at 1:08 AM, Hello World  wrote:

> Thanks for your reply. Can you do me a favor to make a check?
> I modified mapred-default.xml as follows:
>540 
>541   mapred.job.reuse.jvm.num.tasks
>542   -1
>543   How many tasks to run per jvm. If set to -1, there is
>544   no limit.
>545   
>546 
> And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop;
>
> This is my program:
>
> 17 public class WordCount {
> 18
> 19   public static class TokenizerMapper
> 20extends Mapper{
> 21
> 22 private final static IntWritable one = new IntWritable(1);
> 23 private Text word = new Text();
> 24 public static int[] ToBeSharedData = new int[1024 * 1024 * 16];
> 25
> 26 protected void setup(Context context
> 27 ) throws IOException, InterruptedException {
> 28 //Init shared data
> 29 ToBeSharedData[0] = 12345;
> 30 System.out.println("setup shared data[0] = " +
> ToBeSharedData[0]);
> 31 }
> 32
> 33 public void map(Object key, Text value, Context context
> 34 ) throws IOException, InterruptedException {
> 35   StringTokenizer itr = new StringTokenizer(value.toString());
> 36   while (itr.hasMoreTokens()) {
> 37 word.set(itr.nextToken());
> 38 context.write(word, one);
> 39   }
> 40   System.out.println("read shared data[0] = " +
> ToBeSharedData[0]);
> 41 }
> 42   }
>
> First, can you tell me how to make sure "jvm reuse" is taking effect, for I
> didn't see anything different from before. I use "top" command under linux
> and see the same number of java processes and same memory usage.
>
> Second, can you tell me how to make the "ToBeSharedData" be inited only
> once
> and can be read from other MapTasks on the same node? Or this is not a
> suitable programming style for map-reduce?
>
> By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a
> single-node.
> thanks in advance
>
> On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal  >wrote:
>
> >
> > snowloong wrote:
> > > Hi,
> > > I want to share some data structures for the map tasks on a same
> node(not
> > through files), I mean, if one map task has already initialized some data
> > structures (e.g. an array or a list), can other map tasks share these
> > memorys and directly access them, for I don't want to reinitialize these
> > datas and I want to save some memory. Can hadoop help me do this?
> >
> > You can enable jvm reuse across tasks. See mapred.job.reuse.jvm.num.tasks
> > in mapred-default.xml for usage. Then you can cache the data in a static
> > variable in your mapper.
> >
> > - Sharad
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals



  

Re: Can I run the testcase in local

2009-05-10 Thread Iman

Zhang,
You will need cygwin. There is also a hadoop virtual machine that you 
can use.
Check this tutorials for more details: 
http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html


zjffdu wrote:

I found it can only work on linux, not windows.

 


So is there any way I can run it on windows.

 

 

 

From: zhang jianfeng [mailto:zjf...@gmail.com] 
Sent: 2009年5月10日 16:39

To: core-user@hadoop.apache.org
Subject: Re: Can I run the testcase in local

 


PS, I run it in windows machine

On Sun, May 10, 2009 at 4:11 PM, zjffdu  wrote:

Hi all,

 


I’d like to know more about the hadoop, so I want to debug the testcase in 
local.

 


But I found the errors below:  Can anyone help to solve this problem, thank you 
very much.

 

 


###

 


2009-05-10 16:00:51,483 ERROR namenode.FSNamesystem 
(FSNamesystem.java:(291)) - FSNamesystem initialization failed.

java.io.IOException: Problem starting http server

at org.apache.hadoop.http.HttpServer.start(HttpServer.java:369)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:372)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:289)

at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:162)

at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:209)

at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:197)

at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:822)

at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:275)

at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:119)

at 
org.apache.hadoop.mapred.ClusterMapReduceTestCase.startCluster(ClusterMapReduceTestCase.java:81)

at 
org.apache.hadoop.mapred.ClusterMapReduceTestCase.setUp(ClusterMapReduceTestCase.java:56)

at junit.framework.TestCase.runBare(TestCase.java:125)

at junit.framework.TestResult$1.protect(TestResult.java:106)

at junit.framework.TestResult.runProtected(TestResult.java:124)

at junit.framework.TestResult.run(TestResult.java:109)

at junit.framework.TestCase.run(TestCase.java:118)

at junit.framework.TestSuite.runTest(TestSuite.java:208)

at junit.framework.TestSuite.run(TestSuite.java:203)

at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)

at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)

at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)

at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)

at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)

at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

Caused by: org.mortbay.util.MultiException[java.lang.ClassNotFoundException: 
org.apache.hadoop.hdfs.server.namenode.dfshealth_jsp, 
java.lang.ClassNotFoundException: 
org.apache.hadoop.hdfs.server.namenode.nn_005fbrowsedfscontent_jsp]

at org.mortbay.http.HttpServer.doStart(HttpServer.java:731)

at org.mortbay.util.Container.start(Container.java:72)

at org.apache.hadoop.http.HttpServer.start(HttpServer.java:347)

... 23 more

2009-05-10 16:00:51,483 INFO  namenode.FSNamesystem (FSEditLog.java:printStatistics(940)) - Number of transactions: 0 Total time for transactions(ms): 0 Number of syncs: 0 SyncTimes(ms): 0 0 


2009-05-10 16:00:51,483 WARN  namenode.FSNamesystem 
(FSNamesystem.java:run(2217)) - ReplicationMonitor thread received 
InterruptedException.java.lang.InterruptedException: sleep interrupted

2009-05-10 16:00:51,655 INFO  ipc.Server (Server.java:stop(1033)) - Stopping 
server on 4233

 

 



  




Re: OT: How to search mailing list archives?

2009-03-08 Thread Iman
You might also want to try the mail archive: 
http://www.mail-archive.com/core-user@hadoop.apache.org/

Jimmy Lin wrote:

I've found nabble to be helpful:
http://www.nabble.com/Hadoop-core-user-f30590.html

-Jimmy

Miles Osborne wrote:

posts tend to get indexed by Google, so try that

Miles

2009/3/8 Stuart White :

This is slightly off-topic, and I realize this question is not
specific to Hadoop, but what is the best way to search the mailing
list archives?  Here's where I'm looking:

http://mail-archives.apache.org/mod_mbox/hadoop-core-user/

I don't see any way to search the archives.  Am I missing something?
Is there another archive site I should be looking at?

Thanks!









Re: Eclipse plugin

2009-02-26 Thread Iman

Hi John,
When I created the hadoop location, the hadoop.job.ugi did not appear in 
the advanced parameter. But when I later edited it, it was there. I 
donnu how that was fixed:)
Also to get it to work, I had to edit the fs.default.name and 
mapred.job.tracker in  hadoop/conf/hadoop-site.xml

I added these lines:

   fs.default.name
   hdfs://:9000
 
 
   mapred.job.tracker
   :9001
 
 
   dfs.replication
   1
 

Finally, I decided to install hadoop locally on my machine instead of 
using the hadoop virtual machine.

Iman.

John Livingstone wrote:

Iman-4,
I have encountered the same problem that you have encountered: Not being
able to access HDFS on my Hadoop VMware Linux server (uning the Hadoop Yahoo
tutorial) and not seeing "hadoop.job.ugi" in my Eclipse Europa 3.3.2 list of
parameters.  What did you have to do or change to get it to work?
Thanks,
John L.




Iman-4 wrote:
  

Thank you so much, Norbert. It worked.
Iman
Norbert Burger wrote:


Are running Eclipse on Windows?  If so, be aware that you need to spawn
Eclipse from within Cygwin in order to access HDFS.  It seems that the
plugin uses "whoami" to get info about the active user.  This thread has
some more info:

http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/%3c487cd747.8050...@signal7.de%3e

Norbert

On 2/12/09, Iman  wrote:
  
  

Hi,
I am using VM image hadoop-appliance-0.18.0.vmx and an eclipse plug-in
of
hadoop. I have followed all the steps in this tutorial:
http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html. My
problem is that I am not able to browse the HDFS. It only shows an entry
"Error:null". Upload files to DFS, and Create new directory fail. Any
suggestions? I have tried to chang all the directories in the hadoop
location advanced parameters to "/tmp/hadoop-user", but it did not work.
Also, the tutorials mentioned a parameter "hadoop.job.ugi" that needs to
be
changed, but I could not find it in the list of parameters.
Thanks
Iman



  
  





  




Re: Probelms getting Eclipse Hadoop plugin to work.

2009-02-20 Thread Iman
This thread helped me fix a similar problem: 
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/%3cc001e847c1fd4248a7d6537643690e2101c83...@mse16be2.mse16.exchange.ms%3e 



In my case, I had the ports specified in the hadoop-site.xml for the 
name node and job tracker switched in the Map/Reduce location's 
configuration.


Iman.
P.S. I sent this reply to the wrong thread before.
Erik Holstad wrote:

Thanks guys!
Running Linux and the remote cluster is also Linux.
I have the properties set up like that already on my remote cluster, but
not sure where to input this info into Eclipse.
And when changing the ports to 9000 and 9001 I get:

Error: java.io.IOException: Unknown protocol to job tracker:
org.apache.hadoop.dfs.ClientProtocol

Regards Erik

  




Re: Map/Recuce Job done locally?

2009-02-20 Thread Iman
This thread helped me fix a similar problem: 
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/%3cc001e847c1fd4248a7d6537643690e2101c83...@mse16be2.mse16.exchange.ms%3e


In my case, I had the ports specified in the hadoop-site.xml for the 
name node and job tracker switched in the Map/Reduce location's 
configuration.


Iman.

Erik Holstad wrote:

Hey Philipp!
Not sure about your time tracking thing, probably works, I've just used a
bash script
to start the jar and then you can do the timing in the script.
About how to compile the jars, you need to include the dependencies too, but
you will see what you are missing when you run the job.

Regards Erik

  




Re: Eclipse plugin

2009-02-12 Thread Iman

Thank you so much, Norbert. It worked.
Iman
Norbert Burger wrote:

Are running Eclipse on Windows?  If so, be aware that you need to spawn
Eclipse from within Cygwin in order to access HDFS.  It seems that the
plugin uses "whoami" to get info about the active user.  This thread has
some more info:

http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/%3c487cd747.8050...@signal7.de%3e

Norbert

On 2/12/09, Iman  wrote:
  

Hi,
I am using VM image hadoop-appliance-0.18.0.vmx and an eclipse plug-in of
hadoop. I have followed all the steps in this tutorial:
http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html. My
problem is that I am not able to browse the HDFS. It only shows an entry
"Error:null". Upload files to DFS, and Create new directory fail. Any
suggestions? I have tried to chang all the directories in the hadoop
location advanced parameters to "/tmp/hadoop-user", but it did not work.
Also, the tutorials mentioned a parameter "hadoop.job.ugi" that needs to be
changed, but I could not find it in the list of parameters.
Thanks
Iman




  




Eclipse plugin

2009-02-12 Thread Iman

Hi,
I am using VM image hadoop-appliance-0.18.0.vmx and an eclipse plug-in 
of hadoop. I have followed all the steps in this tutorial: 
http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html. My 
problem is that I am not able to browse the HDFS. It only shows an entry 
"Error:null". Upload files to DFS, and Create new directory fail. Any 
suggestions? I have tried to chang all the directories in the hadoop 
location advanced parameters to "/tmp/hadoop-user", but it did not work. 
Also, the tutorials mentioned a parameter "hadoop.job.ugi" that needs to 
be changed, but I could not find it in the list of parameters.

Thanks
Iman