Re: Could not reserve enough space for heap in JVM
In 32 bit machine u r limited to 4 gb heap size at jvm level per machine - Original Message - From: Arijit Mukherjee To: core-user@hadoop.apache.org Sent: Wed Feb 25 21:27:25 2009 Subject: Re: Could not reserve enough space for heap in JVM Mine is 32bit. As of now, it has only 2GB RAM, but I'm planning to acquire more hardware resources - so a clarification in this regard would help me in deciding the specs of the new cluster. Arijit 2009/2/26 souravm > Is ur machine 32 bit or 64 bit > > - Original Message - > From: Nick Cen > To: core-user@hadoop.apache.org > Sent: Wed Feb 25 21:10:00 2009 > Subject: Re: Could not reserve enough space for heap in JVM > > I got a question relatived to the HADOOP_HEAPSIZE variable. My machine's > memory size is 16G. but when i set HADOOP_HEAPSIZE to 4GB, it thrown the > exception refered in this thread. how can i make full use of my mem. thx. > > 2009/2/26 Arijit Mukherjee > > > I was getting similar errors too while running the mapreduce samples. I > > fiddled with the hadoop-env.sh (where the HEAPSIZE is specified) and the > > hadoop-site.xml files - and rectified it after some trial and error. But > I > > would like to know if there is a thumb rule for this. Right now, I've a > > core > > duo machine with 2GB RAM running on Ubuntu 8.10, and I've found that a > > HEAPSIZE of 256Mb works without any problems. Anything more than that > would > > give the same error (even when nothing else is going on in the machine). > > > > Arijit > > > > 2009/2/26 Anum Ali > > > > > If the solution given my Matei Zaharia wont work , which I guess it > > > wont if you are using eclipse 3.3.0 because this is a bug , which they > > > resloved it in later version which is eclipse 3.4 ganymede. Better > > > upgrade eclipse version. > > > > > > > > > > > > > > > On 2/26/09, Matei Zaharia wrote: > > > > These variables have to be at runtime through a config file, not at > > > compile > > > > time. You can set them in hadoop-env.sh: Uncomment the line with > export > > > > HADOOP_HEAPSIZE= to set the heap size for all Hadoop > > processes, > > > or > > > > change options for specific commands. Now these commands are for the > > > Hadoop > > > > processes themselves, but if you are getting the error in tasks > you're > > > > running, you can set these in your hadoop-site.xml through the > > > > mapred.child.java.opts variable, as follows: > > > > > > > > mapred.child.java.opts > > > > -Xmx512m > > > > > > > > > > > > By the way I'm not sure if -J-Xmx is the right syntax; I've always > seen > > > -Xmx > > > > and -Xms. > > > > > > > > On Wed, Feb 25, 2009 at 5:05 PM, madhuri72 > wrote: > > > > > > > >> > > > >> Hi, > > > >> > > > >> I'm trying to run hadoop version 19 on ubuntu with java build > > > >> 1.6.0_11-b03. > > > >> I'm getting the following error: > > > >> > > > >> Error occurred during initialization of VM > > > >> Could not reserve enough space for object heap > > > >> Could not create the Java virtual machine. > > > >> make: *** [run] Error 1 > > > >> > > > >> I searched the forums and found some advice on setting the VM's > memory > > > via > > > >> the javac options > > > >> > > > >> -J-Xmx512m or -J-Xms256m > > > >> > > > >> I have tried this with various sizes between 128 and 1024 MB. I am > > > adding > > > >> this tag when I compile the source. This isn't working for me, and > > > >> allocating 1 GB of memory is a lot for the machine I'm using. Is > > there > > > >> some > > > >> way to make this work with hadoop? Is there somewhere else I can > set > > > the > > > >> heap memory? > > > >> > > > >> Thanks. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> -- > > > >> View this message in context: > > > >> > > > > > > http://www.nabble.com/Could-not-reserve-enough-space-for-heap-in-JVM-tp22215608p22215608.html > > > >> Sent from the Hadoop core-user mailing list ar
Re: Could not reserve enough space for heap in JVM
Is ur machine 32 bit or 64 bit - Original Message - From: Nick Cen To: core-user@hadoop.apache.org Sent: Wed Feb 25 21:10:00 2009 Subject: Re: Could not reserve enough space for heap in JVM I got a question relatived to the HADOOP_HEAPSIZE variable. My machine's memory size is 16G. but when i set HADOOP_HEAPSIZE to 4GB, it thrown the exception refered in this thread. how can i make full use of my mem. thx. 2009/2/26 Arijit Mukherjee > I was getting similar errors too while running the mapreduce samples. I > fiddled with the hadoop-env.sh (where the HEAPSIZE is specified) and the > hadoop-site.xml files - and rectified it after some trial and error. But I > would like to know if there is a thumb rule for this. Right now, I've a > core > duo machine with 2GB RAM running on Ubuntu 8.10, and I've found that a > HEAPSIZE of 256Mb works without any problems. Anything more than that would > give the same error (even when nothing else is going on in the machine). > > Arijit > > 2009/2/26 Anum Ali > > > If the solution given my Matei Zaharia wont work , which I guess it > > wont if you are using eclipse 3.3.0 because this is a bug , which they > > resloved it in later version which is eclipse 3.4 ganymede. Better > > upgrade eclipse version. > > > > > > > > > > On 2/26/09, Matei Zaharia wrote: > > > These variables have to be at runtime through a config file, not at > > compile > > > time. You can set them in hadoop-env.sh: Uncomment the line with export > > > HADOOP_HEAPSIZE= to set the heap size for all Hadoop > processes, > > or > > > change options for specific commands. Now these commands are for the > > Hadoop > > > processes themselves, but if you are getting the error in tasks you're > > > running, you can set these in your hadoop-site.xml through the > > > mapred.child.java.opts variable, as follows: > > > > > > mapred.child.java.opts > > > -Xmx512m > > > > > > > > > By the way I'm not sure if -J-Xmx is the right syntax; I've always seen > > -Xmx > > > and -Xms. > > > > > > On Wed, Feb 25, 2009 at 5:05 PM, madhuri72 wrote: > > > > > >> > > >> Hi, > > >> > > >> I'm trying to run hadoop version 19 on ubuntu with java build > > >> 1.6.0_11-b03. > > >> I'm getting the following error: > > >> > > >> Error occurred during initialization of VM > > >> Could not reserve enough space for object heap > > >> Could not create the Java virtual machine. > > >> make: *** [run] Error 1 > > >> > > >> I searched the forums and found some advice on setting the VM's memory > > via > > >> the javac options > > >> > > >> -J-Xmx512m or -J-Xms256m > > >> > > >> I have tried this with various sizes between 128 and 1024 MB. I am > > adding > > >> this tag when I compile the source. This isn't working for me, and > > >> allocating 1 GB of memory is a lot for the machine I'm using. Is > there > > >> some > > >> way to make this work with hadoop? Is there somewhere else I can set > > the > > >> heap memory? > > >> > > >> Thanks. > > >> > > >> > > >> > > >> > > >> > > >> -- > > >> View this message in context: > > >> > > > http://www.nabble.com/Could-not-reserve-enough-space-for-heap-in-JVM-tp22215608p22215608.html > > >> Sent from the Hadoop core-user mailing list archive at Nabble.com. > > >> > > >> > > > > > > > > > -- > "And when the night is cloudy, > There is still a light that shines on me, > Shine on until tomorrow, let it be." > -- http://daily.appspot.com/food/ CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
RE: Any suggestion on performance improvement ?
Hi Alex, I get 30-40 secs of response time for around 60MB of data. The number of Map and Reduce task is 1 each. This is because the default HDFS block size is 64 MB and Pig assigns 1 Map task for each HDFS block - I believe that is optimal. Now this being the unit of performance even if I increase the number of node I don't think the performance would be better. Regards, Sourav -Original Message- From: Alex Loddengaard [mailto:[EMAIL PROTECTED] Sent: Friday, November 14, 2008 9:44 AM To: core-user@hadoop.apache.org Subject: Re: Any suggestion on performance improvement ? How big is the data that you're loading and filtering? Your cluster is pretty small, so if you have data on the magnitude of tens or hundreds of GBs, then the performance you're describing is probably to be expected. How many map and reduce tasks are you running on each node? Alex On Thu, Nov 13, 2008 at 4:55 PM, souravm <[EMAIL PROTECTED]> wrote: > Hi, > > I'm testing with a 4 node setup of Hadoop hdfs. > > Each node has configuration of 2GB memory and dual core and around 30-60 GB > disk space. > > I've kept files of different sizes in the hdfs ranging from 10MB to 5 GB. > > I'm querying those files using PIG. What I'm seeing that even a simple > select query (LOAD and FILTER) is taking at least 30-40 sec of time. The MAP > process in one node takes at least 25 sec. > > I've kept the jvm max heap size to 1024m. > > Any suggestion on how to improve the performance with different > configuration at Hadoop level (by changing hdfs and MapReduce parameters) ? > > Regards, > Sourav > > CAUTION - Disclaimer * > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended > solely > for the use of the addressee(s). If you are not the intended recipient, > please > notify the sender by e-mail and delete the original message. Further, you > are not > to copy, disclose, or distribute this e-mail or its contents to any other > person and > any such actions are unlawful. This e-mail may contain viruses. Infosys has > taken > every reasonable precaution to minimize this risk, but is not liable for > any damage > you may sustain as a result of any virus in this e-mail. You should carry > out your > own virus checks before opening the e-mail or attachment. Infosys reserves > the > right to monitor and review the content of all messages sent to or from > this e-mail > address. Messages sent to or from this e-mail address may be stored on the > Infosys e-mail system. > ***INFOSYS End of Disclaimer INFOSYS*** >
Any suggestion on performance improvement ?
Hi, I'm testing with a 4 node setup of Hadoop hdfs. Each node has configuration of 2GB memory and dual core and around 30-60 GB disk space. I've kept files of different sizes in the hdfs ranging from 10MB to 5 GB. I'm querying those files using PIG. What I'm seeing that even a simple select query (LOAD and FILTER) is taking at least 30-40 sec of time. The MAP process in one node takes at least 25 sec. I've kept the jvm max heap size to 1024m. Any suggestion on how to improve the performance with different configuration at Hadoop level (by changing hdfs and MapReduce parameters) ? Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
RE: Need help in hdfs configuration fully distributed way in Mac OSX...
Hi Mafish, Thanks for your suggestions. Finally I could resolve the issue. The *site.xml in namenode had ds.default.name as localhost where as in data nodes it were the actual ip. I changed the local host to actual ip in name node and it started working. Regards, Sourav -Original Message- From: Mafish Liu [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 16, 2008 7:37 PM To: core-user@hadoop.apache.org Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX... Hi, souravm: I don't know exactly what's wrong with your configuration from your post and I guest the possible causes are: 1. Make sure firewall on namenode is off or the port of 9000 is free to connect in your firewall configuration. 2. Namenode. Check the namenode start up log to see if namenode starts up correctly, or try run 'jps' on your namenode to see if there is process called "namenode". May this help. On Tue, Sep 16, 2008 at 10:41 PM, souravm <[EMAIL PROTECTED]> wrote: > Hi, > > Tha namenode in machine 1 has started. I can see the following log. Is > there a specific way to provide the master name in masters file (in > hadoop/conf) in datanode ? I've currently specified > > 2008-09-16 07:23:46,321 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: > Initializing RPC Metrics with hostName=NameNode, port=9000 > 2008-09-16 07:23:46,325 INFO org.apache.hadoop.dfs.NameNode: Namenode up > at: localhost/127.0.0.1:9000 > 2008-09-16 07:23:46,327 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=NameNode, sessionId=null > 2008-09-16 07:23:46,329 INFO org.apache.hadoop.dfs.NameNodeMetrics: > Initializing NameNodeMeterics using context > object:org.apache.hadoop.metrics.spi.NullContext > 2008-09-16 07:23:46,404 INFO org.apache.hadoop.fs.FSNamesystem: > fsOwner=souravm,souravm,_lpadmin,_appserveradm,_appserverusr,admin > 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: > supergroup=supergroup > 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: > isPermissionEnabled=true > 2008-09-16 07:23:46,473 INFO org.apache.hadoop.fs.FSNamesystem: Finished > loading FSImage in 112 msecs > 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* > Leaving safe mode after 0 secs. > 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* > Network topology has 0 racks and 0 datanodes > 2008-09-16 07:23:46,480 INFO org.apache.hadoop.dfs.StateChange: STATE* > UnderReplicatedBlocks has 0 blocks > 2008-09-16 07:23:46,486 INFO org.apache.hadoop.fs.FSNamesystem: Registered > FSNamesystemStatusMBean > 2008-09-16 07:23:46,561 INFO org.mortbay.util.Credential: Checking Resource > aliases > 2008-09-16 07:23:46,627 INFO org.mortbay.http.HttpServer: Version > Jetty/5.1.4 > 2008-09-16 07:23:46,907 INFO org.mortbay.util.Container: Started > [EMAIL PROTECTED] > 2008-09-16 07:23:46,937 INFO org.mortbay.util.Container: Started > WebApplicationContext[/,/] > 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started > HttpContext[/logs,/logs] > 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started > HttpContext[/static,/static] > 2008-09-16 07:23:46,939 INFO org.mortbay.http.SocketListener: Started > SocketListener on 0.0.0.0:50070 > 2008-09-16 07:23:46,939 INFO org.mortbay.util.Container: Started > [EMAIL PROTECTED] > 2008-09-16 07:23:46,940 INFO org.apache.hadoop.fs.FSNamesystem: Web-server > up at: 0.0.0.0:50070 > 2008-09-16 07:23:46,940 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2008-09-16 07:23:46,942 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 9000: starting > 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 0 on 9000: starting > 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 1 on 9000: starting > 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 2 on 9000: starting > 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 3 on 9000: starting > 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 4 on 9000: starting > 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 5 on 9000: starting > 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 6 on 9000: starting > 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 7 on 9000: starting > 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 8 on 9000: starting > 2008-09-16 07:23:46,944 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 9 on 9000: starting > > Is there a specific way to provide the master name in m
Remote datanode (fully distributed way) is not starting in Mac OSX...
Hi. Any pointer on what could be the problem ? Regards, Sourav From: souravm Sent: Tuesday, September 16, 2008 1:07 AM To: 'core-user@hadoop.apache.org' Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX... Hi, I tried the way u suggested. I setup ssh without password. So now namenode can connect to datanode without password - the start-dfs.sh script does not ask for any password. However, even with this fix I still face the same problem. Regards, Sourav - Original Message - From: Mafish Liu <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Mon Sep 15 23:26:10 2008 Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX... Hi: You need to configure your nodes to ensure that node 1 can connect to node 2 without password. On Tue, Sep 16, 2008 at 2:04 PM, souravm <[EMAIL PROTECTED]> wrote: > Hi All, > > I'm facing a problem in configuring hdfs in a fully distributed way in Mac > OSX. > > Here is the topology - > > 1. The namenode is in machine 1 > 2. There is 1 datanode in machine 2 > > Now when I execute start-dfs.sh from machine 1, it connects to machine 2 > (after it asks for password for connecting to machine 2) and starts datanode > in machine 2 (as the console message says). > > However - > 1. When I go to http://machine1:50070 - it does not show the data node at > all. It says 0 data node configured > 2. In the log file in machine 2 what I see is - > / > STARTUP_MSG: Starting DataNode > STARTUP_MSG: host = rc0902b-dhcp169.apple.com/17.229.22.169 > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.17.2.1 > STARTUP_MSG: build = > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r > 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008 > / > 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 1 time(s). > 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 2 time(s). > 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 3 time(s). > 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 4 time(s). > 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 5 time(s). > 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 6 time(s). > 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 7 time(s). > 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 8 time(s). > 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 9 time(s). > 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 10 time(s). > 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at / > 17.229.23.77:9000 not available yet, Z... > > ... and this retyring gets on repeating > > > The hadoop-site.xmls are like this - > > 1. In machine 1 > - > > > >fs.default.name >hdfs://localhost:9000 > > > >dfs.name.dir >/Users/souravm/hdpn > > > >mapred.job.tracker >localhost:9001 > > >dfs.replication >1 > > > > > 2. In machine 2 > > > > >fs.default.name >hdfs://:9000 > > >dfs.data.dir >/Users/nirdosh/hdfsd1 > > >dfs.replication >1 > > > > The slaves file in machine 1 has single entry - @ machine2> > > The exact steps I did - > > 1. Reformat the namenode in machine 1 > 2. execute start-dfs.sh in machine 1 > 3. Then I try to see whether the datanode is created through http:// 1 ip>:50070 > > Any pointer to resolve this issue would be appreciated. > > Regards, > Sourav > > > > CAUTION - Disclaimer * > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended > solely > for the use of the addressee(s). If you are not the intended recipient, > please > notify the sender by e-mail and delete the original message. Further, you > are not > to copy, disclose, or di
RE: Need help in hdfs configuration fully distributed way in Mac OSX...
Hi, Tha namenode in machine 1 has started. I can see the following log. Is there a specific way to provide the master name in masters file (in hadoop/conf) in datanode ? I've currently specified 2008-09-16 07:23:46,321 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2008-09-16 07:23:46,325 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: localhost/127.0.0.1:9000 2008-09-16 07:23:46,327 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2008-09-16 07:23:46,329 INFO org.apache.hadoop.dfs.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2008-09-16 07:23:46,404 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=souravm,souravm,_lpadmin,_appserveradm,_appserverusr,admin 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: supergroup=supergroup 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: isPermissionEnabled=true 2008-09-16 07:23:46,473 INFO org.apache.hadoop.fs.FSNamesystem: Finished loading FSImage in 112 msecs 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* Leaving safe mode after 0 secs. 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 2008-09-16 07:23:46,480 INFO org.apache.hadoop.dfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2008-09-16 07:23:46,486 INFO org.apache.hadoop.fs.FSNamesystem: Registered FSNamesystemStatusMBean 2008-09-16 07:23:46,561 INFO org.mortbay.util.Credential: Checking Resource aliases 2008-09-16 07:23:46,627 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2008-09-16 07:23:46,907 INFO org.mortbay.util.Container: Started [EMAIL PROTECTED] 2008-09-16 07:23:46,937 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-09-16 07:23:46,939 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50070 2008-09-16 07:23:46,939 INFO org.mortbay.util.Container: Started [EMAIL PROTECTED] 2008-09-16 07:23:46,940 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up at: 0.0.0.0:50070 2008-09-16 07:23:46,940 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2008-09-16 07:23:46,942 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9000: starting 2008-09-16 07:23:46,944 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000: starting Is there a specific way to provide the master name in masters file (in hadoop/conf) in datanode ? I've currently specified @. I'm thinking there might be a problem as in log file of data node I can see the message '2008-09-16 14:38:51,501 INFO org.apache.hadoop.ipc.RPC: Server at /192.168.1.102:9000 not available yet, Z...' Any help ? Regards, Sourav From: Samuel Guo [EMAIL PROTECTED] Sent: Tuesday, September 16, 2008 5:49 AM To: core-user@hadoop.apache.org Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX... check the namenode's log in machine1 to see if your namenode started successfully :) On Tue, Sep 16, 2008 at 2:04 PM, souravm <[EMAIL PROTECTED]> wrote: > Hi All, > > I'm facing a problem in configuring hdfs in a fully distributed way in Mac > OSX. > > Here is the topology - > > 1. The namenode is in machine 1 > 2. There is 1 datanode in machine 2 > > Now when I execute start-dfs.sh from machine 1, it connects to machine 2 > (after it asks for password for connecting to machine 2) and starts datanode > in machine 2 (as the console message says). > > However - > 1. When I go to http://machine1:50070 - it does not show the data node at > all. It says 0 data node configured > 2. In the log file in machine 2 what I see is - &
Re: Need help in hdfs configuration fully distributed way in Mac OSX...
Hi, I tried the way u suggested. I setup ssh without password. So now namenode can connect to datanode without password - the start-dfs.sh script does not ask for any password. However, even with this fix I still face the same problem. Regards, Sourav - Original Message - From: Mafish Liu <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Mon Sep 15 23:26:10 2008 Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX... Hi: You need to configure your nodes to ensure that node 1 can connect to node 2 without password. On Tue, Sep 16, 2008 at 2:04 PM, souravm <[EMAIL PROTECTED]> wrote: > Hi All, > > I'm facing a problem in configuring hdfs in a fully distributed way in Mac > OSX. > > Here is the topology - > > 1. The namenode is in machine 1 > 2. There is 1 datanode in machine 2 > > Now when I execute start-dfs.sh from machine 1, it connects to machine 2 > (after it asks for password for connecting to machine 2) and starts datanode > in machine 2 (as the console message says). > > However - > 1. When I go to http://machine1:50070 - it does not show the data node at > all. It says 0 data node configured > 2. In the log file in machine 2 what I see is - > / > STARTUP_MSG: Starting DataNode > STARTUP_MSG: host = rc0902b-dhcp169.apple.com/17.229.22.169 > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.17.2.1 > STARTUP_MSG: build = > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r > 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008 > / > 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 1 time(s). > 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 2 time(s). > 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 3 time(s). > 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 4 time(s). > 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 5 time(s). > 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 6 time(s). > 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 7 time(s). > 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 8 time(s). > 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 9 time(s). > 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /17.229.23.77:9000. Already tried 10 time(s). > 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at / > 17.229.23.77:9000 not available yet, Z... > > ... and this retyring gets on repeating > > > The hadoop-site.xmls are like this - > > 1. In machine 1 > - > > > >fs.default.name >hdfs://localhost:9000 > > > >dfs.name.dir >/Users/souravm/hdpn > > > >mapred.job.tracker >localhost:9001 > > >dfs.replication >1 > > > > > 2. In machine 2 > > > > >fs.default.name >hdfs://:9000 > > >dfs.data.dir >/Users/nirdosh/hdfsd1 > > >dfs.replication >1 > > > > The slaves file in machine 1 has single entry - @ machine2> > > The exact steps I did - > > 1. Reformat the namenode in machine 1 > 2. execute start-dfs.sh in machine 1 > 3. Then I try to see whether the datanode is created through http:// 1 ip>:50070 > > Any pointer to resolve this issue would be appreciated. > > Regards, > Sourav > > > > CAUTION - Disclaimer * > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended > solely > for the use of the addressee(s). If you are not the intended recipient, > please > notify the sender by e-mail and delete the original message. Further, you > are not > to copy, disclose, or distribute this e-mail or its contents to any other > person and > any such actions are unlawful. This e-mail may contain viruses. Infosys has > taken > every reasonable precaution to minimize this risk, but is not liable for > any damage > you may sustain as a resul
Need help in hdfs configuration fully distributed way in Mac OSX...
Hi All, I'm facing a problem in configuring hdfs in a fully distributed way in Mac OSX. Here is the topology - 1. The namenode is in machine 1 2. There is 1 datanode in machine 2 Now when I execute start-dfs.sh from machine 1, it connects to machine 2 (after it asks for password for connecting to machine 2) and starts datanode in machine 2 (as the console message says). However - 1. When I go to http://machine1:50070 - it does not show the data node at all. It says 0 data node configured 2. In the log file in machine 2 what I see is - / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = rc0902b-dhcp169.apple.com/17.229.22.169 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.17.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008 / 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 1 time(s). 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 2 time(s). 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 3 time(s). 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 4 time(s). 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 5 time(s). 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 6 time(s). 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 7 time(s). 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 8 time(s). 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 9 time(s). 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 10 time(s). 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at /17.229.23.77:9000 not available yet, Z... ... and this retyring gets on repeating The hadoop-site.xmls are like this - 1. In machine 1 - fs.default.name hdfs://localhost:9000 dfs.name.dir /Users/souravm/hdpn mapred.job.tracker localhost:9001 dfs.replication 1 2. In machine 2 fs.default.name hdfs://:9000 dfs.data.dir /Users/nirdosh/hdfsd1 dfs.replication 1 The slaves file in machine 1 has single entry - @ The exact steps I did - 1. Reformat the namenode in machine 1 2. execute start-dfs.sh in machine 1 3. Then I try to see whether the datanode is created through http://:50070 Any pointer to resolve this issue would be appreciated. Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Accessing input files from different servers
Hi, I would like to process a set of log files (say web server access log) from a number of different machines. So I need to get those log files from the respective machines to my central HDFS. To achieve this - a) Do I need to install hadoop and start reunning HDFS (using start-dfs.sh) in all those machines where the log files are getting created ? And then do a file get from the central HDFS server` ? b) Any other way to achive this ? Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
RE: Why can't Hadoop be used for online applications ?
Thanks Ryan for your inputs. Regards, Sourav From: Ryan LeCompte [EMAIL PROTECTED] Sent: Friday, September 12, 2008 11:55 AM To: core-user@hadoop.apache.org Subject: Re: Why can't Hadoop be used for online applications ? Hadoop is best suited for distributed processing across many machines of large data sets. Most people use Hadoop to plow through large data sets in an offline fashion. One approach that you can use is to use Hadoop to process your data, then put it in an optimized form in HBase (i.e., similar to Google's Bigtable). Then, you can use HBase for querying the data in an online-access fashion. Refer to http://hadoop.apache.org/hbase/ for more information about HBase. Ryan On Fri, Sep 12, 2008 at 2:46 PM, souravm <[EMAIL PROTECTED]> wrote: > Hi, > > Here is a bsic doubt. > > I found in different documentation it is mentioned that Hadoop is not > recommended for online applications. Can anyone please elaborate on the same ? > > Regards, > Sourav > > CAUTION - Disclaimer * > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely > for the use of the addressee(s). If you are not the intended recipient, please > notify the sender by e-mail and delete the original message. Further, you are > not > to copy, disclose, or distribute this e-mail or its contents to any other > person and > any such actions are unlawful. This e-mail may contain viruses. Infosys has > taken > every reasonable precaution to minimize this risk, but is not liable for any > damage > you may sustain as a result of any virus in this e-mail. You should carry out > your > own virus checks before opening the e-mail or attachment. Infosys reserves the > right to monitor and review the content of all messages sent to or from this > e-mail > address. Messages sent to or from this e-mail address may be stored on the > Infosys e-mail system. > ***INFOSYS End of Disclaimer INFOSYS*** >
Why can't Hadoop be used for online applications ?
Hi, Here is a bsic doubt. I found in different documentation it is mentioned that Hadoop is not recommended for online applications. Can anyone please elaborate on the same ? Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***