Re: Error in Cluster Startup: NameNode is not formatted
The property "dfs.name.dir" allows you to control where Hadoop writes NameNode metadata. You should have a property like dfs.name.dir /data/zhang/hadoop/name/data to make sure the NameNode data isn't being deleted when you delete the files in /tmp. -Matt On Jun 26, 2009, at 2:33 PM, Boyu Zhang wrote: Matt, Thanks a lot for your reply! I did formatted the namenode. But I got the same error again. And actually I successfully run the example jar file once, but after that one time, I couldn't get it run again. I clean the / tmp dir every time before I format namenode again(I am just testing it, so I don't worry about losing data:). Still, I got the same error when I execute the bin/start-dfs.sh . I checked my conf, and I can't figure out why. Here is my conf file: I really appreciate if you could take a look at it. Thanks a lot. fs.default.name hdfs://hostname1:9000 mapred.job.tracker hostname2:9001 dfs.data.dir /data/zhang/hadoop/dfs/data Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. mapred.local.dir /data/zhang/hadoop/mapred/local The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. -----Original Message- From: Matt Massie [mailto:m...@cloudera.com] Sent: Friday, June 26, 2009 4:31 PM To: core-user@hadoop.apache.org Subject: Re: Error in Cluster Startup: NameNode is not formatted Boyu- You didn't do anything stupid. I've forgotten to format a NameNode too myself. If you check the QuickStart guide at http://hadoop.apache.org/core/docs/current/quickstart.html you'll see that formatting the NameNode is the first of the Execution section (near the bottom of the page). The command to format the NameNode is: hadoop namenode -format A warning though, you should only format your NameNode once. Just like formatting any filesystem, you can loss data if you (re)format. Good luck. -Matt On Jun 26, 2009, at 1:25 PM, Boyu Zhang wrote: Hi all, I am a student and I am trying to install the Hadoop on a cluster, I have one machine running namenode, one running jobtracker, two slaves. When I run the /bin/start-dfs.sh , there is something wrong with my namenode, it won't start. Here is the error message in the log file: ERROR org.apache.hadoop.fs.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: NameNode is not formatted. at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:243) at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80) at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294) at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:273) at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148) at org.apache.hadoop.dfs.NameNode.(NameNode.java:193) at org.apache.hadoop.dfs.NameNode.(NameNode.java:179) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839) I think it is something stupid i did, could somebody help me out? Thanks a lot! Sincerely, Boyu Zhang
Re: Error in Cluster Startup: NameNode is not formatted
Boyu- You didn't do anything stupid. I've forgotten to format a NameNode too myself. If you check the QuickStart guide at http://hadoop.apache.org/core/docs/current/quickstart.html you'll see that formatting the NameNode is the first of the Execution section (near the bottom of the page). The command to format the NameNode is: hadoop namenode -format A warning though, you should only format your NameNode once. Just like formatting any filesystem, you can loss data if you (re)format. Good luck. -Matt On Jun 26, 2009, at 1:25 PM, Boyu Zhang wrote: Hi all, I am a student and I am trying to install the Hadoop on a cluster, I have one machine running namenode, one running jobtracker, two slaves. When I run the /bin/start-dfs.sh , there is something wrong with my namenode, it won't start. Here is the error message in the log file: ERROR org.apache.hadoop.fs.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: NameNode is not formatted. at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:243) at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80) at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294) at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:273) at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148) at org.apache.hadoop.dfs.NameNode.(NameNode.java:193) at org.apache.hadoop.dfs.NameNode.(NameNode.java:179) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839) I think it is something stupid i did, could somebody help me out? Thanks a lot! Sincerely, Boyu Zhang
Re: UnknownHostException
fs.default.name in your hadoop-site.xml needs to be set to a fully- qualified domain name (instead of an IP address) -Matt On Jun 23, 2009, at 6:42 AM, bharath vissapragada wrote: when i try to execute the command bin/start-dfs.sh , i get the following error . I have checked the hadoop-site.xml file on all the nodes , and they are fine .. can some-one help me out! 10.2.24.21: Exception in thread "main" java.net.UnknownHostException: unknown host: 10.2.24.21. 10.2.24.21: at org.apache.hadoop.ipc.Client$Connection.(Client.java:195) 10.2.24.21: at org.apache.hadoop.ipc.Client.getConnection(Client.java:779) 10.2.24.21: at org.apache.hadoop.ipc.Client.call(Client.java:704) 10.2.24.21: at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java: 216) 10.2.24.21: at org.apache.hadoop.dfs. $Proxy4.getProtocolVersion(Unknown Source) 10.2.24.21: at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319) 10.2.24.21: at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306) 10.2.24.21: at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343) 10.2.24.21: at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java: 288)
Re: HDFS out of space
Pankil- I'd be interested to know the size of the /mnt and /mnt2 partitions. Are they the same? Can you run the following and report the output... % df -h /mnt /mnt2 Thanks. -Matt On Jun 22, 2009, at 1:32 PM, Pankil Doshi wrote: Hey Alex, Will Hadoop balancer utility work in this case? Pankil On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard wrote: Are you seeing any exceptions because of the disk being at 99% capacity? Hadoop should do something sane here and write new data to the disk with more capacity. That said, it is ideal to be balanced. As far as I know, there is no way to balance an individual DataNode's hard drives (Hadoop does round-robin scheduling when writing data). Alex On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo wrote: Hi all, How does one handle a mount running out of space for HDFS? We have two disks mounted on /mnt and /mnt2 respectively on one of the machines that are used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a way to tell the machine to balance itself out? I know for the cluster, you can balance it using start-balancer.sh but I don't think that it will tell the individual machine to balance itself out. Our "hack" right now would be just to delete the data on /mnt, since we have replication of 3x, we should be OK. But I'd prefer not to do that. Any thoughts?
Re: Need help
Hadoop can be run on a hardware heterogeneous cluster. Currently, Hadoop clusters really only run well on Linux although you can run a Hadoop client on non-Linux machines. You will need to have a special configuration for each of the machine in your cluster based on their hardware profile. Ideally, you'll be able to group the machines in your cluster into "classes" of machines (e.g. machines with 1GB of RAM and 2 core versus 4GB of RAM and 4 core) to reduce the burden of managing multiple configurations. If you are talking about a Hadoop cluster that is completely heterogeneous (each machine is completely different), the management overhead could be high. Configuration variables like "mapred.tasktracker.map.tasks.maximum" and "mapred.tasktracker.reduce.tasks.maximum" should be set based on the number of cores/memory in each machine. Variables like "mapred.child.java.opts" need to be set differently based on the amount of memory the machine has (e.g. "-Xmx250m"). You should have at least 250MB of memory dedicated to each task although more is better. It's also wise to make sure that each task has the same amount of memory regardless of the machine it's scheduled on; otherwise, tasks might succeed or fail based on which machine gets the task. This asymmetry will make debugging harder. You can use our online configurator (http://www.cloudera.com/configurator/ ), to generate optimized configurations for each class of machines in your cluster. It will ask simple question about your configuration and then produce a hadoop-site.xml file. Good luck! -Matt On Jun 18, 2009, at 8:33 AM, ashish pareek wrote: Can you tell few of the challenges in configuring heterogeneous cluster...or can pass on some link where I would get some information regarding challenges in running Hadoop on heterogeneous hardware One more things is How about running different applications on the same Hadoop cluster?and what challenges are involved in it ? Thanks, Regards, Ashish On Thu, Jun 18, 2009 at 8:53 PM, jason hadoop wrote: I don't know anyone who has a completely homogeneous cluster. So hadoop is scalable across heterogeneous environments. I stated that configuration is simpler if the machines are similar (There are optimizations in configuration for near homogeneous machines.) On Thu, Jun 18, 2009 at 8:10 AM, ashish pareek wrote: Does that mean hadoop is not scalable wrt heterogeneous environment? and one more question is can we run different application on the same hadoop cluster . Thanks. Regards, Ashish On Thu, Jun 18, 2009 at 8:30 PM, jason hadoop wrote: Hadoop has always been reasonably agnostic wrt hardware and homogeneity. There are optimizations in configuration for near homogeneous machines. On Thu, Jun 18, 2009 at 7:46 AM, ashish pareek wrote: Hello, I am doing my master my final year project is on Hadoop ...so I would like to know some thing about Hadoop cluster i.e, Do new version of Hadoop are able to handle heterogeneous hardware.If you have any informantion regarding these please mail me as my project is in heterogenous environment. Thanks! Reagrds, Ashish Pareek -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Small Issues..!
On Jun 14, 2009, at 11:01 PM, Sugandha Naolekar wrote: Hello! I have a 4 node cluster of hadoop running. Now, there is 5th machine which is acting as a client of hadoop. It's not a part of the hadoop cluster(master/slave config file). Now I have to writer a JAVA code that gets executed on this client which will simply put the client ystem's data into HDFS(and get it replicated over 2 datanodes) and as per my requirement, I can simply fetch it back on the client machine itself. For this, I have done following things as of now:: *** -> Among 4 nodes 2 are datanodes and ther oter 2 are namenode and jobtracker respectively. *** *** -> Now, to make that code work on client machine, I have designed a UI. Now here on the client m/c, do i need to install hadoop? *** You will need to have the same version of Hadoop installed on any client that need to communicate with the Hadoop cluster. *** -> I have installed hadoop on it, and in it's config file, I have specified only 2 tags. 1) fs.default.name-> value=namenode's address. 2) dfs.http.address(namenode's addres) *** I'm assuming you mean that you have Hadoop installed on the client with a hadoop-site.xml (or core-site.xml) with the correct fs.default.name. Correct? *** Thus, If there is a file in /home/hadoop/test.java on client machine; I will have 2 get the instance of HDFS fs by Filesystem.get. rt?? *** Before you begin writing special FileSystem Java code, I would do a quick sanity check of the client configuration. Can you run the command... % bin/hadoop fs -ls ...without error? Can you -put files onto HDFS from the client... % bin/hadoop fs -put ...without error? * You should also check your firewall rules between the client and NameNode. * Make sure that the TCP port you specified in fs.default.name is open for connection from the client. * Run "netstat -t -l" to make sure that the NameNode is running and listening on the TCP port you specified. Only when you've ensured that the hadoop commandline works would I begin writing custom client code based on the FileSystem class. *** Then, by using Filesystem.util, I will have to simply specify both the fs::local as src, hdfs as destination, and src path as the /home/hadoop/test.java and destination as /user/hadoop/. rt?? So it should work ...! *** *** -> But, it gives me an error as "not able to find src path /home/hadoop/test.java" -> Will i have to use RPC classes and methods under hadoop api to do this.?? *** You should be able to just use the FileSystem class to w/o needing to use any RPC classes FileSystem documentation: http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/FileSystem.html *** Things don;t seem to be working in any of the ways. Please help me out. *** Thanks!
Re: Multiple NIC Cards
If you look at the documentation for the getCanonicalHostName() function (thanks, Steve)... http://java.sun.com/javase/6/docs/api/java/net/InetAddress.html#getCanonicalHostName() you'll see two Java security properties (networkaddress.cache.ttl and networkaddress.cache.negative.ttl). You might take a look at your /etc/nsswitch.conf configuration as well to learn how hosts are resolved on your machine, e.g... $ grep hosts /etc/nsswitch.conf hosts: files dns and lastly, you may want to check if you are running nscd (the NameService cache daemon). If you are, take a look at /etc/nscd.conf for the caching policy it's using. Good luck. -Matt On Jun 10, 2009, at 1:09 PM, John Martyniak wrote: That is what I thought also, is that it needs to keep that information somewhere, because it needs to be able to communicate with all of the servers. So I deleted the /tmp/had* and /tmp/hs* directories, removed the log files, and grepped for the duey name in all files in config. And the problem still exists. Originally I thought that it might have had something to do with multiple entries in the .ssh/ authorized_keys file but removed everything there. And the problem still existed. So I think that I am going to grab a new install of hadoop 0.19.1, delete the existing one and start out fresh to see if that changes anything. Wish me luck:) -John On Jun 10, 2009, at 12:30 PM, Steve Loughran wrote: John Martyniak wrote: Does hadoop "cache" the server names anywhere? Because I changed to using DNS for name resolution, but when I go to the nodes view, it is trying to view with the old name. And I changed the hadoop- site.xml file so that it no longer has any of those values. in SVN head, we try and get Java to tell us what is going on http://svn.apache.org/viewvc/hadoop/core/trunk/src/core/org/apache/hadoop/net/DNS.java This uses InetAddress.getLocalHost().getCanonicalHostName() to get the value, which is cached for life of the process. I don't know of anything else, but wouldn't be surprised -the Namenode has to remember the machines where stuff was stored. John Martyniak President/CEO Before Dawn Solutions, Inc. 9457 S. University Blvd #266 Highlands Ranch, CO 80126 o: 877-499-1562 c: 303-522-1756 e: j...@beforedawnsoutions.com w: http://www.beforedawnsolutions.com
Re: Monitoring hadoop?
Anthony- The ganglia web site is at http://ganglia.info/ with documentation in a wiki at http://ganglia.wiki.sourceforge.net/. There is also a good wiki page at IBM as well http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia . Ganglia packages are available for most distributions to help with installation so make sure to grep for ganglia with your favorite package manager (e.g. aptitude, yum, etc). Ganglia will give you more information about your cluster than just Hadoop metrics. You'll get CPU, load, memory, disk and network monitoring as well for free. You can see live demos of ganglia at http://ganglia.info/?page_id=69. Good luck. -Matt On Jun 5, 2009, at 7:10 AM, Brian Bockelman wrote: Hey Anthony, Look into hooking your Hadoop system into Ganglia; this produces about 20 real-time statistics per node. Hadoop also does JMX, which hooks into more "enterprise"-y monitoring systems. Brian On Jun 5, 2009, at 8:55 AM, Anthony McCulley wrote: Hey all, I'm currently tasked to come up with a web/flex-based visualization/monitoring system for a cloud system using hadoop as part of a university research project. I was wondering if I could elicit some feedback from all of you with regards to: - If you were an engineer of a cloud system running hadoop, what information would you be interested in capturing, viewing, monitoring, etc? - Is there any sort of real-time stats or monitoring currently available for hadoop? if so, is in a web-friendly format? Thanks in advance, - Anthony
Re: Fastlz coming?
Kris- You might take a look at some of the previous lzo threads on this list for help. See: http://www.mail-archive.com/search?q=lzo&l=core-user%40hadoop.apache.org -Matt On Jun 4, 2009, at 10:29 AM, Kris Jirapinyo wrote: Is there any documentation on that site on how we can use lzo? I don't see any entries on the wiki page of the project. I see an entry on the Hadoop wiki (http://wiki.apache.org/hadoop/UsingLzoCompression) but seems like that's more oriented towards HBase. I am on hadoop 0.19.1. Thanks, Kris J. On Thu, Jun 4, 2009 at 3:02 AM, Johan Oskarsson wrote: We're using Lzo still, works great for those big log files: http://code.google.com/p/hadoop-gpl-compression/ /Johan Kris Jirapinyo wrote: Hi all, In the remove lzo JIRA ticket https://issues.apache.org/jira/browse/HADOOP-4874 Tatu mentioned he was going to port fastlz from C to Java and provide a patch. Has there been any updates on that? Or is anyone working on any additional custom compression codecs? Thanks, Kris J.
Re: No route to host prevents from storing files to HDFS
Just for clarity: are you using any type of virtualization (e.g. vmware, xen) or just running the DataNode java process on the same machine? What is "fs.default.name" set to in your hadoop-site.xml? -Matt On Wed, Apr 22, 2009 at 5:22 PM, Stas Oskin wrote: > Hi. > > Is it possible to paste the output from the following command on both your > > DataNode and NameNode? > > > > % route -v -n > > > > Sure, here it is: > > Kernel IP routing table > Destination Gateway Genmask Flags Metric RefUse > Iface > 192.168.253.0 0.0.0.0 255.255.255.0 U 0 00 > eth0 > 169.254.0.0 0.0.0.0 255.255.0.0 U 0 00 > eth0 > 0.0.0.0 192.168.253.1 0.0.0.0 UG0 00 > eth0 > > > As you might recall, the problematic data node runs in same server as the > NameNode. > > Regards. >
Re: No route to host prevents from storing files to HDFS
Stas- Is it possible to paste the output from the following command on both your DataNode and NameNode? % route -v -n -Matt On Wed, Apr 22, 2009 at 4:36 PM, Stas Oskin wrote: > Hi. > > The way to diagnose this explicitly is: > > 1) on the server machine that should be accepting connections on the > port, > > telnet localhost PORT, and telnet IP PORT you should get a connection, if > > not then the server is not binding the port. > > 2) on the remote machine verify that you can communicate to the server > > machine via normal tools such as ssh and or ping and or traceroute, using > > the IP address from the error message in your log file > > 3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and > > (3) does not, then there is something blocking packets for the port range > > in > > question. If (3) does succeed then there is some probably interesting > > problem. > > > > Tried in step 3 to telnet both the 50010 and the 8010 ports of the > problematic datanode - both worked. > > I agree there is indeed an interesting problem :). Question is how it can > be > solved. > > Thanks. >