RE: How to set Rack Id of DataNodes?
Hi Mohammad, Yes that's correct your rack awareness script takes the IP address of a node and returns the rack name/id. You then just have to ensure the script is executable and referenced (using an absolute path) in the parameter topology.script.file.name in core-site.xml. Regards, Vijay From: Mohammad Mustaqeem [mailto:3m.mustaq...@gmail.com] Sent: 15 April 2013 13:05 To: user@hadoop.apache.org Subject: How to set Rack Id of DataNodes? Hello everyone?? I want to set the Rack Id of each DataNodes?? I have read somewhere that we have to write a script that gives Rack Id of nodes. I want to clarify that the input of that script will be IP Address of DataNode and the output will be the RackId.. Is it?? -- With regards --- Mohammad Mustaqeem, M.Tech (CSE) MNNIT Allahabad 9026604270
RE: NameNode failure and recovery!
Hi Rahul, The SNN does not act as a backup / standby NameNode in the event of failure. The sole purpose of the Secondary NameNode (or as it’s otherwise / more correctly known as the Checkpoint Node) is to perform checkpointing of the current state of HDFS: The SNN retrieves the fsimage and edits files from the NN The NN rolls the edits file The SNN Loads the fsimage into memory Then the SNN replays the edits log file to merge the two Then the SNN transfers the merged checkpoint back to the NN The NN uses the checkpoint as the new fsimage file It’s true that technically you could use the fsimage from the SNN if completely lost the NN – and yes as you said you would “lose” any changes to HDFS that occurred between the NN dieing and the last time the checkpoint occurred. But as mentioned the SNN is not a backup for the NN. Regards, Vijay From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] Sent: 03 April 2013 15:40 To: user@hadoop.apache.org Subject: NameNode failure and recovery! Hi all, I was reading about Hadoop and got to know that there are two ways to protect against the name node failures. 1) To write to a nfs mount along with the usual local disk. -or- 2) Use secondary name node. In case of failure of NN , the SNN can take in charge. My questions :- 1) SNN is always lagging , so when SNN becomes primary in event of a NN failure , then the edits which have not been merged into the image file would be lost , so the system of SNN would not be consistent with the NN before its failure. 2) Also I have read that other purpose of SNN is to periodically merge the edit logs with the image file. In case a setup goes with option #1 (writing to NFS, no SNN) , then who does this merging. Thanks, Rahul
RE: In Compatible clusterIDs
Hi Nagarjuna, What's is in your /etc/hosts file? I think the line in logs where it says DataNodeRegistration(0.0.0.0 [..], should be the hostname or IP of the datanode (124.123.215.187 since you said it's a pseudo-distributed setup) and not 0.0.0.0. By the way are you using the dfs.hosts parameter for specifying the datanodes that can connect to the namenode? Vijay From: nagarjuna kanamarlapudi [mailto:nagarjuna.kanamarlap...@gmail.com] Sent: 20 February 2013 15:52 To: user@hadoop.apache.org Subject: Re: In Compatible clusterIDs Hi Jean Marc, Yes, this is the cluster I am trying to create and then will scale up. As per your suggestion I deleted the folder /Users/nagarjunak/Documents/hadoop-install/hadoop-2.0.3-alpha/tmp_20 an formatted the cluster. Now I get the following error. 2013-02-20 21:17:25,668 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-811644675-124.123.215.187-1361375214801 (storage id DS-1515823288-124.123.215.187-50010-1361375245435) service to nagarjuna/124.123.215.187:9000 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol .DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-1515823288-124.123.215.187-50010-1361375245435, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-723b02a7-3441-41b5-8045-2a45a9cf96b0;nsid=1805451 571;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatano de(DatanodeManager.java:629) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNames ystem.java:3459) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(Na meNodeRpcServer.java:881) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.reg isterDatanode(DatanodeProtocolServerSideTranslatorPB.java:90) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtoco lService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295) at org.apache.hadoop.ipc.Protob On Wed, Feb 20, 2013 at 9:10 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Nagarjuna, Is it a test cluster? Do you have another cluster running close-by? Also, is it your first try? It seems there is some previous data in the dfs directory which is not in sync with the last installation. Maybe you can remove the content of /Users/nagarjunak/Documents/hadoop-install/hadoop-2.0.3-alpha/tmp_20 if it's not usefull for you, reformat your node and restart it? JM 2013/2/20, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com: Hi, I am trying to setup single node cluster of hadop 2.0.* When trying to start datanode I got the following error. Could anyone help me out Block pool BP-1894309265-124.123.215.187-1361374377471 (storage id DS-1175433225-124.123.215.187-50010-1361374235895) service to nagarjuna/ 124.123.215.187:9000 java.io.IOException: Incompatible clusterIDs in /Users/nagarjunak/Documents/hadoop-install/hadoop-2.0.3-alpha/tmp_20/dfs/dat a: namenode clusterID = CID-800b7eb1-7a83-4649-86b7-617913e82ad8; datanode clusterID = CID-1740b490-8413-451c-926f-2f0676b217ec at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage. java:391) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(Dat aStorage.java:191) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(Dat aStorage.java:219) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:85 0) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java: 821) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceI nfo(BPOfferService.java:280) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshak e(BPServiceActor.java:222) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.jav a:664) at java.lang.Thread.run(Thread.java:680) 2013-02-20 21:03:39,856 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1894309265-124.123.215.187-1361374377471 (storage id DS-1175433225-124.123.215.187-50010-1361374235895) service to nagarjuna/ 124.123.215.187:9000 2013-02-20 21:03:39,958 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1894309265-124.123.215.187-1361374377471 (storage id DS-1175433225-124.123.215.187-50010-1361374235895) 2013-02-20 21:03:41,959 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode 2013-02-20 21:03:41,961 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0 2013-02-20 21:03:41,963 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /
RE: getimage failed in Name Node Log
Hi Janesh, I think your SNN may be starting up with the wrong IP, I'm sure the machine parameter should say 192.168.0.101? http://namenode:50070/getimage?putimage=1 http://namenode:50070/getimage?putimage=1port=50090machine=0.0.0.0token= -32:1989419481:0:136084943:1360849122845 port=50090machine=0.0.0.0token=-32:1989419481:0:136084943:13608491228 45 Are you able to retrieve the fsimage from the SNN from the command line? Using curl or wget: wget 'http://192.168.0.105:50070/getimage?getimage=1' -O fsimage.dmp If this actually retrieves an error page then the NN is reachable from the SNN and the port is definitely open. Otherwise double check that this is not due to the OS firewall blocking the connection, assuming it is on? That said the PrivilegedActionException in the error may actually mean it's Vijay From: janesh mishra [mailto:janeshmis...@gmail.com] Sent: 15 February 2013 12:27 To: user@hadoop.apache.org Subject: getimage failed in Name Node Log Hi, I am new in Hadoop and i set the hadoop cluster with the help of Michell Noll Multi-Node setup (http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi- node-cluster/). When i setup the single Node Hadoop then every things works fine. But in Multi Node setup i found that my fsimage and editlogs files are not updated on SNN, roll back of edit is done i have edit.new on NN Logs Form NN: 2013-02-14 19:13:52,468 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hduser cause:java.net.ConnectException: Connection refused 2013-02-14 19:13:52,468 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hduser cause:java.net.ConnectException: Connection refused 2013-02-14 19:13:52,477 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.net.ConnectException: Connection refused Logs From SNN: -- 2013-02-14 19:13:52,350 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL namenode:50070putimage=1port=50090machine=0.0.0.0token=32:1989419481:0:13 6084943:1360849122845 2013-02-14 19:13:52,374 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint: 2013-02-14 19:13:52,375 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.FileNotFoundException: http://namenode:50070/getimage?putimage=1 http://namenode:50070/getimage?putimage=1port=50090machine=0.0.0.0token= -32:1989419481:0:136084943:1360849122845 port=50090machine=0.0.0.0token=-32:1989419481:0:136084943:13608491228 45 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection .java:1613) atorg.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(Trans ferFsImage.java:160) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(Secondar yNameNode.java:377) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(Second aryNameNode.java:418) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNam eNode.java:312) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNo de.java:275) at java.lang.Thread.run(Thread.java:722) My setup includes Version : hadoop-1.0.4 1. Name Node (192.168.0.105) 2. Secondary Name Node (192.168.0.101) 3. Data Node (192.168.0.100) Name Node also works as Data Node. Conf File For Name Node: core-hdfs.xml - ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/app/hadoop/tmp/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://namenode:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property property namefs.checkpoint.period/name value300/value descriptionThe number of seconds between two periodic checkpoints. /description /property /configuration hdfs-site.xml - ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value2/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property property namedfs.hosts/name value/usr/local/hadoop/includehosts/value descriptionips that works as datanode/description /property property
RE: Error for Pseudo-distributed Mode
Hi, Could you first try running the example: $ /usr/bin/hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar grep input output 'dfs[a-z.]+' Do you receive the same error? Not sure if it's related to a lack of RAM, but as the stack trace shows errors with network timeout (I realise that you're running in pseudo-distributed mode): Caused by: com.google.protobuf.ServiceException: java.net.SocketTimeoutException: Call From localhost.localdomain/127.0.0.1 to localhost.localdomain:54113 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:60976 remote=localhost.localdomain/127.0.0.1:54113]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout Your best bet is probably to start with checking the items mentioned in the wiki page linked to above. While the default firewall rules (on CentOS) usually allows pretty much all traffic on the lo interface it might be worth temporarily turning off iptables (assuming it is on). Vijay From: yeyu1899 [mailto:yeyu1...@163.com] Sent: 12 February 2013 12:58 To: user@hadoop.apache.org Subject: Error for Pseudo-distributed Mode Hi all, I installed a redhat_enterprise-linux-x86 in VMware Workstation, and set the virtual machine 1G memory. Then I followed steps guided by Installing CDH4 on a Single Linux Node in Pseudo-distributed Mode -- https://ccp.cloudera.com/display/CDH4DOC/Installing+CDH4+on+a+Single+Linux+N ode+in+Pseudo-distributed+Mode. When at last, I ran an example Hadoop job with the command $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+' then the screen showed as follows, depending AttemptID:attempt_1360528029309_0001_r_00_0 Timed out after 600 secs and I wonder is that because my virtual machine's memory too little~~?? [hadoop@localhost hadoop-mapreduce]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z]+' 13/02/11 04:30:44 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String). 13/02/11 04:30:44 INFO input.FileInputFormat: Total input paths to process : 4 13/02/11 04:30:45 INFO mapreduce.JobSubmitter: number of splits:4 13/02/11 04:30:45 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 13/02/11 04:30:45 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class 13/02/11 04:30:45 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 13/02/11 04:30:45 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name 13/02/11 04:30:45 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class 13/02/11 04:30:45 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 13/02/11 04:30:45 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 13/02/11 04:30:45 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 13/02/11 04:30:45 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 13/02/11 04:30:45 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 13/02/11 04:30:45 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 13/02/11 04:30:46 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources. 13/02/11 04:30:46 INFO mapred.ResourceMgrDelegate: Submitted application application_1360528029309_0001 to ResourceManager at /0.0.0.0:8032 13/02/11 04:30:46 INFO mapreduce.Job: The url to track the job: http://localhost.localdomain:8088/proxy/application_1360528029309_0001/ 13/02/11 04:30:46 INFO mapreduce.Job: Running job: job_1360528029309_0001 13/02/11 04:31:01 INFO mapreduce.Job: Job job_1360528029309_0001 running in uber mode : false 13/02/11 04:31:01 INFO mapreduce.Job: map 0% reduce 0% 13/02/11 04:47:22 INFO mapreduce.Job: Task Id : attempt_1360528029309_0001_r_00_0, Status : FAILED AttemptID:attempt_1360528029309_0001_r_00_0 Timed out after 600 secs cleanup failed for container container_1360528029309_0001_01_06 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAn dThrowException(YarnRemoteExceptionPBImpl.java:135) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.stopC
RE: Reg Too many fetch-failures Error
Hi Manoj, As you may be aware this means the reduces are unable to fetch intermediate data from TaskTrackers that ran map tasks - you can try: * increasing tasktracker.http.threads so there are more threads to handle fetch requests from reduces. * decreasing mapreduce.reduce.parallel.copies : so fewer copy / fetches are performed in parallel It could also be due to a temporary DNS issue. See slide 26 of this presentation for potential causes for this message: http://www.slideshare.net/cloudera/hadoop-troubleshooting-101-kate-ting-clou dera Not sure why you did not receive the problem before but was it the same data or different data? Did you have other jobs running on your cluster? Hope that helps Regards Vijay From: Manoj Babu [mailto:manoj...@gmail.com] Sent: 01 February 2013 15:09 To: user@hadoop.apache.org Subject: Reg Too many fetch-failures Error Hi All, I am getting Too many fetch-failures exception. What might be the reason for this exception, For same size of data i dint face this error earlier and there is change in code. How to avoid this? Thanks in advance. Cheers! Manoj.
RE: Maximum Storage size in a Single datanode
Hi Jeba, There are other considerations too, for example, if a single node holds 1 PB of data and if it were to die this would cause a significant amount of traffic as NameNode arranges for new replicas to be created. Vijay From: Bertrand Dechoux [mailto:decho...@gmail.com] Sent: 30 January 2013 09:14 To: user@hadoop.apache.org; jeba earnest Subject: Re: Maximum Storage size in a Single datanode I would say the hard limit is due to the OS local file system (and your budget). So short answer for ext3 : it doesn't seems so. http://en.wikipedia.org/wiki/Ext3 And I am not sure the answer is the most interesting. Even if you could put 1 Peta on one node, what is usually interesting is the ratio storage/compute. Bertrand On Wed, Jan 30, 2013 at 9:08 AM, jeba earnest jebaearn...@yahoo.com wrote: Hi, Is it possible to keep 1 Petabyte in a single data node? If not, How much is the maximum storage for a particular data node? Regards, M. Jeba