RE: Intermittent DataStreamer Exception while appending to file inside HDFS
Hi Arinto, You can check 3rd DN logs. Whether any space issues so that node was not selected for write etc. Does it mean that one of the datanodes was unreachable when we try to append into the files? It did not select for write itself. If it failed after selected for write means you should have get this error while recovery itself. Regards, Uma From: Arinto Murdopo [mailto:ari...@gmail.com] Sent: 11 October 2013 08:48 To: user@hadoop.apache.org Subject: Re: Intermittent DataStreamer Exception while appending to file inside HDFS Thank you for the comprehensive answer, When I inspect our NameNode UI, I see there are 3 datanodes are up. However, as you mentioned, the log only showed 2 datanodes are up. Does it mean that one of the datanodes was unreachable when we try to append into the files? Best regards, Arinto www.otnira.comhttp://www.otnira.com On Thu, Oct 10, 2013 at 4:57 PM, Uma Maheswara Rao G mahesw...@huawei.commailto:mahesw...@huawei.com wrote: Hi Arinto, Please disable this feature with smaller clusters. dfs.client.block.write.replace-datanode-on-failure.policy Reason for this exception is, you have replication set to 3 and looks like you have only 2 nodes in the cluster from the logs. When you first time created pipeline we will not do any verification i.e, whether pipeline DNs met the replication or not. Above property says only replace DN on failure. But here additionally we took advantage of verifying this condition when we reopen the pipeline for append. So, here unfortunately it will not meet the replication with existing DNs and it will try to add another node. Since you are not having any extra nodes in cluster other than selected nodes, it will fail. With the current configurations you can not append. Also please take a look at default configuration description: namedfs.client.block.write.replace-datanode-on-failure.enable/name valuetrue/value description If there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. The feature is to add new datanodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new datanodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy /description Make this configuration false at your client side. Regards, Uma From: Arinto Murdopo [mailto:ari...@gmail.commailto:ari...@gmail.com] Sent: 10 October 2013 13:02 To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Intermittent DataStreamer Exception while appending to file inside HDFS Hi there, I have this following exception while I'm appending existing file in my HDFS. This error appears intermittently. If the error does not show up, I can append the file successfully. If the error appears, I could not append the file. Here is the error: https://gist.github.com/arinto/d37a56f449c61c9d1d9c For your convenience, here it is: 13/10/10 14:17:30 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[10.0.106.82:50010http://10.0.106.82:50010, 10.0.106.81:50010http://10.0.106.81:50010], original=[10.0.106.82:50010http://10.0.106.82:50010, 10.0.106.81:50010http://10.0.106.81:50010]) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) Some configuration files: 1. hdfs-site.xml: https://gist.github.com/arinto/f5f1522a6f6994ddfc17#file-hdfs-append-datastream-exception-hdfs-site-xml 2. core-site.xml: https://gist.github.com/arinto/0c6f40872181fe26f8b1#file-hdfs-append-datastream-exception-core-site-xml So, any idea how to solve this issue? Some links that I've found (but unfortunately they do not help) 1. StackOverflowhttp://stackoverflow.com/questions/15347799/java-io-ioexception-failed-to-add-a-datanode-hdfs-hadoop, our replication factor is 3 and we've never changed the replication factor since we setup the cluster. 2. Impala-User mailing
RE: Intermittent DataStreamer Exception while appending to file inside HDFS
Hi Arinto, Please disable this feature with smaller clusters. dfs.client.block.write.replace-datanode-on-failure.policy Reason for this exception is, you have replication set to 3 and looks like you have only 2 nodes in the cluster from the logs. When you first time created pipeline we will not do any verification i.e, whether pipeline DNs met the replication or not. Above property says only replace DN on failure. But here additionally we took advantage of verifying this condition when we reopen the pipeline for append. So, here unfortunately it will not meet the replication with existing DNs and it will try to add another node. Since you are not having any extra nodes in cluster other than selected nodes, it will fail. With the current configurations you can not append. Also please take a look at default configuration description: namedfs.client.block.write.replace-datanode-on-failure.enable/name valuetrue/value description If there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. The feature is to add new datanodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new datanodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy /description Make this configuration false at your client side. Regards, Uma From: Arinto Murdopo [mailto:ari...@gmail.com] Sent: 10 October 2013 13:02 To: user@hadoop.apache.org Subject: Intermittent DataStreamer Exception while appending to file inside HDFS Hi there, I have this following exception while I'm appending existing file in my HDFS. This error appears intermittently. If the error does not show up, I can append the file successfully. If the error appears, I could not append the file. Here is the error: https://gist.github.com/arinto/d37a56f449c61c9d1d9c For your convenience, here it is: 13/10/10 14:17:30 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[10.0.106.82:50010http://10.0.106.82:50010, 10.0.106.81:50010http://10.0.106.81:50010], original=[10.0.106.82:50010http://10.0.106.82:50010, 10.0.106.81:50010http://10.0.106.81:50010]) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) Some configuration files: 1. hdfs-site.xml: https://gist.github.com/arinto/f5f1522a6f6994ddfc17#file-hdfs-append-datastream-exception-hdfs-site-xml 2. core-site.xml: https://gist.github.com/arinto/0c6f40872181fe26f8b1#file-hdfs-append-datastream-exception-core-site-xml So, any idea how to solve this issue? Some links that I've found (but unfortunately they do not help) 1. StackOverflowhttp://stackoverflow.com/questions/15347799/java-io-ioexception-failed-to-add-a-datanode-hdfs-hadoop, our replication factor is 3 and we've never changed the replication factor since we setup the cluster. 2. Impala-User mailing listhttps://groups.google.com/a/cloudera.org/forum/#!searchin/impala-user/DataStreamer$20exception/impala-user/u2CN163Cyfc/_OcRqBYL2B4J: the error here is due to replication factor set to 1. In our case, we're using replication factor = 3 Best regards, Arinto www.otnira.comhttp://www.otnira.com
RE: When to use DFSInputStream and HdfsDataInputStream
Hi Rob, DFSInputStream: InterfaceAudience for this class is private and you should not use this class directly. This class mainly implements actual core functionality of read. And this is DFS specific implementation only. HdfsDataInputStream : InterfaceAudience for this class is public and you can use this class. In fact, you will get the object of HdfsDataInputStream when you open the file for read. This wrapper provides you some additional DFS specific api implementations like getVisibleLength etc which are may not be the intended apis for normal FS. Similar way for write: I hope this will help you for clarifying your doubts. Regards, Uma From: Rob Blah [mailto:tmp5...@gmail.com] Sent: 01 October 2013 03:39 To: user@hadoop.apache.org Subject: When to use DFSInputStream and HdfsDataInputStream Hi What is the use case difference between: - DFSInputStream and HdfsDataInputStream - DFSOutputStream and HdfsDataOutputStream When one should be preferred over other? From sources I see they have similar functionality, only HdfsData*Stream follows Data*Stream instead of *Stream. Also is DFS*Stream more general than HdfsData*Stream, in the sense it works on higher abstraction layer, can work with other Distributed FS (even though it contact HDFS specific components), or its just naming convention? Which one should I chose to read/write data from/to HDFS and why (sounds like academic question ;) )? * - means both Input and Output regards tmp
RE: HADOOP UPGRADE ISSUE
start-all.sh will not carry any arguments to pass to nodes. Start with start-dfs.sh or start directly namenode with upgrade option. ./hadoop namenode -upgrade Regards, Uma From: yogesh dhari [yogeshdh...@live.com] Sent: Thursday, November 22, 2012 12:23 PM To: hadoop helpforoum Subject: HADOOP UPGRADE ISSUE Hi All, I am trying upgrading apache hadoop-0.20.2 to hadoop-1.0.4. I have give same dfs.name.dir, etc as same in hadoop-1.0.4' conf files as were in hadoop-0.20.2. Now I am starting dfs n mapred using start-all.sh -upgrade but namenode and datanode fail to run. 1) Namenode's logs shows:: ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: File system image contains an old layout version -18. An upgrade to version -32 is required. Please restart NameNode with -upgrade option. . . ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: File system image contains an old layout version -18. An upgrade to version -32 is required. Please restart NameNode with -upgrade option. 2) Datanode's logs shows:: WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /opt/hadoop_newdata_dirr, expected: rwxr-xr-x, while actual: rwxrwxrwx ( how these file permission showing warnings)* 2012-11-22 12:05:21,157 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in dfs.data.dir are invalid. Please suggest Thanks Regards Yogesh Kumar
RE: High Availability - second namenode (master2) issue: Incompatible namespaceIDs
If you format namenode, you need to cleanup storage directories of DataNode as well if that is having some data already. DN also will have namespace ID saved and compared with NN namespaceID. if you format NN, then namespaceID will be changed and DN may have still older namespaceID. So, just cleaning the data in DN would be fine. Regards, Uma From: hadoop hive [hadooph...@gmail.com] Sent: Friday, November 16, 2012 1:15 PM To: user@hadoop.apache.org Subject: Re: High Availability - second namenode (master2) issue: Incompatible namespaceIDs Seems like you havn't format your cluster (if its 1st time made). On Fri, Nov 16, 2012 at 9:58 AM, a...@hsk.hkmailto:a...@hsk.hk a...@hsk.hkmailto:a...@hsk.hk wrote: Hi, Please help! I have installed a Hadoop Cluster with a single master (master1) and have HBase running on the HDFS. Now I am setting up the second master (master2) in order to form HA. When I used JPS to check the cluster, I found : 2782 Jps 2126 NameNode 2720 SecondaryNameNode i.e. The datanode on this server could not be started In the log file, found: 2012-11-16 10:28:44,851 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 1356148070; datanode namespaceID = 1151604993 One of the possible solutions to fix this issue is to: stop the cluster, reformat the NameNode, restart the cluster. QUESTION: As I already have HBASE running on the cluster, if I reformat the NameNode, do I need to reinstall the entire HBASE? I don't mind to have all data lost as I don't have many data in HBASE and HDFS, however I don't want to re-install HBASE again. On the other hand, I have tried another solution: stop the DataNode, edit the namespaceID in current/VERSION (i.e. set namespaceID=1151604993), restart the datanode, it doesn't work: Warning: $HADOOP_HOME is deprecated. starting master2, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-master2-master2.out Exception in thread main java.lang.NoClassDefFoundError: master2 Caused by: java.lang.ClassNotFoundException: master2 at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: master2. Program will exit. QUESTION: Any other solutions? Thanks
RE: Active-Active setup for the namenode
Adding to Andy's points: To be clarify: I think 0.23 does not claim HA feature. Also Hadoop-2 HA is Active-Standby model. Regards, Uma From: Andy Isaacson [a...@cloudera.com] Sent: Thursday, November 15, 2012 8:19 AM To: user@hadoop.apache.org Subject: Re: Active-Active setup for the namenode On Wed, Nov 14, 2012 at 4:35 AM, mailinglist mailingl...@datenvandalismus.org wrote: does anyone know, if it possible to setup an active-active-NameNode in hadoop 1.0 ? Or how can i provide a HA-NameNode? HA is not present in hadoop 1.0. You'll have to upgrade to a release on branch 2.0 or 0.23. -andy
RE: How to do HADOOP RECOVERY ???
Which version of Hadoop are you using? Do you have all DNs running? can you check UI report, wehther all DN are a live? Can you check the DN disks are good or not? Can you grep the NN and DN logs with one of the corrupt blockID from below? Regards, Uma From: yogesh.kuma...@wipro.com [yogesh.kuma...@wipro.com] Sent: Monday, October 29, 2012 2:03 PM To: user@hadoop.apache.org Subject: How to do HADOOP RECOVERY ??? Hi All, I run this command hadoop fsck -Ddfs.http.address=localhost:50070 / and found that some blocks are missing and corrupted results comes like.. /user/hive/warehouse/tt_report_htcount/00_0: MISSING 2 blocks of total size 71826120 B.. /user/hive/warehouse/tt_report_perhour_hit/00_0: CORRUPT block blk_75438572351073797 /user/hive/warehouse/tt_report_perhour_hit/00_0: MISSING 1 blocks of total size 1531 B.. /user/hive/warehouse/vw_cc/00_0: CORRUPT block blk_-1280621588594166706 /user/hive/warehouse/vw_cc/00_0: MISSING 1 blocks of total size 1774 B.. /user/hive/warehouse/vw_report2/00_0: CORRUPT block blk_8637186139854977656 /user/hive/warehouse/vw_report2/00_0: CORRUPT block blk_4019541597438638886 /user/hive/warehouse/vw_report2/00_0: MISSING 2 blocks of total size 71826120 B.. /user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276 . . . . . Total size:7600625746 B Total dirs:205 Total files:173 Total blocks (validated):270 (avg. block size 28150465 B) CORRUPT FILES:171 MISSING BLOCKS:269 MISSING SIZE:7600625742 B CORRUPT BLOCKS: 269 Minimally replicated blocks:1 (0.37037036 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks:0 (0.0 %) Mis-replicated blocks:0 (0.0 %) Default replication factor:1 Average block replication:0.0037037036 Corrupt blocks:269 Missing replicas:0 (0.0 %) Number of data-nodes:1 Number of racks:1 Is there any way to recover them ? Please help and suggest Thanks Regards yogesh kumar The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
RE: How to do HADOOP RECOVERY ???
If you backed up both data directory and namespace dirs correctly, configuring them back should work fine. please check whether your configuration are getting applied properly once. for ex: I can see below dfs.data,dir here ',' instead of '.' . this might be typo, I am just asking you relook crefully once your configs. if you just backed up and configure the same dirs to a cluster back and starting would be just equal to restarting the cluster. Regards, Uma From: yogesh.kuma...@wipro.com [yogesh.kuma...@wipro.com] Sent: Monday, October 29, 2012 5:43 PM To: user@hadoop.apache.org Subject: RE: How to do HADOOP RECOVERY ??? Hi Uma, You are correct, when I start cluster it goes into safemode and if I do wait its doesn't come out. I use -safemode leave option. Safe mode is ON. The ratio of reported blocks 0.0037 has not reached the threshold 0.9990. Safe mode will be turned off automatically. 379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%) When I start fresh cluster mean.. I have saved the fs.name.dir and fs.data.dir seprately for back-up of old cluster(single node). and used old machine and new machine to start new cluster ( old machine acted as DN and newly added machine was acted as NN+DN). and at the same time I have given different directory location for dfs.name.dir and dfs.data.dir on old machine Say when it was single node dfs.name.dir -- /HADOOP/SINGLENODE/Name_Dirdfs.data,dir -- /HADOOP/SINGLENODE/Data_Dir when I used it with another machine as D.N dfs.name.dir -- /HADOOP/MULTINODE/Name_Dir dfs.data.dir -- /HADOOP/MULTINODE/Data_Dir Now I get back to previous stage. Old Machine as single Node Cluster (NN + DN) and gave the path for dfs.name.dir dfs.data.dir ( dfs.name.dir -- /HADOOP/SINGLENODE/Name_Dirdfs.data,dir -- /HADOOP/SINGLENODE/Data_Dir) I have saved namespace and data before configuring the multi node cluster with new machine. It should work after giving the name space and data directory path in conf files of single node machine and should show the previous content, or I am wrong ?? Why Its it happening, and why is it not cumming from safe mode by itself Please suggest Regards Yogesh Kumar From: Uma Maheswara Rao G [mahesw...@huawei.com] Sent: Monday, October 29, 2012 5:10 PM To: user@hadoop.apache.org Subject: RE: How to do HADOOP RECOVERY ??? I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case. I have saved the dfs.name.dir seprately, and started with fresh cluster... When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs). Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode? or if you continue with the same cluster and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster? In this case, if you delete any existing files, data blocks will be invalidated in DN. After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt. I did -ls / operation and got this exception mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/ Found 1 items ls will show because namespace has this info for this file. But DNs does not have any block related to it. From: yogesh.kuma...@wipro.com [yogesh.kuma...@wipro.com] Sent: Monday, October 29, 2012 4:13 PM To: user@hadoop.apache.org Subject: RE: How to do HADOOP RECOVERY ??? Thanks Uma, I am using hadoop-0.20.2 version. UI shows. Cluster Summary 379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%) WARNING : There are about 270 missing blocks. Please check the log or run fsck. Configured Capacity : 465.44 GB DFS Used: 20 KB Non DFS Used: 439.37 GB DFS Remaining : 26.07 GB DFS Used% : 0 % DFS Remaining% : 5.6 % Live Nodeshttp://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE : 1 Dead Nodeshttp://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD : 0 Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only. I have saved the dfs.name.dir seprately, and started with fresh cluster... Now
Re: Replication Factor Modification
Replication factor is per file option, So, you may have to write a small program which will iterate over all files and set the replication factor to desired one. API: FileSystem#setReplication Regards, Uma On Wed, Sep 5, 2012 at 11:39 PM, Uddipan Mukherjee uddipan_mukher...@infosys.com wrote: Hi, We have a requirement where we have change our Hadoop Cluster's Replication Factor without restarting the Cluster. We are running our Cluster on Amazon EMR. Can you please suggest the way to achieve this? Any pointer to this will be very helpful. Thanks And Regards Uddipan Mukherjee CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
RE: HADOOP-7178 patch is not present in Hadoop-1.0.3
Hi Stuti, Yes, I remember we have committed this to trunk only that time. Regards, Uma From: Stuti Awasthi [stutiawas...@hcl.com] Sent: Friday, August 31, 2012 5:54 PM To: user@hadoop.apache.org Subject: HADOOP-7178 patch is not present in Hadoop-1.0.3 Hello, I wanted to avoid .crc file so I found that the patch is available under HADOOP-7178. This was fixed and committed under Hadoop-0.23.0 version. Currently I am using latest stable release Hadoop-1.0.3 but not able to find api public void copyToLocalFile(boolean delSrc, Path src, Path dst, boolean useRawLocalFileSystem) Is this patch not included in this version? Thanks, Stuti Awasthi ::DISCLAIMER:: The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
RE: checkpointnode backupnode hdfs HA
Hi Jan, Don't confuse with the backupnode/checkpoint nodes here. The new HA architecture mainly targetted to build HA with Namenode states. 1) Active Namenode 2) Standby Namenode When you start NN, they both will start in standby mode bydefault. then you can switch one NN to active state by giving ha admin commands or by configuring ZKFC( auto failover) process(not release officially yet). So, the NN state will start required services accordingly. This is almost like a new implementation for StandbyNode checkpointing process. Active NN will write edits to local dirs and shared NN dirs. Standby node will keep tail the edits from Shared NN dirs. Coming to this Shared storage part: Currently there are 3 options. 1) NFS filers ( mey need to buy external devices) 2) BookKeeper ( Its a subproject of open source ZooKeeper). This is mainly inspired by NN. This is high performance write ahead logging system. and also it can scale to more nodes depending on usage dynamically. Now the integration with BookKeeper already available and we are running the some clusters with that. HDFS-3399 3) Other option is Quorum based approach, this is under development. This is mainly aimed to develop shared storage nodes inside HDFS itself and can make use of proven RPC protocols for unified security mechanisms and use the proven edits storage layers. HDFS-3077. I hope, this will give more idea on current HA in community. Regards, Uma From: Jan Van Besien [ja...@ngdata.com] Sent: Thursday, August 16, 2012 1:41 PM To: user@hadoop.apache.org Subject: checkpointnode backupnode hdfs HA I am a bit confused about the different options for namenode high availability (or something along those lines) in CDH4 (hadoop-2.0.0). I understand that the secondary namenode is deprecated, and that there are two options to replace it: checkpoint or backup namenodes. Both are well explained in the documentation, but the confusion begins when reading about HDFS High Availability, for example here: http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html Is the topic HDFS High Availability as described there (using shared storage) related to checkpoint/backup nodes. If so, in what way? If I read about backup nodes, it also seems to be aimed at high availability. From what I understood, the current implementation doesn't provide (warm) fail-over yet, but this is planned. So starting to replace secondary namenodes now with backup namenodes sounds like a future proof idea? thanks, Jan
RE: upload hang at DFSClient$DFSOutputStream.close(3488)
Hi Mingxi, In your thread dump, did you check DataStreamer thread? is it running? If DataStreamer thread is not running, then this issue would be mostly same as HDFS-2850. Did you find any OOME in your clients? Regards, Uma From: Mingxi Wu [mingxi...@turn.com] Sent: Monday, April 16, 2012 7:25 AM To: common-user@hadoop.apache.org Subject: upload hang at DFSClient$DFSOutputStream.close(3488) Hi, I use hadoop cloudera 0.20.2-cdh3u0. I have a program which uploads local files to HDFS every hour. Basically, I open a gzip input stream by in= new GZIPInputStream(fin); And write to HDFS file. After less than two days, it will hang. It hangs at FSDataOutputStream.close(86). Here is the stack: State: WAITING Running 16660 ms (user 13770 ms) blocked 11276 times for ms waiting 11209 times for ms LockName: java.util.LinkedList@f1ca0de LockOwnerId: -1 java.lang.Object.wait(-2) java.lang.Object.wait(485) org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(3468) org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(3457) org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(3549) org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(3488) org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(61) org.apache.hadoop.fs.FSDataOutputStream.close(86) org.apache.hadoop.io.IOUtils.copyBytes(59) org.apache.hadoop.io.IOUtils.copyBytes(74) Any suggestion to avoid this issue? It seems this is a bug in hadoop. I found this issue is less severe when my upload server do one upload at a time, instead of using multiple concurrent uploads. Thanks, Mingxi
RE: Namenode no lease exception ... what does it mean?
Mark, Any other clients deleted the file while write inprogress? could you please grep in namenode about this file /user/mark/output33/_temporary/_attempt_201202090811_0005_m_000247_0/part-00247 if any delete requests? From: Mark question [markq2...@gmail.com] Sent: Friday, February 10, 2012 12:02 AM To: common-user Subject: Namenode no lease exception ... what does it mean? Hi guys, Even though there is enough space on HDFS as shown by -report ... I get the following 2 error shown first in the log of a datanode and the second on Namenode log: 1)2012-02-09 10:18:37,519 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_8448117986822173955 is added to invalidSet of 10.0.40.33:50010 2) 2012-02-09 10:18:41,788 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: addStoredBlock request received for blk_132544693472320409_2778 on 10.0.40.12:50010 size 67108864 But it does not belong to any file. 2012-02-09 10:18:41,789 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 12123, call addBlock(/user/mark/output33/_temporary/_attempt_201202090811_0005_m_000247_0/part-00247, DFSClient_attempt_201202090811_0005_m_000247_0) from 10.0.40.12:34103: error: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/mark/output33/_temporary/_attempt_201202090811_0005_m_000247_0/part-00247 File does not exist. Holder DFSClient_attempt_201202090811_0005_m_000247_0 does not have any open files. org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/mark/output33/_temporary/_attempt_201202090811_0005_m_000247_0/part-00247 File does not exist. Holder DFSClient_attempt_201202090811_0005_m_000247_0 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1332) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1323) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1251) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) Any other ways to debug this? Thanks, Mark
RE: Unable to Load Native-Hadoop Library for Your Platform
Looks you are not using any compression in your code. Hadoop has some native libraries to load mainly for compression codecs. When you want to use that compression tequniques, you need to compile with this compile.native option enable. Also need to set in java library path. If you are not using any such stuff, then you need not worry about that warning. Please look at the below link for more information. http://hadoop.apache.org/common/docs/current/native_libraries.html Regards, Uma From: Bing Li [lbl...@gmail.com] Sent: Tuesday, February 07, 2012 3:08 PM To: common-user@hadoop.apache.org Subject: Unable to Load Native-Hadoop Library for Your Platform Dear all, I got an error when running a simple Java program on Hadoop. The program is just to merge some local files to one and put it on Hadoop. The code is as follows. .. Configuration conf = new Configuration(); try { FileSystem hdfs = FileSystem.get(conf); FileSystem local = FileSystem.getLocal(conf); Path inputDir = new Path(/home/libing/Temp/); Path hdfsFile = new Path(/tmp/user/libing/example.txt); try { FileStatus[] inputFiles = local.listStatus(inputDir); FSDataOutputStream out = hdfs.create(hdfsFile); for (int i = 0; i inputFiles.length; i ++) { System.out.println(inputFiles[i].getPath().getName()); FSDataInputStream in = local.open(inputFiles[i].getPath()); byte buffer[] = new byte[256]; int bytesRead = 0; while ((bytesRead = in.read(buffer)) 0) { out.write(buffer, 0, bytesRead); } in.close(); } out.close(); } catch (IOException e) { e.printStackTrace(); } } catch (IOException e) { e.printStackTrace(); } .. I run it with ant and got the following warning. BTW, all the relevant jar packages from Hadoop are specified in the build.xml. [java] 2012-2-7 17:16:18 org.apache.hadoop.util.NativeCodeLoader clinit [java] Warning: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable The program got a correct result. But I cannot figure out what the above problem is. Thanks so much! Bing
RE: out of memory running examples
What is the Java heap space you configured? for the property mapred.child.java.opts From: Tim Broberg [tim.brob...@exar.com] Sent: Tuesday, February 07, 2012 3:20 PM To: common-user@hadoop.apache.org Subject: out of memory running examples I'm trying to run the basic example from hadoop/hadoop-1.0.0/docs/single_node_setup.html. I'm getting java.lang.OutOfMemoryError's when I run the grep example from that page. Stackoverflow suggests various tweaks to the command line, mapred-site.xml, or hadoop-env.sh, none of which seem to be helping in my case. When I tweak hadoop-env.sh to echo text to a file, that file doesn't show up, which suggests that hadoop-env.sh isn't even getting executed. Any hints on debugging this? - Tim. [tbroberg@san-mothra hadoop-1.0.0]$ bin/hadoop jar hadoop-examples-1.0.0.jar grep input output 'dfs[a-z.]+' Warning: $HADOOP_HOME is deprecated. 12/02/07 01:39:35 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/02/07 01:39:35 INFO mapred.FileInputFormat: Total input paths to process : 7 12/02/07 01:39:35 INFO mapred.JobClient: Running job: job_local_0001 12/02/07 01:39:35 INFO util.ProcessTree: setsid exited with exit code 0 12/02/07 01:39:35 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4c349471mailto:org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4c349471 12/02/07 01:39:35 INFO mapred.MapTask: numReduceTasks: 1 12/02/07 01:39:35 INFO mapred.MapTask: io.sort.mb = 100 12/02/07 01:39:35 WARN mapred.LocalJobRunner: job_local_0001 java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:949) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 12/02/07 01:39:36 INFO mapred.JobClient: map 0% reduce 0% 12/02/07 01:39:36 INFO mapred.JobClient: Job complete: job_local_0001 12/02/07 01:39:36 INFO mapred.JobClient: Counters: 0 12/02/07 01:39:36 INFO mapred.JobClient: Job Failed: NA java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265) at org.apache.hadoop.examples.Grep.run(Grep.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.Grep.main(Grep.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) [tbroberg@san-mothra hadoop-1.0.0]$ The information and any attached documents contained in this message may be confidential and/or legally privileged. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, or reproduction is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender immediately by return e-mail and destroy all copies of the original message.
RE: reducers output
This looks to be HDFS specific question. please send it to correct mailing list. cc'ed to mapreduce user if you have not registered for hdfs. Please look at the previous discussion in mailing list about your question. http://lucene.472066.n3.nabble.com/How-HDFS-decides-where-to-put-the-block-td2834463.html Regards, Uma From: Alieh Saeedi [aliehsae...@yahoo.com] Sent: Saturday, February 04, 2012 1:16 PM To: mapreduce-user@hadoop.apache.org Subject: reducers output Hi 1- How does Hadoop decide where to save file blocks (I mean all files include files written by reducers)? Could you please give me a reference link?
RE: reducers output
Sorry. Ignore my previos answer. Was just thinking about normal HDFS files :-). In your question, they will be decided by your job itself. sorry for the confusion. From: Uma Maheswara Rao G [mahesw...@huawei.com] Sent: Saturday, February 04, 2012 5:46 PM To: hdfs-u...@hadoop.apache.org Cc: mapreduce-user@hadoop.apache.org Subject: RE: reducers output This looks to be HDFS specific question. please send it to correct mailing list. cc'ed to mapreduce user if you have not registered for hdfs. Please look at the previous discussion in mailing list about your question. http://lucene.472066.n3.nabble.com/How-HDFS-decides-where-to-put-the-block-td2834463.html Regards, Uma From: Alieh Saeedi [aliehsae...@yahoo.com] Sent: Saturday, February 04, 2012 1:16 PM To: mapreduce-user@hadoop.apache.org Subject: reducers output Hi 1- How does Hadoop decide where to save file blocks (I mean all files include files written by reducers)? Could you please give me a reference link?
RE: ERROR namenode.NameNode: java.io.IOException: Cannot remove current directory: /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current
can you try delete this directory manually? Please check other process already running with this directory configured. Regards, Uma From: Vijayakumar Ramdoss [nellaivi...@gmail.com] Sent: Thursday, February 02, 2012 1:27 AM To: common-user@hadoop.apache.org Subject: ERROR namenode.NameNode: java.io.IOException: Cannot remove current directory: /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current Hi All, I am trying to start the Namenode from my machine, Its throwing the error message, *ERROR namenode.NameNode: java.io.IOException: Cannot remove current directory: /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current* Please refer the log information from here, vijayram@ubuntu:/etc$ hadoop namenode -format 12/02/01 14:07:48 INFO namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ubuntu/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2-cdh3u3 STARTUP_MSG: build = file:///data/1/tmp/nightly_2012-01-26_09-40-25_3/hadoop-0.20-0.20.2+923.194-1~squeeze -r 03b655719d13929bd68bb2c2f9cee615b389cea9; compiled by 'root' on Thu Jan 26 11:54:44 PST 2012 / Re-format filesystem in /var/lib/hadoop-0.20/cache/hadoop/dfs/name ? (Y or N) Y 12/02/01 14:08:10 INFO util.GSet: VM type = 64-bit 12/02/01 14:08:10 INFO util.GSet: 2% max memory = 17.77875 MB 12/02/01 14:08:10 INFO util.GSet: capacity = 2^21 = 2097152 entries 12/02/01 14:08:10 INFO util.GSet: recommended=2097152, actual=2097152 12/02/01 14:08:10 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing. 12/02/01 14:08:10 INFO namenode.FSNamesystem: fsOwner=vijayram (auth:SIMPLE) 12/02/01 14:08:10 INFO namenode.FSNamesystem: supergroup=supergroup 12/02/01 14:08:10 INFO namenode.FSNamesystem: isPermissionEnabled=false 12/02/01 14:08:10 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=1000 12/02/01 14:08:10 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 12/02/01 14:08:10 ERROR namenode.NameNode: java.io.IOException: Cannot remove current directory: /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:292) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1246) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1265) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1127) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1244) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1260) Thanks and Regards Vijay nellaivi...@gmail.com
RE: namenode grows overtime
Can you please check in UI, what is the heap usage. Then we can confirm whether java heap is growing or not. top will consider native memory usage also and nio uses directByteBuffers internally. This is good write up from Jonathan https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357 Regards, Uma From: felix gao [gre1...@gmail.com] Sent: Friday, January 20, 2012 6:42 AM To: hdfs-user@hadoop.apache.org Subject: Re: namenode grows overtime Koji, There is no Java options specified other than -Xmx24g, what are some of the recommended options for namenode? Thanks, Felix On Wed, Jan 18, 2012 at 3:39 PM, Koji Noguchi knogu...@yahoo-inc.commailto:knogu...@yahoo-inc.com wrote: Hi Felix, Taking jmap –histo:live pid would tell you what’s occupying the heap. Are you using UseConcMarkSweepGC? If yes, and if you see bunch of java.net.SocksSocketImpl sun.nio.ch.SocketChannelImpl from jmap histo outputs, try passing-XX:-CMSConcurrentMTEnabled . Background: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 Koji On 1/18/12 3:23 PM, felix gao gre1...@gmail.comhttp://gre1...@gmail.com wrote: Hi guys, we are running Hadoop 0.20.2+228, the namenode process's memory grows overtime to occupy over 18GB. However, if I restart the namenode, it only occupies about 10GB after it is stable. I am wondering if there is anyway to figure out what is going on with namenode that causes it to grow very rapidly and if there are any tools to make the namenode printout some useful information on what is holding onto that memory. Thanks, Felix
RE: Data processing in DFSClient
Hi Shesha, Take a look at org.apache.hadoop.hdfs.server.datanode.BlockSender.java Regards, Uma From: Sesha Kumar [sesha...@gmail.com] Sent: Monday, January 16, 2012 7:50 PM To: hdfs-user@hadoop.apache.org Subject: Data processing in DFSClient Hey guys, Sorry for the typo in my last message.I have corrected it. I would like to perform some additional processing on the data which is streamed to DFSClient. To my knowledge the class DFSInputStream manages the stream operations on the client side whenever a file is being read, but i don't know which class should be modified to add this additional processing capability to data node. Please clarify. Thanks in advance
RE: HDFS Problems After Programmatic Access
Looks namenode is not giving the nodes. Can you please check Datanode is running properly? From: Apurv Verma [dapu...@gmail.com] Sent: Wednesday, January 11, 2012 8:15 AM To: hdfs-user@hadoop.apache.org Subject: HDFS Problems After Programmatic Access When I run this command, I get an error message. This problem started first when I first accessed HDFS programatically. I am on a single node cluster in ubuntu. hadoop dfs -copyFromLocal /home/apurv/bigdataLocal/macbeth.txt /home/apurv/bigdataRemote/ 12/01/11 08:07:49 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/apurv/bigdataRemote could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1030) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) 12/01/11 08:07:49 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 12/01/11 08:07:49 WARN hdfs.DFSClient: Could not get block locations. Source file /home/apurv/bigdataRemote - Aborting... copyFromLocal: java.io.IOException: File /home/apurv/bigdataRemote could only be replicated to 0 nodes, instead of 1 12/01/11 08:07:49 ERROR hdfs.DFSClient: Exception closing file /home/apurv/bigdataRemote : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/apurv/bigdataRemote could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/apurv/bigdataRemote could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) at
RE: Timeouts in Datanodes while block scanning
Hi Aaron, Presently i am in 0.20.2 version. I debugged the problem for some time. Could not find any clue. Wanted to know any of the dev/users faced this situation in their clusters. Regards, Uma From: Aaron T. Myers [a...@cloudera.com] Sent: Thursday, January 05, 2012 11:36 PM To: hdfs-...@hadoop.apache.org Subject: Re: Timeouts in Datanodes while block scanning What version of HDFS? This question might be more appropriate for hdfs-user@ . -- Aaron T. Myers Software Engineer, Cloudera On Thu, Jan 5, 2012 at 8:59 AM, Uma Maheswara Rao G mahesw...@huawei.comwrote: Hi, I have 10 Node cluster running from last 25days( running with Hbase cluster). Recently observed that for every continuos blocks scans, there are many timeouts coming in DataNode. After this block scan verifications, again reads succeeded. This situation keep occurring many times now, for every continuous block scans. Here Hbase continuously performing many random reads. Whether any one faced this situation in your clusters? Below is the logs with timeouts. 2011-12-28 11:30:42,618 INFO DataNode.clienttrace (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: / 107.252.175.3:52764, bytes: 264192, op: HDFS_READ, cliID: DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: blk_1323251633953_187190 2011-12-28 11:30:42,621 INFO DataNode.clienttrace (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: / 107.252.175.3:52772, bytes: 396288, op: HDFS_READ, cliID: DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: blk_1323251635735_188342 2011-12-28 11:30:42,641 INFO DataNode.clienttrace (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: / 107.252.175.3:52796, bytes: 396288, op: HDFS_READ, cliID: DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: blk_1323251634096_187277 2011-12-28 11:30:42,889 INFO DataNode.clienttrace (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: / 107.252.175.3:52732, bytes: 264192, op: HDFS_READ, cliID: DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: blk_1323251635763_188363 2011-12-28 11:30:42,889 INFO DataNode.clienttrace (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: / 107.252.175.3:52637, bytes: 264192, op: HDFS_READ, cliID: DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: blk_1323251634921_187798 2011-12-28 11:30:42,976 INFO DataNode.clienttrace (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: / 107.252.175.3:52755, bytes: 396288, op: HDFS_READ, cliID: DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: blk_1323251635359_188075 2011-12-28 11:30:57,757 INFO datanode.DataBlockScanner (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for blk_1323251602823_167208 2011-12-28 11:32:15,757 INFO datanode.DataBlockScanner (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for blk_1323251599175_166755 2011-12-28 11:32:54,561 INFO datanode.DataBlockScanner (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for blk_1323251673745_194676 2011-12-28 11:33:33,561 INFO datanode.DataBlockScanner (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for blk_1323251640709_189383 2011-12-28 11:34:12,557 INFO datanode.DataBlockScanner (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for blk_1323251649630_190779 2011-12-28 11:34:51,557 INFO datanode.DataBlockScanner (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for blk_1323251463964_91885 2011-12-28 11:35:23,958 INFO datanode.DataBlockScanner (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for blk_1323251636310_188845 2011-12-28 11:36:01,155 INFO datanode.DataBlockScanner (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for blk_1322486683238_54999 2011-12-28 11:36:04,157 INFO datanode.DataBlockScanner (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for blk_1323251678959_195786 2011-12-28 11:36:43,157 INFO datanode.DataBlockScanner (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for blk_1323251641803_189561 2011-12-28 11:37:20,357 INFO datanode.DataBlockScanner (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for blk_1322486706170_66445 2011-12-28 11:37:44,759 INFO datanode.DataBlockScanner
RE: Hadoop configuration
Hey Humayun, Looks your hostname still not resoling properly. even though you configured hostnames as master, slave...etc, it is getting humayun as hostname. just edit /etc/HOSTNAME file with correct hostname what you are expecting here. To confirm whether it is resolving properly or not, you can just do below steps #hostname //should get hostname here correctly ( ex: master) #hostname -i ..//should resolve correct IP here ... ( ex: master ip) and make sure slave and slave1 sre pingable each other. Regards, Uma From: Humayun kabir [humayun0...@gmail.com] Sent: Saturday, December 24, 2011 9:51 PM To: common-user@hadoop.apache.org Subject: Re: Hadoop configuration i've checked my log files. But i don't understand to why this error occurs. here i my logs files. please give me some suggestion. jobtracker.log http://paste.ubuntu.com/781181/ namenode.log http://paste.ubuntu.com/781183/ datanode.log(1st machine) http://paste.ubuntu.com/781176/ datanode.log(2nd machine) http://paste.ubuntu.com/781195goog_2054845717/ tasktracker.log(1st machine) http://paste.ubuntu.com/781192/ tasktracker.log(2nd machine) http://paste.ubuntu.com/781197/ On 24 December 2011 15:26, Joey Krabacher jkrabac...@gmail.com wrote: have you checked your log files for any clues? --Joey On Sat, Dec 24, 2011 at 3:15 AM, Humayun kabir humayun0...@gmail.com wrote: Hi Uma, Thank you very much for your tips. We tried it in 3 nodes in virtual box as you suggested. But still we are facing problem. Here is our all configuration file to all nodes. please take a look and show us some ways to solve it. It was nice and it would be great if you help us in this regards. http://core-site.xmlcore-site.xml http://pastebin.com/Twn5edrp hdfs-site.xml http://pastebin.com/k4hR4GE9 mapred-site.xml http://pastebin.com/gZuyHswS /etc/hosts http://pastebin.com/5s0yhgnj output http://paste.ubuntu.com/780807/ Hope you will understand and extend your helping hand towards us. Have a nice day. Regards Humayun On 23 December 2011 17:31, Uma Maheswara Rao G mahesw...@huawei.com wrote: Hi Humayun , Lets assume you have JT, TT1, TT2, TT3 Now you should configure the \etc\hosts like below examle 10.18.xx.1 JT 10.18.xx.2 TT1 10.18.xx.3 TT2 10.18.xx.4 TT3 Configure the same set in all the machines, so that all task trackers can talk each other with hostnames correctly. Also pls remove some entries from your files 127.0.0.1 localhost.localdomain localhost 127.0.1.1 humayun I have seen others already suggested many links for the regular configuration items. Hope you might clear about them. hope it will help... Regards, Uma From: Humayun kabir [humayun0...@gmail.com] Sent: Thursday, December 22, 2011 10:34 PM To: common-user@hadoop.apache.org; Uma Maheswara Rao G Subject: Re: Hadoop configuration Hello Uma, Thanks for your cordial and quick reply. It would be great if you explain what you suggested to do. Right now we are running on following configuration. We are using hadoop on virtual box. when it is a single node then it works fine for big dataset larger than the default block size. but in case of multinode cluster (2 nodes) we are facing some problems. We are able to ping both Master-Slave and Slave-Master. Like when the input dataset is smaller than the default block size(64 MB) then it works fine. but when the input dataset is larger than the default block size then it shows ‘too much fetch failure’ in reduce state. here is the output link http://paste.ubuntu.com/707517/ this is our /etc/hosts file 192.168.60.147 humayun # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 humayun localhost6.localdomain6 localhost6 127.0.1.1 humayun # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 192.168.60.1 master 192.168.60.2 slave Regards, -Humayun. On 22 December 2011 15:47, Uma Maheswara Rao G mahesw...@huawei.com mailto:mahesw...@huawei.com wrote: Hey Humayun, To solve the too many fetch failures problem, you should configure host mapping correctly. Each tasktracker should be able to ping from each other. Regards, Uma From: Humayun kabir [humayun0...@gmail.commailto:humayun0...@gmail.com ] Sent: Thursday, December 22, 2011 2:54 PM To: common-user@hadoop.apache.orgmailto:common-user@hadoop.apache.org Subject
RE: Secondary Namenode on hadoop 0.20.205 ?
Hey Praveenesh, You can start secondary namenode also by just giving the option ./hadoop secondarynamenode DN can not act as seconday namenode. The basic work for seconday namenode is to do checkpointing and getting the edits insync with Namenode till last checkpointing period. DN is to store the real data blocks physically. you need to configure correct namenode http address also for the secondaryNN, so that it can connect NN for checkpointing operations. http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode You can configure secondary node IP in masters file, start-dfs.sh itself will start the SNN automatically as it starts DN and NN as well. also you can see http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/ Regards, Uma From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, December 26, 2011 5:05 PM To: common-user@hadoop.apache.org Subject: Secondary Namenode on hadoop 0.20.205 ? Hey people, How can we setup another machine in the cluster as Secondary Namenode in hadoop 0.20.205 ? Can a DN also act as SNN, any pros and cons of having this configuration ? Thanks, Praveenesh
RE: Hadoop configuration
Hi Humayun , Lets assume you have JT, TT1, TT2, TT3 Now you should configure the \etc\hosts like below examle 10.18.xx.1 JT 10.18.xx.2 TT1 10.18.xx.3 TT2 10.18.xx.4 TT3 Configure the same set in all the machines, so that all task trackers can talk each other with hostnames correctly. Also pls remove some entries from your files 127.0.0.1 localhost.localdomain localhost 127.0.1.1 humayun I have seen others already suggested many links for the regular configuration items. Hope you might clear about them. hope it will help... Regards, Uma From: Humayun kabir [humayun0...@gmail.com] Sent: Thursday, December 22, 2011 10:34 PM To: common-user@hadoop.apache.org; Uma Maheswara Rao G Subject: Re: Hadoop configuration Hello Uma, Thanks for your cordial and quick reply. It would be great if you explain what you suggested to do. Right now we are running on following configuration. We are using hadoop on virtual box. when it is a single node then it works fine for big dataset larger than the default block size. but in case of multinode cluster (2 nodes) we are facing some problems. We are able to ping both Master-Slave and Slave-Master. Like when the input dataset is smaller than the default block size(64 MB) then it works fine. but when the input dataset is larger than the default block size then it shows ‘too much fetch failure’ in reduce state. here is the output link http://paste.ubuntu.com/707517/ this is our /etc/hosts file 192.168.60.147 humayun # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 humayun localhost6.localdomain6 localhost6 127.0.1.1 humayun # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 192.168.60.1 master 192.168.60.2 slave Regards, -Humayun. On 22 December 2011 15:47, Uma Maheswara Rao G mahesw...@huawei.commailto:mahesw...@huawei.com wrote: Hey Humayun, To solve the too many fetch failures problem, you should configure host mapping correctly. Each tasktracker should be able to ping from each other. Regards, Uma From: Humayun kabir [humayun0...@gmail.commailto:humayun0...@gmail.com] Sent: Thursday, December 22, 2011 2:54 PM To: common-user@hadoop.apache.orgmailto:common-user@hadoop.apache.org Subject: Hadoop configuration someone please help me to configure hadoop such as core-site.xml, hdfs-site.xml, mapred-site.xml etc. please provide some example. it is badly needed. because i run in a 2 node cluster. when i run the wordcount example then it gives the result too mutch fetch failure.
RE: could not sink exception
At what load you are running the cluster? Looks NN not able to choose the targets for block. When choosing the targets NN will check many conditions like what is the thread count in DNs and allocated space for the node, ...etc. If NN finds any of the nodes in that situation it will not choose the targets. That logs were debug logs. If you are getting this error frequently, you can just enable the debug log and check the reason once. Regards, Uma From: Jayaseelan E [jayaseela...@ericsson.com] Sent: Thursday, December 22, 2011 11:58 AM To: Harsh J; hdfs-user@hadoop.apache.org Subject: REG:could not sink exception Hi When we tried to run the hadoop flows we got the following exception for all the tasks and it is failed.error is little bit confusing.I have two data nodes,but both are healthy and nodes are Up and running. com.ericsson.cim.base.core.exception.CimProcessingException: Could not sink. Got an IOException with message: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /data/processing/events/usage/Processed_at_20111215100837/_temporary/_attempt_201112150454_0036_m_00_0/ServiceUsageChargedAccBalCS4_20111001-m-0 could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1282) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) Thanks indeed jayaseelan
RE: how read a file in HDFS?
Yes you can use utility methods from IOUtils ex: FileOutputStream fo = new FileOutputStream (file); IOUtils.copyBytes(fs.open(fileName), fo, 1024, true); here fs is DFS stream. other option is, you can make use of FileSystem apis. EX: FileSystem fs=new DistributedFileSystem(); fs.initialize(new URI(namenode_uri), conf); fs.copyToLocalFile(new Path(SRC_PATH),new Path(DST_PATH)); here you can give source path as DFS path and dst path local fileststem path. hope it helps. Regards, Uma From: Pedro Costa [psdc1...@gmail.com] Sent: Friday, December 16, 2011 9:27 PM To: mapreduce-user Subject: Fwd: how read a file in HDFS? Hi, I want to read a file that has 100MB of size and it is in the HDFS. How should I do it? Is it with IOUtils.readFully? Can anyone give me an example? -- Thanks, -- Thanks,
RE: Hadoop startup error - Mac OS - JDK
Some workaround available in https://issues.apache.org/jira/browse/HADOOP-7489 try adding that options in hadoop-env.sh. Regards, Uma From: Idris Ali [psychid...@gmail.com] Sent: Friday, December 16, 2011 8:16 PM To: common-user@hadoop.apache.org; oozie-us...@incubator.apache.org Subject: Hadoop startup error - Mac OS - JDK Hi, I am getting the below error since my last java update, I am using Mac OS 10.7.2 and CDH3u0. I tried with open JDK 1.7 and sun JDK 1.6._29. I use to run oozie with Hadoop till the last update. Any help is appreciated. I can send the details of core-site.xml as well. Thanks, -Idris 2011-12-16 20:07:50.009 java[92609:1d03] Unable to load realm mapping info from SCDynamicStore 2011-12-16 20:08:09.245 java[92609:1107] Unable to load realm mapping info from SCDynamicStore 2011-12-16 20:08:09.246 java[92609:1107] *** Terminating app due to uncaught exception 'JavaNativeException', reason: 'KrbException: Could not load configuration from SCDynamicStore' *** First throw call stack: ( 0 CoreFoundation 0x7fff98147286 __exceptionPreprocess + 198 1 libobjc.A.dylib 0x7fff98519d5e objc_exception_throw + 43 2 CoreFoundation 0x7fff981d14c9 -[NSException raise] + 9 3 JavaNativeFoundation0x000106dacc47 JNFCallStaticVoidMethod + 213 4 libosx.dylib0x000107b0c184 ___SCDynamicStoreCallBack_block_invoke_1 + 24 5 JavaNativeFoundation0x000106daf18a JNFPerformEnvBlock + 86 6 SystemConfiguration 0x7fff8e4d1d50 rlsPerform + 119 7 CoreFoundation 0x7fff980b5b51 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17 8 CoreFoundation 0x7fff980b53bd __CFRunLoopDoSources0 + 253 9 CoreFoundation 0x7fff980dc1a9 __CFRunLoopRun + 905 10 CoreFoundation 0x7fff980dbae6 CFRunLoopRunSpecific + 230 11 java0x000101d83eb1 CreateExecutionEnvironment + 841 12 java0x000101d7fecd JLI_Launch + 1933 13 java0x000101d85c2d main + 108 14 java0x000101d7f738 start + 52 15 ??? 0x0014 0x0 + 20 ) terminate called throwing an exception
RE: how read a file in HDFS?
Yes you can use utility methods from IOUtils ex: FileOutputStream fo = new FileOutputStream (file); IOUtils.copyBytes(fs.open(fileName), fo, 1024, true); here fs is DFS stream. other option is, you can make use of FileSystem apis. EX: FileSystem fs=new DistributedFileSystem(); fs.initialize(new URI(namenode_uri), conf); fs.copyToLocalFile(new Path(SRC_PATH),new Path(DST_PATH)); here you can give source path as DFS path and dst path local fileststem path. hope it helps. Regards, Uma From: Pedro Costa [psdc1...@gmail.com] Sent: Friday, December 16, 2011 9:27 PM To: mapreduce-user Subject: Fwd: how read a file in HDFS? Hi, I want to read a file that has 100MB of size and it is in the HDFS. How should I do it? Is it with IOUtils.readFully? Can anyone give me an example? -- Thanks, -- Thanks,
RE: how read a file in HDFS?
Yes you can use utility methods from IOUtils ex: FileOutputStream fo = new FileOutputStream (file); IOUtils.copyBytes(fs.open(fileName), fo, 1024, true); here fs is DFS stream. other option is, you can make use of FileSystem apis. EX: FileSystem fs=new DistributedFileSystem(); fs.initialize(new URI(namenode_uri), conf); fs.copyToLocalFile(new Path(SRC_PATH),new Path(DST_PATH)); here you can give source path as DFS path and dst path local fileststem path. hope it helps. Regards, Uma From: Pedro Costa [psdc1...@gmail.com] Sent: Friday, December 16, 2011 9:27 PM To: mapreduce-user Subject: Fwd: how read a file in HDFS? Hi, I want to read a file that has 100MB of size and it is in the HDFS. How should I do it? Is it with IOUtils.readFully? Can anyone give me an example? -- Thanks, -- Thanks,
RE: Problem Hadoop
Property should set at Namenode side. Please check your classpath, really conf directory coming or not where you updated the propery. From: Andrea Valentini Albanelli [andrea.valent...@pg.infn.it] Sent: Wednesday, December 14, 2011 8:57 PM To: hdfs-user@hadoop.apache.org Subject: Problem Hadoop Hello, when I try to do bin/hadoop fs -put conf input I obtain: put: org.apache.hadoop.security.AccessControlException: Permission denied: user=santoni, access=WRITE, inode=user:root:supergroup:rwxr-xr-x What's that? I've try to add the following entry to conf/hdfs-site.xml: property namedfs.permissions/name valuefalse/value /property But nothing. I've try to format the namenode too with $ bin/hadoop namenode -format but nothing Thank you for help, Andrea
RE: HDFS Backup nodes
AFAIK backup node introduced in 0.21 version onwards. From: praveenesh kumar [praveen...@gmail.com] Sent: Wednesday, December 07, 2011 12:40 PM To: common-user@hadoop.apache.org Subject: HDFS Backup nodes Does hadoop 0.20.205 supports configuring HDFS backup nodes ? Thanks, Praveenesh
RE: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second
Looks you are getting HDFS-2553. The cause might be that, you cleared the datadirectories directly without DN restart. Workaround would be to restart DNs. Regards, Uma From: Stephen Boesch [java...@gmail.com] Sent: Tuesday, November 29, 2011 8:53 PM To: mapreduce-user@hadoop.apache.org Subject: Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second Update on this: I've shut down all the servers multiple times. Also cleared the data directories and reformatted the namenode. Restarted it and the same results: 100% cpu and millions of these calls to isBPServiceAlive. 2011/11/29 Stephen Boesch java...@gmail.commailto:java...@gmail.com I am just trying to get off the ground with MRv2. The first node (in pseudo distributed mode) is working fine - ran a couple of TeraSort's on it. The second node has a serious issue with its single DataNode: it consumes 100% of one of the CPU's. Looking at it through JVisualVM, there are over 8 million invocations of isBPServiceAlive in a matter of a minute or so and continually incrementing at a steady clip. A screenshot of the JvisualVM cpu profile - showing just shy of 8M invocations is attached. What kind of configuration error could lead to this? The conf/masters and conf/slaves simply say localhost. If need be I'll copy the *-site.xml's. They are boilerplate from the Cloudera page by Ahmed Radwan.
RE: blockID generation
you can find the code directly in FSNameSystem#allocateBlock It is just a random long number and will ensure that blockid is not created already by NN. Regards, Uma From: kartheek muthyala [kartheek0...@gmail.com] Sent: Tuesday, November 29, 2011 6:07 PM To: hdfs-...@hadoop.apache.org; hdfs-user Subject: blockID generation Hi all, I am interested in exploring how the blockID is generated in hadoop world by the namenode. Any pointers to the class/method which takes care of this generation? Thanks in advance, ~Kartheek.
RE: Problem running Hadoop 0.23.0
Please recheck your cluster once, whether all the places has the same evrsion of Jars. It looks RPC client and servers are different versions. From: Nitin Khandelwal [nitin.khandel...@germinait.com] Sent: Monday, November 28, 2011 5:32 PM To: common-user@hadoop.apache.org Subject: Problem running Hadoop 0.23.0 Hi, I was trying to setup Hadoop 0.23.0 with help of http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/SingleCluster.html. After starting resourcemanager and nodemanager, I get following error when i try to hit Hadoop UI *���)org**.apache.hadoop.**ipc.RPC$Version** Mismatch���Ser**ver IPC version 5 cannot communicate with client version 47 *. There is no significant error in Hadoop logs (it shows everything started successfully). Do you have any idea about this error? Thanks, -- Nitin Khandelwal
RE: hdfs behavior
I think, you might not completed even single block write completely. Length will be updated in NN after completing the block. Currently partial block lengths will not be included in length calculation. Regards, Uma From: Inder Pall [inder.p...@gmail.com] Sent: Monday, November 28, 2011 5:13 PM To: hdfs-user@hadoop.apache.org Subject: hdfs behavior People, i am seeing the following - 1. writing to a large file on HDFS 2. tail -f on the same file shows data is streaming. 3. hadoop dfs -ls on the same file shows size as 0. Has anyone experienced this? -- Inder
RE: how to find data nodes on which a file is distributed to?
From Java API, FileSystem#getFileBlockLocations should give you the blocklocations. Regards, Uma From: Praveen Sripati [praveensrip...@gmail.com] Sent: Monday, November 28, 2011 10:01 PM To: hdfs-user@hadoop.apache.org Subject: Re: how to find data nodes on which a file is distributed to? Go to the NameNode web UI (default port is 50070) and select 'Browse the filesystem' and drill down to the file. At the bottom of the page the block report is shown. Or else 'hadoop fsck / -files -blocks -locations' from the CLI will also give the block report for all the files in HDFS. Thanks, Praveen On Mon, Nov 28, 2011 at 9:29 PM, CB cbalw...@gmail.commailto:cbalw...@gmail.com wrote: Hi, I am new to HDFS. I read HDFS documents on the internet but I couldn't figure out the following. Is there a way to find a list of data nodes where a file is distributed to when I executed a command such as hadoop dfs –copyFromLocal /tmp/testdata /user/chansup/test Thanks, - Chansup
RE: Clarification on federated HDFS
Hey Shesha, In Fedatated HDFS, same DataNodes can work with multiple NameNode. Where as in your setup, complete cluster itself is different. I would suggest you to take a look at HDFS-2471, Suresh has explained very neet and breifly here. Regards, Uma From: Sesha Kumar [sesha...@gmail.com] Sent: Monday, November 28, 2011 9:05 AM To: hdfs-user@hadoop.apache.org Subject: Clarification on federated HDFS Hi guys, Is Federated HDFS same as having a set of individual Hadoop Clusters, each managing its own namespace and knowing nothing about the existence of other clusters (but having separate set of data nodes instead of a common pool)? Assuming we have something similar to ViewFS which provides a single global namespace combining the namespaces of each namenode and it also provides some facility to add new namenodes to the setup. What are the differences between Federated HDFS and the above given setup?
RE: How to delete files older than X days in HDFS/Hadoop
AFAIK, there is no facility like this in HDFS through command line. One option is, write small client program and collect the files from root based on your condition and invoke delete on them. Regards, Uma From: Raimon Bosch [raimon.bo...@gmail.com] Sent: Saturday, November 26, 2011 8:31 PM To: common-user@hadoop.apache.org Subject: How to delete files older than X days in HDFS/Hadoop Hi, I'm wondering how to delete files older than X days with HDFS/Hadoop. On linux we can do it with the folowing command: find ~/datafolder/* -mtime +7 -exec rm {} \; Any ideas?
RE: What does hdfs balancer do after adding more disks to existing datanode.
Hi, Current volume choosing policy is round robin fashion, Since the DN got new disk, balancer will balance some blocks to this node. But the volume choosing will be same when palcing the block. AFAIK, it wont do any special balancing between disks in the same node. please correct me if i understood your question wrongly. Regards, Uma From: Ajit Ratnaparkhi [ajit.ratnapar...@gmail.com] Sent: Tuesday, November 22, 2011 5:13 PM To: hdfs-user@hadoop.apache.org; hdfs-...@hadoop.apache.org Subject: What does hdfs balancer do after adding more disks to existing datanode. Hi, If I add additional disks to existing datanode (assume existing datanode has 7 1TB disk which are already 80% full and then I add two new 2TB disks 0% full) and then run balancer, does balancer balance data in a datanode? ie. Will it move data from existing disks to newly added disks such that all disks are approx equally full ? thanks, Ajit.
RE: Regarding loading a big XML file to HDFS
Also i am surprising, how you are writing mapreduce application here. Map and reduce will work with key value pairs. From: Uma Maheswara Rao G Sent: Tuesday, November 22, 2011 8:33 AM To: common-user@hadoop.apache.org; core-u...@hadoop.apache.org Subject: RE: Regarding loading a big XML file to HDFS __ From: hari708 [hari...@gmail.com] Sent: Tuesday, November 22, 2011 6:50 AM To: core-u...@hadoop.apache.org Subject: Regarding loading a big XML file to HDFS Hi, I have a big file consisting of XML data.the XML is not represented as a single line in the file. if we stream this file using ./hadoop dfs -put command to a hadoop directory .How the distribution happens.? HDFS will didvide the blocks based on your block size configured for the file. Basically in My mapreduce program i am expecting a complete XML as my input.i have a CustomReader(for XML) in my mapreduce job configuration.My main confusion is if namenode distribute data to DataNodes ,there is a chance that a part of xml can go to one data node and other half can go in another datanode.If that is the case will my custom XMLReader in the mapreduce be able to combine it(as mapreduce reads data locally only). Please help me on this? if you can not do anything parallel here, make your input split size to cover complete file size. also configure the block size to cover complete file size. In this case, only one mapper and reducer will be spawned for file. But here you wont get any parallel processing advantage. -- View this message in context: http://old.nabble.com/Regarding-loading-a-big-XML-file-to-HDFS-tp32871900p32871900.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
RE: source code of hadoop 0.20.2
http://svn.apache.org/repos/asf/hadoop/common/branches/ all branches code will be under this. You can choose required one. Regards, Uma From: mohmmadanis moulavi [anis_moul...@yahoo.co.in] Sent: Tuesday, November 15, 2011 6:00 PM To: common-user@hadoop.apache.org Subject: source code of hadoop 0.20.2 Friends, Where can i find the source code of hadoop 0.20.2 version, i specifically want the source code of jobtracker. I am using hadoop which comes along with the nutch-1.2. Reagrds, Mohmmadanis Moualvi
RE: setting up eclipse env for hadoop
Yes, you can follow that. mvn eclipse:eclipse will generate eclipse related files. After that directly import in your eclipse. note: Repository links need to update. hdfs and mapreduce are moved inside to common folder. Regatrds, Uma From: Amir Sanjar [v1san...@us.ibm.com] Sent: Monday, November 14, 2011 9:07 PM To: common-user@hadoop.apache.org Subject: setting up eclipse env for hadoop I am trying to build hadoop-trunk using eclipse, is this http://wiki.apache.org/hadoop/EclipseEnvironment the latest document? Best Regards Amir Sanjar Linux System Management Architect and Lead IBM Senior Software Engineer Phone# 512-286-8393 Fax# 512-838-8858
RE: setting up eclipse env for hadoop
You are right. From: Tim Broberg [tim.brob...@exar.com] Sent: Tuesday, November 15, 2011 1:02 AM To: common-user@hadoop.apache.org Subject: RE: setting up eclipse env for hadoop The ant steps for building the eclipse plugin are replaced by mvn eclipse:eclipse, for versions 0.23+, correct? From: Uma Maheswara Rao G [mahesw...@huawei.com] Sent: Monday, November 14, 2011 10:11 AM To: common-user@hadoop.apache.org Subject: RE: setting up eclipse env for hadoop Yes, you can follow that. mvn eclipse:eclipse will generate eclipse related files. After that directly import in your eclipse. note: Repository links need to update. hdfs and mapreduce are moved inside to common folder. Regatrds, Uma From: Amir Sanjar [v1san...@us.ibm.com] Sent: Monday, November 14, 2011 9:07 PM To: common-user@hadoop.apache.org Subject: setting up eclipse env for hadoop I am trying to build hadoop-trunk using eclipse, is this http://wiki.apache.org/hadoop/EclipseEnvironment the latest document? Best Regards Amir Sanjar Linux System Management Architect and Lead IBM Senior Software Engineer Phone# 512-286-8393 Fax# 512-838-8858 The information and any attached documents contained in this message may be confidential and/or legally privileged. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, or reproduction is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender immediately by return e-mail and destroy all copies of the original message.
Re: Issues with Distributed Caching
- Original Message - From: Arko Provo Mukherjee arkoprovomukher...@gmail.com Date: Tuesday, November 8, 2011 1:26 pm Subject: Issues with Distributed Caching To: mapreduce-user@hadoop.apache.org Hello, I am having the following problem with Distributed Caching. *In the driver class, I am doing the following: (/home/arko/MyProgram/datais a directory created as an output of another map-reduce)* *FileSystem fs = FileSystem.get(jobconf_seed); String init_path = /home/arko/MyProgram/data; System.out.println(Caching files in + init_path); FileStatus[] init_files = fs.listStatus(new Path(init_path)); for ( int i = 0; i init_files.length; i++ ) { Path p = init_files[i].getPath(); DistributedCache.addCacheFile ( p.toUri(), jobconf ); }* I am not clearly sure about this. But looking at this, if you do addCacheFile, it will set the files to mapred.cache.files. I think you are getting localCacheFiles ( it will try to get the value with ,apred.cache.localFiles) . Looks that value is coming as null. Please check whether you are setting that values correctly or not. This is executing fine. *I have the following code in the configure method of the Map class:* *public void configure(JobConf job) { try { fs = FileSystem.getLocal(new Configuration()); Path [] localFiles = DistributedCache.getLocalCacheFiles(job); for ( Path p:localFiles ) { BufferedReader file_reader = new BufferedReader(new InputStreamReader(fs.open(p))); String line = file_reader.readLine(); while ( line != null ) { // Do something with the data line = C0_file.readLine(); } } } catch (java.io.IOException e) { System.err.println(ERROR!! Cannot open filesystem from Map for reading!!);e.printStackTrace(); } }* This is giving me a java.lang.NullPointerException: 11/11/08 01:36:17 INFO mapred.JobClient: Task Id : attempt_201106271322_12775_m_03_1, Status : FAILED java.lang.NullPointerException at Map.configure(Map.java:57) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328) at org.apache.hadoop.mapred.Child.main(Child.java:155) I am doing it in a wrong way? I followed a lot of links and this seems to be the way to go about it. Please help! Thanks a lot in advance! Warm regards Arko
Re: dfs.write.packet.size set to 2G
- Original Message - From: donal0412 donal0...@gmail.com Date: Tuesday, November 8, 2011 1:04 pm Subject: dfs.write.packet.size set to 2G To: hdfs-user@hadoop.apache.org Hi, I want to store lots of files in HDFS, the file size is = 2G. I don't want the file to split into blocks,because I need the whole file while processing it, and I don't want to transfer blocks to one node when processing it. A easy way to do this would be set dfs.write.packet.size to 2G, I wonder if some one has similar experiences or known whether this is practicable.Will there be performance problems when set the packet size to a big number? Looks you are looking at wrong configuration for your case. If you dont want to split the file, you need to increase dfs.blocksize. In DFS data transefer will happen packet by packet. dfs.write.packet.size will represents waht is the size for this packet. So, block will be splitted into packets at client side and maintain in dataqueue. DataStreamet thread will pick the packets and transfer to DN till the block size reaches. Once it reaches the block boundary, it will close the block streams. BTW, how you are going to process the data here? You are not going to use mapreduce for processing you Data? Thanks! donal Regards, Uma
Re: Any daemon?
You can look at BlockPoolSliceScanner#scan method. This is in trunk code. You can find this logic in DataBlockScanner#run in earlier versions. Regards, Uma - Original Message - From: kartheek muthyala kartheek0...@gmail.com Date: Monday, November 7, 2011 7:31 pm Subject: Any daemon? To: common-user@hadoop.apache.org Hi all, I am interested in knowing, if there is any background daemon in hadoopwhich runs for regular periods checking if all the data copies(blocks as listed in block map) do exist and are not corrupted?. Can you please point me to that piece of code in hadoop? Thanks, Kartheek.
Fwd: Re: Do failed task attempts stick around the jobcache on local disk?
forwarding to mapreduce ---BeginMessage--- Am I being completely silly asking about this? Does anyone know? On Wed, Nov 2, 2011 at 6:27 PM, Meng Mao meng...@gmail.com wrote: Is there any mechanism in place to remove failed task attempt directories from the TaskTracker's jobcache? It seems like for us, the only way to get rid of them is manually. ---End Message---
Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....
in 205, code is different than trace Which version are you using? I just verified the code in older versions, http://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201109.mbox/%3c20110902221116.d0b192388...@eris.apache.org%3E below is the code snippet. +boolean rv = true; + +// read perms +rv = f.setReadable(group.implies(FsAction.READ), false); +checkReturnValue(rv, f, permission); if rv is false then it throws the below error. Can you please create a simple program with the below path and try call setReadable with the user where task tracker starts. Then we can get to know what error it is giving. look at the javadoc http://download.oracle.com/javase/6/docs/api/java/io/File.html#setReadable(boolean,%20boolean) setReadable public boolean setReadable(boolean readable, boolean ownerOnly)Sets the owner's or everybody's read permission for this abstract pathname. Parameters: readable - If true, sets the access permission to allow read operations; if false to disallow read operations ownerOnly - If true, the read permission applies only to the owner's read permission; otherwise, it applies to everybody. If the underlying file system can not distinguish the owner's read permission from that of others, then the permission will apply to everybody, regardless of this value. Returns: true if and only if the operation succeeded. The operation will fail if the user does not have permission to change the access permissions of this abstract pathname. If readable is false and the underlying file system does not implement a read permission, then the operation will fail. I am not sure how to provide the athentications in Cygwin. Please make sure you should have rights to change the permissions with the user. If i get some more info, i will update you. i sent it to mapreduce user and cced to common Regards, Uma *- Original Message - From: Masoud mas...@agape.hanyang.ac.kr Date: Friday, November 4, 2011 7:01 am Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-u...@hadoop.apache.org Dear Uma, as you know when we use start-all.sh command, all the outputs saved in log files, when i check the tasktracker log file, i see the below error message and its shutdown. im really confused, its more than 4 days im working in this issue and tried different ways but no result.^^ BS. Masoud On 11/03/2011 08:34 PM, Uma Maheswara Rao G 72686 wrote: it wont disply any thing on console. If you get any error while exceuting the command, then only it will disply on console. In your case it might executed successfully. Still you are facing same problem with TT startup? Regards, Uma - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Thursday, November 3, 2011 7:02 am Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-u...@hadoop.apache.org Hi, thanks for info, i checked that report, seems same with mine but no specific solution mentioned. Yes, i changed this folder permission via cygwin,NO RESULT. Im really confused. ... any idea please ...? Thanks, B.S On 11/01/2011 05:38 PM, Uma Maheswara Rao G 72686 wrote: Looks, that is permissions related issue on local dirs There is an issue filed in mapred, related to this problem https://issues.apache.org/jira/browse/MAPREDUCE-2921 Can you please provide permissions explicitely and try? Regards, Uma - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 1:19 pm Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-u...@hadoop.apache.org Sure, ^^ when I run {namenode -fromat} it makes dfs in c:/tmp/ administrator_hadoop/ after that by running start-all.sh every thing is OK, all daemons run except tasktracker. My current user in administrator, but tacktracker runs by cyg_server user that made by cygwin in installation time;This is a part of log file: 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as cyg_server 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-cyg_server/mapred/local 2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop- cyg_server\mapred\local\ttprivate to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java
Re: Never ending reduce jobs, error Error reading task outputConnection refused
This problem may come if you dont configure the hostmappings properly. Can you check whether your tasktrackers are pingable from each other with the configured hostsnames? Regards, Uma - Original Message - From: Russell Brown misterr...@gmail.com Date: Friday, November 4, 2011 9:00 pm Subject: Never ending reduce jobs, error Error reading task outputConnection refused To: mapreduce-user@hadoop.apache.org Hi, I have a cluster of 4 tasktracker/datanodes and 1 JobTracker/Namenode. I can run small jobs on this cluster fine (like up to a few thousand keys) but more than that and I start seeing errors like this: 11/11/04 08:16:08 INFO mapred.JobClient: Task Id : attempt_20040342_0006_m_05_0, Status : FAILED Too many fetch-failures 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:16:13 INFO mapred.JobClient: map 97% reduce 1% 11/11/04 08:16:25 INFO mapred.JobClient: map 100% reduce 1% 11/11/04 08:17:20 INFO mapred.JobClient: Task Id : attempt_20040342_0006_m_10_0, Status : FAILED Too many fetch-failures 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:17:24 INFO mapred.JobClient: map 97% reduce 1% 11/11/04 08:17:36 INFO mapred.JobClient: map 100% reduce 1% 11/11/04 08:19:20 INFO mapred.JobClient: Task Id : attempt_20040342_0006_m_11_0, Status : FAILED Too many fetch-failures I have no IDEA what this means. All my nodes can ssh to each other, pass wordlessly, all the time. On the individual data/task nodes the logs have errors like this: 2011-11-04 08:24:42,514 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_20040342_0006_m_15_0,2) failed : org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/vagrant/jobcache/job_20040342_0006/attempt_20040342_0006_m_15_0/output/file.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3543) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) 2011-11-04 08:24:42,514 WARN org.apache.hadoop.mapred.TaskTracker: Unknown child with bad map output: attempt_20040342_0006_m_15_0. Ignored. Are they related? What d any of the mean? If I use a much smaller amount of data I don't see any of these errors and everything works fine, so I guess they are to do with some resource (though what I don't know?) Looking at MASTERNODE:50070/dfsnodelist.jsp?whatNodes=LIVE I see that datanodes have ample disk space, that isn't it… Any help at all really appreciated. Searching for the errors on Google has me nothing, reading the Hadoop definitive guide as me nothing. Many thanks in advance Russell
Re: Never ending reduce jobs, error Error reading task outputConnection refused
- Original Message - From: Russell Brown misterr...@gmail.com Date: Friday, November 4, 2011 9:11 pm Subject: Re: Never ending reduce jobs, error Error reading task outputConnection refused To: mapreduce-user@hadoop.apache.org On 4 Nov 2011, at 15:35, Uma Maheswara Rao G 72686 wrote: This problem may come if you dont configure the hostmappings properly. Can you check whether your tasktrackers are pingable from each other with the configured hosts names? Hi, Thanks for replying so fast! Hostnames? I use IP addresses in the slaves config file, and via IP addresses everyone can ping everyone else, do I need to set up hostnames too? Yes, can you configure hostname mappings and check.. Cheers Russell Regards, Uma - Original Message - From: Russell Brown misterr...@gmail.com Date: Friday, November 4, 2011 9:00 pm Subject: Never ending reduce jobs, error Error reading task outputConnection refused To: mapreduce-user@hadoop.apache.org Hi, I have a cluster of 4 tasktracker/datanodes and 1 JobTracker/Namenode. I can run small jobs on this cluster fine (like up to a few thousand keys) but more than that and I start seeing errors like this: 11/11/04 08:16:08 INFO mapred.JobClient: Task Id : attempt_20040342_0006_m_05_0, Status : FAILED Too many fetch-failures 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:16:13 INFO mapred.JobClient: map 97% reduce 1% 11/11/04 08:16:25 INFO mapred.JobClient: map 100% reduce 1% 11/11/04 08:17:20 INFO mapred.JobClient: Task Id : attempt_20040342_0006_m_10_0, Status : FAILED Too many fetch-failures 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:17:24 INFO mapred.JobClient: map 97% reduce 1% 11/11/04 08:17:36 INFO mapred.JobClient: map 100% reduce 1% 11/11/04 08:19:20 INFO mapred.JobClient: Task Id : attempt_20040342_0006_m_11_0, Status : FAILED Too many fetch-failures I have no IDEA what this means. All my nodes can ssh to each other, pass wordlessly, all the time. On the individual data/task nodes the logs have errors like this: 2011-11-04 08:24:42,514 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_20040342_0006_m_15_0,2) failed : org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/vagrant/jobcache/job_20040342_0006/attempt_20040342_0006_m_15_0/output/file.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3543) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) 2011-11-04 08:24:42,514 WARN org.apache.hadoop.mapred.TaskTracker: Unknown child with bad map
Re: Never ending reduce jobs, error Error reading task outputConnection refused
- Original Message - From: Russell Brown misterr...@gmail.com Date: Friday, November 4, 2011 9:18 pm Subject: Re: Never ending reduce jobs, error Error reading task outputConnection refused To: mapreduce-user@hadoop.apache.org On 4 Nov 2011, at 15:44, Uma Maheswara Rao G 72686 wrote: - Original Message - From: Russell Brown misterr...@gmail.com Date: Friday, November 4, 2011 9:11 pm Subject: Re: Never ending reduce jobs, error Error reading task outputConnection refused To: mapreduce-user@hadoop.apache.org On 4 Nov 2011, at 15:35, Uma Maheswara Rao G 72686 wrote: This problem may come if you dont configure the hostmappings properly. Can you check whether your tasktrackers are pingable from each other with the configured hosts names? Hi, Thanks for replying so fast! Hostnames? I use IP addresses in the slaves config file, and via IP addresses everyone can ping everyone else, do I need to set up hostnames too? Yes, can you configure hostname mappings and check.. Like full blown DNS? I mean there is no reference to any machine by hostname in any of my config anywhere, so I'm not sure where to start. These machines are just on my local network. you need to configure them in /etc/hosts file. ex: xx.xx.xx.xx1 TT_HOSTNAME1 xx.xx.xx.xx2 TT_HOSTNAME2 xx.xx.xx.xx3 TT_HOSTNAME3 xx.xx.xx.xx4 TT_HOSTNAME4 configure them in all the machines and check. Cheers Russell Regards, Uma - Original Message - From: Russell Brown misterr...@gmail.com Date: Friday, November 4, 2011 9:00 pm Subject: Never ending reduce jobs, error Error reading task outputConnection refused To: mapreduce-user@hadoop.apache.org Hi, I have a cluster of 4 tasktracker/datanodes and 1 JobTracker/Namenode. I can run small jobs on this cluster fine (like up to a few thousand keys) but more than that and I start seeing errors like this: 11/11/04 08:16:08 INFO mapred.JobClient: Task Id : attempt_20040342_0006_m_05_0, Status : FAILED Too many fetch-failures 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:16:13 INFO mapred.JobClient: map 97% reduce 1% 11/11/04 08:16:25 INFO mapred.JobClient: map 100% reduce 1% 11/11/04 08:17:20 INFO mapred.JobClient: Task Id : attempt_20040342_0006_m_10_0, Status : FAILED Too many fetch-failures 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task outputConnection refused 11/11/04 08:17:24 INFO mapred.JobClient: map 97% reduce 1% 11/11/04 08:17:36 INFO mapred.JobClient: map 100% reduce 1% 11/11/04 08:19:20 INFO mapred.JobClient: Task Id : attempt_20040342_0006_m_11_0, Status : FAILED Too many fetch-failures I have no IDEA what this means. All my nodes can ssh to each other, pass wordlessly, all the time. On the individual data/task nodes the logs have errors like this: 2011-11-04 08:24:42,514 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_20040342_0006_m_15_0,2) failed : org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/vagrant/jobcache/job_20040342_0006/attempt_20040342_0006_m_15_0/output/file.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3543) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle
Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....
in 205, code is different than trace Which version are you using? I just verified the code in older versions, http://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201109.mbox/%3c20110902221116.d0b192388...@eris.apache.org%3E below is the code snippet. +boolean rv = true; + +// read perms +rv = f.setReadable(group.implies(FsAction.READ), false); +checkReturnValue(rv, f, permission); if rv is false then it throws the below error. Can you please create a simple program with the below path and try call setReadable with the user where task tracker starts. Then we can get to know what error it is giving. look at the javadoc http://download.oracle.com/javase/6/docs/api/java/io/File.html#setReadable(boolean,%20boolean) setReadable public boolean setReadable(boolean readable, boolean ownerOnly)Sets the owner's or everybody's read permission for this abstract pathname. Parameters: readable - If true, sets the access permission to allow read operations; if false to disallow read operations ownerOnly - If true, the read permission applies only to the owner's read permission; otherwise, it applies to everybody. If the underlying file system can not distinguish the owner's read permission from that of others, then the permission will apply to everybody, regardless of this value. Returns: true if and only if the operation succeeded. The operation will fail if the user does not have permission to change the access permissions of this abstract pathname. If readable is false and the underlying file system does not implement a read permission, then the operation will fail. I am not sure how to provide the athentications in Cygwin. Please make sure you should have rights to change the permissions with the user. If i get some more info, i will update you. i sent it to mapreduce user and cced to common Regards, Uma *- Original Message - From: Masoud mas...@agape.hanyang.ac.kr Date: Friday, November 4, 2011 7:01 am Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Dear Uma, as you know when we use start-all.sh command, all the outputs saved in log files, when i check the tasktracker log file, i see the below error message and its shutdown. im really confused, its more than 4 days im working in this issue and tried different ways but no result.^^ BS. Masoud On 11/03/2011 08:34 PM, Uma Maheswara Rao G 72686 wrote: it wont disply any thing on console. If you get any error while exceuting the command, then only it will disply on console. In your case it might executed successfully. Still you are facing same problem with TT startup? Regards, Uma - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Thursday, November 3, 2011 7:02 am Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Hi, thanks for info, i checked that report, seems same with mine but no specific solution mentioned. Yes, i changed this folder permission via cygwin,NO RESULT. Im really confused. ... any idea please ...? Thanks, B.S On 11/01/2011 05:38 PM, Uma Maheswara Rao G 72686 wrote: Looks, that is permissions related issue on local dirs There is an issue filed in mapred, related to this problem https://issues.apache.org/jira/browse/MAPREDUCE-2921 Can you please provide permissions explicitely and try? Regards, Uma - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 1:19 pm Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Sure, ^^ when I run {namenode -fromat} it makes dfs in c:/tmp/ administrator_hadoop/ after that by running start-all.sh every thing is OK, all daemons run except tasktracker. My current user in administrator, but tacktracker runs by cyg_server user that made by cygwin in installation time;This is a part of log file: 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as cyg_server 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-cyg_server/mapred/local 2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop- cyg_server\mapred\local\ttprivate to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java
Re: HDFS error : Could not Complete file
Looks before comlpeting the file, folder has been deleted. In HDFS, we will be able to delete the files any time. Application need to take care about the file comleteness depending on his usage. Do you have any dfsclient side logs in mapreduce, when exactly delete command issued? - Original Message - From: Sudharsan Sampath sudha...@gmail.com Date: Friday, November 4, 2011 2:48 pm Subject: HDFS error : Could not Complete file To: hdfs-user@hadoop.apache.org Hi, I have a simple map-reduce program [map only :) ]that reads the input and emits the same to n outputs on a single node cluster with max map tasks set to 10 on a 16 core processor machine. After a while the tasks begin to fail with the following exception log. 2011-01-01 03:17:52,149 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=temp,tempip=/x.x.x.x cmd=delete src=/TestMultipleOuputs1320394241986/_temporary/_attempt_201101010256_0006_m_00_2 dst=nullperm=null 2011-01-01 03:17:52,156 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*NameSystem.addStoredBlock: addStoredBlock request received for blk_7046642930904717718_23143 on x.x.x.x:port size 66148 But it does not belong to any file. 2011-01-01 03:17:52,156 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: failed to complete /TestMultipleOuputs1320394241986/_temporary/_attempt_201101010256_0006_m_00_2/Output0-m-0 because dir.getFileBlocks() is null and pendingFile is null 2011-01-01 03:17:52,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 12 on 9000, call complete(/TestMultipleOuputs1320394241986/_temporary/_attempt_201101010256_0006_m_00_2/Output0-m-0, DFSClient_attempt_201101010256_0006_m_00_2) from x.x.x.x:port error: java.io.IOException: Could not complete write to file /TestMultipleOuputs1320394241986/_temporary/_attempt_201101010256_0006_m_00_2/Output0-m-0 by DFSClient_attempt_201101010256_0006_m_00_2 java.io.IOException: Could not complete write to file /TestMultipleOuputs1320394241986/_temporary/_attempt_201101010256_0006_m_00_2/Output0-m-0 by DFSClient_attempt_201101010256_0006_m_00_2 at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:497) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) Looks like there's a delete command issued by FsNameSystem.audit before the it errors out stating it could not complete write to the file inside that.. Any clue on what could have gone wrong? Thanks Sudharsan S
Re: Packets-Block
- Original Message - From: kartheek muthyala kartheek0...@gmail.com Date: Thursday, November 3, 2011 11:23 am Subject: Packets-Block To: common-user@hadoop.apache.org Hi all, I need some info related to the code section which handles the followingoperations. Basically DataXceiver.c on the client side transmits the block in packetsand Actually DataXceiver will run only in DN. Whenever you create a file DataStreamer thread will start in DFSClient. Whenever application writing the bytes, they will be enqueued into dataQueue. Streamer thread will pick the packets from dataqueue and write on to the pipeline sockets. Also it will write the opcodes to tell the DN about the kind of operation. on the data node side we have DataXceiver.c and BlockReceiver.c files which take care of writing these packets in order to a block file until the last packet for the block is received. I want some info around this area DataXceiverServer will run and listen for the requests. For every request it receives, it will create DataXceiver thread and pass the info to it. Based on the opcode it will create BlockReceiver or BlockSender objects and give the control to it. where in BlockReceiver.c , i have seen a PacketResponder class and a BlockReceiver class where in two places you are finalizing the block (What i understood by finalizing is that when the last packet for the block is received, you are closing the block file). In PacketResponder class in two places you are using finalizeBlock() function, one in lastDataNodeRun()function and the other in run() method and in BlockReceiver.c you are using finalizeBlock() in receiveBlock() function. I understood from the commentsthat the finalizeBlock() call from run() method is done for the datanode with which client directly interacts and finalizeBlock() call from receiveBlock() function is done for all the datanodes where the block is sent for replication. As part replication, if one block has received by DN and also block length will be know before itself. So, receivePacket() invocation in while loop itself can read the complete block. So, after reading, it need to finalize the block to add into volumesMap. But i didn't understand why there is a finalizeBlock() call from lastDataNodeRun() function. This call will be for current writes from client/DN, it will not know the actual size untill client says that is last packet in current block. finalizeBlock will be called if the packet is lastPacket for that block. finalizeBlock will add the replica into volumesMap. Also if the packet is last one, then it needs to close all the blocks files in DN which were opened for writes. Can someone explain me about this? I may be wrong at most of the places of my understanding of the workflow. Correct me if i am wrong. Thanks, Kartheek Regards, Uma
Re: Packets-Block
Hello Karthik, see inline - Original Message - From: kartheek muthyala kartheek0...@gmail.com Date: Thursday, November 3, 2011 4:02 pm Subject: Re: Packets-Block To: common-user@hadoop.apache.org Thanks Uma for the prompt reply. I have one more doubt, as i can see block class contains only metadata information like Timestamp, length. But the actual data is in the streams.What I cannot understand is that where is the data getting written from streams to blockfile.(which function is taking care of this? ). Yes, block will contains all the information like blockID, generation timestamp, number of bytes... Block is writable, so that we can transfer them through network. ( ex: DN will send block reports,...etc ). Actual data will in disk with the name of blk_block id So, using this block id, we can identify the block name directly. When the block is created at the DN side, volumes map will maintans replicaBeingWritten objs with this block ID information . You can see the code in BlockReceiver constructor, i.e, once it gets the replicaInfo, it will call creatStreams on that replicainfo. So, that will create the FileOutPutStreams. Regards, Uma ~Kartheek. On Thu, Nov 3, 2011 at 12:55 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: - Original Message - From: kartheek muthyala kartheek0...@gmail.com Date: Thursday, November 3, 2011 11:23 am Subject: Packets-Block To: common-user@hadoop.apache.org Hi all, I need some info related to the code section which handles the followingoperations. Basically DataXceiver.c on the client side transmits the block in packetsand Actually DataXceiver will run only in DN. Whenever you create a file DataStreamer thread will start in DFSClient. Whenever application writing the bytes, they will be enqueued into dataQueue. Streamer thread will pick the packets from dataqueue and write on to the pipeline sockets. Also it will write the opcodes to tell the DN about the kind of operation. on the data node side we have DataXceiver.c and BlockReceiver.c files which take care of writing these packets in order to a block file until the last packet for the block is received. I want some info around this area DataXceiverServer will run and listen for the requests. For every request it receives, it will create DataXceiver thread and pass the info to it. Based on the opcode it will create BlockReceiver or BlockSender objects and give the control to it. where in BlockReceiver.c , i have seen a PacketResponder class and a BlockReceiver class where in two places you are finalizing the block (What i understood by finalizing is that when the last packet for the block is received, you are closing the block file). In PacketResponder class in two places you are using finalizeBlock() function, one in lastDataNodeRun()function and the other in run() method and in BlockReceiver.c you are using finalizeBlock() in receiveBlock() function. I understood from the commentsthat the finalizeBlock() call from run() method is done for the datanode with which client directly interacts and finalizeBlock() call from receiveBlock() function is done for all the datanodes where the block is sent for replication. As part replication, if one block has received by DN and also block length will be know before itself. So, receivePacket() invocation in while loop itself can read the complete block. So, after reading, it need to finalize the block to add into volumesMap. But i didn't understand why there is a finalizeBlock() call from lastDataNodeRun() function. This call will be for current writes from client/DN, it will not know the actual size untill client says that is last packet in current block. finalizeBlock will be called if the packet is lastPacket for that block. finalizeBlock will add the replica into volumesMap. Also if the packet is last one, then it needs to close all the blocks files in DN which were opened for writes. Can someone explain me about this? I may be wrong at most of the places of my understanding of the workflow. Correct me if i am wrong. Thanks, Kartheek Regards, Uma
Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....
it wont disply any thing on console. If you get any error while exceuting the command, then only it will disply on console. In your case it might executed successfully. Still you are facing same problem with TT startup? Regards, Uma - Original Message - From: Masoud mas...@agape.hanyang.ac.kr Date: Thursday, November 3, 2011 7:02 am Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Hi, thanks for info, i checked that report, seems same with mine but no specific solution mentioned. Yes, i changed this folder permission via cygwin,NO RESULT. Im really confused. ... any idea please ...? Thanks, B.S On 11/01/2011 05:38 PM, Uma Maheswara Rao G 72686 wrote: Looks, that is permissions related issue on local dirs There is an issue filed in mapred, related to this problem https://issues.apache.org/jira/browse/MAPREDUCE-2921 Can you please provide permissions explicitely and try? Regards, Uma - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 1:19 pm Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Sure, ^^ when I run {namenode -fromat} it makes dfs in c:/tmp/ administrator_hadoop/ after that by running start-all.sh every thing is OK, all daemons run except tasktracker. My current user in administrator, but tacktracker runs by cyg_server user that made by cygwin in installation time;This is a part of log file: 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as cyg_server 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-cyg_server/mapred/local 2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:741) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1463) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3611) 2011-11-01 14:26:54,479 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: / Thanks, BR. On 11/01/2011 04:33 PM, Uma Maheswara Rao G 72686 wrote: Can you please give some trace? - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 11:08 am Subject: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Hi I have problem in running hadoop under cygwin 1.7 only tasktracker ran by cyg_server user and so make some problems, so any idea please??? BS. Masoud.
Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....
Can you please give some trace? - Original Message - From: Masoud mas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 11:08 am Subject: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Hi I have problem in running hadoop under cygwin 1.7 only tasktracker ran by cyg_server user and so make some problems, so any idea please??? BS. Masoud.
Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....
Looks, that is permissions related issue on local dirs There is an issue filed in mapred, related to this problem https://issues.apache.org/jira/browse/MAPREDUCE-2921 Can you please provide permissions explicitely and try? Regards, Uma - Original Message - From: Masoud mas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 1:19 pm Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Sure, ^^ when I run {namenode -fromat} it makes dfs in c:/tmp/ administrator_hadoop/ after that by running start-all.sh every thing is OK, all daemons run except tasktracker. My current user in administrator, but tacktracker runs by cyg_server user that made by cygwin in installation time;This is a part of log file: 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as cyg_server 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-cyg_server/mapred/local 2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:741) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1463) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3611) 2011-11-01 14:26:54,479 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: / Thanks, BR. On 11/01/2011 04:33 PM, Uma Maheswara Rao G 72686 wrote: Can you please give some trace? - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 11:08 am Subject: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Hi I have problem in running hadoop under cygwin 1.7 only tasktracker ran by cyg_server user and so make some problems, so any idea please??? BS. Masoud.
Re: Server log files, order of importance ?
If you want to trace one particular block associated with a file, you can first check the file Name and find the NameSystem.allocateBlock: from your NN logs. here you can find the allocated blockID. After this, you just grep with this blockID from your huge logs. Take the time spamps for each operations based on this grep information. easily you can trace what happend to that block. Regards, Uma - Original Message - From: Jay Vyas jayunit...@gmail.com Date: Tuesday, November 1, 2011 3:37 am Subject: Server log files, order of importance ? To: common-user@hadoop.apache.org Hi guys :I wanted to go through each of the server logs on my hadoop (single psuedo node) vm. In particular, I want to know where to look when things go wrong (i.e. so I can more effectively debug hadoop namenode issues in the future). Can someone suggest what the most important ones to start looking at are ? -- Jay Vyas MMSB/UCHC
Re: can't format namenode....
- Original Message - From: Jay Vyas jayunit...@gmail.com Date: Saturday, October 29, 2011 8:27 pm Subject: can't format namenode To: common-user@hadoop.apache.org Hi guys : In order to fix some issues im having (recently posted), I'vedecided to try to make sure my name node is formatted But the formatting fails (see 1 below) . So... To trace the failure, I figured I would grep through all log filesfor exceptions. I've curated the results here ... does this look familiar to anyone ? Clearly, something is very wrong with my CDH hadoop installation. 1) To attempt to solve this, I figured I would format my namenode. Oddly,when I run hadoop -namenode format I get the following stack trace : 11/10/29 14:39:37 INFO namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost.localdomain/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2-cdh3u1 STARTUP_MSG: build = file:///tmp/topdir/BUILD/hadoop-0.20.2- cdh3u1 -r bdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638; compiled by 'root' on Mon Jul 18 09:40:22 PDT 2011 / Re-format filesystem in /var/lib/hadoop-0.20/cache/hadoop/dfs/name ? (Y or N) Y 11/10/29 14:39:40 INFO util.GSet: VM type = 64-bit 11/10/29 14:39:40 INFO util.GSet: 2% max memory = 19.33375 MB 11/10/29 14:39:40 INFO util.GSet: capacity = 2^21 = 2097152 entries11/10/29 14:39:40 INFO util.GSet: recommended=2097152, actual=209715211/10/29 14:39:40 INFO namenode.FSNamesystem: fsOwner=cloudera11/10/29 14:39:40 INFO namenode.FSNamesystem: supergroup=supergroup11/10/29 14:39:40 INFO namenode.FSNamesystem: isPermissionEnabled=false11/10/29 14:39:40 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=1000 11/10/29 14:39:40 INFO namenode.FSNamesystem: isAccessTokenEnabled=falseaccessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 11/10/29 14:39:41 ERROR namenode.NameNode: java.io.IOException: Cannot remove current directory: /var/lib/hadoop- 0.20/cache/hadoop/dfs/name/currentat org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:303) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1244) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1263) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1100) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1217) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1233) 11/10/29 14:39:41 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1/ Are you able to remove this directory explicitely? /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current 2) Here are the exceptions (abridged , i removed repetetive parts regardingreplicated to 0 nodes instead of 1 This is because, the file is not replicated to minimum replication (1). 2011-10-28 22:36:52,669 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call addBlock(/var/lib/hadoop- 0.20/cache/mapred/mapred/system/jobtracker.info,DFSClient_- 134960056, null) from 127.0.0.1:35163: error: java.io.IOException: File /var lib/hadoop-0.20/cache/mapred/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 java.io.IOException: File /var/lib/hadoop- 0.20/cache/mapred/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 STARTUP_MSG: host = java.net.UnknownHostException: gcrc15.uchc.net: gcrc15.uchc.net java.net.UnknownHostException: gcrc15.uchc.net: gcrc15.uchc.net java.net.UnknownHostException: gcrc15.uchc.net: gcrc15.uchc.net java.net.UnknownHostException: gcrc15.uchc.net: gcrc15.uchc.net java.net.UnknownHostException: gcrc15.uchc.net: gcrc15.uchc.net java.net.UnknownHostException: gcrc15.uchc.net: gcrc15.uchc.net 2011-10-28 22:30:03,413 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamerException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /var/lib/hadoop-0.20/cache/mapred/mapred/system/jobtracker.info could only be replicated to 0 nodes, in tead of 1 .. REPEATED SEVERAL TIMES ..2011-10-28 22:36:52,716 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /var/lib/hadoop-0.20/cache/mapred/mapred/system/jobtracker.info could only be replicated to 0 nodes, in tead of 1 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /var/lib/hadoop-0.20/cache/mapred/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of
Re: Permission denied for normal users
- Original Message - From: Josu Lazkano josu.lazk...@barcelonamedia.org Date: Thursday, October 27, 2011 9:38 pm Subject: Permission denied for normal users To: hdfs-user@hadoop.apache.org Hello list, I am new on Hadoop, I configura a 3 slaves and 1 master Hadoop cluster. The problem is that with normal users I can not execute nothing: $ hadoop dfs -ls /user/josu.lazkano/gutenberg Found 7 items -rw-r--r-- 2 josu.lazkano supergroup 343695 2011-10-11 09:47 /user/josu.lazkano/gutenberg/132.txt.utf8 -rw-r--r-- 2 josu.lazkano supergroup 594933 2011-10-11 09:47 /user/josu.lazkano/gutenberg/1661.txt.utf8 -rw-r--r-- 2 josu.lazkano supergroup1945886 2011-10-11 09:47 /user/josu.lazkano/gutenberg/19699.txt.utf8 -rw-r--r-- 2 josu.lazkano supergroup 674566 2011-10-11 09:47 /user/josu.lazkano/gutenberg/20417.txt.utf8 -rw-r--r-- 2 josu.lazkano supergroup1573112 2011-10-11 09:47 /user/josu.lazkano/gutenberg/4300.txt.utf8 -rw-r--r-- 2 josu.lazkano supergroup1423801 2011-10-11 09:47 /user/josu.lazkano/gutenberg/5000.txt.utf8 -rw-r--r-- 2 josu.lazkano supergroup 393963 2011-10-11 09:47 /user/josu.lazkano/gutenberg/972.txt.utf8 And this happens when I run a jar: $ hadoop jar /usr/local/hadoop/hadoop-0.20.2-examples.jar wordcount /user/josu.lazkano/gutenberg/ /user/josu.lazkano/gutenberg_outException in thread main java.io.IOException: Permission denied at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.checkAndCreate(File.java:1704) at java.io.File.createTempFile(File.java:1792) at org.apache.hadoop.util.RunJar.main(RunJar.java:115) This error throws from OS itself. Error is not related to Hadoop permissions. First of all, are you able to create file with that user? Provide the permissions for /user and try. With hduser user there is no problem with same jar execution (this user start the hdfs). How could I solve this? Thanks and best regards. -- Josu Lazkano Barcelona Media – Centre d’Innovació Regards, Uma
Re: Need help understanding Hadoop Architecture
Hi, Firt of all, welcome to Hadoop. - Original Message - From: panamamike panamam...@hotmail.com Date: Sunday, October 23, 2011 8:29 pm Subject: Need help understanding Hadoop Architecture To: core-u...@hadoop.apache.org I'm new to Hadoop. I've read a few articles and presentations which are directed at explaining what Hadoop is, and how it works. Currently my understanding is Hadoop is an MPP system which leverages the use of largeblock size to quickly find data. In theory, I understand how a large block size along with an MPP architecture as well as using what I'm understandingto be a massive index scheme via mapreduce can be used to find data. What I don't understand is how ,after you identify the appropriate 64MBblocksize, do you find the data you're specifically after? Does this mean the CPU has to search the entire 64MB block for the data of interest? If so, how does Hadoop know what data from that block to retrieve? I'm assuming the block is probably composed of one or more files. If not, I'm assuming the user isn't look for the entire 64MB block rather a portionof it. I am just giving breif about file system here. Distributed file system contains, NameNode, DataNode, Checkpointing nodes and DFSClient. Here NameNode will maintain the metadat about the files and blocks. Datanode holds the actual data. and it will send the heartbeats to NN.So, Namenode knows about the DN status. DFSClient is client side ligic, which will first ask the namenode to give set of DN to write the file. Then NN will add their entries in metadata and give DN list to client. Then client will write the Data to Dtatnodes directly. While reading the file also, Client will ask NN to give the block locations, then client will directly connect to DN and read the data. There are many other concepts replication, leasemonitoring...etc. I hope this will give you about initial understanding about HDFS. Please go through the below document which will explan you very clearly with the architecture diagrams. Any help indicating documentation, books, articles on the subject would be much appreciated. Here is a doc for HADOOP http://db.trimtabs.com:2080/mindterm/ebooks/Hadoop_The_Definitive_Guide_Cr.pdf Regards, Mike -- View this message in context: http://old.nabble.com/Need-help- understanding-Hadoop-Architecture-tp32705405p32705405.html Sent from the Hadoop core-user mailing list archive at Nabble.com. Regards, Uma
Re: lost data with 1 failed datanode and replication factor 3 in 6 node cluster
- Original Message - From: Ossi los...@gmail.com Date: Friday, October 21, 2011 2:57 pm Subject: lost data with 1 failed datanode and replication factor 3 in 6 node cluster To: common-user@hadoop.apache.org hi, We managed to lost data when 1 datanode broke down in a cluster of 6 datanodes with replication factor 3. As far as I know, that shouldn't happen, since each blocks should have 1 copy in 3 different hosts. So, loosing even 2 nodes should be fine. Earlier we did some tests with replication factor 2, but reverted from that: 88 2011-10-12 06:46:49 hadoop dfs -setrep -w 2 -R / 148 2011-10-12 10:22:09 hadoop dfs -setrep -w 3 -R / The lost data was generated after replication factor was set back to 3. First of all the question is how are you measuring the dataloss? Any read failure with block missing exceptions? My guess is that, you are measuring the dataloss by dfsused space. If i am correct, the dfsused space will be calculated by complete data available DNs. So, when one datanode goes down, then dfs used and ramainig also will reduce relatively. This can not be taken as data loss. Please correct me, if my understanding is wrong with the question. And even if replication factor would have been 2, data shouldn't have been lost, right? We wonder how that is possible and in what situations that could happen? br, Ossi Regards, Uma
Re: Remote Blocked Transfer count
- Original Message - From: Mark question markq2...@gmail.com Date: Saturday, October 22, 2011 5:57 am Subject: Remote Blocked Transfer count To: common-user common-user@hadoop.apache.org Hello, I wonder if there is a way to measure how many of the data blocks havetransferred over the network? Or more generally, how many times where there a connection/contact between different machines? There is a metrics available in Hadoop. Did you check them. The simplest way to configure Hadoop metrics is to funnel them into a user-configurable file on the machine running the daemon. Metrics are organized into “contexts” (Hadoop currently uses “jvm”, “dfs”, “mapred”, and “rpc”), and each context is independently configured http://www.cloudera.com/blog/2009/03/hadoop-metrics/ You can view them by JMX. I thought of checking the Namenode log file which usually shows blk_from src= to dst ... but I'm not sure if it's correct to count those lines. I wont recommend to depend on logs. Because if some one changes the log, then it will effect your application. Any ideas are helpful. Mark Regards, Uma
Re: Does hadoop support append option?
- Original Message - From: kartheek muthyala kartheek0...@gmail.com Date: Tuesday, October 18, 2011 11:54 am Subject: Re: Does hadoop support append option? To: common-user@hadoop.apache.org I am just concerned about the use case of appends in Hadoop. I know that they have provided support for appends in hadoop. But how frequently are the files getting appended? . In normal case file block details will not be persisted in edit log before closing the file. As part of close only, this will happen. If NN restart happens before closing the file, we loose this data. Consider a case, we have a very big file and data also very important, in this case, we should have an option to persist the block details frequently into editlog file rite, inorder to avoid the dataloss in case of NN restarts. To do this, DFS exposed the API called sync. This will basically persist the editlog entries to disk. To reopen the stream back again we will use append api. In trunk, this support has been refactored cleanly and handled many corner cases. APIs also provided as hflush. There is this version concept too that is maintained in the block report, according to my guess this version number is maintained to make sure that if a datanode gets disconnected once and comes back if it has a old copy of the data , then discard read requests to this data node. But if the files are not getting appended frequently does the version number remain the same?. Any typical use case can you guys point to? I am not sure, what is your exact question here. Can you please clarify more on this? ~Kartheek On Mon, Oct 17, 2011 at 12:53 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: AFAIK, append option is there in 20Append branch. Mainly supports sync. But there are some issues with that. Same has been merged to 20.205 branch and will be released soon (rc2 available). And also fixed many bugs in this branch. As per our basic testing it is pretty good as of now.Need to wait for official release. Regards, Uma - Original Message - From: bourne1900 bourne1...@yahoo.cn Date: Monday, October 17, 2011 12:37 pm Subject: Does hadoop support append option? To: common-user common-user@hadoop.apache.org I know that hadoop0.19.0 supports append option, but not stable. Does the latest version support append option? Is it stable? Thanks for help. bourne Regards, Uma
Re: could not complete file...
- Original Message - From: bourne1900 bourne1...@yahoo.cn Date: Tuesday, October 18, 2011 3:21 pm Subject: could not complete file... To: common-user common-user@hadoop.apache.org Hi, There are 20 threads which put file into HDFS ceaseless, every file is 2k. When 1 million files have finished, client begin throw coulod not complete file exception ceaseless. Could not complete file log is actually info log. This will be logged from client when closing the file. It will retry for some time (i remember 100 times) to ensure the suuceefull writes. Did you observe any write failures here? At that time, datanode is hang-up. I think maybe heart beat is lost, so namenode does not know the state of datanode. But I do not know why heart beat have lost. Is there any info can be found from log when datanode can not send heart beat? Can you check the NN UI to verify the number of live nodes. By this we can decide whether DN stopped sending heartbeats or not. Thanks and regards! bourne Regards, Uma
Re: Does hadoop support append option?
- Original Message - From: kartheek muthyala kartheek0...@gmail.com Date: Tuesday, October 18, 2011 1:31 pm Subject: Re: Does hadoop support append option? To: common-user@hadoop.apache.org Thanks Uma for the clarification of the append functionality. My second question is about the version number concept used in the blockmap. Why does it maintain this version number? sorry Karthik, As i know, there is no version number in blocks map. Are you talking about generationTimeStamp or something? can you paste the snippet where you have seen that version number, so, that i can get your question clearly. ~Kartheek On Tue, Oct 18, 2011 at 12:14 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: - Original Message - From: kartheek muthyala kartheek0...@gmail.com Date: Tuesday, October 18, 2011 11:54 am Subject: Re: Does hadoop support append option? To: common-user@hadoop.apache.org I am just concerned about the use case of appends in Hadoop. I know that they have provided support for appends in hadoop. But how frequently are the files getting appended? . In normal case file block details will not be persisted in edit log before closing the file. As part of close only, this will happen. If NN restart happens before closing the file, we loose this data. Consider a case, we have a very big file and data also very important, in this case, we should have an option to persist the block details frequently into editlog file rite, inorder to avoid the dataloss in case of NN restarts. To do this, DFS exposed the API called sync. This will basically persist the editlog entries to disk. To reopen the stream back again we will use append api. In trunk, this support has been refactored cleanly and handled many corner cases. APIs also provided as hflush. There is this version concept too that is maintained in the block report, according to my guess this version number is maintained to make sure that if a datanode gets disconnected once and comes back if it has a old copy of the data , then discard read requests to this data node. But if the files are not getting appended frequently does the version number remain the same?. Any typical use case can you guys point to? I am not sure, what is your exact question here. Can you please clarify more on this? ~Kartheek On Mon, Oct 17, 2011 at 12:53 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: AFAIK, append option is there in 20Append branch. Mainly supports sync. But there are some issues with that. Same has been merged to 20.205 branch and will be released soon (rc2 available). And also fixed many bugs in this branch. As per our basic testing it is pretty good as of now.Need to wait for official release. Regards, Uma - Original Message - From: bourne1900 bourne1...@yahoo.cn Date: Monday, October 17, 2011 12:37 pm Subject: Does hadoop support append option? To: common-user common-user@hadoop.apache.org I know that hadoop0.19.0 supports append option, but not stable.Does the latest version support append option? Is it stable? Thanks for help. bourne Regards, Uma
Re: execute hadoop job from remote web application
- Original Message - From: Oleg Ruchovets oruchov...@gmail.com Date: Tuesday, October 18, 2011 4:11 pm Subject: execute hadoop job from remote web application To: common-user@hadoop.apache.org Hi , what is the way to execute hadoop job on remote cluster. I want to execute my hadoop job from remote web application , but I didn't find any hadoop client (remote API) to do it. Please advice. Oleg You can put the Hadoop jars in your web applications classpath and find the Class JobClient and submit the jobs using it. Regards, Uma
Re: execute hadoop job from remote web application
- Original Message - From: Bejoy KS bejoy.had...@gmail.com Date: Tuesday, October 18, 2011 5:25 pm Subject: Re: execute hadoop job from remote web application To: common-user@hadoop.apache.org Oleg If you are looking at how to submit your jobs using JobClient then the below sample can give you a start. //get the configuration parameters and assigns a job name JobConf conf = new JobConf(getConf(), MyClass.class); conf.setJobName(SMS Reports); //setting key value types for mapper and reducer outputs conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(Text.class); //specifying the custom reducer class conf.setReducerClass(SmsReducer.class); //Specifying the input directories(@ runtime) and Mappers independently for inputs from multiple sources FileInputFormat.addInputPath(conf, new Path(args[0])); //Specifying the output directory @ runtime FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); Along with the hadoop jars you may need to have the config files as well on your client. The sample is from old map reduce API. You can use the new one as well in that we use the Job instead of JobClient. Hope it helps!.. Regards Bejoy.K.S On Tue, Oct 18, 2011 at 5:00 PM, Oleg Ruchovets oruchov...@gmail.comwrote: Excellent. Can you give a small example of code. Good samle by Bejoy hope, you have access for this site. Also please go through this docs, http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Example%3A+WordCount+v2.0 Here is the wordcount example. On Tue, Oct 18, 2011 at 1:13 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: - Original Message - From: Oleg Ruchovets oruchov...@gmail.com Date: Tuesday, October 18, 2011 4:11 pm Subject: execute hadoop job from remote web application To: common-user@hadoop.apache.org Hi , what is the way to execute hadoop job on remote cluster. I want to execute my hadoop job from remote web application , but I didn't find any hadoop client (remote API) to do it. Please advice. Oleg You can put the Hadoop jars in your web applications classpath and find the Class JobClient and submit the jobs using it. Regards, Uma Regards Uma
Re: Does hadoop support append option?
AFAIK, append option is there in 20Append branch. Mainly supports sync. But there are some issues with that. Same has been merged to 20.205 branch and will be released soon (rc2 available). And also fixed many bugs in this branch. As per our basic testing it is pretty good as of now.Need to wait for official release. Regards, Uma - Original Message - From: bourne1900 bourne1...@yahoo.cn Date: Monday, October 17, 2011 12:37 pm Subject: Does hadoop support append option? To: common-user common-user@hadoop.apache.org I know that hadoop0.19.0 supports append option, but not stable. Does the latest version support append option? Is it stable? Thanks for help. bourne
Re: Is there a good way to see how full hdfs is
We can write the simple program and you can call this API. Make sure Hadoop jars presents in your class path. Just for more clarification, DN will send their stats as parts of hertbeats, So, NN will maintain all the statistics about the diskspace usage for the complete filesystem and etc... This api will give you that stats. Regards, Uma - Original Message - From: ivan.nov...@emc.com Date: Monday, October 17, 2011 9:07 pm Subject: Re: Is there a good way to see how full hdfs is To: common-user@hadoop.apache.org, mapreduce-u...@hadoop.apache.org Cc: common-...@hadoop.apache.org So is there a client program to call this? Can one write their own simple client to call this method from all diskson the cluster? How about a map reduce job to collect from all disks on the cluster? On 10/15/11 4:51 AM, Uma Maheswara Rao G 72686 mahesw...@huawei.comwrote: /** Return the disk usage of the filesystem, including total capacity, * used space, and remaining space */ public DiskStatus getDiskStatus() throws IOException { return dfs.getDiskStatus(); } DistributedFileSystem has the above API from java API side. Regards, Uma - Original Message - From: wd w...@wdicc.com Date: Saturday, October 15, 2011 4:16 pm Subject: Re: Is there a good way to see how full hdfs is To: mapreduce-u...@hadoop.apache.org hadoop dfsadmin -report On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis lordjoe2...@gmail.com wrote: We have a small cluster with HDFS running on only 8 nodes - I believe that the partition assigned to hdfs might be getting full and wonder if the web tools or java api havew a way to look at free space on hdfs -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Re: Is there a good way to see how full hdfs is
Yes, that was deprecated in trunk If you want to use by programatically, this will be the better option as well. /** {@inheritDoc} */ @Override public FsStatus getStatus(Path p) throws IOException { statistics.incrementReadOps(1); return dfs.getDiskStatus(); } This should work for you. It will give you FileStatus object contains below APIs getCapacity, getUsed, getRemaining I would suggest you to look at the FileSystem APIs available once. I think you will get clear understanding to use. Regards, Uma - Original Message - From: ivan.nov...@emc.com Date: Monday, October 17, 2011 9:48 pm Subject: Re: Is there a good way to see how full hdfs is To: common-user@hadoop.apache.org Hi Harsh, I need access to the data programatically for system automation, and hence I do not want a monitoring tool but access to the raw data. I am more than happy to use an exposed function or client program and not an internal API. So i am still a bit confused... What is the simplest way to get at thisraw disk usage data programmatically? Is there a HDFS equivalent of du and df, or are you suggesting to just run that on the linux OS (which is perfectly doable). Cheers, Ivan On 10/17/11 9:05 AM, Harsh J ha...@cloudera.com wrote: Uma/Ivan, The DistributedFileSystem class explicitly is _not_ meant for public consumption, it is an internal one. Additionally, that method has beendeprecated. What you need is FileSystem#getStatus() if you want the summarized report via code. A job, that possibly runs du or df, is a good idea if you guarantee perfect homogeneity of path names in your cluster. But I wonder, why won't using a general monitoring tool (such as nagios) for this purpose cut it? What's the end goal here? P.s. I'd moved this conversation to hdfs-user@ earlier on, but now I see it being cross posted into mr-user, common-user, and common- dev -- Why? On Mon, Oct 17, 2011 at 9:25 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: We can write the simple program and you can call this API. Make sure Hadoop jars presents in your class path. Just for more clarification, DN will send their stats as parts of hertbeats, So, NN will maintain all the statistics about the diskspaceusage for the complete filesystem and etc... This api will give you that stats. Regards, Uma - Original Message - From: ivan.nov...@emc.com Date: Monday, October 17, 2011 9:07 pm Subject: Re: Is there a good way to see how full hdfs is To: common-user@hadoop.apache.org, mapreduce-u...@hadoop.apache.org Cc: common-...@hadoop.apache.org So is there a client program to call this? Can one write their own simple client to call this method from all diskson the cluster? How about a map reduce job to collect from all disks on the cluster? On 10/15/11 4:51 AM, Uma Maheswara Rao G 72686 mahesw...@huawei.comwrote: /** Return the disk usage of the filesystem, including total capacity, * used space, and remaining space */ public DiskStatus getDiskStatus() throws IOException { return dfs.getDiskStatus(); } DistributedFileSystem has the above API from java API side. Regards, Uma - Original Message - From: wd w...@wdicc.com Date: Saturday, October 15, 2011 4:16 pm Subject: Re: Is there a good way to see how full hdfs is To: mapreduce-u...@hadoop.apache.org hadoop dfsadmin -report On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis lordjoe2...@gmail.com wrote: We have a small cluster with HDFS running on only 8 nodes - I believe that the partition assigned to hdfs might be getting full and wonder if the web tools or java api havew a way to look at free space on hdfs -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Harsh J
Re: Hadoop node disk failure - reinstall question
- Original Message - From: Mayuran Yogarajah mayuran.yogara...@casalemedia.com Date: Tuesday, October 18, 2011 4:24 am Subject: Hadoop node disk failure - reinstall question To: common-user@hadoop.apache.org common-user@hadoop.apache.org One of our nodes died today, it looks like the disk containing the OS expired. I will need to reinstall the machine. Are there any known issues with using the same hostname / IP again, or is it better to give it a new IP / host name ? The second disk on the machine is still operational and contains HDFS data so I plan on mounting it. Is this ill-advised? Should I just wipe that disk too ? Copying that data to new machine would be good option. It is gain depending on the replication. If you have enough replicas in your cluster. then automatically replication will happen to new nodes. In that case you need not even worry about old data. thanks, M
Re: Unrecognized option: -jvm
You are using Which version of Hadoop ? Please check the recent discussion, which will help you related to this problem. http://search-hadoop.com/m/PPgvNPUoL2subj=Re+Starting+Datanode Regards, Uma - Original Message - From: Majid Azimi majid.merk...@gmail.com Date: Sunday, October 16, 2011 2:22 am Subject: Unrecognized option: -jvm To: common-user@hadoop.apache.org Hi guys, I'm realy new to hadoop. I have configured a single node hadoop cluster. but seems that my data node is not working. job tracker log file shows thismessage(alot of them per 10 second): 2011-10-16 00:01:15,558 WARN org.apache.hadoop.mapred.JobTracker: Retrying... 2011-10-16 00:01:15,589 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamerException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-root/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1030) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy5.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) 2011-10-16 00:01:15,589 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 2011-10-16 00:01:15,589 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file /tmp/hadoop- root/mapred/system/jobtracker.info- Aborting... 2011-10-16 00:01:15,590 WARN org.apache.hadoop.mapred.JobTracker: Writing to file hdfs://localhost/tmp/hadoop-root/mapred/system/jobtracker.info failed!2011-10-16 00:01:15,593 WARN org.apache.hadoop.mapred.JobTracker: FileSystem is not ready yet! 2011-10-16 00:01:15,603 WARN org.apache.hadoop.mapred.JobTracker: Failed to initialize recovery manager. org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-root/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1030) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy5.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at
Re: Too much fetch failure
Are you able to ping the other node with the configured hostnames? Make sure that you should be able to ping to the other machine with the configured hostname in ect/hosts files. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Sunday, October 16, 2011 6:46 pm Subject: Re: Too much fetch failure To: common-user@hadoop.apache.org try commenting 127.0.0.1 localhost line in your /etc/hosts and then restartthe cluster and then try again. Thanks, Praveenesh On Sun, Oct 16, 2011 at 2:00 PM, Humayun gmail humayun0...@gmail.comwrote: we are using hadoop on virtual box. when it is a single node then it works fine for big dataset larger than the default block size. but in case of multinode cluster (2 nodes) we are facing some problems. Like when the input dataset is smaller than the default block size(64 MB) then it works fine. but when the input dataset is larger than the default block size then it shows ‘too much fetch failure’ in reduce state. here is the output link http://paste.ubuntu.com/707517/ From the above comments , there are many users who faced this problem. different users suggested to modify the /etc/hosts file in different manner to fix the problem. but there is no ultimate solution.we need the actual solution thats why we are writing here. this is our /etc/hosts file 192.168.60.147 humayun # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 humayun localhost6.localdomain6 localhost6 127.0.1.1 humayun # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 192.168.60.1 master 192.168.60.2 slave
Re: Too much fetch failure
I mean, two nodes here is tasktrackers. - Original Message - From: Humayun gmail humayun0...@gmail.com Date: Sunday, October 16, 2011 7:38 pm Subject: Re: Too much fetch failure To: common-user@hadoop.apache.org yes we can ping every node (both master and slave). On 16 October 2011 19:52, Uma Maheswara Rao G 72686 mahesw...@huawei.comwrote: Are you able to ping the other node with the configured hostnames? Make sure that you should be able to ping to the other machine with the configured hostname in ect/hosts files. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Sunday, October 16, 2011 6:46 pm Subject: Re: Too much fetch failure To: common-user@hadoop.apache.org try commenting 127.0.0.1 localhost line in your /etc/hosts and then restartthe cluster and then try again. Thanks, Praveenesh On Sun, Oct 16, 2011 at 2:00 PM, Humayun gmail humayun0...@gmail.comwrote: we are using hadoop on virtual box. when it is a single node then it works fine for big dataset larger than the default block size. but in case of multinode cluster (2 nodes) we are facing some problems. Like when the input dataset is smaller than the default block size(64 MB) then it works fine. but when the input dataset is larger than the default block size then it shows ‘too much fetch failure’ in reduce state. here is the output link http://paste.ubuntu.com/707517/ From the above comments , there are many users who faced this problem. different users suggested to modify the /etc/hosts file in different manner to fix the problem. but there is no ultimate solution.we need the actual solution thats why we are writing here. this is our /etc/hosts file 192.168.60.147 humayun # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 humayun localhost6.localdomain6 localhost6 127.0.1.1 humayun # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 192.168.60.1 master 192.168.60.2 slave
Re: Is there a good way to see how full hdfs is
/** Return the disk usage of the filesystem, including total capacity, * used space, and remaining space */ public DiskStatus getDiskStatus() throws IOException { return dfs.getDiskStatus(); } DistributedFileSystem has the above API from java API side. Regards, Uma - Original Message - From: wd w...@wdicc.com Date: Saturday, October 15, 2011 4:16 pm Subject: Re: Is there a good way to see how full hdfs is To: mapreduce-user@hadoop.apache.org hadoop dfsadmin -report On Sat, Oct 15, 2011 at 8:16 AM, Steve Lewis lordjoe2...@gmail.com wrote: We have a small cluster with HDFS running on only 8 nodes - I believe that the partition assigned to hdfs might be getting full and wonder if the web tools or java api havew a way to look at free space on hdfs -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Re: hadoop input buffer size
I think below can give you more info about it. http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/ Nice explanation by Owen here. Regards, Uma - Original Message - From: Yang Xiaoliang yangxiaoliang2...@gmail.com Date: Wednesday, October 5, 2011 4:27 pm Subject: Re: hadoop input buffer size To: common-user@hadoop.apache.org Hi, Hadoop neither read one line each time, nor fetching dfs.block.size of lines into a buffer, Actually, for the TextInputFormat, it read io.file.buffer.size bytes of text into a buffer each time, this can be seen from the hadoop source file LineReader.java 2011/10/5 Mark question markq2...@gmail.com Hello, Correct me if I'm wrong, but when a program opens n-files at the same time to read from, and start reading from each file at a time 1 line at a time. Isn't hadoop actually fetching dfs.block.size of lines into a buffer? and not actually one line. If this is correct, I set up my dfs.block.size = 3MB and each line takes about 650 bytes only, then I would assume the performance for reading 1-4000 lines would be the same, but it isn't ! Do you know a way to find #n of lines to be read at once? Thank you, Mark
Re: How to iterate over a hdfs folder with hadoop
Yes, FileStatus class would be trhe equavalent for list. FileStstus has the API's isDir and getPath. This both api's can satify for your futher usage.:-) I think small difference would be, FileStatus will ensure the sorted order. Regards, Uma - Original Message - From: John Conwell j...@iamjohn.me Date: Monday, October 10, 2011 8:40 pm Subject: Re: How to iterate over a hdfs folder with hadoop To: common-user@hadoop.apache.org FileStatus[] files = fs.listStatus(new Path(path)); for (FileStatus fileStatus : files) { //...do stuff ehre } On Mon, Oct 10, 2011 at 8:03 AM, Raimon Bosch raimon.bo...@gmail.comwrote: Hi, I'm wondering how can I browse an hdfs folder using the classes in org.apache.hadoop.fs package. The operation that I'm looking for is 'hadoop dfs -ls' The standard file system equivalent would be: File f = new File(outputPath); if(f.isDirectory()){ String files[] = f.list(); for(String file : files){ //Do your logic } } Thanks in advance, Raimon Bosch. -- Thanks, John C
Re: Secondary namenode fsimage concept
Hi, It looks to me that, problem with your NFS. It is not supporting locks. Which version of NFS are you using? Please check your NFS locking support by writing simple program for file locking. I think NFS4 supports locking ( i did not tried). http://nfs.sourceforge.net/ A6. What are the main new features in version 4 of the NFS protocol? *NFS Versions 2 and 3 are stateless protocols, but NFS Version 4 introduces state. An NFS Version 4 client uses state to notify an NFS Version 4 server of its intentions on a file: locking, reading, writing, and so on. An NFS Version 4 server can return information to a client about what other clients have intentions on a file to allow a client to cache file data more aggressively via delegation. To help keep state consistent, more sophisticated client and server reboot recovery mechanisms are built in to the NFS Version 4 protocol. *NFS Version 4 introduces support for byte-range locking and share reservation. Locking in NFS Version 4 is lease-based, so an NFS Version 4 client must maintain contact with an NFS Version 4 server to continue extending its open and lock leases. Regards, Uma - Original Message - From: Shouguo Li the1plum...@gmail.com Date: Tuesday, October 11, 2011 2:31 am Subject: Re: Secondary namenode fsimage concept To: common-user@hadoop.apache.org hey parick i wanted to configure my cluster to write namenode metadata to multipledirectories as well: property namedfs.name.dir/name value/hadoop/var/name,/mnt/hadoop/var/name/value /property in my case, /hadoop/var/name is local directory, /mnt/hadoop/var/name is NFS volume. i took down the cluster first, then copied over files from /hadoop/var/name to /mnt/hadoop/var/name, and then tried to start up the cluster. but the cluster won't start up properly... here's the namenode log: http://pastebin.com/gmu0B7yd any ideas why it wouldn't start up? thx On Thu, Oct 6, 2011 at 6:58 PM, patrick sang silvianhad...@gmail.comwrote: I would say your namenode write metadata in local fs (where your secondary namenode will pull files), and NFS mount. property namedfs.name.dir/name value/hadoop/name,/hadoop/nfs_server_name/value /property my 0.02$ P On Thu, Oct 6, 2011 at 12:04 AM, shanmuganathan.r shanmuganatha...@zohocorp.com wrote: Hi Kai, There is no datas stored in the secondarynamenode related to the Hadoop cluster . Am I correct? If it correct means If we run the secondaryname node in separate machine then fetching , merging and transferring time is increased if the cluster has large data in the namenode fsimage file . At the time if fail over occurs , then how can we recover the nearly one hour changes in the HDFS file ? (default check point time is one hour) Thanks R.Shanmuganathan On Thu, 06 Oct 2011 12:20:28 +0530 Kai Voigtk...@123.orggt; wrote Hi, the secondary namenode only fetches the two files when a checkpointing is needed. Kai Am 06.10.2011 um 08:45 schrieb shanmuganathan.r: gt; Hi Kai, gt; gt; In the Second part I meant gt; gt; gt; Is the secondary namenode also contain the FSImage file or the two files(FSImage and EdiltLog) are transferred from the namenode at the checkpoint time. gt; gt; gt; Thanks gt; Shanmuganathan gt; gt; gt; gt; gt; gt; On Thu, 06 Oct 2011 11:37:50 +0530 Kai Voigtamp;lt;k...@123.org amp;gt; wrote gt; gt; gt; Hi, gt; gt; you're correct when saying the namenode hosts the fsimage file and the edits log file. gt; gt; The fsimage file contains a snapshot of the HDFS metadata (a filename to blocks list mapping). Whenever there is a change to HDFS, it will be appended to the edits file. Think of it as a database transaction log, where changes will not be applied to the datafile, but appended to a log. gt; gt; To prevent the edits file growing infinitely, the secondary namenode periodically pulls these two files, and the namenode starts writing changes to a new edits file. Then, the secondary namenode merges the changes from the edits file with the old snapshot from the fsimage file and creates an updated fsimage file. This updated fsimage file is then copied to the namenode. gt; gt; Then, the entire cycle starts again. To answer your question: The namenode has both files, even if the secondary namenode is running on a different machine. gt; gt; Kai gt; gt; Am 06.10.2011 um 07:57 schrieb shanmuganathan.r: gt; gt; amp;gt; gt; amp;gt; Hi All, gt; amp;gt; gt; amp;gt; I have a doubt in hadoop secondary namenode concept . Please correct if the following statements are wrong . gt; amp;gt; gt; amp;gt; gt; amp;gt; The namenode hosts the fsimage and edit log files.
Re: Error using hadoop distcp
Distcp will run as mapreduce job. Here tasktrackers required the hostname mappings to contact to other nodes. Please configure the mapping correctly in both the machines and try. egards, Uma - Original Message - From: trang van anh anh...@vtc.vn Date: Wednesday, October 5, 2011 1:41 pm Subject: Re: Error using hadoop distcp To: common-user@hadoop.apache.org which host run the task that throws the exception ? ensure that each data node know another data nodes in hadoop cluster- add ub16 entry in /etc/hosts on where the task running. On 10/5/2011 12:15 PM, praveenesh kumar wrote: I am trying to use distcp to copy a file from one HDFS to another. But while copying I am getting the following exception : hadoop distcp hdfs://ub13:54310/user/hadoop/weblog hdfs://ub16:54310/user/hadoop/weblog 11/10/05 10:41:01 INFO mapred.JobClient: Task Id : attempt_201110031447_0005_m_07_0, Status : FAILED java.net.UnknownHostException: unknown host: ub16 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:195) at org.apache.hadoop.ipc.Client.getConnection(Client.java:850) at org.apache.hadoop.ipc.Client.call(Client.java:720) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:215) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:177) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48) at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124) at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296) at org.apache.hadoop.mapred.Child.main(Child.java:170) Its saying its not finding ub16. But the entry is there in /etc/hosts files. I am able to ssh both the machines. Do I need password less ssh between these two NNs ? What can be the issue ? Any thing I am missing before using distcp ? Thanks, Praveenesh
Re: ERROR 1066: Unable to open iterator for alias A. Backend error : Could not obtain block:
Hello Kiran, Can you check that block presents in DN and check the generation timestamp in metafile(if you are aware of it)? Can you grep the blk_-8354424441116992221 from your logs and paste here? We have seen this when recovery is in progress and read parallelly(in 0.20x versions). If this problem is because of recovery, you should be able read the file in next attempt. we can make out the scenario beased on your grep rsult from logs. Thanks Regards, Uma - Original Message - From: kiranprasad kiranprasa...@imimobile.com Date: Friday, October 7, 2011 2:18 pm Subject: ERROR 1066: Unable to open iterator for alias A. Backend error : Could not obtain block: To: hdfs-user@hadoop.apache.org Hi I ve checked with below mentioned command and I am getting [kiranprasad.g@pig4 hadoop-0.20.2]$ bin/hadoop fs -text /data/arpumsisdn.txt | tail 11/10/07 16:17:18 INFO hdfs.DFSClient: No node available for block: blk_-8354424441116992221_1060 file=/data/arpumsisdn.txt 11/10/07 16:17:18 INFO hdfs.DFSClient: Could not obtain block blk_-8354424441116992221_1060 from any node: java.io.IOException: No live nodes contain current block 11/10/07 16:17:21 INFO hdfs.DFSClient: No node available for block: blk_-8354424441116992221_1060 file=/data/arpumsisdn.txt 11/10/07 16:17:21 INFO hdfs.DFSClient: Could not obtain block blk_-8354424441116992221_1060 from any node: java.io.IOException: No live nodes contain current block 11/10/07 16:17:25 INFO hdfs.DFSClient: No node available for block: blk_-8354424441116992221_1060 file=/data/arpumsisdn.txt 11/10/07 16:17:25 INFO hdfs.DFSClient: Could not obtain block blk_-8354424441116992221_1060 from any node: java.io.IOException: No live nodes contain current block 11/10/07 16:17:29 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-8354424441116992221_1060 file=/data/arpumsisdn.txtat org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1695) at java.io.DataInputStream.readShort(DataInputStream.java:295) at org.apache.hadoop.fs.FsShell.forMagic(FsShell.java:397) at org.apache.hadoop.fs.FsShell.access$200(FsShell.java:49) at org.apache.hadoop.fs.FsShell$2.process(FsShell.java:420) at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898) at org.apache.hadoop.fs.FsShell.text(FsShell.java:414) at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1563) at org.apache.hadoop.fs.FsShell.run(FsShell.java:1763) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880) text: Could not obtain block: blk_-8354424441116992221_1060 file=/data/arpumsisdn.txt The Block is not available. How to recover the data block ? -Original Message- From: Alex Rovner Sent: Wednesday, October 05, 2011 5:55 PM To: u...@pig.apache.org Subject: Re: ERROR 1066: Unable to open iterator for alias A. Backend error : Could not obtain block: You can also test quickly if thats the issue by running the following command: hadoop fs -text /data/arpumsisdn.txt | tail On Wed, Oct 5, 2011 at 8:24 AM, Alex Rovner alexrov...@gmail.com wrote: Kiran, This looks like your HDFS is missing some blocks. Can you run fsck and see if you have missing blocks and if so for what files? http://hadoop.apache.org/common/docs/r0.17.2/hdfs_user_guide.html#Fsck Alex On Tue, Oct 4, 2011 at 7:53 AM, kiranprasad kiranprasa...@imimobile.comwrote: I am getting the below exception when trying to execute PIG latin script. Failed! Failed Jobs: JobId Alias Feature Message Outputs job_201110042009_0005 A MAP_ONLYMessage: Job failed! hdfs://10.0.0.61/tmp/temp1751671187/tmp-592386019, Input(s): Failed to read data from /data/arpumsisdn.txt Output(s): Failed to produce result in hdfs:// 10.0.0.61/tmp/temp1751671187/tmp-592386019 Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_201110042009_0005 2011-10-04 22:13:53,736 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2011-10-04 22:13:53,745 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A. Backend error : Could not obtain block: blk_-8354424441116992221_1060 file=/data/arpumsisdn.txt Details at logfile:
Re: FileSystem closed
FileSystem objects will be cached in jvm. When it tries to get the FS object by using Filesystem.get(..) ( sequence file internally will use it), it will return same fs object if scheme and authority is same for the uri. fs cache key's equals implementation is below static boolean isEqual(Object a, Object b) { return a == b || (a != null a.equals(b)); } /** {@inheritDoc} */ public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } I think, here some your files uri and schems are same and got the same fs object. When it closes first one, diffenitely other will get this exception. Regards, Uma - Original Message - From: Joey Echeverria j...@cloudera.com Date: Thursday, September 29, 2011 10:34 pm Subject: Re: FileSystem closed To: common-user@hadoop.apache.org Do you close your FileSystem instances at all? IIRC, the FileSystem instance you use is a singleton and if you close it once, it's closed for everybody. My guess is you close it in your cleanup method and you have JVM reuse turned on. -Joey On Thu, Sep 29, 2011 at 12:49 PM, Mark question markq2...@gmail.com wrote: Hello, I'm running 100 mappers sequentially on a single machine, where each mapper opens 100 files at the beginning then read one by one sequentially and closes after each one is done. After executing 6 mappers, the 7th gives this error: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:297) at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:426) at java.io.FilterInputStream.close(FilterInputStream.java:155) at org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1653) at Mapper_Reader20HM4.CleanUp(Mapper_Reader20HM4.java:124) at BFMapper20HM9.close(BFMapper20HM9.java:264) at BFMapRunner20HM9.run(BFMapRunner20HM9.java:95) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:397) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) at org.apache.hadoop.mapred.Child.main(Child.java:211) java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:297) at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:426) at java.io.FilterInputStream.close(FilterInputStream.java:155) at org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1653) at Mapper_Reader20HM4.CleanUp(Mapper_Reader20HM4.java:124) at BFMapper20HM9.close(BFMapper20HM9.java:264) at BFMapRunner20HM9.run(BFMapRunner20HM9.java:95) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:397) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) at org.apache.hadoop.mapred.Child.main(Child.java:211) java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:297) at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:426) at java.io.FilterInputStream.close(FilterInputStream.java:155) at org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1653) at Mapper_Reader20HM4.CleanUp(Mapper_Reader20HM4.java:124) at BFMapper20HM9.close(BFMapper20HM9.java:264) at BFMapRunner20HM9.run(BFMapRunner20HM9.java:95) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:397) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) at org.apache.hadoop.mapred.Child.main(Child.java:211) java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:297) at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:426) at
Re: Block Size
hi, Here is some useful info: A small file is one which is significantly smaller than the HDFS block size (default 64MB). If you’re storing small files, then you probably have lots of them (otherwise you wouldn’t turn to Hadoop), and the problem is that HDFS can’t handle lots of files. Every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible. Furthermore, HDFS is not geared up to efficiently accessing small files: it is primarily designed for streaming access of large files. Reading through small files normally causes lots of seeks and lots of hopping from datanode to datanode to retrieve each small file, all of which is an inefficient data access pattern. Problems with small files and MapReduce Map tasks usually process a block of input at a time (using the default FileInputFormat). If the file is very small and there are a lot of them, then each map task processes very little input, and there are a lot more map tasks, each of which imposes extra bookkeeping overhead. Compare a 1GB file broken into 16 64MB blocks, and 10,000 or so 100KB files. The 10,000 files use one map each, and the job time can be tens or hundreds of times slower than the equivalent one with a single input file. There are a couple of features to help alleviate the bookkeeping overhead: task JVM reuse for running multiple map tasks in one JVM, thereby avoiding some JVM startup overhead (see the mapred.job.reuse.jvm.num.tasks property), and MultiFileInputSplit which can run more than one split per map. just copied from cloudera's blog. http://www.cloudera.com/blog/2009/02/the-small-files-problem/#comments regards, Uma - Original Message - From: lessonz less...@q.com Date: Thursday, September 29, 2011 11:10 pm Subject: Block Size To: common-user common-user@hadoop.apache.org I'm new to Hadoop, and I'm trying to understand the implications of a 64M block size in the HDFS. Is there a good reference that enumerates the implications of this decision and its effects on files stored in the system as well as map-reduce jobs? Thanks.
Re: How to run Hadoop in standalone mode in Windows
Java 6, Cygwin ( maven + tortoiseSVN are for building hadoop) should be enough for running standalone mode in windows. Regards, Uma - Original Message - From: Mark Kerzner markkerz...@gmail.com Date: Saturday, September 24, 2011 4:58 am Subject: How to run Hadoop in standalone mode in Windows To: common-user@hadoop.apache.org Hi, I have cygwin, and I have NetBeans, and I have a maven Hadoop project that works on Linux. How do I combine them to work in Windows? Thank you, Mark
Re: HDFS file into Blocks
@Kartheek, Great :-) - Original Message - From: kartheek muthyala kartheek0...@gmail.com Date: Monday, September 26, 2011 12:06 pm Subject: Re: HDFS file into Blocks To: common-user@hadoop.apache.org @Uma, Thanks alot!!. I have found the flow... Thanks, Kartheek. On Mon, Sep 26, 2011 at 10:03 AM, He Chen airb...@gmail.com wrote: Hi It is interesting that a guy from Huawei is also working on Hadoop project. :) Chen On Sun, Sep 25, 2011 at 11:29 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: Hi, You can find the Code in DFSOutputStream.java Here there will be one thread DataStreamer thread. This thread will pick the packets from DataQueue and write on to the sockets. Before this, when actually writing the chunks, based on the block size parameter passed from client, it will set the last packet parameter in Packet. If the streamer thread finds that is the last block then it end the block. That means it will close the socket which were used for witing the block. Streamer thread repeat the loops. When it find there is no sockets open then it will again create the pipeline for the next block. Go throgh the flow from writeChunk in DFSOutputStream.java, where exactly enqueing the packets in dataQueue. Regards, Uma - Original Message - From: kartheek muthyala kartheek0...@gmail.com Date: Sunday, September 25, 2011 11:06 am Subject: HDFS file into Blocks To: common-user@hadoop.apache.org Hi all, I am working around the code to understand where HDFS divides a file into blocks. Can anyone point me to this section of the code? Thanks, Kartheek
Re: Too many fetch failures. Help!
Hello Abdelrahman, Are you able to ping from one machine to other with the configured hostname? configure both the hostnames in /etc/hosts file properly and try. Regards, Uma - Original Message - From: Abdelrahman Kamel abdouka...@gmail.com Date: Monday, September 26, 2011 8:47 pm Subject: Too many fetch failures. Help! To: common-user@hadoop.apache.org Hi, This is my first post here. I'm new to Hadoop. I've already installed Hadoop on 2 Ubuntu boxes (one is both master andslave and the other is only slave). When I run a Wordcount example on 5 small txt files, the process never completes and I get a Too many fetch failures error on my terminal. If you can help me, I cant post my terminal's output and any log files needed. Great thanks. -- Abdelrahman Kamel
Re: Hadoop java mapper -copyFromLocal heap size error
Hello Joris, Looks You have configured mapred.map.child.java.opts to -Xmx512M, To spawn a child process that much memory is required. Can you check what other processes occupied memory in your machine. Bacuse your current task is not getting the enough memory to initialize. or try to reduce the mapred.map.child.java.opts to 256 , if your map task can exeute with that memory. Regards, Uma - Original Message - From: Joris Poort gpo...@gmail.com Date: Saturday, September 24, 2011 5:50 am Subject: Hadoop java mapper -copyFromLocal heap size error To: mapreduce-user mapreduce-user@hadoop.apache.org As part of my Java mapper I have a command executes some code on the local node and copies a local output file to the hadoop fs. Unfortunately I'm getting the following output: Error occurred during initialization of VM Could not reserve enough space for object heap I've tried adjusting mapred.map.child.java.opts to -Xmx512M, but unfortunately no luck. When I ssh into the node, I can run the -copyFromLocal command without any issues. The ouput files are also quite small like around 100kb. Any help would be greatly appreciated! Cheers, Joris
Re: HDFS file into Blocks
Hi, You can find the Code in DFSOutputStream.java Here there will be one thread DataStreamer thread. This thread will pick the packets from DataQueue and write on to the sockets. Before this, when actually writing the chunks, based on the block size parameter passed from client, it will set the last packet parameter in Packet. If the streamer thread finds that is the last block then it end the block. That means it will close the socket which were used for witing the block. Streamer thread repeat the loops. When it find there is no sockets open then it will again create the pipeline for the next block. Go throgh the flow from writeChunk in DFSOutputStream.java, where exactly enqueing the packets in dataQueue. Regards, Uma - Original Message - From: kartheek muthyala kartheek0...@gmail.com Date: Sunday, September 25, 2011 11:06 am Subject: HDFS file into Blocks To: common-user@hadoop.apache.org Hi all, I am working around the code to understand where HDFS divides a file into blocks. Can anyone point me to this section of the code? Thanks, Kartheek
Re: RE: Making Mumak work with capacity scheduler
Yes Devaraj, From the logs, looks it failed to create /jobtracker/jobsInfo code snippet: if (!fs.exists(path)) { if (!fs.mkdirs(path, new FsPermission(JOB_STATUS_STORE_DIR_PERMISSION))) { throw new IOException( CompletedJobStatusStore mkdirs failed to create + path.toString()); } @ Arun, Can you check, you have correct permission as Devaraj said? 2011-09-22 15:53:57.598::INFO: Started SelectChannelConnector@0.0.0.0:50030 11/09/22 15:53:57 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 11/09/22 15:53:57 WARN conf.Configuration: mapred.task.cache.levels is deprecated. Instead, use mapreduce.jobtracker.taskcache.levels 11/09/22 15:53:57 WARN mapred.SimulatorJobTracker: Error starting tracker: java.io.IOException: CompletedJobStatusStore mkdirs failed to create /jobtracker/jobsInfo at org.apache.hadoop.mapred.CompletedJobStatusStore.init(CompletedJobStatusStore.java:83) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:4684) at org.apache.hadoop.mapred.SimulatorJobTracker.init(SimulatorJobTracker.java:81) at org.apache.hadoop.mapred.SimulatorJobTracker.startTracker(SimulatorJobTracker.java:100) at org.apache.hadoop.mapred.SimulatorEngine.init(SimulatorEngine.java:210) at org.apache.hadoop.mapred.SimulatorEngine.init(SimulatorEngine.java:184) at org.apache.hadoop.mapred.SimulatorEngine.run(SimulatorEngine.java:292) at org.apache.hadoop.mapred.SimulatorEngine.run(SimulatorEngine.java:323) I cc'ed to Mapreduce user mailing list as well. Regards, Uma - Original Message - From: Devaraj K devara...@huawei.com Date: Thursday, September 22, 2011 6:01 pm Subject: RE: Making Mumak work with capacity scheduler To: common-u...@hadoop.apache.org Hi Arun, I have gone through the logs. Mumak simulator is trying to start the job tracker and job tracking is failing to start because it is not able to create /jobtracker/jobsinfo directory. I think the directory doesn't have enough permissions. Please check thepermissions or any other reason why it is failing to create the dir. Devaraj K -Original Message- From: arun k [mailto:arunk...@gmail.com] Sent: Thursday, September 22, 2011 3:57 PM To: common-u...@hadoop.apache.org Subject: Re: Making Mumak work with capacity scheduler Hi Uma ! u got me right ! Actually without any patch when i modified appropriate mapred- site.xml and capacity-scheduler.xml and copied capaciy jar accordingly. I am able to see see queues in Jobracker GUI but both the queues show same set of job's execution. I ran with trace and topology files from test/data : $bin/mumak.sh trace_file topology_file Is it because i am not submitting jobs to a particular queue ? If so how can i do it ? Got hadoop-0.22 from http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22/ builded all three components but when i give arun@arun-Presario-C500-RU914PA-ACJ:~/hadoop22/branch- 0.22/mapreduce/src/contrib/mumak$ bin/mumak.sh src/test/data/19-jobs.trace.json.gz src/test/data/19-jobs.topology.json.gz it gets stuck at some point. Log is here http://pastebin.com/9SNUHLFy Thanks, Arun On Wed, Sep 21, 2011 at 2:03 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: Hello Arun, If you want to apply MAPREDUCE-1253 on 21 version, applying patch directly using commands may not work because of codebase changes. So, you take the patch and apply the lines in your code base manually. I am not sure any otherway for this. Did i understand wrongly your intention? Regards, Uma - Original Message - From: ArunKumar arunk...@gmail.com Date: Wednesday, September 21, 2011 1:52 pm Subject: Re: Making Mumak work with capacity scheduler To: hadoop-u...@lucene.apache.org Hi Uma ! Mumak is not part of stable versions yet. It comes from Hadoop- 0.21 onwards. Can u describe in detail You may need to merge them logically ( back port them) ? I don't get it . Arun On Wed, Sep 21, 2011 at 12:07 PM, Uma Maheswara Rao G [via Lucene] ml-node+s472066n3354668...@n3.nabble.com wrote: Looks that patchs are based on 0.22 version. So, you can not apply them directly. You may need to merge them logically ( back port them). one more point to note here 0.21 version of hadoop is not a stable version. Presently 0.20xx versions are stable. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354668i=0 Date: Wednesday, September 21, 2011 12:01 pm Subject: Re: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354668i=1 Hi Uma ! I am applying patch to mumak in hadoop-0.21
Re: Can we replace namenode machine with some other machine ?
In NN many deamons will run. For replicating the blocks from one DN to other DN when there is no enough replications. SafeMode monitering, LeaseManager and will also maintain the Blocks to machineList mappings in memory, HeartbeatMonitoring, IPC handlers..etc. In JT also there are many deamons like this. If you are not dealing with very less files then normal configuration is enough. But you should configure enough memory for running the NN and JT.This always will comes under your usage. For better understanding, I would suggest you to go through the Hadoop Deffenitive Guide. All this details has been documented very well. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Thursday, September 22, 2011 11:45 am Subject: Re: Can we replace namenode machine with some other machine ? To: common-user@hadoop.apache.org But apart from storing metadata info, Is there anything more NN/JT machinesare doing ?? . So I can say I can survive with poor NN if I am not dealing with lots of files in HDFS ? On Thu, Sep 22, 2011 at 11:08 AM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: By just changing the configs will not effect your data. You need to restart your DNs to connect to new NN. For the second question: It will again depends on your usage. If your files will more in DFS then NN will consume more memory as it needs to store all the metadata info of the files in NameSpace. If your files are more and more then it is recommended that dont put the NN and JT in same machine. Coming to DN case: Configured space will used for storing the block files.Once it is filled the space then NN will not select this DN for further writes. So, if one DN has less space should fine than less space for NN in big clusters. Configuring good configuration DN which has very good amount of space. And NN has less space to store your files metadata info then its of no use to have more space in DNs right :-) Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Thursday, September 22, 2011 10:42 am Subject: Re: Can we replace namenode machine with some other machine ? To: common-user@hadoop.apache.org If I just change configuration settings in slave machines, Will it effectany of the data that is currently residing in the cluster ?? And my second question was... Do we need the master node (NN/JT hosting machine) to have good configuration than our slave machines(DN/TT hosting machines). Actually my master node is a weaker machine than my slave machines,because I am assuming that master machines does not do much additional work, and its okay to have a weak machine as master. Now I have a new big server machine just being added to my cluster. So I am thinking shall I make this new machine as my new master(NN/JT) or just add this machine as slave ? Thanks, Praveenesh On Thu, Sep 22, 2011 at 10:20 AM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: You copy the same installations to new machine and change ip address. After that configure the new NN addresses to your clients and DNs. Also Does Namenode/JobTracker machine's configuration needs to be better than datanodes/tasktracker's ?? I did not get this question. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Thursday, September 22, 2011 10:13 am Subject: Can we replace namenode machine with some other machine ? To: common-user@hadoop.apache.org Hi all, Can we replace our namenode machine later with some other machine. ? Actually I got a new server machine in my cluster and now I want to make this machine as my new namenode and jobtracker node ? Also Does Namenode/JobTracker machine's configuration needs to be betterthan datanodes/tasktracker's ?? How can I achieve this target with least overhead ? Thanks, Praveenesh
Re: RE: Making Mumak work with capacity scheduler
Yes Devaraj, From the logs, looks it failed to create /jobtracker/jobsInfo code snippet: if (!fs.exists(path)) { if (!fs.mkdirs(path, new FsPermission(JOB_STATUS_STORE_DIR_PERMISSION))) { throw new IOException( CompletedJobStatusStore mkdirs failed to create + path.toString()); } @ Arun, Can you check, you have correct permission as Devaraj said? 2011-09-22 15:53:57.598::INFO: Started SelectChannelConnector@0.0.0.0:50030 11/09/22 15:53:57 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 11/09/22 15:53:57 WARN conf.Configuration: mapred.task.cache.levels is deprecated. Instead, use mapreduce.jobtracker.taskcache.levels 11/09/22 15:53:57 WARN mapred.SimulatorJobTracker: Error starting tracker: java.io.IOException: CompletedJobStatusStore mkdirs failed to create /jobtracker/jobsInfo at org.apache.hadoop.mapred.CompletedJobStatusStore.init(CompletedJobStatusStore.java:83) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:4684) at org.apache.hadoop.mapred.SimulatorJobTracker.init(SimulatorJobTracker.java:81) at org.apache.hadoop.mapred.SimulatorJobTracker.startTracker(SimulatorJobTracker.java:100) at org.apache.hadoop.mapred.SimulatorEngine.init(SimulatorEngine.java:210) at org.apache.hadoop.mapred.SimulatorEngine.init(SimulatorEngine.java:184) at org.apache.hadoop.mapred.SimulatorEngine.run(SimulatorEngine.java:292) at org.apache.hadoop.mapred.SimulatorEngine.run(SimulatorEngine.java:323) I cc'ed to Mapreduce user mailing list as well. Regards, Uma - Original Message - From: Devaraj K devara...@huawei.com Date: Thursday, September 22, 2011 6:01 pm Subject: RE: Making Mumak work with capacity scheduler To: common-user@hadoop.apache.org Hi Arun, I have gone through the logs. Mumak simulator is trying to start the job tracker and job tracking is failing to start because it is not able to create /jobtracker/jobsinfo directory. I think the directory doesn't have enough permissions. Please check thepermissions or any other reason why it is failing to create the dir. Devaraj K -Original Message- From: arun k [mailto:arunk...@gmail.com] Sent: Thursday, September 22, 2011 3:57 PM To: common-user@hadoop.apache.org Subject: Re: Making Mumak work with capacity scheduler Hi Uma ! u got me right ! Actually without any patch when i modified appropriate mapred- site.xml and capacity-scheduler.xml and copied capaciy jar accordingly. I am able to see see queues in Jobracker GUI but both the queues show same set of job's execution. I ran with trace and topology files from test/data : $bin/mumak.sh trace_file topology_file Is it because i am not submitting jobs to a particular queue ? If so how can i do it ? Got hadoop-0.22 from http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22/ builded all three components but when i give arun@arun-Presario-C500-RU914PA-ACJ:~/hadoop22/branch- 0.22/mapreduce/src/contrib/mumak$ bin/mumak.sh src/test/data/19-jobs.trace.json.gz src/test/data/19-jobs.topology.json.gz it gets stuck at some point. Log is here http://pastebin.com/9SNUHLFy Thanks, Arun On Wed, Sep 21, 2011 at 2:03 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: Hello Arun, If you want to apply MAPREDUCE-1253 on 21 version, applying patch directly using commands may not work because of codebase changes. So, you take the patch and apply the lines in your code base manually. I am not sure any otherway for this. Did i understand wrongly your intention? Regards, Uma - Original Message - From: ArunKumar arunk...@gmail.com Date: Wednesday, September 21, 2011 1:52 pm Subject: Re: Making Mumak work with capacity scheduler To: hadoop-u...@lucene.apache.org Hi Uma ! Mumak is not part of stable versions yet. It comes from Hadoop- 0.21 onwards. Can u describe in detail You may need to merge them logically ( back port them) ? I don't get it . Arun On Wed, Sep 21, 2011 at 12:07 PM, Uma Maheswara Rao G [via Lucene] ml-node+s472066n3354668...@n3.nabble.com wrote: Looks that patchs are based on 0.22 version. So, you can not apply them directly. You may need to merge them logically ( back port them). one more point to note here 0.21 version of hadoop is not a stable version. Presently 0.20xx versions are stable. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354668i=0 Date: Wednesday, September 21, 2011 12:01 pm Subject: Re: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354668i=1 Hi Uma ! I am applying patch to mumak in hadoop-0.21
Re: Making Mumak work with capacity scheduler
Hello Arun, On which code base you are trying to apply the patch. Code should match to apply the patch. Regards, Uma - Original Message - From: ArunKumar arunk...@gmail.com Date: Wednesday, September 21, 2011 11:33 am Subject: Making Mumak work with capacity scheduler To: hadoop-u...@lucene.apache.org Hi ! I have set up mumak and able to run it in terminal and in eclipse. I have modified the mapred-site.xml and capacity-scheduler.xml as necessary.I tried to apply patch MAPREDUCE-1253-20100804.patch in https://issues.apache.org/jira/browse/MAPREDUCE-1253 https://issues.apache.org/jira/browse/MAPREDUCE-1253 as follows {HADOOP_HOME}contrib/mumak$patch -p0 patch_file_location but i get error 3 out of 3 HUNK failed. Thanks, Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity- scheduler-tp3354615p3354615.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: RE: RE: java.io.IOException: Incorrect data format
I would suggest you to clean some space and try. Regards, Uma - Original Message - From: Peng, Wei wei.p...@xerox.com Date: Wednesday, September 21, 2011 10:03 am Subject: RE: RE: java.io.IOException: Incorrect data format To: common-user@hadoop.apache.org Yes, I can. The datanode is not able to start after crashing without enough HD space. Wei -Original Message- From: Uma Maheswara Rao G 72686 [mailto:mahesw...@huawei.com] Sent: Tuesday, September 20, 2011 9:30 PM To: common-user@hadoop.apache.org Subject: Re: RE: java.io.IOException: Incorrect data format Are you able to create the directory manually in the DataNode Machine? #mkdirs /state/partition2/hadoop/dfs/tmp Regards, Uma - Original Message - From: Peng, Wei wei.p...@xerox.com Date: Wednesday, September 21, 2011 9:44 am Subject: RE: java.io.IOException: Incorrect data format To: common-user@hadoop.apache.org I modified edits so that hadoop namenode is restarted, however, I couldnot start my datanode. The datanode log shows 2011-09-20 21:07:10,068 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Mkdirs failed to create /state/partition2/hadoop/dfs/tmpat org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.init(FSDatas et.java:394) at org.apache.hadoop.hdfs.server.datanode.FSDataset.init(FSDataset.java:8 94) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.j ava:318) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:232 ) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.ja va:1363) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(Data Node.java:1318) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode. java:1326) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1448) Wei -Original Message- From: Uma Maheswara Rao G 72686 [mailto:mahesw...@huawei.com] Sent: Tuesday, September 20, 2011 9:10 PM To: common-user@hadoop.apache.org Subject: Re: java.io.IOException: Incorrect data format Can you check what is the value for command 'df -h'in NN machine. I think, one more possibility could be that while saving image itself it would have been currupted. To avoid such cases it has been already handled in trunk.For more details https://issues.apache.org/jira/browse/HDFS-1594 Regards, Uma - Original Message - From: Peng, Wei wei.p...@xerox.com Date: Wednesday, September 21, 2011 9:01 am Subject: java.io.IOException: Incorrect data format To: common-user@hadoop.apache.org I was not able to restart my name server because I the name server ran out of space. Then I adjusted dfs.datanode.du.reserved to 0, and used tune2fs -m to get some space, but I still could not restart the name node. I got the following error: java.io.IOException: Incorrect data format. logVersion is -18 but writables.length is 0. Anyone knows how to resolve this issue? Best, Wei
Re: Making Mumak work with capacity scheduler
Looks that patchs are based on 0.22 version. So, you can not apply them directly. You may need to merge them logically ( back port them). one more point to note here 0.21 version of hadoop is not a stable version. Presently 0.20xx versions are stable. Regards, Uma - Original Message - From: ArunKumar arunk...@gmail.com Date: Wednesday, September 21, 2011 12:01 pm Subject: Re: Making Mumak work with capacity scheduler To: hadoop-u...@lucene.apache.org Hi Uma ! I am applying patch to mumak in hadoop-0.21 version. Arun On Wed, Sep 21, 2011 at 11:55 AM, Uma Maheswara Rao G [via Lucene] ml-node+s472066n3354652...@n3.nabble.com wrote: Hello Arun, On which code base you are trying to apply the patch. Code should match to apply the patch. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354652i=0 Date: Wednesday, September 21, 2011 11:33 am Subject: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354652i=1 Hi ! I have set up mumak and able to run it in terminal and in eclipse. I have modified the mapred-site.xml and capacity-scheduler.xml as necessary.I tried to apply patch MAPREDUCE-1253-20100804.patch in https://issues.apache.org/jira/browse/MAPREDUCE-1253 https://issues.apache.org/jira/browse/MAPREDUCE-1253 as follows {HADOOP_HOME}contrib/mumak$patch -p0 patch_file_location but i get error 3 out of 3 HUNK failed. Thanks, Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity- scheduler-tp3354615p3354615.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity-scheduler-tp3354615p3354652.html To unsubscribe from Making Mumak work with capacity scheduler, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3354615code=YXJ1bms3ODZAZ21haWwuY29tfDMzNTQ2MTV8NzA5NTc4MTY3. -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity- scheduler-tp3354615p3354660.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Making Mumak work with capacity scheduler
Hello Arun, If you want to apply MAPREDUCE-1253 on 21 version, applying patch directly using commands may not work because of codebase changes. So, you take the patch and apply the lines in your code base manually. I am not sure any otherway for this. Did i understand wrongly your intention? Regards, Uma - Original Message - From: ArunKumar arunk...@gmail.com Date: Wednesday, September 21, 2011 1:52 pm Subject: Re: Making Mumak work with capacity scheduler To: hadoop-u...@lucene.apache.org Hi Uma ! Mumak is not part of stable versions yet. It comes from Hadoop- 0.21 onwards. Can u describe in detail You may need to merge them logically ( back port them) ? I don't get it . Arun On Wed, Sep 21, 2011 at 12:07 PM, Uma Maheswara Rao G [via Lucene] ml-node+s472066n3354668...@n3.nabble.com wrote: Looks that patchs are based on 0.22 version. So, you can not apply them directly. You may need to merge them logically ( back port them). one more point to note here 0.21 version of hadoop is not a stable version. Presently 0.20xx versions are stable. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354668i=0 Date: Wednesday, September 21, 2011 12:01 pm Subject: Re: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354668i=1 Hi Uma ! I am applying patch to mumak in hadoop-0.21 version. Arun On Wed, Sep 21, 2011 at 11:55 AM, Uma Maheswara Rao G [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=3354668i=2 wrote: Hello Arun, On which code base you are trying to apply the patch. Code should match to apply the patch. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354652i=0 Date: Wednesday, September 21, 2011 11:33 am Subject: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354652i=1 Hi ! I have set up mumak and able to run it in terminal and in eclipse.I have modified the mapred-site.xml and capacity- scheduler.xml as necessary.I tried to apply patch MAPREDUCE-1253- 20100804.patch in https://issues.apache.org/jira/browse/MAPREDUCE-1253 https://issues.apache.org/jira/browse/MAPREDUCE-1253 as follows{HADOOP_HOME}contrib/mumak$patch -p0 patch_file_locationbut i get error 3 out of 3 HUNK failed. Thanks, Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity- scheduler-tp3354615p3354615.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity-scheduler-tp3354615p3354652.html To unsubscribe from Making Mumak work with capacity scheduler, click here -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity- scheduler-tp3354615p3354660.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity-scheduler-tp3354615p3354668.html To unsubscribe from Making Mumak work with capacity scheduler, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3354615code=YXJ1bms3ODZAZ21haWwuY29tfDMzNTQ2MTV8NzA5NTc4MTY3. -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity- scheduler-tp3354615p3354818.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Any other way to copy to HDFS ?
Hi, You need not copy the files to NameNode. Hadoop provide Client code as well to copy the files. To copy the files from other node ( non dfs), you need to put the hadoop**.jar's into classpath and use the below code snippet. FileSystem fs =new DistributedFileSystem(); fs.initialize(NAMENODE_URI, configuration); fs.copyFromLocal(srcPath, dstPath); using this API, you can copy the files from any machine. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 2:14 pm Subject: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can not copy any file to HDFS greater than 500 GB ? Is there any other way to directly copy to HDFS without copy the file to namenode's local filesystem ? What can be other ways to copy large files greater than namenode's diskcapacity ? Thanks, Praveenesh.
Re: Any other way to copy to HDFS ?
For more understanding the flows, i would recommend you to go through once below docs http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace Regards, Uma - Original Message - From: Uma Maheswara Rao G 72686 mahesw...@huawei.com Date: Wednesday, September 21, 2011 2:36 pm Subject: Re: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Hi, You need not copy the files to NameNode. Hadoop provide Client code as well to copy the files. To copy the files from other node ( non dfs), you need to put the hadoop**.jar's into classpath and use the below code snippet. FileSystem fs =new DistributedFileSystem(); fs.initialize(NAMENODE_URI, configuration); fs.copyFromLocal(srcPath, dstPath); using this API, you can copy the files from any machine. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 2:14 pm Subject: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can not copy any file to HDFS greater than 500 GB ? Is there any other way to directly copy to HDFS without copy the file to namenode's local filesystem ? What can be other ways to copy large files greater than namenode's diskcapacity ? Thanks, Praveenesh.
Re: Any other way to copy to HDFS ?
When you start the NameNode in Linux Machine, it will listen on one address.You can configure that address in NameNode by using fs.default.name. From the clients, you can give this address to connect to your NameNode. initialize API will take URI and configuration. Assume if your NameNode is running on hdfs://10.18.52.63:9000 Then you can caonnect to your NameNode like below. FileSystem fs =new DistributedFileSystem(); fs.initialize(new URI(hdfs://10.18.52.63:9000/), new Configuration()); Please go through the below mentioned docs, you will more understanding. if I want to copy data from windows machine to namenode machine ? In DFS namenode will be responsible for only nameSpace. in simple words to understand quickly the flow: Clients will ask NameNode to give some DNs to copy the data. Then NN will create file entry in NameSpace and also will return the block entries based on client request. Then clients directly will connect to the DNs and copy the data. Reading data back also will the sameway. I hope you will understand better now :-) Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 3:11 pm Subject: Re: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org So I want to copy the file from windows machine to linux namenode. How can I define NAMENODE_URI in the code you mention, if I want to copy data from windows machine to namenode machine ? Thanks, Praveenesh On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: For more understanding the flows, i would recommend you to go through once below docs http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace Regards, Uma - Original Message - From: Uma Maheswara Rao G 72686 mahesw...@huawei.com Date: Wednesday, September 21, 2011 2:36 pm Subject: Re: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Hi, You need not copy the files to NameNode. Hadoop provide Client code as well to copy the files. To copy the files from other node ( non dfs), you need to put the hadoop**.jar's into classpath and use the below code snippet. FileSystem fs =new DistributedFileSystem(); fs.initialize(NAMENODE_URI, configuration); fs.copyFromLocal(srcPath, dstPath); using this API, you can copy the files from any machine. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 2:14 pm Subject: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can not copy any file to HDFS greater than 500 GB ? Is there any other way to directly copy to HDFS without copy the file to namenode's local filesystem ? What can be other ways to copy large files greater than namenode's diskcapacity ? Thanks, Praveenesh.