Re: rack awarness unexpected behaviour
I've check it out and it works like that. The problem is, if the two racks have not the same capacity, one will have the disk space filled up much faster than the other (that's what I'm seeing). If one rack (rack A) has 2 servers of 8 cores with 4 reduce slots each and the other rack (rack B) has 2 servers of 16 cores with 8 reduce slots each, rack A will get filled up faster as rack B is writing more (because has more reduce slots). Could a solution be to modify the bash script used to decide to which replica write a block? Would use probability and give to rack B double chance to receive de write. -- View this message in context: http://lucene.472066.n3.nabble.com/rack-awareness-unexpected-behaviour-tp4086029p4093270.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: rack awarness unexpected behaviour
Marc, The rack aware script is an artificial concept. Meaning you can tell which machine is in which rack and that may or may not reflect where the machine is actually located. The idea is to balance the number of nodes in the racks, at least on paper. So you can have 14 machines in rack 1, and 16 machines in rack 2 even though they may physically be 20 machines in rack 1 and 10 machines in rack 2. HTH -Mike On Oct 3, 2013, at 2:52 AM, Marc Sturlese marc.sturl...@gmail.com wrote: I've check it out and it works like that. The problem is, if the two racks have not the same capacity, one will have the disk space filled up much faster than the other (that's what I'm seeing). If one rack (rack A) has 2 servers of 8 cores with 4 reduce slots each and the other rack (rack B) has 2 servers of 16 cores with 8 reduce slots each, rack A will get filled up faster as rack B is writing more (because has more reduce slots). Could a solution be to modify the bash script used to decide to which replica write a block? Would use probability and give to rack B double chance to receive de write. -- View this message in context: http://lucene.472066.n3.nabble.com/rack-awareness-unexpected-behaviour-tp4086029p4093270.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
Re: rack awarness unexpected behaviour
Doing that will balance the block writing but I think here you loose the concept of physical rack awareness. Let's say you have 2 physical racks, one with 2 servers and one with 4. If you artificially tell hadoop that one rack has 3 servers and the other 3 you are loosing the concept of rack awareness. You're not guaranteeing that each physical rack contains at least a replica of each block. So if you have 2 racks with different number of servers, it's not possible to do proper rack awareness without filling the disks of the rack with less servers first. Am I right or am I missing something? -- View this message in context: http://lucene.472066.n3.nabble.com/rack-awareness-unexpected-behaviour-tp4086029p4093337.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: rack awarness unexpected behaviour
And that's the rub. Rack awareness is an artificial construct. You want to fix it and match the real world, you need to balance the racks physically. Otherwise you need to rewrite load balancing to take in to consideration the number and power of the nodes in the rack. The short answer, it's easier to fudge the values in the script. Sent from a remote device. Please excuse any typos... Mike Segel On Oct 3, 2013, at 8:58 AM, Marc Sturlese marc.sturl...@gmail.com wrote: Doing that will balance the block writing but I think here you loose the concept of physical rack awareness. Let's say you have 2 physical racks, one with 2 servers and one with 4. If you artificially tell hadoop that one rack has 3 servers and the other 3 you are loosing the concept of rack awareness. You're not guaranteeing that each physical rack contains at least a replica of each block. So if you have 2 racks with different number of servers, it's not possible to do proper rack awareness without filling the disks of the rack with less servers first. Am I right or am I missing something? -- View this message in context: http://lucene.472066.n3.nabble.com/rack-awareness-unexpected-behaviour-tp4086029p4093337.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: rack awarness unexpected behaviour
The current HDFS's default replica placement policy don't fit two biased racks case very well: assume local rack has more nodes, which means more reducer slots and more disk capacity, then more reducer tasks will be executed within local rack. According to replica placement policy, it will put 1 replica on local rack and 2 replicas on remote rack which means data load are doubled in remote rack although less capacity there. The workaround of cheating rack-aware script (like described below) may help to resolve unbalanced data problem but will take following two issues: 1. data reliability - all 3 replicas of some blocks may fall into the same real rack. 2. rack level data locality - no matter task scheduling or replica choosing in HDFS read will get mis-understand on real rack topology. See if this is tradeoff you want to get in your case. Another workaround, although not design for this case, may be helpful: to enable NodeGroup level of locality that between node and rack which is supported after 1.2.0. Nodes under the same NodeGroup can only have one replica placed which is designed for getting rid of replica duplicated for VMs on the same host. Specifically in your case, assume you have 20 machines in rack A and 10 machines in rack B, you can put rack A nodes to two NodeGroups (so each NodeGroup has 10 nodes) and rack B nodes to one NodeGroups. In this case, the replica will be distributed in ratio of 2:1, no matter where the writer is. Hope it helps. Thanks, Junping - Original Message - From: Michael Segel michael_se...@hotmail.com To: common-user@hadoop.apache.org Cc: hadoop-u...@lucene.apache.org Sent: Thursday, October 3, 2013 8:23:58 PM Subject: Re: rack awarness unexpected behaviour Marc, The rack aware script is an artificial concept. Meaning you can tell which machine is in which rack and that may or may not reflect where the machine is actually located. The idea is to balance the number of nodes in the racks, at least on paper. So you can have 14 machines in rack 1, and 16 machines in rack 2 even though they may physically be 20 machines in rack 1 and 10 machines in rack 2. HTH -Mike On Oct 3, 2013, at 2:52 AM, Marc Sturlese marc.sturl...@gmail.com wrote: I've check it out and it works like that. The problem is, if the two racks have not the same capacity, one will have the disk space filled up much faster than the other (that's what I'm seeing). If one rack (rack A) has 2 servers of 8 cores with 4 reduce slots each and the other rack (rack B) has 2 servers of 16 cores with 8 reduce slots each, rack A will get filled up faster as rack B is writing more (because has more reduce slots). Could a solution be to modify the bash script used to decide to which replica write a block? Would use probability and give to rack B double chance to receive de write. -- View this message in context: http://lucene.472066.n3.nabble.com/rack-awareness-unexpected-behaviour-tp4086029p4093270.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
RE: Will different files in HDFS trigger different mapper
Hi, If you have lot of small files, by default Hive will group various of them in a single mapper. Check this property: hive.input.format (org.apache.hadoop.hive.ql.io.CombineHiveInputFormat (default, if empty) = if you set it to org.apache.hadoop.hive.ql.io.HiveInputFormat, you´ll get 100 maps (and a much slower MapReduce job). Other properties enable to tune it. For instance: mapred.max.split.size hive.merge.mapfiles=true; hive.merge.mapredfiles=true; Regards, Sourygna From: java8964 java8964 [mailto:java8...@hotmail.com] Sent: miércoles, 02 de octubre de 2013 22:22 To: user@hadoop.apache.org Subject: Will different files in HDFS trigger different mapper Hi, I have a question related to how the mapper generated for the input files from HDFS. I understand the split and blocks concept in the HDFS, but my originally understanding is that one mapper will only process data from one file in HDFS, no matter how small this file it is. Is that correct? The reason for this is that in some ETL, I did see the logic to understand the data set based on the file name convention. So in the mapper, before processing the first KV, we can build some logic in the map() method to get the file name of the current input, and init some logic here. After that, we don't need to worry data could be from another file later, as one mapper task will only handle data from one file, even when the file is very small. So small files not only cause trouble in NN memory, it also wastes the Map tasks, as map task could consume too less data. But today, when I run following hive query (hadoop 1.0.4 and hive 0.9.1), select partition_column, count(*) from test_table group by partition_column It only generates 2 mappers in MR job. This is an external hive table, and the input bytes for this MR job is only 338M, but the data files in the HDFS for this table is more than 100, even though a lot of them is very small, as this is one node cluster, but it is configured as one node full cluster mode, not local mode. Should the MR job generated here trigger at least 100 mappers? Is this because in hive that my original assumption not work any more? Thanks Yong
Secondary NameNode doCheckpoint Exception
Hi Everyone, I am starting my hadoop cluster manually from Java which works fine until some time. When secondary NameNode tries to do merge it gives me the following exception: --- 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) --- What causes this exception to occur? Any help would be appreciated! Kind Regards, Furkan Bicak.
Re: Secondary NameNode doCheckpoint Exception
On 03-10-2013 10:44, Furkan Bıçak wrote: Hi Everyone, I am starting my hadoop cluster manually from Java which works fine until some time. When secondary NameNode tries to do merge it gives me the following exception: --- 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) --- What causes this exception to occur? Any help would be appreciated! Kind Regards, Furkan Bicak. I forgot to mention the hadoop version. I am using hadoop 1.2.1 version. Thanks.
Re: Secondary NameNode doCheckpoint Exception
Hi, There is some layout Version value issue in your VERSION file. Can you please share the VERSION file content from NN and SNN ? ${dfs.name.dir}/current/VERSION Regards Jitendra On Thu, Oct 3, 2013 at 1:22 PM, Furkan Bıçak bicak...@gmail.com wrote: On 03-10-2013 10:44, Furkan Bıçak wrote: Hi Everyone, I am starting my hadoop cluster manually from Java which works fine until some time. When secondary NameNode tries to do merge it gives me the following exception: --- 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.**namenode.FSImage.loadFSImage(** FSImage.java:857) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$** CheckpointStorage.doMerge(**SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$** CheckpointStorage.access$600(**SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doMerge(SecondaryNameNode.**java:557) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doCheckpoint(**SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doWork(SecondaryNameNode.java:**396) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** run(SecondaryNameNode.java:**360) at java.lang.Thread.run(Thread.**java:724) java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.**namenode.FSImage.loadFSImage(** FSImage.java:857) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$** CheckpointStorage.doMerge(**SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$** CheckpointStorage.access$600(**SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doMerge(SecondaryNameNode.**java:557) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doCheckpoint(**SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doWork(SecondaryNameNode.java:**396) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** run(SecondaryNameNode.java:**360) at java.lang.Thread.run(Thread.**java:724) --- What causes this exception to occur? Any help would be appreciated! Kind Regards, Furkan Bicak. I forgot to mention the hadoop version. I am using hadoop 1.2.1 version. Thanks.
Re: Secondary NameNode doCheckpoint Exception
Did you upgraded your cluster ? Regards Jitendra On Thu, Oct 3, 2013 at 1:22 PM, Furkan Bıçak bicak...@gmail.com wrote: On 03-10-2013 10:44, Furkan Bıçak wrote: Hi Everyone, I am starting my hadoop cluster manually from Java which works fine until some time. When secondary NameNode tries to do merge it gives me the following exception: --- 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.**namenode.FSImage.loadFSImage(** FSImage.java:857) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$** CheckpointStorage.doMerge(**SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$** CheckpointStorage.access$600(**SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doMerge(SecondaryNameNode.**java:557) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doCheckpoint(**SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doWork(SecondaryNameNode.java:**396) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** run(SecondaryNameNode.java:**360) at java.lang.Thread.run(Thread.**java:724) java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.**namenode.FSImage.loadFSImage(** FSImage.java:857) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$** CheckpointStorage.doMerge(**SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$** CheckpointStorage.access$600(**SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doMerge(SecondaryNameNode.**java:557) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doCheckpoint(**SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** doWork(SecondaryNameNode.java:**396) at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** run(SecondaryNameNode.java:**360) at java.lang.Thread.run(Thread.**java:724) --- What causes this exception to occur? Any help would be appreciated! Kind Regards, Furkan Bicak. I forgot to mention the hadoop version. I am using hadoop 1.2.1 version. Thanks.
Re: Secondary NameNode doCheckpoint Exception
Hi, Content of the VERSION file of NameNode is as follows: - #Thu Oct 03 11:20:57 EEST 2013 namespaceID=1005981884 cTime=0 storageType=NAME_NODE layoutVersion=-41 - However, I could not find any VERSION file for Secondary NameNode. Just to mention, I am using Pseudo-Distributed Mode. Thanks. On 03-10-2013 11:23, Jitendra Yadav wrote: Hi, There is some layout Version value issue in your VERSION file. Can you please share the VERSION file content from NN and SNN ? ${dfs.name.dir}/current/VERSION Regards Jitendra On Thu, Oct 3, 2013 at 1:22 PM, Furkan Bıçak bicak...@gmail.com mailto:bicak...@gmail.com wrote: On 03-10-2013 10:44, Furkan Bıçak wrote: Hi Everyone, I am starting my hadoop cluster manually from Java which works fine until some time. When secondary NameNode tries to do merge it gives me the following exception: --- 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) --- What causes this exception to occur? Any help would be appreciated! Kind Regards, Furkan Bicak. I forgot to mention the hadoop version. I am using hadoop 1.2.1 version. Thanks.
Re: Secondary NameNode doCheckpoint Exception
No, I started from scratch. Thanks, Frkn. On 03-10-2013 11:32, Jitendra Yadav wrote: Did you upgraded your cluster ? Regards Jitendra On Thu, Oct 3, 2013 at 1:22 PM, Furkan Bıçak bicak...@gmail.com mailto:bicak...@gmail.com wrote: On 03-10-2013 10:44, Furkan Bıçak wrote: Hi Everyone, I am starting my hadoop cluster manually from Java which works fine until some time. When secondary NameNode tries to do merge it gives me the following exception: --- 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) --- What causes this exception to occur? Any help would be appreciated! Kind Regards, Furkan Bicak. I forgot to mention the hadoop version. I am using hadoop 1.2.1 version. Thanks.
Re: Secondary NameNode doCheckpoint Exception
Can you restart your cluster using below scripts and check the logs? # stop-all.sh #start-all.sh Regards Jitendra On Thu, Oct 3, 2013 at 2:09 PM, Furkan Bıçak bicak...@gmail.com wrote: No, I started from scratch. Thanks, Frkn. On 03-10-2013 11:32, Jitendra Yadav wrote: Did you upgraded your cluster ? Regards Jitendra On Thu, Oct 3, 2013 at 1:22 PM, Furkan Bıçak bicak...@gmail.com wrote: On 03-10-2013 10:44, Furkan Bıçak wrote: Hi Everyone, I am starting my hadoop cluster manually from Java which works fine until some time. When secondary NameNode tries to do merge it gives me the following exception: --- 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) --- What causes this exception to occur? Any help would be appreciated! Kind Regards, Furkan Bicak. I forgot to mention the hadoop version. I am using hadoop 1.2.1 version. Thanks.
Re: Hadoop Solaris OS compatibility
Thanks Roman, I will keep in touch with you. However we have faced lots of issues on Solaris Sparc OS but mostly are with hadoop 2.x.x version Regards Jitendra On Thu, Oct 3, 2013 at 5:04 AM, Roman Shaposhnik r...@apache.org wrote: On Fri, Sep 27, 2013 at 2:42 AM, Jitendra Yadav jeetuyadav200...@gmail.com wrote: Hi All, Since few years, I'm working as hadoop admin on Linux platform,Though we have majority of servers on Solaris (Sun Sparc hardware). Many times I have seen that hadoop is compatible with Linux. Is that right?. If yes then what all things I need to have so that I can run hadoop on Solaris in production? Do I need to build hadoop source on Solaris? Hadoop has a # of extension points that require compilations of native C/C++ code. Most of that code should be reasonably portable, but you'll have to try for yourself. SPARC may present an additional challenge with regard to some of the recent issue of unaligned memory access -- watch out for those as well. If you're interested in getting Hadoop to run on Solaris -- you can also ping me off-list with particular questions. Thanks, Roman.
Log file size limiting and log file rotation configurations in hadoop
Hi, Is there any configuration parameter in hadoop to limit the log file size?? and what parameter in configuration allows automatic rotation of log files in hadoop? Thanks, OC
Re: Secondary NameNode doCheckpoint Exception
I don't have any problem when starting and stoping the cluster with scripts. I am having this problem when starting the cluster from Java. Also, when I start the cluster from Java, I can run map-reduce jobs and it succesfully finishes. However, after some time about 5 minutes when the secondary namenode is started, it tries the merge and then gives that error. Thanks, Frkn. On 03-10-2013 12:10, Jitendra Yadav wrote: Can you restart your cluster using below scripts and check the logs? # stop-all.sh #start-all.sh Regards Jitendra On Thu, Oct 3, 2013 at 2:09 PM, Furkan Bıçak bicak...@gmail.com mailto:bicak...@gmail.com wrote: No, I started from scratch. Thanks, Frkn. On 03-10-2013 11:32, Jitendra Yadav wrote: Did you upgraded your cluster ? Regards Jitendra On Thu, Oct 3, 2013 at 1:22 PM, Furkan Bıçak bicak...@gmail.com mailto:bicak...@gmail.com wrote: On 03-10-2013 10:44, Furkan Bıçak wrote: Hi Everyone, I am starting my hadoop cluster manually from Java which works fine until some time. When secondary NameNode tries to do merge it gives me the following exception: --- 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) --- What causes this exception to occur? Any help would be appreciated! Kind Regards, Furkan Bicak. I forgot to mention the hadoop version. I am using hadoop 1.2.1 version. Thanks.
Re: Secondary NameNode doCheckpoint Exception
When you said started from java does that means you are doing some development work using eclipse and hadoop? May be I'm not familiar with this kind of environment. Regards Jitendra On Thu, Oct 3, 2013 at 3:24 PM, Furkan Bıçak bicak...@gmail.com wrote: I don't have any problem when starting and stoping the cluster with scripts. I am having this problem when starting the cluster from Java. Also, when I start the cluster from Java, I can run map-reduce jobs and it succesfully finishes. However, after some time about 5 minutes when the secondary namenode is started, it tries the merge and then gives that error. Thanks, Frkn. On 03-10-2013 12:10, Jitendra Yadav wrote: Can you restart your cluster using below scripts and check the logs? # stop-all.sh #start-all.sh Regards Jitendra On Thu, Oct 3, 2013 at 2:09 PM, Furkan Bıçak bicak...@gmail.com wrote: No, I started from scratch. Thanks, Frkn. On 03-10-2013 11:32, Jitendra Yadav wrote: Did you upgraded your cluster ? Regards Jitendra On Thu, Oct 3, 2013 at 1:22 PM, Furkan Bıçak bicak...@gmail.com wrote: On 03-10-2013 10:44, Furkan Bıçak wrote: Hi Everyone, I am starting my hadoop cluster manually from Java which works fine until some time. When secondary NameNode tries to do merge it gives me the following exception: --- 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) --- What causes this exception to occur? Any help would be appreciated! Kind Regards, Furkan Bicak. I forgot to mention the hadoop version. I am using hadoop 1.2.1 version. Thanks.
Re: Secondary NameNode doCheckpoint Exception
Yes, I am doing some development and starting my hadoop cluster using my IDE. Thanks, Frkn. On 03-10-2013 13:06, Jitendra Yadav wrote: When you said started from java does that means you are doing some development work using eclipse and hadoop? May be I'm not familiar with this kind of environment. Regards Jitendra On Thu, Oct 3, 2013 at 3:24 PM, Furkan Bıçak bicak...@gmail.com mailto:bicak...@gmail.com wrote: I don't have any problem when starting and stoping the cluster with scripts. I am having this problem when starting the cluster from Java. Also, when I start the cluster from Java, I can run map-reduce jobs and it succesfully finishes. However, after some time about 5 minutes when the secondary namenode is started, it tries the merge and then gives that error. Thanks, Frkn. On 03-10-2013 12:10, Jitendra Yadav wrote: Can you restart your cluster using below scripts and check the logs? # stop-all.sh #start-all.sh Regards Jitendra On Thu, Oct 3, 2013 at 2:09 PM, Furkan Bıçak bicak...@gmail.com mailto:bicak...@gmail.com wrote: No, I started from scratch. Thanks, Frkn. On 03-10-2013 11:32, Jitendra Yadav wrote: Did you upgraded your cluster ? Regards Jitendra On Thu, Oct 3, 2013 at 1:22 PM, Furkan Bıçak bicak...@gmail.com mailto:bicak...@gmail.com wrote: On 03-10-2013 10:44, Furkan Bıçak wrote: Hi Everyone, I am starting my hadoop cluster manually from Java which works fine until some time. When secondary NameNode tries to do merge it gives me the following exception: --- 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) --- What causes this exception to occur? Any help would be appreciated! Kind Regards, Furkan Bicak. I forgot to mention the hadoop version. I am using hadoop 1.2.1 version. Thanks.
Re: Secondary NameNode doCheckpoint Exception
What I simply do is starting NameNode, DataNode, SecondaryNameNode, JobTracker and TaskTracker respectively. Why would SecondaryNameNode throws an exception when doing merge. Am I missing some configurations? Or how can I set checkpoint period? Thanks, Frkn. On 03-10-2013 13:18, Furkan Bıçak wrote: Yes, I am doing some development and starting my hadoop cluster using my IDE. Thanks, Frkn. On 03-10-2013 13:06, Jitendra Yadav wrote: When you said started from java does that means you are doing some development work using eclipse and hadoop? May be I'm not familiar with this kind of environment. Regards Jitendra On Thu, Oct 3, 2013 at 3:24 PM, Furkan Bıçak bicak...@gmail.com mailto:bicak...@gmail.com wrote: I don't have any problem when starting and stoping the cluster with scripts. I am having this problem when starting the cluster from Java. Also, when I start the cluster from Java, I can run map-reduce jobs and it succesfully finishes. However, after some time about 5 minutes when the secondary namenode is started, it tries the merge and then gives that error. Thanks, Frkn. On 03-10-2013 12:10, Jitendra Yadav wrote: Can you restart your cluster using below scripts and check the logs? # stop-all.sh #start-all.sh Regards Jitendra On Thu, Oct 3, 2013 at 2:09 PM, Furkan Bıçak bicak...@gmail.com mailto:bicak...@gmail.com wrote: No, I started from scratch. Thanks, Frkn. On 03-10-2013 11:32, Jitendra Yadav wrote: Did you upgraded your cluster ? Regards Jitendra On Thu, Oct 3, 2013 at 1:22 PM, Furkan Bıçak bicak...@gmail.com mailto:bicak...@gmail.com wrote: On 03-10-2013 10:44, Furkan Bıçak wrote: Hi Everyone, I am starting my hadoop cluster manually from Java which works fine until some time. When secondary NameNode tries to do merge it gives me the following exception: --- 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 13/10/03 10:29:38 ERROR namenode.SecondaryNameNode: java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) java.lang.AssertionError: Negative layout version is expected. at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:857) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:780) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$600(SecondaryNameNode.java:680) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:557) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:521) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360) at java.lang.Thread.run(Thread.java:724) --- What causes this exception to occur? Any help would be appreciated! Kind Regards, Furkan Bicak. I forgot to mention the hadoop version. I am using hadoop 1.2.1 version. Thanks.
Single Yarn Client -- multiple applications?
Hi, Can we submit multiple applications from the same Client class? It seems to be allowed now, I just tried it with Distributed Shell example... Is it OK to do so? or does it have any wrong implications? Thanks, Kishore
Re: Log file size limiting and log file rotation configurations in hadoop
I am assuming that you are talking about user logs? See the following links for some pointers: http://grepalex.com/2012/11/12/hadoop-logging/ http://blog.cloudera.com/blog/2010/11/hadoop-log-location-and-retention/ http://hadoop.apache.org/docs/r1.0.4/mapred-default.html (*userlog* properties) Regards, Shahab On Thu, Oct 3, 2013 at 5:42 AM, oc tsdb oc.t...@gmail.com wrote: Hi, Is there any configuration parameter in hadoop to limit the log file size?? and what parameter in configuration allows automatic rotation of log files in hadoop? Thanks, OC
Re: Extending DFSInputStream class
I am throwing here this followup question. Is it safe to extend (overwrite) DFSInputStream class, in terms of future Yarn implementation design/changes etc. Unfortunately I have to overwrite DFSInputStream as If I would overwrite HdfsDataInputStream (as I should) I can't overwrite all the API i am suppose to (since in the inheritance tree DataInputStream defines methods as final). Is my solution safe in the future? Will it work correctly with HDFS in future? This question is specially directed to YARN developers. 2013/9/26 Rob Blah tmp5...@gmail.com I have specific complex stream scheme, which I want to hide from the user (short answer), also some security reasons (limiting possible read buffer size). 2013/9/26 java8964 java8964 java8...@hotmail.com Just curious, any reason you don't want to use the DFSDataInputStream? Yong -- Date: Thu, 26 Sep 2013 16:46:00 +0200 Subject: Extending DFSInputStream class From: tmp5...@gmail.com To: user@hadoop.apache.org Hi I would like to wrap DFSInputStream by extension. However it seems that the DFSInputStream constructor is package private. Is there anyway to achieve my goal? Also just out of curiosity why you have made this class inaccessible for developers, or am I missing something? regards tmp
Re: Hadoop Solaris OS compatibility
Terlli U.ykat Jitendra Yadav jeetuyadav200...@gmail.com wrote: Thanks Roman, I will keep in touch with you. However we have faced lots of issues on Solaris Sparc OS but mostly are with hadoop 2.x.x version Regards Jitendra On Thu, Oct 3, 2013 at 5:04 AM, Roman Shaposhnik r...@apache.org wrote: On Fri, Sep 27, 2013 at 2:42 AM, Jitendra Yadav jeetuyadav200...@gmail.com wrote: Hi All, Since few years, I'm working as hadoop admin on Linux platform,Though we have majority of servers on Solaris (Sun Sparc hardware). Many times I have seen that hadoop is compatible with Linux. Is that right?. If yes then what all things I need to have so that I can run hadoop on Solaris in production? Do I need to build hadoop source on Solaris? Hadoop has a # of extension points that require compilations of native C/C++ code. Most of that code should be reasonably portable, but you'll have to try for yourself. SPARC may present an additional challenge with regard to some of the recent issue of unaligned memory access -- watch out for those as well. If you're interested in getting Hadoop to run on Solaris -- you can also ping me off-list with particular questions. Thanks, Roman.
Re: modify HDFS
Hi all, I checked out hadoop 2.1.0 beta through this command svn checkout http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.0-beta/hadoop-common-2.1.0-beta; and i built all the subprojects using maven and imported them to eclipse. Now if I modified the HDFS code, how shall i test the hdfs subproject? (knowing that the project is missing the configuration files to actually be deployed on the cluster). -- Best Regards, Karim Ahmed Awara On Wed, Oct 2, 2013 at 10:31 PM, Ravi Prakash ravi...@ymail.com wrote: Karim! Hadoop 3.0 corresponds to trunk currently. I would recommend you to use branch-2. Its fairly stable. hadoop-1.x is rather old and is in maintenance mode now. You can get all the branches from https://wiki.apache.org/hadoop/GitAndHadoop git clone git://git.apache.org/hadoop-common.git Please peruse https://hadoop.apache.org/docs/stable/cluster_setup.html to configure a distributed cluster. The block files should be in dfs.data.dir on the data nodes. You may also be able to use the Offline Image viewer to find all blocks and their locations from the fsimage file. HTH Ravi -- *From:* Pradeep Gollakota pradeep...@gmail.com *To:* user@hadoop.apache.org *Sent:* Wednesday, October 2, 2013 9:29 AM *Subject:* Re: modify HDFS Since hadoop 3.0 is 2 major versions higher, it will be significantly different than working with hadoop 1.1.2. The hadoop-1.1 branch is available on SVN at http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1/ On Tue, Oct 1, 2013 at 11:30 PM, Karim Awara karim.aw...@kaust.edu.sawrote: Hi all, My previous web surfing led me to such steps that I executed successfully. However, my issue is that what version of hadoop this is? (I believe it is hadoop 3.0 since it supports maven build). However, I want to modify a stable release (hadoop 1.1.2, which does not have the maven build support). Will working with the hadoop 3.0 make a lot of difference to me? Secondly, my goal is to modify the block placement strategy at HDFS in a distributed environment, and test such changes myself. Now, assuming I was successful in modifying the HDFS code, how to test such modifications (since the hadoop version is actually missing configuration files and so on that make it work in distributed environment)? -- Best Regards, Karim Ahmed Awara On Wed, Oct 2, 2013 at 1:13 AM, Ravi Prakash ravi...@ymail.com wrote: Karim! You should read BUILDING.txt . I usually generate the eclipse files using mvn eclipse:eclipse Then I can import all the projects into eclipse as eclipse projects. This is useful for code navigation and completion etc. however I still build using command line: mvn -Pdist -Dmaven.javadoc.skip -DskipTests install HTH Ravi -- *From:* Jagat Singh jagatsi...@gmail.com *To:* user@hadoop.apache.org *Sent:* Tuesday, October 1, 2013 3:44 PM *Subject:* Re: modify HDFS Hi, What issue you having. I Wrote it about here , might help you Import it into eclipse as maven project http://jugnu-life.blogspot.com.au/2013/09/build-and-compile-hadoop-from-source.html?m=1 Thanks On 01/10/2013 11:56 PM, Karim Awara karim.aw...@kaust.edu.sa wrote: Hi, I want to modify the source code of HDFS to my usage, but I can't get any handy sources for development of hdfs on eclipse. (I tried the hdfs developer mailing list, but they are unresponsive). May you guide me? -- Best Regards, Karim Ahmed Awara -- This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email. -- This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email. -- -- This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
Re: Non data-local scheduling
Hi Andre, Try setting yarn.scheduler.capacity.node-locality-delay to a number between 0 and 1. This will turn on delay scheduling - here's the doc on how this works: For applications that request containers on particular nodes, the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another node. Expressed as a float between 0 and 1, which, as a fraction of the cluster size, is the number of scheduling opportunities to pass up. The default value of -1.0 means don't pass up any scheduling opportunities. -Sandy On Thu, Oct 3, 2013 at 9:57 AM, André Hacker andrephac...@gmail.com wrote: Hi, I have a 25 node cluster, running hadoop 2.1.0-beta, with capacity scheduler (default settings for scheduler) and replication factor 3. I have exclusive access to the cluster to run a benchmark job and I wonder why there are so few data-local and so many rack-local maps. The input format calculates 44 input splits and 44 map tasks, however, it seems to be random how many of them are processed data locally. Here the counters of my last tries: data-local / rack-local: Test 1: data-local:15 rack-local: 29 Test 2: data-local:18 rack-local: 26 I don't understand why there is not always 100% data local. This should not be a problem since the blocks of my input file are distributed over all nodes. Maybe someone can give me a hint. Thanks, André Hacker, TU Berlin
Re: Non data-local scheduling
Thanks, but I can't set this to a fraction, it wants to see an integer. My documentation is slightly different: Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers. Typically this should be set to number of racks in the cluster, this feature is disabled by default, set to -1. I set it to 1 and now I had 33 data local and 11 rack local tasks, which is a better, but still not optimal. Couldn't find a good description of what this feature means (what is a scheduling opportunity, how many are there?). It does not seem to be in the current documentation http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html 2013/10/3 Sandy Ryza sandy.r...@cloudera.com Hi Andre, Try setting yarn.scheduler.capacity.node-locality-delay to a number between 0 and 1. This will turn on delay scheduling - here's the doc on how this works: For applications that request containers on particular nodes, the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another node. Expressed as a float between 0 and 1, which, as a fraction of the cluster size, is the number of scheduling opportunities to pass up. The default value of -1.0 means don't pass up any scheduling opportunities. -Sandy On Thu, Oct 3, 2013 at 9:57 AM, André Hacker andrephac...@gmail.comwrote: Hi, I have a 25 node cluster, running hadoop 2.1.0-beta, with capacity scheduler (default settings for scheduler) and replication factor 3. I have exclusive access to the cluster to run a benchmark job and I wonder why there are so few data-local and so many rack-local maps. The input format calculates 44 input splits and 44 map tasks, however, it seems to be random how many of them are processed data locally. Here the counters of my last tries: data-local / rack-local: Test 1: data-local:15 rack-local: 29 Test 2: data-local:18 rack-local: 26 I don't understand why there is not always 100% data local. This should not be a problem since the blocks of my input file are distributed over all nodes. Maybe someone can give me a hint. Thanks, André Hacker, TU Berlin
Re: Non data-local scheduling
Try playing with the block size vs split size. If the blocks are very large and the splits small then multiple splits correspond to the same block and if there are more splits than replicas you get rack local processing. On 10/3/2013 12:57 PM, André Hacker wrote: Hi, I have a 25 node cluster, running hadoop 2.1.0-beta, with capacity scheduler (default settings for scheduler) and replication factor 3. I have exclusive access to the cluster to run a benchmark job and I wonder why there are so few data-local and so many rack-local maps. The input format calculates 44 input splits and 44 map tasks, however, it seems to be random how many of them are processed data locally. Here the counters of my last tries: data-local / rack-local: Test 1: data-local:15 rack-local: 29 Test 2: data-local:18 rack-local: 26 I don't understand why there is not always 100% data local. This should not be a problem since the blocks of my input file are distributed over all nodes. Maybe someone can give me a hint. Thanks, André Hacker, TU Berlin
Re: Non data-local scheduling
Ah, I was going off the Fair Scheduler equivalent, didn't realize they were different. In that case you might try setting it to something like half the nodes in the cluster. Nodes are constantly heartbeating to the Resource Manager. When a node heartbeats, the scheduler checks to see whether the node has any free space, and, if it does, offers it to an application. From the application's perspective, this offer is a scheduling opportunity. Each application will pass up yarn.scheduler.capacity.node-locality-delay before placing a container on a non-local node. -Sandy On Thu, Oct 3, 2013 at 10:36 AM, André Hacker andrephac...@gmail.comwrote: Thanks, but I can't set this to a fraction, it wants to see an integer. My documentation is slightly different: Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers. Typically this should be set to number of racks in the cluster, this feature is disabled by default, set to -1. I set it to 1 and now I had 33 data local and 11 rack local tasks, which is a better, but still not optimal. Couldn't find a good description of what this feature means (what is a scheduling opportunity, how many are there?). It does not seem to be in the current documentation http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html 2013/10/3 Sandy Ryza sandy.r...@cloudera.com Hi Andre, Try setting yarn.scheduler.capacity.node-locality-delay to a number between 0 and 1. This will turn on delay scheduling - here's the doc on how this works: For applications that request containers on particular nodes, the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another node. Expressed as a float between 0 and 1, which, as a fraction of the cluster size, is the number of scheduling opportunities to pass up. The default value of -1.0 means don't pass up any scheduling opportunities. -Sandy On Thu, Oct 3, 2013 at 9:57 AM, André Hacker andrephac...@gmail.comwrote: Hi, I have a 25 node cluster, running hadoop 2.1.0-beta, with capacity scheduler (default settings for scheduler) and replication factor 3. I have exclusive access to the cluster to run a benchmark job and I wonder why there are so few data-local and so many rack-local maps. The input format calculates 44 input splits and 44 map tasks, however, it seems to be random how many of them are processed data locally. Here the counters of my last tries: data-local / rack-local: Test 1: data-local:15 rack-local: 29 Test 2: data-local:18 rack-local: 26 I don't understand why there is not always 100% data local. This should not be a problem since the blocks of my input file are distributed over all nodes. Maybe someone can give me a hint. Thanks, André Hacker, TU Berlin
Re: Problem: org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException: connect timed out
what was the problem? oozie problem or machine was getting rebooted where map task ran before? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Apr 18, 2013 at 12:47 PM, Som Satpathy somsatpa...@gmail.comwrote: Never mind, got it fixed. Thanks, Som On Tue, Apr 16, 2013 at 6:18 PM, Som Satpathy somsatpa...@gmail.comwrote: Hi All, I have just set up a CDH cluster on EC2 using cloudera manager 4.5. I have been trying to run a couple of mapreduce jobs as part of an oozie workflow but have been blocked by the following exception: (my reducer always hangs because of this) - 2013-04-17 00:32:02,268 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201304170021_0003_r_00_0 copy failed: attempt_201304170021_0003_m_00_0 from ip-10-174-49-51.us-west-1.compute.internal 2013-04-17 00:32:02,269 WARN org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at sun.net.NetworkClient.doConnect(NetworkClient.java:158) at sun.net.www.http.HttpClient.openServer(HttpClient.java:395) at sun.net.www.http.HttpClient.openServer(HttpClient.java:530) at sun.net.www.http.HttpClient.init(HttpClient.java:234) at sun.net.www.http.HttpClient.New(HttpClient.java:307) at sun.net.www.http.HttpClient.New(HttpClient.java:324) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1573) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1530) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1466) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1360) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1292) 2013-04-17 00:32:02,269 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201304170021_0003_r_00_0: Failed fetch #1 from attempt_201304170021_0003_m_00_0 2013-04-17 00:32:02,269 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201304170021_0003_r_00_0 adding host ip-10-174-49-51.us-west-1.compute.internal to penalty box, next contact in 12 seconds Any suggestions that can help me get around this? Really appreciate any help here. Thanks, Som -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.