Re: HDFS upgrade problem of fsImage
Thanks Joshi, Maybe I pasted wrong log messages. please looked at here for the real story. https://issues.apache.org/jira/browse/HDFS-5550 On Fri, Nov 22, 2013 at 6:25 PM, Joshi, Rekha rekha_jo...@intuit.comwrote: Yes realized that and I see your point :-) However seems like some fs inconsistency present, did you attempt rollback/finalizeUpgrade and check? For that error, FSImage.java/code finds a previous fs state - // Upgrade is allowed only if there are // no previous fs states in any of the directories for (IteratorStorageDirectory it = storage.dirIterator(); it.hasNext();) { StorageDirectory sd = it.next(); if (sd.getPreviousDir().exists()) throw new InconsistentFSStateException(sd.getRoot(), previous fs state should not exist during upgrade. + Finalize or rollback first.); } Thanks Rekha From: Azuryy Yu azury...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date: Thursday 21 November 2013 5:19 PM To: user@hadoop.apache.org user@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org Subject: Re: HDFS upgrade problem of fsImage I insist hot upgrade on the test cluster because I want hot upgrade on the prod cluster. On 2013-11-21 7:23 PM, Joshi, Rekha rekha_jo...@intuit.com wrote: Hi Azurry, This error occurs when FSImage finds previous fs state, and as log states you would need to either finalizeUpgrade or rollback to proceed.Below - bin/hadoop dfsadmin –finalizeUpgrade hadoop dfsadmin –rollback On side note for a small test cluster on which one might suspect you are the only user, why wouldn't you insist on hot upgrade? :-) Thanks Rekha Some helpful guidelines for upgrade here - http://wiki.apache.org/hadoop/Hadoop_Upgrade https://twiki.grid.iu.edu/bin/view/Storage/HadoopUpgrade http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html#Upgrading_from_older_release_to_0.23_and_configuring_federation From: Azuryy Yu azury...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date: Thursday 21 November 2013 9:48 AM To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org, user@hadoop.apache.org user@hadoop.apache.org Subject: HDFS upgrade problem of fsImage Hi Dear, I have a small test cluster with hadoop-2.0x, and HA configuraded, but I want to upgrade to hadoop-2.2. I dont't want to stop cluster during upgrade, so my steps are: 1) on standby NN: hadoop-dameon.sh stop namenode 2) remove HA configuration in the conf 3) hadoop-daemon.sh start namenode -upgrade -clusterID test-cluster but Exception in the NN log, so how to upgrade and don't stop the whole cluster. Thanks. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /hdfs/name is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback first. at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:323) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:248) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:858) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:620) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:445) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:494) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:692) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:677) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1345)
Re: Missing records from HDFS
I do think this is because of your RecorderReader, can you paste your code here? and give a piece of data example. please use pastebin if you want. On Fri, Nov 22, 2013 at 7:16 PM, ZORAIDA HIDALGO SANCHEZ zora...@tid.eswrote: One more thing, if we split the files then all the records are processed. Files are of 70,5MB. Thanks, Zoraida.- De: zoraida zora...@tid.es Fecha: viernes, 22 de noviembre de 2013 08:59 Para: user@hadoop.apache.org user@hadoop.apache.org Asunto: Re: Missing records from HDFS Thanks for your response Azuryy. My hadoop version: 2.0.0-cdh4.3.0 InputFormat: a custom class that extends from FileInputFormat(csv input format) These fiels are under the same directory, different files. My input path is configured using oozie throughout the propertie mapred.input.dir. Same code and input running on Hadoop 2.0.0-cdh4.2.1 works fine. Does not discard any record. Thanks. De: Azuryy Yu azury...@gmail.com Responder a: user@hadoop.apache.org user@hadoop.apache.org Fecha: jueves, 21 de noviembre de 2013 07:31 Para: user@hadoop.apache.org user@hadoop.apache.org Asunto: Re: Missing records from HDFS what's your hadoop version? and which InputFormat are you used? these files under one directory or there are lots of subdirectory? how ddi you configure input path in your main? On Thu, Nov 21, 2013 at 12:25 AM, ZORAIDA HIDALGO SANCHEZ zora...@tid.eswrote: Hi all, my job is not reading all the input records. In the input directory I have a set of files containing a total of 600 records but only 5997000 are processed. The Map Input Records counter says 5997000. I have tried downloading the files with a getmerge to check how many records would return but the correct number is returned(600). Do you have any suggestion? Thanks. -- Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx -- Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx
Re: Missing records from HDFS
); } else { return new LineReader(fileIn, job, this.recordDelimiterBytes); } } private int readData() throws IOException { if (data == null) { data = new Text(); } int newSize = 0; while (pos end) { newSize = in.readLine(data, maxLineLength, Math.max((int) Math.min(Integer.MAX_VALUE, end - pos), maxLineLength)); if (newSize == 0) { break; } pos += newSize; if (newSize maxLineLength) { break; } // line too long. try again LOG.info(Skipped line of size + newSize + at pos + (pos - newSize)); } return newSize; } @Override public FileValidatorDescriptor getCurrentKey() { return key; } @Override public Text getCurrentValue() { return value; } @Override public float getProgress() { if (start == end) { return 0.0f; } else { return Math.min(1.0f, (pos - start) / (float) (end - start)); } } @Override public synchronized void close() throws IOException { if (in != null) { in.close(); } } } Thanks. De: Azuryy Yu azury...@gmail.com Responder a: user@hadoop.apache.org user@hadoop.apache.org Fecha: viernes, 22 de noviembre de 2013 12:19 Para: user@hadoop.apache.org user@hadoop.apache.org Asunto: Re: Missing records from HDFS I do think this is because of your RecorderReader, can you paste your code here? and give a piece of data example. please use pastebin if you want. On Fri, Nov 22, 2013 at 7:16 PM, ZORAIDA HIDALGO SANCHEZ zora...@tid.eswrote: One more thing, if we split the files then all the records are processed. Files are of 70,5MB. Thanks, Zoraida.- De: zoraida zora...@tid.es Fecha: viernes, 22 de noviembre de 2013 08:59 Para: user@hadoop.apache.org user@hadoop.apache.org Asunto: Re: Missing records from HDFS Thanks for your response Azuryy. My hadoop version: 2.0.0-cdh4.3.0 InputFormat: a custom class that extends from FileInputFormat(csv input format) These fiels are under the same directory, different files. My input path is configured using oozie throughout the propertie mapred.input.dir. Same code and input running on Hadoop 2.0.0-cdh4.2.1 works fine. Does not discard any record. Thanks. De: Azuryy Yu azury...@gmail.com Responder a: user@hadoop.apache.org user@hadoop.apache.org Fecha: jueves, 21 de noviembre de 2013 07:31 Para: user@hadoop.apache.org user@hadoop.apache.org Asunto: Re: Missing records from HDFS what's your hadoop version? and which InputFormat are you used? these files under one directory or there are lots of subdirectory? how ddi you configure input path in your main? On Thu, Nov 21, 2013 at 12:25 AM, ZORAIDA HIDALGO SANCHEZ zora...@tid.eswrote: Hi all, my job is not reading all the input records. In the input directory I have a set of files containing a total of 600 records but only 5997000 are processed. The Map Input Records counter says 5997000. I have tried downloading the files with a getmerge to check how many records would return but the correct number is returned(600). Do you have any suggestion? Thanks. -- Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx -- Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx -- Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx
Re: HDFS upgrade problem of fsImage
Thanks Joshi, I don't have upgrade before. the test cluser is the new cluster with hadoop-2.0.3. so I don't need to 'bin/hadoop dfsadmin –finalizeUpgrade' On Thu, Nov 21, 2013 at 7:22 PM, Joshi, Rekha rekha_jo...@intuit.comwrote: Hi Azurry, This error occurs when FSImage finds previous fs state, and as log states you would need to either finalizeUpgrade or rollback to proceed.Below - bin/hadoop dfsadmin –finalizeUpgrade hadoop dfsadmin –rollback On side note for a small test cluster on which one might suspect you are the only user, why wouldn't you insist on hot upgrade? :-) Thanks Rekha Some helpful guidelines for upgrade here - http://wiki.apache.org/hadoop/Hadoop_Upgrade https://twiki.grid.iu.edu/bin/view/Storage/HadoopUpgrade http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html#Upgrading_from_older_release_to_0.23_and_configuring_federation From: Azuryy Yu azury...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date: Thursday 21 November 2013 9:48 AM To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org, user@hadoop.apache.org user@hadoop.apache.org Subject: HDFS upgrade problem of fsImage Hi Dear, I have a small test cluster with hadoop-2.0x, and HA configuraded, but I want to upgrade to hadoop-2.2. I dont't want to stop cluster during upgrade, so my steps are: 1) on standby NN: hadoop-dameon.sh stop namenode 2) remove HA configuration in the conf 3) hadoop-daemon.sh start namenode -upgrade -clusterID test-cluster but Exception in the NN log, so how to upgrade and don't stop the whole cluster. Thanks. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /hdfs/name is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback first. at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:323) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:248) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:858) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:620) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:445) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:494) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:692) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:677) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1345)
Re: HDFS upgrade problem of fsImage
I insist hot upgrade on the test cluster because I want hot upgrade on the prod cluster. On 2013-11-21 7:23 PM, Joshi, Rekha rekha_jo...@intuit.com wrote: Hi Azurry, This error occurs when FSImage finds previous fs state, and as log states you would need to either finalizeUpgrade or rollback to proceed.Below - bin/hadoop dfsadmin –finalizeUpgrade hadoop dfsadmin –rollback On side note for a small test cluster on which one might suspect you are the only user, why wouldn't you insist on hot upgrade? :-) Thanks Rekha Some helpful guidelines for upgrade here - http://wiki.apache.org/hadoop/Hadoop_Upgrade https://twiki.grid.iu.edu/bin/view/Storage/HadoopUpgrade http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html#Upgrading_from_older_release_to_0.23_and_configuring_federation From: Azuryy Yu azury...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date: Thursday 21 November 2013 9:48 AM To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org, user@hadoop.apache.org user@hadoop.apache.org Subject: HDFS upgrade problem of fsImage Hi Dear, I have a small test cluster with hadoop-2.0x, and HA configuraded, but I want to upgrade to hadoop-2.2. I dont't want to stop cluster during upgrade, so my steps are: 1) on standby NN: hadoop-dameon.sh stop namenode 2) remove HA configuration in the conf 3) hadoop-daemon.sh start namenode -upgrade -clusterID test-cluster but Exception in the NN log, so how to upgrade and don't stop the whole cluster. Thanks. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /hdfs/name is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback first. at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:323) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:248) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:858) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:620) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:445) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:494) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:692) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:677) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1345)
Re: Limit on total jobs running using fair scheduler
fot MRv1, Its impossible. On 2013-11-22 5:46 AM, Ivan Tretyakov itretya...@griddynamics.com wrote: Thank you for your replies! We are using MR version 1 and my question is regarding this version. Omkar, are you talking about MR1 or MR2? I didn't find property to limit number of running jobs per queue for capacity scheduler using MR1. Did I wrong somewhere? What exactly option do you mean? Sandy, thanks, I got it. But unfortunately we are using MR1 for now. On Wed, Nov 20, 2013 at 2:12 AM, Sandy Ryza sandy.r...@cloudera.comwrote: Unfortunately, this is not possible in the MR1 fair scheduler without setting the jobs for individual pools. In MR2, fair scheduler hierarchical queues will allow setting maxRunningApps at the top of the hierarchy, which would have the effect you're looking for. -Sandy On Tue, Nov 19, 2013 at 2:01 PM, Omkar Joshi ojo...@hortonworks.comwrote: Not sure about the fair scheduler but in capacity scheduler you can achieve this by controlling number of jobs/applications per queue. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Tue, Nov 19, 2013 at 3:26 AM, Ivan Tretyakov itretya...@griddynamics.com wrote: Hello! We are using CDH 4.1.1 (Version: 2.0.0-mr1-cdh4.1.1) and fair-scheduler. We need to limit total number of jobs which can run at the same time on cluster. I can see maxRunningJobs options but it sets limit for pool or user. We wouldn't like to limit each pool or user we just need to set limit on total number of jobs running. Is it possible to do it using fair scheduler? Can capacity scheduler help here? Maybe there are other options to achieve the goal? Thanks in advance! -- Best Regards Ivan Tretyakov CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Best Regards Ivan Tretyakov Deployment Engineer Grid Dynamics +7 812 640 38 76 Skype: ivan.v.tretyakov www.griddynamics.com itretya...@griddynamics.com
HDFS upgrade problem of fsImage
Hi Dear, I have a small test cluster with hadoop-2.0x, and HA configuraded, but I want to upgrade to hadoop-2.2. I dont't want to stop cluster during upgrade, so my steps are: 1) on standby NN: hadoop-dameon.sh stop namenode 2) remove HA configuration in the conf 3) hadoop-daemon.sh start namenode -upgrade -clusterID test-cluster but Exception in the NN log, so how to upgrade and don't stop the whole cluster. Thanks. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /hdfs/name is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback first. at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:323) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:248) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:858) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:620) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:445) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:494) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:692) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:677) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1345)
Re: Missing records from HDFS
what's your hadoop version? and which InputFormat are you used? these files under one directory or there are lots of subdirectory? how ddi you configure input path in your main? On Thu, Nov 21, 2013 at 12:25 AM, ZORAIDA HIDALGO SANCHEZ zora...@tid.eswrote: Hi all, my job is not reading all the input records. In the input directory I have a set of files containing a total of 600 records but only 5997000 are processed. The Map Input Records counter says 5997000. I have tried downloading the files with a getmerge to check how many records would return but the correct number is returned(600). Do you have any suggestion? Thanks. -- Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx
Re: help with hadoop 1.2.1 aggregate framework (non-streaming)
Hi Nicholas, This is not hadoop releated. edu.harvard.seas.scifs.ScifsStandard, which is your cutomized class, so: you need to include this class in your ScifsStandard.jar On Thu, Nov 21, 2013 at 4:15 AM, Nicholas Murphy halcyo...@gmail.comwrote: I'm trying to use the aggregate framework but in a non-streaming fashion. I detailed what I'm doing pretty well here: http://stackoverflow.com/questions/20085532/trying-to-make-hadoop-1-2-1-aggregate-framework-work-non-streaming I have a feeling there's a simple solution (involving setting the appropriate classpath somewhere) but I don't know what it is. Any help appreciated. There's remarkably little information available on non-streaming use of aggregates that I can find. Thanks, Nick
Re: help with hadoop 1.2.1 aggregate framework (non-streaming)
Where is your ScifsStandard.jar? does that locate in the same path with your command line? if not, plaese specify the full path in your command line, such as : hadoop jar ../test/test.jar X On Thu, Nov 21, 2013 at 2:42 PM, Nicholas Murphy halcyo...@gmail.comwrote: Thanks, but as per the StackOverflow site, it is there (see the second code block where I list the contents of ScifsStandard.jar). Nick On Nov 21, 2013, at 1:37 AM, Azuryy Yu azury...@gmail.com wrote: Hi Nicholas, This is not hadoop releated. edu.harvard.seas.scifs.ScifsStandard, which is your cutomized class, so: you need to include this class in your ScifsStandard.jar On Thu, Nov 21, 2013 at 4:15 AM, Nicholas Murphy halcyo...@gmail.com wrote: I'm trying to use the aggregate framework but in a non-streaming fashion. I detailed what I'm doing pretty well here: http://stackoverflow.com/questions/20085532/trying-to-make-hadoop-1-2-1-aggregate-framework-work-non-streaming I have a feeling there's a simple solution (involving setting the appropriate classpath somewhere) but I don't know what it is. Any help appreciated. There's remarkably little information available on non-streaming use of aggregates that I can find. Thanks, Nick
hadoop maven repo
hi, please recommend a good maven repo to compile hadoop source code. It complain cannot find jdbm:bundle:2.0.0:m15 during compile trunk. thanks.
Re: hadoop maven repo
Ted, I am on Linux. On 2013-11-19 1:30 AM, Ted Yu yuzhih...@gmail.com wrote: Which platform did you perform the build on ? I was able to build trunk on Mac. I found the following dependency in dependency tree output: [INFO] +- org.apache.directory.server:apacheds-jdbm-partition:jar:2.0.0-M15:compile [INFO] | \- org.apache.directory.jdbm:apacheds-jdbm1:bundle:2.0.0-M2:compile On Mon, Nov 18, 2013 at 3:27 AM, Azuryy Yu azury...@gmail.com wrote: hi, please recommend a good maven repo to compile hadoop source code. It complain cannot find jdbm:bundle:2.0.0:m15 during compile trunk. thanks.
Re: hadoop maven repo
Thanks Ted. I missed new pom.xml, fixed it now. On 2013-11-19 3:09 AM, Ted Yu yuzhih...@gmail.com wrote: Compilation on Linux passed for me: [hortonzy@kiyo hadoop]$ uname -a Linux core.net 2.6.32-220.23.1.el6.20120713.x86_64 #1 SMP Fri Jul 13 11:40:51 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux [hortonzy@kiyo hadoop]$ mvn -version Apache Maven 3.0.3 (r1075438; 2011-02-28 17:31:09+) Cheers On Mon, Nov 18, 2013 at 10:51 AM, Azuryy Yu azury...@gmail.com wrote: Ted, I am on Linux. On 2013-11-19 1:30 AM, Ted Yu yuzhih...@gmail.com wrote: Which platform did you perform the build on ? I was able to build trunk on Mac. I found the following dependency in dependency tree output: [INFO] +- org.apache.directory.server:apacheds-jdbm-partition:jar:2.0.0-M15:compile [INFO] | \- org.apache.directory.jdbm:apacheds-jdbm1:bundle:2.0.0-M2:compile On Mon, Nov 18, 2013 at 3:27 AM, Azuryy Yu azury...@gmail.com wrote: hi, please recommend a good maven repo to compile hadoop source code. It complain cannot find jdbm:bundle:2.0.0:m15 during compile trunk. thanks.
RE: HDFS Startup Failure due to dfs.namenode.rpc-address and Shared Edits Directory
dfs.ha.namenodes.mycluster nn.domain,snn.domain it should be: dfs.ha.namenodes.mycluster nn1,nn2 On Aug 27, 2013 11:22 PM, Smith, Joshua D. joshua.sm...@gd-ais.com wrote: Harsh- Here are all of the other values that I have configured. hdfs-site.xml - dfs.webhdfs.enabled true dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.automatic-falover.enabled true ha.zookeeper.quorum nn.domain:2181,snn.domain:2181,jt.domain:2181 dfs.journalnode.edits.dir /opt/hdfs/data1/dfs/jn dfs.namenode.shared.edits.dir qjournal://nn.domain:8485;snn.domain:8485;jt.domain:8485/mycluster dfs.nameservices mycluster dfs.ha.namenodes.mycluster nn.domain,snn.domain dfs.namenode.rpc-address.mycluster.nn1 nn.domain:8020 dfs.namenode.rpc-address.mycluster.nn2 snn.domain:8020 dfs.namenode.http-address.mycluster.nn1 nn.domain:50070 dfs.namenode.http-address.mycluster.nn2 snn.domain:50070 dfs.name.dir /var/lib/hadoop-hdfs/cache/hdfs/dfs/name core-site.xml fs.trash.interval 1440 fs.trash.checkpoint.interval 1440 fs.defaultFS hdfs://mycluster dfs.datanode.data.dir /hdfs/data1,/hdfs/data2,/hdfs/data3,/hdfs/data4,/hdfs/data5,/hdfs/data6,/hdfs/data7 mapred-site.xml -- mapreduce.framework.name yarn mapreduce.jobhistory.address jt.domain:10020 mapreduce.jobhistory.webapp.address jt.domain:19888 yarn-site.xml --- yarn.nodemanager.aux-service mapreduce.shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.log-aggregation-enable true yarn.nodemanager.remote-app-log-dir /var/log/hadoop-yarn/apps yarn.application.classpath $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib /*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$YARN_HOME/*,$YARN_HOME/lib/* yarn.resourcemanager.resource-tracker.address jt.domain:8031 yarn.resourcemanager.address jt.domain:8032 yarn.resourcemanager.scheduler.address jt.domain:8030 yarn.resourcemanager.admin.address jt.domain:8033 yarn.reesourcemanager.webapp.address jt.domain:8088 These are the only interesting entries in my HDFS log file when I try to start the NameNode with service hadoop-hdfs-namenode start. WARN org.apache.hadoop.hdfs.server.common.Util: Path /var/lib/hadoop-hdfs/cache/hdfs/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration. WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories! INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: HA Enabled: false WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Configured NNs: ((there's a blank line here implying no configured NameNodes!)) ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. Java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled. FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join Java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled. I don't like the blank line for Configured NNs. Not sure why it's not finding them. If I try the command hdfs zkfc -formatZK I get the following: Exception in thread main org.apache.hadoop.HadoopIllegalArgumentException: HA is not enabled for this namenode. -Original Message- From: Smith, Joshua D. [mailto:joshua.sm...@gd-ais.com] Sent: Tuesday, August 27, 2013 7:17 AM To: user@hadoop.apache.org Subject: RE: HDFS Startup Failure due to dfs.namenode.rpc-address and Shared Edits Directory Harsh- Yes, I intend to use HA. That's what I'm trying to configure right now. Unfortunately I cannot share my complete configuration files. They're on a disconnected network. Are there any configuration items that you'd like me to post my settings for? The deployment is CDH 4.3 on a brand new cluster. There are 3 master nodes (NameNode, StandbyNameNode, JobTracker/ResourceManager) and 7 slave nodes. Each of the master nodes is configured to be a Zookeeper node as well as a Journal node. The HA configuration that I'm striving toward is the automatic fail-over with Zookeeper. Does that help? Josh -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Monday, August 26, 2013 6:11 PM To: user@hadoop.apache.org Subject: Re: HDFS Startup Failure due to dfs.namenode.rpc-address and Shared Edits Directory It is not quite from your post, so a Q: Do you intend to use HA or not? Can you share your complete core-site.xml and hdfs-site.xml along with a brief note on the deployment? On Tue, Aug 27, 2013 at 12:48 AM, Smith, Joshua D.
RE: HDFS Startup Failure due to dfs.namenode.rpc-address and Shared Edits Directory
not yet. please correct it. On Aug 27, 2013 11:39 PM, Smith, Joshua D. joshua.sm...@gd-ais.com wrote: nn.domain is a place holder for the actual fully qualified hostname of my NameNode snn.domain is a place holder for the actual fully qualified hostname of my StandbyNameNode. ** ** Of course both the NameNode and the StandbyNameNode are running exactly the same software with the same configuration since this is YARN. I’m not running and SecondaryName node. ** ** The actual fully qualified hostnames are on another network and my customer is sensitive about privacy, so that’s why I didn’t post the actual values. ** ** So, I think I have the equivalent of nn1,nn2 do I not? ** ** *From:* Azuryy Yu [mailto:azury...@gmail.com] *Sent:* Tuesday, August 27, 2013 11:32 AM *To:* user@hadoop.apache.org *Subject:* RE: HDFS Startup Failure due to dfs.namenode.rpc-address and Shared Edits Directory ** ** dfs.ha.namenodes.mycluster nn.domain,snn.domain it should be: dfs.ha.namenodes.mycluster nn1,nn2 On Aug 27, 2013 11:22 PM, Smith, Joshua D. joshua.sm...@gd-ais.com wrote: Harsh- Here are all of the other values that I have configured. hdfs-site.xml - dfs.webhdfs.enabled true dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.automatic-falover.enabled true ha.zookeeper.quorum nn.domain:2181,snn.domain:2181,jt.domain:2181 dfs.journalnode.edits.dir /opt/hdfs/data1/dfs/jn dfs.namenode.shared.edits.dir qjournal://nn.domain:8485;snn.domain:8485;jt.domain:8485/mycluster dfs.nameservices mycluster dfs.ha.namenodes.mycluster nn.domain,snn.domain dfs.namenode.rpc-address.mycluster.nn1 nn.domain:8020 dfs.namenode.rpc-address.mycluster.nn2 snn.domain:8020 dfs.namenode.http-address.mycluster.nn1 nn.domain:50070 dfs.namenode.http-address.mycluster.nn2 snn.domain:50070 dfs.name.dir /var/lib/hadoop-hdfs/cache/hdfs/dfs/name core-site.xml fs.trash.interval 1440 fs.trash.checkpoint.interval 1440 fs.defaultFS hdfs://mycluster dfs.datanode.data.dir /hdfs/data1,/hdfs/data2,/hdfs/data3,/hdfs/data4,/hdfs/data5,/hdfs/data6,/hdfs/data7 mapred-site.xml -- mapreduce.framework.name yarn mapreduce.jobhistory.address jt.domain:10020 mapreduce.jobhistory.webapp.address jt.domain:19888 yarn-site.xml --- yarn.nodemanager.aux-service mapreduce.shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.log-aggregation-enable true yarn.nodemanager.remote-app-log-dir /var/log/hadoop-yarn/apps yarn.application.classpath $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib /*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$YARN_HOME/*,$YARN_HOME/lib/* yarn.resourcemanager.resource-tracker.address jt.domain:8031 yarn.resourcemanager.address jt.domain:8032 yarn.resourcemanager.scheduler.address jt.domain:8030 yarn.resourcemanager.admin.address jt.domain:8033 yarn.reesourcemanager.webapp.address jt.domain:8088 These are the only interesting entries in my HDFS log file when I try to start the NameNode with service hadoop-hdfs-namenode start. WARN org.apache.hadoop.hdfs.server.common.Util: Path /var/lib/hadoop-hdfs/cache/hdfs/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration. WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories! INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: HA Enabled: false WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Configured NNs: ((there's a blank line here implying no configured NameNodes!)) ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. Java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled. FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join Java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled. I don't like the blank line for Configured NNs. Not sure why it's not finding them. If I try the command hdfs zkfc -formatZK I get the following: Exception in thread main org.apache.hadoop.HadoopIllegalArgumentException: HA is not enabled for this namenode. -Original Message- From: Smith, Joshua D. [mailto:joshua.sm...@gd-ais.com] Sent: Tuesday, August 27, 2013 7:17 AM To: user@hadoop.apache.org Subject: RE: HDFS Startup Failure due to dfs.namenode.rpc-address and Shared Edits Directory Harsh- Yes, I intend to use HA. That's what I'm trying to configure right now. Unfortunately I cannot
Re: Why LineRecordWriter.write(..) is synchronized
because we may use multi-threads to write a single file. On Aug 8, 2013 2:54 PM, Sathwik B P sath...@apache.org wrote: Hi, LineRecordWriter.write(..) is synchronized. I did not find any other RecordWriter implementations define the write as synchronized. Any specific reason for this. regards, sathwik
Re: Why LineRecordWriter.write(..) is synchronized
its not hadoop forked threads, we may create a line record writer, then call this writer concurrently. On Aug 8, 2013 4:00 PM, Sathwik B P sathwik...@gmail.com wrote: Hi, Thanks for your reply. May I know where does hadoop fork multiple threads to use a single RecordWriter. regards, sathwik On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu azury...@gmail.com wrote: because we may use multi-threads to write a single file. On Aug 8, 2013 2:54 PM, Sathwik B P sath...@apache.org wrote: Hi, LineRecordWriter.write(..) is synchronized. I did not find any other RecordWriter implementations define the write as synchronized. Any specific reason for this. regards, sathwik
Re: issue about hadoop hardware choose
if you want HA, then do you want to deploy journal node on the DN? On Aug 8, 2013 5:09 PM, ch huang justlo...@gmail.com wrote: hi,all: My company need build a 10 node hadoop cluster (2 namenode and 8 datanode node manager ,for both data storage and data analysis ) ,we have hbase ,hive on the hadoop cluster, 10G data increment per day. we use CDH4.3 ( for dual - namenode HA),my plan is name node resource manager dual Quad Core 24G RAM 2 * 500GB SATA DISK (JBOD) datanode node manager dual Quad Core 24G RAM 2 * 1TGB SATA DISK (JBOD) my question is 1, if resource manager need a dedicated server? ( i plan to put RM with one of NN) 2, if the RAM is enough for RM + NN machine? 3,RAID is need for NN machine? 4,is it ok if i place JN on other node(DN or NN) 5, how much zookeeper server node i need? 6,i want to place yarn proxy server and mapreduce history server with another NN,is it ok?
Re: Why LineRecordWriter.write(..) is synchronized
sequence writer is also synchronized, I dont think this is bad. if you call HDFS api to write concurrently, then its necessary. On Aug 8, 2013 7:53 PM, Jay Vyas jayunit...@gmail.com wrote: Then is this a bug? Synchronization in absence of any race condition is normally considered bad. In any case id like to know why this writer is synchronized whereas the other one are not.. That is, I think, then point at issue: either other writers should be synchronized or else this one shouldn't be - consistency across the write implementations is probably desirable so that changes to output formats or record writers don't lead to bugs in multithreaded environments . On Aug 8, 2013, at 6:50 AM, Harsh J ha...@cloudera.com wrote: While we don't fork by default, we do provide a MultithreadedMapper implementation that would require such synchronization. But if you are asking is it necessary, then perhaps the answer is no. On Aug 8, 2013 3:43 PM, Azuryy Yu azury...@gmail.com wrote: its not hadoop forked threads, we may create a line record writer, then call this writer concurrently. On Aug 8, 2013 4:00 PM, Sathwik B P sathwik...@gmail.com wrote: Hi, Thanks for your reply. May I know where does hadoop fork multiple threads to use a single RecordWriter. regards, sathwik On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu azury...@gmail.com wrote: because we may use multi-threads to write a single file. On Aug 8, 2013 2:54 PM, Sathwik B P sath...@apache.org wrote: Hi, LineRecordWriter.write(..) is synchronized. I did not find any other RecordWriter implementations define the write as synchronized. Any specific reason for this. regards, sathwik
Re: Namenode is failing with expception to join
Manish, you stop HDFS then start HDFS on the standby name node right? please looked at https://issues.apache.org/jira/browse/HDFS-5058 there are two solutions: 1) start HDFS on the active name node, nor SBN 2) copy {namenode.name.dir}/* to the SBN I advice #1. On Wed, Aug 7, 2013 at 3:00 PM, Manish Bhoge manishbh...@rocketmail.comwrote: I have all configuration fine. But whenever i start namenode it fails with a below exception. No clue where to fix this? 2013-08-07 02:56:22,754 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 1 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under construction = 0 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 115 loaded in 0 seconds. 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded image for txid 0 from /data/1/dfs/nn/current/fsimage_000 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@5f18223d expecting start txid #1 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream '/data/1/dfs/nn/current/edits_0515247-0515255' to transaction ID 1 2013-08-07 02:56:22,753 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system... 2013-08-07 02:56:22,754 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped. 2013-08-07 02:56:22,754 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.2013-08-07 02:56:22,754 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: There appears to be a gap in the edit log. We expected txid 1, but got txid 515247.
specify Mapred tasks and slots
Hi Dears, Can I specify how many slots to use for reduce? I know we can specify reduces tasks, but is there one task occupy one slot? it it possible that one tak occupy more than one slot in Hadoop-1.1.2. Thanks.
Re: specify Mapred tasks and slots
My question is can I specify how many slots to be used for each M/R task? On Thu, Aug 8, 2013 at 10:29 AM, Shekhar Sharma shekhar2...@gmail.comwrote: Slots are decided upon the configuration of machines, RAM etc... Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com wrote: Hi Dears, Can I specify how many slots to use for reduce? I know we can specify reduces tasks, but is there one task occupy one slot? it it possible that one tak occupy more than one slot in Hadoop-1.1.2. Thanks.
Re: specify Mapred tasks and slots
Thanks Harsh and all friends response. That's helpful. On Thu, Aug 8, 2013 at 11:55 AM, Harsh J ha...@cloudera.com wrote: What Devaraj said. Except that if you use CapacityScheduler, then you can bind together memory requests and slot concepts, and be able to have a task grab more than one slot for itself when needed. We've discussed this aspect previously at http://search-hadoop.com/m/gnFs91yIg1e On Thu, Aug 8, 2013 at 8:34 AM, Devaraj k devara...@huawei.com wrote: One task can use only one slot, It cannot use more than one slot. If the task is Map task then it will use one map slot and if the task is reduce task the it will use one reduce slot from the configured ones. Thanks Devaraj k From: Azuryy Yu [mailto:azury...@gmail.com] Sent: 08 August 2013 08:27 To: user@hadoop.apache.org Subject: Re: specify Mapred tasks and slots My question is can I specify how many slots to be used for each M/R task? On Thu, Aug 8, 2013 at 10:29 AM, Shekhar Sharma shekhar2...@gmail.com wrote: Slots are decided upon the configuration of machines, RAM etc... Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com wrote: Hi Dears, Can I specify how many slots to use for reduce? I know we can specify reduces tasks, but is there one task occupy one slot? it it possible that one tak occupy more than one slot in Hadoop-1.1.2. Thanks. -- Harsh J
Re: version 1.1.2 document error:
All the differences are listed on the last URL you provided: https://github.com/twitter/hadoop-lzo Did you read it carefully? On Thu, Jul 25, 2013 at 11:28 AM, 周梦想 abloz...@gmail.com wrote: Hello, In the page:http://hadoop.apache.org/docs/r1.1.2/deployment_layout.html - LZO - LZ0 codec from github.com/omally/hadoop-gpl-compression should be http://hadoop.apache.org/docs/r1.1.2/deployment_layout.html https://github.com/omalley/hadoop-gpl-compression I think. there is another lzo : https://github.com/twitter/hadoop-lzo how is the difference between the tow? Thanks. Andy Zhou
Re: ./hdfs namenode -bootstrapStandby error
hi, can you using 'hdfs namenode -initializeSharedEdits' on the active NN, remember start all journal nodes before try this. On Jul 19, 2013 5:17 PM, lei liu liulei...@gmail.com wrote: I use hadoop-2.0.5 version and use QJM for HA. I use ./hdfs namenode -bootstrapStandby for StandbyNameNode, but report below error: = About to bootstrap Standby ID nn2 from: Nameservice ID: mycluster Other Namenode ID: nn1 Other NN's HTTP address: 10.232.98.77:20021 Other NN's IPC address: dw77.kgb.sqa.cm4/10.232.98.77:20020 Namespace ID: 1499625118 Block pool ID: BP-2012507965-10.232.98.77-1372993302021 Cluster ID: CID-921af0aa-b831-4828-965c-3b71a5149600 Layout version: -40 = Re-format filesystem in Storage Directory /home/musa.ll/hadoop2/cluster-data/name ? (Y or N) Y 13/07/19 17:04:28 INFO common.Storage: Storage directory /home/musa.ll/hadoop2/cluster-data/name has been successfully formatted. 13/07/19 17:04:29 FATAL ha.BootstrapStandby: Unable to read transaction ids 16317-16337 from the configured shared edits storage qjournal:// 10.232.98.61:20022;10.232.98.62:20022;10.232.98.63:20022/mycluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node. Error: Gap in transactions. Expected to be able to read up until at least txid 16337 but unable to find any edit logs containing txid 16331 13/07/19 17:04:29 INFO util.ExitUtil: Exiting with status 6 The edit logs are below content in JournalNode: -rw-r--r-- 1 musa.ll users 30 Jul 19 15:51 edits_0016327-0016328 -rw-r--r-- 1 musa.ll users 30 Jul 19 15:53 edits_0016329-0016330 -rw-r--r-- 1 musa.ll users 1048576 Jul 19 17:03 edits_inprogress_0016331 The edits_inprogress_0016331 should contains the 16331-16337 transactions, why the ./hdfs namenode -bootstrapStandby command report error? How can I initialize the StandbyNameNode? Thanks, LiuLei
Re: Namenode automatically going to safemode with 2.1.0-beta
this is not a bug. it has been documented. On Jul 19, 2013 10:13 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, I have made my dfs.namenode.name.dir point to a subdirectory of my home, and I don't see this issue again. So, is this a bug that we need to log into JIRA? Thanks, Kishore On Tue, Jul 16, 2013 at 6:39 AM, Harsh J ha...@cloudera.com wrote: 2013-07-12 11:04:26,002 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume 'null' is 0, which is below the configured reserved amount 104857600 This is interesting. Its calling your volume null, which may be more of a superficial bug. What is your dfs.namenode.name.dir set to? From /tmp/hadoop-dsadm/dfs/name I'd expect you haven't set it up and /tmp is being used off of the out-of-box defaults. Could you try to set it to a specific directory thats not on /tmp? On Mon, Jul 15, 2013 at 2:43 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I don't have it in my hdfs-site.xml, in which case probably the default value is taken.. On Mon, Jul 15, 2013 at 2:29 PM, Azuryy Yu azury...@gmail.com wrote: please check dfs.datanode.du.reserved in the hdfs-site.xml On Jul 15, 2013 4:30 PM, Aditya exalter adityaexal...@gmail.com wrote: Hi Krishna, Can you please send screenshots of namenode web UI. Thanks Aditya. On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I have had enough space on the disk that is used, like around 30 Gigs Thanks, Kishore On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla venkatarami.ne...@cloudwick.com wrote: Hi, pls see the available space for NN storage directory. Thanks Regards Venkat On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am doing no activity on my single node cluster which is using 2.1.0-beta, and still observed that it has gone to safe mode by itself after a while. I was looking at the name node log and see many of these kinds of entries.. Can anything be interpreted from these? 2013-07-12 09:06:11,256 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 561 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561 2013-07-12 09:07:11,291 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 14 2013-07-12 09:07:11,292 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 15 2013-07-12 09:07:11,293 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 - /tmp/hadoop-dsadm/dfs/name/current/edits_561-562 2013-07-12 09:07:11,294 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 563 2013-07-12 09:08:11,397 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563 2013-07-12 09:08:11,399 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 11 2013-07-12 09:08:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 - /tmp/hadoop-dsadm/dfs/name/current/edits_563-564 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565 2013-07-12
Re: Running a single cluster in multiple datacenters
Hi Bertrand, I guess you configured two racks totally. one IDC is a rack, and another IDC is another rack. so if you want to don't replicate populate during one IDC down, you had to change the replicate placement policy, if there are minimum blocks on one rack, then don't do anything. (here minims blocks should be '2', which can guarantee you have two blocks at lease in one IDC) so you had to configure replicator factor to '4' if you adopt my advice. On Jul 16, 2013, at 6:37 AM, Bertrand Dechoux decho...@gmail.com wrote: According to your own analysis, you wouldn't be more available but that was your aim. Did you consider having two separate clusters? One per datacenter, with an automatic copy of the data? I understand that load balancing of work and data would not be easy but it seems to me a simple strategy (that I have seen working). However, you are stating that the two datacenters are close and linked by a big network connection. What is the impact on the latency and the bandwidth? (between two nodes in the same datacenter versus two nodes in different datacenters) The main question is what happens when a job will use TaskTrackers from datacenter A but DataNodes from datacenter B. It will happen. Simply consider Reducer tasks that don't have any strategy about locality because it doesn't really make sense in a general context. Regards Bertrand On Mon, Jul 15, 2013 at 11:56 PM, j...@nanthrax.net wrote: Hi Niels, it's depend of the number of replicas and the Hadoop rack configuration (level). It's possible to have replicas on the two datacenters. What's the rack configuration that you plan ? You can implement your own one and define it using the topology.node.switch.mapping.impl property. Regards JB On 2013-07-15 23:49, Niels Basjes wrote: Hi, Last week we had a discussion at work regarding setting up our new Hadoop cluster(s). One of the things that has changed is that the importance of the Hadoop stack is growing so we want to be more available. One of the points we talked about was setting up the cluster in such a way that the nodes are physically located in two separate datacenters (on opposite sides of the same city) with a big network connection in between. Were currently talking about a cluster in the 50 nodes range, but that will grow over time. The advantages I see: - More CPU power available for jobs. - The data is automatically copied between the datacenters as long as we configure them to be different racks. The disadvantages I see: - If the network goes out then one half is dead and the other half will most likely go to safemode because the recovering of the missing replicas will fill up the disks fast. What things should we consider also? Has anyone any experience with such a setup? Is it a good idea to do this? What are better options for us to consider? Thanks for any input. -- Bertrand Dechoux
Re: Namenode automatically going to safemode with 2.1.0-beta
hi, from the log: NameNode low on available disk space. Entering safe mode. this is the root cause. On Jul 15, 2013 2:45 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am doing no activity on my single node cluster which is using 2.1.0-beta, and still observed that it has gone to safe mode by itself after a while. I was looking at the name node log and see many of these kinds of entries.. Can anything be interpreted from these? 2013-07-12 09:06:11,256 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 561 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561 2013-07-12 09:07:11,291 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 14 2013-07-12 09:07:11,292 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 15 2013-07-12 09:07:11,293 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 - /tmp/hadoop-dsadm/dfs/name/current/edits_561-562 2013-07-12 09:07:11,294 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 563 2013-07-12 09:08:11,397 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563 2013-07-12 09:08:11,399 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 11 2013-07-12 09:08:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 - /tmp/hadoop-dsadm/dfs/name/current/edits_563-564 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 13 2013-07-12 09:09:11,441 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 13 And after sometime it said: 2013-07-12 11:03:19,799 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 795 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 12 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 11:04:19,829 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_795 - /tmp/hadoop-dsadm/dfs/name/current/edits_795-796 2013-07-12 11:04:19,829 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 797 2013-07-12 11:04:26,002 WARN
Re: Namenode automatically going to safemode with 2.1.0-beta
please check dfs.datanode.du.reserved in the hdfs-site.xml On Jul 15, 2013 4:30 PM, Aditya exalter adityaexal...@gmail.com wrote: Hi Krishna, Can you please send screenshots of namenode web UI. Thanks Aditya. On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I have had enough space on the disk that is used, like around 30 Gigs Thanks, Kishore On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla venkatarami.ne...@cloudwick.com wrote: Hi, pls see the available space for NN storage directory. Thanks Regards Venkat On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am doing no activity on my single node cluster which is using 2.1.0-beta, and still observed that it has gone to safe mode by itself after a while. I was looking at the name node log and see many of these kinds of entries.. Can anything be interpreted from these? 2013-07-12 09:06:11,256 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 561 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561 2013-07-12 09:07:11,291 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 14 2013-07-12 09:07:11,292 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 15 2013-07-12 09:07:11,293 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 - /tmp/hadoop-dsadm/dfs/name/current/edits_561-562 2013-07-12 09:07:11,294 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 563 2013-07-12 09:08:11,397 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563 2013-07-12 09:08:11,399 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 11 2013-07-12 09:08:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 - /tmp/hadoop-dsadm/dfs/name/current/edits_563-564 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 13 2013-07-12 09:09:11,441 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 13 And after sometime it said: 2013-07-12 11:03:19,799 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 795 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 12 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in
Re: Use a URL for the HADOOP_CONF_DIR?
yes. it is useful. On Jul 16, 2013 5:40 AM, Niels Basjes ni...@basjes.nl wrote: Hi, When giving users access to an Hadoop cluster they need a few XML config files (like the hadoop-site.xml ). They put these somewhere on they PC and start running their jobs on the cluster. Now when you're changing the settings you want those users to use (for example you changed some tcpport) you need them all to update their config files. My question is: Can you set the HADOOP_CONF_DIR to be a URL on a webserver? A while ago I tried this and (back then) it didn't work. Would this be a useful enhancement? -- Best regards, Niels Basjes
Re: Hadoop property precedence
the conf that client running on will take effect. On Jul 13, 2013 4:42 PM, Kiran Dangeti kirandkumar2...@gmail.com wrote: Shalish, The default block size is 64MB which is good at the client end. Make sure the same at your end also in conf. You can increase the size of each block to 128MB or greater than that only thing you can see the processing will be fast but at end there may be chances of losing data. Thanks, Kiran On Fri, Jul 12, 2013 at 10:20 PM, Shalish VJ shalis...@yahoo.com wrote: Hi, Suppose block size set in configuration file at client side is 64MB, block size set in configuration file at name node side is 128MB and block size set in configuration file at datanode side is something else. Please advice, If the client is writing a file to hdfs,which property would be executed. Thanks, Shalish.
Re: Task failure in slave node
sorry for typo, mahout, not mahou. sent from mobile On Jul 11, 2013 9:40 PM, Azuryy Yu azury...@gmail.com wrote: hi, put all mahou jars under hadoop_home/lib, then restart cluster. On Jul 11, 2013 8:45 PM, Margusja mar...@roo.ee wrote: Hi I have tow nodes: n1 (master, salve) and n2 (slave) after set up I ran wordcount example and it worked fine: [hduser@n1 ~]$ hadoop jar /usr/local/hadoop/hadoop-**examples-1.0.4.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output 13/07/11 15:30:44 INFO input.FileInputFormat: Total input paths to process : 7 13/07/11 15:30:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/07/11 15:30:44 WARN snappy.LoadSnappy: Snappy native library not loaded 13/07/11 15:30:44 INFO mapred.JobClient: Running job: job_201307111355_0015 13/07/11 15:30:45 INFO mapred.JobClient: map 0% reduce 0% 13/07/11 15:31:03 INFO mapred.JobClient: map 42% reduce 0% 13/07/11 15:31:06 INFO mapred.JobClient: map 57% reduce 0% 13/07/11 15:31:09 INFO mapred.JobClient: map 71% reduce 0% 13/07/11 15:31:15 INFO mapred.JobClient: map 100% reduce 0% 13/07/11 15:31:18 INFO mapred.JobClient: map 100% reduce 23% 13/07/11 15:31:27 INFO mapred.JobClient: map 100% reduce 100% 13/07/11 15:31:32 INFO mapred.JobClient: Job complete: job_201307111355_0015 13/07/11 15:31:32 INFO mapred.JobClient: Counters: 30 13/07/11 15:31:32 INFO mapred.JobClient: Job Counters 13/07/11 15:31:32 INFO mapred.JobClient: Launched reduce tasks=1 13/07/11 15:31:32 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=67576 13/07/11 15:31:32 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/11 15:31:32 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/11 15:31:32 INFO mapred.JobClient: Rack-local map tasks=3 13/07/11 15:31:32 INFO mapred.JobClient: Launched map tasks=7 13/07/11 15:31:32 INFO mapred.JobClient: Data-local map tasks=4 13/07/11 15:31:32 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=21992 13/07/11 15:31:32 INFO mapred.JobClient: File Output Format Counters 13/07/11 15:31:32 INFO mapred.JobClient: Bytes Written=1412505 13/07/11 15:31:32 INFO mapred.JobClient: FileSystemCounters 13/07/11 15:31:32 INFO mapred.JobClient: FILE_BYTES_READ=5414195 13/07/11 15:31:32 INFO mapred.JobClient: HDFS_BYTES_READ=6950820 13/07/11 15:31:32 INFO mapred.JobClient: FILE_BYTES_WRITTEN=8744993 13/07/11 15:31:32 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1412505 13/07/11 15:31:32 INFO mapred.JobClient: File Input Format Counters 13/07/11 15:31:32 INFO mapred.JobClient: Bytes Read=6950001 13/07/11 15:31:32 INFO mapred.JobClient: Map-Reduce Framework 13/07/11 15:31:32 INFO mapred.JobClient: Map output materialized bytes=3157469 13/07/11 15:31:32 INFO mapred.JobClient: Map input records=137146 13/07/11 15:31:32 INFO mapred.JobClient: Reduce shuffle bytes=2904836 13/07/11 15:31:32 INFO mapred.JobClient: Spilled Records=594764 13/07/11 15:31:32 INFO mapred.JobClient: Map output bytes=11435849 13/07/11 15:31:32 INFO mapred.JobClient: Total committed heap usage (bytes)=1128136704 13/07/11 15:31:32 INFO mapred.JobClient: CPU time spent (ms)=18230 13/07/11 15:31:32 INFO mapred.JobClient: Combine input records=1174991 13/07/11 15:31:32 INFO mapred.JobClient: SPLIT_RAW_BYTES=819 13/07/11 15:31:32 INFO mapred.JobClient: Reduce input records=218990 13/07/11 15:31:32 INFO mapred.JobClient: Reduce input groups=128513 13/07/11 15:31:32 INFO mapred.JobClient: Combine output records=218990 13/07/11 15:31:32 INFO mapred.JobClient: Physical memory (bytes) snapshot=1179656192 13/07/11 15:31:32 INFO mapred.JobClient: Reduce output records=128513 13/07/11 15:31:32 INFO mapred.JobClient: Virtual memory (bytes) snapshot=22992117760 13/07/11 15:31:32 INFO mapred.JobClient: Map output records=1174991 from web interface (http://n1:50030/) I saw that both (n1 and n2 ) were used without any errors. Problems appear if I try to use following commands in master (n1): [hduser@n1 ~]$hadoop jar mahout-distribution-0.7/**mahout-examples-0.7-job.jar org.apache.mahout.classifier.**df.mapreduce.BuildForest -Dmapred.max.split.size=**1874231 -p -d testdata/bal_ee_2009.csv -ds testdata/bal_ee_2009.csv.info -sl 10 -o bal_ee_2009_out -t 1 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [file:/usr/local/hadoop-1.0.4/**org/slf4j/impl/** StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-1.** 0.4/lib/slf4j-log4j12-1.4.3.**jar!/org/slf4j/impl/** StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.**html#multiple_bindingshttp://www.slf4j.org/codes.html#multiple_bindingsfor an explanation. 13/07/11 15:36:50 INFO mapreduce.BuildForest: Partial Mapred implementation 13/07/11 15:36:50 INFO mapreduce.BuildForest
Re: cannot submit a job via java client in hadoop- 2.0.5-alpha
you didn't set yarn.nodemanager.address in your yarn-site.xml On Wed, Jul 10, 2013 at 4:33 PM, Francis.Hu francis...@reachjunction.comwrote: Hi,All ** ** I have a hadoop- 2.0.5-alpha cluster with 3 data nodes . I have Resource Manager and all data nodes started and can access web ui of Resource Manager. I wrote a java client to submit a job as TestJob class below. But the job is never submitted successfully. It throws out exception all the time. *** * My configurations are attached. Can anyone help me? Thanks. ** ** -my-java client *public* *class* TestJob { *public* *void* execute() { ** ** Configuration conf1 = *new* Configuration(); conf1.addResource(resources/core-site.xml); conf1.addResource(resources/hdfs-site.xml); conf1.addResource(resources/yarn-site.xml); conf1.addResource(resources/mapred-site.xml); JobConf conf = *new* JobConf(conf1); conf.setJar(/home/francis/hadoop-jobs/MapReduceJob.jar); conf.setJobName(Test); ** ** conf.setInputFormat(TextInputFormat.*class*); conf.setOutputFormat(TextOutputFormat.*class*); ** ** conf.setOutputKeyClass(Text.*class*); conf.setOutputValueClass(IntWritable.*class*); ** ** conf.setMapperClass(DisplayRequestMapper.*class*); conf.setReducerClass(DisplayRequestReducer.*class*); ** ** FileInputFormat.*setInputPaths*(conf,*new* Path( /home/francis/hadoop-jobs/2013070907.FNODE.2.txt)); FileOutputFormat.*setOutputPath*(conf, *new* Path( /home/francis/hadoop-jobs/result/)); ** ** *try* { JobClient client = *new* JobClient(conf); RunningJob job = client.submitJob(conf); job.waitForCompletion(); } *catch* (IOException e) { e.printStackTrace(); } } } ** ** --Exception ** ** jvm 1| java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. jvm 1| at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:119) jvm 1| at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:81) jvm 1| at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:74) jvm 1| at org.apache.hadoop.mapred.JobClient.init(JobClient.java:482) jvm 1| at org.apache.hadoop.mapred.JobClient.init(JobClient.java:461) jvm 1| at com.rh.elastic.hadoop.job.TestJob.execute(TestJob.java:59) ** ** ** ** Thanks, Francis.Hu ** **
Re: Distributed Cache
It should be like this: Configuration conf = new Configuration(); Job job = new Job(conf, test); job.setJarByClass(Test.class); DistributedCache.addCacheFile(new Path(your hdfs path).toUri(), job.getConfiguration()); but the best example is test cases: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/filecache/TestClientDistributedCacheManager.java?view=markup On Wed, Jul 10, 2013 at 6:07 AM, Ted Yu yuzhih...@gmail.com wrote: You should use Job#addCacheFile() Cheers On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew andrew.bote...@emc.comwrote: Hi, ** ** I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5). In my driver class, I use this code to try and add a file to the distributed cache: ** ** import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; ** ** Configuration conf = new Configuration(); DistributedCache.addCacheFile(new URI(file path in HDFS), conf); Job job = Job.getInstance(); … ** ** However, I keep getting warnings that the method addCacheFile() is deprecated. Is there a more current way to add files to the distributed cache? ** ** Thanks in advance, ** ** Andrew
Re: Can I move block data directly?
Thanks Harsh, always detailed answers each time. Yes, this is an unsupported scenarios, Load balancer is very slow even after I set bandwidthPerSec to a large value, so I want to take this way to slove the problem quickly. On Mon, Jul 8, 2013 at 1:46 PM, Viral Bajaria viral.baja...@gmail.comwrote: Out of curiosity. Besides the bandwidthPerSec and threshold what other parameters are tuneable ? Thanks, Viral On Sun, Jul 7, 2013 at 10:39 PM, Harsh J ha...@cloudera.com wrote: If balancer isn't cutting it out for you with stock defaults, you should consider tuning that than doing these unsupported scenarios.
Re: Can I move block data directly?
bq. I'd also ensure the ownership of the block files are intact. Hi Harsh, what's mean ensure the ownership of the block files are intact? and I want to ask more. for my understand, after I restart the data node daemon, block report should tell NN all blocks owned by this DN, and block scnaner can remember all blocks' structure on local, so the block file owner ship would be confirmed at the starting period and even if some pieces of blk_ files losts, then NN an find it's under replicated, am I right? Thanks. On Mon, Jul 8, 2013 at 2:07 PM, Azuryy Yu azury...@gmail.com wrote: Thanks Harsh, always detailed answers each time. Yes, this is an unsupported scenarios, Load balancer is very slow even after I set bandwidthPerSec to a large value, so I want to take this way to slove the problem quickly. On Mon, Jul 8, 2013 at 1:46 PM, Viral Bajaria viral.baja...@gmail.comwrote: Out of curiosity. Besides the bandwidthPerSec and threshold what other parameters are tuneable ? Thanks, Viral On Sun, Jul 7, 2013 at 10:39 PM, Harsh J ha...@cloudera.com wrote: If balancer isn't cutting it out for you with stock defaults, you should consider tuning that than doing these unsupported scenarios.
Re: Can I move block data directly?
Yeah. I got it. Thanks Harsh. On Mon, Jul 8, 2013 at 3:10 PM, Harsh J ha...@cloudera.com wrote: Yeah you're right. I only meant the ownership of the blk_* files to be owned by the same user as the DN daemon, for consistency more than anything else. On Mon, Jul 8, 2013 at 11:46 AM, Azuryy Yu azury...@gmail.com wrote: bq. I'd also ensure the ownership of the block files are intact. Hi Harsh, what's mean ensure the ownership of the block files are intact? and I want to ask more. for my understand, after I restart the data node daemon, block report should tell NN all blocks owned by this DN, and block scnaner can remember all blocks' structure on local, so the block file owner ship would be confirmed at the starting period and even if some pieces of blk_ files losts, then NN an find it's under replicated, am I right? Thanks. On Mon, Jul 8, 2013 at 2:07 PM, Azuryy Yu azury...@gmail.com wrote: Thanks Harsh, always detailed answers each time. Yes, this is an unsupported scenarios, Load balancer is very slow even after I set bandwidthPerSec to a large value, so I want to take this way to slove the problem quickly. On Mon, Jul 8, 2013 at 1:46 PM, Viral Bajaria viral.baja...@gmail.com wrote: Out of curiosity. Besides the bandwidthPerSec and threshold what other parameters are tuneable ? Thanks, Viral On Sun, Jul 7, 2013 at 10:39 PM, Harsh J ha...@cloudera.com wrote: If balancer isn't cutting it out for you with stock defaults, you should consider tuning that than doing these unsupported scenarios. -- Harsh J
Re: New to hadoop - which version to start with ?
oh, this is fair scheduler issue. you need a patch for that. I will send it to you later. or can you use another scheduler insteadof fs? On Jul 6, 2013 4:42 PM, sudhir543-...@yahoo.com sudhir543-...@yahoo.com wrote: That did not work, same error comes up. So , I tried bin/hadoop fs -chmod -R 755 /tmp/hadoop-Sudhir/mapred/staging/sudhir-1664368101/.staging - as well However next run would give 13/07/06 14:09:46 ERROR security.UserGroupInformation: PriviledgedActionException as:Sudhir cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Sudhir\mapred\staging\Sudhir1731506911\.staging to 0700 java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Sudhir\mapred\staging\Sudhir1731506911\.staging to 0700 Thanks *Sudhir * -- *From:* Azuryy Yu azury...@gmail.com *To:* user@hadoop.apache.org; sudhir543-...@yahoo.com sudhir543-...@yahoo.com *Sent:* Saturday, 6 July 2013 11:01 AM *Subject:* Re: New to hadoop - which version to start with ? hadoop fs -chmod -R 755 \tmp\hadoop-Sudhir\mapred\staging Then It should works. On Sat, Jul 6, 2013 at 1:27 PM, sudhir543-...@yahoo.com sudhir543-...@yahoo.com wrote: I am new to hadoop, just started reading 'hadoop the definitive guide'. I downloaded hadoop 1.1.2 and tried to run a sample Map reduce job using cygwin, but I got following error java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Sudhir\mapred\staging\Sudhir-1267269654\.staging to 0700 I read that some specific version has this error. Can some one suggest me which version should I start with (and supports MapReduce 1 runtime (Classic)) Thanks *Sudhir *
Re: Which InputFormat to use?
Using InputFormat under mapreduce package. mapred package is very old package. but generally you can extend from FileInputFormat under o.a.h.mapreduce package. On Fri, Jul 5, 2013 at 1:23 PM, Devaraj k devara...@huawei.com wrote: Hi Ahmed, ** ** Hadoop 0.20.0 included the new mapred API, these sometimes refer as context objects. These are designed to make API easier to evolve in future. There are some differences between new old API's, ** ** The new API's favour abstract classes rather than interfaces, since abstract classes are easy to evolve. New API's use context objects like MapContext ReduceContext to connect the user code. The old API has a special JobConf object for jobconf, in new API Job configuration will be done using Configuration. ** ** You can find the new API's in org.apache.hadoop.mapreduce.lib.input.* package and its sub packages, old API's in org.apache.hadoop.mapred.* package its sub packages. ** ** The new API is type-incompatible with the old, we need to rewrite the jobs to make use of these advantages. ** ** Based on these things you can select which API's to use. ** ** Thanks Devaraj k ** ** *From:* Ahmed Eldawy [mailto:aseld...@gmail.com] *Sent:* 05 July 2013 00:00 *To:* user@hadoop.apache.org *Subject:* Which InputFormat to use? ** ** Hi I'm developing a new set of InputFormats that are used for a project I'm doing. I found that there are two ways to create a new InputFormat.** ** 1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat 2- Implement the interface org.apache.hadoop.mapred.InputFormat I don't know why there are two versions which are incompatible. I found out that for each one, there is a whole set of interfaces for different classes such as InputSplit, RecordReader and MapReduce job. Unfortunately, each set of classes is not compatible with the other one. This means that I have to choose one of the interfaces and go with it till the end. I have two questions basically. 1- Which of these two interfaces I should go with? I didn't find any deprecation in one of them so they both seem legitimate. Is there any plan to retire one of them? 2- I already have some classes implemented in one of the formats, does it worth refactoring these classes to use the other interface, in case I used he old format. Thanks in advance for your help. ** ** Best regards, Ahmed Eldawy
Re: Decomssion datanode - no response
I filed this issue at : https://issues.apache.org/jira/browse/HDFS-4959 On Fri, Jul 5, 2013 at 1:06 PM, Azuryy Yu azury...@gmail.com wrote: Client hasn't any connection problem. On Fri, Jul 5, 2013 at 12:46 PM, Devaraj k devara...@huawei.com wrote: And also could you check whether the client is connecting to NameNode or any failure in connecting to the NN. ** ** Thanks Devaraj k ** ** *From:* Azuryy Yu [mailto:azury...@gmail.com] *Sent:* 05 July 2013 09:15 *To:* user@hadoop.apache.org *Subject:* Re: Decomssion datanode - no response ** ** I added dfs.hosts.exclude before NN started. and I updated /usr/local/hadoop/conf/dfs_exclude whith new hosts, but It doesn't decomssion. ** ** On Fri, Jul 5, 2013 at 11:39 AM, Devaraj k devara...@huawei.com wrote:* *** When did you add this configuration in NN conf? property namedfs.hosts.exclude/name value/usr/local/hadoop/conf/dfs_exclude/value /property If you have added this configuration after starting NN, it won’t take effect and need to restart NN. If you have added this config with the exclude file before NN start, you can update the file with new hosts and refreshNodes command can be issued, then newly updated the DN’s will be decommissioned. Thanks Devaraj k *From:* Azuryy Yu [mailto:azury...@gmail.com] *Sent:* 05 July 2013 08:48 *To:* user@hadoop.apache.org *Subject:* Re: Decomssion datanode - no response Thanks Devaraj, There are no any releated logs in the NN log and DN log. On Fri, Jul 5, 2013 at 11:14 AM, Devaraj k devara...@huawei.com wrote:* *** Do you see any log related to this in Name Node logs when you issue refreshNodes dfsadmin command? Thanks Devaraj k *From:* Azuryy Yu [mailto:azury...@gmail.com] *Sent:* 05 July 2013 08:12 *To:* user@hadoop.apache.org *Subject:* Decomssion datanode - no response Hi, I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude, hdfs-site.xml: property namedfs.hosts.exclude/name value/usr/local/hadoop/conf/dfs_exclude/value /property then: hdfs dfsadmin -refreshNodes but there is no decomssion nodes showed on the webUI. and not any releated logs in the datanode log. what's wrong? ** **
Re: New to hadoop - which version to start with ?
hadoop fs -chmod -R 755 \tmp\hadoop-Sudhir\mapred\staging Then It should works. On Sat, Jul 6, 2013 at 1:27 PM, sudhir543-...@yahoo.com sudhir543-...@yahoo.com wrote: I am new to hadoop, just started reading 'hadoop the definitive guide'. I downloaded hadoop 1.1.2 and tried to run a sample Map reduce job using cygwin, but I got following error java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Sudhir\mapred\staging\Sudhir-1267269654\.staging to 0700 I read that some specific version has this error. Can some one suggest me which version should I start with (and supports MapReduce 1 runtime (Classic)) Thanks *Sudhir *
Re: 【dfs.namenode.shared.edits.dir can support different NameServices】
Hi Bing, HA is not confilct with HDFS federation. for example, you have two name services: cluster1, cluster2, then, property namedfs.namenode.shared.edits.dir/name valueqjournal://n1.com:8485;n2.com:8485/cluster1/value /property property namedfs.namenode.shared.edits.dir/name valueqjournal://n1.com:8485;n2.com:8485/cluster2/value /property On Thu, Jul 4, 2013 at 2:46 PM, Bing Jiang jiangbinglo...@gmail.com wrote: hi, Chris. I have traced the source code, and I find this issue comes from sbin/start-dfs.sh, SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.namenode.shared.edits.dir 2-) If I set the suffix on dfs.namenode.shared.edits.dir.[namespace id].[nn id], it will get null. So please take consideration about HA on multiple nameservice. So, please change the way of launching JournalNodes in shell scripts (sbin/start-dfs.sh) Thanks. 2013/7/4 Chris Nauroth cnaur...@hortonworks.com Hello Bing, I have not tested this configuration myself, but from reading the code it does appear that dfs.namenode.shared.edits.dir supports appending the nameservice ID to the key, so that you can specify different directories for different federated namenodes in a single hdfs-site.xml. https://github.com/apache/hadoop-common/blob/branch-2.0.5-alpha/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L147 Hope this helps, Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jul 3, 2013 at 6:21 AM, Bing Jiang jiangbinglo...@gmail.comwrote: hi,all. Using hadoop-2.0.5-alpha. and I meet a problem that I should configure different dfs.namenode.shared.edits.dir for different NameSpace. So could dfs.namenode.shared.edits.dir item can support multiple nameservies to avoid retaining multiple conf/hdfs-site.xml in different Namenode? Thanks~ -- Bing Jiang Tel:(86)134-2619-1361 weibo: http://weibo.com/jiangbinglover BLOG: http://blog.sina.com.cn/jiangbinglover National Research Center for Intelligent Computing Systems Institute of Computing technology Graduate University of Chinese Academy of Science -- Bing Jiang Tel:(86)134-2619-1361 weibo: http://weibo.com/jiangbinglover BLOG: http://blog.sina.com.cn/jiangbinglover National Research Center for Intelligent Computing Systems Institute of Computing technology Graduate University of Chinese Academy of Science
Re: Datanode support different Namespace
This is because you don't use the same clusterID. all data nodes and namenodes should use the same clusterID. On Thu, Jul 4, 2013 at 3:12 PM, Bing Jiang jiangbinglo...@gmail.com wrote: Hi, all We try to use hadoop-2.0.5-alpha, using two namespaces, one is for hbase cluster, and the other one is for common use.At the same time, we use Quorum Journal policy as HA. GS-CIX-SEV0001, GS-CIX-SEV0002, namenodes in hbasecluster namespace GS-CIX-SEV0003, GS-CIX-SEV0004, namenodes in commoncluster namespace. GS-CIX-SEV0001~GS-CIX-SEV0008 , 8 machines used as Datanode After launching the hdfs cluster all, there is something which makes me confused, that each namespace has half of the datanodes. NameNode 'GS-CIX-SEV0004:9100' Started:Thu Jul 04 10:28:00 CST 2013 Version:2.0.5-alpha, 1488459 Compiled: 2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha Cluster ID: CID-15c48d78-2137-4c6e-aacf-0edbf2bb3db7 Block Pool ID: BP-1792015895-10.100.2.3-1372904504940 Browse the filesystem NameNode Logs Go back to DFS home Live Datanodes : 4 Last AdminConfigured Used Non DFS Remaining Used Used Remaining Block Block Pool Failed Node Contact State Capacity (GB) Used (GB) (%) (%) (%)Blocks Pool Used (%) Volumes (GB) (GB) Used (GB) Blocks GS-CIX-SEV0001 1 In Service 888.07 0.00 116.04772.03 0.00 ┌┐ 86.93 0 0.00 0.000 └┘ GS-CIX-SEV0002 1 In Service 888.07 0.00 135.50752.57 0.00 ┌┐ 84.74 0 0.00 0.000 └┘ GS-CIX-SEV0005 1 In Service 888.07 0.0097.61790.46 0.00 ┌┐ 89.01 0 0.00 0.000 └┘ GS-CIX-SEV0006 1 In Service 888.07 0.00 122.30765.77 0.00 ┌┐ 86.23 0 0.00 0.000 └┘ Another Namespace's NameNode: NameNode 'GS-CIX-SEV0001:9100' Started:Thu Jul 04 10:19:03 CST 2013 Version:2.0.5-alpha, 1488459 Compiled: 2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha Cluster ID: CID-1a53483d-000e-4726-aef1-f500bedb1df6 Block Pool ID: BP-1142418822-10.100.2.1-1372904314309 Browse the filesystem NameNode Logs Go back to DFS home Live Datanodes : 4 Last AdminConfigured Used Non DFS Remaining Used Used Remaining Block Block Pool Failed Node Contact State Capacity (GB) Used (GB) (%) (%) (%)Blocks Pool Used (%) Volumes (GB) (GB) Used (GB) Blocks GS-CIX-SEV0003 0 In Service 888.07 0.00 150.54737.53 0.00 ┌┐ 83.05 0 0.00 0.000 └┘ GS-CIX-SEV0004 0 In Service 888.07 0.00 177.22710.85 0.00 ┌┐ 80.04 0 0.00 0.000 └┘ GS-CIX-SEV0007 0 In Service 888.07 0.0062.91825.16 0.00 ┌┐ 92.92 0 0.00 0.000 └┘ GS-CIX-SEV0008 0 In Service 888.07 0.00 125.25762.82 0.00 ┌┐ 85.90 0 0.00 0.000 And check the DN(GS-CIX-SEV0001)'s log, it prints like this: 2013-07-04 10:34:51,699 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1142418822-10.100.2.1-1372904314309 (storage id DS-1677272131-10.100.2.1-50010-1372905291690) service to GS-CIX-SEV0001/ 10.100.2.1:9100 java.io.IOException: Inconsistent storage IDs. Name-node returned DS811369792. Expecting DS-1677272131-10.100.2.1-50010-1372905291690 at org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:731) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:308) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:632) at
Re: Datanode support different Namespace
Additional, If these are two new clusters, then on each namenode, using hdfs namenode -format -clusterID yourID But if you want to upgrade these two clusters from NonHA to HA, then using bin/start-dfs.sh -upgrade -clusterID yourID On Thu, Jul 4, 2013 at 3:14 PM, Azuryy Yu azury...@gmail.com wrote: This is because you don't use the same clusterID. all data nodes and namenodes should use the same clusterID. On Thu, Jul 4, 2013 at 3:12 PM, Bing Jiang jiangbinglo...@gmail.comwrote: Hi, all We try to use hadoop-2.0.5-alpha, using two namespaces, one is for hbase cluster, and the other one is for common use.At the same time, we use Quorum Journal policy as HA. GS-CIX-SEV0001, GS-CIX-SEV0002, namenodes in hbasecluster namespace GS-CIX-SEV0003, GS-CIX-SEV0004, namenodes in commoncluster namespace. GS-CIX-SEV0001~GS-CIX-SEV0008 , 8 machines used as Datanode After launching the hdfs cluster all, there is something which makes me confused, that each namespace has half of the datanodes. NameNode 'GS-CIX-SEV0004:9100' Started:Thu Jul 04 10:28:00 CST 2013 Version:2.0.5-alpha, 1488459 Compiled: 2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha Cluster ID: CID-15c48d78-2137-4c6e-aacf-0edbf2bb3db7 Block Pool ID: BP-1792015895-10.100.2.3-1372904504940 Browse the filesystem NameNode Logs Go back to DFS home Live Datanodes : 4 Last AdminConfigured Used Non DFS Remaining Used Used Remaining Block Block Pool Failed Node Contact State Capacity (GB) Used (GB) (%) (%) (%)Blocks Pool Used (%) Volumes (GB) (GB) Used (GB) Blocks GS-CIX-SEV0001 1 In Service 888.07 0.00 116.04772.03 0.00 ┌┐ 86.93 0 0.00 0.000 └┘ GS-CIX-SEV0002 1 In Service 888.07 0.00 135.50752.57 0.00 ┌┐ 84.74 0 0.00 0.000 └┘ GS-CIX-SEV0005 1 In Service 888.07 0.0097.61790.46 0.00 ┌┐ 89.01 0 0.00 0.000 └┘ GS-CIX-SEV0006 1 In Service 888.07 0.00 122.30765.77 0.00 ┌┐ 86.23 0 0.00 0.000 └┘ Another Namespace's NameNode: NameNode 'GS-CIX-SEV0001:9100' Started:Thu Jul 04 10:19:03 CST 2013 Version:2.0.5-alpha, 1488459 Compiled: 2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha Cluster ID: CID-1a53483d-000e-4726-aef1-f500bedb1df6 Block Pool ID: BP-1142418822-10.100.2.1-1372904314309 Browse the filesystem NameNode Logs Go back to DFS home Live Datanodes : 4 Last AdminConfigured Used Non DFS Remaining Used Used Remaining Block Block Pool Failed Node Contact State Capacity (GB) Used (GB) (%) (%) (%)Blocks Pool Used (%) Volumes (GB) (GB) Used (GB) Blocks GS-CIX-SEV0003 0 In Service 888.07 0.00 150.54737.53 0.00 ┌┐ 83.05 0 0.00 0.000 └┘ GS-CIX-SEV0004 0 In Service 888.07 0.00 177.22710.85 0.00 ┌┐ 80.04 0 0.00 0.000 └┘ GS-CIX-SEV0007 0 In Service 888.07 0.0062.91825.16 0.00 ┌┐ 92.92 0 0.00 0.000 └┘ GS-CIX-SEV0008 0 In Service 888.07 0.00 125.25762.82 0.00 ┌┐ 85.90 0 0.00 0.000 And check the DN(GS-CIX-SEV0001)'s log, it prints like this: 2013-07-04 10:34:51,699 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1142418822-10.100.2.1-1372904314309 (storage id DS-1677272131-10.100.2.1-50010-1372905291690) service to GS-CIX-SEV0001/ 10.100.2.1:9100 java.io.IOException: Inconsistent storage IDs. Name-node returned DS811369792. Expecting DS-1677272131-10.100.2.1-50010-1372905291690 at org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:731
Re: Datanode support different Namespace
It's random. On Jul 4, 2013 3:33 PM, Bing Jiang jiangbinglo...@gmail.com wrote: If not set cluster id in formatting the Namenode, is there a policy in hdfs to guarantee the even of distributing DataNodes into different Namespace, or just randomly? 2013/7/4 Azuryy Yu azury...@gmail.com Additional, If these are two new clusters, then on each namenode, using hdfs namenode -format -clusterID yourID But if you want to upgrade these two clusters from NonHA to HA, then using bin/start-dfs.sh -upgrade -clusterID yourID On Thu, Jul 4, 2013 at 3:14 PM, Azuryy Yu azury...@gmail.com wrote: This is because you don't use the same clusterID. all data nodes and namenodes should use the same clusterID. On Thu, Jul 4, 2013 at 3:12 PM, Bing Jiang jiangbinglo...@gmail.comwrote: Hi, all We try to use hadoop-2.0.5-alpha, using two namespaces, one is for hbase cluster, and the other one is for common use.At the same time, we use Quorum Journal policy as HA. GS-CIX-SEV0001, GS-CIX-SEV0002, namenodes in hbasecluster namespace GS-CIX-SEV0003, GS-CIX-SEV0004, namenodes in commoncluster namespace. GS-CIX-SEV0001~GS-CIX-SEV0008 , 8 machines used as Datanode After launching the hdfs cluster all, there is something which makes me confused, that each namespace has half of the datanodes. NameNode 'GS-CIX-SEV0004:9100' Started:Thu Jul 04 10:28:00 CST 2013 Version:2.0.5-alpha, 1488459 Compiled: 2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha Cluster ID: CID-15c48d78-2137-4c6e-aacf-0edbf2bb3db7 Block Pool ID: BP-1792015895-10.100.2.3-1372904504940 Browse the filesystem NameNode Logs Go back to DFS home Live Datanodes : 4 Last AdminConfigured Used Non DFS Remaining Used Used Remaining Block Block Pool Failed Node Contact State Capacity (GB) Used (GB) (%) (%) (%)Blocks Pool Used (%) Volumes (GB) (GB) Used (GB) Blocks GS-CIX-SEV0001 1 In Service 888.07 0.00 116.04772.03 0.00 ┌┐ 86.93 0 0.00 0.000 └┘ GS-CIX-SEV0002 1 In Service 888.07 0.00 135.50752.57 0.00 ┌┐ 84.74 0 0.00 0.000 └┘ GS-CIX-SEV0005 1 In Service 888.07 0.0097.61790.46 0.00 ┌┐ 89.01 0 0.00 0.000 └┘ GS-CIX-SEV0006 1 In Service 888.07 0.00 122.30765.77 0.00 ┌┐ 86.23 0 0.00 0.000 └┘ Another Namespace's NameNode: NameNode 'GS-CIX-SEV0001:9100' Started:Thu Jul 04 10:19:03 CST 2013 Version:2.0.5-alpha, 1488459 Compiled: 2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha Cluster ID: CID-1a53483d-000e-4726-aef1-f500bedb1df6 Block Pool ID: BP-1142418822-10.100.2.1-1372904314309 Browse the filesystem NameNode Logs Go back to DFS home Live Datanodes : 4 Last AdminConfigured Used Non DFS Remaining Used Used Remaining Block Block Pool Failed Node Contact State Capacity (GB) Used (GB) (%) (%) (%)Blocks Pool Used (%) Volumes (GB) (GB) Used (GB) Blocks GS-CIX-SEV0003 0 In Service 888.07 0.00 150.54737.53 0.00 ┌┐ 83.05 0 0.00 0.000 └┘ GS-CIX-SEV0004 0 In Service 888.07 0.00 177.22710.85 0.00 ┌┐ 80.04 0 0.00 0.000 └┘ GS-CIX-SEV0007 0 In Service 888.07 0.0062.91825.16 0.00 ┌┐ 92.92 0 0.00 0.000 └┘ GS-CIX-SEV0008 0 In Service 888.07 0.00 125.25762.82 0.00 ┌┐ 85.90 0 0.00 0.000 And check the DN(GS-CIX-SEV0001)'s log, it prints like this: 2013-07-04 10:34:51,699 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1142418822-10.100.2.1-1372904314309 (storage id DS-1677272131-10.100.2.1-50010-1372905291690
Decomssion datanode - no response
Hi, I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude, hdfs-site.xml: property namedfs.hosts.exclude/name value/usr/local/hadoop/conf/dfs_exclude/value /property then: hdfs dfsadmin -refreshNodes but there is no decomssion nodes showed on the webUI. and not any releated logs in the datanode log. what's wrong?
Re: Decomssion datanode - no response
Thanks Devaraj, There are no any releated logs in the NN log and DN log. On Fri, Jul 5, 2013 at 11:14 AM, Devaraj k devara...@huawei.com wrote: Do you see any log related to this in Name Node logs when you issue refreshNodes dfsadmin command? ** ** Thanks Devaraj k ** ** *From:* Azuryy Yu [mailto:azury...@gmail.com] *Sent:* 05 July 2013 08:12 *To:* user@hadoop.apache.org *Subject:* Decomssion datanode - no response ** ** Hi, I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude, * *** hdfs-site.xml: property namedfs.hosts.exclude/name value/usr/local/hadoop/conf/dfs_exclude/value /property then: hdfs dfsadmin -refreshNodes but there is no decomssion nodes showed on the webUI. and not any releated logs in the datanode log. what's wrong?
Re: 答复: Decomssion datanode - no response
It's 20 minutes passed after I ran -refreshNodes, but there is no decomssion nodes showed on the UI. and canno find any hints in the NN and DN logs. On Fri, Jul 5, 2013 at 11:16 AM, Francis.Hu francis...@reachjunction.comwrote: I know the default value is 10 minutes and 30 seconds for switching datanodes from live to dead. ** ** *发件人:* Azuryy Yu [mailto:azury...@gmail.com] *发送时间:* Friday, July 05, 2013 10:42 *收件人:* user@hadoop.apache.org *主题:* Decomssion datanode - no response ** ** Hi, I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude, * *** hdfs-site.xml: property namedfs.hosts.exclude/name value/usr/local/hadoop/conf/dfs_exclude/value /property then: hdfs dfsadmin -refreshNodes but there is no decomssion nodes showed on the webUI. and not any releated logs in the datanode log. what's wrong?
Re: Decomssion datanode - no response
I added dfs.hosts.exclude before NN started. and I updated /usr/local/hadoop/conf/dfs_exclude whith new hosts, but It doesn't decomssion. On Fri, Jul 5, 2013 at 11:39 AM, Devaraj k devara...@huawei.com wrote: When did you add this configuration in NN conf? property namedfs.hosts.exclude/name value/usr/local/hadoop/conf/dfs_exclude/value /property ** ** If you have added this configuration after starting NN, it won’t take effect and need to restart NN. ** ** If you have added this config with the exclude file before NN start, you can update the file with new hosts and refreshNodes command can be issued, then newly updated the DN’s will be decommissioned. ** ** Thanks Devaraj k ** ** *From:* Azuryy Yu [mailto:azury...@gmail.com] *Sent:* 05 July 2013 08:48 *To:* user@hadoop.apache.org *Subject:* Re: Decomssion datanode - no response ** ** Thanks Devaraj, There are no any releated logs in the NN log and DN log. ** ** On Fri, Jul 5, 2013 at 11:14 AM, Devaraj k devara...@huawei.com wrote:** ** Do you see any log related to this in Name Node logs when you issue refreshNodes dfsadmin command? Thanks Devaraj k *From:* Azuryy Yu [mailto:azury...@gmail.com] *Sent:* 05 July 2013 08:12 *To:* user@hadoop.apache.org *Subject:* Decomssion datanode - no response Hi, I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude, * *** hdfs-site.xml: property namedfs.hosts.exclude/name value/usr/local/hadoop/conf/dfs_exclude/value /property then: hdfs dfsadmin -refreshNodes but there is no decomssion nodes showed on the webUI. and not any releated logs in the datanode log. what's wrong? ** **
Re: Decomssion datanode - no response
Client hasn't any connection problem. On Fri, Jul 5, 2013 at 12:46 PM, Devaraj k devara...@huawei.com wrote: And also could you check whether the client is connecting to NameNode or any failure in connecting to the NN. ** ** Thanks Devaraj k ** ** *From:* Azuryy Yu [mailto:azury...@gmail.com] *Sent:* 05 July 2013 09:15 *To:* user@hadoop.apache.org *Subject:* Re: Decomssion datanode - no response ** ** I added dfs.hosts.exclude before NN started. and I updated /usr/local/hadoop/conf/dfs_exclude whith new hosts, but It doesn't decomssion. ** ** On Fri, Jul 5, 2013 at 11:39 AM, Devaraj k devara...@huawei.com wrote:** ** When did you add this configuration in NN conf? property namedfs.hosts.exclude/name value/usr/local/hadoop/conf/dfs_exclude/value /property If you have added this configuration after starting NN, it won’t take effect and need to restart NN. If you have added this config with the exclude file before NN start, you can update the file with new hosts and refreshNodes command can be issued, then newly updated the DN’s will be decommissioned. Thanks Devaraj k *From:* Azuryy Yu [mailto:azury...@gmail.com] *Sent:* 05 July 2013 08:48 *To:* user@hadoop.apache.org *Subject:* Re: Decomssion datanode - no response Thanks Devaraj, There are no any releated logs in the NN log and DN log. On Fri, Jul 5, 2013 at 11:14 AM, Devaraj k devara...@huawei.com wrote:** ** Do you see any log related to this in Name Node logs when you issue refreshNodes dfsadmin command? Thanks Devaraj k *From:* Azuryy Yu [mailto:azury...@gmail.com] *Sent:* 05 July 2013 08:12 *To:* user@hadoop.apache.org *Subject:* Decomssion datanode - no response Hi, I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude, * *** hdfs-site.xml: property namedfs.hosts.exclude/name value/usr/local/hadoop/conf/dfs_exclude/value /property then: hdfs dfsadmin -refreshNodes but there is no decomssion nodes showed on the webUI. and not any releated logs in the datanode log. what's wrong? ** **
Re: Could not get additional block while writing hundreds of files
Hi Manuel, 2013-07-03 15:03:16,427 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 3 2013-07-03 15:03:16,427 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root cause:java.io.IOException: File /log/1372863795616 could only be replicated to 0 nodes, instead of 1 This indicates you haven't enough space on the HDFS. can you check the cluster capacity used? On Thu, Jul 4, 2013 at 12:14 AM, Manuel de Ferran manuel.defer...@gmail.com wrote: Greetings all, we try to import data to an HDFS cluster, but we face random Exception. We try to figure out what is the root cause: misconfiguration, too much load, ... and how to solve that. The client writes hundred of files with a replication factor of 3. It crashes sometimes at the beginning, sometimes close to the end, and in rare case it succeeds. On failure, we have on client side: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /log/1372863795616 could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) which seems to be well known. We have followed the hints from the Troubleshooting page, but we're still stuck: lots of disk available on datanodes, free inodes, far below the open files limit , all datanodes are up and running. Note that we have other HDFS clients that are still able to write files while import is running. Here is the corresponding extract of the namenode log file: 2013-07-03 15:03:15,951 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 46009 Total time for transactions(ms): 153Number of transactions batched in Syncs: 5428 Number of syncs: 32889 SyncTimes(ms): 139555 2013-07-03 15:03:16,427 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 3 2013-07-03 15:03:16,427 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root cause:java.io.IOException: File /log/1372863795616 could only be replicated to 0 nodes, instead of 1 2013-07-03 15:03:16,427 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9002, call addBlock(/log/1372863795616, DFSClient_1875494617, null) from 192.168.1.141:41376: error: java.io.IOException: File /log/1372863795616 could only be replicated to 0 nodes, instead of 1 java.io.IOException: File /log/1372863795616 could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) During the process, fsck reports about 300 of open files. The cluster is running hadoop-1.0.3. Any advice about the configuration ? We tried to lower dfs.heartbeat.interval, we raised dfs.datanode.max.xcievers to 4k maybe raising dfs.datanode.handler.count ? Thanks for your help
Re: data loss after cluster wide power loss
Hi Uma, I think there is minimum performance degration if set dfs.datanode.synconclose to true. On Tue, Jul 2, 2013 at 3:31 PM, Uma Maheswara Rao G mahesw...@huawei.comwrote: Hi Dave, Looks like your analysis is correct. I have faced similar issue some time back. See the discussion link: http://markmail.org/message/ruev3aa4x5zh2l4w#query:+page:1+mid:33gcdcu3coodkks3+state:results On sudden restarts, it can lost the OS filesystem edits. Similar thing happened in our case, i.e, after restart blocks were moved back to BeingWritten directory even though they were finalized. After restart they were marked as corrupt. You could set dfs.datanode.synconclose to true to avoid this sort of things, but that will degrade performance. Regards, Uma -Original Message- From: ddlat...@gmail.com [mailto:ddlat...@gmail.com] On Behalf Of Dave Latham Sent: 01 July 2013 16:08 To: hdfs-u...@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org Subject: Re: data loss after cluster wide power loss Much appreciated, Suresh. Let me know if I can provide any more information or if you'd like me to open a JIRA. Dave On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas sur...@hortonworks.com wrote: Dave, Thanks for the detailed email. Sorry I did not read all the details you had sent earlier completely (on my phone). As you said, this is not related to data loss related to HBase log and hsync. I think you are right; the rename operation itself might not have hit the disk. I think we should either ensure metadata operation is synced on the datanode or handle it being reported as blockBeingWritten. Let me spend sometime to debug this issue. One surprising thing is, all the replicas were reported as blockBeingWritten. Regards, Suresh On Mon, Jul 1, 2013 at 6:03 PM, Dave Latham lat...@davelink.net wrote: (Removing hbase list and adding hdfs-dev list as this is pretty internal stuff). Reading through the code a bit: FSDataOutputStream.close calls DFSOutputStream.close calls DFSOutputStream.closeInternal - sets currentPacket.lastPacketInBlock = true - then calls DFSOutputStream.flushInternal - enqueues current packet - waits for ack BlockReceiver.run - if (lastPacketInBlock !receiver.finalized) calls FSDataset.finalizeBlock calls FSDataset.finalizeBlockInternal calls FSVolume.addBlock calls FSDir.addBlock calls FSDir.addBlock - renames block from blocksBeingWritten tmp dir to current dest dir This looks to me as I would expect a synchronous chain from a DFS client to moving the file from blocksBeingWritten to the current dir so that once the file is closed that it the block files would be in the proper directory - even if the contents of the file are still in the OS buffer rather than synced to disk. It's only after this moving of blocks that NameNode.complete file is called. There are several conditions and loops in there that I'm not certain this chain is fully reliable in all cases without a greater understanding of the code. Could it be the case that the rename operation itself is not synced and that ext3 lost the fact that the block files were moved? Or is there a bug in the close file logic that for some reason the block files are not always moved into place when a file is closed? Thanks for your patience, Dave On Mon, Jul 1, 2013 at 3:35 PM, Dave Latham lat...@davelink.net wrote: Thanks for the response, Suresh. I'm not sure that I understand the details properly. From my reading of HDFS-744 the hsync API would allow a client to make sure that at any point in time it's writes so far hit the disk. For example, for HBase it could apply a fsync after adding some edits to its WAL to ensure those edits are fully durable for a file which is still open. However, in this case the dfs file was closed and even renamed. Is it the case that even after a dfs file is closed and renamed that the data blocks would still not be synced and would still be stored by the datanode in blocksBeingWritten rather than in current? If that is case, would it be better for the NameNode not to reject replicas that are in blocksBeingWritten, especially if it doesn't have any other replicas available? Dave On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas sur...@hortonworks.comwrote: Yes this is a known issue. The HDFS part of this was addressed in https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not available in 1.x release. I think HBase does not use this API yet. On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham lat...@davelink.net wrote: We're running HBase over HDFS 1.0.2 on about 1000 nodes. On Saturday the data center we were in had a total power failure and the cluster went down hard. When we brought it back up, HDFS reported 4 files as CORRUPT. We recovered the data in question
What's Yarn?
Hi Dear all, I just fount it occasionally, maybe all you know that, but I just show here again. Yet Another Resource Negotiator—YARN from: http://adtmag.com/blogs/watersworks/2012/08/apache-yarn-promotion.aspx
Re: reply: a question about dfs.replication
It's not HDFS issue. dfs.replication is a client side configuration, not server side. so you need to set it to '2' on your client side( your application running on). THEN execute command such as : hdfs dfs -put or call HDFS API in java application. On Tue, Jul 2, 2013 at 12:25 PM, Francis.Hu francis...@reachjunction.comwrote: Thanks all of you, I just get the problem fixed through the command: hdfs dfs -setrep -R -w 2 / ** ** Is that an issue of HDFS ? Why do i need to execute manually a command to tell the hadoop the replication factor even it is set in hdfs-site.xml ?** ** ** ** Thanks, Francis.Hu ** ** *发件人:* Francis.Hu [mailto:francis...@reachjunction.com] *发送时间:* Tuesday, July 02, 2013 11:30 *收件人:* user@hadoop.apache.org *主题:* 答复: 答复: a question about dfs.replication ** ** Yes , it returns 2 correctly after hdfs getconf -confkey dfs.replication ** ** ** ** but in web page ,it is 3 as below: ** ** *发件人:* yypvsxf19870706 [mailto:yypvsxf19870...@gmail.com] *发送时间:* Monday, July 01, 2013 23:24 *收件人:* user@hadoop.apache.org *主题:* Re: 答复: a question about dfs.replication ** ** Hi ** ** Could you please get the property value by using : hdfs getconf -confkey dfs.replication. iPhone ��?2013-7-1锛?5:51锛�Francis.Hu francis...@reachjunction.com ��锛?br Actually, My java client is running with the same configuration as the hadoop's . The dfs.replication is already set as 2 in my hadoop's configuration. So i think the dfs.replication is already overrided by my configuration in hdfs-site.xml. but seems it doesn't work even i overrided the parameter evidently. *���浠朵�**?span lang=EN-US:* ��屑械谢薪芯胁 ��芯��懈�� [mailto:emelya...@post.km.ruemelya...@post.km.ru] ***�**���堕�**?span lang=EN-US:* Monday, July 01, 2013 15:18 *��朵欢浜?span lang=EN-US:* user@hadoop.apache.org *涓婚**��:* Re: a question about dfs.replication On 01.07.2013 10:19, Francis.Hu wrote: Hi, All I am installing a cluster with Hadoop 2.0.5-alpha. I have one namenode and two datanodes. The dfs.replication is set as 2 in hdfs-site.xml. After all configuration work is done, I started all nodes. Then I saved a file into HDFS through java client. nOW I can access hdfs web page: x.x.x.x:50070,and also see the file is already listed in the hdfs list.*** * My question is: *The replication column in HDFS web page is showing as 3, not 2. Does anyone know What the problem is?* * * ---Actual setting of hdfs-site.xml property namedfs.replication/name value2/value /property After that, I typed dfsamdin command to check the file: hdfs fsck /test3/ The result of above command: /test3/hello005.txt: Under replicated BP-609310498-192.168.219.129-1372323727200:blk_-1069303317294683372_1006. Target Replicas is 3 but found 2 replica(s). Status: HEALTHY Total size:35 B Total dirs:1 Total files: 1 Total blocks (validated): 1 (avg. block size 35 B) Minimally replicated blocks: 1 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 1 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:2 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 1 (33.32 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Sat Jun 29 16:51:37 CST 2013 in 6 milliseconds Thanks, Francis Hu If I'm not mistaking dfs.replication parameter in config sets only default replication factor, which can be overrided when putting file to hdfs. image001.pngimage002.png
Re: java.lang.UnsatisfiedLinkError - Unable to load libGfarmFSNative library
From the log: libGfarmFSNative.so: libgfarm.so.1: cannot open shared object file: No such file or directory I don't think you put libgfarm.* under $HADOOP_HOME/lib/native/Linux-amd64-64 (Linux-i386-32 if running on 32 bits OS) on all nodes. On Thu, Jun 27, 2013 at 10:44 AM, Harsh J ha...@cloudera.com wrote: Is libgfarm.so.1 installed and available on all systems? You're facing a link error though hadoop did try to load the library it had ( libGfarmFSNative.so). If the gfarm guys have a mailing list, thats probably the best place to ask. On Thu, Jun 27, 2013 at 1:06 AM, Marília Melo mariliam...@gmail.comwrote: Hi all, I'm trying to install a plugin called gfarm_hadoop that allows me to use a filesystem called gfarm instead of HDFS ( https://sourceforge.net/projects/gfarm/files/gfarm_hadoop/). I have used it before, but now I'm trying to install it in a new cluster and for some reason it isn't working... After installing gfarm 2.5.8 at /data/local3/marilia/gfarm, hadoop 1.1.2 at /data/local3/marilia/hadoop-1.1.2 and the plugin, when I try to list the new filesystem it works fine: $ bin/hadoop fs -ls gfarm:/// Found 26 items -rwxrwxrwx 1101 2013-06-26 02:36 /foo drwxrwxrwx - 0 2013-06-26 02:43 /home But then when I try to run an example, the task eventually completes, but I get Unable to load libGfarmFSNative library errors. Looking at the logs message it seems to be a path problem, but I have tried almost everything and it doesn't work. The way I'm setting the path now is writing on conf/hadoop-env.sh the following line: export LD_LIBRARY_PATH=/data/local3/marilia/gfarm/lib I have even moved all the .so files to the hadoop directory, but I still get the same message... Any ideas? Thanks in advance. Log: $ bin/hadoop jar hadoop-examples-*.jar teragen 1000 gfarm:///inoa11 Generating 1000 using 2 maps with step of 500 13/06/27 03:57:32 INFO mapred.JobClient: Running job: job_201306270356_0001 13/06/27 03:57:33 INFO mapred.JobClient: map 0% reduce 0% 13/06/27 03:57:38 INFO mapred.JobClient: map 50% reduce 0% 13/06/27 03:57:43 INFO mapred.JobClient: Task Id : attempt_201306270356_0001_m_01_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) attempt_201306270356_0001_m_01_0: java.lang.UnsatisfiedLinkError: /data/local3/marilia/hadoop-1.1.2/lib/native/Linux-amd64-64/libGfarmFSNative.so: libgfarm.so.1: cannot open shared object file: No such file or directory attempt_201306270356_0001_m_01_0: at java.lang.ClassLoader$NativeLibrary.load(Native Method) attempt_201306270356_0001_m_01_0: at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1803) attempt_201306270356_0001_m_01_0: at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1728) attempt_201306270356_0001_m_01_0: at java.lang.Runtime.loadLibrary0(Runtime.java:823) attempt_201306270356_0001_m_01_0: at java.lang.System.loadLibrary(System.java:1028) attempt_201306270356_0001_m_01_0: at org.apache.hadoop.fs.gfarmfs.GfarmFSNative.clinit(GfarmFSNative.java:9) attempt_201306270356_0001_m_01_0: at org.apache.hadoop.fs.gfarmfs.GfarmFileSystem.initialize(GfarmFileSystem.java:34) attempt_201306270356_0001_m_01_0: at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1411) attempt_201306270356_0001_m_01_0: at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) attempt_201306270356_0001_m_01_0: at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1429) attempt_201306270356_0001_m_01_0: at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) attempt_201306270356_0001_m_01_0: at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) attempt_201306270356_0001_m_01_0: at org.apache.hadoop.mapred.FileOutputCommitter.getTempTaskOutputPath(FileOutputCommitter.java:234) attempt_201306270356_0001_m_01_0: at org.apache.hadoop.mapred.Task.initialize(Task.java:522) attempt_201306270356_0001_m_01_0: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) attempt_201306270356_0001_m_01_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:255) attempt_201306270356_0001_m_01_0: at java.security.AccessController.doPrivileged(Native Method) attempt_201306270356_0001_m_01_0: at javax.security.auth.Subject.doAs(Subject.java:396) attempt_201306270356_0001_m_01_0: at
Re: 答复: Help about build cluster on boxes which already has one?
there is no MN, NM is node manager. --Send from my Sony mobile. On Jun 26, 2013 6:31 AM, yuhe justlo...@gmail.com wrote: I plan to use CDH3u4,and what is MN? -- 使用语盒发送 @2013-06-25 22:36 http://www.yuchs.com -- 原始邮件 -- user@hadoop.apache.org @2013年06月25日 15:12 What version of Hadoop are you planning on using? You will probably have to partition the resources too. e.g. If you are using 0.23 / 2.0, the NM available resources memory will have to be split on all the nodes From: Sandeep L sandeepvre...@outlook.com To: user@hadoop.apache.org user@hadoop.apache.org Sent: Tuesday, June 25, 2013 3:53 AM Subject: RE: 答复: Help about build cluster on boxes which already has one? just try by changing ports, then if you get any errors reply, I can help you. Date: Tue, 25 Jun 2013 10:57:06 + From: justlo...@gmail.com To: user@hadoop.apache.org Subject: 答复: Help about build cluster on boxes which already has one? Thanks,anything else? -- 使用语盒发送 @2013-06-25 10:57 http://www.yuchs.com -- 原始邮件 -- user@hadoop.apache.org @2013年06月25日 10:50 Port numbers should not conflict.You need to change port numbers in all configuration files(hbase-*.xml, hdfs-*.xml, core-*.xml, mapred-*.xml) Date: Tue, 25 Jun 2013 10:26:47 + From: justlo...@gmail.com To: user@hadoop.apache.org Subject: Help about build cluster on boxes which already has one? Today I get a task to build a hadoop and hbase cluster on boxes which already has one hadoop cluster on them (use default port ),what things I need to change so I can deploy the new one successfully? I am newbie to hadoop and hbase ,have successfully built hadoop and hbase on vmware ,thanks all -- 使用语盒发送 @2013-06-25 10:26 http://www.yuchs.com
Re: MapReduce job not running - i think i keep all correct configuration.
Can you paste some error logs here? you can find it on the JT or TT. and tell us the hadoop version. On Sun, Jun 23, 2013 at 9:20 PM, Pavan Kumar Polineni smartsunny...@gmail.com wrote: Hi all, first i have a machine with all the demons are running on it. after that i added two data nodes. In this case MR job working fine. Now i changed the first machine to just namenode by stopping all the demons except NN demon. and changed i data node to (SNN.JT,DN,TT) and all are working. i keep the other data node like that only. I changed the configurations to link up the NN and JT. From here when i tried to run MR job this is not running .. Please help Me. Thanks -- Pavan Kumar Polineni
Re: which hadoop version i can choose in production env?
I advice community version of Hadoop-1.1.2, which is a stable release, Hadoop2 hasn't stable release currently, even if all alpha release was extensive tested. but from me, I think HDFS2 is stable now.(no?), MR1 is also stable, but Yarn still need extensive tests(at least I think so), so our prod cluster using HDFS2 and MR1 now. On Tue, Jun 25, 2013 at 10:11 AM, ch huang justlo...@gmail.com wrote: hadoop 1 vs hadoop 2 ,and apache community version vs cloudrea version,anyone can help?
Re: Inputformat
you had to write a JSONInputFormat, or google first to find it. --Send from my Sony mobile. On Jun 23, 2013 7:06 AM, jamal sasha jamalsha...@gmail.com wrote: Then how should I approach this issue? On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes ni...@basjes.nl wrote: If you try to hammer in a nail (json file) with a screwdriver ( XMLInputReader) then perhaps the reason it won't work may be that you are using the wrong tool? On Jun 21, 2013 11:38 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I am using one of the libraries which rely on InputFormat. Right now, it is reading xml files spanning across mutiple lines. So currently the input format is like: public class XMLInputReader extends FileInputFormatLongWritable, Text { public static final String START_TAG = page; public static final String END_TAG = /page; @Override public RecordReaderLongWritable, Text getRecordReader(InputSplit split, JobConf conf, Reporter reporter) throws IOException { conf.set(XMLInputFormat.START_TAG_KEY, START_TAG); conf.set(XMLInputFormat.END_TAG_KEY, END_TAG); return new XMLRecordReader((FileSplit) split, conf); } } So, in above if the data is like: page soemthing \n somthing \n /page It process this sort of data.. Now, i want to use the same framework but for json files but lasting just single line.. So I guess my my START_TAG can be { Will my END_TAG be }\n it can't be } as there can be nested json in this data? Any clues Thanks
Re: How to fail the Name Node or how to crash the Name Node for testing Purpose.
$HADOOP_HOME/bin/hadoop-daemon.sh stop namenode On Wed, Jun 19, 2013 at 2:38 PM, Pavan Kumar Polineni smartsunny...@gmail.com wrote: For Testing The Name Node Crashes and failures. For Single Point of Failure -- Pavan Kumar Polineni
Re: how to close hadoop when tmp files were cleared
ps aux|grep java , you can find pid, then just 'kill -9' to stop the Hadoop process. On Mon, Jun 17, 2013 at 4:34 PM, Harsh J ha...@cloudera.com wrote: Just send the processes a SIGTERM signal (regular kill). Its what the script does anyway. Ensure to change the PID directory before the next restart though. On Mon, Jun 17, 2013 at 1:09 PM, zhang.hen...@zte.com.cn wrote: Hi, My hadoop cluster has been running for a period of time. Now i want to close it for some system changes. But the command bin/stop-all.sh shows no jobtracker to stop,no tasktracker to stop,no namenode to stop and no datanode to stop. I use jps got nothing but jps itself. However, hadoop is indeed running.I think it may be some tmp files about hadoop had been cleared by operation system. Could someone tell me how to stop hadoop in case of no data files breaks ? Any guidance would be greatly appreciated. Thanks! Jeff ZTE Information Security Notice: The information contained in this mail (and any attachment transmitted herewith) is privileged and confidential and is intended for the exclusive use of the addressee(s). If you are not an intended recipient, any disclosure, reproduction, distribution or other dissemination or use of the information contained is strictly prohibited. If you have received this mail in error, please delete it and notify us immediately. -- Harsh J
Re: Error in command: bin/hadoop fs -put conf input
from the log, there is no room on the HDFS. --Send from my Sony mobile. On Jun 16, 2013 5:12 AM, sumit piparsania sumitpiparsa...@yahoo.com wrote: Hi, I am getting the below error while executing the command. Kindly assist me in resolving this issue. $ bin/hadoop fs -put conf input bin/hadoop: line 320: C:\Program: command not found 13/06/16 02:29:13 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 13/06/16 02:29:18 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/Sumit/input/capacity-scheduler.xml could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) at org.apache.hadoop.ipc.Client.call(Client.java:1107) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at $Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3686) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3546) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2749) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2989) 13/06/16 02:29:18 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 13/06/16 02:29:18 WARN hdfs.DFSClient: Could not get block locations. Source file /user/Sumit/input/capacity-scheduler.xml - Aborting... put: java.io.IOException: File /user/Sumit/input/capacity-scheduler.xml could only be replicated to 0 nodes, instead of 1 13/06/16 02:29:18 ERROR hdfs.DFSClient: Failed to close file /user/Sumit/input/capacity-scheduler.xml org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/Sumit/input/capacity-scheduler.xml could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) at org.apache.hadoop.ipc.Client.call(Client.java:1107) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at $Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at $Proxy1.addBlock(Unknown Source) at
Re: read lucene index in mapper
you need to add lucene index tar.gz in the distributed cache as archive, then create index reader in the mapper's setup. --Send from my Sony mobile. On Jun 12, 2013 12:50 AM, parnab kumar parnab.2...@gmail.com wrote: Hi , I need to read an existing lucene index in a map.can someone point me to the right direction. Thanks, Parnab
Re: hadoop 2.0 client configuration
if you want work with HA, yes , all these configuration needed. --Send from my Sony mobile. On Jun 11, 2013 8:05 AM, Praveen M lefthandma...@gmail.com wrote: Hello, I'm a hadoop n00b, and I had recently upgraded from hadoop 0.20.2 to hadoop 2 (chd-4.2.1) For a client configuration to connect to the hadoop cluster, in the earlier 0.20.2 case, I had to specify only the, fs.default.name and mapred.job.tracker config parameters. But now, with the HA configuration of hadoop 2. I have a lot of parameters in my client configuration. fs.defaultFS dfs.nameservices dfs.ha.namenodes.nameservice-id dfs.namenode.rpc-address.nameservice-id.namenode-1 dfs.namenode.rpc-address.nameservice-id.namenode-2 dfs.namenode.shared.edits.dir ha.zookeeper.quorum My Question is, do I really need all these configurations in the client? I'm looking for minimalist client configuration that would let me work with the hadoop cluster in HA mode. Thanks in advance. Thanks you, Praveen
Re: HadoopV2 and HDFS-fuse
hi Harsh, yes, I‘ve build native -Pnative successfully. I also used -Drequire.fuse=true. but I just found contrib/fuse directory is empty. so I asked this question, Thanks Harsh. --Send from my Sony mobile. On Jun 9, 2013 9:09 PM, Harsh J ha...@cloudera.com wrote: Hi Azuryy, Are you not finding it compiled with the global native compile option? Do you face a specific error? Per the pom.xml of hadoop-hdfs, it will build fuse-dfs if native profile is turned on, and you can assert for fuse-requirement with -Drequire.fuse=true. On Sun, Jun 9, 2013 at 11:03 AM, Azuryy Yu azury...@gmail.com wrote: hi, Does anybody can tell me how to compile hdfs-fuse based on Hadoop-2.0-*, Thanks. -- Harsh J
HadoopV2 and HDFS-fuse
hi, Does anybody can tell me how to compile hdfs-fuse based on Hadoop-2.0-*, Thanks.
Re: how to locate the replicas of a file in HDFS?
ClientProtocol namenode = DFSClient.createNamenode(conf); HdfsFileStatus hfs = namenode.getFileInfo(your_hdfs_file_name); LocatedBlocks lbs = namenode.getBlockLocations(your_hdfs_file_name, 0, hfs.getLen()); for (LocatedBlock lb : lbs.getLocatedBlocks()) { DatanodeInfo[] info = lb.getLocations() ; //you can get data node name or address here. } On Tue, Jun 4, 2013 at 2:02 PM, Mahmood Naderan nt_mahm...@yahoo.comwrote: hadoop fsck mytext.txt -files -locations -blocks I expect something like a tag which is attached to each block (say block X) that shows the position of the replicated block of X. The method you mentioned is a user level task. Am I right? Regards, Mahmood* * -- *From:* Rahul Bhattacharjee rahul.rec@gmail.com *To:* user@hadoop.apache.org user@hadoop.apache.org; 一凡 李 zhuazhua_...@yahoo.com.cn *Sent:* Tuesday, June 4, 2013 9:34 AM *Subject:* Re: how to locate the replicas of a file in HDFS? hadoop fsck mytext.txt -files -locations -blocks Thanks, Rahul On Tue, Jun 4, 2013 at 10:19 AM, 一凡 李 zhuazhua_...@yahoo.com.cn wrote: Hi, Could you tell me how to locate where store each replica of a file in HDFS? Correctly speaking, if I create a file in HDFS(replicate factor:3),how to find the DataNodes which store its each block and replicas? Best Wishes, Yifan
Re:
can you upgrade to 1.1.2, which is also a stable release, and fixed the bug you facing now. --Send from my Sony mobile. On Jun 2, 2013 3:23 AM, Shahab Yunus shahab.yu...@gmail.com wrote: Thanks Harsh for the reply. I was confused too that why security is causing this. Regards, Shahab On Sat, Jun 1, 2013 at 12:43 PM, Harsh J ha...@cloudera.com wrote: Shahab - I see he has mentioned generally that security is enabled (but not that it happens iff security is enabled), and the issue here doesn't have anything to do with security really. Azurry - Lets discuss the code issues on the JIRA (instead of here) or on the mapreduce-dev lists. On Sat, Jun 1, 2013 at 10:05 PM, Shahab Yunus shahab.yu...@gmail.com wrote: HI Harsh, Quick question though: why do you think it only happens if the OP 'uses security' as he mentioned? Regards, Shahab On Sat, Jun 1, 2013 at 11:49 AM, Harsh J ha...@cloudera.com wrote: Does smell like a bug as that number you get is simply Long.MAX_VALUE, or 8 exbibytes. Looking at the sources, this turns out to be a rather funny Java issue (there's a divide by zero happening and [1] suggests Long.MAX_VALUE return in such a case). I've logged a bug report for this at https://issues.apache.org/jira/browse/MAPREDUCE-5288 with a reproducible case. Does this happen consistently for you? [1] http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double) On Sat, Jun 1, 2013 at 7:27 PM, Lanati, Matteo matteo.lan...@lrz.de wrote: Hi all, I stumbled upon this problem as well while trying to run the default wordcount shipped with Hadoop 1.2.0. My testbed is made up of 2 virtual machines: Debian 7, Oracle Java 7, 2 GB RAM, 25 GB hard disk. One node is used as JT+NN, the other as TT+DN. Security is enabled. The input file is about 600 kB and the error is 2013-06-01 12:22:51,999 WARN org.apache.hadoop.mapred.JobInProgress: No room for map task. Node 10.156.120.49 has 22854692864 bytes free; but we expect map to take 9223372036854775807 The logfile is attached, together with the configuration files. The version I'm using is Hadoop 1.2.0 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2-r 1479473 Compiled by hortonfo on Mon May 6 06:59:37 UTC 2013 From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405 This command was run using /home/lu95jib/hadoop-exmpl/hadoop-1.2.0/hadoop-core-1.2.0.jar If I run the default configuration (i.e. no securty), then the job succeeds. Is there something missing in how I set up my nodes? How is it possible that the envisaged value for the needed space is so big? Thanks in advance. Matteo Which version of Hadoop are you using. A quick search shows me a bug https://issues.apache.org/jira/browse/HADOOP-5241 that seems to show similar symptoms. However, that was fixed a long while ago. On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui reduno1...@googlemail.com wrote: This the content of the jobtracker log file : 2013-03-23 12:06:48,912 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201303231139_0001 = 6950001. Number of splits = 7 2013-03-23 12:06:48,925 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_00 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,927 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_01 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,930 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_02 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,931 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_03 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,933 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_04 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,934 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_05 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,939 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_06 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,950 INFO org.apache.hadoop.mapred.JobInProgress: job_201303231139_0001 LOCALITY_WAIT_FACTOR=0.5 2013-03-23 12:06:48,978 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201303231139_0001 initialized successfully with 7 map tasks and 1 reduce tasks. 2013-03-23 12:06:50,855 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP) 'attempt_201303231139_0001_m_08_0' to tip task_201303231139_0001_m_08, for tracker
Re:
yes. hadoop-1.1.2 was released on Jan. 31st. just download it. On Tue, Jun 4, 2013 at 6:33 AM, Lanati, Matteo matteo.lan...@lrz.de wrote: Hi Azuryy, thanks for the update. Sorry for the silly question, but where can I download the patched version? If I look into the closest mirror (i.e. http://mirror.netcologne.de/apache.org/hadoop/common/), I can see that the Hadoop 1.1.2 version was last updated on Jan. 31st. Thanks in advance, Matteo PS: just to confirm that I tried a minimal Hadoop 1.2.0 setup, so without any security, and the problem is there. On Jun 3, 2013, at 3:02 PM, Azuryy Yu azury...@gmail.com wrote: can you upgrade to 1.1.2, which is also a stable release, and fixed the bug you facing now. --Send from my Sony mobile. On Jun 2, 2013 3:23 AM, Shahab Yunus shahab.yu...@gmail.com wrote: Thanks Harsh for the reply. I was confused too that why security is causing this. Regards, Shahab On Sat, Jun 1, 2013 at 12:43 PM, Harsh J ha...@cloudera.com wrote: Shahab - I see he has mentioned generally that security is enabled (but not that it happens iff security is enabled), and the issue here doesn't have anything to do with security really. Azurry - Lets discuss the code issues on the JIRA (instead of here) or on the mapreduce-dev lists. On Sat, Jun 1, 2013 at 10:05 PM, Shahab Yunus shahab.yu...@gmail.com wrote: HI Harsh, Quick question though: why do you think it only happens if the OP 'uses security' as he mentioned? Regards, Shahab On Sat, Jun 1, 2013 at 11:49 AM, Harsh J ha...@cloudera.com wrote: Does smell like a bug as that number you get is simply Long.MAX_VALUE, or 8 exbibytes. Looking at the sources, this turns out to be a rather funny Java issue (there's a divide by zero happening and [1] suggests Long.MAX_VALUE return in such a case). I've logged a bug report for this at https://issues.apache.org/jira/browse/MAPREDUCE-5288 with a reproducible case. Does this happen consistently for you? [1] http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double) On Sat, Jun 1, 2013 at 7:27 PM, Lanati, Matteo matteo.lan...@lrz.de wrote: Hi all, I stumbled upon this problem as well while trying to run the default wordcount shipped with Hadoop 1.2.0. My testbed is made up of 2 virtual machines: Debian 7, Oracle Java 7, 2 GB RAM, 25 GB hard disk. One node is used as JT+NN, the other as TT+DN. Security is enabled. The input file is about 600 kB and the error is 2013-06-01 12:22:51,999 WARN org.apache.hadoop.mapred.JobInProgress: No room for map task. Node 10.156.120.49 has 22854692864 bytes free; but we expect map to take 9223372036854775807 The logfile is attached, together with the configuration files. The version I'm using is Hadoop 1.2.0 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2-r 1479473 Compiled by hortonfo on Mon May 6 06:59:37 UTC 2013 From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405 This command was run using /home/lu95jib/hadoop-exmpl/hadoop-1.2.0/hadoop-core-1.2.0.jar If I run the default configuration (i.e. no securty), then the job succeeds. Is there something missing in how I set up my nodes? How is it possible that the envisaged value for the needed space is so big? Thanks in advance. Matteo Which version of Hadoop are you using. A quick search shows me a bug https://issues.apache.org/jira/browse/HADOOP-5241 that seems to show similar symptoms. However, that was fixed a long while ago. On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui reduno1...@googlemail.com wrote: This the content of the jobtracker log file : 2013-03-23 12:06:48,912 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201303231139_0001 = 6950001. Number of splits = 7 2013-03-23 12:06:48,925 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_00 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,927 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_01 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,930 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_02 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,931 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_03 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,933 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_04 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,934 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_05 has
Re:
Hi Harsh, I need to take care my eyes recently, I mis-read 1.2.0 to 1.0.2, so I said upgrade. Sorry. On Tue, Jun 4, 2013 at 9:46 AM, Harsh J ha...@cloudera.com wrote: Azuryy, 1.1.2 1.2.0. Its not an upgrade you're suggesting there. If you feel there's been a regression, can you comment that on the JIRA? On Tue, Jun 4, 2013 at 6:57 AM, Azuryy Yu azury...@gmail.com wrote: yes. hadoop-1.1.2 was released on Jan. 31st. just download it. On Tue, Jun 4, 2013 at 6:33 AM, Lanati, Matteo matteo.lan...@lrz.de wrote: Hi Azuryy, thanks for the update. Sorry for the silly question, but where can I download the patched version? If I look into the closest mirror (i.e. http://mirror.netcologne.de/apache.org/hadoop/common/), I can see that the Hadoop 1.1.2 version was last updated on Jan. 31st. Thanks in advance, Matteo PS: just to confirm that I tried a minimal Hadoop 1.2.0 setup, so without any security, and the problem is there. On Jun 3, 2013, at 3:02 PM, Azuryy Yu azury...@gmail.com wrote: can you upgrade to 1.1.2, which is also a stable release, and fixed the bug you facing now. --Send from my Sony mobile. On Jun 2, 2013 3:23 AM, Shahab Yunus shahab.yu...@gmail.com wrote: Thanks Harsh for the reply. I was confused too that why security is causing this. Regards, Shahab On Sat, Jun 1, 2013 at 12:43 PM, Harsh J ha...@cloudera.com wrote: Shahab - I see he has mentioned generally that security is enabled (but not that it happens iff security is enabled), and the issue here doesn't have anything to do with security really. Azurry - Lets discuss the code issues on the JIRA (instead of here) or on the mapreduce-dev lists. On Sat, Jun 1, 2013 at 10:05 PM, Shahab Yunus shahab.yu...@gmail.com wrote: HI Harsh, Quick question though: why do you think it only happens if the OP 'uses security' as he mentioned? Regards, Shahab On Sat, Jun 1, 2013 at 11:49 AM, Harsh J ha...@cloudera.com wrote: Does smell like a bug as that number you get is simply Long.MAX_VALUE, or 8 exbibytes. Looking at the sources, this turns out to be a rather funny Java issue (there's a divide by zero happening and [1] suggests Long.MAX_VALUE return in such a case). I've logged a bug report for this at https://issues.apache.org/jira/browse/MAPREDUCE-5288 with a reproducible case. Does this happen consistently for you? [1] http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double) On Sat, Jun 1, 2013 at 7:27 PM, Lanati, Matteo matteo.lan...@lrz.de wrote: Hi all, I stumbled upon this problem as well while trying to run the default wordcount shipped with Hadoop 1.2.0. My testbed is made up of 2 virtual machines: Debian 7, Oracle Java 7, 2 GB RAM, 25 GB hard disk. One node is used as JT+NN, the other as TT+DN. Security is enabled. The input file is about 600 kB and the error is 2013-06-01 12:22:51,999 WARN org.apache.hadoop.mapred.JobInProgress: No room for map task. Node 10.156.120.49 has 22854692864 bytes free; but we expect map to take 9223372036854775807 The logfile is attached, together with the configuration files. The version I'm using is Hadoop 1.2.0 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473 Compiled by hortonfo on Mon May 6 06:59:37 UTC 2013 From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405 This command was run using /home/lu95jib/hadoop-exmpl/hadoop-1.2.0/hadoop-core-1.2.0.jar If I run the default configuration (i.e. no securty), then the job succeeds. Is there something missing in how I set up my nodes? How is it possible that the envisaged value for the needed space is so big? Thanks in advance. Matteo Which version of Hadoop are you using. A quick search shows me a bug https://issues.apache.org/jira/browse/HADOOP-5241 that seems to show similar symptoms. However, that was fixed a long while ago. On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui reduno1...@googlemail.com wrote: This the content of the jobtracker log file : 2013-03-23 12:06:48,912 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201303231139_0001 = 6950001. Number of splits = 7 2013-03-23 12:06:48,925 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_00 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,927 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_01 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12
Re:
This should be fixed in hadoop-1.1.2 stable release. if we determine completedMapsInputSize is zero, then job's map tasks MUST be zero, so the estimated output size is zero. below is the code: long getEstimatedMapOutputSize() { long estimate = 0L; if (job.desiredMaps() 0) { estimate = getEstimatedTotalMapOutputSize() / job.desiredMaps(); } return estimate; } On Sat, Jun 1, 2013 at 11:49 PM, Harsh J ha...@cloudera.com wrote: Does smell like a bug as that number you get is simply Long.MAX_VALUE, or 8 exbibytes. Looking at the sources, this turns out to be a rather funny Java issue (there's a divide by zero happening and [1] suggests Long.MAX_VALUE return in such a case). I've logged a bug report for this at https://issues.apache.org/jira/browse/MAPREDUCE-5288 with a reproducible case. Does this happen consistently for you? [1] http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double) On Sat, Jun 1, 2013 at 7:27 PM, Lanati, Matteo matteo.lan...@lrz.de wrote: Hi all, I stumbled upon this problem as well while trying to run the default wordcount shipped with Hadoop 1.2.0. My testbed is made up of 2 virtual machines: Debian 7, Oracle Java 7, 2 GB RAM, 25 GB hard disk. One node is used as JT+NN, the other as TT+DN. Security is enabled. The input file is about 600 kB and the error is 2013-06-01 12:22:51,999 WARN org.apache.hadoop.mapred.JobInProgress: No room for map task. Node 10.156.120.49 has 22854692864 bytes free; but we expect map to take 9223372036854775807 The logfile is attached, together with the configuration files. The version I'm using is Hadoop 1.2.0 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473 Compiled by hortonfo on Mon May 6 06:59:37 UTC 2013 From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405 This command was run using /home/lu95jib/hadoop-exmpl/hadoop-1.2.0/hadoop-core-1.2.0.jar If I run the default configuration (i.e. no securty), then the job succeeds. Is there something missing in how I set up my nodes? How is it possible that the envisaged value for the needed space is so big? Thanks in advance. Matteo Which version of Hadoop are you using. A quick search shows me a bug https://issues.apache.org/jira/browse/HADOOP-5241 that seems to show similar symptoms. However, that was fixed a long while ago. On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui reduno1...@googlemail.com wrote: This the content of the jobtracker log file : 2013-03-23 12:06:48,912 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201303231139_0001 = 6950001. Number of splits = 7 2013-03-23 12:06:48,925 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_00 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,927 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_01 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,930 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_02 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,931 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_03 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,933 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_04 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,934 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_05 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,939 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_06 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,950 INFO org.apache.hadoop.mapred.JobInProgress: job_201303231139_0001 LOCALITY_WAIT_FACTOR=0.5 2013-03-23 12:06:48,978 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201303231139_0001 initialized successfully with 7 map tasks and 1 reduce tasks. 2013-03-23 12:06:50,855 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP) 'attempt_201303231139_0001_m_08_0' to tip task_201303231139_0001_m_08, for tracker 'tracker_hadoop0.novalocal:hadoop0.novalocal/127.0.0.1:44879' 2013-03-23 12:08:00,340 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201303231139_0001_m_08_0' has completed task_201303231139_0001_m_08 successfully. 2013-03-23 12:08:00,538 WARN org.apache.hadoop.mapred.JobInProgress: No room for map task. Node hadoop0.novalocal has 8791543808 bytes free; but we expect map to take 1317624576693539401 2013-03-23 12:08:00,543 WARN org.apache.hadoop.mapred.JobInProgress: No room for map task. Node hadoop0.novalocal has 8791543808 bytes free; but we expect map to take 1317624576693539401 2013-03-23 12:08:00,544 WARN
Re:
just add more, continue the above thread: protected synchronized long getEstimatedTotalMapOutputSize() { if(completedMapsUpdates threshholdToUse) { return 0; } else { long inputSize = job.getInputLength() + job.desiredMaps(); //add desiredMaps() so that randomwriter case doesn't blow up //the multiplication might lead to overflow, casting it with //double prevents it long estimate = Math.round(((double)inputSize * completedMapsOutputSize * 2.0)/completedMapsInputSize); if (LOG.isDebugEnabled()) { LOG.debug(estimate total map output will be + estimate); } return estimate; } } On Sun, Jun 2, 2013 at 12:34 AM, Azuryy Yu azury...@gmail.com wrote: This should be fixed in hadoop-1.1.2 stable release. if we determine completedMapsInputSize is zero, then job's map tasks MUST be zero, so the estimated output size is zero. below is the code: long getEstimatedMapOutputSize() { long estimate = 0L; if (job.desiredMaps() 0) { estimate = getEstimatedTotalMapOutputSize() / job.desiredMaps(); } return estimate; } On Sat, Jun 1, 2013 at 11:49 PM, Harsh J ha...@cloudera.com wrote: Does smell like a bug as that number you get is simply Long.MAX_VALUE, or 8 exbibytes. Looking at the sources, this turns out to be a rather funny Java issue (there's a divide by zero happening and [1] suggests Long.MAX_VALUE return in such a case). I've logged a bug report for this at https://issues.apache.org/jira/browse/MAPREDUCE-5288 with a reproducible case. Does this happen consistently for you? [1] http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double) On Sat, Jun 1, 2013 at 7:27 PM, Lanati, Matteo matteo.lan...@lrz.de wrote: Hi all, I stumbled upon this problem as well while trying to run the default wordcount shipped with Hadoop 1.2.0. My testbed is made up of 2 virtual machines: Debian 7, Oracle Java 7, 2 GB RAM, 25 GB hard disk. One node is used as JT+NN, the other as TT+DN. Security is enabled. The input file is about 600 kB and the error is 2013-06-01 12:22:51,999 WARN org.apache.hadoop.mapred.JobInProgress: No room for map task. Node 10.156.120.49 has 22854692864 bytes free; but we expect map to take 9223372036854775807 The logfile is attached, together with the configuration files. The version I'm using is Hadoop 1.2.0 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473 Compiled by hortonfo on Mon May 6 06:59:37 UTC 2013 From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405 This command was run using /home/lu95jib/hadoop-exmpl/hadoop-1.2.0/hadoop-core-1.2.0.jar If I run the default configuration (i.e. no securty), then the job succeeds. Is there something missing in how I set up my nodes? How is it possible that the envisaged value for the needed space is so big? Thanks in advance. Matteo Which version of Hadoop are you using. A quick search shows me a bug https://issues.apache.org/jira/browse/HADOOP-5241 that seems to show similar symptoms. However, that was fixed a long while ago. On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui reduno1...@googlemail.com wrote: This the content of the jobtracker log file : 2013-03-23 12:06:48,912 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201303231139_0001 = 6950001. Number of splits = 7 2013-03-23 12:06:48,925 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_00 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,927 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_01 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,930 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_02 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,931 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_03 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,933 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_04 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,934 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_05 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,939 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201303231139_0001_m_06 has split on node:/default-rack/hadoop0.novalocal 2013-03-23 12:06:48,950 INFO org.apache.hadoop.mapred.JobInProgress: job_201303231139_0001 LOCALITY_WAIT_FACTOR=0.5 2013-03-23 12:06:48,978 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201303231139_0001 initialized successfully with 7 map tasks and 1 reduce tasks. 2013-03-23 12:06:50,855 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP
Re: Hint on EOFException's on datanodes
maybe network issue, datanode received an incomplete packet. --Send from my Sony mobile. On May 24, 2013 1:39 PM, Stephen Boesch java...@gmail.com wrote: On a smallish (10 node) cluster with only 2 mappers per node after a few minutes EOFExceptions are cropping up on the datanodes: an example is shown below. Any hint on what to tweak/change in hadoop / cluster settings to make this more happy? 2013-05-24 05:03:57,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (org.apache.hadoop.hdfs.server.datanode.DataXceiver@1b1accfc): writeBlock blk_7760450154173670997_48372 received exception java.io.EOFException: while trying to read 65557 bytes 2013-05-24 05:03:57,262 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (PacketResponder 0 for Block blk_-3990749197748165818_48331): PacketResponder 0 for block blk_-3990749197748165818_48331 terminating 2013-05-24 05:03:57,460 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode (org.apache.hadoop.hdfs.server.datanode.DataXceiver@1b1accfc): DatanodeRegistration(10.254.40.79:9200, storageID=DS-1106090267-10.254.40.79-9200-1369343833886, infoPort=9102, ipcPort=9201):DataXceiver java.io.EOFException: while trying to read 65557 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:406) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112) at java.lang.Thread.run(Thread.java:662) 2013-05-24 05:03:57,261 INFO org.apache.hadoop.hdfs.server.datanode.Dat
Re: Keep Kerberos credentials valid after logging out
nohup ./your_bash 1temp.log 21 --Send from my Sony mobile. On May 21, 2013 6:32 PM, zheyi rong zheyi.r...@gmail.com wrote: Hi all, I would like to run my hadoop job in a bash file for several times, e.g. #!/usr/bin/env bash for i in {1..10} do my-hadoop-job done Since I don't want to keep my laptop on for hours, I run this bash script on a server via a SSH session. However, the bash script always terminated after my logging out of that server by 'ctrl-z, bg, disown, exit'. Using GNU 'screen' detaching and reattaching, I can see the following exceptions: Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:554) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:499) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:601) at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:212) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1292) at org.apache.hadoop.ipc.Client.call(Client.java:1121) ... 30 more Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:134) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:415) at org.apache.hadoop.ipc.Client$Connection.access$1100(Client.java:212) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:594) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:591) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:590) ... 33 more Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:130) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175) ... 42 more The cluster is deployed with cdh3. so how can I keep my script running after logging out ? Thank you in advance. Regards, Zheyi Rong
Re: Writing intermediate key,value pairs to file and read it again
you would look at chain reducer java doc, which meet your requirement. --Send from my Sony mobile. On Apr 20, 2013 11:43 PM, Vikas Jadhav vikascjadha...@gmail.com wrote: Hello, Can anyone help me in following issue Writing intermediate key,value pairs to file and read it again let us say i have to write each intermediate pair received @reducer to a file and again read that as key value pair again and use it for processing I found IFile.java file which has reader and writer but i am not able understand how to use it for example. I dont understand Counter value as last parameter spilledRecordsCounter Thanks. -- * * * Regards,* * Vikas *
when to hadoop-2.0 stable release
I don't think this is easy to answer. maybe it's not decided. if so, can you tell me what important features are still deveoping or any other reasons? Appreciate.
Re: Reading and Writing Sequencefile using Hadoop 2.0 Apis
you can use if even if it's depracated. I can find in the org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.java, @Override public void initialize(InputSplit split, TaskAttemptContext context ) throws IOException, InterruptedException { FileSplit fileSplit = (FileSplit) split; conf = context.getConfiguration(); Path path = fileSplit.getPath(); FileSystem fs = path.getFileSystem(conf); this.in = new SequenceFile.Reader(fs, path, conf); this.end = fileSplit.getStart() + fileSplit.getLength(); if (fileSplit.getStart() in.getPosition()) { in.sync(fileSplit.getStart()); // sync to start } this.start = in.getPosition(); more = start end; } On Thu, Apr 18, 2013 at 6:44 AM, sumit ghosh sumi...@yahoo.com wrote: ** I am looking for an example which is using the new Hadoop 2.0 API to read and write Sequence Files. Effectively I need to know how to use these functions: createWriter(Configuration conf, org.apache.hadoop.io.SequenceFile.Writer.Option... opts) The Old definition is not working for me: SequenceFile.createWriter( fs, conf, path, key.getClass(), value.getClass()); Similarly I need to know what will be the code for reading the Sequence file, as the follwoing is deprecated: SequenceFile.Reader(fs, path, conf); Thanks, Sumit
Re: Physically moving HDFS cluster to new
Data nodes name or IP changed cannot cause your data loss. only kept fsimage(under the namenode.data.dir) and all block data on the data nodes, then everything can be recoveryed when your start the cluster. On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown tombrow...@gmail.com wrote: We have a situation where we want to physically move our small (4 node) cluster from one data center to another. As part of this move, each node will receive both a new FQN and a new IP address. As I understand it, HDFS is somehow tied to the the FQN or IP address, and changing them causes data loss. Is there any supported method of moving a cluster this way? Thanks in advance! --Tom
Re: jobtracker not starting - access control exception - folder not owned by me (it claims)
I supposed you start-mapred by user mapred. then hadoop fs -chown -R mpared:mapred /home/jbu/hadoop_local_install/ hadoop-1.0.4/tmp/mapred/system this is caused by fairscheduler, please reach MAPREDUCE-4398https://issues.apache.org/jira/browse/MAPREDUCE-4398 On Mon, Apr 15, 2013 at 6:43 PM, Julian Bui julian...@gmail.com wrote: Hello hadoop users, I can't start my jobtracker and am getting an org.apache.hadoop.security.AccessControlException saying that my hdfs://localhost:9000/home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/system is not owned by jbu (me, my user). However, I check the folder and it is indeed owned by me. Details follow. $ cd /home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/ $ ls -al drwxrwxr-x 6 jbu jbu 4096 Apr 15 03:30 local drwxrwxr-x 2 jbu jbu 4096 Apr 15 03:33 system Looking inside ./hadoop-jbu-jobtracker-jbu-laptop.log: org.apache.hadoop.security.AccessControlException: The systemdir hdfs://localhost:9000/home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/system is not owned by jbu at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2379) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2192) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2186) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291) at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978) 2013-04-15 03:34:13,697 FATAL org.apache.hadoop.mapred.JobTracker: org.apache.hadoop.security.AccessControlException: The systemdir hdfs://localhost:9000/home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/system is not owned by jbu at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2379) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2192) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2186) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291) at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978) So it's still having problems thinking that directory is not owned by me. The log also said: 2013-04-15 03:34:13,695 WARN org.apache.hadoop.mapred.JobTracker: Failed to operate on mapred.system.dir (hdfs://localhost:9000/home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/system) because of permissions. 2013-04-15 03:34:13,695 WARN org.apache.hadoop.mapred.JobTracker: Manually delete the mapred.system.dir (hdfs://localhost:9000/home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/system) and then start the JobTracker. So I deleted the system directory and restarted and the same problem appeared, that I didn't have ownership of the directory. Still won't start. I am using hadoop 1.0.4 on linux mint. Any ideas? Thanks, -Julian
Re: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again
This is zookeeper issue. please paste zookeeper log here. thanks. On Tue, Apr 16, 2013 at 9:58 AM, dylan dwld0...@gmail.com wrote: It is hbase-0.94.2-cdh4.2.0. ** ** *发件人:* Ted Yu [mailto:yuzhih...@gmail.com] *发送时间:* 2013年4月16日 9:55 *收件人:* u...@hbase.apache.org *主题:* Re: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again ** ** I think this question would be more appropriate for HBase user mailing list. ** ** Moving hadoop user to bcc. ** ** Please tell us the HBase version you are using. ** ** Thanks On Mon, Apr 15, 2013 at 6:51 PM, dylan dwld0...@gmail.com wrote: Hi I am a newer for hadoop, and set up hadoop with tarball . I have 5 nodes for cluster, 2 NN nodes with QJM (3 Journal Nodes, one of them on DN node. ), 3 DN nodes with zookeepers, It works fine. When I reboot one data node machine which includes zookeeper, after that , restart all processes. The hadoop works fine, but hbase not. I cannot disable tables and drop tables. The logs an follows: The Hbase HMaster log: DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: -ROOT-,,0.70236052 state=CLOSING, ts=1366001558865, server=Master,6,1366001238313 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again 10,684 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region -ROOT-,,0.70236052 (offlining) The Hbase HRegionServer log: DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=7.44 MB, free=898.81 MB, max=906.24 MB, blocks=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, evicted=0, evictedPerRun=NaN The Hbase Web show: Region State 70236052-ROOT-,,0.70236052 state=CLOSING, ts=Mon Apr 15 12:52:38 CST 2013 (75440s ago), server=Master,6,1366001238313 How fix it? Thanks. ** **
Re: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again
it located under hbase-home/logs/ if your zookeeper is managed by hbase. but I noticed you configured QJM, then did your QJM and Hbase share the same ZK cluster? if so, then just paste your QJM zk configuration in the hdfs-site.xml and hbase zk configuration in the hbase-site.xml. On Tue, Apr 16, 2013 at 10:37 AM, dylan dwld0...@gmail.com wrote: How to check zookeeper log?? It is the binary files, how to transform it to normal log? ** **I find the “ org.apache.zookeeper.server.LogFormatter”, how to run?** ** ** ** *发件人:* Azuryy Yu [mailto:azury...@gmail.com] *发送时间:* 2013年4月16日 10:01 *收件人:* user@hadoop.apache.org *主题:* Re: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again ** ** This is zookeeper issue. ** ** please paste zookeeper log here. thanks. ** ** On Tue, Apr 16, 2013 at 9:58 AM, dylan dwld0...@gmail.com wrote: It is hbase-0.94.2-cdh4.2.0. *发件人:* Ted Yu [mailto:yuzhih...@gmail.com] *发送时间:* 2013年4月16日 9:55 *收件人:* u...@hbase.apache.org *主题:* Re: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again I think this question would be more appropriate for HBase user mailing list. Moving hadoop user to bcc. Please tell us the HBase version you are using. Thanks On Mon, Apr 15, 2013 at 6:51 PM, dylan dwld0...@gmail.com wrote: Hi I am a newer for hadoop, and set up hadoop with tarball . I have 5 nodes for cluster, 2 NN nodes with QJM (3 Journal Nodes, one of them on DN node. ), 3 DN nodes with zookeepers, It works fine. When I reboot one data node machine which includes zookeeper, after that , restart all processes. The hadoop works fine, but hbase not. I cannot disable tables and drop tables. The logs an follows: The Hbase HMaster log: DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: -ROOT-,,0.70236052 state=CLOSING, ts=1366001558865, server=Master,6,1366001238313 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again 10,684 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region -ROOT-,,0.70236052 (offlining) The Hbase HRegionServer log: DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=7.44 MB, free=898.81 MB, max=906.24 MB, blocks=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, evicted=0, evictedPerRun=NaN The Hbase Web show: Region State 70236052-ROOT-,,0.70236052 state=CLOSING, ts=Mon Apr 15 12:52:38 CST 2013 (75440s ago), server=Master,6,1366001238313 How fix it? Thanks. ** **
Re: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again
and paste ZK configuration in the zookeerp_home/conf/zoo.cfg On Tue, Apr 16, 2013 at 10:42 AM, Azuryy Yu azury...@gmail.com wrote: it located under hbase-home/logs/ if your zookeeper is managed by hbase. but I noticed you configured QJM, then did your QJM and Hbase share the same ZK cluster? if so, then just paste your QJM zk configuration in the hdfs-site.xml and hbase zk configuration in the hbase-site.xml. On Tue, Apr 16, 2013 at 10:37 AM, dylan dwld0...@gmail.com wrote: How to check zookeeper log?? It is the binary files, how to transform it to normal log? ** **I find the “ org.apache.zookeeper.server.LogFormatter”, how to run? ** ** ** ** *发件人:* Azuryy Yu [mailto:azury...@gmail.com] *发送时间:* 2013年4月16日 10:01 *收件人:* user@hadoop.apache.org *主题:* Re: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again ** ** This is zookeeper issue. ** ** please paste zookeeper log here. thanks. ** ** On Tue, Apr 16, 2013 at 9:58 AM, dylan dwld0...@gmail.com wrote: It is hbase-0.94.2-cdh4.2.0. *发件人:* Ted Yu [mailto:yuzhih...@gmail.com] *发送时间:* 2013年4月16日 9:55 *收件人:* u...@hbase.apache.org *主题:* Re: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again I think this question would be more appropriate for HBase user mailing list. Moving hadoop user to bcc. Please tell us the HBase version you are using. Thanks On Mon, Apr 15, 2013 at 6:51 PM, dylan dwld0...@gmail.com wrote: Hi I am a newer for hadoop, and set up hadoop with tarball . I have 5 nodes for cluster, 2 NN nodes with QJM (3 Journal Nodes, one of them on DN node. ), 3 DN nodes with zookeepers, It works fine. When I reboot one data node machine which includes zookeeper, after that , restart all processes. The hadoop works fine, but hbase not. I cannot disable tables and drop tables. The logs an follows: The Hbase HMaster log: DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: -ROOT-,,0.70236052 state=CLOSING, ts=1366001558865, server=Master,6,1366001238313 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again 10,684 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region -ROOT-,,0.70236052 (offlining) The Hbase HRegionServer log: DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=7.44 MB, free=898.81 MB, max=906.24 MB, blocks=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, evicted=0, evictedPerRun=NaN The Hbase Web show: Region State 70236052-ROOT-,,0.70236052 state=CLOSING, ts=Mon Apr 15 12:52:38 CST 2013 (75440s ago), server=Master,6,1366001238313 How fix it? Thanks. ** **
Re: 答复: 答复: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again
then, can you find zookeeper log under zookeeper_home/zookeeper.out ? On Tue, Apr 16, 2013 at 11:04 AM, dylan dwld0...@gmail.com wrote: I use hbase shell ** ** I always show : ERROR: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing ** ** *发件人:* Azuryy Yu [mailto:azury...@gmail.com] *发送时间:* 2013年4月16日 10:59 *收件人:* user@hadoop.apache.org *主题:* Re: 答复: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again ** ** did your hbase managed zookeeper? or did you set export HBASE_MANAGES_ZK=false in the hbase-env.sh? ** ** if not, then that's zookeeper port conflicted. ** ** On Tue, Apr 16, 2013 at 10:55 AM, dylan dwld0...@gmail.com wrote: # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/usr/cdh4/zookeeper/data # the port at which the clients will connect clientPort=2181 server.1=Slave01:2888:3888 server.2=Slave02:2888:3888 server.3=Slave03:2888:3888 *发件人:* Azuryy Yu [mailto:azury...@gmail.com] *发送时间:* 2013年4月16日 10:45 *收件人:* user@hadoop.apache.org *主题:* Re: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again and paste ZK configuration in the zookeerp_home/conf/zoo.cfg On Tue, Apr 16, 2013 at 10:42 AM, Azuryy Yu azury...@gmail.com wrote:*** * it located under hbase-home/logs/ if your zookeeper is managed by hbase.* *** but I noticed you configured QJM, then did your QJM and Hbase share the same ZK cluster? if so, then just paste your QJM zk configuration in the hdfs-site.xml and hbase zk configuration in the hbase-site.xml. On Tue, Apr 16, 2013 at 10:37 AM, dylan dwld0...@gmail.com wrote: How to check zookeeper log?? It is the binary files, how to transform it to normal log? I find the “ org.apache.zookeeper.server.LogFormatter”, how to run? *发件人:* Azuryy Yu [mailto:azury...@gmail.com] *发送时间:* 2013年4月16日 10:01 *收件人:* user@hadoop.apache.org *主题:* Re: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again This is zookeeper issue. please paste zookeeper log here. thanks. On Tue, Apr 16, 2013 at 9:58 AM, dylan dwld0...@gmail.com wrote: It is hbase-0.94.2-cdh4.2.0. *发件人:* Ted Yu [mailto:yuzhih...@gmail.com] *发送时间:* 2013年4月16日 9:55 *收件人:* u...@hbase.apache.org *主题:* Re: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again I think this question would be more appropriate for HBase user mailing list. Moving hadoop user to bcc. Please tell us the HBase version you are using. Thanks On Mon, Apr 15, 2013 at 6:51 PM, dylan dwld0...@gmail.com wrote: Hi I am a newer for hadoop, and set up hadoop with tarball . I have 5 nodes for cluster, 2 NN nodes with QJM (3 Journal Nodes, one of them on DN node. ), 3 DN nodes with zookeepers, It works fine. When I reboot one data node machine which includes zookeeper, after that , restart all processes. The hadoop works fine, but hbase not. I cannot disable tables and drop tables. The logs an follows: The Hbase HMaster log: DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: -ROOT-,,0.70236052 state=CLOSING, ts=1366001558865, server=Master,6,1366001238313 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again 10,684 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region -ROOT-,,0.70236052 (offlining) The Hbase HRegionServer log: DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=7.44 MB, free=898.81 MB, max=906.24 MB, blocks=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, evicted=0, evictedPerRun=NaN The Hbase Web show: Region State 70236052-ROOT-,,0.70236052 state=CLOSING
Re: 答复: 答复: 答复: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again
I cannot find any useful information from pasted logs. On Tue, Apr 16, 2013 at 11:22 AM, dylan dwld0...@gmail.com wrote: yes. I have just discovered. ** ** I find the Slave01 and Slave03 zookeeper.out under zookeeper_home/bin/*** * But Slave02(which reboot before) zookeeper_home under / directory after reboot ** ** *Slave02 zookeeper.out show:* WARN [RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting SendWorker 2013-04-15 16:38:31,987 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2094) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:370)* *** at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831) at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667) [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker@688] - Send worker leaving thread [myid:2] - INFO [Slave02/192.168.75.243:3888 :QuorumCnxManager$Listener@493] - Received connection request / 192.168.75.242:51136 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x5037d (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x5 (n.peerEPoch), FOLLOWING (my state) [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 1 (n.leader), 0x5037d (n.zxid), 0x2 (n.round), LOOKING (n.state), 1 (n.sid), 0x5 (n.peerEPoch), FOLLOWING (my state) ** ** ** ** *Slave01 zookeeper.out show:* [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x13e0dc5a0890005 type:create cxid:0x1e zxid:0xb003c txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired 2013-04-16 10:58:26,415 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x13e0dc5a0890006 type:create cxid:0x7 zxid:0xb003d txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired 2013-04-16 10:58:26,431 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x13e0dc5a0890007 type:create cxid:0x7 zxid:0xb003e txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired 2013-04-16 10:58:26,489 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x23e0dc5a333000a type:create cxid:0x7 zxid:0xb003f txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired 2013-04-16 10:58:36,001 [myid:1] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x33e0dc5b4de0003, timeout of 4ms exceeded 2013-04-16 10:58:36,001 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x33e0dc5b4de0003 2013-04-16 11:03:44,000 [myid:1] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x23e0dc5a333000b, timeout of 4ms exceeded 2013-04-16 11:03:44,001 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x23e0dc5a333000b ** ** ** ** ** ** *发件人:* Azuryy Yu [mailto:azury...@gmail.com] *发送时间:* 2013年4月16日 11:13 *收件人:* user@hadoop.apache.org *主题:* Re: 答复: 答复: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again ** ** then, can you find zookeeper log under zookeeper_home/zookeeper.out ? ** ** On Tue, Apr 16, 2013 at 11:04 AM, dylan dwld0...@gmail.com wrote: I use hbase shell I always show : ERROR: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing *发件人:* Azuryy Yu [mailto:azury...@gmail.com] *发送时间:* 2013年4月16日 10:59 *收件人:* user@hadoop.apache.org *主题:* Re: 答复: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send
Re: A question of QJM with HDFS federation
Hi Harsh, If they are two separate cluster, instead of federated, haing the same cluster ID, but using different name service ID, so can they use the same journal nodes and ZK nodes? and if they are also separated cluster, with different clusterID, but using the same name service ID, so can they use the same journal nodes and ZK nodes? Thanks. On Mon, Apr 8, 2013 at 2:57 PM, Azuryy Yu azury...@gmail.com wrote: Thank you very much, Harsh. not yet question now. --Send from my Sony mobile. On Apr 8, 2013 2:51 PM, Harsh J ha...@cloudera.com wrote: Hi Azurry, QJM: Yes, multiple nameservices can share a single QJM set. The QJM configuration allows for a journal ID prefix path which you should configure to be the nameservice ID. You do not need to change disk paths/etc. at all. For example, NS1 NNs can have the dfs.namenode.shared.edits.dir configured as: qjournal://node1.example.com:8485;node2.example.com:8485; node3.example.com:8485/NS1 NS2 NNs can have the dfs.namenode.shared.edits.dir configured as: qjournal://node1.example.com:8485;node2.example.com:8485; node3.example.com:8485/NS2 Which will separate both of them logically and still make them use the same QJM set of nodes. ZKFC: Each NN needs its own HDFS ZKFC daemon, but all ZKFCs across multiple NSes can share a single ZK cluster. All ZKFC's core-site.xml can have the same ha.zookeeper.quorum value since the ZKFC automatically reuses the nameservice ID as its parent znode name on the ZK instance, and won't collide with another NSs' ZKFCs. Do post back if there are still some more doubts. On Mon, Apr 8, 2013 at 10:53 AM, Azuryy Yu azury...@gmail.com wrote: Hi dears, I deployed Hadoopv2 with HA enabled using QJM, so my question is: 1) if we also configured HDFS federation, such as: NN1 is active, NN2 is standby NN3 is active, NN4 is standby they are configured as HDFS federation, then, Can these four NNs using the same Journal nodes and ZKs? if your answer is yes, does that enough just using different dfs.journalnode.edits.dir? such as: NN1 and NN2 configured dfs.journalnode.edits.dir as /data1, NN3 and NN4 configured dfs.journalnode.edits.dir as /data2 Thanks. -- Harsh J
Re: Mapper always hangs at the same spot
agree. just check your app. or paste map code here. --Send from my Sony mobile. On Apr 14, 2013 4:08 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Your application logic is likely stuck in a loop. On Sat, Apr 13, 2013 at 12:47 PM, Chris Hokamp chris.hok...@gmail.comwrote: When you say never progresses, do you see the MR framework kill it automatically after 10 minutes of inactivity or does it never ever exit? The latter -- it never exits. Killing it manually seems like a good option for now. We already have mapred.max.map.failures.percent set to a non-zero value, but because the task never fails, this never comes into effect. Thanks for the help, Chris On Sat, Apr 13, 2013 at 5:00 PM, Harsh J ha...@cloudera.com wrote: When you say never progresses, do you see the MR framework kill it automatically after 10 minutes of inactivity or does it never ever exit? You can lower the timeout period on tasks via mapred.task.timeout set in msec. You could also set mapred.max.map.failures.percent to a non-zero value to allow that much percentage of tasks to fail without also marking the whole job as a failure. If the task itself does not get killed by the framework due to inactiveness, try doing a hadoop job -fail-task on its attempt ID manually. On Sat, Apr 13, 2013 at 8:45 PM, Chris Hokamp chris.hok...@gmail.com wrote: Hello, We have a job where all mappers finish except for one, which always hangs at the same spot (i.e. reaches 49%, then never progresses). This is likely due to a bug in the wiki parser in our Pig UDF. We can afford to lose the data this mapper is working on if it would allow the job to finish. Question: is there a hadoop configuration parameter similar to mapred.skip.map.max.skip.records that would let us skip a map that doesn't progress after X amount of time? Any other possible workarounds for this case would also be useful. We are currently using hadoop 1.1.0 and Pig 0.10.1. Thanks, Chris -- Harsh J
Re: Copy Vs DistCP
yes, you are right. On Thu, Apr 11, 2013 at 3:40 PM, Hemanth Yamijala yhema...@thoughtworks.com wrote: AFAIK, the cp command works fully from the DFS client. It reads bytes from the InputStream created when the file is opened and writes the same to the OutputStream of the file. It does not work at the level of data blocks. A configuration io.file.buffer.size is used as the size of the buffer used in copy - set to 4096 by default. Thanks Hemanth On Thu, Apr 11, 2013 at 9:42 AM, KayVajj vajjalak...@gmail.com wrote: If CP command is not parallel how does it work for a file partitioned on various data nodes? On Wed, Apr 10, 2013 at 6:30 PM, Azuryy Yu azury...@gmail.com wrote: CP command is not parallel, It's just call FileSystem, even if DFSClient has multi threads. DistCp can work well on the same cluster. On Thu, Apr 11, 2013 at 8:17 AM, KayVajj vajjalak...@gmail.com wrote: The File System Copy utility copies files byte by byte if I'm not wrong. Could it be possible that the cp command works with blocks and moves them which could be significantly efficient? Also how does the cp command work if the file is distributed on different data nodes?? Thanks Kay On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas jayunit...@gmail.com wrote: DistCP is a full blown mapreduce job (mapper only, where the mappers do a fully parallel copy to the detsination). CP appears (correct me if im wrong) to simply invoke the FileSystem and issues a copy command for every source file. I have an additional question: how is CP which is internal to a cluster optimized (if at all) ? On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 shurong@qunar.com wrote: ** Hi, I think it' better using Copy in the same cluster while using distCP between clusters, and cp command is a hadoop internal parallel process and will not copy files locally. -- 麦树荣 *From:* KayVajj vajjalak...@gmail.com *Date:* 2013-04-11 06:20 *To:* user@hadoop.apache.org *Subject:* Copy Vs DistCP I have few questions regarding the usage of DistCP for copying files in the same cluster. 1) Which one is better within a same cluster and what factors (like file size etc) wouldinfluence the usage of one over te other? 2) when we run a cp command like below from a client node of the cluster (not a data node), How does the cp command work i) like an MR job ii) copy files locally and then it copy it back at the new location. Example of the copy command hdfs dfs -cp /some_location/file /new_location/ Thanks, your responses are appreciated. -- Kay -- Jay Vyas http://jayunit100.blogspot.com
Re: Copy Vs DistCP
DistCP is prefer for your requirements. On Fri, Apr 12, 2013 at 12:52 AM, KayVajj vajjalak...@gmail.com wrote: Summing up what would be the recommendations for copy 1) DistCP 2) shell cp command 3) Using File System API(FileUtils to be precise) inside of a Java program 4) A MR with an Identity Mapper and no Reducer (may be this is what DistCP does) I did not run any comparisons as my dev cluster is just a two node cluster and not sure how this would perform on a production cluster. Kay On Thu, Apr 11, 2013 at 5:44 AM, Jay Vyas jayunit...@gmail.com wrote: Yes makes sense... cp is serialized and simpler, and does not rely on jobtracker- Whereas distcp actually only submits a job and waits for completion. So it can fail if tasks start to fail or timeout. I Have seen distcp fail and hang before albeit not often. Sent from my iPhone On Apr 10, 2013, at 10:37 PM, Alexander Pivovarov apivova...@gmail.com wrote: if cluster is busy with other jobs distcp will wait for free map slots. Regular cp is more reliable and predictable. Especialy if you need to copy just several GB On Apr 10, 2013 6:31 PM, Azuryy Yu azury...@gmail.com wrote: CP command is not parallel, It's just call FileSystem, even if DFSClient has multi threads. DistCp can work well on the same cluster. On Thu, Apr 11, 2013 at 8:17 AM, KayVajj vajjalak...@gmail.com wrote: The File System Copy utility copies files byte by byte if I'm not wrong. Could it be possible that the cp command works with blocks and moves them which could be significantly efficient? Also how does the cp command work if the file is distributed on different data nodes?? Thanks Kay On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas jayunit...@gmail.com wrote: DistCP is a full blown mapreduce job (mapper only, where the mappers do a fully parallel copy to the detsination). CP appears (correct me if im wrong) to simply invoke the FileSystem and issues a copy command for every source file. I have an additional question: how is CP which is internal to a cluster optimized (if at all) ? On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 shurong@qunar.com wrote: ** Hi, I think it' better using Copy in the same cluster while using distCP between clusters, and cp command is a hadoop internal parallel process and will not copy files locally. -- 麦树荣 *From:* KayVajj vajjalak...@gmail.com *Date:* 2013-04-11 06:20 *To:* user@hadoop.apache.org *Subject:* Copy Vs DistCP I have few questions regarding the usage of DistCP for copying files in the same cluster. 1) Which one is better within a same cluster and what factors (like file size etc) wouldinfluence the usage of one over te other? 2) when we run a cp command like below from a client node of the cluster (not a data node), How does the cp command work i) like an MR job ii) copy files locally and then it copy it back at the new location. Example of the copy command hdfs dfs -cp /some_location/file /new_location/ Thanks, your responses are appreciated. -- Kay -- Jay Vyas http://jayunit100.blogspot.com
Re: The Job.xml file
Yes, you can start a job directly from a job.xml try hadoop job -submit JOB_FILE, replace JOB_FILE with your job.xml. On Wed, Apr 10, 2013 at 12:25 AM, Jay Vyas jayunit...@gmail.com wrote: Hi guys: I cant find much info about the life cycle for the job.xml file in hadoop. My thoughts are : 1) It is created by the job client 2) It is only read by the JobTracker 3) Task trackers (indirectly) are configured by information in job.xml because the JobTracker decomposes its contents into individual tasks So, my (related) questions are: Is there a way to start a job directly from a job.xml file? What components depend on and read the job.xml file? Where is the job.xml defined/documented (if anywhere)? -- Jay Vyas http://jayunit100.blogspot.com
Re: backup node question
Hi Harsh, Do you mean BackupNameNode is Secondary NameNode in Hadoop1.x? On Sun, Apr 7, 2013 at 4:05 PM, Harsh J ha...@cloudera.com wrote: Yes, it need not keep an edits (transactions) stream locally cause those are passed synchronously to the BackupNameNode, which persists it on its behalf. On Sun, Apr 7, 2013 at 1:21 PM, Lin Ma lin...@gmail.com wrote: Thanks Harsh, For your comments, What it means is that the NameNode need not store anything locally, you mean Primary Name Node do not need to store checkpoint/journal locally, and only need to keep memory image up-to-date for edits? regards, Lin On Sun, Apr 7, 2013 at 3:31 PM, Harsh J ha...@cloudera.com wrote: Hi Lin, My reply inline. On Sun, Apr 7, 2013 at 12:36 PM, Lin Ma lin...@gmail.com wrote: Hi guys, I am reading from this paper to learn about backup nodes (http://www.storageconference.org/2010/Papers/MSST/Shvachko.pdf), It is mentioned, It contains all file system metadata information except for block locations. It can perform all operations of the regular NameNode that do not involve modification of the namespace or knowledge of block locations. , what kinds of operations do not need knowledge of block locations? Operations that do not involve data reads or writes would not require knowledge of block locations. Applying also the restriction of no namespace mutation, an example would be listing directories and looking up file information via FileStatus objects (perhaps the only examples - its like a safemode but no reads either). It is also mentioned, Use of a BackupNode provides the option of running the NameNode without persistent storage, delegating responsibility for the namespace state persisting to the BackupNode., what means running the NameNode without persistent storage and delegating responsibility for the namespace state persisting? What it means is that the NameNode need not store anything locally, but can rely on the edits being stored at the BackupNameNode which would continuously be receiving it. When restarted, it can grab a current checkpoint from the BNN and boot up anywhere, since there's no local storage requirement. -- Harsh J -- Harsh J
Re: backup node question
I am confused. Hadoopv2 has NN SNN DN JN(journal node), so whats Standby Namenode? --Send from my Sony mobile. On Apr 7, 2013 9:03 PM, Harsh J ha...@cloudera.com wrote: BackupNameNode is not present in the maintenance 1.x releases, it is a feature added to a higher version; you can try it out in 2.x today if you wish to. On Sun, Apr 7, 2013 at 3:12 PM, Azuryy Yu azury...@gmail.com wrote: Hi Harsh, Do you mean BackupNameNode is Secondary NameNode in Hadoop1.x? On Sun, Apr 7, 2013 at 4:05 PM, Harsh J ha...@cloudera.com wrote: Yes, it need not keep an edits (transactions) stream locally cause those are passed synchronously to the BackupNameNode, which persists it on its behalf. On Sun, Apr 7, 2013 at 1:21 PM, Lin Ma lin...@gmail.com wrote: Thanks Harsh, For your comments, What it means is that the NameNode need not store anything locally, you mean Primary Name Node do not need to store checkpoint/journal locally, and only need to keep memory image up-to-date for edits? regards, Lin On Sun, Apr 7, 2013 at 3:31 PM, Harsh J ha...@cloudera.com wrote: Hi Lin, My reply inline. On Sun, Apr 7, 2013 at 12:36 PM, Lin Ma lin...@gmail.com wrote: Hi guys, I am reading from this paper to learn about backup nodes (http://www.storageconference.org/2010/Papers/MSST/Shvachko.pdf), It is mentioned, It contains all file system metadata information except for block locations. It can perform all operations of the regular NameNode that do not involve modification of the namespace or knowledge of block locations. , what kinds of operations do not need knowledge of block locations? Operations that do not involve data reads or writes would not require knowledge of block locations. Applying also the restriction of no namespace mutation, an example would be listing directories and looking up file information via FileStatus objects (perhaps the only examples - its like a safemode but no reads either). It is also mentioned, Use of a BackupNode provides the option of running the NameNode without persistent storage, delegating responsibility for the namespace state persisting to the BackupNode., what means running the NameNode without persistent storage and delegating responsibility for the namespace state persisting? What it means is that the NameNode need not store anything locally, but can rely on the edits being stored at the BackupNameNode which would continuously be receiving it. When restarted, it can grab a current checkpoint from the BNN and boot up anywhere, since there's no local storage requirement. -- Harsh J -- Harsh J -- Harsh J
Re: backup node question
SNN=secondary name node in my last mail. --Send from my Sony mobile. On Apr 7, 2013 10:01 PM, Azuryy Yu azury...@gmail.com wrote: I am confused. Hadoopv2 has NN SNN DN JN(journal node), so whats Standby Namenode? --Send from my Sony mobile. On Apr 7, 2013 9:03 PM, Harsh J ha...@cloudera.com wrote: BackupNameNode is not present in the maintenance 1.x releases, it is a feature added to a higher version; you can try it out in 2.x today if you wish to. On Sun, Apr 7, 2013 at 3:12 PM, Azuryy Yu azury...@gmail.com wrote: Hi Harsh, Do you mean BackupNameNode is Secondary NameNode in Hadoop1.x? On Sun, Apr 7, 2013 at 4:05 PM, Harsh J ha...@cloudera.com wrote: Yes, it need not keep an edits (transactions) stream locally cause those are passed synchronously to the BackupNameNode, which persists it on its behalf. On Sun, Apr 7, 2013 at 1:21 PM, Lin Ma lin...@gmail.com wrote: Thanks Harsh, For your comments, What it means is that the NameNode need not store anything locally, you mean Primary Name Node do not need to store checkpoint/journal locally, and only need to keep memory image up-to-date for edits? regards, Lin On Sun, Apr 7, 2013 at 3:31 PM, Harsh J ha...@cloudera.com wrote: Hi Lin, My reply inline. On Sun, Apr 7, 2013 at 12:36 PM, Lin Ma lin...@gmail.com wrote: Hi guys, I am reading from this paper to learn about backup nodes (http://www.storageconference.org/2010/Papers/MSST/Shvachko.pdf), It is mentioned, It contains all file system metadata information except for block locations. It can perform all operations of the regular NameNode that do not involve modification of the namespace or knowledge of block locations. , what kinds of operations do not need knowledge of block locations? Operations that do not involve data reads or writes would not require knowledge of block locations. Applying also the restriction of no namespace mutation, an example would be listing directories and looking up file information via FileStatus objects (perhaps the only examples - its like a safemode but no reads either). It is also mentioned, Use of a BackupNode provides the option of running the NameNode without persistent storage, delegating responsibility for the namespace state persisting to the BackupNode., what means running the NameNode without persistent storage and delegating responsibility for the namespace state persisting? What it means is that the NameNode need not store anything locally, but can rely on the edits being stored at the BackupNameNode which would continuously be receiving it. When restarted, it can grab a current checkpoint from the BNN and boot up anywhere, since there's no local storage requirement. -- Harsh J -- Harsh J -- Harsh J
Re: backup node question
oh, got it. you are a good guy. --Send from my Sony mobile. On Apr 7, 2013 10:11 PM, Harsh J ha...@cloudera.com wrote: StandbyNameNode is the term we use to refer to a NameNode in HA that is currently not the active one (i.e. its state is 'Standby'). Its not a special type of daemon (i.e. it just runs the NameNode service), just a naming convention. On Sun, Apr 7, 2013 at 7:31 PM, Azuryy Yu azury...@gmail.com wrote: I am confused. Hadoopv2 has NN SNN DN JN(journal node), so whats Standby Namenode? --Send from my Sony mobile. On Apr 7, 2013 9:03 PM, Harsh J ha...@cloudera.com wrote: BackupNameNode is not present in the maintenance 1.x releases, it is a feature added to a higher version; you can try it out in 2.x today if you wish to. On Sun, Apr 7, 2013 at 3:12 PM, Azuryy Yu azury...@gmail.com wrote: Hi Harsh, Do you mean BackupNameNode is Secondary NameNode in Hadoop1.x? On Sun, Apr 7, 2013 at 4:05 PM, Harsh J ha...@cloudera.com wrote: Yes, it need not keep an edits (transactions) stream locally cause those are passed synchronously to the BackupNameNode, which persists it on its behalf. On Sun, Apr 7, 2013 at 1:21 PM, Lin Ma lin...@gmail.com wrote: Thanks Harsh, For your comments, What it means is that the NameNode need not store anything locally, you mean Primary Name Node do not need to store checkpoint/journal locally, and only need to keep memory image up-to-date for edits? regards, Lin On Sun, Apr 7, 2013 at 3:31 PM, Harsh J ha...@cloudera.com wrote: Hi Lin, My reply inline. On Sun, Apr 7, 2013 at 12:36 PM, Lin Ma lin...@gmail.com wrote: Hi guys, I am reading from this paper to learn about backup nodes (http://www.storageconference.org/2010/Papers/MSST/Shvachko.pdf ), It is mentioned, It contains all file system metadata information except for block locations. It can perform all operations of the regular NameNode that do not involve modification of the namespace or knowledge of block locations. , what kinds of operations do not need knowledge of block locations? Operations that do not involve data reads or writes would not require knowledge of block locations. Applying also the restriction of no namespace mutation, an example would be listing directories and looking up file information via FileStatus objects (perhaps the only examples - its like a safemode but no reads either). It is also mentioned, Use of a BackupNode provides the option of running the NameNode without persistent storage, delegating responsibility for the namespace state persisting to the BackupNode., what means running the NameNode without persistent storage and delegating responsibility for the namespace state persisting? What it means is that the NameNode need not store anything locally, but can rely on the edits being stored at the BackupNameNode which would continuously be receiving it. When restarted, it can grab a current checkpoint from the BNN and boot up anywhere, since there's no local storage requirement. -- Harsh J -- Harsh J -- Harsh J -- Harsh J
A question of QJM with HDFS federation
Hi dears, I deployed Hadoopv2 with HA enabled using QJM, so my question is: 1) if we also configured HDFS federation, such as: NN1 is active, NN2 is standby NN3 is active, NN4 is standby they are configured as HDFS federation, then, Can these four NNs using the same Journal nodes and ZKs? if your answer is yes, does that enough just using different dfs.journalnode.edits.dir? such as: NN1 and NN2 configured dfs.journalnode.edits.dir as /data1, NN3 and NN4 configured dfs.journalnode.edits.dir as /data2 Thanks.
Re: Hadoop 1.04 with Eclipice
download hadoop-0.20.203, there is hadoop-eclipse plugin, which also supports hadoop-1.0.4 Send from my Sony mobile. On Apr 5, 2013 11:14 PM, sahil soni whitepaper2...@gmail.com wrote: Hi All, I have installed the Hadoop 1.04 on the Red Hat Linux 5 . I want to install the Eclipse any version on Windows 7 and want to use that Windows Eclipse for Development of Hadoop . I tried to google and download Hadoop plugins for the same . But i am not able to configure it on windows . Though with same plugins i am able to configure it on Linux system itself where Hadoop is running . 1.) So back to Original Question :- How to configure Hadoop Eclipse Plugin on Windows while your Hadoop 1.04 is running on Linux ? 2.) Please sent me clear steps to How Built a Plugin and how to add those 5 jars which we have to add .As i tried to add those jars to Plugins and it did not worked . There is no document online which clearly tells step by step to do it . Thanks in advance . Please sent me the clear instructions how to follow these steps :- The general approach is to 1. checkout the code from Apache's SVN 2. modify build.properties in /src/contrib/eclipse-plugin and add eclipse.home=path to eclipse 3. download apache forrest 0.8 and sun jdk 5 4. run ant command as ant clean package -Djava5.home=/opt/java/jdk1.5.0_22 -Dforrest.home=/opt/apache-forrest-0.8 (replace the paths as per your config) 5. you should be online for this 6. after that eclipse plugin should be there in /build/contrib/eclipse-plugin 7. Now the plugin thus made is not correct 8. open the jar and add the jars of the following in /lib of the jar- commons-configuration, commons-lang, jackson-core-asl,jackson-mapper-asl 9. modify MANIFEST.MF in /META-INF of the jar to include these paths such as Bundle-ClassPath: classes/,lib/hadoop-core.jar,lib/jackson-mapper-asl-1.8.8.jar,lib/jackson-core-asl-1.8.8.jar,lib/commons-configuration-1.6.jar,lib/commons-lang-2.4.jar 10. copy this jar to plugins folder of eclipse 11. run eclipse -clean 12. switch to map reduce perspective I tried to follow it but don't succeed . Regards
Re: MVN repository for hadoop trunk
hi, do you think trunk is also stable as well as released stable? --Send from my Sony mobile. On Apr 7, 2013 5:01 AM, Harsh J ha...@cloudera.com wrote: I don't think we publish nightly or rolling jars anywhere on maven central from trunk builds. On Sun, Apr 7, 2013 at 2:17 AM, Jay Vyas jayunit...@gmail.com wrote: Hi guys: Is there a mvn repo for hadoop's 3.0.0 trunk build? Clearly the hadoop pom.xml allows us to build hadoop from scratch and installs it as 3.0.0-SNAPSHOT -- but its not clear wether there is a published version of this snapshot jar somewhere. -- Jay Vyas http://jayunit100.blogspot.com -- Harsh J
Re: What do ns_quota and ds_quota mean in an namednode entry
name space, disk space. ns means block numbers limits. ds is total file size limitation. On Apr 4, 2013 3:12 PM, Bert Yuan bert.y...@gmail.com wrote: Bellow is json format of an namednode entry: { inode:{ inodepath:'/anotherDir/biggerfile', replication:3, modificationtime:'2009-03-16 14:15', accesstime:'2009-03-16 14:15', blocksize:134217728, blocks:{ numblocks:3, block:[ { blockid:-3825289017228345300, numbytes:134217728, generationstamp:1002 }, { blockid:-561951562131659300, numbytes:134217728, generationstamp:1002 }, { blockid:524543674153269000, numbytes:18196208, generationstamp:1002 } ] }, nsquota:-1, dsquota:-1, permissions:{ username:'jhoman', groupname:'supergroup', permstring:'rw-r--r--' } } } I don't know what's the meaning of 'nsquota' and 'dsquota', could anyone explain this to me?
Re: are we able to decommission multi nodes at one time?
not at all. so don't worry about that. On Wed, Apr 3, 2013 at 2:04 PM, Yanbo Liang yanboha...@gmail.com wrote: It means that may be some replicas will be stay in under replica state? 2013/4/3 Azuryy Yu azury...@gmail.com bq. then namenode start to copy block replicates on DN-2 to another DN, supposed DN-2. sorry for typo. Correct for it: then namenode start to copy block replicates on DN-1 to another DN, supposed DN-2. On Wed, Apr 3, 2013 at 9:51 AM, Azuryy Yu azury...@gmail.com wrote: It's different. If you just want to stop DN-1 a short time, just kill the DataNode process on DN-1. then do what you want. during this time, Namenode cannot receive the heart beat from DN-1, then namenode start to copy block replicates on DN-2 to another DN, supposed DN-2. But when you start DN-1 again, Namenode receive the DN-1 registration, then namenode stop to copy the DN-1's block replicates even if NN doesn't finish coping. Am I explain clearly? On Wed, Apr 3, 2013 at 9:43 AM, Henry Junyoung Kim henry.jy...@gmail.com wrote: @Harsh What's the reasons to make big gaps for removing nodes between decommission and just down nodes? In my understanding, both are necessary to copy un-replicated blocks to another alive nodes. If main costs of them are this one, total elapsed time couldn't be big different. Could you share some articles or documents to understand about decommissioning procedures? - explaining is always thanks ;) 2013. 4. 2., 오후 5:37, Harsh J ha...@cloudera.com 작성: Yes, you can do the downtime work in steps of 2 DNs at a time, especially since you mentioned the total work would be only ~30mins at most. On Tue, Apr 2, 2013 at 1:46 PM, Henry Junyoung Kim henry.jy...@gmail.com wrote: the rest of nodes to be alive has enough size to store. for this one that you've mentioned. its easier to do so in a rolling manner without need of a decommission. to check my understanding, just shutting down 2 of them and then 2 more and then 2 more without decommissions. is this correct? 2013. 4. 2., 오후 4:54, Harsh J ha...@cloudera.com 작성: Note though that its only possible to decommission 7 nodes at the same time and expect it to finish iff the remaining 8 nodes have adequate free space for the excess replicas. If you're just going to take them down for a short while (few mins each), its easier to do so in a rolling manner without need of a decommission. You can take upto two down at a time on a replication average of 3 or 3+, and put it back in later without too much data movement impact. On Tue, Apr 2, 2013 at 1:06 PM, Yanbo Liang yanboha...@gmail.com wrote: It's reasonable to decommission 7 nodes at the same time. But may be it also takes long time to finish it. Because all the replicas in these 7 nodes need to be copied to remaining 8 nodes. The size of transfer from these nodes to the remaining nodes is equal. 2013/4/2 Henry Junyoung Kim henry.jy...@gmail.com :) currently, I have 15 data nodes. for some tests, I am trying to decommission until 8 nodes. Now, the total dfs used size is 52 TB which is including all replicated blocks. from 15 to 8, total spent time is almost 4 days long. ;( someone mentioned that I don't need to decommission node by node. for this case, is there no problems if I decommissioned 7 nodes at the same time? 2013. 4. 2., 오후 12:14, Azuryy Yu azury...@gmail.com 작성: I can translate it to native English: how many nodes you want to decommission? On Tue, Apr 2, 2013 at 11:01 AM, Yanbo Liang yanboha...@gmail.com wrote: You want to decommission how many nodes? 2013/4/2 Henry JunYoung KIM henry.jy...@gmail.com 15 for datanodes and 3 for replication factor. 2013. 4. 1., 오후 3:23, varun kumar varun@gmail.com 작성: How many nodes do you have and replication factor for it. -- Harsh J -- Harsh J
Re: are we able to decommission multi nodes at one time?
It's different. If you just want to stop DN-1 a short time, just kill the DataNode process on DN-1. then do what you want. during this time, Namenode cannot receive the heart beat from DN-1, then namenode start to copy block replicates on DN-2 to another DN, supposed DN-2. But when you start DN-1 again, Namenode receive the DN-1 registration, then namenode stop to copy the DN-1's block replicates even if NN doesn't finish coping. Am I explain clearly? On Wed, Apr 3, 2013 at 9:43 AM, Henry Junyoung Kim henry.jy...@gmail.comwrote: @Harsh What's the reasons to make big gaps for removing nodes between decommission and just down nodes? In my understanding, both are necessary to copy un-replicated blocks to another alive nodes. If main costs of them are this one, total elapsed time couldn't be big different. Could you share some articles or documents to understand about decommissioning procedures? - explaining is always thanks ;) 2013. 4. 2., 오후 5:37, Harsh J ha...@cloudera.com 작성: Yes, you can do the downtime work in steps of 2 DNs at a time, especially since you mentioned the total work would be only ~30mins at most. On Tue, Apr 2, 2013 at 1:46 PM, Henry Junyoung Kim henry.jy...@gmail.com wrote: the rest of nodes to be alive has enough size to store. for this one that you've mentioned. its easier to do so in a rolling manner without need of a decommission. to check my understanding, just shutting down 2 of them and then 2 more and then 2 more without decommissions. is this correct? 2013. 4. 2., 오후 4:54, Harsh J ha...@cloudera.com 작성: Note though that its only possible to decommission 7 nodes at the same time and expect it to finish iff the remaining 8 nodes have adequate free space for the excess replicas. If you're just going to take them down for a short while (few mins each), its easier to do so in a rolling manner without need of a decommission. You can take upto two down at a time on a replication average of 3 or 3+, and put it back in later without too much data movement impact. On Tue, Apr 2, 2013 at 1:06 PM, Yanbo Liang yanboha...@gmail.com wrote: It's reasonable to decommission 7 nodes at the same time. But may be it also takes long time to finish it. Because all the replicas in these 7 nodes need to be copied to remaining 8 nodes. The size of transfer from these nodes to the remaining nodes is equal. 2013/4/2 Henry Junyoung Kim henry.jy...@gmail.com :) currently, I have 15 data nodes. for some tests, I am trying to decommission until 8 nodes. Now, the total dfs used size is 52 TB which is including all replicated blocks. from 15 to 8, total spent time is almost 4 days long. ;( someone mentioned that I don't need to decommission node by node. for this case, is there no problems if I decommissioned 7 nodes at the same time? 2013. 4. 2., 오후 12:14, Azuryy Yu azury...@gmail.com 작성: I can translate it to native English: how many nodes you want to decommission? On Tue, Apr 2, 2013 at 11:01 AM, Yanbo Liang yanboha...@gmail.com wrote: You want to decommission how many nodes? 2013/4/2 Henry JunYoung KIM henry.jy...@gmail.com 15 for datanodes and 3 for replication factor. 2013. 4. 1., 오후 3:23, varun kumar varun@gmail.com 작성: How many nodes do you have and replication factor for it. -- Harsh J -- Harsh J
Re: are we able to decommission multi nodes at one time?
bq. then namenode start to copy block replicates on DN-2 to another DN, supposed DN-2. sorry for typo. Correct for it: then namenode start to copy block replicates on DN-1 to another DN, supposed DN-2. On Wed, Apr 3, 2013 at 9:51 AM, Azuryy Yu azury...@gmail.com wrote: It's different. If you just want to stop DN-1 a short time, just kill the DataNode process on DN-1. then do what you want. during this time, Namenode cannot receive the heart beat from DN-1, then namenode start to copy block replicates on DN-2 to another DN, supposed DN-2. But when you start DN-1 again, Namenode receive the DN-1 registration, then namenode stop to copy the DN-1's block replicates even if NN doesn't finish coping. Am I explain clearly? On Wed, Apr 3, 2013 at 9:43 AM, Henry Junyoung Kim henry.jy...@gmail.comwrote: @Harsh What's the reasons to make big gaps for removing nodes between decommission and just down nodes? In my understanding, both are necessary to copy un-replicated blocks to another alive nodes. If main costs of them are this one, total elapsed time couldn't be big different. Could you share some articles or documents to understand about decommissioning procedures? - explaining is always thanks ;) 2013. 4. 2., 오후 5:37, Harsh J ha...@cloudera.com 작성: Yes, you can do the downtime work in steps of 2 DNs at a time, especially since you mentioned the total work would be only ~30mins at most. On Tue, Apr 2, 2013 at 1:46 PM, Henry Junyoung Kim henry.jy...@gmail.com wrote: the rest of nodes to be alive has enough size to store. for this one that you've mentioned. its easier to do so in a rolling manner without need of a decommission. to check my understanding, just shutting down 2 of them and then 2 more and then 2 more without decommissions. is this correct? 2013. 4. 2., 오후 4:54, Harsh J ha...@cloudera.com 작성: Note though that its only possible to decommission 7 nodes at the same time and expect it to finish iff the remaining 8 nodes have adequate free space for the excess replicas. If you're just going to take them down for a short while (few mins each), its easier to do so in a rolling manner without need of a decommission. You can take upto two down at a time on a replication average of 3 or 3+, and put it back in later without too much data movement impact. On Tue, Apr 2, 2013 at 1:06 PM, Yanbo Liang yanboha...@gmail.com wrote: It's reasonable to decommission 7 nodes at the same time. But may be it also takes long time to finish it. Because all the replicas in these 7 nodes need to be copied to remaining 8 nodes. The size of transfer from these nodes to the remaining nodes is equal. 2013/4/2 Henry Junyoung Kim henry.jy...@gmail.com :) currently, I have 15 data nodes. for some tests, I am trying to decommission until 8 nodes. Now, the total dfs used size is 52 TB which is including all replicated blocks. from 15 to 8, total spent time is almost 4 days long. ;( someone mentioned that I don't need to decommission node by node. for this case, is there no problems if I decommissioned 7 nodes at the same time? 2013. 4. 2., 오후 12:14, Azuryy Yu azury...@gmail.com 작성: I can translate it to native English: how many nodes you want to decommission? On Tue, Apr 2, 2013 at 11:01 AM, Yanbo Liang yanboha...@gmail.com wrote: You want to decommission how many nodes? 2013/4/2 Henry JunYoung KIM henry.jy...@gmail.com 15 for datanodes and 3 for replication factor. 2013. 4. 1., 오후 3:23, varun kumar varun@gmail.com 작성: How many nodes do you have and replication factor for it. -- Harsh J -- Harsh J