Re: Hadoop Test libraries: Where did they go ?
Yup , we figured it out eventually. The artifacts now use the test-jar directive which creates a jar file that you can reference in mvn using the type tag in your dependencies. However, fyi, I haven't been able to successfully google for the quintessential classes in the hadoop test libs like the fs BaseContractTest by name, so they are now harder to find then before So i think it's unfortunate that they are not a top level maven artifact. It's misleading, as It's now very easy to assume from looking at hadoop in mvn central that hadoop-test is just an old library that nobody updates anymore. Just a thought but Maybe hadoop-test could be rejuvenated to point to the hadoop-commons some how? On Nov 25, 2013, at 4:52 AM, Steve Loughran ste...@hortonworks.com wrote: I see a hadoop-common-2.2.0-tests.jar in org.apache.hadoop/hadoop-?common; SHA1 a9994d261d00295040a402cd2f611a2bac23972a, which resolves in a search engine to http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.2.0/ It looks like it is now part of the hadoop-common artifacts, you just say you want the test bits http://maven.apache.org/guides/mini/guide-attached-tests.html On 21 November 2013 23:28, Jay Vyas jayunit...@gmail.com wrote: It appears to me that http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-test Is no longer updated Where does hadoop now package the test libraries? Looking in the .//hadoop-common-project/hadoop-common/pom.xml file in the hadoop 2X branches, im not sure wether or not src/test is packaged into a jar anymore... but i fear it is not. -- Jay Vyas http://jayunit100.blogspot.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Relationship between heap sizes and mapred.child.java.opt configuration
I'm learning about Hadoop configuration. What is the connection between the datanode/ tasktracker heap sizes and the mapre.child.java.opts? Does one have to be exceeded to another?
Re: Relationship between heap sizes and mapred.child.java.opt configuration
mapred.child.java.opts are referring to the settings for the JVMs spawned by the TaskTracker. This JVMs will actually run the tasks (mappers and reducers) The heap sizes for TaskTrackers and DataNodes are unrelated to those. They run in their own JVMs each. Kai Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu chjaso...@gmail.com: I'm learning about Hadoop configuration. What is the connection between the datanode/ tasktracker heap sizes and the mapre.child.java.opts? Does one have to be exceeded to another? Kai Voigt Am Germaniahafen 1 k...@123.org 24143 Kiel +49 160 96683050 Germany @KaiVoigt
Re: Relationship between heap sizes and mapred.child.java.opt configuration
Thanks for the reply. So what is the purpose of heap sizes for tasktrackers and datanodes then? In other words, if I want to speed up the map/reducing cycle, can I just minimize the heap size and maximize the mapred.child.java.opts? or will the minimizing heap sizes causing out of memory exception? On Mon, Nov 25, 2013 at 10:02 AM, Kai Voigt k...@123.org wrote: mapred.child.java.opts are referring to the settings for the JVMs spawned by the TaskTracker. This JVMs will actually run the tasks (mappers and reducers) The heap sizes for TaskTrackers and DataNodes are unrelated to those. They run in their own JVMs each. Kai Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu chjaso...@gmail.com: I'm learning about Hadoop configuration. What is the connection between the datanode/ tasktracker heap sizes and the mapre.child.java.opts? Does one have to be exceeded to another? -- *Kai Voigt* Am Germaniahafen 1 k...@123.org 24143 Kiel +49 160 96683050 Germany @KaiVoigt
Map/Reduce/Driver jar(s) organization
I'm curious what are some best practices for structuring jars for a business framework that uses Map/Reduce? Note: This is assuming you aren't invoking MR manually via the cmd line, but have Hadoop integrated into a larger business framework that invokes MR jobs programmatically. By business framework I mean an architecture that includes a services component (REST, app server, whatever), business domain logic, and Hadoop MR jobs, etc. Here are some common code artifacts in such an architecture: * Map/Reduce classes * Hadoop Driver classes that configure the MR job and invoke them * Biz Domain classes that invoke the Hadoop driver classes, within the context of some business process * Services classes that interface between user-calls/system-events and biz domain logic Are most people creating monolithic jars that have all classes for all layers? Separating all hadoop related classes from domain level classes? Are you putting the MR classes in the same jar as the Hadoop driver classes, or separate jars? Thanks, Turbo -- Thanks, John C
Errors running Hadoop 2.2.0 on Cygwin
I have the following error while running 2.2.0 using cygwin. anyone can help with the problem ? /cygdrive/c/hadoop-2.2.0/bin $ ./hdfs namenode -format java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/server/namenode/NameNode Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.server.namenode.NameNode at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode. Program will exit. Exception in thread main
Re: Errors running Hadoop 2.2.0 on Cygwin
Can you show us the classpath ? Cheers On Tue, Nov 26, 2013 at 2:40 AM, Srinivas Chamarthi srinivas.chamar...@gmail.com wrote: I have the following error while running 2.2.0 using cygwin. anyone can help with the problem ? /cygdrive/c/hadoop-2.2.0/bin $ ./hdfs namenode -format java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/server/namenode/NameNode Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.server.namenode.NameNode at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode. Program will exit. Exception in thread main
Re: Errors running Hadoop 2.2.0 on Cygwin
added echo $CLASSPATH in libexec/hadoop-config.sh and here is what it contains C:\hadoop-2.2.0\etc\hadoop;C:\hadoop-2.2.0\share\hadoop\common\lib\*;C:\hadoop-2.2.0\share\hadoop\common\*:/cygdrive/c/hadoop-2.2.0/share/hadoop/hdfs:/cygdrive/c/hadoop-2.2.0/share/hadoop/hdfs/lib/*:/cygdrive/c/hadoop-2.2.0/share/hadoop/hdfs/*:/cygdrive/c/hadoop-2.2.0/share/hadoop/yarn/lib/*:/cygdrive/c/hadoop-2.2.0/share/hadoop/yarn/*:/cygdrive/c/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:/cygdrive/c/hadoop-2.2.0/share/hadoop/mapreduce/*:/cygdrive/c/hadoop-2.2.0//contrib/capacity-scheduler/*.jar I can clearly see windows paths in the classpath. I think is the reason for the issue. But I haven't mentioned anything expilcitly with windows based paths this is what is mentioned in my ~/.bashrc file export HADOOP_HOME=/cygdrive/c/hadoop-2.2.0/ export HADOOP_MAPRED_HOME=/cygdrive/c/hadoop-2.2.0 export HADOOP_COMMON_HOME=/cygdrive/c/hadoop-2.2.0 export HADOOP_HDFS_HOME=/cygdrive/c/hadoop-2.2.0 export YARN_HOME=/cygdrive/c/hadoop-2.2.0 export HADOOP_CONFIG_DIRECTORY=/cygdrive/c/hadoop-2.2.0/etc/hadoop On Mon, Nov 25, 2013 at 10:43 AM, Ted Yu yuzhih...@gmail.com wrote: Can you show us the classpath ? Cheers On Tue, Nov 26, 2013 at 2:40 AM, Srinivas Chamarthi srinivas.chamar...@gmail.com wrote: I have the following error while running 2.2.0 using cygwin. anyone can help with the problem ? /cygdrive/c/hadoop-2.2.0/bin $ ./hdfs namenode -format java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/server/namenode/NameNode Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.server.namenode.NameNode at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode. Program will exit. Exception in thread main
About Hadoop
can Map Reduce will run on HDFS or any other file system ? HDFS is Mandatory
Re: About Hadoop
you don't necessarily have to have to hdfs to run mapreduce. But its recommended :) On Mon, Nov 25, 2013 at 3:25 PM, RajBasha S rajbash...@ermslive.com wrote: can Map Reduce will run on HDFS or any other file system ? HDFS is Mandatory -- Nitin Pawar
Re: Desicion Tree Implementation in Hadoop MapReduce
As far as I know, there is no ID3 implementation in mahout currently, but you can use the decision forest instead. https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example. 2013/11/25 unmesha sreeveni unmeshab...@gmail.com Is that ID3 classification? It includes prediction also? On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang yexiji...@gmail.com wrote: You can directly find it at https://github.com/apache/mahout, or you can check out from svn by following https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control. 2013/11/23 unmesha sreeveni unmeshab...@gmail.com I want to go through Decision tree implementation in mahout. Refereed Apache Mahout http://mahout.apache.org/ 6 Feb 2012 - Apache Mahout 0.6 released Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include: Improved Decision Tree performance and added support for regression problems Where can I find its source code and documentation. Should I download mahout -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer* -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer* -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: Time taken for starting AMRMClientAsync
Hi Krishna, Are you starting all AMs from the same JVM? Mind sharing the code you are using for your time testing? Thx On Thu, Nov 21, 2013 at 6:11 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I have modified the code in hadoop-2.2.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java to submit multiple application masters one after another and still seeing 800 to 900 ms being taken for the start() call on AMRMClientAsync in all of those applications. Please suggest if you think I am missing something else Thanks, Kishore On Tue, Nov 19, 2013 at 6:07 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I don't know what are managed and unmanaged AMs, can you please explain me what are the difference and how are each of them launched? I tried to google for these terms and came across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it related to that? Thanks, Kishore On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur t...@cloudera.comwrote: Kishore, Also, please specify if you are using managed or unmanaged AMs (the numbers I've mentioned before are using unmanaged AMs). thx On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: It is just creating a connection to RM and shouldn't take that long. Can you please file a ticket so that we can look at it? JVM class loading overhead is one possibility but 1 sec is a bit too much. Thanks, +Vinod On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote: Hi, I am seeing the following call to start() on AMRMClientAsync taking from 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I mean does it depend on any of the interval parameters or so in configuration files? I have tried reducing the value of the first argument below from 1000 to 100 seconds also, but that doesn't help. AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro -- Alejandro
RE: Heterogeneous Cluster
Yes, I set one up as a test. I had a windows cluster of 3 machines, and added a 4th Linux node. The Data Node was able to connect and replicate, but MR jobs failed. JobTracker/TaskTracker wasn't translating the path to the data block. They were telling the Linux node to look in C:\ for the data block, and that obviously didn't work. I feel like it's only a small change away from working, at that point, I had to shift my focus and couldn't work on it any more. Have you had any success? Andrew Andrew Machtolff / Senior Consultant [http://images.askcts.com/images/cts_logo_email.png]http://www.askcts.com/ www.askcts.comhttp://www.askcts.com/ amachto...@askcts.commailto:amachto...@askcts.com [http://images.askcts.com/images/cts_logo_linkedin.png]http://www.linkedin.com/company/cts-inc[http://images.askcts.com/images/cts_logo_twitter.png]https://twitter.com/askCTS[http://images.askcts.com/images/cts_logo_facebook.png]https://www.facebook.com/askCTS From: Ian Jackson [mailto:ian_jack...@trilliumsoftware.com] Sent: Friday, November 22, 2013 4:05 PM To: user@hadoop.apache.org Subject: Heterogeneous Cluster Has anyone set up a Heterogeneous cluster, some Windows nodes and Linux nodes? inline: image001.jpginline: image002.jpginline: image003.jpginline: image004.jpg
Only one reducer running on canopy generator
Hi all, I have been experiencing memory issue while working with Mahout canopy algorithm on big set of data on Hadoop. I notice that only one reducer was running while other nodes were idle. I was wondering if increasing the number of reduce tasks would ease down the memory usage and speed up procedure. However, I realize that by configuring mapred.reduce.tasks on Hadoop has no effect on canopy reduce tasks. It's still running only with one reducer. Now, I'm question if canopy is set that way, or am I not configuring correct on Hadoop?
How can I remote debug application master
Hi, I build a customized application master but have some issues, is it possible for me to remote debug the application master ? Thanks
How can I see the history log of non-mapreduce job in yarn
I have configured the history server of yarn. But it looks like it can only help me to see the history log of mapreduce jobs. I still could not see the logs of non-mapreduce job. How can I see the history log of non-mapreduce job ?
Re: Time taken for starting AMRMClientAsync
Hi Alejandro, I don't start all the AMs from the same JVM. How can I do that? Also, when I do that, that will save me time taken to get AM started, which is also good to see an improvement in. Please let me know how can I do that? And, would this also save me time taken for connecting from AM to the Resource Manager? Thanks, Kishore On Tue, Nov 26, 2013 at 3:45 AM, Alejandro Abdelnur t...@cloudera.comwrote: Hi Krishna, Are you starting all AMs from the same JVM? Mind sharing the code you are using for your time testing? Thx On Thu, Nov 21, 2013 at 6:11 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I have modified the code in hadoop-2.2.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java to submit multiple application masters one after another and still seeing 800 to 900 ms being taken for the start() call on AMRMClientAsync in all of those applications. Please suggest if you think I am missing something else Thanks, Kishore On Tue, Nov 19, 2013 at 6:07 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I don't know what are managed and unmanaged AMs, can you please explain me what are the difference and how are each of them launched? I tried to google for these terms and came across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it related to that? Thanks, Kishore On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur t...@cloudera.comwrote: Kishore, Also, please specify if you are using managed or unmanaged AMs (the numbers I've mentioned before are using unmanaged AMs). thx On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: It is just creating a connection to RM and shouldn't take that long. Can you please file a ticket so that we can look at it? JVM class loading overhead is one possibility but 1 sec is a bit too much. Thanks, +Vinod On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote: Hi, I am seeing the following call to start() on AMRMClientAsync taking from 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I mean does it depend on any of the interval parameters or so in configuration files? I have tried reducing the value of the first argument below from 1000 to 100 seconds also, but that doesn't help. AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro -- Alejandro
Re: Desicion Tree Implementation in Hadoop MapReduce
ok . Thx Yexi On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang yexiji...@gmail.com wrote: As far as I know, there is no ID3 implementation in mahout currently, but you can use the decision forest instead. https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example. 2013/11/25 unmesha sreeveni unmeshab...@gmail.com Is that ID3 classification? It includes prediction also? On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang yexiji...@gmail.com wrote: You can directly find it at https://github.com/apache/mahout, or you can check out from svn by following https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control. 2013/11/23 unmesha sreeveni unmeshab...@gmail.com I want to go through Decision tree implementation in mahout. Refereed Apache Mahout http://mahout.apache.org/ 6 Feb 2012 - Apache Mahout 0.6 released Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include: Improved Decision Tree performance and added support for regression problems Where can I find its source code and documentation. Should I download mahout -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer* -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer* -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer*
Re: Desicion Tree Implementation in Hadoop MapReduce
You are welcome :) 2013/11/25 unmesha sreeveni unmeshab...@gmail.com ok . Thx Yexi On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang yexiji...@gmail.com wrote: As far as I know, there is no ID3 implementation in mahout currently, but you can use the decision forest instead. https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example. 2013/11/25 unmesha sreeveni unmeshab...@gmail.com Is that ID3 classification? It includes prediction also? On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang yexiji...@gmail.com wrote: You can directly find it at https://github.com/apache/mahout, or you can check out from svn by following https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control. 2013/11/23 unmesha sreeveni unmeshab...@gmail.com I want to go through Decision tree implementation in mahout. Refereed Apache Mahout http://mahout.apache.org/ 6 Feb 2012 - Apache Mahout 0.6 released Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include: Improved Decision Tree performance and added support for regression problems Where can I find its source code and documentation. Should I download mahout -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer* -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer* -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer* -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: Heterogeneous Cluster
I don't think this is a normal way, and It's not suggested. we can deploy cluster cross IDC, cross different network, but don't cross OS. at least currently. On Tue, Nov 26, 2013 at 6:56 AM, Andrew Machtolff amachto...@askcts.comwrote: Yes, I set one up as a test. I had a windows cluster of 3 machines, and added a 4th Linux node. The Data Node was able to connect and replicate, but MR jobs failed. JobTracker/TaskTracker wasn’t translating the path to the data block. They were telling the Linux node to look in C:\ for the data block, and that obviously didn’t work. I feel like it’s only a small change away from working, at that point, I had to shift my focus and couldn’t work on it any more. Have you had any success? Andrew ** Andrew Machtolff / Senior Consultant [image: http://images.askcts.com/images/cts_logo_email.png]http://www.askcts.com/ www.askcts.com amachto...@askcts.com [image: http://images.askcts.com/images/cts_logo_linkedin.png]http://www.linkedin.com/company/cts-inc[image: http://images.askcts.com/images/cts_logo_twitter.png]https://twitter.com/askCTS[image: http://images.askcts.com/images/cts_logo_facebook.png]https://www.facebook.com/askCTS *From:* Ian Jackson [mailto:ian_jack...@trilliumsoftware.com] *Sent:* Friday, November 22, 2013 4:05 PM *To:* user@hadoop.apache.org *Subject:* Heterogeneous Cluster Has anyone set up a Heterogeneous cluster, some Windows nodes and Linux nodes? image001.jpgimage004.jpgimage003.jpgimage002.jpg
Re: Time taken for starting AMRMClientAsync
Krishna, Well, it all depends on your use case. In the case of Llama, Llama is a server that hosts multiple unmanaged AMs, thus all AMs run in the same process. Thanks. On Mon, Nov 25, 2013 at 6:40 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I don't start all the AMs from the same JVM. How can I do that? Also, when I do that, that will save me time taken to get AM started, which is also good to see an improvement in. Please let me know how can I do that? And, would this also save me time taken for connecting from AM to the Resource Manager? Thanks, Kishore On Tue, Nov 26, 2013 at 3:45 AM, Alejandro Abdelnur t...@cloudera.comwrote: Hi Krishna, Are you starting all AMs from the same JVM? Mind sharing the code you are using for your time testing? Thx On Thu, Nov 21, 2013 at 6:11 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I have modified the code in hadoop-2.2.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java to submit multiple application masters one after another and still seeing 800 to 900 ms being taken for the start() call on AMRMClientAsync in all of those applications. Please suggest if you think I am missing something else Thanks, Kishore On Tue, Nov 19, 2013 at 6:07 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I don't know what are managed and unmanaged AMs, can you please explain me what are the difference and how are each of them launched? I tried to google for these terms and came across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it related to that? Thanks, Kishore On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur t...@cloudera.com wrote: Kishore, Also, please specify if you are using managed or unmanaged AMs (the numbers I've mentioned before are using unmanaged AMs). thx On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: It is just creating a connection to RM and shouldn't take that long. Can you please file a ticket so that we can look at it? JVM class loading overhead is one possibility but 1 sec is a bit too much. Thanks, +Vinod On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote: Hi, I am seeing the following call to start() on AMRMClientAsync taking from 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I mean does it depend on any of the interval parameters or so in configuration files? I have tried reducing the value of the first argument below from 1000 to 100 seconds also, but that doesn't help. AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro -- Alejandro -- Alejandro
why my terasort job become a local job?
hi,maillist: i run terasort in my hadoop cluster,and it run as a local job,i do not know why ,anyone can help? i use hadoop version is CDH4.4 # sudo -u hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.4.0.jar teragen 1000 /alex/terasort/10G-input .. 13/11/26 11:57:39 INFO mapred.Task: Task:attempt_local321416814_0001_m_00_0 is done. And is in the process of commiting 13/11/26 11:57:39 INFO mapred.LocalJobRunner: 13/11/26 11:57:39 INFO mapred.Task: Task attempt_local321416814_0001_m_00_0 is allowed to commit now 13/11/26 11:57:39 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local321416814_0001_m_00_0' to hdfs://product/alex/terasort/10G-input 13/11/26 11:57:39 INFO mapred.LocalJobRunner: 13/11/26 11:57:39 INFO mapred.Task: Task 'attempt_local321416814_0001_m_00_0' done. 13/11/26 11:57:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local321416814_0001_m_00_0 13/11/26 11:57:39 INFO mapred.LocalJobRunner: Map task executor complete. 13/11/26 11:57:40 INFO mapred.JobClient: map 100% reduce 0% 13/11/26 11:57:40 INFO mapred.JobClient: Job complete: job_local321416814_0001 13/11/26 11:57:40 INFO mapred.JobClient: Counters: 19
Is there any design document for YARN
Hi , I am reading the yarn code, so wondering whether there's any design document for the yarn. I found the blog post on hortonworks is very useful. But more details document would be helpful. Thanks
Re: why my terasort job become a local job?
Do you set to use yarn framework in mapred-site.xml as following ? property namemapreduce.framework.name/name valueyarn/value /property On Tue, Nov 26, 2013 at 1:27 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i run terasort in my hadoop cluster,and it run as a local job,i do not know why ,anyone can help? i use hadoop version is CDH4.4 # sudo -u hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.4.0.jar teragen 1000 /alex/terasort/10G-input .. 13/11/26 11:57:39 INFO mapred.Task: Task:attempt_local321416814_0001_m_00_0 is done. And is in the process of commiting 13/11/26 11:57:39 INFO mapred.LocalJobRunner: 13/11/26 11:57:39 INFO mapred.Task: Task attempt_local321416814_0001_m_00_0 is allowed to commit now 13/11/26 11:57:39 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local321416814_0001_m_00_0' to hdfs://product/alex/terasort/10G-input 13/11/26 11:57:39 INFO mapred.LocalJobRunner: 13/11/26 11:57:39 INFO mapred.Task: Task 'attempt_local321416814_0001_m_00_0' done. 13/11/26 11:57:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local321416814_0001_m_00_0 13/11/26 11:57:39 INFO mapred.LocalJobRunner: Map task executor complete. 13/11/26 11:57:40 INFO mapred.JobClient: map 100% reduce 0% 13/11/26 11:57:40 INFO mapred.JobClient: Job complete: job_local321416814_0001 13/11/26 11:57:40 INFO mapred.JobClient: Counters: 19
Working with Capacity Scheduler
Hi, I working with Capacity Scheduler on YARN and I have configured different queues. I can able to see all the queues on RM ui. But, when i start to run MR jobs with configured user names(yarn,mapred), i am unable to run the Jobs and job are suspended. Again i set default as FIFO working fine. Can you please help me out sort this issue and configured Configurations are given below. *propertynameyarn.scheduler.capacity.root.queues/name valueproduction,exploration/value/propertyproperty nameyarn.scheduler.capacity.root.exploration.queues/name valuea,b,c/value/propertyproperty nameyarn.scheduler.capacity.root.capacity/name value100/value/propertyproperty nameyarn.scheduler.capacity.root.production.capacity/name value70/value/propertyproperty nameyarn.scheduler.capacity.root.exploration.capacity/name value30/value/propertyproperty nameyarn.scheduler.capacity.root.exploration.a.capacity/name value30/value/propertyproperty nameyarn.scheduler.capacity.root.exploration.b.capacity/name value30/value/propertyproperty nameyarn.scheduler.capacity.root.exploration.c.capacity/name value40/value/propertyproperty nameyarn.scheduler.capacity.root. production.acl_submit_applications/name valueyarn,mapred/value/propertypropertyvalueuserb/value nameyarn.scheduler.capacity.root.exploration.a.acl_submit_applications/name/propertyproperty nameyarn.scheduler.capacity.root.exploration.b.acl_submit_applications/name valueuserb/value/propertyproperty nameyarn.scheduler.capacity.root.exploration.c.acl_submit_applications/name valueuserc/value/property/configuration* -- *Regards* *Munna*
issue about yarn framework
hi,maillist: i have a 5-nodes hadoop cluster,today i find a problem ,one of my job running in the cluster take up all the container and all vcore,so other jobs need stay in pending status ,my question is 1,how to find the number of all containers in hadoop,and the number of containers in cluster decide by what? 2, how to find one container can get the number of vcore? 3, how to caculate the number of vcore in one cluster? 4,how can i limit the number of container which allocate to a job? 5,how can i limit the number of vcore which allocate to a job (or a container)? thank you!!
Re: why my terasort job become a local job?
yes ,i did # grep -C 3 framework /etc/hadoop/conf/mapred-site.xml configuration !-- YARN -- property namemapreduce.framework.name/name valueyarn/value /property On Tue, Nov 26, 2013 at 1:36 PM, Jeff Zhang jezh...@gopivotal.com wrote: Do you set to use yarn framework in mapred-site.xml as following ? property namemapreduce.framework.name/name valueyarn/value /property On Tue, Nov 26, 2013 at 1:27 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i run terasort in my hadoop cluster,and it run as a local job,i do not know why ,anyone can help? i use hadoop version is CDH4.4 # sudo -u hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.4.0.jar teragen 1000 /alex/terasort/10G-input .. 13/11/26 11:57:39 INFO mapred.Task: Task:attempt_local321416814_0001_m_00_0 is done. And is in the process of commiting 13/11/26 11:57:39 INFO mapred.LocalJobRunner: 13/11/26 11:57:39 INFO mapred.Task: Task attempt_local321416814_0001_m_00_0 is allowed to commit now 13/11/26 11:57:39 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local321416814_0001_m_00_0' to hdfs://product/alex/terasort/10G-input 13/11/26 11:57:39 INFO mapred.LocalJobRunner: 13/11/26 11:57:39 INFO mapred.Task: Task 'attempt_local321416814_0001_m_00_0' done. 13/11/26 11:57:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local321416814_0001_m_00_0 13/11/26 11:57:39 INFO mapred.LocalJobRunner: Map task executor complete. 13/11/26 11:57:40 INFO mapred.JobClient: map 100% reduce 0% 13/11/26 11:57:40 INFO mapred.JobClient: Job complete: job_local321416814_0001 13/11/26 11:57:40 INFO mapred.JobClient: Counters: 19
issue about yarn scheduler
hi,maillist: i see apache doc about yarn schema,it says capacity scheduler became a default scheduler,but what i see in CDH4.4,fifo scheduler still is default scheduler,why?
issue about set the memory for each container
hi,maillist: i find each my container just use 200M heap space,how can i resize it? # ps -ef|grep -i yarnchild yarn 24333 8210 99 14:09 ?00:00:05 /usr/java/jdk1.7.0_25/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx200m -Djava.io.tmpdir=/data/mrlocal/1/yarn/local/usercache/hdfs/appcache/application_1385445543402_0006/container_1385445543402_0006_01_17/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/data/mrlocal/2/yarn/logs/application_1385445543402_0006/container_1385445543402_0006_01_17 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.10.224 59237 attempt_1385445543402_0006_m_15_0 17