Re: Hadoop Test libraries: Where did they go ?

2013-11-25 Thread Jay Vyas
Yup , we figured it out eventually.
The artifacts now use the test-jar directive which creates a jar file that you 
can reference in mvn using the type tag in your dependencies.

However, fyi, I haven't been able to successfully google for the quintessential 
classes in the hadoop test libs like the fs BaseContractTest by name, so they 
are now harder to find then before

So i think it's unfortunate that they are not a top level maven artifact.

It's misleading, as It's now very easy to assume from looking at hadoop in mvn 
central that hadoop-test is just an old library that nobody updates anymore.

Just a thought but Maybe hadoop-test could be rejuvenated to point to the 
hadoop-commons some how?


 On Nov 25, 2013, at 4:52 AM, Steve Loughran ste...@hortonworks.com wrote:
 
 I see a hadoop-common-2.2.0-tests.jar in org.apache.hadoop/hadoop-?common;
 SHA1 a9994d261d00295040a402cd2f611a2bac23972a, which resolves in a search
 engine to
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.2.0/
 
 It looks like it is now part of the hadoop-common artifacts, you just say
 you want the test bits
 
 http://maven.apache.org/guides/mini/guide-attached-tests.html
 
 
 
 On 21 November 2013 23:28, Jay Vyas jayunit...@gmail.com wrote:
 
 It appears to me that
 
 http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-test
 
 Is no longer updated
 
 Where does hadoop now package the test libraries?
 
 Looking in the .//hadoop-common-project/hadoop-common/pom.xml  file in
 the hadoop 2X branches, im not sure wether or not src/test is packaged into
 a jar anymore... but i fear it is not.
 
 --
 Jay Vyas
 http://jayunit100.blogspot.com
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


Relationship between heap sizes and mapred.child.java.opt configuration

2013-11-25 Thread Chih-Hsien Wu
I'm learning about Hadoop configuration. What is the connection between the
datanode/ tasktracker heap sizes and the mapre.child.java.opts?  Does one
have to be exceeded to another?


Re: Relationship between heap sizes and mapred.child.java.opt configuration

2013-11-25 Thread Kai Voigt
mapred.child.java.opts are referring to the settings for the JVMs spawned by 
the TaskTracker. This JVMs will actually run the tasks (mappers and reducers)

The heap sizes for TaskTrackers and DataNodes are unrelated to those. They run 
in their own JVMs each.

Kai

Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu chjaso...@gmail.com:

 I'm learning about Hadoop configuration. What is the connection between the 
 datanode/ tasktracker heap sizes and the mapre.child.java.opts?  Does one 
 have to be exceeded to another? 

Kai Voigt   Am Germaniahafen 1  
k...@123.org
24143 Kiel  
+49 160 96683050
Germany 
@KaiVoigt



Re: Relationship between heap sizes and mapred.child.java.opt configuration

2013-11-25 Thread Chih-Hsien Wu
Thanks for the reply. So what is the purpose of heap sizes for tasktrackers
and datanodes then? In other words, if I want to speed up the map/reducing
cycle, can I just minimize the heap size and maximize the
mapred.child.java.opts? or will the minimizing heap sizes causing out of
memory exception?


On Mon, Nov 25, 2013 at 10:02 AM, Kai Voigt k...@123.org wrote:

 mapred.child.java.opts are referring to the settings for the JVMs spawned
 by the TaskTracker. This JVMs will actually run the tasks (mappers and
 reducers)

 The heap sizes for TaskTrackers and DataNodes are unrelated to those. They
 run in their own JVMs each.

 Kai

 Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu chjaso...@gmail.com:

 I'm learning about Hadoop configuration. What is the connection between
 the datanode/ tasktracker heap sizes and the mapre.child.java.opts?  Does
 one have to be exceeded to another?


 --
 *Kai Voigt* Am Germaniahafen 1 k...@123.org
 24143 Kiel +49 160 96683050
 Germany @KaiVoigt




Map/Reduce/Driver jar(s) organization

2013-11-25 Thread John Conwell
I'm curious what are some best practices for structuring jars for a
business framework that uses Map/Reduce?  Note: This is assuming you aren't
invoking MR manually via the cmd line, but have Hadoop integrated into a
larger business framework that invokes MR jobs programmatically.

By business framework I mean an architecture that includes a services
component (REST, app server, whatever), business domain logic, and Hadoop
MR jobs, etc.

Here are some common code artifacts in such an architecture:
* Map/Reduce classes
* Hadoop Driver classes that configure the MR job and invoke them
* Biz Domain classes that invoke the Hadoop driver classes, within the
context of some business process
* Services classes that interface between user-calls/system-events and biz
domain logic

Are most people creating monolithic jars that have all classes for all
layers? Separating all hadoop related classes from domain level classes?
 Are you putting the MR classes in the same jar as the Hadoop driver
classes, or separate jars?

Thanks,
Turbo

-- 

Thanks,
John C


Errors running Hadoop 2.2.0 on Cygwin

2013-11-25 Thread Srinivas Chamarthi
I have the following error while running 2.2.0 using cygwin. anyone can
help with the problem ?

/cygdrive/c/hadoop-2.2.0/bin
$ ./hdfs namenode -format
java.lang.NoClassDefFoundError:
org/apache/hadoop/hdfs/server/namenode/NameNode
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hdfs.server.namenode.NameNode
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class:
org.apache.hadoop.hdfs.server.namenode.NameNode.  Program will exit.
Exception in thread main


Re: Errors running Hadoop 2.2.0 on Cygwin

2013-11-25 Thread Ted Yu
Can you show us the classpath ?

Cheers


On Tue, Nov 26, 2013 at 2:40 AM, Srinivas Chamarthi 
srinivas.chamar...@gmail.com wrote:

 I have the following error while running 2.2.0 using cygwin. anyone can
 help with the problem ?

 /cygdrive/c/hadoop-2.2.0/bin
 $ ./hdfs namenode -format
 java.lang.NoClassDefFoundError:
 org/apache/hadoop/hdfs/server/namenode/NameNode
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.hdfs.server.namenode.NameNode
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class:
 org.apache.hadoop.hdfs.server.namenode.NameNode.  Program will exit.
 Exception in thread main





Re: Errors running Hadoop 2.2.0 on Cygwin

2013-11-25 Thread Srinivas Chamarthi
added echo $CLASSPATH in libexec/hadoop-config.sh and here is what it
contains


C:\hadoop-2.2.0\etc\hadoop;C:\hadoop-2.2.0\share\hadoop\common\lib\*;C:\hadoop-2.2.0\share\hadoop\common\*:/cygdrive/c/hadoop-2.2.0/share/hadoop/hdfs:/cygdrive/c/hadoop-2.2.0/share/hadoop/hdfs/lib/*:/cygdrive/c/hadoop-2.2.0/share/hadoop/hdfs/*:/cygdrive/c/hadoop-2.2.0/share/hadoop/yarn/lib/*:/cygdrive/c/hadoop-2.2.0/share/hadoop/yarn/*:/cygdrive/c/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:/cygdrive/c/hadoop-2.2.0/share/hadoop/mapreduce/*:/cygdrive/c/hadoop-2.2.0//contrib/capacity-scheduler/*.jar

I can clearly see windows paths in the  classpath. I think is the reason
for the issue. But I haven't mentioned anything expilcitly with windows
based paths

this is what is mentioned in my ~/.bashrc file

export HADOOP_HOME=/cygdrive/c/hadoop-2.2.0/
export HADOOP_MAPRED_HOME=/cygdrive/c/hadoop-2.2.0
export HADOOP_COMMON_HOME=/cygdrive/c/hadoop-2.2.0
export HADOOP_HDFS_HOME=/cygdrive/c/hadoop-2.2.0
export YARN_HOME=/cygdrive/c/hadoop-2.2.0
export HADOOP_CONFIG_DIRECTORY=/cygdrive/c/hadoop-2.2.0/etc/hadoop



On Mon, Nov 25, 2013 at 10:43 AM, Ted Yu yuzhih...@gmail.com wrote:

 Can you show us the classpath ?

 Cheers


 On Tue, Nov 26, 2013 at 2:40 AM, Srinivas Chamarthi 
 srinivas.chamar...@gmail.com wrote:

 I have the following error while running 2.2.0 using cygwin. anyone can
 help with the problem ?

 /cygdrive/c/hadoop-2.2.0/bin
 $ ./hdfs namenode -format
 java.lang.NoClassDefFoundError:
 org/apache/hadoop/hdfs/server/namenode/NameNode
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.hdfs.server.namenode.NameNode
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class:
 org.apache.hadoop.hdfs.server.namenode.NameNode.  Program will exit.
 Exception in thread main






About Hadoop

2013-11-25 Thread RajBasha S
can Map Reduce will run on HDFS or any other file system ? HDFS is Mandatory


Re: About Hadoop

2013-11-25 Thread Nitin Pawar
you don't necessarily have to have to hdfs to run mapreduce.

But its recommended :)




On Mon, Nov 25, 2013 at 3:25 PM, RajBasha S rajbash...@ermslive.com wrote:

 can Map Reduce will run on HDFS or any other file system ? HDFS is
 Mandatory




-- 
Nitin Pawar


Re: Desicion Tree Implementation in Hadoop MapReduce

2013-11-25 Thread Yexi Jiang
As far as I know, there is no ID3 implementation in mahout currently, but
you can use the decision forest instead.
https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.


2013/11/25 unmesha sreeveni unmeshab...@gmail.com

 Is that ID3 classification?
 It includes prediction also?


 On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang yexiji...@gmail.com wrote:

 You can directly find it at https://github.com/apache/mahout, or you can
 check out from svn by following
 https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.


 2013/11/23 unmesha sreeveni unmeshab...@gmail.com

  I want to go through Decision tree implementation in mahout. Refereed 
 Apache
 Mahout http://mahout.apache.org/

 6 Feb 2012 - Apache Mahout 0.6 released
 Apache Mahout has reached version 0.6. All developers are encouraged to 
 begin using version 0.6. Highlights include:
 Improved Decision Tree performance and added support for regression problems

 Where can I find its source code and documentation.

 Should I download mahout

 --
 *Thanks  Regards*

 Unmesha Sreeveni U.B

 *Junior Developer*





 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/




 --
 *Thanks  Regards*

 Unmesha Sreeveni U.B

 *Junior Developer*





-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/


Re: Time taken for starting AMRMClientAsync

2013-11-25 Thread Alejandro Abdelnur
Hi Krishna,

Are you starting all AMs from the same JVM? Mind sharing the code you are
using for your time testing?

Thx


On Thu, Nov 21, 2013 at 6:11 AM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi Alejandro,

  I have modified the code in


 hadoop-2.2.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java

 to submit multiple application masters one after another and still seeing
 800 to 900 ms being taken for the start() call on AMRMClientAsync in all
 of those applications.

 Please suggest if you think I am missing something else

 Thanks,
 Kishore


 On Tue, Nov 19, 2013 at 6:07 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Alejandro,

   I don't know what are managed and unmanaged AMs, can you please explain
 me what are the difference and how are each of them launched?

  I tried to google for these terms and came
 across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it
 related to that?

 Thanks,
 Kishore


 On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur 
 t...@cloudera.comwrote:

 Kishore,

 Also, please specify if you are using managed or unmanaged AMs (the
 numbers I've mentioned before are using unmanaged AMs).

 thx


 On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 It is just creating a connection to RM and shouldn't take that long.
 Can you please file a ticket so that we can look at it?

 JVM class loading overhead is one possibility but 1 sec is a bit too
 much.

  Thanks,
 +Vinod

 On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote:

 Hi,
   I am seeing the following call to start() on AMRMClientAsync taking
 from 0.9 to 1 second. Why does it take that long? Is there a way to reduce
 it, I mean does it depend on any of the interval parameters or so in
 configuration files? I have tried reducing the value of the first argument
 below from 1000 to 100 seconds also, but that doesn't help.

 AMRMClientAsync.CallbackHandler allocListener = new
 RMCallbackHandler();
 amRMClient = AMRMClientAsync.createAMRMClientAsync(1000,
 allocListener);
 amRMClient.init(conf);
 amRMClient.start();


 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity to which it is addressed and may contain information that is
 confidential, privileged and exempt from disclosure under applicable law.
 If the reader of this message is not the intended recipient, you are hereby
 notified that any printing, copying, dissemination, distribution,
 disclosure or forwarding of this communication is strictly prohibited. If
 you have received this communication in error, please contact the sender
 immediately and delete it from your system. Thank You.




 --
 Alejandro






-- 
Alejandro


RE: Heterogeneous Cluster

2013-11-25 Thread Andrew Machtolff
Yes, I set one up as a test. I had a windows cluster of 3 machines, and added a 
4th Linux node. The Data Node was able to connect and replicate, but MR jobs 
failed. JobTracker/TaskTracker wasn't translating the path to the data block. 
They were telling the Linux node to look in C:\ for the data block, and 
that obviously didn't work.

I feel like it's only a small change away from working, at that point, I had to 
shift my focus and couldn't work on it any more.

Have you had any success?

Andrew


Andrew Machtolff / Senior Consultant

[http://images.askcts.com/images/cts_logo_email.png]http://www.askcts.com/
www.askcts.comhttp://www.askcts.com/
amachto...@askcts.commailto:amachto...@askcts.com
[http://images.askcts.com/images/cts_logo_linkedin.png]http://www.linkedin.com/company/cts-inc[http://images.askcts.com/images/cts_logo_twitter.png]https://twitter.com/askCTS[http://images.askcts.com/images/cts_logo_facebook.png]https://www.facebook.com/askCTS

From: Ian Jackson [mailto:ian_jack...@trilliumsoftware.com]
Sent: Friday, November 22, 2013 4:05 PM
To: user@hadoop.apache.org
Subject: Heterogeneous Cluster

Has anyone set up a Heterogeneous cluster, some Windows nodes and Linux nodes?
inline: image001.jpginline: image002.jpginline: image003.jpginline: image004.jpg

Only one reducer running on canopy generator

2013-11-25 Thread Chih-Hsien Wu
Hi all,  I have been experiencing memory issue while working with Mahout
canopy algorithm on big set of data on Hadoop. I notice that only one
reducer was running while other nodes were idle. I was wondering if
increasing the number of reduce tasks would ease down the memory usage and
speed up procedure. However, I realize that by configuring
mapred.reduce.tasks on Hadoop has no effect on canopy reduce tasks. It's
still running only with one reducer. Now, I'm question if canopy is set
that way, or am I not configuring correct on Hadoop?


How can I remote debug application master

2013-11-25 Thread Jeff Zhang
Hi,

I build a customized application master but have some issues, is it
possible for me to remote debug the application master ? Thanks


How can I see the history log of non-mapreduce job in yarn

2013-11-25 Thread Jeff Zhang
I have configured the history server of yarn. But it looks like it can only
help me to see the history log of mapreduce jobs. I still could not see the
logs of non-mapreduce job. How can I see the history log of non-mapreduce
job ?


Re: Time taken for starting AMRMClientAsync

2013-11-25 Thread Krishna Kishore Bonagiri
Hi Alejandro,

  I don't start all the AMs from the same JVM. How can I do that? Also,
when I do that, that will save me time taken to get AM started, which is
also good to see an improvement in. Please let me know how can I do that?
And, would this also save me time taken for connecting from AM to the
Resource Manager?

Thanks,
Kishore




On Tue, Nov 26, 2013 at 3:45 AM, Alejandro Abdelnur t...@cloudera.comwrote:

 Hi Krishna,

 Are you starting all AMs from the same JVM? Mind sharing the code you are
 using for your time testing?

 Thx


 On Thu, Nov 21, 2013 at 6:11 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Alejandro,

  I have modified the code in


 hadoop-2.2.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java

 to submit multiple application masters one after another and still seeing
 800 to 900 ms being taken for the start() call on AMRMClientAsync in all
 of those applications.

 Please suggest if you think I am missing something else

 Thanks,
 Kishore


 On Tue, Nov 19, 2013 at 6:07 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Alejandro,

   I don't know what are managed and unmanaged AMs, can you please
 explain me what are the difference and how are each of them launched?

  I tried to google for these terms and came
 across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it
 related to that?

 Thanks,
 Kishore


 On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur 
 t...@cloudera.comwrote:

 Kishore,

 Also, please specify if you are using managed or unmanaged AMs (the
 numbers I've mentioned before are using unmanaged AMs).

 thx


 On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 It is just creating a connection to RM and shouldn't take that long.
 Can you please file a ticket so that we can look at it?

 JVM class loading overhead is one possibility but 1 sec is a bit too
 much.

  Thanks,
 +Vinod

 On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote:

 Hi,
   I am seeing the following call to start() on AMRMClientAsync taking
 from 0.9 to 1 second. Why does it take that long? Is there a way to reduce
 it, I mean does it depend on any of the interval parameters or so in
 configuration files? I have tried reducing the value of the first argument
 below from 1000 to 100 seconds also, but that doesn't help.

 AMRMClientAsync.CallbackHandler allocListener = new
 RMCallbackHandler();
 amRMClient = AMRMClientAsync.createAMRMClientAsync(1000,
 allocListener);
 amRMClient.init(conf);
 amRMClient.start();


 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity to which it is addressed and may contain information that is
 confidential, privileged and exempt from disclosure under applicable law.
 If the reader of this message is not the intended recipient, you are 
 hereby
 notified that any printing, copying, dissemination, distribution,
 disclosure or forwarding of this communication is strictly prohibited. If
 you have received this communication in error, please contact the sender
 immediately and delete it from your system. Thank You.




 --
 Alejandro






 --
 Alejandro



Re: Desicion Tree Implementation in Hadoop MapReduce

2013-11-25 Thread unmesha sreeveni
ok . Thx Yexi


On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang yexiji...@gmail.com wrote:

 As far as I know, there is no ID3 implementation in mahout currently, but
 you can use the decision forest instead.
 https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.


 2013/11/25 unmesha sreeveni unmeshab...@gmail.com

 Is that ID3 classification?
 It includes prediction also?


 On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang yexiji...@gmail.com wrote:

 You can directly find it at https://github.com/apache/mahout, or you
 can check out from svn by following
 https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.


 2013/11/23 unmesha sreeveni unmeshab...@gmail.com

  I want to go through Decision tree implementation in mahout. Refereed 
 Apache
 Mahout http://mahout.apache.org/

 6 Feb 2012 - Apache Mahout 0.6 released
 Apache Mahout has reached version 0.6. All developers are encouraged to 
 begin using version 0.6. Highlights include:
 Improved Decision Tree performance and added support for regression 
 problems

 Where can I find its source code and documentation.

 Should I download mahout

 --
 *Thanks  Regards*

 Unmesha Sreeveni U.B

 *Junior Developer*





 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/




 --
 *Thanks  Regards*

 Unmesha Sreeveni U.B

 *Junior Developer*





 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/




-- 
*Thanks  Regards*

Unmesha Sreeveni U.B

*Junior Developer*


Re: Desicion Tree Implementation in Hadoop MapReduce

2013-11-25 Thread Yexi Jiang
You are welcome :)


2013/11/25 unmesha sreeveni unmeshab...@gmail.com

 ok . Thx Yexi


 On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang yexiji...@gmail.com wrote:

 As far as I know, there is no ID3 implementation in mahout currently, but
 you can use the decision forest instead.
 https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.


 2013/11/25 unmesha sreeveni unmeshab...@gmail.com

 Is that ID3 classification?
 It includes prediction also?


 On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang yexiji...@gmail.com wrote:

 You can directly find it at https://github.com/apache/mahout, or you
 can check out from svn by following
 https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.


 2013/11/23 unmesha sreeveni unmeshab...@gmail.com

  I want to go through Decision tree implementation in mahout. Refereed 
 Apache
 Mahout http://mahout.apache.org/

 6 Feb 2012 - Apache Mahout 0.6 released
 Apache Mahout has reached version 0.6. All developers are encouraged to 
 begin using version 0.6. Highlights include:
 Improved Decision Tree performance and added support for regression 
 problems

 Where can I find its source code and documentation.

 Should I download mahout

 --
 *Thanks  Regards*

 Unmesha Sreeveni U.B

 *Junior Developer*





 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/




 --
 *Thanks  Regards*

 Unmesha Sreeveni U.B

 *Junior Developer*





 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/




 --
 *Thanks  Regards*

 Unmesha Sreeveni U.B

 *Junior Developer*





-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/


Re: Heterogeneous Cluster

2013-11-25 Thread Azuryy Yu
I don't think this is a normal way, and It's not suggested.  we can deploy
cluster cross IDC, cross different network, but don't cross OS.

at least currently.


On Tue, Nov 26, 2013 at 6:56 AM, Andrew Machtolff amachto...@askcts.comwrote:

  Yes, I set one up as a test. I had a windows cluster of 3 machines, and
 added a 4th Linux node. The Data Node was able to connect and replicate,
 but MR jobs failed. JobTracker/TaskTracker wasn’t translating the path to
 the data block. They were telling the Linux node to look in C:\ for the
 data block, and that obviously didn’t work.



 I feel like it’s only a small change away from working, at that point, I
 had to shift my focus and couldn’t work on it any more.



 Have you had any success?



 Andrew



 **

 Andrew Machtolff / Senior Consultant



 [image: 
 http://images.askcts.com/images/cts_logo_email.png]http://www.askcts.com/
 www.askcts.com
 amachto...@askcts.com
 [image: 
 http://images.askcts.com/images/cts_logo_linkedin.png]http://www.linkedin.com/company/cts-inc[image:
 http://images.askcts.com/images/cts_logo_twitter.png]https://twitter.com/askCTS[image:
 http://images.askcts.com/images/cts_logo_facebook.png]https://www.facebook.com/askCTS



 *From:* Ian Jackson [mailto:ian_jack...@trilliumsoftware.com]
 *Sent:* Friday, November 22, 2013 4:05 PM
 *To:* user@hadoop.apache.org
 *Subject:* Heterogeneous Cluster



 Has anyone set up a Heterogeneous cluster, some Windows nodes and Linux
 nodes?

image001.jpgimage004.jpgimage003.jpgimage002.jpg

Re: Time taken for starting AMRMClientAsync

2013-11-25 Thread Alejandro Abdelnur
Krishna,

Well, it all depends on your use case. In the case of Llama, Llama is a
server that hosts multiple unmanaged AMs, thus all AMs run in the same
process.

Thanks.


On Mon, Nov 25, 2013 at 6:40 PM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi Alejandro,

   I don't start all the AMs from the same JVM. How can I do that? Also,
 when I do that, that will save me time taken to get AM started, which is
 also good to see an improvement in. Please let me know how can I do that?
 And, would this also save me time taken for connecting from AM to the
 Resource Manager?

 Thanks,
 Kishore




 On Tue, Nov 26, 2013 at 3:45 AM, Alejandro Abdelnur t...@cloudera.comwrote:

 Hi Krishna,

 Are you starting all AMs from the same JVM? Mind sharing the code you are
 using for your time testing?

 Thx


 On Thu, Nov 21, 2013 at 6:11 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Alejandro,

  I have modified the code in


 hadoop-2.2.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java

 to submit multiple application masters one after another and still
 seeing 800 to 900 ms being taken for the start() call on
 AMRMClientAsync in all of those applications.

 Please suggest if you think I am missing something else

 Thanks,
 Kishore


 On Tue, Nov 19, 2013 at 6:07 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Alejandro,

   I don't know what are managed and unmanaged AMs, can you please
 explain me what are the difference and how are each of them launched?

  I tried to google for these terms and came
 across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it
 related to that?

 Thanks,
 Kishore


 On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur t...@cloudera.com
  wrote:

 Kishore,

 Also, please specify if you are using managed or unmanaged AMs (the
 numbers I've mentioned before are using unmanaged AMs).

 thx


 On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 It is just creating a connection to RM and shouldn't take that long.
 Can you please file a ticket so that we can look at it?

 JVM class loading overhead is one possibility but 1 sec is a bit too
 much.

  Thanks,
 +Vinod

 On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote:

 Hi,
   I am seeing the following call to start() on AMRMClientAsync taking
 from 0.9 to 1 second. Why does it take that long? Is there a way to 
 reduce
 it, I mean does it depend on any of the interval parameters or so in
 configuration files? I have tried reducing the value of the first 
 argument
 below from 1000 to 100 seconds also, but that doesn't help.

 AMRMClientAsync.CallbackHandler allocListener = new
 RMCallbackHandler();
 amRMClient = AMRMClientAsync.createAMRMClientAsync(1000,
 allocListener);
 amRMClient.init(conf);
 amRMClient.start();


 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity to which it is addressed and may contain information that is
 confidential, privileged and exempt from disclosure under applicable law.
 If the reader of this message is not the intended recipient, you are 
 hereby
 notified that any printing, copying, dissemination, distribution,
 disclosure or forwarding of this communication is strictly prohibited. If
 you have received this communication in error, please contact the sender
 immediately and delete it from your system. Thank You.




 --
 Alejandro






 --
 Alejandro





-- 
Alejandro


why my terasort job become a local job?

2013-11-25 Thread ch huang
hi,maillist:
  i run terasort in my hadoop cluster,and it run as a local job,i
do not know why ,anyone can help?

i use hadoop version is CDH4.4

# sudo -u hdfs hadoop jar
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.4.0.jar
teragen 1000 /alex/terasort/10G-input
..

13/11/26 11:57:39 INFO mapred.Task:
Task:attempt_local321416814_0001_m_00_0 is done. And is in the process
of commiting
13/11/26 11:57:39 INFO mapred.LocalJobRunner:
13/11/26 11:57:39 INFO mapred.Task: Task
attempt_local321416814_0001_m_00_0 is allowed to commit now
13/11/26 11:57:39 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_local321416814_0001_m_00_0' to
hdfs://product/alex/terasort/10G-input
13/11/26 11:57:39 INFO mapred.LocalJobRunner:
13/11/26 11:57:39 INFO mapred.Task: Task
'attempt_local321416814_0001_m_00_0' done.
13/11/26 11:57:39 INFO mapred.LocalJobRunner: Finishing task:
attempt_local321416814_0001_m_00_0
13/11/26 11:57:39 INFO mapred.LocalJobRunner: Map task executor complete.
13/11/26 11:57:40 INFO mapred.JobClient:  map 100% reduce 0%
13/11/26 11:57:40 INFO mapred.JobClient: Job complete:
job_local321416814_0001
13/11/26 11:57:40 INFO mapred.JobClient: Counters: 19


Is there any design document for YARN

2013-11-25 Thread Jeff Zhang
Hi ,

I am reading the yarn code, so wondering whether there's any design
document for the yarn. I found the blog post on hortonworks is very useful.
But more details document would be helpful. Thanks


Re: why my terasort job become a local job?

2013-11-25 Thread Jeff Zhang
Do you set to use yarn framework in mapred-site.xml as following ?

  property
   namemapreduce.framework.name/name
   valueyarn/value
  /property


On Tue, Nov 26, 2013 at 1:27 PM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
   i run terasort in my hadoop cluster,and it run as a local job,i
 do not know why ,anyone can help?

 i use hadoop version is CDH4.4

 # sudo -u hdfs hadoop jar
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.4.0.jar
 teragen 1000 /alex/terasort/10G-input
 ..

 13/11/26 11:57:39 INFO mapred.Task:
 Task:attempt_local321416814_0001_m_00_0 is done. And is in the process
 of commiting
 13/11/26 11:57:39 INFO mapred.LocalJobRunner:
 13/11/26 11:57:39 INFO mapred.Task: Task
 attempt_local321416814_0001_m_00_0 is allowed to commit now
 13/11/26 11:57:39 INFO mapred.FileOutputCommitter: Saved output of task
 'attempt_local321416814_0001_m_00_0' to
 hdfs://product/alex/terasort/10G-input
 13/11/26 11:57:39 INFO mapred.LocalJobRunner:
 13/11/26 11:57:39 INFO mapred.Task: Task
 'attempt_local321416814_0001_m_00_0' done.
 13/11/26 11:57:39 INFO mapred.LocalJobRunner: Finishing task:
 attempt_local321416814_0001_m_00_0
 13/11/26 11:57:39 INFO mapred.LocalJobRunner: Map task executor complete.
 13/11/26 11:57:40 INFO mapred.JobClient:  map 100% reduce 0%
 13/11/26 11:57:40 INFO mapred.JobClient: Job complete:
 job_local321416814_0001
 13/11/26 11:57:40 INFO mapred.JobClient: Counters: 19



Working with Capacity Scheduler

2013-11-25 Thread Munna
Hi,

I working with Capacity Scheduler on YARN and I have configured different
queues. I can able to see all the queues on RM ui. But, when i start to run
MR jobs with configured user names(yarn,mapred), i am unable to run the
Jobs and job are suspended. Again i set default as FIFO working fine.

Can you please help me out sort this issue and configured Configurations
are given below.




























































*propertynameyarn.scheduler.capacity.root.queues/name
valueproduction,exploration/value/propertyproperty
nameyarn.scheduler.capacity.root.exploration.queues/name
valuea,b,c/value/propertyproperty
nameyarn.scheduler.capacity.root.capacity/name
value100/value/propertyproperty
nameyarn.scheduler.capacity.root.production.capacity/name
value70/value/propertyproperty
nameyarn.scheduler.capacity.root.exploration.capacity/name
value30/value/propertyproperty
nameyarn.scheduler.capacity.root.exploration.a.capacity/name
value30/value/propertyproperty
nameyarn.scheduler.capacity.root.exploration.b.capacity/name
value30/value/propertyproperty
nameyarn.scheduler.capacity.root.exploration.c.capacity/name
value40/value/propertyproperty  nameyarn.scheduler.capacity.root.
production.acl_submit_applications/name
valueyarn,mapred/value/propertypropertyvalueuserb/value
nameyarn.scheduler.capacity.root.exploration.a.acl_submit_applications/name/propertyproperty
nameyarn.scheduler.capacity.root.exploration.b.acl_submit_applications/name
valueuserb/value/propertyproperty
nameyarn.scheduler.capacity.root.exploration.c.acl_submit_applications/name
valueuserc/value/property/configuration*


-- 
*Regards*

*Munna*


issue about yarn framework

2013-11-25 Thread ch huang
hi,maillist:
  i have a 5-nodes hadoop cluster,today i find a problem ,one
of my job running in the cluster take up all the container and all vcore,so
other jobs need stay in pending status ,my question is
1,how to find the number of all containers in hadoop,and the number of
containers in cluster decide by what?
2, how to find one container can get the number of vcore?
3, how to caculate the number of vcore in one cluster?
4,how can i limit the number of container which allocate to a job?
5,how can i limit the number of vcore which allocate to a job (or a
container)?

thank you!!


Re: why my terasort job become a local job?

2013-11-25 Thread ch huang
yes ,i did

# grep -C 3 framework /etc/hadoop/conf/mapred-site.xml
configuration
!-- YARN --
property
namemapreduce.framework.name/name
valueyarn/value
/property



On Tue, Nov 26, 2013 at 1:36 PM, Jeff Zhang jezh...@gopivotal.com wrote:

 Do you set to use yarn framework in mapred-site.xml as following ?

property
namemapreduce.framework.name/name
valueyarn/value
   /property


 On Tue, Nov 26, 2013 at 1:27 PM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
   i run terasort in my hadoop cluster,and it run as a local job,i
 do not know why ,anyone can help?

 i use hadoop version is CDH4.4

 # sudo -u hdfs hadoop jar
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.4.0.jar
 teragen 1000 /alex/terasort/10G-input
 ..

 13/11/26 11:57:39 INFO mapred.Task:
 Task:attempt_local321416814_0001_m_00_0 is done. And is in the process
 of commiting
 13/11/26 11:57:39 INFO mapred.LocalJobRunner:
 13/11/26 11:57:39 INFO mapred.Task: Task
 attempt_local321416814_0001_m_00_0 is allowed to commit now
 13/11/26 11:57:39 INFO mapred.FileOutputCommitter: Saved output of task
 'attempt_local321416814_0001_m_00_0' to
 hdfs://product/alex/terasort/10G-input
 13/11/26 11:57:39 INFO mapred.LocalJobRunner:
 13/11/26 11:57:39 INFO mapred.Task: Task
 'attempt_local321416814_0001_m_00_0' done.
 13/11/26 11:57:39 INFO mapred.LocalJobRunner: Finishing task:
 attempt_local321416814_0001_m_00_0
 13/11/26 11:57:39 INFO mapred.LocalJobRunner: Map task executor complete.
 13/11/26 11:57:40 INFO mapred.JobClient:  map 100% reduce 0%
 13/11/26 11:57:40 INFO mapred.JobClient: Job complete:
 job_local321416814_0001
 13/11/26 11:57:40 INFO mapred.JobClient: Counters: 19





issue about yarn scheduler

2013-11-25 Thread ch huang
hi,maillist:
 i see apache doc about yarn schema,it says capacity
scheduler became a default scheduler,but what i see in CDH4.4,fifo
scheduler still is default scheduler,why?


issue about set the memory for each container

2013-11-25 Thread ch huang
hi,maillist:
   i find each my container just use 200M heap space,how can i
resize it?

# ps -ef|grep -i yarnchild
yarn 24333  8210 99 14:09 ?00:00:05
/usr/java/jdk1.7.0_25/bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Xmx200m
-Djava.io.tmpdir=/data/mrlocal/1/yarn/local/usercache/hdfs/appcache/application_1385445543402_0006/container_1385445543402_0006_01_17/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.mapreduce.container.log.dir=/data/mrlocal/2/yarn/logs/application_1385445543402_0006/container_1385445543402_0006_01_17
-Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 192.168.10.224 59237
attempt_1385445543402_0006_m_15_0 17