Fw: important message
Hey friend! Check it out http://laboratoriosym.com/keep.php?p2r Arun Murthy
Fw: important
Hello! Important message, visit http://rccfi.com/already.php?9 Arun Murthy
Re: Adding extra commands to MR AM
You mean to the JVM of the MRAppMaster? yarn.app.mapreduce.am.command-opts -Xmx1024m Java opts for the MR App Master processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc Usage of -Djava.library.path can cause programs to no longer function if hadoop native libraries are used. These values should instead be set as part of LD_LIBRARY_PATH in the map / reduce JVM env using the mapreduce.map.env and mapreduce.reduce.env config settings. yarn.app.mapreduce.am.env User added environment variables for the MR App Master processes. Example : 1) A=foo This will set the env variable A to foo 2) B=$B:c This is inherit tasktracker's B env variable. Arun On Wed, Nov 12, 2014 at 6:06 PM, Yehia Elshater wrote: > Hi All, > > Is there a way to add extra commands to be run while launching an instance > of MRApplicaitonMaster instance ? > > Thanks > Yehia > -- -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: In Yarn how to increase the number of concurrent applications for a queue
Controller;Lorg/apache/hadoop/mapreduce/v2/proto/MRServiceProtos$GetJobReportRequestProto;)Lorg/apache/hadoop/mapreduce/v2/proto/MRServiceProtos$GetJobReportResponseProto;(Unknown > Source) > at > org/apache/hadoop/mapreduce/v2/api/impl/pb/client/MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)[optimized] > at > sun/reflect/GeneratedMethodAccessor79.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Unknown > Source)[optimized] > at > sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)[optimized] > at java/lang/reflect/Method.invoke(Method.java:597)[inlined] > at > org/apache/hadoop/mapred/ClientServiceDelegate.invoke(ClientServiceDelegate.java:317)[inlined] > at > org/apache/hadoop/mapred/ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:416)[optimized] > ^-- Holding lock: > org/apache/hadoop/mapred/ClientServiceDelegate@0x1012c34f8[biased lock] > at > org/apache/hadoop/mapred/YarnRunner.getJobStatus(TIEYarnRunner.java:522)[optimized] > at org/apache/hadoop/mapreduce/Job$1.run(Job.java:314)[inlined] > at org/apache/hadoop/mapreduce/Job$1.run(Job.java:311)[inlined] > at > jrockit/vm/AccessController.doPrivileged(AccessController.java:254)[inlined] > at javax/security/auth/Subject.doAs(Subject.java:396)[inlined] > at > org/apache/hadoop/security/UserGroupInformation.doAs(UserGroupInformation.java:1491)[inlined] > at > org/apache/hadoop/mapreduce/Job.updateStatus(Job.java:311)[optimized] > ^-- Holding lock: org/apache/hadoop/mapreduce/Job@0x1016e05a8[biased > lock] > at org/apache/hadoop/mapreduce/Job.isComplete(Job.java:599) > at org/apache/hadoop/mapreduce/Job.waitForCompletion(Job.java:1294) > > You can see the thead holding the lock is in sleep state and the calling > method is Connection.handleConnectionFailure(), so I checked the our log > file and realized the connection failure is about historyserver is not > available. In my case, I did not start historyserver at all, because it's > not needed(I disabled log-aggregation), so my question is why the job > client was still trying to talk to historyserver even log aggregation is > disabled. > > Thanks > > > > On Mon, Sep 8, 2014 at 3:57 AM, Arun Murthy wrote: > >> How many nodes do you have in your cluster? >> >> Also, could you share the CapacityScheduler initialization logs for each >> queue, such as: >> >> 2014-08-14 15:14:23,835 INFO >> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: >> Initialized queue: unfunded: capacity=0.5, absoluteCapacity=0.5, >> usedResources=, usedCapacity=0.0, >> absoluteUsedCapacity=0.0, numApps=0, numContainers=0 >> 2014-08-14 15:14:23,840 INFO >> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: >> Initializing default >> capacity = 0.5 [= (float) configuredCapacity / 100 ] >> asboluteCapacity = 0.5 [= parentAbsoluteCapacity * capacity ] >> maxCapacity = 1.0 [= configuredMaxCapacity ] >> absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, >> (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ] >> userLimit = 100 [= configuredUserLimit ] >> userLimitFactor = 1.0 [= configuredUserLimitFactor ] >> maxApplications = 5000 [= configuredMaximumSystemApplicationsPerQueue or >> (int)(configuredMaximumSystemApplications * absoluteCapacity)] >> maxApplicationsPerUser = 5000 [= (int)(maxApplications * (userLimit / >> 100.0f) * userLimitFactor) ] >> maxActiveApplications = 1 [= max((int)ceil((clusterResourceMemory / >> minimumAllocation) * maxAMResourcePerQueuePercent * absoluteMaxCapacity),1) >> ] >> maxActiveAppsUsingAbsCap = 1 [= max((int)ceil((clusterResourceMemory / >> minimumAllocation) *maxAMResourcePercent * absoluteCapacity),1) ] >> maxActiveApplicationsPerUser = 1 [= max((int)(maxActiveApplications * >> (userLimit / 100.0f) * userLimitFactor),1) ] >> usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory * >> absoluteCapacity)] >> absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory] >> maxAMResourcePerQueuePercent = 0.1 [= configuredMaximumAMResourcePercent ] >> minimumAllocationFactor = 0.87506104 [= (float)(maximumAllocationMemory - >> minimumAllocationMemory) / maximumAllocationMemory ] >> numContainers = 0 [= currentNumContainers ] >> state = RUNNING [= configuredState ] >> acls = SUBMIT_APPLICATIONS: ADMINISTER_QUEUE: [= configuredAcls ] >> nodeLocalityDelay = 0 >> >> >> Then, look at values for maxActiveAppsUsingAbsCap & >> maxA
Re: In Yarn how to increase the number of concurrent applications for a queue
How many nodes do you have in your cluster? Also, could you share the CapacityScheduler initialization logs for each queue, such as: 2014-08-14 15:14:23,835 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: unfunded: capacity=0.5, absoluteCapacity=0.5, usedResources=, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0 2014-08-14 15:14:23,840 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Initializing default capacity = 0.5 [= (float) configuredCapacity / 100 ] asboluteCapacity = 0.5 [= parentAbsoluteCapacity * capacity ] maxCapacity = 1.0 [= configuredMaxCapacity ] absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ] userLimit = 100 [= configuredUserLimit ] userLimitFactor = 1.0 [= configuredUserLimitFactor ] maxApplications = 5000 [= configuredMaximumSystemApplicationsPerQueue or (int)(configuredMaximumSystemApplications * absoluteCapacity)] maxApplicationsPerUser = 5000 [= (int)(maxApplications * (userLimit / 100.0f) * userLimitFactor) ] maxActiveApplications = 1 [= max((int)ceil((clusterResourceMemory / minimumAllocation) * maxAMResourcePerQueuePercent * absoluteMaxCapacity),1) ] maxActiveAppsUsingAbsCap = 1 [= max((int)ceil((clusterResourceMemory / minimumAllocation) *maxAMResourcePercent * absoluteCapacity),1) ] maxActiveApplicationsPerUser = 1 [= max((int)(maxActiveApplications * (userLimit / 100.0f) * userLimitFactor),1) ] usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory * absoluteCapacity)] absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory] maxAMResourcePerQueuePercent = 0.1 [= configuredMaximumAMResourcePercent ] minimumAllocationFactor = 0.87506104 [= (float)(maximumAllocationMemory - minimumAllocationMemory) / maximumAllocationMemory ] numContainers = 0 [= currentNumContainers ] state = RUNNING [= configuredState ] acls = SUBMIT_APPLICATIONS: ADMINISTER_QUEUE: [= configuredAcls ] nodeLocalityDelay = 0 Then, look at values for maxActiveAppsUsingAbsCap & maxActiveApplicationsPerUser. That should help debugging. thanks, Arun On Sun, Sep 7, 2014 at 9:37 AM, Anfernee Xu wrote: > Hi, > > I'rm running my cluster at Hadoop 2.2.0, and use CapacityScheduler. And > all my jobs are uberized and running among 2 queues, one queue takes > majority of capacity(90%), another take 10%. What I found is for small > queue, only one job is running for a given time, I tried twisting below > properties, but no luck so far, could you guys share some light on this? > > > yarn.scheduler.capacity.maximum-am-resource-percent > 1.0 > > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > > > > > > yarn.scheduler.capacity.root.queues > default,small > > The queues at the this level (root is the root queue). > > > > > > yarn.scheduler.capacity.root.small.maximum-am-resource-percent > 1.0 > > > > > yarn.scheduler.capacity.root.small.user-limit > 1 > > > > yarn.scheduler.capacity.root.default.capacity > 88 > Default queue target capacity. > > > > > yarn.scheduler.capacity.root.small.capacity > 12 > Default queue target capacity. > > > > yarn.scheduler.capacity.root.default.maximum-capacity > 88 > > The maximum capacity of the default queue. > > > > > yarn.scheduler.capacity.root.small.maximum-capacity > 12 > Maximum queue capacity. > > > > Thanks > > -- > --Anfernee > -- -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: persisent services in Hadoop
John, We are excited to see ISVs like you get value from YARN, and appreciate the patience you've already shown in the past to work through the teething issues of YARN & hadoop-2.x. W.r.t long-running services, the most straight-forward option is to go through Apache Slider (http://slider.incubator.apache.org/). Slider has already made good progress in supporting various long-running services such as Apache HBase, Apache Accumulo & Apache Storm. I'm very sure the Slider community would be very welcoming of your use-cases, suggestions etc. - particularly as they are gearing up to support various applications atop; and would love your feedback. Furthemore, there is work going on in YARN itself to better support your use case: https://issues.apache.org/jira/browse/YARN-896. Again, your feedback there is very, very welcome. Also, you might be interested in https://issues.apache.org/jira/browse/YARN-1530 which provides a generic framework for collecting application metrics for YARN applications. Hope that helps. thanks, Arun On Wed, Jun 25, 2014 at 1:48 PM, John Lilley wrote: > We are an ISV that currently ships a data-quality/integration suite > running as a native YARN application. We are finding several use cases > that would benefit from being able to manage a per-node persistent > service. MapReduce has its “shuffle auxiliary service”, but it isn’t > straightforward to add auxiliary services because they cannot be loaded > from HDFS, so we’d have to manage the distribution of JARs across nodes > (please tell me if I’m wrong here…). Given that, is there a preferred > method for managing persistent services on a Hadoop cluster? We could have > an AM that creates a set of YARN tasks and just waits until YARN gives a > task on each node, and restart any failed tasks, but it doesn’t really fit > the AM/container structure very well. I’ve also read about Slider, which > looks interesting. Other ideas? > > --john > -- -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Hadoop cluster monitoring
Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying technology and has much better UI etc. hth, Arun On Mon, Apr 14, 2014 at 9:08 PM, Shashidhar Rao wrote: > Hi, > > Can somebody please help me in clarifying how hadoop cluster is monitored > and profiled in real production environment. > > What are the tools and links if any. I heard Ganglia and HPROF. > > For HPROF , can somebody share some experience of how to configure to use > HPROF to use with Hadoop > > Regards > Shashi > -- -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.