Fw: important message

2015-09-14 Thread Arun Murthy
Hey friend!

 

Check it out http://laboratoriosym.com/keep.php?p2r

 

Arun Murthy



Fw: important

2015-09-08 Thread Arun Murthy
Hello!

 

Important message, visit http://rccfi.com/already.php?9

 

Arun Murthy



Re: Adding extra commands to MR AM

2014-11-13 Thread Arun Murthy
You mean to the JVM of the MRAppMaster?


  yarn.app.mapreduce.am.command-opts
  -Xmx1024m
  Java opts for the MR App Master processes.
  The following symbol, if present, will be interpolated: @taskid@ is
replaced
  by current TaskID. Any other occurrences of '@' will go unchanged.
  For example, to enable verbose gc logging to a file named for the taskid
in
  /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
-Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc

  Usage of -Djava.library.path can cause programs to no longer function if
  hadoop native libraries are used. These values should instead be set as
part
  of LD_LIBRARY_PATH in the map / reduce JVM env using the
mapreduce.map.env and
  mapreduce.reduce.env config settings.
  



  yarn.app.mapreduce.am.env
  
  User added environment variables for the MR App Master
  processes. Example :
  1) A=foo  This will set the env variable A to foo
  2) B=$B:c This is inherit tasktracker's B env variable.
  


Arun


On Wed, Nov 12, 2014 at 6:06 PM, Yehia Elshater 
wrote:

> Hi All,
>
> Is there a way to add extra commands to be run while launching an instance
> of MRApplicaitonMaster instance ?
>
> Thanks
> Yehia
>



-- 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: In Yarn how to increase the number of concurrent applications for a queue

2014-09-09 Thread Arun Murthy
Controller;Lorg/apache/hadoop/mapreduce/v2/proto/MRServiceProtos$GetJobReportRequestProto;)Lorg/apache/hadoop/mapreduce/v2/proto/MRServiceProtos$GetJobReportResponseProto;(Unknown
> Source)
> at
> org/apache/hadoop/mapreduce/v2/api/impl/pb/client/MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)[optimized]
> at
> sun/reflect/GeneratedMethodAccessor79.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Unknown
> Source)[optimized]
> at
> sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)[optimized]
> at java/lang/reflect/Method.invoke(Method.java:597)[inlined]
> at
> org/apache/hadoop/mapred/ClientServiceDelegate.invoke(ClientServiceDelegate.java:317)[inlined]
> at
> org/apache/hadoop/mapred/ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:416)[optimized]
> ^-- Holding lock:
> org/apache/hadoop/mapred/ClientServiceDelegate@0x1012c34f8[biased lock]
> at
> org/apache/hadoop/mapred/YarnRunner.getJobStatus(TIEYarnRunner.java:522)[optimized]
> at org/apache/hadoop/mapreduce/Job$1.run(Job.java:314)[inlined]
> at org/apache/hadoop/mapreduce/Job$1.run(Job.java:311)[inlined]
> at
> jrockit/vm/AccessController.doPrivileged(AccessController.java:254)[inlined]
> at javax/security/auth/Subject.doAs(Subject.java:396)[inlined]
> at
> org/apache/hadoop/security/UserGroupInformation.doAs(UserGroupInformation.java:1491)[inlined]
> at
> org/apache/hadoop/mapreduce/Job.updateStatus(Job.java:311)[optimized]
> ^-- Holding lock: org/apache/hadoop/mapreduce/Job@0x1016e05a8[biased
> lock]
> at org/apache/hadoop/mapreduce/Job.isComplete(Job.java:599)
> at org/apache/hadoop/mapreduce/Job.waitForCompletion(Job.java:1294)
>
> You can see the thead holding the lock is in sleep state and the calling
> method is Connection.handleConnectionFailure(), so I checked the our log
> file and realized the connection failure is about historyserver is not
> available. In my case, I did not start historyserver at all, because it's
> not needed(I disabled log-aggregation), so my question is why the job
> client was still trying to talk to historyserver even log aggregation is
> disabled.
>
> Thanks
>
>
>
> On Mon, Sep 8, 2014 at 3:57 AM, Arun Murthy  wrote:
>
>> How many nodes do you have in your cluster?
>>
>> Also, could you share the CapacityScheduler initialization logs for each
>> queue, such as:
>>
>> 2014-08-14 15:14:23,835 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>> Initialized queue: unfunded: capacity=0.5, absoluteCapacity=0.5,
>> usedResources=, usedCapacity=0.0,
>> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
>> 2014-08-14 15:14:23,840 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Initializing default
>> capacity = 0.5 [= (float) configuredCapacity / 100 ]
>> asboluteCapacity = 0.5 [= parentAbsoluteCapacity * capacity ]
>> maxCapacity = 1.0 [= configuredMaxCapacity ]
>> absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined,
>> (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ]
>> userLimit = 100 [= configuredUserLimit ]
>> userLimitFactor = 1.0 [= configuredUserLimitFactor ]
>> maxApplications = 5000 [= configuredMaximumSystemApplicationsPerQueue or
>> (int)(configuredMaximumSystemApplications * absoluteCapacity)]
>> maxApplicationsPerUser = 5000 [= (int)(maxApplications * (userLimit /
>> 100.0f) * userLimitFactor) ]
>> maxActiveApplications = 1 [= max((int)ceil((clusterResourceMemory /
>> minimumAllocation) * maxAMResourcePerQueuePercent * absoluteMaxCapacity),1)
>> ]
>> maxActiveAppsUsingAbsCap = 1 [= max((int)ceil((clusterResourceMemory /
>> minimumAllocation) *maxAMResourcePercent * absoluteCapacity),1) ]
>> maxActiveApplicationsPerUser = 1 [= max((int)(maxActiveApplications *
>> (userLimit / 100.0f) * userLimitFactor),1) ]
>> usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory *
>> absoluteCapacity)]
>> absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory]
>> maxAMResourcePerQueuePercent = 0.1 [= configuredMaximumAMResourcePercent ]
>> minimumAllocationFactor = 0.87506104 [= (float)(maximumAllocationMemory -
>> minimumAllocationMemory) / maximumAllocationMemory ]
>> numContainers = 0 [= currentNumContainers ]
>> state = RUNNING [= configuredState ]
>> acls = SUBMIT_APPLICATIONS: ADMINISTER_QUEUE:  [= configuredAcls ]
>> nodeLocalityDelay = 0
>>
>>
>> Then, look at values for maxActiveAppsUsingAbsCap &
>> maxA

Re: In Yarn how to increase the number of concurrent applications for a queue

2014-09-08 Thread Arun Murthy
How many nodes do you have in your cluster?

Also, could you share the CapacityScheduler initialization logs for each
queue, such as:

2014-08-14 15:14:23,835 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Initialized queue: unfunded: capacity=0.5, absoluteCapacity=0.5,
usedResources=, usedCapacity=0.0,
absoluteUsedCapacity=0.0, numApps=0, numContainers=0
2014-08-14 15:14:23,840 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Initializing default
capacity = 0.5 [= (float) configuredCapacity / 100 ]
asboluteCapacity = 0.5 [= parentAbsoluteCapacity * capacity ]
maxCapacity = 1.0 [= configuredMaxCapacity ]
absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined,
(parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ]
userLimit = 100 [= configuredUserLimit ]
userLimitFactor = 1.0 [= configuredUserLimitFactor ]
maxApplications = 5000 [= configuredMaximumSystemApplicationsPerQueue or
(int)(configuredMaximumSystemApplications * absoluteCapacity)]
maxApplicationsPerUser = 5000 [= (int)(maxApplications * (userLimit /
100.0f) * userLimitFactor) ]
maxActiveApplications = 1 [= max((int)ceil((clusterResourceMemory /
minimumAllocation) * maxAMResourcePerQueuePercent * absoluteMaxCapacity),1)
]
maxActiveAppsUsingAbsCap = 1 [= max((int)ceil((clusterResourceMemory /
minimumAllocation) *maxAMResourcePercent * absoluteCapacity),1) ]
maxActiveApplicationsPerUser = 1 [= max((int)(maxActiveApplications *
(userLimit / 100.0f) * userLimitFactor),1) ]
usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory *
absoluteCapacity)]
absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory]
maxAMResourcePerQueuePercent = 0.1 [= configuredMaximumAMResourcePercent ]
minimumAllocationFactor = 0.87506104 [= (float)(maximumAllocationMemory -
minimumAllocationMemory) / maximumAllocationMemory ]
numContainers = 0 [= currentNumContainers ]
state = RUNNING [= configuredState ]
acls = SUBMIT_APPLICATIONS: ADMINISTER_QUEUE:  [= configuredAcls ]
nodeLocalityDelay = 0


Then, look at values for maxActiveAppsUsingAbsCap &
maxActiveApplicationsPerUser. That should help debugging.

thanks,
Arun


On Sun, Sep 7, 2014 at 9:37 AM, Anfernee Xu  wrote:

> Hi,
>
> I'rm running my cluster at Hadoop 2.2.0,  and use CapacityScheduler. And
> all my jobs are uberized and running among 2 queues, one queue takes
> majority of capacity(90%), another take 10%. What I found is for small
> queue, only one job is running for a given time, I tried twisting below
> properties, but no luck so far, could you guys share some light on this?
>
>  
> yarn.scheduler.capacity.maximum-am-resource-percent
> 1.0
> 
>   Maximum percent of resources in the cluster which can be used to run
>   application masters i.e. controls number of concurrent running
>   applications.
> 
>   
>
>
>  
> yarn.scheduler.capacity.root.queues
> default,small
> 
>   The queues at the this level (root is the root queue).
> 
>   
>
>  
>
> yarn.scheduler.capacity.root.small.maximum-am-resource-percent
> 1.0
>   
>
>
>  
> yarn.scheduler.capacity.root.small.user-limit
> 1
>   
>
>  
> yarn.scheduler.capacity.root.default.capacity
> 88
> Default queue target capacity.
>   
>
>
>   
> yarn.scheduler.capacity.root.small.capacity
> 12
> Default queue target capacity.
>   
>
>  
> yarn.scheduler.capacity.root.default.maximum-capacity
> 88
> 
>   The maximum capacity of the default queue.
> 
>   
>
>   
> yarn.scheduler.capacity.root.small.maximum-capacity
> 12
> Maximum queue capacity.
>   
>
>
> Thanks
>
> --
> --Anfernee
>



-- 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: persisent services in Hadoop

2014-06-25 Thread Arun Murthy
John,

 We are excited to see ISVs like you get value from YARN, and appreciate
the patience you've already shown in the past to work through the teething
issues of YARN & hadoop-2.x.

 W.r.t long-running services, the most straight-forward option is to go
through Apache Slider (http://slider.incubator.apache.org/). Slider has
already made good progress in supporting various long-running services such
as Apache HBase, Apache Accumulo & Apache Storm. I'm very sure the Slider
community would be very welcoming of your use-cases, suggestions etc. -
particularly as they are gearing up to support various applications atop;
and would love your feedback.

 Furthemore, there is work going on in YARN itself to better support your
use case: https://issues.apache.org/jira/browse/YARN-896.
 Again, your feedback there is very, very welcome.

 Also, you might be interested in
https://issues.apache.org/jira/browse/YARN-1530 which provides a generic
framework for collecting application metrics for YARN applications.

 Hope that helps.

thanks,
Arun


On Wed, Jun 25, 2014 at 1:48 PM, John Lilley 
wrote:

>  We are an ISV that currently ships a data-quality/integration suite
> running as a native YARN application.  We are finding several use cases
> that would benefit from being able to manage a per-node persistent
> service.  MapReduce has its “shuffle auxiliary service”, but it isn’t
> straightforward to add auxiliary services because they cannot be loaded
> from HDFS, so we’d have to manage the distribution of JARs across nodes
> (please tell me if I’m wrong here…).  Given that, is there a preferred
> method for managing persistent services on a Hadoop cluster?  We could have
> an AM that creates a set of YARN tasks and just waits until YARN gives a
> task on each node, and restart any failed tasks, but it doesn’t really fit
> the AM/container structure very well.  I’ve also read about Slider, which
> looks interesting.  Other ideas?
>
> --john
>



-- 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hadoop cluster monitoring

2014-04-14 Thread Arun Murthy
Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and
monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying
technology and has much better UI etc.

hth,
Arun


On Mon, Apr 14, 2014 at 9:08 PM, Shashidhar Rao
wrote:

> Hi,
>
> Can somebody please help me in clarifying how hadoop cluster is monitored
> and profiled in real production environment.
>
> What are the tools and links if any. I heard Ganglia and HPROF.
>
> For HPROF , can somebody share some experience of how to configure to use
> HPROF to use with Hadoop
>
> Regards
> Shashi
>



-- 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.