from:"Sunil Govind"

Re: YARN - level/depth of monitoring info - newbie question

2017-07-25 Thread Sunil Govind

Hi Rajila,

>From YARN side, you will be able to get detailed information about the
application. And that application could be MapReduce or anything. But in
side that mapreduce app, what kind of operation is done, its specific to
that application (here its mapreduce).

YARN could only be able to give you time/memory/cpu usage w.r.t app or
atmost at node level.

Thanks
Sunil

On Tue, Jul 25, 2017 at 3:46 AM rajila2008 .  wrote:

> Hi all,
>
> Does YARN provide application level info ?!
>
> For example : there is a map-reduce job persisting its outcome in a NoSql
> datastore by executing an INSERT command.
> Can YARN provide the execution time for the INSERT ?  without the
> application itself is not logging the info anywhere.
>
> There's some argument at workplace , Dev team asking prod-support to find
> such info thru YARN logs.
>
> I believe "YARN's resource reporting"  is similar to "unix top" command,
> but at cluster level .   "top" give system level info , not how many
> INSERTs a job executed.
> Similarly YARN will not give application specific info like the number of
> INSERT op OR record count OR array size with a job ,  the application needs
> to log such info as needed.
>
> Could anyone please clarify if my understanding correct ?!
>
> Regards,
> Rajila
>

Re: unsubscribe and subscribe to anotehr email

2017-05-16 Thread Sunil Govind

Please follow steps given at
https://www.apache.org/foundation/mailinglists.html to
subscribe/unsubscribe.

Thanks
Sunil

On Tue, May 16, 2017 at 7:06 PM Venkatrama, Krishna <
krishna.venkatr...@bcbsfl.com> wrote:

> I am unsubscribing to this
>
>
>
> Pl add kkvsh...@yahoo.com to the user group
>
>
>
> *Krishna Venkatrama(KK)*
>
> 9049052189 <(904)%20905-2189>
>
> IT Shared Services
>
> Sr. Information Architect
>
> Hadoop Data Platform Architect
>
>
>
> We comply with applicable Federal civil rights laws and do not
> discriminate on the basis of race, color, national origin, age, disability
> or sex. You may access the Non-Discrimination and Accessibility Notice
> here .
>
> Language Assistance Available:
>
> Español, Kreyol Ayisien, Tiếng Việt, Português, 中文, français, Tagalog,
> русский, italiano, Deutsche, 한국어, Polskie, Gujarati, ไทย, العربية, 日本語,
> فارسی 
>
> Florida Blue is a trade name of Blue Cross and Blue Shield of Florida,
> Inc.  Blue Cross and Blue Shield of Florida, Inc., and its subsidiary and
> affiliate companies are not responsible for errors or omissions in this
> e-mail message. Any personal comments made in this e-mail do not reflect
> the views of Blue Cross and Blue Shield of Florida, Inc.  The information
> contained in this document may be confidential and intended solely for the
> use of the individual or entity to whom it is addressed.  This document may
> contain material that is privileged or protected from disclosure under
> applicable law.  If you are not the intended recipient or the individual
> responsible for delivering to the intended recipient, please (1) be advised
> that any use, dissemination, forwarding, or copying of this document IS
> STRICTLY PROHIBITED; and (2) notify sender immediately by telephone and
> destroy the document. THANK YOU.
>

Re: Modifying container log granularity at job submission time

2017-05-03 Thread Sunil Govind

Hi Benson

I think you are trying to enable debug for an MR app.
-Dmapreduce.map.log.level=DEBUG -Dmapreduce.reduce.log.level=DEBUG
-Dyarn.app.mapreduce.am.log.level=DEBUG

These options could be set while submitting job.

Thanks
Sunil

On Tue, May 2, 2017 at 7:34 AM Benson Qiu  wrote:

> When I view container logs, I only see "INFO:" log lines. How do I make
> the log lines more fine grained?
>
> I've tried the following, without success:
>
> Configuration.setStrings(MRJobConfig.MR_AM_LOG_LEVEL, "DEBUG");
>
>
> Thanks,
> Benson
>

Re: Max Application Master Resources with Queue Elasticity

2017-02-09 Thread Sunil Govind

Hello Benson

If I view "Maximum Application Master Resources" on the ResourceManager Web
UI for QueueA, I should see 4096MB, correct?
> Yes. Are u seeing any different behavior? if so, please share
cap-sched.xml.

shouldn't we be able to run 4 uber-mode jobs on QueueA without waiting or
using preemption?
> Yes, it should be.

are you saying that 20% of 5GB is 1GB, so we can only run 1 uber-mode job
even though 5GB is available?
> ideally no. we take max(queue capacity, available limit) * am-res-pcnt.

- Sunil


On Tue, Feb 7, 2017 at 11:51 PM Benson Qiu <benson@salesforce.com>
wrote:

> Hi Sunil,
>
> Thanks for your reply!
>
> I have some follow up questions to make sure I fully understand the
> scenario you mentioned (QueueA has 50% capacity, 100% max-capacity, 20%
> maximum-am-resource-percent, cluster resource is 20GB, AM container size is
> 1GB, QueueB has taken over 15GB).
>
> Adding on, lets assume the following:
> - All jobs run in uber mode so we don't need to worry about additional
> resources for map and reduce containers.
> - root.QueueA and root.QueueB are the only two queues on the cluster.
> - user-limit-factor is high enough that a single user can use all of
> QueueA and QueueB's elasticity.
>
> Some questions:
> 1. If I view "Maximum Application Master Resources" on the ResourceManager
> Web UI for QueueA, I should see 4096MB, correct? (QueueA elastically can
> use 100% of the 20GB cluster. 20% of 20GB = 4096MB).
> 2. At the current point in time when QueueB is using 15GB, QueueA has 5GB
> available. Since "Maximum Application Master Resources" is 4096MB, and 5GB
> is available, shouldn't we be able to run 4 uber-mode jobs on QueueA
> without waiting or using preemption? Or are you saying that 20% of 5GB is
> 1GB, so we can only run 1 uber-mode job even though 5GB is available?
>
> Thanks,
> Benson
>
> On Mon, Feb 6, 2017 at 9:25 PM, Sunil Govind <sunil.gov...@gmail.com>
> wrote:
>
> Hello Benson
>
> I could help to explain a little bit here.
>
> maximum-am-resource-percent could be configured per-queue level (from
> next release, it could be configure per node-label level as well). By
> default 10% is default, and hence 10% of queue's capacity could be used for
> running AM resources. However due to elasticity, a queue could have
> resources above its configured capacity. In that case, "Max Application
> Master Resources" will be considering queue's max limit.
>
> To answer your question, Yes. Ideally this resources is available for
> running AM. However there could many other reasons by which this resource
> may not be available for AM. To list a few, assume QueueA has 50% capacity
> and 100% as its max-capacity. AM resource percentage is 20%. Cluster
> resource is 20GB.
> - Assume QueueB has taken over 15GB. And one app is running in QueueA with
> 1GB as AM resource. As per calculation 4GB could go to AM resource.
> However, we need to wait till some resource are freed from QueueB or use
> preemption.
> - User limit. If user-limit-factor is <=1, then you may not be able to get
> more resources for elasticity.
>
> If you tune all params as per your scenario, and if there are enough
> resources in cluster, you could avail this resource for AM.
>
> Thanks
> Sunil
>
> On Tue, Feb 7, 2017 at 9:20 AM Benson Qiu <benson@salesforce.com>
> wrote:
>
> Hi,
>
> I noticed that "Max Application Master Resources" on the ResourceManager
> UI (/cluster/scheduler) takes into account queue elasticity.
>
> AMResourceLimit and userAMResourceLimit on the ResourceManager API
> (/ws/v1/cluster/scheduler) also takes into account queue elasticity.
>
> Are these AM resources always guaranteed? If a queue cannot grow because
> all of the other queues in the cluster are fully utilized, does the queue
> still have "Max Application Master Resources" available for AM containers?
>
> Thanks,
> Benson
>
>
>

Re: YarnClient vs ResourceManager REST APIs

2017-02-08 Thread Sunil Govind

Hi Benson

As mentioned earlier, QueueInfo has many information related to a queue. We
could configure maximum-am-resource-percent per-queue level or at scheduler
level so that it could be applicable for all.

https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/yarn/api/records/QueueInfo.html

I think you have raised YARN-6164, and we could discuss further there.

Thanks
Sunil



On Wed, Feb 8, 2017 at 6:32 AM Benson Qiu <benson@salesforce.com> wrote:

> Hi Sunil,
>
> Wanted to follow up on this. I'm having trouble finding a way to access
> `yarn.scheduler.capacity.maximum-am-resource-percent`.
>
> For Hadoop 2.7.2, it does not seem to be possible to get
> maximum-am-resouce-percent through any of the following methods:
> YarnClient
> <https://hadoop.apache.org/docs/r2.7.2/api/index.html?org/apache/hadoop/yarn/client/api/YarnClient.html>,
> ResourceManager HTTP APIs
> <https://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html>,
> or yarn rmadmin command
> <https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html>
> .
>
> I'm planning on raising a ticket on JIRA. Just wanted to make sure I'm not
> overlooking anything?
>
> Thanks,
> Benson
>
> On Tue, Jan 31, 2017 at 1:24 AM, Sunil Govind <sunil.gov...@gmail.com>
> wrote:
>
> Hi Benson
>
> QueueInfo  is used to get some basic information related to a queue. If
> there is a use case to have some more extended information such as
> resourceUsedForAM etc, then please raise a ticket under YARN project and we
> could discuss more there.
>
>
> Thanks
>
> Sunil
>
>
> On Thu, Jan 19, 2017 at 6:40 AM Benson Qiu <benson@salesforce.com>
> wrote:
>
> The ResourceManager REST APIs provide some information that we can't
> obtain from YarnClient
> <https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html>.
> For example, the http:///ws/v1/cluster/scheduler
> endpoint provides "AMResourceUsed" which is not provided by YarnClient.
>
> Are there any plans to add this additional information to YarnClient in
> the future?
>
> Thanks,
> Benson
>
>

Re: Max Application Master Resources with Queue Elasticity

2017-02-06 Thread Sunil Govind

Hello Benson

I could help to explain a little bit here.

maximum-am-resource-percent could be configured per-queue level (from next
release, it could be configure per node-label level as well). By default
10% is default, and hence 10% of queue's capacity could be used for running
AM resources. However due to elasticity, a queue could have resources above
its configured capacity. In that case, "Max Application Master Resources"
will be considering queue's max limit.

To answer your question, Yes. Ideally this resources is available for
running AM. However there could many other reasons by which this resource
may not be available for AM. To list a few, assume QueueA has 50% capacity
and 100% as its max-capacity. AM resource percentage is 20%. Cluster
resource is 20GB.
- Assume QueueB has taken over 15GB. And one app is running in QueueA with
1GB as AM resource. As per calculation 4GB could go to AM resource.
However, we need to wait till some resource are freed from QueueB or use
preemption.
- User limit. If user-limit-factor is <=1, then you may not be able to get
more resources for elasticity.

If you tune all params as per your scenario, and if there are enough
resources in cluster, you could avail this resource for AM.

Thanks
Sunil

On Tue, Feb 7, 2017 at 9:20 AM Benson Qiu  wrote:

> Hi,
>
> I noticed that "Max Application Master Resources" on the ResourceManager
> UI (/cluster/scheduler) takes into account queue elasticity.
>
> AMResourceLimit and userAMResourceLimit on the ResourceManager API
> (/ws/v1/cluster/scheduler) also takes into account queue elasticity.
>
> Are these AM resources always guaranteed? If a queue cannot grow because
> all of the other queues in the cluster are fully utilized, does the queue
> still have "Max Application Master Resources" available for AM containers?
>
> Thanks,
> Benson
>

Re: YarnClient vs ResourceManager REST APIs

2017-01-31 Thread Sunil Govind

Hi Benson

QueueInfo  is used to get some basic information related to a queue. If
there is a use case to have some more extended information such as
resourceUsedForAM etc, then please raise a ticket under YARN project and we
could discuss more there.

Thanks

Sunil

On Thu, Jan 19, 2017 at 6:40 AM Benson Qiu 
wrote:

> The ResourceManager REST APIs provide some information that we can't
> obtain from YarnClient
> .
> For example, the http:///ws/v1/cluster/scheduler
> endpoint provides "AMResourceUsed" which is not provided by YarnClient.
>
> Are there any plans to add this additional information to YarnClient in
> the future?
>
> Thanks,
> Benson
>

Re: Heartbeat between RM and AM

2017-01-02 Thread Sunil Govind

Hi

If you are thinking about allocation requests heartbeat calls from AM to
RM, then its mostly driven per application level (not YARN specific
config). For eg: in MapReduce, below config is used for same.
yarn.app.mapreduce.am.scheduler.heartbeat.interval-ms

Thanks
Sunil

On Sat, Dec 31, 2016 at 8:20 AM Sultan Alamro 
wrote:

> Hi all,
>
> Can any one tell me how I can modify the heartbeat between the RM and AM?
> I need to add new requests to the AM from the RM.
>
> These requests basically are values calculated by the RM to be used by the
> AM online.
>
> Thanks,
> Sultan
>

Re: Host not returned with getApplicationAttempts API

2016-12-20 Thread Sunil Govind

Hi Ajay

'host' printed in above response block is from ApplicationAttemptReport.
This is supplied by AM (Application Master) during its registration.
I am not sure which application you are querying for here. I suggest you
can check in AM logs or configs etc.

Thanks
Sunil

On Tue, Dec 20, 2016 at 1:23 AM AJAY GUPTA  wrote:

> Hi,
>
> I want to print the Host received in applicationAttemptReport of all
> attempts when yarnClient.getApplicationAttempts() API is called. I see that
> the host received is NULL whereas other fields have information. Is this a
> bug or only I am seeing this behaviour?
>
>
> application_attempts {
>
>   application_attempt_id {
>
> application_id {
>
>   id: 2
>
>   cluster_timestamp: 1481851584221
>
> }
>
> attemptId: 1
>
>   }
>
>   host: ""
>
>   rpc_port: 0
>
>   tracking_url: "
> http://localhost:8088/proxy/application_1481851584221_0002/;
>
>   diagnostics: ""
>
>   yarn_application_attempt_state: APP_ATTEMPT_RUNNING
>
>   am_container_id {
>
> app_attempt_id {
>
>   application_id {
>
> id: 2
>
> cluster_timestamp: 1481851584221
>
>   }
>
>   attemptId: 1
>
> }
>
> id: 1
>
>   }
>
>   original_tracking_url: "localhost:59636"
>
> }
>

Re: Fetch container list for failed application attempt

2016-12-15 Thread Sunil Govind

Hello

IIUC, failed attempt is not removed from attempts list. While querying for
containers of failed attempt, internally resource manager gives u container
from running attempt. Its some what fine as we are transferring few
containers from old attempt if available.

If this behavior is not correct as per your usecase, I suggest you could
raise a JIRA ticket explaining the problem and folks will help in
discussing the same.

Thanks
Sunil

On Wed, Dec 14, 2016 at 5:02 PM priyanka gugale 
wrote:

> Hi,
>
> I am launching a yarn application. If I kill app master, it tries to
> restart application with new attempt id. Now I use yarn command,
>
> yarn container -list 
>
> When I provide the Application Attempt ID of failed attempt, it lists the
> container from next attempt which is in "RUNNING" state right now.
>
> Shouldn't this return either list of killed containers from attempt 1 or
> empty list? Is this a issue or it's expected behavior?
>
> -Priyanka
>

Re: how to add a shareable node label?

2016-10-12 Thread Sunil Govind

Hi Frank

Thanks for sharing more details. Let me try this combination and I might be
wrong, so pls correct me. And I think sharable node-label could help here.


Labels:

Node1-4 = default label

Node8-9 = "special" label


Queues:

"ProdQ" accessable-labels  is ""  (only default label)

"TestQ" accessable-labels  is ""  (only default label)


"LabeledQ" accessable-labels "special"


Capacity Per Queue:

"ProdQ"

capacity=50%

max-capacity=100%


"TestQ"

capacity=50%

max-capacity=50%


"LabeledQ"

special.capacity=100%

special.max-capacity=100%


Various Choices:

* Jobs in ProdQ is assured with 50% of default label resources and it can
go to 100% if there are no resource running in TestQ

* Jobs in TestQ can only get 50% of default label resources.

* If jobs in ProdQ or TestQ needed to make use of "special" label machines,
it is only possible when there are resource available in "special" label.
"special" is a non-exclusive label which can share its resource with
"default" label.

* Any job submitted in "LabeledQ"  is assured with 100%
of special resources and can use 100% if nothing is there. I think
preemption could be made optional here.

If Inter queue preemption is enabled, we can enforce a normalization faster
for default label, else apps might need to wait.  We could also try another
approach as I shared in an earlier mail. But it ensure some % of resources
for ProdQ and TestQ in LabeledQ which may not be suitable.


Thanks

Sunil

On Tue, Oct 11, 2016 at 10:38 PM Frank Luo <j...@merkleinc.com> wrote:

>
>
>
>
>
>
>
>
> Hah, how so? I am confused as I was under impression that I needed sharing
> but not preemption.
>
>
>
> Let’s model this out.
>
>
>
> Assuming I got 4 “normal” machines node1-4, and two special node8 and
> node9 where JobA can be executed on.
>
>
>
> And I need two queues, ProdQ and TestQ equally sharing Node1-4, and a
> “LabeledQ” with node8/9.
>
>
>
> When ProdQ is full, it can overflow to TestQ and further to LabeledQ. If
> TestQ is full, the tasks stay in TestQ, or optionally overflow to LabeledQ
> (either way
> is fine as long as it doesn’t go to ProdQ). And when JobA is running, it
> can only go to LabelledQ. If something else is on LabelledQ, JobA waits.
>
>
>
> Do you mind to illustrate how to config the queues to achieve what I am
> looking for?
>
>
>
> Thank you Sunil.
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
>
>
> *Sent:* Tuesday, October 11, 2016 11:44 AM
>
>
>
> *To:* Frank Luo <j...@merkleinc.com>; user@hadoop.apache.org
>
> *Subject:* Re: how to add a shareable node label?
>
>
>
>
> Hi Frank
>
>
>
>
>
>
> Extremely sorry for the delay..
>
>
>
>
>
>
>
> Yes, you are correct. Sharing feature of node label is not needed in your
> case.
>
>
>
> Existing node labels and a queue model could solve the problem.
>
>
>
>
>
>
>
> Thanks
>
>
>
> Sunil
>
>
>
>
>
>
>
> On Fri, Oct 7, 2016 at 11:59 PM Frank Luo <j...@merkleinc.com> wrote:
>
>
>
>
>
> That is correct, Sunil.
>
>
>
> Just to confirm,  the Node Labeling feature on 2.8 or 3.0 alpha won’t
> satisfy
> my need, right?
>
>
>
> *From:*
> Sunil Govind [mailto:sunil.gov...@gmail.com]
>
>
> *Sent:* Friday, October 07, 2016 12:09 PM
>
>
>
>
>
>
>
> *To:* Frank Luo <j...@merkleinc.com>;
> user@hadoop.apache.org
>
> *Subject:* Re: how to add a shareable node label?
>
>
>
>
>
>
>
>
> HI Frank
>
>
>
>
>
>
> In that case, preemption may not be needed. So over-utilizing resources of
> queueB will be running till it completes. Since queueA is under served,
> then any next free container could
> go to queueA which is for Job_A.
>
>
>
>
>
>
>
> Thanks
>
>
>
> Sunil
>
>
>
>
>
>
>
> On Fri, Oct 7, 2016 at 9:58 PM Frank Luo <j...@merkleinc.com> wrote:
>
>
>
>
>
> Sunil,
>
>
>
> Your description pretty much matches my understanding. Except for “Job_A
> will have to run as per its schedule w/o any delay”. My situation is that
> Job_A can be delayed. As long as it runs in queueA, I am happy.
>
>
>
> Just as you said, processes normally running in queueB might not be
> preemptable.
> So if they overflow to queueA then got preempted, then that is not good.
>
>
>
> *From:*
> Sunil Govind [mailto:sunil.gov...@gmail.com]
>
>
> *Sent:* Friday, October 07, 2016 10:50 AM
&g

Re: how to add a shareable node label?

2016-10-11 Thread Sunil Govind

Hi Frank

Extremely sorry for the delay..

Yes, you are correct. Sharing feature of node label is not needed in your
case.
Existing node labels and a queue model could solve the problem.

Thanks
Sunil

On Fri, Oct 7, 2016 at 11:59 PM Frank Luo <j...@merkleinc.com> wrote:

> That is correct, Sunil.
>
>
>
> Just to confirm,  the Node Labeling feature on 2.8 or 3.0 alpha won’t
> satisfy my need, right?
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
> *Sent:* Friday, October 07, 2016 12:09 PM
>
>
> *To:* Frank Luo <j...@merkleinc.com>; user@hadoop.apache.org
> *Subject:* Re: how to add a shareable node label?
>
>
>
> HI Frank
>
>
>
> In that case, preemption may not be needed. So over-utilizing resources of
> queueB will be running till it completes. Since queueA is under served,
> then any next free container could go to queueA which is for Job_A.
>
>
>
> Thanks
>
> Sunil
>
>
>
> On Fri, Oct 7, 2016 at 9:58 PM Frank Luo <j...@merkleinc.com> wrote:
>
> Sunil,
>
>
>
> Your description pretty much matches my understanding. Except for “Job_A
> will have to run as per its schedule w/o any delay”. My situation is that
> Job_A can be delayed. As long as it runs in queueA, I am happy.
>
>
>
> Just as you said, processes normally running in queueB might not be
> preemptable. So if they overflow to queueA then got preempted, then that is
> not good.
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
> *Sent:* Friday, October 07, 2016 10:50 AM
>
>
> *To:* Frank Luo <j...@merkleinc.com>; user@hadoop.apache.org
>
> *Subject:* Re: how to add a shareable node label?
>
>
>
> HI Frank
>
>
>
> Thanks for the details.
>
>
>
> I am not quite sure if I understood you problem correctly. I think you are
> looking for a solution to ensure that Job_A will have to run as per its
> schedule w/o any delay. Meantime you also do not want to waste resources on
> those high end machine where Job_A is running.
>
>
>
> I think you still need node label exclusivity here since there is h/w
> dependency. But if you have 2 queues' which are shared to use "labelA"
> here, then always "Job_A" can be planned to run in that queue, say
> "queueA". Other jobs could be run in "queueB" here. So if you tune
> capacities and if preemption is enabled per queue level, overutilized
> resources used by "queueB" could be preempted for "Job_A".
>
>
>
> But if your sharable jobs are like some linux jobs which should not be
> preempted, then this may be only a half solution.
>
>
>
> Thanks
>
> Sunil
>
>
>
> On Fri, Oct 7, 2016 at 7:36 AM Frank Luo <j...@merkleinc.com> wrote:
>
> Sunil,
>
>
>
> You confirmed my understanding. I got the understanding by reading the
> docs and haven’t really tried 2.8 or 3.0-alphal.
>
>
>
> My situation is that I am in a multi-tenant env, and  got several very
> powerful machines with expensive licenses to run a particular linux job,
> let’s say Job_A. But the job is executed infrequently, so I want to let
> other jobs to use the machines when Job_A is not running. In the meaning
> time, I am not powerful enough to force all other jobs to be preemptable.
> As matter of fact, I know they have Hadoop jobs inserting into sql-server,
> or just pure linux jobs that are not preemptable in nature. So preempt jobs
> is not an option for me.
>
>
>
> I hope it makes sense.
>
>
>
> Frank
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
> *Sent:* Thursday, October 06, 2016 2:15 PM
>
>
> *To:* Frank Luo <j...@merkleinc.com>; user@hadoop.apache.org
> *Subject:* Re: how to add a shareable node label?
>
>
>
> HI Frank
>
>
>
> Ideally those containers will be preempted if there are unsatisfied demand
> for "configured label".
>
>
>
> I could explain this:
>
> "labelA" has few empty resources.  All nodes under "default" label is
> used. Hence a new application which is submitted to "default" label has to
> wait. But if "labelA" is non-exclusive and there are some free resources,
> this new application can run on "labelA".
>
> However if there are some more new apps submitted to "labelA", and if
> there are no more resources available in "labelA", then it may preempt
> containers from the app which was sharing containers earlier.
>
>
>
> May be you could share some more information so tht it may become more
> clear. Also I suppose you are running this in hadoop 3 alph

Re: how to add a shareable node label?

2016-10-07 Thread Sunil Govind

HI Frank

In that case, preemption may not be needed. So over-utilizing resources of
queueB will be running till it completes. Since queueA is under served,
then any next free container could go to queueA which is for Job_A.

Thanks
Sunil

On Fri, Oct 7, 2016 at 9:58 PM Frank Luo <j...@merkleinc.com> wrote:

> Sunil,
>
>
>
> Your description pretty much matches my understanding. Except for “Job_A
> will have to run as per its schedule w/o any delay”. My situation is that
> Job_A can be delayed. As long as it runs in queueA, I am happy.
>
>
>
> Just as you said, processes normally running in queueB might not be
> preemptable. So if they overflow to queueA then got preempted, then that is
> not good.
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
> *Sent:* Friday, October 07, 2016 10:50 AM
>
>
> *To:* Frank Luo <j...@merkleinc.com>; user@hadoop.apache.org
>
> *Subject:* Re: how to add a shareable node label?
>
>
>
> HI Frank
>
>
>
> Thanks for the details.
>
>
>
> I am not quite sure if I understood you problem correctly. I think you are
> looking for a solution to ensure that Job_A will have to run as per its
> schedule w/o any delay. Meantime you also do not want to waste resources on
> those high end machine where Job_A is running.
>
>
>
> I think you still need node label exclusivity here since there is h/w
> dependency. But if you have 2 queues' which are shared to use "labelA"
> here, then always "Job_A" can be planned to run in that queue, say
> "queueA". Other jobs could be run in "queueB" here. So if you tune
> capacities and if preemption is enabled per queue level, overutilized
> resources used by "queueB" could be preempted for "Job_A".
>
>
>
> But if your sharable jobs are like some linux jobs which should not be
> preempted, then this may be only a half solution.
>
>
>
> Thanks
>
> Sunil
>
>
>
> On Fri, Oct 7, 2016 at 7:36 AM Frank Luo <j...@merkleinc.com> wrote:
>
> Sunil,
>
>
>
> You confirmed my understanding. I got the understanding by reading the
> docs and haven’t really tried 2.8 or 3.0-alphal.
>
>
>
> My situation is that I am in a multi-tenant env, and  got several very
> powerful machines with expensive licenses to run a particular linux job,
> let’s say Job_A. But the job is executed infrequently, so I want to let
> other jobs to use the machines when Job_A is not running. In the meaning
> time, I am not powerful enough to force all other jobs to be preemptable.
> As matter of fact, I know they have Hadoop jobs inserting into sql-server,
> or just pure linux jobs that are not preemptable in nature. So preempt jobs
> is not an option for me.
>
>
>
> I hope it makes sense.
>
>
>
> Frank
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
> *Sent:* Thursday, October 06, 2016 2:15 PM
>
>
> *To:* Frank Luo <j...@merkleinc.com>; user@hadoop.apache.org
> *Subject:* Re: how to add a shareable node label?
>
>
>
> HI Frank
>
>
>
> Ideally those containers will be preempted if there are unsatisfied demand
> for "configured label".
>
>
>
> I could explain this:
>
> "labelA" has few empty resources.  All nodes under "default" label is
> used. Hence a new application which is submitted to "default" label has to
> wait. But if "labelA" is non-exclusive and there are some free resources,
> this new application can run on "labelA".
>
> However if there are some more new apps submitted to "labelA", and if
> there are no more resources available in "labelA", then it may preempt
> containers from the app which was sharing containers earlier.
>
>
>
> May be you could share some more information so tht it may become more
> clear. Also I suppose you are running this in hadoop 3 alpha1 release.
> please correct me if I m wrong.
>
>
>
> Thanks
>
> Sunil
>
>
>
> On Thu, Oct 6, 2016 at 9:44 PM Frank Luo <j...@merkleinc.com> wrote:
>
> Thanks Sunil.
>
>
>
> Ø  3. If there is any future ask for those resources , we will preempt
> the non labeled apps and give them back to labeled apps.
>
>
>
> Unfortunately, I am still not able to use it, because of the preemptive
> behavior. The jobs that steals labelled resources are not preemptable, and
> I’d rather waiting instead of killing.
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
> *Sent:* Thursday, October 06, 2016 1:59 AM
>
>
> *To:* Frank Luo <j...@merkleinc.com>; user@hadoop.apache.org
&g

Re: how to add a shareable node label?

2016-10-07 Thread Sunil Govind

HI Frank

Thanks for the details.

I am not quite sure if I understood you problem correctly. I think you are
looking for a solution to ensure that Job_A will have to run as per its
schedule w/o any delay. Meantime you also do not want to waste resources on
those high end machine where Job_A is running.

I think you still need node label exclusivity here since there is h/w
dependency. But if you have 2 queues' which are shared to use "labelA"
here, then always "Job_A" can be planned to run in that queue, say
"queueA". Other jobs could be run in "queueB" here. So if you tune
capacities and if preemption is enabled per queue level, overutilized
resources used by "queueB" could be preempted for "Job_A".

But if your sharable jobs are like some linux jobs which should not be
preempted, then this may be only a half solution.

Thanks
Sunil

On Fri, Oct 7, 2016 at 7:36 AM Frank Luo <j...@merkleinc.com> wrote:

Sunil,



You confirmed my understanding. I got the understanding by reading the docs
and haven’t really tried 2.8 or 3.0-alphal.



My situation is that I am in a multi-tenant env, and  got several very
powerful machines with expensive licenses to run a particular linux job,
let’s say Job_A. But the job is executed infrequently, so I want to let
other jobs to use the machines when Job_A is not running. In the meaning
time, I am not powerful enough to force all other jobs to be preemptable.
As matter of fact, I know they have Hadoop jobs inserting into sql-server,
or just pure linux jobs that are not preemptable in nature. So preempt jobs
is not an option for me.



I hope it makes sense.



Frank



*From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
*Sent:* Thursday, October 06, 2016 2:15 PM


*To:* Frank Luo <j...@merkleinc.com>; user@hadoop.apache.org
*Subject:* Re: how to add a shareable node label?



HI Frank



Ideally those containers will be preempted if there are unsatisfied demand
for "configured label".



I could explain this:

"labelA" has few empty resources.  All nodes under "default" label is used.
Hence a new application which is submitted to "default" label has to wait.
But if "labelA" is non-exclusive and there are some free resources, this
new application can run on "labelA".

However if there are some more new apps submitted to "labelA", and if there
are no more resources available in "labelA", then it may preempt containers
from the app which was sharing containers earlier.



May be you could share some more information so tht it may become more
clear. Also I suppose you are running this in hadoop 3 alpha1 release.
please correct me if I m wrong.



Thanks

Sunil



On Thu, Oct 6, 2016 at 9:44 PM Frank Luo <j...@merkleinc.com> wrote:

Thanks Sunil.



Ø  3. If there is any future ask for those resources , we will preempt the
non labeled apps and give them back to labeled apps.



Unfortunately, I am still not able to use it, because of the preemptive
behavior. The jobs that steals labelled resources are not preemptable, and
I’d rather waiting instead of killing.



*From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
*Sent:* Thursday, October 06, 2016 1:59 AM


*To:* Frank Luo <j...@merkleinc.com>; user@hadoop.apache.org

*Subject:* Re: how to add a shareable node label?



Hi Frank

I think as of today this is not possible. You could try and experience the
"non-exlusive" feature of node-label which will officially come in 2.8
soon. Or you can try it in "Hadoop 3 alpha1" release too if its fine to
check. YARN-3214 <https://issues.apache.org/jira/browse/YARN-3214> has the
details for the nodelabel sharing concept.



Thanks

Sunil



On Wed, Oct 5, 2016 at 8:14 PM Frank Luo <j...@merkleinc.com> wrote:

Sunil, thanks for responding.



So is there any way to dedicate one kind of jobs to certain machines, then
having those machines be shared if no dedicated job running?



*From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
*Sent:* Wednesday, October 05, 2016 12:50 AM
*To:* Frank Luo <j...@merkleinc.com>; user@hadoop.apache.org;
u...@yarn.apache.org


*Subject:* Re: how to add a shareable node label?



Hi Frank,



As far as I checked, all labels are "exclusive" in 2.7. In upcoming 2.8
release, we can get "non-exclusive" or sharable node labels.



Thanks

Sunil



On Wed, Oct 5, 2016 at 8:40 AM Frank Luo <j...@merkleinc.com> wrote:

I am using Hadoop 2.7.3, when I run:

$ yarn rmadmin -addToClusterNodeLabels "Label1(exclusive=false)"



I got an error as:

… addToClusterNodeLabels: java.io.IOException: label name should only
contains {0-9, a-z, A-Z, -, _} and should not started with {-,_}



If I just use “Label1”, it will work fine, but I want a shareable one.



Anyone knows a better way to do it?

*Access the Q2 2016 Digital Marketing Report

Re: how to add a shareable node label?

2016-10-06 Thread Sunil Govind

Hi Frank
I think as of today this is not possible. You could try and experience the
"non-exlusive" feature of node-label which will officially come in 2.8
soon. Or you can try it in "Hadoop 3 alpha1" release too if its fine to
check. YARN-3214 <https://issues.apache.org/jira/browse/YARN-3214> has the
details for the nodelabel sharing concept.

Thanks
Sunil

On Wed, Oct 5, 2016 at 8:14 PM Frank Luo <j...@merkleinc.com> wrote:

> Sunil, thanks for responding.
>
>
>
> So is there any way to dedicate one kind of jobs to certain machines, then
> having those machines be shared if no dedicated job running?
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
> *Sent:* Wednesday, October 05, 2016 12:50 AM
> *To:* Frank Luo <j...@merkleinc.com>; user@hadoop.apache.org;
> u...@yarn.apache.org
>
>
> *Subject:* Re: how to add a shareable node label?
>
>
>
> Hi Frank,
>
>
>
> As far as I checked, all labels are "exclusive" in 2.7. In upcoming 2.8
> release, we can get "non-exclusive" or sharable node labels.
>
>
>
> Thanks
>
> Sunil
>
>
>
> On Wed, Oct 5, 2016 at 8:40 AM Frank Luo <j...@merkleinc.com> wrote:
>
> I am using Hadoop 2.7.3, when I run:
>
> $ yarn rmadmin -addToClusterNodeLabels "Label1(exclusive=false)"
>
>
>
> I got an error as:
>
> … addToClusterNodeLabels: java.io.IOException: label name should only
> contains {0-9, a-z, A-Z, -, _} and should not started with {-,_}
>
>
>
> If I just use “Label1”, it will work fine, but I want a shareable one.
>
>
>
> Anyone knows a better way to do it?
>
> *Access the Q2 2016 Digital Marketing Report for a fresh set of trends and
> benchmarks in digital marketing*
> <http://www2.merkleinc.com/l/47252/2016-07-26/47gt7c>
>
> *Download our latest report titled “The Case for Change: Exploring the
> Myths of Customer-Centric Transformation” *
> <http://www2.merkleinc.com/l/47252/2016-08-04/4b9p7c>
>
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>
> *Access the Q2 2016 Digital Marketing Report for a fresh set of trends and
> benchmarks in digital marketing*
> <http://www2.merkleinc.com/l/47252/2016-07-26/47gt7c>
>
> *Download our latest report titled “The Case for Change: Exploring the
> Myths of Customer-Centric Transformation” *
> <http://www2.merkleinc.com/l/47252/2016-08-04/4b9p7c>
>
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>

Re: how to add a shareable node label?

2016-10-04 Thread Sunil Govind

Hi Frank,

As far as I checked, all labels are "exclusive" in 2.7. In upcoming 2.8
release, we can get "non-exclusive" or sharable node labels.

Thanks
Sunil

On Wed, Oct 5, 2016 at 8:40 AM Frank Luo  wrote:

> I am using Hadoop 2.7.3, when I run:
>
> $ yarn rmadmin -addToClusterNodeLabels "Label1(exclusive=false)"
>
>
>
> I got an error as:
>
> … addToClusterNodeLabels: java.io.IOException: label name should only
> contains {0-9, a-z, A-Z, -, _} and should not started with {-,_}
>
>
>
> If I just use “Label1”, it will work fine, but I want a shareable one.
>
>
>
> Anyone knows a better way to do it?
>
> *Access the Q2 2016 Digital Marketing Report for a fresh set of trends and
> benchmarks in digital marketing*
> 
>
> *Download our latest report titled “The Case for Change: Exploring the
> Myths of Customer-Centric Transformation” *
> 
>
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>

Re: ACCEPTED: waiting for AM container to be allocated, launched and register with RM

2016-08-22 Thread Sunil Govind

HI Ram

RM logs looks fine and as per config it looks like RM is running on 8030
itself.
I am not very sure about the oozie end config which you mentioned. I
suggest you could check the config end more and debug there.
Also will let other community folks to pitch in if they have some other
opinion.

Thanks
Sunil

On Mon, Aug 22, 2016 at 8:57 PM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> any thoughts from the logs and config I have shared?
>
> On Aug 21, 2016 8:32 AM, "rammohan ganapavarapu" <rammohanga...@gmail.com>
> wrote:
>
>> so in job.properties what is the jobtracker property, is it RM ip: port
>> or scheduler port which is 8030, if I use 8030 I am getting unknown
>> protocol proto buffer error.
>>
>> On Aug 21, 2016 7:37 AM, "Sunil Govind" <sunil.gov...@gmail.com> wrote:
>>
>>> Hi.
>>>
>>> It seems its an oozie issue. From conf, RM scheduler is running at port
>>> 8030.
>>> But your job.properties is taking 8032. I suggest you could double
>>> confirm your oozie configuration and see the configurations are intact to
>>> contact RM. Sharing a link also
>>>
>>> https://discuss.zendesk.com/hc/en-us/articles/203355837-How-to-run-a-MapReduce-jar-using-Oozie-workflow
>>>
>>> Thanks
>>> Sunil
>>>
>>>
>>> On Sun, Aug 21, 2016 at 8:41 AM rammohan ganapavarapu <
>>> rammohanga...@gmail.com> wrote:
>>>
>>>> Please find the attached config that i got from yarn ui and  AM,RM
>>>> logs. I only see that connecting to 0.0.0.0:8030 when i submit job
>>>> using oozie, but if i submit as yarn jar its working fine as i posted in my
>>>> previous posts.
>>>>
>>>> Here is my oozie job.properties file, i have a java class that just
>>>> prints
>>>>
>>>> nameNode=hdfs://master01:8020
>>>> jobTracker=master01:8032
>>>> workflowName=EchoJavaJob
>>>> oozie.use.system.libpath=true
>>>>
>>>> queueName=default
>>>> hdfsWorkflowHome=/user/uap/oozieWorkflows
>>>>
>>>> workflowPath=${nameNode}${hdfsWorkflowHome}/${workflowName}
>>>> oozie.wf.application.path=${workflowPath}
>>>>
>>>> Please let me know if you guys find any clue why its trying to connect
>>>> to 0.0.0.:8030.
>>>>
>>>> Thanks,
>>>> Ram
>>>>
>>>>
>>>> On Fri, Aug 19, 2016 at 11:54 PM, Sunil Govind <sunil.gov...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Ram
>>>>>
>>>>> From the console log, as Rohith said, AM is looking for AM at 8030. So
>>>>> pls confirm the RM port once.
>>>>> Could you please share AM and RM logs.
>>>>>
>>>>> Thanks
>>>>> Sunil
>>>>>
>>>>> On Sat, Aug 20, 2016 at 10:36 AM rammohan ganapavarapu <
>>>>> rammohanga...@gmail.com> wrote:
>>>>>
>>>>>> yes, I did configured.
>>>>>>
>>>>>> On Aug 19, 2016 7:22 PM, "Rohith Sharma K S" <
>>>>>> ksrohithsha...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> From below discussion and AM logs, I see that AM container has
>>>>>>> launched but not able to connect to RM.
>>>>>>>
>>>>>>> This looks like your configuration issue. Would you check your
>>>>>>> job.xml jar that does *yarn.resourcemanager.scheduler.address *has
>>>>>>> been configured?
>>>>>>>
>>>>>>> Essentially, this address required by MRAppMaster for connecting to
>>>>>>> RM for heartbeats. If you don’t not configure, default value will be 
>>>>>>> taken
>>>>>>> i.e 8030.
>>>>>>>
>>>>>>>
>>>>>>> Thanks & Regards
>>>>>>> Rohith Sharma K S
>>>>>>>
>>>>>>> On Aug 20, 2016, at 7:02 AM, rammohan ganapavarapu <
>>>>>>> rammohanga...@gmail.com> wrote:
>>>>>>>
>>>>>>> Even if  the cluster dont have enough resources it should connect to
>>>>>>> "
>>>>>>>
>>>>>>> /0.0.0.0:8030" right? it should connect to

Re: ACCEPTED: waiting for AM container to be allocated, launched and register with RM

2016-08-21 Thread Sunil Govind

Hi.

It seems its an oozie issue. From conf, RM scheduler is running at port
8030.
But your job.properties is taking 8032. I suggest you could double confirm
your oozie configuration and see the configurations are intact to contact
RM. Sharing a link also
https://discuss.zendesk.com/hc/en-us/articles/203355837-How-to-run-a-MapReduce-jar-using-Oozie-workflow

Thanks
Sunil


On Sun, Aug 21, 2016 at 8:41 AM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Please find the attached config that i got from yarn ui and  AM,RM logs. I
> only see that connecting to 0.0.0.0:8030 when i submit job using oozie,
> but if i submit as yarn jar its working fine as i posted in my previous
> posts.
>
> Here is my oozie job.properties file, i have a java class that just prints
>
> nameNode=hdfs://master01:8020
> jobTracker=master01:8032
> workflowName=EchoJavaJob
> oozie.use.system.libpath=true
>
> queueName=default
> hdfsWorkflowHome=/user/uap/oozieWorkflows
>
> workflowPath=${nameNode}${hdfsWorkflowHome}/${workflowName}
> oozie.wf.application.path=${workflowPath}
>
> Please let me know if you guys find any clue why its trying to connect to
> 0.0.0.:8030.
>
> Thanks,
> Ram
>
>
> On Fri, Aug 19, 2016 at 11:54 PM, Sunil Govind <sunil.gov...@gmail.com>
> wrote:
>
>> Hi Ram
>>
>> From the console log, as Rohith said, AM is looking for AM at 8030. So
>> pls confirm the RM port once.
>> Could you please share AM and RM logs.
>>
>> Thanks
>> Sunil
>>
>> On Sat, Aug 20, 2016 at 10:36 AM rammohan ganapavarapu <
>> rammohanga...@gmail.com> wrote:
>>
>>> yes, I did configured.
>>>
>>> On Aug 19, 2016 7:22 PM, "Rohith Sharma K S" <ksrohithsha...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> From below discussion and AM logs, I see that AM container has launched
>>>> but not able to connect to RM.
>>>>
>>>> This looks like your configuration issue. Would you check your job.xml
>>>> jar that does *yarn.resourcemanager.scheduler.address *has been
>>>> configured?
>>>>
>>>> Essentially, this address required by MRAppMaster for connecting to RM
>>>> for heartbeats. If you don’t not configure, default value will be taken i.e
>>>> 8030.
>>>>
>>>>
>>>> Thanks & Regards
>>>> Rohith Sharma K S
>>>>
>>>> On Aug 20, 2016, at 7:02 AM, rammohan ganapavarapu <
>>>> rammohanga...@gmail.com> wrote:
>>>>
>>>> Even if  the cluster dont have enough resources it should connect to "
>>>>
>>>> /0.0.0.0:8030" right? it should connect to my , not sure why 
>>>> its trying to connect to 0.0.0.0:8030.
>>>>
>>>> I have verified the config and i removed traces of 0.0.0.0 still no luck.
>>>>
>>>> org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at 
>>>> /0.0.0.0:8030
>>>>
>>>> If an one has any clue please share.
>>>>
>>>> Thanks,
>>>>
>>>> Ram
>>>>
>>>>
>>>>
>>>> On Fri, Aug 19, 2016 at 2:32 PM, rammohan ganapavarapu <
>>>> rammohanga...@gmail.com> wrote:
>>>>
>>>>> When i submit a job using yarn its seems working only with oozie its
>>>>> failing i guess, not sure what is missing.
>>>>>
>>>>> yarn jar
>>>>> /uap/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi
>>>>> 20 1000
>>>>> Number of Maps  = 20
>>>>> Samples per Map = 1000
>>>>> .
>>>>> .
>>>>> .
>>>>> Job Finished in 19.622 seconds
>>>>> Estimated value of Pi is 3.1428
>>>>>
>>>>> Ram
>>>>>
>>>>> On Fri, Aug 19, 2016 at 11:46 AM, rammohan ganapavarapu <
>>>>> rammohanga...@gmail.com> wrote:
>>>>>
>>>>>> Ok, i have used yarn-utils.py to get the correct values for my
>>>>>> cluster and update those properties and restarted RM and NM but still no
>>>>>> luck not sure what i am missing, any other insights will help me.
>>>>>>
>>>>>> Below are my properties from yarn-site.xml and map-site.xml.
>>>>>>
>>>>>> python yarn-utils.py -c 24 -m 63 -d 3 -k False
>>>>>>  Using cor

Re: ACCEPTED: waiting for AM container to be allocated, launched and register with RM

2016-08-20 Thread Sunil Govind

gt; 
>>>>
>>>>  
>>>>   yarn.nodemanager.resource.memory-mb
>>>>   61440
>>>> 
>>>>
>>>>
>>>> Ram
>>>>
>>>> On Thu, Aug 18, 2016 at 11:14 PM, tkg_cangkul <yuza.ras...@gmail.com>
>>>> wrote:
>>>>
>>>>> maybe this link can be some reference to tune up the cluster:
>>>>>
>>>>>
>>>>> http://jason4zhu.blogspot.co.id/2014/10/memory-configuration-in-hadoop.html
>>>>>
>>>>>
>>>>> On 19/08/16 11:13, rammohan ganapavarapu wrote:
>>>>>
>>>>> Do you know what properties to tune?
>>>>>
>>>>> Thanks,
>>>>> Ram
>>>>>
>>>>> On Thu, Aug 18, 2016 at 9:11 PM, tkg_cangkul <yuza.ras...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> i think that's because you don't have enough resource.  u can tune
>>>>>> your cluster config to maximize your resource.
>>>>>>
>>>>>>
>>>>>> On 19/08/16 11:03, rammohan ganapavarapu wrote:
>>>>>>
>>>>>> I dont see any thing odd except this not sure if i have to worry
>>>>>> about it or not.
>>>>>>
>>>>>> 2016-08-19 03:29:26,621 INFO [main]
>>>>>> org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /
>>>>>> 0.0.0.0:8030
>>>>>> 2016-08-19 03:29:27,646 INFO [main] org.apache.hadoop.ipc.Client:
>>>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0
>>>>>> time(s); retry policy is RetryUpToMaximumCo
>>>>>> untWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
>>>>>> 2016-08-19 03:29:28,647 INFO [main] org.apache.hadoop.ipc.Client:
>>>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1
>>>>>> time(s); retry policy is 
>>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>>>>>> sleepTime=1000 MILLISECONDS)
>>>>>>
>>>>>>
>>>>>> its keep printing this log ..in app container logs.
>>>>>>
>>>>>> On Thu, Aug 18, 2016 at 8:20 PM, tkg_cangkul <yuza.ras...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> maybe u can check the logs from port 8088 on your browser. that was
>>>>>>> RM UI. just choose your job id and then check the logs.
>>>>>>>
>>>>>>> On 19/08/16 10:14, rammohan ganapavarapu wrote:
>>>>>>>
>>>>>>> Sunil,
>>>>>>>
>>>>>>> Thanks you for your input, below are my server metrics for RM. Also
>>>>>>> attached RM UI for capacity scheduler resources. How else i can find?
>>>>>>>
>>>>>>> {
>>>>>>>   "name":
>>>>>>> "Hadoop:service=ResourceManager,name=QueueMetrics,q0=root",
>>>>>>>   "modelerType": "QueueMetrics,q0=root",
>>>>>>>   "tag.Queue": "root",
>>>>>>>   "tag.Context": "yarn",
>>>>>>>   "tag.Hostname": "hadoop001",
>>>>>>>   "running_0": 0,
>>>>>>>   "running_60": 0,
>>>>>>>   "running_300": 0,
>>>>>>>   "running_1440": 0,
>>>>>>>   "AppsSubmitted": 1,
>>>>>>>   "AppsRunning": 0,
>>>>>>>   "AppsPending": 0,
>>>>>>>   "AppsCompleted": 0,
>>>>>>>   "AppsKilled": 0,
>>>>>>>   "AppsFailed": 1,
>>>>>>>   "AllocatedMB": 0,
>>>>>>>   "AllocatedVCores": 0,
>>>>>>>   "AllocatedContainers": 0,
>>>>>>>   "AggregateContainersAllocated": 2,
>>>>>>>   "AggregateContainersReleased": 2,
>>>>>>>   "AvailableMB": 64512,
>>>>>>>   "AvailableVCores": 24,
>>>>>>>   "PendingMB": 0,
>>>>>>>   "PendingVCores": 0,
>>>>>>>   "PendingContainers": 0,
>>>>>>>   "ReservedMB": 0,
>>>>>>>   "ReservedVCores": 0,
>>>>>>>   "ReservedContainers": 0,
>>>>>>>   "ActiveUsers": 0,
>>>>>>>   "ActiveApplications": 0
>>>>>>> },
>>>>>>>
>>>>>>> On Thu, Aug 18, 2016 at 6:49 PM, Sunil Govind <
>>>>>>> sunil.gov...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> It could be because of many of reasons. Also I am not sure about
>>>>>>>> which scheduler your are using, pls share more details such as RM log 
>>>>>>>> etc.
>>>>>>>>
>>>>>>>> I could point out few reasons
>>>>>>>>  - Such as "Not enough resource is cluster" can cause this
>>>>>>>>  - If using Capacity Scheduler, if queue capacity is maxed out,
>>>>>>>> such case can happen.
>>>>>>>>  - Similarly if max-am-resource-percent is crossed per queue level,
>>>>>>>> then also AM container may not be launched.
>>>>>>>>
>>>>>>>> you could check RM log to get more information if AM container is
>>>>>>>> laucnhed.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Sunil
>>>>>>>>
>>>>>>>> On Fri, Aug 19, 2016 at 5:37 AM rammohan ganapavarapu <
>>>>>>>> rammohanga...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> When i submit a MR job, i am getting this from AM UI but it never
>>>>>>>>> get finished, what am i missing ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ram
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -
>>>>>>> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
>>>>>>> For additional commands, e-mail: user-h...@hadoop.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>

Re: ACCEPTED: waiting for AM container to be allocated, launched and register with RM

2016-08-18 Thread Sunil Govind

Hi

It could be because of many of reasons. Also I am not sure about which
scheduler your are using, pls share more details such as RM log etc.

I could point out few reasons
 - Such as "Not enough resource is cluster" can cause this
 - If using Capacity Scheduler, if queue capacity is maxed out, such case
can happen.
 - Similarly if max-am-resource-percent is crossed per queue level, then
also AM container may not be launched.

you could check RM log to get more information if AM container is laucnhed.

Thanks
Sunil

On Fri, Aug 19, 2016 at 5:37 AM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Hi,
>
> When i submit a MR job, i am getting this from AM UI but it never get
> finished, what am i missing ?
>
> Thanks,
> Ram
>

Re: How to distcp data between two clusters which are not in the same local network?

2016-08-15 Thread Sunil Govind

Hi

I think you can also refer below link too.
http://aajisaka.github.io/hadoop-project/hadoop-distcp/DistCp.html

Thanks
Sunil

On Mon, Aug 15, 2016 at 7:26 PM Wei-Chiu Chuang  wrote:

> Hello,
> if I understand your question correctly, you are actually building a
> multi-home Hadoop, correct?
> Multi-homed Hadoop cluster can be tricky to set up, to the extend that
> Cloudera does not recommend it. I've not set up a multihome Hadoop cluster
> before, but I think you have to make sure the reverse resolution works for
> the IP addresses.
>
>
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html
>
>
> On Mon, Aug 15, 2016 at 1:06 AM, Shady Xu  wrote:
>
>> Hi all,
>>
>> Recently I tried to use distcp to copy data across two clusters which are
>> not in the same local network. Fortunately, the nodes of the source cluster
>> each has an extra interface and ip which can be accessed from the
>> destination cluster. But during the process of distcp, the map tasks always
>> used the local ip of the source cluster nodes which they cannot reach.
>>
>> I tried changing the property 'dfs.datanode.dns.interface' to the one I
>> want, and I tried changing the property '
>> dfs.datanode.use.datanode.hostname' to true too. Nothing works.
>>
>> Does hadoop now support this or do I miss something?
>>
>
>

Re: Yarn web UI shows more memory used than actual

2016-08-15 Thread Sunil Govind

Hi Suresh

"This 'memory used' would be the memory used by all containers running on
that node"
>> "Memory Used" in Nodes page indicates how memory is used in all the node
managers with respect to the corresponding demand made to RM. For eg, if
application has asked for 4GB resource and if its really using only 2GB,
then this kind of difference can be shown (one possibility). Which means
4GB will be displayed in Node page.

As Ray has mentioned if the demand for resource is more from AM itself OR
with highly configured JVM size for containers (through java opts), there
can be chances that containers may take more that you intented and UI will
display higher value.

Thanks
Sunil

On Sun, Aug 14, 2016 at 6:35 AM Suresh V  wrote:

> Hello Ray,
>
> I'm referring to the nodes of the cluster page, which shows the individual
> nodes and the total memory available in each node and the memory used in
> each node.
>
> This 'memory used' would be the memory used by all containers running on
> that node; however, if I check free command in the node, there is
> significant difference. I'm unable to understand this...
>
> Appreciate any light into this. I agree the main RM page shows the total
> containers memory utilization across nodes., which is matching the sum of
> memory used in each nodes as displayed in the 'nodes of the cluster' page...
>
> Thank you
> Suresh.
>
>
> Suresh V
> http://www.justbirds.in
>
>
> On Sat, Aug 13, 2016 at 12:44 PM, Ray Chiang  wrote:
>
>> The RM page will show the combined container memory usage.  If you have a
>> significant difference between any or all of
>>
>> 1) actual process memory usage
>> 2) JVM heap size
>> 3) container maximum
>>
>> then you will have significant memory underutilization.
>>
>> -Ray
>>
>>
>> On 20160813 6:31 AM, Suresh V wrote:
>>
>> Hello,
>>
>> In our cluster when a MR job is running, in the 'Nodes of the cluster'
>> page, it shows the memory used as 84GB out of 87GB allocated to yarn
>> nodemanagers.
>> However when I actually do a top or free command while logged in to the
>> node, it shows as only 23GB used and about 95GB or more free.
>>
>> I would imagine the memory used displayed in the Yarn web UI should match
>> the memory used shown by top or free command on the node.
>>
>> Please advise if this is right thinking or am I missing something?
>>
>> Thank you
>> Suresh.
>>
>>
>>
>>
>

Re: All nodes are not used

2016-08-09 Thread Sunil Govind

HI Madhav

Could you help to share some more information here. When u say few nodes
are not utilized, is it always same nodes which are not utilized?

also how long each of these container are running on an average, pls make
sure you have provided enough split size to ensure the containers are not
short running.

Thanks
Sunil

On Tue, Aug 9, 2016 at 4:49 AM Madhav Sharan  wrote:

> Hi Hadoop users,
>
> I am running a m/r job with an input file of 23 million records. I can see
> all our files are not getting used.
>
> What can I change to utilize all nodes?
>
>
> Containers Mem Used Mem Avail Vcores used Vcores avail
> 8 11.25 GB 0 B 8 0
> 0 0 B 11.25 GB 0 8
> 0 0 B 11.25 GB 0 8
> 8 11.25 GB 0 B 8 0
> 8 11.25 GB 0 B 8 0
> 7 11.25 GB 0 B 7 1
> 5 7.03 GB 4.22 GB 5 3
> 0 0 B 11.25 GB 0 8
> 0 0 B 11.25 GB 0 8
>
>
> My command looks like -
>
> hadoop jar
> target/pooled-time-series-1.0-SNAPSHOT-jar-with-dependencies.jar
> gov.nasa.jpl.memex.pooledtimeseries.MeanChiSquareDistanceCalculation 
> /user/pts/output/MeanChiSquareAndSimilarityInput
> /user/pts/output/MeanChiSquaredCalcOutput
>
> Directory - */user/pts/output/MeanChiSquareAndSimilarityInput* have a
> input file of 23 m records. File size is ~3 GB
>
> Code -
> https://github.com/smadha/pooled_time_series/blob/master/src/main/java/gov/nasa/jpl/memex/pooledtimeseries/MeanChiSquareDistanceCalculation.java#L135
>
>
> --
> Madhav Sharan
>
>

Re: AM Container exits with code 2

2016-07-29 Thread Sunil Govind

Hi Rahul,
>From the given log, I do not think YARN is killing containers due to memory
issue. Usage is under the limits. However full log is not shared, so you
can verify that when the AM launch is failed whether memory was under limit
or not.
Which application are you trying to run?
Also its better if we have "application master container" log.  *sysout* or
*syserr* of that launch will have some more information.

Thanks
Sunil

On Fri, Jul 29, 2016 at 12:49 PM Rahul Chhiber <
rahul.chhi...@cumulus-systems.com> wrote:

> Hi all,
>
>
>
> I have launched an application on yarn cluster which has following config.
>
> Master (Resource Manager) - 16GB RAM + 8 vCPU
>
> Slave 1 (Node manager 1) - 8GB RAM + 4 vCPU
>
>
>
> Intermittently AM(2GB, 1 core) is exiting with code - 2 with the following
> trace. I am not able to find anything about exit code 2.
>
>
>
> Last log is
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 22504 for container-id
> container_1469709900068_0002_01_01: 203.8 MB of 2 GB physical memory
> used; 2.8 GB of 4.2 GB virtual memory used
>
>
>
> Does this have anything to do with my application logic or Is it possible
> that it is killed because of exceeding the memory limits?
>
>
>
> 2016-07-28 17:08:50,672 WARN
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Exception from container-launch with container ID:
> container_1469709900068_0002_01_01 and exit code: 2
>
> ExitCodeException exitCode=2:
>
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
>
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> 2016-07-28 17:08:50,674 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from
> container-launch.
>
> 2016-07-28 17:08:50,674 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id:
> container_1469709900068_0002_01_01
>
> 2016-07-28 17:08:50,674 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 2
>
> 2016-07-28 17:08:50,674 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace:
> ExitCodeException exitCode=2:
>
>
>
> Thanks,
>
> Rahul Chhiber
>
>
>

Re: WebUI's Server don't work and JobHistoryServer missing

2016-07-21 Thread Sunil Govind

Hi Mike

yarn.resourcemanager.webapp.address  is to configure Resource Manager Web
UI and you can access ResourceManager Web UI as "http://". It
will run on 8088 as default if you are not configuring this property
explicitly. Please refer [
https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml]
for more detailed information.

I think you are not running JobHistoryServer. pls refer [
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html]
 "Hadoop Startup" section. You can run history server by "[mapred]$
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start
historyserver" command.
If you are interested, you can also run "timelineserver" which is for yarn.

Also you might need to run WebAppProxy server to see the MapReduce job
application master ui. Please refer to above link for cluster setup and it
has detailed information to start various services.

Thanks
Sunil

On Tue, Jul 19, 2016 at 1:59 PM Mike Wenzel  wrote:

> My cluster looks like:
>
>
>
> Node1 - NameNode + ResourceManager
>
> Node2 - SecondaryNameNode
>
> Node3 - DataNode (+NodeManager)
>
> Node4 - DataNode (+NodeManager)
>
> Node5 - DataNode (+NodeManager)
>
>
>
> http://node1:8088/cluster works.
>
>
>
> My problems:
>
> > SecondaryNamenode WebUI: http://node2:50090 doesn’t work (
> ERR_EMPTY_RESPONSE).
>
> > Datanode WebUI: http://node3:50075 looks like there is something wrong:
> https://i.imgur.com/dbTxtkS.png
>
> > Also I’m missing the JobHistory Server. Isn’t the jobhistory server
> included in yarn? I tried accessing some URLs I found on the web declared
> as “default configuration” and always (ERR_EMPTY_RESPONSE).
>
> > When I run a mapreduce job the output still says: “uri to track the job
> is httpd://localhost:8080” and when I try to access any node on this port I
> get no data (ERR_EMPTY_RESPONSE).
>
>
>
> Unfortunately I couldn’t get this problem solved. I only found a few
> guides setting yarn up. I tried them all and my situation only got worse
> when adding more properties on yarn-site.xml. E.g. the cluster
> overview-WebUI http://node1:8088/cluster didn’t worked anymore when
> adding “yarn.resourcemanager.webapp.address” to yarn-site.xml.
>
> Can anyone help me out configuring yarn correctly please?
>
>
>
> My configuration:
>
>
>
> core-site.xml
>
> 
>
> 
>
> 
>
>   
>
> hadoop.tmp.dir
>
> file:///hdfs/tmp
>
>   
>
>   
>
> fs.defaultFS
>
> hdfs://node1:54310
>
>   
>
> 
>
>
>
> hdfs-site.xml
>
> 
>
> 
>
> 
>
> 
>
>   
>
> dfs.namenode.name.dir
>
> file:///hdfs/name
>
>   
>
>   
>
> dfs.namenode.http-address
>
> node1:50070
>
>   
>
>   
>
> dfs.namenode.checkpoint.dir
>
> file:///hdfs/checkpoint
>
>   
>
>
>
> 
>
>   
>
> dfs.namenode.secondary.http-address
>
> node2:50090
>
>   
>
>
>
> 
>
>   
>
> dfs.datanode.data.dir
>
> file:///hdfs/data
>
>   
>
> 
>
>
>
> yarn-site.xml
>
> 
>
> 
>
> 
>
> 
>
>   
>
> yarn.resourcemanager.hostname
>
> node1
>
>   
>
>   
>
> yarn.resourcemanager.resource-tracker.address
>
> node1:8031
>
>   
>
>   
>
> yarn.resourcemanager.scheduler.address
>
> node1:8030
>
>   
>
>   
>
> yarn.resourcemanager.address
>
> node1:8032
>
>   
>
> 
>
>
>
> masters
>
> node2
>
>
>
> slaves
>
> node3
>
> node4
>
> node5
>

Re: Restart number of vcores in YARN

2016-07-17 Thread Sunil Govind

Hi Alvaro

yarn.nodemanager.resource.cpu-vcores will help to configure the vcores. And
one of the option to refresh this information is to restart nodemanager.
However you also need to ensure that the changed "yarn-site.xml" path is
present in nodemanager's class path. Pls ensure $HADOOP_CONF_DIR is
correctly set and has the updated yarn-site.xml in it.

Thanks
Sunil

On Fri, Jul 15, 2016 at 7:42 PM Alvaro Brandon 
wrote:

> Hello everyone:
>
> I've changed yarn.nodemanager.resource.cpu-vcores in my yarn-site.xml
> configuration file and restarted all the yarn and hdfs services. However
> the nodes doesn't reflect this change in the number of available virtual
> cores, at least when I query the resource manager API. How can you refresh
> this yarn.nodemanager.resource.cpu-vcores in the cluster?
>
> Thanks in advance
>

Re: YARN application start event

2016-07-10 Thread Sunil Govind

Hi Alvaro
I suspect YARN is not supporting to register any custom listeners from user
end (for any events).
Through the REST api's, you could query and try to get the STATE of the
application (http://{ip:port}/ws/v1/cluster/apps/{appid}) given the app id.
If the app state is RUNNING, you can consider the application has started.

Thanks
Sunil

On Thu, Jul 7, 2016 at 3:12 PM Alvaro Brandon 
wrote:

> Hello everyone:
>
> I was wondering if there is any way to capture the event of an application
> starting in YARN. The idea is to implement a Listener that every time a
> YARN application starts, will query the REST API to get the current memory
> and cores availables in the cluster. Any ideas on this?
>
> Thanks in advance,
>
> Alvaro
>

Re: YARN cluster underutilization

2016-06-22 Thread Sunil Govind

Hi

Input split size is increased to make the containers to run longer and
process more data. Thus slow container allocation wont be a problem(Since
all container requests are coming w/o data locality). Its better to keep
more memory for AM container when it handles 600k+ requests. And each
mappers will directly emit data to disk as mentioned by Jeff by applying
some filters.

Thanks
Sunil

On Tue, Jun 21, 2016 at 12:07 PM Deepak Goel <deic...@gmail.com> wrote:

> Pretty nice. However why would swapping to disk happen when there is
> enough physical memory available..
>
> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
>
>--
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, dee...@simtree.net
> deic...@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>
> On Sat, May 28, 2016 at 1:31 AM, Guttadauro, Jeff <
> jeff.guttada...@here.com> wrote:
>
>> Hi, all.
>>
>>
>>
>> Just wanted to provide an update, which is that I’m finally getting good
>> YARN cluster utilization (consistently within the 90-100% range!).  I
>> believe the biggest change was to increase the min split size.  Since our
>> input is all in S3 and data locality is not really an issue, I bumped it up
>> to 2G to minimize the impact of allocation/deallocation of container
>> resources, since each container will be up working for longer, so that now
>> occurs less frequently.
>>
>>
>>
>>   
>> mapreduce.input.fileinputformat.split.minsize2147483648
>>
>>
>>
>> Not sure how much impact the following changes had, since they were made
>> at the same time.  Everything’s humming along now though, so I’m going to
>> leave them.
>>
>>
>>
>> I also reduced the node heartbeat interval from 1000ms down to 500ms 
>> ("yarn.resourcemanager.nodemanagers.heartbeat-interval-ms":
>> "500" in cluster configuration JSON), since I’m told that NodeManager
>> will only allocate 1 container per node per heartbeat when dealing with
>> non-localized data, like we are since it’s in S3.  I also doubled the
>> memory given to the YARN Resource Manager from the default for the
>> m3.xlarge node type I’m using ("YARN_RESOURCEMANAGER_HEAPSIZE": "5120"
>> in cluster configuration JSON).
>>
>>
>>
>> Thanks again to Sunil and Shubh (and my colleague, York) for the helpful
>> guidance!
>>
>>
>>
>> Take care,
>>
>> -Jeff
>>
>>
>>
>> *From:* Shubh hadoopExp [mailto:shubhhadoop...@gmail.com]
>> *Sent:* Wednesday, May 25, 2016 11:08 PM
>> *To:* Guttadauro, Jeff <jeff.guttada...@here.com>
>> *Cc:* Sunil Govind <sunil.gov...@gmail.com>; user@hadoop.apache.org
>>
>> *Subject:* Re: YARN cluster underutilization
>>
>>
>>
>> Hey,
>>
>>
>>
>> OFFSWITCH allocation means if the data locality is maintained or not. It
>> has no relation with heartbeat! Heartbeat is just used to clear the
>> pipelining of Container request.
>>
>>
>>
>> -Shubh
>>
>>
>>
>>
>>
>> On May 25, 2016, at 3:30 PM, Guttadauro, Jeff <jeff.guttada...@here.com>
>> wrote:
>>
>>
>>
>> Interesting stuff!  I did not know about this handling of OFFSWITCH
>> requests.
>>
>>
>>
>> To get around this, would you recommend reducing the heartbeat interval,
>> perhaps to 250ms to get a 4x improvement in container allocation rate (or
>> is it not quite as simple as that)?  Maybe doing this in combination with
>> using a greater number of smaller nodes would help?  Would overloading the
>> ResourceManager be a concern if doing that?  Should I bump up the
>> “YARN_RESOURCEMANAGER_HEAPSIZE” configuration property (current default for
>> m3.xlarge is 2396M), or would you suggest any other knobs to turn to help
>> RM handle it?
>>
>>
>>
>> Thanks again for all your help, Sunil!
>>
>>
>>
>> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com
>> <sunil.gov...@gmail.com>]
>> *Sent:* Wednesday, May 25, 2016 1:07 PM
>> *To:* Guttadauro, Jeff <jeff.guttada...@here.com>; user@hadoop.apache.org
>> *Subject:* Re: YARN cluster underutilization
>>
>>
>>
>> Hi Jeff,
>>
>>
>>
>>  I do see the yar

Re: maximum-am-resource-percent is insufficient to start a single application

2016-06-15 Thread Sunil Govind

Adding to what Varun has said, Resource Manager log will be of help here to
confirm same.

The code snippet which you have mentioned is correct. But it also has a
check that if the number of active application is less than 1, this check
wont be performed. And it seems you have only one application.

- Sunil



On Wed, Jun 15, 2016 at 12:27 PM Varun saxena <varun.sax...@huawei.com>
wrote:

> Can you open the Resource Manager(RM) UI and share screenshot of main RM
> page. We can check cluster resources there. Most probably cluster does not
> have enough resources.
>
> How much memory and VCores does your AM need ?
>
> RM UI can be accessed at http://localhost:8088/
>
>
>
> - Varun Saxena.
>
>
>
> *From:* Phillip Wu [mailto:phillip...@unsw.edu.au]
> *Sent:* 15 June 2016 14:42
> *To:* user@hadoop.apache.org
> *Cc:* Sunil Govind
> *Subject:* RE: maximum-am-resource-percent is insufficient to start a
> single application
>
>
>
> Sunil,
>
>
>
> Thanks for your email.
>
>
>
> 1.   I don’t think anything on the cluster is being used – see below
>
> I’m not sure how to get my “total cluster resource size” – please advise
> how to get this?
>
> After doing the hive insert I get this:
>
> hduser@ip-10-118-112-182:/$ hadoop queue -info default -showJobs
>
> 16/06/10 02:24:49 INFO client.RMProxy: Connecting to ResourceManager at /
> 127.0.0.1:8050
>
> ==
>
> Queue Name : default
>
> Queue State : running
>
> Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0,
> CurrentCapacity: 0.0
>
> Total jobs:1
>
>   JobId  State   StartTime
> UserName   Queue  Priority   UsedContainers
> RsvdContainers  UsedMem RsvdMem NeededMem AM info
>
> job_1465523894946_0001   PREP   1465524072194
>  hduser defaultNORMAL0
> 0   0M  0M0M
> http://localhost:8088/proxy/application_1465523894946_0001/
>
>
>
> hduser@ip-10-118-112-182:/$ mapred job -status  job_1465523894946_0001
>
> Job: job_1465523894946_0001
>
> Job File:
> /tmp/hadoop-yarn/staging/hduser/.staging/job_1465523894946_0001/job.xml
>
> Job Tracking URL :
> http://localhost:8088/proxy/application_1465523894946_0001/
>
> Uber job : false
>
> Number of maps: 0
>
> Number of reduces: 0
>
> map() completion: 0.0
>
> reduce() completion: 0.0
>
> Job state: PREP
>
> retired: false
>
> reason for failure:
>
> Counters: 0
>
> 2.   There are no other applications except I’m running zookeeper
>
> 3.   There is only one user
>
>
>
> For your assistance this seems to be the code generating the error
> message[…yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java]:
>
> if (!Resources.lessThanOrEqual(
>
>   resourceCalculator, lastClusterResource, userAmIfStarted,
>
>   userAMLimit)) {
>
> if (getNumActiveApplications() < 1) {
>
>   LOG.warn("maximum-am-resource-percent is insufficient to start
> a" +
>
> " single application in queue for user, it is likely set too
> low." +
>
> " skipping enforcement to allow at least one application to
> start");
>
> } else {
>
>   LOG.info("not starting application as amIfStarted exceeds " +
>
> "userAmLimit");
>
>   continue;
>
> }
>
>   }
>
>
>
> Any ideas?
>
>
>
> Phillip
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com
> <sunil.gov...@gmail.com>]
> *Sent:* Wednesday, 15 June 2016 4:24 PM
> *To:* Phillip Wu; user@hadoop.apache.org
> *Subject:* Re: maximum-am-resource-percent is insufficient to start a
> single application
>
>
>
> Hi Philip
>
>
>
> Higher maximum-am-resource-percent value (0~1) will help to allocate more
> resource for your ApplicationMaster container of a yarn application (MR
> Jobs here), but also depend on the capacity configured for the queue. You
> have mentioned that there is only default queue here, so that wont be a
> problem. Few questions:
>
> - How much is your total cluster resource size and how much of cluster
> resource is used now ?
>
> - Is there any other application were running in cluster and whether
> it was taking full cluster resource.? This is a possibility since you now
> gave whole queue's capacity for AM containers.
>
> - Do you have multiple users in your cluster who runs appli

Re: maximum-am-resource-percent is insufficient to start a single application

2016-06-15 Thread Sunil Govind

Hi Philip

Higher maximum-am-resource-percent value (0~1) will help to allocate more
resource for your ApplicationMaster container of a yarn application (MR
Jobs here), but also depend on the capacity configured for the queue. You
have mentioned that there is only default queue here, so that wont be a
problem. Few questions:
- How much is your total cluster resource size and how much of cluster
resource is used now ?
- Is there any other application were running in cluster and whether it
was taking full cluster resource.? This is a possibility since you now gave
whole queue's capacity for AM containers.
- Do you have multiple users in your cluster who runs applications
other that this hive job? If so,
yarn.scheduler.capacity..minimum-user-limit-percent will have
impact on AM resource usage limit. I think you can double check this.


- Sunil

On Wed, Jun 15, 2016 at 8:47 AM Phillip Wu  wrote:

> Hi,
>
>
>
> I'm new to Hadoop and Hive.
>
>
>
> I'm using Hadoop 2.6.4 (binary I got from internet) & Hive 2.0.1 (binary I
> got from internet).
>
> I can create a database and table in hive.
>
>
>
> However when I try to insert a record into a previously created table I
> get:
>
> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> maximum-am-resource-percent is insufficient to start a single application
> in queue"
>
>
>
> yarn-site.xml
>
> 
>
>   yarn.resourcemanager.scheduler.class
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>
> 
>
>
>
> capacity-scheduler.xml
>
> 
>
> yarn.scheduler.capacity.maximum-am-resource-percent
>
> 1.0
>
> 
>
>   Maximum percent of resources in the cluster which can be used to run
>
>   application masters i.e. controls number of concurrent running
>
>   applications.
>
> 
>
>   
>
>
>
> According to the documentation this means I have allocated 100% to my one
> and only default scheduler queue.
>
> [
> https://hadoop.apache.org/docs/r2.6.4/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
> ]
>
> "yarn.scheduler.capacity.maximum-am-resource-percent /
> yarn.scheduler.capacity..maximum-am-resource-percent
>
> Maximum percent of resources in the cluster which can be used to run
> application masters - controls number of concurrent active applications.
>
> Limits on each queue are directly proportional to their queue capacities
> and user limits.
>
> Specified as a float - ie 0.5 = 50%. Default is 10%. This can be set for
> all queues with yarn.scheduler.capacity.maximum-am-resource-percent and can
> also be overridden on a per queue basis by setting
>
> yarn.scheduler.capacity..maximum-am-resource-percent"
>
>
>
> Can someone tell me how to fix this?
>

Re: Verifying the authenticity of submitted AM

2016-06-10 Thread Sunil Govind

HI Mingyu,

May be you can take a look at below link
https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/yarn.html

It will give a fair idea about the security you can get for an application

- Sunil

On Fri, Jun 10, 2016 at 3:54 AM Mingyu Kim  wrote:

> // forking for clarify
>
>
>
> Related to the question I had below, I’m wondering how I can verify the
> authenticity of the submitted AM. (For example, when I’m making a call to
> AM, I’d like to verify that I’m talking to the AM that I submitted, not
> someone else who hijacked my network traffic. Also, when AM makes a
> callback to a server outside YARN, I’d like to verify that it’s the AM I
> submitted, not someone else who’s spoofing) This can generally be achieved
> by sending a secret (whether that’s a one-time secret that the server
> outside YARN can verity or a SSL keystore) to AM. Do you know how one can
> securely send the secret to AM? Or, is there an existing YARN mechanism I
> can rely on to verify the authenticity? (I saw
> ApplicationReport.getClientToAMToken(), but that seems to be for AM to
> verify the authenticity of client) Again, any pointer will be appreciated.
>
>
>
> Thanks,
>
> Mingyu
>
>
>
> *From: *Rohith Sharma K S 
> *Date: *Wednesday, June 8, 2016 at 11:15 PM
> *To: *Mingyu Kim , "user@hadoop.apache.org" <
> user@hadoop.apache.org>
> *Cc: *Matt Cheah 
> *Subject: *RE: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi
>
>
>
> Do you know how I can extend the client interface of the RPC port?
>
> >>> YARN provides YARNClIent library that uses ApplicationClientProtocol.
> For your more understanding refer
> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html#Writing_a_simple_Client
> 
>
>
>
> I know AM has some endpoints exposed through the RPC port for internal
> YARN communications, but was not sure how I can extend it to expose a
> custom endpoint.
>
> >>> I am not sure what you mean here internal YARN communication? AM can
> connect to RM only via AM-RM interface for register/unregister and
> heartbeat and details sent to RM are limited.  It is up to the AM’s to
> expose client interface for providing metadata.
>
> Thanks & Regards
>
> Rohith Sharma K S
>
> *From:* Mingyu Kim [mailto:m...@palantir.com]
> *Sent:* 09 June 2016 11:21
> *To:* Rohith Sharma K S; user@hadoop.apache.org
> *Cc:* Matt Cheah
> *Subject:* Re: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi Rohith,
>
>
>
> Thanks for the quick response. That sounds promising. Do you know how I
> can extend the client interface of the RPC port? I know AM has some
> endpoints exposed through the RPC port for internal YARN communications,
> but was not sure how I can extend it to expose a custom endpoint. Any
> pointer would be appreciated!
>
>
>
> Mingyu
>
>
>
> *From: *Rohith Sharma K S 
> *Date: *Wednesday, June 8, 2016 at 10:39 PM
> *To: *Mingyu Kim , "user@hadoop.apache.org" <
> user@hadoop.apache.org>
> *Cc: *Matt Cheah 
> *Subject: *RE: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi
>
>
>
> Apart from AM address and tracking URL, no other meta data of
> applicationMaster are stored in YARN. May be AM can expose client interface
> so that AM clients can interact with Running AM to retrieve specific AM
> details.
>
>
>
> RPC port of AM can be get from YARN client interface such as
> ApplicationClientProtocol# getApplicationReport() OR
> ApplicationClientProtocol #getApplicationAttemptReport().
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Mingyu Kim [mailto:m...@palantir.com ]
> *Sent:* 09 June 2016 10:36
> *To:* user@hadoop.apache.org
> *Cc:* Matt Cheah
> *Subject:* Securely discovering Application Master's metadata or sending
> a secret to Application Master at submission
>
>
>
> Hi all,
>
>
>
> To provide a bit of background, I’m trying to deploy a REST server on
> Application Master and discover the randomly assigned port number securely.
> I can easily discover the host name of AM through YARN REST API, but the
> port number needs to be discovered separately. (Port number is assigned
> within a specified range with retries to avoid port conflicts) An easy
> solution would be to have Application Master make a callback with the port
> number, but

Re: ResourceManager API

2016-06-10 Thread Sunil Govind

Hi Kishore

Below command may help you to get some basic information which you are
looking for.
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs

Further to this, some more enhancements are happening as part of YARN-4904,
but its not part of any release as of now.

- Sunil

On Fri, Jun 10, 2016 at 9:32 AM kishore alajangi 
wrote:

> Hi Experts,
>
> Is there a way to get the logs from resourcemanager api for running job ?
> please help me.
>
>
> --
> Sincere Regards,
> A.Kishore Kumar,
>
>

Re: Securely discovering Application Master's metadata or sending a secret to Application Master at submission

2016-06-10 Thread Sunil Govind

Hi Mike

Adding to what Rohith has mentioned, you can refer to below interface to
know what all information which you can get from Yarn w.r.t one
application.
https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/yarn/api/records/ApplicationReport.html

This has RPC port ApplicationMaster, and you can try to interact AM through
that. Being said this, its upto ApplicationMaster to expose interfaces
which you are looking. And YARN doesnt have any control on same as
mentioned by Rohith.

- Sunil


On Fri, Jun 10, 2016 at 11:26 AM Rohith Sharma K S <
rohithsharm...@huawei.com> wrote:

> Hi
>
>
>
> Basically I see you have multiple questions
>
> 1.   How to get AM RPC port ?
>
> >>> This you can get it via YarnClient# getApplicationReport(). This
> gives common/generic application specific details. Note that RM does not
> maintain any custom details for applications.
>
> 2.   How can you get metadata of AM?
>
> >>> Basically AM design should be such that bind an interface to AM RPC.
> And AM-RPC host and port can be obtained from ResourceManager. Using
> host:port of AM from application submitter,  connect to AM and get required
> details from AM only. To achieve this , YARN does not provide any interface
> since AM are written users. Essentially, user can design AM to expose
> client interface to their clients. For your better understanding , see
> MapReduce framework MRAppMaster.
>
> 3.   About the authenticity of job-submitter to AM
>
> >>> Use secured hadoop cluster with Kerberos enabled. Note that AM also
> should be implemented for handling Kerberos.
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Mingyu Kim [mailto:m...@palantir.com]
> *Sent:* 10 June 2016 03:47
>
>
> *To:* Rohith Sharma K S; user@hadoop.apache.org
> *Cc:* Matt Cheah
> *Subject:* Re: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi Rohith,
>
>
>
> Thanks for the pointers. I checked the Hadoop documentation you linked,
> but it’s not clear how I can expose client interface for providing
> metadata. By “YARN internal communications”, I was referring to the
> endpoints that are exposed by AM on the RPC port as reported in
> ApplicationReport. I assume either RM or containers will communicate with
> AM through these endpoints.
>
>
>
> I believe your suggestion is to expose additional endpoints to the AM RPC
> port. Can you clarify how I can do that? Is there an interface/class I need
> to extend? How can I register the extra endpoints for providing metadata on
> the existing AM RPC port?
>
>
>
> Mingyu
>
>
>
> *From: *Rohith Sharma K S 
> *Date: *Wednesday, June 8, 2016 at 11:15 PM
> *To: *Mingyu Kim , "user@hadoop.apache.org" <
> user@hadoop.apache.org>
> *Cc: *Matt Cheah 
> *Subject: *RE: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi
>
>
>
> Do you know how I can extend the client interface of the RPC port?
>
> >>> YARN provides YARNClIent library that uses ApplicationClientProtocol.
> For your more understanding refer
> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html#Writing_a_simple_Client
> 
>
>
>
> I know AM has some endpoints exposed through the RPC port for internal
> YARN communications, but was not sure how I can extend it to expose a
> custom endpoint.
>
> >>> I am not sure what you mean here internal YARN communication? AM can
> connect to RM only via AM-RM interface for register/unregister and
> heartbeat and details sent to RM are limited.  It is up to the AM’s to
> expose client interface for providing metadata.
>
> Thanks & Regards
>
> Rohith Sharma K S
>
> *From:* Mingyu Kim [mailto:m...@palantir.com ]
> *Sent:* 09 June 2016 11:21
> *To:* Rohith Sharma K S; user@hadoop.apache.org
> *Cc:* Matt Cheah
> *Subject:* Re: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi Rohith,
>
>
>
> Thanks for the quick response. That sounds promising. Do you know how I
> can extend the client interface of the RPC port? I know AM has some
> endpoints exposed through the RPC port for internal YARN communications,
> but was not sure how I can extend it to expose a custom endpoint. Any
> pointer would be appreciated!
>
>
>
> Mingyu
>
>
>
> *From: *Rohith Sharma K S 
> *Date: *Wednesday, June 8, 2016 at 10:39 PM
> *To: *Mingyu Kim , "user@hadoop.apache.org" <
>

Re: YARN cluster underutilization

2016-05-25 Thread Sunil Govind

Hi Jeff,

I am not very sure about reducing heartbeat interval. This may put more
pressure on RM. We can wait for opinion from others in this point.

I think a good option is to add node locality in resource requests rather
than keeping it OFF_SWITCH. This could help.

Thanks
Sunil

On Thu, May 26, 2016 at 1:00 AM Guttadauro, Jeff <jeff.guttada...@here.com>
wrote:

> Interesting stuff!  I did not know about this handling of OFFSWITCH
> requests.
>
>
>
> To get around this, would you recommend reducing the heartbeat interval,
> perhaps to 250ms to get a 4x improvement in container allocation rate (or
> is it not quite as simple as that)?  Maybe doing this in combination with
> using a greater number of smaller nodes would help?  Would overloading the
> ResourceManager be a concern if doing that?  Should I bump up the
> “YARN_RESOURCEMANAGER_HEAPSIZE” configuration property (current default for
> m3.xlarge is 2396M), or would you suggest any other knobs to turn to help
> RM handle it?
>
>
>
> Thanks again for all your help, Sunil!
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
> *Sent:* Wednesday, May 25, 2016 1:07 PM
>
>
> *To:* Guttadauro, Jeff <jeff.guttada...@here.com>; user@hadoop.apache.org
> *Subject:* Re: YARN cluster underutilization
>
>
>
> Hi Jeff,
>
>
>
>  I do see the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms
> property set to 1000 in the job configuration
>
> >> Ok, This make sense.. node heartbeat seems default.
>
>
>
> If there are no locality specified in resource requests (using
> ResourceRequest.ANY) , then YARN will allocate only one container per node
> heartbeat. So your container allocation rate is slower considering 600k
> requests and only 20 nodes. And if more number of containers are also
> getting released fast (I could see that some containers lifetime is 80 to
> 90 secs), then this will become more complex and container allocation rate
> will be slower.
>
>
>
> YARN-4963 <https://issues.apache.org/jira/browse/YARN-4963> is trying to
> make more allocation per heartbeat for NODE_OFFSWITCH (ANY) requests. But
> its not yet available in any release.
>
>
>
> I guess you can investigate more in this line to confirm this points.
>
>
>
> Thanks
>
> Sunil
>
>
>
>
>
> On Wed, May 25, 2016 at 11:00 PM Guttadauro, Jeff <
> jeff.guttada...@here.com> wrote:
>
> Thanks for digging into the log, Sunil, and making some interesting
> observations!
>
>
>
> The heartbeat interval hasn’t been changed from its default, and I do see
> the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms property set to
> 1000 in the job configuration.  I was searching in the log for heartbeat
> interval information, but I didn’t find anything.  Where do you look in the
> log for the heartbeats?
>
>
>
> Also, you are correct about there being no data locality, as all the input
> data is in S3.  The utilization has been fluctuating, but I can’t really
> see a pattern or tell why.  It actually started out pretty low in the
> 20-30% range and then managed to get up into the 50-70% range after a
> while, but that was short-lived, as it went back down into the 20-30% range
> for quite a while.  While writing this, I saw it surprisingly hit 80%!!
> First time I’ve seen it that high in the 20 hours it’s been running…
>  Although looks like it may be headed back down.  I’m perplexed.  Wouldn’t
> you generally expect fairly stable utilization over the course of the job?
> (This is the only job running.)
>
>
>
> Thanks,
>
> -Jeff
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
> *Sent:* Wednesday, May 25, 2016 11:55 AM
>
>
> *To:* Guttadauro, Jeff <jeff.guttada...@here.com>; user@hadoop.apache.org
> *Subject:* Re: YARN cluster underutilization
>
>
>
> Hi Jeff.
>
>
>
> Thanks for sharing this information. I have some observations from this
> logs.
>
>
>
> - I think the node heartbeat is around 2/3 seconds here. Is it changed due
> to some other reasons?
>
> - And all mappers Resource Request seems to be asking for type ANY (there
> is no data locality). pls correct me if I am wrong.
>
>
>
> If the resource request type is ANY, only one container will be allocated
> per heartbeat for a node. Here node heartbeat delay is also more. And I can
> see that containers are released very fast too. So when u started you
> application, are you seeing more better resource utilization? And once
> containers started to get released/completed, you are seeing under
> utilization.
>
>
>
> Pls look into this line. It may be a reason.
>
>
>
&

Re: YARN cluster underutilization

2016-05-25 Thread Sunil Govind

Hi Jeff,

 I do see the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms
property set to 1000 in the job configuration
>> Ok, This make sense.. node heartbeat seems default.

If there are no locality specified in resource requests (using
ResourceRequest.ANY) , then YARN will allocate only one container per node
heartbeat. So your container allocation rate is slower considering 600k
requests and only 20 nodes. And if more number of containers are also
getting released fast (I could see that some containers lifetime is 80 to
90 secs), then this will become more complex and container allocation rate
will be slower.

YARN-4963 <https://issues.apache.org/jira/browse/YARN-4963> is trying to
make more allocation per heartbeat for NODE_OFFSWITCH (ANY) requests. But
its not yet available in any release.

I guess you can investigate more in this line to confirm this points.

Thanks
Sunil


On Wed, May 25, 2016 at 11:00 PM Guttadauro, Jeff <jeff.guttada...@here.com>
wrote:

> Thanks for digging into the log, Sunil, and making some interesting
> observations!
>
>
>
> The heartbeat interval hasn’t been changed from its default, and I do see
> the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms property set to
> 1000 in the job configuration.  I was searching in the log for heartbeat
> interval information, but I didn’t find anything.  Where do you look in the
> log for the heartbeats?
>
>
>
> Also, you are correct about there being no data locality, as all the input
> data is in S3.  The utilization has been fluctuating, but I can’t really
> see a pattern or tell why.  It actually started out pretty low in the
> 20-30% range and then managed to get up into the 50-70% range after a
> while, but that was short-lived, as it went back down into the 20-30% range
> for quite a while.  While writing this, I saw it surprisingly hit 80%!!
> First time I’ve seen it that high in the 20 hours it’s been running…
>  Although looks like it may be headed back down.  I’m perplexed.  Wouldn’t
> you generally expect fairly stable utilization over the course of the job?
> (This is the only job running.)
>
>
>
> Thanks,
>
> -Jeff
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
> *Sent:* Wednesday, May 25, 2016 11:55 AM
>
>
> *To:* Guttadauro, Jeff <jeff.guttada...@here.com>; user@hadoop.apache.org
> *Subject:* Re: YARN cluster underutilization
>
>
>
> Hi Jeff.
>
>
>
> Thanks for sharing this information. I have some observations from this
> logs.
>
>
>
> - I think the node heartbeat is around 2/3 seconds here. Is it changed due
> to some other reasons?
>
> - And all mappers Resource Request seems to be asking for type ANY (there
> is no data locality). pls correct me if I am wrong.
>
>
>
> If the resource request type is ANY, only one container will be allocated
> per heartbeat for a node. Here node heartbeat delay is also more. And I can
> see that containers are released very fast too. So when u started you
> application, are you seeing more better resource utilization? And once
> containers started to get released/completed, you are seeing under
> utilization.
>
>
>
> Pls look into this line. It may be a reason.
>
>
>
> Thanks
>
> Sunil
>
>
>
> On Wed, May 25, 2016 at 9:59 PM Guttadauro, Jeff <jeff.guttada...@here.com>
> wrote:
>
> Thanks for your thoughts thus far, Sunil.  Most grateful for any
> additional help you or others can offer.  To answer your questions,
>
>
>
> 1.   This is a custom M/R job, which uses mappers only (no reduce
> phase) to process GPS probe data and filter based on inclusion within a
> provided polygon.  There is actually a lot of upfront work done in the
> driver to make that task as simple as can be (identifies a list of tiles
> that are completely inside the polygon and those that fall across an edge,
> for which more processing would be needed), but the job would still be more
> compute-intensive than wordcount, for example.
>
>
>
> 2.   I’m running almost 84k mappers for this job.  This is actually
> down from ~600k mappers, since one other thing I’ve done is increased the
> mapreduce.input.fileinputformat.split.minsize to 536870912 (512M) for the
> job.  Data is in S3, so loss of locality isn’t really a concern.
>
>
>
> 3.   For NodeManager configuration, I’m using EMR’s default
> configuration for the m3.xlarge instance type, which is
> yarn.scheduler.minimum-allocation-mb=32,
> yarn.scheduler.maximum-allocation-mb=11520, and
> yarn.nodemanager.resource.memory-mb=11520.  YARN dashboard shows min/max
> allocations of <memory:32, vCores:1>/<memory:11520, vCores:8>.
>
>
>
> 4.   Capacity S

Re: YARN cluster underutilization

2016-05-25 Thread Sunil Govind

Hi Jeff.

Thanks for sharing this information. I have some observations from this
logs.

- I think the node heartbeat is around 2/3 seconds here. Is it changed due
to some other reasons?
- And all mappers Resource Request seems to be asking for type ANY (there
is no data locality). pls correct me if I am wrong.

If the resource request type is ANY, only one container will be allocated
per heartbeat for a node. Here node heartbeat delay is also more. And I can
see that containers are released very fast too. So when u started you
application, are you seeing more better resource utilization? And once
containers started to get released/completed, you are seeing under
utilization.

Pls look into this line. It may be a reason.

Thanks
Sunil

On Wed, May 25, 2016 at 9:59 PM Guttadauro, Jeff <jeff.guttada...@here.com>
wrote:

> Thanks for your thoughts thus far, Sunil.  Most grateful for any
> additional help you or others can offer.  To answer your questions,
>
>
>
> 1.   This is a custom M/R job, which uses mappers only (no reduce
> phase) to process GPS probe data and filter based on inclusion within a
> provided polygon.  There is actually a lot of upfront work done in the
> driver to make that task as simple as can be (identifies a list of tiles
> that are completely inside the polygon and those that fall across an edge,
> for which more processing would be needed), but the job would still be more
> compute-intensive than wordcount, for example.
>
>
>
> 2.   I’m running almost 84k mappers for this job.  This is actually
> down from ~600k mappers, since one other thing I’ve done is increased the
> mapreduce.input.fileinputformat.split.minsize to 536870912 (512M) for the
> job.  Data is in S3, so loss of locality isn’t really a concern.
>
>
>
> 3.   For NodeManager configuration, I’m using EMR’s default
> configuration for the m3.xlarge instance type, which is
> yarn.scheduler.minimum-allocation-mb=32,
> yarn.scheduler.maximum-allocation-mb=11520, and
> yarn.nodemanager.resource.memory-mb=11520.  YARN dashboard shows min/max
> allocations of <memory:32, vCores:1>/<memory:11520, vCores:8>.
>
>
>
> 4.   Capacity Scheduler [MEMORY]
>
>
>
> 5.   I’ve attached 2500 lines from the RM log.  Happy to grab more,
> but they are pretty big, and I thought that might be sufficient.
>
>
>
> Any guidance is much appreciated!
>
> -Jeff
>
>
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com]
> *Sent:* Wednesday, May 25, 2016 10:55 AM
> *To:* Guttadauro, Jeff <jeff.guttada...@here.com>; user@hadoop.apache.org
> *Subject:* Re: YARN cluster underutilization
>
>
>
> Hi Jeff,
>
>
>
> It looks like to you are allocating more memory for AM container. Mostly
> you might not need 6Gb (as per the log). Could you please help  to provide
> some more information.
>
>
>
> 1. What type of mapreduce application (wordcount etc) are you running?
> Some AMs may be CPU intensive and some may not be. So based on the type
> application, memory/cpu can be tuned for better utilization.
>
> 2. How many mappers (reducers) are you trying to run here?
>
> 3. You have mentioned that each node has 8 cores and 15GB, but how much is
> actually configured for NM?
>
> 4. Which scheduler are you using?
>
> 5. Its better to attach RM log if possible.
>
>
>
> Thanks
>
> Sunil
>
>
>
> On Wed, May 25, 2016 at 8:58 PM Guttadauro, Jeff <jeff.guttada...@here.com>
> wrote:
>
> Hi, all.
>
>
>
> I have an M/R (map-only) job that I’m running on a Hadoop 2.7.1 YARN
> cluster that is being quite underutilized (utilization of around 25-30%).
> The EMR cluster is 1 master + 20 core m3.xlarge nodes, which have 8 cores
> each and 15G total memory (with 11.25G of that available to YARN).  I’ve
> configured mapper memory with the following properties, which should allow
> for 8 containers running map tasks per node:
>
>
>
> mapreduce.map.memory.mb1440
> 
>
> mapreduce.map.java.opts-Xmx1024m
> 
>
>
>
> It was suggested that perhaps my AppMaster was having trouble keeping up
> with creating all the mapper containers and that I bulk up its resource
> allocation.  So I did, as shown below, providing it 6G container memory (5G
> task memory), 3 cores, and 60 task listener threads.
>
>
>
> yarn.app.mapreduce.am.job.task.listener.thread-count60
> 
>
> yarn.app.mapreduce.am.resource.cpu-vcores3
> 
>
> yarn.app.mapreduce.am.resource.mb6400
> 
>
> yarn.app.mapreduce.am.command-opts-Xmx5120m
> 
>
>
>
> Taking a look at the node on which the AppMaster is running, I'm seeing
> plenty of CPU idle time and free memory, yet there are still nodes with no

Re: YARN cluster underutilization

2016-05-25 Thread Sunil Govind

Hi Jeff,

It looks like to you are allocating more memory for AM container. Mostly
you might not need 6Gb (as per the log). Could you please help  to provide
some more information.

1. What type of mapreduce application (wordcount etc) are you running? Some
AMs may be CPU intensive and some may not be. So based on the type
application, memory/cpu can be tuned for better utilization.
2. How many mappers (reducers) are you trying to run here?
3. You have mentioned that each node has 8 cores and 15GB, but how much is
actually configured for NM?
4. Which scheduler are you using?
5. Its better to attach RM log if possible.

Thanks
Sunil

On Wed, May 25, 2016 at 8:58 PM Guttadauro, Jeff 
wrote:

> Hi, all.
>
>
>
> I have an M/R (map-only) job that I’m running on a Hadoop 2.7.1 YARN
> cluster that is being quite underutilized (utilization of around 25-30%).
> The EMR cluster is 1 master + 20 core m3.xlarge nodes, which have 8 cores
> each and 15G total memory (with 11.25G of that available to YARN).  I’ve
> configured mapper memory with the following properties, which should allow
> for 8 containers running map tasks per node:
>
>
>
> mapreduce.map.memory.mb1440
> 
>
> mapreduce.map.java.opts-Xmx1024m
> 
>
>
>
> It was suggested that perhaps my AppMaster was having trouble keeping up
> with creating all the mapper containers and that I bulk up its resource
> allocation.  So I did, as shown below, providing it 6G container memory (5G
> task memory), 3 cores, and 60 task listener threads.
>
>
>
> yarn.app.mapreduce.am.job.task.listener.thread-count60
> 
>
> yarn.app.mapreduce.am.resource.cpu-vcores3
> 
>
> yarn.app.mapreduce.am.resource.mb6400
> 
>
> yarn.app.mapreduce.am.command-opts-Xmx5120m
> 
>
>
>
> Taking a look at the node on which the AppMaster is running, I'm seeing
> plenty of CPU idle time and free memory, yet there are still nodes with no
> utilization (0 running containers).  The log indicates that the AppMaster
> has way more memory (physical/virtual) than it appears to need with
> repeated log messages like this:
>
>
>
> 2016-05-25 13:59:04,615 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
> (Container Monitor): Memory usage of ProcessTree 11265 for container-id
> container_1464122327865_0002_01_01: 1.6 GB of 6.3 GB physical memory
> used; 6.1 GB of 31.3 GB virtual memory used
>
>
>
> Can you please help me figure out where to go from here to troubleshoot,
> or any other things to try?
>
>
>
> Thanks!
>
> -Jeff
>
>
>

37 matches

Mail list logo