Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Klaus Ma
8:10, Guangya Liu <gyliu...@gmail.com> wrote: >>>> >>>> Hi Tom, >>>> >>>> I traced the agent of "20160112-165226-67375276-5050-22401-S199" and >>>> found that it is keeps declining by many frameworks: once a framework got >>>&g

Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Guangya Liu
e weight for each >>> role? >>> 2) Do you start all agents without any reservation? >>> >>> Thanks, >>> >>> Guangya >>> >>> On Sun, Feb 21, 2016 at 9:23 AM, Klaus Ma <klaus1982...@gmail.com> >>> wrote: >>> &

Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Tom Arnfeld
>>> Thanks, >>> >>> Guangya >>> >>> On Sun, Feb 21, 2016 at 9:23 AM, Klaus Ma <klaus1982...@gmail.com >>> <mailto:klaus1982...@gmail.com>> wrote: >>> Hi Tom, >>> >>> What's the allocation interval,

Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Guangya Liu
y to reduce filter's timeout >>> of framework? >>> >>> According to the log, ~12 frameworks on cluster with ~42 agents; the >>> filter duration is 5sec, and there're ~60 times filtered in each seconds >>> (e.g. 65 in 18:08:34). For example, framework >&

Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Tom Arnfeld
filter >> duration is 5sec, and there're ~60 times filtered in each seconds (e.g. 65 >> in 18:08:34). For example, framework >> (20160219-164457-67375276-5050-28802-0015) just get resources from 6 agents >> and filtered the other 36 agents at 18:08:35 (egrep "Allo

Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Guangya Liu
filtered in each seconds >> (e.g. 65 in 18:08:34). For example, framework >> (20160219-164457-67375276-5050-28802-0015) >> just get resources from 6 agents and filtered the other 36 agents at >> 18:08:35 (egrep "Alloca|Filtered" mesos-master.log | grep >> &q

Re: Mesos sometimes not allocating the entire cluster

2016-02-21 Thread Guangya Liu
050-28802-0015) > just get resources from 6 agents and filtered the other 36 agents at > 18:08:35 (egrep "Alloca|Filtered" mesos-master.log | grep > "20160219-164457-67375276-5050-28802-0015" | grep "18:08:35") > > Thanks > Klaus > > ------

RE: Mesos sometimes not allocating the entire cluster

2016-02-20 Thread Klaus Ma
-164457-67375276-5050-28802-0015) just get resources from 6 agents and filtered the other 36 agents at 18:08:35 (egrep "Alloca|Filtered" mesos-master.log | grep "20160219-164457-67375276-5050-28802-0015" | grep "18:08:35") ThanksKlaus From: t...@duedil.com Subject: Re:

Re: Mesos sometimes not allocating the entire cluster

2016-02-18 Thread Tom Arnfeld
Hi Ben, I've only just seen your email! Really appreciate the reply, that's certainly an interesting bug and we'll try that patch and see how we get on. Cheers, Tom. On 29 January 2016 at 19:54, Benjamin Mahler wrote: > Hi Tom, > > I suspect you may be tripping the

Re: Mesos sometimes not allocating the entire cluster

2016-01-29 Thread Benjamin Mahler
Hi Tom, I suspect you may be tripping the following issue: https://issues.apache.org/jira/browse/MESOS-4302 Please have a read through this and see if it applies here. You may also be able to apply the fix to your cluster to see if that helps things. Ben On Wed, Jan 20, 2016 at 10:19 AM, Tom

Re: Mesos sometimes not allocating the entire cluster

2016-01-22 Thread Klaus Ma
Can you share the whole log of master? I'll be helpful :). Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer Platform OpenSource Technology, STG, IBM GCG +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me On Thu, Jan 21, 2016 at 11:57 PM, Tom Arnfeld wrote: >

Re: Mesos sometimes not allocating the entire cluster

2016-01-22 Thread Tom Arnfeld
I can’t send the entire log as there’s a lot of activity on the cluster all the time, is there anything particular you’re looking for? > On 22 Jan 2016, at 12:46, Klaus Ma wrote: > > Can you share the whole log of master? I'll be helpful :). > > > Da (Klaus), Ma

Re: Mesos sometimes not allocating the entire cluster

2016-01-21 Thread Klaus Ma
Yes, it seems Hadoop framework did not consume all offered resources: if framework launch task (1 CPUs) on offer (10 CPUs), the other 9 CPUs will return back to master (recoverResouces). Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer Platform OpenSource Technology, STG, IBM GCG

Re: Mesos sometimes not allocating the entire cluster

2016-01-21 Thread Tom Arnfeld
Thanks everyone! Stephan - There's a couple of useful points there, will definitely give it a read. Klaus - Thanks, we're running a bunch of different frameworks, in that list there's Hadoop MRv1, Apache Spark, Marathon and a couple of home grown frameworks we have. In this particular case the

Re: Mesos sometimes not allocating the entire cluster

2016-01-21 Thread Tom Arnfeld
Guangya - Nope, there's no outstanding offers for any frameworks, the ones that are getting offers are responding properly. Klaus - This was just a sample of logs for a single agent, the cluster has at least ~40 agents at any one time. On 21 January 2016 at 15:20, Guangya Liu

Re: Mesos sometimes not allocating the entire cluster

2016-01-21 Thread Guangya Liu
Can you please help check if some outstanding offers in cluster which does not accept by any framework? You can check this via the endpoint of /master/state.json endpoint. If there are some outstanding offers, you can start the master with a offer_timeout flag to let master rescind some offers if

Re: Mesos sometimes not allocating the entire cluster

2016-01-21 Thread Klaus Ma
Do you mean the only one slave is offered to some framework but the others are starving? Mesos allocator (DRF) offer resources by host; so if there's only one host, the other framework can not get resources. We're have several JIRAs on how to balance resources between frameworks. Da

Re: Mesos sometimes not allocating the entire cluster

2016-01-20 Thread Erb, Stephan
t;t...@duedil.com> Sent: Wednesday, January 20, 2016 7:19 PM To: user@mesos.apache.org Subject: Mesos sometimes not allocating the entire cluster Hey, I've noticed some interesting behaviour recently when we have lots of different frameworks connected to our Mesos cluster at once, all using a v

Mesos sometimes not allocating the entire cluster

2016-01-20 Thread Tom Arnfeld
Hey, I've noticed some interesting behaviour recently when we have lots of different frameworks connected to our Mesos cluster at once, all using a variety of different shares. Some of the frameworks don't get offered more resources (for long periods of time, hours even) leaving the cluster under

Re: Mesos sometimes not allocating the entire cluster

2016-01-20 Thread Klaus Ma
Hi Tom, Which framework are you using, e.g. Swarm, Marathon or something else? and which language package are you using? DRF will sort role/framework by allocation ratio, and offer all "available" resources by slave; but if the resources it too small (< 0.1CPU) or the resources was