Re: Resource Isolation in Mesos

2016-10-04 Thread haosdent
Hi, @Srikant

Hi, @Srikant

Usually, your task should be killed when over cgroup limit. Would you enter
the `/sys/fs/cgroup/memory/mesos` folder in the agent?
Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`,
 `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and
`${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this email.

${YOUR_CONTAINER_ID} is the container id of your task here, you could find
it from the agent log. Or as you said, you only have this one task, so it
should only have one directory under `/sys/fs/cgroup/memory/mesos`.

Furthermore, would you show the result of
http://${YOUR_AGENT_IP}:5051/containers?
It contains some tasks statistics information as well.

On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani 
wrote:

> We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to 1.0.1.
> After upgrade we are not able to see memory used by task which was fine in
> previous version. Due to this cgroups are not effective.
>
> Answers to your questions below :
>
> There is only 1 task running as a appserver which is consuming approx 20G
> mem but this info is not coming in Mesos UI.
> Swaps are enabled in agent start command.
> These flags are used in agent - cgroups_limits_swap=true
> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
> In agent logs I can see updated memory limit to 33MB for container.
>
> Web UI shows the total memory allocated to framework but it is not showing
> memory used by task.It always shows 0B/33MB.
>
> Not sure if this is rhel7 issue or mesos 1.0.1.
>
> Any suggestions ?
> On 26 Sep 2016 21:55, "haosdent"  wrote:
>
>> Hi, @Srikant May you elaborate
>>
>> >We have verified using top command that framework was using 2gB memory
>> while allocated was just 50 mb.
>>
>> * How many running tasks in your framework?
>> * Do you enable or disable swap in the agents?
>> * What's the flags that you launch agents?
>> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to ` in
>> the log of agent?
>>
>> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
>> srikant.blackr...@gmail.com> wrote:
>>
>>> Hi Greg ,
>>>
>>> Previously we were running Mesos 0.27 on Rhel6 and since we already have
>>> one c group hierarchy for cpu and memory for our production  processes I'd
>>> we were not able to merge two c groups hierarchy on rhel6. Slave process
>>> was not coming up.
>>> Now we have moved  to Rhel7 and both mesos master and slave are running
>>> on rhel7 with c group implemented.But we are seeing that mesos UI not
>>> showing the actual memory used by framework.
>>>
>>> Any idea why framework usage of cpu and memory is not coming in UI. Due
>>> to this OS is still not killing the task which are consuming more memory
>>> than the allocated one.
>>> We have verified using top command that framework was using 2gB memory
>>> while allocated was just 50 mb.
>>>
>>> Please suggest.
>>> On 8 Sep 2016 01:53, "Greg Mann"  wrote:
>>>
 Hi Srikant,
 Without using cgroups, it won't be possible to enforce isolation of
 cpu/memory on a Linux agent. Could you elaborate a bit on why you aren't
 able to use cgroups currently? Have you tested the existing Mesos cgroup
 isolators in your system?

 Cheers,
 Greg

 On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani <
 srikant.blackr...@gmail.com> wrote:

> Hi Guys,
>
> We are running Mesos cluster in our development environment. We are
> seeing the cases where framework uses more amount of resources like cpu 
> and
> memory then the initial requested resources. When any new framework is
> registered Mesos calculates the resources on the basis of already offered
> resources to first framework and it doesn't consider actual  resources
> utilised by previous framework.
> This is resulting in incorrect calculation of resources.
> Mesos website says that we should Implement  c groups but it is not
> possible in our case as we have already implemented c groups in other
> projects and due to Linux restrictions  we can't merge two c groups
> hierarchy.
>
> Any idea how we can implement resource Isolation in Mesos ?
>
> We are using Mesos 0.27.1
>
> Thanks
> Srikant Kalani
>


>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>


-- 
Best Regards,
Haosdent Huang


Re: Troubleshooting tasks that are stuck in the 'Staging' state

2016-10-04 Thread Frank Scholten
Thanks Haosdent for your quick response.

I added GLOG_v=1 to the master and agents.

1. The framework is registered. Marathon in this case.
2. I see messages 'Telling agent (...) to kill task (...)'. Why does
this happen? I also see 'Sending explicit reconciliation state
TASK_LOST for task fake-marathon-pacemaker-task-(...)'.
3. I searched for RunTaskMessage in the agent log but could not find
it. Is this the exact text to search for or is this the name of the
protobuf message? Are these logged on a higher log level?

On Tue, Oct 4, 2016 at 11:22 AM, haosdent  wrote:
> staging is the initialize status of the task. I think you may your logs via
> these steps:
>
> 1. If your framework registered successfully in the master?
> 2. If the master send resources offers to your framework and your framework
> accept it?
> 3. If your agents receive the RunTaskMessage from master to launch your
> task?
>
> In additionally, use `export GLOG_v=1` before start masters and agents may
> helpful for your troubleshooting.
>
> On Tue, Oct 4, 2016 at 4:58 PM, Frank Scholten 
> wrote:
>>
>> Hi all,
>>
>> I am looking for some ways to troubleshoot or debug tasks that are
>> stuck in the 'staging' state. Typically they have no logs in the
>> sandbox.
>>
>> Are there are any endpoints or things to look for in logs to identify
>> a root cause?
>>
>> Is there a troubleshooting guide for Mesos to solve problems like this?
>>
>> Cheers,
>>
>> Frank
>
>
>
>
> --
> Best Regards,
> Haosdent Huang


Re: determine slave capabilities

2016-10-04 Thread Hendrik Haddorp

thanks!

On 04.10.2016 11:14, haosdent wrote:
hi, @Hendrik You could specific the --attribute flag when starting 
mesos agent. For example, use --attributes=docker:false. Then you 
could get it in the `Offer` in your framework. Another way is query 
the /flags endpoint of the agent in your framework. You could get the 
url of the agent from `Offer` as well.


On Tue, Oct 4, 2016 at 5:06 PM, Hendrik Haddorp 
> wrote:


Hi,

is there a way for a framework to determine what containerizers
are available on a slave? I have a setup where one slave has no
docker engine so that I get an error when I try to start a
container on that slave. Thus it would be nice if I could somehow
check in advanced what capabilities a slave has.

regards,
Hendrik




--
Best Regards,
Haosdent Huang




Re: Troubleshooting tasks that are stuck in the 'Staging' state

2016-10-04 Thread haosdent
staging is the initialize status of the task. I think you may your logs via
these steps:

1. If your framework registered successfully in the master?
2. If the master send resources offers to your framework and your framework
accept it?
3. If your agents receive the RunTaskMessage from master to launch your
task?

In additionally, use `export GLOG_v=1` before start masters and agents may
helpful for your troubleshooting.

On Tue, Oct 4, 2016 at 4:58 PM, Frank Scholten 
wrote:

> Hi all,
>
> I am looking for some ways to troubleshoot or debug tasks that are
> stuck in the 'staging' state. Typically they have no logs in the
> sandbox.
>
> Are there are any endpoints or things to look for in logs to identify
> a root cause?
>
> Is there a troubleshooting guide for Mesos to solve problems like this?
>
> Cheers,
>
> Frank
>



-- 
Best Regards,
Haosdent Huang


Troubleshooting tasks that are stuck in the 'Staging' state

2016-10-04 Thread Frank Scholten
Hi all,

I am looking for some ways to troubleshoot or debug tasks that are
stuck in the 'staging' state. Typically they have no logs in the
sandbox.

Are there are any endpoints or things to look for in logs to identify
a root cause?

Is there a troubleshooting guide for Mesos to solve problems like this?

Cheers,

Frank