Pitfalls when writing custom Frameworks

2014-08-31 Thread Stephan Erb
Hi everybody,

I would like to assess the effort required to write a custom framework.

Background: We have an application where we can start a flexible number
of long-running worker processes performing number-crunching. The more
processes the better. However, we have multiple users, each running an
instance of the application and therefore competing for resources (as
each tries to run as many worker processes as possible). 

For various reasons, we would like to run our application instances on
top of mesos. There seem to be two ways to achieve this:

 A. Write a custom framework for our application that spawns the
worker processes on demand. Each user gets to run one framework
instance. We also need preemption of workers to achieve equality
among frameworks. We could achieve this using an external entity
monitoring all frameworks and telling to worst offenders to
scale down a little.
 B. Instead of writing a framework, use a Service-Scheduler like
Marathon, Aurora or Singularity to spawn the worker processes.
Instead of just performing the scale-down, the external entity
would dictate the number of worker processes for each
application depending on its demand.


The first choice seems to be the natural fit for Mesos. However,
existing framework like Aurora seem to be battle-tested in regard to
high availability, race conditions and issues like state reconciliation
where the world view of scheduler and slaves are drifting apart.

So this question boils down to: When considering to write a custom
framework, which pitfalls do I have to be aware of? Can I come away with
blindly implementing the scheduler API? Or do I always have to implement
stuff like custom state-reconciliation in order to prevent orphaned
tasks on slaves (for example, when my framework scheduler crashes or is
temporarily unavailable)?

Thanks for your input!

Best Regards,
Stephan






Re: Frontend loadbalancer configuration for long running tasks

2014-09-11 Thread Stephan Erb
Hi everybody,

Ankur's post is general enough, allowing me to reiterate the question:

Does anyone know about similar HAProxy solutions for Aurora? 

Thanks,
Stephan


On Di, 2014-09-09 at 01:52 -0700, Ankur Chauhan wrote:
> Hi all,
> 
> 
> (Please let me know if this is not the correct place for such a
> question).
> I have been looking at mesos + marathon + haproxy as a way of
> deploying long running web applications. Mesos coupled with
> marathon's /tasks api gives me all the information needed to get a
> haproxy configured and load balancing all the tasks but it seems a
> little too simplistic. 
> 
> 
> I was wondering if there are other projects or if others could share
> how they configure/reconfigure their loadbalancers when new tasks come
> alive. 
> 
> 
> Just to make things a little more concrete consider the following use
> case:
> 
> 
> There are two web applications that are running as tasks on mesos: 
> 1. webapp1 (http + https) on app1.domain.com
> 2. webapp2 (http + https) on app2.domain.com
> 
> 
> We want to configure a HAProxy server that routes traffic from users
> (:80 and :443) and loadbalances it correctly onto the correct set of
> tasks. Obviously there is some haproxy configuration happening here
> but i am interested in finding out what others have been doing in
> similar cases before I go around building yet another haproxy
> reconfigure and reload script.
> 
> 
> -- Ankur




Re: Pitfalls when writing custom Frameworks

2014-09-11 Thread Stephan Erb
Thanks Sharma and Bill! This is exactly the input I was looking for.

We will start by using an existing service scheduler and see where this
leads us. 

Best Regards,
Stephan

On Di, 2014-09-02 at 10:14 -0700, Bill Farner wrote:
> I'll echo Sharma's points.  While it seems simple enough to see which
> moving parts you need to implement here, the long-term effort is
> large.  I've been working on Aurora for 4.5 years, and still know of a
> lot of work we need to do.  If your use case can fit into an existing
> framework (perhaps mod a feature request/contribution here and there),
> you'll free up a lot of time to focus on the problem you're actually
> trying to solve.
> 
> -=Bill
> 
> 
> On Mon, Sep 1, 2014 at 10:45 AM, Sharma Podila 
> wrote:
> I am tempted to say that the short answer is, if your option B
> works, why bother writing your own scheduler/framework?
> 
> 
> Writing a Mesos framework can be easy. However, writing a
> fault tolerant Mesos framework that has good scalability, is
> performant, and is highly available can be relatively hard.
> Here's a few things, off the top of my head, that helped us
> make the decision to write our own:
>   * There must be a good long term reason to write your
> own framework. The scheduling/preemption/allocation
> model you spoke of may be a good reason. For us, it
> was specific scheduling optimizations that are not
> generic and are absent in other frameworks.
>   * Fault tolerance is a combination of a few things,
> Here's a few to consider:
>   * Task reconciliation with Mesos master
> currently will involve more than just using
> the reconcile feature. We augment it with
> heartbeats from tasks, Aurora does GC task,
> etc.. I believe it will take another Mesos
> release (or two?) before we can rely solely on
> Mesos task reconciliation.
>   * Framework itself must be highly available, for
> example, using ZooKeeper leader election among
> multiple framework instances. 
>   * Fault tolerant persistence of task states. For
> example, when Mesos calls your framework with
> a status update of a task, that state must be
> reliably persisted.
>   * It sounds like achieving fair share allocation via
> preemptions is important to you. That "external
> entity" you refer to may be non-trivial in the long
> run. If you were to embark on writing your own
> framework, another model to consider is to just have
> one framework scheduler instance for all users. Then,
> put the preemptions and fair share logic inside it.
> There could be complexities such as,
> for heterogeneous mix of task and slave resource
> sizes, scaling down an arbitrary number of tasks from
> user A doesn't imply they will benefit user B. The
> scheduler can perform this better than an external
> entity, by only preempting the right ones, etc.
>   * That said, for simpler use cases, it may work
> just fine to have an external entity.
>   * Scheduling itself is a hard problem. And can slow down
> quickly when doing anything more than first-fit style,
> by adding a few constraints and SLAs. Preemptions, for
> example, can slow down the scheduler in figuring out
> the right tasks to preempt to honor the fair share
> SLAs. That is, assuming you have more than a few
> hundred tasks. 
>   * There were a few talks at MesosCon, ten days ago, on
> this topic including one from us. The video/slides
> from the conference should be available from MesosCon
> sometime soon. 
> 
> 
> 
> 
> 
> 
> On Sun, Aug 31, 2014 at 7:51 AM, Stephan Erb
>  wrote:
> Hi everybody,
> 
> I would like to assess the effort required to write a
> custom framework.
> 
> Back

Re: Mesos 12.04 Python2.7 Egg

2014-09-16 Thread Stephan Erb

Did you find a solution for your question?

I am currently having similar issues when trying to run the thermos  
executor on Debian 7, which doesn't ship GLIBC 2.16 either. Seems like 
we have to patch the Aurora build process (probably in 
3rdparty/python/BUILD) to download the correct eggs form mesosphere.io 
instead of using the default ones on pypi.


Does anyone have experience in how to do this?

Thanks,
Stephan


On Sa 30 Aug 2014 08:08:24 CEST, Joe Smith wrote:

Howdy all,

I'm to migrating Apache Aurora
 to  mesos 0.20.0[1][2], but am
having an issue using the published dist on PyPI
. I also gave the
mesosphere-provided (thank you!) egg
 for Ubuntu
12.04, and am getting the same stack trace:

vagrant@192:~$
PYTHONPATH=/home/vagrant/.pex/install/mesos.native-0.20.0-py2.7-linux-x86_64.egg.be6632b790cd03172f858e7f875cdab4ef415ca5/mesos.native-0.20.0-py2.7-linux-x86_64.egg/mesos/
python2.7
Python 2.7.3 (default, Feb 27 2014, 19:58:35)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mesos
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named mesos
>>> import native
Traceback (most recent call last):
  File "", line 1, in 
  File
"/home/vagrant/.pex/install/mesos.native-0.20.0-py2.7-linux-x86_64.egg.be6632b790cd03172f858e7f875cdab4ef415ca5/mesos.native-0.20.0-py2.7-linux-x86_64.egg/mesos/native/__init__.py",
line 17, in 
from ._mesos import MesosExecutorDriverImpl
ImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.16' not
found (required by
/home/vagrant/.pex/install/mesos.native-0.20.0-py2.7-linux-x86_64.egg.be6632b790cd03172f858e7f875cdab4ef415ca5/mesos.native-0.20.0-py2.7-linux-x86_64.egg/mesos/native/_mesos.so)
>>>

It looks like the issue is it was built with a non-standard glibc (if
I'm following right):

vagrant@192:~/mesos-0.20.0$ /lib/x86_64-linux-gnu/libc.so.6 | grep
release\ version
GNU C Library (Ubuntu EGLIBC 2.15-0ubuntu10) stable release version
2.15, by Roland McGrath et al.

Any feedback or suggestions would be greatly appreciated!

Thanks,
Joe

[1] https://reviews.apache.org/r/25208/
[2] https://issues.apache.org/jira/browse/AURORA-674







Problems with OOM

2014-09-26 Thread Stephan Erb

Hi everyone,

I am having issues with the cgroups isolation of Mesos. It seems like 
tasks are prevented from allocating more memory than their limit. 
However, they are never killed.


 * My scheduled task allocates memory in a tight loop. According to
   'ps', once its memory requirements are exceeded it is not killed,
   but ends up in the state D ("uninterruptible sleep (usually IO)").
 * The task is still considered running by Mesos.
 * There is no indication of an OOM in dmesg.
 * There is neither an OOM notice nor any other output related to the
   task in the slave log.
 * According to htop, the system load is increased with a significant
   portion of CPU time spend within the kernel. Commonly the load is so
   high that all zookeeper connections time out.

I am running Aurora and Mesos 0.20.1 using the cgroups isolation on 
Debian 7 (kernel 3.2.60-1+deb7u3). .


Sorry for the somewhat unspecific error description. Still, anyone an 
idea what might be wrong here?


Thanks and Best Regards,
Stephan


Mesos.interface python package

2014-09-26 Thread Stephan Erb

Hello,

could the owner of https://pypi.python.org/pypi/mesos.interface please 
be so kind and upload latest version 0.20.1 to PyPi?


Otherwise the (awesome) egg-files by Mesosphere cannot be installed.

Thanks very much!
Stephan



Re: Problems with OOM

2014-09-26 Thread Stephan Erb
@Tomas: I am currently only running a single slave in a VM. It uses the 
isolator and the logs are clean.

@Tom: Thanks for the interesting hint! I will look into it.

Best Regards,
Stephan

On Fr 26 Sep 2014 16:53:22 CEST, Tom Arnfeld wrote:

I'm not sure if this at all related to the issue you're seeing, but we
ran into this fun issue (or at least this seems to be the cause)
helpfully documented on this blog article:
http://blog.nitrous.io/2014/03/10/stability-and-a-linux-oom-killer-bug.html.

TLDR: OOM killer getting into an infinite loop, causing the CPU to
spin out of control on our VMs.

More details in this commit message to the OOM killer earlier this
year;
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0c740d0afc3bff0a097ad03a1c8df92757516f5c

Hope this helps somewhat...

On 26 September 2014 14:15, Tomas Barton mailto:barton.to...@gmail.com>> wrote:

Just to make sure, all slaves are running with:

--isolation='cgroups/cpu,cgroups/mem'

Is there something suspicious in mesos slave logs?

On 26 September 2014 13:20, Stephan Erb
mailto:stephan@blue-yonder.com>>
wrote:

Hi everyone,

I am having issues with the cgroups isolation of Mesos. It
seems like tasks are prevented from allocating more memory
than their limit. However, they are never killed.

  * My scheduled task allocates memory in a tight loop.
According to 'ps', once its memory requirements are
exceeded it is not killed, but ends up in the state D
("uninterruptible sleep (usually IO)").
  * The task is still considered running by Mesos.
  * There is no indication of an OOM in dmesg.
  * There is neither an OOM notice nor any other output
related to the task in the slave log.
  * According to htop, the system load is increased with a
significant portion of CPU time spend within the kernel.
Commonly the load is so high that all zookeeper
connections time out.

I am running Aurora and Mesos 0.20.1 using the cgroups
isolation on Debian 7 (kernel 3.2.60-1+deb7u3). .

Sorry for the somewhat unspecific error description. Still,
anyone an idea what might be wrong here?

Thanks and Best Regards,
Stephan







Re: Problems with OOM

2014-10-06 Thread Stephan Erb

Hello,

I am still facing the same issue:

 * My process keeps allocating memory until all available system memory
   is used, but it is never killed. Its sandbox is limited to x00 MB
   but it ends up using several GB.
 * There is no OOM or cgroup related entry in dmesg (beside the
   initialization, i.e., "Initializing cgroup subsys memory"...)
 * The slave log contains nothing suspicious (see the attached logfile)

Updating my Debian kernel from 3.2 to a backported 3.16 kernel did not 
help. The system is more responsive under load, but the OOM killer is 
still not triggered. I haven't tried running kernelshark on any of these 
kernels, yet.


My used slave command line: /usr/local/sbin/mesos-slave 
--master=zk://test-host:2181/mesos --log_dir=/var/log/mesos 
--cgroups_limit_swap --isolation=cgroups/cpu,cgroups/mem 
--work_dir=/var/lib/mesos --attributes=host:test-host;rack:unspecified


Any more ideas?

Thanks,
Stephan


On 27.09.2014 19:34, CCAAT wrote:

On 09/26/14 06:20, Stephan Erb wrote:

Hi everyone,

I am having issues with the cgroups isolation of Mesos. It seems like
tasks are prevented from allocating more memory than their limit.
However, they are never killed.



I am running Aurora and Mesos 0.20.1 using the cgroups isolation on
Debian 7 (kernel 3.2.60-1+deb7u3). .



Maybe a newer kernel might help?  I've poked around for some 
suggestions on the  kernel-configuration file for servers running 
mesos, but nobody is talking about how they "tweak" their kernel 
settings, yet.


Here's a good article on default shared memory limits:
[1]http://lwn.net/Articles/595638/


Also, I'm not sure if OOM-Killer works on kernel space problems
where memory is grabbed up continuously by the kernel. That may
not even be your problem. I know OOM-killer works on userspace
memory problems.

Kernelshark is your friend

hth,
James








Log file created at: 2014/10/06 16:58:15
Running on machine: test-host
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
I1006 16:58:15.520334  2266 logging.cpp:142] INFO level logging started!
I1006 16:58:15.522333  2266 main.cpp:126] Build: 2014-09-23 05:35:41 by root
I1006 16:58:15.522378  2266 main.cpp:128] Version: 0.20.1
I1006 16:58:15.522400  2266 main.cpp:131] Git tag: 0.20.1
I1006 16:58:15.522420  2266 main.cpp:135] Git SHA: fe0a39112f3304283f970f1b08b322b1e970829d
I1006 16:58:15.524052  2266 containerizer.cpp:89] Using isolation: cgroups/cpu,cgroups/mem
I1006 16:58:15.927139  2266 linux_launcher.cpp:78] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I1006 16:58:15.929747  2266 main.cpp:149] Starting Mesos slave
I1006 16:58:15.933691  2818 slave.cpp:167] Slave started on 1)@127.0.1.1:5051
I1006 16:58:15.988718  2818 slave.cpp:278] Slave resources: cpus(*):8; mem(*):15061; disk(*):919916; ports(*):[31000-32000]
I1006 16:58:15.992478  2818 slave.cpp:306] Slave hostname: test-host.local
I1006 16:58:15.992552  2818 slave.cpp:307] Slave checkpoint: true
I1006 16:58:16.002214  2815 state.cpp:33] Recovering state from '/var/lib/mesos/meta'
I1006 16:58:16.003589  2815 state.cpp:50] Slave host rebooted
I1006 16:58:16.004365  2816 status_update_manager.cpp:193] Recovering status update manager
I1006 16:58:16.076061  2816 containerizer.cpp:252] Recovering containerizer
I1006 16:58:16.088528  2815 slave.cpp:3198] Finished recovery

... 

I1006 17:28:04.565655  2814 slave.cpp:1002] Got assigned task 1412609276176-www-data-test-ipython-1-b69cccbf-677b-47a7-83f9-74e713b7678c for framework 20140919-174559-16842879-5050-27194-
I1006 17:28:04.568666  2814 slave.cpp:1112] Launching task 1412609276176-www-data-test-ipython-1-b69cccbf-677b-47a7-83f9-74e713b7678c for framework 20140919-174559-16842879-5050-27194-
I1006 17:28:05.814142  2814 slave.cpp:3857] Checkpointing ExecutorInfo to '/var/lib/mesos/meta/slaves/20141006-165817-16842879-5050-2264-0/frameworks/20140919-174559-16842879-5050-27194-/executors/thermos-1412609276176-www-data-test-ipython-1-b69cccbf-677b-47a7-83f9-74e713b7678c/executor.info'
I1006 17:28:06.006503  2814 slave.cpp:3972] Checkpointing TaskInfo to '/var/lib/mesos/meta/slaves/20141006-165817-16842879-5050-2264-0/frameworks/20140919-174559-16842879-5050-27194-/executors/thermos-1412609276176-www-data-test-ipython-1-b69cccbf-677b-47a7-83f9-74e713b7678c/runs/899fb038-cb6c-429b-8132-630ac582c846/tasks/1412609276176-www-data-test-ipython-1-b69cccbf-677b-47a7-83f9-74e713b7678c/task.info'
I1006 17:28:06.006503  2817 containerizer.cpp:394] Starting container '899fb038-cb6c-429b-8132-630ac582c846' for executor 'thermos-1412609276176-www-data-test-ipython-1-b69cccbf-677b-47a7-83f9-74e713b7678c' of framework '20140919-174559-16842879-5050-27194-'
I1006 17:28:06.008249  2814 slave.cpp:1222] Queuing task '1412609276176-www-data-test-ipython-1-b69cccbf-677b-47a7-83f9-74e713b7678c' for executor th

Re: Problems with OOM

2014-10-07 Thread Stephan Erb
Ok, here is something odd. My kernel is booted using 
"cgroup_enable=memory swapaccount=1" in order to enable cgroup accounting.


The log for starting a new container:

I1007 11:38:25.881882  3698 slave.cpp:1222] Queuing task 
'1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1' 
for executor 
thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1
 of framework '20140919-174559-16842879-5050-27194-

I1007 11:38:25.891448  3696 cpushare.cpp:338] Updated 'cpu.shares' to 1280 
(cpus 1.25) for container 866af1d4-14df-4e55-be5d-a54e2a573cd7

I1007 11:38:25.892354  3695 mem.cpp:479] Started listening for OOM events for 
container 866af1d4-14df-4e55-be5d-a54e2a573cd7

I1007 11:38:25.894224  3695 mem.cpp:293] Updated 'memory.soft_limit_in_bytes' 
to 628MB for container 866af1d4-14df-4e55-be5d-a54e2a573cd7

I1007 11:38:25.897894  3695 mem.cpp:347] Updated 'memory.memsw.limit_in_bytes' 
to 628MB for container 866af1d4-14df-4e55-be5d-a54e2a573cd7

I1007 11:38:25.901499  3693 linux_launcher.cpp:191] Cloning child process with 
flags = 0

I1007 11:38:25.982059  3693 containerizer.cpp:678] Checkpointing executor's 
forked pid 3985 to 
'/var/lib/mesos/meta/slaves/20141007-113221-16842879-5050-2279-0/frameworks/20140919-174559-16842879-5050-27194-/executors/thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1/runs/866af1d4-14df-4e55-be5d-a54e2a573cd7/pids/forked.pid'

I1007 11:38:26.170440  3696 containerizer.cpp:510] Fetching URIs for container 
'866af1d4-14df-4e55-be5d-a54e2a573cd7' using command 
'/usr/local/libexec/mesos/mesos-fetcher'

I1007 11:38:26.796327  3692 slave.cpp:2538] Monitoring executor 
'thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1'
 of framework '20140919-174559-16842879-5050-27194-' in container 
'866af1d4-14df-4e55-be5d-a54e2a573cd7'

I1007 11:38:27.611901  3691 slave.cpp:1733] Got registration for executor 
'thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1'
 of framework 20140919-174559-16842879-5050-27194- from 
executor(1)@127.0.1.1:39709

I1007 11:38:27.612476  3691 slave.cpp:1819] Checkpointing executor pid 
'executor(1)@127.0.1.1:39709' to 
'/var/lib/mesos/meta/slaves/20141007-113221-16842879-5050-2279-0/frameworks/20140919-174559-16842879-5050-27194-/executors/thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1/runs/866af1d4-14df-4e55-be5d-a54e2a573cd7/pids/libprocess.pid'

I1007 11:38:27.614302  3691 slave.cpp:1853] Flushing queued task 
1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1 for 
executor 
'thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1'
 of framework 20140919-174559-16842879-5050-27194-

I1007 11:38:27.615567  3697 cpushare.cpp:338] Updated 'cpu.shares' to 1280 
(cpus 1.25) for container 866af1d4-14df-4e55-be5d-a54e2a573cd7

I1007 11:38:27.615622  3694 mem.cpp:293] Updated 'memory.soft_limit_in_bytes' 
to 628MB for container 866af1d4-14df-4e55-be5d-a54e2a573cd7

I1007 11:38:27.630520  3694 slave.cpp:2088] Handling status update 
TASK_STARTING (UUID: 177f83dd-6669-4ead-8e42-95030e5723e4) for task 
1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1 of 
framework 20140919-174559-16842879-5050-27194- from 
executor(1)@127.0.1.1:39709


But when inspecting the limits of my container, they are not enforced as 
expected:


# cat 866af1d4-14df-4e55-be5d-a54e2a573cd7/memory.soft_limit_in_bytes

658505728

# cat 866af1d4-14df-4e55-be5d-a54e2a573cd7/memory.limit_in_bytes

9223372036854775807

# cat 866af1d4-14df-4e55-be5d-a54e2a573cd7/memory.memsw.limit_in_bytes

9223372036854775807


Shouldn't the memsw.limit_in_bytes be set as well?

Best Regards,
Stephan


On 06.10.2014 18:56, Stephan Erb wrote:

Hello,

I am still facing the same issue:

  * My process keeps allocating memory until all available system
memory is used, but it is never killed. Its sandbox is limited to
x00 MB but it ends up using several GB.
  * There is no OOM or cgroup related entry in dmesg (beside the
initialization, i.e., "Initializing cgroup subsys memory"...)
  * The slave log contains nothing suspicious (see the attached logfile)

Updating my Debian kernel from 3.2 to a backported 3.16 kernel did not 
help. The system is more responsive under load, but the OOM killer is 
still not triggered. I haven't tried running kernelshark on any of 
these kernels, yet.


My used slave command line: /usr/local/sbin/mesos-slave 
--master=zk://test-host:2181/mesos --log_dir=/var/log/mesos 
--cgroups_limit_swap --isolation=cgroups/cpu,cgroups/mem 
--work_dir=/var/lib/mesos --attributes=host:test-host;rack:unspecified


An

Re: Problems with OOM

2014-10-07 Thread Stephan Erb
Seems like there is a workaround: I can emulate my desired configuration 
to prevent swap usage, by disabling swap on the host and starting the 
slave without "--cgroups_limit_swap". Then everything works as expected, 
i.e., a misbehaving task is killed immediately.


However, I still don't know why 'cgroups_limit_swap' is not working as 
advertised.


Best Regards,
Stephan

On 07.10.2014 12:29, Stephan Erb wrote:
Ok, here is something odd. My kernel is booted using 
"cgroup_enable=memory swapaccount=1" in order to enable cgroup accounting.


The log for starting a new container:
I1007 11:38:25.881882  3698 slave.cpp:1222] Queuing task 
'1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1' 
for executor 
thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1
 of framework '20140919-174559-16842879-5050-27194-
I1007 11:38:25.891448  3696 cpushare.cpp:338] Updated 'cpu.shares' to 1280 
(cpus 1.25) for container 866af1d4-14df-4e55-be5d-a54e2a573cd7
I1007 11:38:25.892354  3695 mem.cpp:479] Started listening for OOM events for 
container 866af1d4-14df-4e55-be5d-a54e2a573cd7
I1007 11:38:25.894224  3695 mem.cpp:293] Updated 'memory.soft_limit_in_bytes' 
to 628MB for container 866af1d4-14df-4e55-be5d-a54e2a573cd7
I1007 11:38:25.897894  3695 mem.cpp:347] Updated 'memory.memsw.limit_in_bytes' 
to 628MB for container 866af1d4-14df-4e55-be5d-a54e2a573cd7
I1007 11:38:25.901499  3693 linux_launcher.cpp:191] Cloning child process with 
flags = 0
I1007 11:38:25.982059  3693 containerizer.cpp:678] Checkpointing executor's 
forked pid 3985 to 
'/var/lib/mesos/meta/slaves/20141007-113221-16842879-5050-2279-0/frameworks/20140919-174559-16842879-5050-27194-/executors/thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1/runs/866af1d4-14df-4e55-be5d-a54e2a573cd7/pids/forked.pid'
I1007 11:38:26.170440  3696 containerizer.cpp:510] Fetching URIs for container 
'866af1d4-14df-4e55-be5d-a54e2a573cd7' using command 
'/usr/local/libexec/mesos/mesos-fetcher'
I1007 11:38:26.796327  3692 slave.cpp:2538] Monitoring executor 
'thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1'
 of framework '20140919-174559-16842879-5050-27194-' in container 
'866af1d4-14df-4e55-be5d-a54e2a573cd7'
I1007 11:38:27.611901  3691 slave.cpp:1733] Got registration for executor 
'thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1'
 of framework 20140919-174559-16842879-5050-27194- from 
executor(1)@127.0.1.1:39709
I1007 11:38:27.612476  3691 slave.cpp:1819] Checkpointing executor pid 
'executor(1)@127.0.1.1:39709' to 
'/var/lib/mesos/meta/slaves/20141007-113221-16842879-5050-2279-0/frameworks/20140919-174559-16842879-5050-27194-/executors/thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1/runs/866af1d4-14df-4e55-be5d-a54e2a573cd7/pids/libprocess.pid'
I1007 11:38:27.614302  3691 slave.cpp:1853] Flushing queued task 
1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1 for 
executor 
'thermos-1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1'
 of framework 20140919-174559-16842879-5050-27194-
I1007 11:38:27.615567  3697 cpushare.cpp:338] Updated 'cpu.shares' to 1280 
(cpus 1.25) for container 866af1d4-14df-4e55-be5d-a54e2a573cd7
I1007 11:38:27.615622  3694 mem.cpp:293] Updated 'memory.soft_limit_in_bytes' 
to 628MB for container 866af1d4-14df-4e55-be5d-a54e2a573cd7
I1007 11:38:27.630520  3694 slave.cpp:2088] Handling status update 
TASK_STARTING (UUID: 177f83dd-6669-4ead-8e42-95030e5723e4) for task 
1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b57dd1 of 
framework 20140919-174559-16842879-5050-27194- from 
executor(1)@127.0.1.1:39709

But when inspecting the limits of my container, they are not enforced 
as expected:


# cat 866af1d4-14df-4e55-be5d-a54e2a573cd7/memory.soft_limit_in_bytes
658505728
# cat 866af1d4-14df-4e55-be5d-a54e2a573cd7/memory.limit_in_bytes
9223372036854775807
# cat 866af1d4-14df-4e55-be5d-a54e2a573cd7/memory.memsw.limit_in_bytes
9223372036854775807

Shouldn't the memsw.limit_in_bytes be set as well?

Best Regards,
Stephan


On 06.10.2014 18:56, Stephan Erb wrote:

Hello,

I am still facing the same issue:

  * My process keeps allocating memory until all available system
memory is used, but it is never killed. Its sandbox is limited to
x00 MB but it ends up using several GB.
  * There is no OOM or cgroup related entry in dmesg (beside the
initialization, i.e., "Initializing cgroup subsys memory"...)
  * The slave log contains nothing suspicious (see the attached logfile)

Updating my Debian kernel from 3.2 to a backported 3.16 kernel did 

Mesos replicated log fills disk with logging output

2018-01-08 Thread Stephan Erb
Hi everyone,

a few days ago, we have bumped into an interesting issue that we had not seen 
before. Essentially, one of our toy clusters dissolved itself:


  *   3 masters, each running Mesos (1.2.1), Aurora (0.19.0), and ZooKeeper 
(3.4.5) for leader election
  *   Master 1 and master 2 had 100% disk usage, because 
/var/lib/mesos/replicated_log/LOG had grown to about 170 GB
  *   The replicated log of both Master 1 and 2 was corrupted. A process 
restart did not fix it.
  *   The ZooKeeper on Master 2 was corrupted as well. Logs indicated this was 
caused by the full disk.
  *   Master 3 was the leading Mesos master and healthy. Its disk usage was 
normal.


The content of /var/lib/mesos/replicated_log/LOG was an endless stream of:

2018/01/04-12:30:56.776466 7f65aae877c0 Recovering log #1753
2018/01/04-12:30:56.776577 7f65aae877c0 Level-0 table #1756: started
2018/01/04-12:30:56.778885 7f65aae877c0 Level-0 table #1756: 7526 bytes OK
2018/01/04-12:30:56.782433 7f65aae877c0 Delete type=0 #1753
2018/01/04-12:30:56.782484 7f65aae877c0 Delete type=3 #1751
2018/01/04-12:30:56.782642 7f6597fff700 Level-0 table #1759: started
2018/01/04-12:30:56.782686 7f6597fff700 Level-0 table #1759: 0 bytes OK
2018/01/04-12:30:56.783242 7f6597fff700 Delete type=0 #1757
2018/01/04-12:30:56.783312 7f6597fff700 Compacting 4@0 + 1@1 files
2018/01/04-12:30:56.783499 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0 ]
2018/01/04-12:30:56.783538 7f6597fff700 Delete type=2 #1760
2018/01/04-12:30:56.783563 7f6597fff700 Compaction error: IO error: 
/var/lib/mesos/replicated_log/001735.sst: No such file or directory
2018/01/04-12:30:56.783598 7f6597fff700 Manual compaction at level-0 from 
(begin) .. (end); will stop at '003060' @ 9423 : 1
2018/01/04-12:30:56.783607 7f6597fff700 Compacting 4@0 + 1@1 files
2018/01/04-12:30:56.783698 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0 ]
2018/01/04-12:30:56.783728 7f6597fff700 Delete type=2 #1761
2018/01/04-12:30:56.783749 7f6597fff700 Compaction error: IO error: 
/var/lib/mesos/replicated_log/001735.sst: No such file or directory
2018/01/04-12:30:56.783770 7f6597fff700 Compacting 4@0 + 1@1 files
2018/01/04-12:30:56.783900 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0 ]
2018/01/04-12:30:56.783929 7f6597fff700 Delete type=2 #1762
2018/01/04-12:30:56.783950 7f6597fff700 Compaction error: IO error: 
/var/lib/mesos/replicated_log/001735.sst: No such file or directory
2018/01/04-12:30:56.783970 7f6597fff700 Compacting 4@0 + 1@1 files
2018/01/04-12:30:56.784312 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0 ]
2018/01/04-12:30:56.785547 7f6597fff700 Delete type=2 #1763

Content of the associated folder:

/var/lib/mesos/replicated_log.corrupted# ls -la
total 964480
drwxr-xr-x 2 mesos mesos  4096 Jan  5 10:12 .
drwxr-xr-x 4 mesos mesos  4096 Jan  5 10:27 ..
-rw-r--r-- 1 mesos mesos   724 Dec 14 16:22 001735.ldb
-rw-r--r-- 1 mesos mesos  7393 Dec 14 16:45 001737.sst
-rw-r--r-- 1 mesos mesos 22129 Jan  3 12:53 001742.sst
-rw-r--r-- 1 mesos mesos 14967 Jan  3 13:00 001747.sst
-rw-r--r-- 1 mesos mesos  7526 Jan  4 12:30 001756.sst
-rw-r--r-- 1 mesos mesos 15113 Jan  5 10:08 001765.sst
-rw-r--r-- 1 mesos mesos 65536 Jan  5 10:09 001767.log
-rw-r--r-- 1 mesos mesos16 Jan  5 10:08 CURRENT
-rw-r--r-- 1 mesos mesos 0 Aug 25  2015 LOCK
-rw-r--r-- 1 mesos mesos 178303865220 Jan  5 10:12 LOG
-rw-r--r-- 1 mesos mesos 463093282 Jan  5 10:08 LOG.old
-rw-r--r-- 1 mesos mesos 65536 Jan  5 10:08 MANIFEST-001764

Monitoring indicates that the disk usage started to grow shortly after a badly 
coordinated configuration deployment change:


  *   Master 1 was leading and restarted after a few hours of uptime
  *   Master 2 was now leading. After a few seconds (30s-60s or so) it got 
restarted as well
  *   Master 3 was now leading (and continued to do so)

I have to admit I am a bit surprised that the restart scenario could lead to 
the issues described above. Has anyone seen similar issues as well?

Thanks and best regards,
Stephan


Re: Mesos replicated log fills disk with logging output

2018-01-10 Thread Stephan Erb
Thanks for the hint! The cluster is using ext4, and judging from the linked 
thread this could have indeed be caused by a stalling hypervisor.

From: Jie Yu 
Reply-To: "user@mesos.apache.org" 
Date: Monday, 8. January 2018 at 23:36
To: user 
Subject: Re: Mesos replicated log fills disk with logging output

Stephan,

I haven't seen that before. A quick Google search suggests that it might be 
related to leveldb. The following thread might be related.
https://groups.google.com/d/msg/leveldb/lRrbv4Y0YgU/AtfRTfQXNoYJ

What is the filesystem you're using?

- JIe

On Mon, Jan 8, 2018 at 2:28 PM, Stephan Erb 
mailto:stephan@blue-yonder.com>> wrote:
Hi everyone,

a few days ago, we have bumped into an interesting issue that we had not seen 
before. Essentially, one of our toy clusters dissolved itself:

·  3 masters, each running Mesos (1.2.1), Aurora (0.19.0), and ZooKeeper 
(3.4.5) for leader election
·  Master 1 and master 2 had 100% disk usage, because 
/var/lib/mesos/replicated_log/LOG had grown to about 170 GB
·  The replicated log of both Master 1 and 2 was corrupted. A process restart 
did not fix it.
·  The ZooKeeper on Master 2 was corrupted as well. Logs indicated this was 
caused by the full disk.
·  Master 3 was the leading Mesos master and healthy. Its disk usage was normal.


The content of /var/lib/mesos/replicated_log/LOG was an endless stream of:

2018/01/04-12:30:56.776466 7f65aae877c0 Recovering log #1753
2018/01/04-12:30:56.776577 7f65aae877c0 Level-0 table #1756: started
2018/01/04-12:30:56.778885 7f65aae877c0 Level-0 table #1756: 7526 bytes OK
2018/01/04-12:30:56.782433 7f65aae877c0 Delete type=0 #1753
2018/01/04-12:30:56.782484 7f65aae877c0 Delete type=3 #1751
2018/01/04-12:30:56.782642 7f6597fff700 Level-0 table #1759: started
2018/01/04-12:30:56.782686 7f6597fff700 Level-0 table #1759: 0 bytes OK
2018/01/04-12:30:56.783242 7f6597fff700 Delete type=0 #1757
2018/01/04-12:30:56.783312 7f6597fff700 Compacting 4@0 + 1@1 files
2018/01/04-12:30:56.783499 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0 ]
2018/01/04-12:30:56.783538 7f6597fff700 Delete type=2 #1760
2018/01/04-12:30:56.783563 7f6597fff700 Compaction error: IO error: 
/var/lib/mesos/replicated_log/001735.sst: No such file or directory
2018/01/04-12:30:56.783598 7f6597fff700 Manual compaction at level-0 from 
(begin) .. (end); will stop at '003060' @ 9423 : 1
2018/01/04-12:30:56.783607 7f6597fff700 Compacting 4@0 + 1@1 files
2018/01/04-12:30:56.783698 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0 ]
2018/01/04-12:30:56.783728 7f6597fff700 Delete type=2 #1761
2018/01/04-12:30:56.783749 7f6597fff700 Compaction error: IO error: 
/var/lib/mesos/replicated_log/001735.sst: No such file or directory
2018/01/04-12:30:56.783770 7f6597fff700 Compacting 4@0 + 1@1 files
2018/01/04-12:30:56.783900 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0 ]
2018/01/04-12:30:56.783929 7f6597fff700 Delete type=2 #1762
2018/01/04-12:30:56.783950 7f6597fff700 Compaction error: IO error: 
/var/lib/mesos/replicated_log/001735.sst: No such file or directory
2018/01/04-12:30:56.783970 7f6597fff700 Compacting 4@0 + 1@1 files
2018/01/04-12:30:56.784312 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0 ]
2018/01/04-12:30:56.785547 7f6597fff700 Delete type=2 #1763

Content of the associated folder:

/var/lib/mesos/replicated_log.corrupted# ls -la
total 964480
drwxr-xr-x 2 mesos mesos  4096 Jan  5 10:12 .
drwxr-xr-x 4 mesos mesos  4096 Jan  5 10:27 ..
-rw-r--r-- 1 mesos mesos   724 Dec 14 16:22 001735.ldb
-rw-r--r-- 1 mesos mesos  7393 Dec 14 16:45 001737.sst
-rw-r--r-- 1 mesos mesos 22129 Jan  3 12:53 001742.sst
-rw-r--r-- 1 mesos mesos 14967 Jan  3 13:00 001747.sst
-rw-r--r-- 1 mesos mesos  7526 Jan  4 12:30 001756.sst
-rw-r--r-- 1 mesos mesos 15113 Jan  5 10:08 001765.sst
-rw-r--r-- 1 mesos mesos 65536 Jan  5 10:09 001767.log
-rw-r--r-- 1 mesos mesos16 Jan  5 10:08 CURRENT
-rw-r--r-- 1 mesos mesos 0 Aug 25  2015 LOCK
-rw-r--r-- 1 mesos mesos 178303865220 Jan  5 10:12 LOG
-rw-r--r-- 1 mesos mesos 463093282 Jan  5 10:08 LOG.old
-rw-r--r-- 1 mesos mesos 65536 Jan  5 10:08 MANIFEST-001764

Monitoring indicates that the disk usage started to grow shortly after a badly 
coordinated configuration deployment change:

·  Master 1 was leading and restarted after a few hours of uptime
·  Master 2 was now leading. After a few seconds (30s-60s or so) it got 
restarted as well
·  Master 3 was now leading (and continued to do so)

I have to admit I am a bit surprised that the restart scenario could lead to 
the issues described above. Has anyone seen similar issues as well?

Thanks and best regards,
Stephan



Re: [VOTE] Release Apache Mesos 1.6.1 (rc2)

2018-07-25 Thread Stephan Erb
The vote for 1.6.1 appears to have passed. Any chance we can get this released 
soon?

Thanks!


On 19.07.18, 01:11, "Gastón Kleiman"  wrote:

+1 (binding)

Tested on our internal CI. All green!
Tested on CentOS 7 and the following tests failed:

[  FAILED  ] DockerContainerizerTest.ROOT_DOCKER_Launch_Executor
[  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_CFS_EnableCfs
[  FAILED  ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen
[  FAILED  ]
NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage
[  FAILED  ]

bool/UserContainerLoggerTest.ROOT_LOGROTATE_RotateWithSwitchUserTrueOrFalse/0,
where GetParam() = true

They are all known to be flaky.

On Wed, Jul 11, 2018 at 6:15 PM Greg Mann  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.6.1.
>
>
> 1.6.1 includes the following:
>
> 

> *Announce major features here*
> *Announce major bug fixes here*
>
> The CHANGELOG for the release is available at:
>
> 
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.6.1-rc2
>
> 

>
> The candidate for Mesos 1.6.1 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc2/mesos-1.6.1.tar.gz
>
> The tag to be voted on is 1.6.1-rc2:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.6.1-rc2
>
> The SHA512 checksum of the tarball can be found at:
>
> 
https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc2/mesos-1.6.1.tar.gz.sha512
>
> The signature of the tarball can be found at:
>
> 
https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc2/mesos-1.6.1.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1230
>
> Please vote on releasing this package as Apache Mesos 1.6.1!
>
> The vote is open until Mon Jul 16 18:15:00 PDT 2018 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.6.1
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Greg
>