[jira] [Comment Edited] (MESOS-4353) Limit the number of processes created by libprocess

2016-02-13 Thread Maged Michael (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146321#comment-15146321
 ] 

Maged Michael edited comment on MESOS-4353 at 2/14/16 2:20 AM:
---

[Klaus]: "If the value is invalid, keep current behaviour, right?"

Yes. The number of worker threads will remain unchanged.

[Klaus]: "Are we also updating the document to support 
LIBPROCESS_WORKER_THREADS for Master/Agent?"

I don't think there is need for separate update of the documentation for master 
and agent. The only section that will need to be updated is "Libprocess 
Options" (http://mesos.apache.org/documentation/latest/configuration/). If we 
add the LIBPROCESS_WORKER_THREADS environment variable, then it will apply to 
all types of Libprocess processes, including master and agents.


was (Author: magedm):
[Klaus]: "If the value is invalid, keep current behaviour, right?"

Yes. The number of worker threads will remain unchanged.

[Klaus]: "Are we also updating the document to support 
LIBPROCESS_WORKER_THREADS for Master/Agent?"

I don't think there is need for separate update of the documentation for master 
and agent. The only section that will need needs to be updated is "Libprocess 
Options" (http://mesos.apache.org/documentation/latest/configuration/). If we 
add the LIBPROCESS_WORKER_THREADS environment variable, then it will apply to 
all types of Libprocess processes, including master and agents.

> Limit the number of processes created by libprocess
> ---
>
> Key: MESOS-4353
> URL: https://issues.apache.org/jira/browse/MESOS-4353
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> Currently libprocess will create {{max(8, number of CPU cores)}} processes 
> during the initialization, see 
> https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146
>  for details. This should be OK for a normal machine which has no much cores 
> (e.g., 16, 32), but for a powerful machine which may have a large number of 
> cores (e.g., an IBM Power machine may have 192 cores), this will cause too 
> much worker threads which are not necessary.
> And since libprocess is widely used in Mesos (master, agent, scheduler, 
> executor), it may also cause some performance issue. For example, when user 
> creates a Docker container via Mesos in a Mesos agent which is running on a 
> powerful machine with 192 cores, the DockerContainerizer in Mesos agent will 
> create a dedicated executor for the container, and there will be 192 worker 
> threads in that executor. And if user creates 1000 Docker containers in that 
> machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads 
> which is a large number and may thrash the OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4353) Limit the number of processes created by libprocess

2016-02-13 Thread Maged Michael (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146321#comment-15146321
 ] 

Maged Michael commented on MESOS-4353:
--

[Klaus]: "If the value is invalid, keep current behaviour, right?"

Yes. The number of worker threads will remain unchanged.

[Klaus]: "Are we also updating the document to support 
LIBPROCESS_WORKER_THREADS for Master/Agent?"

I don't think there is need for separate update of the documentation for master 
and agent. The only section that will need needs to be updated is "Libprocess 
Options" (http://mesos.apache.org/documentation/latest/configuration/). If we 
add the LIBPROCESS_WORKER_THREADS environment variable, then it will apply to 
all types of Libprocess processes, including master and agents.

> Limit the number of processes created by libprocess
> ---
>
> Key: MESOS-4353
> URL: https://issues.apache.org/jira/browse/MESOS-4353
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> Currently libprocess will create {{max(8, number of CPU cores)}} processes 
> during the initialization, see 
> https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146
>  for details. This should be OK for a normal machine which has no much cores 
> (e.g., 16, 32), but for a powerful machine which may have a large number of 
> cores (e.g., an IBM Power machine may have 192 cores), this will cause too 
> much worker threads which are not necessary.
> And since libprocess is widely used in Mesos (master, agent, scheduler, 
> executor), it may also cause some performance issue. For example, when user 
> creates a Docker container via Mesos in a Mesos agent which is running on a 
> powerful machine with 192 cores, the DockerContainerizer in Mesos agent will 
> create a dedicated executor for the container, and there will be 192 worker 
> threads in that executor. And if user creates 1000 Docker containers in that 
> machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads 
> which is a large number and may thrash the OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4353) Limit the number of processes created by libprocess

2016-02-12 Thread Maged Michael (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145367#comment-15145367
 ] 

Maged Michael commented on MESOS-4353:
--

If this is OK, then I propose the following design:

* Introduce a new environment variable to allow the operator to set the number 
of libprocess worker threads.
* The environment variable is named LIBPROCESS_WORKER_THREADS
* Valid values of the environment variable are integers in the range 1 to 1024. 
* All other values are invalid and generate a warning.
* The proposed environment variable can be set directly for Mesos master, 
agents (slaves), and tests.
* For executors, the proposed environment variable can be set indirectly by 
including it in the setting of the agent (slave) 
--executor_environment_variables option (See documentation of Mesos 
configuration http://mesos.apache.org/documentation/latest/configuration/).
* Update documentation of Mesos configuration to reflect the addition of this 
libprocess environment variable.


> Limit the number of processes created by libprocess
> ---
>
> Key: MESOS-4353
> URL: https://issues.apache.org/jira/browse/MESOS-4353
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> Currently libprocess will create {{max(8, number of CPU cores)}} processes 
> during the initialization, see 
> https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146
>  for details. This should be OK for a normal machine which has no much cores 
> (e.g., 16, 32), but for a powerful machine which may have a large number of 
> cores (e.g., an IBM Power machine may have 192 cores), this will cause too 
> much worker threads which are not necessary.
> And since libprocess is widely used in Mesos (master, agent, scheduler, 
> executor), it may also cause some performance issue. For example, when user 
> creates a Docker container via Mesos in a Mesos agent which is running on a 
> powerful machine with 192 cores, the DockerContainerizer in Mesos agent will 
> create a dedicated executor for the container, and there will be 192 worker 
> threads in that executor. And if user creates 1000 Docker containers in that 
> machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads 
> which is a large number and may thrash the OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4353) Limit the number of processes created by libprocess

2016-02-11 Thread Maged Michael (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142699#comment-15142699
 ] 

Maged Michael commented on MESOS-4353:
--

Replying to Joris:
> I don't think it makes sense to make this a maximum. Rather, it is just the 
> number of libprocess_worker_threads.

My concern is that the number may be set to a very large value.How about we set 
a hardwired maximum value to limit the given value if it is too large?

> Limit the number of processes created by libprocess
> ---
>
> Key: MESOS-4353
> URL: https://issues.apache.org/jira/browse/MESOS-4353
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> Currently libprocess will create {{max(8, number of CPU cores)}} processes 
> during the initialization, see 
> https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146
>  for details. This should be OK for a normal machine which has no much cores 
> (e.g., 16, 32), but for a powerful machine which may have a large number of 
> cores (e.g., an IBM Power machine may have 192 cores), this will cause too 
> much worker threads which are not necessary.
> And since libprocess is widely used in Mesos (master, agent, scheduler, 
> executor), it may also cause some performance issue. For example, when user 
> creates a Docker container via Mesos in a Mesos agent which is running on a 
> powerful machine with 192 cores, the DockerContainerizer in Mesos agent will 
> create a dedicated executor for the container, and there will be 192 worker 
> threads in that executor. And if user creates 1000 Docker containers in that 
> machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads 
> which is a large number and may thrash the OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4353) Limit the number of processes created by libprocess

2016-02-04 Thread Maged Michael (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132967#comment-15132967
 ] 

Maged Michael commented on MESOS-4353:
--

Patch https://reviews.apache.org/r/43144/

> Limit the number of processes created by libprocess
> ---
>
> Key: MESOS-4353
> URL: https://issues.apache.org/jira/browse/MESOS-4353
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> Currently libprocess will create {{max(8, number of CPU cores)}} processes 
> during the initialization, see 
> https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146
>  for details. This should be OK for a normal machine which has no much cores 
> (e.g., 16, 32), but for a powerful machine which may have a large number of 
> cores (e.g., an IBM Power machine may have 192 cores), this will cause too 
> much worker threads which are not necessary.
> And since libprocess is widely used in Mesos (master, agent, scheduler, 
> executor), it may also cause some performance issue. For example, when user 
> creates a Docker container via Mesos in a Mesos agent which is running on a 
> powerful machine with 192 cores, the DockerContainerizer in Mesos agent will 
> create a dedicated executor for the container, and there will be 192 worker 
> threads in that executor. And if user creates 1000 Docker containers in that 
> machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads 
> which is a large number and may thrash the OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4353) Limit the number of processes created by libprocess

2016-02-03 Thread Maged Michael (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131116#comment-15131116
 ] 

Maged Michael commented on MESOS-4353:
--

Draft design document 
https://docs.google.com/document/d/1D34S0HOZOonu510305THPECioCZ4bH5Zqq-AeCq5gmQ

> Limit the number of processes created by libprocess
> ---
>
> Key: MESOS-4353
> URL: https://issues.apache.org/jira/browse/MESOS-4353
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> Currently libprocess will create {{max(8, number of CPU cores)}} processes 
> during the initialization, see 
> https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146
>  for details. This should be OK for a normal machine which has no much cores 
> (e.g., 16, 32), but for a powerful machine which may have a large number of 
> cores (e.g., an IBM Power machine may have 192 cores), this will cause too 
> much worker threads which are not necessary.
> And since libprocess is widely used in Mesos (master, agent, scheduler, 
> executor), it may also cause some performance issue. For example, when user 
> creates a Docker container via Mesos in a Mesos agent which is running on a 
> powerful machine with 192 cores, the DockerContainerizer in Mesos agent will 
> create a dedicated executor for the container, and there will be 192 worker 
> threads in that executor. And if user creates 1000 Docker containers in that 
> machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads 
> which is a large number and may thrash the OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4353) Limit the number of processes created by libprocess

2016-01-25 Thread Maged Michael (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115256#comment-15115256
 ] 

Maged Michael edited comment on MESOS-4353 at 1/25/16 2:21 PM:
---

Currently, there is a lower bound of 8 on the number of worker threads. I 
suggest that if the operator provides an upper bound that is less than 8 (e.g., 
sets an environment variable LIBPROCESS_MAX_WORKER_THREADS=4), then the 
provided upper bound overrides the lower bound (in this example, the number of 
worker threads would be set to 4). The rationale is that not all types of Mesos 
processes require 8 worker threads.


was (Author: magedm):
Currently, there is a lower bound of 8 on the number of worker threads. I 
suggest that if the user provides an upper bound that is less than 8 (e.g., 
sets an environment variable LIBPROCESS_MAX_WORKER_THREADS=4), then the 
provided upper bound overrides the lower bound (in this example, the number of 
worker threads would be set to 4). The rationale is that not all types Mesos 
processes require 8 worker threads.

> Limit the number of processes created by libprocess
> ---
>
> Key: MESOS-4353
> URL: https://issues.apache.org/jira/browse/MESOS-4353
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> Currently libprocess will create {{max(8, number of CPU cores)}} processes 
> during the initialization, see 
> https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146
>  for details. This should be OK for a normal machine which has no much cores 
> (e.g., 16, 32), but for a powerful machine which may have a large number of 
> cores (e.g., an IBM Power machine may have 192 cores), this will cause too 
> much worker threads which are not necessary.
> And since libprocess is widely used in Mesos (master, agent, scheduler, 
> executor), it may also cause some performance issue. For example, when user 
> creates a Docker container via Mesos in a Mesos agent which is running on a 
> powerful machine with 192 cores, the DockerContainerizer in Mesos agent will 
> create a dedicated executor for the container, and there will be 192 worker 
> threads in that executor. And if user creates 1000 Docker containers in that 
> machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads 
> which is a large number and may thrash the OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4353) Limit the number of processes created by libprocess

2016-01-25 Thread Maged Michael (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115256#comment-15115256
 ] 

Maged Michael commented on MESOS-4353:
--

Currently, there is a lower bound of 8 on the number of worker threads. I 
suggest that if the user provides an upper bound that is less than 8 (e.g., 
sets an environment variable LIBPROCESS_MAX_WORKER_THREADS=4), then the 
provided upper bound overrides the lower bound (in this example, the number of 
worker threads would be set to 4). The rationale is that not all types Mesos 
processes require 8 worker threads.

> Limit the number of processes created by libprocess
> ---
>
> Key: MESOS-4353
> URL: https://issues.apache.org/jira/browse/MESOS-4353
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> Currently libprocess will create {{max(8, number of CPU cores)}} processes 
> during the initialization, see 
> https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146
>  for details. This should be OK for a normal machine which has no much cores 
> (e.g., 16, 32), but for a powerful machine which may have a large number of 
> cores (e.g., an IBM Power machine may have 192 cores), this will cause too 
> much worker threads which are not necessary.
> And since libprocess is widely used in Mesos (master, agent, scheduler, 
> executor), it may also cause some performance issue. For example, when user 
> creates a Docker container via Mesos in a Mesos agent which is running on a 
> powerful machine with 192 cores, the DockerContainerizer in Mesos agent will 
> create a dedicated executor for the container, and there will be 192 worker 
> threads in that executor. And if user creates 1000 Docker containers in that 
> machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads 
> which is a large number and may thrash the OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4353) Limit the number of processes created by libprocess

2016-01-20 Thread Maged Michael (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109695#comment-15109695
 ] 

Maged Michael commented on MESOS-4353:
--

Some options:
(1) Use an environment variable to set the maximum number of libprocess worker 
threads per process. Such value can be set according to the role of the Mesos 
process. For example, a Mesos Master process may have a higher maximum number 
of libprocess worker threads than a Docker Containerizer.
(2) Automatically detect the type of the Mesos process (master, executor, ...) 
and set the maximum number of libprocess worker threads accordingly.
(3) Set a hardwired maximum number of worker threads regardless of the type of 
the Mesos process.

I think option (1) is the most flexible.

> Limit the number of processes created by libprocess
> ---
>
> Key: MESOS-4353
> URL: https://issues.apache.org/jira/browse/MESOS-4353
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> Currently libprocess will create {{max(8, number of CPU cores)}} processes 
> during the initialization, see 
> https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146
>  for details. This should be OK for a normal machine which has no much cores 
> (e.g., 16, 32), but for a powerful machine which may have a large number of 
> cores (e.g., an IBM Power machine may have 192 cores), this will cause too 
> much worker threads which are not necessary.
> And since libprocess is widely used in Mesos (master, agent, scheduler, 
> executor), it may also cause some performance issue. For example, when user 
> creates a Docker container via Mesos in a Mesos agent which is running on a 
> powerful machine with 192 cores, the DockerContainerizer in Mesos agent will 
> create a dedicated executor for the container, and there will be 192 worker 
> threads in that executor. And if user creates 1000 Docker containers in that 
> machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads 
> which is a large number and may thrash the OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3855) Add deterministic simulation tools for Mesos testing and debugging

2015-11-09 Thread Maged Michael (JIRA)
Maged Michael created MESOS-3855:


 Summary: Add deterministic simulation tools for Mesos testing and 
debugging
 Key: MESOS-3855
 URL: https://issues.apache.org/jira/browse/MESOS-3855
 Project: Mesos
  Issue Type: Improvement
Reporter: Maged Michael


Test case-driven testing of Mesos master and allocator under non-deterministic 
and system-dependent conditions is subject to the lack of ability to reproduce 
problems and missing problems that may only occur on systems with different 
characteristics. Furthermore, test case-driven testing requires identifying the 
test cases beforehand.

The proposed simulation tools aim to run unmodified Mesos master and allocator 
code deterministically, driven by pseudo random events occurring within the 
constraints of configurable cluster models. Deterministic simulation guarantees 
repeatability of results. The pseudo random configurable model drives the 
exploration of the Mesos master and allocator state space without the need to 
identify specific test cases beforehand. 

Basic Requirements:
- Simulation results are deterministic. All runs with the same parameters 
generate identical results regardless of the host system.
- Automatic integration of Mesos master and allocator code into the simulator 
without manual modification, by adding capabilities in the libprocess and stout 
libraries to control timing and communication among threads and among nodes.
- Support for configurable cluster models to generate pseudo-random events to 
drive the execution of operations in Mesos master and allocator.
- Support for invariants and statistics in the cluster model in order to detect 
errors and suboptimal behavior in the tested Mesos master and allocator 
implementation.

Examples of problems to be detected by the simulator:
- Liveness problems such as deadlock, livelock, starvation.
- Safety problems such as unintentional overallocation of resources, lost 
tasks, failure to recover resources.
- Fairness problems such as allowing one or more frameworks to dominate 
resource usage at the expense of other frameworks.
- Violations of invariants in the Mesos master and allocator code.

Possible extensions that leverage common infrastructure:
- Performance testing: E.g., high response time, low resource utilization, low 
throughput
- Framework plug-in interface for testing framework task scheduling policies 
with Mesos allocators and against other framework policies.
- Cluster performance modeling to establish performance bounds for Mesos 
configurations of interest and what-if scenarios without the need to run on a 
real cloud.

Subitems (initial list):
- Add deterministic simulation capabilities to libprocess and stout.
  -- Replace real time with simulated time.
  -- Intercept inter-thread and inter-node communication.
  -- Schedule deterministic simulated communication events.
- Add libprocess-based test cases for deterministic simulation tools.
  -- Programs with inter-thread and inter-node communication using libprocess.
- Add Mesos cluster simulated event scheduler.
 -- To manage and order events (inter-node and inter-thread communication).
- Add configurable Mesos cluster model for driving deterministic simulation.
 -- Minimal (extensible) models of frameworks, roles, jobs, tasks, agents, and 
resources.
 -- Cluster invariants and statistics
- Add mock Zookeeper model for deterministic Mesos cluster simulation.

Assumptions:
- The data race freedom of the tested code.
- Correctness of 3rd party packages (zookeeper, protobuf, ...).

Link to high-level design document (in progress) https://goo.gl/9wfPef




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)