[jira] [Updated] (MESOS-1980) Benchmark RPC/s of linked Libprocess

2015-05-15 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-1980:
---
Summary: Benchmark RPC/s of linked Libprocess  (was: Introduce concept of 
clusterwise ressources)

> Benchmark RPC/s of linked Libprocess
> 
>
> Key: MESOS-1980
> URL: https://issues.apache.org/jira/browse/MESOS-1980
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>
> ibprocess has some performance bottlenecks. Implement a benchmark where we 
> can see regressions / improvements regarding RPCs performed per second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches

2015-05-15 Thread chenzongzhi (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545115#comment-14545115
 ] 

chenzongzhi commented on MESOS-2588:


Yes, this is an excellent feature.
I think the type of preHook should be a script, because people may use the 
preHook create directory, change the cgroup setting.
So a script may be better.

> Create pre-create hook before a Docker container launches
> -
>
> Key: MESOS-2588
> URL: https://issues.apache.org/jira/browse/MESOS-2588
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Reporter: Timothy Chen
>Assignee: haosdent
>
> To be able to support custom actions to be called before launching a docker 
> contianer, we should create a hook that can be extensible and allow 
> module/hooks to be performed before a docker container is launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2699) Unable to build on debian jessie

2015-05-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545122#comment-14545122
 ] 

Adam B commented on MESOS-2699:
---

[~tpitluga] Try `./configure --disable-bundled-pip`, maybe with an optional 
`--with-pip=DIR` if pip is not already in PYTHONPATH.
That said, we should also consider upgrading our bundled pip version.

> Unable to build on debian jessie
> 
>
> Key: MESOS-2699
> URL: https://issues.apache.org/jira/browse/MESOS-2699
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.22.0
> Environment: here is a dockerfile to reproduce:
> FROM debian:jessie   
>  
> RUN DEBIAN_FRONTEND=noninteractive apt-get -y update 
> RUN DEBIAN_FRONTEND=noninteractive apt-get -y install \  
> apt-utils \  
> build-essential \
> autoconf \   
> libtool \
> libcurl4-nss-dev \   
> libsasl2-dev \   
> libapr1-dev \
> libsvn-dev \ 
> zlib1g-dev \ 
> git  
>  
> RUN DEBIAN_FRONTEND=noninteractive apt-get -y install \  
> openjdk-7-jdk \  
> python-dev \ 
> python-boto \
> maven \  
> ruby2.1 \
> ruby2.1-dev  
>  
> RUN update-alternatives --install /usr/bin/gem gem /usr/bin/gem2.1 1 && \
>   update-alternatives --install /usr/bin/ruby ruby /usr/bin/ruby2.1 1
>  
> RUN mkdir /build 
>  
> WORKDIR /build   
>  
> RUN gem install fpm  
>  
> ENV MAINTAINER="devs+cos...@getbraintree.com"
>  
> RUN git clone https://github.com/mesosphere/mesos-deb-packaging.git  
> RUN cd mesos-deb-packaging && \  
>   ./build_mesos  
>Reporter: Tony Pitluga
>Priority: Minor
>
> Debian Jessie deprecated SSLV3 and has removed support for it in python's 
> urllib. The version of requests (2.3) vendored into mesos inside of the 
> 3rdparty/pip-1.5.6.tar.gz bombs out with this error:
> Traceback (most recent call last):
>   
>  
>   File "", line 1, in 
>   
>  
>   File 
> "/build/mesos-deb-packaging/mesos-repo/build/3rdparty/pip-1.5.6/pip/__init__.py",
>  line 11, in 
> from pip.vcs import git, mercurial, subversion, bazaar  # noqa
>   
>  
>   File 
> "/build/mesos-deb-packaging/mesos-repo/build/3rdparty/pip-1.5.6/pip/vcs/mercurial.py",
>  line 9, in 
> from pip.download import path_to_url  
>   
>  
>   File 
> "/build/mesos-deb-packaging/mesos-repo/build/3rdparty/pip-1.5.6/pip/download.py",
>  line 22, in

[jira] [Comment Edited] (MESOS-2699) Unable to build on debian jessie

2015-05-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545122#comment-14545122
 ] 

Adam B edited comment on MESOS-2699 at 5/15/15 8:07 AM:


[~tpitluga] Try {{./configure \-\-disable-bundled-pip}}, maybe with an optional 
{{--with-pip=DIR}} if pip is not already in PYTHONPATH.
That said, we should also consider upgrading our bundled pip version.


was (Author: adam-mesos):
[~tpitluga] Try `./configure --disable-bundled-pip`, maybe with an optional 
`--with-pip=DIR` if pip is not already in PYTHONPATH.
That said, we should also consider upgrading our bundled pip version.

> Unable to build on debian jessie
> 
>
> Key: MESOS-2699
> URL: https://issues.apache.org/jira/browse/MESOS-2699
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.22.0
> Environment: here is a dockerfile to reproduce:
> FROM debian:jessie   
>  
> RUN DEBIAN_FRONTEND=noninteractive apt-get -y update 
> RUN DEBIAN_FRONTEND=noninteractive apt-get -y install \  
> apt-utils \  
> build-essential \
> autoconf \   
> libtool \
> libcurl4-nss-dev \   
> libsasl2-dev \   
> libapr1-dev \
> libsvn-dev \ 
> zlib1g-dev \ 
> git  
>  
> RUN DEBIAN_FRONTEND=noninteractive apt-get -y install \  
> openjdk-7-jdk \  
> python-dev \ 
> python-boto \
> maven \  
> ruby2.1 \
> ruby2.1-dev  
>  
> RUN update-alternatives --install /usr/bin/gem gem /usr/bin/gem2.1 1 && \
>   update-alternatives --install /usr/bin/ruby ruby /usr/bin/ruby2.1 1
>  
> RUN mkdir /build 
>  
> WORKDIR /build   
>  
> RUN gem install fpm  
>  
> ENV MAINTAINER="devs+cos...@getbraintree.com"
>  
> RUN git clone https://github.com/mesosphere/mesos-deb-packaging.git  
> RUN cd mesos-deb-packaging && \  
>   ./build_mesos  
>Reporter: Tony Pitluga
>Priority: Minor
>
> Debian Jessie deprecated SSLV3 and has removed support for it in python's 
> urllib. The version of requests (2.3) vendored into mesos inside of the 
> 3rdparty/pip-1.5.6.tar.gz bombs out with this error:
> Traceback (most recent call last):
>   
>  
>   File "", line 1, in 
>   
>  
>   File 
> "/build/mesos-deb-packaging/mesos-repo/build/3rdparty/pip-1.5.6/pip/__init__.py",
>  line 11, in 
> from pip.vcs import git, mercurial, subversion, bazaar  # noqa
>   
>  
>   File 
> "/build/mesos-deb-packaging/mesos-repo/build/3rdparty/pip-1.5.6/pip/vcs/mercurial.py",
>  line 9, in 
> from pip.download import

[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-05-15 Thread chenqiuhao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545124#comment-14545124
 ] 

chenqiuhao commented on MESOS-2706:
---

PS:
1.In my environment,when the tasks number reached 22,The CPU usages reached 
100%.
2.From the top -Hp command ,we  know that in lt-mesos-slave process,there is 
only one thread running,the rest are sleeping .So the process can only cost 
100%  CPU  mostly .
==
Threads: 12 total, 1 running, 11 sleeping, 0 stopped, 0 zombie
==


> When the docker-tasks grow, the time spare between Queuing task and Starting 
> container grows
> 
>
> Key: MESOS-2706
> URL: https://issues.apache.org/jira/browse/MESOS-2706
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.22.0
> Environment: My Environment info:
> Mesos 0.22.0 & Marathon 0.82-RC1 both running in one host-server.
> Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
> 24G mems.
> So Mesos can launch thousands of task in theory.
> And the docker-task is very light-weight to launch a sshd service .
>Reporter: chenqiuhao
>
> At the beginning, Marathon can launch docker-task very fast,but when the 
> number of tasks in the only-one mesos-slave host reached 50,It seemed 
> Marathon lauch docker-task slow.
> So I check the mesos-slave log,and I found that the time spare between 
> Queuing task and Starting container grew .
> For example, 
> launch the 1st docker task, it takes about 0.008s
> [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
> task|Starting container'
> I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
> dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
> '20150202-112355-2684495626-5050-26153-
> I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
> 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
> '20150202-112355-2684495626-5050-26153-'
> launch the 50th docker task, it takes about 4.9s
> I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
> dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
> '20150202-112355-2684495626-5050-26153-
> I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
> '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
> '20150202-112355-2684495626-5050-26153-'
> And when i launch the 100th docker task,it takes about 13s!
> And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
> same result.
> Did somebody have the same experience , or Can help to do the same pressure 
> test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2724) Mesos support command from host machine

2015-05-15 Thread chenzongzhi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenzongzhi updated MESOS-2724:
---
Description: 
gWe use mesos + marathon to build our Paas platform. We meet a problem
We want to execute some command after the docker container started. such as we 
want change the cgroup setting.
We know We can execute some command in the  docker container, but we want 
execute command in the host machine.
Anyone know how to implement it or any good idea?
Thanks

  was:
We use mesos + marathon to build our Paas platform. We meet a problem
We want to execute some command after the docker container started. such as we 
want change the cgroup setting.
We know We can execute some command in the  docker container, but we want 
execute command in the host machine.
Anyone know how to implement it or any good idea?
Thanks


> Mesos support command from host machine
> ---
>
> Key: MESOS-2724
> URL: https://issues.apache.org/jira/browse/MESOS-2724
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework
>Reporter: chenzongzhi
>Priority: Critical
>
> gWe use mesos + marathon to build our Paas platform. We meet a problem
> We want to execute some command after the docker container started. such as 
> we want change the cgroup setting.
> We know We can execute some command in the  docker container, but we want 
> execute command in the host machine.
> Anyone know how to implement it or any good idea?
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-05-15 Thread chenqiuhao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545342#comment-14545342
 ] 

chenqiuhao commented on MESOS-2706:
---

I used strace command and found the slave process will  read all  /proc/$/stats 
and /proc/$/cmdline to statics the usage of per docker-task round by round.
For example, I launch a docker-task in an  OS which have launched other 500 
processes(count by ps -ef|grep wc ),the mesos slave process will  keep on 
reading 500+500 times /proc/$/stats&cmdline round by round .And when the number 
of docker-tasks reached 50,the massive times of reading  
/proc/$/stats&cmdline exhaust whole 1 CPU time.


> When the docker-tasks grow, the time spare between Queuing task and Starting 
> container grows
> 
>
> Key: MESOS-2706
> URL: https://issues.apache.org/jira/browse/MESOS-2706
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.22.0
> Environment: My Environment info:
> Mesos 0.22.0 & Marathon 0.82-RC1 both running in one host-server.
> Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
> 24G mems.
> So Mesos can launch thousands of task in theory.
> And the docker-task is very light-weight to launch a sshd service .
>Reporter: chenqiuhao
>
> At the beginning, Marathon can launch docker-task very fast,but when the 
> number of tasks in the only-one mesos-slave host reached 50,It seemed 
> Marathon lauch docker-task slow.
> So I check the mesos-slave log,and I found that the time spare between 
> Queuing task and Starting container grew .
> For example, 
> launch the 1st docker task, it takes about 0.008s
> [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
> task|Starting container'
> I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
> dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
> '20150202-112355-2684495626-5050-26153-
> I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
> 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
> '20150202-112355-2684495626-5050-26153-'
> launch the 50th docker task, it takes about 4.9s
> I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
> dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
> '20150202-112355-2684495626-5050-26153-
> I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
> '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
> '20150202-112355-2684495626-5050-26153-'
> And when i launch the 100th docker task,it takes about 13s!
> And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
> same result.
> Did somebody have the same experience , or Can help to do the same pressure 
> test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2119) Add Socket tests

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2119:
---
Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, 
Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 
Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 10 - 5/30  (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 
1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere 
Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15)

> Add Socket tests
> 
>
> Key: MESOS-2119
> URL: https://issues.apache.org/jira/browse/MESOS-2119
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Joris Van Remoortere
>
> Add more Socket specific tests to get coverage while doing libev to libevent 
> (w and wo SSL) move



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2645) Design doc for resource oversubscription

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2645:
---
Sprint: Mesosphere Q1 Sprint 10 - 5/30

> Design doc for resource oversubscription
> 
>
> Key: MESOS-2645
> URL: https://issues.apache.org/jira/browse/MESOS-2645
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Niklas Quarfot Nielsen
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2645) Design doc for resource oversubscription

2015-05-15 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545638#comment-14545638
 ] 

Marco Massenzio commented on MESOS-2645:


Can you please split into smaller tasks/user stories and then move out of 
Sprint the ones that won't be done.

> Design doc for resource oversubscription
> 
>
> Key: MESOS-2645
> URL: https://issues.apache.org/jira/browse/MESOS-2645
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Niklas Quarfot Nielsen
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2650) Modularize the Resource Estimator

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2650:
---
Sprint: Mesosphere Q1 Sprint 10 - 5/30

> Modularize the Resource Estimator
> -
>
> Key: MESOS-2650
> URL: https://issues.apache.org/jira/browse/MESOS-2650
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Niklas Quarfot Nielsen
>  Labels: mesosphere
>
> Modularizing the resource estimator opens up the door for org specific 
> implementations.
> Test the estimator module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2651) Implement QoS controller

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2651:
---
Sprint: Mesosphere Q1 Sprint 10 - 5/30

> Implement QoS controller
> 
>
> Key: MESOS-2651
> URL: https://issues.apache.org/jira/browse/MESOS-2651
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Niklas Quarfot Nielsen
>  Labels: mesosphere
>
> This is a component of the slave that informs the slave about the possible 
> "corrections" that need to be performed (e.g., shutdown container using 
> recoverable resources).
> This needs to be integrated with the resource monitor.
> Need to figure out the metrics used for sending corrections (e.g., scheduling 
> latency, usage, informed by executor/scheduler)
> We also need to figure out the feedback loop between the QoS controller and 
> the Resource Estimator.
> {code}
> class QoSController {
> public:
>   QoSController(ResourceMonitor* monitor);
>   process::Queue correction();
> };
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2248) 0.22.0 release

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2248:
--
Epic Status: Done

> 0.22.0 release
> --
>
> Key: MESOS-2248
> URL: https://issues.apache.org/jira/browse/MESOS-2248
> Project: Mesos
>  Issue Type: Epic
>Reporter: Niklas Quarfot Nielsen
>Assignee: Niklas Quarfot Nielsen
>
> Mesos release 0.22.0 will include the following major feature(s):
>  - Module Hooks (MESOS-2060)
>  - Disk quota isolation in Mesos containerizer (MESOS-1587 and MESOS-1588)
> Minor features and fixes:
>  - Task labels (MESOS-2120)
>  - Service discovery info for tasks and executors (MESOS-2208)
> - Docker containerizer able to recover when running in a container 
> (MESOS-2115)
>  - Containerizer fixes (...)
>  - Various bug fixes (...)
> Possible major features:
>  - Container level network isolation (MESOS-1585)
>  - Dynamic Reservations (MESOS-2018)
> This ticket will be used to track blockers to this release.
> For reference (per Jan 22nd) this has gone into Mesos since 0.21.1: 
> https://gist.github.com/nqn/76aeb41a555625659ed8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2595) Create docker executor

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2595:
---
Story Points: 8

> Create docker executor
> --
>
> Key: MESOS-2595
> URL: https://issues.apache.org/jira/browse/MESOS-2595
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>
> Currently we're reusing the command executor to wait on the progress of the 
> docker executor, but has the following drawback:
> - We need to launch a seperate docker log process just to forward logs, where 
> we can just simply reattach stdout/stderr if we create a specific executor 
> for docker
> - In general, Mesos slave is assuming that the executor is the one starting 
> the actual task. But the current docker containerizer, the containerizer is 
> actually starting the docker container first then launches the command 
> executor to wait on it. This can cause problems if the container failed 
> before the command executor was able to launch, as slave will try to update 
> the limits of the containerizer on executor registration but then the docker 
> containerizer will fail to do so since the container failed. 
> Overall it's much simpler to tie the container lifecycle with the executor 
> and simplfies logic and log management.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2595) Create docker executor

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2595:
---
Target Version/s: 0.23.0

Please confirm that this is in "Reviewable" state.
[~tnachen]

> Create docker executor
> --
>
> Key: MESOS-2595
> URL: https://issues.apache.org/jira/browse/MESOS-2595
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>
> Currently we're reusing the command executor to wait on the progress of the 
> docker executor, but has the following drawback:
> - We need to launch a seperate docker log process just to forward logs, where 
> we can just simply reattach stdout/stderr if we create a specific executor 
> for docker
> - In general, Mesos slave is assuming that the executor is the one starting 
> the actual task. But the current docker containerizer, the containerizer is 
> actually starting the docker container first then launches the command 
> executor to wait on it. This can cause problems if the container failed 
> before the command executor was able to launch, as slave will try to update 
> the limits of the containerizer on executor registration but then the docker 
> containerizer will fail to do so since the container failed. 
> Overall it's much simpler to tie the container lifecycle with the executor 
> and simplfies logic and log management.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2073) Fetcher cache file verification, updating and invalidation

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2073:
---
Sprint: Mesosphere Q1 Sprint 10 - 5/30

> Fetcher cache file verification, updating and invalidation
> --
>
> Key: MESOS-2073
> URL: https://issues.apache.org/jira/browse/MESOS-2073
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, slave
>Reporter: Bernd Mathiske
>Assignee: Bernd Mathiske
>Priority: Minor
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The other tickets in the fetcher cache epic do not necessitate a check sum 
> (e.g. MD5, SHA*) for files cached by the fetcher. Whereas such a check sum 
> could be used to verify whether the file arrived without unintended 
> alterations, it can first and foremost be employed to detect and trigger 
> updates. 
> Scenario: If a UIR is requested for fetching and the indicated download has 
> the same check sum as the cached file, then the cache file will be used and 
> the download forgone. If the check sum is different, then fetching proceeds 
> and the cached file gets replaced. 
> This capability will be indicated by an additional field in the URI protobuf. 
> Details TBD, i.e. to be discussed in comments below.
> In addition to the above, even if the check sum is the same, we can support 
> voluntary cache file invalidation: a fresh download can be requested, or the 
> caching behavior can be revoked entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2600) Introduce reservation HTTP endpoints on the master

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2600:
---
Story Points: 5

> Introduce reservation HTTP endpoints on the master
> --
>
> Key: MESOS-2600
> URL: https://issues.apache.org/jira/browse/MESOS-2600
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>  Labels: mesosphere
>
> Enable operators to manage dynamic reservations by Introducing the 
> {{/reserve}} and {{/unreserve}} HTTP endpoints on the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2600) Introduce reservation HTTP endpoints on the master

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio reassigned MESOS-2600:
--

Assignee: Michael Park

> Introduce reservation HTTP endpoints on the master
> --
>
> Key: MESOS-2600
> URL: https://issues.apache.org/jira/browse/MESOS-2600
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
>
> Enable operators to manage dynamic reservations by Introducing the 
> {{/reserve}} and {{/unreserve}} HTTP endpoints on the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2600) Introduce reservation HTTP endpoints on the master

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2600:
---
Sprint: Mesosphere Q1 Sprint 10 - 5/30

> Introduce reservation HTTP endpoints on the master
> --
>
> Key: MESOS-2600
> URL: https://issues.apache.org/jira/browse/MESOS-2600
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
>
> Enable operators to manage dynamic reservations by Introducing the 
> {{/reserve}} and {{/unreserve}} HTTP endpoints on the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2600) Introduce reservation HTTP endpoints on the master

2015-05-15 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545662#comment-14545662
 ] 

Marco Massenzio commented on MESOS-2600:


Please find a Shepherd and update the ticket

> Introduce reservation HTTP endpoints on the master
> --
>
> Key: MESOS-2600
> URL: https://issues.apache.org/jira/browse/MESOS-2600
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
>
> Enable operators to manage dynamic reservations by Introducing the 
> {{/reserve}} and {{/unreserve}} HTTP endpoints on the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2119) Add Socket tests

2015-05-15 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545666#comment-14545666
 ] 

Marco Massenzio commented on MESOS-2119:


Please confirm that the estimate effort is in the right ballpark.

> Add Socket tests
> 
>
> Key: MESOS-2119
> URL: https://issues.apache.org/jira/browse/MESOS-2119
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Joris Van Remoortere
>
> Add more Socket specific tests to get coverage while doing libev to libevent 
> (w and wo SSL) move



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2073) Fetcher cache file verification, updating and invalidation

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2073:
---
Story Points: 2  (was: 1.5)

> Fetcher cache file verification, updating and invalidation
> --
>
> Key: MESOS-2073
> URL: https://issues.apache.org/jira/browse/MESOS-2073
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, slave
>Reporter: Bernd Mathiske
>Assignee: Bernd Mathiske
>Priority: Minor
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The other tickets in the fetcher cache epic do not necessitate a check sum 
> (e.g. MD5, SHA*) for files cached by the fetcher. Whereas such a check sum 
> could be used to verify whether the file arrived without unintended 
> alterations, it can first and foremost be employed to detect and trigger 
> updates. 
> Scenario: If a UIR is requested for fetching and the indicated download has 
> the same check sum as the cached file, then the cache file will be used and 
> the download forgone. If the check sum is different, then fetching proceeds 
> and the cached file gets replaced. 
> This capability will be indicated by an additional field in the URI protobuf. 
> Details TBD, i.e. to be discussed in comments below.
> In addition to the above, even if the check sum is the same, we can support 
> voluntary cache file invalidation: a fresh download can be requested, or the 
> caching behavior can be revoked entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2119) Add Socket tests

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2119:
---
Story Points: 5

> Add Socket tests
> 
>
> Key: MESOS-2119
> URL: https://issues.apache.org/jira/browse/MESOS-2119
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Joris Van Remoortere
>
> Add more Socket specific tests to get coverage while doing libev to libevent 
> (w and wo SSL) move



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2727) 0.23.0 Release

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2727:
--
Epic Name: 0.23.0 Release Management  (was: 0.23.0 Release)

> 0.23.0 Release
> --
>
> Key: MESOS-2727
> URL: https://issues.apache.org/jira/browse/MESOS-2727
> Project: Mesos
>  Issue Type: Epic
>  Components: release
>Reporter: Adam B
>Assignee: Adam B
>
> Please add links to Epics and Major features this release is "blocked by".
> We can also track release management tasks as subtasks of this 0.23 Epic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2215) The Docker containerizer attempts to recover any task when checkpointing is enabled, not just docker tasks.

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2215:
---
Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20  (was: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere 
Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 10 - 
5/30)

> The Docker containerizer attempts to recover any task when checkpointing is 
> enabled, not just docker tasks.
> ---
>
> Key: MESOS-2215
> URL: https://issues.apache.org/jira/browse/MESOS-2215
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.21.0
>Reporter: Steve Niemitz
>Assignee: Timothy Chen
>
> Once the slave restarts and recovers the task, I see this error in the log 
> for all tasks that were recovered every second or so.  Note, these were NOT 
> docker tasks:
> W0113 16:01:00.790323 773142 monitor.cpp:213] Failed to get resource usage 
> for  container 7b729b89-dc7e-4d08-af97-8cd1af560a21 for executor 
> thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd
>  of framework 20150109-161713-715350282-5050-290797-: Failed to 'docker 
> inspect mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21': exit status = exited 
> with status 1 stderr = Error: No such image or container: 
> mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21
> However the tasks themselves are still healthy and running.
> The slave was launched with --containerizers=mesos,docker
> -
> More info: it looks like the docker containerizer is a little too ambitious 
> about recovering containers, again this was not a docker task:
> I0113 15:59:59.476145 773142 docker.cpp:814] Recovering container 
> '7b729b89-dc7e-4d08-af97-8cd1af560a21' for executor 
> 'thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd'
>  of framework 20150109-161713-715350282-5050-290797-
> Looking into the source, it looks like the problem is that the 
> ComposingContainerize runs recover in parallel, but neither the docker 
> containerizer nor mesos containerizer check if they should recover the task 
> or not (because they were the ones that launched it).  Perhaps this needs to 
> be written into the checkpoint somewhere?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2215) The Docker containerizer attempts to recover any task when checkpointing is enabled, not just docker tasks.

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2215:
---
Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere 
Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20)

> The Docker containerizer attempts to recover any task when checkpointing is 
> enabled, not just docker tasks.
> ---
>
> Key: MESOS-2215
> URL: https://issues.apache.org/jira/browse/MESOS-2215
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.21.0
>Reporter: Steve Niemitz
>Assignee: Timothy Chen
>
> Once the slave restarts and recovers the task, I see this error in the log 
> for all tasks that were recovered every second or so.  Note, these were NOT 
> docker tasks:
> W0113 16:01:00.790323 773142 monitor.cpp:213] Failed to get resource usage 
> for  container 7b729b89-dc7e-4d08-af97-8cd1af560a21 for executor 
> thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd
>  of framework 20150109-161713-715350282-5050-290797-: Failed to 'docker 
> inspect mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21': exit status = exited 
> with status 1 stderr = Error: No such image or container: 
> mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21
> However the tasks themselves are still healthy and running.
> The slave was launched with --containerizers=mesos,docker
> -
> More info: it looks like the docker containerizer is a little too ambitious 
> about recovering containers, again this was not a docker task:
> I0113 15:59:59.476145 773142 docker.cpp:814] Recovering container 
> '7b729b89-dc7e-4d08-af97-8cd1af560a21' for executor 
> 'thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd'
>  of framework 20150109-161713-715350282-5050-290797-
> Looking into the source, it looks like the problem is that the 
> ComposingContainerize runs recover in parallel, but neither the docker 
> containerizer nor mesos containerizer check if they should recover the task 
> or not (because they were the ones that launched it).  Perhaps this needs to 
> be written into the checkpoint somewhere?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2215) The Docker containerizer attempts to recover any task when checkpointing is enabled, not just docker tasks.

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2215:
---
Story Points: 8

> The Docker containerizer attempts to recover any task when checkpointing is 
> enabled, not just docker tasks.
> ---
>
> Key: MESOS-2215
> URL: https://issues.apache.org/jira/browse/MESOS-2215
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.21.0
>Reporter: Steve Niemitz
>Assignee: Timothy Chen
>
> Once the slave restarts and recovers the task, I see this error in the log 
> for all tasks that were recovered every second or so.  Note, these were NOT 
> docker tasks:
> W0113 16:01:00.790323 773142 monitor.cpp:213] Failed to get resource usage 
> for  container 7b729b89-dc7e-4d08-af97-8cd1af560a21 for executor 
> thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd
>  of framework 20150109-161713-715350282-5050-290797-: Failed to 'docker 
> inspect mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21': exit status = exited 
> with status 1 stderr = Error: No such image or container: 
> mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21
> However the tasks themselves are still healthy and running.
> The slave was launched with --containerizers=mesos,docker
> -
> More info: it looks like the docker containerizer is a little too ambitious 
> about recovering containers, again this was not a docker task:
> I0113 15:59:59.476145 773142 docker.cpp:814] Recovering container 
> '7b729b89-dc7e-4d08-af97-8cd1af560a21' for executor 
> 'thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd'
>  of framework 20150109-161713-715350282-5050-290797-
> Looking into the source, it looks like the problem is that the 
> ComposingContainerize runs recover in parallel, but neither the docker 
> containerizer nor mesos containerizer check if they should recover the task 
> or not (because they were the ones that launched it).  Perhaps this needs to 
> be written into the checkpoint somewhere?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2500) Doxygen setup for libprocess

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2500:
---
Sprint: Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere 
Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17)

> Doxygen setup for libprocess
> 
>
> Key: MESOS-2500
> URL: https://issues.apache.org/jira/browse/MESOS-2500
> Project: Mesos
>  Issue Type: Documentation
>  Components: libprocess
>Reporter: Bernd Mathiske
>Assignee: Joerg Schad
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Goals: 
> - Initial doxygen setup. 
> - Enable interested developers to generate already available doxygen content 
> locally in their workspace and view it.
> - Form the basis for future contributions of more doxygen content.
> 1. Devise a way to use Doxygen with Mesos source code. (For example, solve 
> this by adding optional brew/apt-get installation to the "Getting Started" 
> doc.)
> 2. Create a make target for libprocess documentation that can be manually 
> triggered.
> 3. Create initial library top level documentation.
> 4. Enhance one header file with Doxygen. Make sure the generated output has 
> all necessary links to navigate from the lib to the file and back, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2215) The Docker containerizer attempts to recover any task when checkpointing is enabled, not just docker tasks.

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2215:
---
Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere 
Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20)

> The Docker containerizer attempts to recover any task when checkpointing is 
> enabled, not just docker tasks.
> ---
>
> Key: MESOS-2215
> URL: https://issues.apache.org/jira/browse/MESOS-2215
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.21.0
>Reporter: Steve Niemitz
>Assignee: Timothy Chen
>
> Once the slave restarts and recovers the task, I see this error in the log 
> for all tasks that were recovered every second or so.  Note, these were NOT 
> docker tasks:
> W0113 16:01:00.790323 773142 monitor.cpp:213] Failed to get resource usage 
> for  container 7b729b89-dc7e-4d08-af97-8cd1af560a21 for executor 
> thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd
>  of framework 20150109-161713-715350282-5050-290797-: Failed to 'docker 
> inspect mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21': exit status = exited 
> with status 1 stderr = Error: No such image or container: 
> mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21
> However the tasks themselves are still healthy and running.
> The slave was launched with --containerizers=mesos,docker
> -
> More info: it looks like the docker containerizer is a little too ambitious 
> about recovering containers, again this was not a docker task:
> I0113 15:59:59.476145 773142 docker.cpp:814] Recovering container 
> '7b729b89-dc7e-4d08-af97-8cd1af560a21' for executor 
> 'thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd'
>  of framework 20150109-161713-715350282-5050-290797-
> Looking into the source, it looks like the problem is that the 
> ComposingContainerize runs recover in parallel, but neither the docker 
> containerizer nor mesos containerizer check if they should recover the task 
> or not (because they were the ones that launched it).  Perhaps this needs to 
> be written into the checkpoint somewhere?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2500) Doxygen setup for libprocess

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2500:
---
Sprint: Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere 
Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17)

> Doxygen setup for libprocess
> 
>
> Key: MESOS-2500
> URL: https://issues.apache.org/jira/browse/MESOS-2500
> Project: Mesos
>  Issue Type: Documentation
>  Components: libprocess
>Reporter: Bernd Mathiske
>Assignee: Joerg Schad
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Goals: 
> - Initial doxygen setup. 
> - Enable interested developers to generate already available doxygen content 
> locally in their workspace and view it.
> - Form the basis for future contributions of more doxygen content.
> 1. Devise a way to use Doxygen with Mesos source code. (For example, solve 
> this by adding optional brew/apt-get installation to the "Getting Started" 
> doc.)
> 2. Create a make target for libprocess documentation that can be manually 
> triggered.
> 3. Create initial library top level documentation.
> 4. Enhance one header file with Doxygen. Make sure the generated output has 
> all necessary links to navigate from the lib to the file and back, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2500) Doxygen setup for libprocess

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2500:
---
Story Points: 2

> Doxygen setup for libprocess
> 
>
> Key: MESOS-2500
> URL: https://issues.apache.org/jira/browse/MESOS-2500
> Project: Mesos
>  Issue Type: Documentation
>  Components: libprocess
>Reporter: Bernd Mathiske
>Assignee: Joerg Schad
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Goals: 
> - Initial doxygen setup. 
> - Enable interested developers to generate already available doxygen content 
> locally in their workspace and view it.
> - Form the basis for future contributions of more doxygen content.
> 1. Devise a way to use Doxygen with Mesos source code. (For example, solve 
> this by adding optional brew/apt-get installation to the "Getting Started" 
> doc.)
> 2. Create a make target for libprocess documentation that can be manually 
> triggered.
> 3. Create initial library top level documentation.
> 4. Enhance one header file with Doxygen. Make sure the generated output has 
> all necessary links to navigate from the lib to the file and back, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2500) Doxygen setup for libprocess

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2500:
---
Sprint: Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17  (was: Mesosphere Q1 Sprint 5 - 3/20, Mesosphere 
Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q1 Sprint 10 - 
5/30)

> Doxygen setup for libprocess
> 
>
> Key: MESOS-2500
> URL: https://issues.apache.org/jira/browse/MESOS-2500
> Project: Mesos
>  Issue Type: Documentation
>  Components: libprocess
>Reporter: Bernd Mathiske
>Assignee: Joerg Schad
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Goals: 
> - Initial doxygen setup. 
> - Enable interested developers to generate already available doxygen content 
> locally in their workspace and view it.
> - Form the basis for future contributions of more doxygen content.
> 1. Devise a way to use Doxygen with Mesos source code. (For example, solve 
> this by adding optional brew/apt-get installation to the "Getting Started" 
> doc.)
> 2. Create a make target for libprocess documentation that can be manually 
> triggered.
> 3. Create initial library top level documentation.
> 4. Enhance one header file with Doxygen. Make sure the generated output has 
> all necessary links to navigate from the lib to the file and back, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2582) Create optional release step: update PyPi repositories

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio reassigned MESOS-2582:
--

Assignee: Adam B

> Create optional release step: update PyPi repositories
> --
>
> Key: MESOS-2582
> URL: https://issues.apache.org/jira/browse/MESOS-2582
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Niklas Quarfot Nielsen
>Assignee: Adam B
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2582) Create optional release step: update PyPi repositories

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2582:
---
Story Points: 2

> Create optional release step: update PyPi repositories
> --
>
> Key: MESOS-2582
> URL: https://issues.apache.org/jira/browse/MESOS-2582
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Niklas Quarfot Nielsen
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2016) docker_name_prefix is too generic

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2016:
---
Sprint: Mesosphere Q4 Sprint 2 - 11/14, Mesosphere Q4 Sprint 3 - 12/7, 
Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 
Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, 
Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q4 Sprint 2 - 11/14, 
Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 
Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20)

> docker_name_prefix is too generic
> -
>
> Key: MESOS-2016
> URL: https://issues.apache.org/jira/browse/MESOS-2016
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Reporter: Jay Buffington
>Assignee: Timothy Chen
>
> From docker.hpp and docker.cpp:
> {code}
> // Prefix used to name Docker containers in order to distinguish those
> // created by Mesos from those created manually.
> extern std::string DOCKER_NAME_PREFIX;
> // TODO(benh): At some point to run multiple slaves we'll need to make
> // the Docker container name creation include the slave ID.
> string DOCKER_NAME_PREFIX = "mesos-";
> {code}
> This name is too generic.  A common pattern in docker land is to run 
> everything in a container and use volume mounts to share sockets do RPC 
> between containers.  CoreOS has popularized this technique. 
> Inevitably, what people do is start a container named "mesos-slave" which 
> runs the docker containerizer recovery code which removes all containers that 
> start with "mesos-"  And then ask "huh, why did my mesos-slave docker 
> container die? I don't see any error messages..."
> Ideally, we should do what Ben suggested and add the slave id to the name 
> prefix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2582) Create optional release step: update PyPi repositories

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2582:
---
Sprint: Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q1 Sprint 10 - 5/30  
(was: Mesosphere Q1 Sprint 7 - 4/17)

> Create optional release step: update PyPi repositories
> --
>
> Key: MESOS-2582
> URL: https://issues.apache.org/jira/browse/MESOS-2582
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Niklas Quarfot Nielsen
>Assignee: Adam B
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2155) Make docker containerizer killing orphan containers optional

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2155:
---
Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, 
Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 
Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 10 - 5/30  
(was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere 
Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20)

> Make docker containerizer killing orphan containers optional
> 
>
> Key: MESOS-2155
> URL: https://issues.apache.org/jira/browse/MESOS-2155
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>
> Currently the docker containerizer on recover will kill containers that are 
> not recognized by the containerizer.
> We want to make this behavior optional as there are certain situations we 
> want to let the docker containers still continue to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1929) Create a "Getting Started with Mesos using JIRA" document

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio reassigned MESOS-1929:
--

Assignee: Marco Massenzio  (was: Benjamin Hindman)

> Create a "Getting Started with Mesos using JIRA" document 
> --
>
> Key: MESOS-1929
> URL: https://issues.apache.org/jira/browse/MESOS-1929
> Project: Mesos
>  Issue Type: Documentation
>  Components: general
>Reporter: John Pampuch
>Assignee: Marco Massenzio
>Priority: Minor
>  Labels: documentation
>
> Create a quick start guide for contributors to understand the process used to 
> move issues through to commits, explaining the states of issues, the Agile 
> boards, sprints, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-708) Static files missing "Last-Modified" HTTP headers

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-708:
--
Sprint: Mesosphere Q1 Sprint 10 - 5/30

> Static files missing "Last-Modified" HTTP headers
> -
>
> Key: MESOS-708
> URL: https://issues.apache.org/jira/browse/MESOS-708
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess, webui
>Affects Versions: 0.13.0
>Reporter: Ross Allen
>Assignee: Alexander Rojas
>  Labels: mesosphere
>
> Static assets served by the Mesos master don't return "Last-Modified" HTTP 
> headers. That means clients receive a 200 status code and re-download assets 
> on every page request even if the assets haven't changed. Because Angular JS 
> does most of the work, the downloading happens only when you navigate to 
> Mesos master in your browser or use the browser's refresh.
> Example header for "mesos.css":
> HTTP/1.1 200 OK
> Date: Thu, 26 Sep 2013 17:18:52 GMT
> Content-Length: 1670
> Content-Type: text/css
> Clients sometimes use the "Date" header for the same effect as 
> "Last-Modified", but the date is always the time of the response from the 
> server, i.e. it changes on every request and makes the assets look new every 
> time.
> The "Last-Modified" header should be added and should be the last modified 
> time of the file. On subsequent requests for the same files, the master 
> should return 304 responses with no content rather than 200 with the full 
> files. It could save clients a lot of download time since Mesos assets are 
> rather heavyweight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1131) Support HTTP auth in libprocess

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-1131:
---
Sprint: Mesosphere Q1 Sprint 10 - 5/30

> Support HTTP auth in libprocess
> ---
>
> Key: MESOS-1131
> URL: https://issues.apache.org/jira/browse/MESOS-1131
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Hindman
>Assignee: Isabel Jimenez
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1120) HTTP auth for CLI

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-1120:
---
Sprint: Mesosphere Q1 Sprint 10 - 5/30

> HTTP auth for CLI
> -
>
> Key: MESOS-1120
> URL: https://issues.apache.org/jira/browse/MESOS-1120
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Reporter: Isabel Jimenez
>Assignee: Isabel Jimenez
>Priority: Minor
>  Labels: cli
>
> Integrate HTTP auth into the CLI programs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2595) Create docker executor

2015-05-15 Thread Jay Buffington (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545685#comment-14545685
 ] 

Jay Buffington commented on MESOS-2595:
---

[~tnachen] Is the new docker executor always pid 1 inside the container when 
the scheduler doesn't use ExecutorInfo?  Does it do proper pid 1 things like 
reap orphans?

> Create docker executor
> --
>
> Key: MESOS-2595
> URL: https://issues.apache.org/jira/browse/MESOS-2595
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>
> Currently we're reusing the command executor to wait on the progress of the 
> docker executor, but has the following drawback:
> - We need to launch a seperate docker log process just to forward logs, where 
> we can just simply reattach stdout/stderr if we create a specific executor 
> for docker
> - In general, Mesos slave is assuming that the executor is the one starting 
> the actual task. But the current docker containerizer, the containerizer is 
> actually starting the docker container first then launches the command 
> executor to wait on it. This can cause problems if the container failed 
> before the command executor was able to launch, as slave will try to update 
> the limits of the containerizer on executor registration but then the docker 
> containerizer will fail to do so since the container failed. 
> Overall it's much simpler to tie the container lifecycle with the executor 
> and simplfies logic and log management.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2630) Remove capture by reference of temporaries in Stout

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2630:
--
Sprint: Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, 
Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q2 Sprint 8 - 5/1, Mesosphere 
Q1 Sprint 9 - 5/15)

> Remove capture by reference of temporaries in Stout
> ---
>
> Key: MESOS-2630
> URL: https://issues.apache.org/jira/browse/MESOS-2630
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2108) Add configure flag or environment variable to enable SSL/libevent Socket

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2108:
--
Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, 
Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 
Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q4 Sprint 3 - 
12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere 
Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 
3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere 
Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Add configure flag or environment variable to enable SSL/libevent Socket
> 
>
> Key: MESOS-2108
> URL: https://issues.apache.org/jira/browse/MESOS-2108
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Joris Van Remoortere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2074) Fetcher cache test fixture

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2074:
--
Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 
Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, 
Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere 
Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15)

> Fetcher cache test fixture
> --
>
> Key: MESOS-2074
> URL: https://issues.apache.org/jira/browse/MESOS-2074
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, slave
>Reporter: Bernd Mathiske
>Assignee: Bernd Mathiske
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> To accelerate providing good test coverage for the fetcher cache (MESOS-336), 
> we can provide a framework that canonicalizes creating and running a number 
> of tasks and allows easy parametrization with combinations of the following:
> - whether to cache or not
> - whether make what has been downloaded executable or not
> - whether to extract from an archive or not
> - whether to download from a file system, http, or...
> We can create a simple HHTP server in the test fixture to support the latter.
> Furthermore, the tests need to be robust wrt. varying numbers of StatusUpdate 
> messages. An accumulating update message sink that reports the final state is 
> needed.
> All this has already been programmed in this patch, just needs to be rebased:
> https://reviews.apache.org/r/21316/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2160) Add support for allocator modules

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2160:
--
Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, 
Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 
Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, 
Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere 
Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 
Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Add support for allocator modules
> -
>
> Key: MESOS-2160
> URL: https://issues.apache.org/jira/browse/MESOS-2160
> Project: Mesos
>  Issue Type: Epic
>Reporter: Niklas Quarfot Nielsen
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Currently Mesos supports only the DRF allocator, changing which requires 
> hacking Mesos source code, which, in turn, sets a high entry barrier. 
> Allocator modules give an easy possibility to tweak resource allocation 
> policy. This will enable swapping allocation policies without the necessity 
> to edit Mesos source code. Custom allocators may be written by everybody and 
> does not need be distributed together with Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2615) Pipe 'updateFramework' path from master to Allocator to support framework re-registration

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2615:
--
Sprint: Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, 
Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere 
Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Pipe 'updateFramework' path from master to Allocator to support framework 
> re-registration
> -
>
> Key: MESOS-2615
> URL: https://issues.apache.org/jira/browse/MESOS-2615
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Affects Versions: 0.22.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: framework, master
>
> Pipe the 'updateFramework' call from the master through the allocator, as 
> described in the design doc in the epic: MESOS-703



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2293) Implement the Call endpoint on master

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2293:
--
Sprint: Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  
(was: Mesosphere Q1 Sprint 9 - 5/15)

> Implement the Call endpoint on master
> -
>
> Key: MESOS-2293
> URL: https://issues.apache.org/jira/browse/MESOS-2293
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Isabel Jimenez
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2069) Basic fetcher cache functionality

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2069:
--
Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, 
Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 
Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q4 Sprint 3 - 
12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere 
Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 
3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere 
Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Basic fetcher cache functionality
> -
>
> Key: MESOS-2069
> URL: https://issues.apache.org/jira/browse/MESOS-2069
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, slave
>Reporter: Bernd Mathiske
>Assignee: Bernd Mathiske
>  Labels: fetcher, slave
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Add a flag to CommandInfo URI protobufs that indicates that files downloaded 
> by the fetcher shall be cached in a repository. To be followed by MESOS-2057 
> for concurrency control.
> Also see MESOS-336 for the overall goals for the fetcher cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2205) Add user documentation for reservations

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2205:
--
Sprint: Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q1 Sprint 5 - 
3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere 
Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Add user documentation for reservations
> ---
>
> Key: MESOS-2205
> URL: https://issues.apache.org/jira/browse/MESOS-2205
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, framework
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
>
> Add a user guide for reservations which describes basic usage of them, how 
> ACLs are used to specify who can unreserve whose resources, and few advanced 
> usage cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2608) test-framework should support principal only credential

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2608:
--
Sprint: Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, 
Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere 
Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> test-framework should support principal only credential 
> 
>
> Key: MESOS-2608
> URL: https://issues.apache.org/jira/browse/MESOS-2608
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>Priority: Minor
>  Labels: mesosphere
>
> Currently the test-framework is enforcing a secret to be present within the 
> supplied credential (via environment variable {{DEFAULT_SECRET}}).
> This is not an ideal example on how framework developers should approach 
> authentication.
> The presence check for the password has to be done within the authenticatee 
> (-module) implementation itself, if needed. 
> {{secret}} is typed {{optional bytes}}  within the {{Credential}} proto 
> message and should be handled accordingly by the framework to allow for 
> password free (e.g. credential cache based) authentication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2115) Improve recovering Docker containers when slave is contained

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2115:
--
Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, 
Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 
Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q4 Sprint 3 - 
12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere 
Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 
3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere 
Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Improve recovering Docker containers when slave is contained
> 
>
> Key: MESOS-2115
> URL: https://issues.apache.org/jira/browse/MESOS-2115
> Project: Mesos
>  Issue Type: Epic
>  Components: docker
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: docker
>
> Currently when docker containerizer is recovering it checks the checkpointed 
> executor pids to recover which containers are still running, and remove the 
> rest of the containers from docker ps that isn't recognized.
> This is problematic when the slave itself was in a docker container, as when 
> the slave container dies all the forked processes are removed as well, so the 
> checkpointed executor pids are no longer valid.
> We have to assume the docker containers might be still running even though 
> the checkpointed executor pids are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2072) Fetcher cache eviction

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2072:
--
Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, 
Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 
Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, 
Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere 
Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 
Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Fetcher cache eviction
> --
>
> Key: MESOS-2072
> URL: https://issues.apache.org/jira/browse/MESOS-2072
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, slave
>Reporter: Bernd Mathiske
>Assignee: Bernd Mathiske
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Delete files from the fetcher cache so that a given cache size is never 
> exceeded. Succeed in doing so while concurrent downloads are on their way and 
> new requests are pouring in.
> Idea: measure the size of each download before it begins, make enough room 
> before the download. This means that only download mechanisms that divulge 
> the size before the main download will be supported. AFAWK, those in use so 
> far have this property. 
> The calculation of how much space to free needs to be under concurrency 
> control, accumulating all space needed for competing, incomplete download 
> requests. (The Python script that performs fetcher caching for Aurora does 
> not seem to implement this. See 
> https://gist.github.com/zmanji/f41df77510ef9d00265a, imagine several of these 
> programs running concurrently, each one's _cache_eviction() call succeeding, 
> each perceiving the SAME free space being available.)
> Ultimately, a conflict resolution strategy is needed if just the downloads 
> underway already exceed the cache capacity. Then, as a fallback, direct 
> download into the work directory will be used for some tasks. TBD how to pick 
> which task gets treated how. 
> At first, only support copying of any downloaded files to the work directory 
> for task execution. This isolates the task life cycle after starting a task 
> from cache eviction considerations. 
> (Later, we can add symbolic links that avoid copying. But then eviction of 
> fetched files used by ongoing tasks must be blocked, which adds complexity. 
> another future extension is MESOS-1667 "Extract from URI while downloading 
> into work dir").



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1913) Create libevent/SSL-backed Socket implementation

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-1913:
--
Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, 
Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 
Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q4 Sprint 3 - 
12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere 
Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 
3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere 
Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Create libevent/SSL-backed Socket implementation
> 
>
> Key: MESOS-1913
> URL: https://issues.apache.org/jira/browse/MESOS-1913
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Joris Van Remoortere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2629) Update style guide to disallow capture by reference of temporaries

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2629:
--
Sprint: Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, 
Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q2 Sprint 8 - 5/1, Mesosphere 
Q1 Sprint 9 - 5/15)

> Update style guide to disallow capture by reference of temporaries
> --
>
> Key: MESOS-2629
> URL: https://issues.apache.org/jira/browse/MESOS-2629
> Project: Mesos
>  Issue Type: Task
>  Components: documentation, technical debt
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>
> We modify the style guide to disallow constant references to temporaries as a 
> whole. This means disallowing both (1) and (2) below.
> h3. Background
> 1. Constant references to simple expression temporaries do extend the 
> lifetime of the temporary till end of function scope:
> * Temporary returned by function:
>   {code}
>   // See full example below.
>   T f(const char* s) { return T(s); }
>   {
> const T& good = f("Ok");
> // use of good is ok.
>   }
>   {code}
> * Temporary constructed as simple expression:
>   {code}
>   // See full example below.
>   {
> const T& good = T("Ok");
> // use of good is ok.
>   }
>   {code}
> 2. Constant references to expressions that result in a reference to a 
> temporary do not extend the lifetime of the temporary:
>   * Temporary returned by function:
>   {code}
>   // See full example below.
>   T f(const char* s) { return T(s); }
>   {
> const T& bad = f("Bad!").Member();
> // use of bad is invalid.
>   }
>   {code}
>   * Temporary constructed as simple expression:
>   {code}
>   // See full example below.
>   {
> const T& bad = T("Bad!").Member();
> // use of bad is invalid.
>   }
>   {code}
> h3. Mesos Case
>   - In Mesos we use Future a lot. Many of our functions return Futures by 
> value:
>   {code}
>   class Socket {
> Future accept();
> Future recv(char* data, size_t size);
> ...
>   }
>   {code}
>   - Sometimes we capture these Futures:
>   {code}
>   {
> const Future& accepted = socket.accept(); // Valid c++, propose 
> we disallow.
>   }
>   {code}
>   - Sometimes we chain these Futures:
>   {code}
>   {
> socket.accept().then(lambda::bind(_accepted)); // Temporary will be valid 
> during 'then' expression evaluation.
>   }
>   {code}
>   - Sometimes we do both:
>   {code}
>   {
> const Future& accepted = 
> socket.accept().then(lambda::bind(_accepted)); // Dangerous! 'accepted' 
> lifetime will not be valid till end of scope. Disallow!
>   }
>   {code}
> h3. Reasoning
> - Although (1) is ok, and considered a 
> [feature|http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/],
>  (2) is extremely dangerous and leads to hard to track bugs.
> - If we explicitly allow (1), but disallow (2), then my worry is that someone 
> coming along to maintain the code later on may accidentally turn (1) into 
> (2), without recognizing the severity of this mistake. For example:
> {code}
> // Original code:
> const T& val = T();
> std::cout << val << std::endl;
> // New code:
> const T& val = T().removeWhiteSpace();
> std::cout << val << std::endl; // val could be corrupted since the destructor 
> has been invoked and T's memory freed.
> {code}
> - If we disallow both cases: it will be easier to catch these mistakes early 
> on in code reviews (and avoid these painful bugs), at the same cost of 
> introducing a new style guide rule.
> h3. Performance Implications
> - BenH suggests c++ developers are commonly taught to capture by constant 
> reference to hint to the compiler that the copy can be elided.
> - Modern compilers use a Data Flow Graph to make optimizations such as
>   - *In-place-construction*: leveraged by RVO and NRVO to construct the 
> object in place on the stack. Similar to "*Placement new*": 
> http://en.wikipedia.org/wiki/Placement_syntax
>   - *RVO* (Return Value Optimization): 
> http://en.wikipedia.org/wiki/Return_value_optimization
>   - *NRVO* (Named Return Value Optimization): 
> https://msdn.microsoft.com/en-us/library/ms364057%28v=vs.80%29.aspx
> - Since modern compilers perform these optimizations, we no longer need to 
> 'hint' to the compiler that the copies can be elided.
> h3. Example program
> {code}
> #include 
> class T {
> public:
>   T(const char* str) : Str(str) {
> printf("+ T(%s)\n", Str);
>   }
>   ~T() {
> printf("- T(%s)\n", Str);
>   }
>   const T& Member() const
>   {
> return *this;
>   }
> private:
>   const char* Str;
> };
> T f(const char* s) { return T(s); }
> int main() {
>   const T& good = T("Ok");
>   const T& good_f = f("Ok function");
>   const T& bad = T("Bad!").Member();
>   const T& bad_f = T("Bad function!").Member();
>   print

[jira] [Updated] (MESOS-2157) Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2157:
--
Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, 
Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 
Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, 
Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere 
Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 
Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints
> 
>
> Key: MESOS-2157
> URL: https://issues.apache.org/jira/browse/MESOS-2157
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Niklas Quarfot Nielsen
>Assignee: Alexander Rojas
>Priority: Trivial
>  Labels: mesosphere, newbie
>
> master/state.json exports the entire state of the cluster and can, for large 
> clusters, become massive (tens of megabytes of JSON).
> Often, a client only need information about subsets of the entire state, for 
> example all connected slaves, or information (registration info, tasks, etc) 
> belonging to a particular framework.
> We can partition state.json into many smaller endpoints, but for starters, 
> being able to get slave information and tasks information per framework would 
> be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2085) Add support encrypted and non-encrypted communication in parallel for cluster upgrade

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2085:
--
Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, 
Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 
Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q4 Sprint 3 - 
12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere 
Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 
3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere 
Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Add support encrypted and non-encrypted communication in parallel for cluster 
> upgrade
> -
>
> Key: MESOS-2085
> URL: https://issues.apache.org/jira/browse/MESOS-2085
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Joris Van Remoortere
>
> During cluster upgrade from non-encrypted to encrypted communication, we need 
> to support an interim where:
> 1) A master can have connections to both encrypted and non-encrypted slaves
> 2) A slave that supports encrypted communication connects to a master that 
> has not yet been upgraded.
> 3) Frameworks are encrypted but the master has not been upgraded yet.
> 4) Master has been upgraded but frameworks haven't.
> 5) A slave process has upgraded but running executor processes haven't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2317) Remove deprecated checkpoint=false code

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2317:
--
Sprint: Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, 
Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Q1 
Sprint 10 - 5/30  (was: Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 
4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Remove deprecated checkpoint=false code
> ---
>
> Key: MESOS-2317
> URL: https://issues.apache.org/jira/browse/MESOS-2317
> Project: Mesos
>  Issue Type: Epic
>Affects Versions: 0.22.0
>Reporter: Adam B
>Assignee: Joerg Schad
>  Labels: checkpoint, mesosphere
>
> Cody's plan from MESOS-444 was:
> 1) -Make it so the flag can't be changed at the command line-
> 2) -Remove the checkpoint variable entirely from slave/flags.hpp. This is a 
> fairly involved change since a number of unit tests depend on manually 
> setting the flag, as well as the default being non-checkpointing.-
> 3) Remove logic around checkpointing in the slave
> 4) Drop the flag from the SlaveInfo struct, remove logic inside the master 
> (Will require a deprecation cycle).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2739) Remove dynamic allocation from Stout Try

2015-05-15 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-2739:
---

 Summary: Remove dynamic allocation from Stout Try
 Key: MESOS-2739
 URL: https://issues.apache.org/jira/browse/MESOS-2739
 Project: Mesos
  Issue Type: Task
  Components: stout
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere


Follow the changes we made to Option and remove the dynamic allocations from 
Try.
One possible way to do this is to aggregate an Option to leverage that 
implementation of unrestricted union.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2070) Implement simple slave recovery behavior for fetcher cache

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2070:
--
Sprint: Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, 
Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 
Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, 
Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Q1 
Sprint 10 - 5/30  (was: Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 
2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 
Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, 
Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Implement simple slave recovery behavior for fetcher cache
> --
>
> Key: MESOS-2070
> URL: https://issues.apache.org/jira/browse/MESOS-2070
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, slave
>Reporter: Bernd Mathiske
>Assignee: Bernd Mathiske
>  Labels: newbie
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> Clean the fetcher cache completely upon slave restart/recovery. This 
> implements correct, albeit not ideal behavior. More efficient schemes that 
> restore knowledge about cached files or even resume downloads can be added 
> later. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2110) Configurable Ping Timeouts

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2110:
--
Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, 
Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 
Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q4 Sprint 3 - 
12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere 
Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 
3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere 
Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Configurable Ping Timeouts
> --
>
> Key: MESOS-2110
> URL: https://issues.apache.org/jira/browse/MESOS-2110
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Adam B
>Assignee: Adam B
>  Labels: master, network, slave, timeout
>
> After a series of ping-failures, the master considers the slave lost and 
> calls shutdownSlave, requiring such a slave that reconnects to kill its tasks 
> and re-register as a new slaveId. On the other side, after a similar timeout, 
> the slave will consider the master lost and try to detect a new master. These 
> timeouts are currently hardcoded constants (5 * 15s), which may not be 
> well-suited for all scenarios.
> - Some clusters may tolerate a longer slave process restart period, and 
> wouldn't want tasks to be killed upon reconnect.
> - Some clusters may have higher-latency networks (e.g. cross-datacenter, or 
> for volunteer computing efforts), and would like to tolerate longer periods 
> without communication.
> We should provide flags/mechanisms on the master to control its tolerance for 
> non-communicative slaves, and (less importantly?) on the slave to tolerate 
> missing masters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2057) Concurrency control for fetcher cache

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2057:
--
Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, 
Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 
Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q4 Sprint 3 - 
12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere 
Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 
3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere 
Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> Concurrency control for fetcher cache
> -
>
> Key: MESOS-2057
> URL: https://issues.apache.org/jira/browse/MESOS-2057
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, slave
>Reporter: Bernd Mathiske
>Assignee: Bernd Mathiske
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Having added a URI flag to CommandInfo messages (in MESOS-2069) that 
> indicates caching, caching files downloaded by the fetcher in a repository, 
> now ensure that when a URI is "cached", it is only ever downloaded once for 
> the same user on the same slave as long as the slave keeps running. 
> This even holds if multiple tasks request the same URI concurrently. If 
> multiple requests for the same URI occur, perform only one of them and reuse 
> the result. Make concurrent requests for the same URI wait for the one 
> download. 
> Different URIs from different CommandInfos can be downloaded concurrently.
> No cache eviction, cleanup or failover will be handled for now. Additional 
> tickets will be filed for these enhancements. (So don't use this feature in 
> production until the whole epic is complete.)
> Note that implementing this does not suffice for production use. This ticket 
> contains the main part of the fetcher logic, though. See the epic MESOS-336 
> for the rest of the features that lead to a fully functional fetcher cache.
> The proposed general approach is to keep all bookkeeping about what is in 
> which stage of being fetched and where it resides in the slave's 
> MesosContainerizerProcess, so that all concurrent access is disambiguated and 
> controlled by an "actor" (aka libprocess "process").
> Depends on MESOS-2056 and MESOS-2069.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2584) AuthenticationTest.RetryFrameworkAuthentication breaks with clang-3.4.2

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2584:
--
Sprint: Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, 
Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere 
Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> AuthenticationTest.RetryFrameworkAuthentication breaks with clang-3.4.2
> ---
>
> Key: MESOS-2584
> URL: https://issues.apache.org/jira/browse/MESOS-2584
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
> Environment: OS X 10.10.2 with clang-3.4.2
>Reporter: Michael Park
>Assignee: Till Toenshoff
>
> When built with {{clang-3.4.2}}, {{make check}} dies with the following error 
> message.
> {code}
> [ RUN  ] AuthenticationTest.RetryFrameworkAuthentication
> Assertion failed: (t != NULL), function operator(), file 
> ../../../3rdparty/libprocess/include/process/c++11/dispatch.hpp, line 77.
> *** Aborted at 1427879902 (unix time) try "date -d @1427879902" if you are 
> using GNU date ***
> PC: @ 0x7fff92600286 __pthread_kill
> *** SIGABRT (@0x7fff92600286) received by PID 5475 (TID 0x10d329000) stack 
> trace: ***
> @ 0x7fff8d56ff1a _sigtramp
> @0x10d328020 (unknown)
> @ 0x7fff8e7acb53 abort
> @ 0x7fff8e774c39 __assert_rtn
> @0x103d941c8 
> _ZZN7process8dispatchIN5mesos8internal8cram_md527CRAMMD5AuthenticateeProcessEEEvRKNS_3PIDIT_EEMS6_FvvEENKUlPNS_11ProcessBaseEE_clESD_
> @0x103d9401f 
> _ZNSt3__110__function6__funcIZN7process8dispatchIN5mesos8internal8cram_md527CRAMMD5AuthenticateeProcessEEEvRKNS2_3PIDIT_EEMS9_FvvEEUlPNS2_11ProcessBaseEE_NS_9allocatorISH_EEFvSG_EEclEOSG_
> @0x10817856b std::__1::function<>::operator()()
> @0x10815fd7f process::ProcessBase::visit()
> @0x1081ea0ae process::DispatchEvent::visit()
> @0x1067f7051 process::ProcessBase::serve()
> @0x1081495be process::ProcessManager::resume()
> @0x108148cee process::schedule()
> @ 0x7fff9951b268 _pthread_body
> @ 0x7fff9951b1e5 _pthread_start
> @ 0x7fff9951941d thread_start
> make[3]: *** [check-local] Abort trap: 6
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2226:
--
Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, 
Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 
Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, 
Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere 
Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 
Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15)

> HookTest.VerifySlaveLaunchExecutorHook is flaky
> ---
>
> Key: MESOS-2226
> URL: https://issues.apache.org/jira/browse/MESOS-2226
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Vinod Kone
>Assignee: Kapil Arya
>  Labels: flaky, flaky-test
>
> Observed this on internal CI
> {code}
> [ RUN  ] HookTest.VerifySlaveLaunchExecutorHook
> Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME'
> I0114 18:51:34.659353  4720 leveldb.cpp:176] Opened db in 1.255951ms
> I0114 18:51:34.662112  4720 leveldb.cpp:183] Compacted db in 596090ns
> I0114 18:51:34.662364  4720 leveldb.cpp:198] Created db iterator in 177877ns
> I0114 18:51:34.662719  4720 leveldb.cpp:204] Seeked to beginning of db in 
> 19709ns
> I0114 18:51:34.663010  4720 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 18208ns
> I0114 18:51:34.663312  4720 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0114 18:51:34.664266  4735 recover.cpp:449] Starting replica recovery
> I0114 18:51:34.664908  4735 recover.cpp:475] Replica is in EMPTY status
> I0114 18:51:34.667842  4734 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0114 18:51:34.669117  4735 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0114 18:51:34.677913  4735 recover.cpp:566] Updating replica status to 
> STARTING
> I0114 18:51:34.683157  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 137939ns
> I0114 18:51:34.683507  4735 replica.cpp:323] Persisted replica status to 
> STARTING
> I0114 18:51:34.684013  4735 recover.cpp:475] Replica is in STARTING status
> I0114 18:51:34.685554  4738 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0114 18:51:34.696512  4736 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0114 18:51:34.700552  4735 recover.cpp:566] Updating replica status to VOTING
> I0114 18:51:34.701128  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 115624ns
> I0114 18:51:34.701478  4735 replica.cpp:323] Persisted replica status to 
> VOTING
> I0114 18:51:34.701817  4735 recover.cpp:580] Successfully joined the Paxos 
> group
> I0114 18:51:34.702569  4735 recover.cpp:464] Recover process terminated
> I0114 18:51:34.716439  4736 master.cpp:262] Master 
> 20150114-185134-2272962752-57018-4720 (fedora-19) started on 
> 192.168.122.135:57018
> I0114 18:51:34.716913  4736 master.cpp:308] Master only allowing 
> authenticated frameworks to register
> I0114 18:51:34.717136  4736 master.cpp:313] Master only allowing 
> authenticated slaves to register
> I0114 18:51:34.717488  4736 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials'
> I0114 18:51:34.718077  4736 master.cpp:357] Authorization enabled
> I0114 18:51:34.719238  4738 whitelist_watcher.cpp:65] No whitelist given
> I0114 18:51:34.719755  4737 hierarchical_allocator_process.hpp:285] 
> Initialized hierarchical allocator process
> I0114 18:51:34.722584  4736 master.cpp:1219] The newly elected leader is 
> master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720
> I0114 18:51:34.722865  4736 master.cpp:1232] Elected as the leading master!
> I0114 18:51:34.723310  4736 master.cpp:1050] Recovering from registrar
> I0114 18:51:34.723760  4734 registrar.cpp:313] Recovering registrar
> I0114 18:51:34.725229  4740 log.cpp:660] Attempting to start the writer
> I0114 18:51:34.727893  4739 replica.cpp:477] Replica received implicit 
> promise request with proposal 1
> I0114 18:51:34.728425  4739 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 114781ns
> I0114 18:51:34.728662  4739 replica.cpp:345] Persisted promised to 1
> I0114 18:51:34.731271  4741 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0114 18:51:34.733223  4734 replica.cpp:378] Replica received explicit 
> promise request for position 0 with proposal 2
> I0114 18:51:34.73407

[jira] [Updated] (MESOS-2631) Remove capture by reference of temporaries in libprocess

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2631:
--
Sprint: Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, 
Mesosphere Q1 Sprint 10 - 5/30  (was: Mesosphere Q2 Sprint 8 - 5/1, Mesosphere 
Q1 Sprint 9 - 5/15)

> Remove capture by reference of temporaries in libprocess
> 
>
> Key: MESOS-2631
> URL: https://issues.apache.org/jira/browse/MESOS-2631
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2740) Remove dynamic allocation from Stout Result

2015-05-15 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-2740:
---

 Summary: Remove dynamic allocation from Stout Result
 Key: MESOS-2740
 URL: https://issues.apache.org/jira/browse/MESOS-2740
 Project: Mesos
  Issue Type: Task
  Components: stout
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere


Follow the changes we made to Option and remove the dynamic allocations from 
Result.
One possible way to do this is to to leverage the implementation of 
unrestricted union in Option and/or Try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2110) Configurable Ping Timeouts

2015-05-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2110:
--
Story Points: 0

> Configurable Ping Timeouts
> --
>
> Key: MESOS-2110
> URL: https://issues.apache.org/jira/browse/MESOS-2110
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Adam B
>Assignee: Adam B
>  Labels: master, network, slave, timeout
>
> After a series of ping-failures, the master considers the slave lost and 
> calls shutdownSlave, requiring such a slave that reconnects to kill its tasks 
> and re-register as a new slaveId. On the other side, after a similar timeout, 
> the slave will consider the master lost and try to detect a new master. These 
> timeouts are currently hardcoded constants (5 * 15s), which may not be 
> well-suited for all scenarios.
> - Some clusters may tolerate a longer slave process restart period, and 
> wouldn't want tasks to be killed upon reconnect.
> - Some clusters may have higher-latency networks (e.g. cross-datacenter, or 
> for volunteer computing efforts), and would like to tolerate longer periods 
> without communication.
> We should provide flags/mechanisms on the master to control its tolerance for 
> non-communicative slaves, and (less importantly?) on the slave to tolerate 
> missing masters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2740) Remove dynamic allocation from Stout Result

2015-05-15 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-2740:

Component/s: technical debt

> Remove dynamic allocation from Stout Result
> --
>
> Key: MESOS-2740
> URL: https://issues.apache.org/jira/browse/MESOS-2740
> Project: Mesos
>  Issue Type: Task
>  Components: stout, technical debt
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: c++11, stout
>
> Follow the changes we made to Option and remove the dynamic allocations 
> from Result.
> One possible way to do this is to to leverage the implementation of 
> unrestricted union in Option and/or Try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2739) Remove dynamic allocation from Stout Try

2015-05-15 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-2739:

Component/s: technical debt

> Remove dynamic allocation from Stout Try
> ---
>
> Key: MESOS-2739
> URL: https://issues.apache.org/jira/browse/MESOS-2739
> Project: Mesos
>  Issue Type: Task
>  Components: stout, technical debt
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: c++11, stout
>
> Follow the changes we made to Option and remove the dynamic allocations 
> from Try.
> One possible way to do this is to aggregate an Option to leverage that 
> implementation of unrestricted union.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-05-15 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545901#comment-14545901
 ] 

Benjamin Mahler commented on MESOS-2706:


Linking in another ticket related to high cpu usage caused by the 1 second 
usage polling in the slave.

> When the docker-tasks grow, the time spare between Queuing task and Starting 
> container grows
> 
>
> Key: MESOS-2706
> URL: https://issues.apache.org/jira/browse/MESOS-2706
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.22.0
> Environment: My Environment info:
> Mesos 0.22.0 & Marathon 0.82-RC1 both running in one host-server.
> Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
> 24G mems.
> So Mesos can launch thousands of task in theory.
> And the docker-task is very light-weight to launch a sshd service .
>Reporter: chenqiuhao
>
> At the beginning, Marathon can launch docker-task very fast,but when the 
> number of tasks in the only-one mesos-slave host reached 50,It seemed 
> Marathon lauch docker-task slow.
> So I check the mesos-slave log,and I found that the time spare between 
> Queuing task and Starting container grew .
> For example, 
> launch the 1st docker task, it takes about 0.008s
> [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
> task|Starting container'
> I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
> dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
> '20150202-112355-2684495626-5050-26153-
> I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
> 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
> '20150202-112355-2684495626-5050-26153-'
> launch the 50th docker task, it takes about 4.9s
> I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
> dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
> '20150202-112355-2684495626-5050-26153-
> I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
> '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
> '20150202-112355-2684495626-5050-26153-'
> And when i launch the 100th docker task,it takes about 13s!
> And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
> same result.
> Did somebody have the same experience , or Can help to do the same pressure 
> test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1375) Log rotation capable

2015-05-15 Thread Steven Schlansker (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545915#comment-14545915
 ] 

Steven Schlansker commented on MESOS-1375:
--

Sorry to sound frustrated.  After digging a little bit, I realize that my issue 
is as much with Mesosphere's packaging of Mesos as it is the Mesos 
configuration itself.  There's a couple of issues here that all come together 
to make it very hard to create a "production ready" logrotate setup.

* GLOG's log rotation is wacky.  It seems to rotate logs in part based on 
service restarts, so the interval between rotations is extremely irregular.  We 
will have 10 log files created in quick succession if a slave has issues 
starting up (right now I have 20 files for a single day we had a lot of issues 
in).  Other times during periods of great stability but high task load, we will 
end up with a single log file covering most of a month and grow to 10GB
* Mesosphere's init scripts do not allow easy customization of GLOG 
configuration (not that it is very configurable to start with)
* Mesosphere's init scripts hardwire stdout / stderr from mesos-master and 
mesos-slave to go to syslog's user facility, which is overloaded by just about 
every project that uses syslog

My ideal setup honestly would be to to pipe process stdout / stderr through 
something like Apache's 'rotatelogs' command, or to improve the Mesos 
integration with 'logrotate' so it can signal properly and not need 
'copytruncate' which has known race conditions.  I tried the logrotate 'hack' 
linked above and we did not find much success over three or four iterations.

It may be possible to get it working nicely, in which case maybe the only 
change needed is a documentation fix of "This is the official way to get Mesos 
logs rotation to work" along with some user education.  Happy to expand on any 
of these points if that would be helpful.  Thanks!


> Log rotation capable
> 
>
> Key: MESOS-1375
> URL: https://issues.apache.org/jira/browse/MESOS-1375
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Affects Versions: 0.18.0
>Reporter: Damien Hardy
>  Labels: ops, twitter
>
> Please provide a way to let ops manage logs.
> A log4j like configuration would be hard but make rotation capable without 
> restarting the service at least. 
> Based on external logrotate tool would be great :
>  * write to a constant log file name
>  * check for file change (recreated by logrotate) before write



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2649) Implement Resource Estimator

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545914#comment-14545914
 ] 

Niklas Quarfot Nielsen commented on MESOS-2649:
---

What else is needed for the estimator (except from modularizing it)?

We need to pass the monitor pointer at least :)

> Implement Resource Estimator
> 
>
> Key: MESOS-2649
> URL: https://issues.apache.org/jira/browse/MESOS-2649
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Jie Yu
>  Labels: twitter
>
> Resource estimator is the component in the slave that estimates the amount of 
> oversubscribable resources.
> This needs to be integrated with the slave and resource monitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2646) Update Master to send revocable offers

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545923#comment-14545923
 ] 

Niklas Quarfot Nielsen commented on MESOS-2646:
---

Should we link the ticket you recently created, to this ticket?

> Update Master to send revocable offers
> --
>
> Key: MESOS-2646
> URL: https://issues.apache.org/jira/browse/MESOS-2646
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>  Labels: twitter
>
> Master will send separate offers for revocable and non-revocable/regular 
> resources. This allows master to rescind revocable offers (e.g, when a new 
> oversubscribed resources estimate comes from the slave) without impacting 
> regular offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2653) Slave should act on correction events from QoS controller

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen reassigned MESOS-2653:
-

Assignee: Niklas Quarfot Nielsen

> Slave should act on correction events from QoS controller
> -
>
> Key: MESOS-2653
> URL: https://issues.apache.org/jira/browse/MESOS-2653
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Niklas Quarfot Nielsen
>  Labels: mesosphere
>
> Slave might want to kill revocable tasks based on correction events from the 
> QoS controller.
> The QoS controller communicates corrections through a stream (or 
> process::Queue) to the slave which corrections it needs to carry out, in 
> order to mitigate interference with production tasks.
> The correction is communicated through a message:
> [code]
> message QoSCorrection {
>  enum CorrectionType {
>  KillExecutor = 1
>  // KillTask = 2
>  // Resize, throttle task
>  }
> optional string reason = X;
> optional ExecutorID executor_id = X;
> // optional TaskID task_id = X;
> }
> [/code]
> And the slave will setup a handler to process these events. Initially, only 
> executor termination is supported and cause the slave to issue 
> 'containerizer->destroy()'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2653) Slave should act on correction events from QoS controller

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545927#comment-14545927
 ] 

Niklas Quarfot Nielsen commented on MESOS-2653:
---

Assigning myself for now. If we can find another contributor, I can shepherd 
this.

> Slave should act on correction events from QoS controller
> -
>
> Key: MESOS-2653
> URL: https://issues.apache.org/jira/browse/MESOS-2653
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Niklas Quarfot Nielsen
>  Labels: mesosphere
>
> Slave might want to kill revocable tasks based on correction events from the 
> QoS controller.
> The QoS controller communicates corrections through a stream (or 
> process::Queue) to the slave which corrections it needs to carry out, in 
> order to mitigate interference with production tasks.
> The correction is communicated through a message:
> [code]
> message QoSCorrection {
>  enum CorrectionType {
>  KillExecutor = 1
>  // KillTask = 2
>  // Resize, throttle task
>  }
> optional string reason = X;
> optional ExecutorID executor_id = X;
> // optional TaskID task_id = X;
> }
> [/code]
> And the slave will setup a handler to process these events. Initially, only 
> executor termination is supported and cause the slave to issue 
> 'containerizer->destroy()'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2687) Add a slave flag to enable oversubscription

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545932#comment-14545932
 ] 

Niklas Quarfot Nielsen commented on MESOS-2687:
---

[~jieyu] Did we decide to skip this, in favor of enabling by using a non-noop 
estimator?

> Add a slave flag to enable oversubscription
> ---
>
> Key: MESOS-2687
> URL: https://issues.apache.org/jira/browse/MESOS-2687
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>  Labels: twitter
>
> Slave sends oversubscribable resources to master only when the flag is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2688) Slave should kill revocable tasks if oversubscription is disabled

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545935#comment-14545935
 ] 

Niklas Quarfot Nielsen commented on MESOS-2688:
---

[~bmahler] Did you have some insights into how we would do this? I remember you 
had some concerns about BE task killing, or was that mostly the mass-killing 
from the master?

> Slave should kill revocable tasks if oversubscription is disabled
> -
>
> Key: MESOS-2688
> URL: https://issues.apache.org/jira/browse/MESOS-2688
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>  Labels: twitter
>
> If oversubscription is disabled on a restarted slave (that had it previously 
> enabled), it should kill revocable tasks.
> Slave knows this information from the Resources of a container that it 
> checkpoints and recovers.
> Add a new reason OVERSUBSCRIPTION_DISABLED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2157) Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints

2015-05-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545938#comment-14545938
 ] 

Adam B commented on MESOS-2157:
---

How about a /master/frameworks/{framework}/executors endpoint too? Just ran 
across the fact that master/state.json doesn't seem to report executorInfos 
(and their resources).

> Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints
> 
>
> Key: MESOS-2157
> URL: https://issues.apache.org/jira/browse/MESOS-2157
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Niklas Quarfot Nielsen
>Assignee: Alexander Rojas
>Priority: Trivial
>  Labels: mesosphere, newbie
>
> master/state.json exports the entire state of the cluster and can, for large 
> clusters, become massive (tens of megabytes of JSON).
> Often, a client only need information about subsets of the entire state, for 
> example all connected slaves, or information (registration info, tasks, etc) 
> belonging to a particular framework.
> We can partition state.json into many smaller endpoints, but for starters, 
> being able to get slave information and tasks information per framework would 
> be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2157) Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints

2015-05-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545938#comment-14545938
 ] 

Adam B edited comment on MESOS-2157 at 5/15/15 6:28 PM:


How about a `/master/frameworks/\{framework\}/executors` endpoint too? Just ran 
across the fact that master/state.json doesn't seem to report executorInfos 
(and their resources).


was (Author: adam-mesos):
How about a /master/frameworks/{framework}/executors endpoint too? Just ran 
across the fact that master/state.json doesn't seem to report executorInfos 
(and their resources).

> Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints
> 
>
> Key: MESOS-2157
> URL: https://issues.apache.org/jira/browse/MESOS-2157
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Niklas Quarfot Nielsen
>Assignee: Alexander Rojas
>Priority: Trivial
>  Labels: mesosphere, newbie
>
> master/state.json exports the entire state of the cluster and can, for large 
> clusters, become massive (tens of megabytes of JSON).
> Often, a client only need information about subsets of the entire state, for 
> example all connected slaves, or information (registration info, tasks, etc) 
> belonging to a particular framework.
> We can partition state.json into many smaller endpoints, but for starters, 
> being able to get slave information and tasks information per framework would 
> be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1375) Log rotation capable

2015-05-15 Thread Cody Maloney (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546016#comment-14546016
 ] 

Cody Maloney commented on MESOS-1375:
-

For configuring things even using Mesosphere init scripts in the current init 
wrappers you can add arbitrary flags as well as do environment variables which 
will be sourced. That said, definitely we've felt the pain of those old init 
scripts (Our newer mesos packaging we use in DCOS completely foregoes them), we 
may actually look at removing them in a new generation of the packaging.

> Log rotation capable
> 
>
> Key: MESOS-1375
> URL: https://issues.apache.org/jira/browse/MESOS-1375
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Affects Versions: 0.18.0
>Reporter: Damien Hardy
>  Labels: ops, twitter
>
> Please provide a way to let ops manage logs.
> A log4j like configuration would be hard but make rotation capable without 
> restarting the service at least. 
> Based on external logrotate tool would be great :
>  * write to a constant log file name
>  * check for file change (recreated by logrotate) before write



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2741) Exposing Resources along with ResourceStatistics from resource monitor

2015-05-15 Thread Jie Yu (JIRA)
Jie Yu created MESOS-2741:
-

 Summary: Exposing Resources along with ResourceStatistics from 
resource monitor
 Key: MESOS-2741
 URL: https://issues.apache.org/jira/browse/MESOS-2741
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu


Right now, the resource monitor returns a Usage which contains ContainerId, 
ExecutorInfo and ResourceStatistics. In order for resource estimator/qos 
controller to calculate usage slack, or tell if a container is using revokable 
resources or not, we need to expose the Resources that are currently assigned 
to the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2741) Exposing Resources along with ResourceStatistics from resource monitor

2015-05-15 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-2741:
--
Description: 
Right now, the resource monitor returns a Usage which contains ContainerId, 
ExecutorInfo and ResourceStatistics. In order for resource estimator/qos 
controller to calculate usage slack, or tell if a container is using revokable 
resources or not, we need to expose the Resources that are currently assigned 
to the container.

This requires us the change the containerizer interface to get the Resources as 
well while calling 'usage()'.

  was:Right now, the resource monitor returns a Usage which contains 
ContainerId, ExecutorInfo and ResourceStatistics. In order for resource 
estimator/qos controller to calculate usage slack, or tell if a container is 
using revokable resources or not, we need to expose the Resources that are 
currently assigned to the container.


> Exposing Resources along with ResourceStatistics from resource monitor
> --
>
> Key: MESOS-2741
> URL: https://issues.apache.org/jira/browse/MESOS-2741
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>  Labels: twitter
>
> Right now, the resource monitor returns a Usage which contains ContainerId, 
> ExecutorInfo and ResourceStatistics. In order for resource estimator/qos 
> controller to calculate usage slack, or tell if a container is using 
> revokable resources or not, we need to expose the Resources that are 
> currently assigned to the container.
> This requires us the change the containerizer interface to get the Resources 
> as well while calling 'usage()'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2741) Exposing Resources along with ResourceStatistics from resource monitor

2015-05-15 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-2741:
--
Labels: twitter  (was: )

> Exposing Resources along with ResourceStatistics from resource monitor
> --
>
> Key: MESOS-2741
> URL: https://issues.apache.org/jira/browse/MESOS-2741
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>  Labels: twitter
>
> Right now, the resource monitor returns a Usage which contains ContainerId, 
> ExecutorInfo and ResourceStatistics. In order for resource estimator/qos 
> controller to calculate usage slack, or tell if a container is using 
> revokable resources or not, we need to expose the Resources that are 
> currently assigned to the container.
> This requires us the change the containerizer interface to get the Resources 
> as well while calling 'usage()'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2741) Exposing Resources along with ResourceStatistics from resource monitor

2015-05-15 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-2741:
--
Story Points: 5

> Exposing Resources along with ResourceStatistics from resource monitor
> --
>
> Key: MESOS-2741
> URL: https://issues.apache.org/jira/browse/MESOS-2741
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>  Labels: twitter
>
> Right now, the resource monitor returns a Usage which contains ContainerId, 
> ExecutorInfo and ResourceStatistics. In order for resource estimator/qos 
> controller to calculate usage slack, or tell if a container is using 
> revokable resources or not, we need to expose the Resources that are 
> currently assigned to the container.
> This requires us the change the containerizer interface to get the Resources 
> as well while calling 'usage()'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2738) Report resources allocated to default executors equally for frameworks and slaves in state.json

2015-05-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-2738:
--
Description: 
[~rcorral] recently observed that according to the master's and the slave's 
state.json summing up the resources allocated to tasks from different 
frameworks on a slave does not always match the total that is reported for the 
slave. The latter number is sometimes higher.

It would be desirable for tools that display allocation statistics to find 
balanced tallies.


  was:
[~rcorral] recently observed that according to state.json summing up the 
resources allocated to tasks from different frameworks on a slave does not 
match the total that is reported for the slave. The latter number is higher.

[~adam-mesos] and I have the strong suspicion that when in state.json the sum 
of all resources allocated to a framework is reported, this does not include 
the resources for default (command line) executors. However, when the resources 
of a slave are summed up in state.json, these resources are included in the 
total. Custom executor resources are included in both cases. Browsing master 
source code supports this theory.

It would be desirable for tools that display allocation statistics to find 
balanced tallies.

Possible alternative approaches:
- Exclude default executors from the sum for slaves.
- Include default executors for frameworks and their tasks.
- Emit a global record listing the canonical resource values for default 
executors, which are always the same. Then the sums of such resources can be 
determined by multiplying by the number of tasks involved.

Workaround for now: determine the latter amounts by reading Mesos source code 
and hardcode them into your external tool.


> Report resources allocated to default executors equally for frameworks and 
> slaves in state.json
> ---
>
> Key: MESOS-2738
> URL: https://issues.apache.org/jira/browse/MESOS-2738
> Project: Mesos
>  Issue Type: Improvement
>  Components: json api, master
>Reporter: Bernd Mathiske
>  Labels: mesosphere
>
> [~rcorral] recently observed that according to the master's and the slave's 
> state.json summing up the resources allocated to tasks from different 
> frameworks on a slave does not always match the total that is reported for 
> the slave. The latter number is sometimes higher.
> It would be desirable for tools that display allocation statistics to find 
> balanced tallies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2738) Reported used resources for tasks in frameworks do not match slave tally

2015-05-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-2738:
--
Summary: Reported used resources for tasks in frameworks do not match slave 
tally  (was: Report resources allocated to default executors equally for 
frameworks and slaves in state.json)

> Reported used resources for tasks in frameworks do not match slave tally
> 
>
> Key: MESOS-2738
> URL: https://issues.apache.org/jira/browse/MESOS-2738
> Project: Mesos
>  Issue Type: Improvement
>  Components: json api, master
>Reporter: Bernd Mathiske
>  Labels: mesosphere
>
> [~rcorral] recently observed that according to the master's and the slave's 
> state.json summing up the resources allocated to tasks from different 
> frameworks on a slave does not always match the total that is reported for 
> the slave. The latter number is sometimes higher.
> It would be desirable for tools that display allocation statistics to find 
> balanced tallies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2741) Exposing Resources along with ResourceStatistics from resource monitor

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-2741:
--
Labels: mesosphere twitter  (was: twitter)

> Exposing Resources along with ResourceStatistics from resource monitor
> --
>
> Key: MESOS-2741
> URL: https://issues.apache.org/jira/browse/MESOS-2741
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>  Labels: mesosphere, twitter
>
> Right now, the resource monitor returns a Usage which contains ContainerId, 
> ExecutorInfo and ResourceStatistics. In order for resource estimator/qos 
> controller to calculate usage slack, or tell if a container is using 
> revokable resources or not, we need to expose the Resources that are 
> currently assigned to the container.
> This requires us the change the containerizer interface to get the Resources 
> as well while calling 'usage()'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2728) Introduce concept of cluster wide resources.

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-2728:
--
Summary: Introduce concept of cluster wide resources.  (was: Introduce 
concept of clusterwise ressources.)

> Introduce concept of cluster wide resources.
> 
>
> Key: MESOS-2728
> URL: https://issues.apache.org/jira/browse/MESOS-2728
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>
> There are resources which are not provided by a single node. Consider for 
> example a external Network Bandwidth of a cluster. Being a limited resource 
> it makes sense for Mesos to manage it but still it is not a ressource being 
> offered by a single node.
> Use Cases:
> 1. Network Bandwidth
> 2. IP Addresses
> 3. Global Service Ports
> 2. Distributed File System Storage
> 3. Software Licences



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2742) Architecture doc on global resources

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)
Niklas Quarfot Nielsen created MESOS-2742:
-

 Summary: Architecture doc on global resources
 Key: MESOS-2742
 URL: https://issues.apache.org/jira/browse/MESOS-2742
 Project: Mesos
  Issue Type: Task
Reporter: Niklas Quarfot Nielsen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2728) Introduce concept of cluster wide resources.

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-2728:
--
Epic Name: Clusterwide Resources.  (was: Clusterwise Ressources.)

> Introduce concept of cluster wide resources.
> 
>
> Key: MESOS-2728
> URL: https://issues.apache.org/jira/browse/MESOS-2728
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>
> There are resources which are not provided by a single node. Consider for 
> example a external Network Bandwidth of a cluster. Being a limited resource 
> it makes sense for Mesos to manage it but still it is not a ressource being 
> offered by a single node.
> Use Cases:
> 1. Network Bandwidth
> 2. IP Addresses
> 3. Global Service Ports
> 2. Distributed File System Storage
> 3. Software Licences



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2734) Update allocator to allocate revocable resources

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546162#comment-14546162
 ] 

Niklas Quarfot Nielsen commented on MESOS-2734:
---

Should we break this down into smaller pieces? Maybe include a ticket for 
documenting the semantic changes?

> Update allocator to allocate revocable resources
> 
>
> Key: MESOS-2734
> URL: https://issues.apache.org/jira/browse/MESOS-2734
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>  Labels: twitter
>
> Allocator maintains 'revocable' resources which keeps track of the currently 
> available revocable resources.
> Revocable resources are added to the DRF sorter allocations much like regular 
> resources during recoverResources() and allocate().
> The only difference is that, unlike regular resources, 'revocable' resources 
> are *not* updated in allocate() or recover(). They only get updated in 
> updateRevocableResources() call.
> The 2 main consequences of this design are
> --> Revocable resources are accounted for in fair sharing which is great.
> --> Allocation for revocable resources only happens whenever there is a new 
> estimate from the slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2647) Slave should validate tasks using oversubscribed resources

2015-05-15 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2647:
--
Description: 
The latest oversubscribed resource estimate might render a revocable task 
launch invalid. Slave should check this and send TASK_LOST with appropriate 
REASON.

We need to add a new REASON for this (REASON_RESOURCE_OVERSUBSCRIBED?).

  was:
The latest oversubscribed resource estimate sent by the slave might render a 
revocable task launch invalid. Master should check this and send TASK_LOST with 
appropriate REASON.

We need to add a new REASON for this (REASON_RESOURCE_OVERSUBSCRIBED?).

Summary: Slave should validate tasks using oversubscribed resources  
(was: Master should validate tasks using oversubscribed resources)

> Slave should validate tasks using oversubscribed resources
> --
>
> Key: MESOS-2647
> URL: https://issues.apache.org/jira/browse/MESOS-2647
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>  Labels: twitter
>
> The latest oversubscribed resource estimate might render a revocable task 
> launch invalid. Slave should check this and send TASK_LOST with appropriate 
> REASON.
> We need to add a new REASON for this (REASON_RESOURCE_OVERSUBSCRIBED?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2735) Change the interaction between the slave and the resource estimator from polling to pushing

2015-05-15 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546167#comment-14546167
 ] 

Niklas Quarfot Nielsen commented on MESOS-2735:
---

Can you capture some of the recent discussions here? I wanted to understand how 
providing a callback to update the most recent value was different from having 
a callback hanging off the future on the estimator::update()

> Change the interaction between the slave and the resource estimator from 
> polling to pushing 
> 
>
> Key: MESOS-2735
> URL: https://issues.apache.org/jira/browse/MESOS-2735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: twitter
>
> This will make the semantics more clear. The resource estimator can control 
> the speed of sending resources estimation to the slave.
> To avoid cyclic dependency, slave will register a callback with the resource 
> estimator and the resource estimator will simply invoke that callback when 
> there's a new estimation ready. The callback will be a defer to the slave's 
> main event queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1162) Add a 'Percentage' abstraction.

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-1162:
---
Priority: Minor  (was: Major)
Assignee: Marco Massenzio  (was: Isabel Jimenez)

> Add a 'Percentage' abstraction.
> ---
>
> Key: MESOS-1162
> URL: https://issues.apache.org/jira/browse/MESOS-1162
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Benjamin Mahler
>Assignee: Marco Massenzio
>Priority: Minor
>
> It is currently difficult to add a percentage-based flag, if one desires it 
> to be specified in the "0%"-"100%" form. This requires creating a {{string}} 
> flag and doing all the parsing  / validation manually.
> An alternative is to use a {{double}} flag with 0.0-1.0 being the valid 
> range, however, this may not read as intuitively to operators.
> Another alternative is to use a {{double}} flag with 0.0-100.0 as the valid 
> range, with the '%' being implicit.
> However, these two alternative techniques can lead to confusion since it's 
> not clear how we're interpreting the value. Requiring the '%' symbol is nice 
> because it leaves no room for ambiguity.
> I would propose adding a 'Percentage' abstraction in stout that provides the 
> parsing logic for use in flags. Percentages can basically be a wrapper around 
> the underlying {{double}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1162) Add a 'Percentage' abstraction.

2015-05-15 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546181#comment-14546181
 ] 

Marco Massenzio commented on MESOS-1162:


As I've been adding some functionality around {{stout::BaseFlags}} I'll be 
looking into doing this: looks like a fun project and simple enough that I 
won't be causing too much damage.

> Add a 'Percentage' abstraction.
> ---
>
> Key: MESOS-1162
> URL: https://issues.apache.org/jira/browse/MESOS-1162
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Benjamin Mahler
>Assignee: Marco Massenzio
>Priority: Minor
>
> It is currently difficult to add a percentage-based flag, if one desires it 
> to be specified in the "0%"-"100%" form. This requires creating a {{string}} 
> flag and doing all the parsing  / validation manually.
> An alternative is to use a {{double}} flag with 0.0-1.0 being the valid 
> range, however, this may not read as intuitively to operators.
> Another alternative is to use a {{double}} flag with 0.0-100.0 as the valid 
> range, with the '%' being implicit.
> However, these two alternative techniques can lead to confusion since it's 
> not clear how we're interpreting the value. Requiring the '%' symbol is nice 
> because it leaves no room for ambiguity.
> I would propose adding a 'Percentage' abstraction in stout that provides the 
> parsing logic for use in flags. Percentages can basically be a wrapper around 
> the underlying {{double}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-350) Explore disk I/O isolation in cgroups

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-350:
--
Epic Name: Disk I/O Isolation

> Explore disk I/O isolation in cgroups
> -
>
> Key: MESOS-350
> URL: https://issues.apache.org/jira/browse/MESOS-350
> Project: Mesos
>  Issue Type: Epic
>Reporter: Sathya Hariesh
>Assignee: Joris Van Remoortere
>
> Currently there is no disk isolation in place and this affects an executor to 
> be starved of disk when another disk heavy operation such as copying a multi 
> gigabyte file is being performed by another executor.
> At Twitter, the executor performs a few fsync operations and has specific 
> wait periods for these operations to succeed. When these operations take 
> longer than expected during termination of tasks, it would potentially result 
> in LOST tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-350) Explore disk I/O isolation in cgroups

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-350:
--
Summary: Explore disk I/O isolation in cgroups  (was: Explore disk io 
isolation in cgroups)

> Explore disk I/O isolation in cgroups
> -
>
> Key: MESOS-350
> URL: https://issues.apache.org/jira/browse/MESOS-350
> Project: Mesos
>  Issue Type: Epic
>Reporter: Sathya Hariesh
>Assignee: Joris Van Remoortere
>
> Currently there is no disk isolation in place and this affects an executor to 
> be starved of disk when another disk heavy operation such as copying a multi 
> gigabyte file is being performed by another executor.
> At Twitter, the executor performs a few fsync operations and has specific 
> wait periods for these operations to succeed. When these operations take 
> longer than expected during termination of tasks, it would potentially result 
> in LOST tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2174) Fix Flaky Tests

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2174:
---
Epic Status: Done  (was: To Do)

> Fix Flaky Tests
> ---
>
> Key: MESOS-2174
> URL: https://issues.apache.org/jira/browse/MESOS-2174
> Project: Mesos
>  Issue Type: Epic
>Reporter: Cody Maloney
>  Labels: mesosphere
>
> Flaky tests reduce our confidence in builds, and make it hard to see the 
> signal thorough the noise. Let's track them and make sure we're making 
> progress towards getting rid of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MESOS-2174) Fix Flaky Tests

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio closed MESOS-2174.
--

"Closing" it for real now (following [~adam-mesos] comments: just "resolving" 
it does not make it go away from our boards).

> Fix Flaky Tests
> ---
>
> Key: MESOS-2174
> URL: https://issues.apache.org/jira/browse/MESOS-2174
> Project: Mesos
>  Issue Type: Epic
>Reporter: Cody Maloney
>  Labels: mesosphere
>
> Flaky tests reduce our confidence in builds, and make it hard to see the 
> signal thorough the noise. Let's track them and make sure we're making 
> progress towards getting rid of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (MESOS-2174) Fix Flaky Tests

2015-05-15 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio reopened MESOS-2174:


> Fix Flaky Tests
> ---
>
> Key: MESOS-2174
> URL: https://issues.apache.org/jira/browse/MESOS-2174
> Project: Mesos
>  Issue Type: Epic
>Reporter: Cody Maloney
>  Labels: mesosphere
>
> Flaky tests reduce our confidence in builds, and make it hard to see the 
> signal thorough the noise. Let's track them and make sure we're making 
> progress towards getting rid of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >