[jira] [Updated] (MESOS-1774) Fix protobuf detection on systems with Python 3 as default

2014-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Domański updated MESOS-1774:
--
Target Version/s: 1.0.0, 0.20.1  (was: 1.0.0, 0.20.0, 0.20.1)

 Fix protobuf detection on systems with Python 3 as default
 --

 Key: MESOS-1774
 URL: https://issues.apache.org/jira/browse/MESOS-1774
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
 Environment: Gentoo Linux
 ./configure --disable-bundled
Reporter: Kamil Domański
  Labels: build

 When configureing without bundled dependencies, usage of *python* symbolic 
 link in *m4/ac_python_module.m4* causes the detection of *google.protobuf* 
 module to fail on systems with Python 3 set as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1774) Fix protobuf detection on systems with Python 3 as default

2014-09-08 Thread JIRA
Kamil Domański created MESOS-1774:
-

 Summary: Fix protobuf detection on systems with Python 3 as default
 Key: MESOS-1774
 URL: https://issues.apache.org/jira/browse/MESOS-1774
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
 Environment: Gentoo Linux
./configure --disable-bundled
Reporter: Kamil Domański


When configureing without bundled dependencies, usage of *python* symbolic link 
in *m4/ac_python_module.m4* causes the detection of *google.protobuf* module to 
fail on systems with Python 3 set as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1774) Fix protobuf detection on systems with Python 3 as default

2014-09-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125498#comment-14125498
 ] 

Kamil Domański commented on MESOS-1774:
---

https://reviews.apache.org/r/25439/

 Fix protobuf detection on systems with Python 3 as default
 --

 Key: MESOS-1774
 URL: https://issues.apache.org/jira/browse/MESOS-1774
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
 Environment: Gentoo Linux
 ./configure --disable-bundled
Reporter: Kamil Domański
  Labels: build

 When configureing without bundled dependencies, usage of *python* symbolic 
 link in *m4/ac_python_module.m4* causes the detection of *google.protobuf* 
 module to fail on systems with Python 3 set as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1775) Libprocess wants source for unbundled gmock

2014-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Domański updated MESOS-1775:
--
Affects Version/s: 0.20.0

 Libprocess wants source for unbundled gmock
 ---

 Key: MESOS-1775
 URL: https://issues.apache.org/jira/browse/MESOS-1775
 Project: Mesos
  Issue Type: Bug
  Components: build, libprocess
Affects Versions: 0.20.0
 Environment: Gentoo Linux
 ./configure --disable-bundled
Reporter: Kamil Domański
Priority: Minor
  Labels: build

 *gmock* is installed on my system. Yet with *--disable-bundled* the 
 libprocess configuration script is still searching for *gmock-all.cc* instead 
 of just the headers and libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1774) Fix protobuf detection on systems with Python 3 as default

2014-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Domański updated MESOS-1774:
--
Shepherd: Timothy St. Clair  (was: Kamil Domański)

 Fix protobuf detection on systems with Python 3 as default
 --

 Key: MESOS-1774
 URL: https://issues.apache.org/jira/browse/MESOS-1774
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
 Environment: Gentoo Linux
 ./configure --disable-bundled
Reporter: Kamil Domański
  Labels: build

 When configureing without bundled dependencies, usage of *python* symbolic 
 link in *m4/ac_python_module.m4* causes the detection of *google.protobuf* 
 module to fail on systems with Python 3 set as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-1764) Minor Build Fixes from 0.20 release

2014-09-08 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121966#comment-14121966
 ] 

Timothy St. Clair edited comment on MESOS-1764 at 9/8/14 4:49 PM:
--

package config file
-reviews.apache.org/r/25355/-


was (Author: tstclair):
package config file
https://reviews.apache.org/r/25355/

 Minor Build Fixes from 0.20 release
 ---

 Key: MESOS-1764
 URL: https://issues.apache.org/jira/browse/MESOS-1764
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
Reporter: Timothy St. Clair
Assignee: Timothy St. Clair

 This ticket is a catch all for minor issues caught during a rebase and 
 testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1772) ./include/process/future.hpp(274): error: no instance of overloaded function process::FutureT::onReady

2014-09-08 Thread Dominic Hamon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125749#comment-14125749
 ] 

Dominic Hamon commented on MESOS-1772:
--

I know very little about the Intel C compiler, but my gut reaction is that we 
shouldn't support Yet Another Toolchain. However, we should then check for this 
at configure time.



 ./include/process/future.hpp(274): error: no instance of overloaded function 
 process::FutureT::onReady
 -

 Key: MESOS-1772
 URL: https://issues.apache.org/jira/browse/MESOS-1772
 Project: Mesos
  Issue Type: Bug
  Components: build
Reporter: Vinson Lee
Priority: Blocker

 build error with Intel C Compiler
 {noformat}
 libtool: compile:  /opt/intel/bin/icpc -DPACKAGE_NAME=\libprocess\ 
 -DPACKAGE_TARNAME=\libprocess\ -DPACKAGE_VERSION=\0.0.1\ 
 -DPACKAGE_STRING=\libprocess 0.0.1\ -DPACKAGE_BUGREPORT=\\ 
 -DPACKAGE_URL=\\ -DPACKAGE=\libprocess\ -DVERSION=\0.0.1\ 
 -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 
 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 
 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ 
 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -I. -I./include -I./3rdparty/stout/include 
 -I3rdparty/boost-1.53.0 -I3rdparty/libev-4.15 -I3rdparty/picojson-4f93734 
 -I3rdparty/glog-0.3.3/src -I3rdparty/ry-http-parser-1c3624a -g -g2 -O2 
 -std=c++11 -MT libprocess_la-http.lo -MD -MP -MF .deps/libprocess_la-http.Tpo 
 -c src/http.cpp  -fPIC -DPIC -o libprocess_la-http.o
 ./include/process/future.hpp(274): error: no instance of overloaded function 
 process::FutureT::onReady [with T=std::string] matches the argument list
 argument types are: (std::_Bindstd::_Mem_fnbool 
 (process::Futurestd::string::*)(const std::string ) 
 (process::Futurestd::string, std::_Placeholder1), 
 process::Futurestd::string::Prefer)
   return onReady(std::forwardF(f), Prefer());
  ^
   detected during:
 instantiation of const process::FutureT 
 process::FutureT::onReady(F ) const [with T=std::string, 
 F=std::_Bindstd::_Mem_fnbool (process::Futurestd::string::*)(const 
 std::string ) (process::Futurestd::string, std::_Placeholder1)] at 
 line 777
 instantiation of bool process::PromiseT::associate(const 
 process::FutureT ) [with T=std::string] at line 1435
 instantiation of void process::internal::thenf(const 
 std::shared_ptrprocess::PromiseX , const 
 std::functionprocess::FutureX (const T ) , const process::FutureT ) 
 [with T=Nothing, X=std::basic_stringchar, std::char_traitschar, 
 std::allocatorchar] at line 1508
 instantiation of process::FutureX 
 process::FutureT::then(const std::functionprocess::FutureX (const T ) 
 ) const [with T=Nothing, X=std::basic_stringchar, std::char_traitschar, 
 std::allocatorchar] at line 355
 instantiation of process::FutureX process::FutureT::then(F 
 , process::FutureT::Prefer) const [with T=Nothing, 
 F=std::_Bindprocess::Futurestd::string (*(int))(int), 
 X=std::basic_stringchar, std::char_traitschar, std::allocatorchar] at 
 line 369
 instantiation of auto process::FutureT::then(F ) 
 const-decltype((expression)) [with T=Nothing, 
 F=std::_Bindprocess::Futurestd::string (*(int))(int)] at line 160 of 
 src/http.cpp
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1765) Use PID namespace to avoid freezing cgroup

2014-09-08 Thread Cong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125756#comment-14125756
 ] 

Cong Wang commented on MESOS-1765:
--

[~yasumoto] Sure, here is the patch I sent to Linux kernel: 
https://lkml.org/lkml/2014/9/4/646 which contains the description of the bug.

 Use PID namespace to avoid freezing cgroup
 --

 Key: MESOS-1765
 URL: https://issues.apache.org/jira/browse/MESOS-1765
 Project: Mesos
  Issue Type: Story
  Components: containerization
Reporter: Cong Wang

 There is some known kernel issue when we freeze the whole cgroup upon OOM. 
 Mesos probably can just use PID namespace so that we will only need to kill 
 the init of the pid namespace, instead of freezing all the processes and 
 killing them one by one. But I am not quite sure if this would break the 
 existing code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1774) Fix protobuf detection on systems with Python 3 as default

2014-09-08 Thread Timothy St. Clair (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy St. Clair updated MESOS-1774:
-
Assignee: Timothy St. Clair

 Fix protobuf detection on systems with Python 3 as default
 --

 Key: MESOS-1774
 URL: https://issues.apache.org/jira/browse/MESOS-1774
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
 Environment: Gentoo Linux
 ./configure --disable-bundled
Reporter: Kamil Domański
Assignee: Timothy St. Clair
  Labels: build

 When configureing without bundled dependencies, usage of *python* symbolic 
 link in *m4/ac_python_module.m4* causes the detection of *google.protobuf* 
 module to fail on systems with Python 3 set as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-1774) Fix protobuf detection on systems with Python 3 as default

2014-09-08 Thread Timothy St. Clair (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy St. Clair resolved MESOS-1774.
--
Resolution: Fixed

commit 18d3957f2742aa83e9a73a4c6ee09cb5419487f3
Author: Kamil Doma?ski alabat...@gmail.com
Date:   Mon Sep 8 12:10:27 2014 -0500



 Fix protobuf detection on systems with Python 3 as default
 --

 Key: MESOS-1774
 URL: https://issues.apache.org/jira/browse/MESOS-1774
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
 Environment: Gentoo Linux
 ./configure --disable-bundled
Reporter: Kamil Domański
Assignee: Timothy St. Clair
  Labels: build

 When configureing without bundled dependencies, usage of *python* symbolic 
 link in *m4/ac_python_module.m4* causes the detection of *google.protobuf* 
 module to fail on systems with Python 3 set as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1764) Minor Build Fixes from 0.20 release

2014-09-08 Thread Timothy St. Clair (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy St. Clair updated MESOS-1764:
-
Shepherd: Vinod Kone

 Minor Build Fixes from 0.20 release
 ---

 Key: MESOS-1764
 URL: https://issues.apache.org/jira/browse/MESOS-1764
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
Reporter: Timothy St. Clair
Assignee: Timothy St. Clair

 This ticket is a catch all for minor issues caught during a rebase and 
 testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1771) introduce unique_ptr

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1771:
-
Description: 
* add unique_ptr to the configure check
* document use of unique_ptr in style guide
** use when possible, use std::move when necessary
* deprecate Owned in favour of unique_ptr
* Move raw pointers with ownership over to unique_ptr

  was:
* add unique_ptr to the configure check
* deprecate Owned in favour of unique_ptr
* Move raw pointers with ownership over to unique_ptr


 introduce unique_ptr
 

 Key: MESOS-1771
 URL: https://issues.apache.org/jira/browse/MESOS-1771
 Project: Mesos
  Issue Type: Improvement
Reporter: Dominic Hamon
Assignee: Dominic Hamon

 * add unique_ptr to the configure check
 * document use of unique_ptr in style guide
 ** use when possible, use std::move when necessary
 * deprecate Owned in favour of unique_ptr
 * Move raw pointers with ownership over to unique_ptr



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1771) introduce unique_ptr

2014-09-08 Thread Dominic Hamon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125783#comment-14125783
 ] 

Dominic Hamon commented on MESOS-1771:
--

Adding check to configure: https://reviews.apache.org/r/25448/

 introduce unique_ptr
 

 Key: MESOS-1771
 URL: https://issues.apache.org/jira/browse/MESOS-1771
 Project: Mesos
  Issue Type: Improvement
Reporter: Dominic Hamon
Assignee: Dominic Hamon

 * add unique_ptr to the configure check
 * document use of unique_ptr in style guide
 ** use when possible, use std::move when necessary
 * deprecate Owned in favour of unique_ptr
 * Move raw pointers with ownership over to unique_ptr



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1715) The slave does not send pending tasks / executors during re-registration.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1715:
-
Sprint: Q3 Sprint 4, Q3 Sprint 5  (was: Q3 Sprint 4)

 The slave does not send pending tasks / executors during re-registration.
 -

 Key: MESOS-1715
 URL: https://issues.apache.org/jira/browse/MESOS-1715
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler

 In what looks like an oversight, the pending tasks and executors in the slave 
 (Framework::pending) are not sent in the re-registration message.
 For tasks, this can lead to spurious TASK_LOST notifications being generated 
 by the master when it falsely thinks the tasks are not present on the slave.
 For executors, this can lead to under-accounting in the master, causing an 
 overcommit on the slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1586) Isolate system directories, e.g., per-container /tmp

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1586:
-
Sprint: Q3 Sprint 1, Q3 Sprint 2, Q3 Sprint 3, Q3 Sprint 4, Q3 Sprint 5  
(was: Q3 Sprint 1, Q3 Sprint 2, Q3 Sprint 3, Q3 Sprint 4)

 Isolate system directories, e.g., per-container /tmp
 

 Key: MESOS-1586
 URL: https://issues.apache.org/jira/browse/MESOS-1586
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Affects Versions: 0.20.0
Reporter: Ian Downes
Assignee: Ian Downes

 Ideally, tasks should not write outside their sandbox (executor work 
 directory) but pragmatically they may need to write to /tmp, /var/tmp, or 
 some other directory.
 1) We should include any such files in disk usage and quota.
 2) We should make these shared directories private, i.e., each container 
 has their own.
 3) We should make the lifetime of any such files the same as the executor 
 work directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1466) Race between executor exited event and launch task can cause overcommit of resources

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1466:
-
Sprint: Q3 Sprint 3, Q3 Sprint 4, Q3 Sprint 5  (was: Q3 Sprint 3, Q3 Sprint 
4)

 Race between executor exited event and launch task can cause overcommit of 
 resources
 

 Key: MESOS-1466
 URL: https://issues.apache.org/jira/browse/MESOS-1466
 Project: Mesos
  Issue Type: Bug
  Components: allocation, master
Reporter: Vinod Kone
Assignee: Benjamin Mahler
  Labels: reliability

 The following sequence of events can cause an overcommit
 -- Launch task is called for a task whose executor is already running
 -- Executor's resources are not accounted for on the master
 -- Executor exits and the event is enqueued behind launch tasks on the master
 -- Master sends the task to the slave which needs to commit for resources 
 for task and the (new) executor.
 -- Master processes the executor exited event and re-offers the executor's 
 resources causing an overcommit of resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1721) Prevent overcommit of the slave for ports and ephemeral ports.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1721:
-
Sprint: Q3 Sprint 4, Q3 Sprint 5  (was: Q3 Sprint 4)

 Prevent overcommit of the slave for ports and ephemeral ports.
 --

 Key: MESOS-1721
 URL: https://issues.apache.org/jira/browse/MESOS-1721
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler

 It's possible for the slave to be overcommitted (e.g. MESOS-1668). In the 
 case of named resources like ports and ephemeral_ports, this is problematic 
 as the resources needed by the tasks are in use.
 This ticket is to present the idea of rejecting tasks when the slave is 
 overcommitted on ports or ephemeral_ports. In order to ensure the master 
 reconciles state with the slave, we can also trigger a re-registration.
 For cpu / memory, this is less crucial, so preventing overcommit for these 
 will be punted for later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1592) Design inverse resource offer support

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1592:
-
Sprint: Q3 Sprint 3, Q3 Sprint 4, Q3 Sprint 5  (was: Q3 Sprint 3, Q3 Sprint 
4)

 Design inverse resource offer support
 -

 Key: MESOS-1592
 URL: https://issues.apache.org/jira/browse/MESOS-1592
 Project: Mesos
  Issue Type: Task
  Components: allocation
Reporter: Benjamin Mahler
Assignee: Alexandra Sava

 An inverse resource offer means that Mesos is requesting resources back 
 from the framework, possibly within some time interval.
 This can be leveraged initially to provide more automated cluster 
 maintenance, by offering schedulers the opportunity to move tasks to 
 compensate for planned maintenance. Operators can set a time limit on how 
 long to wait for schedulers to relocate tasks before the tasks are forcibly 
 terminated.
 Inverse resource offers have many other potential uses, as it opens the 
 opportunity for the allocator to attempt to move tasks in the cluster through 
 the co-operation of the framework, possibly providing better 
 over-subscription, fairness, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1728) Libprocess: report bind parameters on failure

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1728:
-
Sprint: Q3 Sprint 4, Q3 Sprint 5  (was: Q3 Sprint 4)

 Libprocess: report bind parameters on failure
 -

 Key: MESOS-1728
 URL: https://issues.apache.org/jira/browse/MESOS-1728
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Nikita Vetoshkin
Assignee: Nikita Vetoshkin
Priority: Trivial

 When you attempt to start slave or master and there's another one already 
 running there, it is nice to report what are the actual parameters to 
 {{bind}} call that failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1466) Race between executor exited event and launch task can cause overcommit of resources

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1466:
-
Sprint: Q3 Sprint 3, Q3 Sprint 4  (was: Q3 Sprint 3, Q3 Sprint 4, Q3 Sprint 
5)

 Race between executor exited event and launch task can cause overcommit of 
 resources
 

 Key: MESOS-1466
 URL: https://issues.apache.org/jira/browse/MESOS-1466
 Project: Mesos
  Issue Type: Bug
  Components: allocation, master
Reporter: Vinod Kone
Assignee: Benjamin Mahler
  Labels: reliability

 The following sequence of events can cause an overcommit
 -- Launch task is called for a task whose executor is already running
 -- Executor's resources are not accounted for on the master
 -- Executor exits and the event is enqueued behind launch tasks on the master
 -- Master sends the task to the slave which needs to commit for resources 
 for task and the (new) executor.
 -- Master processes the executor exited event and re-offers the executor's 
 resources causing an overcommit of resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1425) LogZooKeeperTest.WriteRead test is flaky

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1425:
-
Sprint: Q3 Sprint 1, Q3 Sprint 2, Q3 Sprint 4  (was: Q3 Sprint 1, Q3 Sprint 
2, Q3 Sprint 4, Q3 Sprint 5)

 LogZooKeeperTest.WriteRead test is flaky
 

 Key: MESOS-1425
 URL: https://issues.apache.org/jira/browse/MESOS-1425
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.19.0
Reporter: Vinod Kone
Assignee: Jie Yu

 {code}
 [ RUN  ] LogZooKeeperTest.WriteRead
 I0527 23:23:48.286031  1352 zookeeper_test_server.cpp:158] Started 
 ZooKeeperTestServer on port 39446
 I0527 23:23:48.293916  1352 log_tests.cpp:1945] Using temporary directory 
 '/tmp/LogZooKeeperTest_WriteRead_Vyty8g'
 I0527 23:23:48.296430  1352 leveldb.cpp:176] Opened db in 2.459713ms
 I0527 23:23:48.296740  1352 leveldb.cpp:183] Compacted db in 286843ns
 I0527 23:23:48.296761  1352 leveldb.cpp:198] Created db iterator in 3083ns
 I0527 23:23:48.296772  1352 leveldb.cpp:204] Seeked to beginning of db in 
 4541ns
 I0527 23:23:48.296777  1352 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 87ns
 I0527 23:23:48.296788  1352 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0527 23:23:48.297499  1383 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 505340ns
 I0527 23:23:48.297513  1383 replica.cpp:320] Persisted replica status to 
 VOTING
 I0527 23:23:48.299492  1352 leveldb.cpp:176] Opened db in 1.73582ms
 I0527 23:23:48.299773  1352 leveldb.cpp:183] Compacted db in 263937ns
 I0527 23:23:48.299793  1352 leveldb.cpp:198] Created db iterator in 7494ns
 I0527 23:23:48.299806  1352 leveldb.cpp:204] Seeked to beginning of db in 
 235ns
 I0527 23:23:48.299813  1352 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 93ns
 I0527 23:23:48.299821  1352 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0527 23:23:48.300503  1380 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 492309ns
 I0527 23:23:48.300516  1380 replica.cpp:320] Persisted replica status to 
 VOTING
 I0527 23:23:48.302500  1352 leveldb.cpp:176] Opened db in 1.793829ms
 I0527 23:23:48.303642  1352 leveldb.cpp:183] Compacted db in 1.123929ms
 I0527 23:23:48.303669  1352 leveldb.cpp:198] Created db iterator in 5865ns
 I0527 23:23:48.303689  1352 leveldb.cpp:204] Seeked to beginning of db in 
 8811ns
 I0527 23:23:48.303705  1352 leveldb.cpp:273] Iterated through 1 keys in the 
 db in 9545ns
 I0527 23:23:48.303715  1352 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 2014-05-27 23:23:48,303:1352(0x2b1173a29700):ZOO_INFO@log_env@712: Client 
 environment:zookeeper.version=zookeeper C client 3.4.5
 2014-05-27 23:23:48,303:1352(0x2b1173a29700):ZOO_INFO@log_env@716: Client 
 environment:host.name=minerva
 2014-05-27 23:23:48,303:1352(0x2b1173a29700):ZOO_INFO@log_env@723: Client 
 environment:os.name=Linux
 2014-05-27 23:23:48,303:1352(0x2b1173a29700):ZOO_INFO@log_env@724: Client 
 environment:os.arch=3.2.0-57-generic
 2014-05-27 23:23:48,303:1352(0x2b1173a29700):ZOO_INFO@log_env@725: Client 
 environment:os.version=#87-Ubuntu SMP Tue Nov 12 21:35:10 UTC 2013
 2014-05-27 23:23:48,303:1352(0x2b1173e2b700):ZOO_INFO@log_env@712: Client 
 environment:zookeeper.version=zookeeper C client 3.4.5
 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@716: Client 
 environment:host.name=minerva
 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@723: Client 
 environment:os.name=Linux
 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@724: Client 
 environment:os.arch=3.2.0-57-generic
 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@725: Client 
 environment:os.version=#87-Ubuntu SMP Tue Nov 12 21:35:10 UTC 2013
 2014-05-27 23:23:48,304:1352(0x2b1173a29700):ZOO_INFO@log_env@733: Client 
 environment:user.name=(null)
 I0527 23:23:48.303988  1380 log.cpp:238] Attempting to join replica to 
 ZooKeeper group
 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@733: Client 
 environment:user.name=(null)
 2014-05-27 23:23:48,304:1352(0x2b1173a29700):ZOO_INFO@log_env@741: Client 
 environment:user.home=/home/jenkins
 I0527 23:23:48.304198  1385 recover.cpp:425] Starting replica recovery
 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@741: Client 
 environment:user.home=/home/jenkins
 2014-05-27 23:23:48,304:1352(0x2b1173a29700):ZOO_INFO@log_env@753: Client 
 environment:user.dir=/tmp/LogZooKeeperTest_WriteRead_Vyty8g
 2014-05-27 23:23:48,304:1352(0x2b1173a29700):ZOO_INFO@zookeeper_init@786: 
 Initiating client connection, host=127.0.0.1:39446 sessionTimeout=5000 
 watcher=0x2b11708e98d0 sessionId=0 sessionPasswd=null 
 

[jira] [Updated] (MESOS-1752) Allow variadic templates

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1752:
-
Sprint: Q3 Sprint 4, Q3 Sprint 5  (was: Q3 Sprint 4)

 Allow variadic templates
 

 Key: MESOS-1752
 URL: https://issues.apache.org/jira/browse/MESOS-1752
 Project: Mesos
  Issue Type: Improvement
Reporter: Dominic Hamon
Assignee: Dominic Hamon
Priority: Minor
  Labels: c++11

 Add variadic templates to the C++11 configure check. Once there, we can start 
 using them in the code-base.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1758) Freezer failure leads to lost task during container destruction.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1758:
-
Sprint: Q3 Sprint 5

 Freezer failure leads to lost task during container destruction.
 

 Key: MESOS-1758
 URL: https://issues.apache.org/jira/browse/MESOS-1758
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Benjamin Mahler

 In the past we've seen numerous issues around the freezer. Lately, on the 
 2.6.44 kernel, we've seen issues where we're unable to freeze the cgroup:
 (1) An oom occurs.
 (2) No indication of oom in the kernel logs.
 (3) The slave is unable to freeze the cgroup.
 (4) The task is marked as lost.
 {noformat}
 I0903 16:46:24.956040 25469 mem.cpp:575] Memory limit exceeded: Requested: 
 15488MB Maximum Used: 15488MB
 MEMORY STATISTICS:
 cache 7958691840
 rss 8281653248
 mapped_file 9474048
 pgpgin 4487861
 pgpgout 522933
 pgfault 2533780
 pgmajfault 11
 inactive_anon 0
 active_anon 8281653248
 inactive_file 7631708160
 active_file 326852608
 unevictable 0
 hierarchical_memory_limit 16240345088
 total_cache 7958691840
 total_rss 8281653248
 total_mapped_file 9474048
 total_pgpgin 4487861
 total_pgpgout 522933
 total_pgfault 2533780
 total_pgmajfault 11
 total_inactive_anon 0
 total_active_anon 8281653248
 total_inactive_file 7631728640
 total_active_file 326852608
 total_unevictable 0
 I0903 16:46:24.956848 25469 containerizer.cpp:1041] Container 
 bbb9732a-d600-4c1b-b326-846338c608c3 has reached its limit for resource 
 mem(*):1.62403e+10 and will be terminated
 I0903 16:46:24.957427 25469 containerizer.cpp:909] Destroying container 
 'bbb9732a-d600-4c1b-b326-846338c608c3'
 I0903 16:46:24.958664 25481 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:34.959529 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:34.962070 25482 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.710848ms
 I0903 16:46:34.962658 25479 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:44.963349 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:44.965631 25472 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.588224ms
 I0903 16:46:44.966356 25472 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:54.967254 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:56.008447 25475 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 2.15296ms
 I0903 16:46:56.009071 25466 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:06.010329 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:06.012538 25467 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.643008ms
 I0903 16:47:06.013216 25467 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:12.516348 25480 slave.cpp:3030] Current usage 9.57%. Max allowed 
 age: 5.630238827780799days
 I0903 16:47:16.015192 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:16.017043 25486 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.511168ms
 I0903 16:47:16.017555 25480 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:19.862746 25483 http.cpp:245] HTTP request for 
 '/slave(1)/stats.json'
 E0903 16:47:24.960055 25472 slave.cpp:2557] Termination of executor 'E' of 
 framework '201104070004-002563-' failed: Failed to destroy container: 
 discarded future
 I0903 16:47:24.962054 25472 slave.cpp:2087] Handling status update TASK_LOST 
 (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 
 201104070004-002563- from @0.0.0.0:0
 I0903 16:47:24.963470 25469 mem.cpp:293] Updated 'memory.soft_limit_in_bytes' 
 to 128MB for container bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:24.963541 25471 cpushare.cpp:338] Updated 'cpu.shares' to 256 
 (cpus 0.25) for container bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:24.964756 25471 

[jira] [Updated] (MESOS-1410) Keep terminal unacknowledged tasks in the master's state.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1410:
-
Assignee: Benjamin Mahler

 Keep terminal unacknowledged tasks in the master's state.
 -

 Key: MESOS-1410
 URL: https://issues.apache.org/jira/browse/MESOS-1410
 Project: Mesos
  Issue Type: Task
Affects Versions: 0.19.0
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler
 Fix For: 0.21.0


 Once we are sending acknowledgments through the master as per MESOS-1409, we 
 need to keep terminal tasks that are *unacknowledged* in the Master's memory.
 This will allow us to identify these tasks to frameworks when we haven't yet 
 forwarded them an update. Without this, we're susceptible to MESOS-1389.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1410) Keep terminal unacknowledged tasks in the master's state.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1410:
-
Shepherd: Vinod Kone

 Keep terminal unacknowledged tasks in the master's state.
 -

 Key: MESOS-1410
 URL: https://issues.apache.org/jira/browse/MESOS-1410
 Project: Mesos
  Issue Type: Task
Affects Versions: 0.19.0
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler
 Fix For: 0.21.0


 Once we are sending acknowledgments through the master as per MESOS-1409, we 
 need to keep terminal tasks that are *unacknowledged* in the Master's memory.
 This will allow us to identify these tasks to frameworks when we haven't yet 
 forwarded them an update. Without this, we're susceptible to MESOS-1389.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1410) Keep terminal unacknowledged tasks in the master's state.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1410:
-
Sprint: Q3 Sprint 5

 Keep terminal unacknowledged tasks in the master's state.
 -

 Key: MESOS-1410
 URL: https://issues.apache.org/jira/browse/MESOS-1410
 Project: Mesos
  Issue Type: Task
Affects Versions: 0.19.0
Reporter: Benjamin Mahler
 Fix For: 0.21.0


 Once we are sending acknowledgments through the master as per MESOS-1409, we 
 need to keep terminal tasks that are *unacknowledged* in the Master's memory.
 This will allow us to identify these tasks to frameworks when we haven't yet 
 forwarded them an update. Without this, we're susceptible to MESOS-1389.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1476) Provide endpoints for deactivating / activating slaves.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1476:
-
Sprint: Q3 Sprint 5

 Provide endpoints for deactivating / activating slaves.
 ---

 Key: MESOS-1476
 URL: https://issues.apache.org/jira/browse/MESOS-1476
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Benjamin Mahler
Assignee: Alexandra Sava
  Labels: gsoc2014

 When performing maintenance operations on slaves, it is important to allow 
 these slaves to be drained of their tasks.
 The first essential primitive of draining slaves is to prevent them from 
 running more tasks. This can be achieved by deactivating them: stop sending 
 their resource offers to frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1739) Add Dynamic Slave Attributes

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1739:
-
Sprint: Q3 Sprint 5

 Add Dynamic Slave Attributes
 

 Key: MESOS-1739
 URL: https://issues.apache.org/jira/browse/MESOS-1739
 Project: Mesos
  Issue Type: Improvement
Reporter: Patrick Reilly
Assignee: Patrick Reilly

 Make it so that either via a slave restart or a out of process reconfigure 
 ping, the attributes and resources of a slave can be updated to be a superset 
 of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1765) Use PID namespace to avoid freezing cgroup

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon reassigned MESOS-1765:


Assignee: Vinod Kone

 Use PID namespace to avoid freezing cgroup
 --

 Key: MESOS-1765
 URL: https://issues.apache.org/jira/browse/MESOS-1765
 Project: Mesos
  Issue Type: Story
  Components: containerization
Reporter: Cong Wang
Assignee: Vinod Kone

 There is some known kernel issue when we freeze the whole cgroup upon OOM. 
 Mesos probably can just use PID namespace so that we will only need to kill 
 the init of the pid namespace, instead of freezing all the processes and 
 killing them one by one. But I am not quite sure if this would break the 
 existing code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1758) Freezer failure leads to lost task during container destruction.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon reassigned MESOS-1758:


Assignee: Vinod Kone

 Freezer failure leads to lost task during container destruction.
 

 Key: MESOS-1758
 URL: https://issues.apache.org/jira/browse/MESOS-1758
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Benjamin Mahler
Assignee: Vinod Kone

 In the past we've seen numerous issues around the freezer. Lately, on the 
 2.6.44 kernel, we've seen issues where we're unable to freeze the cgroup:
 (1) An oom occurs.
 (2) No indication of oom in the kernel logs.
 (3) The slave is unable to freeze the cgroup.
 (4) The task is marked as lost.
 {noformat}
 I0903 16:46:24.956040 25469 mem.cpp:575] Memory limit exceeded: Requested: 
 15488MB Maximum Used: 15488MB
 MEMORY STATISTICS:
 cache 7958691840
 rss 8281653248
 mapped_file 9474048
 pgpgin 4487861
 pgpgout 522933
 pgfault 2533780
 pgmajfault 11
 inactive_anon 0
 active_anon 8281653248
 inactive_file 7631708160
 active_file 326852608
 unevictable 0
 hierarchical_memory_limit 16240345088
 total_cache 7958691840
 total_rss 8281653248
 total_mapped_file 9474048
 total_pgpgin 4487861
 total_pgpgout 522933
 total_pgfault 2533780
 total_pgmajfault 11
 total_inactive_anon 0
 total_active_anon 8281653248
 total_inactive_file 7631728640
 total_active_file 326852608
 total_unevictable 0
 I0903 16:46:24.956848 25469 containerizer.cpp:1041] Container 
 bbb9732a-d600-4c1b-b326-846338c608c3 has reached its limit for resource 
 mem(*):1.62403e+10 and will be terminated
 I0903 16:46:24.957427 25469 containerizer.cpp:909] Destroying container 
 'bbb9732a-d600-4c1b-b326-846338c608c3'
 I0903 16:46:24.958664 25481 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:34.959529 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:34.962070 25482 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.710848ms
 I0903 16:46:34.962658 25479 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:44.963349 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:44.965631 25472 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.588224ms
 I0903 16:46:44.966356 25472 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:54.967254 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:56.008447 25475 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 2.15296ms
 I0903 16:46:56.009071 25466 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:06.010329 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:06.012538 25467 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.643008ms
 I0903 16:47:06.013216 25467 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:12.516348 25480 slave.cpp:3030] Current usage 9.57%. Max allowed 
 age: 5.630238827780799days
 I0903 16:47:16.015192 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:16.017043 25486 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.511168ms
 I0903 16:47:16.017555 25480 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:19.862746 25483 http.cpp:245] HTTP request for 
 '/slave(1)/stats.json'
 E0903 16:47:24.960055 25472 slave.cpp:2557] Termination of executor 'E' of 
 framework '201104070004-002563-' failed: Failed to destroy container: 
 discarded future
 I0903 16:47:24.962054 25472 slave.cpp:2087] Handling status update TASK_LOST 
 (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 
 201104070004-002563- from @0.0.0.0:0
 I0903 16:47:24.963470 25469 mem.cpp:293] Updated 'memory.soft_limit_in_bytes' 
 to 128MB for container bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:24.963541 25471 cpushare.cpp:338] Updated 'cpu.shares' to 256 
 (cpus 0.25) for container 

[jira] [Updated] (MESOS-1721) Prevent overcommit of the slave for ports and ephemeral ports.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1721:
-
Sprint: Q3 Sprint 4  (was: Q3 Sprint 4, Q3 Sprint 5)

 Prevent overcommit of the slave for ports and ephemeral ports.
 --

 Key: MESOS-1721
 URL: https://issues.apache.org/jira/browse/MESOS-1721
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler

 It's possible for the slave to be overcommitted (e.g. MESOS-1668). In the 
 case of named resources like ports and ephemeral_ports, this is problematic 
 as the resources needed by the tasks are in use.
 This ticket is to present the idea of rejecting tasks when the slave is 
 overcommitted on ports or ephemeral_ports. In order to ensure the master 
 reconciles state with the slave, we can also trigger a re-registration.
 For cpu / memory, this is less crucial, so preventing overcommit for these 
 will be punted for later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1717) The slave does not show pending tasks in the JSON endpoints.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1717:
-
Story Points: 1

 The slave does not show pending tasks in the JSON endpoints.
 

 Key: MESOS-1717
 URL: https://issues.apache.org/jira/browse/MESOS-1717
 Project: Mesos
  Issue Type: Bug
  Components: json api, slave
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler

 The slave does not show pending tasks in the /state.json endpoint.
 This is a bit tricky to add since we rely on knowing the executor directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1715) The slave does not send pending tasks / executors during re-registration.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1715:
-
Story Points: 3

 The slave does not send pending tasks / executors during re-registration.
 -

 Key: MESOS-1715
 URL: https://issues.apache.org/jira/browse/MESOS-1715
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler

 In what looks like an oversight, the pending tasks and executors in the slave 
 (Framework::pending) are not sent in the re-registration message.
 For tasks, this can lead to spurious TASK_LOST notifications being generated 
 by the master when it falsely thinks the tasks are not present on the slave.
 For executors, this can lead to under-accounting in the master, causing an 
 overcommit on the slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1728) Libprocess: report bind parameters on failure

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1728:
-
Story Points: 1

 Libprocess: report bind parameters on failure
 -

 Key: MESOS-1728
 URL: https://issues.apache.org/jira/browse/MESOS-1728
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Nikita Vetoshkin
Assignee: Nikita Vetoshkin
Priority: Trivial

 When you attempt to start slave or master and there's another one already 
 running there, it is nice to report what are the actual parameters to 
 {{bind}} call that failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1392) Failure when znode is removed before we can read its contents.

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1392:
-
Story Points: 3

 Failure when znode is removed before we can read its contents.
 --

 Key: MESOS-1392
 URL: https://issues.apache.org/jira/browse/MESOS-1392
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.19.0
Reporter: Benjamin Mahler
Assignee: Yan Xu

 Looks like the following can occur when a znode goes away right before we can 
 read it's contents:
 {noformat: title=Slave exit}
 I0520 16:33:45.721727 29155 group.cpp:382] Trying to create path 
 '/home/mesos/test/master' in ZooKeeper
 I0520 16:33:48.600837 29155 detector.cpp:134] Detected a new leader: 
 (id='2617')
 I0520 16:33:48.601428 29147 group.cpp:655] Trying to get 
 '/home/mesos/test/master/info_002617' in ZooKeeper
 Failed to detect a master: Failed to get data for ephemeral node 
 '/home/mesos/test/master/info_002617' in ZooKeeper: no node
 Slave Exit Status: 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1739) Add Dynamic Slave Attributes

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1739:
-
Story Points: 3

 Add Dynamic Slave Attributes
 

 Key: MESOS-1739
 URL: https://issues.apache.org/jira/browse/MESOS-1739
 Project: Mesos
  Issue Type: Improvement
Reporter: Patrick Reilly
Assignee: Patrick Reilly

 Make it so that either via a slave restart or a out of process reconfigure 
 ping, the attributes and resources of a slave can be updated to be a superset 
 of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-703) master fails to respect updated FrameworkInfo when the framework scheduler restarts

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-703:

Sprint: Q3 Sprint 5

 master fails to respect updated FrameworkInfo when the framework scheduler 
 restarts
 ---

 Key: MESOS-703
 URL: https://issues.apache.org/jira/browse/MESOS-703
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 0.14.0
 Environment: ubuntu 13.04, mesos 0.14.0-rc3
Reporter: Jordan Curzon
Assignee: Vinod Kone

 When I first ran marathon it was running as a personal user and registered 
 with mesos-master as such due to putting an empty string in the user field. 
 When I restarted marathon as nobody, tasks were still being run as the 
 personal user which didn't exist on the slaves. I know marathon was trying to 
 send a FrameworkInfo with nobody listed as the user because I hard coded it 
 in. The tasks wouldn't run as nobody until I restarted the mesos-master. 
 Each time I restarted the marathon framework, it reregistered with 
 mesos-master and mesos-master wrote to the logs that it detected a failover 
 because the scheduler went away and then came back.
 I understand the scheduler failover, but shouldn't mesos-master respect an 
 updated FrameworkInfo when the scheduler re-registers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1774) Fix protobuf detection on systems with Python 3 as default

2014-09-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126053#comment-14126053
 ] 

Kamil Domański commented on MESOS-1774:
---

[~tstclair], I actually updated the review request, since the original patch 
changed echoed message, but not not the command.

 Fix protobuf detection on systems with Python 3 as default
 --

 Key: MESOS-1774
 URL: https://issues.apache.org/jira/browse/MESOS-1774
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
 Environment: Gentoo Linux
 ./configure --disable-bundled
Reporter: Kamil Domański
Assignee: Timothy St. Clair
  Labels: build

 When configureing without bundled dependencies, usage of *python* symbolic 
 link in *m4/ac_python_module.m4* causes the detection of *google.protobuf* 
 module to fail on systems with Python 3 set as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-1774) Fix protobuf detection on systems with Python 3 as default

2014-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Domański updated MESOS-1774:
--
Comment: was deleted

(was: [~tstclair], I actually updated the review request, since the original 
patch changed echoed message, but not not the command.)

 Fix protobuf detection on systems with Python 3 as default
 --

 Key: MESOS-1774
 URL: https://issues.apache.org/jira/browse/MESOS-1774
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
 Environment: Gentoo Linux
 ./configure --disable-bundled
Reporter: Kamil Domański
Assignee: Timothy St. Clair
  Labels: build

 When configureing without bundled dependencies, usage of *python* symbolic 
 link in *m4/ac_python_module.m4* causes the detection of *google.protobuf* 
 module to fail on systems with Python 3 set as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (MESOS-1774) Fix protobuf detection on systems with Python 3 as default

2014-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Domański reopened MESOS-1774:
---

[~tstclair], I actually updated the review request, since the original patch 
changed echoed message, but not not the command.

 Fix protobuf detection on systems with Python 3 as default
 --

 Key: MESOS-1774
 URL: https://issues.apache.org/jira/browse/MESOS-1774
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
 Environment: Gentoo Linux
 ./configure --disable-bundled
Reporter: Kamil Domański
Assignee: Timothy St. Clair
  Labels: build

 When configureing without bundled dependencies, usage of *python* symbolic 
 link in *m4/ac_python_module.m4* causes the detection of *google.protobuf* 
 module to fail on systems with Python 3 set as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1777) Design persistent resources

2014-09-08 Thread Jie Yu (JIRA)
Jie Yu created MESOS-1777:
-

 Summary: Design persistent resources
 Key: MESOS-1777
 URL: https://issues.apache.org/jira/browse/MESOS-1777
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1777) Design persistent resources

2014-09-08 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1777:
-
  Sprint: Q3 Sprint 5
Assignee: Jie Yu

 Design persistent resources
 ---

 Key: MESOS-1777
 URL: https://issues.apache.org/jira/browse/MESOS-1777
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1758) Freezer failure leads to lost task during container destruction.

2014-09-08 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126238#comment-14126238
 ] 

Vinod Kone commented on MESOS-1758:
---

short term fix: https://reviews.apache.org/r/25457/ until we get PID namespace 
support.

 Freezer failure leads to lost task during container destruction.
 

 Key: MESOS-1758
 URL: https://issues.apache.org/jira/browse/MESOS-1758
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Benjamin Mahler
Assignee: Vinod Kone

 In the past we've seen numerous issues around the freezer. Lately, on the 
 2.6.44 kernel, we've seen issues where we're unable to freeze the cgroup:
 (1) An oom occurs.
 (2) No indication of oom in the kernel logs.
 (3) The slave is unable to freeze the cgroup.
 (4) The task is marked as lost.
 {noformat}
 I0903 16:46:24.956040 25469 mem.cpp:575] Memory limit exceeded: Requested: 
 15488MB Maximum Used: 15488MB
 MEMORY STATISTICS:
 cache 7958691840
 rss 8281653248
 mapped_file 9474048
 pgpgin 4487861
 pgpgout 522933
 pgfault 2533780
 pgmajfault 11
 inactive_anon 0
 active_anon 8281653248
 inactive_file 7631708160
 active_file 326852608
 unevictable 0
 hierarchical_memory_limit 16240345088
 total_cache 7958691840
 total_rss 8281653248
 total_mapped_file 9474048
 total_pgpgin 4487861
 total_pgpgout 522933
 total_pgfault 2533780
 total_pgmajfault 11
 total_inactive_anon 0
 total_active_anon 8281653248
 total_inactive_file 7631728640
 total_active_file 326852608
 total_unevictable 0
 I0903 16:46:24.956848 25469 containerizer.cpp:1041] Container 
 bbb9732a-d600-4c1b-b326-846338c608c3 has reached its limit for resource 
 mem(*):1.62403e+10 and will be terminated
 I0903 16:46:24.957427 25469 containerizer.cpp:909] Destroying container 
 'bbb9732a-d600-4c1b-b326-846338c608c3'
 I0903 16:46:24.958664 25481 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:34.959529 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:34.962070 25482 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.710848ms
 I0903 16:46:34.962658 25479 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:44.963349 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:44.965631 25472 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.588224ms
 I0903 16:46:44.966356 25472 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:54.967254 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:56.008447 25475 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 2.15296ms
 I0903 16:46:56.009071 25466 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:06.010329 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:06.012538 25467 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.643008ms
 I0903 16:47:06.013216 25467 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:12.516348 25480 slave.cpp:3030] Current usage 9.57%. Max allowed 
 age: 5.630238827780799days
 I0903 16:47:16.015192 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:16.017043 25486 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.511168ms
 I0903 16:47:16.017555 25480 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:19.862746 25483 http.cpp:245] HTTP request for 
 '/slave(1)/stats.json'
 E0903 16:47:24.960055 25472 slave.cpp:2557] Termination of executor 'E' of 
 framework '201104070004-002563-' failed: Failed to destroy container: 
 discarded future
 I0903 16:47:24.962054 25472 slave.cpp:2087] Handling status update TASK_LOST 
 (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 
 201104070004-002563- from @0.0.0.0:0
 I0903 16:47:24.963470 25469 mem.cpp:293] Updated 'memory.soft_limit_in_bytes' 
 to 128MB for container bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 

[jira] [Updated] (MESOS-1758) Freezer failure leads to lost task during container destruction.

2014-09-08 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1758:
--
Target Version/s: 0.20.1
   Fix Version/s: 0.21.0
Story Points: 2

commit 63ed9863f927beb2cf074aacc838fb601329
Author: Vinod Kone vinodk...@gmail.com
Date:   Mon Sep 8 15:40:54 2014 -0700

Added kill() to freezerTimedOut() in cgroups.cpp.
This is a short-term fix for MESOS-1758.

Review: https://reviews.apache.org/r/25457


 Freezer failure leads to lost task during container destruction.
 

 Key: MESOS-1758
 URL: https://issues.apache.org/jira/browse/MESOS-1758
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Benjamin Mahler
Assignee: Vinod Kone
 Fix For: 0.21.0


 In the past we've seen numerous issues around the freezer. Lately, on the 
 2.6.44 kernel, we've seen issues where we're unable to freeze the cgroup:
 (1) An oom occurs.
 (2) No indication of oom in the kernel logs.
 (3) The slave is unable to freeze the cgroup.
 (4) The task is marked as lost.
 {noformat}
 I0903 16:46:24.956040 25469 mem.cpp:575] Memory limit exceeded: Requested: 
 15488MB Maximum Used: 15488MB
 MEMORY STATISTICS:
 cache 7958691840
 rss 8281653248
 mapped_file 9474048
 pgpgin 4487861
 pgpgout 522933
 pgfault 2533780
 pgmajfault 11
 inactive_anon 0
 active_anon 8281653248
 inactive_file 7631708160
 active_file 326852608
 unevictable 0
 hierarchical_memory_limit 16240345088
 total_cache 7958691840
 total_rss 8281653248
 total_mapped_file 9474048
 total_pgpgin 4487861
 total_pgpgout 522933
 total_pgfault 2533780
 total_pgmajfault 11
 total_inactive_anon 0
 total_active_anon 8281653248
 total_inactive_file 7631728640
 total_active_file 326852608
 total_unevictable 0
 I0903 16:46:24.956848 25469 containerizer.cpp:1041] Container 
 bbb9732a-d600-4c1b-b326-846338c608c3 has reached its limit for resource 
 mem(*):1.62403e+10 and will be terminated
 I0903 16:46:24.957427 25469 containerizer.cpp:909] Destroying container 
 'bbb9732a-d600-4c1b-b326-846338c608c3'
 I0903 16:46:24.958664 25481 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:34.959529 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:34.962070 25482 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.710848ms
 I0903 16:46:34.962658 25479 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:44.963349 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:44.965631 25472 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.588224ms
 I0903 16:46:44.966356 25472 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:54.967254 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:46:56.008447 25475 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 2.15296ms
 I0903 16:46:56.009071 25466 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:06.010329 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:06.012538 25467 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.643008ms
 I0903 16:47:06.013216 25467 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:12.516348 25480 slave.cpp:3030] Current usage 9.57%. Max allowed 
 age: 5.630238827780799days
 I0903 16:47:16.015192 25488 cgroups.cpp:2209] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:16.017043 25486 cgroups.cpp:1404] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
 1.511168ms
 I0903 16:47:16.017555 25480 cgroups.cpp:2192] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
 I0903 16:47:19.862746 25483 http.cpp:245] HTTP request for 
 '/slave(1)/stats.json'
 E0903 16:47:24.960055 25472 slave.cpp:2557] Termination of executor 'E' of 
 framework '201104070004-002563-' failed: Failed to destroy container: 
 discarded future
 I0903 16:47:24.962054 25472 slave.cpp:2087] Handling status update TASK_LOST 
 (UUID: 

[jira] [Updated] (MESOS-1476) Provide endpoints for deactivating / activating slaves.

2014-09-08 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-1476:
---
Sprint:   (was: Mesos Q3 Sprint 5)

 Provide endpoints for deactivating / activating slaves.
 ---

 Key: MESOS-1476
 URL: https://issues.apache.org/jira/browse/MESOS-1476
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Benjamin Mahler
  Labels: gsoc2014

 When performing maintenance operations on slaves, it is important to allow 
 these slaves to be drained of their tasks.
 The first essential primitive of draining slaves is to prevent them from 
 running more tasks. This can be achieved by deactivating them: stop sending 
 their resource offers to frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1476) Provide endpoints for deactivating / activating slaves.

2014-09-08 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-1476:
--

Assignee: (was: Alexandra Sava)

Un-assigning for now since there is no longer a need for this with the updated 
maintenance design in MESOS-1474.

 Provide endpoints for deactivating / activating slaves.
 ---

 Key: MESOS-1476
 URL: https://issues.apache.org/jira/browse/MESOS-1476
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Benjamin Mahler
  Labels: gsoc2014

 When performing maintenance operations on slaves, it is important to allow 
 these slaves to be drained of their tasks.
 The first essential primitive of draining slaves is to prevent them from 
 running more tasks. This can be achieved by deactivating them: stop sending 
 their resource offers to frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1592) Design inverse resource offer support

2014-09-08 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126421#comment-14126421
 ] 

Benjamin Mahler commented on MESOS-1592:


Moving this to reviewable as inverse offers were designed as part of the 
maintenance work: MESOS-1474.

We are currently considering how persistent resources will interact with 
inverse offers and the other maintenance primitives.

 Design inverse resource offer support
 -

 Key: MESOS-1592
 URL: https://issues.apache.org/jira/browse/MESOS-1592
 Project: Mesos
  Issue Type: Task
  Components: allocation
Reporter: Benjamin Mahler
Assignee: Alexandra Sava

 An inverse resource offer means that Mesos is requesting resources back 
 from the framework, possibly within some time interval.
 This can be leveraged initially to provide more automated cluster 
 maintenance, by offering schedulers the opportunity to move tasks to 
 compensate for planned maintenance. Operators can set a time limit on how 
 long to wait for schedulers to relocate tasks before the tasks are forcibly 
 terminated.
 Inverse resource offers have many other potential uses, as it opens the 
 opportunity for the allocator to attempt to move tasks in the cluster through 
 the co-operation of the framework, possibly providing better 
 over-subscription, fairness, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1717) The slave does not show pending tasks in the JSON endpoints.

2014-09-08 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-1717:
---
Sprint: Q3 Sprint 4  (was: Q3 Sprint 4, Mesos Q3 Sprint 5)

 The slave does not show pending tasks in the JSON endpoints.
 

 Key: MESOS-1717
 URL: https://issues.apache.org/jira/browse/MESOS-1717
 Project: Mesos
  Issue Type: Bug
  Components: json api, slave
Reporter: Benjamin Mahler

 The slave does not show pending tasks in the /state.json endpoint.
 This is a bit tricky to add since we rely on knowing the executor directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)