[jira] [Commented] (MESOS-1621) Docker run networking should be configurable and support bridge network

2014-09-15 Thread Steve Domin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133746#comment-14133746
 ] 

Steve Domin commented on MESOS-1621:


That makes sense. When can we expect 0.21.0 to be released approximately?

> Docker run networking should be configurable and support bridge network
> ---
>
> Key: MESOS-1621
> URL: https://issues.apache.org/jira/browse/MESOS-1621
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: Docker
>
> Currently to easily support running executors in Docker image, we hardcode 
> --net=host into Docker run so slave and executor and reuse the same mechanism 
> to communicate, which is to pass the slave IP/PORT for the framework to 
> respond with it's own hostname and port information back to setup the tunnel.
> We want to see how to abstract this or even get rid of host networking 
> altogether if we have a good way to not rely on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1741) mesos-slave shouldn't fail if dockerd is down

2014-09-15 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133864#comment-14133864
 ] 

Timothy St. Clair commented on MESOS-1741:
--

So unless there is clear reporting all the way through, I "kind of" prefer the 
current semantics.  Failing early gives a clear indication that something is 
wrong, vs. masking the error and rolling out a solution on a fleet of machines 
which imho could be worse.   

> mesos-slave shouldn't fail if dockerd is down
> -
>
> Key: MESOS-1741
> URL: https://issues.apache.org/jira/browse/MESOS-1741
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.20.1
>Reporter: Bhuvan Arumugam
> Fix For: 0.20.1
>
>
> When using {{--containerizers=docker,mesos}} for mesos-slave, it fail to come 
> up if dockerd is not running. It use {{docker version}} to figure out docker 
> access. The {{docker version}} exit 1 if dockerd is not running.
> mesos-slave should launch with other containerzer (mesos), if dockerd is down.
> {code}
> I0827 21:33:23.953763 19448 logging.cpp:142] INFO level logging started!
> I0827 21:33:23.954180 19448 main.cpp:126] Build: 2014-08-21 21:26:28 by 
> jenkins
> I0827 21:33:23.954190 19448 main.cpp:128] Version: 0.21.0
> I0827 21:33:23.954196 19448 main.cpp:135] Git SHA: 
> 70784a9f234b2902d6fee11298365d9b08756313
> Failed to create a containerizer: Could not create DockerContainerizer: 
> Failed to execute 'docker version': exited with status exited with status 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1688) No offers if no memory is allocatable

2014-09-15 Thread Timothy St. Clair (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy St. Clair updated MESOS-1688:
-
Target Version/s: 0.20.1

Updating to add Target to 0.20.1, to eval if it is possible for this release 
cycle. 

> No offers if no memory is allocatable
> -
>
> Key: MESOS-1688
> URL: https://issues.apache.org/jira/browse/MESOS-1688
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.18.1, 0.18.2, 0.19.0, 0.19.1
>Reporter: Martin Weindel
>Priority: Critical
>
> The [Spark 
> scheduler|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala]
>  allocates memory only for the executor and cpu only for its tasks.
> So it can happen that all memory is nearly completely allocated by Spark 
> executors, but all cpu resources are idle.
> In this case Mesos does not offer resources anymore, as less than MIN_MEM 
> (=32MB) memory is allocatable.
> This effectively causes a dead lock in the Spark job, as it is not offered 
> cpu resources needed for launching new tasks.
> see {{HierarchicalAllocatorProcess::allocatable(const Resources&)}} called in 
> {{HierarchicalAllocatorProcess::allocate(const hashset&)}}
> {code}
> template 
> bool
> HierarchicalAllocatorProcess::allocatable(
> const Resources& resources)
> {
> ...
>   Option cpus = resources.cpus();
>   Option mem = resources.mem();
>   if (cpus.isSome() && mem.isSome()) {
> return cpus.get() >= MIN_CPUS && mem.get() > MIN_MEM;
>   }
>   return false;
> }
> {code}
> A possible solution may to completely drop the condition on allocatable 
> memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1621) Docker run networking should be configurable and support bridge network

2014-09-15 Thread Jay Buffington (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134102#comment-14134102
 ] 

Jay Buffington commented on MESOS-1621:
---

[~stevedomin] the currently release cadence is about one release per month.

> Docker run networking should be configurable and support bridge network
> ---
>
> Key: MESOS-1621
> URL: https://issues.apache.org/jira/browse/MESOS-1621
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: Docker
>
> Currently to easily support running executors in Docker image, we hardcode 
> --net=host into Docker run so slave and executor and reuse the same mechanism 
> to communicate, which is to pass the slave IP/PORT for the framework to 
> respond with it's own hostname and port information back to setup the tunnel.
> We want to see how to abstract this or even get rid of host networking 
> altogether if we have a good way to not rely on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1621) Docker run networking should be configurable and support bridge network

2014-09-15 Thread Steve Domin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134110#comment-14134110
 ] 

Steve Domin commented on MESOS-1621:


[~jaybuff] cool, thanks for the answer!

> Docker run networking should be configurable and support bridge network
> ---
>
> Key: MESOS-1621
> URL: https://issues.apache.org/jira/browse/MESOS-1621
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: Docker
>
> Currently to easily support running executors in Docker image, we hardcode 
> --net=host into Docker run so slave and executor and reuse the same mechanism 
> to communicate, which is to pass the slave IP/PORT for the framework to 
> respond with it's own hostname and port information back to setup the tunnel.
> We want to see how to abstract this or even get rid of host networking 
> altogether if we have a good way to not rely on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1788) Add c++11 feature whitelist to style guide

2014-09-15 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1788:
-
Issue Type: Documentation  (was: Technical task)
Parent: (was: MESOS-1793)

> Add c++11 feature whitelist to style guide
> --
>
> Key: MESOS-1788
> URL: https://issues.apache.org/jira/browse/MESOS-1788
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Dominic Hamon
>Priority: Minor
>
> We now support a subset of C++11 as checked by the {{configure}} script. It 
> would be useful to have documentation in the style guide regarding the white 
> list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1788) Add c++11 feature whitelist to style guide

2014-09-15 Thread Dominic Hamon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134137#comment-14134137
 ] 

Dominic Hamon commented on MESOS-1788:
--

converted back to documentation type. We don't use 'technical task' and this is 
specifically to add to documentation.

> Add c++11 feature whitelist to style guide
> --
>
> Key: MESOS-1788
> URL: https://issues.apache.org/jira/browse/MESOS-1788
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Dominic Hamon
>Priority: Minor
>
> We now support a subset of C++11 as checked by the {{configure}} script. It 
> would be useful to have documentation in the style guide regarding the white 
> list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MESOS-1642) Slave should proceed with recovery if the old resources is a subset of the new resources.

2014-09-15 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon closed MESOS-1642.

Resolution: Duplicate

> Slave should proceed with recovery if the old resources is a subset of the 
> new resources.
> -
>
> Key: MESOS-1642
> URL: https://issues.apache.org/jira/browse/MESOS-1642
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> This would simply the deploy a lot if we want to increase the slave resources 
> (or slave private resources) in the SlaveInfo. The current slave will simply 
> flap if it finds the old/new SlaveInfo are not equal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-1520) Mesos headers include stout

2014-09-15 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon resolved MESOS-1520.
--
   Resolution: Fixed
Fix Version/s: 0.20.0

As per MESOS-1608, stout headers are now installed. If stout is ever separated, 
the configure scripts will take care of discovering availability.

> Mesos headers include stout
> ---
>
> Key: MESOS-1520
> URL: https://issues.apache.org/jira/browse/MESOS-1520
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Reporter: Zuyu Zhang
> Fix For: 0.20.0
>
>
> include/mesos/resources.hpp
> 30:#include 
> 31:#include 
> 32:#include 
> 33:#include 
> include/mesos/scheduler.hpp
> 365:   * because 'stout' is not visible from here.
> include/mesos/values.hpp
> 24:#include 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MESOS-1699) OsTest(s) sometimes are stuck forever

2014-09-15 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon closed MESOS-1699.

Resolution: Cannot Reproduce

> OsTest(s) sometimes are stuck forever
> -
>
> Key: MESOS-1699
> URL: https://issues.apache.org/jira/browse/MESOS-1699
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.0
>Reporter: Vinod Kone
>
> Observed this quite a few times on the Apache CI. The tests timeout somewhere 
> in the OsTests.
> E.g: 
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2310/consoleText
> {code}
> [--] 20 tests from OsTest
> [ RUN  ] OsTest.find
> [   OK ] OsTest.find (1 ms)
> [ RUN  ] OsTest.system
> sh: 1: invalid.command: not found
> [   OK ] OsTest.system (43 ms)
> [ RUN  ] OsTest.release
> [   OK ] OsTest.release (1 ms)
> [ RUN  ] OsTest.sysname
> [   OK ] OsTest.sysname (0 ms)
> [ RUN  ] OsTest.uname
> [   OK ] OsTest.uname (0 ms)
> [ RUN  ] OsTest.process
> [   OK ] OsTest.process (0 ms)
> [ RUN  ] OsTest.touch
> [   OK ] OsTest.touch (1 ms)
> [ RUN  ] OsTest.pstree
> Build timed out (after 180 minutes). Marking the build as failed.
> Build was aborted
> Sending e-mails to: d...@mesos.apache.org benjamin.hind...@gmail.com
> make[5]: *** wait: No child processes.  Stop.
> make[5]: *** Waiting for unfinished jobs
> make[5]: *** wait: No child processes.  Stop.
> make[3]: make[2]: *** [check-recursive] Terminated
> *** [check-recursive] Terminated
> make[4]: *** [check] Error 2
> make: *** [check-recursive] Terminated
> make[1]: *** [check] Terminated
> Finished: FAILURE
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MESOS-1735) Better Startup Failure For Duplicate Master

2014-09-15 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon closed MESOS-1735.

Resolution: Won't Fix

The error being printed is not unique to running duplicate mesos masters as 
another process may be bound to the requested port. As such, the error message 
is accurate.

Correctly identifying that another master is already running is a larger scope 
issue that doesn't seem particularly useful to add.

> Better Startup Failure For Duplicate Master
> ---
>
> Key: MESOS-1735
> URL: https://issues.apache.org/jira/browse/MESOS-1735
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.20.0
> Environment: Ubuntu 12.04
>Reporter: Ken Sipe
>
> The error message is cryptic when starting a mesos-master when a mesos-master 
> is already running.   The error message is:
> mesos-master --ip=192.168.74.174 --work_dir=~/mesos
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F0826 20:24:56.940961  3057 process.cpp:1632] Failed to initialize, bind: 
> Address already in use [98]
> *** Check failure stack trace: ***
> Aborted (core dumped)
> This can be a new person's first experience.  It isn't clear to them that the 
> process is already running.  And they are lost as to what to do next.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MESOS-1742) Remove Using Directives

2014-09-15 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon closed MESOS-1742.

Resolution: Won't Fix

> Remove Using Directives
> ---
>
> Key: MESOS-1742
> URL: https://issues.apache.org/jira/browse/MESOS-1742
> Project: Mesos
>  Issue Type: Story
>  Components: build
>Reporter: Jessica Hartog
>  Labels: build
>
> We tell developers to follow the Google C++ Style Guide 
> (http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml) which says, 
> "Do not use a using-directive."
> At the moment 134 files in src/ do not conform to this practice.
> $ grep -r -l "using namespace" src | wc -l
> 134
> Using directives increase compile time by requiring a larger lookup space for 
> unqualified ids.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-1195) systemd.slice + cgroup enablement fails in multiple ways.

2014-09-15 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043981#comment-14043981
 ] 

Timothy St. Clair edited comment on MESOS-1195 at 9/15/14 7:41 PM:
---

*DONE* -reviews.apache.org/r/22977/-
*DONE* -reviews.apache.org/r/22979- 


was (Author: tstclair):
*DONE* -reviews.apache.org/r/22977/-
*DONE* -reviews.apache.org/r/22979- 
https://reviews.apache.org/r/22980
https://reviews.apache.org/r/22981/

> systemd.slice + cgroup enablement fails in multiple ways. 
> --
>
> Key: MESOS-1195
> URL: https://issues.apache.org/jira/browse/MESOS-1195
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.18.0
>Reporter: Timothy St. Clair
>Assignee: Timothy St. Clair
>
> When attempting to configure mesos to use systemd slices on a 'rawhide/f21' 
> machine, it fails creating the isolator: 
> I0407 12:39:28.035354 14916 containerizer.cpp:180] Using isolation: 
> cgroups/cpu,cgroups/mem
> Failed to create a containerizer: Could not create isolator cgroups/cpu: 
> Failed to create isolator: The cpu subsystem is co-mounted at 
> /sys/fs/cgroup/cpu with other subsytems
> -- details --
> /sys/fs/cgroup
> total 0
> drwxr-xr-x. 12 root root 280 Mar 18 08:47 .
> drwxr-xr-x.  6 root root   0 Mar 18 08:47 ..
> drwxr-xr-x.  2 root root   0 Mar 18 08:47 blkio
> lrwxrwxrwx.  1 root root  11 Mar 18 08:47 cpu -> cpu,cpuacct
> lrwxrwxrwx.  1 root root  11 Mar 18 08:47 cpuacct -> cpu,cpuacct
> drwxr-xr-x.  2 root root   0 Mar 18 08:47 cpu,cpuacct
> drwxr-xr-x.  2 root root   0 Mar 18 08:47 cpuset
> drwxr-xr-x.  2 root root   0 Mar 18 08:47 devices
> drwxr-xr-x.  2 root root   0 Mar 18 08:47 freezer
> drwxr-xr-x.  2 root root   0 Mar 18 08:47 hugetlb
> drwxr-xr-x.  3 root root   0 Apr  3 11:26 memory
> drwxr-xr-x.  2 root root   0 Mar 18 08:47 net_cls
> drwxr-xr-x.  2 root root   0 Mar 18 08:47 perf_event
> drwxr-xr-x.  4 root root   0 Mar 18 08:47 systemd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1795) Assertion failure in state abstraction crashes JVM

2014-09-15 Thread Connor Doyle (JIRA)
Connor Doyle created MESOS-1795:
---

 Summary: Assertion failure in state abstraction crashes JVM
 Key: MESOS-1795
 URL: https://issues.apache.org/jira/browse/MESOS-1795
 Project: Mesos
  Issue Type: Bug
  Components: java api
Affects Versions: 0.20.0
Reporter: Connor Doyle


Observed the following log output prior to a crash of the Marathon scheduler:

Sep 12 23:46:01 highly-available-457-540 marathon[11494]: F0912 23:46:01.771927 
11532 org_apache_mesos_state_AbstractState.cpp:145] CHECK_READY(*future): is 
PENDING 
Sep 12 23:46:01 highly-available-457-540 marathon[11494]: *** Check failure 
stack trace: ***
Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
0x7febc2663a2d  google::LogMessage::Fail()
Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
0x7febc26657e3  google::LogMessage::SendToLog()
Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
0x7febc2663648  google::LogMessage::Flush()
Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
0x7febc266603e  google::LogMessageFatal::~LogMessageFatal()
Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
0x7febc26588a3  Java_org_apache_mesos_state_AbstractState__1_1fetch_1get
Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
0x7febcd107d98  (unknown)

Listing 1: Crash log output.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1795) Assertion failure in state abstraction crashes JVM

2014-09-15 Thread Connor Doyle (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134406#comment-14134406
 ] 

Connor Doyle commented on MESOS-1795:
-

Review for a fix is in progress here: https://reviews.apache.org/r/25614

> Assertion failure in state abstraction crashes JVM
> --
>
> Key: MESOS-1795
> URL: https://issues.apache.org/jira/browse/MESOS-1795
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 0.20.0
>Reporter: Connor Doyle
>
> Observed the following log output prior to a crash of the Marathon scheduler:
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: F0912 
> 23:46:01.771927 11532 org_apache_mesos_state_AbstractState.cpp:145] 
> CHECK_READY(*future): is PENDING 
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: *** Check failure 
> stack trace: ***
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663a2d  google::LogMessage::Fail()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26657e3  google::LogMessage::SendToLog()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663648  google::LogMessage::Flush()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc266603e  google::LogMessageFatal::~LogMessageFatal()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26588a3  Java_org_apache_mesos_state_AbstractState__1_1fetch_1get
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febcd107d98  (unknown)
> Listing 1: Crash log output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1796) Support multiple working paths

2014-09-15 Thread Charles Allen (JIRA)
Charles Allen created MESOS-1796:


 Summary: Support multiple working paths
 Key: MESOS-1796
 URL: https://issues.apache.org/jira/browse/MESOS-1796
 Project: Mesos
  Issue Type: Wish
  Components: slave
Reporter: Charles Allen
Priority: Minor


As a framework developer, I would like the ability to have multiple working 
paths as part of a slave reporting its resources.

Currently, if a slave (like an ec2 instance) has multiple disks, the disks must 
be combined in a MD array or similar in order to be fully utilized in Mesos. 
This ask is to allow multiple disks to be mounted on multiple paths, and have 
the slave be able to support and report availability on these various working 
paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1796) Support multiple working paths

2014-09-15 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134465#comment-14134465
 ] 

Charles Allen commented on MESOS-1796:
--

If there is a way to specify multiple working paths, then I have completely 
missed it in the docs.

> Support multiple working paths
> --
>
> Key: MESOS-1796
> URL: https://issues.apache.org/jira/browse/MESOS-1796
> Project: Mesos
>  Issue Type: Wish
>  Components: slave
>Reporter: Charles Allen
>Priority: Minor
>
> As a framework developer, I would like the ability to have multiple working 
> paths as part of a slave reporting its resources.
> Currently, if a slave (like an ec2 instance) has multiple disks, the disks 
> must be combined in a MD array or similar in order to be fully utilized in 
> Mesos. This ask is to allow multiple disks to be mounted on multiple paths, 
> and have the slave be able to support and report availability on these 
> various working paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1795) Assertion failure in state abstraction crashes JVM

2014-09-15 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-1795:
--
Assignee: Connor Doyle  (was: Niklas Quarfot Nielsen)

> Assertion failure in state abstraction crashes JVM
> --
>
> Key: MESOS-1795
> URL: https://issues.apache.org/jira/browse/MESOS-1795
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 0.20.0
>Reporter: Connor Doyle
>Assignee: Connor Doyle
>
> Observed the following log output prior to a crash of the Marathon scheduler:
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: F0912 
> 23:46:01.771927 11532 org_apache_mesos_state_AbstractState.cpp:145] 
> CHECK_READY(*future): is PENDING 
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: *** Check failure 
> stack trace: ***
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663a2d  google::LogMessage::Fail()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26657e3  google::LogMessage::SendToLog()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663648  google::LogMessage::Flush()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc266603e  google::LogMessageFatal::~LogMessageFatal()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26588a3  Java_org_apache_mesos_state_AbstractState__1_1fetch_1get
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febcd107d98  (unknown)
> Listing 1: Crash log output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1795) Assertion failure in state abstraction crashes JVM

2014-09-15 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen reassigned MESOS-1795:
-

Assignee: Niklas Quarfot Nielsen

> Assertion failure in state abstraction crashes JVM
> --
>
> Key: MESOS-1795
> URL: https://issues.apache.org/jira/browse/MESOS-1795
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 0.20.0
>Reporter: Connor Doyle
>Assignee: Niklas Quarfot Nielsen
>
> Observed the following log output prior to a crash of the Marathon scheduler:
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: F0912 
> 23:46:01.771927 11532 org_apache_mesos_state_AbstractState.cpp:145] 
> CHECK_READY(*future): is PENDING 
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: *** Check failure 
> stack trace: ***
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663a2d  google::LogMessage::Fail()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26657e3  google::LogMessage::SendToLog()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663648  google::LogMessage::Flush()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc266603e  google::LogMessageFatal::~LogMessageFatal()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26588a3  Java_org_apache_mesos_state_AbstractState__1_1fetch_1get
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febcd107d98  (unknown)
> Listing 1: Crash log output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1795) Assertion failure in state abstraction crashes JVM

2014-09-15 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-1795:
--
Shepherd: Niklas Quarfot Nielsen

> Assertion failure in state abstraction crashes JVM
> --
>
> Key: MESOS-1795
> URL: https://issues.apache.org/jira/browse/MESOS-1795
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 0.20.0
>Reporter: Connor Doyle
>Assignee: Connor Doyle
>
> Observed the following log output prior to a crash of the Marathon scheduler:
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: F0912 
> 23:46:01.771927 11532 org_apache_mesos_state_AbstractState.cpp:145] 
> CHECK_READY(*future): is PENDING 
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: *** Check failure 
> stack trace: ***
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663a2d  google::LogMessage::Fail()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26657e3  google::LogMessage::SendToLog()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663648  google::LogMessage::Flush()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc266603e  google::LogMessageFatal::~LogMessageFatal()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26588a3  Java_org_apache_mesos_state_AbstractState__1_1fetch_1get
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febcd107d98  (unknown)
> Listing 1: Crash log output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1795) Assertion failure in state abstraction crashes JVM

2014-09-15 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134472#comment-14134472
 ] 

Benjamin Mahler commented on MESOS-1795:


Do you understand what transpired?

> Assertion failure in state abstraction crashes JVM
> --
>
> Key: MESOS-1795
> URL: https://issues.apache.org/jira/browse/MESOS-1795
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 0.20.0
>Reporter: Connor Doyle
>Assignee: Connor Doyle
>
> Observed the following log output prior to a crash of the Marathon scheduler:
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: F0912 
> 23:46:01.771927 11532 org_apache_mesos_state_AbstractState.cpp:145] 
> CHECK_READY(*future): is PENDING 
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: *** Check failure 
> stack trace: ***
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663a2d  google::LogMessage::Fail()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26657e3  google::LogMessage::SendToLog()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663648  google::LogMessage::Flush()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc266603e  google::LogMessageFatal::~LogMessageFatal()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26588a3  Java_org_apache_mesos_state_AbstractState__1_1fetch_1get
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febcd107d98  (unknown)
> Listing 1: Crash log output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1795) Assertion failure in state abstraction crashes JVM

2014-09-15 Thread Connor Doyle (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134476#comment-14134476
 ] 

Connor Doyle commented on MESOS-1795:
-

[~bmahler] I believe so, and those conjectures are outlined in the review.  
However if you or others can shed more light on the cause that would be great.  
Unfortunately this has been a difficult issue to reproduce.

> Assertion failure in state abstraction crashes JVM
> --
>
> Key: MESOS-1795
> URL: https://issues.apache.org/jira/browse/MESOS-1795
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 0.20.0
>Reporter: Connor Doyle
>Assignee: Connor Doyle
>
> Observed the following log output prior to a crash of the Marathon scheduler:
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: F0912 
> 23:46:01.771927 11532 org_apache_mesos_state_AbstractState.cpp:145] 
> CHECK_READY(*future): is PENDING 
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: *** Check failure 
> stack trace: ***
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663a2d  google::LogMessage::Fail()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26657e3  google::LogMessage::SendToLog()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc2663648  google::LogMessage::Flush()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc266603e  google::LogMessageFatal::~LogMessageFatal()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febc26588a3  Java_org_apache_mesos_state_AbstractState__1_1fetch_1get
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ 
> 0x7febcd107d98  (unknown)
> Listing 1: Crash log output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1392) Failure when znode is removed before we can read its contents.

2014-09-15 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134503#comment-14134503
 ] 

Yan Xu commented on MESOS-1392:
---

https://reviews.apache.org/r/25663

> Failure when znode is removed before we can read its contents.
> --
>
> Key: MESOS-1392
> URL: https://issues.apache.org/jira/browse/MESOS-1392
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
>Reporter: Benjamin Mahler
>Assignee: Yan Xu
>
> Looks like the following can occur when a znode goes away right before we can 
> read it's contents:
> {noformat: title=Slave exit}
> I0520 16:33:45.721727 29155 group.cpp:382] Trying to create path 
> '/home/mesos/test/master' in ZooKeeper
> I0520 16:33:48.600837 29155 detector.cpp:134] Detected a new leader: 
> (id='2617')
> I0520 16:33:48.601428 29147 group.cpp:655] Trying to get 
> '/home/mesos/test/master/info_002617' in ZooKeeper
> Failed to detect a master: Failed to get data for ephemeral node 
> '/home/mesos/test/master/info_002617' in ZooKeeper: no node
> Slave Exit Status: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-1753) Allow default/deleted functions

2014-09-15 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1753:
-
Comment: was deleted

(was: https://reviews.apache.org/r/25261/)

> Allow default/deleted functions
> ---
>
> Key: MESOS-1753
> URL: https://issues.apache.org/jira/browse/MESOS-1753
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Dominic Hamon
>Assignee: Dominic Hamon
>Priority: Minor
>  Labels: c++11
>
> Add default/delete functions to the configure script. Once there, we can 
> start using them across the code-base.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1753) Allow default/deleted functions

2014-09-15 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1753:
-
Assignee: (was: Dominic Hamon)

> Allow default/deleted functions
> ---
>
> Key: MESOS-1753
> URL: https://issues.apache.org/jira/browse/MESOS-1753
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Dominic Hamon
>Priority: Minor
>  Labels: c++11
>
> Add default/delete functions to the configure script. Once there, we can 
> start using them across the code-base.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-740) Create an Archive abstraction.

2014-09-15 Thread Thomas Rampelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134593#comment-14134593
 ] 

Thomas Rampelberg commented on MESOS-740:
-

This is going to become a bigger issue with the new cli. Right now, the CLI 
treats the master as the one true source of information. When you have a 
framework that is running a large number of tasks, the current buffer fills up 
quickly and you end up loosing the task id. For slaves that still have the 
files of a potentially failed task still around (because the buffer there is 
totally different), you're no longer able to discover where the task ran from 
the master.

It would be great if the archive was a module that let you define what 
retention policy you had for older, completed tasks.

> Create an Archive abstraction.
> --
>
> Key: MESOS-740
> URL: https://issues.apache.org/jira/browse/MESOS-740
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>
> I was thinking about this lately given the resource monitor, the slave, and 
> the master all keep archived information around at the cost of increased 
> memory consumption.
> An archive abstraction would simplify the code in these components, as well 
> as help ensure the master / slave remain below a certain amount of memory 
> consumption. This proves useful if were were to start placing the slave 
> inside a cgroup with a memory limit.
> The archive could periodically monitor memory consumption (or use memory 
> threshold notifications) to decide when to GC some of the archived 
> information. This would be slightly analogous to how we GC the disk based on 
> disk usage.
> Thinking about what this may look like:
> 1. Archive is map-like and holds the data using . This would need 
> to be templated and as such may require that separate components create 
> separate Archives.
> 2. Archive distributes Futures that become ready when memory consumption 
> needs to be lowered. Components can set callbacks on these futures to 
> routines that remove data from their historical state. At this point Archive 
> is really just a MemoryMonitor abstraction.
> [~benjaminhindman] I heard you may have been thinking about this as well at 
> one point, let me know what ideas you have!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1621) Docker run networking should be configurable and support bridge network

2014-09-15 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134595#comment-14134595
 ] 

Timothy Chen commented on MESOS-1621:
-

[~bhuvan] I think from the reviewboard it seems like there isn't any major 
comments, so I personally think it can go through 0.20.1 as this seems to be 
blocking lots of adoption of the Docker + Mesos feature.
Let me know what you think that also needs to change for the patch, or you 
think otherwise.

> Docker run networking should be configurable and support bridge network
> ---
>
> Key: MESOS-1621
> URL: https://issues.apache.org/jira/browse/MESOS-1621
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: Docker
>
> Currently to easily support running executors in Docker image, we hardcode 
> --net=host into Docker run so slave and executor and reuse the same mechanism 
> to communicate, which is to pass the slave IP/PORT for the framework to 
> respond with it's own hostname and port information back to setup the tunnel.
> We want to see how to abstract this or even get rid of host networking 
> altogether if we have a good way to not rely on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1788) Add c++11 feature whitelist to style guide

2014-09-15 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134620#comment-14134620
 ] 

Alexander Rukletsov commented on MESOS-1788:


Sorry, this wasn't on purpose. Sneak together with parent change.

> Add c++11 feature whitelist to style guide
> --
>
> Key: MESOS-1788
> URL: https://issues.apache.org/jira/browse/MESOS-1788
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Dominic Hamon
>Priority: Minor
>
> We now support a subset of C++11 as checked by the {{configure}} script. It 
> would be useful to have documentation in the style guide regarding the white 
> list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1771) introduce unique_ptr

2014-09-15 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1771:
-
Description: 
* add unique_ptr to the configure check
* document use of unique_ptr in style guide
** use when possible, use std::move when necessary
* move raw pointers to Owned to establish ownership
* deprecate Owned in favour of unique_ptr


  was:
* add unique_ptr to the configure check
* document use of unique_ptr in style guide
** use when possible, use std::move when necessary
* deprecate Owned in favour of unique_ptr
* Move raw pointers with ownership over to unique_ptr


> introduce unique_ptr
> 
>
> Key: MESOS-1771
> URL: https://issues.apache.org/jira/browse/MESOS-1771
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Dominic Hamon
>Assignee: Dominic Hamon
>
> * add unique_ptr to the configure check
> * document use of unique_ptr in style guide
> ** use when possible, use std::move when necessary
> * move raw pointers to Owned to establish ownership
> * deprecate Owned in favour of unique_ptr



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MESOS-1792) add os::shell with pluggable IO descriptors

2014-09-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Domański closed MESOS-1792.
-
  Resolution: Invalid
Assignee: (was: Kamil Domański)
Target Version/s:   (was: 1.0.0, 0.21.0)

> add os::shell with pluggable IO descriptors
> ---
>
> Key: MESOS-1792
> URL: https://issues.apache.org/jira/browse/MESOS-1792
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 0.20.0
>Reporter: Kamil Domański
>  Labels: features, patch, performance
>
> Add an overload of os::shell that allows to run a command with stdin and/or 
> stdout substituted by file descriptors passed as parameters.
> This will allow to pipe a stream of data in and out of a process e.g. 
> directly from download to extraction, as proposed by [~bernd-mesos] in 
> MESOS-1667.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1770) Docker with command shell=true should override entrypoint

2014-09-15 Thread Bhuvan Arumugam (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134979#comment-14134979
 ] 

Bhuvan Arumugam commented on MESOS-1770:


[~tnachen] the above patch break the build. can you please fix and post a 
revised patch?

> Docker with command shell=true should override entrypoint
> -
>
> Key: MESOS-1770
> URL: https://issues.apache.org/jira/browse/MESOS-1770
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: docker
>
> Currently with the new CommandInfo there is a shell flag that if it's 
> enabled, will wrap the command with /bin/sh -c  with docker run.
> However we don't override the entrypoint, therefore when a user specified a 
> image with a entrypoint and also have shell=true then /bin/sh -c will become 
> part of the argument to the entrypoint.
> I don't think there is any example where users expect /bin/sh -c to be a 
> argument in the entrypoint, and to make sure cases where shell is needed for 
> expanding environment variables we also override the entrypoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-1741) mesos-slave shouldn't fail if dockerd is down

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam resolved MESOS-1741.

  Resolution: Won't Fix
   Fix Version/s: (was: 0.20.1)
Target Version/s: 0.21.0  (was: 0.20.1)

marking it as won't fix as we are fine to not bring up mesos-slave when dockerd 
is down.

> mesos-slave shouldn't fail if dockerd is down
> -
>
> Key: MESOS-1741
> URL: https://issues.apache.org/jira/browse/MESOS-1741
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.20.1
>Reporter: Bhuvan Arumugam
>
> When using {{--containerizers=docker,mesos}} for mesos-slave, it fail to come 
> up if dockerd is not running. It use {{docker version}} to figure out docker 
> access. The {{docker version}} exit 1 if dockerd is not running.
> mesos-slave should launch with other containerzer (mesos), if dockerd is down.
> {code}
> I0827 21:33:23.953763 19448 logging.cpp:142] INFO level logging started!
> I0827 21:33:23.954180 19448 main.cpp:126] Build: 2014-08-21 21:26:28 by 
> jenkins
> I0827 21:33:23.954190 19448 main.cpp:128] Version: 0.21.0
> I0827 21:33:23.954196 19448 main.cpp:135] Git SHA: 
> 70784a9f234b2902d6fee11298365d9b08756313
> Failed to create a containerizer: Could not create DockerContainerizer: 
> Failed to execute 'docker version': exited with status exited with status 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1621) Docker run networking should be configurable and support bridge network

2014-09-15 Thread Bhuvan Arumugam (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134992#comment-14134992
 ] 

Bhuvan Arumugam commented on MESOS-1621:


[~tnachen] sure. Updated reviewboard thread to remind volunteers to 
review/submit the patch.

retaining it in 0.20.1.

> Docker run networking should be configurable and support bridge network
> ---
>
> Key: MESOS-1621
> URL: https://issues.apache.org/jira/browse/MESOS-1621
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: Docker
>
> Currently to easily support running executors in Docker image, we hardcode 
> --net=host into Docker run so slave and executor and reuse the same mechanism 
> to communicate, which is to pass the slave IP/PORT for the framework to 
> respond with it's own hostname and port information back to setup the tunnel.
> We want to see how to abstract this or even get rid of host networking 
> altogether if we have a good way to not rely on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1728) Libprocess: report bind parameters on failure

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1728:
---
Fix Version/s: (was: 0.21.0)
   0.20.1

> Libprocess: report bind parameters on failure
> -
>
> Key: MESOS-1728
> URL: https://issues.apache.org/jira/browse/MESOS-1728
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Nikita Vetoshkin
>Assignee: Nikita Vetoshkin
>Priority: Trivial
> Fix For: 0.20.1
>
>
> When you attempt to start slave or master and there's another one already 
> running there, it is nice to report what are the actual parameters to 
> {{bind}} call that failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1705) SubprocessTest.Status sometimes flakes out

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1705:
---
Fix Version/s: (was: 0.21.0)
   0.20.1

> SubprocessTest.Status sometimes flakes out
> --
>
> Key: MESOS-1705
> URL: https://issues.apache.org/jira/browse/MESOS-1705
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.20.0
>Reporter: Timothy St. Clair
>Assignee: Vinod Kone
>Priority: Minor
>  Labels: build
> Fix For: 0.20.1
>
>
> It's a pretty rare event, but happened more then once.  
> [ RUN  ] SubprocessTest.Status
> *** Aborted at 1408023909 (unix time) try "date -d @1408023909" if you are 
> using GNU date ***
> PC: @   0x35700094b1 (unknown)
> *** SIGTERM (@0x3e841d8) received by PID 16872 (TID 0x7fa9ea426780) from 
> PID 16856; stack trace: ***
> @   0x3570435cb0 (unknown)
> @   0x35700094b1 (unknown)
> @   0x3570009d9f (unknown)
> @   0x357000e726 (unknown)
> @   0x3570015185 (unknown)
> @   0x5ead42 process::childMain()
> @   0x5ece8d std::_Function_handler<>::_M_invoke()
> @   0x5eac9c process::defaultClone()
> @   0x5ebbd4 process::subprocess()
> @   0x55a229 process::subprocess()
> @   0x55a846 process::subprocess()
> @   0x54224c SubprocessTest_Status_Test::TestBody()
> @ 0x7fa9ea460323 (unknown)
> @ 0x7fa9ea455b67 (unknown)
> @ 0x7fa9ea455c0e (unknown)
> @ 0x7fa9ea455d15 (unknown)
> @ 0x7fa9ea4593a8 (unknown)
> @ 0x7fa9ea459647 (unknown)
> @   0x422466 main
> @   0x3570421d65 (unknown)
> @   0x4260bd (unknown)
> [   OK ] SubprocessTest.Status (153 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1621) Docker run networking should be configurable and support bridge network

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1621:
---
Fix Version/s: 0.20.1

> Docker run networking should be configurable and support bridge network
> ---
>
> Key: MESOS-1621
> URL: https://issues.apache.org/jira/browse/MESOS-1621
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: Docker
> Fix For: 0.20.1
>
>
> Currently to easily support running executors in Docker image, we hardcode 
> --net=host into Docker run so slave and executor and reuse the same mechanism 
> to communicate, which is to pass the slave IP/PORT for the framework to 
> respond with it's own hostname and port information back to setup the tunnel.
> We want to see how to abstract this or even get rid of host networking 
> altogether if we have a good way to not rely on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1770) Docker with command shell=true should override entrypoint

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1770:
---
Fix Version/s: 0.20.1

> Docker with command shell=true should override entrypoint
> -
>
> Key: MESOS-1770
> URL: https://issues.apache.org/jira/browse/MESOS-1770
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: docker
> Fix For: 0.20.1
>
>
> Currently with the new CommandInfo there is a shell flag that if it's 
> enabled, will wrap the command with /bin/sh -c  with docker run.
> However we don't override the entrypoint, therefore when a user specified a 
> image with a entrypoint and also have shell=true then /bin/sh -c will become 
> part of the argument to the entrypoint.
> I don't think there is any example where users expect /bin/sh -c to be a 
> argument in the entrypoint, and to make sure cases where shell is needed for 
> expanding environment variables we also override the entrypoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1764) Build Fixes from 0.20 release

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1764:
---
Fix Version/s: 0.20.1

> Build Fixes from 0.20 release
> -
>
> Key: MESOS-1764
> URL: https://issues.apache.org/jira/browse/MESOS-1764
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.20.0
>Reporter: Timothy St. Clair
>Assignee: Timothy St. Clair
> Fix For: 0.20.1
>
>
> This ticket is a catch all for minor issues caught during a rebase and 
> testing.
> + Add package configuration file to deployment
> + Updates deploy_dir from localstatedir to sysconfdir



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1729) LogZooKeeperTest.WriteRead fails due to SIGPIPE (escalated to SIGABRT)

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1729:
---
Fix Version/s: (was: 0.21.0)
   0.20.1

> LogZooKeeperTest.WriteRead fails due to SIGPIPE (escalated to SIGABRT)
> --
>
> Key: MESOS-1729
> URL: https://issues.apache.org/jira/browse/MESOS-1729
> Project: Mesos
>  Issue Type: Bug
>  Components: build, test
>Affects Versions: 0.21.0
> Environment: OSX 10.9.4, clang 3.4.
> Same or very similar results on Linux
>Reporter: Till Toenshoff
>Assignee: Vinod Kone
>  Labels: test
> Fix For: 0.20.1
>
>
> The following is reported and 100% reproducible when running {{make check}} 
> on my OSX box.
> {noformat}
> [ RUN  ] LogZooKeeperTest.WriteRead
> I0821 21:18:34.960811 2078368528 jvm.cpp:572] Looking up method 
> (Ljava/lang/String;)V
> I0821 21:18:34.960934 2078368528 jvm.cpp:572] Looking up method 
> deleteOnExit()V
> I0821 21:18:34.961335 2078368528 jvm.cpp:572] Looking up method 
> (Ljava/io/File;Ljava/io/File;)V
> log4j:WARN No appenders could be found for logger 
> (org.apache.zookeeper.server.persistence.FileTxnSnapLog).
> log4j:WARN Please initialize the log4j system properly.
> I0821 21:18:35.004449 2078368528 jvm.cpp:572] Looking up method ()V
> I0821 21:18:35.005053 2078368528 jvm.cpp:572] Looking up method 
> (Lorg/apache/zookeeper/server/persistence/FileTxnSnapLog;Lorg/apache/zookeeper/server/ZooKeeperServer$DataTreeBuilder;)V
> I0821 21:18:35.025753 2078368528 jvm.cpp:572] Looking up method ()V
> I0821 21:18:35.032670 2078368528 jvm.cpp:572] Looking up method (I)V
> I0821 21:18:35.032873 2078368528 jvm.cpp:572] Looking up method 
> configure(Ljava/net/InetSocketAddress;I)V
> I0821 21:18:35.038020 2078368528 jvm.cpp:572] Looking up method 
> startup(Lorg/apache/zookeeper/server/ZooKeeperServer;)V
> I0821 21:18:35.093870 2078368528 jvm.cpp:572] Looking up method 
> getClientPort()I
> I0821 21:18:35.093925 2078368528 zookeeper_test_server.cpp:158] Started 
> ZooKeeperTestServer on port 52772
> I0821 21:18:35.094081 2078368528 log_tests.cpp:1945] Using temporary 
> directory '/tmp/LogZooKeeperTest_WriteRead_F8UzYv'
> I0821 21:18:35.095954 2078368528 leveldb.cpp:176] Opened db in 1815us
> I0821 21:18:35.096392 2078368528 leveldb.cpp:183] Compacted db in 428us
> I0821 21:18:35.096420 2078368528 leveldb.cpp:198] Created db iterator in 7us
> I0821 21:18:35.096432 2078368528 leveldb.cpp:204] Seeked to beginning of db 
> in 8us
> I0821 21:18:35.096442 2078368528 leveldb.cpp:273] Iterated through 0 keys in 
> the db in 8us
> I0821 21:18:35.096462 2078368528 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0821 21:18:35.097043 107220992 leveldb.cpp:306] Persisting metadata (8 
> bytes) to leveldb took 184us
> I0821 21:18:35.097075 107220992 replica.cpp:320] Persisted replica status to 
> VOTING
> I0821 21:18:35.099768 2078368528 leveldb.cpp:176] Opened db in 1673us
> I0821 21:18:35.100049 2078368528 leveldb.cpp:183] Compacted db in 270us
> I0821 21:18:35.100070 2078368528 leveldb.cpp:198] Created db iterator in 6us
> I0821 21:18:35.100080 2078368528 leveldb.cpp:204] Seeked to beginning of db 
> in 5us
> I0821 21:18:35.100088 2078368528 leveldb.cpp:273] Iterated through 0 keys in 
> the db in 5us
> I0821 21:18:35.100097 2078368528 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0821 21:18:35.100411 108294144 leveldb.cpp:306] Persisting metadata (8 
> bytes) to leveldb took 159us
> I0821 21:18:35.100435 108294144 replica.cpp:320] Persisted replica status to 
> VOTING
> I0821 21:18:35.101984 2078368528 leveldb.cpp:176] Opened db in 1224us
> I0821 21:18:35.102934 2078368528 leveldb.cpp:183] Compacted db in 942us
> I0821 21:18:35.102958 2078368528 leveldb.cpp:198] Created db iterator in 8us
> I0821 21:18:35.102972 2078368528 leveldb.cpp:204] Seeked to beginning of db 
> in 8us
> I0821 21:18:35.102984 2078368528 leveldb.cpp:273] Iterated through 1 keys in 
> the db in 9us
> I0821 21:18:35.102994 2078368528 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> 2014-08-21 21:18:35,103:6420(0x106641000):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> 2014-08-21 21:18:35,103:6420(0x106641000):ZOO_INFO@log_env@716: Client 
> environment:host.name=lobomacpro2.fritz.box
> 2014-08-21 21:18:35,103:6420(0x106641000):ZOO_INFO@log_env@723: Client 
> environment:os.name=Darwin
> 2014-08-21 21:18:35,103:6420(0x106641000):ZOO_INFO@log_env@724: Client 
> environment:os.arch=13.3.0
> 2014-08-21 21:18:35,103:6420(0x106641000):ZOO_INFO@log_env@725: Client 
> environment:os.version=Darwin Kernel Version 13.

[jira] [Updated] (MESOS-1740) Bad error message when docker containerizer isn't enabled

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1740:
---
Fix Version/s: 0.20.1

> Bad error message when docker containerizer isn't enabled
> -
>
> Key: MESOS-1740
> URL: https://issues.apache.org/jira/browse/MESOS-1740
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Jay Buffington
>Assignee: Timothy Chen
>Priority: Minor
>  Labels: docker
> Fix For: 0.20.1
>
>
> If I set container in TaskInfo's executor (aka DockerInfo) but I do not start 
> the slave with {{--containerizer=docker,...}} then I get this error message 
> in the log:
> {noformat}
> E0827 17:53:16.422735 20090 slave.cpp:2491] Container 'xxx' for executor 
> 'yyy' of framework 'zzz' failed to start: TaskInfo/ExecutorInfo not supported
> {noformat}
> A better error message would have been:
> {noformat}
> No enabled containerizers could create a container for the provided 
> TaskInfo/ExecutorInfo message.  Enabled containerizers are: mesos.
> {noformat}
> An even better error message would have been:
> {noformat}
> DockerInfo was sent, but docker containerizer is not enabled.  Try adding 
> --containerizer=docker,... to command line args
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1762) Avoid docker pull on each container run

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1762:
---
Fix Version/s: 0.20.1

> Avoid docker pull on each container run
> ---
>
> Key: MESOS-1762
> URL: https://issues.apache.org/jira/browse/MESOS-1762
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Timothy Chen
>Assignee: Timothy Chen
> Fix For: 0.20.1
>
>
> Currently the docker containerizer does a docker pull on each run, and this 
> has several downsides:
> 1. Not able to run local images
> 2. Require to contact registry server on each run, therefore docker run 
> becomes unavailable if the registry server is down. Also has scalability 
> limits.
> We want to avoid doing a pull everytime. The downside ofcourse is that images 
> without explicit tags (:latest) will not get the most updated version, but 
> this is the same behavior when calling docker run on the cli.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1688) No offers if no memory is allocatable

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1688:
---
Fix Version/s: 0.20.1

> No offers if no memory is allocatable
> -
>
> Key: MESOS-1688
> URL: https://issues.apache.org/jira/browse/MESOS-1688
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.18.1, 0.18.2, 0.19.0, 0.19.1
>Reporter: Martin Weindel
>Priority: Critical
> Fix For: 0.20.1
>
>
> The [Spark 
> scheduler|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala]
>  allocates memory only for the executor and cpu only for its tasks.
> So it can happen that all memory is nearly completely allocated by Spark 
> executors, but all cpu resources are idle.
> In this case Mesos does not offer resources anymore, as less than MIN_MEM 
> (=32MB) memory is allocatable.
> This effectively causes a dead lock in the Spark job, as it is not offered 
> cpu resources needed for launching new tasks.
> see {{HierarchicalAllocatorProcess::allocatable(const Resources&)}} called in 
> {{HierarchicalAllocatorProcess::allocate(const hashset&)}}
> {code}
> template 
> bool
> HierarchicalAllocatorProcess::allocatable(
> const Resources& resources)
> {
> ...
>   Option cpus = resources.cpus();
>   Option mem = resources.mem();
>   if (cpus.isSome() && mem.isSome()) {
> return cpus.get() >= MIN_CPUS && mem.get() > MIN_MEM;
>   }
>   return false;
> }
> {code}
> A possible solution may to completely drop the condition on allocatable 
> memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1758) Freezer failure leads to lost task during container destruction.

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1758:
---
Fix Version/s: (was: 0.21.0)
   0.20.1

> Freezer failure leads to lost task during container destruction.
> 
>
> Key: MESOS-1758
> URL: https://issues.apache.org/jira/browse/MESOS-1758
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Benjamin Mahler
>Assignee: Vinod Kone
> Fix For: 0.20.1
>
>
> In the past we've seen numerous issues around the freezer. Lately, on the 
> 2.6.44 kernel, we've seen issues where we're unable to freeze the cgroup:
> (1) An oom occurs.
> (2) No indication of oom in the kernel logs.
> (3) The slave is unable to freeze the cgroup.
> (4) The task is marked as lost.
> {noformat}
> I0903 16:46:24.956040 25469 mem.cpp:575] Memory limit exceeded: Requested: 
> 15488MB Maximum Used: 15488MB
> MEMORY STATISTICS:
> cache 7958691840
> rss 8281653248
> mapped_file 9474048
> pgpgin 4487861
> pgpgout 522933
> pgfault 2533780
> pgmajfault 11
> inactive_anon 0
> active_anon 8281653248
> inactive_file 7631708160
> active_file 326852608
> unevictable 0
> hierarchical_memory_limit 16240345088
> total_cache 7958691840
> total_rss 8281653248
> total_mapped_file 9474048
> total_pgpgin 4487861
> total_pgpgout 522933
> total_pgfault 2533780
> total_pgmajfault 11
> total_inactive_anon 0
> total_active_anon 8281653248
> total_inactive_file 7631728640
> total_active_file 326852608
> total_unevictable 0
> I0903 16:46:24.956848 25469 containerizer.cpp:1041] Container 
> bbb9732a-d600-4c1b-b326-846338c608c3 has reached its limit for resource 
> mem(*):1.62403e+10 and will be terminated
> I0903 16:46:24.957427 25469 containerizer.cpp:909] Destroying container 
> 'bbb9732a-d600-4c1b-b326-846338c608c3'
> I0903 16:46:24.958664 25481 cgroups.cpp:2192] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
> I0903 16:46:34.959529 25488 cgroups.cpp:2209] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
> I0903 16:46:34.962070 25482 cgroups.cpp:1404] Successfullly thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
> 1.710848ms
> I0903 16:46:34.962658 25479 cgroups.cpp:2192] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
> I0903 16:46:44.963349 25488 cgroups.cpp:2209] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
> I0903 16:46:44.965631 25472 cgroups.cpp:1404] Successfullly thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
> 1.588224ms
> I0903 16:46:44.966356 25472 cgroups.cpp:2192] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
> I0903 16:46:54.967254 25488 cgroups.cpp:2209] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
> I0903 16:46:56.008447 25475 cgroups.cpp:1404] Successfullly thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
> 2.15296ms
> I0903 16:46:56.009071 25466 cgroups.cpp:2192] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
> I0903 16:47:06.010329 25488 cgroups.cpp:2209] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
> I0903 16:47:06.012538 25467 cgroups.cpp:1404] Successfullly thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
> 1.643008ms
> I0903 16:47:06.013216 25467 cgroups.cpp:2192] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
> I0903 16:47:12.516348 25480 slave.cpp:3030] Current usage 9.57%. Max allowed 
> age: 5.630238827780799days
> I0903 16:47:16.015192 25488 cgroups.cpp:2209] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
> I0903 16:47:16.017043 25486 cgroups.cpp:1404] Successfullly thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 
> 1.511168ms
> I0903 16:47:16.017555 25480 cgroups.cpp:2192] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3
> I0903 16:47:19.862746 25483 http.cpp:245] HTTP request for 
> '/slave(1)/stats.json'
> E0903 16:47:24.960055 25472 slave.cpp:2557] Termination of executor 'E' of 
> framework '201104070004-002563-' failed: Failed to destroy container: 
> discarded future
> I0903 16:47:24.962054 25472 slave.cpp:2087] Handling status update TASK_LOST 
> (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 
> 201104070004-002563- from @0.0.0.0:0
> I0903 16:47:24.963470 25469 mem.cpp:293] Updated 'memory.soft_limit_in_bytes' 
> to 128MB for containe

[jira] [Updated] (MESOS-1732) Mesos containerizer doesn't reject tasks with container info set

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1732:
---
Fix Version/s: (was: 0.21.0)

> Mesos containerizer doesn't reject tasks with container info set
> 
>
> Key: MESOS-1732
> URL: https://issues.apache.org/jira/browse/MESOS-1732
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.20.0
>Reporter: Vinod Kone
>Assignee: Vinod Kone
> Fix For: 0.20.1
>
>
> DockerContainerizer doesn't accept tasks that do not have 
> TaskInfo.ContainerInfo set, but MesosContainerizer accepts tasks even if 
> TaskInfo.ContainerInfo is set. It should not.
> This means, currently if a slave has to support both docker and non-docker 
> tasks, the order of containerizers in --containerizers is important, viz., 
> "docker,mesos" works but "mesos,docker" doesn't work when running a 
> non-docker task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1755) Add docker support to mesos-execute

2014-09-15 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1755:
---
Fix Version/s: (was: 0.21.0)
   0.20.1

> Add docker support to mesos-execute
> ---
>
> Key: MESOS-1755
> URL: https://issues.apache.org/jira/browse/MESOS-1755
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Timothy Chen
> Fix For: 0.20.1
>
>
> The fix for this is already committed at https://reviews.apache.org/r/24808/. 
> I'm creating this ticket to track that this patch gets included in 0.20.1 
> release, since apparently Singularity framework depends on this patch to work 
> with Docker !?!? 
> https://groups.google.com/forum/#!topic/singularity-users/GzzswbpI92E
> [~tnachen]: Can you confirm if this has to be included in 0.20.1?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1688) No offers if no memory is allocatable

2014-09-15 Thread Bhuvan Arumugam (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135007#comment-14135007
 ] 

Bhuvan Arumugam commented on MESOS-1688:


[~tstclair] the patch has got several review comments. i'll wait for a day 
before onboarding this bug for 0.20.1 train.

> No offers if no memory is allocatable
> -
>
> Key: MESOS-1688
> URL: https://issues.apache.org/jira/browse/MESOS-1688
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.18.1, 0.18.2, 0.19.0, 0.19.1
>Reporter: Martin Weindel
>Priority: Critical
> Fix For: 0.20.1
>
>
> The [Spark 
> scheduler|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala]
>  allocates memory only for the executor and cpu only for its tasks.
> So it can happen that all memory is nearly completely allocated by Spark 
> executors, but all cpu resources are idle.
> In this case Mesos does not offer resources anymore, as less than MIN_MEM 
> (=32MB) memory is allocatable.
> This effectively causes a dead lock in the Spark job, as it is not offered 
> cpu resources needed for launching new tasks.
> see {{HierarchicalAllocatorProcess::allocatable(const Resources&)}} called in 
> {{HierarchicalAllocatorProcess::allocate(const hashset&)}}
> {code}
> template 
> bool
> HierarchicalAllocatorProcess::allocatable(
> const Resources& resources)
> {
> ...
>   Option cpus = resources.cpus();
>   Option mem = resources.mem();
>   if (cpus.isSome() && mem.isSome()) {
> return cpus.get() >= MIN_CPUS && mem.get() > MIN_MEM;
>   }
>   return false;
> }
> {code}
> A possible solution may to completely drop the condition on allocatable 
> memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)