[jira] [Created] (MESOS-2999) Implement a linux/iptables isolator
Stephan Erb created MESOS-2999: -- Summary: Implement a linux/iptables isolator Key: MESOS-2999 URL: https://issues.apache.org/jira/browse/MESOS-2999 Project: Mesos Issue Type: Story Components: containerization, isolation Reporter: Stephan Erb As a user of Mesos, I would like to have control over inbound and outbound network communication of a launched Mesos container. The intention is to gain improved security and isolation of user processes on the network level. *Example Usecases*: * Preventing outgoing connections to external endpoints which have not been whitelisted (e.g., deny internet connections, only allow connections to this one production database but not the others, ...) * Prevent incoming connections from external systems or containers which have not been whitelisted (e.g., don't allow a rough or even hijacked services to interfere with another service) The last usecase is somewhat tricky due to the dynamic nature of a Mesos cluster but might be achieved using the available [DiscoveryInfo|https://github.com/apache/mesos/blob/master/docs/app-framework-development-guide.md#service-discovery] (e.g., block all connections from foreign environments). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3001) Create a demo HTTP API client
Marco Massenzio created MESOS-3001: -- Summary: Create a demo HTTP API client Key: MESOS-3001 URL: https://issues.apache.org/jira/browse/MESOS-3001 Project: Mesos Issue Type: Bug Components: framework Reporter: Marco Massenzio Assignee: Marco Massenzio We want to create a simple demo HTTP API Client (in Java or Python) that can serve as an example framework for people who will want to use the new API for their Frameworks. The scope should be fairly limited (eg, launching a simple Container task?) but sufficient to exercise most of the new API endpoint messages/capabilities. Scope: TBD Non-Goals: - create a best-of-breed Framework to deliver any specific functionality; - create an Integration Test for the HTTP API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617157#comment-14617157 ] haosdent commented on MESOS-2199: - Hi, [~idownes]. My test step: {code} cd mesos/ rm -rf build mkdir -p build ./bootstrap cd build ../configure make check -j8 GTEST_FILTER=-* sudo ./bin/mesos-tests.sh --verbose --gtest_filter=SlaveTest.ROOT_RunTaskWithCommandInfoWithUser {code} And could pass it in my tests. My test env: {code} CentOS release 6.5 (Final) Linux test-2 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux g++ (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15) {code} Could you show me your test environment? Or open verbose flags to display the log? Thank you in advance. Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2332) Report per-container metrics for network bandwidth throttling
[ https://issues.apache.org/jira/browse/MESOS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-2332: -- Shepherd: Jie Yu (was: Ian Downes) Report per-container metrics for network bandwidth throttling - Key: MESOS-2332 URL: https://issues.apache.org/jira/browse/MESOS-2332 Project: Mesos Issue Type: Improvement Components: isolation Reporter: Paul Brett Assignee: Paul Brett Labels: features, twitter Fix For: 0.23.0 Export metrics from the network isolation to identify scope and duration of container throttling. Packet loss can be identified from the overlimits and requeues fields of the htb qdisc report for the virtual interface, e.g. {noformat} $ tc -s -d qdisc show dev mesos19223 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 158213287452 bytes 1030876393 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc ingress : parent :fff1 Sent 119381747824 bytes 1144549901 pkt (dropped 2044879, overlimits 0 requeues 0) backlog 0b 0p requeues 0 {noformat} Note that since a packet can be examined multiple times before transmission, overlimits can exceed total packets sent. Add to the port_mapping isolator usage() and the container statistics protobuf. Carefully consider the naming (esp tx/rx) + commenting of the protobuf fields so it's clear what these represent and how they are different to the existing dropped packet counts from the network stack. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3007) Support systemd with Mesos containerizer
Artem Harutyunyan created MESOS-3007: Summary: Support systemd with Mesos containerizer Key: MESOS-3007 URL: https://issues.apache.org/jira/browse/MESOS-3007 Project: Mesos Issue Type: Epic Reporter: Artem Harutyunyan Fix For: 0.24.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3003) Support mounting in default configuration files/volumes into every new container
Timothy Chen created MESOS-3003: --- Summary: Support mounting in default configuration files/volumes into every new container Key: MESOS-3003 URL: https://issues.apache.org/jira/browse/MESOS-3003 Project: Mesos Issue Type: Improvement Reporter: Timothy Chen Most container images leave out system configuration (e.g: /etc/*) and expect the container runtimes to mount in specific configurations as needed such as /etc/resolv.conf from the host into the container when needed. We need to support mounting in specific configuration files for command executor to work, and also allow the user to optionally define other configuration files to mount in as well via flags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3000) Failing test - NsTest.ROOT_setns
Ian Downes created MESOS-3000: - Summary: Failing test - NsTest.ROOT_setns Key: MESOS-3000 URL: https://issues.apache.org/jira/browse/MESOS-3000 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.23.0 Reporter: Ian Downes Priority: Blocker Appears to be the same issue plaguing MESOS-2199 {noformat} [root@hostname build]# MESOS_VERBOSE=1 ./bin/mesos-tests.sh --gtest_filter=NsTest.ROOT_setns ... [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from NsTest [ RUN ] NsTest.ROOT_setns ABORT: (../../../3rdparty/libprocess/src/subprocess.cpp:163): Failed to os::execvpe in childMain: Permission denied*** Aborted at 1436292540 (unix time) try date -d @1436292540 if you are using GNU date *** PC: @ 0x7f7a1229e625 __GI_raise *** SIGABRT (@0xfffe0001) received by PID 1 (TID 0x7f7a19afc820) from PID 1; stack trace: *** @ 0x7f7a13421710 (unknown) @ 0x7f7a1229e625 __GI_raise @ 0x7f7a1229fe05 __GI_abort @ 0x860ba1 (unknown) @ 0x860bcf (unknown) @ 0x7f7a1826f118 (unknown) @ 0x7f7a18274594 (unknown) @ 0x7f7a18273b88 (unknown) @ 0x7f7a18273098 (unknown) @ 0x1180720 (unknown) @ 0x117a5d7 (unknown) @ 0x7f7a123548fd clone ../../src/tests/ns_tests.cpp:121: Failure Failed to wait 15secs for status [ FAILED ] NsTest.ROOT_setns (15004 ms) [--] 1 test from NsTest (15004 ms total) [--] Global test environment tear-down ../../src/tests/environment.cpp:441: Failure Failed Tests completed with child processes remaining: -+- 40531 /home/idownes/workspace/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=NsTest.ROOT_setns \--- 40565 /home/idownes/workspace/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=NsTest.ROOT_setns [==] 1 test from 1 test case ran. (15034 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] NsTest.ROOT_setns {noformat} Relevant strace for the forked child: {noformat} ... getpid()= 1 dup2(6, 0) = 0 dup2(7, 1) = 1 dup2(8, 2) = 2 close(6) = 0 close(7) = 0 close(8) = 0 execve(/home/idownes/workspace/mesos/build/src/setns-test-helper, [setns-test-helper, SetnsTestHelper], [/* 24 vars */]) = -1 EACCES (Permission denied) write(2, ABORT: (../../../3rdparty/libpro..., 62) = 62 write(2, Failed to os::execvpe in childMa..., 53) = 53 ... {noformat} Binary that it's trying to exec: {noformat} [root@hostname build]# stat /home/idownes/workspace/mesos/build/src/setns-test-helper File: `/home/idownes/workspace/mesos/build/src/setns-test-helper' Size: 7948Blocks: 16 IO Block: 4096 regular file Device: 801h/2049d Inode: 22949249Links: 1 Access: (0755/-rwxr-xr-x) Uid: (13118/ idownes) Gid: ( 1500/employee) Access: 2015-07-07 17:58:09.569861237 + Modify: 2015-07-07 17:58:09.573861290 + Change: 2015-07-07 17:58:09.573861290 + [root@hostname build]# /home/idownes/workspace/mesos/build/src/setns-test-helper Usage: /home/idownes/workspace/mesos/build/src/.libs/lt-setns-test-helper subcommand [OPTIONS] Available subcommands: help SetnsTestHelper {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3004) Support running the command executor with provisioned image for running a task in a container
Timothy Chen created MESOS-3004: --- Summary: Support running the command executor with provisioned image for running a task in a container Key: MESOS-3004 URL: https://issues.apache.org/jira/browse/MESOS-3004 Project: Mesos Issue Type: Improvement Reporter: Timothy Chen Mesos Containerizer uses the command executor to actually launch the user defined command, and the command executor then can communicate with the slave about the process lifecycle. When we provision a new container with the user specified image, we also need to be able to run the command executor in the container to support the same semantics. One approach is to dynamically mount in a static binary of the command executor with all its dependencies in a special directory so it doesn't interfere with the provisioned root filesystem and configure the mesos containerizer to run the command executor in that directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van Remoortere reassigned MESOS-3002: --- Assignee: Joris Van Remoortere (was: Mark Wang) Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Joris Van Remoortere Priority: Blocker Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2993) Document per container unique egress flow and network queueing statistics
[ https://issues.apache.org/jira/browse/MESOS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617110#comment-14617110 ] Adam B commented on MESOS-2993: --- [~pbrett] Is there a draft/review for this yet? Wondering if we can get this in for the next release candidate (rc2). Document per container unique egress flow and network queueing statistics -- Key: MESOS-2993 URL: https://issues.apache.org/jira/browse/MESOS-2993 Project: Mesos Issue Type: Bug Components: documentation, isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Document new network isolation capabilities in 0.23 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617163#comment-14617163 ] haosdent commented on MESOS-2199: - Nobody user is same with yours: {code} nobody:x:99:99:Nobody:/:/sbin/nologin {code} Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-3002: -- Target Version/s: 0.23.0 Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Mark Wang Priority: Blocker Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3005) SSL tests can fail depending on hostname configuration
Joris Van Remoortere created MESOS-3005: --- Summary: SSL tests can fail depending on hostname configuration Key: MESOS-3005 URL: https://issues.apache.org/jira/browse/MESOS-3005 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Depending on how /etc/hosts is configured, the SSL tests can fail with a bad hostname match for the certificate. We can avoid this by explicitly matching the hostname for the certificate to the IP that will be used during the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617240#comment-14617240 ] Ian Downes commented on MESOS-2199: --- I have my build directory under my home directory and I do not want {{nobody}} (or anybody else) to have access to it. {noformat} [root@hostname build]# stat /home/idownes File: `/home/idownes' Size: 4096Blocks: 8 IO Block: 4096 directory Device: 801h/2049d Inode: 22807083Links: 11 Access: (0700/drwx--) Uid: (13118/ idownes) Gid: ( 1500/employee) Access: 2015-07-06 22:51:35.829848943 + Modify: 2015-07-06 21:58:32.348041134 + Change: 2015-07-06 21:58:32.348041134 + {noformat} My home directory is {{0700}} so naturally {{nobody}} does not have access: {noformat} [root@hostname build]# su -s /bin/sh nobody -c ls /home/idownes ls: cannot open directory /home/idownes: Permission denied {noformat} I think it's flawed to require global read access for the build directory... Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1457) Process IDs should be required to be human-readable
[ https://issues.apache.org/jira/browse/MESOS-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617296#comment-14617296 ] Till Toenshoff commented on MESOS-1457: --- I got pointed to this older issue as the patches did not get committed. Seems Palak's solution is acceptable. It would be great if we could indeed get a comment into the ProcessBase constructor stating something like the proposed {noformat} // Please provide a process ID prefix to ease debugging (See MESOS-1457). {noformat} [~PalakPC] could you possibly propose the above in a review-request and rebase those other two patches so we can get them committed? Process IDs should be required to be human-readable Key: MESOS-1457 URL: https://issues.apache.org/jira/browse/MESOS-1457 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Dominic Hamon Assignee: Palak Choudhary Priority: Minor When debugging, it's very useful to understand which processes are getting timeslices. As such, the human-readable names that can be passed to {{ProcessBase}} are incredibly valuable, however they are currently optional. If the constructor of {{ProcessBase}} took a mandatory string, every process would get a human-readable name and debugging would be much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
Paul Brett created MESOS-3002: - Summary: Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-3002: -- Assignee: Mark Wang Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Mark Wang Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617180#comment-14617180 ] haosdent commented on MESOS-2199: - Could you show the permission of you build dir? Your build path should allow other users could read it. You could check your build dir permissions through {code} su -s /bin/sh nobody -c ls your_build_dir_absolute_path {code} Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2972) Serialize Docker image spec as protobuf
[ https://issues.apache.org/jira/browse/MESOS-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated MESOS-2972: Description: The Docker image specification defines a schema for the metadata json that it puts into each image. Currently the docker image provisioner needs to be able to parse and understand this metadata json, and we should create a protobuf equivelent schema so we can utilize the json to protobuf conversion to read and validate the metadata. Serialize Docker image spec as protobuf --- Key: MESOS-2972 URL: https://issues.apache.org/jira/browse/MESOS-2972 Project: Mesos Issue Type: Improvement Reporter: Timothy Chen Labels: mesosphere The Docker image specification defines a schema for the metadata json that it puts into each image. Currently the docker image provisioner needs to be able to parse and understand this metadata json, and we should create a protobuf equivelent schema so we can utilize the json to protobuf conversion to read and validate the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2972) Serialize Docker image spec as protobuf
[ https://issues.apache.org/jira/browse/MESOS-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617190#comment-14617190 ] Timothy Chen commented on MESOS-2972: - Just updated the description. Serialize Docker image spec as protobuf --- Key: MESOS-2972 URL: https://issues.apache.org/jira/browse/MESOS-2972 Project: Mesos Issue Type: Improvement Reporter: Timothy Chen Labels: mesosphere The Docker image specification defines a schema for the metadata json that it puts into each image. Currently the docker image provisioner needs to be able to parse and understand this metadata json, and we should create a protobuf equivelent schema so we can utilize the json to protobuf conversion to read and validate the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2972) Serialize Docker image spec as protobuf
[ https://issues.apache.org/jira/browse/MESOS-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated MESOS-2972: Assignee: Lily Chen Serialize Docker image spec as protobuf --- Key: MESOS-2972 URL: https://issues.apache.org/jira/browse/MESOS-2972 Project: Mesos Issue Type: Improvement Reporter: Timothy Chen Assignee: Lily Chen Labels: mesosphere The Docker image specification defines a schema for the metadata json that it puts into each image. Currently the docker image provisioner needs to be able to parse and understand this metadata json, and we should create a protobuf equivelent schema so we can utilize the json to protobuf conversion to read and validate the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes updated MESOS-3002: -- Priority: Blocker (was: Major) Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Mark Wang Priority: Blocker Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2640) Remove old frameworks and ec2 scripts from core Mesos repository
[ https://issues.apache.org/jira/browse/MESOS-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-2640: -- Fix Version/s: 0.24.0 Remove old frameworks and ec2 scripts from core Mesos repository Key: MESOS-2640 URL: https://issues.apache.org/jira/browse/MESOS-2640 Project: Mesos Issue Type: Task Reporter: Yan Xu Assignee: Yan Xu Fix For: 0.24.0 As per discussion [on the dev list|http://www.mail-archive.com/dev@mesos.apache.org/msg31587.html] we'll remove the old and unmaintained frameworks code from the repo and move them to https://github.com/mesos/framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617165#comment-14617165 ] Paul Brett commented on MESOS-3002: --- Mark - can you take a look at this. Thanks Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Mark Wang Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3006) Add cgroups memory stats API
Jojy Varghese created MESOS-3006: Summary: Add cgroups memory stats API Key: MESOS-3006 URL: https://issues.apache.org/jira/browse/MESOS-3006 Project: Mesos Issue Type: Task Components: containerization, docker Environment: linux Reporter: Jojy Varghese Assignee: Jojy Varghese cgroups API current does expose stats from the memory namespace. Having this API would enable isolators to use its various fields(eg. rss, rss_huge, writeback etc) in use cases like usage metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2119) Add Socket tests
[ https://issues.apache.org/jira/browse/MESOS-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2119: - Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 14 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Sprint 10, Mesosphere Sprint 11) Add Socket tests Key: MESOS-2119 URL: https://issues.apache.org/jira/browse/MESOS-2119 Project: Mesos Issue Type: Task Components: libprocess Reporter: Niklas Quarfot Nielsen Assignee: Joris Van Remoortere Labels: mesosphere Add more Socket specific tests to get coverage while doing libev to libevent (w and wo SSL) move -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van Remoortere updated MESOS-3002: Shepherd: Benjamin Hindman Sprint: Mesosphere Sprint 14 Story Points: 1 Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Joris Van Remoortere Priority: Blocker Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-3002: -- Affects Version/s: (was: 0.23.0) 0.24.0 Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.24.0 Reporter: Paul Brett Assignee: Joris Van Remoortere Priority: Blocker Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-3002: -- Target Version/s: 0.24.0 (was: 0.23.0) Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.24.0 Reporter: Paul Brett Assignee: Joris Van Remoortere Priority: Blocker Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2075) Add maintenance information to the replicated registry.
[ https://issues.apache.org/jira/browse/MESOS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2075: - Sprint: Mesosphere Sprint 14 Labels: mesosphere twitter (was: twitter) Fix Version/s: 0.24.0 Add maintenance information to the replicated registry. --- Key: MESOS-2075 URL: https://issues.apache.org/jira/browse/MESOS-2075 Project: Mesos Issue Type: Task Components: master Reporter: Benjamin Mahler Labels: mesosphere, twitter Fix For: 0.24.0 To achieve fault-tolerance for the maintenance primitives, we will need to add the maintenance information to the registry. The registry currently stores all of the slave information, which is quite large (~ 17MB for 50,000 slaves from my testing), which results in a protobuf object that is extremely expensive to copy. As far as I can tell, reads / writes to maintenance information is independent of reads / writes to the existing 'registry' information. So there are two approach here: h4. Add maintenance information to 'maintenance' key: # The advantage of this approach is that we don't further grow the large Registry object. # This approach assumes that writes to 'maintenance' are independent of writes to the 'registry'. If these writes are not independent, this approach requires that we add transactional support to the State abstraction. # This approach requires adding compaction to LogStorage. # This approach likely requires some refactoring to the Registrar. h4. Add maintenance information to 'registry' key: # The advantage of this approach is that it's the easiest to implement. # This will further grow the single 'registry' object, but doesn't preclude it being split apart in the future. # This approach may require using the diff support in LogStorage and/or adding compression support to LogStorage snapshots to deal with the increased size of the registry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2996) Failing Docker tests on CentOS Linux release 7.1.1503.
[ https://issues.apache.org/jira/browse/MESOS-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617429#comment-14617429 ] Adam B commented on MESOS-2996: --- First two patches committed, but there's still a lingering issue, as mentioned in the above comment. commit a925b77d53fabcc22e4b4988e18b40387e17b0ab Author: Timothy Chen tnac...@apache.org Date: Tue Jul 7 11:51:36 2015 -0700 Only run netcat tests when nc is available. Review: https://reviews.apache.org/r/36216 commit eecf0d4a2a31506878c98c9dd562175816efdcbf Author: Timothy Chen tnac...@apache.org Date: Tue Jul 7 11:50:21 2015 -0700 Fix running docker executor tests. Review: https://reviews.apache.org/r/36214 Failing Docker tests on CentOS Linux release 7.1.1503. -- Key: MESOS-2996 URL: https://issues.apache.org/jira/browse/MESOS-2996 Project: Mesos Issue Type: Bug Reporter: Joerg Schad Assignee: Timothy Chen Priority: Critical Labels: mesosphere With Mesos 0.23 rc1 several tests fail on CentOS Linux release 7.1 (will add more detail shortly). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2996) Failing Docker tests on CentOS Linux release 7.1.1503.
[ https://issues.apache.org/jira/browse/MESOS-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2996: -- Priority: Blocker (was: Critical) Failing Docker tests on CentOS Linux release 7.1.1503. -- Key: MESOS-2996 URL: https://issues.apache.org/jira/browse/MESOS-2996 Project: Mesos Issue Type: Bug Reporter: Joerg Schad Assignee: Timothy Chen Priority: Blocker Labels: mesosphere With Mesos 0.23 rc1 several tests fail on CentOS Linux release 7.1 (will add more detail shortly). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2946) Authorizer Module: Interface design
[ https://issues.apache.org/jira/browse/MESOS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613069#comment-14613069 ] Till Toenshoff edited comment on MESOS-2946 at 7/7/15 8:27 PM: --- h4.Status Quo As the current design stands, {{Authorizer}} is indeed an interface, but its default implementation is declared in the same header. Moreover, if one decides to create an alternative implementation for authorization, Mesos needs to be recompiled and all the places where the authorizer gets instantiated need to be updated. h4.Design Under the modularize version, the MVP for the {{Authorizer}} interface will look like: {code} class Authorizer { public: static TryAuthorizer* create(const std::string name); virtual ~Authorizer() {} virtual TryNothing initialize(const OptionACLs acls) = 0; virtual process::Futurebool authorize( const ACL::RegisterFramework request) = 0; virtual process::Futurebool authorize( const ACL::RunTask request) = 0; virtual process::Futurebool authorize( const ACL::ShutdownFramework request) = 0; protected: Authorizer() {} }; {code} Where {{Authorizer::create(const std::string)}} is the factory function which will construct the default {{LocalAuthorizer}} if local is selected and will use the existing facilities within {{ModuleManager}} to load the appropriate module in any other case. In order to allow the {{LocalAuthorizer}} to play nicely with the general modules design, it needs a default constructor. This constraint leads to the existence of {{Authorizer::initialize(const OptionACLs)}} which is needed to pass initialization parameters to the {{LocalAuthorizer}}. Note that all other authorizers will use the {{ModuleManager}} mechanisms to pass initialization parameters. This follows the pattern used in the {{Authenticator}} module. The method {{Authorizer::initialize(const OptionACLs)}} can be removed when we go to a modules only implementation. All other methods remain unchanged from the original {{Authorizer}} interface. was (Author: arojas): h4.Status Quo As the current design stands, {{Authorizer}} is indeed an interface, but its default implementation is declared in the same header. Moreover, if one decides to create an alternative implementation for authorization, Mesos needs to be recompiled and all the places where the authorizer gets instantiated need to be updated. h4.Design Under the modularize version, the MVP for the {{Authorizer}} interface will look like: {code} class Authorizer { public: static TryAuthorizer* create(const std::string name); virtual ~Authorizer() {} virtual TryNothing initialize(const OptionACLs acls) = 0; virtual process::Futurebool authorize( const ACL::RegisterFramework request) = 0; virtual process::Futurebool authorize( const ACL::RunTask request) = 0; virtual process::Futurebool authorize( const ACL::ShutdownFramework request) = 0; protected: Authorizer() {} }; {code} Where {{Authorizer::create(const std::string)}} is the factory function which will construct the default {{LocalAuthorizer}} if local is selected and will use the existing facilities within {{ModuleManager}} to load the appropriate module in any other case. In order to allow the {{LocalAuthorizer}} to play nicely with the general modules design, it needs a default constructor. This constraint leads to the existence of {{Authorizer::initialize(const OptionACLs)}} which is needed to pass initialization parameters to the {{LocalAuthorizer}}. Note that all other authorizers will use the {{ModuleManager}} mechanisms to pass initialization parameters. This follows the pattern used in the {{Authorizator}} module. The method {{Authorizer::initialize(const OptionACLs)}} can be removed when we go to a modules only implementation. All other methods remain unchanged from the original {{Authorizer}} interface. Authorizer Module: Interface design --- Key: MESOS-2946 URL: https://issues.apache.org/jira/browse/MESOS-2946 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff Assignee: Till Toenshoff Labels: mesosphere, module, security h4.Motivation Design an interface covering authorizer modules while staying minimally invasive in regards to changes to the existing {{LocalAuthorizer}} implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2993) Document per container unique egress flow and network queueing statistics
[ https://issues.apache.org/jira/browse/MESOS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617498#comment-14617498 ] Paul Brett commented on MESOS-2993: --- Review draft available at https://reviews.apache.org/r/36281/ Document per container unique egress flow and network queueing statistics -- Key: MESOS-2993 URL: https://issues.apache.org/jira/browse/MESOS-2993 Project: Mesos Issue Type: Bug Components: documentation, isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Document new network isolation capabilities in 0.23 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3008) Libevent SSL doesn't use EPOLL
Joris Van Remoortere created MESOS-3008: --- Summary: Libevent SSL doesn't use EPOLL Key: MESOS-3008 URL: https://issues.apache.org/jira/browse/MESOS-3008 Project: Mesos Issue Type: Improvement Components: libprocess Affects Versions: 0.23.0 Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere we currently disable to epoll in libevent to allow SSL to work. It would be more scalable if we didn't have to do that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-3002: -- Fix Version/s: (was: 0.23.0) Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.24.0 Reporter: Paul Brett Assignee: Joris Van Remoortere Priority: Blocker Fix For: 0.24.0 Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2076) Implement maintenance primitives in the Master.
[ https://issues.apache.org/jira/browse/MESOS-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2076: - Sprint: Mesosphere Sprint 14 Labels: mesosphere twitter (was: twitter) Implement maintenance primitives in the Master. --- Key: MESOS-2076 URL: https://issues.apache.org/jira/browse/MESOS-2076 Project: Mesos Issue Type: Task Components: master Reporter: Benjamin Mahler Labels: mesosphere, twitter The master will need to do a number of things to implement the maintenance primitives: # For slaves that have a maintenance window: #* For unused resources, offers must be augmented with an Unavailability. #* For used resources, inverse offers must be sent. # For inverse offers that are declined, we must filter these before sending them again. We must also store the decline reason, guard against OOMing. #* My hunch is that we'll not want to persist the reasons in the initial approach. # When the drain window is reached, we'll make a binary decision as to whether the slave was drained, based on whether it was empty. #* If drained, we deactivate this slave and store the fact that it was drained. #* If not drained, we leave this slave activated. # Recover the maintenance information upon failover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3003) Support mounting in default configuration files/volumes into every new container
[ https://issues.apache.org/jira/browse/MESOS-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated MESOS-3003: Component/s: containerization Support mounting in default configuration files/volumes into every new container Key: MESOS-3003 URL: https://issues.apache.org/jira/browse/MESOS-3003 Project: Mesos Issue Type: Improvement Components: containerization Reporter: Timothy Chen Labels: mesosphere Most container images leave out system configuration (e.g: /etc/*) and expect the container runtimes to mount in specific configurations as needed such as /etc/resolv.conf from the host into the container when needed. We need to support mounting in specific configuration files for command executor to work, and also allow the user to optionally define other configuration files to mount in as well via flags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3004) Support running the command executor with provisioned image for running a task in a container
[ https://issues.apache.org/jira/browse/MESOS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated MESOS-3004: Component/s: containerization Support running the command executor with provisioned image for running a task in a container - Key: MESOS-3004 URL: https://issues.apache.org/jira/browse/MESOS-3004 Project: Mesos Issue Type: Improvement Components: containerization Reporter: Timothy Chen Labels: mesosphere Mesos Containerizer uses the command executor to actually launch the user defined command, and the command executor then can communicate with the slave about the process lifecycle. When we provision a new container with the user specified image, we also need to be able to run the command executor in the container to support the same semantics. One approach is to dynamically mount in a static binary of the command executor with all its dependencies in a special directory so it doesn't interfere with the provisioned root filesystem and configure the mesos containerizer to run the command executor in that directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3009) Reproduce systemd cgroup behavior
Artem Harutyunyan created MESOS-3009: Summary: Reproduce systemd cgroup behavior Key: MESOS-3009 URL: https://issues.apache.org/jira/browse/MESOS-3009 Project: Mesos Issue Type: Task Reporter: Artem Harutyunyan Assignee: Joris Van Remoortere It has been noticed before that systemd reorganizes cgroup hierarchy created by mesos slave. Because of this mesos is no longer able to find the cgroup, and there is also a chance of undoing the isolation that mesos slave puts in place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van Remoortere updated MESOS-3002: Labels: mesosphere (was: ) Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.24.0 Reporter: Paul Brett Assignee: Joris Van Remoortere Priority: Blocker Labels: mesosphere Fix For: 0.24.0 Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2943) mesos fails to compile under mac when libssl and libevent are enabled
[ https://issues.apache.org/jira/browse/MESOS-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617501#comment-14617501 ] Benjamin Hindman commented on MESOS-2943: - commit 971583522b3ada19f91d58fd89c0d3d17f5fef34 Author: Joris Van Remoortere joris.van.remoort...@gmail.com Date: Tue Jul 7 14:54:56 2015 -0700 MESOS-2943: Add comment for explicit return type. Review: https://reviews.apache.org/r/36267 mesos fails to compile under mac when libssl and libevent are enabled - Key: MESOS-2943 URL: https://issues.apache.org/jira/browse/MESOS-2943 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.23.0 Reporter: Artem Harutyunyan Assignee: Joris Van Remoortere Priority: Blocker Labels: mesosphere Fix For: 0.23.0 ../configure --enable-debug --enable-libevent --enable-ssl make produces the following error: poll.cpp' || echo '../../../3rdparty/libprocess/'`src/libevent_poll.cpp libtool: compile: g++ -DPACKAGE_NAME=\libprocess\ -DPACKAGE_TARNAME=\libprocess\ -DPACKAGE_VERSION=\0.0.1\ -DPACKAGE_STRING=\libprocess 0.0.1\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\libprocess\ -DVERSION=\0.0.1\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBCURL=1 -DHAVE_EVENT2_EVENT_H=1 -DHAVE_LIBEVENT=1 -DHAVE_EVENT2_THREAD_H=1 -DHAVE_LIBEVENT_PTHREADS=1 -DHAVE_OPENSSL_SSL_H=1 -DHAVE_LIBSSL=1 -DHAVE_LIBCRYPTO=1 -DHAVE_EVENT2_BUFFEREVENT_SSL_H=1 -DHAVE_LIBEVENT_OPENSSL=1 -DUSE_SSL_SOCKET=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBDL=1 -I. -I../../../3rdparty/libprocess -I../../../3rdparty/libprocess/include -I../../../3rdparty/libprocess/3rdparty/stout/include -I3rdparty/boost-1.53.0 -I3rdparty/libev-4.15 -I3rdparty/picojson-4f93734 -I3rdparty/glog-0.3.3/src -I3rdparty/ry-http-parser-1c3624a -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -g1 -O0 -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -MT libprocess_la-libevent_poll.lo -MD -MP -MF .deps/libprocess_la-libevent_poll.Tpo -c ../../../3rdparty/libprocess/src/libevent_poll.cpp -fno-common -DPIC -o libprocess_la-libevent_poll.o mv -f .deps/libprocess_la-socket.Tpo .deps/libprocess_la-socket.Plo mv -f .deps/libprocess_la-subprocess.Tpo .deps/libprocess_la-subprocess.Plo mv -f .deps/libprocess_la-libevent.Tpo .deps/libprocess_la-libevent.Plo mv -f .deps/libprocess_la-metrics.Tpo .deps/libprocess_la-metrics.Plo In file included from ../../../3rdparty/libprocess/src/libevent_ssl_socket.cpp:11: In file included from ../../../3rdparty/libprocess/include/process/queue.hpp:9: ../../../3rdparty/libprocess/include/process/future.hpp:849:7: error: no viable conversion from 'const process::Futureconst process::Futureprocess::network::Socket ' to 'const process::network::Socket' set(u); ^ ../../../3rdparty/libprocess/src/libevent_ssl_socket.cpp:769:10: note: in instantiation of function template specialization 'process::Futureprocess::network::Socket::Futureprocess::Futureconst process::Futureprocess::network::Socket ' requested here return accept_queue.get() ^ ../../../3rdparty/libprocess/include/process/socket.hpp:21:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'const process::Futureconst process::Futureprocess::network::Socket ' to 'process::network::Socket ' for 1st argument class Socket ^ ../../../3rdparty/libprocess/include/process/socket.hpp:21:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'const process::Futureconst process::Futureprocess::network::Socket ' to 'const process::network::Socket ' for 1st argument class Socket ^ ../../../3rdparty/libprocess/include/process/future.hpp:411:21: note: passing argument to parameter '_t' here bool set(const T _t); ^ 1 error generated. make[4]: *** [libprocess_la-libevent_ssl_socket.lo] Error 1 make[4]: *** Waiting for unfinished jobs mv -f .deps/libprocess_la-libevent_poll.Tpo .deps/libprocess_la-libevent_poll.Plo mv -f .deps/libprocess_la-openssl.Tpo .deps/libprocess_la-openssl.Plo mv -f .deps/libprocess_la-process.Tpo .deps/libprocess_la-process.Plo make[3]: ***
[jira] [Updated] (MESOS-2061) Add InverseOffer protobuf message.
[ https://issues.apache.org/jira/browse/MESOS-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2061: - Sprint: Mesosphere Sprint 14 Labels: mesosphere twitter (was: twitter) Add InverseOffer protobuf message. -- Key: MESOS-2061 URL: https://issues.apache.org/jira/browse/MESOS-2061 Project: Mesos Issue Type: Task Reporter: Benjamin Mahler Labels: mesosphere, twitter InverseOffer was defined as part of the maintenance work in MESOS-1474, design doc here: https://docs.google.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/edit?usp=sharing {code} // A request to deallocate or return any resources already // being consumed by the framework. message InverseOffer { required OfferID id = 1; required FrameworkID framework_id = 2; repeated Resource resources = 3; // The slave ID if the resources need to be released on a particular slave. optional SlaveID slave_id = 4; // The executor and task IDs if the resources need to be released on specific // executors and/or tasks. optional ExecutorID executor_id = 6; repeated TaskID task_ids = 6; // The resources specified in this offer will become unavailable // at the specified start time and for the specified duration. Any // tasks running using these resources might get killed when // these resources become unavailable. required Unavailability unavailability = 7; } {code} This ticket is to capture the addition of the InverseOffer protobuf to mesos.proto, the necessary API changes for Event/Call and the language bindings will be tracked separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2076) Implement maintenance primitives in the Master.
[ https://issues.apache.org/jira/browse/MESOS-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2076: - Fix Version/s: 0.24.0 Implement maintenance primitives in the Master. --- Key: MESOS-2076 URL: https://issues.apache.org/jira/browse/MESOS-2076 Project: Mesos Issue Type: Task Components: master Reporter: Benjamin Mahler Labels: mesosphere, twitter Fix For: 0.24.0 The master will need to do a number of things to implement the maintenance primitives: # For slaves that have a maintenance window: #* For unused resources, offers must be augmented with an Unavailability. #* For used resources, inverse offers must be sent. # For inverse offers that are declined, we must filter these before sending them again. We must also store the decline reason, guard against OOMing. #* My hunch is that we'll not want to persist the reasons in the initial approach. # When the drain window is reached, we'll make a binary decision as to whether the slave was drained, based on whether it was empty. #* If drained, we deactivate this slave and store the fact that it was drained. #* If not drained, we leave this slave activated. # Recover the maintenance information upon failover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2061) Add InverseOffer protobuf message.
[ https://issues.apache.org/jira/browse/MESOS-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2061: - Fix Version/s: 0.24.0 Add InverseOffer protobuf message. -- Key: MESOS-2061 URL: https://issues.apache.org/jira/browse/MESOS-2061 Project: Mesos Issue Type: Task Reporter: Benjamin Mahler Labels: mesosphere, twitter Fix For: 0.24.0 InverseOffer was defined as part of the maintenance work in MESOS-1474, design doc here: https://docs.google.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/edit?usp=sharing {code} // A request to deallocate or return any resources already // being consumed by the framework. message InverseOffer { required OfferID id = 1; required FrameworkID framework_id = 2; repeated Resource resources = 3; // The slave ID if the resources need to be released on a particular slave. optional SlaveID slave_id = 4; // The executor and task IDs if the resources need to be released on specific // executors and/or tasks. optional ExecutorID executor_id = 6; repeated TaskID task_ids = 6; // The resources specified in this offer will become unavailable // at the specified start time and for the specified duration. Any // tasks running using these resources might get killed when // these resources become unavailable. required Unavailability unavailability = 7; } {code} This ticket is to capture the addition of the InverseOffer protobuf to mesos.proto, the necessary API changes for Event/Call and the language bindings will be tracked separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3010) Review design document for maintenance primitives
Artem Harutyunyan created MESOS-3010: Summary: Review design document for maintenance primitives Key: MESOS-3010 URL: https://issues.apache.org/jira/browse/MESOS-3010 Project: Mesos Issue Type: Task Reporter: Artem Harutyunyan Priority: Blocker Following a suggestion from [~bmahler] we should review the design document [0] for maintenance primitives and consider adding support for explicit acknowledgement from frameworks. [0] - https://docs.google.com/document/d/1CIoOnBLFiEvmhOe-h_s8M4m9Qa7BLETuj_dSNJW959U -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1474) Provide cluster maintenance primitives for operators.
[ https://issues.apache.org/jira/browse/MESOS-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-1474: - Labels: mesosphere twitter (was: twitter) Provide cluster maintenance primitives for operators. - Key: MESOS-1474 URL: https://issues.apache.org/jira/browse/MESOS-1474 Project: Mesos Issue Type: Epic Components: framework, master, slave Reporter: Benjamin Mahler Labels: mesosphere, twitter Sometimes operators need to perform maintenance on a mesos cluster; we define maintenance here as anything that requires the tasks to be drained on the slave(s). Most mesos upgrades can be done without affecting running tasks, but there are situations where maintenance is task-affecting: * Host maintenance (e.g. hardware repair, kernel upgrades). * Non-recoverable slave upgrades (e.g. adjusting slave attributes). * etc In order to ensure operators don’t violate frameworks’ SLAs, schedulers need to be aware of planned unavailability events. Maintenance awareness allows schedulers to avoid churn for long running tasks by placing them on machines not undergoing maintenance. If all resources are planned for maintenance, then the scheduler will prefer machines scheduled for maintenance least imminently. Maintenance awareness is also crucial when a scheduler uses [persistent disk|https://issues.apache.org/jira/browse/MESOS-1554] resources, to ensure that the scheduler is aware of the expected duration of unavailability for a persistent disk resource (e.g. using 3 1TB replicas, don’t need to replicate 1TB over the network when only 1 of the 3 replicas is going to be unavailable for a reboot ( 1 hour)). There are a few primitives of interest here: * Provide a way for operators to [fully shutdown a slave|https://issues.apache.org/jira/browse/MESOS-1475] (killing all tasks underneath it). Colloquially known as a hard drain. * Provide a way for operators to mark specific slaves as scheduled for maintenance. This will inform the scheduler about the scheduled unavailability of the resources. * Provide a way for frameworks to be notified when resources are requested to be relinquished. This gives the framework to proactively move a task before it may be forcibly killed by an operator. It also allows the automation of operations like: please drain these slaves within 1 hour. See the [design doc|https://docs.google.com/a/twitter.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/edit#] for the latest details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2061) Add InverseOffer protobuf message.
[ https://issues.apache.org/jira/browse/MESOS-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2061: - Target Version/s: 0.24.0 Fix Version/s: (was: 0.24.0) Add InverseOffer protobuf message. -- Key: MESOS-2061 URL: https://issues.apache.org/jira/browse/MESOS-2061 Project: Mesos Issue Type: Task Reporter: Benjamin Mahler Labels: mesosphere, twitter InverseOffer was defined as part of the maintenance work in MESOS-1474, design doc here: https://docs.google.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/edit?usp=sharing {code} // A request to deallocate or return any resources already // being consumed by the framework. message InverseOffer { required OfferID id = 1; required FrameworkID framework_id = 2; repeated Resource resources = 3; // The slave ID if the resources need to be released on a particular slave. optional SlaveID slave_id = 4; // The executor and task IDs if the resources need to be released on specific // executors and/or tasks. optional ExecutorID executor_id = 6; repeated TaskID task_ids = 6; // The resources specified in this offer will become unavailable // at the specified start time and for the specified duration. Any // tasks running using these resources might get killed when // these resources become unavailable. required Unavailability unavailability = 7; } {code} This ticket is to capture the addition of the InverseOffer protobuf to mesos.proto, the necessary API changes for Event/Call and the language bindings will be tracked separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2075) Add maintenance information to the replicated registry.
[ https://issues.apache.org/jira/browse/MESOS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2075: - Target Version/s: 0.24.0 Fix Version/s: (was: 0.24.0) Add maintenance information to the replicated registry. --- Key: MESOS-2075 URL: https://issues.apache.org/jira/browse/MESOS-2075 Project: Mesos Issue Type: Task Components: master Reporter: Benjamin Mahler Labels: mesosphere, twitter To achieve fault-tolerance for the maintenance primitives, we will need to add the maintenance information to the registry. The registry currently stores all of the slave information, which is quite large (~ 17MB for 50,000 slaves from my testing), which results in a protobuf object that is extremely expensive to copy. As far as I can tell, reads / writes to maintenance information is independent of reads / writes to the existing 'registry' information. So there are two approach here: h4. Add maintenance information to 'maintenance' key: # The advantage of this approach is that we don't further grow the large Registry object. # This approach assumes that writes to 'maintenance' are independent of writes to the 'registry'. If these writes are not independent, this approach requires that we add transactional support to the State abstraction. # This approach requires adding compaction to LogStorage. # This approach likely requires some refactoring to the Registrar. h4. Add maintenance information to 'registry' key: # The advantage of this approach is that it's the easiest to implement. # This will further grow the single 'registry' object, but doesn't preclude it being split apart in the future. # This approach may require using the diff support in LogStorage and/or adding compression support to LogStorage snapshots to deal with the increased size of the registry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2076) Implement maintenance primitives in the Master.
[ https://issues.apache.org/jira/browse/MESOS-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2076: - Target Version/s: 0.24.0 Fix Version/s: (was: 0.24.0) Implement maintenance primitives in the Master. --- Key: MESOS-2076 URL: https://issues.apache.org/jira/browse/MESOS-2076 Project: Mesos Issue Type: Task Components: master Reporter: Benjamin Mahler Labels: mesosphere, twitter The master will need to do a number of things to implement the maintenance primitives: # For slaves that have a maintenance window: #* For unused resources, offers must be augmented with an Unavailability. #* For used resources, inverse offers must be sent. # For inverse offers that are declined, we must filter these before sending them again. We must also store the decline reason, guard against OOMing. #* My hunch is that we'll not want to persist the reasons in the initial approach. # When the drain window is reached, we'll make a binary decision as to whether the slave was drained, based on whether it was empty. #* If drained, we deactivate this slave and store the fact that it was drained. #* If not drained, we leave this slave activated. # Recover the maintenance information upon failover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2061) Add InverseOffer protobuf message.
[ https://issues.apache.org/jira/browse/MESOS-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2061: - Assignee: Joseph Wu Add InverseOffer protobuf message. -- Key: MESOS-2061 URL: https://issues.apache.org/jira/browse/MESOS-2061 Project: Mesos Issue Type: Task Reporter: Benjamin Mahler Assignee: Joseph Wu Labels: mesosphere, twitter InverseOffer was defined as part of the maintenance work in MESOS-1474, design doc here: https://docs.google.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/edit?usp=sharing {code} // A request to deallocate or return any resources already // being consumed by the framework. message InverseOffer { required OfferID id = 1; required FrameworkID framework_id = 2; repeated Resource resources = 3; // The slave ID if the resources need to be released on a particular slave. optional SlaveID slave_id = 4; // The executor and task IDs if the resources need to be released on specific // executors and/or tasks. optional ExecutorID executor_id = 6; repeated TaskID task_ids = 6; // The resources specified in this offer will become unavailable // at the specified start time and for the specified duration. Any // tasks running using these resources might get killed when // these resources become unavailable. required Unavailability unavailability = 7; } {code} This ticket is to capture the addition of the InverseOffer protobuf to mesos.proto, the necessary API changes for Event/Call and the language bindings will be tracked separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2066) Add optional 'Unavailability' to resource offers to provide maintenance awareness.
[ https://issues.apache.org/jira/browse/MESOS-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2066: - Labels: mesosphere twitter (was: twitter) Add optional 'Unavailability' to resource offers to provide maintenance awareness. -- Key: MESOS-2066 URL: https://issues.apache.org/jira/browse/MESOS-2066 Project: Mesos Issue Type: Task Reporter: Benjamin Mahler Assignee: Joseph Wu Labels: mesosphere, twitter In order to inform frameworks about upcoming maintenance on offered resources, per MESOS-1474, we'd like to add an optional 'Unavailability' information to offers: {code} message Unavailability { required Time start = 1; // The approximate duration of the unavailability, // if this is a transient unavailability. optional Duration duration = 2; } message Offer { required OfferID id = 1; required FrameworkID framework_id = 2; required SlaveID slave_id = 3; required string hostname = 4; repeated Resource resources = 5; repeated Attribute attributes = 7; repeated ExecutorID executor_ids = 6; // The resources specified in this offer will become unavailable // at the specified start time and for the specified duration. Any // tasks launched using these resources might get killed when // these resources become unavailable. optional Unavailability unavailability = 8; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3010) Review design document for maintenance primitives
[ https://issues.apache.org/jira/browse/MESOS-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-3010: --- Description: Following a suggestion from [~bmahler] we should review the design document [0] for maintenance primitives and consider adding support for explicit acknowledgement from frameworks. [0] - https://docs.google.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/edit was: Following a suggestion from [~bmahler] we should review the design document [0] for maintenance primitives and consider adding support for explicit acknowledgement from frameworks. [0] - https://docs.google.com/document/d/1CIoOnBLFiEvmhOe-h_s8M4m9Qa7BLETuj_dSNJW959U Review design document for maintenance primitives -- Key: MESOS-3010 URL: https://issues.apache.org/jira/browse/MESOS-3010 Project: Mesos Issue Type: Task Reporter: Artem Harutyunyan Priority: Blocker Following a suggestion from [~bmahler] we should review the design document [0] for maintenance primitives and consider adding support for explicit acknowledgement from frameworks. [0] - https://docs.google.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3005) SSL tests can fail depending on hostname configuration
[ https://issues.apache.org/jira/browse/MESOS-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617587#comment-14617587 ] Adam B commented on MESOS-3005: --- commit 13a4e81dfeb9ed5515a80c2071c7fcbb696d3450 Author: Joris Van Remoortere joris.van.remoort...@gmail.com Date: Tue Jul 7 15:53:40 2015 -0700 SSL: Fix connection issue on OSX. Using the protocol based size for the `connect()` argument. Review: https://reviews.apache.org/r/36246 SSL tests can fail depending on hostname configuration -- Key: MESOS-3005 URL: https://issues.apache.org/jira/browse/MESOS-3005 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Labels: libevent, mesosphere, ssl, tests Fix For: 0.23.0 Depending on how /etc/hosts is configured, the SSL tests can fail with a bad hostname match for the certificate. We can avoid this by explicitly matching the hostname for the certificate to the IP that will be used during the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2996) Failing Docker tests on CentOS Linux release 7.1.1503.
[ https://issues.apache.org/jira/browse/MESOS-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617718#comment-14617718 ] Timothy Chen commented on MESOS-2996: - commit d959ea4359f1105ad6ad6dc59f49bf0ed5a6bb56 Author: Timothy Chen tnac...@apache.org Date: Tue Jul 7 14:40:34 2015 -0700 Remove os environment for docker executor enviornment setup. Review: https://reviews.apache.org/r/36282 Failing Docker tests on CentOS Linux release 7.1.1503. -- Key: MESOS-2996 URL: https://issues.apache.org/jira/browse/MESOS-2996 Project: Mesos Issue Type: Bug Reporter: Joerg Schad Assignee: Timothy Chen Priority: Blocker Labels: mesosphere With Mesos 0.23 rc1 several tests fail on CentOS Linux release 7.1 (will add more detail shortly). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2076) Implement maintenance primitives in the Master.
[ https://issues.apache.org/jira/browse/MESOS-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2076: - Sprint: (was: Mesosphere Sprint 14) Implement maintenance primitives in the Master. --- Key: MESOS-2076 URL: https://issues.apache.org/jira/browse/MESOS-2076 Project: Mesos Issue Type: Task Components: master Reporter: Benjamin Mahler Labels: mesosphere, twitter The master will need to do a number of things to implement the maintenance primitives: # For slaves that have a maintenance window: #* For unused resources, offers must be augmented with an Unavailability. #* For used resources, inverse offers must be sent. # For inverse offers that are declined, we must filter these before sending them again. We must also store the decline reason, guard against OOMing. #* My hunch is that we'll not want to persist the reasons in the initial approach. # When the drain window is reached, we'll make a binary decision as to whether the slave was drained, based on whether it was empty. #* If drained, we deactivate this slave and store the fact that it was drained. #* If not drained, we leave this slave activated. # Recover the maintenance information upon failover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3008) Libevent SSL doesn't use EPOLL
[ https://issues.apache.org/jira/browse/MESOS-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van Remoortere updated MESOS-3008: Labels: libevent libprocess mesosphere ssl (was: libevent libprocess ssl) Libevent SSL doesn't use EPOLL -- Key: MESOS-3008 URL: https://issues.apache.org/jira/browse/MESOS-3008 Project: Mesos Issue Type: Improvement Components: libprocess Affects Versions: 0.23.0 Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Labels: libevent, libprocess, mesosphere, ssl we currently disable to epoll in libevent to allow SSL to work. It would be more scalable if we didn't have to do that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3009) Reproduce systemd cgroup behavior
[ https://issues.apache.org/jira/browse/MESOS-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van Remoortere updated MESOS-3009: Labels: mesosphere (was: ) Reproduce systemd cgroup behavior -- Key: MESOS-3009 URL: https://issues.apache.org/jira/browse/MESOS-3009 Project: Mesos Issue Type: Task Reporter: Artem Harutyunyan Assignee: Joris Van Remoortere Labels: mesosphere It has been noticed before that systemd reorganizes cgroup hierarchy created by mesos slave. Because of this mesos is no longer able to find the cgroup, and there is also a chance of undoing the isolation that mesos slave puts in place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2066) Add optional 'Unavailability' to resource offers to provide maintenance awareness.
[ https://issues.apache.org/jira/browse/MESOS-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2066: - Assignee: Joseph Wu Add optional 'Unavailability' to resource offers to provide maintenance awareness. -- Key: MESOS-2066 URL: https://issues.apache.org/jira/browse/MESOS-2066 Project: Mesos Issue Type: Task Reporter: Benjamin Mahler Assignee: Joseph Wu Labels: mesosphere, twitter In order to inform frameworks about upcoming maintenance on offered resources, per MESOS-1474, we'd like to add an optional 'Unavailability' information to offers: {code} message Unavailability { required Time start = 1; // The approximate duration of the unavailability, // if this is a transient unavailability. optional Duration duration = 2; } message Offer { required OfferID id = 1; required FrameworkID framework_id = 2; required SlaveID slave_id = 3; required string hostname = 4; repeated Resource resources = 5; repeated Attribute attributes = 7; repeated ExecutorID executor_ids = 6; // The resources specified in this offer will become unavailable // at the specified start time and for the specified duration. Any // tasks launched using these resources might get killed when // these resources become unavailable. optional Unavailability unavailability = 8; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2997) SSL connection failure causes failed CHECK.
[ https://issues.apache.org/jira/browse/MESOS-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van Remoortere updated MESOS-2997: Labels: libprocess mesosphere ssl (was: libprocess ssl) SSL connection failure causes failed CHECK. --- Key: MESOS-2997 URL: https://issues.apache.org/jira/browse/MESOS-2997 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Labels: libprocess, mesosphere, ssl {code} [ RUN ] SSLTest.BasicSameProcess F0706 18:32:28.465451 238583808 libevent_ssl_socket.cpp:507] Check failed: 'self-bev' Must be non NULL {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2998) Disable Persistent Volumes, Dynamic Reservations via master flags
[ https://issues.apache.org/jira/browse/MESOS-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2998: -- Target Version/s: 0.24.0 (was: 0.23.0) Disable Persistent Volumes, Dynamic Reservations via master flags - Key: MESOS-2998 URL: https://issues.apache.org/jira/browse/MESOS-2998 Project: Mesos Issue Type: Improvement Components: master Affects Versions: 0.23.0 Reporter: Adam B Assignee: Michael Park Labels: mesosphere, persistence, reservations, volumes As an operator, I might not want frameworks using the experimental dynamic reservations/persistent volumes APIs in 0.23, since there are no ACLs or operator endpoints for me to manage them. That means that a rogue framework could start reserving resources and creating volumes with all resources provided, and I would have no way to clean them up. Is it possible to disable these features from the master (flags, etc.)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2199: -- Target Version/s: 0.24.0 (was: 0.23.0) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617519#comment-14617519 ] Adam B commented on MESOS-2199: --- commit de13d78b7c2a87162c77e7f296784913d90901fd Author: Adam B a...@mesosphere.io Date: Tue Jul 7 14:35:39 2015 -0700 Disabled ROOT_RunTaskWithCommandInfoWithUser for MESOS-2199. Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2075) Add maintenance information to the replicated registry.
[ https://issues.apache.org/jira/browse/MESOS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2075: - Sprint: (was: Mesosphere Sprint 14) Add maintenance information to the replicated registry. --- Key: MESOS-2075 URL: https://issues.apache.org/jira/browse/MESOS-2075 Project: Mesos Issue Type: Task Components: master Reporter: Benjamin Mahler Labels: mesosphere, twitter To achieve fault-tolerance for the maintenance primitives, we will need to add the maintenance information to the registry. The registry currently stores all of the slave information, which is quite large (~ 17MB for 50,000 slaves from my testing), which results in a protobuf object that is extremely expensive to copy. As far as I can tell, reads / writes to maintenance information is independent of reads / writes to the existing 'registry' information. So there are two approach here: h4. Add maintenance information to 'maintenance' key: # The advantage of this approach is that we don't further grow the large Registry object. # This approach assumes that writes to 'maintenance' are independent of writes to the 'registry'. If these writes are not independent, this approach requires that we add transactional support to the State abstraction. # This approach requires adding compaction to LogStorage. # This approach likely requires some refactoring to the Registrar. h4. Add maintenance information to 'registry' key: # The advantage of this approach is that it's the easiest to implement. # This will further grow the single 'registry' object, but doesn't preclude it being split apart in the future. # This approach may require using the diff support in LogStorage and/or adding compression support to LogStorage snapshots to deal with the increased size of the registry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3005) SSL tests can fail depending on hostname configuration
[ https://issues.apache.org/jira/browse/MESOS-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van Remoortere updated MESOS-3005: Labels: libevent mesosphere ssl tests (was: libevent ssl tests) SSL tests can fail depending on hostname configuration -- Key: MESOS-3005 URL: https://issues.apache.org/jira/browse/MESOS-3005 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Labels: libevent, mesosphere, ssl, tests Fix For: 0.23.0 Depending on how /etc/hosts is configured, the SSL tests can fail with a bad hostname match for the certificate. We can avoid this by explicitly matching the hostname for the certificate to the IP that will be used during the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2991) Compilation Error on Mac OS 10.10.4 with clang 3.5.0
[ https://issues.apache.org/jira/browse/MESOS-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616609#comment-14616609 ] Alexander Rukletsov commented on MESOS-2991: Contributor: [~alex-mesos] Initial reviewer: [~mcypark] Final reviewer: [~adam-mesos] Compilation Error on Mac OS 10.10.4 with clang 3.5.0 Key: MESOS-2991 URL: https://issues.apache.org/jira/browse/MESOS-2991 Project: Mesos Issue Type: Bug Components: stout, test Affects Versions: 0.23.0 Reporter: Alexander Rukletsov Assignee: Michael Park Labels: mesosphere Compiling 0.23.0 (rc1) produces compilation errors on Mac OS 10.10.4 with {{g++}} based on LLVM 3.5. It looks like the issue was introduced in {{a5640ad813e6256b548fca068f04fd9fa3a03eda}}, https://reviews.apache.org/r/32838. In contrast to the commit message, compiling the rc with gcc4.4 on CentOS worked fine for me. According to 0.23 release notes and MESOS-2604, we should support clang 3.5. {code} ../../../../../3rdparty/libprocess/3rdparty/stout/tests/os_tests.cpp:543:25: error: conversion from 'void ()' to 'const Optionvoid (*)()' is ambiguous Fork(dosetsid, // Great-great-granchild. ^~~~ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:40:3: note: candidate constructor Option(const T _t) : state(SOME), t(_t) {} ^ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:42:3: note: candidate constructor Option(T _t) : state(SOME), t(std::move(_t)) {} ^ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:45:3: note: candidate constructor [with U = void ()] Option(const U u) : state(SOME), t(u) {} ^ {code} Compiler version: {code} $ g++ --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn) Target: x86_64-apple-darwin14.4.0 Thread model: posix {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2991) Compilation Error on Mac OS 10.10.4 with clang 3.5.0
[ https://issues.apache.org/jira/browse/MESOS-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-2991: --- Shepherd: Adam B Compilation Error on Mac OS 10.10.4 with clang 3.5.0 Key: MESOS-2991 URL: https://issues.apache.org/jira/browse/MESOS-2991 Project: Mesos Issue Type: Bug Components: stout, test Affects Versions: 0.23.0 Reporter: Alexander Rukletsov Assignee: Alexander Rukletsov Labels: mesosphere Compiling 0.23.0 (rc1) produces compilation errors on Mac OS 10.10.4 with {{g++}} based on LLVM 3.5. It looks like the issue was introduced in {{a5640ad813e6256b548fca068f04fd9fa3a03eda}}, https://reviews.apache.org/r/32838. In contrast to the commit message, compiling the rc with gcc4.4 on CentOS worked fine for me. According to 0.23 release notes and MESOS-2604, we should support clang 3.5. {code} ../../../../../3rdparty/libprocess/3rdparty/stout/tests/os_tests.cpp:543:25: error: conversion from 'void ()' to 'const Optionvoid (*)()' is ambiguous Fork(dosetsid, // Great-great-granchild. ^~~~ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:40:3: note: candidate constructor Option(const T _t) : state(SOME), t(_t) {} ^ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:42:3: note: candidate constructor Option(T _t) : state(SOME), t(std::move(_t)) {} ^ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:45:3: note: candidate constructor [with U = void ()] Option(const U u) : state(SOME), t(u) {} ^ {code} Compiler version: {code} $ g++ --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn) Target: x86_64-apple-darwin14.4.0 Thread model: posix {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches
[ https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616551#comment-14616551 ] haosdent commented on MESOS-2588: - Hi, [~baotiao] I am not sure about this. Let's see [~tnachen]'s opinions. Create pre-create hook before a Docker container launches - Key: MESOS-2588 URL: https://issues.apache.org/jira/browse/MESOS-2588 Project: Mesos Issue Type: Bug Components: docker Reporter: Timothy Chen Assignee: haosdent To be able to support custom actions to be called before launching a docker contianer, we should create a hook that can be extensible and allow module/hooks to be performed before a docker container is launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3011) Publish release documentation for major releases on website
Paul Brett created MESOS-3011: - Summary: Publish release documentation for major releases on website Key: MESOS-3011 URL: https://issues.apache.org/jira/browse/MESOS-3011 Project: Mesos Issue Type: Documentation Reporter: Paul Brett Currently, the website only provides a single version of the documentation. We should publish documentation for each release on the website independently (for example as https://mesos.apache.org/documentation/0.22/index.html, https://mesos.apache.org/documentation/0.23/index.html) and make latest redirect to the current version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2997) SSL connection failure causes failed CHECK.
[ https://issues.apache.org/jira/browse/MESOS-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2997: -- Fix Version/s: 0.23.0 SSL connection failure causes failed CHECK. --- Key: MESOS-2997 URL: https://issues.apache.org/jira/browse/MESOS-2997 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Labels: libprocess, mesosphere, ssl Fix For: 0.23.0 {code} [ RUN ] SSLTest.BasicSameProcess F0706 18:32:28.465451 238583808 libevent_ssl_socket.cpp:507] Check failed: 'self-bev' Must be non NULL {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2992) Improve attribute documentation to reflect current state
[ https://issues.apache.org/jira/browse/MESOS-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2992: -- Fix Version/s: 0.23.0 Improve attribute documentation to reflect current state Key: MESOS-2992 URL: https://issues.apache.org/jira/browse/MESOS-2992 Project: Mesos Issue Type: Documentation Components: documentation Reporter: Timothy Chen Assignee: Timothy Chen Fix For: 0.23.0 Currently the attributes doc is out of date, and doesn't reflect all the latest attributes types we support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3017) Make container-IP available via Master endpoint
Kapil Arya created MESOS-3017: - Summary: Make container-IP available via Master endpoint Key: MESOS-3017 URL: https://issues.apache.org/jira/browse/MESOS-3017 Project: Mesos Issue Type: Task Reporter: Kapil Arya Assignee: Kapil Arya -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3000) Failing test - NsTest.ROOT_setns
[ https://issues.apache.org/jira/browse/MESOS-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617872#comment-14617872 ] haosdent commented on MESOS-3000: - Also because of other user could not access /home/idownes/workspace/mesos/build/src/setns-test-helper Failing test - NsTest.ROOT_setns Key: MESOS-3000 URL: https://issues.apache.org/jira/browse/MESOS-3000 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.23.0 Reporter: Ian Downes Priority: Blocker Appears to be the same issue plaguing MESOS-2199 {noformat} [root@hostname build]# MESOS_VERBOSE=1 ./bin/mesos-tests.sh --gtest_filter=NsTest.ROOT_setns ... [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from NsTest [ RUN ] NsTest.ROOT_setns ABORT: (../../../3rdparty/libprocess/src/subprocess.cpp:163): Failed to os::execvpe in childMain: Permission denied*** Aborted at 1436292540 (unix time) try date -d @1436292540 if you are using GNU date *** PC: @ 0x7f7a1229e625 __GI_raise *** SIGABRT (@0xfffe0001) received by PID 1 (TID 0x7f7a19afc820) from PID 1; stack trace: *** @ 0x7f7a13421710 (unknown) @ 0x7f7a1229e625 __GI_raise @ 0x7f7a1229fe05 __GI_abort @ 0x860ba1 (unknown) @ 0x860bcf (unknown) @ 0x7f7a1826f118 (unknown) @ 0x7f7a18274594 (unknown) @ 0x7f7a18273b88 (unknown) @ 0x7f7a18273098 (unknown) @ 0x1180720 (unknown) @ 0x117a5d7 (unknown) @ 0x7f7a123548fd clone ../../src/tests/ns_tests.cpp:121: Failure Failed to wait 15secs for status [ FAILED ] NsTest.ROOT_setns (15004 ms) [--] 1 test from NsTest (15004 ms total) [--] Global test environment tear-down ../../src/tests/environment.cpp:441: Failure Failed Tests completed with child processes remaining: -+- 40531 /home/idownes/workspace/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=NsTest.ROOT_setns \--- 40565 /home/idownes/workspace/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=NsTest.ROOT_setns [==] 1 test from 1 test case ran. (15034 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] NsTest.ROOT_setns {noformat} Relevant strace for the forked child: {noformat} ... getpid()= 1 dup2(6, 0) = 0 dup2(7, 1) = 1 dup2(8, 2) = 2 close(6) = 0 close(7) = 0 close(8) = 0 execve(/home/idownes/workspace/mesos/build/src/setns-test-helper, [setns-test-helper, SetnsTestHelper], [/* 24 vars */]) = -1 EACCES (Permission denied) write(2, ABORT: (../../../3rdparty/libpro..., 62) = 62 write(2, Failed to os::execvpe in childMa..., 53) = 53 ... {noformat} Binary that it's trying to exec: {noformat} [root@hostname build]# stat /home/idownes/workspace/mesos/build/src/setns-test-helper File: `/home/idownes/workspace/mesos/build/src/setns-test-helper' Size: 7948Blocks: 16 IO Block: 4096 regular file Device: 801h/2049d Inode: 22949249Links: 1 Access: (0755/-rwxr-xr-x) Uid: (13118/ idownes) Gid: ( 1500/employee) Access: 2015-07-07 17:58:09.569861237 + Modify: 2015-07-07 17:58:09.573861290 + Change: 2015-07-07 17:58:09.573861290 + [root@hostname build]# /home/idownes/workspace/mesos/build/src/setns-test-helper Usage: /home/idownes/workspace/mesos/build/src/.libs/lt-setns-test-helper subcommand [OPTIONS] Available subcommands: help SetnsTestHelper {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2715) Python egg build breakage
[ https://issues.apache.org/jira/browse/MESOS-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617879#comment-14617879 ] Greg Bowyer commented on MESOS-2715: I would leave out the Travis changes, I didn't get it to build. Python egg build breakage - Key: MESOS-2715 URL: https://issues.apache.org/jira/browse/MESOS-2715 Project: Mesos Issue Type: Bug Components: build, python api Reporter: Greg Bowyer Priority: Minor Labels: mesosphere Essentially a small build fix, the python setup.py for the native code does not add -std=c++11 to its compiler flags. This is probably a dup. Fix is here for the interested https://github.com/apache/mesos/pull/42 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3012) Support existing message passing optimization with Event/Call.
Benjamin Mahler created MESOS-3012: -- Summary: Support existing message passing optimization with Event/Call. Key: MESOS-3012 URL: https://issues.apache.org/jira/browse/MESOS-3012 Project: Mesos Issue Type: Task Reporter: Benjamin Mahler See the thread here: http://markmail.org/thread/wvapc7vkbv7z6gbx The scheduler driver currently sends framework messages directly to the slave, when possible: {noformat} (through master) Scheduler — Master — Slave Executor DriverDriver (skip master) {noformat} The slave always sends messages directly to the scheduler driver: {noformat} Scheduler Master Slave Executor DriverDriver (skip master) {noformat} In order for the scheduler driver to receive Events from the master, it needs enough information to continue directly sending messages to slaves. This was previously accomplished by sending the slave's pid inside the [offer message|https://github.com/apache/mesos/blob/0.23.0-rc1/src/messages/messages.proto#L168]: {code} message ResourceOffersMessage { repeated Offer offers = 1; repeated string pids = 2; } {code} We could add an 'Address' to the Offer protobuf to provide the scheduler driver with the same information: {code} message Address { required string ip; required string hostname; required uint32_t port; // All HTTP requests to this address must begin with this prefix. required string path_prefix; } message Offer { required OfferID id = 1; required FrameworkID framework_id = 2; required SlaveID slave_id = 3; required string hostname = 4; // Deprecated in favor of 'address'. optional Address address = 8; // Obviates 'hostname'. ... } {code} The path prefix is required for testing purposes, where we can have multiple slaves within a process (e.g. {{localhost:5051/slave(1)/state.json}} vs. {{localhost:5051/slave(2)/state.json}}). This provides enough information to allow the scheduler driver to continue to directly send messages to the slaves, which unblocks MESOS-2910. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2910) Add an Event message handler to scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617784#comment-14617784 ] Benjamin Mahler commented on MESOS-2910: This was blocked because we don't have enough information in Event/Call to continue sending messages directly to the slaves. Linking in the blocking ticket. Add an Event message handler to scheduler driver Key: MESOS-2910 URL: https://issues.apache.org/jira/browse/MESOS-2910 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Benjamin Mahler Adding this handler lets master send Event messages to the driver. See MESOS-2909 for additional context. This ticket only tracks the installation of the handler and maybe handling of a single event for testing. Additional events handling will be captured in a different ticket(s). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3013) Extend DiscoveryInfo to include NetworkRequirement message
Kapil Arya created MESOS-3013: - Summary: Extend DiscoveryInfo to include NetworkRequirement message Key: MESOS-3013 URL: https://issues.apache.org/jira/browse/MESOS-3013 Project: Mesos Issue Type: Bug Reporter: Kapil Arya Assignee: Kapil Arya As per the [design doc|https://docs.google.com/document/d/17mXtAmdAXcNBwp_JfrxmZcQrs7EO6ancSbejrqjLQ0g], we need to enable frameworks to specify network requirements. The proposed message could be along the lines of: {code} message NetworkRequirement { enum Protocol { IPv4, IPv6 } required Protocol protocol; // A netgroup is the name given to a set of logically-related IPs that are // allowed to communicate within themselves. For example, one might want // to create separate netgroups for dev, testing, qa and prod deployment // environments. repeated string netgroups; // Sticky IPs allow a framwork to re-launch a task with the same IP on a // different Slave/Node. optional bool sticky [default = false]; // A unique id that the framework uses to tag the assigned IP. This tag // can be later used to reclaim IP while relaunching the task. optional string id; }; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3013) Extend DiscoveryInfo to include NetworkRequirement message
[ https://issues.apache.org/jira/browse/MESOS-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-3013: -- Issue Type: Task (was: Bug) Extend DiscoveryInfo to include NetworkRequirement message Key: MESOS-3013 URL: https://issues.apache.org/jira/browse/MESOS-3013 Project: Mesos Issue Type: Task Reporter: Kapil Arya Assignee: Kapil Arya Labels: mesosphere As per the [design doc|https://docs.google.com/document/d/17mXtAmdAXcNBwp_JfrxmZcQrs7EO6ancSbejrqjLQ0g], we need to enable frameworks to specify network requirements. The proposed message could be along the lines of: {code} message NetworkRequirement { enum Protocol { IPv4, IPv6 } required Protocol protocol; // A netgroup is the name given to a set of logically-related IPs that are // allowed to communicate within themselves. For example, one might want // to create separate netgroups for dev, testing, qa and prod deployment // environments. repeated string netgroups; // Sticky IPs allow a framwork to re-launch a task with the same IP on a // different Slave/Node. optional bool sticky [default = false]; // A unique id that the framework uses to tag the assigned IP. This tag // can be later used to reclaim IP while relaunching the task. optional string id; }; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617869#comment-14617869 ] haosdent commented on MESOS-2199: - {quote} I think it's flawed to require global read access for the build directory... {quote} Hi, [~idownes].If nobody could not read /home/idownes/build, he also could not read /home/idownes/build/src/.libs/lt-mesos-executor and execute. So the contradictory place appears: 1. if we want nobody could execute /home/idownes/build/src/.libs/lt-mesos-executor, he should have r-x permissions in these directories: {code} /home /home/idownes /home/idownes/build /home/idownes/build/src /home/idownes/build/src/.libs {code} 2. if we don't want nobody access /home/idownes/ , he also could not execute /home/idownes/build/src/.libs/lt-mesos-executor because lt-mesos-executor belongs to /home/idownes/ Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617869#comment-14617869 ] haosdent edited comment on MESOS-2199 at 7/8/15 2:41 AM: - {quote} I think it's flawed to require global read access for the build directory... {quote} Hi, [~idownes].If nobody could not read /home/idownes/build, he also could not read /home/idownes/build/src/.libs/lt-mesos-executor and execute. So the contradictory place appears: 1. if we want nobody could execute /home/idownes/build/src/.libs/lt-mesos-executor, he should have r-x permissions in these directories: {code} /home /home/idownes /home/idownes/build /home/idownes/build/src /home/idownes/build/src/.libs {code} 2. if we don't want nobody access /home/idownes/ , he also could not execute /home/idownes/build/src/.libs/lt-mesos-executor because lt-mesos-executor belongs to /home/idownes/ More details is [here|http://unix.stackexchange.com/questions/13858/do-the-parent-directorys-permissions-matter-when-accessing-a-subdirectory]. And need chmod o+x /home/idownes was (Author: haosd...@gmail.com): {quote} I think it's flawed to require global read access for the build directory... {quote} Hi, [~idownes].If nobody could not read /home/idownes/build, he also could not read /home/idownes/build/src/.libs/lt-mesos-executor and execute. So the contradictory place appears: 1. if we want nobody could execute /home/idownes/build/src/.libs/lt-mesos-executor, he should have r-x permissions in these directories: {code} /home /home/idownes /home/idownes/build /home/idownes/build/src /home/idownes/build/src/.libs {code} 2. if we don't want nobody access /home/idownes/ , he also could not execute /home/idownes/build/src/.libs/lt-mesos-executor because lt-mesos-executor belongs to /home/idownes/ More details is here. And need chmod o+x /home/idownes Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2996) Failing Docker tests on CentOS Linux release 7.1.1503.
[ https://issues.apache.org/jira/browse/MESOS-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2996: -- Fix Version/s: 0.23.0 Failing Docker tests on CentOS Linux release 7.1.1503. -- Key: MESOS-2996 URL: https://issues.apache.org/jira/browse/MESOS-2996 Project: Mesos Issue Type: Bug Reporter: Joerg Schad Assignee: Timothy Chen Priority: Blocker Labels: mesosphere Fix For: 0.23.0 With Mesos 0.23 rc1 several tests fail on CentOS Linux release 7.1 (will add more detail shortly). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3016) Add task status update hooks for Master/Slave
Kapil Arya created MESOS-3016: - Summary: Add task status update hooks for Master/Slave Key: MESOS-3016 URL: https://issues.apache.org/jira/browse/MESOS-3016 Project: Mesos Issue Type: Task Reporter: Kapil Arya Assignee: Kapil Arya The task termination hooks are needed for doing task-specific cleanup in Master/Slave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-3000) Failing test - NsTest.ROOT_setns
[ https://issues.apache.org/jira/browse/MESOS-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-3000: Comment: was deleted (was: Also because of other user could not access /home/idownes/workspace/mesos/build/src/setns-test-helper) Failing test - NsTest.ROOT_setns Key: MESOS-3000 URL: https://issues.apache.org/jira/browse/MESOS-3000 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.23.0 Reporter: Ian Downes Priority: Blocker Appears to be the same issue plaguing MESOS-2199 {noformat} [root@hostname build]# MESOS_VERBOSE=1 ./bin/mesos-tests.sh --gtest_filter=NsTest.ROOT_setns ... [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from NsTest [ RUN ] NsTest.ROOT_setns ABORT: (../../../3rdparty/libprocess/src/subprocess.cpp:163): Failed to os::execvpe in childMain: Permission denied*** Aborted at 1436292540 (unix time) try date -d @1436292540 if you are using GNU date *** PC: @ 0x7f7a1229e625 __GI_raise *** SIGABRT (@0xfffe0001) received by PID 1 (TID 0x7f7a19afc820) from PID 1; stack trace: *** @ 0x7f7a13421710 (unknown) @ 0x7f7a1229e625 __GI_raise @ 0x7f7a1229fe05 __GI_abort @ 0x860ba1 (unknown) @ 0x860bcf (unknown) @ 0x7f7a1826f118 (unknown) @ 0x7f7a18274594 (unknown) @ 0x7f7a18273b88 (unknown) @ 0x7f7a18273098 (unknown) @ 0x1180720 (unknown) @ 0x117a5d7 (unknown) @ 0x7f7a123548fd clone ../../src/tests/ns_tests.cpp:121: Failure Failed to wait 15secs for status [ FAILED ] NsTest.ROOT_setns (15004 ms) [--] 1 test from NsTest (15004 ms total) [--] Global test environment tear-down ../../src/tests/environment.cpp:441: Failure Failed Tests completed with child processes remaining: -+- 40531 /home/idownes/workspace/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=NsTest.ROOT_setns \--- 40565 /home/idownes/workspace/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=NsTest.ROOT_setns [==] 1 test from 1 test case ran. (15034 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] NsTest.ROOT_setns {noformat} Relevant strace for the forked child: {noformat} ... getpid()= 1 dup2(6, 0) = 0 dup2(7, 1) = 1 dup2(8, 2) = 2 close(6) = 0 close(7) = 0 close(8) = 0 execve(/home/idownes/workspace/mesos/build/src/setns-test-helper, [setns-test-helper, SetnsTestHelper], [/* 24 vars */]) = -1 EACCES (Permission denied) write(2, ABORT: (../../../3rdparty/libpro..., 62) = 62 write(2, Failed to os::execvpe in childMa..., 53) = 53 ... {noformat} Binary that it's trying to exec: {noformat} [root@hostname build]# stat /home/idownes/workspace/mesos/build/src/setns-test-helper File: `/home/idownes/workspace/mesos/build/src/setns-test-helper' Size: 7948Blocks: 16 IO Block: 4096 regular file Device: 801h/2049d Inode: 22949249Links: 1 Access: (0755/-rwxr-xr-x) Uid: (13118/ idownes) Gid: ( 1500/employee) Access: 2015-07-07 17:58:09.569861237 + Modify: 2015-07-07 17:58:09.573861290 + Change: 2015-07-07 17:58:09.573861290 + [root@hostname build]# /home/idownes/workspace/mesos/build/src/setns-test-helper Usage: /home/idownes/workspace/mesos/build/src/.libs/lt-setns-test-helper subcommand [OPTIONS] Available subcommands: help SetnsTestHelper {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2993) Document per container unique egress flow and network queueing statistics
[ https://issues.apache.org/jira/browse/MESOS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617770#comment-14617770 ] Paul Brett commented on MESOS-2993: --- Update incorporating reviewer comments. Document per container unique egress flow and network queueing statistics -- Key: MESOS-2993 URL: https://issues.apache.org/jira/browse/MESOS-2993 Project: Mesos Issue Type: Bug Components: documentation, isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Document new network isolation capabilities in 0.23 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3013) Extend DiscoveryInfo to include NetworkRequirement message
[ https://issues.apache.org/jira/browse/MESOS-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-3013: -- Labels: mesosphere (was: ) Extend DiscoveryInfo to include NetworkRequirement message Key: MESOS-3013 URL: https://issues.apache.org/jira/browse/MESOS-3013 Project: Mesos Issue Type: Bug Reporter: Kapil Arya Assignee: Kapil Arya Labels: mesosphere As per the [design doc|https://docs.google.com/document/d/17mXtAmdAXcNBwp_JfrxmZcQrs7EO6ancSbejrqjLQ0g], we need to enable frameworks to specify network requirements. The proposed message could be along the lines of: {code} message NetworkRequirement { enum Protocol { IPv4, IPv6 } required Protocol protocol; // A netgroup is the name given to a set of logically-related IPs that are // allowed to communicate within themselves. For example, one might want // to create separate netgroups for dev, testing, qa and prod deployment // environments. repeated string netgroups; // Sticky IPs allow a framwork to re-launch a task with the same IP on a // different Slave/Node. optional bool sticky [default = false]; // A unique id that the framework uses to tag the assigned IP. This tag // can be later used to reclaim IP while relaunching the task. optional string id; }; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches
[ https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616706#comment-14616706 ] chenzongzhi commented on MESOS-2588: if we want to set the container's cgoup by script, we need know the path, then we can change the value directly by write a value in this file. such as we can echo 1 /cgroup/cpu/docker/ce00c65f07924ab5225e655a4e2fc6e7f30e63e1ac7a49901463002946fd196f/cpu.cfs_period_us to implement our limitation Or we can set the cgoup limit like MesosContainer. Create pre-create hook before a Docker container launches - Key: MESOS-2588 URL: https://issues.apache.org/jira/browse/MESOS-2588 Project: Mesos Issue Type: Bug Components: docker Reporter: Timothy Chen Assignee: haosdent To be able to support custom actions to be called before launching a docker contianer, we should create a hook that can be extensible and allow module/hooks to be performed before a docker container is launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2991) Compilation Error on Mac OS 10.10.4 with clang 3.5.0
[ https://issues.apache.org/jira/browse/MESOS-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616705#comment-14616705 ] Benjamin Hindman commented on MESOS-2991: - Hey folks, [~mcypark] and I came across this issue in the past and the solution was as simple as adding `` in front of `dosetsid`, to clearly inform the compiler that we want this to be treated as a function pointer. I think this was lost in one of the patch sets that [~mcypark] and I had been working on, unfortunately, but has been added now. Can folks retry and see if this is still an issue? Compilation Error on Mac OS 10.10.4 with clang 3.5.0 Key: MESOS-2991 URL: https://issues.apache.org/jira/browse/MESOS-2991 Project: Mesos Issue Type: Bug Components: stout, test Affects Versions: 0.23.0 Reporter: Alexander Rukletsov Assignee: Michael Park Labels: mesosphere Compiling 0.23.0 (rc1) produces compilation errors on Mac OS 10.10.4 with {{g++}} based on LLVM 3.5. It looks like the issue was introduced in {{a5640ad813e6256b548fca068f04fd9fa3a03eda}}, https://reviews.apache.org/r/32838. In contrast to the commit message, compiling the rc with gcc4.4 on CentOS worked fine for me. According to 0.23 release notes and MESOS-2604, we should support clang 3.5. {code} ../../../../../3rdparty/libprocess/3rdparty/stout/tests/os_tests.cpp:543:25: error: conversion from 'void ()' to 'const Optionvoid (*)()' is ambiguous Fork(dosetsid, // Great-great-granchild. ^~~~ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:40:3: note: candidate constructor Option(const T _t) : state(SOME), t(_t) {} ^ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:42:3: note: candidate constructor Option(T _t) : state(SOME), t(std::move(_t)) {} ^ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:45:3: note: candidate constructor [with U = void ()] Option(const U u) : state(SOME), t(u) {} ^ {code} Compiler version: {code} $ g++ --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn) Target: x86_64-apple-darwin14.4.0 Thread model: posix {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches
[ https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617935#comment-14617935 ] haosdent commented on MESOS-2588: - Hi [~baotiao] You could not do that before docker launch. This hook is execute before create the docker container and could not get the container id of docker. I think your requirements is this issue: https://issues.apache.org/jira/browse/MESOS-2154 Create pre-create hook before a Docker container launches - Key: MESOS-2588 URL: https://issues.apache.org/jira/browse/MESOS-2588 Project: Mesos Issue Type: Bug Components: docker Reporter: Timothy Chen Assignee: haosdent To be able to support custom actions to be called before launching a docker contianer, we should create a hook that can be extensible and allow module/hooks to be performed before a docker container is launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-2588) Create pre-create hook before a Docker container launches
[ https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-2588: Comment: was deleted (was: Hi [~baotiao] You could not do that before docker launch. This hook is execute before create the docker container and could not get the container id of docker. I think your requirements is this issue: https://issues.apache.org/jira/browse/MESOS-2154) Create pre-create hook before a Docker container launches - Key: MESOS-2588 URL: https://issues.apache.org/jira/browse/MESOS-2588 Project: Mesos Issue Type: Bug Components: docker Reporter: Timothy Chen Assignee: haosdent To be able to support custom actions to be called before launching a docker contianer, we should create a hook that can be extensible and allow module/hooks to be performed before a docker container is launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches
[ https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617936#comment-14617936 ] haosdent commented on MESOS-2588: - Hi [~baotiao] You could not do that before docker launch. This hook is execute before create the docker container and could not get the container id of docker. I think your requirements is this issue: https://issues.apache.org/jira/browse/MESOS-2154 Create pre-create hook before a Docker container launches - Key: MESOS-2588 URL: https://issues.apache.org/jira/browse/MESOS-2588 Project: Mesos Issue Type: Bug Components: docker Reporter: Timothy Chen Assignee: haosdent To be able to support custom actions to be called before launching a docker contianer, we should create a hook that can be extensible and allow module/hooks to be performed before a docker container is launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617397#comment-14617397 ] Adam B commented on MESOS-3002: --- The changes for MESOS-2800 to Rename OptionT::get(const T _t) to getOrElse() happened after the 0.23.0-rc1 cut and are not planned for cherry-picking into the release. The Fix Version of MESOS-2800 is 0.24.0, so the Affects Version of this ticket (MESOS-3002) is really 0.24.0, and hence its Target Version should also be 0.24.0. Please let me know otherwise if you actually saw this build error when building from the 0.23.0-rc1 tag. Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Joris Van Remoortere Priority: Blocker Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617463#comment-14617463 ] Adam B commented on MESOS-2199: --- Good point, [~idownes]. I'll disable this test for now (0.23), and we can revisit the proper fix in 0.24. Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)