[jira] [Commented] (MESOS-1634) Calling stop on SchedulerDriver leaves Zookeeper connection left behind
[ https://issues.apache.org/jira/browse/MESOS-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495680#comment-14495680 ] Robert Lacroix commented on MESOS-1634: --- https://reviews.apache.org/r/33208/ > Calling stop on SchedulerDriver leaves Zookeeper connection left behind > --- > > Key: MESOS-1634 > URL: https://issues.apache.org/jira/browse/MESOS-1634 > Project: Mesos > Issue Type: Bug > Components: framework >Affects Versions: 0.18.0 >Reporter: Robert Lacroix >Assignee: Robert Lacroix > > When calling stop on SchedulerDriver, the Zookeeper connection of > ZooKeeperMasterDetector is not closed. This leaks connections to Zookeeper. > We should properly close them when stop is called. > {code} > $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED > 2 > 2014-07-23 17:46:53,840:26108(0x1246a8000):ZOO_INFO@check_events@1750: > session establishment complete on server [127.0.0.1:2181], > sessionId=0x14765c8e1a4000a, negotiated timeout=1 > $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED > 3 > I0723 17:48:57.354792 662249472 sched.cpp:730] Stopping framework > '20140723-174036-16777343-5050-26021-' > $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED > 3 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-1634) Calling stop on SchedulerDriver leaves Zookeeper connection left behind
[ https://issues.apache.org/jira/browse/MESOS-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Lacroix reassigned MESOS-1634: - Assignee: Robert Lacroix > Calling stop on SchedulerDriver leaves Zookeeper connection left behind > --- > > Key: MESOS-1634 > URL: https://issues.apache.org/jira/browse/MESOS-1634 > Project: Mesos > Issue Type: Bug > Components: framework >Affects Versions: 0.18.0 >Reporter: Robert Lacroix >Assignee: Robert Lacroix > > When calling stop on SchedulerDriver, the Zookeeper connection of > ZooKeeperMasterDetector is not closed. This leaks connections to Zookeeper. > We should properly close them when stop is called. > {code} > $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED > 2 > 2014-07-23 17:46:53,840:26108(0x1246a8000):ZOO_INFO@check_events@1750: > session establishment complete on server [127.0.0.1:2181], > sessionId=0x14765c8e1a4000a, negotiated timeout=1 > $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED > 3 > I0723 17:48:57.354792 662249472 sched.cpp:730] Stopping framework > '20140723-174036-16777343-5050-26021-' > $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED > 3 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2023) mesos-execute should allow setting environment variables
[ https://issues.apache.org/jira/browse/MESOS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495559#comment-14495559 ] haosdent commented on MESOS-2023: - [~adam-mesos]Could you help me review this? Thank you very much. > mesos-execute should allow setting environment variables > > > Key: MESOS-2023 > URL: https://issues.apache.org/jira/browse/MESOS-2023 > Project: Mesos > Issue Type: Improvement > Components: cli >Affects Versions: 0.20.1 >Reporter: Steven Schlansker >Assignee: haosdent > Labels: newbie > > mesos-execute does not allow setting various properties of the 'CommandInfo' > protobuf. Most notably, being able to set environment variables and URIs > would be very useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2203) Old Centos 6.5 kernels/headers not sufficient for building Mesos
[ https://issues.apache.org/jira/browse/MESOS-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495550#comment-14495550 ] haosdent commented on MESOS-2203: - In lxc, https://github.com/lxc/lxc/blob/master/src/lxc/utils.h#L53-L66 They check __NR_setns like this: {code} /* Define setns() if missing from the C library */ #ifndef HAVE_SETNS static inline int setns(int fd, int nstype) { #ifdef __NR_setns return syscall(__NR_setns, fd, nstype); #elif defined(__NR_set_ns) return syscall(__NR_set_ns, fd, nstype); #else errno = ENOSYS; return -1; #endif } #endif {code} > Old Centos 6.5 kernels/headers not sufficient for building Mesos > > > Key: MESOS-2203 > URL: https://issues.apache.org/jira/browse/MESOS-2203 > Project: Mesos > Issue Type: Documentation >Affects Versions: 0.21.0 > Environment: E.g. Centos 6.5 with kernel 2.6.32-279.14.1 >Reporter: Hans van den Bogert >Priority: Minor > > Old kernels are not sufficient for building Mesos: > bq. > Error: > bq. libtool: compile: g++ -DPACKAGE_NAME=\"mesos\" > -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.21.0\" > "-DPACKAGE_STRING=\"mesos 0.21.0\"" -DPACKAGE_BUGREPORT=\"\" > -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.21.0\" -DSTDC_HEADERS=1 > -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 > -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 > -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 > -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 > -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 > -DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror > -DLIBDIR=\"/var/scratch/vdbogert/lib\" > -DPKGLIBEXECDIR=\"/var/scratch/vdbogert/libexec/mesos\" > -DPKGDATADIR=\"/var/scratch/vdbogert/share/mesos\" -I../../include > -I../../3rdparty/libprocess/include > -I../../3rdparty/libprocess/3rdparty/stout/include -I../include > -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 > -I../3rdparty/libprocess/3rdparty/picojson-4f93734 > -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src > -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src > -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src > -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include > -I../3rdparty/zookeeper-3.4.5/src/c/generated > -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src > -I/var/scratch/vdbogert//include/subversion-1 -I/usr/include/apr-1 > -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 > -MT slave/containerizer/isolators/namespaces/libmesos_no_3rdparty_la-pid.lo > -MD -MP -MF > slave/containerizer/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo > -c ../../src/slave/containerizer/isolators/namespaces/pid.cpp -fPIC -DPIC > -o > slave/containerizer/isolators/namespaces/.libs/libmesos_no_3rdparty_la-pid.o > In file included from /usr/include/sys/syscall.h:32:0, > from ../../src/linux/ns.hpp:26, > from > ../../src/slave/containerizer/isolators/namespaces/pid.cpp:31: > ../../src/linux/ns.hpp: In function 'Try ns::setns(const string&, > const string&)': > ../../src/linux/ns.hpp:167:23: error: '__NR_setns' was not declared in this > scope >int ret = ::syscall(SYS_setns, fd.get(), nstype.get()); >^ > Perhaps this should be stated on: > http://mesos.apache.org/gettingstarted/ because taking myself as example, > this has cost me a lot of time to pinpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2619) Document master-scheduler communication
Connor Doyle created MESOS-2619: --- Summary: Document master-scheduler communication Key: MESOS-2619 URL: https://issues.apache.org/jira/browse/MESOS-2619 Project: Mesos Issue Type: Bug Components: documentation Affects Versions: 0.22.0 Reporter: Connor Doyle New users often stumble on the networking requirements for communication between schedulers and the Mesos master. It's not explicitly stated anywhere that the master has to talk back to the scheduler. Also, some configuration options (like the LIBPROCESS_PORT environment variable) are under-documented. This problem is exacerbated as many new users start playing with Mesos and scheduers in unpredictable networking contexts (NAT, containers with bridged networking, etc.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2618) Update C++ style guide on function definition / invocation formatting.
Till Toenshoff created MESOS-2618: - Summary: Update C++ style guide on function definition / invocation formatting. Key: MESOS-2618 URL: https://issues.apache.org/jira/browse/MESOS-2618 Project: Mesos Issue Type: Documentation Reporter: Till Toenshoff Priority: Minor Our style guide currently suggests two options for cases of function definitions / invocations that do not fit into a single line even when breaking after the opening argument bracket; Fixed leading indention (4 spaces); {noformat} // 4: OK. allocator->resourcesRecovered( frameworkId, slaveId, resources, filters); {noformat} Variable leading indention; {noformat} // 3: In this case, 3 is OK. foobar(someArgument, someOtherArgument, theLastArgument); {noformat} There is a counter-case mentioned as for the latter; {noformat} // 3: Don't use in this case due to "jaggedness". allocator->resourcesRecovered(frameworkId, slaveId, resources, filters); {noformat} The problem here seems to be that the counter-case might not be well defined on when it applies. We might want to consider... A. removing the variable leading option entirely B. define the exact limits on when "jaggedness" applies -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2191) Add ContainerId to the TaskStatus message
[ https://issues.apache.org/jira/browse/MESOS-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495123#comment-14495123 ] Benjamin Mahler commented on MESOS-2191: How does that work for custom executors running in the docker containerizer? Currently schedulers may not necessarily expect data messages to come from command / docker tasks. > Add ContainerId to the TaskStatus message > - > > Key: MESOS-2191 > URL: https://issues.apache.org/jira/browse/MESOS-2191 > Project: Mesos > Issue Type: Wish > Components: containerization >Reporter: Marcel Neuhausler >Assignee: Alexander Rojas > Labels: mesosphere > > {{TaskStatus}} provides the frameworks with certain information > ({{executorId}}, {{slaveId}}, etc.) which is useful when collecting > statistics about cluster performance; however, it is difficult to associate > tasks to the container it is executed since this information stays always > within mesos itself. Therefore it would be good to provide the framework > scheduler with this information, adding a new field in the {{TaskStatus}} > message. > See comments for a use case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2605) The slave sometimes does not send active executors during reregistration
[ https://issues.apache.org/jira/browse/MESOS-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495003#comment-14495003 ] Cody Maloney commented on MESOS-2605: - That sounds like this might be related to MESOS-2601 then. Mesos doesn't currently save what containerizer created / owns a container, and so it just tries to recover the container with all of them. > The slave sometimes does not send active executors during reregistration > > > Key: MESOS-2605 > URL: https://issues.apache.org/jira/browse/MESOS-2605 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Elizabeth Lingg >Assignee: Michael Park > Labels: mesosphere > > The slave sometimes does not send active executors during reregistration. > Framework checkpointing is enabled, and the executor successfully > reregisters. However, the tasks in that executor are LOST (by abnormal > executor termination) because the executor is removed by the mesos master as > unknown. See the example below, > task.journalnode.journalnode.NodeExecutor.1428609184051. > See the Slave Logs here for the Task: > {code} > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.778790 25126 status_update_manager.cpp:317] Received status update > TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task > task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.779013 25126 status_update_manager.hpp:346] Checkpointing UPDATE for > status update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for > task task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.781788 25123 slave.cpp:2753] Forwarding the update TASK_RUNNING > (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task > task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 to master@10.142.250.253:5050 > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.781889 25123 slave.cpp:2686] Sending acknowledgement for status > update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task > task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 to executor(1)@10.168.119.78:47638 > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.784503 25124 status_update_manager.cpp:389] Received status update > acknowledgement (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task > task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.784567 25124 status_update_manager.hpp:346] Checkpointing ACK for > status update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for > task task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 > {code} > Master Logs: > {code} > Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: W0409 > 20:19:43.008666 1067 master.cpp:4015] Executor > executor.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 possibly unknown to the slave > 20150407-233647-2059219722-5050-1659-S5 at slave(1)@10.168.119.78:5051 > (ec2-54-237-57-237.compute-1.amazonaws.com) > Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 > 20:19:43.008652 1074 hierarchical.hpp:648] Recovered cpus(*):0.1; > mem(*):1536 (total allocatable: cpus(*):3.5; mem(*):21113; disk(*):142210; > ports(*):[3889-5044, 5046-5049, 2182-2958, 2960-3887, 1025-2180, 8082-9041, > 9043-9159, 9161-, 5052-6999, 7002-7198, 7200-8079, 10001-65535]) on slave > 20150407-233647-2059219722-5050-1659-S5 from framework > 20150408-002100-4261056010-5050-1047-0008 > Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 > 20:19:43.008712 1067 master.cpp:4714] Removing executor > 'executor.journalnode.NodeExecutor.1428609184051' with resources cpus(*):0.1; > mem(*):1536 of framework 20150408-002100-4261056010-5050-1047-0008 on slave > 20150407-233647-2059219722-5050-1659-S5 at slave(1)@10.168.119.78:5051 > (ec2-54-237-57-237.compute-1.amazonaws.com) > Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 > 20:19:43.010372 1067 master.cpp:3295] Status update TASK_LOST (UUID: > e5532567-e5b2-4fca-87aa-f3f98e371640) for task >
[jira] [Commented] (MESOS-2605) The slave sometimes does not send active executors during reregistration
[ https://issues.apache.org/jira/browse/MESOS-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494989#comment-14494989 ] Michael Park commented on MESOS-2605: - Reporting recent findings. {code:title=master} Apr 14 18:49:40 ip-10-168-90-31.ec2.internal mesos-master[1226]: W0414 18:49:40.078554 1248 master.cpp:4015] Executor executor.journalnode.NodeExecutor.1429034850690 of framework 20150408-055737-526034954-5050-1226-0393 possibly unknown to the slave 20150408-055737-526034954-5050-1226-S9 at slave(1)@10.154.8.101:5051 (ec2-54-237-83-163.compute-1.amazonaws.com) {code} {code:title=slave} Apr 14 18:49:36 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 18:49:36.802649 18193 slave.cpp:4305] Recovering executor 'executor.journalnode.NodeExecutor.1429034850690' of framework 20150408-055737-526034954-5050-1226-0393 /* ... */ Apr 14 18:49:36 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 18:49:36.832767 18188 status_update_manager.cpp:205] Recovering executor 'executor.journalnode.NodeExecutor.1429034850690' of framework 20150408-055737-526034954-5050-1226-0393 /* ... */ Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 18:49:36.857517 18189 docker.cpp:470] Recovering container '5338e6cf-03ac-4882-a08e-48bfd6d797dc' for executor 'executor.journalnode.NodeExecutor.1429034850690' of framework 20150408-055737-526034954-5050-1226-0393 /* ... */ Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 18:49:36.870594 18190 containerizer.cpp:350] Recovering container '5338e6cf-03ac-4882-a08e-48bfd6d797dc' for executor 'executor.journalnode.NodeExecutor.1429034850690' of framework 20150408-055737-526034954-5050-1226-0393 {code} So we somehow we're calling {{recover}} in {{docker.cpp}} as well as {{containerizer.cpp}}. But based on the fact that HDFS doesn't use {{docker}} at all, along with this log: {code} Apr 14 18:07:30 ip-10-154-8-101.ec2.internal mesos-slave[11172]: I0414 18:07:30.708111 11187 containerizer.cpp:472] Starting container '5338e6cf-03ac-4882-a08e-48bfd6d797dc' for executor 'executor.journalnode.NodeExecutor.1429034850690' of framework '20150408-055737-526034954-5050-1226-0393' {code} We should only be calling it for {{containerizer.cpp}} only. The slave proceeds to log the following sequence of events, which shows that we try to docker recover the containers and when we can't find them, we terminate the executor. {code} Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 18:49:37.602605 18186 slave.cpp:3738] Sending reconnect request to executor executor.journalnode.NodeExecutor.1429034850690 of framework 20150408-055737-526034954-5050-1226-0393 at executor(1)@10.154.8.101:60097 /* ... */ Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 18:49:37.611616 18186 slave.cpp:2321] Re-registering executor executor.journalnode.NodeExecutor.1429034850690 of framework 20150408-055737-526034954-5050-1226-0393 /* ... */ Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: E0414 18:49:37.862635 18190 slave.cpp:2456] Failed to update resources for container 5338e6cf-03ac-4882-a08e-48bfd6d797dc of executor 'executor.journalnode.NodeExecutor.1429034850690' of framework 20150408-055737-526034954-5050-1226-0393, destroying container: Failed to 'docker inspect mesos-5338e6cf-03ac-4882-a08e-48bfd6d797dc': exit status = exited with status 1 stderr = Error: No such image or container: mesos-5338e6cf-03ac-4882-a08e-48bfd6d797dc /* ... */ Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: E0414 18:49:37.976002 18187 slave.cpp:3191] Termination of executor 'executor.journalnode.NodeExecutor.1429034850690' of framework '20150408-055737-526034954-5050-1226-0393' failed: Failed to kill the Docker container: Failed to 'docker stop -t 0 mesos-5338e6cf-03ac-4882-a08e-48bfd6d797dc': exit status = exited with status 1 stderr = Error response from daemon: No such container: mesos-5338e6cf-03ac-4882-a08e-48bfd6d797dc /* ... */ Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: E0414 18:49:37.977609 18187 slave.cpp:2653] Failed to update resources for container 5338e6cf-03ac-4882-a08e-48bfd6d797dc of executor executor.journalnode.NodeExecutor.1429034850690 running task task.journalnode.journalnode.NodeExecutor.1429034850690 on status update for terminal task, destroying container: Container '5338e6cf-03ac-4882-a08e-48bfd6d797dc' not found /* ... */ Apr 14 18:49:40 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 18:49:40.079958 18188 slave.cpp:949] MPARK: Slave::doReliableRegistration Apr 14 18:49:40 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 18:49:40.080099 18188 slave.cpp:1053] MPARK: Executor 'executor.namenode.NameNodeExecutor.1429034908782' is terminated! Apr 14 18:49:40 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 18:49:4
[jira] [Commented] (MESOS-2191) Add ContainerId to the TaskStatus message
[ https://issues.apache.org/jira/browse/MESOS-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494987#comment-14494987 ] Marcel Neuhausler commented on MESOS-2191: -- Hi Timothy, Getting the "docker inspect json output" as part of the TaskInfo in the TASK_RUNNING TaskStatus message would be perfect :-) Thanks! > Add ContainerId to the TaskStatus message > - > > Key: MESOS-2191 > URL: https://issues.apache.org/jira/browse/MESOS-2191 > Project: Mesos > Issue Type: Wish > Components: containerization >Reporter: Marcel Neuhausler >Assignee: Alexander Rojas > Labels: mesosphere > > {{TaskStatus}} provides the frameworks with certain information > ({{executorId}}, {{slaveId}}, etc.) which is useful when collecting > statistics about cluster performance; however, it is difficult to associate > tasks to the container it is executed since this information stays always > within mesos itself. Therefore it would be good to provide the framework > scheduler with this information, adding a new field in the {{TaskStatus}} > message. > See comments for a use case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1939) Enable multiple authentication methods in parallel
[ https://issues.apache.org/jira/browse/MESOS-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494872#comment-14494872 ] Adam B commented on MESOS-1939: --- Note that the Authorizer and Authenticator/Authenticatee are two different (sets of) interfaces. We have already turned the Authentication interfaces into Mesos Modules, but have yet to do the same for the Authorizer interface. This JIRA is specifically for Authentication (not Authorization), for example the slaves could use the default CRAMMD5Authenticatee (src/authentication/cram_md5/authenticatee.hpp) while frameworks could authenticate via a custom authentication module (e.g. Kerberos, PKI, etc.). In the master, you would specify multiple authenticator modules, and the master could have a collection (list, set) of them. On the authenticatee side, the framework/slave would (phase 1) be started with a single authenticatee type, or (phase 2) a list of authentication mechanisms, in some order of preference. The authenticatee would have to pass the authentication mechanism to the master (perhaps via the AuthenticateMessage) so that the master can know which Authenticator to use to authenticate the authenticatee. > Enable multiple authentication methods in parallel > -- > > Key: MESOS-1939 > URL: https://issues.apache.org/jira/browse/MESOS-1939 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Till Toenshoff >Priority: Minor > Labels: authentication > > The master (authenticator) should allow for multiple authentication > mechanisms to be used at the same time. That way, a slave could be > authenticated by mechanism FOO while the frameworks are authenticated by BAR. > The authenticatee should be allowed to select the desired mechanism (module). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2233) Run ASF CI mesos builds inside docker
[ https://issues.apache.org/jira/browse/MESOS-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494617#comment-14494617 ] Vinod Kone commented on MESOS-2233: --- Some relevant discussions on Docker's GitHub. https://github.com/docker/docker/issues/7276 https://github.com/Unitech/PM2/issues/1086 I'll test with "--privilieged" flag. > Run ASF CI mesos builds inside docker > - > > Key: MESOS-2233 > URL: https://issues.apache.org/jira/browse/MESOS-2233 > Project: Mesos > Issue Type: Task > Components: technical debt >Reporter: Vinod Kone >Assignee: Vinod Kone > Labels: twitter > Attachments: Dockerfile, supervisord.conf > > > There are several limitations to mesos projects current state of CI, which is > run on builds.a.o > --> Only runs on Ubuntu > --> Doesn't run any tests that deal with cgroups > --> Doesn't run any tests that need root permissions > Now that ASF CI supports docker > (https://issues.apache.org/jira/browse/BUILDS-25), it would be great for the > Mesos project to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2616) Update C++ style guide on variable naming.
[ https://issues.apache.org/jira/browse/MESOS-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494607#comment-14494607 ] Till Toenshoff commented on MESOS-2616: --- https://reviews.apache.org/r/32536/ > Update C++ style guide on variable naming. > --- > > Key: MESOS-2616 > URL: https://issues.apache.org/jira/browse/MESOS-2616 > Project: Mesos > Issue Type: Documentation >Reporter: Till Toenshoff >Assignee: Alexander Rukletsov >Priority: Minor > > Our variable naming guide currently is not really explaining use cases for > leading or trailing underscores as found a lot within our codebase. > We should correct that. > The following was copied from the review description for allowing discussions > where needed: > Documents the patterns we use to name variables and function arguments in our > codebase. > h4.Leading underscores to avoid ambiguity. > We use this pattern extensively in libprocess, stout and mesos, a few > examples below. > * stout/try.hpp:105 > {noformat} > Try(State _state, T* _t = NULL, const std::string& _message = "") > : state(_state), t(_t), message(_message) {} > {noformat} > * process/http.hpp:480 > {noformat} > URL(const std::string& _scheme, > const std::string& _domain, > const uint16_t _port = 80, > const std::string& _path = "/", > const hashmap& _query = > (hashmap()), > const Option& _fragment = None()) > : scheme(_scheme), > domain(_domain), > port(_port), > path(_path), > query(_query), > fragment(_fragment) {} > {noformat} > * slave/containerizer/linux_launcher.cpp:56 > {noformat} > LinuxLauncher::LinuxLauncher( > const Flags& _flags, > int _namespaces, > const string& _hierarchy) > : flags(_flags), > namespaces(_namespaces), > hierarchy(_hierarchy) {} > {noformat} > h4.Trailing undescores as prime symbols. > We use this pattern in the code, though not extensively. We would like to see > more pass-by-value instead of creating copies from a variable passed by const > reference. > * master.cpp:2942 > {noformat} > // Create and add the slave id. > SlaveInfo slaveInfo_ = slaveInfo; > slaveInfo_.mutable_id()->CopyFrom(newSlaveId()); > {noformat} > * slave.cpp:4180 > {noformat} > ExecutorInfo executorInfo_ = executor->info; > Resources resources = executorInfo_.resources(); > resources += taskInfo.resources(); > executorInfo_.mutable_resources()->CopyFrom(resources); > {noformat} > * status_update_manager.cpp:474 > {noformat} > // Bounded exponential backoff. > Duration duration_ = > std::min(duration * 2, STATUS_UPDATE_RETRY_INTERVAL_MAX); > {noformat} > * containerizer/mesos/containerizer.cpp:109 > {noformat} > // Modify the flags to include any changes to isolation. > Flags flags_ = flags; > flags_.isolation = isolation; > {noformat} > h4.Passing arguments by value. > * slave.cpp:2480 > {noformat} > void Slave::statusUpdate(StatusUpdate update, const UPID& pid) > { > ... > // Set the source before forwarding the status update. > update.mutable_status()->set_source( > pid == UPID() ? TaskStatus::SOURCE_SLAVE : TaskStatus::SOURCE_EXECUTOR); > ... > } > {noformat} > * process/metrics/timer.hpp:103 > {noformat} > static void _time(Time start, Timer that) > { > const Time stop = Clock::now(); > double value; > process::internal::acquire(&that.data->lock); > { > that.data->lastValue = T(stop - start).value(); > value = that.data->lastValue.get(); > } > process::internal::release(&that.data->lock); > that.push(value); > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2233) Run ASF CI mesos builds inside docker
[ https://issues.apache.org/jira/browse/MESOS-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494526#comment-14494526 ] Vinod Kone commented on MESOS-2233: --- Definitely looks like an "apparmor" permissions issue (though it is weird that it only happens intermittently). Here's the relevant kernel log output from the latest failed build https://builds.apache.org/job/vinod-docker-multi/COMPILER=gcc,LABEL=docker%7C%7CHadoop,OS=ubuntu:14.10/113/consoleFull configure fails due to perm issue for 'sed'. {code} config.status: creating include/Makefile config.status: creating 3rdparty/Makefile sed: error while loading shared libraries: libpthread.so.0: failed to map segment from shared object: Permission denied config.status: creating 3rdparty/gmock_sources.cc config.status: executing depfiles commands config.status: executing libtool commands === configuring in 3rdparty/stout (/mesos/3rdparty/libprocess/3rdparty/stout) configure: running /bin/bash ./configure --disable-option-checking '--prefix=/usr/local' 'CC=gcc' 'CXX=g++' '--enable-shared=no' '--with-pic' --cache-file=/dev/null --srcdir=. {code} apparmor denies 'sed' {code} [16654475.072801] type=1400 audit(1429008557.507:152643): apparmor="DENIED" operation="file_mmap" profile="docker-default" name="s-0.23.0/_build/confwWH8Lp/subs.awk" pid=15083 comm="sed" requested_mask="mr" denied_mask="mr" fsuid=1000 ouid=0 [16657331.416018] docker0: port 1(vethad50be3) entered disabled state [16657331.416570] device vethad50be3 left promiscuous mode [16657331.416581] docker0: port 1(vethad50be3) entered disabled state [16658694.375946] device veth9c2198d entered promiscuous mode [16658694.376446] IPv6: ADDRCONF(NETDEV_UP): veth9c2198d: link is not ready [16658694.406234] IPv6: ADDRCONF(NETDEV_CHANGE): veth9c2198d: link becomes ready [16658694.406272] docker0: port 1(veth9c2198d) entered forwarding state [16658694.406277] docker0: port 1(veth9c2198d) entered forwarding state [16658695.048564] docker0: port 1(veth9c2198d) entered disabled state [16658695.049197] device veth9c2198d left promiscuous mode [16658695.049227] docker0: port 1(veth9c2198d) entered disabled state [16658698.266384] device veth3b62949 entered promiscuous mode [16658698.266793] IPv6: ADDRCONF(NETDEV_UP): veth3b62949: link is not ready [16658698.305110] IPv6: ADDRCONF(NETDEV_CHANGE): veth3b62949: link becomes ready [16658698.305145] docker0: port 1(veth3b62949) entered forwarding state [16658698.305152] docker0: port 1(veth3b62949) entered forwarding state [16658713.370021] docker0: port 1(veth3b62949) entered forwarding state [16662827.593876] docker0: port 1(veth3b62949) entered disabled state [16662827.594835] device veth3b62949 left promiscuous mode [16662827.594848] docker0: port 1(veth3b62949) entered disabled state [16662848.574731] device veth51c14c8 entered promiscuous mode [16662848.575105] IPv6: ADDRCONF(NETDEV_UP): veth51c14c8: link is not ready [16662848.606153] IPv6: ADDRCONF(NETDEV_CHANGE): veth51c14c8: link becomes ready [16662848.606193] docker0: port 1(veth51c14c8) entered forwarding state [16662848.606200] docker0: port 1(veth51c14c8) entered forwarding state [16662849.211224] docker0: port 1(veth51c14c8) entered disabled state [16662849.212002] device veth51c14c8 left promiscuous mode [16662849.212015] docker0: port 1(veth51c14c8) entered disabled state [16662852.417932] device veth1fa6472 entered promiscuous mode [16662852.418403] IPv6: ADDRCONF(NETDEV_UP): veth1fa6472: link is not ready [16662852.444529] IPv6: ADDRCONF(NETDEV_CHANGE): veth1fa6472: link becomes ready [16662852.444578] docker0: port 1(veth1fa6472) entered forwarding state [16662852.444586] docker0: port 1(veth1fa6472) entered forwarding state [16662867.497495] docker0: port 1(veth1fa6472) entered forwarding state [1968.777955] docker0: port 1(veth1fa6472) entered disabled state [1968.778929] device veth1fa6472 left promiscuous mode [1968.778942] docker0: port 1(veth1fa6472) entered disabled state [16671013.616616] device veth98fd1d3 entered promiscuous mode [16671013.617074] IPv6: ADDRCONF(NETDEV_UP): veth98fd1d3: link is not ready [16671013.658662] IPv6: ADDRCONF(NETDEV_CHANGE): veth98fd1d3: link becomes ready [16671013.658709] docker0: port 1(veth98fd1d3) entered forwarding state [16671013.658715] docker0: port 1(veth98fd1d3) entered forwarding state [16671014.251222] docker0: port 1(veth98fd1d3) entered disabled state [16671014.252164] device veth98fd1d3 left promiscuous mode [16671014.252177] docker0: port 1(veth98fd1d3) entered disabled state [16671017.698131] device veth40c3c60 entered promiscuous mode [16671017.698709] IPv6: ADDRCONF(NETDEV_UP): veth40c3c60: link is not ready [16671017.742416] IPv6: ADDRCONF(NETDEV_CHANGE): veth40c3c60: link becomes ready [16671017.742465] docker0: port 1(veth40c3c60) entered forwarding state [16671017.742473] docker0: port 1(veth40c3c60) entered forwarding state [1
[jira] [Assigned] (MESOS-1865) Mesos APIs for non-leading masters should return copies of the leader's state or an error, not a success with incorrect information
[ https://issues.apache.org/jira/browse/MESOS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-1865: --- Assignee: haosdent > Mesos APIs for non-leading masters should return copies of the leader's state > or an error, not a success with incorrect information > --- > > Key: MESOS-1865 > URL: https://issues.apache.org/jira/browse/MESOS-1865 > Project: Mesos > Issue Type: Bug > Components: json api >Affects Versions: 0.20.1 >Reporter: Steven Schlansker >Assignee: haosdent > > Some of the API endpoints, for example /master/tasks.json, will return bogus > information if you query a non-leading master: > {code} > [steven@Anesthetize:~]% curl > http://master1.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [] > } > [steven@Anesthetize:~]% curl > http://master2.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [] > } > [steven@Anesthetize:~]% curl > http://master3.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [ > { > "executor_id": "", > "framework_id": "20140724-231003-419644938-5050-1707-", > "id": > "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db", > "name": > "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db", > "resources": { > "cpus": 0.25, > "disk": 0, > {code} > This is very hard for end-users to work around. For example if I query > "which master is leading" followed by "leader: which tasks are running" it is > possible that the leader fails over in between, leaving me with an incorrect > answer and no way to know that this happened. > In my opinion the API should return the correct response (by asking the > current leader?) or an error (500 Not the leader?) but it's unacceptable to > return a successful wrong answer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1865) Mesos APIs for non-leading masters should return copies of the leader's state or an error, not a success with incorrect information
[ https://issues.apache.org/jira/browse/MESOS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494511#comment-14494511 ] haosdent commented on MESOS-1865: - [~stevenschlansker] Thank you, let me try to fix this. > Mesos APIs for non-leading masters should return copies of the leader's state > or an error, not a success with incorrect information > --- > > Key: MESOS-1865 > URL: https://issues.apache.org/jira/browse/MESOS-1865 > Project: Mesos > Issue Type: Bug > Components: json api >Affects Versions: 0.20.1 >Reporter: Steven Schlansker > > Some of the API endpoints, for example /master/tasks.json, will return bogus > information if you query a non-leading master: > {code} > [steven@Anesthetize:~]% curl > http://master1.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [] > } > [steven@Anesthetize:~]% curl > http://master2.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [] > } > [steven@Anesthetize:~]% curl > http://master3.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [ > { > "executor_id": "", > "framework_id": "20140724-231003-419644938-5050-1707-", > "id": > "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db", > "name": > "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db", > "resources": { > "cpus": 0.25, > "disk": 0, > {code} > This is very hard for end-users to work around. For example if I query > "which master is leading" followed by "leader: which tasks are running" it is > possible that the leader fails over in between, leaving me with an incorrect > answer and no way to know that this happened. > In my opinion the API should return the correct response (by asking the > current leader?) or an error (500 Not the leader?) but it's unacceptable to > return a successful wrong answer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-354) Oversubscribe resources
[ https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-354: - Issue Type: Epic (was: Story) Summary: Oversubscribe resources (was: oversubscribe resources) > Oversubscribe resources > --- > > Key: MESOS-354 > URL: https://issues.apache.org/jira/browse/MESOS-354 > Project: Mesos > Issue Type: Epic > Components: isolation, master, slave >Reporter: brian wickman >Priority: Minor > Attachments: mesos_virtual_offers.pdf > > > This proposal is predicated upon offer revocation. > The idea would be to add a new "revoked" status either by (1) piggybacking > off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a > new status update TASK_REVOKED. > In order to augment an offer with metadata about revocability, there are > options: > 1) Add a revocable boolean to the Offer and > a) offer only one type of Offer per slave at a particular time > b) offer both revocable and non-revocable resources at the same time but > require frameworks to understand that Offers can contain overlapping resources > 2) Add a revocable_resources field on the Offer which is a superset of the > regular resources field. By consuming > resources <= revocable_resources in > a launchTask, the Task becomes a revocable task. If launching a task with < > resources, the Task is non-revocable. > The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) > and non-revocable tasks are online higher-SLA tasks (e.g. services.) > Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk. > One of these resources is a rate (4 cpu seconds per second) and two of them > are fixed values (8GB and 20GB respectively, though disk resources can be > further broken down into spindles - fixed - and iops - a rate.) In practice, > these are the maximum resources in the respective dimensions that this task > will use. In reality, we provision tasks at some factor below peak, and only > hit peak resource consumption in rare circumstances or perhaps at a diurnal > peak. > In the meantime, we stand to gain from offering the some constant factor of > the difference between (reserved - actual) of non-revocable tasks as > revocable resources, depending upon our tolerance for revocable task churn. > The main challenge is coming up with an accurate short / medium / long-term > prediction of resource consumption based upon current behavior. > In many cases it would be OK to be sloppy: > * CPU / iops / network IO are rates (compressible) and can often be OK > below guarantees for brief periods of time while task revocation takes place > * Memory slack can be provided by enabling swap and dynamically setting > swap paging boundaries. Should swap ever be activated, that would be a > signal to revoke. > The master / allocator would piggyback on the slave heartbeat mechanism to > learn of the amount of revocable resources available at any point in time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-1865) Mesos APIs for non-leading masters should return copies of the leader's state or an error, not a success with incorrect information
[ https://issues.apache.org/jira/browse/MESOS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494499#comment-14494499 ] haosdent edited comment on MESOS-1865 at 4/14/15 6:08 PM: -- Hi, [~stevenschlansker] Sorry to could not got your idea. Do you mean the non-leading master should return error status code instead of return { "tasks": []} was (Author: haosd...@gmail.com): Hi, [~stevenschlansker] Sorry to could not got your idea. Do you mean the non-leading master should return error status code instead of return "{ "tasks": []}"? > Mesos APIs for non-leading masters should return copies of the leader's state > or an error, not a success with incorrect information > --- > > Key: MESOS-1865 > URL: https://issues.apache.org/jira/browse/MESOS-1865 > Project: Mesos > Issue Type: Bug > Components: json api >Affects Versions: 0.20.1 >Reporter: Steven Schlansker > > Some of the API endpoints, for example /master/tasks.json, will return bogus > information if you query a non-leading master: > {code} > [steven@Anesthetize:~]% curl > http://master1.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [] > } > [steven@Anesthetize:~]% curl > http://master2.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [] > } > [steven@Anesthetize:~]% curl > http://master3.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [ > { > "executor_id": "", > "framework_id": "20140724-231003-419644938-5050-1707-", > "id": > "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db", > "name": > "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db", > "resources": { > "cpus": 0.25, > "disk": 0, > {code} > This is very hard for end-users to work around. For example if I query > "which master is leading" followed by "leader: which tasks are running" it is > possible that the leader fails over in between, leaving me with an incorrect > answer and no way to know that this happened. > In my opinion the API should return the correct response (by asking the > current leader?) or an error (500 Not the leader?) but it's unacceptable to > return a successful wrong answer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1865) Mesos APIs for non-leading masters should return copies of the leader's state or an error, not a success with incorrect information
[ https://issues.apache.org/jira/browse/MESOS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494506#comment-14494506 ] Steven Schlansker commented on MESOS-1865: -- Yes. Or it should return the correct results. Really, it should do just about anything rather than returning a valid but incorrect result. > Mesos APIs for non-leading masters should return copies of the leader's state > or an error, not a success with incorrect information > --- > > Key: MESOS-1865 > URL: https://issues.apache.org/jira/browse/MESOS-1865 > Project: Mesos > Issue Type: Bug > Components: json api >Affects Versions: 0.20.1 >Reporter: Steven Schlansker > > Some of the API endpoints, for example /master/tasks.json, will return bogus > information if you query a non-leading master: > {code} > [steven@Anesthetize:~]% curl > http://master1.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [] > } > [steven@Anesthetize:~]% curl > http://master2.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [] > } > [steven@Anesthetize:~]% curl > http://master3.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [ > { > "executor_id": "", > "framework_id": "20140724-231003-419644938-5050-1707-", > "id": > "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db", > "name": > "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db", > "resources": { > "cpus": 0.25, > "disk": 0, > {code} > This is very hard for end-users to work around. For example if I query > "which master is leading" followed by "leader: which tasks are running" it is > possible that the leader fails over in between, leaving me with an incorrect > answer and no way to know that this happened. > In my opinion the API should return the correct response (by asking the > current leader?) or an error (500 Not the leader?) but it's unacceptable to > return a successful wrong answer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1865) Mesos APIs for non-leading masters should return copies of the leader's state or an error, not a success with incorrect information
[ https://issues.apache.org/jira/browse/MESOS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494499#comment-14494499 ] haosdent commented on MESOS-1865: - Hi, [~stevenschlansker] Sorry to could not got your idea. Do you mean the non-leading master should return error status code instead of return "{ "tasks": []}"? > Mesos APIs for non-leading masters should return copies of the leader's state > or an error, not a success with incorrect information > --- > > Key: MESOS-1865 > URL: https://issues.apache.org/jira/browse/MESOS-1865 > Project: Mesos > Issue Type: Bug > Components: json api >Affects Versions: 0.20.1 >Reporter: Steven Schlansker > > Some of the API endpoints, for example /master/tasks.json, will return bogus > information if you query a non-leading master: > {code} > [steven@Anesthetize:~]% curl > http://master1.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [] > } > [steven@Anesthetize:~]% curl > http://master2.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [] > } > [steven@Anesthetize:~]% curl > http://master3.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n > 10 > { > "tasks": [ > { > "executor_id": "", > "framework_id": "20140724-231003-419644938-5050-1707-", > "id": > "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db", > "name": > "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db", > "resources": { > "cpus": 0.25, > "disk": 0, > {code} > This is very hard for end-users to work around. For example if I query > "which master is leading" followed by "leader: which tasks are running" it is > possible that the leader fails over in between, leaving me with an incorrect > answer and no way to know that this happened. > In my opinion the API should return the correct response (by asking the > current leader?) or an error (500 Not the leader?) but it's unacceptable to > return a successful wrong answer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1939) Enable multiple authentication methods in parallel
[ https://issues.apache.org/jira/browse/MESOS-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494494#comment-14494494 ] haosdent commented on MESOS-1939: - Hi, [~adam-mesos] I only could find LocalAuthorizer(ACL) now. So is it necessary to implement this? And me idea to implement this is add a map which contains different authorizers to master.cpp. It is ok to add a new field to master.cpp? > Enable multiple authentication methods in parallel > -- > > Key: MESOS-1939 > URL: https://issues.apache.org/jira/browse/MESOS-1939 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Till Toenshoff >Priority: Minor > Labels: authentication > > The master (authenticator) should allow for multiple authentication > mechanisms to be used at the same time. That way, a slave could be > authenticated by mechanism FOO while the frameworks are authenticated by BAR. > The authenticatee should be allowed to select the desired mechanism (module). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-354) oversubscribe resources
[ https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494492#comment-14494492 ] Niklas Quarfot Nielsen commented on MESOS-354: -- Folks, here is the architecture document we have been working on for introducing oversubscription in Mesos: https://docs.google.com/document/d/1pUnElxHy1uWfHY_FOvvRC73QaOGgdXE0OXN-gbxdXA0/edit# It is still work in progress, so feel free to add suggestions and raise concerns. > oversubscribe resources > --- > > Key: MESOS-354 > URL: https://issues.apache.org/jira/browse/MESOS-354 > Project: Mesos > Issue Type: Story > Components: isolation, master, slave >Reporter: brian wickman >Priority: Minor > Attachments: mesos_virtual_offers.pdf > > > This proposal is predicated upon offer revocation. > The idea would be to add a new "revoked" status either by (1) piggybacking > off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a > new status update TASK_REVOKED. > In order to augment an offer with metadata about revocability, there are > options: > 1) Add a revocable boolean to the Offer and > a) offer only one type of Offer per slave at a particular time > b) offer both revocable and non-revocable resources at the same time but > require frameworks to understand that Offers can contain overlapping resources > 2) Add a revocable_resources field on the Offer which is a superset of the > regular resources field. By consuming > resources <= revocable_resources in > a launchTask, the Task becomes a revocable task. If launching a task with < > resources, the Task is non-revocable. > The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) > and non-revocable tasks are online higher-SLA tasks (e.g. services.) > Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk. > One of these resources is a rate (4 cpu seconds per second) and two of them > are fixed values (8GB and 20GB respectively, though disk resources can be > further broken down into spindles - fixed - and iops - a rate.) In practice, > these are the maximum resources in the respective dimensions that this task > will use. In reality, we provision tasks at some factor below peak, and only > hit peak resource consumption in rare circumstances or perhaps at a diurnal > peak. > In the meantime, we stand to gain from offering the some constant factor of > the difference between (reserved - actual) of non-revocable tasks as > revocable resources, depending upon our tolerance for revocable task churn. > The main challenge is coming up with an accurate short / medium / long-term > prediction of resource consumption based upon current behavior. > In many cases it would be OK to be sloppy: > * CPU / iops / network IO are rates (compressible) and can often be OK > below guarantees for brief periods of time while task revocation takes place > * Memory slack can be provided by enabling swap and dynamically setting > swap paging boundaries. Should swap ever be activated, that would be a > signal to revoke. > The master / allocator would piggyback on the slave heartbeat mechanism to > learn of the amount of revocable resources available at any point in time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2203) Old Centos 6.5 kernels/headers not sufficient for building Mesos
[ https://issues.apache.org/jira/browse/MESOS-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494469#comment-14494469 ] Vinod Kone commented on MESOS-2203: --- [~idownes] Looks like this code path was introduced for pid namespaces support? Is it possible to have a configure check for this? > Old Centos 6.5 kernels/headers not sufficient for building Mesos > > > Key: MESOS-2203 > URL: https://issues.apache.org/jira/browse/MESOS-2203 > Project: Mesos > Issue Type: Documentation >Affects Versions: 0.21.0 > Environment: E.g. Centos 6.5 with kernel 2.6.32-279.14.1 >Reporter: Hans van den Bogert >Priority: Minor > > Old kernels are not sufficient for building Mesos: > bq. > Error: > bq. libtool: compile: g++ -DPACKAGE_NAME=\"mesos\" > -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.21.0\" > "-DPACKAGE_STRING=\"mesos 0.21.0\"" -DPACKAGE_BUGREPORT=\"\" > -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.21.0\" -DSTDC_HEADERS=1 > -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 > -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 > -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 > -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 > -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 > -DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror > -DLIBDIR=\"/var/scratch/vdbogert/lib\" > -DPKGLIBEXECDIR=\"/var/scratch/vdbogert/libexec/mesos\" > -DPKGDATADIR=\"/var/scratch/vdbogert/share/mesos\" -I../../include > -I../../3rdparty/libprocess/include > -I../../3rdparty/libprocess/3rdparty/stout/include -I../include > -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 > -I../3rdparty/libprocess/3rdparty/picojson-4f93734 > -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src > -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src > -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src > -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include > -I../3rdparty/zookeeper-3.4.5/src/c/generated > -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src > -I/var/scratch/vdbogert//include/subversion-1 -I/usr/include/apr-1 > -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 > -MT slave/containerizer/isolators/namespaces/libmesos_no_3rdparty_la-pid.lo > -MD -MP -MF > slave/containerizer/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo > -c ../../src/slave/containerizer/isolators/namespaces/pid.cpp -fPIC -DPIC > -o > slave/containerizer/isolators/namespaces/.libs/libmesos_no_3rdparty_la-pid.o > In file included from /usr/include/sys/syscall.h:32:0, > from ../../src/linux/ns.hpp:26, > from > ../../src/slave/containerizer/isolators/namespaces/pid.cpp:31: > ../../src/linux/ns.hpp: In function 'Try ns::setns(const string&, > const string&)': > ../../src/linux/ns.hpp:167:23: error: '__NR_setns' was not declared in this > scope >int ret = ::syscall(SYS_setns, fd.get(), nstype.get()); >^ > Perhaps this should be stated on: > http://mesos.apache.org/gettingstarted/ because taking myself as example, > this has cost me a lot of time to pinpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2605) The slave sometimes does not send active executors during reregistration
[ https://issues.apache.org/jira/browse/MESOS-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494339#comment-14494339 ] Elizabeth Lingg commented on MESOS-2605: As an additional comment, this behavior is observed on our Core OS Cluster. On this cluster, we have restarts of Mesos slaves as well as reboots of the machine. It seems to happen upon restarts of the Mesos slaves sometimes, but not all the time. > The slave sometimes does not send active executors during reregistration > > > Key: MESOS-2605 > URL: https://issues.apache.org/jira/browse/MESOS-2605 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Elizabeth Lingg >Assignee: Michael Park > Labels: mesosphere > > The slave sometimes does not send active executors during reregistration. > Framework checkpointing is enabled, and the executor successfully > reregisters. However, the tasks in that executor are LOST (by abnormal > executor termination) because the executor is removed by the mesos master as > unknown. See the example below, > task.journalnode.journalnode.NodeExecutor.1428609184051. > See the Slave Logs here for the Task: > {code} > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.778790 25126 status_update_manager.cpp:317] Received status update > TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task > task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.779013 25126 status_update_manager.hpp:346] Checkpointing UPDATE for > status update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for > task task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.781788 25123 slave.cpp:2753] Forwarding the update TASK_RUNNING > (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task > task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 to master@10.142.250.253:5050 > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.781889 25123 slave.cpp:2686] Sending acknowledgement for status > update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task > task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 to executor(1)@10.168.119.78:47638 > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.784503 25124 status_update_manager.cpp:389] Received status update > acknowledgement (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task > task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 > Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 > 19:53:06.784567 25124 status_update_manager.hpp:346] Checkpointing ACK for > status update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for > task task.journalnode.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 > {code} > Master Logs: > {code} > Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: W0409 > 20:19:43.008666 1067 master.cpp:4015] Executor > executor.journalnode.NodeExecutor.1428609184051 of framework > 20150408-002100-4261056010-5050-1047-0008 possibly unknown to the slave > 20150407-233647-2059219722-5050-1659-S5 at slave(1)@10.168.119.78:5051 > (ec2-54-237-57-237.compute-1.amazonaws.com) > Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 > 20:19:43.008652 1074 hierarchical.hpp:648] Recovered cpus(*):0.1; > mem(*):1536 (total allocatable: cpus(*):3.5; mem(*):21113; disk(*):142210; > ports(*):[3889-5044, 5046-5049, 2182-2958, 2960-3887, 1025-2180, 8082-9041, > 9043-9159, 9161-, 5052-6999, 7002-7198, 7200-8079, 10001-65535]) on slave > 20150407-233647-2059219722-5050-1659-S5 from framework > 20150408-002100-4261056010-5050-1047-0008 > Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 > 20:19:43.008712 1067 master.cpp:4714] Removing executor > 'executor.journalnode.NodeExecutor.1428609184051' with resources cpus(*):0.1; > mem(*):1536 of framework 20150408-002100-4261056010-5050-1047-0008 on slave > 20150407-233647-2059219722-5050-1659-S5 at slave(1)@10.168.119.78:5051 > (ec2-54-237-57-237.compute-1.amazonaws.com) > Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 > 20:19:43.010372 1067 master.cpp:3295] Status update TASK_LOST (UUID
[jira] [Commented] (MESOS-2203) Old Centos 6.5 kernels/headers not sufficient for building Mesos
[ https://issues.apache.org/jira/browse/MESOS-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494337#comment-14494337 ] haosdent commented on MESOS-2203: - I use CentOS 6.5 My kernel version: {quote} $ uname -a Linux xxx 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/issue CentOS release 6.5 (Final) Kernel \r on an \m {quote} My gcc version: {quote} $ g++ --version g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11) Copyright (C) 2010 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. {quote} And I could build success and pass all unit test cases in CentOS 6.5 > Old Centos 6.5 kernels/headers not sufficient for building Mesos > > > Key: MESOS-2203 > URL: https://issues.apache.org/jira/browse/MESOS-2203 > Project: Mesos > Issue Type: Documentation >Affects Versions: 0.21.0 > Environment: E.g. Centos 6.5 with kernel 2.6.32-279.14.1 >Reporter: Hans van den Bogert >Priority: Minor > > Old kernels are not sufficient for building Mesos: > bq. > Error: > bq. libtool: compile: g++ -DPACKAGE_NAME=\"mesos\" > -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.21.0\" > "-DPACKAGE_STRING=\"mesos 0.21.0\"" -DPACKAGE_BUGREPORT=\"\" > -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.21.0\" -DSTDC_HEADERS=1 > -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 > -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 > -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 > -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 > -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 > -DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror > -DLIBDIR=\"/var/scratch/vdbogert/lib\" > -DPKGLIBEXECDIR=\"/var/scratch/vdbogert/libexec/mesos\" > -DPKGDATADIR=\"/var/scratch/vdbogert/share/mesos\" -I../../include > -I../../3rdparty/libprocess/include > -I../../3rdparty/libprocess/3rdparty/stout/include -I../include > -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 > -I../3rdparty/libprocess/3rdparty/picojson-4f93734 > -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src > -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src > -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src > -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include > -I../3rdparty/zookeeper-3.4.5/src/c/generated > -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src > -I/var/scratch/vdbogert//include/subversion-1 -I/usr/include/apr-1 > -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 > -MT slave/containerizer/isolators/namespaces/libmesos_no_3rdparty_la-pid.lo > -MD -MP -MF > slave/containerizer/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo > -c ../../src/slave/containerizer/isolators/namespaces/pid.cpp -fPIC -DPIC > -o > slave/containerizer/isolators/namespaces/.libs/libmesos_no_3rdparty_la-pid.o > In file included from /usr/include/sys/syscall.h:32:0, > from ../../src/linux/ns.hpp:26, > from > ../../src/slave/containerizer/isolators/namespaces/pid.cpp:31: > ../../src/linux/ns.hpp: In function 'Try ns::setns(const string&, > const string&)': > ../../src/linux/ns.hpp:167:23: error: '__NR_setns' was not declared in this > scope >int ret = ::syscall(SYS_setns, fd.get(), nstype.get()); >^ > Perhaps this should be stated on: > http://mesos.apache.org/gettingstarted/ because taking myself as example, > this has cost me a lot of time to pinpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2203) Old Centos 6.5 kernels/headers not sufficient for building Mesos
[ https://issues.apache.org/jira/browse/MESOS-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494307#comment-14494307 ] Mike Ringenburg commented on MESOS-2203: I'd second the suggestion to get this added to the getting started page - I just ran into it while building Mesos. Luckily an quick search brought me here, but it'd be nice to have it stated up front. > Old Centos 6.5 kernels/headers not sufficient for building Mesos > > > Key: MESOS-2203 > URL: https://issues.apache.org/jira/browse/MESOS-2203 > Project: Mesos > Issue Type: Documentation >Affects Versions: 0.21.0 > Environment: E.g. Centos 6.5 with kernel 2.6.32-279.14.1 >Reporter: Hans van den Bogert >Priority: Minor > > Old kernels are not sufficient for building Mesos: > bq. > Error: > bq. libtool: compile: g++ -DPACKAGE_NAME=\"mesos\" > -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.21.0\" > "-DPACKAGE_STRING=\"mesos 0.21.0\"" -DPACKAGE_BUGREPORT=\"\" > -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.21.0\" -DSTDC_HEADERS=1 > -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 > -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 > -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 > -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 > -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 > -DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror > -DLIBDIR=\"/var/scratch/vdbogert/lib\" > -DPKGLIBEXECDIR=\"/var/scratch/vdbogert/libexec/mesos\" > -DPKGDATADIR=\"/var/scratch/vdbogert/share/mesos\" -I../../include > -I../../3rdparty/libprocess/include > -I../../3rdparty/libprocess/3rdparty/stout/include -I../include > -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 > -I../3rdparty/libprocess/3rdparty/picojson-4f93734 > -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src > -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src > -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src > -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include > -I../3rdparty/zookeeper-3.4.5/src/c/generated > -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src > -I/var/scratch/vdbogert//include/subversion-1 -I/usr/include/apr-1 > -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 > -MT slave/containerizer/isolators/namespaces/libmesos_no_3rdparty_la-pid.lo > -MD -MP -MF > slave/containerizer/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo > -c ../../src/slave/containerizer/isolators/namespaces/pid.cpp -fPIC -DPIC > -o > slave/containerizer/isolators/namespaces/.libs/libmesos_no_3rdparty_la-pid.o > In file included from /usr/include/sys/syscall.h:32:0, > from ../../src/linux/ns.hpp:26, > from > ../../src/slave/containerizer/isolators/namespaces/pid.cpp:31: > ../../src/linux/ns.hpp: In function 'Try ns::setns(const string&, > const string&)': > ../../src/linux/ns.hpp:167:23: error: '__NR_setns' was not declared in this > scope >int ret = ::syscall(SYS_setns, fd.get(), nstype.get()); >^ > Perhaps this should be stated on: > http://mesos.apache.org/gettingstarted/ because taking myself as example, > this has cost me a lot of time to pinpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2617) The docker containerizer does not support configuring CFS quotas.
Steve Niemitz created MESOS-2617: Summary: The docker containerizer does not support configuring CFS quotas. Key: MESOS-2617 URL: https://issues.apache.org/jira/browse/MESOS-2617 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.22.0 Reporter: Steve Niemitz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2616) Update C++ style guide on variable naming.
[ https://issues.apache.org/jira/browse/MESOS-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-2616: -- Description: Our variable naming guide currently is not really explaining use cases for leading or trailing underscores as found a lot within our codebase. We should correct that. The following was copied from the review description for allowing discussions where needed: Documents the patterns we use to name variables and function arguments in our codebase. h4.Leading underscores to avoid ambiguity. We use this pattern extensively in libprocess, stout and mesos, a few examples below. * stout/try.hpp:105 {noformat} Try(State _state, T* _t = NULL, const std::string& _message = "") : state(_state), t(_t), message(_message) {} {noformat} * process/http.hpp:480 {noformat} URL(const std::string& _scheme, const std::string& _domain, const uint16_t _port = 80, const std::string& _path = "/", const hashmap& _query = (hashmap()), const Option& _fragment = None()) : scheme(_scheme), domain(_domain), port(_port), path(_path), query(_query), fragment(_fragment) {} {noformat} * slave/containerizer/linux_launcher.cpp:56 {noformat} LinuxLauncher::LinuxLauncher( const Flags& _flags, int _namespaces, const string& _hierarchy) : flags(_flags), namespaces(_namespaces), hierarchy(_hierarchy) {} {noformat} h4.Trailing undescores as prime symbols. We use this pattern in the code, though not extensively. We would like to see more pass-by-value instead of creating copies from a variable passed by const reference. * master.cpp:2942 {noformat} // Create and add the slave id. SlaveInfo slaveInfo_ = slaveInfo; slaveInfo_.mutable_id()->CopyFrom(newSlaveId()); {noformat} * slave.cpp:4180 {noformat} ExecutorInfo executorInfo_ = executor->info; Resources resources = executorInfo_.resources(); resources += taskInfo.resources(); executorInfo_.mutable_resources()->CopyFrom(resources); {noformat} * status_update_manager.cpp:474 {noformat} // Bounded exponential backoff. Duration duration_ = std::min(duration * 2, STATUS_UPDATE_RETRY_INTERVAL_MAX); {noformat} * containerizer/mesos/containerizer.cpp:109 {noformat} // Modify the flags to include any changes to isolation. Flags flags_ = flags; flags_.isolation = isolation; {noformat} h4.Passing arguments by value. * slave.cpp:2480 {noformat} void Slave::statusUpdate(StatusUpdate update, const UPID& pid) { ... // Set the source before forwarding the status update. update.mutable_status()->set_source( pid == UPID() ? TaskStatus::SOURCE_SLAVE : TaskStatus::SOURCE_EXECUTOR); ... } {noformat} * process/metrics/timer.hpp:103 {noformat} static void _time(Time start, Timer that) { const Time stop = Clock::now(); double value; process::internal::acquire(&that.data->lock); { that.data->lastValue = T(stop - start).value(); value = that.data->lastValue.get(); } process::internal::release(&that.data->lock); that.push(value); } {noformat} was: Our variable naming guide currently is not really explaining use cases for leading or trailing underscores as found a lot within our codebase. We should correct that. > Update C++ style guide on variable naming. > --- > > Key: MESOS-2616 > URL: https://issues.apache.org/jira/browse/MESOS-2616 > Project: Mesos > Issue Type: Documentation >Reporter: Till Toenshoff >Assignee: Alexander Rukletsov >Priority: Minor > > Our variable naming guide currently is not really explaining use cases for > leading or trailing underscores as found a lot within our codebase. > We should correct that. > The following was copied from the review description for allowing discussions > where needed: > Documents the patterns we use to name variables and function arguments in our > codebase. > h4.Leading underscores to avoid ambiguity. > We use this pattern extensively in libprocess, stout and mesos, a few > examples below. > * stout/try.hpp:105 > {noformat} > Try(State _state, T* _t = NULL, const std::string& _message = "") > : state(_state), t(_t), message(_message) {} > {noformat} > * process/http.hpp:480 > {noformat} > URL(const std::string& _scheme, > const std::string& _domain, > const uint16_t _port = 80, > const std::string& _path = "/", > const hashmap& _query = > (hashmap()), > const Option& _fragment = None()) > : scheme(_scheme), > domain(_domain), > port(_port), > path(_path), > query(_query), > fragment(_fragment) {} > {noformat} > * slave/containerizer/linux_launcher.cpp:56 > {noformat} > LinuxLauncher::LinuxLauncher( > const Flags& _flags, > int _namespaces, > const string&
[jira] [Created] (MESOS-2616) Update C++ style guide on variable naming.
Till Toenshoff created MESOS-2616: - Summary: Update C++ style guide on variable naming. Key: MESOS-2616 URL: https://issues.apache.org/jira/browse/MESOS-2616 Project: Mesos Issue Type: Documentation Reporter: Till Toenshoff Assignee: Alexander Rukletsov Priority: Minor Our variable naming guide currently is not really explaining use cases for leading or trailing underscores as found a lot within our codebase. We should correct that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2191) Add ContainerId to the TaskStatus message
[ https://issues.apache.org/jira/browse/MESOS-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493819#comment-14493819 ] Timothy Chen commented on MESOS-2191: - Hi Marcel, Thanks for explaining the rationale for getting ContainerId. There are quite a number of requests to ask for the Container name, and also different requests asking for various Docker container related information to get sent back to the scheduler for advanced use cases as you described. What I'm currently considering doing is to actually send back the whole docker inspect response back in TaskStatus, so in your scheduler when you see TaskStatus with TASK_RUNNING, the optional data field will contain the Docker inspect JSON output string serailized into a byte array. Then you can get any information such as name, network settings, volumes, all in in one place. Let me know what you think and if you foresee problems with this. > Add ContainerId to the TaskStatus message > - > > Key: MESOS-2191 > URL: https://issues.apache.org/jira/browse/MESOS-2191 > Project: Mesos > Issue Type: Wish > Components: containerization >Reporter: Marcel Neuhausler >Assignee: Alexander Rojas > Labels: mesosphere > > {{TaskStatus}} provides the frameworks with certain information > ({{executorId}}, {{slaveId}}, etc.) which is useful when collecting > statistics about cluster performance; however, it is difficult to associate > tasks to the container it is executed since this information stays always > within mesos itself. Therefore it would be good to provide the framework > scheduler with this information, adding a new field in the {{TaskStatus}} > message. > See comments for a use case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2598) Slave state.json frameworks.executors.queued_tasks wrong format?
[ https://issues.apache.org/jira/browse/MESOS-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493786#comment-14493786 ] Matthias Veit commented on MESOS-2598: -- Sure. > Slave state.json frameworks.executors.queued_tasks wrong format? > > > Key: MESOS-2598 > URL: https://issues.apache.org/jira/browse/MESOS-2598 > Project: Mesos > Issue Type: Bug > Components: statistics >Affects Versions: 0.22.0 > Environment: Linux version 3.10.0-229.1.2.el7.x86_64 > (buil...@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat > 4.8.2-16) (GCC) ) #1 SMP Fri Mar 27 03:04:26 UTC 2015 >Reporter: Matthias Veit >Priority: Minor > Labels: newbie > > queued_tasks.executor_id is expected to be a string and not a complete json > object. It should have the very same format as the tasks array on the same > level. > Example, directly taken from slave > {noformat} > > "queued_tasks": [ > { > "data": "", > "executor_id": { > "command": { > "argv": [], > "uris": [ > { > "executable": false, > "value": > "http://downloads.foo.io/orchestra/storm-mesos/0.9.2-incubating-47-ovh.bb373df1c/storm-mesos-0.9.2-incubating.tgz"; > } > ], > "value": "cd storm-mesos* && python bin/storm supervisor > storm.mesos.MesosSupervisor" > }, > "data": > "{\"assignmentid\":\"srv4.hw.ca1.foo.com\",\"supervisorid\":\"srv4.hw.ca1.foo.com-stage-ingestion-stats-slave-111-1428421145\"}", > "executor_id": "stage-ingestion-stats-slave-111-1428421145", > "framework_id": "20150401-160104-251662508-5050-2197-0002", > "name": "", > "resources": { > "cpus": 0.5, > "disk": 0, > "mem": 1000 > } > }, > "id": "srv4.hw.ca1.foo.com-31708", > "name": "worker srv4.hw.ca1.foo.com:31708", > "resources": { > "cpus": 1, > "disk": 0, > "mem": 5120, > "ports": "[31708-31708]" > }, > "slave_id": "20150327-025553-218108076-5050-4122-S0" > }, > ... > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2598) Slave state.json frameworks.executors.queued_tasks wrong format?
[ https://issues.apache.org/jira/browse/MESOS-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493787#comment-14493787 ] Matthias Veit commented on MESOS-2598: -- Sure. > Slave state.json frameworks.executors.queued_tasks wrong format? > > > Key: MESOS-2598 > URL: https://issues.apache.org/jira/browse/MESOS-2598 > Project: Mesos > Issue Type: Bug > Components: statistics >Affects Versions: 0.22.0 > Environment: Linux version 3.10.0-229.1.2.el7.x86_64 > (buil...@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat > 4.8.2-16) (GCC) ) #1 SMP Fri Mar 27 03:04:26 UTC 2015 >Reporter: Matthias Veit >Priority: Minor > Labels: newbie > > queued_tasks.executor_id is expected to be a string and not a complete json > object. It should have the very same format as the tasks array on the same > level. > Example, directly taken from slave > {noformat} > > "queued_tasks": [ > { > "data": "", > "executor_id": { > "command": { > "argv": [], > "uris": [ > { > "executable": false, > "value": > "http://downloads.foo.io/orchestra/storm-mesos/0.9.2-incubating-47-ovh.bb373df1c/storm-mesos-0.9.2-incubating.tgz"; > } > ], > "value": "cd storm-mesos* && python bin/storm supervisor > storm.mesos.MesosSupervisor" > }, > "data": > "{\"assignmentid\":\"srv4.hw.ca1.foo.com\",\"supervisorid\":\"srv4.hw.ca1.foo.com-stage-ingestion-stats-slave-111-1428421145\"}", > "executor_id": "stage-ingestion-stats-slave-111-1428421145", > "framework_id": "20150401-160104-251662508-5050-2197-0002", > "name": "", > "resources": { > "cpus": 0.5, > "disk": 0, > "mem": 1000 > } > }, > "id": "srv4.hw.ca1.foo.com-31708", > "name": "worker srv4.hw.ca1.foo.com:31708", > "resources": { > "cpus": 1, > "disk": 0, > "mem": 5120, > "ports": "[31708-31708]" > }, > "slave_id": "20150327-025553-218108076-5050-4122-S0" > }, > ... > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)