[jira] [Commented] (MESOS-1634) Calling stop on SchedulerDriver leaves Zookeeper connection left behind

2015-04-14 Thread Robert Lacroix (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495680#comment-14495680
 ] 

Robert Lacroix commented on MESOS-1634:
---

https://reviews.apache.org/r/33208/

> Calling stop on SchedulerDriver leaves Zookeeper connection left behind
> ---
>
> Key: MESOS-1634
> URL: https://issues.apache.org/jira/browse/MESOS-1634
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Affects Versions: 0.18.0
>Reporter: Robert Lacroix
>Assignee: Robert Lacroix
>
> When calling stop on SchedulerDriver, the Zookeeper connection of 
> ZooKeeperMasterDetector is not closed. This leaks connections to Zookeeper. 
> We should properly close them when stop is called.
> {code}
> $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED
> 2
> 2014-07-23 17:46:53,840:26108(0x1246a8000):ZOO_INFO@check_events@1750: 
> session establishment complete on server [127.0.0.1:2181], 
> sessionId=0x14765c8e1a4000a, negotiated timeout=1
> $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED
> 3
> I0723 17:48:57.354792 662249472 sched.cpp:730] Stopping framework 
> '20140723-174036-16777343-5050-26021-'
> $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED
> 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1634) Calling stop on SchedulerDriver leaves Zookeeper connection left behind

2015-04-14 Thread Robert Lacroix (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Lacroix reassigned MESOS-1634:
-

Assignee: Robert Lacroix

> Calling stop on SchedulerDriver leaves Zookeeper connection left behind
> ---
>
> Key: MESOS-1634
> URL: https://issues.apache.org/jira/browse/MESOS-1634
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Affects Versions: 0.18.0
>Reporter: Robert Lacroix
>Assignee: Robert Lacroix
>
> When calling stop on SchedulerDriver, the Zookeeper connection of 
> ZooKeeperMasterDetector is not closed. This leaks connections to Zookeeper. 
> We should properly close them when stop is called.
> {code}
> $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED
> 2
> 2014-07-23 17:46:53,840:26108(0x1246a8000):ZOO_INFO@check_events@1750: 
> session establishment complete on server [127.0.0.1:2181], 
> sessionId=0x14765c8e1a4000a, negotiated timeout=1
> $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED
> 3
> I0723 17:48:57.354792 662249472 sched.cpp:730] Stopping framework 
> '20140723-174036-16777343-5050-26021-'
> $ netstat -an | grep 2181 | grep tcp4 | grep ESTABLISHED
> 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2023) mesos-execute should allow setting environment variables

2015-04-14 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495559#comment-14495559
 ] 

haosdent commented on MESOS-2023:
-

[~adam-mesos]Could you help me review this? Thank you very much.

> mesos-execute should allow setting environment variables
> 
>
> Key: MESOS-2023
> URL: https://issues.apache.org/jira/browse/MESOS-2023
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 0.20.1
>Reporter: Steven Schlansker
>Assignee: haosdent
>  Labels: newbie
>
> mesos-execute does not allow setting various properties of the 'CommandInfo' 
> protobuf.  Most notably, being able to set environment variables and URIs 
> would be very useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2203) Old Centos 6.5 kernels/headers not sufficient for building Mesos

2015-04-14 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495550#comment-14495550
 ] 

haosdent commented on MESOS-2203:
-

In lxc, https://github.com/lxc/lxc/blob/master/src/lxc/utils.h#L53-L66 

They check __NR_setns like this:
{code}
/* Define setns() if missing from the C library */
#ifndef HAVE_SETNS
static inline int setns(int fd, int nstype)
{
#ifdef __NR_setns
return syscall(__NR_setns, fd, nstype);
#elif defined(__NR_set_ns)
return syscall(__NR_set_ns, fd, nstype);
#else
errno = ENOSYS;
return -1;
#endif
}
#endif
{code}

> Old Centos 6.5 kernels/headers not sufficient for building Mesos
> 
>
> Key: MESOS-2203
> URL: https://issues.apache.org/jira/browse/MESOS-2203
> Project: Mesos
>  Issue Type: Documentation
>Affects Versions: 0.21.0
> Environment: E.g. Centos 6.5 with kernel 2.6.32-279.14.1 
>Reporter: Hans van den Bogert
>Priority: Minor
>
> Old kernels are not sufficient for building Mesos:
> bq. 
> Error:
> bq. libtool: compile:  g++ -DPACKAGE_NAME=\"mesos\" 
> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.21.0\" 
> "-DPACKAGE_STRING=\"mesos 0.21.0\"" -DPACKAGE_BUGREPORT=\"\" 
> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.21.0\" -DSTDC_HEADERS=1 
> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 
> -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
> -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 
> -DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror 
> -DLIBDIR=\"/var/scratch/vdbogert/lib\" 
> -DPKGLIBEXECDIR=\"/var/scratch/vdbogert/libexec/mesos\" 
> -DPKGDATADIR=\"/var/scratch/vdbogert/share/mesos\" -I../../include 
> -I../../3rdparty/libprocess/include 
> -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
> -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
> -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
> -I../3rdparty/zookeeper-3.4.5/src/c/generated 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I/var/scratch/vdbogert//include/subversion-1 -I/usr/include/apr-1 
> -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 
> -MT slave/containerizer/isolators/namespaces/libmesos_no_3rdparty_la-pid.lo 
> -MD -MP -MF 
> slave/containerizer/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo
>  -c ../../src/slave/containerizer/isolators/namespaces/pid.cpp  -fPIC -DPIC 
> -o 
> slave/containerizer/isolators/namespaces/.libs/libmesos_no_3rdparty_la-pid.o
> In file included from /usr/include/sys/syscall.h:32:0,
>  from ../../src/linux/ns.hpp:26,
>  from 
> ../../src/slave/containerizer/isolators/namespaces/pid.cpp:31:
> ../../src/linux/ns.hpp: In function 'Try ns::setns(const string&, 
> const string&)':
> ../../src/linux/ns.hpp:167:23: error: '__NR_setns' was not declared in this 
> scope
>int ret = ::syscall(SYS_setns, fd.get(), nstype.get());
>^
> Perhaps this should be stated on:
> http://mesos.apache.org/gettingstarted/ because taking myself as example, 
> this has cost me a lot of time to pinpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2619) Document master-scheduler communication

2015-04-14 Thread Connor Doyle (JIRA)
Connor Doyle created MESOS-2619:
---

 Summary: Document master-scheduler communication
 Key: MESOS-2619
 URL: https://issues.apache.org/jira/browse/MESOS-2619
 Project: Mesos
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.22.0
Reporter: Connor Doyle


New users often stumble on the networking requirements for communication 
between schedulers and the Mesos master.

It's not explicitly stated anywhere that the master has to talk back to the 
scheduler.  Also, some configuration options (like the LIBPROCESS_PORT 
environment variable) are under-documented.

This problem is exacerbated as many new users start playing with Mesos and 
scheduers in unpredictable networking contexts (NAT, containers with bridged 
networking, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2618) Update C++ style guide on function definition / invocation formatting.

2015-04-14 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-2618:
-

 Summary: Update C++ style guide on function definition / 
invocation formatting. 
 Key: MESOS-2618
 URL: https://issues.apache.org/jira/browse/MESOS-2618
 Project: Mesos
  Issue Type: Documentation
Reporter: Till Toenshoff
Priority: Minor


Our style guide currently suggests two options for cases of function 
definitions / invocations that do not fit into a single line even when breaking 
after the opening argument bracket;

Fixed leading indention (4 spaces);
{noformat}
// 4: OK.  
allocator->resourcesRecovered(  
frameworkId,  
slaveId,  
resources,  
filters);
{noformat}

Variable leading indention;
{noformat}
// 3: In this case, 3 is OK.  
foobar(someArgument,  
   someOtherArgument,  
   theLastArgument);
{noformat}

There is a counter-case mentioned as for the latter; 
{noformat}
// 3: Don't use in this case due to "jaggedness".
allocator->resourcesRecovered(frameworkId,  
  slaveId,  
  resources,  
  filters);
{noformat}


The problem here seems to be that the counter-case might not be well defined  
on when it applies.

We might want to consider...
A. removing the variable leading option entirely
B. define the exact limits on when "jaggedness" applies



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2191) Add ContainerId to the TaskStatus message

2015-04-14 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495123#comment-14495123
 ] 

Benjamin Mahler commented on MESOS-2191:


How does that work for custom executors running in the docker containerizer? 
Currently schedulers may not necessarily expect data messages to come from 
command / docker tasks.

> Add ContainerId to the TaskStatus message
> -
>
> Key: MESOS-2191
> URL: https://issues.apache.org/jira/browse/MESOS-2191
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Marcel Neuhausler
>Assignee: Alexander Rojas
>  Labels: mesosphere
>
> {{TaskStatus}} provides the frameworks with certain information 
> ({{executorId}}, {{slaveId}}, etc.) which is useful when collecting 
> statistics about cluster performance; however, it is difficult to associate 
> tasks to the container it is executed since this information stays always 
> within mesos itself. Therefore it would be good to provide the framework 
> scheduler with this information, adding a new field in the {{TaskStatus}} 
> message.
> See comments for a use case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2605) The slave sometimes does not send active executors during reregistration

2015-04-14 Thread Cody Maloney (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495003#comment-14495003
 ] 

Cody Maloney commented on MESOS-2605:
-

That sounds like this might be related to MESOS-2601 then. Mesos doesn't 
currently save what containerizer created / owns a container, and so it just 
tries to recover the container with all of them.

> The slave sometimes does not send active executors during reregistration
> 
>
> Key: MESOS-2605
> URL: https://issues.apache.org/jira/browse/MESOS-2605
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Elizabeth Lingg
>Assignee: Michael Park
>  Labels: mesosphere
>
> The slave sometimes does not send active executors during reregistration. 
> Framework checkpointing is enabled, and the executor successfully 
> reregisters. However, the tasks in that executor are LOST (by abnormal 
> executor termination) because the executor is removed by the mesos master as 
> unknown. See the example below, 
> task.journalnode.journalnode.NodeExecutor.1428609184051.
> See the Slave Logs here for the Task:
> {code}
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.778790 25126 status_update_manager.cpp:317] Received status update 
> TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task 
> task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.779013 25126 status_update_manager.hpp:346] Checkpointing UPDATE for 
> status update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for 
> task task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.781788 25123 slave.cpp:2753] Forwarding the update TASK_RUNNING 
> (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task 
> task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008 to master@10.142.250.253:5050
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.781889 25123 slave.cpp:2686] Sending acknowledgement for status 
> update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task 
> task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008 to executor(1)@10.168.119.78:47638
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.784503 25124 status_update_manager.cpp:389] Received status update 
> acknowledgement (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task 
> task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.784567 25124 status_update_manager.hpp:346] Checkpointing ACK for 
> status update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for 
> task task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008
> {code}
> Master Logs:
> {code}
> Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: W0409 
> 20:19:43.008666  1067 master.cpp:4015] Executor 
> executor.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008 possibly unknown to the slave 
> 20150407-233647-2059219722-5050-1659-S5 at slave(1)@10.168.119.78:5051 
> (ec2-54-237-57-237.compute-1.amazonaws.com)
> Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 
> 20:19:43.008652  1074 hierarchical.hpp:648] Recovered cpus(*):0.1; 
> mem(*):1536 (total allocatable: cpus(*):3.5; mem(*):21113; disk(*):142210; 
> ports(*):[3889-5044, 5046-5049, 2182-2958, 2960-3887, 1025-2180, 8082-9041, 
> 9043-9159, 9161-, 5052-6999, 7002-7198, 7200-8079, 10001-65535]) on slave 
> 20150407-233647-2059219722-5050-1659-S5 from framework 
> 20150408-002100-4261056010-5050-1047-0008
> Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 
> 20:19:43.008712  1067 master.cpp:4714] Removing executor 
> 'executor.journalnode.NodeExecutor.1428609184051' with resources cpus(*):0.1; 
> mem(*):1536 of framework 20150408-002100-4261056010-5050-1047-0008 on slave 
> 20150407-233647-2059219722-5050-1659-S5 at slave(1)@10.168.119.78:5051 
> (ec2-54-237-57-237.compute-1.amazonaws.com)
> Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 
> 20:19:43.010372  1067 master.cpp:3295] Status update TASK_LOST (UUID: 
> e5532567-e5b2-4fca-87aa-f3f98e371640) for task 
> 

[jira] [Commented] (MESOS-2605) The slave sometimes does not send active executors during reregistration

2015-04-14 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494989#comment-14494989
 ] 

Michael Park commented on MESOS-2605:
-

Reporting recent findings.

{code:title=master}
Apr 14 18:49:40 ip-10-168-90-31.ec2.internal mesos-master[1226]: W0414 
18:49:40.078554  1248 master.cpp:4015] Executor 
executor.journalnode.NodeExecutor.1429034850690 of framework 
20150408-055737-526034954-5050-1226-0393 possibly unknown to the slave 
20150408-055737-526034954-5050-1226-S9 at slave(1)@10.154.8.101:5051 
(ec2-54-237-83-163.compute-1.amazonaws.com)
{code}

{code:title=slave}
Apr 14 18:49:36 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 
18:49:36.802649 18193 slave.cpp:4305] Recovering executor 
'executor.journalnode.NodeExecutor.1429034850690' of framework 
20150408-055737-526034954-5050-1226-0393
/* ... */
Apr 14 18:49:36 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 
18:49:36.832767 18188 status_update_manager.cpp:205] Recovering executor 
'executor.journalnode.NodeExecutor.1429034850690' of framework 
20150408-055737-526034954-5050-1226-0393
/* ... */
Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 
18:49:36.857517 18189 docker.cpp:470] Recovering container 
'5338e6cf-03ac-4882-a08e-48bfd6d797dc' for executor 
'executor.journalnode.NodeExecutor.1429034850690' of framework 
20150408-055737-526034954-5050-1226-0393
/* ... */
Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 
18:49:36.870594 18190 containerizer.cpp:350] Recovering container 
'5338e6cf-03ac-4882-a08e-48bfd6d797dc' for executor 
'executor.journalnode.NodeExecutor.1429034850690' of framework 
20150408-055737-526034954-5050-1226-0393
{code}

So we somehow we're calling {{recover}} in {{docker.cpp}} as well as 
{{containerizer.cpp}}. But based on the fact that HDFS doesn't use {{docker}} 
at all, along with this log:

{code}
Apr 14 18:07:30 ip-10-154-8-101.ec2.internal mesos-slave[11172]: I0414 
18:07:30.708111 11187 containerizer.cpp:472] Starting container 
'5338e6cf-03ac-4882-a08e-48bfd6d797dc' for executor 
'executor.journalnode.NodeExecutor.1429034850690' of framework 
'20150408-055737-526034954-5050-1226-0393'
{code}

We should only be calling it for {{containerizer.cpp}} only.

The slave proceeds to log the following sequence of events, which shows that we 
try to docker recover the containers and when we can't find them, we terminate 
the executor.

{code}
Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 
18:49:37.602605 18186 slave.cpp:3738] Sending reconnect request to executor 
executor.journalnode.NodeExecutor.1429034850690 of framework 
20150408-055737-526034954-5050-1226-0393 at executor(1)@10.154.8.101:60097
/* ... */
Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 
18:49:37.611616 18186 slave.cpp:2321] Re-registering executor 
executor.journalnode.NodeExecutor.1429034850690 of framework 
20150408-055737-526034954-5050-1226-0393
/* ... */
Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: E0414 
18:49:37.862635 18190 slave.cpp:2456] Failed to update resources for container 
5338e6cf-03ac-4882-a08e-48bfd6d797dc of executor 
'executor.journalnode.NodeExecutor.1429034850690' of framework 
20150408-055737-526034954-5050-1226-0393, destroying container: Failed to 
'docker inspect mesos-5338e6cf-03ac-4882-a08e-48bfd6d797dc': exit status = 
exited with status 1 stderr = Error: No such image or container: 
mesos-5338e6cf-03ac-4882-a08e-48bfd6d797dc
/* ... */
Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: E0414 
18:49:37.976002 18187 slave.cpp:3191] Termination of executor 
'executor.journalnode.NodeExecutor.1429034850690' of framework 
'20150408-055737-526034954-5050-1226-0393' failed: Failed to kill the Docker 
container: Failed to 'docker stop -t 0 
mesos-5338e6cf-03ac-4882-a08e-48bfd6d797dc': exit status = exited with status 1 
stderr = Error response from daemon: No such container: 
mesos-5338e6cf-03ac-4882-a08e-48bfd6d797dc
/* ... */
Apr 14 18:49:37 ip-10-154-8-101.ec2.internal mesos-slave[18180]: E0414 
18:49:37.977609 18187 slave.cpp:2653] Failed to update resources for container 
5338e6cf-03ac-4882-a08e-48bfd6d797dc of executor 
executor.journalnode.NodeExecutor.1429034850690 running task 
task.journalnode.journalnode.NodeExecutor.1429034850690 on status update for 
terminal task, destroying container: Container 
'5338e6cf-03ac-4882-a08e-48bfd6d797dc' not found
/* ... */
Apr 14 18:49:40 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 
18:49:40.079958 18188 slave.cpp:949] MPARK: Slave::doReliableRegistration
Apr 14 18:49:40 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 
18:49:40.080099 18188 slave.cpp:1053] MPARK: Executor 
'executor.namenode.NameNodeExecutor.1429034908782' is terminated!
Apr 14 18:49:40 ip-10-154-8-101.ec2.internal mesos-slave[18180]: I0414 
18:49:4

[jira] [Commented] (MESOS-2191) Add ContainerId to the TaskStatus message

2015-04-14 Thread Marcel Neuhausler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494987#comment-14494987
 ] 

Marcel Neuhausler commented on MESOS-2191:
--

Hi Timothy,

Getting the "docker inspect json output" as part of the TaskInfo in the 
TASK_RUNNING TaskStatus message would be perfect :-)

Thanks!

> Add ContainerId to the TaskStatus message
> -
>
> Key: MESOS-2191
> URL: https://issues.apache.org/jira/browse/MESOS-2191
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Marcel Neuhausler
>Assignee: Alexander Rojas
>  Labels: mesosphere
>
> {{TaskStatus}} provides the frameworks with certain information 
> ({{executorId}}, {{slaveId}}, etc.) which is useful when collecting 
> statistics about cluster performance; however, it is difficult to associate 
> tasks to the container it is executed since this information stays always 
> within mesos itself. Therefore it would be good to provide the framework 
> scheduler with this information, adding a new field in the {{TaskStatus}} 
> message.
> See comments for a use case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1939) Enable multiple authentication methods in parallel

2015-04-14 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494872#comment-14494872
 ] 

Adam B commented on MESOS-1939:
---

Note that the Authorizer and Authenticator/Authenticatee are two different 
(sets of) interfaces. We have already turned the Authentication interfaces into 
Mesos Modules, but have yet to do the same for the Authorizer interface.

This JIRA is specifically for Authentication (not Authorization), for example 
the slaves could use the default CRAMMD5Authenticatee 
(src/authentication/cram_md5/authenticatee.hpp) while frameworks could 
authenticate via a custom authentication module (e.g. Kerberos, PKI, etc.). In 
the master, you would specify multiple authenticator modules, and the master 
could have a collection (list, set) of them. On the authenticatee side, the 
framework/slave would (phase 1) be started with a single authenticatee type, or 
(phase 2) a list of authentication mechanisms, in some order of preference. The 
authenticatee would have to pass the authentication mechanism to the master 
(perhaps via the AuthenticateMessage) so that the master can know which 
Authenticator to use to authenticate the authenticatee.

> Enable multiple authentication methods in parallel
> --
>
> Key: MESOS-1939
> URL: https://issues.apache.org/jira/browse/MESOS-1939
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Till Toenshoff
>Priority: Minor
>  Labels: authentication
>
> The master (authenticator) should allow for multiple authentication 
> mechanisms to be used at the same time. That way, a slave could be 
> authenticated by mechanism FOO while the frameworks are authenticated by BAR.
> The authenticatee should be allowed to select the desired mechanism (module).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2233) Run ASF CI mesos builds inside docker

2015-04-14 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494617#comment-14494617
 ] 

Vinod Kone commented on MESOS-2233:
---

Some relevant discussions on Docker's GitHub.

https://github.com/docker/docker/issues/7276

https://github.com/Unitech/PM2/issues/1086

I'll test with "--privilieged" flag.

> Run ASF CI mesos builds inside docker
> -
>
> Key: MESOS-2233
> URL: https://issues.apache.org/jira/browse/MESOS-2233
> Project: Mesos
>  Issue Type: Task
>  Components: technical debt
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>  Labels: twitter
> Attachments: Dockerfile, supervisord.conf
>
>
> There are several limitations to mesos projects current state of CI, which is 
> run on builds.a.o
> --> Only runs on Ubuntu
> --> Doesn't run any tests that deal with cgroups
> --> Doesn't run any tests that need root permissions
> Now that ASF CI supports docker 
> (https://issues.apache.org/jira/browse/BUILDS-25), it would be great for the 
> Mesos project to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2616) Update C++ style guide on variable naming.

2015-04-14 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494607#comment-14494607
 ] 

Till Toenshoff commented on MESOS-2616:
---

https://reviews.apache.org/r/32536/

> Update C++ style guide on variable naming. 
> ---
>
> Key: MESOS-2616
> URL: https://issues.apache.org/jira/browse/MESOS-2616
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Till Toenshoff
>Assignee: Alexander Rukletsov
>Priority: Minor
>
> Our variable naming guide currently is not really explaining use cases for 
> leading or trailing underscores as found a lot within our codebase. 
> We should correct that.
> The following was copied from the review description for allowing discussions 
> where needed:
> Documents the patterns we use to name variables and function arguments in our 
> codebase.
> h4.Leading underscores to avoid ambiguity.
> We use this pattern extensively in libprocess, stout and mesos, a few 
> examples below.
> * stout/try.hpp:105
> {noformat}
> Try(State _state, T* _t = NULL, const std::string& _message = "")
>   : state(_state), t(_t), message(_message) {}
> {noformat}
> * process/http.hpp:480
> {noformat}
>   URL(const std::string& _scheme,
>   const std::string& _domain,
>   const uint16_t _port = 80,
>   const std::string& _path = "/",
>   const hashmap& _query =
> (hashmap()),
>   const Option& _fragment = None())
> : scheme(_scheme),
>   domain(_domain),
>   port(_port),
>   path(_path),
>   query(_query),
>   fragment(_fragment) {}
> {noformat}
> * slave/containerizer/linux_launcher.cpp:56
> {noformat}
> LinuxLauncher::LinuxLauncher(
> const Flags& _flags,
> int _namespaces,
> const string& _hierarchy)
>   : flags(_flags),
> namespaces(_namespaces),
> hierarchy(_hierarchy) {}
> {noformat}
> h4.Trailing undescores as prime symbols.
> We use this pattern in the code, though not extensively. We would like to see 
> more pass-by-value instead of creating copies from a variable passed by const 
> reference.
> * master.cpp:2942
> {noformat}
> // Create and add the slave id.
> SlaveInfo slaveInfo_ = slaveInfo;
> slaveInfo_.mutable_id()->CopyFrom(newSlaveId());
> {noformat}
> * slave.cpp:4180
> {noformat}
> ExecutorInfo executorInfo_ = executor->info;
> Resources resources = executorInfo_.resources();
> resources += taskInfo.resources();
> executorInfo_.mutable_resources()->CopyFrom(resources);
> {noformat}
> * status_update_manager.cpp:474
> {noformat}
> // Bounded exponential backoff.
> Duration duration_ =
> std::min(duration * 2, STATUS_UPDATE_RETRY_INTERVAL_MAX);
> {noformat}
> * containerizer/mesos/containerizer.cpp:109
> {noformat}
> // Modify the flags to include any changes to isolation.
> Flags flags_ = flags;
> flags_.isolation = isolation;
> {noformat}
> h4.Passing arguments by value.
> * slave.cpp:2480
> {noformat}
> void Slave::statusUpdate(StatusUpdate update, const UPID& pid)
> {
>   ...
>   // Set the source before forwarding the status update.
>   update.mutable_status()->set_source(
>   pid == UPID() ? TaskStatus::SOURCE_SLAVE : TaskStatus::SOURCE_EXECUTOR);
>   ...
> }
> {noformat}
> * process/metrics/timer.hpp:103
> {noformat}
>   static void _time(Time start, Timer that)
>   {
> const Time stop = Clock::now();
> double value;
> process::internal::acquire(&that.data->lock);
> {
>   that.data->lastValue = T(stop - start).value();
>   value = that.data->lastValue.get();
> }
> process::internal::release(&that.data->lock);
> that.push(value);
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2233) Run ASF CI mesos builds inside docker

2015-04-14 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494526#comment-14494526
 ] 

Vinod Kone commented on MESOS-2233:
---

Definitely looks like an "apparmor" permissions issue (though it is weird that 
it only happens intermittently).

Here's the relevant kernel log output from the latest failed build

https://builds.apache.org/job/vinod-docker-multi/COMPILER=gcc,LABEL=docker%7C%7CHadoop,OS=ubuntu:14.10/113/consoleFull

configure fails due to perm issue for 'sed'.

{code}
config.status: creating include/Makefile
config.status: creating 3rdparty/Makefile
sed: error while loading shared libraries: libpthread.so.0: failed to map 
segment from shared object: Permission denied
config.status: creating 3rdparty/gmock_sources.cc
config.status: executing depfiles commands
config.status: executing libtool commands
=== configuring in 3rdparty/stout (/mesos/3rdparty/libprocess/3rdparty/stout)
configure: running /bin/bash ./configure --disable-option-checking 
'--prefix=/usr/local'  'CC=gcc' 'CXX=g++' '--enable-shared=no' '--with-pic' 
--cache-file=/dev/null --srcdir=.
{code}

apparmor denies 'sed'
{code}
[16654475.072801] type=1400 audit(1429008557.507:152643): apparmor="DENIED" 
operation="file_mmap" profile="docker-default" 
name="s-0.23.0/_build/confwWH8Lp/subs.awk" pid=15083 comm="sed" 
requested_mask="mr" denied_mask="mr" fsuid=1000 ouid=0
[16657331.416018] docker0: port 1(vethad50be3) entered disabled state
[16657331.416570] device vethad50be3 left promiscuous mode
[16657331.416581] docker0: port 1(vethad50be3) entered disabled state
[16658694.375946] device veth9c2198d entered promiscuous mode
[16658694.376446] IPv6: ADDRCONF(NETDEV_UP): veth9c2198d: link is not ready
[16658694.406234] IPv6: ADDRCONF(NETDEV_CHANGE): veth9c2198d: link becomes ready
[16658694.406272] docker0: port 1(veth9c2198d) entered forwarding state
[16658694.406277] docker0: port 1(veth9c2198d) entered forwarding state
[16658695.048564] docker0: port 1(veth9c2198d) entered disabled state
[16658695.049197] device veth9c2198d left promiscuous mode
[16658695.049227] docker0: port 1(veth9c2198d) entered disabled state
[16658698.266384] device veth3b62949 entered promiscuous mode
[16658698.266793] IPv6: ADDRCONF(NETDEV_UP): veth3b62949: link is not ready
[16658698.305110] IPv6: ADDRCONF(NETDEV_CHANGE): veth3b62949: link becomes ready
[16658698.305145] docker0: port 1(veth3b62949) entered forwarding state
[16658698.305152] docker0: port 1(veth3b62949) entered forwarding state
[16658713.370021] docker0: port 1(veth3b62949) entered forwarding state
[16662827.593876] docker0: port 1(veth3b62949) entered disabled state
[16662827.594835] device veth3b62949 left promiscuous mode
[16662827.594848] docker0: port 1(veth3b62949) entered disabled state
[16662848.574731] device veth51c14c8 entered promiscuous mode
[16662848.575105] IPv6: ADDRCONF(NETDEV_UP): veth51c14c8: link is not ready
[16662848.606153] IPv6: ADDRCONF(NETDEV_CHANGE): veth51c14c8: link becomes ready
[16662848.606193] docker0: port 1(veth51c14c8) entered forwarding state
[16662848.606200] docker0: port 1(veth51c14c8) entered forwarding state
[16662849.211224] docker0: port 1(veth51c14c8) entered disabled state
[16662849.212002] device veth51c14c8 left promiscuous mode
[16662849.212015] docker0: port 1(veth51c14c8) entered disabled state
[16662852.417932] device veth1fa6472 entered promiscuous mode
[16662852.418403] IPv6: ADDRCONF(NETDEV_UP): veth1fa6472: link is not ready
[16662852.444529] IPv6: ADDRCONF(NETDEV_CHANGE): veth1fa6472: link becomes ready
[16662852.444578] docker0: port 1(veth1fa6472) entered forwarding state
[16662852.444586] docker0: port 1(veth1fa6472) entered forwarding state
[16662867.497495] docker0: port 1(veth1fa6472) entered forwarding state
[1968.777955] docker0: port 1(veth1fa6472) entered disabled state
[1968.778929] device veth1fa6472 left promiscuous mode
[1968.778942] docker0: port 1(veth1fa6472) entered disabled state
[16671013.616616] device veth98fd1d3 entered promiscuous mode
[16671013.617074] IPv6: ADDRCONF(NETDEV_UP): veth98fd1d3: link is not ready
[16671013.658662] IPv6: ADDRCONF(NETDEV_CHANGE): veth98fd1d3: link becomes ready
[16671013.658709] docker0: port 1(veth98fd1d3) entered forwarding state
[16671013.658715] docker0: port 1(veth98fd1d3) entered forwarding state
[16671014.251222] docker0: port 1(veth98fd1d3) entered disabled state
[16671014.252164] device veth98fd1d3 left promiscuous mode
[16671014.252177] docker0: port 1(veth98fd1d3) entered disabled state
[16671017.698131] device veth40c3c60 entered promiscuous mode
[16671017.698709] IPv6: ADDRCONF(NETDEV_UP): veth40c3c60: link is not ready
[16671017.742416] IPv6: ADDRCONF(NETDEV_CHANGE): veth40c3c60: link becomes ready
[16671017.742465] docker0: port 1(veth40c3c60) entered forwarding state
[16671017.742473] docker0: port 1(veth40c3c60) entered forwarding state
[1

[jira] [Assigned] (MESOS-1865) Mesos APIs for non-leading masters should return copies of the leader's state or an error, not a success with incorrect information

2015-04-14 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-1865:
---

Assignee: haosdent

> Mesos APIs for non-leading masters should return copies of the leader's state 
> or an error, not a success with incorrect information
> ---
>
> Key: MESOS-1865
> URL: https://issues.apache.org/jira/browse/MESOS-1865
> Project: Mesos
>  Issue Type: Bug
>  Components: json api
>Affects Versions: 0.20.1
>Reporter: Steven Schlansker
>Assignee: haosdent
>
> Some of the API endpoints, for example /master/tasks.json, will return bogus 
> information if you query a non-leading master:
> {code}
> [steven@Anesthetize:~]% curl 
> http://master1.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": []
> }
> [steven@Anesthetize:~]% curl 
> http://master2.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": []
> }
> [steven@Anesthetize:~]% curl 
> http://master3.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": [
> {
>   "executor_id": "",
>   "framework_id": "20140724-231003-419644938-5050-1707-",
>   "id": 
> "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db",
>   "name": 
> "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db",
>   "resources": {
> "cpus": 0.25,
> "disk": 0,
> {code}
> This is very hard for end-users to work around.  For example if I query 
> "which master is leading" followed by "leader: which tasks are running" it is 
> possible that the leader fails over in between, leaving me with an incorrect 
> answer and no way to know that this happened.
> In my opinion the API should return the correct response (by asking the 
> current leader?) or an error (500 Not the leader?) but it's unacceptable to 
> return a successful wrong answer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1865) Mesos APIs for non-leading masters should return copies of the leader's state or an error, not a success with incorrect information

2015-04-14 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494511#comment-14494511
 ] 

haosdent commented on MESOS-1865:
-

[~stevenschlansker] Thank you, let me try to fix this.

> Mesos APIs for non-leading masters should return copies of the leader's state 
> or an error, not a success with incorrect information
> ---
>
> Key: MESOS-1865
> URL: https://issues.apache.org/jira/browse/MESOS-1865
> Project: Mesos
>  Issue Type: Bug
>  Components: json api
>Affects Versions: 0.20.1
>Reporter: Steven Schlansker
>
> Some of the API endpoints, for example /master/tasks.json, will return bogus 
> information if you query a non-leading master:
> {code}
> [steven@Anesthetize:~]% curl 
> http://master1.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": []
> }
> [steven@Anesthetize:~]% curl 
> http://master2.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": []
> }
> [steven@Anesthetize:~]% curl 
> http://master3.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": [
> {
>   "executor_id": "",
>   "framework_id": "20140724-231003-419644938-5050-1707-",
>   "id": 
> "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db",
>   "name": 
> "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db",
>   "resources": {
> "cpus": 0.25,
> "disk": 0,
> {code}
> This is very hard for end-users to work around.  For example if I query 
> "which master is leading" followed by "leader: which tasks are running" it is 
> possible that the leader fails over in between, leaving me with an incorrect 
> answer and no way to know that this happened.
> In my opinion the API should return the correct response (by asking the 
> current leader?) or an error (500 Not the leader?) but it's unacceptable to 
> return a successful wrong answer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-354) Oversubscribe resources

2015-04-14 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-354:
-
Issue Type: Epic  (was: Story)
   Summary: Oversubscribe resources  (was: oversubscribe resources)

> Oversubscribe resources
> ---
>
> Key: MESOS-354
> URL: https://issues.apache.org/jira/browse/MESOS-354
> Project: Mesos
>  Issue Type: Epic
>  Components: isolation, master, slave
>Reporter: brian wickman
>Priority: Minor
> Attachments: mesos_virtual_offers.pdf
>
>
> This proposal is predicated upon offer revocation.
> The idea would be to add a new "revoked" status either by (1) piggybacking 
> off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a 
> new status update TASK_REVOKED.
> In order to augment an offer with metadata about revocability, there are 
> options:
>   1) Add a revocable boolean to the Offer and
> a) offer only one type of Offer per slave at a particular time
> b) offer both revocable and non-revocable resources at the same time but 
> require frameworks to understand that Offers can contain overlapping resources
>   2) Add a revocable_resources field on the Offer which is a superset of the 
> regular resources field.  By consuming > resources <= revocable_resources in 
> a launchTask, the Task becomes a revocable task.  If launching a task with < 
> resources, the Task is non-revocable.
> The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) 
> and non-revocable tasks are online higher-SLA tasks (e.g. services.)
> Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk.  
> One of these resources is a rate (4 cpu seconds per second) and two of them 
> are fixed values (8GB and 20GB respectively, though disk resources can be 
> further broken down into spindles - fixed - and iops - a rate.)  In practice, 
> these are the maximum resources in the respective dimensions that this task 
> will use.  In reality, we provision tasks at some factor below peak, and only 
> hit peak resource consumption in rare circumstances or perhaps at a diurnal 
> peak.  
> In the meantime, we stand to gain from offering the some constant factor of 
> the difference between (reserved - actual) of non-revocable tasks as 
> revocable resources, depending upon our tolerance for revocable task churn.  
> The main challenge is coming up with an accurate short / medium / long-term 
> prediction of resource consumption based upon current behavior.
> In many cases it would be OK to be sloppy:
>   * CPU / iops / network IO are rates (compressible) and can often be OK 
> below guarantees for brief periods of time while task revocation takes place
>   * Memory slack can be provided by enabling swap and dynamically setting 
> swap paging boundaries.  Should swap ever be activated, that would be a 
> signal to revoke.
> The master / allocator would piggyback on the slave heartbeat mechanism to 
> learn of the amount of revocable resources available at any point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-1865) Mesos APIs for non-leading masters should return copies of the leader's state or an error, not a success with incorrect information

2015-04-14 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494499#comment-14494499
 ] 

haosdent edited comment on MESOS-1865 at 4/14/15 6:08 PM:
--

Hi, [~stevenschlansker] Sorry to could not got your idea. Do you mean the 
non-leading master should return error status code instead of return { "tasks": 
[]}


was (Author: haosd...@gmail.com):
Hi, [~stevenschlansker] Sorry to could not got your idea. Do you mean the 
non-leading master should return error status code instead of return "{ 
"tasks": []}"?

> Mesos APIs for non-leading masters should return copies of the leader's state 
> or an error, not a success with incorrect information
> ---
>
> Key: MESOS-1865
> URL: https://issues.apache.org/jira/browse/MESOS-1865
> Project: Mesos
>  Issue Type: Bug
>  Components: json api
>Affects Versions: 0.20.1
>Reporter: Steven Schlansker
>
> Some of the API endpoints, for example /master/tasks.json, will return bogus 
> information if you query a non-leading master:
> {code}
> [steven@Anesthetize:~]% curl 
> http://master1.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": []
> }
> [steven@Anesthetize:~]% curl 
> http://master2.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": []
> }
> [steven@Anesthetize:~]% curl 
> http://master3.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": [
> {
>   "executor_id": "",
>   "framework_id": "20140724-231003-419644938-5050-1707-",
>   "id": 
> "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db",
>   "name": 
> "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db",
>   "resources": {
> "cpus": 0.25,
> "disk": 0,
> {code}
> This is very hard for end-users to work around.  For example if I query 
> "which master is leading" followed by "leader: which tasks are running" it is 
> possible that the leader fails over in between, leaving me with an incorrect 
> answer and no way to know that this happened.
> In my opinion the API should return the correct response (by asking the 
> current leader?) or an error (500 Not the leader?) but it's unacceptable to 
> return a successful wrong answer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1865) Mesos APIs for non-leading masters should return copies of the leader's state or an error, not a success with incorrect information

2015-04-14 Thread Steven Schlansker (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494506#comment-14494506
 ] 

Steven Schlansker commented on MESOS-1865:
--

Yes.  Or it should return the correct results.  Really, it should do just about 
anything rather than returning a valid but incorrect result.

> Mesos APIs for non-leading masters should return copies of the leader's state 
> or an error, not a success with incorrect information
> ---
>
> Key: MESOS-1865
> URL: https://issues.apache.org/jira/browse/MESOS-1865
> Project: Mesos
>  Issue Type: Bug
>  Components: json api
>Affects Versions: 0.20.1
>Reporter: Steven Schlansker
>
> Some of the API endpoints, for example /master/tasks.json, will return bogus 
> information if you query a non-leading master:
> {code}
> [steven@Anesthetize:~]% curl 
> http://master1.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": []
> }
> [steven@Anesthetize:~]% curl 
> http://master2.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": []
> }
> [steven@Anesthetize:~]% curl 
> http://master3.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": [
> {
>   "executor_id": "",
>   "framework_id": "20140724-231003-419644938-5050-1707-",
>   "id": 
> "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db",
>   "name": 
> "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db",
>   "resources": {
> "cpus": 0.25,
> "disk": 0,
> {code}
> This is very hard for end-users to work around.  For example if I query 
> "which master is leading" followed by "leader: which tasks are running" it is 
> possible that the leader fails over in between, leaving me with an incorrect 
> answer and no way to know that this happened.
> In my opinion the API should return the correct response (by asking the 
> current leader?) or an error (500 Not the leader?) but it's unacceptable to 
> return a successful wrong answer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1865) Mesos APIs for non-leading masters should return copies of the leader's state or an error, not a success with incorrect information

2015-04-14 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494499#comment-14494499
 ] 

haosdent commented on MESOS-1865:
-

Hi, [~stevenschlansker] Sorry to could not got your idea. Do you mean the 
non-leading master should return error status code instead of return "{ 
"tasks": []}"?

> Mesos APIs for non-leading masters should return copies of the leader's state 
> or an error, not a success with incorrect information
> ---
>
> Key: MESOS-1865
> URL: https://issues.apache.org/jira/browse/MESOS-1865
> Project: Mesos
>  Issue Type: Bug
>  Components: json api
>Affects Versions: 0.20.1
>Reporter: Steven Schlansker
>
> Some of the API endpoints, for example /master/tasks.json, will return bogus 
> information if you query a non-leading master:
> {code}
> [steven@Anesthetize:~]% curl 
> http://master1.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": []
> }
> [steven@Anesthetize:~]% curl 
> http://master2.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": []
> }
> [steven@Anesthetize:~]% curl 
> http://master3.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 
> 10
> {
>   "tasks": [
> {
>   "executor_id": "",
>   "framework_id": "20140724-231003-419644938-5050-1707-",
>   "id": 
> "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db",
>   "name": 
> "pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db",
>   "resources": {
> "cpus": 0.25,
> "disk": 0,
> {code}
> This is very hard for end-users to work around.  For example if I query 
> "which master is leading" followed by "leader: which tasks are running" it is 
> possible that the leader fails over in between, leaving me with an incorrect 
> answer and no way to know that this happened.
> In my opinion the API should return the correct response (by asking the 
> current leader?) or an error (500 Not the leader?) but it's unacceptable to 
> return a successful wrong answer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1939) Enable multiple authentication methods in parallel

2015-04-14 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494494#comment-14494494
 ] 

haosdent commented on MESOS-1939:
-

Hi, [~adam-mesos] I only could find LocalAuthorizer(ACL) now. So is it 
necessary to implement this?

And me idea to implement this is add a map which contains different authorizers 
to master.cpp. It is ok to add a new field to master.cpp?

> Enable multiple authentication methods in parallel
> --
>
> Key: MESOS-1939
> URL: https://issues.apache.org/jira/browse/MESOS-1939
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Till Toenshoff
>Priority: Minor
>  Labels: authentication
>
> The master (authenticator) should allow for multiple authentication 
> mechanisms to be used at the same time. That way, a slave could be 
> authenticated by mechanism FOO while the frameworks are authenticated by BAR.
> The authenticatee should be allowed to select the desired mechanism (module).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-354) oversubscribe resources

2015-04-14 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494492#comment-14494492
 ] 

Niklas Quarfot Nielsen commented on MESOS-354:
--

Folks,

here is the architecture document we have been working on for introducing 
oversubscription in Mesos: 
https://docs.google.com/document/d/1pUnElxHy1uWfHY_FOvvRC73QaOGgdXE0OXN-gbxdXA0/edit#

It is still work in progress, so feel free to add suggestions and raise 
concerns.

> oversubscribe resources
> ---
>
> Key: MESOS-354
> URL: https://issues.apache.org/jira/browse/MESOS-354
> Project: Mesos
>  Issue Type: Story
>  Components: isolation, master, slave
>Reporter: brian wickman
>Priority: Minor
> Attachments: mesos_virtual_offers.pdf
>
>
> This proposal is predicated upon offer revocation.
> The idea would be to add a new "revoked" status either by (1) piggybacking 
> off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a 
> new status update TASK_REVOKED.
> In order to augment an offer with metadata about revocability, there are 
> options:
>   1) Add a revocable boolean to the Offer and
> a) offer only one type of Offer per slave at a particular time
> b) offer both revocable and non-revocable resources at the same time but 
> require frameworks to understand that Offers can contain overlapping resources
>   2) Add a revocable_resources field on the Offer which is a superset of the 
> regular resources field.  By consuming > resources <= revocable_resources in 
> a launchTask, the Task becomes a revocable task.  If launching a task with < 
> resources, the Task is non-revocable.
> The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) 
> and non-revocable tasks are online higher-SLA tasks (e.g. services.)
> Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk.  
> One of these resources is a rate (4 cpu seconds per second) and two of them 
> are fixed values (8GB and 20GB respectively, though disk resources can be 
> further broken down into spindles - fixed - and iops - a rate.)  In practice, 
> these are the maximum resources in the respective dimensions that this task 
> will use.  In reality, we provision tasks at some factor below peak, and only 
> hit peak resource consumption in rare circumstances or perhaps at a diurnal 
> peak.  
> In the meantime, we stand to gain from offering the some constant factor of 
> the difference between (reserved - actual) of non-revocable tasks as 
> revocable resources, depending upon our tolerance for revocable task churn.  
> The main challenge is coming up with an accurate short / medium / long-term 
> prediction of resource consumption based upon current behavior.
> In many cases it would be OK to be sloppy:
>   * CPU / iops / network IO are rates (compressible) and can often be OK 
> below guarantees for brief periods of time while task revocation takes place
>   * Memory slack can be provided by enabling swap and dynamically setting 
> swap paging boundaries.  Should swap ever be activated, that would be a 
> signal to revoke.
> The master / allocator would piggyback on the slave heartbeat mechanism to 
> learn of the amount of revocable resources available at any point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2203) Old Centos 6.5 kernels/headers not sufficient for building Mesos

2015-04-14 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494469#comment-14494469
 ] 

Vinod Kone commented on MESOS-2203:
---

[~idownes] Looks like this code path was introduced for pid namespaces support? 
Is it possible to have a configure check for this?

> Old Centos 6.5 kernels/headers not sufficient for building Mesos
> 
>
> Key: MESOS-2203
> URL: https://issues.apache.org/jira/browse/MESOS-2203
> Project: Mesos
>  Issue Type: Documentation
>Affects Versions: 0.21.0
> Environment: E.g. Centos 6.5 with kernel 2.6.32-279.14.1 
>Reporter: Hans van den Bogert
>Priority: Minor
>
> Old kernels are not sufficient for building Mesos:
> bq. 
> Error:
> bq. libtool: compile:  g++ -DPACKAGE_NAME=\"mesos\" 
> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.21.0\" 
> "-DPACKAGE_STRING=\"mesos 0.21.0\"" -DPACKAGE_BUGREPORT=\"\" 
> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.21.0\" -DSTDC_HEADERS=1 
> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 
> -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
> -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 
> -DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror 
> -DLIBDIR=\"/var/scratch/vdbogert/lib\" 
> -DPKGLIBEXECDIR=\"/var/scratch/vdbogert/libexec/mesos\" 
> -DPKGDATADIR=\"/var/scratch/vdbogert/share/mesos\" -I../../include 
> -I../../3rdparty/libprocess/include 
> -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
> -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
> -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
> -I../3rdparty/zookeeper-3.4.5/src/c/generated 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I/var/scratch/vdbogert//include/subversion-1 -I/usr/include/apr-1 
> -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 
> -MT slave/containerizer/isolators/namespaces/libmesos_no_3rdparty_la-pid.lo 
> -MD -MP -MF 
> slave/containerizer/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo
>  -c ../../src/slave/containerizer/isolators/namespaces/pid.cpp  -fPIC -DPIC 
> -o 
> slave/containerizer/isolators/namespaces/.libs/libmesos_no_3rdparty_la-pid.o
> In file included from /usr/include/sys/syscall.h:32:0,
>  from ../../src/linux/ns.hpp:26,
>  from 
> ../../src/slave/containerizer/isolators/namespaces/pid.cpp:31:
> ../../src/linux/ns.hpp: In function 'Try ns::setns(const string&, 
> const string&)':
> ../../src/linux/ns.hpp:167:23: error: '__NR_setns' was not declared in this 
> scope
>int ret = ::syscall(SYS_setns, fd.get(), nstype.get());
>^
> Perhaps this should be stated on:
> http://mesos.apache.org/gettingstarted/ because taking myself as example, 
> this has cost me a lot of time to pinpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2605) The slave sometimes does not send active executors during reregistration

2015-04-14 Thread Elizabeth Lingg (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494339#comment-14494339
 ] 

Elizabeth Lingg commented on MESOS-2605:


As an additional comment, this behavior is observed on our Core OS Cluster. On 
this cluster, we have restarts of Mesos slaves as well as reboots of the 
machine. It seems to happen upon restarts of the Mesos slaves sometimes, but 
not all the time.

> The slave sometimes does not send active executors during reregistration
> 
>
> Key: MESOS-2605
> URL: https://issues.apache.org/jira/browse/MESOS-2605
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Elizabeth Lingg
>Assignee: Michael Park
>  Labels: mesosphere
>
> The slave sometimes does not send active executors during reregistration. 
> Framework checkpointing is enabled, and the executor successfully 
> reregisters. However, the tasks in that executor are LOST (by abnormal 
> executor termination) because the executor is removed by the mesos master as 
> unknown. See the example below, 
> task.journalnode.journalnode.NodeExecutor.1428609184051.
> See the Slave Logs here for the Task:
> {code}
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.778790 25126 status_update_manager.cpp:317] Received status update 
> TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task 
> task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.779013 25126 status_update_manager.hpp:346] Checkpointing UPDATE for 
> status update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for 
> task task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.781788 25123 slave.cpp:2753] Forwarding the update TASK_RUNNING 
> (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task 
> task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008 to master@10.142.250.253:5050
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.781889 25123 slave.cpp:2686] Sending acknowledgement for status 
> update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task 
> task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008 to executor(1)@10.168.119.78:47638
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.784503 25124 status_update_manager.cpp:389] Received status update 
> acknowledgement (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for task 
> task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008
> Apr 09 19:53:06 ip-10-168-119-78.ec2.internal mesos-slave[25116]: I0409 
> 19:53:06.784567 25124 status_update_manager.hpp:346] Checkpointing ACK for 
> status update TASK_RUNNING (UUID: 4eb22075-c319-463d-8f70-94db9caa69c6) for 
> task task.journalnode.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008
> {code}
> Master Logs:
> {code}
> Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: W0409 
> 20:19:43.008666  1067 master.cpp:4015] Executor 
> executor.journalnode.NodeExecutor.1428609184051 of framework 
> 20150408-002100-4261056010-5050-1047-0008 possibly unknown to the slave 
> 20150407-233647-2059219722-5050-1659-S5 at slave(1)@10.168.119.78:5051 
> (ec2-54-237-57-237.compute-1.amazonaws.com)
> Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 
> 20:19:43.008652  1074 hierarchical.hpp:648] Recovered cpus(*):0.1; 
> mem(*):1536 (total allocatable: cpus(*):3.5; mem(*):21113; disk(*):142210; 
> ports(*):[3889-5044, 5046-5049, 2182-2958, 2960-3887, 1025-2180, 8082-9041, 
> 9043-9159, 9161-, 5052-6999, 7002-7198, 7200-8079, 10001-65535]) on slave 
> 20150407-233647-2059219722-5050-1659-S5 from framework 
> 20150408-002100-4261056010-5050-1047-0008
> Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 
> 20:19:43.008712  1067 master.cpp:4714] Removing executor 
> 'executor.journalnode.NodeExecutor.1428609184051' with resources cpus(*):0.1; 
> mem(*):1536 of framework 20150408-002100-4261056010-5050-1047-0008 on slave 
> 20150407-233647-2059219722-5050-1659-S5 at slave(1)@10.168.119.78:5051 
> (ec2-54-237-57-237.compute-1.amazonaws.com)
> Apr 09 20:19:43 ip-10-142-250-253.ec2.internal mesos-master[1047]: I0409 
> 20:19:43.010372  1067 master.cpp:3295] Status update TASK_LOST (UUID

[jira] [Commented] (MESOS-2203) Old Centos 6.5 kernels/headers not sufficient for building Mesos

2015-04-14 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494337#comment-14494337
 ] 

haosdent commented on MESOS-2203:
-

I use CentOS 6.5

My kernel version:
{quote}
$ uname -a
Linux xxx 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 
x86_64 x86_64 GNU/Linux
$ cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m
{quote}

My gcc version:

{quote}
$ g++ --version
g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
{quote}

And I could build success and pass all unit test cases in CentOS 6.5

> Old Centos 6.5 kernels/headers not sufficient for building Mesos
> 
>
> Key: MESOS-2203
> URL: https://issues.apache.org/jira/browse/MESOS-2203
> Project: Mesos
>  Issue Type: Documentation
>Affects Versions: 0.21.0
> Environment: E.g. Centos 6.5 with kernel 2.6.32-279.14.1 
>Reporter: Hans van den Bogert
>Priority: Minor
>
> Old kernels are not sufficient for building Mesos:
> bq. 
> Error:
> bq. libtool: compile:  g++ -DPACKAGE_NAME=\"mesos\" 
> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.21.0\" 
> "-DPACKAGE_STRING=\"mesos 0.21.0\"" -DPACKAGE_BUGREPORT=\"\" 
> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.21.0\" -DSTDC_HEADERS=1 
> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 
> -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
> -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 
> -DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror 
> -DLIBDIR=\"/var/scratch/vdbogert/lib\" 
> -DPKGLIBEXECDIR=\"/var/scratch/vdbogert/libexec/mesos\" 
> -DPKGDATADIR=\"/var/scratch/vdbogert/share/mesos\" -I../../include 
> -I../../3rdparty/libprocess/include 
> -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
> -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
> -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
> -I../3rdparty/zookeeper-3.4.5/src/c/generated 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I/var/scratch/vdbogert//include/subversion-1 -I/usr/include/apr-1 
> -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 
> -MT slave/containerizer/isolators/namespaces/libmesos_no_3rdparty_la-pid.lo 
> -MD -MP -MF 
> slave/containerizer/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo
>  -c ../../src/slave/containerizer/isolators/namespaces/pid.cpp  -fPIC -DPIC 
> -o 
> slave/containerizer/isolators/namespaces/.libs/libmesos_no_3rdparty_la-pid.o
> In file included from /usr/include/sys/syscall.h:32:0,
>  from ../../src/linux/ns.hpp:26,
>  from 
> ../../src/slave/containerizer/isolators/namespaces/pid.cpp:31:
> ../../src/linux/ns.hpp: In function 'Try ns::setns(const string&, 
> const string&)':
> ../../src/linux/ns.hpp:167:23: error: '__NR_setns' was not declared in this 
> scope
>int ret = ::syscall(SYS_setns, fd.get(), nstype.get());
>^
> Perhaps this should be stated on:
> http://mesos.apache.org/gettingstarted/ because taking myself as example, 
> this has cost me a lot of time to pinpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2203) Old Centos 6.5 kernels/headers not sufficient for building Mesos

2015-04-14 Thread Mike Ringenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494307#comment-14494307
 ] 

Mike Ringenburg commented on MESOS-2203:


I'd second the suggestion to get this added to the getting started page - I 
just ran into it while building Mesos.  Luckily an quick search brought me 
here, but it'd be nice to have it stated up front.

> Old Centos 6.5 kernels/headers not sufficient for building Mesos
> 
>
> Key: MESOS-2203
> URL: https://issues.apache.org/jira/browse/MESOS-2203
> Project: Mesos
>  Issue Type: Documentation
>Affects Versions: 0.21.0
> Environment: E.g. Centos 6.5 with kernel 2.6.32-279.14.1 
>Reporter: Hans van den Bogert
>Priority: Minor
>
> Old kernels are not sufficient for building Mesos:
> bq. 
> Error:
> bq. libtool: compile:  g++ -DPACKAGE_NAME=\"mesos\" 
> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.21.0\" 
> "-DPACKAGE_STRING=\"mesos 0.21.0\"" -DPACKAGE_BUGREPORT=\"\" 
> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.21.0\" -DSTDC_HEADERS=1 
> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 
> -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
> -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 
> -DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror 
> -DLIBDIR=\"/var/scratch/vdbogert/lib\" 
> -DPKGLIBEXECDIR=\"/var/scratch/vdbogert/libexec/mesos\" 
> -DPKGDATADIR=\"/var/scratch/vdbogert/share/mesos\" -I../../include 
> -I../../3rdparty/libprocess/include 
> -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
> -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
> -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
> -I../3rdparty/zookeeper-3.4.5/src/c/generated 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I/var/scratch/vdbogert//include/subversion-1 -I/usr/include/apr-1 
> -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 
> -MT slave/containerizer/isolators/namespaces/libmesos_no_3rdparty_la-pid.lo 
> -MD -MP -MF 
> slave/containerizer/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo
>  -c ../../src/slave/containerizer/isolators/namespaces/pid.cpp  -fPIC -DPIC 
> -o 
> slave/containerizer/isolators/namespaces/.libs/libmesos_no_3rdparty_la-pid.o
> In file included from /usr/include/sys/syscall.h:32:0,
>  from ../../src/linux/ns.hpp:26,
>  from 
> ../../src/slave/containerizer/isolators/namespaces/pid.cpp:31:
> ../../src/linux/ns.hpp: In function 'Try ns::setns(const string&, 
> const string&)':
> ../../src/linux/ns.hpp:167:23: error: '__NR_setns' was not declared in this 
> scope
>int ret = ::syscall(SYS_setns, fd.get(), nstype.get());
>^
> Perhaps this should be stated on:
> http://mesos.apache.org/gettingstarted/ because taking myself as example, 
> this has cost me a lot of time to pinpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2617) The docker containerizer does not support configuring CFS quotas.

2015-04-14 Thread Steve Niemitz (JIRA)
Steve Niemitz created MESOS-2617:


 Summary: The docker containerizer does not support configuring CFS 
quotas.
 Key: MESOS-2617
 URL: https://issues.apache.org/jira/browse/MESOS-2617
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.22.0
Reporter: Steve Niemitz






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2616) Update C++ style guide on variable naming.

2015-04-14 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-2616:
--
Description: 
Our variable naming guide currently is not really explaining use cases for 
leading or trailing underscores as found a lot within our codebase. 

We should correct that.

The following was copied from the review description for allowing discussions 
where needed:

Documents the patterns we use to name variables and function arguments in our 
codebase.

h4.Leading underscores to avoid ambiguity.

We use this pattern extensively in libprocess, stout and mesos, a few examples 
below.

* stout/try.hpp:105
{noformat}
Try(State _state, T* _t = NULL, const std::string& _message = "")
  : state(_state), t(_t), message(_message) {}
{noformat}

* process/http.hpp:480
{noformat}
  URL(const std::string& _scheme,
  const std::string& _domain,
  const uint16_t _port = 80,
  const std::string& _path = "/",
  const hashmap& _query =
(hashmap()),
  const Option& _fragment = None())
: scheme(_scheme),
  domain(_domain),
  port(_port),
  path(_path),
  query(_query),
  fragment(_fragment) {}
{noformat}

* slave/containerizer/linux_launcher.cpp:56
{noformat}
LinuxLauncher::LinuxLauncher(
const Flags& _flags,
int _namespaces,
const string& _hierarchy)
  : flags(_flags),
namespaces(_namespaces),
hierarchy(_hierarchy) {}
{noformat}

h4.Trailing undescores as prime symbols.

We use this pattern in the code, though not extensively. We would like to see 
more pass-by-value instead of creating copies from a variable passed by const 
reference.

* master.cpp:2942
{noformat}
// Create and add the slave id.
SlaveInfo slaveInfo_ = slaveInfo;
slaveInfo_.mutable_id()->CopyFrom(newSlaveId());
{noformat}

* slave.cpp:4180
{noformat}
ExecutorInfo executorInfo_ = executor->info;
Resources resources = executorInfo_.resources();
resources += taskInfo.resources();
executorInfo_.mutable_resources()->CopyFrom(resources);
{noformat}

* status_update_manager.cpp:474
{noformat}
// Bounded exponential backoff.
Duration duration_ =
std::min(duration * 2, STATUS_UPDATE_RETRY_INTERVAL_MAX);
{noformat}

* containerizer/mesos/containerizer.cpp:109
{noformat}
// Modify the flags to include any changes to isolation.
Flags flags_ = flags;
flags_.isolation = isolation;
{noformat}

h4.Passing arguments by value.

* slave.cpp:2480
{noformat}
void Slave::statusUpdate(StatusUpdate update, const UPID& pid)
{
  ...
  // Set the source before forwarding the status update.
  update.mutable_status()->set_source(
  pid == UPID() ? TaskStatus::SOURCE_SLAVE : TaskStatus::SOURCE_EXECUTOR);
  ...
}
{noformat}

* process/metrics/timer.hpp:103
{noformat}
  static void _time(Time start, Timer that)
  {
const Time stop = Clock::now();

double value;

process::internal::acquire(&that.data->lock);
{
  that.data->lastValue = T(stop - start).value();
  value = that.data->lastValue.get();
}
process::internal::release(&that.data->lock);

that.push(value);
  }
{noformat}


  was:
Our variable naming guide currently is not really explaining use cases for 
leading or trailing underscores as found a lot within our codebase. 

We should correct that.


> Update C++ style guide on variable naming. 
> ---
>
> Key: MESOS-2616
> URL: https://issues.apache.org/jira/browse/MESOS-2616
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Till Toenshoff
>Assignee: Alexander Rukletsov
>Priority: Minor
>
> Our variable naming guide currently is not really explaining use cases for 
> leading or trailing underscores as found a lot within our codebase. 
> We should correct that.
> The following was copied from the review description for allowing discussions 
> where needed:
> Documents the patterns we use to name variables and function arguments in our 
> codebase.
> h4.Leading underscores to avoid ambiguity.
> We use this pattern extensively in libprocess, stout and mesos, a few 
> examples below.
> * stout/try.hpp:105
> {noformat}
> Try(State _state, T* _t = NULL, const std::string& _message = "")
>   : state(_state), t(_t), message(_message) {}
> {noformat}
> * process/http.hpp:480
> {noformat}
>   URL(const std::string& _scheme,
>   const std::string& _domain,
>   const uint16_t _port = 80,
>   const std::string& _path = "/",
>   const hashmap& _query =
> (hashmap()),
>   const Option& _fragment = None())
> : scheme(_scheme),
>   domain(_domain),
>   port(_port),
>   path(_path),
>   query(_query),
>   fragment(_fragment) {}
> {noformat}
> * slave/containerizer/linux_launcher.cpp:56
> {noformat}
> LinuxLauncher::LinuxLauncher(
> const Flags& _flags,
> int _namespaces,
> const string&

[jira] [Created] (MESOS-2616) Update C++ style guide on variable naming.

2015-04-14 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-2616:
-

 Summary: Update C++ style guide on variable naming. 
 Key: MESOS-2616
 URL: https://issues.apache.org/jira/browse/MESOS-2616
 Project: Mesos
  Issue Type: Documentation
Reporter: Till Toenshoff
Assignee: Alexander Rukletsov
Priority: Minor


Our variable naming guide currently is not really explaining use cases for 
leading or trailing underscores as found a lot within our codebase. 

We should correct that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2191) Add ContainerId to the TaskStatus message

2015-04-14 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493819#comment-14493819
 ] 

Timothy Chen commented on MESOS-2191:
-

Hi Marcel,

Thanks for explaining the rationale for getting ContainerId. There are quite a 
number of requests to ask for the Container name, and also different requests 
asking for various Docker container related information to get sent back to the 
scheduler for advanced use cases as you described.

What I'm currently considering doing is to actually send back the whole docker 
inspect response back in TaskStatus, so in your scheduler when you see 
TaskStatus with TASK_RUNNING, the optional data field will contain the Docker 
inspect JSON output string serailized into a byte array. Then you can get any 
information such as name, network settings, volumes, all in in one place.

Let me know what you think and if you foresee problems with this.

> Add ContainerId to the TaskStatus message
> -
>
> Key: MESOS-2191
> URL: https://issues.apache.org/jira/browse/MESOS-2191
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Marcel Neuhausler
>Assignee: Alexander Rojas
>  Labels: mesosphere
>
> {{TaskStatus}} provides the frameworks with certain information 
> ({{executorId}}, {{slaveId}}, etc.) which is useful when collecting 
> statistics about cluster performance; however, it is difficult to associate 
> tasks to the container it is executed since this information stays always 
> within mesos itself. Therefore it would be good to provide the framework 
> scheduler with this information, adding a new field in the {{TaskStatus}} 
> message.
> See comments for a use case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2598) Slave state.json frameworks.executors.queued_tasks wrong format?

2015-04-14 Thread Matthias Veit (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493786#comment-14493786
 ] 

Matthias Veit commented on MESOS-2598:
--

Sure.

> Slave state.json frameworks.executors.queued_tasks wrong format?
> 
>
> Key: MESOS-2598
> URL: https://issues.apache.org/jira/browse/MESOS-2598
> Project: Mesos
>  Issue Type: Bug
>  Components: statistics
>Affects Versions: 0.22.0
> Environment: Linux version 3.10.0-229.1.2.el7.x86_64 
> (buil...@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 
> 4.8.2-16) (GCC) ) #1 SMP Fri Mar 27 03:04:26 UTC 2015
>Reporter: Matthias Veit
>Priority: Minor
>  Labels: newbie
>
> queued_tasks.executor_id is expected to be a string and not a complete json 
> object. It should have the very same format as the tasks array on the same 
> level.
> Example, directly taken from slave
> {noformat}
>  
> "queued_tasks": [
> {
>   "data": "",
>   "executor_id": {
> "command": {
>   "argv": [],
>   "uris": [
> {
>   "executable": false,
>   "value": 
> "http://downloads.foo.io/orchestra/storm-mesos/0.9.2-incubating-47-ovh.bb373df1c/storm-mesos-0.9.2-incubating.tgz";
> }
>   ],
>   "value": "cd storm-mesos* && python bin/storm supervisor 
> storm.mesos.MesosSupervisor"
> },
> "data": 
> "{\"assignmentid\":\"srv4.hw.ca1.foo.com\",\"supervisorid\":\"srv4.hw.ca1.foo.com-stage-ingestion-stats-slave-111-1428421145\"}",
> "executor_id": "stage-ingestion-stats-slave-111-1428421145",
> "framework_id": "20150401-160104-251662508-5050-2197-0002",
> "name": "",
> "resources": {
>   "cpus": 0.5,
>   "disk": 0,
>   "mem": 1000
> }
>   },
>   "id": "srv4.hw.ca1.foo.com-31708",
>   "name": "worker srv4.hw.ca1.foo.com:31708",
>   "resources": {
> "cpus": 1,
> "disk": 0,
> "mem": 5120,
> "ports": "[31708-31708]"
>   },
>   "slave_id": "20150327-025553-218108076-5050-4122-S0"
> },
> ...
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2598) Slave state.json frameworks.executors.queued_tasks wrong format?

2015-04-14 Thread Matthias Veit (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493787#comment-14493787
 ] 

Matthias Veit commented on MESOS-2598:
--

Sure.

> Slave state.json frameworks.executors.queued_tasks wrong format?
> 
>
> Key: MESOS-2598
> URL: https://issues.apache.org/jira/browse/MESOS-2598
> Project: Mesos
>  Issue Type: Bug
>  Components: statistics
>Affects Versions: 0.22.0
> Environment: Linux version 3.10.0-229.1.2.el7.x86_64 
> (buil...@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 
> 4.8.2-16) (GCC) ) #1 SMP Fri Mar 27 03:04:26 UTC 2015
>Reporter: Matthias Veit
>Priority: Minor
>  Labels: newbie
>
> queued_tasks.executor_id is expected to be a string and not a complete json 
> object. It should have the very same format as the tasks array on the same 
> level.
> Example, directly taken from slave
> {noformat}
>  
> "queued_tasks": [
> {
>   "data": "",
>   "executor_id": {
> "command": {
>   "argv": [],
>   "uris": [
> {
>   "executable": false,
>   "value": 
> "http://downloads.foo.io/orchestra/storm-mesos/0.9.2-incubating-47-ovh.bb373df1c/storm-mesos-0.9.2-incubating.tgz";
> }
>   ],
>   "value": "cd storm-mesos* && python bin/storm supervisor 
> storm.mesos.MesosSupervisor"
> },
> "data": 
> "{\"assignmentid\":\"srv4.hw.ca1.foo.com\",\"supervisorid\":\"srv4.hw.ca1.foo.com-stage-ingestion-stats-slave-111-1428421145\"}",
> "executor_id": "stage-ingestion-stats-slave-111-1428421145",
> "framework_id": "20150401-160104-251662508-5050-2197-0002",
> "name": "",
> "resources": {
>   "cpus": 0.5,
>   "disk": 0,
>   "mem": 1000
> }
>   },
>   "id": "srv4.hw.ca1.foo.com-31708",
>   "name": "worker srv4.hw.ca1.foo.com:31708",
>   "resources": {
> "cpus": 1,
> "disk": 0,
> "mem": 5120,
> "ports": "[31708-31708]"
>   },
>   "slave_id": "20150327-025553-218108076-5050-4122-S0"
> },
> ...
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)