[jira] [Updated] (MESOS-6114) ClassNotFoundException shows when loading java class in framework

2016-10-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6114:
---
Fix Version/s: (was: 0.28.3)

> ClassNotFoundException shows when loading java class in framework
> -
>
> Key: MESOS-6114
> URL: https://issues.apache.org/jira/browse/MESOS-6114
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Affects Versions: 0.28.1, 0.28.2, 1.0.0, 1.0.1
> Environment: Mesos0.28.1
> Marathon 1.1.1
> os   redhat-7.2 (x86-64)
> kernel   3.10.0-327 (x86-64)
> Java openjdk-1.8.0_65
>Reporter: Sam chen
>  Labels: patch
>
> 1. When we are trying to develop "scheduler" and "executor" using java
> 2. When we use our own java ClassLoader
> 3. It throws "ClassNotFoundException" when loading java class
> After we investigated Mesos, it created jvm via jni. While 
> attachcurrentthread, it did not mulipulate context.  The below is error log:
> I0823 05:54:38.074373 8 logging.cpp:188] INFO level logging started!
> I0823 05:54:38.076400 8 exec.cpp:143] Version: 0.28.1
> I0823 05:54:38.08059052 exec.cpp:217] Executor registered on slave 
> 326a-43cc-42f7-8e55-648bdc8cc9d8-S12
> Exception in thread "Thread-17" java.lang.NoClassDefFoundError: 
> com/googlecode/aviator/ClassExpression
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:642)
> at 
> com.googlecode.aviator.parser.AviatorClassLoader.defineClass(AviatorClassLoader.java:35)
> at 
> com.googlecode.aviator.code.asm.ASMCodeGenerator.getResult(ASMCodeGenerator.java:664)
> at 
> com.googlecode.aviator.code.OptimizeCodeGenerator.getResult(OptimizeCodeGenerator.java:367)
> at 
> com.googlecode.aviator.parser.ExpressionParser.parse(ExpressionParser.java:681)
> at 
> com.googlecode.aviator.AviatorEvaluator.innerCompile(AviatorEvaluator.java:468)
> at 
> com.googlecode.aviator.AviatorEvaluator.compile(AviatorEvaluator.java:447)
> at 
> com.googlecode.aviator.AviatorEvaluator.compile(AviatorEvaluator.java:495)
> at com.cusi.babel.rwsplit.sync.convertor.ExprRule.(ExprRule.java:20)
> at 
> com.cusi.babel.rwsplit.sync.convertor.ConvertRuleFactory.createConvertRule(ConvertRuleFactory.java:15)
> at 
> com.cusi.babel.rwsplit.sync.convertor.ColumnRule.(ColumnRule.java:29)
> at com.cusi.babel.rwsplit.sync.convertor.TaskRule.(TaskRule.java:27)
> at com.cusi.babel.rwsplit.sync.task.Task.(Task.java:27)
> at com.cusi.babel.rwsplit.sync.Engine.startTask(Engine.java:103)
> at 
> com.cusi.babel.rwsplit.sync.mesos.SyncExecutor.launchTask(SyncExecutor.java:82)
> Caused by: java.lang.ClassNotFoundException: 
> com.googlecode.aviator.ClassExpression
> at java.lang.ClassLoader.findClass(ClassLoader.java:530)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 17 more
> I0823 05:54:38.87330152 exec.cpp:425] Deactivating the executor libprocess



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6134) Port CFS quota support to Docker Containerizer using command executor

2016-10-10 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562343#comment-15562343
 ] 

Alexander Rukletsov commented on MESOS-6134:


Retargeting this for 1.2, please speak up if you want to land this in 1.1.0.

> Port CFS quota support to Docker Containerizer using command executor
> -
>
> Key: MESOS-6134
> URL: https://issues.apache.org/jira/browse/MESOS-6134
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> MESOS-2154 only partially fixed the CFS quota support in Docker 
> Containerizer: that fix only works for custom executor.
> This tracks the fix for command executor so we can declare this is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6134) Port CFS quota support to Docker Containerizer using command executor.

2016-10-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6134:
---
Target Version/s: 1.2.0
   Fix Version/s: (was: 1.1.0)
 Summary: Port CFS quota support to Docker Containerizer using 
command executor.  (was: Port CFS quota support to Docker Containerizer using 
command executor)

> Port CFS quota support to Docker Containerizer using command executor.
> --
>
> Key: MESOS-6134
> URL: https://issues.apache.org/jira/browse/MESOS-6134
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> MESOS-2154 only partially fixed the CFS quota support in Docker 
> Containerizer: that fix only works for custom executor.
> This tracks the fix for command executor so we can declare this is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6178) LinuxFilesystemIsolatorTest.ROOT_RecoverOrphanedPersistentVolume is flaky

2016-10-10 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562347#comment-15562347
 ] 

Alexander Rukletsov commented on MESOS-6178:


[~greggomann] was this really fixed in 1.0.1 or just targeted? Could you please 
adjust target/fix versions?

> LinuxFilesystemIsolatorTest.ROOT_RecoverOrphanedPersistentVolume is flaky
> -
>
> Key: MESOS-6178
> URL: https://issues.apache.org/jira/browse/MESOS-6178
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Greg Mann
>  Labels: mesosphere, tests
> Fix For: 1.0.1
>
>
> Observed on our internal CI, on both Ubuntu and CentOS:
> {code}
> [21:37:58] :   [Step 10/10] [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_RecoverOrphanedPersistentVolume
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.874855 23546 cluster.cpp:157] 
> Creating default 'local' authorizer
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.884749 23546 leveldb.cpp:174] 
> Opened db in 9.758968ms
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.887332 23546 leveldb.cpp:181] 
> Compacted db in 2.564813ms
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.887353 23546 leveldb.cpp:196] 
> Created db iterator in 3316ns
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.887362 23546 leveldb.cpp:202] 
> Seeked to beginning of db in 596ns
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.887367 23546 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 358ns
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.887377 23546 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.887632 23566 recover.cpp:451] 
> Starting replica recovery
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.887719 23566 recover.cpp:477] 
> Replica is in EMPTY status
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.888072 23565 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(5938)@172.30.2.175:42074
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.888202 23564 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.888434 23567 recover.cpp:568] 
> Updating replica status to STARTING
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.888592 23560 master.cpp:380] 
> Master 0b0fca1d-a807-4831-9cf1-70f31c3c25d3 (ip-172-30-2-175.mesosphere.io) 
> started on 172.30.2.175:42074
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.888602 23560 master.cpp:382] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/5wLQhS/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/5wLQhS/master" --zk_session_timeout="10secs"
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.888710 23560 master.cpp:432] 
> Master only allowing authenticated frameworks to register
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.888715 23560 master.cpp:446] 
> Master only allowing authenticated agents to register
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.888717 23560 master.cpp:459] 
> Master only allowing authenticated HTTP frameworks to register
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.888720 23560 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/5wLQhS/credentials'
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.888793 23560 master.cpp:504] Using 
> default 'crammd5' authenticator
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.31 23560 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.64 23560 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
> [21:37:58]W:   [Step 10/10] I0915 21:37:58.85 23560 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-sched

[jira] [Updated] (MESOS-6236) Launch subprocesses associated with specified namespaces.

2016-10-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6236:
---
Fix Version/s: (was: 1.1.0)

> Launch subprocesses associated with specified namespaces.
> -
>
> Key: MESOS-6236
> URL: https://issues.apache.org/jira/browse/MESOS-6236
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> Currently there is no standard way in Mesos to launch a child process in a 
> different namespace (e.g. {{net}}, {{mnt}}). A user may leverage 
> {{Subprocess}} and provide its own {{clone}} callback, but this approach is 
> error-prone.
> One possible solution is to implement a {{Subprocess}}' child hook. In 
> [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have 
> introduced a child hook framework in subprocess and implemented three child 
> hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. We suggest to introduce 
> another child hook {{SETNS}} so that other components (e.g., health check) 
> can call it to enter the namespaces of a specific process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6014) Create a CNI plugin that provides port mapping functionality for various CNI plugins.

2016-10-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6014:
---
Target Version/s: 1.1.0
   Fix Version/s: (was: 1.1.0)
 Summary: Create a CNI plugin that provides port mapping 
functionality for various CNI plugins.  (was: Create a CNI plugin that provides 
port mapping functionality for various CNI plugins)

> Create a CNI plugin that provides port mapping functionality for various CNI 
> plugins.
> -
>
> Key: MESOS-6014
> URL: https://issues.apache.org/jira/browse/MESOS-6014
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently there is no CNI plugin that supports port mapping. Given that the 
> unified containerizer is starting to become the de-facto container run time, 
> having  a CNI plugin that provides port mapping is a must have. This is 
> primarily required for support BRIDGE networking mode, similar to docker 
> bridge networking that users expect to have when using docker containers. 
> While the most obvious use case is that of using the port-mapper plugin with 
> the bridge plugin, the port-mapping functionality itself is generic and 
> should be usable with any CNI plugin that needs it.
> Keeping port-mapping as a CNI plugin gives operators the ability to use the 
> default port-mapper (CNI plugin) that Mesos provides, or use their own plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6264) Investigate the high memory usage of the default executor.

2016-10-10 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562350#comment-15562350
 ] 

Alexander Rukletsov commented on MESOS-6264:


Retargeting this for 1.2.0, please speak up if you want to land this in 1.1.0.

> Investigate the high memory usage of the default executor.
> --
>
> Key: MESOS-6264
> URL: https://issues.apache.org/jira/browse/MESOS-6264
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 1.1.0
>
> Attachments: pmap_output_for_the_default_executor.txt
>
>
> It seems that a default executor with two sleep tasks is using ~32 mb on 
> average and can sometimes lead to it being killed for some tests like 
> {{SlaveRecoveryTest/0.ROOT_CGROUPS_ReconnectDefaultExecutor}} on our internal 
> CI. Attached the {{pmap}} output for the default executor. Please note that 
> the command executor memory usage is also pretty high (~26 mb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6264) Investigate the high memory usage of the default executor.

2016-10-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6264:
---
Target Version/s: 1.2.0
   Fix Version/s: (was: 1.1.0)

> Investigate the high memory usage of the default executor.
> --
>
> Key: MESOS-6264
> URL: https://issues.apache.org/jira/browse/MESOS-6264
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: mesosphere
> Attachments: pmap_output_for_the_default_executor.txt
>
>
> It seems that a default executor with two sleep tasks is using ~32 mb on 
> average and can sometimes lead to it being killed for some tests like 
> {{SlaveRecoveryTest/0.ROOT_CGROUPS_ReconnectDefaultExecutor}} on our internal 
> CI. Attached the {{pmap}} output for the default executor. Please note that 
> the command executor memory usage is also pretty high (~26 mb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6290) Support nested containers for logger in Mesos Containerizer.

2016-10-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6290:
---
Target Version/s: 1.2.0
   Fix Version/s: (was: 1.1.0)

> Support nested containers for logger in Mesos Containerizer.
> 
>
> Key: MESOS-6290
> URL: https://issues.apache.org/jira/browse/MESOS-6290
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer, logger, mesosphere
>
> Currently, there are two issues in mesos containerizer using logger for 
> nested contaienrs:
> 1. An empty executorinfo is passed to logger when launching a nested 
> container, it would potentially break some logger modules if any module tries 
> to access the required proto field (e.g., executorId).
> 2. The logger does not reocver the nested containers yet in 
> MesosContainerizer::recover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6290) Support nested containers for logger in Mesos Containerizer.

2016-10-10 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562364#comment-15562364
 ] 

Alexander Rukletsov commented on MESOS-6290:


Retargeting this for 1.2, please speak up if you want to land this in 1.1.0.

> Support nested containers for logger in Mesos Containerizer.
> 
>
> Key: MESOS-6290
> URL: https://issues.apache.org/jira/browse/MESOS-6290
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer, logger, mesosphere
> Fix For: 1.1.0
>
>
> Currently, there are two issues in mesos containerizer using logger for 
> nested contaienrs:
> 1. An empty executorinfo is passed to logger when launching a nested 
> container, it would potentially break some logger modules if any module tries 
> to access the required proto field (e.g., executorId).
> 2. The logger does not reocver the nested containers yet in 
> MesosContainerizer::recover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4995) Make it possible to directly defer invocations of const member functions.

2016-10-11 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4995:
---
Summary: Make it possible to directly defer invocations of const member 
functions.  (was: Make it possible to directly defer invocations of const 
member functions)

> Make it possible to directly defer invocations of const member functions.
> -
>
> Key: MESOS-4995
> URL: https://issues.apache.org/jira/browse/MESOS-4995
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Benjamin Bannier
>  Labels: mesosphere, newbie++
>
> Currently libprocess' {{defer}} provides no overloads to invoke {{const}} 
> member functions.
> This has lead to a situation where often effectively {{const}} getters are 
> not made {{const}}, purely to allow straight-forward usage of {{defer}}, and 
> leads to surprising API choices motivated only by limitations in low-level 
> infrastructure (here: {{defer}}).
> We should augument {{defer}} with overloads allowing to {{defer}} invocation 
> of {{const}} member functions, and tighten up interfaces of existing code 
> where possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6355) Improvements to task group support.

2016-10-11 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6355:
---
Summary: Improvements to task group support.  (was: Improvements to task 
group support)

> Improvements to task group support.
> ---
>
> Key: MESOS-6355
> URL: https://issues.apache.org/jira/browse/MESOS-6355
> Project: Mesos
>  Issue Type: Epic
>Reporter: Vinod Kone
>
> This is a follow up epic to MESOS-2249 to capture further improvements and 
> changes that need to be made to the MVP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2449) Support group of tasks (Pod) constructs and API in Mesos.

2016-10-11 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-2449:
---
Summary: Support group of tasks (Pod) constructs and API in Mesos.  (was: 
Support group of tasks (Pod) constructs and API in Mesos)

> Support group of tasks (Pod) constructs and API in Mesos.
> -
>
> Key: MESOS-2449
> URL: https://issues.apache.org/jira/browse/MESOS-2449
> Project: Mesos
>  Issue Type: Epic
>Reporter: Timothy Chen
>  Labels: mesosphere
>
> There is a common need among different frameworks, that wants to start a 
> group of tasks that are either depend or co-located with each other.
> Although a framework can schedule individual tasks within the same offer and 
> slave id, it doesn't have a way to describe dependencies, failure policies 
> (if one of the task failed), network setup, and group container information, 
> etc.
> Want to create a epic to start the discussion around the requirements folks 
> need, and see where we can lead this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4828) XFS disk quota isolator

2016-10-11 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4828:
---
Target Version/s: 1.2.0
   Fix Version/s: (was: 1.2.0)

> XFS disk quota isolator
> ---
>
> Key: MESOS-4828
> URL: https://issues.apache.org/jira/browse/MESOS-4828
> Project: Mesos
>  Issue Type: Epic
>  Components: isolation
>Reporter: James Peach
>Assignee: James Peach
>
> Implement a disk resource isolator using XFS project quotas. Compared to the 
> {{posix/disk}} isolator, this doesn't need to scan the filesystem 
> periodically, and applications receive a {{EDQUOT}} error instead of being 
> summarily killed.
> This initial implementation only isolates sandbox directory resources, since 
> isolation doesn't have any visibility into the the lifecycle of volumes, 
> which is needed to assign and track project IDs.
> The build dependencies for this are XFS header (from xfsprogs-devel) and 
> libblkid. We need libblkid or the equivalent to map filesystem paths to block 
> devices in order to apply quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5344) Partition-aware Mesos frameworks

2016-10-11 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5344:
---
Target Version/s: 1.1.0

> Partition-aware Mesos frameworks
> 
>
> Key: MESOS-5344
> URL: https://issues.apache.org/jira/browse/MESOS-5344
> Project: Mesos
>  Issue Type: Epic
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Blocker
>  Labels: mesosphere
>
> This epic covers three related tasks:
> 1. Allowing partitioned agents to reregister with the master. This allows 
> frameworks to control how tasks running on partitioned agents should be dealt 
> with.
> 2. Replacing the TASK_LOST task state with a set of more granular states with 
> more precise semantics: UNREACHABLE, DROPPED, UNKNOWN, GONE, and 
> GONE_BY_OPERATOR.
> 3. Allow frameworks to be informed when a task that was running on a 
> partitioned agent has been terminated (GONE and GONE_BY_OPERATOR states).
> These new behaviors will be guarded by the {{PARTITION_AWARE}} framework 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5344) Partition-aware Mesos frameworks

2016-10-11 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5344:
---
Fix Version/s: (was: 1.1.0)

> Partition-aware Mesos frameworks
> 
>
> Key: MESOS-5344
> URL: https://issues.apache.org/jira/browse/MESOS-5344
> Project: Mesos
>  Issue Type: Epic
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Blocker
>  Labels: mesosphere
>
> This epic covers three related tasks:
> 1. Allowing partitioned agents to reregister with the master. This allows 
> frameworks to control how tasks running on partitioned agents should be dealt 
> with.
> 2. Replacing the TASK_LOST task state with a set of more granular states with 
> more precise semantics: UNREACHABLE, DROPPED, UNKNOWN, GONE, and 
> GONE_BY_OPERATOR.
> 3. Allow frameworks to be informed when a task that was running on a 
> partitioned agent has been terminated (GONE and GONE_BY_OPERATOR states).
> These new behaviors will be guarded by the {{PARTITION_AWARE}} framework 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6290) Support nested containers for logger in Mesos Containerizer.

2016-10-12 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6290:
---
Priority: Blocker  (was: Major)

> Support nested containers for logger in Mesos Containerizer.
> 
>
> Key: MESOS-6290
> URL: https://issues.apache.org/jira/browse/MESOS-6290
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: containerizer, logger, mesosphere
>
> Currently, there are two issues in mesos containerizer using logger for 
> nested contaienrs:
> 1. An empty executorinfo is passed to logger when launching a nested 
> container, it would potentially break some logger modules if any module tries 
> to access the required proto field (e.g., executorId).
> 2. The logger does not reocver the nested containers yet in 
> MesosContainerizer::recover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6293) HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.

2016-10-13 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6293:
---
Shepherd: Alexander Rukletsov

> HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.
> 
>
> Key: MESOS-6293
> URL: https://issues.apache.org/jira/browse/MESOS-6293
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>  Labels: health-check, mesosphere
>
> I see consistent failures of this test in the internal CI in *some* distros, 
> specifically CentOS 6, Ubuntu 14, 15, 16. The source of the health check 
> failure is always the same: {{curl}} cannot connect to the target:
> {noformat}
> Received task health update, healthy: false
> W0929 17:22:05.270992  2730 health_checker.cpp:204] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) couldn't connect to host
> I0929 17:22:05.273634 26850 slave.cpp:3609] Handling status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from executor(1)@172.30.2.20:58660
> I0929 17:22:05.274178 26844 status_update_manager.cpp:323] Received status 
> update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274226 26844 status_update_manager.cpp:377] Forwarding update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to the agent
> I0929 17:22:05.274314 26845 slave.cpp:4026] Forwarding the update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to master@172.30.2.20:38955
> I0929 17:22:05.274415 26845 slave.cpp:3920] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274436 26845 slave.cpp:3936] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for 
> task aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of 
> framework 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to 
> executor(1)@172.30.2.20:58660
> I0929 17:22:05.274534 26849 master.cpp:5661] Status update TASK_RUNNING 
> (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from agent 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-S0 at slave(77)@172.30.2.20:38955 
> (ip-172-30-2-20.mesosphere.io)
> ../../src/tests/health_check_tests.cpp:1398: Failure
> I0929 17:22:05.274567 26849 master.cpp:5723] Forwarding status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> Value of: statusHealth.get().healthy()
>   Actual: false
>   Expected: true
> I0929 17:22:05.274636 26849 master.cpp:7560] Updating the state of task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- (latest state: TASK_RUNNING, status 
> update state: TASK_RUNNING)
> I0929 17:22:05.274829 26844 sched.cpp:1025] Scheduler::statusUpdate took 
> 43297ns
> Received SHUTDOWN event
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6387) Improve reporting of parallel test runner.

2016-10-13 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6387:
---
Labels: mesosphere newbie python  (was: newbie python)

> Improve reporting of parallel test runner.
> --
>
> Key: MESOS-6387
> URL: https://issues.apache.org/jira/browse/MESOS-6387
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Benjamin Bannier
>  Labels: mesosphere, newbie, python
>
> We should improve the logging of the parallel test runner. Improved logging 
> seems to require some parsing of GoogleTest output, e.g., to prevent 
> interleaving test output from concurrently executing tests.
> We should add a verbose mode which can print results as they arrive from 
> shards in order to not let users wonder if tests are executing.
> We should also provide a way to properly unify output from different shards, 
> e.g., report passed and failed tests in a single list instead of listing all 
> shards separately (this e.g., makes tests from failed shards reported first 
> harder to discover).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6387) Improve reporting of parallel test runner.

2016-10-13 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6387:
---
Summary: Improve reporting of parallel test runner.  (was: Improve 
reporting of parallel test runner)

> Improve reporting of parallel test runner.
> --
>
> Key: MESOS-6387
> URL: https://issues.apache.org/jira/browse/MESOS-6387
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Benjamin Bannier
>  Labels: mesosphere, newbie, python
>
> We should improve the logging of the parallel test runner. Improved logging 
> seems to require some parsing of GoogleTest output, e.g., to prevent 
> interleaving test output from concurrently executing tests.
> We should add a verbose mode which can print results as they arrive from 
> shards in order to not let users wonder if tests are executing.
> We should also provide a way to properly unify output from different shards, 
> e.g., report passed and failed tests in a single list instead of listing all 
> shards separately (this e.g., makes tests from failed shards reported first 
> harder to discover).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6387) Improve reporting of parallel test runner.

2016-10-13 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573204#comment-15573204
 ] 

Alexander Rukletsov commented on MESOS-6387:


The issue I have with parallel tests is that it is unclear how many tests have 
failed. I have to scroll up to gather a list of failed tests, because there is 
no aggregation among shards. For example, in this run two tests have actually 
failed, which is unclear from the output:
{noformat}
[  FAILED  ] 1 test, listed below:
[  FAILED  ] DefaultExecutorTest.KillTaskGroupOnTaskFailure

 1 FAILED TEST
  YOU HAVE 4 DISABLED TESTS



[FAIL]
make[3]: *** [check-local] Error 2
make[2]: *** [check-am] Error 2
make[1]: *** [check] Error 2
make: *** [check-recursive] Error 1
{noformat}

> Improve reporting of parallel test runner.
> --
>
> Key: MESOS-6387
> URL: https://issues.apache.org/jira/browse/MESOS-6387
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Benjamin Bannier
>  Labels: mesosphere, newbie, python
>
> We should improve the logging of the parallel test runner. Improved logging 
> seems to require some parsing of GoogleTest output, e.g., to prevent 
> interleaving test output from concurrently executing tests.
> We should add a verbose mode which can print results as they arrive from 
> shards in order to not let users wonder if tests are executing.
> We should also provide a way to properly unify output from different shards, 
> e.g., report passed and failed tests in a single list instead of listing all 
> shards separately (this e.g., makes tests from failed shards reported first 
> harder to discover).
> Distinguishing reports from tests run in parallel and sequentially might be 
> useful as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6140) Add a parallel test runner

2016-10-13 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573226#comment-15573226
 ] 

Alexander Rukletsov commented on MESOS-6140:


{noformat}
Commit: f67e8b572c3e8111f7ab79b65ed5e3dbdd20cb46 [f67e8b5]
Author: Benjamin Bannier benjamin.bann...@mesosphere.io
Date: 13 October 2016 at 23:23:08 GMT+2
Committer: Alexander Rukletsov al...@apache.org

Provided more information on failed parallel tests.

In case of failed tests the parallel test runner always prints the
full log of all failed shards. Since our tests produce usually more
than a screenful of output usually only failed tests from the shard
reported last will be visible which can give the impression that only
this shard failed.

This commit adds the number of failed shards to the output so users
get some hint that more failed tests might be reported off their
visible screen area.

Review: https://reviews.apache.org/r/52841/
{noformat}

> Add a parallel test runner
> --
>
> Key: MESOS-6140
> URL: https://issues.apache.org/jira/browse/MESOS-6140
> Project: Mesos
>  Issue Type: Improvement
>  Components: tests
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> In order to allow parallelization of the test execution we should add a 
> parallel test executor to Mesos, and subsequently activate it in the build 
> setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6344) Allow `network/cni` isolator to take a search path for CNI plugins instead of single directory

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6344:
---
Priority: Blocker  (was: Major)

> Allow `network/cni` isolator to take a search path for CNI plugins instead of 
> single directory
> --
>
> Key: MESOS-6344
> URL: https://issues.apache.org/jira/browse/MESOS-6344
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
>
> Currently the `network/cni` isolator expects a single directory with the 
> `--network_cni_plugins_dir` . This is very limiting because this forces the 
> operator to put all the CNI plugins in the same directory. 
> With Mesos port-mapper CNI plugin this would also imply that the operator 
> would have to move this plugin from the Mesos installation directory to a 
> directory specified in the `--network_cni_plugins_dir`. 
> To simplify the operators experience it would make sense for the 
> `--network_cni_plugins_dir` flag to take in set of directories instead of 
> single directory. The `network/cni` isolator can then search this set of 
> directories to find the CNI plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6023) Create a binary for the port-mapper plugin

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6023:
---
Priority: Blocker  (was: Major)

> Create a binary for the port-mapper plugin
> --
>
> Key: MESOS-6023
> URL: https://issues.apache.org/jira/browse/MESOS-6023
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>
> The CNI port mapper plugin needs to be a separate binary that will be invoked 
> by the `network/cni` isolator as a CNI plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6040) Add a CMake build for `mesos-port-mapper`

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6040:
---
Priority: Blocker  (was: Major)

> Add a CMake build for `mesos-port-mapper`
> -
>
> Key: MESOS-6040
> URL: https://issues.apache.org/jira/browse/MESOS-6040
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
>
> Once the port-mapper binary compiles with GNU make, we need to modify the 
> CMake to build the port-mapper binary as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6014) Create a CNI plugin that provides port mapping functionality for various CNI plugins.

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6014:
---
Priority: Blocker  (was: Major)

> Create a CNI plugin that provides port mapping functionality for various CNI 
> plugins.
> -
>
> Key: MESOS-6014
> URL: https://issues.apache.org/jira/browse/MESOS-6014
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
>
> Currently there is no CNI plugin that supports port mapping. Given that the 
> unified containerizer is starting to become the de-facto container run time, 
> having  a CNI plugin that provides port mapping is a must have. This is 
> primarily required for support BRIDGE networking mode, similar to docker 
> bridge networking that users expect to have when using docker containers. 
> While the most obvious use case is that of using the port-mapper plugin with 
> the bridge plugin, the port-mapping functionality itself is generic and 
> should be usable with any CNI plugin that needs it.
> Keeping port-mapping as a CNI plugin gives operators the ability to use the 
> default port-mapper (CNI plugin) that Mesos provides, or use their own plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2449) Support group of tasks (Pod) constructs and API in Mesos.

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-2449:
---
Priority: Blocker  (was: Major)

> Support group of tasks (Pod) constructs and API in Mesos.
> -
>
> Key: MESOS-2449
> URL: https://issues.apache.org/jira/browse/MESOS-2449
> Project: Mesos
>  Issue Type: Epic
>Reporter: Timothy Chen
>Priority: Blocker
>  Labels: mesosphere
>
> There is a common need among different frameworks, that wants to start a 
> group of tasks that are either depend or co-located with each other.
> Although a framework can schedule individual tasks within the same offer and 
> slave id, it doesn't have a way to describe dependencies, failure policies 
> (if one of the task failed), network setup, and group container information, 
> etc.
> Want to create a epic to start the discussion around the requirements folks 
> need, and see where we can lead this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6283) Fix the Web UI allowing access to the task sandbox for nested containers.

2016-10-14 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575090#comment-15575090
 ] 

Alexander Rukletsov commented on MESOS-6283:


Folks, do you still plan to land it in 1.1.0?

> Fix the Web UI allowing access to the task sandbox for nested containers.
> -
>
> Key: MESOS-6283
> URL: https://issues.apache.org/jira/browse/MESOS-6283
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: Anand Mazumdar
>Assignee: haosdent
>Priority: Blocker
>  Labels: mesosphere
> Attachments: sandbox.gif
>
>
> Currently, the sandbox button for a child task is broken on the WebUI. It 
> does nothing and dies with an error that the executor for this task cannot be 
> found. We need to fix the WebUI to follow the symlink "tasks/taskId" and 
> display the task sandbox to the users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6335) Add user doc for task group tasks

2016-10-14 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575094#comment-15575094
 ] 

Alexander Rukletsov commented on MESOS-6335:


Is it still to land in 1.1.0?

> Add user doc for task group tasks
> -
>
> Key: MESOS-6335
> URL: https://issues.apache.org/jira/browse/MESOS-6335
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6134) Port CFS quota support to Docker Containerizer using command executor.

2016-10-14 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575096#comment-15575096
 ] 

Alexander Rukletsov commented on MESOS-6134:


This issue is delaying the 1.1.0 release and shows no progress in the last 
days. It is retargeted for 1.2.0.

> Port CFS quota support to Docker Containerizer using command executor.
> --
>
> Key: MESOS-6134
> URL: https://issues.apache.org/jira/browse/MESOS-6134
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> MESOS-2154 only partially fixed the CFS quota support in Docker 
> Containerizer: that fix only works for custom executor.
> This tracks the fix for command executor so we can declare this is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6134) Port CFS quota support to Docker Containerizer using command executor.

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6134:
---
Target Version/s: 1.2.0  (was: 1.1.0)

> Port CFS quota support to Docker Containerizer using command executor.
> --
>
> Key: MESOS-6134
> URL: https://issues.apache.org/jira/browse/MESOS-6134
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> MESOS-2154 only partially fixed the CFS quota support in Docker 
> Containerizer: that fix only works for custom executor.
> This tracks the fix for command executor so we can declare this is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6376) Add documentation for capabilities support of the mesos containerizer

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6376:
---
Priority: Blocker  (was: Major)

> Add documentation for capabilities support of the mesos containerizer
> -
>
> Key: MESOS-6376
> URL: https://issues.apache.org/jira/browse/MESOS-6376
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Blocker
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6376) Add documentation for capabilities support of the mesos containerizer

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6376:
---
Story Points: 3  (was: 1)

> Add documentation for capabilities support of the mesos containerizer
> -
>
> Key: MESOS-6376
> URL: https://issues.apache.org/jira/browse/MESOS-6376
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Blocker
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6337) Nested containers getting killed before network isolation can be applied to them.

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6337:
---
Target Version/s:   (was: 1.1.0)

> Nested containers getting killed before network isolation can be applied to 
> them.
> -
>
> Key: MESOS-6337
> URL: https://issues.apache.org/jira/browse/MESOS-6337
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Gilbert Song
>  Labels: mesosphere
>
> Seeing this odd behavior in one of our clusters:
> ```
> http.cpp:1948] Failed to launch nested container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: 
> Collect failed: Failed to seed container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: 
> Collect failed: Failed to setup hostname and network files: Failed to enter 
> the mount namespace of pid 21591: Pid 21591 does not exist
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.894485 
> 31531 containerizer.cpp:1931] Destroying container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e in 
> ISOLATING state
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.894439 
> 31531 containerizer.cpp:2300] Container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e has 
> exited
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.854456 
> 31534 systemd.cpp:96] Assigned child process '21591' to 
> 'mesos_executors.slice'
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: W1007 02:05:55.831861 
> 21580 process.cpp:882] Failed SSL connections will be downgraded to a non-SSL 
> socket
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: NOTE: Set 
> LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.831526 
> 21580 openssl.cpp:432] Will only verify peer certificate if presented!
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: NOTE: Set 
> LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.831521 
> 21580 openssl.cpp:426] Will not verify peer certificate!
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.831511 
> 21580 openssl.cpp:421] CA directory path unspecified! NOTE: Set CA directory 
> path with LIBPROCESS_SSL_CA_DIR=
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: W1007 02:05:55.831405 
> 21580 openssl.cpp:399] Failed SSL connections will be downgraded to a non-SSL 
> socket
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: WARNING: Logging before 
> InitGoogleLogging() is written to STDERR
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: W1007 02:05:55.828413 
> 21581 process.cpp:882] Failed SSL connections will be downgraded to a non-SSL 
> socket
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: NOTE: Set 
> LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification
> ```
> The above log is "reverse" chronological order, so please read it bottom up.
> The relevant log is:
> ```
> http.cpp:1948] Failed to launch nested container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: 
> Collect failed: Failed to seed container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: 
> Collect failed: Failed to setup hostname and network files: Failed to enter 
> the mount namespace of pid 21591: Pid 21591 does not exist
> ```
> Looks like the nested container failed to launch because the `isolate` call 
> to the `network/cni` isolator failed. Seems like when the isolator received 
> the `isolate` call the PID for the nested container has already exited and it 
> couldn't enter its mount namespace to setup the network files. 
> The odd thing here is that the nested container would have been frozen, and 
> hence was not running, so not sure what killed the nested container. My 
> suspicion falls on systemd, since I also see this log message:
> ```
> Oct 07 18:02:31 ip-10-10-0-207 mesos-agent[31520]: I1007 18:02:31.473656 
> 31532 systemd.cpp:96] Assigned child process '1596' to 'mesos_executors.slice'
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6391) Command task's sandbox should not be owned by root if it uses container image.

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6391:
---
Priority: Blocker  (was: Major)

> Command task's sandbox should not be owned by root if it uses container image.
> --
>
> Key: MESOS-6391
> URL: https://issues.apache.org/jira/browse/MESOS-6391
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.1
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Blocker
>
> Currently, if the task defines a container image, the command executor will 
> be run under root because it needs to perform pivot_root.
> That means if the task wants to run under an unprivileged user, the sandbox 
> of that task will not be writable because it's owned by root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5344) Partition-aware Mesos frameworks.

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5344:
---
Summary: Partition-aware Mesos frameworks.  (was: Partition-aware Mesos 
frameworks)

> Partition-aware Mesos frameworks.
> -
>
> Key: MESOS-5344
> URL: https://issues.apache.org/jira/browse/MESOS-5344
> Project: Mesos
>  Issue Type: Epic
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Blocker
>  Labels: mesosphere
>
> This epic covers three related tasks:
> 1. Allowing partitioned agents to reregister with the master. This allows 
> frameworks to control how tasks running on partitioned agents should be dealt 
> with.
> 2. Replacing the TASK_LOST task state with a set of more granular states with 
> more precise semantics: UNREACHABLE, DROPPED, UNKNOWN, GONE, and 
> GONE_BY_OPERATOR.
> 3. Allow frameworks to be informed when a task that was running on a 
> partitioned agent has been terminated (GONE and GONE_BY_OPERATOR states).
> These new behaviors will be guarded by the {{PARTITION_AWARE}} framework 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6394) Improvements to partition-aware Mesos frameworks.

2016-10-14 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-6394:
--

 Summary: Improvements to partition-aware Mesos frameworks.
 Key: MESOS-6394
 URL: https://issues.apache.org/jira/browse/MESOS-6394
 Project: Mesos
  Issue Type: Epic
  Components: master
Reporter: Alexander Rukletsov
Assignee: Neil Conway


This is a follow up epic to MESOS-5344 to capture further improvements and 
changes that need to be made to the MVP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6035) Add non-recursive version of cgroups::get

2016-10-14 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575167#comment-15575167
 ] 

Alexander Rukletsov commented on MESOS-6035:


{noformat}
Commit: fcd5106b5dfa14bc83eae68415bd4782c16f79a4 [fcd5106]
Parents: 9fc2901d23
Author: Alexander Rukletsov 
Date: 14 October 2016 at 14:10:23 GMT+2
Commit Date: 14 October 2016 at 14:15:02 GMT+2
Labels: HEAD -> master

Revert "Removed the expired TODO about non-recursive version...
`cgroups::get`."

This reverts commit e042aa071a77ef1922d9b1a93f6e8adf221979b3.

RR https://reviews.apache.org/r/51185/ should have been committed
together with https://reviews.apache.org/r/51031/. However, the
latter is not going to make it into the 1.1.0 release, hence the
former is reverted now to avoid confusion.
{noformat}

> Add non-recursive version of cgroups::get
> -
>
> Key: MESOS-6035
> URL: https://issues.apache.org/jira/browse/MESOS-6035
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> In some cases, we only need to get the top level cgroups instead of to get 
> all cgroups recursively. Add a non-recursive version could help to avoid 
> unnecessary paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6035) Add non-recursive version of cgroups::get

2016-10-14 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575167#comment-15575167
 ] 

Alexander Rukletsov edited comment on MESOS-6035 at 10/14/16 12:16 PM:
---

{noformat}
Commit: fcd5106b5dfa14bc83eae68415bd4782c16f79a4 [fcd5106]
Author: Alexander Rukletsov 
Date: 14 October 2016 at 14:10:23 GMT+2
Commit Date: 14 October 2016 at 14:15:02 GMT+2

Revert "Removed the expired TODO about non-recursive version...
`cgroups::get`."

This reverts commit e042aa071a77ef1922d9b1a93f6e8adf221979b3.

RR https://reviews.apache.org/r/51185/ should have been committed
together with https://reviews.apache.org/r/51031/. However, the
latter is not going to make it into the 1.1.0 release, hence the
former is reverted now to avoid confusion.
{noformat}


was (Author: alexr):
{noformat}
Commit: fcd5106b5dfa14bc83eae68415bd4782c16f79a4 [fcd5106]
Parents: 9fc2901d23
Author: Alexander Rukletsov 
Date: 14 October 2016 at 14:10:23 GMT+2
Commit Date: 14 October 2016 at 14:15:02 GMT+2
Labels: HEAD -> master

Revert "Removed the expired TODO about non-recursive version...
`cgroups::get`."

This reverts commit e042aa071a77ef1922d9b1a93f6e8adf221979b3.

RR https://reviews.apache.org/r/51185/ should have been committed
together with https://reviews.apache.org/r/51031/. However, the
latter is not going to make it into the 1.1.0 release, hence the
former is reverted now to avoid confusion.
{noformat}

> Add non-recursive version of cgroups::get
> -
>
> Key: MESOS-6035
> URL: https://issues.apache.org/jira/browse/MESOS-6035
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> In some cases, we only need to get the top level cgroups instead of to get 
> all cgroups recursively. Add a non-recursive version could help to avoid 
> unnecessary paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6395) HealthChecker sends updates to executor via libprocess messaging.

2016-10-14 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-6395:
--

 Summary: HealthChecker sends updates to executor via libprocess 
messaging.
 Key: MESOS-6395
 URL: https://issues.apache.org/jira/browse/MESOS-6395
 Project: Mesos
  Issue Type: Improvement
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov


Currently {{HealthChecker}} sends status updates via libprocess messaging to 
the executor's UPID. This seems unnecessary after refactoring health checker 
into the library: a simple callback will do. Moreover, not requiring executor's 
{{UPID}} will simplify creating a mocked {{HealthChecker}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5963) HealthChecker should not decide when to kill tasks and when to stop performing health checks.

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5963:
---
Shepherd: Benjamin Mahler
Assignee: Alexander Rukletsov  (was: haosdent)
  Sprint: Mesosphere Sprint 45
Target Version/s: 1.2.0

> HealthChecker should not decide when to kill tasks and when to stop 
> performing health checks.
> -
>
> Key: MESOS-5963
> URL: https://issues.apache.org/jira/browse/MESOS-5963
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> Currently, {{HealthChecker}} library decides when a task should be killed 
> based on its health status. Moreover, it stops checking it health after that. 
> This seems unfortunate, because it's up to the executor and / or framework to 
> decide both when to kill tasks and when to health check them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3959) Executor page of mesos ui does not show slave hostname.

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3959:
---
Summary: Executor page of mesos ui does not show slave hostname.  (was: 
Executor page of mesos ui does not show slave hostname)

> Executor page of mesos ui does not show slave hostname.
> ---
>
> Key: MESOS-3959
> URL: https://issues.apache.org/jira/browse/MESOS-3959
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: Ian Babrou
>
> This is not really convenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6028) mesos-execute has a typo in volume help.

2016-10-14 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6028:
---
Summary: mesos-execute has a typo in volume help.  (was: typo in 
mesos-execute usage)

> mesos-execute has a typo in volume help.
> 
>
> Key: MESOS-6028
> URL: https://issues.apache.org/jira/browse/MESOS-6028
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Stéphane Cottin
>Assignee: Tomasz Janiszewski
>Priority: Minor
>
> s/docker_options/driver_options/g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6376) Add documentation for capabilities support of the mesos containerizer

2016-10-17 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6376:
---
Target Version/s: 1.2.0  (was: 1.1.0)
Priority: Major  (was: Blocker)

> Add documentation for capabilities support of the mesos containerizer
> -
>
> Key: MESOS-6376
> URL: https://issues.apache.org/jira/browse/MESOS-6376
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6417) Introduce an extra 'unknown' health check state.

2016-10-19 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-6417:
--

 Summary: Introduce an extra 'unknown' health check state.
 Key: MESOS-6417
 URL: https://issues.apache.org/jira/browse/MESOS-6417
 Project: Mesos
  Issue Type: Improvement
Reporter: Alexander Rukletsov


There are three logical states regarding health checks:
1) no health checks;
2) a health check is defined, but no result is available yet;
3) a health check is defined, it is either healthy or not.

Currently, we do not distinguish between 1) and 2), which can be problematic 
for framework authors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4292) Tests for quota with implicit roles.

2016-10-19 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4292:
---
Shepherd: Alexander Rukletsov
Assignee: Zhitao Li

> Tests for quota with implicit roles.
> 
>
> Key: MESOS-4292
> URL: https://issues.apache.org/jira/browse/MESOS-4292
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Alexander Rukletsov
>Assignee: Zhitao Li
>  Labels: mesosphere
>
> With the introduction of implicit roles (MESOS-3988), we should make sure 
> quota can be set for an inactive role (unknown to the master) and maybe 
> transition it to the active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6420) Mesos Agent leaking sockets when port mapping network isolator is ON

2016-10-25 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6420:
---
Target Version/s: 1.1.0  (was: 1.1.1, 1.2.0)

> Mesos Agent leaking sockets when port mapping network isolator is ON
> 
>
> Key: MESOS-6420
> URL: https://issues.apache.org/jira/browse/MESOS-6420
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network, slave
>Affects Versions: 1.0.2
>Reporter: Santhosh Shanmugham
> Fix For: 1.0.2, 1.2.0
>
>
> Mesos Agent leaks one socket per task launched and eventually runs out of 
> sockets. We were able to track it down to the network isolator 
> (port_mapping.cpp). When we turned off the port mapping isolator no file 
> descriptors where leaked. The leaked fd is a SOCK_STREAM socket.
> Leaked Sockets:
> $ sudo lsof -p $(pgrep -u root -o -f /usr/local/sbin/mesos-slave) -nP | grep 
> "can't"
> [sudo] password for sshanmugham:
> mesos-sla 57688 root   19u  sock0,6  0t0 2993216948 can't 
> identify protocol
> mesos-sla 57688 root   27u  sock0,6  0t0 2993216468 can't 
> identify protocol
> Extract from strace:
> ...
> [pid 57701] 19:14:02.493718 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494395 close(19)   = 0
> [pid 57701] 19:14:02.494448 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494844 close(19)   = 0
> [pid 57701] 19:14:02.494913 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.495565 close(19)   = 0
> [pid 57701] 19:14:02.495617 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496072 close(19)   = 0
> [pid 57701] 19:14:02.496128 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496758 close(19)   = 0
> [pid 57701] 19:14:02.496812 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497270 close(19)   = 0
> [pid 57701] 19:14:02.497319 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497698 close(19)   = 0
> [pid 57701] 19:14:02.497750 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498407 close(19)   = 0
> [pid 57701] 19:14:02.498456 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498899 close(19)   = 0
> [pid 57701] 19:14:02.498963 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 63682] 19:14:02.499091 close(18 
> [pid 57701] 19:14:02.499634 close(19)   = 0
> [pid 57701] 19:14:02.499689 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500044 close(19)   = 0
> [pid 57701] 19:14:02.500093 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500734 close(19)   = 0
> [pid 57701] 19:14:02.500782 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.501271 close(19)   = 0
> [pid 57701] 19:14:02.501339 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.502030 close(19)   = 0
> [pid 57701] 19:14:02.502101 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 19
> ...
> ...
> [pid 57691] 19:18:03.461022 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461345 open("/etc/selinux/config", O_RDONLY  ...>
> [pid 57691] 19:18:03.461460 close(27)   = 0
> [pid 57691] 19:18:03.461520 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461632 close(3 
> [pid  6138] 19:18:03.461781 open("/proc/mounts", O_RDONLY 
> [pid  6138] 19:18:03.462190 close(3 
> [pid 57691] 19:18:03.462374 close(27)   = 0
> [pid 57691] 19:18:03.462430 socket(PF_NETLINK, SOCK_RAW, 0 
> [pid  6138] 19:18:03.462456 open("/proc/net/psched", O_RDONLY 
> [pid  6138] 19:18:03.462678 close(3 
> [pid  6138] 19:18:03.462915 open("/etc/libnl/classid", O_RDONLY  ...>
> [pid 57691] 19:18:03.463046 close(27)   = 0
> [pid 57691] 19:18:03.463111 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.463225 close(3 
> [pid 57691] 19:18:03.463845 close(27)   = 0
> [pid 57691] 19:18:03.463911 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.464604 close(27)   = 0
> [pid 57691] 19:18:03.464664 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465074 close(27)   = 0
> [pid 57691] 19:18:03.465132 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465862 close(27)   = 0
> [pid 57691] 19:18:03.465928 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.466713 close(27)   = 0
> [pid 57691] 19:18:03.466780 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.467472 close(27)   = 0
> [pid 57691] 19:18:03.467524 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468012 close(27)   = 0
> [pid 57691] 19:18:03.468075 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468799 close(27)   = 0
> [pid 57691] 19:18:03.468950 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.469505 close(27)   = 0
> [pid 57691] 19:18:03.469578 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.470301 close(27)   = 0
> [pid 57691] 19:18:03.470353

[jira] [Updated] (MESOS-6420) Mesos Agent leaking sockets when port mapping network isolator is ON

2016-10-25 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6420:
---
Shepherd: Jie Yu
Target Version/s: 1.0.2, 1.1.0, 1.2.0  (was: 1.1.0)

> Mesos Agent leaking sockets when port mapping network isolator is ON
> 
>
> Key: MESOS-6420
> URL: https://issues.apache.org/jira/browse/MESOS-6420
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network, slave
>Affects Versions: 1.0.2
>Reporter: Santhosh Shanmugham
> Fix For: 1.0.2, 1.2.0
>
>
> Mesos Agent leaks one socket per task launched and eventually runs out of 
> sockets. We were able to track it down to the network isolator 
> (port_mapping.cpp). When we turned off the port mapping isolator no file 
> descriptors where leaked. The leaked fd is a SOCK_STREAM socket.
> Leaked Sockets:
> $ sudo lsof -p $(pgrep -u root -o -f /usr/local/sbin/mesos-slave) -nP | grep 
> "can't"
> [sudo] password for sshanmugham:
> mesos-sla 57688 root   19u  sock0,6  0t0 2993216948 can't 
> identify protocol
> mesos-sla 57688 root   27u  sock0,6  0t0 2993216468 can't 
> identify protocol
> Extract from strace:
> ...
> [pid 57701] 19:14:02.493718 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494395 close(19)   = 0
> [pid 57701] 19:14:02.494448 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494844 close(19)   = 0
> [pid 57701] 19:14:02.494913 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.495565 close(19)   = 0
> [pid 57701] 19:14:02.495617 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496072 close(19)   = 0
> [pid 57701] 19:14:02.496128 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496758 close(19)   = 0
> [pid 57701] 19:14:02.496812 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497270 close(19)   = 0
> [pid 57701] 19:14:02.497319 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497698 close(19)   = 0
> [pid 57701] 19:14:02.497750 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498407 close(19)   = 0
> [pid 57701] 19:14:02.498456 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498899 close(19)   = 0
> [pid 57701] 19:14:02.498963 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 63682] 19:14:02.499091 close(18 
> [pid 57701] 19:14:02.499634 close(19)   = 0
> [pid 57701] 19:14:02.499689 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500044 close(19)   = 0
> [pid 57701] 19:14:02.500093 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500734 close(19)   = 0
> [pid 57701] 19:14:02.500782 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.501271 close(19)   = 0
> [pid 57701] 19:14:02.501339 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.502030 close(19)   = 0
> [pid 57701] 19:14:02.502101 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 19
> ...
> ...
> [pid 57691] 19:18:03.461022 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461345 open("/etc/selinux/config", O_RDONLY  ...>
> [pid 57691] 19:18:03.461460 close(27)   = 0
> [pid 57691] 19:18:03.461520 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461632 close(3 
> [pid  6138] 19:18:03.461781 open("/proc/mounts", O_RDONLY 
> [pid  6138] 19:18:03.462190 close(3 
> [pid 57691] 19:18:03.462374 close(27)   = 0
> [pid 57691] 19:18:03.462430 socket(PF_NETLINK, SOCK_RAW, 0 
> [pid  6138] 19:18:03.462456 open("/proc/net/psched", O_RDONLY 
> [pid  6138] 19:18:03.462678 close(3 
> [pid  6138] 19:18:03.462915 open("/etc/libnl/classid", O_RDONLY  ...>
> [pid 57691] 19:18:03.463046 close(27)   = 0
> [pid 57691] 19:18:03.463111 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.463225 close(3 
> [pid 57691] 19:18:03.463845 close(27)   = 0
> [pid 57691] 19:18:03.463911 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.464604 close(27)   = 0
> [pid 57691] 19:18:03.464664 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465074 close(27)   = 0
> [pid 57691] 19:18:03.465132 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465862 close(27)   = 0
> [pid 57691] 19:18:03.465928 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.466713 close(27)   = 0
> [pid 57691] 19:18:03.466780 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.467472 close(27)   = 0
> [pid 57691] 19:18:03.467524 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468012 close(27)   = 0
> [pid 57691] 19:18:03.468075 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468799 close(27)   = 0
> [pid 57691] 19:18:03.468950 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.469505 close(27)   = 0
> [pid 57691] 19:18:03.469578 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.470301 close(27)

[jira] [Updated] (MESOS-6420) Mesos Agent leaking sockets when port mapping network isolator is ON

2016-10-25 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6420:
---
Priority: Blocker  (was: Major)

> Mesos Agent leaking sockets when port mapping network isolator is ON
> 
>
> Key: MESOS-6420
> URL: https://issues.apache.org/jira/browse/MESOS-6420
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network, slave
>Affects Versions: 1.0.2
>Reporter: Santhosh Shanmugham
>Priority: Blocker
> Fix For: 1.0.2, 1.2.0
>
>
> Mesos Agent leaks one socket per task launched and eventually runs out of 
> sockets. We were able to track it down to the network isolator 
> (port_mapping.cpp). When we turned off the port mapping isolator no file 
> descriptors where leaked. The leaked fd is a SOCK_STREAM socket.
> Leaked Sockets:
> $ sudo lsof -p $(pgrep -u root -o -f /usr/local/sbin/mesos-slave) -nP | grep 
> "can't"
> [sudo] password for sshanmugham:
> mesos-sla 57688 root   19u  sock0,6  0t0 2993216948 can't 
> identify protocol
> mesos-sla 57688 root   27u  sock0,6  0t0 2993216468 can't 
> identify protocol
> Extract from strace:
> ...
> [pid 57701] 19:14:02.493718 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494395 close(19)   = 0
> [pid 57701] 19:14:02.494448 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494844 close(19)   = 0
> [pid 57701] 19:14:02.494913 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.495565 close(19)   = 0
> [pid 57701] 19:14:02.495617 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496072 close(19)   = 0
> [pid 57701] 19:14:02.496128 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496758 close(19)   = 0
> [pid 57701] 19:14:02.496812 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497270 close(19)   = 0
> [pid 57701] 19:14:02.497319 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497698 close(19)   = 0
> [pid 57701] 19:14:02.497750 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498407 close(19)   = 0
> [pid 57701] 19:14:02.498456 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498899 close(19)   = 0
> [pid 57701] 19:14:02.498963 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 63682] 19:14:02.499091 close(18 
> [pid 57701] 19:14:02.499634 close(19)   = 0
> [pid 57701] 19:14:02.499689 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500044 close(19)   = 0
> [pid 57701] 19:14:02.500093 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500734 close(19)   = 0
> [pid 57701] 19:14:02.500782 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.501271 close(19)   = 0
> [pid 57701] 19:14:02.501339 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.502030 close(19)   = 0
> [pid 57701] 19:14:02.502101 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 19
> ...
> ...
> [pid 57691] 19:18:03.461022 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461345 open("/etc/selinux/config", O_RDONLY  ...>
> [pid 57691] 19:18:03.461460 close(27)   = 0
> [pid 57691] 19:18:03.461520 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461632 close(3 
> [pid  6138] 19:18:03.461781 open("/proc/mounts", O_RDONLY 
> [pid  6138] 19:18:03.462190 close(3 
> [pid 57691] 19:18:03.462374 close(27)   = 0
> [pid 57691] 19:18:03.462430 socket(PF_NETLINK, SOCK_RAW, 0 
> [pid  6138] 19:18:03.462456 open("/proc/net/psched", O_RDONLY 
> [pid  6138] 19:18:03.462678 close(3 
> [pid  6138] 19:18:03.462915 open("/etc/libnl/classid", O_RDONLY  ...>
> [pid 57691] 19:18:03.463046 close(27)   = 0
> [pid 57691] 19:18:03.463111 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.463225 close(3 
> [pid 57691] 19:18:03.463845 close(27)   = 0
> [pid 57691] 19:18:03.463911 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.464604 close(27)   = 0
> [pid 57691] 19:18:03.464664 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465074 close(27)   = 0
> [pid 57691] 19:18:03.465132 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465862 close(27)   = 0
> [pid 57691] 19:18:03.465928 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.466713 close(27)   = 0
> [pid 57691] 19:18:03.466780 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.467472 close(27)   = 0
> [pid 57691] 19:18:03.467524 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468012 close(27)   = 0
> [pid 57691] 19:18:03.468075 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468799 close(27)   = 0
> [pid 57691] 19:18:03.468950 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.469505 close(27)   = 0
> [pid 57691] 19:18:03.469578 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.470301 close(27)   = 0
> [pid 5769

[jira] [Updated] (MESOS-6446) WebUI redirect doesn't work with stats from /metric/snapshot

2016-10-25 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6446:
---
Target Version/s: 1.0.2, 1.1.0  (was: 1.0.2)

> WebUI redirect doesn't work with stats from /metric/snapshot
> 
>
> Key: MESOS-6446
> URL: https://issues.apache.org/jira/browse/MESOS-6446
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: haosdent
>Priority: Blocker
> Attachments: Screen Shot 2016-10-21 at 12.04.23 PM.png
>
>
> After Mesos 1.0, the webUI redirect is hidden from the users so you can go to 
> any of the master and the webUI is populated with state.json from the leading 
> master. 
> This doesn't include stats from /metric/snapshot though as it is not 
> redirected. The user ends up seeing some fields with empty values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6455) DefaultExecutorTests fail when running on hosts without docker

2016-10-25 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6455:
---
Target Version/s: 1.1.0

> DefaultExecutorTests fail when running on hosts without docker 
> ---
>
> Key: MESOS-6455
> URL: https://issues.apache.org/jira/browse/MESOS-6455
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>
> {noformat:title=}
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskRunning/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_KillTask/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskUsesExecutor/1, 
> where GetParam() = "docker,mesos"
> {noformat}
> {noformat:title=}
> ../../src/tests/default_executor_tests.cpp:98: Failure
> slave: Failed to create containerizer: Could not create DockerContainerizer: 
> Failed to create docker: Failed to get docker version: Failed to execute 
> 'docker -H unix:///var/run/docker.sock --version': exited with status 127
> {noformat}
> Maybe we can put {{DOCKER_}} in the instantiation name and use another 
> instantiation for tests that don't require docker?
> /cc [~vinodkone] [~anandmazumdar]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6420) Mesos Agent leaking sockets when port mapping network isolator is ON

2016-10-25 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605995#comment-15605995
 ] 

Alexander Rukletsov commented on MESOS-6420:


[~jieyu] could you cherry-pick it to 1.1.x?

> Mesos Agent leaking sockets when port mapping network isolator is ON
> 
>
> Key: MESOS-6420
> URL: https://issues.apache.org/jira/browse/MESOS-6420
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network, slave
>Affects Versions: 1.0.2
>Reporter: Santhosh Shanmugham
>Priority: Blocker
> Fix For: 1.0.2, 1.2.0
>
>
> Mesos Agent leaks one socket per task launched and eventually runs out of 
> sockets. We were able to track it down to the network isolator 
> (port_mapping.cpp). When we turned off the port mapping isolator no file 
> descriptors where leaked. The leaked fd is a SOCK_STREAM socket.
> Leaked Sockets:
> $ sudo lsof -p $(pgrep -u root -o -f /usr/local/sbin/mesos-slave) -nP | grep 
> "can't"
> [sudo] password for sshanmugham:
> mesos-sla 57688 root   19u  sock0,6  0t0 2993216948 can't 
> identify protocol
> mesos-sla 57688 root   27u  sock0,6  0t0 2993216468 can't 
> identify protocol
> Extract from strace:
> ...
> [pid 57701] 19:14:02.493718 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494395 close(19)   = 0
> [pid 57701] 19:14:02.494448 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494844 close(19)   = 0
> [pid 57701] 19:14:02.494913 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.495565 close(19)   = 0
> [pid 57701] 19:14:02.495617 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496072 close(19)   = 0
> [pid 57701] 19:14:02.496128 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496758 close(19)   = 0
> [pid 57701] 19:14:02.496812 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497270 close(19)   = 0
> [pid 57701] 19:14:02.497319 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497698 close(19)   = 0
> [pid 57701] 19:14:02.497750 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498407 close(19)   = 0
> [pid 57701] 19:14:02.498456 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498899 close(19)   = 0
> [pid 57701] 19:14:02.498963 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 63682] 19:14:02.499091 close(18 
> [pid 57701] 19:14:02.499634 close(19)   = 0
> [pid 57701] 19:14:02.499689 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500044 close(19)   = 0
> [pid 57701] 19:14:02.500093 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500734 close(19)   = 0
> [pid 57701] 19:14:02.500782 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.501271 close(19)   = 0
> [pid 57701] 19:14:02.501339 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.502030 close(19)   = 0
> [pid 57701] 19:14:02.502101 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 19
> ...
> ...
> [pid 57691] 19:18:03.461022 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461345 open("/etc/selinux/config", O_RDONLY  ...>
> [pid 57691] 19:18:03.461460 close(27)   = 0
> [pid 57691] 19:18:03.461520 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461632 close(3 
> [pid  6138] 19:18:03.461781 open("/proc/mounts", O_RDONLY 
> [pid  6138] 19:18:03.462190 close(3 
> [pid 57691] 19:18:03.462374 close(27)   = 0
> [pid 57691] 19:18:03.462430 socket(PF_NETLINK, SOCK_RAW, 0 
> [pid  6138] 19:18:03.462456 open("/proc/net/psched", O_RDONLY 
> [pid  6138] 19:18:03.462678 close(3 
> [pid  6138] 19:18:03.462915 open("/etc/libnl/classid", O_RDONLY  ...>
> [pid 57691] 19:18:03.463046 close(27)   = 0
> [pid 57691] 19:18:03.463111 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.463225 close(3 
> [pid 57691] 19:18:03.463845 close(27)   = 0
> [pid 57691] 19:18:03.463911 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.464604 close(27)   = 0
> [pid 57691] 19:18:03.464664 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465074 close(27)   = 0
> [pid 57691] 19:18:03.465132 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465862 close(27)   = 0
> [pid 57691] 19:18:03.465928 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.466713 close(27)   = 0
> [pid 57691] 19:18:03.466780 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.467472 close(27)   = 0
> [pid 57691] 19:18:03.467524 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468012 close(27)   = 0
> [pid 57691] 19:18:03.468075 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468799 close(27)   = 0
> [pid 57691] 19:18:03.468950 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.469505 close(27)   = 0
> [pid 57691] 19:18:03.469578 socket(PF_NETLINK, SOCK_RAW,

[jira] [Updated] (MESOS-6278) Add test cases for the HTTP health checks.

2016-10-27 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6278:
---
Summary: Add test cases for the HTTP health checks.  (was: Add test cases 
for the HTTP health checks)

> Add test cases for the HTTP health checks.
> --
>
> Key: MESOS-6278
> URL: https://issues.apache.org/jira/browse/MESOS-6278
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: haosdent
>Assignee: haosdent
>  Labels: health-check, mesosphere, test
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6279) Add test cases for the TCP health check.

2016-10-27 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6279:
---
Summary: Add test cases for the TCP health check.  (was: Add test cases for 
the TCP health check)

> Add test cases for the TCP health check.
> 
>
> Key: MESOS-6279
> URL: https://issues.apache.org/jira/browse/MESOS-6279
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: haosdent
>Assignee: haosdent
>  Labels: health-check, mesosphere, test
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6293) HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.

2016-10-27 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612004#comment-15612004
 ] 

Alexander Rukletsov commented on MESOS-6293:


https://reviews.apache.org/r/53226/

> HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.
> 
>
> Key: MESOS-6293
> URL: https://issues.apache.org/jira/browse/MESOS-6293
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>  Labels: health-check, mesosphere
>
> I see consistent failures of this test in the internal CI in *some* distros, 
> specifically CentOS 6, Ubuntu 14, 15, 16. The source of the health check 
> failure is always the same: {{curl}} cannot connect to the target:
> {noformat}
> Received task health update, healthy: false
> W0929 17:22:05.270992  2730 health_checker.cpp:204] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) couldn't connect to host
> I0929 17:22:05.273634 26850 slave.cpp:3609] Handling status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from executor(1)@172.30.2.20:58660
> I0929 17:22:05.274178 26844 status_update_manager.cpp:323] Received status 
> update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274226 26844 status_update_manager.cpp:377] Forwarding update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to the agent
> I0929 17:22:05.274314 26845 slave.cpp:4026] Forwarding the update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to master@172.30.2.20:38955
> I0929 17:22:05.274415 26845 slave.cpp:3920] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274436 26845 slave.cpp:3936] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for 
> task aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of 
> framework 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to 
> executor(1)@172.30.2.20:58660
> I0929 17:22:05.274534 26849 master.cpp:5661] Status update TASK_RUNNING 
> (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from agent 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-S0 at slave(77)@172.30.2.20:38955 
> (ip-172-30-2-20.mesosphere.io)
> ../../src/tests/health_check_tests.cpp:1398: Failure
> I0929 17:22:05.274567 26849 master.cpp:5723] Forwarding status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> Value of: statusHealth.get().healthy()
>   Actual: false
>   Expected: true
> I0929 17:22:05.274636 26849 master.cpp:7560] Updating the state of task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- (latest state: TASK_RUNNING, status 
> update state: TASK_RUNNING)
> I0929 17:22:05.274829 26844 sched.cpp:1025] Scheduler::statusUpdate took 
> 43297ns
> Received SHUTDOWN event
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6493) Add test cases for the HTTPS health checks.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6493:
---
Fix Version/s: (was: 1.2.0)

> Add test cases for the HTTPS health checks.
> ---
>
> Key: MESOS-6493
> URL: https://issues.apache.org/jira/browse/MESOS-6493
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: haosdent
>Assignee: haosdent
>  Labels: health-check, mesosphere, test
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6420) Mesos Agent leaking sockets when port mapping network isolator is ON

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6420:
---
Fix Version/s: 1.1.0

> Mesos Agent leaking sockets when port mapping network isolator is ON
> 
>
> Key: MESOS-6420
> URL: https://issues.apache.org/jira/browse/MESOS-6420
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network, slave
>Affects Versions: 1.0.2
>Reporter: Santhosh Shanmugham
>Priority: Blocker
> Fix For: 1.0.2, 1.1.0, 1.2.0
>
>
> Mesos Agent leaks one socket per task launched and eventually runs out of 
> sockets. We were able to track it down to the network isolator 
> (port_mapping.cpp). When we turned off the port mapping isolator no file 
> descriptors where leaked. The leaked fd is a SOCK_STREAM socket.
> Leaked Sockets:
> $ sudo lsof -p $(pgrep -u root -o -f /usr/local/sbin/mesos-slave) -nP | grep 
> "can't"
> [sudo] password for sshanmugham:
> mesos-sla 57688 root   19u  sock0,6  0t0 2993216948 can't 
> identify protocol
> mesos-sla 57688 root   27u  sock0,6  0t0 2993216468 can't 
> identify protocol
> Extract from strace:
> ...
> [pid 57701] 19:14:02.493718 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494395 close(19)   = 0
> [pid 57701] 19:14:02.494448 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494844 close(19)   = 0
> [pid 57701] 19:14:02.494913 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.495565 close(19)   = 0
> [pid 57701] 19:14:02.495617 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496072 close(19)   = 0
> [pid 57701] 19:14:02.496128 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496758 close(19)   = 0
> [pid 57701] 19:14:02.496812 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497270 close(19)   = 0
> [pid 57701] 19:14:02.497319 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497698 close(19)   = 0
> [pid 57701] 19:14:02.497750 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498407 close(19)   = 0
> [pid 57701] 19:14:02.498456 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498899 close(19)   = 0
> [pid 57701] 19:14:02.498963 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 63682] 19:14:02.499091 close(18 
> [pid 57701] 19:14:02.499634 close(19)   = 0
> [pid 57701] 19:14:02.499689 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500044 close(19)   = 0
> [pid 57701] 19:14:02.500093 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500734 close(19)   = 0
> [pid 57701] 19:14:02.500782 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.501271 close(19)   = 0
> [pid 57701] 19:14:02.501339 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.502030 close(19)   = 0
> [pid 57701] 19:14:02.502101 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 19
> ...
> ...
> [pid 57691] 19:18:03.461022 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461345 open("/etc/selinux/config", O_RDONLY  ...>
> [pid 57691] 19:18:03.461460 close(27)   = 0
> [pid 57691] 19:18:03.461520 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461632 close(3 
> [pid  6138] 19:18:03.461781 open("/proc/mounts", O_RDONLY 
> [pid  6138] 19:18:03.462190 close(3 
> [pid 57691] 19:18:03.462374 close(27)   = 0
> [pid 57691] 19:18:03.462430 socket(PF_NETLINK, SOCK_RAW, 0 
> [pid  6138] 19:18:03.462456 open("/proc/net/psched", O_RDONLY 
> [pid  6138] 19:18:03.462678 close(3 
> [pid  6138] 19:18:03.462915 open("/etc/libnl/classid", O_RDONLY  ...>
> [pid 57691] 19:18:03.463046 close(27)   = 0
> [pid 57691] 19:18:03.463111 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.463225 close(3 
> [pid 57691] 19:18:03.463845 close(27)   = 0
> [pid 57691] 19:18:03.463911 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.464604 close(27)   = 0
> [pid 57691] 19:18:03.464664 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465074 close(27)   = 0
> [pid 57691] 19:18:03.465132 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465862 close(27)   = 0
> [pid 57691] 19:18:03.465928 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.466713 close(27)   = 0
> [pid 57691] 19:18:03.466780 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.467472 close(27)   = 0
> [pid 57691] 19:18:03.467524 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468012 close(27)   = 0
> [pid 57691] 19:18:03.468075 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468799 close(27)   = 0
> [pid 57691] 19:18:03.468950 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.469505 close(27)   = 0
> [pid 57691] 19:18:03.469578 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.470301 close(27)   = 0
> [pid 57691] 1

[jira] [Updated] (MESOS-6026) Tasks mistakenly marked as FAILED due to race b/w sendExecutorTerminatedStatusUpdate() and _statusUpdate().

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6026:
---
Summary: Tasks mistakenly marked as FAILED due to race b/w 
sendExecutorTerminatedStatusUpdate() and _statusUpdate().  (was: Tasks 
mistakenly marked as FAILED due to race b/w 
⁠sendExecutorTerminatedStatusUpdate()⁠ and ⁠_statusUpdate()⁠)

> Tasks mistakenly marked as FAILED due to race b/w 
> sendExecutorTerminatedStatusUpdate() and _statusUpdate().
> ---
>
> Key: MESOS-6026
> URL: https://issues.apache.org/jira/browse/MESOS-6026
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Kapil Arya
>Assignee: Benjamin Mahler
>  Labels: mesosphere
> Fix For: 1.0.2, 1.1.0
>
>
> Due to a race between ⁠sendExecutorTerminatedStatusUpdate()⁠ and 
> ⁠_statusUpdate()⁠ that happens when the task has just finished and the 
> executor is exiting.
> Here is an example of slave log messages:
> {code}
> Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959374 
> 20418 slave.cpp:3211] Handling status update TASK_FINISHED (UUID: 
> fd79d0bd-4ece-41dc-bced-b93491f6bb2e) for task 291 of framework 
> 340dfe26-a09f-4857-85b8-faba5f8d95df-0008 from executor(1)@10.10.0.205:53504
> Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959604 
> 20418 slave.cpp:3732] executor(1)@10.10.0.205:53504 exited
> Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959643 
> 20418 slave.cpp:4089] Executor '291' of framework 
> 340dfe26-a09f-4857-85b8-faba5f8d95df-0008 exited with status 0
> Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959744 
> 20418 slave.cpp:3211] Handling status update TASK_FAILED (UUID: 
> b94722fb-1658-4936-b604-6d642ffe20a0) for task 291 of framework 
> 340dfe26-a09f-4857-85b8-faba5f8d95df-0008 from @0.0.0.0:0
> {code}
> As can be noticed, the task is marked as TASK_FAILED after the executor has 
> exited.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6445) Reconciliation for unreachable agent after master failover is incorrect.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6445:
---
Summary: Reconciliation for unreachable agent after master failover is 
incorrect.  (was: Reconciliation for unreachable agent after master failover is 
incorrect)

> Reconciliation for unreachable agent after master failover is incorrect.
> 
>
> Key: MESOS-6445
> URL: https://issues.apache.org/jira/browse/MESOS-6445
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> {noformat}
> If the master fails over and an agent does not re-register within the
> `agent_reregister_timeout`, the master marks the agent as unreachable in
> the registry and sends `slaveLost` for it. However, we neglected to
> update the master's in-memory state for the newly unreachable agent;
> this meant that task reconciliation would return incorrect results
> (until/unless the next master failover).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6455) DefaultExecutorTests fail when running on hosts without docker

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6455:
---
Target Version/s: 1.2.0  (was: 1.1.0)

> DefaultExecutorTests fail when running on hosts without docker 
> ---
>
> Key: MESOS-6455
> URL: https://issues.apache.org/jira/browse/MESOS-6455
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>
> {noformat:title=}
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskRunning/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_KillTask/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskUsesExecutor/1, 
> where GetParam() = "docker,mesos"
> {noformat}
> {noformat:title=}
> ../../src/tests/default_executor_tests.cpp:98: Failure
> slave: Failed to create containerizer: Could not create DockerContainerizer: 
> Failed to create docker: Failed to get docker version: Failed to execute 
> 'docker -H unix:///var/run/docker.sock --version': exited with status 127
> {noformat}
> Maybe we can put {{DOCKER_}} in the instantiation name and use another 
> instantiation for tests that don't require docker?
> /cc [~vinodkone] [~anandmazumdar]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6420) Mesos Agent leaking sockets when port mapping network isolator is ON

2016-10-28 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614902#comment-15614902
 ] 

Alexander Rukletsov commented on MESOS-6420:


Cherry-picked.

> Mesos Agent leaking sockets when port mapping network isolator is ON
> 
>
> Key: MESOS-6420
> URL: https://issues.apache.org/jira/browse/MESOS-6420
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network, slave
>Affects Versions: 1.0.2
>Reporter: Santhosh Shanmugham
>Priority: Blocker
> Fix For: 1.0.2, 1.1.0, 1.2.0
>
>
> Mesos Agent leaks one socket per task launched and eventually runs out of 
> sockets. We were able to track it down to the network isolator 
> (port_mapping.cpp). When we turned off the port mapping isolator no file 
> descriptors where leaked. The leaked fd is a SOCK_STREAM socket.
> Leaked Sockets:
> $ sudo lsof -p $(pgrep -u root -o -f /usr/local/sbin/mesos-slave) -nP | grep 
> "can't"
> [sudo] password for sshanmugham:
> mesos-sla 57688 root   19u  sock0,6  0t0 2993216948 can't 
> identify protocol
> mesos-sla 57688 root   27u  sock0,6  0t0 2993216468 can't 
> identify protocol
> Extract from strace:
> ...
> [pid 57701] 19:14:02.493718 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494395 close(19)   = 0
> [pid 57701] 19:14:02.494448 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494844 close(19)   = 0
> [pid 57701] 19:14:02.494913 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.495565 close(19)   = 0
> [pid 57701] 19:14:02.495617 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496072 close(19)   = 0
> [pid 57701] 19:14:02.496128 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496758 close(19)   = 0
> [pid 57701] 19:14:02.496812 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497270 close(19)   = 0
> [pid 57701] 19:14:02.497319 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497698 close(19)   = 0
> [pid 57701] 19:14:02.497750 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498407 close(19)   = 0
> [pid 57701] 19:14:02.498456 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498899 close(19)   = 0
> [pid 57701] 19:14:02.498963 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 63682] 19:14:02.499091 close(18 
> [pid 57701] 19:14:02.499634 close(19)   = 0
> [pid 57701] 19:14:02.499689 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500044 close(19)   = 0
> [pid 57701] 19:14:02.500093 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500734 close(19)   = 0
> [pid 57701] 19:14:02.500782 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.501271 close(19)   = 0
> [pid 57701] 19:14:02.501339 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.502030 close(19)   = 0
> [pid 57701] 19:14:02.502101 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 19
> ...
> ...
> [pid 57691] 19:18:03.461022 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461345 open("/etc/selinux/config", O_RDONLY  ...>
> [pid 57691] 19:18:03.461460 close(27)   = 0
> [pid 57691] 19:18:03.461520 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461632 close(3 
> [pid  6138] 19:18:03.461781 open("/proc/mounts", O_RDONLY 
> [pid  6138] 19:18:03.462190 close(3 
> [pid 57691] 19:18:03.462374 close(27)   = 0
> [pid 57691] 19:18:03.462430 socket(PF_NETLINK, SOCK_RAW, 0 
> [pid  6138] 19:18:03.462456 open("/proc/net/psched", O_RDONLY 
> [pid  6138] 19:18:03.462678 close(3 
> [pid  6138] 19:18:03.462915 open("/etc/libnl/classid", O_RDONLY  ...>
> [pid 57691] 19:18:03.463046 close(27)   = 0
> [pid 57691] 19:18:03.463111 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.463225 close(3 
> [pid 57691] 19:18:03.463845 close(27)   = 0
> [pid 57691] 19:18:03.463911 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.464604 close(27)   = 0
> [pid 57691] 19:18:03.464664 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465074 close(27)   = 0
> [pid 57691] 19:18:03.465132 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465862 close(27)   = 0
> [pid 57691] 19:18:03.465928 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.466713 close(27)   = 0
> [pid 57691] 19:18:03.466780 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.467472 close(27)   = 0
> [pid 57691] 19:18:03.467524 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468012 close(27)   = 0
> [pid 57691] 19:18:03.468075 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468799 close(27)   = 0
> [pid 57691] 19:18:03.468950 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.469505 close(27)   = 0
> [pid 57691] 19:18:03.469578 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691]

[jira] [Updated] (MESOS-6455) DefaultExecutorTests fail when running on hosts without docker.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6455:
---
Summary: DefaultExecutorTests fail when running on hosts without docker.  
(was: DefaultExecutorTests fail when running on hosts without docker )

> DefaultExecutorTests fail when running on hosts without docker.
> ---
>
> Key: MESOS-6455
> URL: https://issues.apache.org/jira/browse/MESOS-6455
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>
> {noformat:title=}
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskRunning/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_KillTask/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskUsesExecutor/1, 
> where GetParam() = "docker,mesos"
> {noformat}
> {noformat:title=}
> ../../src/tests/default_executor_tests.cpp:98: Failure
> slave: Failed to create containerizer: Could not create DockerContainerizer: 
> Failed to create docker: Failed to get docker version: Failed to execute 
> 'docker -H unix:///var/run/docker.sock --version': exited with status 127
> {noformat}
> Maybe we can put {{DOCKER_}} in the instantiation name and use another 
> instantiation for tests that don't require docker?
> /cc [~vinodkone] [~anandmazumdar]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6497) HTTP Adapter does not surface MasterInfo.

2016-10-28 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614924#comment-15614924
 ] 

Alexander Rukletsov commented on MESOS-6497:


Please post the complete chain.

https://reviews.apache.org/r/53246/
https://reviews.apache.org/r/53247/

> HTTP Adapter does not surface MasterInfo.
> -
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6497) HTTP Adapter does not surface MasterInfo.

2016-10-28 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614923#comment-15614923
 ] 

Alexander Rukletsov commented on MESOS-6497:


Please post the complete chain.

https://reviews.apache.org/r/53246/
https://reviews.apache.org/r/53247/

> HTTP Adapter does not surface MasterInfo.
> -
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-6497) HTTP Adapter does not surface MasterInfo.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6497:
---
Comment: was deleted

(was: Please post the complete chain.

https://reviews.apache.org/r/53246/
https://reviews.apache.org/r/53247/)

> HTTP Adapter does not surface MasterInfo.
> -
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6119) TCP health checks are not portable.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6119:
---
Sprint: Mesosphere Sprint 42, Mesosphere Sprint 43, Mesosphere Sprint 44, 
Mesosphere Sprint 46  (was: Mesosphere Sprint 42, Mesosphere Sprint 43, 
Mesosphere Sprint 44, Mesosphere Sprint 45)

> TCP health checks are not portable.
> ---
>
> Key: MESOS-6119
> URL: https://issues.apache.org/jira/browse/MESOS-6119
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> MESOS-3567 introduced a dependency on "bash" for TCP health checks, which is 
> undesirable. We should implement a portable solution for TCP health checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6457) Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6457:
---
Sprint: Mesosphere Sprint 46  (was: Mesosphere Sprint 45)

> Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING
> 
>
> Key: MESOS-6457
> URL: https://issues.apache.org/jira/browse/MESOS-6457
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>
> A task can currently transition from {{TASK_KILLING}} to {{TASK_RUNNING}}, if 
> for example it starts/stops passing a health check once it got into the 
> {{TASK_KILLING}} state.
> I think that this behaviour is counterintuitive. It also makes the life of 
> framework/tools developers harder, since they have to keep track of the 
> complete task status history in order to know if a task is being killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6184:
---
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 46  (was: Mesosphere Sprint 
44, Mesosphere Sprint 45)

> Health checks should use a general mechanism to enter namespaces of the task.
> -
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Blocker
>  Labels: health-check, mesosphere
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5963) HealthChecker should not decide when to kill tasks and when to stop performing health checks.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5963:
---
Sprint: Mesosphere Sprint 46  (was: Mesosphere Sprint 45)

> HealthChecker should not decide when to kill tasks and when to stop 
> performing health checks.
> -
>
> Key: MESOS-5963
> URL: https://issues.apache.org/jira/browse/MESOS-5963
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> Currently, {{HealthChecker}} library decides when a task should be killed 
> based on its health status. Moreover, it stops checking it health after that. 
> This seems unfortunate, because it's up to the executor and / or framework to 
> decide both when to kill tasks and when to health check them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6395) HealthChecker sends updates to executor via libprocess messaging.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6395:
---
Sprint: Mesosphere Sprint 46  (was: Mesosphere Sprint 45)

> HealthChecker sends updates to executor via libprocess messaging.
> -
>
> Key: MESOS-6395
> URL: https://issues.apache.org/jira/browse/MESOS-6395
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> Currently {{HealthChecker}} sends status updates via libprocess messaging to 
> the executor's UPID. This seems unnecessary after refactoring health checker 
> into the library: a simple callback will do. Moreover, not requiring 
> executor's {{UPID}} will simplify creating a mocked {{HealthChecker}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6171) Introduce "global" decision policy for unhealthy tasks.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6171:
---
Sprint: Mesosphere Sprint 44  (was: Mesosphere Sprint 44, Mesosphere Sprint 
45)

> Introduce "global" decision policy for unhealthy tasks.
> ---
>
> Key: MESOS-6171
> URL: https://issues.apache.org/jira/browse/MESOS-6171
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.0.0
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>  Labels: health-check, mesosphere
>
> Currently, if the task is deemed unhealthy, i.e. it failed a health check a 
> certain number of times, it is killed by both default executors: 
> [command|https://github.com/apache/mesos/blob/b053572bc424478cafcd60d1bce078f5132c4590/src/launcher/executor.cpp#L299]
>  and 
> [docker|https://github.com/apache/mesos/blob/b053572bc424478cafcd60d1bce078f5132c4590/src/docker/executor.cpp#L315].
>  This is what can be called "local" kill policy.
> While local kill policy can save some network traffic and unload the 
> scheduler, there are cases, when a scheduler may want to decide what—and 
> when—to do. This is what can be called "global" policy, i.e. the health check 
> library reports whether a health check failed or succeeded, while the 
> executor forwards this update to the scheduler without taking any action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6514) Improvements to quota.

2016-10-31 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-6514:
--

 Summary: Improvements to quota.
 Key: MESOS-6514
 URL: https://issues.apache.org/jira/browse/MESOS-6514
 Project: Mesos
  Issue Type: Epic
  Components: allocation, master
Reporter: Alexander Rukletsov


This is a follow up epic to MESOS-1791 to capture further improvements and 
changes that need to be made after the MVP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1791) Introduce Master / Offer Resource Reservations aka Quota.

2016-10-31 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-1791:
---
Summary: Introduce Master / Offer Resource Reservations aka Quota.  (was: 
Introduce Master / Offer Resource Reservations aka Quota)

> Introduce Master / Offer Resource Reservations aka Quota.
> -
>
> Key: MESOS-1791
> URL: https://issues.apache.org/jira/browse/MESOS-1791
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master, replicated log
>Reporter: Tom Arnfeld
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Currently Mesos supports the ability to reserve resources (for a given role) 
> on a per-slave basis, as introduced in MESOS-505. This allows you to almost 
> statically partition off a set of resources on a set of machines, to 
> guarantee certain types of frameworks get some resources.
> This is very useful, though it is also very useful to be able to control 
> these reservations through the master (instead of per-slave) for when I don't 
> care which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of 
> (X,Y).
> I'm not sure what structure this could take, but apparently it has already 
> been discussed. Would this be a CLI flag? Could there be a (authenticated) 
> web interface to control these reservations?
> Follow up epic: MESOS-6514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1791) Introduce Master / Offer Resource Reservations aka Quota

2016-10-31 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-1791:
---
Description: 
Currently Mesos supports the ability to reserve resources (for a given role) on 
a per-slave basis, as introduced in MESOS-505. This allows you to almost 
statically partition off a set of resources on a set of machines, to guarantee 
certain types of frameworks get some resources.

This is very useful, though it is also very useful to be able to control these 
reservations through the master (instead of per-slave) for when I don't care 
which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of (X,Y).

I'm not sure what structure this could take, but apparently it has already been 
discussed. Would this be a CLI flag? Could there be a (authenticated) web 
interface to control these reservations?

Follow up epic: MESOS-6514.

  was:
Currently Mesos supports the ability to reserve resources (for a given role) on 
a per-slave basis, as introduced in MESOS-505. This allows you to almost 
statically partition off a set of resources on a set of machines, to guarantee 
certain types of frameworks get some resources.

This is very useful, though it is also very useful to be able to control these 
reservations through the master (instead of per-slave) for when I don't care 
which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of (X,Y).

I'm not sure what structure this could take, but apparently it has already been 
discussed. Would this be a CLI flag? Could there be a (authenticated) web 
interface to control these reservations?


> Introduce Master / Offer Resource Reservations aka Quota
> 
>
> Key: MESOS-1791
> URL: https://issues.apache.org/jira/browse/MESOS-1791
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master, replicated log
>Reporter: Tom Arnfeld
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Currently Mesos supports the ability to reserve resources (for a given role) 
> on a per-slave basis, as introduced in MESOS-505. This allows you to almost 
> statically partition off a set of resources on a set of machines, to 
> guarantee certain types of frameworks get some resources.
> This is very useful, though it is also very useful to be able to control 
> these reservations through the master (instead of per-slave) for when I don't 
> care which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of 
> (X,Y).
> I'm not sure what structure this could take, but apparently it has already 
> been discussed. Would this be a CLI flag? Could there be a (authenticated) 
> web interface to control these reservations?
> Follow up epic: MESOS-6514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3875) Account dynamic reservations towards quota.

2016-10-31 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3875:
---
Summary: Account dynamic reservations towards quota.  (was: Account dynamic 
reservations towards quota)

> Account dynamic reservations towards quota.
> ---
>
> Key: MESOS-3875
> URL: https://issues.apache.org/jira/browse/MESOS-3875
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Dynamic reservations—whether allocated or not—should be accounted towards 
> role's quota. This requires update in at least two places:
> * The built-in allocator, which actually satisfies quota;
> * The sanity check in the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3875) Account dynamic reservations towards quota.

2016-10-31 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3875:
---
Shepherd:   (was: Joris Van Remoortere)
Assignee: (was: Alexander Rukletsov)

> Account dynamic reservations towards quota.
> ---
>
> Key: MESOS-3875
> URL: https://issues.apache.org/jira/browse/MESOS-3875
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>  Labels: mesosphere
>
> Dynamic reservations—whether allocated or not—should be accounted towards 
> role's quota. This requires update in at least two places:
> * The built-in allocator, which actually satisfies quota;
> * The sanity check in the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3938) Consider allowing setting quotas for the default '*' role.

2016-10-31 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3938:
---
Shepherd:   (was: Joris Van Remoortere)
Assignee: (was: Alexander Rukletsov)
 Summary: Consider allowing setting quotas for the default '*' role.  (was: 
Allow setting quotas for the default '*' role)

> Consider allowing setting quotas for the default '*' role.
> --
>
> Key: MESOS-3938
> URL: https://issues.apache.org/jira/browse/MESOS-3938
> Project: Mesos
>  Issue Type: Task
>Reporter: Alexander Rukletsov
>
> Investigate use cases and implications of the possibility to set quota for 
> the '*' role. For example, having quota for '*' set can effectively reduce 
> the scope of the quota capacity heuristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2016-10-31 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6184:
---
Target Version/s: 1.2.0

> Health checks should use a general mechanism to enter namespaces of the task.
> -
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Blocker
>  Labels: health-check, mesosphere
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6142) Frameworks may RESERVE for an arbitrary role.

2016-10-31 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6142:
---
Affects Version/s: 1.1.0
 Target Version/s: 1.2.0

> Frameworks may RESERVE for an arbitrary role.
> -
>
> Key: MESOS-6142
> URL: https://issues.apache.org/jira/browse/MESOS-6142
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>  Labels: mesosphere, reservations
>
> The master does not validate that resources from a reservation request have 
> the same role the framework is registered with. As a result, frameworks may 
> reserve resources for arbitrary roles.
> I've modified the role in [the {{ReserveThenUnreserve}} 
> test|https://github.com/apache/mesos/blob/bca600cf5602ed8227d91af9f73d689da14ad786/src/tests/reservation_tests.cpp#L117]
>  to "yoyo" and observed the following in the test's log:
> {noformat}
> I0908 18:35:43.379122 2138112 master.cpp:3362] Processing ACCEPT call for 
> offers: [ dfaf67e6-7c1c-4988-b427-c49842cb7bb7-O0 ] on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train) for framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- 
> (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116
> I0908 18:35:43.379170 2138112 master.cpp:3022] Authorizing principal 
> 'test-principal' to reserve resources 'cpus(yoyo, test-principal):1; 
> mem(yoyo, test-principal):512'
> I0908 18:35:43.379678 2138112 master.cpp:3642] Applying RESERVE operation for 
> resources cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 from 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.379767 2138112 master.cpp:7341] Sending checkpointed resources 
> cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.380273 3211264 slave.cpp:2497] Updated checkpointed resources 
> from  to cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512
> I0908 18:35:43.380574 2674688 hierarchical.cpp:760] Updated allocation of 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 from cpus(*):1; mem(*):512; 
> disk(*):470841; ports(*):[31000-32000] to ports(*):[31000-32000]; cpus(yoyo, 
> test-principal):1; disk(*):470841; mem(yoyo, test-principal):512 with RESERVE 
> operation
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6457) Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.

2016-10-31 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6457:
---
Priority: Blocker  (was: Major)
 Summary: Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.  
(was: Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING)

> Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.
> -
>
> Key: MESOS-6457
> URL: https://issues.apache.org/jira/browse/MESOS-6457
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>Priority: Blocker
>
> A task can currently transition from {{TASK_KILLING}} to {{TASK_RUNNING}}, if 
> for example it starts/stops passing a health check once it got into the 
> {{TASK_KILLING}} state.
> I think that this behaviour is counterintuitive. It also makes the life of 
> framework/tools developers harder, since they have to keep track of the 
> complete task status history in order to know if a task is being killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6494) Clean up the flags parsing in the executors.

2016-10-31 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6494:
---
Summary: Clean up the flags parsing in the executors.  (was: Clean up the 
flags parsing in the executors)

> Clean up the flags parsing in the executors.
> 
>
> Key: MESOS-6494
> URL: https://issues.apache.org/jira/browse/MESOS-6494
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>
> The current executors and the executor libraries use a mix of `stout::flags` 
> and `os::getenv` to parse flags, leading to a lot of unnecessary and 
> sometimes duplicated code.
> This should be cleaned up, using only {{stout::flags}} to parse flags.
> Environment variables should be used for the flags that are common to ALL the 
> executors (listed in the Executor HTTP API doc).
> Command line parameters should be used for flags that apply only to 
> individual executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6517) Health checking only on 127.0.0.1 is limiting.

2016-10-31 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-6517:
--

 Summary: Health checking only on 127.0.0.1 is limiting.
 Key: MESOS-6517
 URL: https://issues.apache.org/jira/browse/MESOS-6517
 Project: Mesos
  Issue Type: Improvement
Reporter: Alexander Rukletsov


As of Mesos 1.1.0, HTTP and TCP health checks always use 127.0.0.1 as the 
target IP. This is not configurable. As a result, tasks should listen on all 
interfaces if they want to support HTTP and TCP health checks. However, there 
might be some cases where tasks or containers will end up binding to a specific 
IP address. 

To make health checking more robust we can:
* look at all interfaces in a given network namespace and do health check on 
all the IP addresses;
* allow users to specify the IP to health check;
* deduce the target IP from task's discovery information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6349) JSON Generation breaks if other locale than C is used.

2016-10-31 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6349:
---
Summary: JSON Generation breaks if other locale than C is used.  (was: JSON 
Generation breaks if other locale than C is used)

> JSON Generation breaks if other locale than C is used.
> --
>
> Key: MESOS-6349
> URL: https://issues.apache.org/jira/browse/MESOS-6349
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alexander Rojas
>
> In locales where the decimal separator is different from a {{.}}, i.e. Latin 
> American locales, Europe locales and most of asia, the JSON generated is 
> invalid, since it uses the system locale.
> Example, the following code will be generated:
> {code}
> {
>   "float_number" : 1234567,9871
> }
> {code}
> Instead of the expected:
> {code}
> {
>   "float_number" : 1234567.9871
> }
> {code}
> This problem doesn't affect Mesos executables since they completely ignore 
> the system's locale, but it does affect applications built in Java upon stout 
> and libprocess since the JVM does set the process locale to the system one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6349) JSON Generation breaks if other locale than C is used.

2016-10-31 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6349:
---
Target Version/s: 1.2.0

> JSON Generation breaks if other locale than C is used.
> --
>
> Key: MESOS-6349
> URL: https://issues.apache.org/jira/browse/MESOS-6349
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alexander Rojas
>
> In locales where the decimal separator is different from a {{.}}, i.e. Latin 
> American locales, Europe locales and most of asia, the JSON generated is 
> invalid, since it uses the system locale.
> Example, the following code will be generated:
> {code}
> {
>   "float_number" : 1234567,9871
> }
> {code}
> Instead of the expected:
> {code}
> {
>   "float_number" : 1234567.9871
> }
> {code}
> This problem doesn't affect Mesos executables since they completely ignore 
> the system's locale, but it does affect applications built in Java upon stout 
> and libprocess since the JVM does set the process locale to the system one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6494) Clean up the flags parsing in the executors.

2016-11-01 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625362#comment-15625362
 ] 

Alexander Rukletsov commented on MESOS-6494:


{noformat}
commit e802760c932e766775eba481d890d43232693815
Author: Gastón Kleiman gas...@mesosphere.com
Date:   Mon Oct 31 16:07:42 2016 -0700

Removed outdated TODO in stout::flags.

Review: https://reviews.apache.org/r/52878/
{noformat}
{noformat}
commit 0d116e099b6568dd0b99a973df57f3581b81a8a4
Author: Gastón Kleiman gas...@mesosphere.com
Date:   Mon Oct 31 16:09:04 2016 -0700

Added parsers for 'SlaveID', 'ExecutorID' and 'FrameworkID'.

Review: https://reviews.apache.org/r/53197/
{noformat}

> Clean up the flags parsing in the executors.
> 
>
> Key: MESOS-6494
> URL: https://issues.apache.org/jira/browse/MESOS-6494
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>
> The current executors and the executor libraries use a mix of `stout::flags` 
> and `os::getenv` to parse flags, leading to a lot of unnecessary and 
> sometimes duplicated code.
> This should be cleaned up, using only {{stout::flags}} to parse flags.
> Environment variables should be used for the flags that are common to ALL the 
> executors (listed in the Executor HTTP API doc).
> Command line parameters should be used for flags that apply only to 
> individual executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6457) Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6457:
---
Target Version/s: 0.28.3, 1.0.2, 1.1.0  (was: 1.0.2, 1.1.0)

> Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.
> -
>
> Key: MESOS-6457
> URL: https://issues.apache.org/jira/browse/MESOS-6457
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.1
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>Priority: Blocker
>
> A task can currently transition from {{TASK_KILLING}} to {{TASK_RUNNING}}, if 
> for example it starts/stops passing a health check once it got into the 
> {{TASK_KILLING}} state.
> I think that this behaviour is counterintuitive. It also makes the life of 
> framework/tools developers harder, since they have to keep track of the 
> complete task status history in order to know if a task is being killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6457) Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6457:
---
Affects Version/s: 0.28.2
   1.0.1

> Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.
> -
>
> Key: MESOS-6457
> URL: https://issues.apache.org/jira/browse/MESOS-6457
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.1
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>Priority: Blocker
>
> A task can currently transition from {{TASK_KILLING}} to {{TASK_RUNNING}}, if 
> for example it starts/stops passing a health check once it got into the 
> {{TASK_KILLING}} state.
> I think that this behaviour is counterintuitive. It also makes the life of 
> framework/tools developers harder, since they have to keep track of the 
> complete task status history in order to know if a task is being killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5597) Document Mesos "health check" feature.

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5597:
---
Summary: Document Mesos "health check" feature.  (was: Document Mesos 
"health check" feature)

> Document Mesos "health check" feature.
> --
>
> Key: MESOS-5597
> URL: https://issues.apache.org/jira/browse/MESOS-5597
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Neil Conway
>  Labels: documentation, health-check, mesosphere
>
> We don't talk about this feature at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5597) Document Mesos "health check" feature.

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5597:
---
Shepherd: Till Toenshoff
Assignee: Alexander Rukletsov
  Sprint: Mesosphere Sprint 46
Target Version/s: 1.2.0
  Issue Type: Documentation  (was: Bug)

> Document Mesos "health check" feature.
> --
>
> Key: MESOS-5597
> URL: https://issues.apache.org/jira/browse/MESOS-5597
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Alexander Rukletsov
>  Labels: documentation, health-check, mesosphere
>
> We don't talk about this feature at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6524) Command health checks ignore some fields from CommandInfo.

2016-11-01 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-6524:
--

 Summary: Command health checks ignore some fields from CommandInfo.
 Key: MESOS-6524
 URL: https://issues.apache.org/jira/browse/MESOS-6524
 Project: Mesos
  Issue Type: Improvement
Reporter: Alexander Rukletsov
Priority: Minor


Command health are specified via {{CommandInfo}}, but ignore several fields: 
{{user}} and {{uris}}. We should either respect these fields or advice users 
not to use them, i.e. fail validation and/or print warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2018) Dynamic Reservation

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-2018:
---
Priority: Major  (was: Critical)

> Dynamic Reservation
> ---
>
> Key: MESOS-2018
> URL: https://issues.apache.org/jira/browse/MESOS-2018
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, framework, master, slave
>Reporter: Adam B
>Assignee: Michael Park
>  Labels: mesosphere, offer, persistence, reservations, resource, 
> stateful, storage
>
> h3. Overview
> This is a feature to provide better support for running stateful services on 
> Mesos such as HDFS (Distributed Filesystem), Cassandra (Distributed 
> Database), or MySQL (Local Database).
> Current resource reservations (henceforth called "static" reservations) are 
> statically determined by the slave operator at slave start time, and 
> individual frameworks have no authority to reserve resources themselves.
> Dynamic reservations allow a framework to dynamically reserve offered 
> resources, such that those resources will only be re-offered to the same 
> framework (or other frameworks with the same role).
> This is especially useful if the framework's task stored some state on the 
> slave, and needs a guaranteed set of resources reserved so that it can 
> re-launch a task on the same slave to recover that state.
> h3. Planned Stages
> 1. MESOS-2489: Enable a framework to perform reservation operations.
> The goal of this stage is to allow the framework to send back a 
> Reserve/Unreserve operation which gets validated by the master and updates 
> the allocator resources. The allocator's {{allocate}} logic is left unchanged 
> and the resources get offered back to the framework's role as desired.
> 2. MESOS-2491: Persist the reservation state on the slave.
> The goal of this stage is to persist the reservation state on the slave. 
> Currently the master knows to store the persistent volumes in the 
> {{checkpointedResources}} data structure which gets sent to individual slaves 
> to be checkpointed. We will update the master such that dynamically reserved 
> resources are stored in the {{checkpointedResources}} as well. This stage 
> also involves subtasks such as updating the slave re(register) logic to 
> support slave re-starts.
> 3. MESOS-2600: Introduce reservation HTTP endpoints on the master.
> The goal of this stage is to enable operators to perform reservation 
> operations via HTTP endpoints on the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6247) Enable Framework to set weight.

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6247:
---
  Priority: Major  (was: Critical)
Issue Type: Wish  (was: Bug)
   Summary: Enable Framework to set weight.  (was: Enable Framework to set 
weight)

> Enable Framework to set weight.
> ---
>
> Key: MESOS-6247
> URL: https://issues.apache.org/jira/browse/MESOS-6247
> Project: Mesos
>  Issue Type: Wish
>  Components: allocation
> Environment: all
>Reporter: Klaus Ma
>
> We'd like to enable framework's weight when it register. So the framework can 
> share resources based on weight within the same role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1256) Enable running all existing containerizer tests with different containerizer implementations.

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-1256:
---
  Priority: Major  (was: Critical)
Issue Type: Improvement  (was: Bug)

> Enable running all existing containerizer tests with different containerizer 
> implementations.
> -
>
> Key: MESOS-1256
> URL: https://issues.apache.org/jira/browse/MESOS-1256
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave, test
>Reporter: Benjamin Hindman
>
> Right now the tests get run with the MesosContainerizer but they should 
> really be pluggable so that new containerizer implementations can be plugged 
> in directly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1547) option --cgroups_enable_cfs - not working on Oracle Linux

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-1547:
---
Priority: Major  (was: Critical)

> option --cgroups_enable_cfs - not working on Oracle Linux 
> --
>
> Key: MESOS-1547
> URL: https://issues.apache.org/jira/browse/MESOS-1547
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Affects Versions: 0.19.0
> Environment: Oracle Enterprise Linux - version 
> 2.6.39-400.215.3.el6uek.x86_64
>Reporter: Umesh Batra
>
> I am not able to make option "cgroups_enable_cfs" work with OEL? 
>  
> Couple of posts that I came across talks about, rebuilding the kernel with a 
> different configuration. 
>  
> Appreciate, if you can confirm or guide me on the same. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2759) Could not create MesosContainerizer: Could not create isolator network/port_mapping: Routing library check failed: Capability ROUTE_LINK_VETH_GET_PEER_OWN_REFERENCE is no

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-2759:
---
Target Version/s: 0.22.1, 0.22.0  (was: 0.22.0, 0.22.1)
Priority: Major  (was: Critical)

> Could not create MesosContainerizer: Could not create isolator 
> network/port_mapping: Routing library check failed: Capability 
> ROUTE_LINK_VETH_GET_PEER_OWN_REFERENCE is not available
> -
>
> Key: MESOS-2759
> URL: https://issues.apache.org/jira/browse/MESOS-2759
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anuj Gupta
>
> [root@localhost ~]# mesos-slave --master=127.0.0.1:5050 
> --log_dir=/var/log/mesos --work_dir=/var/lib/mesos 
> --isolation=cgroups/cpu,cgroups/mem,network/port_mapping 
> --resources=ephemeral_ports:[32768-57344] --ephemeral_ports_per_container=1024
> mesos-slave: /usr/lib64/libnl-3.so.200: no version information available 
> (required by /usr/local/lib/libmesos-0.22.0.so)
> mesos-slave: /usr/lib64/libnl-route-3.so.200: no version information 
> available (required by /usr/local/lib/libmesos-0.22.0.so)
> mesos-slave: /usr/lib64/libnl-3.so.200: no version information available 
> (required by /lib/libnl-idiag-3.so.200)
> I0521 14:10:16.727126 13214 logging.cpp:172] INFO level logging started!
> I0521 14:10:16.727409 13214 main.cpp:156] Build: 2015-05-21 13:21:45 by root
> I0521 14:10:16.727432 13214 main.cpp:158] Version: 0.22.0
> I0521 14:10:16.727727 13214 containerizer.cpp:110] Using isolation: 
> cgroups/cpu,cgroups/mem,network/port_mapping
> Failed to create a containerizer: Could not create MesosContainerizer: Could 
> not create isolator network/port_mapping: Routing library check failed: 
> Capability ROUTE_LINK_VETH_GET_PEER_OWN_REFERENCE is not available



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2828) Refactor 'model' functions and use JSON::Protobuf instead

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-2828:
---
  Priority: Major  (was: Critical)
Issue Type: Improvement  (was: Bug)

> Refactor 'model' functions and use JSON::Protobuf instead
> -
>
> Key: MESOS-2828
> URL: https://issues.apache.org/jira/browse/MESOS-2828
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rojas
>  Labels: mesosphere
>
> The current patter used to serialize objects returned by endpoints requieres 
> the use of a {{model}} function which receives the object to be serialized 
> and returns a JSON object. (See slave.cpp and master.cpp).
> If possible, it would be better to use JSON::Protobuf to serialize protocol 
> buffer objects, and if possible to write a jsonfy function similar to 
> stringify.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3000) Failing test - NsTest.ROOT_setns

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3000:
---
Priority: Major  (was: Blocker)

> Failing test - NsTest.ROOT_setns
> 
>
> Key: MESOS-3000
> URL: https://issues.apache.org/jira/browse/MESOS-3000
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.23.0
>Reporter: Ian Downes
>Assignee: Chris Lambert
>
> Appears to be the same issue plaguing MESOS-2199
> {noformat}
> [root@hostname build]# MESOS_VERBOSE=1 ./bin/mesos-tests.sh 
> --gtest_filter=NsTest.ROOT_setns
> ...
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from NsTest
> [ RUN  ] NsTest.ROOT_setns
> ABORT: (../../../3rdparty/libprocess/src/subprocess.cpp:163): Failed to 
> os::execvpe in childMain: Permission denied*** Aborted at 1436292540 (unix 
> time) try "date -d @1436292540" if you are using GNU date ***
> PC: @ 0x7f7a1229e625 __GI_raise
> *** SIGABRT (@0xfffe0001) received by PID 1 (TID 0x7f7a19afc820) from PID 
> 1; stack trace: ***
> @ 0x7f7a13421710 (unknown)
> @ 0x7f7a1229e625 __GI_raise
> @ 0x7f7a1229fe05 __GI_abort
> @   0x860ba1 (unknown)
> @   0x860bcf (unknown)
> @ 0x7f7a1826f118 (unknown)
> @ 0x7f7a18274594 (unknown)
> @ 0x7f7a18273b88 (unknown)
> @ 0x7f7a18273098 (unknown)
> @  0x1180720 (unknown)
> @  0x117a5d7 (unknown)
> @ 0x7f7a123548fd clone
> ../../src/tests/ns_tests.cpp:121: Failure
> Failed to wait 15secs for status
> [  FAILED  ] NsTest.ROOT_setns (15004 ms)
> [--] 1 test from NsTest (15004 ms total)
> [--] Global test environment tear-down
> ../../src/tests/environment.cpp:441: Failure
> Failed
> Tests completed with child processes remaining:
> -+- 40531 /home/idownes/workspace/mesos/build/src/.libs/lt-mesos-tests 
> --gtest_filter=NsTest.ROOT_setns
>  \--- 40565 /home/idownes/workspace/mesos/build/src/.libs/lt-mesos-tests 
> --gtest_filter=NsTest.ROOT_setns
> [==] 1 test from 1 test case ran. (15034 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] NsTest.ROOT_setns
> {noformat}
> Relevant strace for the forked child:
> {noformat}
> ...
> getpid()= 1
> dup2(6, 0) = 0
> dup2(7, 1) = 1
> dup2(8, 2) = 2
> close(6) = 0
> close(7) = 0
> close(8) = 0
> execve("/home/idownes/workspace/mesos/build/src/setns-test-helper", 
> ["setns-test-helper", "SetnsTestHelper"], [/* 24 vars */]) = -1 EACCES 
> (Permission denied)
> write(2, "ABORT: (../../../3rdparty/libpro"..., 62) = 62
> write(2, "Failed to os::execvpe in childMa"..., 53) = 53
> ...
> {noformat}
> Binary that it's trying to exec:
> {noformat}
> [root@hostname build]# stat 
> /home/idownes/workspace/mesos/build/src/setns-test-helper
>   File: `/home/idownes/workspace/mesos/build/src/setns-test-helper'
>   Size: 7948Blocks: 16 IO Block: 4096   regular file
> Device: 801h/2049d  Inode: 22949249Links: 1
> Access: (0755/-rwxr-xr-x)  Uid: (13118/ idownes)   Gid: ( 1500/employee)
> Access: 2015-07-07 17:58:09.569861237 +
> Modify: 2015-07-07 17:58:09.573861290 +
> Change: 2015-07-07 17:58:09.573861290 +
> [root@hostname build]# 
> /home/idownes/workspace/mesos/build/src/setns-test-helper
> Usage: /home/idownes/workspace/mesos/build/src/.libs/lt-setns-test-helper 
>  [OPTIONS]
> Available subcommands:
> help
> SetnsTestHelper
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3039) Allow executors binding IP to be different than Slave binding IP.

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3039:
---
Issue Type: Improvement  (was: Bug)
   Summary: Allow executors binding IP to be different than Slave binding 
IP.  (was: Allow executors binding IP to be different than Slave binding IP)

> Allow executors binding IP to be different than Slave binding IP.
> -
>
> Key: MESOS-3039
> URL: https://issues.apache.org/jira/browse/MESOS-3039
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>Priority: Critical
>  Labels: mesosphere
>
> Currently, the Slave will bind either to the loopback IP (127.0.0.1) or to 
> the IP passed via the '--ip' flag. When it launches a containerized executor 
> (e.g, via Mesos Containerizer), the executor inherits the binding IP of the 
> Slave. This is due to the fact that the '--ip' flags sets the environment 
> variable `LIBPROCESS_IP` to the passed IP. The executor then inherits this 
> environment variable and is forced to bind to the Slave IP.
> If an executor is running in its own containerized environment, with a 
> separate IP than that of the Slave, currently there is no way of forcing it 
> to bind to its own IP. A potential solution is to use the executor 
> environment decorator hooks to update LIBPROCESS_IP environment variable for 
> the executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6280) Task group executor should support command health checks.

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6280:
---
Priority: Critical  (was: Blocker)

> Task group executor should support command health checks.
> -
>
> Key: MESOS-6280
> URL: https://issues.apache.org/jira/browse/MESOS-6280
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Priority: Critical
>  Labels: mesosphere
>
> Currently, the default (aka pod) executor supports only HTTP and TCP health 
> checks. We should also support command health checks as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6280) Task group executor should support command health checks.

2016-11-01 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6280:
---
Issue Type: Improvement  (was: Bug)

> Task group executor should support command health checks.
> -
>
> Key: MESOS-6280
> URL: https://issues.apache.org/jira/browse/MESOS-6280
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Priority: Critical
>  Labels: mesosphere
>
> Currently, the default (aka pod) executor supports only HTTP and TCP health 
> checks. We should also support command health checks as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    4   5   6   7   8   9   10   11   12   13   >