date:20160229

[jira] [Commented] (MESOS-4816) Expose TaskInfo to Isolators

2016-02-29 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173399#comment-15173399
 ] 

Guangya Liu commented on MESOS-4816:


[~cdoyle] Can you please show a concrete use case for this? When the isolator 
want to access the TaskInfo? Does the isolator want to access all task infos?

The problem with your proposal is that the {{update}} is mainly used to update 
executor resources when task/executor is running, and one executor container 
will map to multiple tasks but not one task. so the parameter here for 
{{TaskInfo}} may need to be a list?

> Expose TaskInfo to Isolators
> 
>
> Key: MESOS-4816
> URL: https://issues.apache.org/jira/browse/MESOS-4816
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules, slave
>Reporter: Connor Doyle
>
> Authors of custom isolator modules frequently require access to the TaskInfo 
> in order to read custom metadata in task labels.
> Currently, it's possible to link containers to tasks within a module by 
> implementing both an isolator and the {{slaveRunTaskLabelDecorator}} hook, 
> and maintaining a shared map of containers to tasks.  This way works, but 
> adds unnecessary complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4792) Remove src/common/date_utils.{c,h}pp

2016-02-29 Thread Yong Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173384#comment-15173384
 ] 

Yong Tang commented on MESOS-4792:
--

The review request seems to have been merged. Thanks [~neilc] for the review.

> Remove src/common/date_utils.{c,h}pp
> 
>
> Key: MESOS-4792
> URL: https://issues.apache.org/jira/browse/MESOS-4792
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Neil Conway
>Assignee: Yong Tang
>Priority: Trivial
>  Labels: mesosphere, newbie, tech-debt
>
> AFAICT this is unused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4761) Add agent flags to allow operators to specify CNI plugin and config directories.

2016-02-29 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173348#comment-15173348
 ] 

Qian Zhang commented on MESOS-4761:
---

RR: https://reviews.apache.org/r/44200/

> Add agent flags to allow operators to specify CNI plugin and config 
> directories.
> 
>
> Key: MESOS-4761
> URL: https://issues.apache.org/jira/browse/MESOS-4761
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> According to design doc, we plan to add the following flags:
> “--network_cni_plugins_dir”
> Location of the CNI plugin binaries. The “network/cni” isolator will find CNI 
> plugins under this directory so that it can execute the plugins to add/delete 
> container from the CNI networks. It is the operator’s responsibility to 
> install the CNI plugin binaries in the specified directory.
> “--network_cni_config_dir”
> Location of the CNI network configuration files. For each network that 
> containers launched in Mesos agent can connect to, the operator should 
> install a network configuration file in JSON format in the specified 
> directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6

2016-02-29 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173311#comment-15173311
 ] 

Greg Mann commented on MESOS-4053:
--

I just ran into this while testing 0.27.2-rc1 on Ubuntu 14.04, using gcc, with 
libevent and SSL enabled. This was after running several rounds of tests on 
this machine with different builds of Mesos:

{code}
[ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
I0301 06:36:33.444818 45724 process.cpp:2492] Spawned process 
files@172.31.18.71:50382
I0301 06:36:33.444849 45739 process.cpp:2502] Resuming files@172.31.18.71:50382 
at 2016-03-01 06:36:33.444837888+00:00
I0301 06:36:33.445101 45745 process.cpp:2502] Resuming help@172.31.18.71:50382 
at 2016-03-01 06:36:33.445021952+00:00
I0301 06:36:33.458566 45724 process.cpp:2492] Spawned process 
__latch__(2)@172.31.18.71:50382
I0301 06:36:33.458591 45746 process.cpp:2502] Resuming 
__gc__@172.31.18.71:50382 at 2016-03-01 06:36:33.458576896+00:00
I0301 06:36:33.458600 45738 process.cpp:2502] Resuming 
__latch__(2)@172.31.18.71:50382 at 2016-03-01 06:36:33.458589952+00:00
I0301 06:36:33.458652 45738 process.cpp:2607] Cleaning up 
__latch__(2)@172.31.18.71:50382
../../src/tests/mesos.cpp:955: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/sys/fs/cgroup/perf_event/mesos_test': Device or resource busy
---
We're very sorry but we can't seem to destroy existing
cgroups that we likely created as part of an earlier
invocation of the tests. Please manually destroy the cgroup
at '/sys/fs/cgroup/perf_event/mesos_test' by first
manually killing all the processes found in the file at 
'/sys/fs/cgroup/perf_event/mesos_test/tasks'
---
I0301 06:36:33.458739 45744 process.cpp:2502] Resuming 
__gc__@172.31.18.71:50382 at 2016-03-01 06:36:33.458727936+00:00
I0301 06:36:33.458854 45749 process.cpp:2502] Resuming 
AuthenticationRouter(1)@172.31.18.71:50382 at 2016-03-01 
06:36:33.458842112+00:00
I0301 06:36:33.461118 45724 process.cpp:2492] Spawned process 
__latch__(3)@172.31.18.71:50382
I0301 06:36:33.461139 45752 process.cpp:2502] Resuming 
__gc__@172.31.18.71:50382 at 2016-03-01 06:36:33.461127936+00:00
I0301 06:36:33.461161 45748 process.cpp:2502] Resuming 
__latch__(3)@172.31.18.71:50382 at 2016-03-01 06:36:33.461143040+00:00
../../src/tests/mesos.cpp:989: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/sys/fs/cgroup/perf_event/mesos_test': Device or resource busy
I0301 06:36:33.461216 45748 process.cpp:2607] Cleaning up 
__latch__(3)@172.31.18.71:50382
I0301 06:36:33.461310 45742 process.cpp:2502] Resuming files@172.31.18.71:50382 
at 2016-03-01 06:36:33.461299968+00:00
I0301 06:36:33.461315 45740 process.cpp:2502] Resuming 
__gc__@172.31.18.71:50382 at 2016-03-01 06:36:33.461304064+00:00
I0301 06:36:33.461503 45742 process.cpp:2607] Cleaning up 
files@172.31.18.71:50382
[  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (17 ms)
{code}

> MemoryPressureMesosTest tests fail on CentOS 6.6
> 
>
> Key: MESOS-4053
> URL: https://issues.apache.org/jira/browse/MESOS-4053
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>Assignee: Benjamin Hindman
>  Labels: mesosphere, test-failure
>
> {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and 
> {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It 
> seems that mounted cgroups are not properly cleaned up after previous tests, 
> so multiple hierarchies are detected and thus an error is produced:
> {code}
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
>

[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-02-29 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173283#comment-15173283
 ] 

Guangya Liu commented on MESOS-3505:


[~xujyan] two questions want to get your confirm, can you help?

1) For this reason, it's possible when an image with a repo:tag already cached 
locally on an agent host and a task requiring this repo:tag arrives, it's using 
an image that's different than the one the user intended.
> Did not quite catch what the problem here, why the task is using the 
> image that it does not intended to use? Do you mean that the image might 
> be updated in remote registry and the local cached one is still using the 
> old one?

2) If support pulling docker images with digest and image ID, how end user get 
those digest and image ID before pull it?

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4814) Implement private registry test with ssl.

2016-02-29 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173246#comment-15173246
 ] 

Guangya Liu commented on MESOS-4814:


One question, does the SSL still needed for registry puller after it was 
refactored by using fetcher to pull doker images?

> Implement private registry test with ssl.
> -
>
> Key: MESOS-4814
> URL: https://issues.apache.org/jira/browse/MESOS-4814
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer
>
> Test unified containerizer using docker images, have ssl enabled to test the 
> private registry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4825) Master's slave reregister logic does not update version field

2016-02-29 Thread Klaus Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-4825:

Shepherd: Joris Van Remoortere
Assignee: Klaus Ma

> Master's slave reregister logic does not update version field
> -
>
> Key: MESOS-4825
> URL: https://issues.apache.org/jira/browse/MESOS-4825
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joris Van Remoortere
>Assignee: Klaus Ma
>Priority: Blocker
> Fix For: 0.28.0
>
>
> The master's logic for reregistering a slave does not update the version 
> field if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4447) Updated reserved() API

2016-02-29 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173241#comment-15173241
 ] 

Guangya Liu commented on MESOS-4447:


[~bmahler] The reason that I want to remove this API is because:

1) The API {{Resources Resources::reserved(const string& role) const}} should 
able to return reserved resources for a specified role or all reserved 
resources for all roles in flatten mode, the {{Optimistic Offer Phase 1}} 
highly depends on the API of flatten reserved resources for different roles as 
I need to translate those resources to allocation slack. Here I was removing 
the API of {{reserved()}} and enable {{Resources Resources::reserved(const 
string& role) const}} can work without role specified.

2) If removed the API of {{reserved()}}, then the one who is calling 
{{reserved()}} before can get the reserved resources for different roles first 
and then create a hashmap in the caller. Currently there are only two places 
calling the {{reserved()}} API.



> Updated reserved() API
> --
>
> Key: MESOS-4447
> URL: https://issues.apache.org/jira/browse/MESOS-4447
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> There are some problems for current {{reserve}} API. The problem is as 
> following:
> {code}
> hashmap Resources::reserved() const
> {
>   hashmap result;
>   foreach (const Resource& resource, resources) {
> if (isReserved(resource)) {
>   result[resource.role()] += resource;
> }
>   }
>   return result;
> }
> Resources Resources::reserved(const string& role) const
> {
>   return filter(lambda::bind(isReserved, lambda::_1, role));
> }
> bool Resources::isReserved(
> const Resource& resource,
> const Option& role)
> {
>   if (role.isSome()) {
> return !isUnreserved(resource) && role.get() == resource.role();
>   } else {
> return !isUnreserved(resource);
>   }
> }
> {code}
> This caused the {{reserved(const string& role) }} has no chance to transfer a 
>   None() parameter to get all reserved resources in flatten mode.
> The solution is remove {{reserved()}} and update {{reserved(const string& 
> role) }} to {{reserved(const Option& role = None()) }}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4825) Master's slave reregister logic does not update version field

2016-02-29 Thread Joris Van Remoortere (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173239#comment-15173239
 ] 

Joris Van Remoortere commented on MESOS-4825:
-

I can shepherd this.
I don't think we should reject if there is a version mismatch. That would 
prevent us from doing rolling upgrades.
We just want to update the version to the current one the agent is running, so 
that the {{/slaves}} endpoint reports it correctly, and any logic that is 
dependent on the slave's version works correctly.

> Master's slave reregister logic does not update version field
> -
>
> Key: MESOS-4825
> URL: https://issues.apache.org/jira/browse/MESOS-4825
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joris Van Remoortere
>Priority: Blocker
> Fix For: 0.28.0
>
>
> The master's logic for reregistering a slave does not update the version 
> field if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4825) Master's slave reregister logic does not update version field

2016-02-29 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173231#comment-15173231
 ] 

Klaus Ma commented on MESOS-4825:
-

[~jvanremoortere], got your points :). The version is updated if master failed 
over, but missed for normal re-register cases. But it seems we did not check 
slave's version against master version: if slave's version is bigger than 
master's (slave's 0.28.0 vs. master's 0.27.0), master should reject it because 
version mismatch. BTW, would you shepherd this? or you have posted a RR :).

> Master's slave reregister logic does not update version field
> -
>
> Key: MESOS-4825
> URL: https://issues.apache.org/jira/browse/MESOS-4825
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joris Van Remoortere
>Priority: Blocker
> Fix For: 0.28.0
>
>
> The master's logic for reregistering a slave does not update the version 
> field if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-4825) Master's slave reregister logic does not update version field

2016-02-29 Thread Joris Van Remoortere (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173208#comment-15173208
 ] 

Joris Van Remoortere edited comment on MESOS-4825 at 3/1/16 4:18 AM:
-

[~klaus1982] Not all re-register paths construct a {{new Slave()}}:
https://github.com/apache/mesos/blob/0fd95ccc54e4d144c3eb60e98bf77d53b6bdab63/src/master/master.cpp#L4405-L4467


was (Author: jvanremoortere):
[~klaus1982]Not all re-register paths construct a {{new Slave()}}:
https://github.com/apache/mesos/blob/0fd95ccc54e4d144c3eb60e98bf77d53b6bdab63/src/master/master.cpp#L4405-L4467

> Master's slave reregister logic does not update version field
> -
>
> Key: MESOS-4825
> URL: https://issues.apache.org/jira/browse/MESOS-4825
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joris Van Remoortere
>Priority: Blocker
> Fix For: 0.28.0
>
>
> The master's logic for reregistering a slave does not update the version 
> field if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4825) Master's slave reregister logic does not update version field

2016-02-29 Thread Joris Van Remoortere (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173208#comment-15173208
 ] 

Joris Van Remoortere commented on MESOS-4825:
-

[~klaus1982]Not all re-register paths construct a {{new Slave()}}:
https://github.com/apache/mesos/blob/0fd95ccc54e4d144c3eb60e98bf77d53b6bdab63/src/master/master.cpp#L4405-L4467

> Master's slave reregister logic does not update version field
> -
>
> Key: MESOS-4825
> URL: https://issues.apache.org/jira/browse/MESOS-4825
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joris Van Remoortere
>Priority: Blocker
> Fix For: 0.28.0
>
>
> The master's logic for reregistering a slave does not update the version 
> field if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4825) Master's slave reregister logic does not update version field

2016-02-29 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173202#comment-15173202
 ] 

Klaus Ma commented on MESOS-4825:
-

[~jvanremoortere], just check the code, it seems the version are updated: the 
new slave's version (MESOS_VERSION) are sent to master and updated when {{new 
Slave()}} in master.

> Master's slave reregister logic does not update version field
> -
>
> Key: MESOS-4825
> URL: https://issues.apache.org/jira/browse/MESOS-4825
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joris Van Remoortere
>Priority: Blocker
> Fix For: 0.28.0
>
>
> The master's logic for reregistering a slave does not update the version 
> field if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3367) Mesos fetcher does not extract archives for URI with parameters

2016-02-29 Thread Erik Weathers (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173132#comment-15173132
 ] 

Erik Weathers commented on MESOS-3367:
--

[~xujyan]: thanks for adding the response. I filed MESOS-4735 for specifying 
the output/result filename.

> Mesos fetcher does not extract archives for URI with parameters
> ---
>
> Key: MESOS-3367
> URL: https://issues.apache.org/jira/browse/MESOS-3367
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 0.22.1, 0.23.0
> Environment: DCOS 1.1
>Reporter: Renat Zubairov
>Assignee: haosdent
>Priority: Minor
>  Labels: mesosphere
>
> I'm deploying using marathon applications with sources served from S3. I'm 
> using a signed URL to give only temporary access to the S3 resources, so URL 
> of the resource have some query parameters.
> So URI is 'https://foo.com/file.tgz?hasi' and fetcher stores it in the file 
> with the name 'file.tgz?hasi', then it thinks that extension 'hasi' is not 
> tgz hence extraction is skipped, despite the fact that MIME Type of the HTTP 
> resource is 'application/x-tar'.
> Workaround - add additional parameter like '&workaround=.tgz'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4735) CommandInfo.URI should allow specifying target filename

2016-02-29 Thread Erik Weathers (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173129#comment-15173129
 ] 

Erik Weathers commented on MESOS-4735:
--

[~gyliu]: sure, I'm talking about the filename within the executor sandbox.  
e.g., say that the {{CommandInfo.URI.value}} has the following URL:
{code: title=Example CommandInfo.URI.value}
http://hadoop-namenode.com:50070/webhdfs/v1/user/foo/bar-executor-binary.tgz?op=OPEN
{code}

In that case, using the current mesos fetcher behavior as of mesos-0.27.0, the 
downloaded file in the executor sandbox would be:
{code: title=Example Current Result Filename}
bar-executor-binary.tgz?op=OPEN
{code}

However, I would like to be able to override this result filename in the 
executor sandbox to be something like:
{code: title=Desired Result Filename}
bar-executor-binary.tgz
{code}

Benefits:
* allows explicit avoidance of inclusion of query parameters from the URI in 
the filename (see MESOS-1686)
* the fetcher's extracting logic could actually extract such executor bundles 
(tar, zip, etc.) (see MESOS-3367)

> CommandInfo.URI should allow specifying target filename
> ---
>
> Key: MESOS-4735
> URL: https://issues.apache.org/jira/browse/MESOS-4735
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 0.27.0
>Reporter: Erik Weathers
>Assignee: Guangya Liu
>Priority: Minor
>
> The {{CommandInfo.URI}} message should allow explicitly choosing the 
> downloaded file's name, to better mimic functionality present in tools like 
> {{wget}} and {{curl}}.
> This relates to issues when the {{CommandInfo.URI}} is pointing to a URL that 
> has query parameters at the end of the path, resulting in the downloaded 
> filename having those elements.  This also prevents extracting of such files, 
> since the extraction logic is simply looking at the file's suffix. See 
> MESOS-3367, MESOS-1686, and MESOS-1509 for more info.  If this issue was 
> fixed, then I could workaround the other issues not being fixed by modifying 
> my framework's scheduler to set the target filename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4702) Document default value of "offer_timeout"

2016-02-29 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4702:
---
Shepherd: Vinod Kone

> Document default value of "offer_timeout"
> -
>
> Key: MESOS-4702
> URL: https://issues.apache.org/jira/browse/MESOS-4702
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: documentation, mesosphere, newbie
>
> There isn't a default value (i.e., offers do not timeout by default), but we 
> should clarify this in {{flags.cpp}} and {{configuration.md}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4790) Revert external linkage of symbols in master/constants.hpp

2016-02-29 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4790:
---
Shepherd: Benjamin Mahler

> Revert external linkage of symbols in master/constants.hpp
> --
>
> Key: MESOS-4790
> URL: https://issues.apache.org/jira/browse/MESOS-4790
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Neil Conway
>Priority: Trivial
>  Labels: mesosphere, newbie, tech-debt
>
> src/master/constants.hpp contains:
> {code}
> // TODO(bmahler): It appears there may be a bug with gcc-4.1.2 in which the
> // duration constants were not being initialized when having static linkage.
> // This issue did not manifest in newer gcc's. Specifically, 4.2.1 was ok.
> // So we've moved these to have external linkage but perhaps in the future
> // we can revert this.
> {code}
> From commit 232a23b2a2e11f4e905b834aa2a11afe5bf6438a. We should investigate 
> whether this is still necessary on supported compilers; it likely is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4825) Master's slave reregister logic does not update version field

2016-02-29 Thread Joris Van Remoortere (JIRA)

Joris Van Remoortere created MESOS-4825:
---

 Summary: Master's slave reregister logic does not update version 
field
 Key: MESOS-4825
 URL: https://issues.apache.org/jira/browse/MESOS-4825
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Joris Van Remoortere
Priority: Blocker
 Fix For: 0.28.0


The master's logic for reregistering a slave does not update the version field 
if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4811) Reusable/Cacheable Offer

2016-02-29 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173057#comment-15173057
 ] 

Klaus Ma commented on MESOS-4811:
-

Yes, and when tasks finished, it did not return resources back to allocator to 
avoid idle time in allocator. That'll significantly improve performance, and 
reduce the workload of Master.

> Reusable/Cacheable Offer
> 
>
> Key: MESOS-4811
> URL: https://issues.apache.org/jira/browse/MESOS-4811
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Klaus Ma
>
> Currently, the resources are return back to allocator when task finished; and 
> those resources are not allocated to framework until next allocation cycle. 
> The performance is low for short running tasks (MESOS-3078). The proposed 
> solution is to let framework keep using the offer until allocator decide to 
> rescind it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4214) Introduce HTTP endpoint /weights for updating weight

2016-02-29 Thread Adam B (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4214:
--
Fix Version/s: (was: 0.28.0)

> Introduce HTTP endpoint /weights for updating weight
> 
>
> Key: MESOS-4214
> URL: https://issues.apache.org/jira/browse/MESOS-4214
> Project: Mesos
>  Issue Type: Task
>Reporter: Yongqiao Wang
>Assignee: Yongqiao Wang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3945) Add operator documentation for /weight endpoint

2016-02-29 Thread Adam B (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3945:
--
Fix Version/s: (was: 0.28.0)

> Add operator documentation for /weight endpoint
> ---
>
> Key: MESOS-3945
> URL: https://issues.apache.org/jira/browse/MESOS-3945
> Project: Mesos
>  Issue Type: Task
>Reporter: James Wang
>Assignee: Yongqiao Wang
>
> This JIRA ticket will update the related doc to apply to dynamic weights, and 
> add an new operator guide for dynamic weights which describes basic usage of 
> the /weights endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4824) "filesystem/linux" isolator does not unmount orphaned persistent volumes

2016-02-29 Thread Joseph Wu (JIRA)

Joseph Wu created MESOS-4824:


 Summary: "filesystem/linux" isolator does not unmount orphaned 
persistent volumes
 Key: MESOS-4824
 URL: https://issues.apache.org/jira/browse/MESOS-4824
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.25.0, 0.24.0, 0.26.0, 0.27.0
Reporter: Joseph Wu
Assignee: Joseph Wu


A persistent volume can be orphaned when:
# A framework registers with checkpointing enabled.
# The framework starts a task + a persistent volume.
# The agent exits.  The task continues to run.
# Something wipes the agent's {{meta}} directory.  This removes the 
checkpointed framework info from the agent.
# The agent comes back and recovers.  The framework for the task is not found, 
so the task is considered orphaned now.

The agent currently does not unmount the persistent volume, saying (with 
{{GLOG_v=1}}) 
{code}
I0229 23:55:42.078940  5635 linux.cpp:711] Ignoring cleanup request for unknown 
container: a35189d3-85d5-4d02-b568-67f675b6dc97
{code}

Test implemented here: https://reviews.apache.org/r/44122/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4811) Reusable/Cacheable Offer

2016-02-29 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172892#comment-15172892
 ] 

Qian Zhang commented on MESOS-4811:
---

So the idea is when framework launches a task with an offer, the unused 
resources in the offer should not be recovered back to allocator, and framework 
can continue to use those resources for launching other subsequent tasks?

> Reusable/Cacheable Offer
> 
>
> Key: MESOS-4811
> URL: https://issues.apache.org/jira/browse/MESOS-4811
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Klaus Ma
>
> Currently, the resources are return back to allocator when task finished; and 
> those resources are not allocated to framework until next allocation cycle. 
> The performance is low for short running tasks (MESOS-3078). The proposed 
> solution is to let framework keep using the offer until allocator decide to 
> rescind it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-29 Thread James Peach (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172869#comment-15172869
 ] 

James Peach commented on MESOS-4757:


That would work for Linux and BSD I think, but not for Darwin. I recommend 
against providing low-level APIs like {{setgroups}}. It's really easy to get 
this wrong with APIs at this level.

> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-29 Thread Jie Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172851#comment-15172851
 ] 

Jie Yu commented on MESOS-4757:
---

[~idownes] I don't we think we should change the owner/group of the sandbox 
according to information from a container which can be malicious. What do you 
think?

> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4702) Document default value of "offer_timeout"

2016-02-29 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4702:
---
  Sprint: Mesosphere Sprint 30
Story Points: 1

> Document default value of "offer_timeout"
> -
>
> Key: MESOS-4702
> URL: https://issues.apache.org/jira/browse/MESOS-4702
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: documentation, mesosphere, newbie
>
> There isn't a default value (i.e., offers do not timeout by default), but we 
> should clarify this in {{flags.cpp}} and {{configuration.md}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-4702) Document default value of "offer_timeout"

2016-02-29 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-4702:
--

Assignee: Neil Conway

> Document default value of "offer_timeout"
> -
>
> Key: MESOS-4702
> URL: https://issues.apache.org/jira/browse/MESOS-4702
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: documentation, mesosphere, newbie
>
> There isn't a default value (i.e., offers do not timeout by default), but we 
> should clarify this in {{flags.cpp}} and {{configuration.md}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-29 Thread Jie Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172842#comment-15172842
 ] 

Jie Yu commented on MESOS-4757:
---

OK, ic. Maybe I can just use a large enough number (e.g., 65536)? I think 
getting this number for sysconf is the right way. I can easily change that.

I guess we need a boarder discussion on whether we should do something like 
this or not (per your email reply and Ian's comment).


> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-29 Thread James Peach (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172836#comment-15172836
 ] 

James Peach commented on MESOS-4757:


This only works because you have < 16 groups.

> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3583) Introduce sessions in HTTP Scheduler API

2016-02-29 Thread Vinod Kone (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3583:
--
Shepherd: Vinod Kone

> Introduce sessions in HTTP Scheduler API
> 
>
> Key: MESOS-3583
> URL: https://issues.apache.org/jira/browse/MESOS-3583
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Greg Mann
>  Labels: mesosphere, tech-debt
>
> Currently, the HTTP Scheduler API has no concept of Sessions aka 
> {{SessionID}} or a {{TokenID}}. This is useful in some failure scenarios. As 
> of now, if a framework fails over and then subscribes again with the same 
> {{FrameworkID}} with the {{force}} option set, the Mesos master would 
> subscribe it.
> If the previous instance of the framework/scheduler tries to send a Call , 
> e.g. {{Call::KILL}} with the same previous {{FrameworkID}} set, it would be 
> still accepted by the master leading to erroneously killing a task.
> This is possible because we do not have a way currently of distinguishing 
> connections. It used to work in the previous driver implementation due to the 
> master also performing a {{UPID}} check to verify if they matched and only 
> then allowing the call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-29 Thread Ian Downes (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172817#comment-15172817
 ] 

Ian Downes commented on MESOS-4757:
---

I skimmed the pull request and it looks reasonable.

[~jieyu] Then we should change the the ownership of the sandbox to match? There 
doesn't have to be a mapping in the user/group database to set ownership:
{noformat}
[1500][idownes:~]$ touch foo
[1500][idownes:~]$ sudo chown 1234 foo
[1500][idownes:~]$ cat /etc/passwd | grep 1234
[1500][idownes:~]$ stat -f "%N: %u" foo
foo: 1234
{noformat}

> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4312) Porting Mesos on Power (ppc64le)

2016-02-29 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172780#comment-15172780
 ] 

Vinod Kone commented on MESOS-4312:
---

Can you make this an epic? I think it more accurately depicts the work needed 
for this.

> Porting Mesos on Power (ppc64le)
> 
>
> Key: MESOS-4312
> URL: https://issues.apache.org/jira/browse/MESOS-4312
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> The goal of this ticket is to make IBM Power (ppc64le) as a supported 
> hardware platform of Mesos. Currently the latest Mesos code can not be 
> successfully built on ppc64le, we will resolve the build errors in this 
> ticket, and also make sure Mesos test suite ("make check") can be ran 
> successfully on ppc64le. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4700) Allow agent to configure net_cls handle minor range.

2016-02-29 Thread Avinash Sridharan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-4700:
-
Fix Version/s: 0.28.0

> Allow agent to configure net_cls handle minor range.
> 
>
> Key: MESOS-4700
> URL: https://issues.apache.org/jira/browse/MESOS-4700
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Avinash Sridharan
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> Bug exists in some user libraries that prevents some certain minor net_cls 
> handle being used. It'll be great if we can configure the minor range through 
> agent flags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-29 Thread Cong Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172684#comment-15172684
 ] 

Cong Wang commented on MESOS-4757:
--

Appc already fixes this by: https://github.com/appc/spec/pull/315/files . Mesos 
could take the similar approach.

> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4823) Implement port forwarding in `network/cni` isolator

2016-02-29 Thread Avinash Sridharan (JIRA)

Avinash Sridharan created MESOS-4823:


 Summary: Implement port forwarding in `network/cni` isolator
 Key: MESOS-4823
 URL: https://issues.apache.org/jira/browse/MESOS-4823
 Project: Mesos
  Issue Type: Task
  Components: containerization
 Environment: linux
Reporter: Avinash Sridharan
Assignee: Avinash Sridharan
Priority: Critical


Most docker and appc images wish ports that micro-services are listening on, to 
the outside world. When containers are running on bridged (or ptp) networking 
this can be achieved by installing port forwarding rules on the agent (using 
iptables). This can be done in the `network/cni` isolator. 

The reason we would like this functionality to be implemented in the 
`network/cni` isolator, and not a CNI plugin, is that the specifications 
currently do not support specifying port forwarding rules. Further, to install 
these rules the isolator needs two pieces of information, the exposed ports and 
the IP address associated with the container. Bother are available to the 
isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-4821) Introduce a port field in `ContainerConfig` in order to set exposed ports for a container.

2016-02-29 Thread Avinash Sridharan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan reassigned MESOS-4821:


Assignee: Avinash Sridharan

> Introduce a port field in `ContainerConfig` in order to set exposed ports for 
> a container.
> --
>
> Key: MESOS-4821
> URL: https://issues.apache.org/jira/browse/MESOS-4821
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Networking isolators such as `network/cni` need to learn about ports that a 
> container wishes to be exposed to the outside world. This can be achieved by 
> adding a field to the `ContainerConfig` protobuf and allowing the 
> `Containerizer` or framework set these fields to inform the isolator of the 
> ports that the container wishes to be exposed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2950) Implement current mesos Authorizer in terms of generalized Authorizer interface

2016-02-29 Thread Alexander Rojas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rojas updated MESOS-2950:
---
Shepherd: Vinod Kone  (was: Till Toenshoff)

> Implement current mesos Authorizer in terms of generalized Authorizer 
> interface
> ---
>
> Key: MESOS-2950
> URL: https://issues.apache.org/jira/browse/MESOS-2950
> Project: Mesos
>  Issue Type: Task
>  Components: master, security
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: acl, mesosphere, security
>
> In order to maintain compatibility with existent versions of Mesos, as well 
> as to prove the flexibility of the generalized {{mesos::Authorizer}} design, 
> the current authorization mechanism through ACL definitions needs to run 
> under the updated interface without any changes being noticeable by the 
> current authorization users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4822) Add support for local image fetching in Appc provisioner.

2016-02-29 Thread Jojy Varghese (JIRA)

Jojy Varghese created MESOS-4822:


 Summary: Add support for local image fetching in Appc provisioner.
 Key: MESOS-4822
 URL: https://issues.apache.org/jira/browse/MESOS-4822
 Project: Mesos
  Issue Type: Task
  Components: containerization
Reporter: Jojy Varghese
Assignee: Jojy Varghese


Currently Appc image provisioner supports http(s) fetching. It would be 
valuable to add support for local file path(URI) based  fetching.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4821) Introduce a port field in `ContainerConfig` in order to set exposed ports for a container.

2016-02-29 Thread Avinash Sridharan (JIRA)

Avinash Sridharan created MESOS-4821:


 Summary: Introduce a port field in `ContainerConfig` in order to 
set exposed ports for a container.
 Key: MESOS-4821
 URL: https://issues.apache.org/jira/browse/MESOS-4821
 Project: Mesos
  Issue Type: Task
  Components: containerization
 Environment: linux
Reporter: Avinash Sridharan


Networking isolators such as `network/cni` need to learn about ports that a 
container wishes to be exposed to the outside world. This can be achieved by 
adding a field to the `ContainerConfig` protobuf and allowing the 
`Containerizer` or framework set these fields to inform the isolator of the 
ports that the container wishes to be exposed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4819) Add documentation for Appc image discovery.

2016-02-29 Thread Jojy Varghese (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jojy Varghese updated MESOS-4819:
-
Issue Type: Documentation  (was: Bug)

> Add documentation for Appc image discovery.
> ---
>
> Key: MESOS-4819
> URL: https://issues.apache.org/jira/browse/MESOS-4819
> Project: Mesos
>  Issue Type: Documentation
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere, unified-containerizer-mvp
>
> Add documentation for the Appc image discovery feature that covers:
> - Use case
> - Implementation detail (Simple discovery).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4820) Need to set `EXPOSED` ports from docker images into `ContainerConfig`

2016-02-29 Thread Avinash Sridharan (JIRA)

Avinash Sridharan created MESOS-4820:


 Summary: Need to set `EXPOSED` ports from docker images into 
`ContainerConfig`
 Key: MESOS-4820
 URL: https://issues.apache.org/jira/browse/MESOS-4820
 Project: Mesos
  Issue Type: Task
  Components: containerization
Reporter: Avinash Sridharan
Priority: Critical


Most docker images have an `EXPOSE` command associated with them. This tells 
the container run-time the TCP ports that the micro-service "wishes" to expose 
to the outside world. 

With the `Unified containerizer` project since `MesosContainerizer` is going to 
natively support docker images it is imperative that the Mesos container run 
time have a mechanism to expose ports listed in a Docker image. The first step 
to achieve this is to extract this information from the `Docker` image and set 
in the `ContainerConfig` . The `ContainerConfig` can then be used to pass this 
information to any isolator (for e.g. `network/cni` isolator) that will install 
port forwarding rules to expose the desired ports.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4818) Add end to end testing for Appc images.

2016-02-29 Thread Jojy Varghese (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jojy Varghese updated MESOS-4818:
-
Issue Type: Task  (was: Bug)

> Add end to end testing for Appc images.
> ---
>
> Key: MESOS-4818
> URL: https://issues.apache.org/jira/browse/MESOS-4818
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere, unified-containerizer-mvp
>
> Add tests that covers integration test of the Appc provisioner feature with 
> mesos containerizer.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4819) Add documentation for Appc image discovery.

2016-02-29 Thread Jojy Varghese (JIRA)

Jojy Varghese created MESOS-4819:


 Summary: Add documentation for Appc image discovery.
 Key: MESOS-4819
 URL: https://issues.apache.org/jira/browse/MESOS-4819
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Jojy Varghese
Assignee: Jojy Varghese


Add documentation for the Appc image discovery feature that covers:

- Use case
- Implementation detail (Simple discovery).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4818) Add end to end testing for Appc images.

2016-02-29 Thread Jojy Varghese (JIRA)

Jojy Varghese created MESOS-4818:


 Summary: Add end to end testing for Appc images.
 Key: MESOS-4818
 URL: https://issues.apache.org/jira/browse/MESOS-4818
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Jojy Varghese
Assignee: Jojy Varghese


Add tests that covers integration test of the Appc provisioner feature with 
mesos containerizer.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4787) HTTP endpoint docs should use shorter paths

2016-02-29 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4787:
---
Sprint: Mesosphere Sprint 30

> HTTP endpoint docs should use shorter paths
> ---
>
> Key: MESOS-4787
> URL: https://issues.apache.org/jira/browse/MESOS-4787
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Kevin Klues
>Priority: Minor
>  Labels: documentation, mesosphere
>
> My understanding is that the recommended path for the v1 scheduler API is 
> {{/api/v1/scheduler}}, but the HTTP endpoint 
> [docs|http://mesos.apache.org/documentation/latest/endpoints/] for this 
> endpoint list the path as {{/master/api/v1/scheduler}}; the filename of the 
> doc page is also in the {{master}} subdirectory.
> Similarly, we document the master state endpoint as {{/master/state}}, 
> whereas the preferred name is now just {{/state}}, and so on for most of the 
> other endpoints. Unlike we the V1 API, we might want to consider backward 
> compatibility and document both forms -- not sure. But certainly it seems 
> like we should encourage people to use the shorter paths, not the longer ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-29 Thread Jie Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172479#comment-15172479
 ] 

Jie Yu commented on MESOS-4757:
---

[~idownes] My main concern is about the sandbox. Currently, sandbox is prepared 
by the agent (thus using the agent's host database) when chown happens and we 
bind mount that directory to the container. Without user namespace, I don't 
know if using the container database is desired or not. 

> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4787) HTTP endpoint docs should use shorter paths

2016-02-29 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4787:
---
Assignee: Kevin Klues

> HTTP endpoint docs should use shorter paths
> ---
>
> Key: MESOS-4787
> URL: https://issues.apache.org/jira/browse/MESOS-4787
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Kevin Klues
>Priority: Minor
>  Labels: documentation, mesosphere
>
> My understanding is that the recommended path for the v1 scheduler API is 
> {{/api/v1/scheduler}}, but the HTTP endpoint 
> [docs|http://mesos.apache.org/documentation/latest/endpoints/] for this 
> endpoint list the path as {{/master/api/v1/scheduler}}; the filename of the 
> doc page is also in the {{master}} subdirectory.
> Similarly, we document the master state endpoint as {{/master/state}}, 
> whereas the preferred name is now just {{/state}}, and so on for most of the 
> other endpoints. Unlike we the V1 API, we might want to consider backward 
> compatibility and document both forms -- not sure. But certainly it seems 
> like we should encourage people to use the shorter paths, not the longer ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3401) Add labels to Resources

2016-02-29 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172476#comment-15172476
 ] 

Neil Conway commented on MESOS-3401:


Note that we now support labels for dynamically reserved resources (see 
MESOS-4479), which supports at least some of the use-cases described in this 
ticket.

> Add labels to Resources
> ---
>
> Key: MESOS-3401
> URL: https://issues.apache.org/jira/browse/MESOS-3401
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Adam B
>  Labels: external-volumes, mesosphere, resources
>
> Similar to how we have added labels to tasks/executors (MESOS-2120), and even 
> FrameworkInfo (MESOS-2841), we should extend Resource to allow arbitrary 
> key/value pairs.
> This could be used to specify that a cpu resource has a certain speed, that a 
> disk resource is SSD, or express any other metadata about a built-in or 
> custom resource type. Only the scalar quantity will be used for determining 
> fair share in the Mesos allocator. The rest will be passed onto frameworks as 
> info they can use for scheduling decisions.
> This would require changes to how the slave specifies its `--resources` 
> (probably as json), how the slave/master reports resources in its web/json 
> API, and how resources are offered to frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4205) Remove unnecessary master flags from Persistent Volume tests

2016-02-29 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4205:
-
Assignee: (was: Greg Mann)

> Remove unnecessary master flags from Persistent Volume tests
> 
>
> Key: MESOS-4205
> URL: https://issues.apache.org/jira/browse/MESOS-4205
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Greg Mann
>
> With the addition of implicit roles, some tests in 
> {{persistent_volume_tests.cpp}} no longer require the master flags that they 
> pass to their masters, which are just used to specify the available roles. 
> These should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3961) Consider equality behavior for DiskInfo resource

2016-02-29 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-3961:
-
Assignee: (was: Greg Mann)

> Consider equality behavior for DiskInfo resource
> 
>
> Key: MESOS-3961
> URL: https://issues.apache.org/jira/browse/MESOS-3961
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere, persistent-volumes
>
> Relevant code:
> {code}
> bool operator==(const Resource::DiskInfo& left, const Resource::DiskInfo& 
> right)
> {
>   // NOTE: We ignore 'volume' inside DiskInfo when doing comparison
>   // because it describes how this resource will be used which has
>   // nothing to do with the Resource object itself. A framework can
>   // use this resource and specify different 'volume' every time it
>   // uses it.
>   if (left.has_persistence() != right.has_persistence()) {
> return false;
>   }
>   if (left.has_persistence()) {
> return left.persistence().id() == right.persistence().id();
>   }
>   return true;
> }
> {code}
> A consequence of this behavior is that if you pass the wrong path to a 
> `destroy-volume` request (but there is a persistent volume that otherwise 
> matches the request), the path will be ignored and the volume will be 
> destroyed. Not clear if that is undesirable, but it does seem surprising.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3401) Add labels to Resources

2016-02-29 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-3401:
-
Assignee: (was: Greg Mann)

> Add labels to Resources
> ---
>
> Key: MESOS-3401
> URL: https://issues.apache.org/jira/browse/MESOS-3401
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Adam B
>  Labels: external-volumes, mesosphere, resources
>
> Similar to how we have added labels to tasks/executors (MESOS-2120), and even 
> FrameworkInfo (MESOS-2841), we should extend Resource to allow arbitrary 
> key/value pairs.
> This could be used to specify that a cpu resource has a certain speed, that a 
> disk resource is SSD, or express any other metadata about a built-in or 
> custom resource type. Only the scalar quantity will be used for determining 
> fair share in the Mesos allocator. The rest will be passed onto frameworks as 
> info they can use for scheduling decisions.
> This would require changes to how the slave specifies its `--resources` 
> (probably as json), how the slave/master reports resources in its web/json 
> API, and how resources are offered to frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4817) Remove internal usage of deprecated *.json endpoints.

2016-02-29 Thread Joerg Schad (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-4817:
---
  Sprint: Mesosphere Sprint 30
Story Points: 3

> Remove internal usage of deprecated *.json endpoints.
> -
>
> Key: MESOS-4817
> URL: https://issues.apache.org/jira/browse/MESOS-4817
> Project: Mesos
>  Issue Type: Task
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> We still use the deprecated *.json internally (UI, tests, documentation). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4214) Introduce HTTP endpoint /weights for updating weight

2016-02-29 Thread Adam B (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4214:
--
Fix Version/s: 0.28.0

> Introduce HTTP endpoint /weights for updating weight
> 
>
> Key: MESOS-4214
> URL: https://issues.apache.org/jira/browse/MESOS-4214
> Project: Mesos
>  Issue Type: Task
>Reporter: Yongqiao Wang
>Assignee: Yongqiao Wang
> Fix For: 0.28.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3945) Add operator documentation for /weight endpoint

2016-02-29 Thread Adam B (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3945:
--
Fix Version/s: 0.28.0

> Add operator documentation for /weight endpoint
> ---
>
> Key: MESOS-3945
> URL: https://issues.apache.org/jira/browse/MESOS-3945
> Project: Mesos
>  Issue Type: Task
>Reporter: James Wang
>Assignee: Yongqiao Wang
> Fix For: 0.28.0
>
>
> This JIRA ticket will update the related doc to apply to dynamic weights, and 
> add an new operator guide for dynamic weights which describes basic usage of 
> the /weights endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4810) ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.

2016-02-29 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4810:
-
Assignee: Jie Yu

> ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.
> --
>
> Key: MESOS-4810
> URL: https://issues.apache.org/jira/browse/MESOS-4810
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.28.0
> Environment: CentOS 7 on AWS, both with or without SSL.
>Reporter: Bernd Mathiske
>Assignee: Jie Yu
>  Labels: docker, test
>
> {noformat}
> [09:46:46] :   [Step 11/11] [ RUN  ] 
> ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.628413  1166 leveldb.cpp:174] 
> Opened db in 4.242882ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629926  1166 leveldb.cpp:181] 
> Compacted db in 1.483621ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629966  1166 leveldb.cpp:196] 
> Created db iterator in 15498ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629977  1166 leveldb.cpp:202] 
> Seeked to beginning of db in 1405ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629984  1166 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 239ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630015  1166 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630470  1183 recover.cpp:447] 
> Starting replica recovery
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630702  1180 recover.cpp:473] 
> Replica is in EMPTY status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.631767  1182 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (14567)@172.30.2.124:37431
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.632115  1183 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.632450  1186 recover.cpp:564] 
> Updating replica status to STARTING
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633476  1186 master.cpp:375] 
> Master 3fbb2fb0-4f18-498b-a440-9acbf6923a13 (ip-172-30-2-124.mesosphere.io) 
> started on 172.30.2.124:37431
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633491  1186 master.cpp:377] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/4UxXoW/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/4UxXoW/master" 
> --zk_session_timeout="10secs"
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633677  1186 master.cpp:422] 
> Master only allowing authenticated frameworks to register
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633685  1186 master.cpp:427] 
> Master only allowing authenticated slaves to register
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633692  1186 credentials.hpp:35] 
> Loading credentials for authentication from '/tmp/4UxXoW/credentials'
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633851  1183 leveldb.cpp:304] 
> Persisting metadata (8 bytes) to leveldb took 1.191043ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633873  1183 replica.cpp:320] 
> Persisted replica status to STARTING
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633894  1186 master.cpp:467] Using 
> default 'crammd5' authenticator
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634003  1186 master.cpp:536] Using 
> default 'basic' HTTP authenticator
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634062  1184 recover.cpp:473] 
> Replica is in STARTING status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634109  1186 master.cpp:570] 
> Authorization enabled
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634249  1187 
> whitelist_watcher.cpp:77] No whitelist given
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634255  1184 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634884  1187 replica.cpp:673] 
> Replica in STARTING status received a broadcasted recover reques

[jira] [Updated] (MESOS-4756) DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard is flaky on CentOS 6

2016-02-29 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4756:
-
Assignee: Jan Schlicht

> DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard is flaky on CentOS 6
> -
>
> Key: MESOS-4756
> URL: https://issues.apache.org/jira/browse/MESOS-4756
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 0.27
> Environment: Centos6 (AWS) + GCC 4.9
>Reporter: Joseph Wu
>Assignee: Jan Schlicht
>  Labels: mesosphere, tests
>
> {code}
> [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard
> I0224 17:50:26.577450 17755 leveldb.cpp:174] Opened db in 6.715352ms
> I0224 17:50:26.579607 17755 leveldb.cpp:181] Compacted db in 2.128954ms
> I0224 17:50:26.579648 17755 leveldb.cpp:196] Created db iterator in 16927ns
> I0224 17:50:26.579661 17755 leveldb.cpp:202] Seeked to beginning of db in 
> 1408ns
> I0224 17:50:26.579669 17755 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 343ns
> I0224 17:50:26.579721 17755 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0224 17:50:26.580185 17776 recover.cpp:447] Starting replica recovery
> I0224 17:50:26.580382 17776 recover.cpp:473] Replica is in EMPTY status
> I0224 17:50:26.581264 17770 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (14098)@172.30.2.121:33050
> I0224 17:50:26.581771 17772 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0224 17:50:26.582188 17771 recover.cpp:564] Updating replica status to 
> STARTING
> I0224 17:50:26.583030 17772 master.cpp:376] Master 
> 00a3ac12-9e76-48f5-92fa-48770b82035d (ip-172-30-2-121.mesosphere.io) started 
> on 172.30.2.121:33050
> I0224 17:50:26.583051 17772 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/jSZ9of/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/jSZ9of/master" 
> --zk_session_timeout="10secs"
> I0224 17:50:26.583328 17772 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0224 17:50:26.583336 17772 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0224 17:50:26.583343 17772 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/jSZ9of/credentials'
> I0224 17:50:26.583901 17772 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0224 17:50:26.584022 17772 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0224 17:50:26.584141 17772 master.cpp:571] Authorization enabled
> I0224 17:50:26.584234 17770 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 1.955608ms
> I0224 17:50:26.584264 17770 replica.cpp:320] Persisted replica status to 
> STARTING
> I0224 17:50:26.584285 17771 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0224 17:50:26.584295 17773 whitelist_watcher.cpp:77] No whitelist given
> I0224 17:50:26.584463 17775 recover.cpp:473] Replica is in STARTING status
> I0224 17:50:26.585260 17771 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (14100)@172.30.2.121:33050
> I0224 17:50:26.585553 1 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0224 17:50:26.586042 17773 recover.cpp:564] Updating replica status to VOTING
> I0224 17:50:26.586091 17770 master.cpp:1712] The newly elected leader is 
> master@172.30.2.121:33050 with id 00a3ac12-9e76-48f5-92fa-48770b82035d
> I0224 17:50:26.586122 17770 master.cpp:1725] Elected as the leading master!
> I0224 17:50:26.586146 17770 master.cpp:1470] Recovering from registrar
> I0224 17:50:26.586294 17773 registrar.cpp:307] Recovering registrar
> I0224 17:50:26.588148 17776 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 1.89126ms
> I0224 17:50:26.588171 17776 replica.cpp:320] Persisted replica status to 
>

[jira] [Updated] (MESOS-4756) DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard is flaky on CentOS 6

2016-02-29 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4756:
-
Sprint: Mesosphere Sprint 30

> DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard is flaky on CentOS 6
> -
>
> Key: MESOS-4756
> URL: https://issues.apache.org/jira/browse/MESOS-4756
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 0.27
> Environment: Centos6 (AWS) + GCC 4.9
>Reporter: Joseph Wu
>Assignee: Jan Schlicht
>  Labels: mesosphere, tests
>
> {code}
> [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard
> I0224 17:50:26.577450 17755 leveldb.cpp:174] Opened db in 6.715352ms
> I0224 17:50:26.579607 17755 leveldb.cpp:181] Compacted db in 2.128954ms
> I0224 17:50:26.579648 17755 leveldb.cpp:196] Created db iterator in 16927ns
> I0224 17:50:26.579661 17755 leveldb.cpp:202] Seeked to beginning of db in 
> 1408ns
> I0224 17:50:26.579669 17755 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 343ns
> I0224 17:50:26.579721 17755 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0224 17:50:26.580185 17776 recover.cpp:447] Starting replica recovery
> I0224 17:50:26.580382 17776 recover.cpp:473] Replica is in EMPTY status
> I0224 17:50:26.581264 17770 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (14098)@172.30.2.121:33050
> I0224 17:50:26.581771 17772 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0224 17:50:26.582188 17771 recover.cpp:564] Updating replica status to 
> STARTING
> I0224 17:50:26.583030 17772 master.cpp:376] Master 
> 00a3ac12-9e76-48f5-92fa-48770b82035d (ip-172-30-2-121.mesosphere.io) started 
> on 172.30.2.121:33050
> I0224 17:50:26.583051 17772 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/jSZ9of/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/jSZ9of/master" 
> --zk_session_timeout="10secs"
> I0224 17:50:26.583328 17772 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0224 17:50:26.583336 17772 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0224 17:50:26.583343 17772 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/jSZ9of/credentials'
> I0224 17:50:26.583901 17772 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0224 17:50:26.584022 17772 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0224 17:50:26.584141 17772 master.cpp:571] Authorization enabled
> I0224 17:50:26.584234 17770 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 1.955608ms
> I0224 17:50:26.584264 17770 replica.cpp:320] Persisted replica status to 
> STARTING
> I0224 17:50:26.584285 17771 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0224 17:50:26.584295 17773 whitelist_watcher.cpp:77] No whitelist given
> I0224 17:50:26.584463 17775 recover.cpp:473] Replica is in STARTING status
> I0224 17:50:26.585260 17771 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (14100)@172.30.2.121:33050
> I0224 17:50:26.585553 1 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0224 17:50:26.586042 17773 recover.cpp:564] Updating replica status to VOTING
> I0224 17:50:26.586091 17770 master.cpp:1712] The newly elected leader is 
> master@172.30.2.121:33050 with id 00a3ac12-9e76-48f5-92fa-48770b82035d
> I0224 17:50:26.586122 17770 master.cpp:1725] Elected as the leading master!
> I0224 17:50:26.586146 17770 master.cpp:1470] Recovering from registrar
> I0224 17:50:26.586294 17773 registrar.cpp:307] Recovering registrar
> I0224 17:50:26.588148 17776 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 1.89126ms
> I0224 17:50:26.588171 17776 replica.cpp:320] Persisted replica status

[jira] [Created] (MESOS-4817) Remove internal usage of deprecated *.json endpoints.

2016-02-29 Thread Joerg Schad (JIRA)

Joerg Schad created MESOS-4817:
--

 Summary: Remove internal usage of deprecated *.json endpoints.
 Key: MESOS-4817
 URL: https://issues.apache.org/jira/browse/MESOS-4817
 Project: Mesos
  Issue Type: Task
Reporter: Joerg Schad
Assignee: Joerg Schad


We still use the deprecated *.json internally (UI, tests, documentation). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4664) Add allocator metrics.

2016-02-29 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172448#comment-15172448
 ] 

Benjamin Bannier commented on MESOS-4664:
-

@bmahler: In reviews a couple of times the question came up whether the number 
of runs counters (MESOS-4718 & MESOS-4719) are useful. I can imagine them being 
useful for determining whether the allocator makes progress in general, and how 
offers are being distributed among frameworks, especially together with 
allocation time metric from MESOS-4721. Since you added them to the ticket 
initially, could you please confirm that you still think they are useful?

> Add allocator metrics.
> --
>
> Key: MESOS-4664
> URL: https://issues.apache.org/jira/browse/MESOS-4664
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>Priority: Critical
>
> There are currently no metrics that provide visibility into the allocator, 
> except for the event queue size. This makes monitoring an debugging 
> allocation behavior in a multi-framework setup difficult.
> Some thoughts for initial metrics to add:
> * How many allocation runs have completed? (counter): MESOS-4718
> * How many allocations each framework got? (counter): MESOS-4719
> * Current allocation breakdown: allocated / available / total (gauges): 
> MESOS-4720
> * Current maximum shares (gauges): MESOS-4724
> * How many active filters are there for the role / framework? (gauges): 
> MESOS-4722
> * How many frameworks are suppressing offers? (gauges)
> * How long does an allocation run take? (timers): MESOS-4721
> * Maintenance related metrics:
> ** How many maintenance events are active? (gauges)
> ** How many maintenance events are scheduled but not active (gauges)
> * Quota related metrics:
> ** How much quota is set for each role? (gauges)
> ** How much quota is satisfied? How much unsatisfied? (gauges): MESOS-4723
>  
> Some of these are already exposed from the master's metrics, but we should 
> not assume this within the allocator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-29 Thread Ian Downes (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172440#comment-15172440
 ] 

Ian Downes commented on MESOS-4757:
---

IMHO this is incorrect and highlights the inconsistent relationship we have 
between the host and the container environments, mostly attributable to our 
history of running in the host context. Ideally, the container should be 
completely independent of the host configuration! It should not be resolving 
user/group names to uids/gids using the host's database. That is making huge 
assumptions about consistent configuration across a cluster -- and an external 
system to maintain it -- that are unnecessary and undesirable.

I suggest something like the following behavior when container images are used:
# If a job specifies a user and group name then the container image *must* 
include the necessary user and group database files and must resolve the names 
to ids. If not, then it fails.
# Support the job specifying uid and gid(s) directly.
# Also support picking the user and gid off a file in the image (I think appc 
supports this?).

If a container image is not used then fallback to the current (and terrible) 
behavior of using the host's databases.

Thoughts?

> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4816) Expose TaskInfo to Isolators

2016-02-29 Thread Connor Doyle (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172432#comment-15172432
 ] 

Connor Doyle commented on MESOS-4816:
-

Maybe the {{update}} signature could be expanded like this?

{code:title=include/mesos/slave/isolator.hpp|borderStyle=solid}
  // Update the resources allocated to the container.
  virtual process::Future update(
  const ContainerID& containerId,
  const Resources& resources,
  const Option& taskInfo) = 0;
{code}

> Expose TaskInfo to Isolators
> 
>
> Key: MESOS-4816
> URL: https://issues.apache.org/jira/browse/MESOS-4816
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules, slave
>Reporter: Connor Doyle
>
> Authors of custom isolator modules frequently require access to the TaskInfo 
> in order to read custom metadata in task labels.
> Currently, it's possible to link containers to tasks within a module by 
> implementing both an isolator and the {{slaveRunTaskLabelDecorator}} hook, 
> and maintaining a shared map of containers to tasks.  This way works, but 
> adds unnecessary complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.

2016-02-29 Thread Anand Mazumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4029:
--
Story Points: 3  (was: 2)

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] Error 2
> make[2]: Leaving directory `/home/vagrant/mesos/build/src'
> make[1]: *** [check] Error 2
> make[1]: Leaving

[jira] [Created] (MESOS-4816) Expose TaskInfo to Isolators

2016-02-29 Thread Connor Doyle (JIRA)

Connor Doyle created MESOS-4816:
---

 Summary: Expose TaskInfo to Isolators
 Key: MESOS-4816
 URL: https://issues.apache.org/jira/browse/MESOS-4816
 Project: Mesos
  Issue Type: Improvement
  Components: modules, slave
Reporter: Connor Doyle


Authors of custom isolator modules frequently require access to the TaskInfo in 
order to read custom metadata in task labels.

Currently, it's possible to link containers to tasks within a module by 
implementing both an isolator and the {{slaveRunTaskLabelDecorator}} hook, and 
maintaining a shared map of containers to tasks.  This way works, but adds 
unnecessary complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4815) Implement private registry test with authentication.

2016-02-29 Thread Gilbert Song (JIRA)

Gilbert Song created MESOS-4815:
---

 Summary: Implement private registry test with authentication.
 Key: MESOS-4815
 URL: https://issues.apache.org/jira/browse/MESOS-4815
 Project: Mesos
  Issue Type: Task
  Components: containerization
Reporter: Gilbert Song
Assignee: Gilbert Song


Unified containerizer using docker images, with authentication to test private 
registry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.

2016-02-29 Thread Anand Mazumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4029:
--
Shepherd: Vinod Kone

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] Error 2
> make[2]: Leaving directory `/home/vagrant/mesos/build/src'
> make[1]: *** [check] Error 2
> make[1]: Leaving dire

[jira] [Updated] (MESOS-4390) Shared Volumes Design Doc

2016-02-29 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4390:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27, Mesosphere Sprint 28, Mesosphere Sprint 30)

> Shared Volumes Design Doc
> -
>
> Key: MESOS-4390
> URL: https://issues.apache.org/jira/browse/MESOS-4390
> Project: Mesos
>  Issue Type: Task
>Reporter: Adam B
>Assignee: Anindya Sinha
>  Labels: mesosphere
>
> Review & Approve design doc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4814) Implement private registry test with ssl.

2016-02-29 Thread Gilbert Song (JIRA)

Gilbert Song created MESOS-4814:
---

 Summary: Implement private registry test with ssl.
 Key: MESOS-4814
 URL: https://issues.apache.org/jira/browse/MESOS-4814
 Project: Mesos
  Issue Type: Task
  Components: containerization
Reporter: Gilbert Song
Assignee: Gilbert Song


Test unified containerizer using docker images, have ssl enabled to test the 
private registry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4390) Shared Volumes Design Doc

2016-02-29 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4390:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27, Mesosphere Sprint 28, Mesosphere Sprint 29)

> Shared Volumes Design Doc
> -
>
> Key: MESOS-4390
> URL: https://issues.apache.org/jira/browse/MESOS-4390
> Project: Mesos
>  Issue Type: Task
>Reporter: Adam B
>Assignee: Anindya Sinha
>  Labels: mesosphere
>
> Review & Approve design doc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4390) Shared Volumes Design Doc

2016-02-29 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4390:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28, Mesosphere Sprint 30  
(was: Mesosphere Sprint 27, Mesosphere Sprint 28)

> Shared Volumes Design Doc
> -
>
> Key: MESOS-4390
> URL: https://issues.apache.org/jira/browse/MESOS-4390
> Project: Mesos
>  Issue Type: Task
>Reporter: Adam B
>Assignee: Anindya Sinha
>  Labels: mesosphere
>
> Review & Approve design doc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4609) Subprocess should be more intelligent about setting/inheriting libprocess environment variables

2016-02-29 Thread Joseph Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-4609:
-
Sprint: Mesosphere Sprint 28  (was: Mesosphere Sprint 28, Mesosphere Sprint 
29)

> Subprocess should be more intelligent about setting/inheriting libprocess 
> environment variables 
> 
>
> Key: MESOS-4609
> URL: https://issues.apache.org/jira/browse/MESOS-4609
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> Mostly copied from [this 
> comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497]
> A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run 
> into some accidental fatalities:
> | || Subprocess uses libprocess || Subprocess is something else ||
> || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> 
> exit | Nothing happens (?) |
> || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | 
> Nothing happens (?) |
> (?) = means this is usually the case, but not 100%.
> A complete fix would look something like:
> * If the {{subprocess}} call gets {{environment = None()}}, we should 
> automatically remove {{LIBPROCESS_PORT}} from the inherited environment.  
> * The parts of 
> [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265]
>  dealing with libprocess & libmesos should be refactored into libprocess as a 
> helper.  We would use this helper for the Containerizer, Fetcher, and 
> ContainerLogger module.
> * If the {{subprocess}} call is given {{LIBPROCESS_PORT == 
> os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var 
> locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4813) Implement base tests for unified container using local registry.

2016-02-29 Thread Gilbert Song (JIRA)

Gilbert Song created MESOS-4813:
---

 Summary: Implement base tests for unified container using local 
registry.
 Key: MESOS-4813
 URL: https://issues.apache.org/jira/browse/MESOS-4813
 Project: Mesos
  Issue Type: Task
  Components: containerization
Reporter: Gilbert Song
Assignee: Gilbert Song


Using command line executor to test shell commands with local docker images.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4697) Consolidate cgroup isolators into one single isolator.

2016-02-29 Thread Jie Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172320#comment-15172320
 ] 

Jie Yu commented on MESOS-4697:
---

[~idownes] Thanks for the comment. There'll be design doc for this which will 
list the motivations, and how it'll be implemented, and how it can be extended. 
Will send to the dev list to get feedback.

> Consolidate cgroup isolators into one single isolator.
> --
>
> Key: MESOS-4697
> URL: https://issues.apache.org/jira/browse/MESOS-4697
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: haosdent
> Attachments: cgroup_v2.pdf
>
>
> Linux introduce the unified cgroup hierarchy since 3.16 [The unified control 
> group hierarchy in 3.16|https://lwn.net/Articles/601840/], 
> [cgroup-v2|https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|]
> There are two motivations for this:
> 1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
> mem, net_cls, etc.), many of the logics are the same. We are currently 
> duplicating a lot of the code.
> 2) Initially, we decided to use a separate isolator for each cgroup subsystem 
> is because we want each subsystem to be mounted under a different hierarchy. 
> This gradually become not true with unified cgroup hierarchy introduced in 
> kernel 3.16. Also, on some popular linux distributions, some subsystems are 
> co-mounted within the same hierarchy (e.g., net_cls and net_prio, cpu and 
> cpuacct). It becomes very hard to co-manage a hierarchy by two isolators.
> We can still introduce subsystem specific code under the unified cgroup 
> isolator (e.g., introduce a Subsystem abstraction?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4712) Remove 'force' field from the Subscribe Call in v1 Scheduler API

2016-02-29 Thread Vinod Kone (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4712:
--
Shepherd: Vinod Kone
  Sprint: Mesosphere Sprint 30
Story Points: 5

> Remove 'force' field from the Subscribe Call in v1 Scheduler API
> 
>
> Key: MESOS-4712
> URL: https://issues.apache.org/jira/browse/MESOS-4712
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> We/I introduced the `force` field in SUBSCRIBE call to deal with scheduler 
> partition cases. Having thought a bit more and discussing with few other 
> folks ([~anandmazumdar], [~greggomann]), I think we can get away from not 
> having that field in the v1 API. The obvious advantage of removing the field 
> is that framework devs don't have to think about how/when to set the field 
> (the current semantics are a bit confusing).
> The new workflow when a master receives a SUBSCRIBE call is that master 
> always accepts this call and closes any existing connection (after sending 
> ERROR event) from the same scheduler (identified by framework id).  
> The expectation from schedulers is that they must close the old subscribe 
> connection before resending a new SUBSCRIBE call.
> Lets look at some tricky scenarios and see how this works and why it is safe.
> 1) Connection disconnection @ the scheduler but not @ the master
>
> Scheduler sees the disconnection and sends a new SUBSCRIBE call. Master sends 
> ERROR on the old connection (won't be received by the scheduler because the 
> connection is already closed) and closes it.
> 2) Connection disconnection @ master but not @ scheduler
> Scheduler realizes this from lack of HEARTBEAT events. It then closes its 
> existing connection and sends a new SUBSCRIBE call. Master accepts the new 
> SUBSCRIBE call. There is no old connection to close on the master as it is 
> already closed.
> 3) Scheduler failover but no disconnection @ master
> Newly elected scheduler sends a SUBSCRIBE call. Master sends ERROR event and 
> closes the old connection (won't be received because the old scheduler failed 
> over).
> 4) If Scheduler A got partitioned (but is alive and connected with master) 
> and Scheduler B got elected as new leader.
> When Scheduler B sends SUBSCRIBE, master sends ERROR and closes the 
> connection from Scheduler A. Master accepts Scheduler B's connection. 
> Typically Scheduler A aborts after receiving ERROR and gets restarted. After 
> restart it won't become the leader because Scheduler B is already elected.
> 5) Scheduler sends SUBSCRIBE, times out, closes the SUBSCRIBE connection (A) 
> and sends a new SUBSCRIBE (B). Master receives SUBSCRIBE (B) and then 
> receives SUBSCRIBE (A) but doesn't see A's disconnection yet.
> Master first accepts SUBSCRIBE (B). After it receives SUBSCRIBE (A), it sends 
> ERROR to SUBSCRIBE (B) and closes that connection. When it accepts SUBSCRIBE 
> (A) and tries to send SUBSCRIBED event the connection closure is detected. 
> Scheduler retries the SUBSCRIBE connection after a backoff. I think this is a 
> rare enough race for it to happen continuously in a loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-4798) Make existing scheduler library tests use the callback interface.

2016-02-29 Thread Vinod Kone (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-4798:
-

Assignee: Anand Mazumdar

> Make existing scheduler library tests use the callback interface.
> -
>
> Key: MESOS-4798
> URL: https://issues.apache.org/jira/browse/MESOS-4798
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> We need to migrate the existing tests in {{src/tests/scheduler_tests.cpp}} 
> and {{src/tests/maintenance_tests.cpp}} to use the new callback interface 
> introduced in {{MESOS-3339}}. 
> For an example see {{SchedulerTest.SchedulerFailover}} which already uses 
> this new interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4583) Rename `examples/event_call_framework.cpp` to `examples/test_http_framework.cpp`

2016-02-29 Thread Vinod Kone (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4583:
--
Sprint:   (was: Mesosphere Sprint 30)

> Rename `examples/event_call_framework.cpp` to 
> `examples/test_http_framework.cpp`
> 
>
> Key: MESOS-4583
> URL: https://issues.apache.org/jira/browse/MESOS-4583
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: mesosphere, newbie
>
> We already have {{examples/test_framework.cpp}} for testing {{PID}} based 
> frameworks. We would ideally want to rename {{event_call_framework}} to 
> correctly reflect that it's an example for HTTP based framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3011) Publish release documentation for major releases on website

2016-02-29 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3011:
---
Sprint: Mesosphere Sprint 30

> Publish release documentation for major releases on website
> ---
>
> Key: MESOS-3011
> URL: https://issues.apache.org/jira/browse/MESOS-3011
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, project website
>Reporter: Paul Brett
>Assignee: Joerg Schad
>  Labels: documentation, mesosphere
>
> Currently, the website only provides a single version of the documentation.  
> We should publish documentation for each release on the website independently 
> (for example as https://mesos.apache.org/documentation/0.22/index.html, 
> https://mesos.apache.org/documentation/0.23/index.html) and make latest 
> redirect to the current version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-1187) precision errors with allocation calculations

2016-02-29 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-1187:
--

Assignee: Neil Conway  (was: Klaus Ma)

> precision errors with allocation calculations
> -
>
> Key: MESOS-1187
> URL: https://issues.apache.org/jira/browse/MESOS-1187
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: aniruddha sathaye
>Assignee: Neil Conway
>  Labels: mesosphere
> Fix For: 0.28.0, 0.27.2, 0.26.1, 0.25.1, 0.24.2
>
>
> As allocations are stored/transmitted as doubles many a times precision 
> errors creep in. 
> we have seen erroneous share calculations happen only because of floating 
> point arithmetic. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4697) Consolidate cgroup isolators into one single isolator.

2016-02-29 Thread Ian Downes (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172285#comment-15172285
 ] 

Ian Downes commented on MESOS-4697:
---

Those are valid reasons to consider changing the current code but can you talk 
more specifically about how a consolidated isolator would address, or 
deliberately not address, the original motivations? composability, 
extensibility, etc.

Can you confirm this would just be for isolation that manipulates cgroups? How 
would it be extensible? How would different functionality be selected, e.g., if 
there's a single cgroups isolator that includes net_cls but the operator did 
not want to enable, instead selecting something like the current 
network/port_mapping isolator? Could (1) not be addressed perhaps by 
composition with a generic cgroups isolator? and (2) could be addressed by 
fixing the existing somewhat rigid isolator <-> controller mapping while still 
preserving the separation of resource isolation, e.g., cpu and cpuacct 
naturally belong together. What hierarchy configuration(s) would the new 
isolator support? Would it be a new isolator or would it replace the existing 
isolators?

> Consolidate cgroup isolators into one single isolator.
> --
>
> Key: MESOS-4697
> URL: https://issues.apache.org/jira/browse/MESOS-4697
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: haosdent
> Attachments: cgroup_v2.pdf
>
>
> Linux introduce the unified cgroup hierarchy since 3.16 [The unified control 
> group hierarchy in 3.16|https://lwn.net/Articles/601840/], 
> [cgroup-v2|https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|]
> There are two motivations for this:
> 1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
> mem, net_cls, etc.), many of the logics are the same. We are currently 
> duplicating a lot of the code.
> 2) Initially, we decided to use a separate isolator for each cgroup subsystem 
> is because we want each subsystem to be mounted under a different hierarchy. 
> This gradually become not true with unified cgroup hierarchy introduced in 
> kernel 3.16. Also, on some popular linux distributions, some subsystems are 
> co-mounted within the same hierarchy (e.g., net_cls and net_prio, cpu and 
> cpuacct). It becomes very hard to co-manage a hierarchy by two isolators.
> We can still introduce subsystem specific code under the unified cgroup 
> isolator (e.g., introduce a Subsystem abstraction?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4447) Updated reserved() API

2016-02-29 Thread Benjamin Mahler (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172278#comment-15172278
 ] 

Benjamin Mahler commented on MESOS-4447:


Hm.. I can't tell why we're doing this change:
https://reviews.apache.org/r/42590/diff/3#1

If I remember correctly, we have these two overloads because the return types 
are different. For the first function {{hashmap 
reserved()}}, we want to obtain a mapping of the reserved resources, indexed by 
the role. The second function {{Resources reserved(string role)}} is equivalent 
to an entry in the map returned by the first function. What are the issues with 
these and why are you trying to consolidate them?

> Updated reserved() API
> --
>
> Key: MESOS-4447
> URL: https://issues.apache.org/jira/browse/MESOS-4447
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> There are some problems for current {{reserve}} API. The problem is as 
> following:
> {code}
> hashmap Resources::reserved() const
> {
>   hashmap result;
>   foreach (const Resource& resource, resources) {
> if (isReserved(resource)) {
>   result[resource.role()] += resource;
> }
>   }
>   return result;
> }
> Resources Resources::reserved(const string& role) const
> {
>   return filter(lambda::bind(isReserved, lambda::_1, role));
> }
> bool Resources::isReserved(
> const Resource& resource,
> const Option& role)
> {
>   if (role.isSome()) {
> return !isUnreserved(resource) && role.get() == resource.role();
>   } else {
> return !isUnreserved(resource);
>   }
> }
> {code}
> This caused the {{reserved(const string& role) }} has no chance to transfer a 
>   None() parameter to get all reserved resources in flatten mode.
> The solution is remove {{reserved()}} and update {{reserved(const string& 
> role) }} to {{reserved(const Option& role = None()) }}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4812) Mesos fails to escape command health checks

2016-02-29 Thread Lukas Loesche (JIRA)

Lukas Loesche created MESOS-4812:


 Summary: Mesos fails to escape command health checks
 Key: MESOS-4812
 URL: https://issues.apache.org/jira/browse/MESOS-4812
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.25.0
Reporter: Lukas Loesche


As described in https://github.com/mesosphere/marathon/issues/
I would like to run a command health check
{noformat}
/bin/bash -c "

[jira] [Assigned] (MESOS-4812) Mesos fails to escape command health checks

2016-02-29 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-4812:
---

Assignee: Benjamin Bannier

> Mesos fails to escape command health checks
> ---
>
> Key: MESOS-4812
> URL: https://issues.apache.org/jira/browse/MESOS-4812
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Lukas Loesche
>Assignee: Benjamin Bannier
>
> As described in https://github.com/mesosphere/marathon/issues/
> I would like to run a command health check
> {noformat}
> /bin/bash -c " {noformat}
> The health check fails because Mesos, while running the command inside double 
> quotes of a sh -c "" doesn't escape the double quotes in the command.
> If I escape the double quotes myself the command health check succeeds. But 
> this would mean that the user needs intimate knowledge of how Mesos executes 
> his commands which can't be right.
> I was told this is not a Marathon but a Mesos issue so am opening this JIRA. 
> I don't know if this only affects the command health check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4697) Consolidate cgroup isolators into one single isolator.

2016-02-29 Thread haosdent (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-4697:

Description: 
Linux introduce the unified cgroup hierarchy since 3.16 [The unified control 
group hierarchy in 3.16|https://lwn.net/Articles/601840/], 
[cgroup-v2|https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|]

There are two motivations for this:
1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
mem, net_cls, etc.), many of the logics are the same. We are currently 
duplicating a lot of the code.
2) Initially, we decided to use a separate isolator for each cgroup subsystem 
is because we want each subsystem to be mounted under a different hierarchy. 
This gradually become not true with unified cgroup hierarchy introduced in 
kernel 3.16. Also, on some popular linux distributions, some subsystems are 
co-mounted within the same hierarchy (e.g., net_cls and net_prio, cpu and 
cpuacct). It becomes very hard to co-manage a hierarchy by two isolators.

We can still introduce subsystem specific code under the unified cgroup 
isolator (e.g., introduce a Subsystem abstraction?).

  was:
Linux introduce the unified cgroup hierarchy since 3.16 
[https://lwn.net/Articles/601840/|The unified control group hierarchy in 3.16] 
[https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|cgroup-v2]

There are two motivations for this:
1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
mem, net_cls, etc.), many of the logics are the same. We are currently 
duplicating a lot of the code.
2) Initially, we decided to use a separate isolator for each cgroup subsystem 
is because we want each subsystem to be mounted under a different hierarchy. 
This gradually become not true with unified cgroup hierarchy introduced in 
kernel 3.16. Also, on some popular linux distributions, some subsystems are 
co-mounted within the same hierarchy (e.g., net_cls and net_prio, cpu and 
cpuacct). It becomes very hard to co-manage a hierarchy by two isolators.

We can still introduce subsystem specific code under the unified cgroup 
isolator (e.g., introduce a Subsystem abstraction?).


> Consolidate cgroup isolators into one single isolator.
> --
>
> Key: MESOS-4697
> URL: https://issues.apache.org/jira/browse/MESOS-4697
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: haosdent
> Attachments: cgroup_v2.pdf
>
>
> Linux introduce the unified cgroup hierarchy since 3.16 [The unified control 
> group hierarchy in 3.16|https://lwn.net/Articles/601840/], 
> [cgroup-v2|https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|]
> There are two motivations for this:
> 1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
> mem, net_cls, etc.), many of the logics are the same. We are currently 
> duplicating a lot of the code.
> 2) Initially, we decided to use a separate isolator for each cgroup subsystem 
> is because we want each subsystem to be mounted under a different hierarchy. 
> This gradually become not true with unified cgroup hierarchy introduced in 
> kernel 3.16. Also, on some popular linux distributions, some subsystems are 
> co-mounted within the same hierarchy (e.g., net_cls and net_prio, cpu and 
> cpuacct). It becomes very hard to co-manage a hierarchy by two isolators.
> We can still introduce subsystem specific code under the unified cgroup 
> isolator (e.g., introduce a Subsystem abstraction?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4697) Consolidate cgroup isolators into one single isolator.

2016-02-29 Thread haosdent (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-4697:

Description: 
Linux introduce the unified cgroup hierarchy since 3.16 
[https://lwn.net/Articles/601840/|The unified control group hierarchy in 3.16] 
[https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|cgroup-v2]

There are two motivations for this:
1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
mem, net_cls, etc.), many of the logics are the same. We are currently 
duplicating a lot of the code.
2) Initially, we decided to use a separate isolator for each cgroup subsystem 
is because we want each subsystem to be mounted under a different hierarchy. 
This gradually become not true with unified cgroup hierarchy introduced in 
kernel 3.16. Also, on some popular linux distributions, some subsystems are 
co-mounted within the same hierarchy (e.g., net_cls and net_prio, cpu and 
cpuacct). It becomes very hard to co-manage a hierarchy by two isolators.

We can still introduce subsystem specific code under the unified cgroup 
isolator (e.g., introduce a Subsystem abstraction?).

  was:
There are two motivations for this:
1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
mem, net_cls, etc.), many of the logics are the same. We are currently 
duplicating a lot of the code.
2) Initially, we decided to use a separate isolator for each cgroup subsystem 
is because we want each subsystem to be mounted under a different hierarchy. 
This gradually become not true with unified cgroup hierarchy introduced in 
kernel 3.16. Also, on some popular linux distributions, some subsystems are 
co-mounted within the same hierarchy (e.g., net_cls and net_prio, cpu and 
cpuacct). It becomes very hard to co-manage a hierarchy by two isolators.

We can still introduce subsystem specific code under the unified cgroup 
isolator (e.g., introduce a Subsystem abstraction?).


> Consolidate cgroup isolators into one single isolator.
> --
>
> Key: MESOS-4697
> URL: https://issues.apache.org/jira/browse/MESOS-4697
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: haosdent
> Attachments: cgroup_v2.pdf
>
>
> Linux introduce the unified cgroup hierarchy since 3.16 
> [https://lwn.net/Articles/601840/|The unified control group hierarchy in 
> 3.16] 
> [https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|cgroup-v2]
> There are two motivations for this:
> 1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
> mem, net_cls, etc.), many of the logics are the same. We are currently 
> duplicating a lot of the code.
> 2) Initially, we decided to use a separate isolator for each cgroup subsystem 
> is because we want each subsystem to be mounted under a different hierarchy. 
> This gradually become not true with unified cgroup hierarchy introduced in 
> kernel 3.16. Also, on some popular linux distributions, some subsystems are 
> co-mounted within the same hierarchy (e.g., net_cls and net_prio, cpu and 
> cpuacct). It becomes very hard to co-manage a hierarchy by two isolators.
> We can still introduce subsystem specific code under the unified cgroup 
> isolator (e.g., introduce a Subsystem abstraction?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4697) Consolidate cgroup isolators into one single isolator.

2016-02-29 Thread haosdent (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-4697:

Attachment: cgroup_v2.pdf

> Consolidate cgroup isolators into one single isolator.
> --
>
> Key: MESOS-4697
> URL: https://issues.apache.org/jira/browse/MESOS-4697
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: haosdent
> Attachments: cgroup_v2.pdf
>
>
> There are two motivations for this:
> 1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
> mem, net_cls, etc.), many of the logics are the same. We are currently 
> duplicating a lot of the code.
> 2) Initially, we decided to use a separate isolator for each cgroup subsystem 
> is because we want each subsystem to be mounted under a different hierarchy. 
> This gradually become not true with unified cgroup hierarchy introduced in 
> kernel 3.16. Also, on some popular linux distributions, some subsystems are 
> co-mounted within the same hierarchy (e.g., net_cls and net_prio, cpu and 
> cpuacct). It becomes very hard to co-manage a hierarchy by two isolators.
> We can still introduce subsystem specific code under the unified cgroup 
> isolator (e.g., introduce a Subsystem abstraction?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4796) Debug ability enhancement for unified container

2016-02-29 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172045#comment-15172045
 ] 

Guangya Liu commented on MESOS-4796:


Docker puller (Both local and registry) RR: https://reviews.apache.org/r/44164/

> Debug ability enhancement for unified container
> ---
>
> Key: MESOS-4796
> URL: https://issues.apache.org/jira/browse/MESOS-4796
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> The following are some start point for what I want to do for this after some 
> discussion with [~jieyu], there will be more enhancement later. 
> docker/local_puller:
> LocalPullerProcess::extractLayer: add some detail for how to extract
> LocalPullerProcess::pull: Message needs to be updated to add image info the 
> log info
> docker/puller.cpp: 
> Puller::create: Clarify which puller is using: local or registry
> docker/registery_puller.cpp
> RegistryPullerProcess::pull: Clarify which image is going to be pulled
> RegistryPullerProcess::__pull: Add some detail for roots,layerPath, tarpath, 
> Json etc when creat layer path.
> RegistryPullerProcess::fetchBlobs: The log message needs to be updated for 
> reference: stringify(reference)
> backends/bind.cpp:
> BindBackendProcess::provision: Add more detail for provision, such as mount 
> point etc.
> BindBackendProcess::destroy: Add which roots is destroying.
> backends/copy.cpp:
> CopyBackendProcess::destroy: Add which roots is destroying.
> CopyBackendProcess::provision: Add more detail for provision info, such as 
> rootfs etc.
> mesos/isolators/docker/runtime.cpp
> add some logs to clarify some detail for 
> DockerRuntimeIsolatorProcess::prepare for how does the docker run time 
> isolator is prepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2728) Introduce concept of cluster wide resources.

2016-02-29 Thread James DeFelice (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James DeFelice updated MESOS-2728:
--
Description: 
There are resources which are not provided by a single node. Consider for 
example a external Network Bandwidth of a cluster. Being a limited resource it 
makes sense for Mesos to manage it but still it is not a resource being offered 
by a single node. A cluster-wide resource is still consumed by a task, and when 
that task completes, the resources are then available to be allocated to 
another framework/task.

Use Cases:
1. Network Bandwidth
2. IP Addresses
3. Global Service Ports
4. Distributed File System Storage
5. Software Licences
6. SAN Volumes



  was:
There are resources which are not provided by a single node. Consider for 
example a external Network Bandwidth of a cluster. Being a limited resource it 
makes sense for Mesos to manage it but still it is not a resource being offered 
by a single node. A cluster-wide resource is still consumed by a task, and when 
that task completes, the resources are then available to be allocated to 
another framework/task.

Use Cases:
1. Network Bandwidth
2. IP Addresses
3. Global Service Ports
2. Distributed File System Storage
3. Software Licences





> Introduce concept of cluster wide resources.
> 
>
> Key: MESOS-2728
> URL: https://issues.apache.org/jira/browse/MESOS-2728
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: external-volumes, mesosphere
>
> There are resources which are not provided by a single node. Consider for 
> example a external Network Bandwidth of a cluster. Being a limited resource 
> it makes sense for Mesos to manage it but still it is not a resource being 
> offered by a single node. A cluster-wide resource is still consumed by a 
> task, and when that task completes, the resources are then available to be 
> allocated to another framework/task.
> Use Cases:
> 1. Network Bandwidth
> 2. IP Addresses
> 3. Global Service Ports
> 4. Distributed File System Storage
> 5. Software Licences
> 6. SAN Volumes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-02-29 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3937:
--
Shepherd: Till Toenshoff  (was: Bernd Mathiske)

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Jan Schlicht
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a
> I1117 15:08:09.296115 26399 master.cpp:1619] E

[jira] [Updated] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-02-29 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3937:
--
Assignee: Jan Schlicht

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Jan Schlicht
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a
> I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading mas

[jira] [Updated] (MESOS-4811) Reusable/Cacheable Offer

2016-02-29 Thread Klaus Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-4811:

Component/s: allocation

> Reusable/Cacheable Offer
> 
>
> Key: MESOS-4811
> URL: https://issues.apache.org/jira/browse/MESOS-4811
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Klaus Ma
>
> Currently, the resources are return back to allocator when task finished; and 
> those resources are not allocated to framework until next allocation cycle. 
> The performance is low for short running tasks (MESOS-3078). The proposed 
> solution is to let framework keep using the offer until allocator decide to 
> rescind it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4811) Reusable/Cacheable Offer

2016-02-29 Thread Klaus Ma (JIRA)

Klaus Ma created MESOS-4811:
---

 Summary: Reusable/Cacheable Offer
 Key: MESOS-4811
 URL: https://issues.apache.org/jira/browse/MESOS-4811
 Project: Mesos
  Issue Type: Bug
Reporter: Klaus Ma


Currently, the resources are return back to allocator when task finished; and 
those resources are not allocated to framework until next allocation cycle. The 
performance is low for short running tasks (MESOS-3078). The proposed solution 
is to let framework keep using the offer until allocator decide to rescind it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4810) ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.

2016-02-29 Thread Bernd Mathiske (JIRA)

Bernd Mathiske created MESOS-4810:
-

 Summary: 
ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.
 Key: MESOS-4810
 URL: https://issues.apache.org/jira/browse/MESOS-4810
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.28.0
 Environment: CentOS 7 on AWS, both with or without SSL.
Reporter: Bernd Mathiske


{noformat}
[09:46:46] : [Step 11/11] [ RUN  ] 
ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand
[09:46:46]W: [Step 11/11] I0229 09:46:46.628413  1166 leveldb.cpp:174] 
Opened db in 4.242882ms
[09:46:46]W: [Step 11/11] I0229 09:46:46.629926  1166 leveldb.cpp:181] 
Compacted db in 1.483621ms
[09:46:46]W: [Step 11/11] I0229 09:46:46.629966  1166 leveldb.cpp:196] 
Created db iterator in 15498ns
[09:46:46]W: [Step 11/11] I0229 09:46:46.629977  1166 leveldb.cpp:202] 
Seeked to beginning of db in 1405ns
[09:46:46]W: [Step 11/11] I0229 09:46:46.629984  1166 leveldb.cpp:271] 
Iterated through 0 keys in the db in 239ns
[09:46:46]W: [Step 11/11] I0229 09:46:46.630015  1166 replica.cpp:779] 
Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
[09:46:46]W: [Step 11/11] I0229 09:46:46.630470  1183 recover.cpp:447] 
Starting replica recovery
[09:46:46]W: [Step 11/11] I0229 09:46:46.630702  1180 recover.cpp:473] 
Replica is in EMPTY status
[09:46:46]W: [Step 11/11] I0229 09:46:46.631767  1182 replica.cpp:673] 
Replica in EMPTY status received a broadcasted recover request from 
(14567)@172.30.2.124:37431
[09:46:46]W: [Step 11/11] I0229 09:46:46.632115  1183 recover.cpp:193] 
Received a recover response from a replica in EMPTY status
[09:46:46]W: [Step 11/11] I0229 09:46:46.632450  1186 recover.cpp:564] 
Updating replica status to STARTING
[09:46:46]W: [Step 11/11] I0229 09:46:46.633476  1186 master.cpp:375] 
Master 3fbb2fb0-4f18-498b-a440-9acbf6923a13 (ip-172-30-2-124.mesosphere.io) 
started on 172.30.2.124:37431
[09:46:46]W: [Step 11/11] I0229 09:46:46.633491  1186 master.cpp:377] Flags 
at startup: --acls="" --allocation_interval="1secs" 
--allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" 
--authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/4UxXoW/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/4UxXoW/master" 
--zk_session_timeout="10secs"
[09:46:46]W: [Step 11/11] I0229 09:46:46.633677  1186 master.cpp:422] 
Master only allowing authenticated frameworks to register
[09:46:46]W: [Step 11/11] I0229 09:46:46.633685  1186 master.cpp:427] 
Master only allowing authenticated slaves to register
[09:46:46]W: [Step 11/11] I0229 09:46:46.633692  1186 credentials.hpp:35] 
Loading credentials for authentication from '/tmp/4UxXoW/credentials'
[09:46:46]W: [Step 11/11] I0229 09:46:46.633851  1183 leveldb.cpp:304] 
Persisting metadata (8 bytes) to leveldb took 1.191043ms
[09:46:46]W: [Step 11/11] I0229 09:46:46.633873  1183 replica.cpp:320] 
Persisted replica status to STARTING
[09:46:46]W: [Step 11/11] I0229 09:46:46.633894  1186 master.cpp:467] Using 
default 'crammd5' authenticator
[09:46:46]W: [Step 11/11] I0229 09:46:46.634003  1186 master.cpp:536] Using 
default 'basic' HTTP authenticator
[09:46:46]W: [Step 11/11] I0229 09:46:46.634062  1184 recover.cpp:473] 
Replica is in STARTING status
[09:46:46]W: [Step 11/11] I0229 09:46:46.634109  1186 master.cpp:570] 
Authorization enabled
[09:46:46]W: [Step 11/11] I0229 09:46:46.634249  1187 
whitelist_watcher.cpp:77] No whitelist given
[09:46:46]W: [Step 11/11] I0229 09:46:46.634255  1184 hierarchical.cpp:144] 
Initialized hierarchical allocator process
[09:46:46]W: [Step 11/11] I0229 09:46:46.634884  1187 replica.cpp:673] 
Replica in STARTING status received a broadcasted recover request from 
(14569)@172.30.2.124:37431
[09:46:46]W: [Step 11/11] I0229 09:46:46.635278  1181 recover.cpp:193] 
Received a recover response from a replica in STARTING status
[09:46:46]W: [Step 11/11] I0229 09:46:46.635742  1187 recover.cpp:564] 
Updating replica status to VOTING
[09:46:46]W: [Step 11/11] I0229 09:46:46.636391  1180 master.cpp:1711] The 
newly elec

[jira] [Updated] (MESOS-4810) ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.

2016-02-29 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4810:
--
Labels: docker test  (was: )

> ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.
> --
>
> Key: MESOS-4810
> URL: https://issues.apache.org/jira/browse/MESOS-4810
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.28.0
> Environment: CentOS 7 on AWS, both with or without SSL.
>Reporter: Bernd Mathiske
>  Labels: docker, test
>
> {noformat}
> [09:46:46] :   [Step 11/11] [ RUN  ] 
> ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.628413  1166 leveldb.cpp:174] 
> Opened db in 4.242882ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629926  1166 leveldb.cpp:181] 
> Compacted db in 1.483621ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629966  1166 leveldb.cpp:196] 
> Created db iterator in 15498ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629977  1166 leveldb.cpp:202] 
> Seeked to beginning of db in 1405ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629984  1166 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 239ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630015  1166 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630470  1183 recover.cpp:447] 
> Starting replica recovery
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630702  1180 recover.cpp:473] 
> Replica is in EMPTY status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.631767  1182 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (14567)@172.30.2.124:37431
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.632115  1183 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.632450  1186 recover.cpp:564] 
> Updating replica status to STARTING
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633476  1186 master.cpp:375] 
> Master 3fbb2fb0-4f18-498b-a440-9acbf6923a13 (ip-172-30-2-124.mesosphere.io) 
> started on 172.30.2.124:37431
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633491  1186 master.cpp:377] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/4UxXoW/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/4UxXoW/master" 
> --zk_session_timeout="10secs"
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633677  1186 master.cpp:422] 
> Master only allowing authenticated frameworks to register
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633685  1186 master.cpp:427] 
> Master only allowing authenticated slaves to register
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633692  1186 credentials.hpp:35] 
> Loading credentials for authentication from '/tmp/4UxXoW/credentials'
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633851  1183 leveldb.cpp:304] 
> Persisting metadata (8 bytes) to leveldb took 1.191043ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633873  1183 replica.cpp:320] 
> Persisted replica status to STARTING
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633894  1186 master.cpp:467] Using 
> default 'crammd5' authenticator
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634003  1186 master.cpp:536] Using 
> default 'basic' HTTP authenticator
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634062  1184 recover.cpp:473] 
> Replica is in STARTING status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634109  1186 master.cpp:570] 
> Authorization enabled
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634249  1187 
> whitelist_watcher.cpp:77] No whitelist given
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634255  1184 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634884  1187 replica.cpp:673] 
> Replica in STARTING status received a broadcasted recover request from 
> (14569)@172.30

[jira] [Commented] (MESOS-4047) MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky

2016-02-29 Thread Bernd Mathiske (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171670#comment-15171670
 ] 

Bernd Mathiske commented on MESOS-4047:
---

https://reviews.apache.org/r/43799/

> MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky
> ---
>
> Key: MESOS-4047
> URL: https://issues.apache.org/jira/browse/MESOS-4047
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
> Environment: Ubuntu 14, gcc 4.8.4
>Reporter: Joseph Wu
>Assignee: Alexander Rojas
>  Labels: flaky, flaky-test
> Fix For: 0.28.0
>
>
> {code:title=Output from passed test}
> [--] 1 test from MemoryPressureMesosTest
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB) copied, 0.000430889 s, 2.4 GB/s
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> I1202 11:09:14.319327  5062 exec.cpp:134] Version: 0.27.0
> I1202 11:09:14.17  5079 exec.cpp:208] Executor registered on slave 
> bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> Registered executor on ubuntu
> Starting task 4e62294c-cfcf-4a13-b699-c6a4b7ac5162
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 5085
> I1202 11:09:14.391739  5077 exec.cpp:254] Received reconnect request from 
> slave bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> I1202 11:09:14.398598  5082 exec.cpp:231] Executor re-registered on slave 
> bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> Re-registered executor on ubuntu
> Shutting down
> Sending SIGTERM to process tree at pid 5085
> Killing the following process trees:
> [ 
> -+- 5085 sh -c while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done 
>  \--- 5086 dd count=512 bs=1M if=/dev/zero of=./temp 
> ]
> [   OK ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (1096 ms)
> {code}
> {code:title=Output from failed test}
> [--] 1 test from MemoryPressureMesosTest
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB) copied, 0.000404489 s, 2.6 GB/s
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> I1202 11:09:15.509950  5109 exec.cpp:134] Version: 0.27.0
> I1202 11:09:15.568183  5123 exec.cpp:208] Executor registered on slave 
> 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0
> Registered executor on ubuntu
> Starting task 14b6bab9-9f60-4130-bdc4-44efba262bc6
> Forked command at 5132
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> I1202 11:09:15.665498  5129 exec.cpp:254] Received reconnect request from 
> slave 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0
> I1202 11:09:15.670995  5123 exec.cpp:381] Executor asked to shutdown
> Shutting down
> Sending SIGTERM to process tree at pid 5132
> ../../src/tests/containerizer/memory_pressure_tests.cpp:283: Failure
> (usage).failure(): Unknown container: ebe90e15-72fa-4519-837b-62f43052c913
> *** Aborted at 1449083355 (unix time) try "date -d @1449083355" if you are 
> using GNU date ***
> {code}
> Notice that in the failed test, the executor is asked to shutdown when it 
> tries to reconnect to the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4786) Example in C++ style guide uses wrong indention for wrapped line

2016-02-29 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4786:

Sprint: Mesosphere Sprint 29

> Example in C++ style guide uses wrong indention for wrapped line
> 
>
> Key: MESOS-4786
> URL: https://issues.apache.org/jira/browse/MESOS-4786
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Trivial
> Fix For: 0.28.0
>
>
> {code}
> Try long_name =
> ::protobuf::parse(
> request);
> {code}
> Here the second line should be indented by two spaces since it is a wrapped 
> assignment; the corresponding rule is laid out in the preceeding paragraph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4784) SlaveTest.MetricsSlaveLaunchErrors test relies on implicit blocking behavior hitting the global metrics endpoint

2016-02-29 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4784:

Sprint: Mesosphere Sprint 29

> SlaveTest.MetricsSlaveLaunchErrors test relies on implicit blocking behavior 
> hitting the global metrics endpoint
> 
>
> Key: MESOS-4784
> URL: https://issues.apache.org/jira/browse/MESOS-4784
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
> Fix For: 0.28.0
>
>
> The test attempts to observe a change in the 
> {{slave/container_launch_errors}} metric, but does not wait for the 
> triggering action to take place. Currently the test passes since hitting the 
> endpoint blocks for some rate limit-related time which provides under many 
> circumstances enough wait time for the action to take place. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4808) Allocation in batch instead of execute it every-time when addSlave/addFramework.

2016-02-29 Thread Klaus Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-4808:

Description: Currently, {{allocate()}} are executed every-time when a new 
slave/framework are registered; if there're lots of agent start all most the 
same time, the allocation will keep running for a while. It's acceptable 
behaviour to allocate resources in next allocation cycle. But when a task is 
finished, it's better to allocate ASAP although there's performances issues; 
refer to MESOS-3078 for more detail on short running tasks.  (was: Currently, 
{{allocate()}} are executed every-time when a new slave/framework are 
registered; if there're lots of agent start all most the same time, the 
allocation will keep running for a while. It's acceptable behaviour to allocate 
resources in next allocation cycle. But when a task is finished, it's better to 
allocate ASAP; refer to MESOS-3078 for more detail on short running tasks.)

> Allocation in batch instead of execute it every-time when 
> addSlave/addFramework.
> 
>
> Key: MESOS-4808
> URL: https://issues.apache.org/jira/browse/MESOS-4808
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Klaus Ma
>  Labels: master, tech-debt
>
> Currently, {{allocate()}} are executed every-time when a new slave/framework 
> are registered; if there're lots of agent start all most the same time, the 
> allocation will keep running for a while. It's acceptable behaviour to 
> allocate resources in next allocation cycle. But when a task is finished, 
> it's better to allocate ASAP although there's performances issues; refer to 
> MESOS-3078 for more detail on short running tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4808) Allocation in batch instead of execute it every-time when addSlave/addFramework.

2016-02-29 Thread Klaus Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-4808:

Assignee: (was: Klaus Ma)

> Allocation in batch instead of execute it every-time when 
> addSlave/addFramework.
> 
>
> Key: MESOS-4808
> URL: https://issues.apache.org/jira/browse/MESOS-4808
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Klaus Ma
>  Labels: master, tech-debt
>
> Currently, {{allocate()}} are executed every-time when a new slave/framework 
> are registered; if there're lots of agent start all most the same time, the 
> allocation will keep running for a while. It's acceptable behaviour to 
> allocate resources in next allocation cycle. But when a task is finished, 
> it's better to allocate ASAP; refer to MESOS-3078 for more detail on short 
> running tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4806) LeveDBStateTests write to the current directory

2016-02-29 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4806:

Assignee: Benjamin Bannier

> LeveDBStateTests write to the current directory
> ---
>
> Key: MESOS-4806
> URL: https://issues.apache.org/jira/browse/MESOS-4806
> Project: Mesos
>  Issue Type: Bug
>  Components: test, tests
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: newbie, parallel-tests
>
> All {{LevelDBStateTest}} tests write to the current directory. This is bad 
> for a number of reasons, e.g.,
> * should the test fail data might be leaked to random locations,
> * the test cannot be executed from a write-only directory, or
> * executing tests from the same suite in parallel (e.g., with 
> {{gtest-parallel}} would race on the existence of the created files, and show 
> bogus behavior.
> The tests should probably be executed from a temporary directory, e.g., via 
> stout's {{TemporaryDirectoryTest}} fixture.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4809) Allow parallel execution of tests

2016-02-29 Thread Benjamin Bannier (JIRA)

Benjamin Bannier created MESOS-4809:
---

 Summary: Allow parallel execution of tests
 Key: MESOS-4809
 URL: https://issues.apache.org/jira/browse/MESOS-4809
 Project: Mesos
  Issue Type: Epic
Reporter: Benjamin Bannier
Priority: Minor


We should allow parallel execution of tests. There are two flavors to this:

(a) tests are run in parallel in the same process, or
(b) tests are run in parallel with separate processes (e.g., with 
gtest-parallel).

While (a) likely has overall better performance, it depends on tests being 
independent of global state (e.g., current directory, and others). On the other 
hand, already (b) improves execution time, and has much smaller requirements.

This epic tracks efforts to fix test to allow scenario (b) above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4808) Allocation in batch instead of execute it every-time when addSlave/addFramework.

2016-02-29 Thread Klaus Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-4808:

Labels: master tech-debt  (was: )

> Allocation in batch instead of execute it every-time when 
> addSlave/addFramework.
> 
>
> Key: MESOS-4808
> URL: https://issues.apache.org/jira/browse/MESOS-4808
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>  Labels: master, tech-debt
>
> Currently, {{allocate()}} are executed every-time when a new slave/framework 
> are registered; if there're lots of agent start all most the same time, the 
> allocation will keep running for a while. It's acceptable behaviour to 
> allocate resources in next allocation cycle. But when a task is finished, 
> it's better to allocate ASAP; refer to MESOS-3078 for more detail on short 
> running tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 111 matches

Mail list logo