[jira] [Commented] (MESOS-1806) Substituting etcd or ReplicatedLog for Zookeeper

2015-09-15 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746694#comment-14746694
 ] 

Shuai Lin commented on MESOS-1806:
--

Progress update:

I have rebased the code to the latest upstream master, and made some fixes so 
it could compile. 

https://github.com/lins05/mesos/tree/etcd

However the etcd_test.sh script is failing:

{code:title=./src/tests/etcd_test.sh error output|borderStyle=solid}
Recovery failed: Failed to recover registrar: Failed to perform fetch within 
1mins
{code}

I'll check where the problem is this week.


> Substituting etcd or ReplicatedLog for Zookeeper
> 
>
> Key: MESOS-1806
> URL: https://issues.apache.org/jira/browse/MESOS-1806
> Project: Mesos
>  Issue Type: Task
>Reporter: Ed Ropple
>Assignee: Shuai Lin
>Priority: Minor
>
>eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint

2015-09-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2907:
--
Description: 
This is the first basic step in ensuring the basic /call functionality: 

- Set up the route on the slave for "api/v1/executor" endpoint.
- The endpoint should perform basic header/protobuf validation and return {501 
NotImplemented} for now.
- Introduce initial tests in executor_api_tests.cpp that just verify the status 
code.


  was:
This is the first basic step in ensuring the basic /call functionality: 
processing a
POST /call
and returning:
202 if all goes well;
401 if not authorized; and
403 if the request is malformed.

Also , we might need to store some identifier which enables us to reject calls 
to /call if the client has not issued a SUBSCRIBE/RESUBSCRIBE Request.


> Slave : Create Basic Functionality to handle /call endpoint
> ---
>
> Key: MESOS-2907
> URL: https://issues.apache.org/jira/browse/MESOS-2907
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> This is the first basic step in ensuring the basic /call functionality: 
> - Set up the route on the slave for "api/v1/executor" endpoint.
> - The endpoint should perform basic header/protobuf validation and return 
> {501 NotImplemented} for now.
> - Introduce initial tests in executor_api_tests.cpp that just verify the 
> status code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Agent : Create Basic Functionality to handle /call endpoint

2015-09-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2907:
--
Description: 
This is the first basic step in ensuring the basic /call functionality: 

- Set up the route on the agent for "api/v1/executor" endpoint.
- The endpoint should perform basic header/protobuf validation and return {{501 
NotImplemented}} for now.
- Introduce initial tests in executor_api_tests.cpp that just verify the status 
code.


  was:
This is the first basic step in ensuring the basic /call functionality: 

- Set up the route on the slave for "api/v1/executor" endpoint.
- The endpoint should perform basic header/protobuf validation and return {{501 
NotImplemented}} for now.
- Introduce initial tests in executor_api_tests.cpp that just verify the status 
code.



> Agent : Create Basic Functionality to handle /call endpoint
> ---
>
> Key: MESOS-2907
> URL: https://issues.apache.org/jira/browse/MESOS-2907
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> This is the first basic step in ensuring the basic /call functionality: 
> - Set up the route on the agent for "api/v1/executor" endpoint.
> - The endpoint should perform basic header/protobuf validation and return 
> {{501 NotImplemented}} for now.
> - Introduce initial tests in executor_api_tests.cpp that just verify the 
> status code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3435) Add Hyper as Mesos Docker alike support

2015-09-15 Thread Deshi Xiao (JIRA)
Deshi Xiao created MESOS-3435:
-

 Summary: Add Hyper as Mesos Docker alike support
 Key: MESOS-3435
 URL: https://issues.apache.org/jira/browse/MESOS-3435
 Project: Mesos
  Issue Type: Improvement
Reporter: Deshi Xiao


Hyper is Hypervisor-agnostic Docker Engine, I hope marathon can support 
it.(https://github.com/mesosphere/marathon/issues/1815)
https://hyper.sh/

In earlier talk about the implement possible with with Tim Chen, He suggest 
firstly implement the engine like mesos-src/docker/docker.hpp





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3421) Support sharing persistent volumes across task instances

2015-09-15 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743761#comment-14743761
 ] 

Anindya Sinha edited comment on MESOS-3421 at 9/15/15 9:47 PM:
---

I am proposing the following for persistent volumes to be shared across task 
containers:

i) Added a "optional bool shared" to the "Persistence" section of the protobuf 
Resource.DiskInfo. It defaults to false which means it retains the current 
behavior. Setting this to "true" when CREATE/DELETE of volumes is done would 
mark them as sharable persistent volumes.
ii) All persistent volumes that are sharable shall be offered to the 
corresponding framework(s) matching the "role" (even if a task used this volume 
in its task constraint). Tasks should be therefore be able to use this resource 
in its task constraint to schedule tasks on the same slave.

The idea is to maintain the list of "shared" persistent volumes, and offer them 
to the appropriate frameworks as valid resources inspite of this being assigned 
to a scheduled task on that agent node.



was (Author: anindya.sinha):
I am proposing the following for persistent volumes to be shared across task 
containers:

i) Added a "optional bool shared" to the "Persistence" section of the protobuf 
Resource.DiskInfo. It defaults to false which means it retains the current 
behavior. Setting this to "true" when CREATE/DELETE of volumes is done would 
mark them as sharable persistent volumes.
ii) All persistent volumes that are sharable shall be offered to the 
corresponding framework which own these persistent volumes (even if a task used 
this volume in its task constraint). Tasks should be therefore be able to use 
this resource in its task constraint to schedule tasks on the same slave.

The idea is to maintain the list of "shared" persistent volumes, and offer them 
to the appropriate frameworks as valid resources inspite of this being assigned 
to a scheduled task on that agent node.


> Support sharing persistent volumes across task instances
> 
>
> Key: MESOS-3421
> URL: https://issues.apache.org/jira/browse/MESOS-3421
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 0.23.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>
> A service that needs persistent volume needs to have access to the same 
> persistent volume (RW) from multiple task(s) instances on the same agent 
> node. Currently, a persistent volume once offered to the framework(s) can be 
> scheduled to a task and until that tasks terminates, that persistent volume 
> cannot be used by another task.
> Explore providing the capability of sharing persistent volumes across task 
> instances scheduled on a single agent node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3421) Support sharing persistent volumes across task instances

2015-09-15 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746327#comment-14746327
 ] 

Anindya Sinha commented on MESOS-3421:
--

I have updated my earlier comment.
Since persistent volumes CREATED by a framework is based on a per-role, the 
persistent volumes that are sharable shall be offered to all frameworks 
matching that role (and not ONLY to the framework that created it). I do not 
think we need to move to persistent volume per framework model necessarily 
before moving ahead with this JIRA.

Thanks for catching that.

> Support sharing persistent volumes across task instances
> 
>
> Key: MESOS-3421
> URL: https://issues.apache.org/jira/browse/MESOS-3421
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 0.23.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>
> A service that needs persistent volume needs to have access to the same 
> persistent volume (RW) from multiple task(s) instances on the same agent 
> node. Currently, a persistent volume once offered to the framework(s) can be 
> scheduled to a task and until that tasks terminates, that persistent volume 
> cannot be used by another task.
> Explore providing the capability of sharing persistent volumes across task 
> instances scheduled on a single agent node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3434) Add inline build process for third-party libraries in Windows, using CMake

2015-09-15 Thread Alex Clemmer (JIRA)
Alex Clemmer created MESOS-3434:
---

 Summary: Add inline build process for third-party libraries in 
Windows, using CMake
 Key: MESOS-3434
 URL: https://issues.apache.org/jira/browse/MESOS-3434
 Project: Mesos
  Issue Type: Task
  Components: build
Reporter: Alex Clemmer
Assignee: haosdent


Right now to build Mesos on Windows, we need to start the build process in VS 
(or NMake or whatever), then when it fails, we need to go to the third-party 
libraries like glog and build them individually and separately.

A better idea would be to have batch scripts that will build them inline, as 
part of the normal build process. haosdent has a good start here: 
https://reviews.apache.org/r/37273/ and https://reviews.apache.org/r/37275/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3136) COMMAND health checks with Marathon 0.10.0 are broken

2015-09-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744933#comment-14744933
 ] 

Adam B commented on MESOS-3136:
---

[~greggomann] is doing some (internal) backport testing for 
0.21.2,0.22.2,0.23.1,0.24.1 for the docker versioning patches from MESOS-2986 
(one of which you [~haosd...@gmail.com] wrote). Although this patch is likely 
unrelated to those others, if we can land it soon, it may be critical enough to 
include in at least one of those patch releases. Let's bring it up on the 
release proposal email thread: http://search-hadoop.com/m/0Vlr6PBeaOUhF241

> COMMAND health checks with Marathon 0.10.0 are broken
> -
>
> Key: MESOS-3136
> URL: https://issues.apache.org/jira/browse/MESOS-3136
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Dr. Stefan Schimanski
>Assignee: haosdent
>Priority: Critical
>
> When deploying Mesos 0.23rc4 with latest Marathon 0.10.0 RC3 command health 
> check stop working. Rolling back to Mesos 0.22.1 fixes the problem.
> Containerizer is Docker.
> All packages are from official Mesosphere Ubuntu 14.04 sources.
> The issue must be analyzed further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3430) LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails on CentOS 7.1

2015-09-15 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744970#comment-14744970
 ] 

haosdent commented on MESOS-3430:
-

I use CentOS 7.1 and XFS to test it. The behaviours is strange in CentOS 7.1: 
after make --bind sandbox sandbox and mark it as shared, every bind mount point 
under sandbox would create two records in /proc/self/mountinfo.

I also could reproduce this through shell
{code}
$ mkdir /tmp/sandbox
$ mkdir /tmp/source
$ mkdir /tmp/sandbox/target
$ mount --bind /tmp/sandbox /tmp/sandbox
$ mount --make-shared /tmp/sandbox
$ mount --bind /tmp/source /tmp/sandbox/target
{code}

{code}
104 38 8:3 /tmp/sandbox /tmp/sandbox rw,relatime shared:1 - xfs /dev/sda3 
rw,seclabel,attr2,inode64,noquota
107 104 8:3 /tmp/source /tmp/sandbox/target rw,relatime shared:1 - xfs 
/dev/sda3 rw,seclabel,attr2,inode64,noquota
108 38 8:3 /tmp/source /tmp/sandbox/target rw,relatime shared:1 - xfs /dev/sda3 
rw,seclabel,attr2,inode64,noquota
{code}

I think it maybe caused by xfs. Need more investigation to xfs.

And a quick way to solve this is to mark the sandbox as slave and then mark the 
sandbox as shared.
{code}
diff --git a/src/slave/containerizer/isolators/filesystem/linux.cpp 
b/src/slave/containerizer/isolators/filesystem/linux.cpp
index dbdbf87..9149838 100644
--- a/src/slave/containerizer/isolators/filesystem/linux.cpp
+++ b/src/slave/containerizer/isolators/filesystem/linux.cpp
@@ -312,6 +312,13 @@ Future

[jira] [Commented] (MESOS-3340) Command-line flags should take precedence over OS Env variables

2015-09-15 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744944#comment-14744944
 ] 

Klaus Ma commented on MESOS-3340:
-

Sure, I've sent an email to user@ mailing list :).

> Command-line flags should take precedence over OS Env variables
> ---
>
> Key: MESOS-3340
> URL: https://issues.apache.org/jira/browse/MESOS-3340
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 0.24.0
>Reporter: Marco Massenzio
>Assignee: Klaus Ma
>  Labels: mesosphere, tech-debt
>
> Currently, it appears that re-defining a flag on the command-line that was 
> already defined via a OS Env var ({{MESOS_*}}) causes the Master to fail with 
> a not very helpful message.
> For example, if one has {{MESOS_QUORUM}} defined, this happens:
> {noformat}
> $ ./mesos-master --zk=zk://192.168.1.4/mesos --quorum=1 
> --hostname=192.168.1.4 --ip=192.168.1.4
> Duplicate flag 'quorum' on command line
> {noformat}
> which is not very helpful.
> Ideally, we would parse the flags with a "well-known" priority (command-line 
> first, environment last) - but at the very least, the error message should be 
> more helpful in explaining what the issue is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2063) Add InverseOffer to C++ Scheduler API.

2015-09-15 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744909#comment-14744909
 ] 

Qian Zhang commented on MESOS-2063:
---

Currently, there is no plan to add this to old scheduler API, see the following 
mail thread for detailed discussion:
http://www.mail-archive.com/dev@mesos.apache.org/msg33184.html

> Add InverseOffer to C++ Scheduler API.
> --
>
> Key: MESOS-2063
> URL: https://issues.apache.org/jira/browse/MESOS-2063
> Project: Mesos
>  Issue Type: Task
>  Components: c++ api
>Reporter: Benjamin Mahler
>Assignee: Qian Zhang
>  Labels: mesosphere, twitter
>
> The initial use case for InverseOffer in the framework API will be the 
> maintenance primitives in mesos: MESOS-1474.
> One way to add these to the C++ Scheduler API is to add a new callback:
> {code}
>   virtual void inverseResourceOffers(
>   SchedulerDriver* driver,
>   const std::vector& inverseOffers) = 0;
> {code}
> libmesos compatibility will need to be figured out here.
> We may want to leave the C++ binding untouched in favor of Event/Call, in 
> order to not break API compatibility for schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3015) Add hooks for Slave exits

2015-09-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744908#comment-14744908
 ] 

Adam B commented on MESOS-3015:
---

This came up again in the context of an external storage manager that needs to 
know when a slave exits, so it can unmount any volumes that were attached to 
that slave. Maybe the real solution would be a Mesos event bus that the other 
process can listen to.

> Add hooks for Slave exits
> -
>
> Key: MESOS-3015
> URL: https://issues.apache.org/jira/browse/MESOS-3015
> Project: Mesos
>  Issue Type: Task
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> The hook will be triggered on slave exits. A master hook module can use this 
> to do Slave-specific cleanups.
> In our particular use case, the hook would trigger cleanup of IPs assigned to 
> the given Slave (see the [design doc | 
> https://docs.google.com/document/d/17mXtAmdAXcNBwp_JfrxmZcQrs7EO6ancSbejrqjLQ0g/edit#]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3420) Resolve shutdown semantics for Machine/Down

2015-09-15 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma reassigned MESOS-3420:
---

Assignee: Klaus Ma

> Resolve shutdown semantics for Machine/Down
> ---
>
> Key: MESOS-3420
> URL: https://issues.apache.org/jira/browse/MESOS-3420
> Project: Mesos
>  Issue Type: Task
>Reporter: Joris Van Remoortere
>Assignee: Klaus Ma
>  Labels: maintenance, mesosphere
>
> When an operator uses the {{machine/down}} endpoint, the master sends a 
> shutdown message to the agent.
> We need to discuss and resolve the semantics that we want regarding the 
> operators and frameworks knowing when their tasks are terminated.
> One option is to explicitly remove the agent from the master which will send 
> the {{TASK_LOST}} updates and {{SlaveLostMessage}} directly from the master. 
> The concern around this is that during a network partition, or if the agent 
> was down at the time, that these tasks could still be running.
> This is a general problem related to task life-times being dissociated with 
> that life-time of the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3432) Unify the implementations of the image provisioners.

2015-09-15 Thread Jie Yu (JIRA)
Jie Yu created MESOS-3432:
-

 Summary: Unify the implementations of the image provisioners.
 Key: MESOS-3432
 URL: https://issues.apache.org/jira/browse/MESOS-3432
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu


The current design uses separate provisioner implementation for each type of 
image (e.g., APPC, DOCKER).

This creates a lot of code duplications. Since we already have a unified 
provisioner backend (e.g., copy, bind, overlayfs), we should be able to unify 
the implementations of image provisioners and hide the image specific logics in 
the corresponding 'Store' implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3430) LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails on CentOS 7.1

2015-09-15 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745881#comment-14745881
 ] 

Jie Yu commented on MESOS-3430:
---

[~haosd...@gmail.com] do you understand why this works? marking the sandbox as 
slave first and then shared makes me feel that the first operation is a no-op.

> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails 
> on CentOS 7.1
> --
>
> Key: MESOS-3430
> URL: https://issues.apache.org/jira/browse/MESOS-3430
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Marco Massenzio
>Assignee: Michael Park
>  Labels: ROOT_Tests, flaky-test
> Attachments: verbose.log
>
>
> Just ran ROOT tests on CentOS 7.1 and had the following failure (clean build, 
> just pulled from {{master}}):
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:498: Failure
> (wait).failure(): Failed to clean up an isolator when destroying container 
> '366b6d37-b326-4ed1-8a5f-43d483dbbace' :Failed to unmount volume 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Failed to unmount 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Invalid argument
> ../../src/tests/utils.cpp:75: Failure
> os::rmdir(sandbox.get()): Device or resource busy
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem (1943 
> ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (1943 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (1951 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3430) LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails on CentOS 7.1

2015-09-15 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745859#comment-14745859
 ] 

haosdent edited comment on MESOS-3430 at 9/15/15 6:15 PM:
--

Thank you very much! Ubuntu 14.04 (ext4) also have the problem like this if 
mark the root as shared mount point. Is it would more simple to use make-slave 
to stop propagate between mount points? So that we don't need concern about the 
outside environment. Assume someone use make-private/make-shared to change the 
root mount point during the slave running.


was (Author: haosd...@gmail.com):
Thank you very much! Ubuntu 14.04 (ext4) also have the problem like this if 
mark the root as shared mount point. Is it would more simple to use make-slave 
to stop propagate between mount points. So that we don't need concern about the 
outside environment. Assume someone use make-private/make-shared to change the 
root mount point during the slave running.

> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails 
> on CentOS 7.1
> --
>
> Key: MESOS-3430
> URL: https://issues.apache.org/jira/browse/MESOS-3430
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Marco Massenzio
>Assignee: Michael Park
>  Labels: ROOT_Tests, flaky-test
> Attachments: verbose.log
>
>
> Just ran ROOT tests on CentOS 7.1 and had the following failure (clean build, 
> just pulled from {{master}}):
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:498: Failure
> (wait).failure(): Failed to clean up an isolator when destroying container 
> '366b6d37-b326-4ed1-8a5f-43d483dbbace' :Failed to unmount volume 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Failed to unmount 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Invalid argument
> ../../src/tests/utils.cpp:75: Failure
> os::rmdir(sandbox.get()): Device or resource busy
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem (1943 
> ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (1943 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (1951 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3366) Allow resources/attributes discovery

2015-09-15 Thread Felix Abecassis (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745769#comment-14745769
 ] 

Felix Abecassis commented on MESOS-3366:


[~cdoyle] [~nnielsen] Thank you for the reviews on my initial patch.

Before I add proper comments and tests, are you fine with this API proposal in 
the first place?
For instance, how should we also tackle attributes decoration? Should we add a 
second hook or have one hook for both attributes and resources? I would say we 
should have 2 different hooks, but it might be at the cost of some code/tests 
duplication.

> Allow resources/attributes discovery
> 
>
> Key: MESOS-3366
> URL: https://issues.apache.org/jira/browse/MESOS-3366
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Felix Abecassis
>
> In heterogeneous clusters, tasks sometimes have strong constraints on the 
> type of hardware they need to execute on. The current solution is to use 
> custom resources and attributes on the agents. Detecting non-standard 
> resources/attributes requires wrapping the "mesos-slave" binary behind a 
> script and use custom code to probe the agent. Unfortunately, this approach 
> doesn't allow composition. The solution would be to provide a hook/module 
> mechanism to allow users to use custom code performing resources/attributes 
> discovery.
> Please review the detailed document below:
> https://docs.google.com/document/d/15OkebDezFxzeyLsyQoU0upB0eoVECAlzEkeg0HQAX9w
> Feel free to express comments/concerns by annotating the document or by 
> replying to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3430) LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails on CentOS 7.1

2015-09-15 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745814#comment-14745814
 ] 

Jie Yu commented on MESOS-3430:
---

We should add a CHECK and abort if the parent mount of work_dir is a shared 
mount.

> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails 
> on CentOS 7.1
> --
>
> Key: MESOS-3430
> URL: https://issues.apache.org/jira/browse/MESOS-3430
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Marco Massenzio
>Assignee: Michael Park
>  Labels: ROOT_Tests, flaky-test
> Attachments: verbose.log
>
>
> Just ran ROOT tests on CentOS 7.1 and had the following failure (clean build, 
> just pulled from {{master}}):
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:498: Failure
> (wait).failure(): Failed to clean up an isolator when destroying container 
> '366b6d37-b326-4ed1-8a5f-43d483dbbace' :Failed to unmount volume 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Failed to unmount 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Invalid argument
> ../../src/tests/utils.cpp:75: Failure
> os::rmdir(sandbox.get()): Device or resource busy
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem (1943 
> ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (1943 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (1951 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3430) LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails on CentOS 7.1

2015-09-15 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745859#comment-14745859
 ] 

haosdent commented on MESOS-3430:
-

Thank you very much! Ubuntu 14.04 (ext4) also have the problem like this if 
mark the root as shared mount point. Is it would more simple to use make-slave 
to stop propagate between mount points. So that we don't need concern about the 
outside environment. Assume someone use make-private/make-shared to change the 
root mount point during the slave running.

> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails 
> on CentOS 7.1
> --
>
> Key: MESOS-3430
> URL: https://issues.apache.org/jira/browse/MESOS-3430
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Marco Massenzio
>Assignee: Michael Park
>  Labels: ROOT_Tests, flaky-test
> Attachments: verbose.log
>
>
> Just ran ROOT tests on CentOS 7.1 and had the following failure (clean build, 
> just pulled from {{master}}):
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:498: Failure
> (wait).failure(): Failed to clean up an isolator when destroying container 
> '366b6d37-b326-4ed1-8a5f-43d483dbbace' :Failed to unmount volume 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Failed to unmount 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Invalid argument
> ../../src/tests/utils.cpp:75: Failure
> os::rmdir(sandbox.get()): Device or resource busy
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem (1943 
> ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (1943 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (1951 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3430) LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails on CentOS 7.1

2015-09-15 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745892#comment-14745892
 ] 

haosdent commented on MESOS-3430:
-

According this document, 
https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt 
slave+shared also is a state. But I not very clear about that.

And I find another problem, I just do a simple try in Ubuntu 14.04
{code}
$ mount --make-shared /
$ mkdir /tmp/sandbox
$ mkdir /tmp/source
$ mkdir /tmp/sandbox/target
$ mount --bind /tmp/sandbox /tmp/sandbox
$ mount --make-shared /tmp/sandbox
$ mount --bind /tmp/source /tmp/sandbox/target
{code}

It could reproduce the problem.
{code}
45 22 8:3 /tmp/sandbox /tmp/sandbox rw,relatime - ext4 
/dev/disk/by-uuid/98708f21-a59d-4b80-a85c-27b78c22e316 
rw,errors=remount-ro,data=ordered
47 45 8:3 /tmp/source /tmp/sandbox/target rw,relatime shared:1 - ext4 
/dev/disk/by-uuid/98708f21-a59d-4b80-a85c-27b78c22e316 
rw,errors=remount-ro,data=ordered
48 22 8:3 /tmp/source /tmp/sandbox/target rw,relatime shared:1 - ext4 
/dev/disk/by-uuid/98708f21-a59d-4b80-a85c-27b78c22e316 
rw,errors=remount-ro,data=ordered
{code}

And if I change the root mount point to private before unmount 
/tmp/sandbox/target. I found them could not unmount.
{code}
$ mount --make-private /tmp/sandbox
{code}

First time success.
{code}
$ umount /tmp/sandbox/target
{code}

Second time failed.
{code}
$ umount /tmp/sandbox/target
umount: /tmp/sandbox/target: not mounted
{code}

The record still exists in /proc/self/mountinfo
{code}
45 22 8:3 /tmp/sandbox /tmp/sandbox rw,relatime - ext4 
/dev/disk/by-uuid/98708f21-a59d-4b80-a85c-27b78c22e316 
rw,errors=remount-ro,data=ordered
48 22 8:3 /tmp/source /tmp/sandbox/target rw,relatime shared:1 - ext4 
/dev/disk/by-uuid/98708f21-a59d-4b80-a85c-27b78c22e316 
rw,errors=remount-ro,data=ordered
{code}



> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails 
> on CentOS 7.1
> --
>
> Key: MESOS-3430
> URL: https://issues.apache.org/jira/browse/MESOS-3430
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Marco Massenzio
>Assignee: Michael Park
>  Labels: ROOT_Tests, flaky-test
> Attachments: verbose.log
>
>
> Just ran ROOT tests on CentOS 7.1 and had the following failure (clean build, 
> just pulled from {{master}}):
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:498: Failure
> (wait).failure(): Failed to clean up an isolator when destroying container 
> '366b6d37-b326-4ed1-8a5f-43d483dbbace' :Failed to unmount volume 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Failed to unmount 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Invalid argument
> ../../src/tests/utils.cpp:75: Failure
> os::rmdir(sandbox.get()): Device or resource busy
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem (1943 
> ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (1943 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (1951 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3386) Port remaining Stout tests to Windows

2015-09-15 Thread Alex Clemmer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745064#comment-14745064
 ] 

Alex Clemmer commented on MESOS-3386:
-

[~hartem] Oh sorry I didn't see this until just now. I'll do it tomorrow (it's 
already almost 2)

> Port remaining Stout tests to Windows
> -
>
> Key: MESOS-3386
> URL: https://issues.apache.org/jira/browse/MESOS-3386
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: build, mesosphere, tests
>
> Here is a concise list of the Stout tests that don't work yet, and their 
> dependencies, and comments about how hard they are to port. Asterisks are 
> next to tests that seem to block Windows MVP.
> {quote}
> *dynamiclibrary_tests.cpp -- depends on dynamic load libraries [probably 
> easy, just map to windows dll load API]
> *flags_tests.cpp -- depends on os.hpp [probably will "just work" if we port 
> os.hpp
> *gzip_tests.cpp -- depends on gzip.hpp [need to make API-compatible impl of 
> gzip.hpp, which is a medium amount of work]
> *ip_tests.cpp -- depends on net.hpp and abort.hpp [will probably "just work" 
> after we port net.hpp]
> *mac_tests.cpp -- depends on abort.hpp and mac.hpp [may or may not be 
> nontrivial, will probably work if we can get mac.hpp]
> *os_tests.cpp -- depends on a bunch of stuff [probably hardest and most 
> important]
> *path_tests.cpp -- depends on os.hpp [will probably "just work" if we port 
> os.hpp]
> protobuf_tests.cpp -- depends on stout/protobuf.hpp (and it can't seem to 
> find the protobuf include dir)
> *sendfile_test.cpp -- depends on os.hpp and sendfile.hpp [simple port of 
> sendfile is possible; os.hpp is harder]
> signals_tests.cpp -- depends on os.hpp and signal.hpp [signals will probably 
> be easy; os.hpp is the hard part]
> *subcommand_tests.cpp -- depends on flags.hpp (which depends on os.hpp) 
> [probably will "just work" if we get os.hpp]
> svn_tests.cpp -- depends on libapr and libsvn [simple if we get windows to 
> pull these deps]
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3157) only perform batch resource allocations

2015-09-15 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745087#comment-14745087
 ] 

Klaus Ma commented on MESOS-3157:
-

[~bmahler], is it possible to ask framework to handle shorter tasks? For such 
short running tasks case, build a framework to load a long running executor to 
run tasks; the framework hold the resources until all tasks are done, and 
dispatch next task to the executor without waiting time.

> only perform batch resource allocations
> ---
>
> Key: MESOS-3157
> URL: https://issues.apache.org/jira/browse/MESOS-3157
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: James Peach
>Assignee: James Peach
>
> Our deployment environments have a lot of churn, with many short-live 
> frameworks that often revive offers. Running the allocator takes a long time 
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the 
> allocator process to get very long, and the allocator effectively becomes 
> unresponsive (eg. a revive offers message takes too long to come to the head 
> of the queue).
> We have been running a patch to remove all the event-triggered allocations 
> and only allocate from the batch task 
> {{HierarchicalAllocatorProcess::batch}}. This works great and really improves 
> responsiveness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3408) Labels field of FrameworkInfo should be added into v1 mesos.proto

2015-09-15 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang updated MESOS-3408:
--
Shepherd: Adam B

> Labels field of FrameworkInfo should be added into v1 mesos.proto
> -
>
> Key: MESOS-3408
> URL: https://issues.apache.org/jira/browse/MESOS-3408
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
> Fix For: 0.25.0
>
>
> In [MESOS-2841|https://issues.apache.org/jira/browse/MESOS-2841], a new field 
> "Labels" has been added into FrameworkInfo in mesos.proto, but is missed in 
> v1 mesos.proto.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1845) CommandInfo tasks may fail when scheduled after another task with the same id has finished.

2015-09-15 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-1845:

Assignee: (was: Klaus Ma)

> CommandInfo tasks may fail when scheduled after another task with the same id 
> has finished.
> ---
>
> Key: MESOS-1845
> URL: https://issues.apache.org/jira/browse/MESOS-1845
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andreas Raster
>
> I created a little test framework where I wanted to experiment with 
> scheduling tasks where running one task relies on the results of another, 
> previously run task. So in my test framework I would first schedule a task 
> that would append the string "foo" to a file, and after that one finishes I 
> would schedule a task that appends "bar" to the same file.
> This worked well when using ExecutorInfo, but when I switched to using 
> CommandInfo instead (specifying commands like 'echo foo >> /share/foobar.txt' 
> in set_value()), it would most of the time fail in the second step when 
> attempting to append "bar". Occasionally, but very rarely, it would work 
> though.
> I couldn't find any meaningful log messages indicating what exactly went 
> wrong. The slave log would indicate that the tasks status changed to 
> TASK_FAILED and that that status update was sent correctly. The stdout log in 
> the Sandbox would indicate that the command 'exited with status 0'.
> I could work around the issue when I specified task ids that were always 
> unique. Previously I would reuse the id of a previously run task, one that 
> appended "foo" to a file, after it finished in the followup task that would 
> append "bar" to a file.
> It seems to me there might be something wrong when scheduling very short 
> running tasks with the same id quickly after each other.
> Source code for my foobar framework:
> http://paste.ubuntu.com/8459083
> Build with:
> g++ -std=c++0x -g -Wall foobar_framework.cpp -I. -L/usr/local/lib -lmesos -o 
> foobar-framework



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky

2015-09-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745606#comment-14745606
 ] 

Guangya Liu edited comment on MESOS-3422 at 9/15/15 3:39 PM:
-

I test on Ubuntu and works well. [~vi...@twitter.com] does this related to 
platform? Thansk.

{code}
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from MasterSlaveReconciliationTest
[ RUN  ] MasterSlaveReconciliationTest.ReconcileLostTask
Using temporary directory 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn'
I0915 22:28:40.800787  3733 leveldb.cpp:176] Opened db in 252.206266ms
I0915 22:28:40.851069  3733 leveldb.cpp:183] Compacted db in 50.197346ms
I0915 22:28:40.851210  3733 leveldb.cpp:198] Created db iterator in 63324ns
I0915 22:28:40.851256  3733 leveldb.cpp:204] Seeked to beginning of db in 4562ns
I0915 22:28:40.851286  3733 leveldb.cpp:273] Iterated through 0 keys in the db 
in 322ns
I0915 22:28:40.871953  3733 replica.cpp:744] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0915 22:28:40.886368  3756 recover.cpp:449] Starting replica recovery
I0915 22:28:40.90  3756 recover.cpp:475] Replica is in EMPTY status
I0915 22:28:40.916332  3759 replica.cpp:641] Replica in EMPTY status received a 
broadcasted recover request
I0915 22:28:40.917351  3756 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I0915 22:28:40.918557  3755 recover.cpp:566] Updating replica status to STARTING
I0915 22:28:40.928189  3759 master.cpp:380] Master 
20150915-222840-16842879-54960-3733 (devstack007.cn.ibm.com) started on 
127.0.1.1:54960
I0915 22:28:40.928261  3759 master.cpp:382] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/credentials"
 --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_slave_ping_timeouts="5" --quiet="false" 
--recovery_slave_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
--registry_strict="true" --root_submissions="true" 
--slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
--user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/master" 
--zk_session_timeout="10secs"
I0915 22:28:40.993895  3759 master.cpp:427] Master only allowing authenticated 
frameworks to register
I0915 22:28:40.993962  3759 master.cpp:432] Master only allowing authenticated 
slaves to register
I0915 22:28:40.994010  3759 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/credentials'
I0915 22:28:40.994776  3759 master.cpp:471] Using default 'crammd5' 
authenticator
I0915 22:28:40.995053  3759 authenticator.cpp:512] Initializing server SASL
I0915 22:28:41.009496  3757 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 90.341573ms
I0915 22:28:41.009570  3757 replica.cpp:323] Persisted replica status to 
STARTING
I0915 22:28:41.010040  3756 recover.cpp:475] Replica is in STARTING status
I0915 22:28:41.011255  3757 replica.cpp:641] Replica in STARTING status 
received a broadcasted recover request
I0915 22:28:41.011551  3752 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I0915 22:28:41.012073  3756 recover.cpp:566] Updating replica status to VOTING
I0915 22:28:41.084720  3753 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 72.469042ms
I0915 22:28:41.084803  3753 replica.cpp:323] Persisted replica status to VOTING
I0915 22:28:41.084935  3752 recover.cpp:580] Successfully joined the Paxos group
I0915 22:28:41.085227  3752 recover.cpp:464] Recover process terminated
I0915 22:28:41.191287  3759 auxprop.cpp:66] Initialized in-memory auxiliary 
property plugin
I0915 22:28:41.191455  3759 master.cpp:508] Authorization enabled
I0915 22:28:41.192039  3758 hierarchical.hpp:408] Initialized hierarchical 
allocator process
I0915 22:28:41.210978  3752 whitelist_watcher.cpp:79] No whitelist given
I0915 22:28:41.226894  3757 master.cpp:1605] The newly elected leader is 
master@127.0.1.1:54960 with id 20150915-222840-16842879-54960-3733
I0915 22:28:41.227022  3757 master.cpp:1618] Elected as the leading mast

[jira] [Commented] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky

2015-09-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745606#comment-14745606
 ] 

Guangya Liu commented on MESOS-3422:


I test on Ubuntu and works well. [~vi...@twitter.com] does this related to 
platform? Thansk.

[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from MasterSlaveReconciliationTest
[ RUN  ] MasterSlaveReconciliationTest.ReconcileLostTask
Using temporary directory 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn'
I0915 22:28:40.800787  3733 leveldb.cpp:176] Opened db in 252.206266ms
I0915 22:28:40.851069  3733 leveldb.cpp:183] Compacted db in 50.197346ms
I0915 22:28:40.851210  3733 leveldb.cpp:198] Created db iterator in 63324ns
I0915 22:28:40.851256  3733 leveldb.cpp:204] Seeked to beginning of db in 4562ns
I0915 22:28:40.851286  3733 leveldb.cpp:273] Iterated through 0 keys in the db 
in 322ns
I0915 22:28:40.871953  3733 replica.cpp:744] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0915 22:28:40.886368  3756 recover.cpp:449] Starting replica recovery
I0915 22:28:40.90  3756 recover.cpp:475] Replica is in EMPTY status
I0915 22:28:40.916332  3759 replica.cpp:641] Replica in EMPTY status received a 
broadcasted recover request
I0915 22:28:40.917351  3756 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I0915 22:28:40.918557  3755 recover.cpp:566] Updating replica status to STARTING
I0915 22:28:40.928189  3759 master.cpp:380] Master 
20150915-222840-16842879-54960-3733 (devstack007.cn.ibm.com) started on 
127.0.1.1:54960
I0915 22:28:40.928261  3759 master.cpp:382] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/credentials"
 --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_slave_ping_timeouts="5" --quiet="false" 
--recovery_slave_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
--registry_strict="true" --root_submissions="true" 
--slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
--user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/master" 
--zk_session_timeout="10secs"
I0915 22:28:40.993895  3759 master.cpp:427] Master only allowing authenticated 
frameworks to register
I0915 22:28:40.993962  3759 master.cpp:432] Master only allowing authenticated 
slaves to register
I0915 22:28:40.994010  3759 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/credentials'
I0915 22:28:40.994776  3759 master.cpp:471] Using default 'crammd5' 
authenticator
I0915 22:28:40.995053  3759 authenticator.cpp:512] Initializing server SASL
I0915 22:28:41.009496  3757 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 90.341573ms
I0915 22:28:41.009570  3757 replica.cpp:323] Persisted replica status to 
STARTING
I0915 22:28:41.010040  3756 recover.cpp:475] Replica is in STARTING status
I0915 22:28:41.011255  3757 replica.cpp:641] Replica in STARTING status 
received a broadcasted recover request
I0915 22:28:41.011551  3752 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I0915 22:28:41.012073  3756 recover.cpp:566] Updating replica status to VOTING
I0915 22:28:41.084720  3753 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 72.469042ms
I0915 22:28:41.084803  3753 replica.cpp:323] Persisted replica status to VOTING
I0915 22:28:41.084935  3752 recover.cpp:580] Successfully joined the Paxos group
I0915 22:28:41.085227  3752 recover.cpp:464] Recover process terminated
I0915 22:28:41.191287  3759 auxprop.cpp:66] Initialized in-memory auxiliary 
property plugin
I0915 22:28:41.191455  3759 master.cpp:508] Authorization enabled
I0915 22:28:41.192039  3758 hierarchical.hpp:408] Initialized hierarchical 
allocator process
I0915 22:28:41.210978  3752 whitelist_watcher.cpp:79] No whitelist given
I0915 22:28:41.226894  3757 master.cpp:1605] The newly elected leader is 
master@127.0.1.1:54960 with id 20150915-222840-16842879-54960-3733
I0915 22:28:41.227022  3757 master.cpp:1618] Elected as the leading master!
I0915 22:28:41.227073  3757 master.cpp:1378] Reco

[jira] [Issue Comment Deleted] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky

2015-09-15 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu updated MESOS-3422:
---
Comment: was deleted

(was: [~vi...@twitter.com] I'm sorry that I updated the problem description by 
mistake, can you please help update the description again? Thanks!)

> MasterSlaveReconciliationTest.ReconcileLostTask test is flaky
> -
>
> Key: MESOS-3422
> URL: https://issues.apache.org/jira/browse/MESOS-3422
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Affects Versions: 0.25.0
> Environment: CentOS
>Reporter: Vinod Kone
>
> Observed this on internal CI
> {code}
> DEBUG: [--] 5 tests from MasterSlaveReconciliationTest
> DEBUG: [ RUN ] MasterSlaveReconciliationTest.SlaveReregisterTerminatedExecutor
> DEBUG: Using temporary directory 
> '/tmp/MasterSlaveReconciliationTest_SlaveReregisterTerminatedExecutor_QJPUzf'
> DEBUG: [ OK ] MasterSlaveReconciliationTest.SlaveReregisterTerminatedExecutor 
> (78 ms)
> DEBUG: [ RUN ] MasterSlaveReconciliationTest.ReconcileLostTask
> DEBUG: Using temporary directory 
> '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_16KDgE'
> DEBUG: tests/master_slave_reconciliation_tests.cpp:226: Failure
> DEBUG: Failed to wait 15secs for statusUpdateMessage
> DEBUG: tests/master_slave_reconciliation_tests.cpp:216: Failure
> DEBUG: Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(, _))...
> DEBUG: Expected: to be called once
> DEBUG: Actual: never called - unsatisfied and active
> DEBUG: I0914 08:51:27.825984 16062 leveldb.cpp:438] Reading position from 
> leveldb took 16151ns
> DEBUG: I0914 08:51:27.828069 16049 registrar.cpp:342] Successfully fetched 
> the registry (0B) in 7648us
> DEBUG: I0914 08:51:27.828119 16049 registrar.cpp:441] Applied 1 operations in 
> 2805ns; attempting to update the 'registry'
> DEBUG: I0914 08:51:27.829991 16066 log.cpp:685] Attempting to append 222 
> bytes to the log
> DEBUG: I0914 08:51:27.830029 16066 coordinator.cpp:341] Coordinator 
> attempting to write APPEND action at position 1
> DEBUG: I0914 08:51:27.830729 16053 replica.cpp:511] Replica received write 
> request for position 1
> DEBUG: I0914 08:51:27.831167 16053 leveldb.cpp:343] Persisting action (241 
> bytes) to leveldb took 414748ns
> DEBUG: I0914 08:51:27.831185 16053 replica.cpp:679] Persisted action at 1
> DEBUG: I0914 08:51:27.831493 16058 replica.cpp:658] Replica received learned 
> notice for position 1
> DEBUG: I0914 08:51:27.831698 16058 leveldb.cpp:343] Persisting action (243 
> bytes) to leveldb took 185223ns
> DEBUG: I0914 08:51:27.831714 16058 replica.cpp:679] Persisted action at 1
> DEBUG: I0914 08:51:27.831722 16058 replica.cpp:664] Replica learned APPEND 
> action at position 1
> DEBUG: I0914 08:51:27.831989 16056 registrar.cpp:486] Successfully updated 
> the 'registry' in 3.827968ms
> DEBUG: I0914 08:51:27.832041 16052 log.cpp:704] Attempting to truncate the 
> log to 1
> DEBUG: I0914 08:51:27.832093 16056 registrar.cpp:372] Successfully recovered 
> registrar
> DEBUG: I0914 08:51:27.832259 16072 coordinator.cpp:341] Coordinator 
> attempting to write TRUNCATE action at position 2
> DEBUG: I0914 08:51:27.832259 16062 master.cpp:1404] Recovered 0 slaves from 
> the Registry (183B) ; allowing 10mins for slaves to re-register
> DEBUG: I0914 08:51:27.832882 16060 replica.cpp:511] Replica received write 
> request for position 2
> DEBUG: I0914 08:51:27.833243 16060 leveldb.cpp:343] Persisting action (16 
> bytes) to leveldb took 340843ns
> DEBUG: I0914 08:51:27.833261 16060 replica.cpp:679] Persisted action at 2
> DEBUG: I0914 08:51:27.833593 16050 replica.cpp:658] Replica received learned 
> notice for position 2
> DEBUG: I0914 08:51:27.833724 16050 leveldb.cpp:343] Persisting action (18 
> bytes) to leveldb took 112560ns
> DEBUG: I0914 08:51:27.833755 16050 leveldb.cpp:401] Deleting ~1 keys from 
> leveldb took 16580ns
> DEBUG: I0914 08:51:27.833765 16050 replica.cpp:679] Persisted action at 2
> DEBUG: I0914 08:51:27.833775 16050 replica.cpp:664] Replica learned TRUNCATE 
> action at position 2
> DEBUG: I0914 08:51:27.843340 16057 http.cpp:333] HTTP POST for 
> /master/maintenance/schedule from 172.18.4.102:46471
> DEBUG: I0914 08:51:27.843801 16050 registrar.cpp:441] Applied 1 operations in 
> 25197ns; attempting to update the 'registry'
> DEBUG: I0914 08:51:27.845721 16068 log.cpp:685] Attempting to append 328 
> bytes to the log
> DEBUG: I0914 08:51:27.845772 16068 coordinator.cpp:341] Coordinator 
> attempting to write APPEND action at position 3
> DEBUG: I0914 08:51:27.846606 16052 replica.cpp:511] Replica received write 
> request for position 3
> DEBUG: I0914 08:51:27.847012 16052 leveldb.cpp:343] Persisting action (347 
> bytes) to leveldb took 387519ns
> DEBUG: 

[jira] [Updated] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky

2015-09-15 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu updated MESOS-3422:
---
Description: 
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from MasterSlaveReconciliationTest
[ RUN  ] MasterSlaveReconciliationTest.ReconcileLostTask
Using temporary directory 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn'
I0915 22:28:40.800787  3733 leveldb.cpp:176] Opened db in 252.206266ms
I0915 22:28:40.851069  3733 leveldb.cpp:183] Compacted db in 50.197346ms
I0915 22:28:40.851210  3733 leveldb.cpp:198] Created db iterator in 63324ns
I0915 22:28:40.851256  3733 leveldb.cpp:204] Seeked to beginning of db in 4562ns
I0915 22:28:40.851286  3733 leveldb.cpp:273] Iterated through 0 keys in the db 
in 322ns
I0915 22:28:40.871953  3733 replica.cpp:744] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0915 22:28:40.886368  3756 recover.cpp:449] Starting replica recovery
I0915 22:28:40.90  3756 recover.cpp:475] Replica is in EMPTY status
I0915 22:28:40.916332  3759 replica.cpp:641] Replica in EMPTY status received a 
broadcasted recover request
I0915 22:28:40.917351  3756 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I0915 22:28:40.918557  3755 recover.cpp:566] Updating replica status to STARTING
I0915 22:28:40.928189  3759 master.cpp:380] Master 
20150915-222840-16842879-54960-3733 (devstack007.cn.ibm.com) started on 
127.0.1.1:54960
I0915 22:28:40.928261  3759 master.cpp:382] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/credentials"
 --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_slave_ping_timeouts="5" --quiet="false" 
--recovery_slave_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
--registry_strict="true" --root_submissions="true" 
--slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
--user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/master" 
--zk_session_timeout="10secs"
I0915 22:28:40.993895  3759 master.cpp:427] Master only allowing authenticated 
frameworks to register
I0915 22:28:40.993962  3759 master.cpp:432] Master only allowing authenticated 
slaves to register
I0915 22:28:40.994010  3759 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/credentials'
I0915 22:28:40.994776  3759 master.cpp:471] Using default 'crammd5' 
authenticator
I0915 22:28:40.995053  3759 authenticator.cpp:512] Initializing server SASL
I0915 22:28:41.009496  3757 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 90.341573ms
I0915 22:28:41.009570  3757 replica.cpp:323] Persisted replica status to 
STARTING
I0915 22:28:41.010040  3756 recover.cpp:475] Replica is in STARTING status
I0915 22:28:41.011255  3757 replica.cpp:641] Replica in STARTING status 
received a broadcasted recover request
I0915 22:28:41.011551  3752 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I0915 22:28:41.012073  3756 recover.cpp:566] Updating replica status to VOTING
I0915 22:28:41.084720  3753 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 72.469042ms
I0915 22:28:41.084803  3753 replica.cpp:323] Persisted replica status to VOTING
I0915 22:28:41.084935  3752 recover.cpp:580] Successfully joined the Paxos group
I0915 22:28:41.085227  3752 recover.cpp:464] Recover process terminated
I0915 22:28:41.191287  3759 auxprop.cpp:66] Initialized in-memory auxiliary 
property plugin
I0915 22:28:41.191455  3759 master.cpp:508] Authorization enabled
I0915 22:28:41.192039  3758 hierarchical.hpp:408] Initialized hierarchical 
allocator process
I0915 22:28:41.210978  3752 whitelist_watcher.cpp:79] No whitelist given
I0915 22:28:41.226894  3757 master.cpp:1605] The newly elected leader is 
master@127.0.1.1:54960 with id 20150915-222840-16842879-54960-3733
I0915 22:28:41.227022  3757 master.cpp:1618] Elected as the leading master!
I0915 22:28:41.227073  3757 master.cpp:1378] Recovering from registrar
I0915 22:28:41.227442  3756 registrar.cpp:309] Recovering registrar
I0915 22:28:41.228864  3759 lo

[jira] [Commented] (MESOS-3419) Add HELP message for reserve/unreserve endpoint

2015-09-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745253#comment-14745253
 ] 

Guangya Liu commented on MESOS-3419:


[~mcypark] Any comments on the RR? Thanks.

> Add HELP message for reserve/unreserve endpoint
> ---
>
> Key: MESOS-3419
> URL: https://issues.apache.org/jira/browse/MESOS-3419
> Project: Mesos
>  Issue Type: Task
>Affects Versions: 0.25.0
>Reporter: Guangya Liu
>Assignee: Guangya Liu
> Fix For: 0.25.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2077) Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.

2015-09-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745262#comment-14745262
 ] 

Guangya Liu commented on MESOS-2077:


[~bmahler] can you please show some comments for the RR? Thanks!

> Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.
> -
>
> Key: MESOS-2077
> URL: https://issues.apache.org/jira/browse/MESOS-2077
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Benjamin Mahler
>Assignee: Guangya Liu
>  Labels: mesosphere, twitter
>
> For maintenance, sometimes operators will force the drain of a slave (via 
> SIGUSR1), when deemed safe (e.g. non-critical tasks running) and/or necessary 
> (e.g. bad hardware).
> To eliminate alerting noise, we'd like to add a 'Reason' that expresses the 
> forced drain of the slave, so that these are not considered to be a generic 
> slave removal TASK_LOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2224) Add explanatory comments for Allocator interface

2015-09-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745259#comment-14745259
 ] 

Guangya Liu commented on MESOS-2224:


[~mcypark] [~alex-mesos] can you help review this? It has been reviewed for 
several rounds. Thanks!

> Add explanatory comments for Allocator interface
> 
>
> Key: MESOS-2224
> URL: https://issues.apache.org/jira/browse/MESOS-2224
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Affects Versions: 0.25.0
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
> Fix For: 0.25.0
>
>
> Allocator is the public API and it would be great to have comments on all 
> calls to be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3184) Scheduler driver accepts (re-)registration message while re-authentication is in progress.

2015-09-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727511#comment-14727511
 ] 

Guangya Liu edited comment on MESOS-3184 at 9/15/15 11:06 AM:
--

[~bmahler] can you please show more detail for this? Based on my understanding, 
if authentication failure, the master will send FrameworkErrorMessage to 
framework and framework will register failed.
{code}
if (authorizationError.isSome()) {
LOG(INFO) << "Refusing subscription of framework"
  << " '" << frameworkInfo.name() << "'"
  << ": " << authorizationError.get().message;

FrameworkErrorMessage message;
message.set_message(authorizationError.get().message);
http.send(message);
http.close();
return;
  }
{code}


was (Author: gyliu):
[~bmahler] can you please show more detail for this? Based on my understanding, 
if authentication failure, the master will send FrameworkErrorMessage to 
framework and framework will register failed.

if (authorizationError.isSome()) {
LOG(INFO) << "Refusing subscription of framework"
  << " '" << frameworkInfo.name() << "'"
  << ": " << authorizationError.get().message;

FrameworkErrorMessage message;
message.set_message(authorizationError.get().message);
http.send(message);
http.close();
return;
  }

> Scheduler driver accepts (re-)registration message while re-authentication is 
> in progress.
> --
>
> Key: MESOS-3184
> URL: https://issues.apache.org/jira/browse/MESOS-3184
> Project: Mesos
>  Issue Type: Bug
>  Components: scheduler driver
>Reporter: Benjamin Mahler
>Assignee: Guangya Liu
>
> The scheduler driver currently accepts re-registered when it is 
> re-authenticating with the master. This can occur due a race with the 
> authentication timeout and the master sending a (re-)registration message.
> This is fairly innocuous currently, but if the subsequent re-authentication 
> fails, the driver keeps retrying authentication, but both the master and 
> driver continue to act as though the scheduler is registered.
> The authentication check in _(re-)registerFramework in the master doesn't 
> provide any benefit to this, it is still a race, so this should likely be 
> removed as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1935) Replace hard-coded reap interval with a constant

2015-09-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745263#comment-14745263
 ] 

Guangya Liu commented on MESOS-1935:


[~bmahler] [~alex-mesos] This has been reviewed for several round, can you help 
review the latest RR again? Thanks!

> Replace hard-coded reap interval with a constant
> 
>
> Key: MESOS-1935
> URL: https://issues.apache.org/jira/browse/MESOS-1935
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.25.0
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Trivial
>  Labels: newbie
> Fix For: 0.25.0
>
>
> With https://issues.apache.org/jira/browse/MESOS-1846 implemented, replace 
> the hard-coded value for the maximal reap interval (1s) with the constant 
> from {{reap.hpp}}. This will mostly affect tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2647) Slave should validate tasks using oversubscribed resources

2015-09-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745268#comment-14745268
 ] 

Guangya Liu commented on MESOS-2647:


[~vi...@twitter.com] One question for this task: How can this case happen? In 
my understanding, when launch task, the revocable resource should be available 
and there should be no task running on this revocable resource. Only when the 
QoS controller make some correction by killing some executors/tasks can release 
some revocable resource. Comments? Thanks!

> Slave should validate tasks using oversubscribed resources
> --
>
> Key: MESOS-2647
> URL: https://issues.apache.org/jira/browse/MESOS-2647
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Guangya Liu
>  Labels: twitter
>
> The latest oversubscribed resource estimate might render a revocable task 
> launch invalid. Slave should check this and send TASK_LOST with appropriate 
> REASON.
> We need to add a new REASON for this (REASON_RESOURCE_OVERSUBSCRIBED?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3169) FrameworkInfo should only be updated if the re-registration is valid

2015-09-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745255#comment-14745255
 ] 

Guangya Liu commented on MESOS-3169:


[~jvanremoortere] can you help review the RR? Thanks.

> FrameworkInfo should only be updated if the re-registration is valid
> 
>
> Key: MESOS-3169
> URL: https://issues.apache.org/jira/browse/MESOS-3169
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.25.0
>Reporter: Joris Van Remoortere
>Assignee: Guangya Liu
>  Labels: framework, master, mesosphere, tech-debt
> Fix For: 0.25.0
>
>
> See Ben Mahler's comment in https://reviews.apache.org/r/32961/
> FrameworkInfo should not be updated if the re-registration is invalid. This 
> can happen in a few cases under the branching logic, so this requires some 
> refactoring.
> Notice that a {code}FrameworkErrorMessage{code} can be generated  both inside 
> {code}else if (from != framework->pid){code} as well as from inside 
> {code}failoverFramework(framework, from);{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3037) Add a QUIESCE call to the scheduler

2015-09-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745258#comment-14745258
 ] 

Guangya Liu commented on MESOS-3037:


[~vi...@twitter.com] Now all of your comments are addressed, can you help 
review? Thanks!

> Add a QUIESCE call to the scheduler
> ---
>
> Key: MESOS-3037
> URL: https://issues.apache.org/jira/browse/MESOS-3037
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.25.0
>Reporter: Vinod Kone
>Assignee: Guangya Liu
>  Labels: September23th
> Fix For: 0.25.0
>
>
> SUPPRESS call is the complement to the current REVIVE call i.e., it will 
> inform Mesos to stop sending offers to the framework. 
> For the scheduler driver to send only Call messages (MESOS-2913), 
> DeactivateFrameworkMessage needs to be converted to Call(s). We can implement 
> this by having the driver send a SUPPRESS call followed by a DECLINE call for 
> outstanding offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3431) Refactor Protobuf tests

2015-09-15 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-3431:
--

 Summary: Refactor Protobuf tests
 Key: MESOS-3431
 URL: https://issues.apache.org/jira/browse/MESOS-3431
 Project: Mesos
  Issue Type: Task
  Components: test
Reporter: Alexander Rukletsov
Priority: Minor


{{ProtobufTest.JSON}} test does several things simultaneously, including 
message instantiation, conversion, parsing. We should split this test into 
several independent ones that test just one thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3408) Labels field of FrameworkInfo should be added into v1 mesos.proto

2015-09-15 Thread James DeFelice (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James DeFelice updated MESOS-3408:
--
Labels: mesosphere  (was: )

> Labels field of FrameworkInfo should be added into v1 mesos.proto
> -
>
> Key: MESOS-3408
> URL: https://issues.apache.org/jira/browse/MESOS-3408
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>  Labels: mesosphere
> Fix For: 0.25.0
>
>
> In [MESOS-2841|https://issues.apache.org/jira/browse/MESOS-2841], a new field 
> "Labels" has been added into FrameworkInfo in mesos.proto, but is missed in 
> v1 mesos.proto.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3430) LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails on CentOS 7.1

2015-09-15 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745810#comment-14745810
 ] 

Jie Yu commented on MESOS-3430:
---

OK, the problem is: by default, centos7.1 mark / mount as a shared mount:
{noformat}
[vagrant@localhost ~]$ cat /proc/self/mountinfo 
17 37 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
18 37 0:16 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
rw,seclabel
19 37 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
rw,seclabel,size=224872k,nr_inodes=56218,mode=755
20 18 0:15 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
securityfs securityfs rw
21 19 0:17 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw,seclabel
22 19 0:11 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
rw,seclabel,gid=5,mode=620,ptmxmode=000
23 37 0:18 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
24 18 0:19 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
rw,seclabel,mode=755
25 24 0:20 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - 
cgroup cgroup 
rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
26 18 0:21 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:19 - pstore 
pstore rw
27 24 0:22 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:10 - 
cgroup cgroup rw,cpuset
28 24 0:23 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
shared:11 - cgroup cgroup rw,cpuacct,cpu
29 24 0:24 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:12 - 
cgroup cgroup rw,memory
30 24 0:25 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:13 - 
cgroup cgroup rw,devices
31 24 0:26 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 - 
cgroup cgroup rw,freezer
32 24 0:27 / /sys/fs/cgroup/net_cls rw,nosuid,nodev,noexec,relatime shared:15 - 
cgroup cgroup rw,net_cls
33 24 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:16 - 
cgroup cgroup rw,blkio
34 24 0:29 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
shared:17 - cgroup cgroup rw,perf_event
35 24 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:18 - 
cgroup cgroup rw,hugetlb
36 18 0:31 / /sys/kernel/config rw,relatime shared:20 - configfs configfs rw
37 1 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/centos-root 
rw,seclabel,attr2,inode64,noquota
38 18 0:14 / /sys/fs/selinux rw,relatime shared:21 - selinuxfs selinuxfs rw
39 17 0:32 / /proc/sys/fs/binfmt_misc rw,relatime shared:23 - autofs systemd-1 
rw,fd=33,pgrp=1,timeout=300,minproto=5,maxproto=5,direct
40 19 0:33 / /dev/hugepages rw,relatime shared:24 - hugetlbfs hugetlbfs 
rw,seclabel
41 19 0:13 / /dev/mqueue rw,relatime shared:25 - mqueue mqueue rw,seclabel
42 18 0:7 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
44 37 8:1 / /boot rw,relatime shared:27 - xfs /dev/sda1 
rw,seclabel,attr2,inode64,noquota
45 37 0:35 / /vagrant rw,nodev,relatime shared:28 - vboxsf none rw
{noformat}

> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails 
> on CentOS 7.1
> --
>
> Key: MESOS-3430
> URL: https://issues.apache.org/jira/browse/MESOS-3430
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Marco Massenzio
>Assignee: Michael Park
>  Labels: ROOT_Tests, flaky-test
> Attachments: verbose.log
>
>
> Just ran ROOT tests on CentOS 7.1 and had the following failure (clean build, 
> just pulled from {{master}}):
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:498: Failure
> (wait).failure(): Failed to clean up an isolator when destroying container 
> '366b6d37-b326-4ed1-8a5f-43d483dbbace' :Failed to unmount volume 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Failed to unmount 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Invalid argument
> ../../src/tests/utils.cpp:75: Failure
> os::rmdir(sandbox.get()): Device or resource busy
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem (1943 
> ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (1943 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (1951 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky

2015-09-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745611#comment-14745611
 ] 

Guangya Liu commented on MESOS-3422:


[~vi...@twitter.com] I'm sorry that I updated the problem description by 
mistake, can you please help update the description again? Thanks!

> MasterSlaveReconciliationTest.ReconcileLostTask test is flaky
> -
>
> Key: MESOS-3422
> URL: https://issues.apache.org/jira/browse/MESOS-3422
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Affects Versions: 0.25.0
> Environment: CentOS
>Reporter: Vinod Kone
>
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from MasterSlaveReconciliationTest
> [ RUN  ] MasterSlaveReconciliationTest.ReconcileLostTask
> Using temporary directory 
> '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn'
> I0915 22:28:40.800787  3733 leveldb.cpp:176] Opened db in 252.206266ms
> I0915 22:28:40.851069  3733 leveldb.cpp:183] Compacted db in 50.197346ms
> I0915 22:28:40.851210  3733 leveldb.cpp:198] Created db iterator in 63324ns
> I0915 22:28:40.851256  3733 leveldb.cpp:204] Seeked to beginning of db in 
> 4562ns
> I0915 22:28:40.851286  3733 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 322ns
> I0915 22:28:40.871953  3733 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0915 22:28:40.886368  3756 recover.cpp:449] Starting replica recovery
> I0915 22:28:40.90  3756 recover.cpp:475] Replica is in EMPTY status
> I0915 22:28:40.916332  3759 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0915 22:28:40.917351  3756 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0915 22:28:40.918557  3755 recover.cpp:566] Updating replica status to 
> STARTING
> I0915 22:28:40.928189  3759 master.cpp:380] Master 
> 20150915-222840-16842879-54960-3733 (devstack007.cn.ibm.com) started on 
> 127.0.1.1:54960
> I0915 22:28:40.928261  3759 master.cpp:382] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/credentials"
>  --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
> --registry_strict="true" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/master"
>  --zk_session_timeout="10secs"
> I0915 22:28:40.993895  3759 master.cpp:427] Master only allowing 
> authenticated frameworks to register
> I0915 22:28:40.993962  3759 master.cpp:432] Master only allowing 
> authenticated slaves to register
> I0915 22:28:40.994010  3759 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_2tUQZn/credentials'
> I0915 22:28:40.994776  3759 master.cpp:471] Using default 'crammd5' 
> authenticator
> I0915 22:28:40.995053  3759 authenticator.cpp:512] Initializing server SASL
> I0915 22:28:41.009496  3757 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 90.341573ms
> I0915 22:28:41.009570  3757 replica.cpp:323] Persisted replica status to 
> STARTING
> I0915 22:28:41.010040  3756 recover.cpp:475] Replica is in STARTING status
> I0915 22:28:41.011255  3757 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0915 22:28:41.011551  3752 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0915 22:28:41.012073  3756 recover.cpp:566] Updating replica status to VOTING
> I0915 22:28:41.084720  3753 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 72.469

[jira] [Assigned] (MESOS-3280) Master fails to access replicated log after network partition

2015-09-15 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-3280:
--

Assignee: Neil Conway

> Master fails to access replicated log after network partition
> -
>
> Key: MESOS-3280
> URL: https://issues.apache.org/jira/browse/MESOS-3280
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.23.0
> Environment: Zookeeper version 3.4.5--1
>Reporter: Bernd Mathiske
>Assignee: Neil Conway
>  Labels: mesosphere
>
> In a 5 node cluster with 3 masters and 2 slaves, and ZK on each node, when a 
> network partition is forced, all the masters apparently lose access to their 
> replicated log. The leading master halts. Unknown reasons, but presumably 
> related to replicated log access. The others fail to recover from the 
> replicated log. Unknown reasons. This could have to do with ZK setup, but it 
> might also be a Mesos bug. 
> This was observed in a Chronos test drive scenario described in detail here:
> https://github.com/mesos/chronos/issues/511
> With setup instructions here:
> https://github.com/mesos/chronos/issues/508



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3433) Unmount work dir and persistent volume mounts of other containers in the new mount namespace.

2015-09-15 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-3433:
--
Assignee: Yan Xu

> Unmount work dir and persistent volume mounts of other containers in the new 
> mount namespace.
> -
>
> Key: MESOS-3433
> URL: https://issues.apache.org/jira/browse/MESOS-3433
> Project: Mesos
>  Issue Type: Task
>Reporter: Yan Xu
>Assignee: Yan Xu
>
> As described in this 
> [TODO|https://github.com/apache/mesos/blob/e601e469c64594dd8339352af405cbf26a574ea8/src/slave/containerizer/isolators/filesystem/linux.cpp#L418]:
> {noformat:title=}
>   // TODO(jieyu): Try to unmount work directory mounts and persistent
>   // volume mounts for other containers to release the extra
>   // references to those mounts.
> {noformat}
> This will a best effort attempt to alleviate the race condition between 
> provisioner's container cleanup and new containers copying host mount table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3433) Unmount work dir and persistent volume mounts of other containers in the new mount namespace.

2015-09-15 Thread Yan Xu (JIRA)
Yan Xu created MESOS-3433:
-

 Summary: Unmount work dir and persistent volume mounts of other 
containers in the new mount namespace.
 Key: MESOS-3433
 URL: https://issues.apache.org/jira/browse/MESOS-3433
 Project: Mesos
  Issue Type: Task
Reporter: Yan Xu


As described in this 
[TODO|https://github.com/apache/mesos/blob/e601e469c64594dd8339352af405cbf26a574ea8/src/slave/containerizer/isolators/filesystem/linux.cpp#L418]:
{noformat:title=}
  // TODO(jieyu): Try to unmount work directory mounts and persistent
  // volume mounts for other containers to release the extra
  // references to those mounts.
{noformat}

This will a best effort attempt to alleviate the race condition between 
provisioner's container cleanup and new containers copying host mount table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3433) Unmount work dir and persistent volume mounts of other containers in the new mount namespace.

2015-09-15 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-3433:
--
Sprint: Twitter Mesos Q3 Sprint 5

> Unmount work dir and persistent volume mounts of other containers in the new 
> mount namespace.
> -
>
> Key: MESOS-3433
> URL: https://issues.apache.org/jira/browse/MESOS-3433
> Project: Mesos
>  Issue Type: Task
>Reporter: Yan Xu
>Assignee: Yan Xu
>  Labels: twitter
>
> As described in this 
> [TODO|https://github.com/apache/mesos/blob/e601e469c64594dd8339352af405cbf26a574ea8/src/slave/containerizer/isolators/filesystem/linux.cpp#L418]:
> {noformat:title=}
>   // TODO(jieyu): Try to unmount work directory mounts and persistent
>   // volume mounts for other containers to release the extra
>   // references to those mounts.
> {noformat}
> This will a best effort attempt to alleviate the race condition between 
> provisioner's container cleanup and new containers copying host mount table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3433) Unmount work dir and persistent volume mounts of other containers in the new mount namespace.

2015-09-15 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-3433:
--
Labels: twitter  (was: )

> Unmount work dir and persistent volume mounts of other containers in the new 
> mount namespace.
> -
>
> Key: MESOS-3433
> URL: https://issues.apache.org/jira/browse/MESOS-3433
> Project: Mesos
>  Issue Type: Task
>Reporter: Yan Xu
>Assignee: Yan Xu
>  Labels: twitter
>
> As described in this 
> [TODO|https://github.com/apache/mesos/blob/e601e469c64594dd8339352af405cbf26a574ea8/src/slave/containerizer/isolators/filesystem/linux.cpp#L418]:
> {noformat:title=}
>   // TODO(jieyu): Try to unmount work directory mounts and persistent
>   // volume mounts for other containers to release the extra
>   // references to those mounts.
> {noformat}
> This will a best effort attempt to alleviate the race condition between 
> provisioner's container cleanup and new containers copying host mount table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3431) Refactor Protobuf tests

2015-09-15 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745359#comment-14745359
 ] 

Klaus Ma commented on MESOS-3431:
-

Sure, I'll handle it after MESOS-3405.

> Refactor Protobuf tests
> ---
>
> Key: MESOS-3431
> URL: https://issues.apache.org/jira/browse/MESOS-3431
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Alexander Rukletsov
>Assignee: Klaus Ma
>Priority: Minor
>
> {{ProtobufTest.JSON}} test does several things simultaneously, including 
> message instantiation, conversion, parsing. We should split this test into 
> several independent ones that test just one thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3418) Factor out V1 API test helper functions

2015-09-15 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu reassigned MESOS-3418:
--

Assignee: Guangya Liu

> Factor out V1 API test helper functions
> ---
>
> Key: MESOS-3418
> URL: https://issues.apache.org/jira/browse/MESOS-3418
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joris Van Remoortere
>Assignee: Guangya Liu
>  Labels: beginner, mesosphere, newbie, v1_api
>
> We currently have some helper functionality for V1 API tests. This is copied 
> in a few test files.
> Factor this out into a common place once the API is stabilized.
> {code}
> // Helper class for using EXPECT_CALL since the Mesos scheduler API
>   // is callback based.
>   class Callbacks
>   {
>   public:
> MOCK_METHOD0(connected, void(void));
> MOCK_METHOD0(disconnected, void(void));
> MOCK_METHOD1(received, void(const std::queue&));
>   };
> {code}
> {code}
> // Enqueues all received events into a libprocess queue.
> // TODO(jmlvanre): Factor this common code out of tests into V1
> // helper.
> ACTION_P(Enqueue, queue)
> {
>   std::queue events = arg0;
>   while (!events.empty()) {
> // Note that we currently drop HEARTBEATs because most of these tests
> // are not designed to deal with heartbeats.
> // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
> if (events.front().type() == Event::HEARTBEAT) {
>   VLOG(1) << "Ignoring HEARTBEAT event";
> } else {
>   queue->put(events.front());
> }
> events.pop();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3431) Refactor Protobuf tests

2015-09-15 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma reassigned MESOS-3431:
---

Assignee: Klaus Ma

> Refactor Protobuf tests
> ---
>
> Key: MESOS-3431
> URL: https://issues.apache.org/jira/browse/MESOS-3431
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Alexander Rukletsov
>Assignee: Klaus Ma
>Priority: Minor
>
> {{ProtobufTest.JSON}} test does several things simultaneously, including 
> message instantiation, conversion, parsing. We should split this test into 
> several independent ones that test just one thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)