[jira] [Commented] (MESOS-4389) Master "roles" endpoint only shows active role

2016-01-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101410#comment-15101410
 ] 

Guangya Liu commented on MESOS-4389:


If you are using implicit role, then it is design behavior, please refer to 
https://issues.apache.org/jira/browse/MESOS-4000

If not using implicit role and set {{roles}} when start up master, the all 
roles should be listed from the endpoint.

> Master "roles" endpoint only shows active role
> --
>
> Key: MESOS-4389
> URL: https://issues.apache.org/jira/browse/MESOS-4389
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API, master
>Reporter: Fan Du
>
> Register two slaves to master with role "busybox" and "ubuntu" respectively, 
> then running marthon with role "busybox", after this check master "roles" 
> endpoints, it can only get default and active role, could this be improved to 
> show all available roles for easily checking?
> {code}
> {
> "roles": [
> {
> "frameworks": [],
> "name": "*",
> "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
> },
> "weight": 1.0
> },
> {
> "frameworks": [
> "2caebb14-161f-4941-b8ab-8990cef01ac0-"
> ],
> "name": "busybox",
> "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
> },
> "weight": 1.0
> }
> ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3838) Put authorize logic for teardown into a common function

2016-01-15 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu updated MESOS-3838:
---
Target Version/s: 0.28.0
   Fix Version/s: 0.28.0

> Put authorize logic for teardown into a common function
> ---
>
> Key: MESOS-3838
> URL: https://issues.apache.org/jira/browse/MESOS-3838
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
> Fix For: 0.28.0
>
>
> The mesos now have {{authorizeTask}}, {{authorizeFramework}} and may have 
> {{authorizeReserveResource}} and {{authorizeUnReserveResource}} later. 
> But now the {{Master::Http::teardown()}} is putting the authorize logic in 
> the {{Master::Http::teardown()}} itself, it is better to put authorize logic 
> for teardown into a common function {{authorizeTeardown()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4384) Documentation cannot link to external URLs that end in .md

2016-01-15 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101496#comment-15101496
 ] 

Joerg Schad commented on MESOS-4384:


FYI also links in the form 
[RecordIO](scheduler-http-api.md#recordio-response-format) in 
executor-http-api.md are broken, gets rendered to 
/documentation/latest/executor-http-api/scheduler-http-api.md#recordio-response-format.

> Documentation cannot link to external URLs that end in .md
> --
>
> Key: MESOS-4384
> URL: https://issues.apache.org/jira/browse/MESOS-4384
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Joerg Schad
>Priority: Minor
>  Labels: documentation, mesosphere
>
> Per [~joerg84]: "In fact it seems that all links ending with .md are 
> interpreted as
> relative links on the webpage, i.e. [label](https://test.com/foo.md) is
> rendered into https://test.com/foo/
> ">label"
> Currently the rakefile will rewrite all with this too general regex 
> {code}
> '''f.read.gsub(/\((.*)(\.md)\)/, '(/documentation/latest/\1/)')'''
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4389) Master "roles" endpoint only shows active role

2016-01-15 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101514#comment-15101514
 ] 

Fan Du commented on MESOS-4389:
---

Thanks for notice of impcicit role, I will give it a try.
The two slaves is configured with default role(busybox, ubuntu) respectively, 
master has not set any {{roles}} in command line. I realized when doing so, it 
will become role's on the whitelist,
which means it will show up when querying the roles endpoint.

> Master "roles" endpoint only shows active role
> --
>
> Key: MESOS-4389
> URL: https://issues.apache.org/jira/browse/MESOS-4389
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API, master
>Reporter: Fan Du
>
> Register two slaves to master with role "busybox" and "ubuntu" respectively, 
> then running marthon with role "busybox", after this check master "roles" 
> endpoints, it can only get default and active role, could this be improved to 
> show all available roles for easily checking?
> {code}
> {
> "roles": [
> {
> "frameworks": [],
> "name": "*",
> "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
> },
> "weight": 1.0
> },
> {
> "frameworks": [
> "2caebb14-161f-4941-b8ab-8990cef01ac0-"
> ],
> "name": "busybox",
> "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
> },
> "weight": 1.0
> }
> ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4389) Master "roles" endpoint only shows active role

2016-01-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101529#comment-15101529
 ] 

Guangya Liu commented on MESOS-4389:


As {{master has not set any roles in command line}}, so you are using 
{{implicit role}}. What you complained here is actually design behaviour.

> Master "roles" endpoint only shows active role
> --
>
> Key: MESOS-4389
> URL: https://issues.apache.org/jira/browse/MESOS-4389
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API, master
>Reporter: Fan Du
>
> Register two slaves to master with role "busybox" and "ubuntu" respectively, 
> then running marthon with role "busybox", after this check master "roles" 
> endpoints, it can only get default and active role, could this be improved to 
> show all available roles for easily checking?
> {code}
> {
> "roles": [
> {
> "frameworks": [],
> "name": "*",
> "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
> },
> "weight": 1.0
> },
> {
> "frameworks": [
> "2caebb14-161f-4941-b8ab-8990cef01ac0-"
> ],
> "name": "busybox",
> "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
> },
> "weight": 1.0
> }
> ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4154) Rename shutdown_frameworks to teardown_framework

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4154:
--
Sprint: Mesosphere Sprint 27

> Rename shutdown_frameworks to teardown_framework
> 
>
> Key: MESOS-4154
> URL: https://issues.apache.org/jira/browse/MESOS-4154
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: acl, security
>
> The mesos is now using teardown framework to shutdown a framework but the 
> acls are still using shutdown_framework, it is better to rename 
> shutdown_framework to teardown_framework for acl to keep consistent.
> This is a post review request for https://reviews.apache.org/r/40829/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2379) Disabled master authentication causes authentication retries in the scheduler.

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2379:
--
Labels: authentication tech-debt  (was: tech-debt)

> Disabled master authentication causes authentication retries in the 
> scheduler. 
> ---
>
> Key: MESOS-2379
> URL: https://issues.apache.org/jira/browse/MESOS-2379
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Till Toenshoff
>  Labels: authentication, tech-debt
>
> The CRAM-MD5 authenticator relies upon shared credentials. Not supplying such 
> credentials while starting up a master effectively disables any 
> authentication.
> A framework (or slave) may still attempt to authenticate which is answered by 
> an {{AuthenticationErrorMessage}} by the master. That in turn will cause the 
> authenticatee to fail its {{authenticate}} promise, which in turn will cause 
> the current framework driver implementation to infinitely (and unthrottled) 
> retry authentication.
> See: https://github.com/apache/mesos/blob/master/src/sched/sched.cpp#L372
> {noformat}
> if (reauthenticate || !future.isReady()) {
>   LOG(INFO)
> << "Failed to authenticate with master " << master.get() << ": "
> << (reauthenticate ? "master changed" :
>(future.isFailed() ? future.failure() : "future discarded"));
>   authenticating = None();
>   reauthenticate = false;
>   // TODO(vinod): Add a limit on number of retries.
>   dispatch(self(), &Self::authenticate); // Retry.
>   return;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2379) Disabled master authentication causes authentication retries in the scheduler.

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2379:
--
Component/s: security

> Disabled master authentication causes authentication retries in the 
> scheduler. 
> ---
>
> Key: MESOS-2379
> URL: https://issues.apache.org/jira/browse/MESOS-2379
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Till Toenshoff
>  Labels: authentication, tech-debt
>
> The CRAM-MD5 authenticator relies upon shared credentials. Not supplying such 
> credentials while starting up a master effectively disables any 
> authentication.
> A framework (or slave) may still attempt to authenticate which is answered by 
> an {{AuthenticationErrorMessage}} by the master. That in turn will cause the 
> authenticatee to fail its {{authenticate}} promise, which in turn will cause 
> the current framework driver implementation to infinitely (and unthrottled) 
> retry authentication.
> See: https://github.com/apache/mesos/blob/master/src/sched/sched.cpp#L372
> {noformat}
> if (reauthenticate || !future.isReady()) {
>   LOG(INFO)
> << "Failed to authenticate with master " << master.get() << ": "
> << (reauthenticate ? "master changed" :
>(future.isFailed() ? future.failure() : "future discarded"));
>   authenticating = None();
>   reauthenticate = false;
>   // TODO(vinod): Add a limit on number of retries.
>   dispatch(self(), &Self::authenticate); // Retry.
>   return;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2198) Scheduler#statusUpdate should not be called multiple times for the same status update

2016-01-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101548#comment-15101548
 ] 

Adam B commented on MESOS-2198:
---

Renaming the ticket to reflect our desire for improved documentation.

> Scheduler#statusUpdate should not be called multiple times for the same 
> status update
> -
>
> Key: MESOS-2198
> URL: https://issues.apache.org/jira/browse/MESOS-2198
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Reporter: Robert Lacroix
>
> Currently Scheduler#statusUpdate can be called multiple times for the same 
> status update, for example when the slave retransmits a status update because 
> it's not acknowledged in time. Especially for terminal status updates this 
> can lead to unexpected scheduler behavior when task id's are being reused.
> Consider this scenario:
> * Scheduler schedules task
> * Task fails, slave sends TASK_FAILED
> * Scheduler is busy and libmesos doesn't acknowledge update in time
> * Slave retransmits TASK_FAILED
> * Scheduler eventually receives first TASK_FAILED and reschedules task
> * Second TASK_FAILED triggers statusUpdate again and the scheduler can't 
> determine if the TASK_FAILED belongs to the first or second run of the task.
> It would be a lot better if libmesos would dedupe status updates and only 
> call Scheduler#statusUpdate once per status update it received. Retries with 
> the same UUID shouldn't cause Scheduler#statusUpdate to be executed again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2198) Scheduler#statusUpdate should not be called multiple times for the same status update

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2198:
--
Description: 
Let's update the documentation for TaskID to indicate that reuse is not 
recommended.

-
Old Summary: Scheduler#statusUpdate should not be called multiple times for the 
same status update
Currently Scheduler#statusUpdate can be called multiple times for the same 
status update, for example when the slave retransmits a status update because 
it's not acknowledged in time. Especially for terminal status updates this can 
lead to unexpected scheduler behavior when task id's are being reused.

Consider this scenario:

* Scheduler schedules task
* Task fails, slave sends TASK_FAILED
* Scheduler is busy and libmesos doesn't acknowledge update in time
* Slave retransmits TASK_FAILED
* Scheduler eventually receives first TASK_FAILED and reschedules task
* Second TASK_FAILED triggers statusUpdate again and the scheduler can't 
determine if the TASK_FAILED belongs to the first or second run of the task.

It would be a lot better if libmesos would dedupe status updates and only call 
Scheduler#statusUpdate once per status update it received. Retries with the 
same UUID shouldn't cause Scheduler#statusUpdate to be executed again.

  was:
Currently Scheduler#statusUpdate can be called multiple times for the same 
status update, for example when the slave retransmits a status update because 
it's not acknowledged in time. Especially for terminal status updates this can 
lead to unexpected scheduler behavior when task id's are being reused.

Consider this scenario:

* Scheduler schedules task
* Task fails, slave sends TASK_FAILED
* Scheduler is busy and libmesos doesn't acknowledge update in time
* Slave retransmits TASK_FAILED
* Scheduler eventually receives first TASK_FAILED and reschedules task
* Second TASK_FAILED triggers statusUpdate again and the scheduler can't 
determine if the TASK_FAILED belongs to the first or second run of the task.

It would be a lot better if libmesos would dedupe status updates and only call 
Scheduler#statusUpdate once per status update it received. Retries with the 
same UUID shouldn't cause Scheduler#statusUpdate to be executed again.


> Scheduler#statusUpdate should not be called multiple times for the same 
> status update
> -
>
> Key: MESOS-2198
> URL: https://issues.apache.org/jira/browse/MESOS-2198
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Reporter: Robert Lacroix
>
> Let's update the documentation for TaskID to indicate that reuse is not 
> recommended.
> -
> Old Summary: Scheduler#statusUpdate should not be called multiple times for 
> the same status update
> Currently Scheduler#statusUpdate can be called multiple times for the same 
> status update, for example when the slave retransmits a status update because 
> it's not acknowledged in time. Especially for terminal status updates this 
> can lead to unexpected scheduler behavior when task id's are being reused.
> Consider this scenario:
> * Scheduler schedules task
> * Task fails, slave sends TASK_FAILED
> * Scheduler is busy and libmesos doesn't acknowledge update in time
> * Slave retransmits TASK_FAILED
> * Scheduler eventually receives first TASK_FAILED and reschedules task
> * Second TASK_FAILED triggers statusUpdate again and the scheduler can't 
> determine if the TASK_FAILED belongs to the first or second run of the task.
> It would be a lot better if libmesos would dedupe status updates and only 
> call Scheduler#statusUpdate once per status update it received. Retries with 
> the same UUID shouldn't cause Scheduler#statusUpdate to be executed again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2198) Document that TaskIDs should not be reused

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2198:
--
Summary: Document that TaskIDs should not be reused  (was: 
Scheduler#statusUpdate should not be called multiple times for the same status 
update)

> Document that TaskIDs should not be reused
> --
>
> Key: MESOS-2198
> URL: https://issues.apache.org/jira/browse/MESOS-2198
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Reporter: Robert Lacroix
>
> Let's update the documentation for TaskID to indicate that reuse is not 
> recommended.
> -
> Old Summary: Scheduler#statusUpdate should not be called multiple times for 
> the same status update
> Currently Scheduler#statusUpdate can be called multiple times for the same 
> status update, for example when the slave retransmits a status update because 
> it's not acknowledged in time. Especially for terminal status updates this 
> can lead to unexpected scheduler behavior when task id's are being reused.
> Consider this scenario:
> * Scheduler schedules task
> * Task fails, slave sends TASK_FAILED
> * Scheduler is busy and libmesos doesn't acknowledge update in time
> * Slave retransmits TASK_FAILED
> * Scheduler eventually receives first TASK_FAILED and reschedules task
> * Second TASK_FAILED triggers statusUpdate again and the scheduler can't 
> determine if the TASK_FAILED belongs to the first or second run of the task.
> It would be a lot better if libmesos would dedupe status updates and only 
> call Scheduler#statusUpdate once per status update it received. Retries with 
> the same UUID shouldn't cause Scheduler#statusUpdate to be executed again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2198) Document that TaskIDs should not be reused

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2198:
--
Component/s: documentation

> Document that TaskIDs should not be reused
> --
>
> Key: MESOS-2198
> URL: https://issues.apache.org/jira/browse/MESOS-2198
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation, framework
>Reporter: Robert Lacroix
>
> Let's update the documentation for TaskID to indicate that reuse is not 
> recommended.
> -
> Old Summary: Scheduler#statusUpdate should not be called multiple times for 
> the same status update
> Currently Scheduler#statusUpdate can be called multiple times for the same 
> status update, for example when the slave retransmits a status update because 
> it's not acknowledged in time. Especially for terminal status updates this 
> can lead to unexpected scheduler behavior when task id's are being reused.
> Consider this scenario:
> * Scheduler schedules task
> * Task fails, slave sends TASK_FAILED
> * Scheduler is busy and libmesos doesn't acknowledge update in time
> * Slave retransmits TASK_FAILED
> * Scheduler eventually receives first TASK_FAILED and reschedules task
> * Second TASK_FAILED triggers statusUpdate again and the scheduler can't 
> determine if the TASK_FAILED belongs to the first or second run of the task.
> It would be a lot better if libmesos would dedupe status updates and only 
> call Scheduler#statusUpdate once per status update it received. Retries with 
> the same UUID shouldn't cause Scheduler#statusUpdate to be executed again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2198) Document that TaskIDs should not be reused

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2198:
--
Description: 
Let's update the documentation for TaskID to indicate that reuse is not 
recommended, as per the discussion below.

-
Old Summary: Scheduler#statusUpdate should not be called multiple times for the 
same status update
Currently Scheduler#statusUpdate can be called multiple times for the same 
status update, for example when the slave retransmits a status update because 
it's not acknowledged in time. Especially for terminal status updates this can 
lead to unexpected scheduler behavior when task id's are being reused.

Consider this scenario:

* Scheduler schedules task
* Task fails, slave sends TASK_FAILED
* Scheduler is busy and libmesos doesn't acknowledge update in time
* Slave retransmits TASK_FAILED
* Scheduler eventually receives first TASK_FAILED and reschedules task
* Second TASK_FAILED triggers statusUpdate again and the scheduler can't 
determine if the TASK_FAILED belongs to the first or second run of the task.

It would be a lot better if libmesos would dedupe status updates and only call 
Scheduler#statusUpdate once per status update it received. Retries with the 
same UUID shouldn't cause Scheduler#statusUpdate to be executed again.

  was:
Let's update the documentation for TaskID to indicate that reuse is not 
recommended.

-
Old Summary: Scheduler#statusUpdate should not be called multiple times for the 
same status update
Currently Scheduler#statusUpdate can be called multiple times for the same 
status update, for example when the slave retransmits a status update because 
it's not acknowledged in time. Especially for terminal status updates this can 
lead to unexpected scheduler behavior when task id's are being reused.

Consider this scenario:

* Scheduler schedules task
* Task fails, slave sends TASK_FAILED
* Scheduler is busy and libmesos doesn't acknowledge update in time
* Slave retransmits TASK_FAILED
* Scheduler eventually receives first TASK_FAILED and reschedules task
* Second TASK_FAILED triggers statusUpdate again and the scheduler can't 
determine if the TASK_FAILED belongs to the first or second run of the task.

It would be a lot better if libmesos would dedupe status updates and only call 
Scheduler#statusUpdate once per status update it received. Retries with the 
same UUID shouldn't cause Scheduler#statusUpdate to be executed again.


> Document that TaskIDs should not be reused
> --
>
> Key: MESOS-2198
> URL: https://issues.apache.org/jira/browse/MESOS-2198
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation, framework
>Reporter: Robert Lacroix
>
> Let's update the documentation for TaskID to indicate that reuse is not 
> recommended, as per the discussion below.
> -
> Old Summary: Scheduler#statusUpdate should not be called multiple times for 
> the same status update
> Currently Scheduler#statusUpdate can be called multiple times for the same 
> status update, for example when the slave retransmits a status update because 
> it's not acknowledged in time. Especially for terminal status updates this 
> can lead to unexpected scheduler behavior when task id's are being reused.
> Consider this scenario:
> * Scheduler schedules task
> * Task fails, slave sends TASK_FAILED
> * Scheduler is busy and libmesos doesn't acknowledge update in time
> * Slave retransmits TASK_FAILED
> * Scheduler eventually receives first TASK_FAILED and reschedules task
> * Second TASK_FAILED triggers statusUpdate again and the scheduler can't 
> determine if the TASK_FAILED belongs to the first or second run of the task.
> It would be a lot better if libmesos would dedupe status updates and only 
> call Scheduler#statusUpdate once per status update it received. Retries with 
> the same UUID shouldn't cause Scheduler#statusUpdate to be executed again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1995) Provide a way for frameworks to clear their resource filters

2016-01-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101555#comment-15101555
 ] 

Adam B commented on MESOS-1995:
---

How does `reviveOffers()` not solve this? I thought that had been around for a 
long time.

> Provide a way for frameworks to clear their resource filters
> 
>
> Key: MESOS-1995
> URL: https://issues.apache.org/jira/browse/MESOS-1995
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>
> Frameworks can provide filters on resources via launchTasks() or 
> declineOffers() but there is no way for them to clear previously set filters. 
> This makes it hard for frameworks to play nicely in a multi-framework 
> cluster. While, frameworks can keep declining offers it makes more sense to 
> give frameworks more control.
> Concrete example: The Jenkins framework currently disconnects itself when its 
> build queue is empty so that it doesn't keep receiving offers. This allows 
> other frameworks to get resources more quickly. This is a bit of a hack. It 
> would be great if Jenkins can do this more explicitly via an API call.
> We already have a no-op resourcesRequested() API call. I propose we just 
> clear the filters when we receive this on the allocator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1934) Tasks created with mesos-execute disappear from task list at termination

2016-01-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101556#comment-15101556
 ] 

Adam B commented on MESOS-1934:
---

The front page of the Mesos UI only shows the tasks from currently registered 
frameworks. Since the `mesos-execute` scheduler exits after its task completes, 
it goes into the completed frameworks list. Go to the "Frameworks" tab and you 
should see it there, and you can click through to get to its completed tasks.
If this is satisfactory, I'd like to close this issue as "Not a problem"

> Tasks created with mesos-execute disappear from task list at termination
> 
>
> Key: MESOS-1934
> URL: https://issues.apache.org/jira/browse/MESOS-1934
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.20.1
> Environment: Linux 3.13.0-34-generic kernel, Ubuntu 14.04
>Reporter: Ben Hardy
>Priority: Minor
>
> We are noticing that when tasks created with mesos-execute terminate, either 
> normally or abnormally, the task disappears from the task list. They do not 
> appear in the "Completed" section as you would expect, or anywhere else in 
> the page.
> This makes problem diagnosis a bit inconvenient since one has to go digging 
> around in the logs on slave nodes to find out what went wrong, rather than 
> just being able to look at logs in the UI as you can with tasks submitted by 
> Singularity or Marathon.
> Not a big deal but would be a nice time saver, and make things more 
> consistent.
> Thanks,
> B



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1887) Allow to specify programs to use in the fetcher

2016-01-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101557#comment-15101557
 ] 

Adam B commented on MESOS-1887:
---

cc: [~jieyu] who is working on a related update to the fetcher

> Allow to specify programs to use in the fetcher
> ---
>
> Key: MESOS-1887
> URL: https://issues.apache.org/jira/browse/MESOS-1887
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, slave
>Reporter: Tobias Weingartner
>
> In environments with multiple different hadoop versions, the hadoop version 
> on the slave's {{$PATH}}, or found via {{$HADOOP_HOME}} may not be the always 
> globally correct version of hadoop.  IE:
>   * The slave may be fetching executors from hadoop1
>   * The execution may be against/for a hadoop2
> Unfortunately, {{hdfs://}} urls are not specific enough to be able to 
> dis-ambiguate the particular version of hadoop you should use to access the 
> hadoop/hdfs cluster given in the url.
> Off hand, I can see two possible solutions:
>   * extra information from the framework in the task struct to help identify 
> the fetchers environment for fetching this particular executor (different 
> from the slave's environment, different from a task's environment).
>   * fetchers global environment (different from slave and task's 
> environemnt), allowing a different hadoop/hdfs cluster to be used to fetch 
> _all_ executors from within hdfs
>   * a method to replace the fetcher with site specific scripts/programs
> Other?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1280) Add replace task primitive

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-1280:
--
Labels: mesosphere  (was: )

> Add replace task primitive
> --
>
> Key: MESOS-1280
> URL: https://issues.apache.org/jira/browse/MESOS-1280
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api, master, slave
>Reporter: Niklas Quarfot Nielsen
>  Labels: mesosphere
>
> Also along the lines of MESOS-938, replaceTask would one of a couple of 
> primitives needed to support various task replacement and scaling scenarios. 
> This replaceTask() version is significantly simpler than the first proposed 
> one; it's only responsibility is to run a new task info on a running tasks 
> resources.
> The running task will be killed as usual, but the newly freed resources will 
> never be announced and the new task will run on them instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1184) Support running nested slaves.

2016-01-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101565#comment-15101565
 ] 

Adam B commented on MESOS-1184:
---

Works for us.
https://github.com/mesosphere/mom

> Support running nested slaves.
> --
>
> Key: MESOS-1184
> URL: https://issues.apache.org/jira/browse/MESOS-1184
> Project: Mesos
>  Issue Type: Task
>Affects Versions: 0.18.0
>Reporter: Ian Downes
>
> We should be able to support (perhaps assuming root priviledges) a slave 
> running within a container, i.e., nesting slaves.
> This could be very useful for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4200) Test case(s) for weights + allocation behavior

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4200:
--
Sprint: Mesosphere Sprint 27

> Test case(s) for weights + allocation behavior
> --
>
> Key: MESOS-4200
> URL: https://issues.apache.org/jira/browse/MESOS-4200
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, test
>Reporter: Neil Conway
>Assignee: Yongqiao Wang
>  Labels: mesosphere, test, weight
>
> As far as I can see, we currently have NO test cases for behavior when 
> weights are defined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3421) Support sharing of resources across task instances

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3421:
--
Issue Type: Epic  (was: Improvement)

> Support sharing of resources across task instances
> --
>
> Key: MESOS-3421
> URL: https://issues.apache.org/jira/browse/MESOS-3421
> Project: Mesos
>  Issue Type: Epic
>  Components: general
>Affects Versions: 0.23.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: external-volumes, persistent-volumes
>
> A service that needs persistent volume needs to have access to the same 
> persistent volume (RW) from multiple task(s) instances on the same agent 
> node. Currently, a persistent volume once offered to the framework(s) can be 
> scheduled to a task and until that tasks terminates, that persistent volume 
> cannot be used by another task.
> Explore providing the capability of sharing persistent volumes across task 
> instances scheduled on a single agent node.
> Based on discussion within the community, we would allow sharing of resources 
> in general, and add support to enable shareability for persistent volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3421) Support sharing of resources across task instances

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3421:
--
  Epic Name: Shared Resources
Component/s: (was: general)
 volumes
 isolation

> Support sharing of resources across task instances
> --
>
> Key: MESOS-3421
> URL: https://issues.apache.org/jira/browse/MESOS-3421
> Project: Mesos
>  Issue Type: Epic
>  Components: isolation, volumes
>Affects Versions: 0.23.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: external-volumes, persistent-volumes
>
> A service that needs persistent volume needs to have access to the same 
> persistent volume (RW) from multiple task(s) instances on the same agent 
> node. Currently, a persistent volume once offered to the framework(s) can be 
> scheduled to a task and until that tasks terminates, that persistent volume 
> cannot be used by another task.
> Explore providing the capability of sharing persistent volumes across task 
> instances scheduled on a single agent node.
> Based on discussion within the community, we would allow sharing of resources 
> in general, and add support to enable shareability for persistent volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4390) Shared Volumes Design Doc

2016-01-15 Thread Adam B (JIRA)
Adam B created MESOS-4390:
-

 Summary: Shared Volumes Design Doc
 Key: MESOS-4390
 URL: https://issues.apache.org/jira/browse/MESOS-4390
 Project: Mesos
  Issue Type: Task
Reporter: Adam B


Review & Approve



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4390) Shared Volumes Design Doc

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4390:
--
Shepherd: Adam B
Assignee: Anindya Sinha
  Sprint: Mesosphere Sprint 27
Story Points: 3
  Labels: mesosphere  (was: )
 Description: Review & Approve design doc  (was: Review & Approve)

> Shared Volumes Design Doc
> -
>
> Key: MESOS-4390
> URL: https://issues.apache.org/jira/browse/MESOS-4390
> Project: Mesos
>  Issue Type: Task
>Reporter: Adam B
>Assignee: Anindya Sinha
>  Labels: mesosphere
>
> Review & Approve design doc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1887) Allow to specify programs to use in the fetcher

2016-01-15 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101576#comment-15101576
 ] 

haosdent commented on MESOS-1887:
-

I saw the new fetcher module add --hadoop_client to specify the hadoop client 
path. No sure whether this flag is passed when runtime or when start agent in 
the future. If it is passed when runtime, I think this issue have already been 
resolved.

> Allow to specify programs to use in the fetcher
> ---
>
> Key: MESOS-1887
> URL: https://issues.apache.org/jira/browse/MESOS-1887
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, slave
>Reporter: Tobias Weingartner
>
> In environments with multiple different hadoop versions, the hadoop version 
> on the slave's {{$PATH}}, or found via {{$HADOOP_HOME}} may not be the always 
> globally correct version of hadoop.  IE:
>   * The slave may be fetching executors from hadoop1
>   * The execution may be against/for a hadoop2
> Unfortunately, {{hdfs://}} urls are not specific enough to be able to 
> dis-ambiguate the particular version of hadoop you should use to access the 
> hadoop/hdfs cluster given in the url.
> Off hand, I can see two possible solutions:
>   * extra information from the framework in the task struct to help identify 
> the fetchers environment for fetching this particular executor (different 
> from the slave's environment, different from a task's environment).
>   * fetchers global environment (different from slave and task's 
> environemnt), allowing a different hadoop/hdfs cluster to be used to fetch 
> _all_ executors from within hdfs
>   * a method to replace the fetcher with site specific scripts/programs
> Other?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3421) Support sharing of resources across task instances

2016-01-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101579#comment-15101579
 ] 

Adam B commented on MESOS-3421:
---

Upgraded this to an Epic and broke the design doc out into its own task.

> Support sharing of resources across task instances
> --
>
> Key: MESOS-3421
> URL: https://issues.apache.org/jira/browse/MESOS-3421
> Project: Mesos
>  Issue Type: Epic
>  Components: isolation, volumes
>Affects Versions: 0.23.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: external-volumes, persistent-volumes
>
> A service that needs persistent volume needs to have access to the same 
> persistent volume (RW) from multiple task(s) instances on the same agent 
> node. Currently, a persistent volume once offered to the framework(s) can be 
> scheduled to a task and until that tasks terminates, that persistent volume 
> cannot be used by another task.
> Explore providing the capability of sharing persistent volumes across task 
> instances scheduled on a single agent node.
> Based on discussion within the community, we would allow sharing of resources 
> in general, and add support to enable shareability for persistent volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4330) MasterQuotaTest.NoAuthenticationNoAuthorization cannot be execute in isolation

2016-01-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-4330:
---

Assignee: Benjamin Bannier

> MasterQuotaTest.NoAuthenticationNoAuthorization cannot be execute in isolation
> --
>
> Key: MESOS-4330
> URL: https://issues.apache.org/jira/browse/MESOS-4330
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere, tech-debt
>
> Executing {{MasterQuotaTest.NoAuthenticationNoAuthorization}} from 
> {{468b8ec}} under OS X 10.10.5 in isolation fails due to missing cleanup,
> {code}
> % ./bin/mesos-tests.sh 
> --gtest_filter=MasterQuotaTest.NoAuthenticationNoAuthorization
> Source directory: /ABC/DEF/src/mesos
> Build directory: /ABC/DEF/src/mesos/build
> -
> We cannot run any Docker tests because:
> Docker tests not supported on non-Linux systems
> -
> /usr/bin/nc
> /usr/bin/curl
> Note: Google Test filter = 
> MasterQuotaTest.NoAuthenticationNoAuthorization-HealthCheckTest.ROOT_DOCKER_DockerHealthyTask:HealthCheckTest.ROOT_DOCKER_DockerHealthStatusChange:HierarchicalAllocator_BENCHMARK_Test.DeclineOffers:HookTest.ROOT_DOCKER_VerifySlavePreLaunchDockerHook:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DISABLED_ROOT_RunTaskWithCommandInfoWithUser:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_DestroyWhilePulling:DockerContainerizerTest.ROOT_DOCKER_ExecutorCleanupWhenLaunchFailed:DockerContainerizerTest.ROOT_DOCKER_FetchFailure:DockerContainerizerTest.ROOT_DOCKER_DockerPullFailure:DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_parsing_version:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:DockerTest.ROOT_DOCKER_MountRelative:DockerTest.ROOT_DOCKER_MountAbsolute:CopyBackendTest.ROOT_CopyBackend:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/0:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/1:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/2:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/3:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/4:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/5:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/6:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/7:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/8:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/9:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/10:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/11:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/12:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/13:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/14:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/15:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/16:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/17:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/18:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/19:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/20:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/21:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/22:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/23:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.Add

[jira] [Updated] (MESOS-4388) Need validate role name

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4388:
--
Labels: mesosphere persistent-volumes  (was: )

> Need validate role name
> ---
>
> Key: MESOS-4388
> URL: https://issues.apache.org/jira/browse/MESOS-4388
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: mesosphere, persistent-volumes
>
> As [~qianzhang] and [~adam-mesos] mentioned in 
> [MESOS-2210|https://issues.apache.org/jira/browse/MESOS-2210]
> - We should validate FrameworkInfo.role on registration; send FrameworkError 
> if invalid.
> - We could also validate roles in --weights; exit master if error.
> - We could also validate roles in --resources reservations; exit agent if 
> error.
> - We could also validate roles in operator-requested reservations and quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4330) MasterQuotaTest.NoAuthenticationNoAuthorization cannot be execute in isolation

2016-01-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4330:

Assignee: (was: Benjamin Bannier)

> MasterQuotaTest.NoAuthenticationNoAuthorization cannot be execute in isolation
> --
>
> Key: MESOS-4330
> URL: https://issues.apache.org/jira/browse/MESOS-4330
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>  Labels: mesosphere, tech-debt
>
> Executing {{MasterQuotaTest.NoAuthenticationNoAuthorization}} from 
> {{468b8ec}} under OS X 10.10.5 in isolation fails due to missing cleanup,
> {code}
> % ./bin/mesos-tests.sh 
> --gtest_filter=MasterQuotaTest.NoAuthenticationNoAuthorization
> Source directory: /ABC/DEF/src/mesos
> Build directory: /ABC/DEF/src/mesos/build
> -
> We cannot run any Docker tests because:
> Docker tests not supported on non-Linux systems
> -
> /usr/bin/nc
> /usr/bin/curl
> Note: Google Test filter = 
> MasterQuotaTest.NoAuthenticationNoAuthorization-HealthCheckTest.ROOT_DOCKER_DockerHealthyTask:HealthCheckTest.ROOT_DOCKER_DockerHealthStatusChange:HierarchicalAllocator_BENCHMARK_Test.DeclineOffers:HookTest.ROOT_DOCKER_VerifySlavePreLaunchDockerHook:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DISABLED_ROOT_RunTaskWithCommandInfoWithUser:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_DestroyWhilePulling:DockerContainerizerTest.ROOT_DOCKER_ExecutorCleanupWhenLaunchFailed:DockerContainerizerTest.ROOT_DOCKER_FetchFailure:DockerContainerizerTest.ROOT_DOCKER_DockerPullFailure:DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_parsing_version:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:DockerTest.ROOT_DOCKER_MountRelative:DockerTest.ROOT_DOCKER_MountAbsolute:CopyBackendTest.ROOT_CopyBackend:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/0:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/1:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/2:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/3:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/4:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/5:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/6:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/7:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/8:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/9:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/10:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/11:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/12:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/13:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/14:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/15:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/16:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/17:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/18:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/19:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/20:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/21:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/22:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/23:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/24:SlaveAndFrameworkC

[jira] [Updated] (MESOS-4130) Document how the fetcher can reach across a proxy connection.

2016-01-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4130:
--
Attachment: signature.asc

Submitted.




> Document how the fetcher can reach across a proxy connection.
> -
>
> Key: MESOS-4130
> URL: https://issues.apache.org/jira/browse/MESOS-4130
> Project: Mesos
>  Issue Type: Documentation
>  Components: fetcher
>Reporter: Bernd Mathiske
>Assignee: Shuai Lin
>  Labels: mesosphere, newbie
> Attachments: signature.asc
>
>
> The fetcher uses libcurl for downloading content from HTTP, HTTPS, etc. There 
> is no source code in the pertinent parts of "net.hpp" that deals with proxy 
> settings. However, libcurl automatically picks up certain environment 
> variables and adjusts its settings accordingly. See "man libcurl-tutorial" 
> for details. See section "Proxies", subsection "Environment Variables". If 
> you follow this recipe in your Mesos agent startup script, you can use a 
> proxy. 
> We should document this in the fetcher (cache) doc 
> (http://mesos.apache.org/documentation/latest/fetcher/).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4329) SlaveTest.LaunchTaskInfoWithContainerInfo cannot be execute in isolation

2016-01-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-4329:
---

Assignee: Benjamin Bannier

> SlaveTest.LaunchTaskInfoWithContainerInfo cannot be execute in isolation
> 
>
> Key: MESOS-4329
> URL: https://issues.apache.org/jira/browse/MESOS-4329
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere, tech-debt
>
> Executing {{SlaveTest.LaunchTaskInfoWithContainerInfo}} from {{468b8ec}} 
> under OS X 10.10.5 in isolation fails due to missing cleanup,
> {code}
> % ./bin/mesos-tests.sh 
> --gtest_filter=SlaveTest.LaunchTaskInfoWithContainerInfo
> Source directory: /ABC/DEF/src/mesos
> Build directory: /ABC/DEF/src/mesos/build
> -
> We cannot run any Docker tests because:
> Docker tests not supported on non-Linux systems
> -
> /usr/bin/nc
> /usr/bin/curl
> Note: Google Test filter = 
> SlaveTest.LaunchTaskInfoWithContainerInfo-HealthCheckTest.ROOT_DOCKER_DockerHealthyTask:HealthCheckTest.ROOT_DOCKER_DockerHealthStatusChange:HierarchicalAllocator_BENCHMARK_Test.DeclineOffers:HookTest.ROOT_DOCKER_VerifySlavePreLaunchDockerHook:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DISABLED_ROOT_RunTaskWithCommandInfoWithUser:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_DestroyWhilePulling:DockerContainerizerTest.ROOT_DOCKER_ExecutorCleanupWhenLaunchFailed:DockerContainerizerTest.ROOT_DOCKER_FetchFailure:DockerContainerizerTest.ROOT_DOCKER_DockerPullFailure:DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_parsing_version:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:DockerTest.ROOT_DOCKER_MountRelative:DockerTest.ROOT_DOCKER_MountAbsolute:CopyBackendTest.ROOT_CopyBackend:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/0:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/1:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/2:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/3:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/4:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/5:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/6:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/7:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/8:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/9:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/10:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/11:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/12:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/13:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/14:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/15:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/16:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/17:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/18:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/19:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/20:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/21:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/22:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/23:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/24:SlaveAndFram

[jira] [Updated] (MESOS-3608) Optionally install test binaries.

2016-01-15 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3608:
---
Summary: Optionally install test binaries.  (was: optionally install test 
binaries)

> Optionally install test binaries.
> -
>
> Key: MESOS-3608
> URL: https://issues.apache.org/jira/browse/MESOS-3608
> Project: Mesos
>  Issue Type: Improvement
>  Components: build, test
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>  Labels: mesosphere
>
> Many of the tests in Mesos could be described as integration tests, since 
> they have external dependencies on kernel features, installed tools, 
> permissions, etc. I'd like to be able to generate a {{mesos-tests}} RPM along 
> with my {{mesos}} RPM so that I can run the same tests in different 
> deployment environments.
> I propose a new configuration option named {{--enable-test-tools}} that will 
> install the tests into {{libexec/mesos/tests}}. I'll also need to make some 
> minor changes to tests so that helper tools can be found in this location as 
> well as in the build directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3763) Need for http::put request method

2016-01-15 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-3763:
---
Assignee: Yongqiao Wang  (was: Joerg Schad)

> Need for http::put request method
> -
>
> Key: MESOS-3763
> URL: https://issues.apache.org/jira/browse/MESOS-3763
> Project: Mesos
>  Issue Type: Task
>Reporter: Joerg Schad
>Assignee: Yongqiao Wang
>Priority: Minor
>  Labels: mesosphere
>
> As we decided to create a more restful api for managing Quota request.
> Therefore we also want to use the HTTP put request and hence need to enable 
> the libprocess/http to send put request besides get and post requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4363) Add a roles field to FrameworkInfo

2016-01-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4363:
--
Component/s: master
 framework

> Add a roles field to FrameworkInfo
> --
>
> Key: MESOS-4363
> URL: https://issues.apache.org/jira/browse/MESOS-4363
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework, master
>Reporter: Benjamin Bannier
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> To represent multiple roles per framework a new repeated string field for 
> roles is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4363) Add a roles field to FrameworkInfo

2016-01-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4363:
--
  Sprint: Mesosphere Sprint 27
Story Points: 1

> Add a roles field to FrameworkInfo
> --
>
> Key: MESOS-4363
> URL: https://issues.apache.org/jira/browse/MESOS-4363
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework, master
>Reporter: Benjamin Bannier
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> To represent multiple roles per framework a new repeated string field for 
> roles is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4364) Add roles validation code to master

2016-01-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4364:

  Sprint: Mesosphere Sprint 27
Story Points: 5
 Component/s: master

> Add roles validation code to master
> ---
>
> Key: MESOS-4364
> URL: https://issues.apache.org/jira/browse/MESOS-4364
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Bannier
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> A {{FrameworkInfo}} can only have one of role or roles. A natural location 
> for this appears to be under {{validation::operation::validate}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4365) Add internal migration from role to roles to master

2016-01-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4365:

  Sprint: Mesosphere Sprint 27
Story Points: 3
 Component/s: master

> Add internal migration from role to roles to master
> ---
>
> Key: MESOS-4365
> URL: https://issues.apache.org/jira/browse/MESOS-4365
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Bannier
>  Labels: mesosphere
>
> If only the {{role}} field is given, add it as single entry to {{roles}}. Add 
> a note to {{CHANGELOG}}/release notes on deprecation of the existing {{role}} 
> field. File a JIRA issue for removal of that migration code once the 
> deprecation cycle is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4366) Migrate all existing uses of FrameworkInfo.role to FrameworkInfo.roles

2016-01-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4366:

  Sprint: Mesosphere Sprint 27
Story Points: 3
 Component/s: slave
  master
  framework

> Migrate all existing uses of FrameworkInfo.role to FrameworkInfo.roles
> --
>
> Key: MESOS-4366
> URL: https://issues.apache.org/jira/browse/MESOS-4366
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework, master, slave
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4367) Add tracking of the role a Resource was offered for

2016-01-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4367:

  Sprint: Mesosphere Sprint 27
Story Points: 5
 Component/s: master

> Add tracking of the role a Resource was offered for
> ---
>
> Key: MESOS-4367
> URL: https://issues.apache.org/jira/browse/MESOS-4367
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> If a framework can have multiple roles, we need a way to identify for which 
> of the framework's role a resource was offered for (e.g., for resource 
> recovery and reconciliation).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4368) Make HierarchicalAllocatorProcess set a Resource's active role during allocation

2016-01-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4368:

  Sprint: Mesosphere Sprint 27
Story Points: 3
 Component/s: allocation

> Make HierarchicalAllocatorProcess set a Resource's active role during 
> allocation
> 
>
> Key: MESOS-4368
> URL: https://issues.apache.org/jira/browse/MESOS-4368
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>  Labels: mesosphere
>
> The concrete implementation here depends on the implementation strategy used 
> to solve MESOS-4367.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4368) Make HierarchicalAllocatorProcess set a Resource's active role during allocation

2016-01-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4368:

Labels: mesosphere  (was: )

> Make HierarchicalAllocatorProcess set a Resource's active role during 
> allocation
> 
>
> Key: MESOS-4368
> URL: https://issues.apache.org/jira/browse/MESOS-4368
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>  Labels: mesosphere
>
> The concrete implementation here depends on the implementation strategy used 
> to solve MESOS-4367.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2179) ExamplesTest.NoExecutorFramework terminates with segmentation fault

2016-01-15 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101718#comment-15101718
 ] 

Joerg Schad commented on MESOS-2179:


Note the segfault occurs in the testframework not the test itself (i.e. tests 
will continue afterwards).
So far could not reproduce the behaviors (even with centos 7 inside docker). 
Will investigate further.

> ExamplesTest.NoExecutorFramework terminates with segmentation fault
> ---
>
> Key: MESOS-2179
> URL: https://issues.apache.org/jira/browse/MESOS-2179
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
> Environment: Centos7 inside Docker
> Mesos master commit: 49d4553a0645624179f17ed6da8d2443e88998bf
>Reporter: Cody Maloney
>Assignee: Joerg Schad
>Priority: Minor
>  Labels: flaky, mesosphere
>
> {code}
> [ RUN  ] ExamplesTest.NoExecutorFramework
> ../../src/tests/script.cpp:83: Failure
> Failed
> no_executor_framework_test.sh terminated with signal Segmentation fault
> [  FAILED  ] ExamplesTest.NoExecutorFramework (2543 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4391) docker pull a remote image conflict

2016-01-15 Thread qinlu (JIRA)
qinlu created MESOS-4391:


 Summary: docker pull a remote image conflict
 Key: MESOS-4391
 URL: https://issues.apache.org/jira/browse/MESOS-4391
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.26.0
 Environment: CentOS Linux release 7.2.1511 (Core)
3.10.0-327.el7.x86_64
Reporter: qinlu


I run a docker app with 3 tasks,and the docker image not exist in the slave ,it 
must to pull from docker.io.
Marathon assign 2 app run in a slave,and the last in another.

I see the log by journalctl,it show me like this :level=error msg="HTTP Error" 
err="No such image: solr:latest" statusCode=404.

There is two threads to pull the image

[root@** ~]# ps -ef|grep solr
root 30113 10735  0 12:17 ?00:00:00 docker -H 
unix:///var/run/docker.sock pull solr:latest
root 30114 10735  0 12:17 ?00:00:00 docker -H 
unix:///var/run/docker.sock pull solr:latest




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4391) docker pull a remote image conflict

2016-01-15 Thread qinlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qinlu updated MESOS-4391:
-
Component/s: framework

> docker pull a remote image conflict
> ---
>
> Key: MESOS-4391
> URL: https://issues.apache.org/jira/browse/MESOS-4391
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework
>Affects Versions: 0.26.0
> Environment: CentOS Linux release 7.2.1511 (Core)
> 3.10.0-327.el7.x86_64
>Reporter: qinlu
>
> I run a docker app with 3 tasks,and the docker image not exist in the slave 
> ,it must to pull from docker.io.
> Marathon assign 2 app run in a slave,and the last in another.
> I see the log by journalctl,it show me like this :level=error msg="HTTP 
> Error" err="No such image: solr:latest" statusCode=404.
> There is two threads to pull the image
> [root@** ~]# ps -ef|grep solr
> root 30113 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest
> root 30114 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4368) Make HierarchicalAllocatorProcess set a Resource's active role during allocation

2016-01-15 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht reassigned MESOS-4368:
---

Assignee: Jan Schlicht

> Make HierarchicalAllocatorProcess set a Resource's active role during 
> allocation
> 
>
> Key: MESOS-4368
> URL: https://issues.apache.org/jira/browse/MESOS-4368
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> The concrete implementation here depends on the implementation strategy used 
> to solve MESOS-4367.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4365) Add internal migration from role to roles to master

2016-01-15 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht reassigned MESOS-4365:
---

Assignee: Jan Schlicht

> Add internal migration from role to roles to master
> ---
>
> Key: MESOS-4365
> URL: https://issues.apache.org/jira/browse/MESOS-4365
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Bannier
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> If only the {{role}} field is given, add it as single entry to {{roles}}. Add 
> a note to {{CHANGELOG}}/release notes on deprecation of the existing {{role}} 
> field. File a JIRA issue for removal of that migration code once the 
> deprecation cycle is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-15 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4392:
-

 Summary: Balance quota frameworks with non-quota, greedy 
frameworks.
 Key: MESOS-4392
 URL: https://issues.apache.org/jira/browse/MESOS-4392
 Project: Mesos
  Issue Type: Improvement
  Components: allocation, master
Reporter: Bernd Mathiske
Assignee: Alexander Rukletsov


Maximize resource utilization and minimize starvation risk for both quota 
frameworks and non-quota, greedy frameworks when competing with each other.

A greedy analytics batch system wants to use as much of the cluster as possible 
to maximize computational throughput. When a competing web service with fixed 
task size starts up, there must be sufficient resources to run it immediately. 
The operator can reserve these resources by setting quota. However, if these 
resources are kept idle until the service is in use, this is wasteful from the 
analytics job's point of view. On the other hand, the analytics job should hand 
back reserved resources to the service when needed to avoid starvation of the 
latter.

We can assume that often, the resources needed by the service will be of the 
non-revocable variety. Here we need to introduce clearer distinctions between 
oversubscribed and revocable resources that are not oversubscribed. An 
oversubscribed resource cannot be converted into a non-revocable resource, not 
even by preemption. In contrast, a non-oversubscribed, revocable resource can 
be converted into a non-revocable resource.

Another related topic is optimistic offers. The pertinent aspect in this 
context is again whether resources are oversubscribed or not.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4392:
--
Epic Name: Revocable by default

> Balance quota frameworks with non-quota, greedy frameworks.
> ---
>
> Key: MESOS-4392
> URL: https://issues.apache.org/jira/browse/MESOS-4392
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master
>Reporter: Bernd Mathiske
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Maximize resource utilization and minimize starvation risk for both quota 
> frameworks and non-quota, greedy frameworks when competing with each other.
> A greedy analytics batch system wants to use as much of the cluster as 
> possible to maximize computational throughput. When a competing web service 
> with fixed task size starts up, there must be sufficient resources to run it 
> immediately. The operator can reserve these resources by setting quota. 
> However, if these resources are kept idle until the service is in use, this 
> is wasteful from the analytics job's point of view. On the other hand, the 
> analytics job should hand back reserved resources to the service when needed 
> to avoid starvation of the latter.
> We can assume that often, the resources needed by the service will be of the 
> non-revocable variety. Here we need to introduce clearer distinctions between 
> oversubscribed and revocable resources that are not oversubscribed. An 
> oversubscribed resource cannot be converted into a non-revocable resource, 
> not even by preemption. In contrast, a non-oversubscribed, revocable resource 
> can be converted into a non-revocable resource.
> Another related topic is optimistic offers. The pertinent aspect in this 
> context is again whether resources are oversubscribed or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4392:
--
Issue Type: Epic  (was: Improvement)

> Balance quota frameworks with non-quota, greedy frameworks.
> ---
>
> Key: MESOS-4392
> URL: https://issues.apache.org/jira/browse/MESOS-4392
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master
>Reporter: Bernd Mathiske
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Maximize resource utilization and minimize starvation risk for both quota 
> frameworks and non-quota, greedy frameworks when competing with each other.
> A greedy analytics batch system wants to use as much of the cluster as 
> possible to maximize computational throughput. When a competing web service 
> with fixed task size starts up, there must be sufficient resources to run it 
> immediately. The operator can reserve these resources by setting quota. 
> However, if these resources are kept idle until the service is in use, this 
> is wasteful from the analytics job's point of view. On the other hand, the 
> analytics job should hand back reserved resources to the service when needed 
> to avoid starvation of the latter.
> We can assume that often, the resources needed by the service will be of the 
> non-revocable variety. Here we need to introduce clearer distinctions between 
> oversubscribed and revocable resources that are not oversubscribed. An 
> oversubscribed resource cannot be converted into a non-revocable resource, 
> not even by preemption. In contrast, a non-oversubscribed, revocable resource 
> can be converted into a non-revocable resource.
> Another related topic is optimistic offers. The pertinent aspect in this 
> context is again whether resources are oversubscribed or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4393) Draft design document for resource revocability by default.

2016-01-15 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4393:
-

 Summary: Draft design document for resource revocability by 
default.
 Key: MESOS-4393
 URL: https://issues.apache.org/jira/browse/MESOS-4393
 Project: Mesos
  Issue Type: Task
  Components: allocation, master
Reporter: Bernd Mathiske
Assignee: Alexander Rukletsov


Create a design document for setting offered resources as "revocable by 
default". Greedy frameworks can then temporarily use resources set aside to 
satisfy quota.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101838#comment-15101838
 ] 

Guangya Liu commented on MESOS-4392:


Just some early questions, so seems we want to do the things to enable Quoted 
resources can be lend out to other frameworks if current framework is idle.

There is a problem for this: The current Quota logic only define the {{Quota 
Guaranteed}} resources which we can treat it as {{Minimum ownership}} of one 
role and I know that we have plan to introduce {{Quota Limit}} which can define 
kind of {{Maximum ownership}} resources reserved by a framework. I think that 
we should ONLY treat the resources between {{Quota Limit}} and {{Quota 
Guaranteed}} as quota revocable resources.  The admin can define a small 
{{Quota Guaranteed}} if s/he do not want to waste too much resources, then the 
resources between {{Quota Limit}} and {{Quota Guaranteed}} can be treated as 
revocable resources and lend out to other frameworks.

With this solution, seems the {{Quota Limit}} does not make much sense: The 
admin can set a large {{Quota Guaranteed}} and lend out its resources to other 
frameworks.

> Balance quota frameworks with non-quota, greedy frameworks.
> ---
>
> Key: MESOS-4392
> URL: https://issues.apache.org/jira/browse/MESOS-4392
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master
>Reporter: Bernd Mathiske
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Maximize resource utilization and minimize starvation risk for both quota 
> frameworks and non-quota, greedy frameworks when competing with each other.
> A greedy analytics batch system wants to use as much of the cluster as 
> possible to maximize computational throughput. When a competing web service 
> with fixed task size starts up, there must be sufficient resources to run it 
> immediately. The operator can reserve these resources by setting quota. 
> However, if these resources are kept idle until the service is in use, this 
> is wasteful from the analytics job's point of view. On the other hand, the 
> analytics job should hand back reserved resources to the service when needed 
> to avoid starvation of the latter.
> We can assume that often, the resources needed by the service will be of the 
> non-revocable variety. Here we need to introduce clearer distinctions between 
> oversubscribed and revocable resources that are not oversubscribed. An 
> oversubscribed resource cannot be converted into a non-revocable resource, 
> not even by preemption. In contrast, a non-oversubscribed, revocable resource 
> can be converted into a non-revocable resource.
> Another related topic is optimistic offers. The pertinent aspect in this 
> context is again whether resources are oversubscribed or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101838#comment-15101838
 ] 

Guangya Liu edited comment on MESOS-4392 at 1/15/16 2:52 PM:
-

Just some early questions, so seems we want to do the things to enable Quoted 
resources can be lend out to other roles if framework on current role is idle.

There is a problem for this: The current Quota logic only define the {{Quota 
Guaranteed}} resources which we can treat it as {{Minimum ownership}} of one 
role and I know that we have plan to introduce {{Quota Limit}} which can define 
kind of {{Maximum ownership}} resources reserved by a role. I think that we 
should ONLY treat the resources between {{Quota Limit}} and {{Quota 
Guaranteed}} as quota revocable resources. 

The admin can define a small {{Quota Guaranteed}} if s/he do not want to waste 
too much resources in case of idle frameworks, then the resources between 
{{Quota Limit}} and {{Quota Guaranteed}} can be treated as revocable resources 
and lend out to other frameworks.

With this solution, seems the {{Quota Limit}} does not make much sense: The 
admin can set a large {{Quota Guaranteed}} and lend out its resources to other 
frameworks.


was (Author: gyliu):
Just some early questions, so seems we want to do the things to enable Quoted 
resources can be lend out to other frameworks if current framework is idle.

There is a problem for this: The current Quota logic only define the {{Quota 
Guaranteed}} resources which we can treat it as {{Minimum ownership}} of one 
role and I know that we have plan to introduce {{Quota Limit}} which can define 
kind of {{Maximum ownership}} resources reserved by a framework. I think that 
we should ONLY treat the resources between {{Quota Limit}} and {{Quota 
Guaranteed}} as quota revocable resources.  The admin can define a small 
{{Quota Guaranteed}} if s/he do not want to waste too much resources, then the 
resources between {{Quota Limit}} and {{Quota Guaranteed}} can be treated as 
revocable resources and lend out to other frameworks.

With this solution, seems the {{Quota Limit}} does not make much sense: The 
admin can set a large {{Quota Guaranteed}} and lend out its resources to other 
frameworks.

> Balance quota frameworks with non-quota, greedy frameworks.
> ---
>
> Key: MESOS-4392
> URL: https://issues.apache.org/jira/browse/MESOS-4392
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master
>Reporter: Bernd Mathiske
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Maximize resource utilization and minimize starvation risk for both quota 
> frameworks and non-quota, greedy frameworks when competing with each other.
> A greedy analytics batch system wants to use as much of the cluster as 
> possible to maximize computational throughput. When a competing web service 
> with fixed task size starts up, there must be sufficient resources to run it 
> immediately. The operator can reserve these resources by setting quota. 
> However, if these resources are kept idle until the service is in use, this 
> is wasteful from the analytics job's point of view. On the other hand, the 
> analytics job should hand back reserved resources to the service when needed 
> to avoid starvation of the latter.
> We can assume that often, the resources needed by the service will be of the 
> non-revocable variety. Here we need to introduce clearer distinctions between 
> oversubscribed and revocable resources that are not oversubscribed. An 
> oversubscribed resource cannot be converted into a non-revocable resource, 
> not even by preemption. In contrast, a non-oversubscribed, revocable resource 
> can be converted into a non-revocable resource.
> Another related topic is optimistic offers. The pertinent aspect in this 
> context is again whether resources are oversubscribed or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101838#comment-15101838
 ] 

Guangya Liu edited comment on MESOS-4392 at 1/15/16 2:53 PM:
-

Just some early questions, so seems we want to do the things to enable Quoted 
resources can be lend out to other roles if framework on current role is idle.

There is a problem for this: The current Quota logic only define the {{Quota 
Guaranteed}} resources which we can treat it as {{Minimum ownership}} of one 
role and I know that we have plan to introduce {{Quota Limit}} which can define 
kind of {{Maximum ownership}} resources reserved by a role. I think that we 
should ONLY treat the resources between {{Quota Limit}} and {{Quota 
Guaranteed}} as quota revocable resources. 

The admin can define a small {{Quota Guaranteed}} if s/he do not want to waste 
too much resources in case of idle frameworks, then the resources between 
{{Quota Limit}} and {{Quota Guaranteed}} can be treated as revocable resources 
and lend out to other frameworks.

With this solution, seems the {{Quota Limit}} does not make much sense: The 
admin can set a large {{Quota Guaranteed}} and lend out current role's 
resources to other roles.


was (Author: gyliu):
Just some early questions, so seems we want to do the things to enable Quoted 
resources can be lend out to other roles if framework on current role is idle.

There is a problem for this: The current Quota logic only define the {{Quota 
Guaranteed}} resources which we can treat it as {{Minimum ownership}} of one 
role and I know that we have plan to introduce {{Quota Limit}} which can define 
kind of {{Maximum ownership}} resources reserved by a role. I think that we 
should ONLY treat the resources between {{Quota Limit}} and {{Quota 
Guaranteed}} as quota revocable resources. 

The admin can define a small {{Quota Guaranteed}} if s/he do not want to waste 
too much resources in case of idle frameworks, then the resources between 
{{Quota Limit}} and {{Quota Guaranteed}} can be treated as revocable resources 
and lend out to other frameworks.

With this solution, seems the {{Quota Limit}} does not make much sense: The 
admin can set a large {{Quota Guaranteed}} and lend out its resources to other 
frameworks.

> Balance quota frameworks with non-quota, greedy frameworks.
> ---
>
> Key: MESOS-4392
> URL: https://issues.apache.org/jira/browse/MESOS-4392
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master
>Reporter: Bernd Mathiske
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Maximize resource utilization and minimize starvation risk for both quota 
> frameworks and non-quota, greedy frameworks when competing with each other.
> A greedy analytics batch system wants to use as much of the cluster as 
> possible to maximize computational throughput. When a competing web service 
> with fixed task size starts up, there must be sufficient resources to run it 
> immediately. The operator can reserve these resources by setting quota. 
> However, if these resources are kept idle until the service is in use, this 
> is wasteful from the analytics job's point of view. On the other hand, the 
> analytics job should hand back reserved resources to the service when needed 
> to avoid starvation of the latter.
> We can assume that often, the resources needed by the service will be of the 
> non-revocable variety. Here we need to introduce clearer distinctions between 
> oversubscribed and revocable resources that are not oversubscribed. An 
> oversubscribed resource cannot be converted into a non-revocable resource, 
> not even by preemption. In contrast, a non-oversubscribed, revocable resource 
> can be converted into a non-revocable resource.
> Another related topic is optimistic offers. The pertinent aspect in this 
> context is again whether resources are oversubscribed or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2918) CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen Flaky

2016-01-15 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101871#comment-15101871
 ] 

Jan Schlicht commented on MESOS-2918:
-

I'd recommend to add a test filter for that, so that this test isn't executed 
if swap is enabled. Looking at the source code of the tests, this filter should 
also cover the check for a recent enough Linux kernel.

> CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen Flaky
> --
>
> Key: MESOS-2918
> URL: https://issues.apache.org/jira/browse/MESOS-2918
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, test
>Affects Versions: 0.23.0
>Reporter: Paul Brett
>Assignee: Chi Zhang
>  Labels: flaky, flaky-test, mesosphere, test, twitter
>
> This test fails when swap is enabled on the platform because it creates a 
> memory hog with the expectation that the OOM killer will kill the hog but 
> with swap enabled, the hog is just swapped out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4394) make check fail while

2016-01-15 Thread Paride Casulli (JIRA)
Paride Casulli created MESOS-4394:
-

 Summary: make check fail while
 Key: MESOS-4394
 URL: https://issues.apache.org/jira/browse/MESOS-4394
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.26.0
 Environment: ubuntu 14.04.3 LTS kernel 3.16.0-55-generic
Reporter: Paride Casulli


Hi, I have the following error during "make check":

[ RUN  ] MasterAuthorizationTest.SlaveRemoved
F0115 15:21:05.297031  3542 utils.cpp:47] CHECK_SOME(parse): syntax error at 
line 1 near: 
,"frameworks\/test-principal\/messages_processed":1,,"frameworks\/test-principal\/messages_received":1,,"master\/cpus_percent":0,,"master\/cpus_revocable_percent":0,,"master\/cpus_revocable_total":0,,"master\/cpus_revocable_used":0,,"master\/cpus_total":0,,"master\/cpus_used":0,,"master\/disk_percent":0,,"master\/disk_revocable_percent":0,,"master\/disk_revocable_total":0,,"master\/disk_revocable_used":0,,"master\/disk_total":0,,"master\/disk_used":0,,"master\/dropped_messages":0,,"master\/elected":1,,"master\/event_queue_dispatches":0,,"master\/event_queue_http_requests":0,,"master\/event_queue_messages":0,,"master\/frameworks_active":1,,"master\/frameworks_connected":1,,"master\/frameworks_disconnected":0,,"master\/frameworks_inactive":0,,"master\/invalid_executor_to_framework_messages":0,,"master\/invalid_framework_to_executor_messages":0,,"master\/invalid_status_update_acknowledgements":0,,"master\/invalid_status_updates":0,,"master\/mem_percent":0,,"master\/mem_revocable_percent":0,,"master\/mem_revocable_total":0,,"master\/mem_revocable_used":0,,"master\/mem_total":0,,"master\/mem_used":0,,"master\/messages_authenticate":2,,"master\/messages_deactivate_framework":0,,"master\/messages_decline_offers":0,,"master\/messages_executor_to_framework":0,,"master\/messages_exited_executor":0,,"master\/messages_framework_to_executor":0,,"master\/messages_kill_task":0,,"master\/messages_launch_tasks":1,,"master\/messages_reconcile_tasks":0,,"master\/messages_register_framework":0,,"master\/messages_register_slave":1,,"master\/messages_reregister_framework":0,,"master\/messages_reregister_slave":0,,"master\/messages_resource_request":0,,"master\/messages_revive_offers":0,,"master\/messages_status_update":0,,"master\/messages_status_update_acknowledgement":0,,"master\/messages_suppress_offers":0,,"master\/messages_unregister_framework":0,,"master\/messages_unregister_slave":1,,"master\/messages_update_slave":1,,"master\/outstanding_offers":0,,"master\/recovery_slave_removals":0,,"master\/slave_registrations":1,,"master\/slave_removals":1,,"master\/slave_removals\/reason_registered":0,,"master\/slave_removals\/reason_unhealthy":0,,"master\/slave_removals\/reason_unregistered":1,,"master\/slave_reregistrations":0,,"master\/slave_shutdowns_canceled":0,,"master\/slave_shutdowns_completed":0,,"master\/slave_shutdowns_scheduled":0,,"master\/slaves_active":0,,"master\/slaves_connected":0,,"master\/slaves_disconnected":0,,"master\/slaves_inactive":0,,"master\/task_lost\/source_master\/reason_slave_removed":1,,"master\/tasks_error":0,,"master\/tasks_failed":0,,"master\/tasks_finished":0,,"master\/tasks_killed":0,,"master\/tasks_lost":1,,"master\/tasks_running":0,,"master\/tasks_staging":1,,"master\/tasks_starting":0,,"master\/uptime_secs":0,072886016,"master\/valid_executor_to_framework_messages":0,,"master\/valid_framework_to_executor_messages":0,,"master\/valid_status_update_acknowledgements":0,,"master\/valid_status_updates":0,,"registrar\/queued_operations":0,,"registrar\/registry_size_bytes":182,,"registrar\/state_fetch_ms":12,898816,"registrar\/state_store_ms":5,33888,"registrar\/state_store_ms\/count":3,"registrar\/state_store_ms\/max":5,33888,"registrar\/state_store_ms\/min":5,106944,"registrar\/state_store_ms\/p50":5,12896,"registrar\/state_store_ms\/p90":5,296896,"registrar\/state_store_ms\/p95":5,317888,"registrar\/state_store_ms\/p99":5,3346816,"registrar\/state_store_ms\/p999":5,33846016,"registrar\/state_store_ms\/p":5,338838016,"scheduler\/event_queue_dispatches":1,,"scheduler\/event_queue_messages":0,,"system\/cpus_total":4,,"system\/load_15min":0,26,"system\/load_1min":0,84,"system\/load_5min":0,49,"system\/mem_free_bytes":4223746048,,"system\/mem_total_bytes":6040068096,}
 
*** Check failure stack trace: ***
@ 0x2b95be981740  google::LogMessage::Fail()
@ 0x2b95be98168c  google::LogMessage::SendToLog()
@ 0x2b95be98108e  google::LogMessage::Flush()
@ 0x2b95be983fa2  google::LogMessageFatal::~LogMessageFatal()
@   0x9433b0  _CheckFatal::~_CheckFatal()
@  0x13a857c  mesos::internal::tests::Metrics()
@   0xe2c1e9  
mesos::internal::tests::MasterAuthorizationTest_SlaveRemoved_Test::TestBody()
@  0x15a0068  
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@  0x159afbe  
testing::inte

[jira] [Commented] (MESOS-4295) Change documentation links to "*.md"

2016-01-15 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101930#comment-15101930
 ] 

Joerg Schad commented on MESOS-4295:


Dev Mailing list discussion: 
http://www.mail-archive.com/dev@mesos.apache.org/msg34066.html 

> Change documentation links to "*.md"
> 
>
> Key: MESOS-4295
> URL: https://issues.apache.org/jira/browse/MESOS-4295
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Joerg Schad
>Priority: Minor
>  Labels: documentation, mesosphere, newbie
>
> Right now, links either use the form 
> {noformat}[label](/documentation/latest/foo/){noformat} or 
> {noformat}[label](foo.md){noformat}. We should probably switch to using the 
> latter form consistently -- it previews better on Github, and it will make it 
> easier to have multiple versions of the docs on the website at once in the 
> future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4195) Add dynamic reservation tests with no principal

2016-01-15 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4195:
-
Sprint: Mesosphere Sprint 27

> Add dynamic reservation tests with no principal
> ---
>
> Key: MESOS-4195
> URL: https://issues.apache.org/jira/browse/MESOS-4195
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Greg Mann
>  Labels: mesosphere
>
> Currently, there exist no dynamic reservation tests that include 
> authorization of a framework that is registered with no principal. This 
> should be added in order to more comprehensively test the dynamic reservation 
> code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4384) Documentation cannot link to external URLs that end in .md

2016-01-15 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102015#comment-15102015
 ] 

Joseph Wu commented on MESOS-4384:
--

Note: I modified the regex here:
https://reviews.apache.org/r/42172/

> Documentation cannot link to external URLs that end in .md
> --
>
> Key: MESOS-4384
> URL: https://issues.apache.org/jira/browse/MESOS-4384
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Joerg Schad
>Priority: Minor
>  Labels: documentation, mesosphere
>
> Per [~joerg84]: "In fact it seems that all links ending with .md are 
> interpreted as
> relative links on the webpage, i.e. [label](https://test.com/foo.md) is
> rendered into https://test.com/foo/
> ">label"
> Currently the rakefile will rewrite all with this too general regex 
> {code}
> '''f.read.gsub(/\((.*)(\.md)\)/, '(/documentation/latest/\1/)')'''
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4195) Add dynamic reservation tests with no principal

2016-01-15 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4195:
-
Labels: mesosphere tests  (was: mesosphere)

> Add dynamic reservation tests with no principal
> ---
>
> Key: MESOS-4195
> URL: https://issues.apache.org/jira/browse/MESOS-4195
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Greg Mann
>  Labels: mesosphere, tests
>
> Currently, there exist no dynamic reservation tests that include 
> authorization of a framework that is registered with no principal. This 
> should be added in order to more comprehensively test the dynamic reservation 
> code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4395) Add persistent volume endpoint tests with no principal

2016-01-15 Thread Greg Mann (JIRA)
Greg Mann created MESOS-4395:


 Summary: Add persistent volume endpoint tests with no principal
 Key: MESOS-4395
 URL: https://issues.apache.org/jira/browse/MESOS-4395
 Project: Mesos
  Issue Type: Bug
Reporter: Greg Mann
Assignee: Greg Mann


There are currently no persistent volume endpoint tests that do not use a 
principal; they should be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4396) Adding Tachyon to the list of frameworks

2016-01-15 Thread Jiri Simsa (JIRA)
Jiri Simsa created MESOS-4396:
-

 Summary: Adding Tachyon to the list of frameworks
 Key: MESOS-4396
 URL: https://issues.apache.org/jira/browse/MESOS-4396
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Affects Versions: 0.26.0
Reporter: Jiri Simsa
Priority: Minor


The Tachyon project provided a Mesos framework. Update the Mesos documentation 
to reflect this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4382) Change the `principal` in `ReservationInfo` to optional

2016-01-15 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4382:
-
Story Points: 1  (was: 2)

> Change the `principal` in `ReservationInfo` to optional
> ---
>
> Key: MESOS-4382
> URL: https://issues.apache.org/jira/browse/MESOS-4382
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere, reservations
>
> With the addition of HTTP endpoints for {{/reserve}} and {{/unreserve}}, it 
> is now desirable to allow dynamic reservations without a principal, in the 
> case where HTTP authentication is disabled. To allow for this, we will change 
> the {{principal}} field in {{ReservationInfo}} from required to optional. For 
> backwards-compatibility, however, the master should currently invalidate any 
> {{ReservationInfo}} messages that do not have this field set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4195) Add dynamic reservation tests with no principal

2016-01-15 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-4195:


Assignee: Greg Mann

> Add dynamic reservation tests with no principal
> ---
>
> Key: MESOS-4195
> URL: https://issues.apache.org/jira/browse/MESOS-4195
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere, tests
>
> Currently, there exist no dynamic reservation tests that include 
> authorization of a framework that is registered with no principal. This 
> should be added in order to more comprehensively test the dynamic reservation 
> code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4295) Change documentation links to "*.md"

2016-01-15 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-4295:
---
Shepherd: Till Toenshoff

> Change documentation links to "*.md"
> 
>
> Key: MESOS-4295
> URL: https://issues.apache.org/jira/browse/MESOS-4295
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Joerg Schad
>Priority: Minor
>  Labels: documentation, mesosphere, newbie
>
> Right now, links either use the form 
> {noformat}[label](/documentation/latest/foo/){noformat} or 
> {noformat}[label](foo.md){noformat}. We should probably switch to using the 
> latter form consistently -- it previews better on Github, and it will make it 
> easier to have multiple versions of the docs on the website at once in the 
> future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4384) Documentation cannot link to external URLs that end in .md

2016-01-15 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-4384:
---
Shepherd: Till Toenshoff

> Documentation cannot link to external URLs that end in .md
> --
>
> Key: MESOS-4384
> URL: https://issues.apache.org/jira/browse/MESOS-4384
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Joerg Schad
>Priority: Minor
>  Labels: documentation, mesosphere
>
> Per [~joerg84]: "In fact it seems that all links ending with .md are 
> interpreted as
> relative links on the webpage, i.e. [label](https://test.com/foo.md) is
> rendered into https://test.com/foo/
> ">label"
> Currently the rakefile will rewrite all with this too general regex 
> {code}
> '''f.read.gsub(/\((.*)(\.md)\)/, '(/documentation/latest/\1/)')'''
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3826) Add an optional unique identifier for resource reservations

2016-01-15 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3826:
---
  Sprint: Mesosphere Sprint 27
Story Points: 5

> Add an optional unique identifier for resource reservations
> ---
>
> Key: MESOS-3826
> URL: https://issues.apache.org/jira/browse/MESOS-3826
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Sargun Dhillon
>Assignee: Neil Conway
>  Labels: mesosphere, reservations
>
> Thanks to the resource reservation primitives, frameworks can reserve 
> resources. These reservations are per role, which means multiple frameworks 
> can share reservations. This can get very hairy, as multiple reservations can 
> occur on each agent. 
> It would be nice to be able to optionally, uniquely identify reservations by 
> ID, much like persistent volumes are today. This could be done by adding a 
> new protobuf field, such as Resource.ReservationInfo.id, that if set upon 
> reservation time, would come back when the reservation is advertised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4384) Documentation cannot link to external URLs that end in .md

2016-01-15 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-4384:
---
Sprint: Mesosphere Sprint 27

> Documentation cannot link to external URLs that end in .md
> --
>
> Key: MESOS-4384
> URL: https://issues.apache.org/jira/browse/MESOS-4384
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Joerg Schad
>Priority: Minor
>  Labels: documentation, mesosphere
>
> Per [~joerg84]: "In fact it seems that all links ending with .md are 
> interpreted as
> relative links on the webpage, i.e. [label](https://test.com/foo.md) is
> rendered into https://test.com/foo/
> ">label"
> Currently the rakefile will rewrite all with this too general regex 
> {code}
> '''f.read.gsub(/\((.*)(\.md)\)/, '(/documentation/latest/\1/)')'''
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4102) Quota doesn't allocate resources on slave joining

2016-01-15 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4102:

Target Version/s: 0.27.0
Priority: Blocker  (was: Major)

> Quota doesn't allocate resources on slave joining
> -
>
> Key: MESOS-4102
> URL: https://issues.apache.org/jira/browse/MESOS-4102
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Neil Conway
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere, quota
> Attachments: quota_absent_framework_test-1.patch
>
>
> See attached patch. {{framework1}} is not allocated any resources, despite 
> the fact that the resources on {{agent2}} can safely be allocated to it 
> without risk of violating {{quota1}}. If I understand the intended quota 
> behavior correctly, this doesn't seem intended.
> Note that if the framework is added _after_ the slaves are added, the 
> resources on {{agent2}} are allocated to {{framework1}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4370) NetworkSettings.IPAddress field is deprectaed in Docker

2016-01-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102041#comment-15102041
 ] 

ASF GitHub Bot commented on MESOS-4370:
---

GitHub user travishegner opened a pull request:

https://github.com/apache/mesos/pull/87

[MESOS-4370] Retrieve network mode (name) and use that to get that networks 
IP add…

…ress.

This patch will first query the docker API for the 
`HostConfig.NetworkMode`, which is populated with the network name. 
(Essentially what was passed in `--net ` to the docker run command). This 
name is then used as a key in `NetworkSettings.Networks..IPAddress` to 
get the IP address that is currently in use by the container.

It appears that even though the docker API has been set up to allow for 
multiple networks, our testing has indicated that it's still only applying one 
network to the container (the last one via the `--net` argument on the run 
line). I can only speculate that the docker API will change again in the near 
future, but I can't speculate how, so at least this fixes the problem as it 
stands right now. This patch is tested against Docker 1.9.1 on Ubuntu 14.04.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/travishegner/mesos master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mesos/pull/87.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #87


commit 75b98792b1d12bffacb74e1ce7444c67f0e0860e
Author: Travis Hegner 
Date:   2016-01-15T13:40:51Z

Retrieve network mode (name) and use that to get that networks IP address.




> NetworkSettings.IPAddress field is deprectaed in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0
> Environment: Ubuntu 14.04
> Docker 1.9.1
>Reporter: Clint Armstrong
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2742) Architecture doc on global resources

2016-01-15 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-2742:
---
Shepherd: Bernd Mathiske  (was: Adam B)

> Architecture doc on global resources
> 
>
> Key: MESOS-2742
> URL: https://issues.apache.org/jira/browse/MESOS-2742
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Joerg Schad
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2742) Architecture doc on global resources

2016-01-15 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-2742:
---
  Sprint: Mesosphere Sprint 27
Story Points: 5  (was: 3)

> Architecture doc on global resources
> 
>
> Key: MESOS-2742
> URL: https://issues.apache.org/jira/browse/MESOS-2742
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Joerg Schad
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4102) Quota doesn't allocate resources on slave joining.

2016-01-15 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4102:
---
Summary: Quota doesn't allocate resources on slave joining.  (was: Quota 
doesn't allocate resources on slave joining)

> Quota doesn't allocate resources on slave joining.
> --
>
> Key: MESOS-4102
> URL: https://issues.apache.org/jira/browse/MESOS-4102
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Neil Conway
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere, quota
> Attachments: quota_absent_framework_test-1.patch
>
>
> See attached patch. {{framework1}} is not allocated any resources, despite 
> the fact that the resources on {{agent2}} can safely be allocated to it 
> without risk of violating {{quota1}}. If I understand the intended quota 
> behavior correctly, this doesn't seem intended.
> Note that if the framework is added _after_ the slaves are added, the 
> resources on {{agent2}} are allocated to {{framework1}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3923) Implement AuthN handling for HTTP Scheduler API

2016-01-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3923:
--
Sprint:   (was: Mesosphere Sprint 27)

> Implement AuthN handling for HTTP Scheduler API
> ---
>
> Key: MESOS-3923
> URL: https://issues.apache.org/jira/browse/MESOS-3923
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API, master
>Affects Versions: 0.25.0
>Reporter: Ben Whitehead
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> If authentication(AuthN) is enabled on a master, frameworks attempting to use 
> the HTTP Scheduler API can't register.
> {code}
> $ cat /tmp/subscribe-943257503176798091.bin | http --print=HhBb --stream 
> --pretty=colors --auth verification:password1 POST :5050/api/v1/scheduler 
> Accept:application/x-protobuf Content-Type:application/x-protobuf
> POST /api/v1/scheduler HTTP/1.1
> Connection: keep-alive
> Content-Type: application/x-protobuf
> Accept-Encoding: gzip, deflate
> Accept: application/x-protobuf
> Content-Length: 126
> User-Agent: HTTPie/0.9.0
> Host: localhost:5050
> Authorization: Basic dmVyaWZpY2F0aW9uOnBhc3N3b3JkMQ==
> +-+
> | NOTE: binary data not shown in terminal |
> +-+
> HTTP/1.1 401 Unauthorized
> Date: Fri, 13 Nov 2015 20:00:45 GMT
> WWW-authenticate: Basic realm="Mesos master"
> Content-Length: 65
> HTTP schedulers are not supported when authentication is required
> {code}
> Authorization(AuthZ) is already supported for HTTP based frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2068) Add comments that explain framework, executor ID, and task life cycle in slave

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2068:
--
Component/s: documentation

> Add comments that explain framework, executor ID, and task life cycle in slave
> --
>
> Key: MESOS-2068
> URL: https://issues.apache.org/jira/browse/MESOS-2068
> Project: Mesos
>  Issue Type: Improvement
>  Components: documentation, slave
>Reporter: Bernd Mathiske
>Assignee: Bernd Mathiske
>Priority: Minor
>  Labels: mesosphere
>
> Fixing MESOS-947 was relatively difficult because the source code is mostly 
> the only source of information with regard to the life cycle of frameworks, 
> executors, and tasks in the slave. In particular this leads to confusion 
> about whether there could be a task lost state  at the beginning of 
> _runTask() when the framework is NULL. This shall be explained to the best of 
> the assignees knowledge.
> For context see https://reviews.apache.org/r/27567
> with these comments:
> On Nov. 5, 2014, 7:50 p.m., Ben Mahler wrote:
> src/slave/slave.cpp, lines 1195-1200
> 
>A comment here as to why we don't need to send TASK_LOST would be much 
> appreciated! It's not obvious so someone might come along and add a TASK_LOST 
> to make sure we're not dropping the task on the floor, so context here would 
> be great!
> Bernd Mathiske wrote:
>Hah, thanks for sharing - I am not alone! :-) None of this was obvious to 
> me either, because there is no comment explaining the general life cycle of 
> anything. Once you understand the intended life cycle, there is now way there 
> can be a TASK_LOST situation here, though. Therefore I propose adding 
> comments describing the overall picture regarding frameworks, executor IDs 
> and task creation in the appropriate places, instead. I'll file a ticket if 
> you agree.
> Once you understand the intended life cycle, there is now way there can be a 
> TASK_LOST situation here, though.
> Phew! :)
> Could you distill your learnings into a comment here, and maybe make the log 
> message more informative? Even with an overall description as you mentioned, 
> dummies like me would still get confused here given the lack of _local_ 
> context. ;)
> - Ben



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1790) Add "chown" option to CommandInfo.URI

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-1790:
--
Sprint: Mesosphere Sprint 27

> Add "chown" option to CommandInfo.URI
> -
>
> Key: MESOS-1790
> URL: https://issues.apache.org/jira/browse/MESOS-1790
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Jim Klucar
>  Labels: myriad, newbie
> Attachments: 
> 0001-MESOS-1790-Adds-chown-option-to-CommandInfo.URI.patch
>
>
> Mesos fetcher always chown()s the extracted executor URIs as the executor 
> user but sometimes this is not desirable, e.g., "setuid" bit gets lost during 
> chown() if slave/fetcher is running as root. 
> It would be nice to give frameworks the ability to skip the chown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3854) Finalize design for generalized Authorizer interface

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3854:
--
Sprint: Mesosphere Sprint 27

> Finalize design for generalized Authorizer interface
> 
>
> Key: MESOS-3854
> URL: https://issues.apache.org/jira/browse/MESOS-3854
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Bernd Mathiske
>Assignee: Alexander Rojas
>  Labels: authorization, mesosphere
>
> Finalize the structure of ACLs and achieve consensus on the design doc 
> proposed in MESOS-2949.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4336) Document supported file types for archive extraction by fetcher

2016-01-15 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-4336:
--

Assignee: Bernd Mathiske

> Document supported file types for archive extraction by fetcher
> ---
>
> Key: MESOS-4336
> URL: https://issues.apache.org/jira/browse/MESOS-4336
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, fetcher
>Reporter: Sunil Shah
>Assignee: Bernd Mathiske
>Priority: Trivial
>  Labels: documentation, mesosphere, newbie
>
> The Mesos fetcher extracts specified URIs if requested to do so by the 
> scheduler. However, the documentation at 
> http://mesos.apache.org/documentation/latest/fetcher/ doesn't list the file 
> types /extensions that will be extracted by the fetcher.
> [The relevant 
> code|https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L63]
>  specifies an exhaustive list of extensions that will be extracted, the 
> documentation should be updated to match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4377) Document units associated with resource types

2016-01-15 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4377:
---
  Sprint: Mesosphere Sprint 27
Story Points: 1

> Document units associated with resource types
> -
>
> Key: MESOS-4377
> URL: https://issues.apache.org/jira/browse/MESOS-4377
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: documentation, mesosphere
>
> We should document the units associated with memory and disk resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4376) Document semantics of `slaveLost`

2016-01-15 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4376:
--
Story Points: 2

> Document semantics of `slaveLost`
> -
>
> Key: MESOS-4376
> URL: https://issues.apache.org/jira/browse/MESOS-4376
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, scheduler driver
>Reporter: Neil Conway
>Assignee: Vinod Kone
>  Labels: documentation, mesosphere
>
> We should clarify the semantics of this callback:
> * Is it always invoked, or just a hint?
> * Can a slave ever come back from `slaveLost`?
> * What happens to persistent resources on a lost slave?
> The new HA framework development guide might be a good place to put (some 
> of?) this information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4375) Allow schemes in HDFS URI fetcher plugin to be configurable.

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4375:
--
Sprint: Mesosphere Sprint 27

> Allow schemes in HDFS URI fetcher plugin to be configurable.
> 
>
> Key: MESOS-4375
> URL: https://issues.apache.org/jira/browse/MESOS-4375
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: haosdent
>Priority: Critical
>
> hadoop commands can handle a variety of schemes. We should not hard code the 
> supported schemes. Instead, we should allow operators to configure it through 
> agent flags. See comments in https://reviews.apache.org/r/41713/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4336) Document supported file types for archive extraction by fetcher

2016-01-15 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4336:
---
Sprint: Mesosphere Sprint 27

> Document supported file types for archive extraction by fetcher
> ---
>
> Key: MESOS-4336
> URL: https://issues.apache.org/jira/browse/MESOS-4336
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, fetcher
>Reporter: Sunil Shah
>Assignee: Bernd Mathiske
>Priority: Trivial
>  Labels: documentation, mesosphere, newbie
>
> The Mesos fetcher extracts specified URIs if requested to do so by the 
> scheduler. However, the documentation at 
> http://mesos.apache.org/documentation/latest/fetcher/ doesn't list the file 
> types /extensions that will be extracted by the fetcher.
> [The relevant 
> code|https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L63]
>  specifies an exhaustive list of extensions that will be extracted, the 
> documentation should be updated to match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4377) Document units associated with resource types

2016-01-15 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-4377:
--

Assignee: Neil Conway

> Document units associated with resource types
> -
>
> Key: MESOS-4377
> URL: https://issues.apache.org/jira/browse/MESOS-4377
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: documentation, mesosphere
>
> We should document the units associated with memory and disk resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1365) SlaveRecoveryTest/0.MultipleFrameworks is flaky

2016-01-15 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-1365:
-
Shepherd: Vinod Kone
  Sprint: Q2'14 Sprint 2, Mesosphere Sprint 27  (was: Q2'14 Sprint 2)
Story Points: 2
  Labels: flaky flaky-test mesosphere  (was: flaky flaky-test)

> SlaveRecoveryTest/0.MultipleFrameworks is flaky
> ---
>
> Key: MESOS-1365
> URL: https://issues.apache.org/jira/browse/MESOS-1365
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Dominic Hamon
>Assignee: Greg Mann
>Priority: Minor
>  Labels: flaky, flaky-test, mesosphere
>
> --gtest_repeat=-1 --gtest_shuffle --gtest_break_on_failure
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.MultipleFrameworks
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0513 15:42:05.931761  4320 exec.cpp:131] Version: 0.19.0
> I0513 15:42:05.936698  4340 exec.cpp:205] Executor registered on slave 
> 20140513-154204-16842879-51872-13062-0
> Registered executor on artoo
> Starting task 51991f97-f5fd-4905-ad0f-02668083af7c
> Forked command at 4367
> sh -c 'sleep 1000'
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0513 15:42:06.915061  4408 exec.cpp:131] Version: 0.19.0
> I0513 15:42:06.931149  4435 exec.cpp:205] Executor registered on slave 
> 20140513-154204-16842879-51872-13062-0
> Registered executor on artoo
> Starting task eaf5d8d6-3a6c-4ee1-84c1-fae20fb1df83
> sh -c 'sleep 1000'
> Forked command at 4439
> I0513 15:42:06.998332  4340 exec.cpp:251] Received reconnect request from 
> slave 20140513-154204-16842879-51872-13062-0
> I0513 15:42:06.998414  4436 exec.cpp:251] Received reconnect request from 
> slave 20140513-154204-16842879-51872-13062-0
> I0513 15:42:07.006350  4437 exec.cpp:228] Executor re-registered on slave 
> 20140513-154204-16842879-51872-13062-0
> Re-registered executor on artoo
> I0513 15:42:07.027039  4337 exec.cpp:378] Executor asked to shutdown
> Shutting down
> Sending SIGTERM to process tree at pid 4367
> Killing the following process trees:
> [ 
> -+- 4367 sh -c sleep 1000 
>  \--- 4368 sleep 1000 
> ]
> ../../src/tests/slave_recovery_tests.cpp:2807: Failure
> Value of: status1.get().state()
>   Actual: TASK_FAILED
> Expected: TASK_KILLED
> Program received signal SIGSEGV, Segmentation fault.
> testing::UnitTest::AddTestPartResult (this=0x154dac0 
> , 
> result_type=testing::TestPartResult::kFatalFailure, file_name=0xeb6b6c 
> "../../src/tests/slave_recovery_tests.cpp", line_number=2807, message=..., 
> os_stack_trace=...) at gmock-1.6.0/gtest/src/gtest.cc:3795
> 3795  *static_cast(NULL) = 1;
> (gdb) bt
> #0  testing::UnitTest::AddTestPartResult (this=0x154dac0 
> , 
> result_type=testing::TestPartResult::kFatalFailure, file_name=0xeb6b6c 
> "../../src/tests/slave_recovery_tests.cpp", line_number=2807, message=..., 
> os_stack_trace=...) at gmock-1.6.0/gtest/src/gtest.cc:3795
> #1  0x00df98b9 in testing::internal::AssertHelper::operator= 
> (this=0x7fffb860, message=...) at gmock-1.6.0/gtest/src/gtest.cc:356
> #2  0x00cdfa57 in 
> SlaveRecoveryTest_MultipleFrameworks_Test::TestBody
>  (this=0x1954db0) at ../../src/tests/slave_recovery_tests.cpp:2807
> #3  0x00e22583 in 
> testing::internal::HandleSehExceptionsInMethodIfSupported void> (object=0x1954db0, method=&virtual testing::Test::TestBody(), 
> location=0xed0af0 "the test body") at gmock-1.6.0/gtest/src/gtest.cc:2090
> #4  0x00e12467 in 
> testing::internal::HandleExceptionsInMethodIfSupported 
> (object=0x1954db0, method=&virtual testing::Test::TestBody(), 
> location=0xed0af0 "the test body") at gmock-1.6.0/gtest/src/gtest.cc:2126
> #5  0x00e010d5 in testing::Test::Run (this=0x1954db0) at 
> gmock-1.6.0/gtest/src/gtest.cc:2161
> #6  0x00e01ceb in testing::TestInfo::Run (this=0x158cf80) at 
> gmock-1.6.0/gtest/src/gtest.cc:2338
> #7  0x00e02387 in testing::TestCase::Run (this=0x158a880) at 
> gmock-1.6.0/gtest/src/gtest.cc:2445
> #8  0x00e079ed in testing::internal::UnitTestImpl::RunAllTests 
> (this=0x1558b40) at gmock-1.6.0/gtest/src/gtest.cc:4237
> #9  0x00e1ec83 in 
> testing::internal::HandleSehExceptionsInMethodIfSupported  bool> (object=0x1558b40, method=(bool 
> (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * 
> const)) 0xe07700 , 
> location=0xed1219 "auxiliary test code (environments or event 
> listeners)") at gmock-1.6.0/gtest/src/gtest.cc:2090
> #10 0x00e14217 in 
> testing::internal::HandleExceptionsInMethodIfSupported  bool> (object=0x1558b40, method=(bool 
> (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * 
> const)) 0xe07700 , 
> location=0xed1219 "auxiliary test code (environments or event 
> li

[jira] [Updated] (MESOS-2017) Segfault with "Pure virtual method called" when tests fail

2016-01-15 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-2017:
---
Labels: mesosphere  (was: twitter)

> Segfault with "Pure virtual method called" when tests fail
> --
>
> Key: MESOS-2017
> URL: https://issues.apache.org/jira/browse/MESOS-2017
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Yan Xu
>Assignee: Kevin Klues
>  Labels: mesosphere, tests
>
> The most recent one:
> {noformat:title=DRFAllocatorTest.DRFAllocatorProcess}
> [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
> Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j'
> I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms
> I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms
> I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns
> I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in 
> 2018ns
> I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 335ns
> I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery
> I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status
> I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to 
> STARTING
> I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 591981ns
> I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to 
> STARTING
> I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status
> I1030 05:55:06.940820 24489 master.cpp:312] Master 
> 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on 
> 67.195.81.187:40429
> I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing 
> authenticated frameworks to register
> I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing 
> authenticated slaves to register
> I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials'
> I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled
> I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.187:40429
> I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is 
> master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459
> I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master!
> I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar
> I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar
> I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING
> I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 536365ns
> I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to 
> VOTING
> I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos 
> group
> I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated
> I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer
> I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 806463ns
> I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1
> I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 603843ns
> I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0
> I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request 
> for position 0
> I1030 05:55:06.952239 24476 leveldb.cpp:438] Reading position from leveldb 
> took 28437ns
> I1030 05:55:06.952896 244

[jira] [Updated] (MESOS-4334) Add documentation for the registry

2016-01-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4334:
--
Shepherd: Benjamin Mahler

> Add documentation for the registry
> --
>
> Key: MESOS-4334
> URL: https://issues.apache.org/jira/browse/MESOS-4334
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, master
>Reporter: Neil Conway
>Assignee: Anand Mazumdar
>  Labels: documentation, mesosphere, registry
>
> What information does the master store in the registry? What do operators 
> need to know about managing the registry?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1594) SlaveRecoveryTest/0.ReconcileKillTask is flaky

2016-01-15 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-1594:


Assignee: Greg Mann

> SlaveRecoveryTest/0.ReconcileKillTask is flaky
> --
>
> Key: MESOS-1594
> URL: https://issues.apache.org/jira/browse/MESOS-1594
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.0
> Environment: Ubuntu 12.10 with GCC
>Reporter: Vinod Kone
>Assignee: Greg Mann
>  Labels: flaky
>
> Observed this on Jenkins.
> {code}
> [ RUN  ] SlaveRecoveryTest/0.ReconcileKillTask
> Using temporary directory '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG'
> I0714 15:08:43.915114 27216 leveldb.cpp:176] Opened db in 474.695188ms
> I0714 15:08:43.933645 27216 leveldb.cpp:183] Compacted db in 18.068942ms
> I0714 15:08:43.934129 27216 leveldb.cpp:198] Created db iterator in 7860ns
> I0714 15:08:43.934439 27216 leveldb.cpp:204] Seeked to beginning of db in 
> 2560ns
> I0714 15:08:43.934779 27216 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1400ns
> I0714 15:08:43.935098 27216 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0714 15:08:43.936027 27238 recover.cpp:425] Starting replica recovery
> I0714 15:08:43.936225 27238 recover.cpp:451] Replica is in EMPTY status
> I0714 15:08:43.936867 27238 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0714 15:08:43.937049 27238 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0714 15:08:43.937232 27238 recover.cpp:542] Updating replica status to 
> STARTING
> I0714 15:08:43.945600 27235 master.cpp:288] Master 
> 20140714-150843-16842879-55850-27216 (quantal) started on 127.0.1.1:55850
> I0714 15:08:43.945643 27235 master.cpp:325] Master only allowing 
> authenticated frameworks to register
> I0714 15:08:43.945651 27235 master.cpp:330] Master only allowing 
> authenticated slaves to register
> I0714 15:08:43.945658 27235 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG/credentials'
> I0714 15:08:43.945808 27235 master.cpp:359] Authorization enabled
> I0714 15:08:43.946369 27235 hierarchical_allocator_process.hpp:301] 
> Initializing hierarchical allocator process with master : 
> master@127.0.1.1:55850
> I0714 15:08:43.946419 27235 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0714 15:08:43.946614 27235 master.cpp:1128] The newly elected leader is 
> master@127.0.1.1:55850 with id 20140714-150843-16842879-55850-27216
> I0714 15:08:43.946630 27235 master.cpp:1141] Elected as the leading master!
> I0714 15:08:43.946637 27235 master.cpp:959] Recovering from registrar
> I0714 15:08:43.946707 27235 registrar.cpp:313] Recovering registrar
> I0714 15:08:43.957895 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 20.529301ms
> I0714 15:08:43.957978 27238 replica.cpp:320] Persisted replica status to 
> STARTING
> I0714 15:08:43.958142 27238 recover.cpp:451] Replica is in STARTING status
> I0714 15:08:43.958664 27238 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0714 15:08:43.958762 27238 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0714 15:08:43.958945 27238 recover.cpp:542] Updating replica status to VOTING
> I0714 15:08:43.975685 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 16.646136ms
> I0714 15:08:43.976367 27238 replica.cpp:320] Persisted replica status to 
> VOTING
> I0714 15:08:43.976824 27241 recover.cpp:556] Successfully joined the Paxos 
> group
> I0714 15:08:43.977072 27242 recover.cpp:440] Recover process terminated
> I0714 15:08:43.980590 27236 log.cpp:656] Attempting to start the writer
> I0714 15:08:43.981385 27236 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0714 15:08:43.999141 27236 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 17.705787ms
> I0714 15:08:43.999222 27236 replica.cpp:342] Persisted promised to 1
> I0714 15:08:44.004451 27240 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0714 15:08:44.004914 27240 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0714 15:08:44.021456 27240 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 16.499775ms
> I0714 15:08:44.021533 27240 replica.cpp:676] Persisted action at 0
> I0714 15:08:44.022006 27240 replica.cpp:508] Replica received write request 
> for position 0
> I0714 15:08:44.022043 27240 leveldb.cpp:438] Reading position from leveldb 
> took 21376ns
> I0714 15:08:44.035969 27240 leveldb.cpp:343] Persisting action (14 bytes) to 
> 

[jira] [Updated] (MESOS-4298) Sync up configuration.md and flags.cpp

2016-01-15 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4298:
---
Shepherd: Vinod Kone

> Sync up configuration.md and flags.cpp
> --
>
> Key: MESOS-4298
> URL: https://issues.apache.org/jira/browse/MESOS-4298
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Greg Mann
>  Labels: documentation, mesosphere, newbie
>
> The https://reviews.apache.org/r/39923/ made some clean up for 
> configuration.md but the related flags.cpp was not updated, we should update 
> those files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4377) Document units associated with resource types

2016-01-15 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4377:
---
Shepherd: Vinod Kone

> Document units associated with resource types
> -
>
> Key: MESOS-4377
> URL: https://issues.apache.org/jira/browse/MESOS-4377
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: documentation, mesosphere
>
> We should document the units associated with memory and disk resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4336) Document supported file types for archive extraction by fetcher

2016-01-15 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4336:
---
Shepherd: Bernd Mathiske

> Document supported file types for archive extraction by fetcher
> ---
>
> Key: MESOS-4336
> URL: https://issues.apache.org/jira/browse/MESOS-4336
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, fetcher
>Reporter: Sunil Shah
>Assignee: Bernd Mathiske
>Priority: Trivial
>  Labels: documentation, mesosphere, newbie
>
> The Mesos fetcher extracts specified URIs if requested to do so by the 
> scheduler. However, the documentation at 
> http://mesos.apache.org/documentation/latest/fetcher/ doesn't list the file 
> types /extensions that will be extracted by the fetcher.
> [The relevant 
> code|https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L63]
>  specifies an exhaustive list of extensions that will be extracted, the 
> documentation should be updated to match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4200) Test case(s) for weights + allocation behavior

2016-01-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4200:
--
Story Points: 2

> Test case(s) for weights + allocation behavior
> --
>
> Key: MESOS-4200
> URL: https://issues.apache.org/jira/browse/MESOS-4200
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, test
>Reporter: Neil Conway
>Assignee: Yongqiao Wang
>  Labels: mesosphere, test, weight
>
> As far as I can see, we currently have NO test cases for behavior when 
> weights are defined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4376) Document semantics of `slaveLost`

2016-01-15 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4376:
---
Shepherd: Vinod Kone

> Document semantics of `slaveLost`
> -
>
> Key: MESOS-4376
> URL: https://issues.apache.org/jira/browse/MESOS-4376
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, scheduler driver
>Reporter: Neil Conway
>Assignee: Vinod Kone
>  Labels: documentation, mesosphere
>
> We should clarify the semantics of this callback:
> * Is it always invoked, or just a hint?
> * Can a slave ever come back from `slaveLost`?
> * What happens to persistent resources on a lost slave?
> The new HA framework development guide might be a good place to put (some 
> of?) this information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2017) Segfault with "Pure virtual method called" when tests fail

2016-01-15 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-2017:
---
Sprint: Twitter Mesos Q4 Sprint 3, Mesosphere Sprint 27  (was: Twitter 
Mesos Q4 Sprint 3)

> Segfault with "Pure virtual method called" when tests fail
> --
>
> Key: MESOS-2017
> URL: https://issues.apache.org/jira/browse/MESOS-2017
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Yan Xu
>Assignee: Kevin Klues
>  Labels: twitter
>
> The most recent one:
> {noformat:title=DRFAllocatorTest.DRFAllocatorProcess}
> [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
> Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j'
> I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms
> I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms
> I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns
> I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in 
> 2018ns
> I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 335ns
> I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery
> I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status
> I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to 
> STARTING
> I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 591981ns
> I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to 
> STARTING
> I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status
> I1030 05:55:06.940820 24489 master.cpp:312] Master 
> 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on 
> 67.195.81.187:40429
> I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing 
> authenticated frameworks to register
> I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing 
> authenticated slaves to register
> I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials'
> I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled
> I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.187:40429
> I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is 
> master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459
> I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master!
> I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar
> I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar
> I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING
> I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 536365ns
> I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to 
> VOTING
> I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos 
> group
> I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated
> I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer
> I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 806463ns
> I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1
> I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 603843ns
> I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0
> I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request 
> for position 0
> I1030 05:55:06.952239 24476 leveldb.cpp:438] Reading position from level

[jira] [Updated] (MESOS-1365) SlaveRecoveryTest/0.MultipleFrameworks is flaky

2016-01-15 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-1365:
-
Story Points: 1  (was: 2)

> SlaveRecoveryTest/0.MultipleFrameworks is flaky
> ---
>
> Key: MESOS-1365
> URL: https://issues.apache.org/jira/browse/MESOS-1365
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Dominic Hamon
>Assignee: Greg Mann
>Priority: Minor
>  Labels: flaky, flaky-test, mesosphere
>
> --gtest_repeat=-1 --gtest_shuffle --gtest_break_on_failure
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.MultipleFrameworks
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0513 15:42:05.931761  4320 exec.cpp:131] Version: 0.19.0
> I0513 15:42:05.936698  4340 exec.cpp:205] Executor registered on slave 
> 20140513-154204-16842879-51872-13062-0
> Registered executor on artoo
> Starting task 51991f97-f5fd-4905-ad0f-02668083af7c
> Forked command at 4367
> sh -c 'sleep 1000'
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0513 15:42:06.915061  4408 exec.cpp:131] Version: 0.19.0
> I0513 15:42:06.931149  4435 exec.cpp:205] Executor registered on slave 
> 20140513-154204-16842879-51872-13062-0
> Registered executor on artoo
> Starting task eaf5d8d6-3a6c-4ee1-84c1-fae20fb1df83
> sh -c 'sleep 1000'
> Forked command at 4439
> I0513 15:42:06.998332  4340 exec.cpp:251] Received reconnect request from 
> slave 20140513-154204-16842879-51872-13062-0
> I0513 15:42:06.998414  4436 exec.cpp:251] Received reconnect request from 
> slave 20140513-154204-16842879-51872-13062-0
> I0513 15:42:07.006350  4437 exec.cpp:228] Executor re-registered on slave 
> 20140513-154204-16842879-51872-13062-0
> Re-registered executor on artoo
> I0513 15:42:07.027039  4337 exec.cpp:378] Executor asked to shutdown
> Shutting down
> Sending SIGTERM to process tree at pid 4367
> Killing the following process trees:
> [ 
> -+- 4367 sh -c sleep 1000 
>  \--- 4368 sleep 1000 
> ]
> ../../src/tests/slave_recovery_tests.cpp:2807: Failure
> Value of: status1.get().state()
>   Actual: TASK_FAILED
> Expected: TASK_KILLED
> Program received signal SIGSEGV, Segmentation fault.
> testing::UnitTest::AddTestPartResult (this=0x154dac0 
> , 
> result_type=testing::TestPartResult::kFatalFailure, file_name=0xeb6b6c 
> "../../src/tests/slave_recovery_tests.cpp", line_number=2807, message=..., 
> os_stack_trace=...) at gmock-1.6.0/gtest/src/gtest.cc:3795
> 3795  *static_cast(NULL) = 1;
> (gdb) bt
> #0  testing::UnitTest::AddTestPartResult (this=0x154dac0 
> , 
> result_type=testing::TestPartResult::kFatalFailure, file_name=0xeb6b6c 
> "../../src/tests/slave_recovery_tests.cpp", line_number=2807, message=..., 
> os_stack_trace=...) at gmock-1.6.0/gtest/src/gtest.cc:3795
> #1  0x00df98b9 in testing::internal::AssertHelper::operator= 
> (this=0x7fffb860, message=...) at gmock-1.6.0/gtest/src/gtest.cc:356
> #2  0x00cdfa57 in 
> SlaveRecoveryTest_MultipleFrameworks_Test::TestBody
>  (this=0x1954db0) at ../../src/tests/slave_recovery_tests.cpp:2807
> #3  0x00e22583 in 
> testing::internal::HandleSehExceptionsInMethodIfSupported void> (object=0x1954db0, method=&virtual testing::Test::TestBody(), 
> location=0xed0af0 "the test body") at gmock-1.6.0/gtest/src/gtest.cc:2090
> #4  0x00e12467 in 
> testing::internal::HandleExceptionsInMethodIfSupported 
> (object=0x1954db0, method=&virtual testing::Test::TestBody(), 
> location=0xed0af0 "the test body") at gmock-1.6.0/gtest/src/gtest.cc:2126
> #5  0x00e010d5 in testing::Test::Run (this=0x1954db0) at 
> gmock-1.6.0/gtest/src/gtest.cc:2161
> #6  0x00e01ceb in testing::TestInfo::Run (this=0x158cf80) at 
> gmock-1.6.0/gtest/src/gtest.cc:2338
> #7  0x00e02387 in testing::TestCase::Run (this=0x158a880) at 
> gmock-1.6.0/gtest/src/gtest.cc:2445
> #8  0x00e079ed in testing::internal::UnitTestImpl::RunAllTests 
> (this=0x1558b40) at gmock-1.6.0/gtest/src/gtest.cc:4237
> #9  0x00e1ec83 in 
> testing::internal::HandleSehExceptionsInMethodIfSupported  bool> (object=0x1558b40, method=(bool 
> (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * 
> const)) 0xe07700 , 
> location=0xed1219 "auxiliary test code (environments or event 
> listeners)") at gmock-1.6.0/gtest/src/gtest.cc:2090
> #10 0x00e14217 in 
> testing::internal::HandleExceptionsInMethodIfSupported  bool> (object=0x1558b40, method=(bool 
> (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * 
> const)) 0xe07700 , 
> location=0xed1219 "auxiliary test code (environments or event 
> listeners)") at gmock-1.6.0/gtest/src/gtest.cc:2126
> #11 0x00e076d7 in testing::UnitTest::Run (this=0x154dac0 
> ) at 
> gmock-1.6.0/gtest/src/gtest.cc:3872
> #1

[jira] [Updated] (MESOS-1594) SlaveRecoveryTest/0.ReconcileKillTask is flaky

2016-01-15 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-1594:
-
Story Points: 1  (was: 2)

> SlaveRecoveryTest/0.ReconcileKillTask is flaky
> --
>
> Key: MESOS-1594
> URL: https://issues.apache.org/jira/browse/MESOS-1594
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.0
> Environment: Ubuntu 12.10 with GCC
>Reporter: Vinod Kone
>Assignee: Greg Mann
>  Labels: flaky, mesosphere
>
> Observed this on Jenkins.
> {code}
> [ RUN  ] SlaveRecoveryTest/0.ReconcileKillTask
> Using temporary directory '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG'
> I0714 15:08:43.915114 27216 leveldb.cpp:176] Opened db in 474.695188ms
> I0714 15:08:43.933645 27216 leveldb.cpp:183] Compacted db in 18.068942ms
> I0714 15:08:43.934129 27216 leveldb.cpp:198] Created db iterator in 7860ns
> I0714 15:08:43.934439 27216 leveldb.cpp:204] Seeked to beginning of db in 
> 2560ns
> I0714 15:08:43.934779 27216 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1400ns
> I0714 15:08:43.935098 27216 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0714 15:08:43.936027 27238 recover.cpp:425] Starting replica recovery
> I0714 15:08:43.936225 27238 recover.cpp:451] Replica is in EMPTY status
> I0714 15:08:43.936867 27238 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0714 15:08:43.937049 27238 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0714 15:08:43.937232 27238 recover.cpp:542] Updating replica status to 
> STARTING
> I0714 15:08:43.945600 27235 master.cpp:288] Master 
> 20140714-150843-16842879-55850-27216 (quantal) started on 127.0.1.1:55850
> I0714 15:08:43.945643 27235 master.cpp:325] Master only allowing 
> authenticated frameworks to register
> I0714 15:08:43.945651 27235 master.cpp:330] Master only allowing 
> authenticated slaves to register
> I0714 15:08:43.945658 27235 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG/credentials'
> I0714 15:08:43.945808 27235 master.cpp:359] Authorization enabled
> I0714 15:08:43.946369 27235 hierarchical_allocator_process.hpp:301] 
> Initializing hierarchical allocator process with master : 
> master@127.0.1.1:55850
> I0714 15:08:43.946419 27235 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0714 15:08:43.946614 27235 master.cpp:1128] The newly elected leader is 
> master@127.0.1.1:55850 with id 20140714-150843-16842879-55850-27216
> I0714 15:08:43.946630 27235 master.cpp:1141] Elected as the leading master!
> I0714 15:08:43.946637 27235 master.cpp:959] Recovering from registrar
> I0714 15:08:43.946707 27235 registrar.cpp:313] Recovering registrar
> I0714 15:08:43.957895 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 20.529301ms
> I0714 15:08:43.957978 27238 replica.cpp:320] Persisted replica status to 
> STARTING
> I0714 15:08:43.958142 27238 recover.cpp:451] Replica is in STARTING status
> I0714 15:08:43.958664 27238 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0714 15:08:43.958762 27238 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0714 15:08:43.958945 27238 recover.cpp:542] Updating replica status to VOTING
> I0714 15:08:43.975685 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 16.646136ms
> I0714 15:08:43.976367 27238 replica.cpp:320] Persisted replica status to 
> VOTING
> I0714 15:08:43.976824 27241 recover.cpp:556] Successfully joined the Paxos 
> group
> I0714 15:08:43.977072 27242 recover.cpp:440] Recover process terminated
> I0714 15:08:43.980590 27236 log.cpp:656] Attempting to start the writer
> I0714 15:08:43.981385 27236 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0714 15:08:43.999141 27236 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 17.705787ms
> I0714 15:08:43.999222 27236 replica.cpp:342] Persisted promised to 1
> I0714 15:08:44.004451 27240 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0714 15:08:44.004914 27240 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0714 15:08:44.021456 27240 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 16.499775ms
> I0714 15:08:44.021533 27240 replica.cpp:676] Persisted action at 0
> I0714 15:08:44.022006 27240 replica.cpp:508] Replica received write request 
> for position 0
> I0714 15:08:44.022043 27240 leveldb.cpp:438] Reading position from leveldb 
> took 21376ns
> I0714 15:08:44.035969 27240 leveldb.cpp:343] Persisting action (14 by

[jira] [Updated] (MESOS-1594) SlaveRecoveryTest/0.ReconcileKillTask is flaky

2016-01-15 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-1594:
-
Shepherd: Vinod Kone

> SlaveRecoveryTest/0.ReconcileKillTask is flaky
> --
>
> Key: MESOS-1594
> URL: https://issues.apache.org/jira/browse/MESOS-1594
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.0
> Environment: Ubuntu 12.10 with GCC
>Reporter: Vinod Kone
>Assignee: Greg Mann
>  Labels: flaky, mesosphere
>
> Observed this on Jenkins.
> {code}
> [ RUN  ] SlaveRecoveryTest/0.ReconcileKillTask
> Using temporary directory '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG'
> I0714 15:08:43.915114 27216 leveldb.cpp:176] Opened db in 474.695188ms
> I0714 15:08:43.933645 27216 leveldb.cpp:183] Compacted db in 18.068942ms
> I0714 15:08:43.934129 27216 leveldb.cpp:198] Created db iterator in 7860ns
> I0714 15:08:43.934439 27216 leveldb.cpp:204] Seeked to beginning of db in 
> 2560ns
> I0714 15:08:43.934779 27216 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1400ns
> I0714 15:08:43.935098 27216 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0714 15:08:43.936027 27238 recover.cpp:425] Starting replica recovery
> I0714 15:08:43.936225 27238 recover.cpp:451] Replica is in EMPTY status
> I0714 15:08:43.936867 27238 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0714 15:08:43.937049 27238 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0714 15:08:43.937232 27238 recover.cpp:542] Updating replica status to 
> STARTING
> I0714 15:08:43.945600 27235 master.cpp:288] Master 
> 20140714-150843-16842879-55850-27216 (quantal) started on 127.0.1.1:55850
> I0714 15:08:43.945643 27235 master.cpp:325] Master only allowing 
> authenticated frameworks to register
> I0714 15:08:43.945651 27235 master.cpp:330] Master only allowing 
> authenticated slaves to register
> I0714 15:08:43.945658 27235 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG/credentials'
> I0714 15:08:43.945808 27235 master.cpp:359] Authorization enabled
> I0714 15:08:43.946369 27235 hierarchical_allocator_process.hpp:301] 
> Initializing hierarchical allocator process with master : 
> master@127.0.1.1:55850
> I0714 15:08:43.946419 27235 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0714 15:08:43.946614 27235 master.cpp:1128] The newly elected leader is 
> master@127.0.1.1:55850 with id 20140714-150843-16842879-55850-27216
> I0714 15:08:43.946630 27235 master.cpp:1141] Elected as the leading master!
> I0714 15:08:43.946637 27235 master.cpp:959] Recovering from registrar
> I0714 15:08:43.946707 27235 registrar.cpp:313] Recovering registrar
> I0714 15:08:43.957895 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 20.529301ms
> I0714 15:08:43.957978 27238 replica.cpp:320] Persisted replica status to 
> STARTING
> I0714 15:08:43.958142 27238 recover.cpp:451] Replica is in STARTING status
> I0714 15:08:43.958664 27238 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0714 15:08:43.958762 27238 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0714 15:08:43.958945 27238 recover.cpp:542] Updating replica status to VOTING
> I0714 15:08:43.975685 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 16.646136ms
> I0714 15:08:43.976367 27238 replica.cpp:320] Persisted replica status to 
> VOTING
> I0714 15:08:43.976824 27241 recover.cpp:556] Successfully joined the Paxos 
> group
> I0714 15:08:43.977072 27242 recover.cpp:440] Recover process terminated
> I0714 15:08:43.980590 27236 log.cpp:656] Attempting to start the writer
> I0714 15:08:43.981385 27236 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0714 15:08:43.999141 27236 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 17.705787ms
> I0714 15:08:43.999222 27236 replica.cpp:342] Persisted promised to 1
> I0714 15:08:44.004451 27240 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0714 15:08:44.004914 27240 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0714 15:08:44.021456 27240 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 16.499775ms
> I0714 15:08:44.021533 27240 replica.cpp:676] Persisted action at 0
> I0714 15:08:44.022006 27240 replica.cpp:508] Replica received write request 
> for position 0
> I0714 15:08:44.022043 27240 leveldb.cpp:438] Reading position from leveldb 
> took 21376ns
> I0714 15:08:44.035969 27240 leveldb.cpp:343] Persisting action (14 bytes) 

[jira] [Updated] (MESOS-3273) EventCall Test Framework is flaky

2016-01-15 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3273:
--
Assignee: Vinod Kone
  Sprint: Mesosphere Sprint 27
Story Points: 5

> EventCall Test Framework is flaky
> -
>
> Key: MESOS-3273
> URL: https://issues.apache.org/jira/browse/MESOS-3273
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>  Labels: flaky-test, tech-debt, twitter
> Attachments: asan.log
>
>
> Observed this on ASF CI. h/t [~haosd...@gmail.com]
> Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master.
> {code}
> [ RUN  ] ExamplesTest.EventCallFramework
> Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx'
> I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the 
> driver is aborted!
> Shutting down
> Sending SIGTERM to process tree at pid 26061
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26062
> Shutting down
> Killing the following process trees:
> [ 
> ]
> Sending SIGTERM to process tree at pid 26063
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26098
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26099
> Killing the following process trees:
> [ 
> ]
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on 
> 172.17.2.10:60249 for 16 cpus
> I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR
> I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0
> I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms
> I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms
> I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns
> I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in 
> 8429ns
> I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 4219ns
> I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery
> I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status
> I0813 19:55:17.181970 26126 master.cpp:378] Master 
> 20150813-195517-167907756-60249-26100 (297daca2d01a) started on 
> 172.17.2.10:60249
> I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: 
> --acls="permissive: false
> register_frameworks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   roles {
> type: SOME
> values: "*"
>   }
> }
> run_tasks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   users {
> type: SOME
> values: "mesos"
>   }
> }
> " --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" 
> --credentials="/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.24.0/src/webui" --work_dir="/tmp/mesos-II8Gua" 
> --zk_session_timeout="10secs"
> I0813 19:55:17.183475 26126 master.cpp:427] Master allowing unauthenticated 
> frameworks to register
> I0813 19:55:17.183536 26126 master.cpp:432] Master allowing unauthenticated 
> slaves to register
> I0813 19:55:17.183615 26126 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials'
> W0813 19:55:17.183859 26126 credentials.hpp:52] Permissions on credentials 
> file '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' are too open. 
> It is recommended that your credentials file is NOT accessible by others.
> I0813 19:55:17.183969 26123 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0813 19:55:17.184306 26126 master.cpp:469] Using default 'crammd5' 
> authenticator
> I0813 19:55:17.184661 26126 authenticator

[jira] [Assigned] (MESOS-1365) SlaveRecoveryTest/0.MultipleFrameworks is flaky

2016-01-15 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-1365:


Assignee: Greg Mann

> SlaveRecoveryTest/0.MultipleFrameworks is flaky
> ---
>
> Key: MESOS-1365
> URL: https://issues.apache.org/jira/browse/MESOS-1365
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Dominic Hamon
>Assignee: Greg Mann
>Priority: Minor
>  Labels: flaky, flaky-test
>
> --gtest_repeat=-1 --gtest_shuffle --gtest_break_on_failure
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.MultipleFrameworks
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0513 15:42:05.931761  4320 exec.cpp:131] Version: 0.19.0
> I0513 15:42:05.936698  4340 exec.cpp:205] Executor registered on slave 
> 20140513-154204-16842879-51872-13062-0
> Registered executor on artoo
> Starting task 51991f97-f5fd-4905-ad0f-02668083af7c
> Forked command at 4367
> sh -c 'sleep 1000'
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0513 15:42:06.915061  4408 exec.cpp:131] Version: 0.19.0
> I0513 15:42:06.931149  4435 exec.cpp:205] Executor registered on slave 
> 20140513-154204-16842879-51872-13062-0
> Registered executor on artoo
> Starting task eaf5d8d6-3a6c-4ee1-84c1-fae20fb1df83
> sh -c 'sleep 1000'
> Forked command at 4439
> I0513 15:42:06.998332  4340 exec.cpp:251] Received reconnect request from 
> slave 20140513-154204-16842879-51872-13062-0
> I0513 15:42:06.998414  4436 exec.cpp:251] Received reconnect request from 
> slave 20140513-154204-16842879-51872-13062-0
> I0513 15:42:07.006350  4437 exec.cpp:228] Executor re-registered on slave 
> 20140513-154204-16842879-51872-13062-0
> Re-registered executor on artoo
> I0513 15:42:07.027039  4337 exec.cpp:378] Executor asked to shutdown
> Shutting down
> Sending SIGTERM to process tree at pid 4367
> Killing the following process trees:
> [ 
> -+- 4367 sh -c sleep 1000 
>  \--- 4368 sleep 1000 
> ]
> ../../src/tests/slave_recovery_tests.cpp:2807: Failure
> Value of: status1.get().state()
>   Actual: TASK_FAILED
> Expected: TASK_KILLED
> Program received signal SIGSEGV, Segmentation fault.
> testing::UnitTest::AddTestPartResult (this=0x154dac0 
> , 
> result_type=testing::TestPartResult::kFatalFailure, file_name=0xeb6b6c 
> "../../src/tests/slave_recovery_tests.cpp", line_number=2807, message=..., 
> os_stack_trace=...) at gmock-1.6.0/gtest/src/gtest.cc:3795
> 3795  *static_cast(NULL) = 1;
> (gdb) bt
> #0  testing::UnitTest::AddTestPartResult (this=0x154dac0 
> , 
> result_type=testing::TestPartResult::kFatalFailure, file_name=0xeb6b6c 
> "../../src/tests/slave_recovery_tests.cpp", line_number=2807, message=..., 
> os_stack_trace=...) at gmock-1.6.0/gtest/src/gtest.cc:3795
> #1  0x00df98b9 in testing::internal::AssertHelper::operator= 
> (this=0x7fffb860, message=...) at gmock-1.6.0/gtest/src/gtest.cc:356
> #2  0x00cdfa57 in 
> SlaveRecoveryTest_MultipleFrameworks_Test::TestBody
>  (this=0x1954db0) at ../../src/tests/slave_recovery_tests.cpp:2807
> #3  0x00e22583 in 
> testing::internal::HandleSehExceptionsInMethodIfSupported void> (object=0x1954db0, method=&virtual testing::Test::TestBody(), 
> location=0xed0af0 "the test body") at gmock-1.6.0/gtest/src/gtest.cc:2090
> #4  0x00e12467 in 
> testing::internal::HandleExceptionsInMethodIfSupported 
> (object=0x1954db0, method=&virtual testing::Test::TestBody(), 
> location=0xed0af0 "the test body") at gmock-1.6.0/gtest/src/gtest.cc:2126
> #5  0x00e010d5 in testing::Test::Run (this=0x1954db0) at 
> gmock-1.6.0/gtest/src/gtest.cc:2161
> #6  0x00e01ceb in testing::TestInfo::Run (this=0x158cf80) at 
> gmock-1.6.0/gtest/src/gtest.cc:2338
> #7  0x00e02387 in testing::TestCase::Run (this=0x158a880) at 
> gmock-1.6.0/gtest/src/gtest.cc:2445
> #8  0x00e079ed in testing::internal::UnitTestImpl::RunAllTests 
> (this=0x1558b40) at gmock-1.6.0/gtest/src/gtest.cc:4237
> #9  0x00e1ec83 in 
> testing::internal::HandleSehExceptionsInMethodIfSupported  bool> (object=0x1558b40, method=(bool 
> (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * 
> const)) 0xe07700 , 
> location=0xed1219 "auxiliary test code (environments or event 
> listeners)") at gmock-1.6.0/gtest/src/gtest.cc:2090
> #10 0x00e14217 in 
> testing::internal::HandleExceptionsInMethodIfSupported  bool> (object=0x1558b40, method=(bool 
> (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * 
> const)) 0xe07700 , 
> location=0xed1219 "auxiliary test code (environments or event 
> listeners)") at gmock-1.6.0/gtest/src/gtest.cc:2126
> #11 0x00e076d7 in testing::UnitTest::Run (this=0x154dac0 
> ) at 
> gmock-1.6.0/gtest/src/gtest.cc:3872
> #12 0x000

[jira] [Updated] (MESOS-1594) SlaveRecoveryTest/0.ReconcileKillTask is flaky

2016-01-15 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-1594:
-
  Sprint: Mesosphere Sprint 27
Story Points: 2
  Labels: flaky mesosphere  (was: flaky)

> SlaveRecoveryTest/0.ReconcileKillTask is flaky
> --
>
> Key: MESOS-1594
> URL: https://issues.apache.org/jira/browse/MESOS-1594
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.0
> Environment: Ubuntu 12.10 with GCC
>Reporter: Vinod Kone
>Assignee: Greg Mann
>  Labels: flaky, mesosphere
>
> Observed this on Jenkins.
> {code}
> [ RUN  ] SlaveRecoveryTest/0.ReconcileKillTask
> Using temporary directory '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG'
> I0714 15:08:43.915114 27216 leveldb.cpp:176] Opened db in 474.695188ms
> I0714 15:08:43.933645 27216 leveldb.cpp:183] Compacted db in 18.068942ms
> I0714 15:08:43.934129 27216 leveldb.cpp:198] Created db iterator in 7860ns
> I0714 15:08:43.934439 27216 leveldb.cpp:204] Seeked to beginning of db in 
> 2560ns
> I0714 15:08:43.934779 27216 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1400ns
> I0714 15:08:43.935098 27216 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0714 15:08:43.936027 27238 recover.cpp:425] Starting replica recovery
> I0714 15:08:43.936225 27238 recover.cpp:451] Replica is in EMPTY status
> I0714 15:08:43.936867 27238 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0714 15:08:43.937049 27238 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0714 15:08:43.937232 27238 recover.cpp:542] Updating replica status to 
> STARTING
> I0714 15:08:43.945600 27235 master.cpp:288] Master 
> 20140714-150843-16842879-55850-27216 (quantal) started on 127.0.1.1:55850
> I0714 15:08:43.945643 27235 master.cpp:325] Master only allowing 
> authenticated frameworks to register
> I0714 15:08:43.945651 27235 master.cpp:330] Master only allowing 
> authenticated slaves to register
> I0714 15:08:43.945658 27235 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG/credentials'
> I0714 15:08:43.945808 27235 master.cpp:359] Authorization enabled
> I0714 15:08:43.946369 27235 hierarchical_allocator_process.hpp:301] 
> Initializing hierarchical allocator process with master : 
> master@127.0.1.1:55850
> I0714 15:08:43.946419 27235 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0714 15:08:43.946614 27235 master.cpp:1128] The newly elected leader is 
> master@127.0.1.1:55850 with id 20140714-150843-16842879-55850-27216
> I0714 15:08:43.946630 27235 master.cpp:1141] Elected as the leading master!
> I0714 15:08:43.946637 27235 master.cpp:959] Recovering from registrar
> I0714 15:08:43.946707 27235 registrar.cpp:313] Recovering registrar
> I0714 15:08:43.957895 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 20.529301ms
> I0714 15:08:43.957978 27238 replica.cpp:320] Persisted replica status to 
> STARTING
> I0714 15:08:43.958142 27238 recover.cpp:451] Replica is in STARTING status
> I0714 15:08:43.958664 27238 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0714 15:08:43.958762 27238 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0714 15:08:43.958945 27238 recover.cpp:542] Updating replica status to VOTING
> I0714 15:08:43.975685 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 16.646136ms
> I0714 15:08:43.976367 27238 replica.cpp:320] Persisted replica status to 
> VOTING
> I0714 15:08:43.976824 27241 recover.cpp:556] Successfully joined the Paxos 
> group
> I0714 15:08:43.977072 27242 recover.cpp:440] Recover process terminated
> I0714 15:08:43.980590 27236 log.cpp:656] Attempting to start the writer
> I0714 15:08:43.981385 27236 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0714 15:08:43.999141 27236 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 17.705787ms
> I0714 15:08:43.999222 27236 replica.cpp:342] Persisted promised to 1
> I0714 15:08:44.004451 27240 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0714 15:08:44.004914 27240 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0714 15:08:44.021456 27240 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 16.499775ms
> I0714 15:08:44.021533 27240 replica.cpp:676] Persisted action at 0
> I0714 15:08:44.022006 27240 replica.cpp:508] Replica received write request 
> for position 0
> I0714 15:08:44.022043 27240 leveldb.cpp:438] Reading position from leveldb 
> took 2

[jira] [Assigned] (MESOS-2017) Segfault with "Pure virtual method called" when tests fail

2016-01-15 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues reassigned MESOS-2017:
--

Assignee: Kevin Klues  (was: Yan Xu)

> Segfault with "Pure virtual method called" when tests fail
> --
>
> Key: MESOS-2017
> URL: https://issues.apache.org/jira/browse/MESOS-2017
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Yan Xu
>Assignee: Kevin Klues
>  Labels: twitter
>
> The most recent one:
> {noformat:title=DRFAllocatorTest.DRFAllocatorProcess}
> [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
> Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j'
> I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms
> I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms
> I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns
> I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in 
> 2018ns
> I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 335ns
> I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery
> I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status
> I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to 
> STARTING
> I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 591981ns
> I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to 
> STARTING
> I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status
> I1030 05:55:06.940820 24489 master.cpp:312] Master 
> 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on 
> 67.195.81.187:40429
> I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing 
> authenticated frameworks to register
> I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing 
> authenticated slaves to register
> I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials'
> I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled
> I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.187:40429
> I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is 
> master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459
> I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master!
> I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar
> I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar
> I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING
> I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 536365ns
> I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to 
> VOTING
> I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos 
> group
> I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated
> I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer
> I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 806463ns
> I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1
> I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 603843ns
> I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0
> I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request 
> for position 0
> I1030 05:55:06.952239 24476 leveldb.cpp:438] Reading position from leveldb 
> took 28437ns
> I1030 05:55:06.952896 2447

[jira] [Commented] (MESOS-3987) /create-volumes, /destroy-volumes should be permissive under a master without authentication.

2016-01-15 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102119#comment-15102119
 ] 

Greg Mann commented on MESOS-3987:
--

It seems that this ticket is unnecessary, as these endpoints currently exhibit 
permissive behavior when authorization is disabled. Closing the ticket, and 
opening MESOS-4395 to create tests for this case.

> /create-volumes, /destroy-volumes should be permissive under a master without 
> authentication.
> -
>
> Key: MESOS-3987
> URL: https://issues.apache.org/jira/browse/MESOS-3987
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: authentication, mesosphere, persistent-volumes
>
> See MESOS-3940 for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2017) Segfault with "Pure virtual method called" when tests fail

2016-01-15 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-2017:
---
Shepherd: Benjamin Mahler

> Segfault with "Pure virtual method called" when tests fail
> --
>
> Key: MESOS-2017
> URL: https://issues.apache.org/jira/browse/MESOS-2017
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Yan Xu
>Assignee: Kevin Klues
>  Labels: mesosphere, tests
>
> The most recent one:
> {noformat:title=DRFAllocatorTest.DRFAllocatorProcess}
> [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
> Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j'
> I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms
> I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms
> I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns
> I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in 
> 2018ns
> I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 335ns
> I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery
> I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status
> I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to 
> STARTING
> I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 591981ns
> I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to 
> STARTING
> I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status
> I1030 05:55:06.940820 24489 master.cpp:312] Master 
> 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on 
> 67.195.81.187:40429
> I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing 
> authenticated frameworks to register
> I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing 
> authenticated slaves to register
> I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials'
> I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled
> I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.187:40429
> I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is 
> master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459
> I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master!
> I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar
> I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar
> I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING
> I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 536365ns
> I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to 
> VOTING
> I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos 
> group
> I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated
> I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer
> I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 806463ns
> I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1
> I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 603843ns
> I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0
> I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request 
> for position 0
> I1030 05:55:06.952239 24476 leveldb.cpp:438] Reading position from leveldb 
> took 28437ns
> I1030 05:55:06.952896 24476 leveld

  1   2   >