[jira] [Commented] (MESOS-6082) Add scheduler Call and Event based metrics to the master.

2017-01-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824812#comment-15824812
 ] 

Anand Mazumdar commented on MESOS-6082:
---

[~a10gupta] Did you get a chance to update the review based on comments from 
[~zhitao]?

> Add scheduler Call and Event based metrics to the master.
> -
>
> Key: MESOS-6082
> URL: https://issues.apache.org/jira/browse/MESOS-6082
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Abhishek Dasgupta
>Priority: Critical
>
> Currently, the master only has metrics for the old-style messages and these 
> are re-used for calls unfortunately:
> {code}
>   // Messages from schedulers.
>   process::metrics::Counter messages_register_framework;
>   process::metrics::Counter messages_reregister_framework;
>   process::metrics::Counter messages_unregister_framework;
>   process::metrics::Counter messages_deactivate_framework;
>   process::metrics::Counter messages_kill_task;
>   process::metrics::Counter messages_status_update_acknowledgement;
>   process::metrics::Counter messages_resource_request;
>   process::metrics::Counter messages_launch_tasks;
>   process::metrics::Counter messages_decline_offers;
>   process::metrics::Counter messages_revive_offers;
>   process::metrics::Counter messages_suppress_offers;
>   process::metrics::Counter messages_reconcile_tasks;
>   process::metrics::Counter messages_framework_to_executor;
> {code}
> Now that we've introduced the Call/Event based API, we should have metrics 
> that reflect this. For example:
> {code}
> {
>   scheduler/calls: 100
>   scheduler/calls/decline: 90,
>   scheduler/calls/accept: 10,
>   scheduler/calls/accept/operations/create: 1,
>   scheduler/calls/accept/operations/destroy: 0,
>   scheduler/calls/accept/operations/launch: 4,
>   scheduler/calls/accept/operations/launch_group: 2,
>   scheduler/calls/accept/operations/reserve: 1,
>   scheduler/calls/accept/operations/unreserve: 0,
>   scheduler/calls/kill: 0,
>   // etc
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6082) Add scheduler Call and Event based metrics to the master.

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824802#comment-15824802
 ] 

Adam B commented on MESOS-6082:
---

[~anandmazumdar] As this JIRA's Shepherd, could you please review the RR's so 
we can hopefully land this for Mesos 1.2?

> Add scheduler Call and Event based metrics to the master.
> -
>
> Key: MESOS-6082
> URL: https://issues.apache.org/jira/browse/MESOS-6082
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Abhishek Dasgupta
>Priority: Critical
>
> Currently, the master only has metrics for the old-style messages and these 
> are re-used for calls unfortunately:
> {code}
>   // Messages from schedulers.
>   process::metrics::Counter messages_register_framework;
>   process::metrics::Counter messages_reregister_framework;
>   process::metrics::Counter messages_unregister_framework;
>   process::metrics::Counter messages_deactivate_framework;
>   process::metrics::Counter messages_kill_task;
>   process::metrics::Counter messages_status_update_acknowledgement;
>   process::metrics::Counter messages_resource_request;
>   process::metrics::Counter messages_launch_tasks;
>   process::metrics::Counter messages_decline_offers;
>   process::metrics::Counter messages_revive_offers;
>   process::metrics::Counter messages_suppress_offers;
>   process::metrics::Counter messages_reconcile_tasks;
>   process::metrics::Counter messages_framework_to_executor;
> {code}
> Now that we've introduced the Call/Event based API, we should have metrics 
> that reflect this. For example:
> {code}
> {
>   scheduler/calls: 100
>   scheduler/calls/decline: 90,
>   scheduler/calls/accept: 10,
>   scheduler/calls/accept/operations/create: 1,
>   scheduler/calls/accept/operations/destroy: 0,
>   scheduler/calls/accept/operations/launch: 4,
>   scheduler/calls/accept/operations/launch_group: 2,
>   scheduler/calls/accept/operations/reserve: 1,
>   scheduler/calls/accept/operations/unreserve: 0,
>   scheduler/calls/kill: 0,
>   // etc
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6804) Running 'tty' inside a debug container that has a tty reports "Not a tty"

2017-01-16 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824801#comment-15824801
 ] 

Kevin Klues commented on MESOS-6804:


I spent multiple days on this last week and it's proven to be much more 
complicated than originally anticipated. From talking to [~vi...@twitter.com] I 
think the plan is to actually remove this as a blocker and add a note in the 
CHANGELOG that this is a known "Critical Bug".

> Running 'tty' inside a debug container that has a tty reports "Not a tty"
> -
>
> Key: MESOS-6804
> URL: https://issues.apache.org/jira/browse/MESOS-6804
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: debugging, mesosphere
>
> We need to inject `/dev/console` into the container and map it to the slave 
> end of the TTY we are attached to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6551) Add attach/exec commands to the Mesos CLI

2017-01-16 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824800#comment-15824800
 ] 

Kevin Klues commented on MESOS-6551:


I would have liked to do it by 1.2, but it never got prioritized and I'm not 
sure I will have time to do it in the next week.

> Add attach/exec commands to the Mesos CLI
> -
>
> Key: MESOS-6551
> URL: https://issues.apache.org/jira/browse/MESOS-6551
> Project: Mesos
>  Issue Type: Task
>  Components: cli
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: debugging, mesosphere
>
> After all of this support has landed, we need to update the Mesos CLI to 
> implement {{attach}} and {{exec}} functionality as outlined in the Design Doc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6623) Re-enable tests impacted by request streaming support

2017-01-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824797#comment-15824797
 ] 

Anand Mazumdar commented on MESOS-6623:
---

Removing the target version. I don't have any cycles in the coming week since I 
already have a couple of other 1.2 blockers on my plate.

> Re-enable tests impacted by request streaming support
> -
>
> Key: MESOS-6623
> URL: https://issues.apache.org/jira/browse/MESOS-6623
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, tests
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Critical
>  Labels: mesosphere
>
> We added support for HTTP request streaming in libprocess as part of 
> MESOS-6466. However, this broke a few tests that relied on HTTP request 
> filtering since the handlers no longer have access to the body of the request 
> when {{visit()}} is invoked. We would need to revisit how we do HTTP request 
> filtering and then re-enable these tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5931) Support auto backend in Unified Containerizer.

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824798#comment-15824798
 ] 

Adam B commented on MESOS-5931:
---

Ping. Looks like we still need a review from the Shepherd, [~jieyu]

> Support auto backend in Unified Containerizer.
> --
>
> Key: MESOS-5931
> URL: https://issues.apache.org/jira/browse/MESOS-5931
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: backend, containerizer, mesosphere
>
> Currently in Unified Containerizer, copy backend will be selected by default. 
> This is not ideal, especially for production environment. It would take a 
> long time to prepare an huge container image to copy it from the store to 
> provisioner.
> Ideally, we should support `auto backend`, which would 
> automatically/intelligently select the best/optimal backend for image 
> provisioner if user does not specify one from the agent flag.
> We should have a logic design first in this ticket, to determine how we want 
> to choose the right backend (e.g., overlayfs or aufs should be preferred if 
> available from the kernel).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6419) The 'master/teardown' endpoint should support tearing down 'unregistered_frameworks'.

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824796#comment-15824796
 ] 

Adam B commented on MESOS-6419:
---

[~neilc], [~vinodkone] I removed the 1.1.1 TargetVersion, since I doubt we'll 
backport this much code.
Since all those patches have already landed in master, is there anything else 
we need to do for Mesos 1.2? If not, let's resolve this ticket.

> The 'master/teardown' endpoint should support tearing down 
> 'unregistered_frameworks'.
> -
>
> Key: MESOS-6419
> URL: https://issues.apache.org/jira/browse/MESOS-6419
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.26.2, 0.27.3, 0.28.2, 1.0.1
>Reporter: Gilbert Song
>Assignee: Neil Conway
>Priority: Critical
>  Labels: endpoint, master
> Fix For: 1.2.0
>
>
> This issue is exposed from 
> [MESOS-6400](https://issues.apache.org/jira/browse/MESOS-6400). When a user 
> is trying to tear down an 'unregistered_framework' from the 'master/teardown' 
> endpoint, a bad request will be returned: `No framework found with specified 
> ID`.
> Ideally, we should support tearing down an unregistered framework, since 
> those frameworks may occur due to network partition, then all the orphan 
> tasks still occupy the resources. It would be a nightmare if a user has to 
> wait until the unregistered framework to get those resources back.
> This may be the initial implementation: 
> https://github.com/apache/mesos/commit/bb8375975e92ee722befb478ddc3b2541d1ccaa9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6623) Re-enable tests impacted by request streaming support

2017-01-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6623:
--
Target Version/s:   (was: 1.2.0)

> Re-enable tests impacted by request streaming support
> -
>
> Key: MESOS-6623
> URL: https://issues.apache.org/jira/browse/MESOS-6623
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, tests
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Critical
>  Labels: mesosphere
>
> We added support for HTTP request streaming in libprocess as part of 
> MESOS-6466. However, this broke a few tests that relied on HTTP request 
> filtering since the handlers no longer have access to the body of the request 
> when {{visit()}} is invoked. We would need to revisit how we do HTTP request 
> filtering and then re-enable these tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6419) The 'master/teardown' endpoint should support tearing down 'unregistered_frameworks'.

2017-01-16 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6419:
--
Target Version/s: 1.2.0  (was: 1.1.1, 1.2.0)

> The 'master/teardown' endpoint should support tearing down 
> 'unregistered_frameworks'.
> -
>
> Key: MESOS-6419
> URL: https://issues.apache.org/jira/browse/MESOS-6419
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.26.2, 0.27.3, 0.28.2, 1.0.1
>Reporter: Gilbert Song
>Assignee: Neil Conway
>Priority: Critical
>  Labels: endpoint, master
> Fix For: 1.2.0
>
>
> This issue is exposed from 
> [MESOS-6400](https://issues.apache.org/jira/browse/MESOS-6400). When a user 
> is trying to tear down an 'unregistered_framework' from the 'master/teardown' 
> endpoint, a bad request will be returned: `No framework found with specified 
> ID`.
> Ideally, we should support tearing down an unregistered framework, since 
> those frameworks may occur due to network partition, then all the orphan 
> tasks still occupy the resources. It would be a nightmare if a user has to 
> wait until the unregistered framework to get those resources back.
> This may be the initial implementation: 
> https://github.com/apache/mesos/commit/bb8375975e92ee722befb478ddc3b2541d1ccaa9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6419) The 'master/teardown' endpoint should support tearing down 'unregistered_frameworks'.

2017-01-16 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6419:
--
Fix Version/s: 1.2.0

> The 'master/teardown' endpoint should support tearing down 
> 'unregistered_frameworks'.
> -
>
> Key: MESOS-6419
> URL: https://issues.apache.org/jira/browse/MESOS-6419
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.26.2, 0.27.3, 0.28.2, 1.0.1
>Reporter: Gilbert Song
>Assignee: Neil Conway
>Priority: Critical
>  Labels: endpoint, master
> Fix For: 1.2.0
>
>
> This issue is exposed from 
> [MESOS-6400](https://issues.apache.org/jira/browse/MESOS-6400). When a user 
> is trying to tear down an 'unregistered_framework' from the 'master/teardown' 
> endpoint, a bad request will be returned: `No framework found with specified 
> ID`.
> Ideally, we should support tearing down an unregistered framework, since 
> those frameworks may occur due to network partition, then all the orphan 
> tasks still occupy the resources. It would be a nightmare if a user has to 
> wait until the unregistered framework to get those resources back.
> This may be the initial implementation: 
> https://github.com/apache/mesos/commit/bb8375975e92ee722befb478ddc3b2541d1ccaa9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6405) Benchmark call ingestion path on the Mesos master.

2017-01-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6405:
--
Target Version/s:   (was: 1.2.0)

> Benchmark call ingestion path on the Mesos master.
> --
>
> Key: MESOS-6405
> URL: https://issues.apache.org/jira/browse/MESOS-6405
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, scheduler api
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Critical
>  Labels: mesosphere
>
> [~drexin] reported on the user mailing 
> [list|http://mail-archives.apache.org/mod_mbox/mesos-user/201610.mbox/%3C6B42E374-9AB7--A315-A6558753E08B%40apple.com%3E]
>  that there seems to be a significant regression in performance on the call 
> ingestion path on the Mesos master wrt to the scheduler driver (v0 API). 
> We should create a benchmark to first get a sense of the numbers and then go 
> about fixing the performance issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6623) Re-enable tests impacted by request streaming support

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824791#comment-15824791
 ] 

Adam B commented on MESOS-6623:
---

Ping. Any chance of us getting this into Mesos 1.2 this week/month?

> Re-enable tests impacted by request streaming support
> -
>
> Key: MESOS-6623
> URL: https://issues.apache.org/jira/browse/MESOS-6623
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, tests
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Critical
>  Labels: mesosphere
>
> We added support for HTTP request streaming in libprocess as part of 
> MESOS-6466. However, this broke a few tests that relied on HTTP request 
> filtering since the handlers no longer have access to the body of the request 
> when {{visit()}} is invoked. We would need to revisit how we do HTTP request 
> filtering and then re-enable these tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6551) Add attach/exec commands to the Mesos CLI

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824792#comment-15824792
 ] 

Adam B commented on MESOS-6551:
---

Ping. Any chance of us getting this into Mesos 1.2 this week/month?

> Add attach/exec commands to the Mesos CLI
> -
>
> Key: MESOS-6551
> URL: https://issues.apache.org/jira/browse/MESOS-6551
> Project: Mesos
>  Issue Type: Task
>  Components: cli
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: debugging, mesosphere
>
> After all of this support has landed, we need to update the Mesos CLI to 
> implement {{attach}} and {{exec}} functionality as outlined in the Design Doc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6405) Benchmark call ingestion path on the Mesos master.

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824789#comment-15824789
 ] 

Adam B commented on MESOS-6405:
---

[~anandmazumdar], [~vinodkone], this ticket/patch hasn't been updated in 
months. Do you still think we can get it into Mesos 1.2?

> Benchmark call ingestion path on the Mesos master.
> --
>
> Key: MESOS-6405
> URL: https://issues.apache.org/jira/browse/MESOS-6405
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, scheduler api
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Critical
>  Labels: mesosphere
>
> [~drexin] reported on the user mailing 
> [list|http://mail-archives.apache.org/mod_mbox/mesos-user/201610.mbox/%3C6B42E374-9AB7--A315-A6558753E08B%40apple.com%3E]
>  that there seems to be a significant regression in performance on the call 
> ingestion path on the Mesos master wrt to the scheduler driver (v0 API). 
> We should create a benchmark to first get a sense of the numbers and then go 
> about fixing the performance issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4766) Improve allocator performance.

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824787#comment-15824787
 ] 

Adam B commented on MESOS-4766:
---

[~mcypark], [~bmahler], is there anything else we want to do here for Mesos 1.2 
this week/month? None of the Unresolved issues in this Epic are targeted to 
1.2, although a few are "Reviewable" (but months old).
If there's nothing more to do here, we can either a) drop/defer the Target 
Version since we're not able to complete the Epic in 1.2, or b) clone the Epic, 
move the remaining issues to the new (1.3) Epic, and close this one as "Done" 
for 1.2.
If there is more work you'd like to do for 1.2, please target those JIRAs 
appropriately. Thanks.

> Improve allocator performance.
> --
>
> Key: MESOS-4766
> URL: https://issues.apache.org/jira/browse/MESOS-4766
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>Priority: Critical
>
> This is an epic to track the various tickets around improving the performance 
> of the allocator, including the following:
> * Preventing un-necessary backup of the allocator.
> * Reducing the cost of allocations and allocator state updates.
> * Improving performance of the DRF sorter.
> * More benchmarking to simulate scenarios with performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6040) Add a CMake build for `mesos-port-mapper`

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824777#comment-15824777
 ] 

Adam B edited comment on MESOS-6040 at 1/17/17 12:45 AM:
-

[~jieyu] Could you take a look at this review request? It's been sitting idle 
for almost 2 weeks.


was (Author: adam-mesos):
[~jieyu] Could you take a look at this review request? It's been sti

> Add a CMake build for `mesos-port-mapper`
> -
>
> Key: MESOS-6040
> URL: https://issues.apache.org/jira/browse/MESOS-6040
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
>
> Once the port-mapper binary compiles with GNU make, we need to modify the 
> CMake to build the port-mapper binary as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6040) Add a CMake build for `mesos-port-mapper`

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824777#comment-15824777
 ] 

Adam B commented on MESOS-6040:
---

[~jieyu] Could you take a look at this review request? It's been sti

> Add a CMake build for `mesos-port-mapper`
> -
>
> Key: MESOS-6040
> URL: https://issues.apache.org/jira/browse/MESOS-6040
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
>
> Once the port-mapper binary compiles with GNU make, we need to modify the 
> CMake to build the port-mapper binary as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3601) Formalize all headers and metadata for HTTP API Event Stream

2017-01-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824773#comment-15824773
 ] 

Anand Mazumdar commented on MESOS-3601:
---

[~adam-mesos] Not much implementation wise. Though, we have agreed on the 
approach to take based on the two proposed solutions mentioned in the design 
doc. I would update the ticket/design doc shortly with more details on the 
solution and start the implementation from tomorrow!

> Formalize all headers and metadata for HTTP API Event Stream
> 
>
> Key: MESOS-3601
> URL: https://issues.apache.org/jira/browse/MESOS-3601
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.24.0
> Environment: Mesos 0.24.0
>Reporter: Ben Whitehead
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: api, http, mesosphere, wireprotocol
>
> From an HTTP standpoint the current set of headers returned when connecting 
> to the HTTP scheduler API are insufficient. 
> {code:title=current headers}
> HTTP/1.1 200 OK
> Transfer-Encoding: chunked
> Date: Wed, 30 Sep 2015 21:07:16 GMT
> Content-Type: application/json
> {code}
> Since the response from mesos is intended to function as a stream 
> {{Connection: keep-alive}} should be specified so that the connection can 
> remain open.
> If RecordIO is going to be applied to the messages, the headers should 
> include the information necessary for a client to be able to detect RecordIO 
> and setup it response handlers appropriately.
> How RecordIO is expressed will come down to the semantics of what is actually 
> "Returned" as the response from {{POST /api/v1/scheduler}}.
> h4. Proposal
> One approach would be to leverage http as much as possible, having a client 
> specify an {{Accept-Encoding}} along with the {{Accept}} header to indicate 
> that it can handle RecordIO {{Content-Encoding}} of {{Content-Type}} 
> messages.  (This approach allows for things like gzip to be woven in fairly 
> easily in the future)
> For this approach I would expect the following:
> {code:title=Request}
> POST /api/v1/scheduler HTTP/1.1
> Host: localhost:5050
> Accept: application/x-protobuf
> Accept-Encoding: recordio
> Content-Type: application/x-protobuf
> Content-Length: 35
> User-Agent: RxNetty Client
> {code}
> {code:title=Response}
> HTTP/1.1 200 OK
> Connection: keep-alive
> Transfer-Encoding: chunked
> Content-Type: application/x-protobuf
> Content-Encoding: recordio
> Cache-Control: no-transform
> {code}
> When Content-Encoding is used it is recommended to set {{Cache-Control: 
> no-transform}} to signal to any proxies that no transformation should be 
> applied to the the content encoding [Section 14.11 RFC 
> 2616|http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6665) io::redirect might cause stack overflow.

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824768#comment-15824768
 ] 

Adam B commented on MESOS-6665:
---

Any progress update [~benjaminhindman]? is there anything further that we want 
to do here for Mesos 1.2?
If so, let's add it to the current Mesosphere sprint and try to get it done 
ASAP so we can cut Mesos 1.2 this week/month.
If not, let's either close it out or drop/defer the TargetVersion.
Thanks!

> io::redirect might cause stack overflow.
> 
>
> Key: MESOS-6665
> URL: https://issues.apache.org/jira/browse/MESOS-6665
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Benjamin Hindman
>Priority: Blocker
>  Labels: mesosphere
>
> Can reproduce this on macOS sierra:
> {noformat}
> [--] 6 tests from IOTest
> [ RUN  ] IOTest.Poll
> [   OK ] IOTest.Poll (0 ms)
> [ RUN  ] IOTest.Read
> [   OK ] IOTest.Read (3 ms)
> [ RUN  ] IOTest.BufferedRead
> [   OK ] IOTest.BufferedRead (5 ms)
> [ RUN  ] IOTest.Write
> [   OK ] IOTest.Write (1 ms)
> [ RUN  ] IOTest.Redirect
> make[6]: *** [check-local] Illegal instruction: 4
> make[5]: *** [check-am] Error 2
> make[4]: *** [check-recursive] Error 1
> make[3]: *** [check] Error 2
> make[2]: *** [check-recursive] Error 1
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> (reverse-i-search)`k': make check -j3
> Jies-MacBook-Pro:build jie$ lldb 3rdparty/libprocess/libprocess-tests
> (lldb) target create "3rdparty/libprocess/libprocess-tests"
> Current executable set to '3rdparty/libprocess/libprocess-tests' (x86_64).
> (lldb) run --gtest_filter=IOTest.Redirect
> Process 26064 launched: 
> '/Users/jie/workspace/dist/mesos/build/3rdparty/libprocess/libprocess-tests' 
> (x86_64)
> Note: Google Test filter = IOTest.Redirect
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from IOTest
> [ RUN  ] IOTest.Redirect
> Process 26064 stopped
> * thread #2: tid = 0x152c5c, 0x7fffd6d463e0 
> libsystem_malloc.dylib`szone_malloc_should_clear + 78, stop reason = 
> EXC_BAD_ACCESS (code=2, address=0x7eb16ff8)
> frame #0: 0x7fffd6d463e0 
> libsystem_malloc.dylib`szone_malloc_should_clear + 78
> libsystem_malloc.dylib`szone_malloc_should_clear:
> ->  0x7fffd6d463e0 <+78>: movq   %rax, -0x78(%rbp)
> 0x7fffd6d463e4 <+82>: movq   0x10f0(%r12), %r13
> 0x7fffd6d463ec <+90>: leaq   (%rax,%rax,4), %r14
> 0x7fffd6d463f0 <+94>: shlq   $0x9, %r14
> (lldb) bt
> .
> frame #2794: 0x7fffd6ddb221 libsystem_pthread.dylib`thread_start + 13
> {noformat}
> Change the test to redirect just 1KB data will hide the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6780) ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824765#comment-15824765
 ] 

Adam B commented on MESOS-6780:
---

[~tillt], [~alexr], is there anything further that we want to do here for Mesos 
1.2?
If so, let's add it to the current Mesosphere sprint and try to get it done 
ASAP so we can cut Mesos 1.2 this week/month.
If not, let's either close it out or drop/defer the TargetVersion.
Thanks!

> ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably
> --
>
> Key: MESOS-6780
> URL: https://issues.apache.org/jira/browse/MESOS-6780
> Project: Mesos
>  Issue Type: Bug
> Environment: Mac OS 10.12, clang version 4.0.0 
> (http://llvm.org/git/clang 88800602c0baafb8739cb838c2fa3f5fb6cc6968) 
> (http://llvm.org/git/llvm 25801f0f22e178343ee1eadfb4c6cc058628280e), 
> libc++-513447dbb91dd555ea08297dbee6a1ceb6abdc46
>Reporter: Benjamin Bannier
>Assignee: Till Toenshoff
>Priority: Blocker
>  Labels: mesosphere
> Attachments: attach_container_input_no_ssl.log
>
>
> The test {{ContentType/AgentAPIStreamingTest.AttachContainerInput}} (both 
> {{/0}} and {{/1}}) fail consistently for me in an SSL-enabled, optimized 
> build.
> {code}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from ContentType/AgentAPIStreamingTest
> [ RUN  ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0
> I1212 17:11:12.371175 3971208128 cluster.cpp:160] Creating default 'local' 
> authorizer
> I1212 17:11:12.393844 17362944 master.cpp:380] Master 
> c752777c-d947-4a86-b382-643463866472 (172.18.8.114) started on 
> 172.18.8.114:51059
> I1212 17:11:12.393899 17362944 master.cpp:382] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/master"
>  --zk_session_timeout="10secs"
> I1212 17:11:12.394670 17362944 master.cpp:432] Master only allowing 
> authenticated frameworks to register
> I1212 17:11:12.394682 17362944 master.cpp:446] Master only allowing 
> authenticated agents to register
> I1212 17:11:12.394691 17362944 master.cpp:459] Master only allowing 
> authenticated HTTP frameworks to register
> I1212 17:11:12.394701 17362944 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials'
> I1212 17:11:12.394959 17362944 master.cpp:504] Using default 'crammd5' 
> authenticator
> I1212 17:11:12.394996 17362944 authenticator.cpp:519] Initializing server SASL
> I1212 17:11:12.411406 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I1212 17:11:12.411571 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I1212 17:11:12.411682 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I1212 17:11:12.411775 17362944 master.cpp:584] Authorization enabled
> I1212 17:11:12.413318 16289792 master.cpp:2045] Elected as the leading master!
> I1212 17:11:12.413377 16289792 master.cpp:1568] Recovering from registrar
> I1212 17:11:12.417582 14143488 registrar.cpp:362] Successfully fetched the 
> registry (0B) in 4.131072ms
> I1212 17:11:12.417667 14143488 registrar.cpp:461] Applied 1 operations in 
> 27us; attempting to update the registry
> I1212 17:11:12.421799 14143488 registrar.cpp:506] Successfully updated the 
> registry in 4.10496ms
> I1212 17:11:12.421835 14143488 

[jira] [Updated] (MESOS-3601) Formalize all headers and metadata for HTTP API Event Stream

2017-01-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3601:
--
Story Points: 5

> Formalize all headers and metadata for HTTP API Event Stream
> 
>
> Key: MESOS-3601
> URL: https://issues.apache.org/jira/browse/MESOS-3601
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.24.0
> Environment: Mesos 0.24.0
>Reporter: Ben Whitehead
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: api, http, mesosphere, wireprotocol
>
> From an HTTP standpoint the current set of headers returned when connecting 
> to the HTTP scheduler API are insufficient. 
> {code:title=current headers}
> HTTP/1.1 200 OK
> Transfer-Encoding: chunked
> Date: Wed, 30 Sep 2015 21:07:16 GMT
> Content-Type: application/json
> {code}
> Since the response from mesos is intended to function as a stream 
> {{Connection: keep-alive}} should be specified so that the connection can 
> remain open.
> If RecordIO is going to be applied to the messages, the headers should 
> include the information necessary for a client to be able to detect RecordIO 
> and setup it response handlers appropriately.
> How RecordIO is expressed will come down to the semantics of what is actually 
> "Returned" as the response from {{POST /api/v1/scheduler}}.
> h4. Proposal
> One approach would be to leverage http as much as possible, having a client 
> specify an {{Accept-Encoding}} along with the {{Accept}} header to indicate 
> that it can handle RecordIO {{Content-Encoding}} of {{Content-Type}} 
> messages.  (This approach allows for things like gzip to be woven in fairly 
> easily in the future)
> For this approach I would expect the following:
> {code:title=Request}
> POST /api/v1/scheduler HTTP/1.1
> Host: localhost:5050
> Accept: application/x-protobuf
> Accept-Encoding: recordio
> Content-Type: application/x-protobuf
> Content-Length: 35
> User-Agent: RxNetty Client
> {code}
> {code:title=Response}
> HTTP/1.1 200 OK
> Connection: keep-alive
> Transfer-Encoding: chunked
> Content-Type: application/x-protobuf
> Content-Encoding: recordio
> Cache-Control: no-transform
> {code}
> When Content-Encoding is used it is recommended to set {{Cache-Control: 
> no-transform}} to signal to any proxies that no transformation should be 
> applied to the the content encoding [Section 14.11 RFC 
> 2616|http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6780) ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably

2017-01-16 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6780:
--
Labels: mesosphere  (was: )

> ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably
> --
>
> Key: MESOS-6780
> URL: https://issues.apache.org/jira/browse/MESOS-6780
> Project: Mesos
>  Issue Type: Bug
> Environment: Mac OS 10.12, clang version 4.0.0 
> (http://llvm.org/git/clang 88800602c0baafb8739cb838c2fa3f5fb6cc6968) 
> (http://llvm.org/git/llvm 25801f0f22e178343ee1eadfb4c6cc058628280e), 
> libc++-513447dbb91dd555ea08297dbee6a1ceb6abdc46
>Reporter: Benjamin Bannier
>Assignee: Till Toenshoff
>Priority: Blocker
>  Labels: mesosphere
> Attachments: attach_container_input_no_ssl.log
>
>
> The test {{ContentType/AgentAPIStreamingTest.AttachContainerInput}} (both 
> {{/0}} and {{/1}}) fail consistently for me in an SSL-enabled, optimized 
> build.
> {code}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from ContentType/AgentAPIStreamingTest
> [ RUN  ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0
> I1212 17:11:12.371175 3971208128 cluster.cpp:160] Creating default 'local' 
> authorizer
> I1212 17:11:12.393844 17362944 master.cpp:380] Master 
> c752777c-d947-4a86-b382-643463866472 (172.18.8.114) started on 
> 172.18.8.114:51059
> I1212 17:11:12.393899 17362944 master.cpp:382] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/master"
>  --zk_session_timeout="10secs"
> I1212 17:11:12.394670 17362944 master.cpp:432] Master only allowing 
> authenticated frameworks to register
> I1212 17:11:12.394682 17362944 master.cpp:446] Master only allowing 
> authenticated agents to register
> I1212 17:11:12.394691 17362944 master.cpp:459] Master only allowing 
> authenticated HTTP frameworks to register
> I1212 17:11:12.394701 17362944 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials'
> I1212 17:11:12.394959 17362944 master.cpp:504] Using default 'crammd5' 
> authenticator
> I1212 17:11:12.394996 17362944 authenticator.cpp:519] Initializing server SASL
> I1212 17:11:12.411406 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I1212 17:11:12.411571 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I1212 17:11:12.411682 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I1212 17:11:12.411775 17362944 master.cpp:584] Authorization enabled
> I1212 17:11:12.413318 16289792 master.cpp:2045] Elected as the leading master!
> I1212 17:11:12.413377 16289792 master.cpp:1568] Recovering from registrar
> I1212 17:11:12.417582 14143488 registrar.cpp:362] Successfully fetched the 
> registry (0B) in 4.131072ms
> I1212 17:11:12.417667 14143488 registrar.cpp:461] Applied 1 operations in 
> 27us; attempting to update the registry
> I1212 17:11:12.421799 14143488 registrar.cpp:506] Successfully updated the 
> registry in 4.10496ms
> I1212 17:11:12.421835 14143488 registrar.cpp:392] Successfully recovered 
> registrar
> I1212 17:11:12.421998 17362944 master.cpp:1684] Recovered 0 agents from the 
> registry (136B); allowing 10mins for agents to re-register
> I1212 17:11:12.422780 3971208128 containerizer.cpp:220] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix

[jira] [Updated] (MESOS-3601) Formalize all headers and metadata for HTTP API Event Stream

2017-01-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3601:
--
Sprint: Mesosphere Sprint 49

> Formalize all headers and metadata for HTTP API Event Stream
> 
>
> Key: MESOS-3601
> URL: https://issues.apache.org/jira/browse/MESOS-3601
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.24.0
> Environment: Mesos 0.24.0
>Reporter: Ben Whitehead
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: api, http, mesosphere, wireprotocol
>
> From an HTTP standpoint the current set of headers returned when connecting 
> to the HTTP scheduler API are insufficient. 
> {code:title=current headers}
> HTTP/1.1 200 OK
> Transfer-Encoding: chunked
> Date: Wed, 30 Sep 2015 21:07:16 GMT
> Content-Type: application/json
> {code}
> Since the response from mesos is intended to function as a stream 
> {{Connection: keep-alive}} should be specified so that the connection can 
> remain open.
> If RecordIO is going to be applied to the messages, the headers should 
> include the information necessary for a client to be able to detect RecordIO 
> and setup it response handlers appropriately.
> How RecordIO is expressed will come down to the semantics of what is actually 
> "Returned" as the response from {{POST /api/v1/scheduler}}.
> h4. Proposal
> One approach would be to leverage http as much as possible, having a client 
> specify an {{Accept-Encoding}} along with the {{Accept}} header to indicate 
> that it can handle RecordIO {{Content-Encoding}} of {{Content-Type}} 
> messages.  (This approach allows for things like gzip to be woven in fairly 
> easily in the future)
> For this approach I would expect the following:
> {code:title=Request}
> POST /api/v1/scheduler HTTP/1.1
> Host: localhost:5050
> Accept: application/x-protobuf
> Accept-Encoding: recordio
> Content-Type: application/x-protobuf
> Content-Length: 35
> User-Agent: RxNetty Client
> {code}
> {code:title=Response}
> HTTP/1.1 200 OK
> Connection: keep-alive
> Transfer-Encoding: chunked
> Content-Type: application/x-protobuf
> Content-Encoding: recordio
> Cache-Control: no-transform
> {code}
> When Content-Encoding is used it is recommended to set {{Cache-Control: 
> no-transform}} to signal to any proxies that no transformation should be 
> applied to the the content encoding [Section 14.11 RFC 
> 2616|http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6665) io::redirect might cause stack overflow.

2017-01-16 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6665:
--
Labels: mesosphere  (was: )

> io::redirect might cause stack overflow.
> 
>
> Key: MESOS-6665
> URL: https://issues.apache.org/jira/browse/MESOS-6665
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Benjamin Hindman
>Priority: Blocker
>  Labels: mesosphere
>
> Can reproduce this on macOS sierra:
> {noformat}
> [--] 6 tests from IOTest
> [ RUN  ] IOTest.Poll
> [   OK ] IOTest.Poll (0 ms)
> [ RUN  ] IOTest.Read
> [   OK ] IOTest.Read (3 ms)
> [ RUN  ] IOTest.BufferedRead
> [   OK ] IOTest.BufferedRead (5 ms)
> [ RUN  ] IOTest.Write
> [   OK ] IOTest.Write (1 ms)
> [ RUN  ] IOTest.Redirect
> make[6]: *** [check-local] Illegal instruction: 4
> make[5]: *** [check-am] Error 2
> make[4]: *** [check-recursive] Error 1
> make[3]: *** [check] Error 2
> make[2]: *** [check-recursive] Error 1
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> (reverse-i-search)`k': make check -j3
> Jies-MacBook-Pro:build jie$ lldb 3rdparty/libprocess/libprocess-tests
> (lldb) target create "3rdparty/libprocess/libprocess-tests"
> Current executable set to '3rdparty/libprocess/libprocess-tests' (x86_64).
> (lldb) run --gtest_filter=IOTest.Redirect
> Process 26064 launched: 
> '/Users/jie/workspace/dist/mesos/build/3rdparty/libprocess/libprocess-tests' 
> (x86_64)
> Note: Google Test filter = IOTest.Redirect
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from IOTest
> [ RUN  ] IOTest.Redirect
> Process 26064 stopped
> * thread #2: tid = 0x152c5c, 0x7fffd6d463e0 
> libsystem_malloc.dylib`szone_malloc_should_clear + 78, stop reason = 
> EXC_BAD_ACCESS (code=2, address=0x7eb16ff8)
> frame #0: 0x7fffd6d463e0 
> libsystem_malloc.dylib`szone_malloc_should_clear + 78
> libsystem_malloc.dylib`szone_malloc_should_clear:
> ->  0x7fffd6d463e0 <+78>: movq   %rax, -0x78(%rbp)
> 0x7fffd6d463e4 <+82>: movq   0x10f0(%r12), %r13
> 0x7fffd6d463ec <+90>: leaq   (%rax,%rax,4), %r14
> 0x7fffd6d463f0 <+94>: shlq   $0x9, %r14
> (lldb) bt
> .
> frame #2794: 0x7fffd6ddb221 libsystem_pthread.dylib`thread_start + 13
> {noformat}
> Change the test to redirect just 1KB data will hide the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3601) Formalize all headers and metadata for HTTP API Event Stream

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824759#comment-15824759
 ] 

Adam B commented on MESOS-3601:
---

Any progress to report, [~anandmazumdar]? Should we add this to the Mesosphere 
sprint 49 if you're working on it?

> Formalize all headers and metadata for HTTP API Event Stream
> 
>
> Key: MESOS-3601
> URL: https://issues.apache.org/jira/browse/MESOS-3601
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.24.0
> Environment: Mesos 0.24.0
>Reporter: Ben Whitehead
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: api, http, mesosphere, wireprotocol
>
> From an HTTP standpoint the current set of headers returned when connecting 
> to the HTTP scheduler API are insufficient. 
> {code:title=current headers}
> HTTP/1.1 200 OK
> Transfer-Encoding: chunked
> Date: Wed, 30 Sep 2015 21:07:16 GMT
> Content-Type: application/json
> {code}
> Since the response from mesos is intended to function as a stream 
> {{Connection: keep-alive}} should be specified so that the connection can 
> remain open.
> If RecordIO is going to be applied to the messages, the headers should 
> include the information necessary for a client to be able to detect RecordIO 
> and setup it response handlers appropriately.
> How RecordIO is expressed will come down to the semantics of what is actually 
> "Returned" as the response from {{POST /api/v1/scheduler}}.
> h4. Proposal
> One approach would be to leverage http as much as possible, having a client 
> specify an {{Accept-Encoding}} along with the {{Accept}} header to indicate 
> that it can handle RecordIO {{Content-Encoding}} of {{Content-Type}} 
> messages.  (This approach allows for things like gzip to be woven in fairly 
> easily in the future)
> For this approach I would expect the following:
> {code:title=Request}
> POST /api/v1/scheduler HTTP/1.1
> Host: localhost:5050
> Accept: application/x-protobuf
> Accept-Encoding: recordio
> Content-Type: application/x-protobuf
> Content-Length: 35
> User-Agent: RxNetty Client
> {code}
> {code:title=Response}
> HTTP/1.1 200 OK
> Connection: keep-alive
> Transfer-Encoding: chunked
> Content-Type: application/x-protobuf
> Content-Encoding: recordio
> Cache-Control: no-transform
> {code}
> When Content-Encoding is used it is recommended to set {{Cache-Control: 
> no-transform}} to signal to any proxies that no transformation should be 
> applied to the the content encoding [Section 14.11 RFC 
> 2616|http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6804) Running 'tty' inside a debug container that has a tty reports "Not a tty"

2017-01-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824755#comment-15824755
 ] 

Adam B commented on MESOS-6804:
---

Any progress to report, [~klueska]? Should we add this ticket to Mesosphere 
Sprint 49?

> Running 'tty' inside a debug container that has a tty reports "Not a tty"
> -
>
> Key: MESOS-6804
> URL: https://issues.apache.org/jira/browse/MESOS-6804
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: debugging, mesosphere
>
> We need to inject `/dev/console` into the container and map it to the slave 
> end of the TTY we are attached to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6926) Log incoming bad requests to Mesos

2017-01-16 Thread Aaron Wood (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Wood updated MESOS-6926:
--
Summary: Log incoming bad requests to Mesos  (was: Log bad requests that 
come from frameworks )

> Log incoming bad requests to Mesos
> --
>
> Key: MESOS-6926
> URL: https://issues.apache.org/jira/browse/MESOS-6926
> Project: Mesos
>  Issue Type: Wish
>  Components: HTTP API
>Reporter: Aaron Wood
>Priority: Minor
>
> It would be very helpful to log bad requests, maybe with the v1 or v2 logging 
> levels. This would help in the case of MESOS-6917 assuming that there was no 
> crash in the first place.
> For example, if someone is sending invalid UUID's (which I was) up to Mesos 
> and gets a bad request they should be able to see logs showing the bad 
> request and potentially the reason behind it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6926) Log bad requests that come from frameworks

2017-01-16 Thread Aaron Wood (JIRA)
Aaron Wood created MESOS-6926:
-

 Summary: Log bad requests that come from frameworks 
 Key: MESOS-6926
 URL: https://issues.apache.org/jira/browse/MESOS-6926
 Project: Mesos
  Issue Type: Wish
  Components: HTTP API
Reporter: Aaron Wood
Priority: Minor


It would be very helpful to log bad requests, maybe with the v1 or v2 logging 
levels. This would help in the case of MESOS-6917 assuming that there was no 
crash in the first place.

For example, if someone is sending invalid UUID's (which I was) up to Mesos and 
gets a bad request they should be able to see logs showing the bad request and 
potentially the reason behind it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4871) Make use of C++11 `override` keyword

2017-01-16 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824372#comment-15824372
 ] 

Benjamin Bannier commented on MESOS-4871:
-

This cleanup can be performed automatically with clang-tidy's 
{{modernize-use-override}} check. Once we implement a fix here we should make 
sure to add {{modernize-use-override}} to [our clang-tidy 
setup|https://github.com/apache/mesos/blob/e49a343e551fb9f332760ba2240f2805de35c1b8/support/mesos-tidy.sh#L29]
 and potentially also update the style guide to avoid confusion.

> Make use of C++11 `override` keyword
> 
>
> Key: MESOS-4871
> URL: https://issues.apache.org/jira/browse/MESOS-4871
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Neil Conway
>  Labels: mesosphere, tech-debt
>
> Per Google C++ style guide (as well as general common sense), we should 
> probably be using the {{override}} keyword to explicitly denote situations 
> where we expect a virtual member function declaration to override a virtual 
> function declared in a parent class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6676) Always re-link with scheduler during re-registration.

2017-01-16 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6676:
--
Fix Version/s: 1.0.3

> Always re-link with scheduler during re-registration.
> -
>
> Key: MESOS-6676
> URL: https://issues.apache.org/jira/browse/MESOS-6676
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
> Fix For: 1.1.1, 1.2.0, 1.0.3
>
>
> Scenario:
> # Framework registers with master using a non-zero {{failover_timeout}} and 
> is assigned a FrameworkID.
> # The master sees an {{ExitedEvent}} for the master->scheduler link. This 
> could happen due to some transient network error, e.g., 1-way partition. The 
> master sends a {{FrameworkErrorMessage}} to the framework. The master marks 
> the framework as disconnected, but keeps the {{Framework*}} for it around in 
> {{frameworks.registered}}.
> # The framework doesn't receive the {{FrameworkErrorMessage}} because it is 
> dropped by the network.
> # The scheduler might receive an {{ExitedEvent}} for the scheduler -> master 
> link, but it ignores this anyway (see MESOS-887).
> # The scheduler sees a new-master-detected event and re-registers with the 
> master. It doesn _not_ set the {{force}} flag. This means we follow [this 
> code 
> path|https://github.com/apache/mesos/blob/a6bab9015cd63121081495b8291635f386b95a92/src/master/master.cpp#L2771]
>  in the master, which does _not_ relink with the scheduler.
> The result is that scheduler re-registration succeds, but the master -> 
> scheduler link is never re-established.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6907) FutureTest.After3 is flaky

2017-01-16 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824092#comment-15824092
 ] 

Alexander Rojas commented on MESOS-6907:


So, after verifying my theory was correct. Timers are executed in [{{void 
process::timedout()}}|https://github.com/apache/mesos/blob/77ddbb62dd2ab4faaa22de8355f4766e7bbe0f2d/3rdparty/libprocess/src/process.cpp#L739].
 Moreover, {{libprocess::timedout()}} is not executed in any libprocess thread, 
but in the libevent loop 
[here|https://github.com/apache/mesos/blob/77ddbb62dd2ab4faaa22de8355f4766e7bbe0f2d/3rdparty/libprocess/src/process.cpp#L898],
 
[here|https://github.com/apache/mesos/blob/77ddbb62dd2ab4faaa22de8355f4766e7bbe0f2d/3rdparty/libprocess/src/clock.cpp#L206]
 and 
[here|https://github.com/apache/mesos/blob/77ddbb62dd2ab4faaa22de8355f4766e7bbe0f2d/3rdparty/libprocess/src/clock.cpp#L133].
 

What all this causes is that timers are executed in batch, and only when all 
the timers of a batch are executed, these timers belonging to that batch will 
be destroyed, which is the cause of the flakiness. It can be solved by forcing 
a second batch to run (since they run on the same thread every time) by 
creating a second timer and manipulating the {{Clock}}, so that the second 
timer is schedule in a different later batch and then waiting for the thunk of 
that timer to be executed. I proposed a patch which does just that:

[r/55576/|https://reviews.apache.org/r/55576/]: Fixes FutureTest.After3 
flakiness.



> FutureTest.After3 is flaky
> --
>
> Key: MESOS-6907
> URL: https://issues.apache.org/jira/browse/MESOS-6907
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Alexander Rojas
>
> There is apparently a race condition between the time an instance of 
> {{Future}} goes out of scope and when the enclosing data is actually 
> deleted, if {{Future::after(Duration, lambda::function Future&)>)}} is called.
> The issue is more likely to occur if the machine is under load or if it is 
> not a very powerful one. The easiest way to reproduce it is to run:
> {code}
> $ stress -c 4 -t 2600 -d 2 -i 2 &
> $ ./libprocess-tests --gtest_filter="FutureTest.After3" --gtest_repeat=-1 
> --gtest_break_on_failure
> {code}
> An exploratory fix for the issue is to change the test to:
> {code}
> TEST(FutureTest, After3)
> {
>   Future future;
>   process::WeakFuture weak_future(future);
>   EXPECT_SOME(weak_future.get());
>   {
> Clock::pause();
> // The original future disappears here. After this call the
> // original future goes out of scope and should not be reachable
> // anymore.
> future = future
>   .after(Milliseconds(1), [](Future f) {
> f.discard();
> return Nothing();
>   });
> Clock::advance(Seconds(2));
> Clock::settle();
> AWAIT_READY(future);
>   }
>   if (weak_future.get().isSome()) {
> os::sleep(Seconds(1));
>   }
>   EXPECT_NONE(weak_future.get());
>   EXPECT_FALSE(future.hasDiscard());
> }
> {code}
> The interesting thing of the fix is that both extra snippets are needed 
> (either one or the other is not enough) to prevent the issue from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6907) FutureTest.After3 is flaky

2017-01-16 Thread Alexander Rojas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rojas reassigned MESOS-6907:
--

Assignee: Alexander Rojas

> FutureTest.After3 is flaky
> --
>
> Key: MESOS-6907
> URL: https://issues.apache.org/jira/browse/MESOS-6907
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>
> There is apparently a race condition between the time an instance of 
> {{Future}} goes out of scope and when the enclosing data is actually 
> deleted, if {{Future::after(Duration, lambda::function Future&)>)}} is called.
> The issue is more likely to occur if the machine is under load or if it is 
> not a very powerful one. The easiest way to reproduce it is to run:
> {code}
> $ stress -c 4 -t 2600 -d 2 -i 2 &
> $ ./libprocess-tests --gtest_filter="FutureTest.After3" --gtest_repeat=-1 
> --gtest_break_on_failure
> {code}
> An exploratory fix for the issue is to change the test to:
> {code}
> TEST(FutureTest, After3)
> {
>   Future future;
>   process::WeakFuture weak_future(future);
>   EXPECT_SOME(weak_future.get());
>   {
> Clock::pause();
> // The original future disappears here. After this call the
> // original future goes out of scope and should not be reachable
> // anymore.
> future = future
>   .after(Milliseconds(1), [](Future f) {
> f.discard();
> return Nothing();
>   });
> Clock::advance(Seconds(2));
> Clock::settle();
> AWAIT_READY(future);
>   }
>   if (weak_future.get().isSome()) {
> os::sleep(Seconds(1));
>   }
>   EXPECT_NONE(weak_future.get());
>   EXPECT_FALSE(future.hasDiscard());
> }
> {code}
> The interesting thing of the fix is that both extra snippets are needed 
> (either one or the other is not enough) to prevent the issue from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6432) Roles with quota assigned can "game" the system to receive excessive resources.

2017-01-16 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-6432:
---

Assignee: Benjamin Bannier

> Roles with quota assigned can "game" the system to receive excessive 
> resources.
> ---
>
> Key: MESOS-6432
> URL: https://issues.apache.org/jira/browse/MESOS-6432
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>Priority: Critical
>
> The current implementation of quota allocation attempts to satisfy each 
> resource quota for a role, but in doing so can far exceed the quota assigned 
> to the role.
> For example, if a role has quota for {{\[30,20,10\]}}, it can consume up to: 
> {{\[∞, ∞, 10\]}} or {{\[∞, 20, ∞\]}} or {{\[30, ∞, ∞\]}} as only once each 
> resource in the quota vector is satisfied do we stop allocating agent's 
> resources to the role!
> As a first step for preventing gaming, we could consider quota satisfied once 
> any of the resources in the vector has quota satisfied. This approach works 
> reasonably well for resources that are required and are present on every 
> agent (cpus, mem, disk). However, it doesn't work well for resources that are 
> optional / only present on some agents (e.g. gpus) (a.k.a. non-ubiquitous / 
> scarce resources). For this we would need to determine which agents have 
> resources that can satisfy the quota prior to performing the allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6631) Disallow frameworks from modifying FrameworkInfo.roles.

2017-01-16 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805320#comment-15805320
 ] 

Benjamin Bannier edited comment on MESOS-6631 at 1/16/17 12:27 PM:
---

Reviews:

https://reviews.apache.org/r/55571/
https://reviews.apache.org/r/55271/


was (Author: bbannier):
Review: https://reviews.apache.org/r/55271/

> Disallow frameworks from modifying FrameworkInfo.roles.
> ---
>
> Key: MESOS-6631
> URL: https://issues.apache.org/jira/browse/MESOS-6631
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>
> In "phase 1" of the multi-role framework support, we want to preserve the 
> existing behavior of single-role framework support in that we disallow 
> frameworks from modifying their role.
> With multi-role framework support, we will initially disallow frameworks from 
> modifying the roles field. Note that in the case that the master has failed 
> over but the framework hasn't re-registered yet, we will use the framework 
> info from the agents to disallow changes to the roles field. We will treat 
> {{FrameworkInfo.roles}} as a set rather than a list, so ordering does not 
> matter for equality.
> One difference between {{role}} and {{roles}} is that for {{role}} 
> modification, we ignore it. But, with {{roles}} modification, since this is a 
> new feature, we can disallow it by rejecting the framework subscription.
> Later, in phase 2, we will allow frameworks to modify their roles, see 
> MESOS-6627.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6902) Add support for agent capabilities

2017-01-16 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823689#comment-15823689
 ] 

Jay Guo commented on MESOS-6902:


Some initial patches on protobuf messages:
https://reviews.apache.org/r/55562/
https://reviews.apache.org/r/55563/

> Add support for agent capabilities
> --
>
> Key: MESOS-6902
> URL: https://issues.apache.org/jira/browse/MESOS-6902
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Neil Conway
>Assignee: Jay Guo
>  Labels: mesosphere
>
> Similarly to how we might add support for master capabilities (MESOS-5675), 
> agent capabilities would also make sense: in a mixed cluster, the master 
> might have support for features that are not present on certain agents, and 
> vice versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6854) Prevent launching MULTI_ROLE framework's tasks on agents without MULTI_ROLE support.

2017-01-16 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823641#comment-15823641
 ] 

Jay Guo commented on MESOS-6854:


[~bmahler] Consider following upgrade scenario where agent is not upgraded:
# start an old cluster consisting of master, agent and framework
# framework launches an executor on the agent
# upgrade master to support multi-role
# upgrade framework to support multi-role
# framework wants to launch a task on existing executor

Should we allow the last step?

> Prevent launching MULTI_ROLE framework's tasks on agents without MULTI_ROLE 
> support.
> 
>
> Key: MESOS-6854
> URL: https://issues.apache.org/jira/browse/MESOS-6854
> Project: Mesos
>  Issue Type: Task
>  Components: agent, master
>Reporter: Benjamin Mahler
>Assignee: Jay Guo
>
> The proposal for upgrades / backwards compatibility in phase 1 of multi-role 
> framework support is that we require that masters and agents are all upgraded 
> before a multi-role framework registers.
> We need to explicitly protect against this situation occurring given it's 
> common for old agents to show up in a cluster. The master can prevent the 
> launching of MULTI_ROLE frameworks' tasks on agent without MULTI_ROLE 
> framework support.
> If we were to naively let this happen the old agent would think the resources 
> are allocated to the "*" and there would need to be master logic to deal with 
> the old agent not populating Resource.AllocationInfo.
> The guard will either need to be version based or agent capability based, the 
> latter seeming like the stronger approach given some users upgrade off of 
> master rather than using release versions.
> We can initially start with the master side guard, and have the agent send 
> the capability once the agent-side implementation is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)