[jira] [Created] (MESOS-6554) Create event stream capability in agent API

2016-11-04 Thread Zhitao Li (JIRA)
Zhitao Li created MESOS-6554:


 Summary: Create event stream capability in agent API
 Key: MESOS-6554
 URL: https://issues.apache.org/jira/browse/MESOS-6554
 Project: Mesos
  Issue Type: Wish
  Components: HTTP API
Reporter: Zhitao Li


Similar to event stream API in master, I hope we can have similar capabilities 
in agent API.

Many container related integration projects uses APIs like 
[https://docs.docker.com/engine/reference/api/docker_remote_api_v1.24/#/monitor-dockers-events|docker
 event], and people need a solution if they want to use Mesos containerizer to 
run docker containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6526) `mesos-containerizer launch --environment` exposes executor env vars in `ps`.

2016-11-04 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu reassigned MESOS-6526:
-

Assignee: Yan Xu

> `mesos-containerizer launch --environment` exposes executor env vars in `ps`.
> -
>
> Key: MESOS-6526
> URL: https://issues.apache.org/jira/browse/MESOS-6526
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>Assignee: Yan Xu
>Priority: Critical
>
> With MESOS-6323, the helper {{mesos-containerizer launch}} takes a 
> `--environment` flag for the env vars used by the executor. This is 
> unpleasant because its a common practice that people use env vars to hide 
> configs that are sensitive and now it's visible to non-root users on the host 
> with a {{ps}} command.
> Given that we want to separate the environments of {{mesos-containerizer 
> launch}} and the executor itself, perhaps we can just package and serialize 
> the executor env vars in one env var {{MESOS_EXECUTOR_ENVIRONMENT}} and pass 
> that to {{mesos-containerizer launch}} which could then get it through a flag 
> the usual way. 
> In general Mesos should do more to protect env vars but I'll file separate 
> issues for them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6553) Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` to launcher->fork()`

2016-11-04 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-6553:
--

 Summary: Update `MesosContainerizerProcess::_launch()` to pass 
`ContainerLaunchInfo` to launcher->fork()`
 Key: MESOS-6553
 URL: https://issues.apache.org/jira/browse/MESOS-6553
 Project: Mesos
  Issue Type: Task
Reporter: Kevin Klues


Currently, we receive a bunch of {{ContainerLaunchInfo}} structs from each of 
our isolators and extract information from them, which we pass one by one to 
our {{launcher->fork()}} call in separate parameters.

Instead, we should construct a new {{ContainerLaunchInfo}} which is the 
concatenation of the ones returned by each isolator, and pass this new one down 
to {{launcher->fork()}} instead of building up individual arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6552) Add ability to filter events on the subscriber stream for Master API.

2016-11-04 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6552:
--
Description: 
Currently, the v1 Master API allows an operator to subscribe to events 
happening on their clusters e.g., any time a new task is launched/updated. 
However,  there is no ability currently for a subscriber to express its 
interest in a particular subset of events on the master e.g, only in agent 
related events (add/removal) etc.

This would also take care of use cases where a subscriber would be short lived 
i.e., is only interested to see if a particular task has been launched on the 
cluster by the framework and then close its connection thereafter. Currently, 
such subscribers also receive the entire snapshot of the cluster via the 
{{SNAPSHOT}} events that can be rather huge for production clusters (we also 
don't support compression on the stream yet). Such subscribers in the future 
would be able to opt out of this event.

  was:
Currently, the v1 Master API allows an operator to subscribe to events 
happening on their clusters e.g., any time a new task is launched/updated. 
However,  there is no ability currently for a subscriber to express its 
interest in a particular subset of events on the master e.g, only in task 
add/updated events

This would also take care of use cases where a subscriber would be short lived 
i.e., is only interested to see if a particular task has been launched on the 
cluster by the framework and then close its connection thereafter. Currently, 
such subscribers also receive the entire snapshot of the cluster via the 
{{SNAPSHOT}} events that can be rather huge for production clusters (we also 
don't support compression on the stream yet). Such subscribers in the future 
would be able to opt out of this event.


> Add ability to filter events on the subscriber stream for Master API.
> -
>
> Key: MESOS-6552
> URL: https://issues.apache.org/jira/browse/MESOS-6552
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the v1 Master API allows an operator to subscribe to events 
> happening on their clusters e.g., any time a new task is launched/updated. 
> However,  there is no ability currently for a subscriber to express its 
> interest in a particular subset of events on the master e.g, only in agent 
> related events (add/removal) etc.
> This would also take care of use cases where a subscriber would be short 
> lived i.e., is only interested to see if a particular task has been launched 
> on the cluster by the framework and then close its connection thereafter. 
> Currently, such subscribers also receive the entire snapshot of the cluster 
> via the {{SNAPSHOT}} events that can be rather huge for production clusters 
> (we also don't support compression on the stream yet). Such subscribers in 
> the future would be able to opt out of this event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6552) Add ability to filter events on the subscriber stream for Master API.

2016-11-04 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-6552:
-

 Summary: Add ability to filter events on the subscriber stream for 
Master API.
 Key: MESOS-6552
 URL: https://issues.apache.org/jira/browse/MESOS-6552
 Project: Mesos
  Issue Type: Improvement
Reporter: Anand Mazumdar


Currently, the v1 Master API allows an operator to subscribe to events 
happening on their clusters e.g., any time a new task is launched/updated. 
However,  there is no ability currently for a subscriber to express its 
interest in a particular subset of events on the master e.g, only in task 
add/updated events

This would also take care of use cases where a subscriber would be short lived 
i.e., is only interested to see if a particular task has been launched on the 
cluster by the framework and then close its connection thereafter. Currently, 
such subscribers also receive the entire snapshot of the cluster via the 
{{SNAPSHOT}} events that can be rather huge for production clusters (we also 
don't support compression on the stream yet). Such subscribers in the future 
would be able to opt out of this event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6551) Add attach/exec commands to the Mesos CLI

2016-11-04 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6551:
---
Summary: Add attach/exec commands to the Mesos CLI  (was: Add exec/attach 
commands to the Mesos CLI)

> Add attach/exec commands to the Mesos CLI
> -
>
> Key: MESOS-6551
> URL: https://issues.apache.org/jira/browse/MESOS-6551
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> After all of this support has landed, we need to update the Mesos CLI to 
> implement {{attach}} and {{exec}} functionality as outlined in the Design Doc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6551) Add exec/attach commands to the Mesos CLI

2016-11-04 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-6551:
--

 Summary: Add exec/attach commands to the Mesos CLI
 Key: MESOS-6551
 URL: https://issues.apache.org/jira/browse/MESOS-6551
 Project: Mesos
  Issue Type: Task
Reporter: Kevin Klues
Assignee: Kevin Klues


After all of this support has landed, we need to update the Mesos CLI to 
implement {{attach}} and {{exec}} functionality as outlined in the Design Doc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6544) MasterMaintenanceTest.InverseOffersFilters is flaky.

2016-11-04 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6544:
---
Sprint: Mesosphere Sprint 46

> MasterMaintenanceTest.InverseOffersFilters is flaky.
> 
>
> Key: MESOS-6544
> URL: https://issues.apache.org/jira/browse/MESOS-6544
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
> Fix For: 1.2.0
>
>
> This test can crash when launching two executors concurrently because the 
> test containerizer is not thread-safe! (see MESOS-6545).
> {noformat}
> [...truncated 78174 lines...]
> I1103 01:40:55.530350 29098 slave.cpp:974] Authenticating with master 
> master@172.17.0.2:58302
> I1103 01:40:55.530432 29098 slave.cpp:985] Using default CRAM-MD5 
> authenticatee
> I1103 01:40:55.530627 29098 slave.cpp:947] Detecting new master
> I1103 01:40:55.530675 29108 authenticatee.cpp:121] Creating new client SASL 
> connection
> I1103 01:40:55.530743 29098 slave.cpp:5587] Received oversubscribable 
> resources {} from the resource estimator
> I1103 01:40:55.530961 29099 master.cpp:6742] Authenticating 
> slave(150)@172.17.0.2:58302
> I1103 01:40:55.531070 29112 authenticator.cpp:414] Starting authentication 
> session for crammd5-authenticatee(357)@172.17.0.2:58302
> I1103 01:40:55.531328 29106 authenticator.cpp:98] Creating new server SASL 
> connection
> I1103 01:40:55.531561 29108 authenticatee.cpp:213] Received SASL 
> authentication mechanisms: CRAM-MD5
> I1103 01:40:55.531604 29108 authenticatee.cpp:239] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I1103 01:40:55.531713 29101 authenticator.cpp:204] Received SASL 
> authentication start
> I1103 01:40:55.531805 29101 authenticator.cpp:326] Authentication requires 
> more steps
> I1103 01:40:55.531921 29108 authenticatee.cpp:259] Received SASL 
> authentication step
> I1103 01:40:55.532120 29101 authenticator.cpp:232] Received SASL 
> authentication step
> I1103 01:40:55.532155 29101 auxprop.cpp:109] Request to lookup properties for 
> user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: false
> I1103 01:40:55.532179 29101 auxprop.cpp:181] Looking up auxiliary property 
> '*userPassword'
> I1103 01:40:55.532233 29101 auxprop.cpp:181] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I1103 01:40:55.532266 29101 auxprop.cpp:109] Request to lookup properties for 
> user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: true
> I1103 01:40:55.532289 29101 auxprop.cpp:131] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1103 01:40:55.532305 29101 auxprop.cpp:131] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1103 01:40:55.532335 29101 authenticator.cpp:318] Authentication success
> I1103 01:40:55.532413 29110 authenticatee.cpp:299] Authentication success
> I1103 01:40:55.532467 29108 master.cpp:6772] Successfully authenticated 
> principal 'test-principal' at slave(150)@172.17.0.2:58302
> I1103 01:40:55.532536 29111 authenticator.cpp:432] Authentication session 
> cleanup for crammd5-authenticatee(357)@172.17.0.2:58302
> I1103 01:40:55.532755 29098 slave.cpp:1069] Successfully authenticated with 
> master master@172.17.0.2:58302
> I1103 01:40:55.532997 29098 slave.cpp:1483] Will retry registration in 
> 12.590371ms if necessary
> I1103 01:40:55.533179 29108 master.cpp:5151] Registering agent at 
> slave(150)@172.17.0.2:58302 (maintenance-host-2) with id 
> 3167a687-904b-4b57-bc0f-91b67dc7e41d-S1
> I1103 01:40:55.533572 29112 registrar.cpp:461] Applied 1 operations in 
> 94467ns; attempting to update the registry
> I1103 01:40:55.546341 29107 slave.cpp:1483] Will retry registration in 
> 36.501523ms if necessary
> I1103 01:40:55.546461 29099 master.cpp:5139] Ignoring register agent message 
> from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
> in progress
> I1103 01:40:55.565403 29097 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 48.099208ms
> I1103 01:40:55.565495 29097 replica.cpp:708] Persisted action TRUNCATE at 
> position 4
> I1103 01:40:55.566788 29097 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I1103 01:40:55.583937 29101 slave.cpp:1483] Will retry registration in 
> 26.127711ms if necessary
> I1103 01:40:55.584123 29112 master.cpp:5139] Ignoring register agent message 
> from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
> in progress
> I1103 01:40:55.609695 

[jira] [Updated] (MESOS-6545) TestContainerizer is not thread-safe.

2016-11-04 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6545:
---
Sprint: Mesosphere Sprint 46

> TestContainerizer is not thread-safe.
> -
>
> Key: MESOS-6545
> URL: https://issues.apache.org/jira/browse/MESOS-6545
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
> Fix For: 1.2.0
>
>
> The TestContainerizer is currently not backed by a Process and does not do 
> any explicit synchronization and so is not thread safe.
> Most tests currently cannot trip the concurrency issues, but this surfaced 
> recently in MESOS-6544.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6466) Add support for streaming HTTP requests in Mesos

2016-11-04 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637168#comment-15637168
 ] 

Anand Mazumdar edited comment on MESOS-6466 at 11/4/16 8:25 PM:


Review Chain: https://reviews.apache.org/r/53481/

Currently still left:
- Fix all tests relying on filtering HTTP events (see r53491 for more details)
- Parameterize the existing decoder tests thereby making them also work for the 
streaming decoder.
- Include Ben's fix for streaming gzip decompression (MESOS-6530)


was (Author: anandmazumdar):
Review Chain: https://reviews.apache.org/r/53481/

Currently still left:
- Fix all tests relying on filtering HTTP events (see r53491 for more details)
- Parameterize the existing decoder tests thereby making them also work for the 
streaming decoder.

> Add support for streaming HTTP requests in Mesos
> 
>
> Key: MESOS-6466
> URL: https://issues.apache.org/jira/browse/MESOS-6466
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Anand Mazumdar
>  Labels: debugging, mesosphere
>
> We already have support for streaming HTTP responses in Mesos. We now also 
> need to add support for streaming HTTP requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6366) Design doc for executor authentication

2016-11-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6366:
-
Description: (was: Produce a design for the passing of credentials to 
the agent, and their use in the following three scenarios:
* HTTP executor authentication
* Container image fetching
* Artifact fetching)

> Design doc for executor authentication
> --
>
> Key: MESOS-6366
> URL: https://issues.apache.org/jira/browse/MESOS-6366
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6365) Executor authentication

2016-11-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6365:
-
Epic Name: Executor Authentication  (was: Agent Secrets)

> Executor authentication
> ---
>
> Key: MESOS-6365
> URL: https://issues.apache.org/jira/browse/MESOS-6365
> Project: Mesos
>  Issue Type: Epic
>  Components: slave
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> To further develop the security of Mesos clusters, executors should be able 
> to authenticate with the Mesos agent. This will entail adding authentication 
> to the executor API and the default executor, as well as providing guidelines 
> for custom executor developers to enable authentication in their executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6365) Executor authentication

2016-11-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6365:
-
Description: To further develop the security of Mesos clusters, executors 
should be able to authenticate with the Mesos agent. This will entail adding 
authentication to the executor API and the default executor, as well as 
providing guidelines for custom executor developers to enable authentication in 
their executors.  (was: Three features are currently driving the need for a 
mechanism to pass secrets/credentials from the master to the agent:
* HTTP executor authentication
* Container image fetching
* Artifact fetching

We currently provide the ability to authenticate with a Docker registry, but 
the credentials used for this may only be set once on the agent via a 
command-line flag. Allowing operators to specify a Docker credential on a 
per-task basis requires a secret-passing mechanism.

We should design and implement a method for passing secrets that will work in 
all three of these scenarios.)

> Executor authentication
> ---
>
> Key: MESOS-6365
> URL: https://issues.apache.org/jira/browse/MESOS-6365
> Project: Mesos
>  Issue Type: Epic
>  Components: slave
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> To further develop the security of Mesos clusters, executors should be able 
> to authenticate with the Mesos agent. This will entail adding authentication 
> to the executor API and the default executor, as well as providing guidelines 
> for custom executor developers to enable authentication in their executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6366) Design doc for executor authentication

2016-11-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6366:
-
Summary: Design doc for executor authentication  (was: Design doc for agent 
secrets)

> Design doc for executor authentication
> --
>
> Key: MESOS-6366
> URL: https://issues.apache.org/jira/browse/MESOS-6366
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Produce a design for the passing of credentials to the agent, and their use 
> in the following three scenarios:
> * HTTP executor authentication
> * Container image fetching
> * Artifact fetching



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6365) Executor authentication

2016-11-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6365:
-
Summary: Executor authentication  (was: Agent secrets for executor 
authentication and fetching)

> Executor authentication
> ---
>
> Key: MESOS-6365
> URL: https://issues.apache.org/jira/browse/MESOS-6365
> Project: Mesos
>  Issue Type: Epic
>  Components: slave
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Three features are currently driving the need for a mechanism to pass 
> secrets/credentials from the master to the agent:
> * HTTP executor authentication
> * Container image fetching
> * Artifact fetching
> We currently provide the ability to authenticate with a Docker registry, but 
> the credentials used for this may only be set once on the agent via a 
> command-line flag. Allowing operators to specify a Docker credential on a 
> per-task basis requires a secret-passing mechanism.
> We should design and implement a method for passing secrets that will work in 
> all three of these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6550) Mesos master ui shows a Lost task as Running in Completed tasks section

2016-11-04 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637337#comment-15637337
 ] 

Yan Xu commented on MESOS-6550:
---

There are two issues:

1. Duplicate entries.
2. Tasks being transitioned to completed but with a RUNNING state.

The first issue actually also happens with a PARTITION_AWARE framework, in 
which case the task is transitioned from TASK_UNREACHABLE -> TASK_RUNNING after 
the unreachable agent comes back but the stale entry is still in the "completed 
tasks" section. Perhaps this condition is worth a JIRA of its own?

> Mesos master ui shows a Lost task as Running in Completed tasks section
> ---
>
> Key: MESOS-6550
> URL: https://issues.apache.org/jira/browse/MESOS-6550
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Megha
> Attachments: screenshot-1.png
>
>
> This is particularly happening when an agent is marked unreachable and as a 
> result master marks the tasks from partition unaware frameworks as lost but 
> when the agent comes back up then we see another instance of this task (from 
> unaware framework) as running on the master ui in the completed tasks section 
> although the master already sent a task kill for this task and agent acted on 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4601) Don't dump stack trace on failure to bind()

2016-11-04 Thread Armand Grillet (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armand Grillet reassigned MESOS-4601:
-

Assignee: Armand Grillet

> Don't dump stack trace on failure to bind()
> ---
>
> Key: MESOS-4601
> URL: https://issues.apache.org/jira/browse/MESOS-4601
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Assignee: Armand Grillet
>  Labels: errorhandling, libprocess, mesosphere, newbie
>
> We should do {{EXIT(EXIT_FAILURE)}} rather than {{LOG(FATAL)}}, both for this 
> code path and a few other expected error conditions in libprocess network 
> initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2723) The mesos-execute tool does not support zk:// master URLs

2016-11-04 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-2723:
-
Shepherd: Joseph Wu

> The mesos-execute tool does not support zk:// master URLs
> -
>
> Key: MESOS-2723
> URL: https://issues.apache.org/jira/browse/MESOS-2723
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.1
>Reporter: Tom Arnfeld
>Assignee: Armand Grillet
>  Labels: newbie
>
> It appears that the {{mesos-execute}} command line tool does it's own PID 
> validation of the {{--master}} param which prevents it from supporting 
> clusters managed with ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4601) Don't dump stack trace on failure to bind()

2016-11-04 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-4601:
-
Shepherd: Joseph Wu

> Don't dump stack trace on failure to bind()
> ---
>
> Key: MESOS-4601
> URL: https://issues.apache.org/jira/browse/MESOS-4601
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>  Labels: errorhandling, libprocess, mesosphere, newbie
>
> We should do {{EXIT(EXIT_FAILURE)}} rather than {{LOG(FATAL)}}, both for this 
> code path and a few other expected error conditions in libprocess network 
> initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2723) The mesos-execute tool does not support zk:// master URLs

2016-11-04 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-2723:
--
Assignee: Armand Grillet

> The mesos-execute tool does not support zk:// master URLs
> -
>
> Key: MESOS-2723
> URL: https://issues.apache.org/jira/browse/MESOS-2723
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.1
>Reporter: Tom Arnfeld
>Assignee: Armand Grillet
>  Labels: newbie
>
> It appears that the {{mesos-execute}} command line tool does it's own PID 
> validation of the {{--master}} param which prevents it from supporting 
> clusters managed with ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6550) Mesos master ui shows a Lost task as Running in Completed tasks section

2016-11-04 Thread Megha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637109#comment-15637109
 ] 

Megha commented on MESOS-6550:
--

[~xujyan]

> Mesos master ui shows a Lost task as Running in Completed tasks section
> ---
>
> Key: MESOS-6550
> URL: https://issues.apache.org/jira/browse/MESOS-6550
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Megha
> Attachments: screenshot-1.png
>
>
> This is particularly happening when an agent is marked unreachable and as a 
> result master marks the tasks from partition unaware frameworks as lost but 
> when the agent comes back up then we see another instance of this task (from 
> unaware framework) as running on the master ui in the completed tasks section 
> although the master already sent a task kill for this task and agent acted on 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6550) Mesos master ui shows a Lost task as Running in Completed tasks section

2016-11-04 Thread Megha (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megha updated MESOS-6550:
-
Attachment: screenshot-1.png

> Mesos master ui shows a Lost task as Running in Completed tasks section
> ---
>
> Key: MESOS-6550
> URL: https://issues.apache.org/jira/browse/MESOS-6550
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Megha
> Attachments: screenshot-1.png
>
>
> This is particularly happening when an agent is marked unreachable and as a 
> result master marks the tasks from partition unaware frameworks as lost but 
> when the agent comes back up then we see another instance of this task (from 
> unaware framework) as running on the master ui in the completed tasks section 
> although the master already sent a task kill for this task and agent acted on 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6550) Mesos master ui shows a Lost task as Running in Completed tasks section

2016-11-04 Thread Megha (JIRA)
Megha created MESOS-6550:


 Summary: Mesos master ui shows a Lost task as Running in Completed 
tasks section
 Key: MESOS-6550
 URL: https://issues.apache.org/jira/browse/MESOS-6550
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Megha


This is particularly happening when an agent is marked unreachable and as a 
result master marks the tasks from partition unaware frameworks as lost but 
when the agent comes back up then we see another instance of this task (from 
unaware framework) as running on the master ui in the completed tasks section 
although the master already sent a task kill for this task and agent acted on 
it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-6549) Asynchronous dir removal in agent GC

2016-11-04 Thread Jacob Janco (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Janco updated MESOS-6549:
---
Comment: was deleted

(was: https://reviews.apache.org/r/53479/)

> Asynchronous dir removal in agent GC
> 
>
> Key: MESOS-6549
> URL: https://issues.apache.org/jira/browse/MESOS-6549
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jacob Janco
>Assignee: Jacob Janco
>  Labels: gc
>
> In src/slave/gc.cpp: 
>   // TODO(bmahler): Other dispatches can block waiting for a removal
>   // operation. To fix this, the removal operation can be done
>   // asynchronously in another thread.
> We did see this occur in our clusters where rmdir operations can take seconds 
> to complete, blocking other queued events leading to, for example, long 
> latencies to task launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6549) Asynchronous dir removal in agent GC

2016-11-04 Thread Jacob Janco (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637001#comment-15637001
 ] 

Jacob Janco commented on MESOS-6549:


https://reviews.apache.org/r/53479/

> Asynchronous dir removal in agent GC
> 
>
> Key: MESOS-6549
> URL: https://issues.apache.org/jira/browse/MESOS-6549
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jacob Janco
>Assignee: Jacob Janco
>  Labels: gc
>
> In src/slave/gc.cpp: 
>   // TODO(bmahler): Other dispatches can block waiting for a removal
>   // operation. To fix this, the removal operation can be done
>   // asynchronously in another thread.
> We did see this occur in our clusters where rmdir operations can take seconds 
> to complete, blocking other queued events leading to, for example, long 
> latencies to task launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6549) Asynchronous dir removal in agent GC

2016-11-04 Thread Jacob Janco (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636997#comment-15636997
 ] 

Jacob Janco commented on MESOS-6549:


https://reviews.apache.org/r/53479/

> Asynchronous dir removal in agent GC
> 
>
> Key: MESOS-6549
> URL: https://issues.apache.org/jira/browse/MESOS-6549
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jacob Janco
>Assignee: Jacob Janco
>  Labels: gc
>
> In src/slave/gc.cpp: 
>   // TODO(bmahler): Other dispatches can block waiting for a removal
>   // operation. To fix this, the removal operation can be done
>   // asynchronously in another thread.
> We did see this occur in our clusters where rmdir operations can take seconds 
> to complete, blocking other queued events leading to, for example, long 
> latencies to task launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6549) Asynchronous dir removal in agent GC

2016-11-04 Thread Jacob Janco (JIRA)
Jacob Janco created MESOS-6549:
--

 Summary: Asynchronous dir removal in agent GC
 Key: MESOS-6549
 URL: https://issues.apache.org/jira/browse/MESOS-6549
 Project: Mesos
  Issue Type: Improvement
Reporter: Jacob Janco
Assignee: Jacob Janco


In src/slave/gc.cpp: 
  // TODO(bmahler): Other dispatches can block waiting for a removal
  // operation. To fix this, the removal operation can be done
  // asynchronously in another thread.

We did see this occur in our clusters where rmdir operations can take seconds 
to complete, blocking other queued events leading to, for example, long 
latencies to task launch. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6520) Make errno an explicit argument for ErrnoError.

2016-11-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636698#comment-15636698
 ] 

James Peach commented on MESOS-6520:


| Support explicit error codes in ErrnoError and SocketError. | 
https://reviews.apache.org/r/53474/ |
| Use explicit error codes in ErrnoError and SocketError. | 
https://reviews.apache.org/r/53475/ |

> Make errno an explicit argument for ErrnoError.
> ---
>
> Key: MESOS-6520
> URL: https://issues.apache.org/jira/browse/MESOS-6520
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Make {{errno}} an explicit argument to {{ErrnoError}}. Right now, the 
> constructor to {{ErrnoError}} references {{errno}} directly, which makes it 
> awkward to pass a custom {{errno}} value (you have to set {{errno}} globally).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6142) Frameworks may RESERVE for an arbitrary role.

2016-11-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1785#comment-1785
 ] 

Gastón Kleiman edited comment on MESOS-6142 at 11/4/16 12:52 PM:
-

Patches:

https://reviews.apache.org/r/52642/
https://reviews.apache.org/r/53470/


was (Author: gkleiman):
Patch: https://reviews.apache.org/r/52642/

> Frameworks may RESERVE for an arbitrary role.
> -
>
> Key: MESOS-6142
> URL: https://issues.apache.org/jira/browse/MESOS-6142
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>Priority: Critical
>  Labels: mesosphere, reservations
>
> The master does not validate that resources from a reservation request have 
> the same role the framework is registered with. As a result, frameworks may 
> reserve resources for arbitrary roles.
> I've modified the role in [the {{ReserveThenUnreserve}} 
> test|https://github.com/apache/mesos/blob/bca600cf5602ed8227d91af9f73d689da14ad786/src/tests/reservation_tests.cpp#L117]
>  to "yoyo" and observed the following in the test's log:
> {noformat}
> I0908 18:35:43.379122 2138112 master.cpp:3362] Processing ACCEPT call for 
> offers: [ dfaf67e6-7c1c-4988-b427-c49842cb7bb7-O0 ] on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train) for framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- 
> (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116
> I0908 18:35:43.379170 2138112 master.cpp:3022] Authorizing principal 
> 'test-principal' to reserve resources 'cpus(yoyo, test-principal):1; 
> mem(yoyo, test-principal):512'
> I0908 18:35:43.379678 2138112 master.cpp:3642] Applying RESERVE operation for 
> resources cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 from 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.379767 2138112 master.cpp:7341] Sending checkpointed resources 
> cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.380273 3211264 slave.cpp:2497] Updated checkpointed resources 
> from  to cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512
> I0908 18:35:43.380574 2674688 hierarchical.cpp:760] Updated allocation of 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 from cpus(*):1; mem(*):512; 
> disk(*):470841; ports(*):[31000-32000] to ports(*):[31000-32000]; cpus(yoyo, 
> test-principal):1; disk(*):470841; mem(yoyo, test-principal):512 with RESERVE 
> operation
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6532) I use mesos container type, I set CommandInfo command set shell cmd, eg: python a.py "xx xxx", but get error

2016-11-04 Thread yongyu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635833#comment-15635833
 ] 

yongyu edited comment on MESOS-6532 at 11/4/16 9:49 AM:


has been fixed.


was (Author: 2507697...@qq.com):
now fix it.

> I use mesos container type, I set CommandInfo command set shell cmd, eg: 
> python a.py "xx  xxx", but get error
> -
>
> Key: MESOS-6532
> URL: https://issues.apache.org/jira/browse/MESOS-6532
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Affects Versions: 1.0.1
>Reporter: yongyu
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6532) I use mesos container type, I set CommandInfo command set shell cmd, eg: python a.py "xx xxx", but get error

2016-11-04 Thread yongyu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635833#comment-15635833
 ] 

yongyu commented on MESOS-6532:
---

now fix it.

> I use mesos container type, I set CommandInfo command set shell cmd, eg: 
> python a.py "xx  xxx", but get error
> -
>
> Key: MESOS-6532
> URL: https://issues.apache.org/jira/browse/MESOS-6532
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Affects Versions: 1.0.1
>Reporter: yongyu
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6548) Support NUMA for tasks

2016-11-04 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6548:

Summary: Support NUMA for tasks  (was: numa-isolator)

> Support NUMA for tasks
> --
>
> Key: MESOS-6548
> URL: https://issues.apache.org/jira/browse/MESOS-6548
> Project: Mesos
>  Issue Type: Epic
>Reporter: haosdent
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6548) numa-isolator

2016-11-04 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6548:

Epic Name: numa-isolator  (was: Support NUMA for tasks)

> numa-isolator
> -
>
> Key: MESOS-6548
> URL: https://issues.apache.org/jira/browse/MESOS-6548
> Project: Mesos
>  Issue Type: Epic
>Reporter: haosdent
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6548) numa-isolator

2016-11-04 Thread haosdent (JIRA)
haosdent created MESOS-6548:
---

 Summary: numa-isolator
 Key: MESOS-6548
 URL: https://issues.apache.org/jira/browse/MESOS-6548
 Project: Mesos
  Issue Type: Epic
Reporter: haosdent






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6541) Mesos test should mount cgroups_root

2016-11-04 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635760#comment-15635760
 ] 

haosdent commented on MESOS-6541:
-

We didn't run test cases after {{clone()}} for now and only Linux support this. 
To make all test cases run an isolated environment, we have used docker in 
Apache Jenkins CI. I still prefer the current way to make test cases run 
without {{CLONE_NEWNS}}, it would expose the problem more obviously if Mesos 
has leak mounts.

> Mesos test should mount cgroups_root
> 
>
> Key: MESOS-6541
> URL: https://issues.apache.org/jira/browse/MESOS-6541
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, test
>Reporter: Yan Xu
>
> Currently on hosts without prior cgroups setup and sysfs is mounted at /sys, 
> mesos tests would fail like this:
> {noformat:title=}
> [ RUN  ] HTTPCommandExecutorTest.TerminateWithACK
> F1103 19:54:40.807538 439804 command_executor_tests.cpp:236] 
> CHECK_SOME(_containerizer): Failed to create launcher: Failed to create Linux 
> launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/fr
> eezer': Failed to create directory '/sys/fs/cgroup/freezer': No such file or 
> directory
> {noformat}
> This is because the agent chooses to use {{LinuxLauncher}} based on 
> availability of the {{freezer}} subsystem alone. However for it to work, one 
> needs to do the following
> {noformat:title=}
> mount -t tmpfs cgroup_root /sys/fs/cgroup
> {noformat}
> in order to make  {{/sys/fs/cgroup}} writable. 
> I have always run the command manually in the past when this failure happens 
> but this could be baffling especially to new developers. Mesos tests should 
> just mount it if it's not already done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6541) Mesos test should mount cgroups_root

2016-11-04 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635759#comment-15635759
 ] 

haosdent commented on MESOS-6541:
-

We didn't run test cases after {{clone()}} for now and only Linux support this. 
To make all test cases run an isolated environment, we have used docker in 
Apache Jenkins CI. I still prefer the current way to make test cases run 
without {{CLONE_NEWNS}}, it would expose the problem more obviously if Mesos 
has leak mounts.

> Mesos test should mount cgroups_root
> 
>
> Key: MESOS-6541
> URL: https://issues.apache.org/jira/browse/MESOS-6541
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, test
>Reporter: Yan Xu
>
> Currently on hosts without prior cgroups setup and sysfs is mounted at /sys, 
> mesos tests would fail like this:
> {noformat:title=}
> [ RUN  ] HTTPCommandExecutorTest.TerminateWithACK
> F1103 19:54:40.807538 439804 command_executor_tests.cpp:236] 
> CHECK_SOME(_containerizer): Failed to create launcher: Failed to create Linux 
> launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/fr
> eezer': Failed to create directory '/sys/fs/cgroup/freezer': No such file or 
> directory
> {noformat}
> This is because the agent chooses to use {{LinuxLauncher}} based on 
> availability of the {{freezer}} subsystem alone. However for it to work, one 
> needs to do the following
> {noformat:title=}
> mount -t tmpfs cgroup_root /sys/fs/cgroup
> {noformat}
> in order to make  {{/sys/fs/cgroup}} writable. 
> I have always run the command manually in the past when this failure happens 
> but this could be baffling especially to new developers. Mesos tests should 
> just mount it if it's not already done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-6541) Mesos test should mount cgroups_root

2016-11-04 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6541:

Comment: was deleted

(was: We didn't run test cases after {{clone()}} for now and only Linux support 
this. To make all test cases run an isolated environment, we have used docker 
in Apache Jenkins CI. I still prefer the current way to make test cases run 
without {{CLONE_NEWNS}}, it would expose the problem more obviously if Mesos 
has leak mounts.)

> Mesos test should mount cgroups_root
> 
>
> Key: MESOS-6541
> URL: https://issues.apache.org/jira/browse/MESOS-6541
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, test
>Reporter: Yan Xu
>
> Currently on hosts without prior cgroups setup and sysfs is mounted at /sys, 
> mesos tests would fail like this:
> {noformat:title=}
> [ RUN  ] HTTPCommandExecutorTest.TerminateWithACK
> F1103 19:54:40.807538 439804 command_executor_tests.cpp:236] 
> CHECK_SOME(_containerizer): Failed to create launcher: Failed to create Linux 
> launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/fr
> eezer': Failed to create directory '/sys/fs/cgroup/freezer': No such file or 
> directory
> {noformat}
> This is because the agent chooses to use {{LinuxLauncher}} based on 
> availability of the {{freezer}} subsystem alone. However for it to work, one 
> needs to do the following
> {noformat:title=}
> mount -t tmpfs cgroup_root /sys/fs/cgroup
> {noformat}
> in order to make  {{/sys/fs/cgroup}} writable. 
> I have always run the command manually in the past when this failure happens 
> but this could be baffling especially to new developers. Mesos tests should 
> just mount it if it's not already done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6418) Avoid popup a new window when open stdout/stderr of the executor

2016-11-04 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6418:

Component/s: webui

> Avoid popup a new window when open stdout/stderr of the executor
> 
>
> Key: MESOS-6418
> URL: https://issues.apache.org/jira/browse/MESOS-6418
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: haosdent
>Assignee: haosdent
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6418) Avoid popup a new window when open stdout/stderr of the executor

2016-11-04 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-6418:
---

Assignee: haosdent

> Avoid popup a new window when open stdout/stderr of the executor
> 
>
> Key: MESOS-6418
> URL: https://issues.apache.org/jira/browse/MESOS-6418
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: haosdent
>Assignee: haosdent
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6465) Add a task_id -> container_id mapping in state.json

2016-11-04 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635558#comment-15635558
 ] 

Jie Yu commented on MESOS-6465:
---

https://reviews.apache.org/r/53467/

> Add a task_id -> container_id mapping in state.json
> ---
>
> Key: MESOS-6465
> URL: https://issues.apache.org/jira/browse/MESOS-6465
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Jie Yu
>  Labels: debugging, mesosphere
>
> Currently, there is no way to get the {{container-id}} of a task from hitting 
> the mesos master alone.  You must first hit the master to get the {{task_id 
> -> agent_id}} and {{task_id -> executor_id}} mappings, then hit the 
> corresponding agent with {{agent_id}} to get the {{executor_id -> 
> container_id}} mapping.
> It would simplify things alot if the {{container_id}} information was 
> immediately available in the {{/tasks}} and {{/state}} endpoints of the 
> master itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6465) Add a task_id -> container_id mapping in state.json

2016-11-04 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635506#comment-15635506
 ] 

Jie Yu commented on MESOS-6465:
---

OK, the plan is to ask default executor to set container_id in container status 
and the agent will use that id to get the proper container status for the 
nested container.

Given that ContainerStatus is in the task status update, this mapping 
information will be available on the master.

> Add a task_id -> container_id mapping in state.json
> ---
>
> Key: MESOS-6465
> URL: https://issues.apache.org/jira/browse/MESOS-6465
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Jie Yu
>  Labels: debugging, mesosphere
>
> Currently, there is no way to get the {{container-id}} of a task from hitting 
> the mesos master alone.  You must first hit the master to get the {{task_id 
> -> agent_id}} and {{task_id -> executor_id}} mappings, then hit the 
> corresponding agent with {{agent_id}} to get the {{executor_id -> 
> container_id}} mapping.
> It would simplify things alot if the {{container_id}} information was 
> immediately available in the {{/tasks}} and {{/state}} endpoints of the 
> master itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6544) MasterMaintenanceTest.InverseOffersFilters is flaky.

2016-11-04 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635400#comment-15635400
 ] 

Benjamin Mahler commented on MESOS-6544:


This will be fixed via MESOS-6545.

> MasterMaintenanceTest.InverseOffersFilters is flaky.
> 
>
> Key: MESOS-6544
> URL: https://issues.apache.org/jira/browse/MESOS-6544
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>
> This test can crash when launching two executors concurrently because the 
> test containerizer is not thread-safe! (see MESOS-6545).
> {noformat}
> [...truncated 78174 lines...]
> I1103 01:40:55.530350 29098 slave.cpp:974] Authenticating with master 
> master@172.17.0.2:58302
> I1103 01:40:55.530432 29098 slave.cpp:985] Using default CRAM-MD5 
> authenticatee
> I1103 01:40:55.530627 29098 slave.cpp:947] Detecting new master
> I1103 01:40:55.530675 29108 authenticatee.cpp:121] Creating new client SASL 
> connection
> I1103 01:40:55.530743 29098 slave.cpp:5587] Received oversubscribable 
> resources {} from the resource estimator
> I1103 01:40:55.530961 29099 master.cpp:6742] Authenticating 
> slave(150)@172.17.0.2:58302
> I1103 01:40:55.531070 29112 authenticator.cpp:414] Starting authentication 
> session for crammd5-authenticatee(357)@172.17.0.2:58302
> I1103 01:40:55.531328 29106 authenticator.cpp:98] Creating new server SASL 
> connection
> I1103 01:40:55.531561 29108 authenticatee.cpp:213] Received SASL 
> authentication mechanisms: CRAM-MD5
> I1103 01:40:55.531604 29108 authenticatee.cpp:239] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I1103 01:40:55.531713 29101 authenticator.cpp:204] Received SASL 
> authentication start
> I1103 01:40:55.531805 29101 authenticator.cpp:326] Authentication requires 
> more steps
> I1103 01:40:55.531921 29108 authenticatee.cpp:259] Received SASL 
> authentication step
> I1103 01:40:55.532120 29101 authenticator.cpp:232] Received SASL 
> authentication step
> I1103 01:40:55.532155 29101 auxprop.cpp:109] Request to lookup properties for 
> user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: false
> I1103 01:40:55.532179 29101 auxprop.cpp:181] Looking up auxiliary property 
> '*userPassword'
> I1103 01:40:55.532233 29101 auxprop.cpp:181] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I1103 01:40:55.532266 29101 auxprop.cpp:109] Request to lookup properties for 
> user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: true
> I1103 01:40:55.532289 29101 auxprop.cpp:131] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1103 01:40:55.532305 29101 auxprop.cpp:131] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1103 01:40:55.532335 29101 authenticator.cpp:318] Authentication success
> I1103 01:40:55.532413 29110 authenticatee.cpp:299] Authentication success
> I1103 01:40:55.532467 29108 master.cpp:6772] Successfully authenticated 
> principal 'test-principal' at slave(150)@172.17.0.2:58302
> I1103 01:40:55.532536 29111 authenticator.cpp:432] Authentication session 
> cleanup for crammd5-authenticatee(357)@172.17.0.2:58302
> I1103 01:40:55.532755 29098 slave.cpp:1069] Successfully authenticated with 
> master master@172.17.0.2:58302
> I1103 01:40:55.532997 29098 slave.cpp:1483] Will retry registration in 
> 12.590371ms if necessary
> I1103 01:40:55.533179 29108 master.cpp:5151] Registering agent at 
> slave(150)@172.17.0.2:58302 (maintenance-host-2) with id 
> 3167a687-904b-4b57-bc0f-91b67dc7e41d-S1
> I1103 01:40:55.533572 29112 registrar.cpp:461] Applied 1 operations in 
> 94467ns; attempting to update the registry
> I1103 01:40:55.546341 29107 slave.cpp:1483] Will retry registration in 
> 36.501523ms if necessary
> I1103 01:40:55.546461 29099 master.cpp:5139] Ignoring register agent message 
> from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
> in progress
> I1103 01:40:55.565403 29097 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 48.099208ms
> I1103 01:40:55.565495 29097 replica.cpp:708] Persisted action TRUNCATE at 
> position 4
> I1103 01:40:55.566788 29097 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I1103 01:40:55.583937 29101 slave.cpp:1483] Will retry registration in 
> 26.127711ms if necessary
> I1103 01:40:55.584123 29112 master.cpp:5139] Ignoring register agent message 
> from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
> in progress
> I1103 

[jira] [Assigned] (MESOS-6544) MasterMaintenanceTest.InverseOffersFilters is flaky.

2016-11-04 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-6544:
--

Assignee: Benjamin Mahler

> MasterMaintenanceTest.InverseOffersFilters is flaky.
> 
>
> Key: MESOS-6544
> URL: https://issues.apache.org/jira/browse/MESOS-6544
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>
> This test can crash when launching two executors concurrently because the 
> test containerizer is not thread-safe! (see MESOS-6545).
> {noformat}
> [...truncated 78174 lines...]
> I1103 01:40:55.530350 29098 slave.cpp:974] Authenticating with master 
> master@172.17.0.2:58302
> I1103 01:40:55.530432 29098 slave.cpp:985] Using default CRAM-MD5 
> authenticatee
> I1103 01:40:55.530627 29098 slave.cpp:947] Detecting new master
> I1103 01:40:55.530675 29108 authenticatee.cpp:121] Creating new client SASL 
> connection
> I1103 01:40:55.530743 29098 slave.cpp:5587] Received oversubscribable 
> resources {} from the resource estimator
> I1103 01:40:55.530961 29099 master.cpp:6742] Authenticating 
> slave(150)@172.17.0.2:58302
> I1103 01:40:55.531070 29112 authenticator.cpp:414] Starting authentication 
> session for crammd5-authenticatee(357)@172.17.0.2:58302
> I1103 01:40:55.531328 29106 authenticator.cpp:98] Creating new server SASL 
> connection
> I1103 01:40:55.531561 29108 authenticatee.cpp:213] Received SASL 
> authentication mechanisms: CRAM-MD5
> I1103 01:40:55.531604 29108 authenticatee.cpp:239] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I1103 01:40:55.531713 29101 authenticator.cpp:204] Received SASL 
> authentication start
> I1103 01:40:55.531805 29101 authenticator.cpp:326] Authentication requires 
> more steps
> I1103 01:40:55.531921 29108 authenticatee.cpp:259] Received SASL 
> authentication step
> I1103 01:40:55.532120 29101 authenticator.cpp:232] Received SASL 
> authentication step
> I1103 01:40:55.532155 29101 auxprop.cpp:109] Request to lookup properties for 
> user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: false
> I1103 01:40:55.532179 29101 auxprop.cpp:181] Looking up auxiliary property 
> '*userPassword'
> I1103 01:40:55.532233 29101 auxprop.cpp:181] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I1103 01:40:55.532266 29101 auxprop.cpp:109] Request to lookup properties for 
> user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: true
> I1103 01:40:55.532289 29101 auxprop.cpp:131] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1103 01:40:55.532305 29101 auxprop.cpp:131] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1103 01:40:55.532335 29101 authenticator.cpp:318] Authentication success
> I1103 01:40:55.532413 29110 authenticatee.cpp:299] Authentication success
> I1103 01:40:55.532467 29108 master.cpp:6772] Successfully authenticated 
> principal 'test-principal' at slave(150)@172.17.0.2:58302
> I1103 01:40:55.532536 29111 authenticator.cpp:432] Authentication session 
> cleanup for crammd5-authenticatee(357)@172.17.0.2:58302
> I1103 01:40:55.532755 29098 slave.cpp:1069] Successfully authenticated with 
> master master@172.17.0.2:58302
> I1103 01:40:55.532997 29098 slave.cpp:1483] Will retry registration in 
> 12.590371ms if necessary
> I1103 01:40:55.533179 29108 master.cpp:5151] Registering agent at 
> slave(150)@172.17.0.2:58302 (maintenance-host-2) with id 
> 3167a687-904b-4b57-bc0f-91b67dc7e41d-S1
> I1103 01:40:55.533572 29112 registrar.cpp:461] Applied 1 operations in 
> 94467ns; attempting to update the registry
> I1103 01:40:55.546341 29107 slave.cpp:1483] Will retry registration in 
> 36.501523ms if necessary
> I1103 01:40:55.546461 29099 master.cpp:5139] Ignoring register agent message 
> from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
> in progress
> I1103 01:40:55.565403 29097 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 48.099208ms
> I1103 01:40:55.565495 29097 replica.cpp:708] Persisted action TRUNCATE at 
> position 4
> I1103 01:40:55.566788 29097 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I1103 01:40:55.583937 29101 slave.cpp:1483] Will retry registration in 
> 26.127711ms if necessary
> I1103 01:40:55.584123 29112 master.cpp:5139] Ignoring register agent message 
> from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
> in progress
> I1103 01:40:55.609695 29097 leveldb.cpp:341] 

[jira] [Assigned] (MESOS-6545) TestContainerizer is not thread-safe.

2016-11-04 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-6545:
--

Assignee: Benjamin Mahler

> TestContainerizer is not thread-safe.
> -
>
> Key: MESOS-6545
> URL: https://issues.apache.org/jira/browse/MESOS-6545
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>
> The TestContainerizer is currently not backed by a Process and does not do 
> any explicit synchronization and so is not thread safe.
> Most tests currently cannot trip the concurrency issues, but this surfaced 
> recently in MESOS-6544.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)