[jira] [Created] (MESOS-4773) CMake: Build master executable.

2016-02-24 Thread Diana Arroyo (JIRA)
Diana Arroyo created MESOS-4773:
---

 Summary: CMake: Build master executable.
 Key: MESOS-4773
 URL: https://issues.apache.org/jira/browse/MESOS-4773
 Project: Mesos
  Issue Type: Task
  Components: cmake
Reporter: Diana Arroyo






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4754) The "executors" field is exposed under a backwards incompatible schema.

2016-02-24 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166848#comment-15166848
 ] 

Michael Park commented on MESOS-4754:
-

{noformat}
commit d99c778de22954c0b3f7089be45ef250386fccd1
Author: Michael Park 
Date:   Wed Feb 24 22:35:39 2016 -0800

Added missing `json` declaration for `ExecutorInfo`.

Review: https://reviews.apache.org/r/43937/
{noformat}

> The "executors" field is exposed under a backwards incompatible schema.
> ---
>
> Key: MESOS-4754
> URL: https://issues.apache.org/jira/browse/MESOS-4754
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
> Fix For: 0.27.2
>
>
> In 0.26.0, the master's {{/state}} endpoint generated the following:
> {code}
> {
>   /* ... */
>   "frameworks": [
> {
>   /* ... */
>   "executors": [
> {
>   "command": {
> "argv": [],
> "uris": [],
> "value": 
> "/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"
>   },
>   "executor_id": "default",
>   "framework_id": "0ea528a9-64ba-417f-98ea-9c4b8d418db6-",
>   "name": "Long Lived Executor (C++)",
>   "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
>   },
>   "slave_id": "8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0"
> }
>   ],
>   /* ... */
> }
>   ]
>   /* ... */
> }
> {code}
> In 0.27.1, the {{ExecutorInfo}} is mistakenly exposed in the raw protobuf 
> schema:
> {code}
> {
>   /* ... */
>   "frameworks": [
> {
>   /* ... */
>   "executors": [
> {
>   "command": {
> "shell": true,
> "value": 
> "/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"
>   },
>   "executor_id": {
> "value": "default"
>   },
>   "framework_id": {
> "value": "368a5a49-480b-41f6-a13b-24a69c92a72e-"
>   },
>   "name": "Long Lived Executor (C++)",
>   "slave_id": "8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0",
>   "source": "cpp_long_lived_framework"
> }
>   ],
>   /* ... */
> }
>   ]
>   /* ... */
> }
> {code}
> This is a backwards incompatible API change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4772) TaskInfo/ExecutorInfo should include owner information

2016-02-24 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4772:
--
Labels: authorization mesosphere ownership security  (was: authorization 
ownership security)

> TaskInfo/ExecutorInfo should include owner information
> --
>
> Key: MESOS-4772
> URL: https://issues.apache.org/jira/browse/MESOS-4772
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Adam B
>  Labels: authorization, mesosphere, ownership, security
>
> We need a way to assign fine-grained ownership to tasks/executors so that 
> multi-user frameworks can tell Mesos to associate the task with a user 
> identity (rather than just the framework principal+role). Then, when an HTTP 
> user requests to view the task's sandbox contents, or kill the task, or list 
> all tasks, the authorizer can determine whether to allow/deny/filter the 
> request based on finer-grained, user-level ownership.
> Some systems may want TaskInfo.owner to represent a group rather than an 
> individual user. That's fine as long as the framework sets the field to the 
> group ID in such a way that a group-aware authorizer can interpret it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4772) TaskInfo/ExecutorInfo should include owner information

2016-02-24 Thread Adam B (JIRA)
Adam B created MESOS-4772:
-

 Summary: TaskInfo/ExecutorInfo should include owner information
 Key: MESOS-4772
 URL: https://issues.apache.org/jira/browse/MESOS-4772
 Project: Mesos
  Issue Type: Improvement
  Components: security
Reporter: Adam B


We need a way to assign fine-grained ownership to tasks/executors so that 
multi-user frameworks can tell Mesos to associate the task with a user identity 
(rather than just the framework principal+role). Then, when an HTTP user 
requests to view the task's sandbox contents, or kill the task, or list all 
tasks, the authorizer can determine whether to allow/deny/filter the request 
based on finer-grained, user-level ownership.
Some systems may want TaskInfo.owner to represent a group rather than an 
individual user. That's fine as long as the framework sets the field to the 
group ID in such a way that a group-aware authorizer can interpret it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-24 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166618#comment-15166618
 ] 

haosdent commented on MESOS-4676:
-

I also find this issue https://github.com/docker/docker/issues/19950 It said 
should be 
{quote}
This error is from stdcopy package which muxes stdout/stderr streams. It seems 
like now it writes something weird; I think it can also be golang version 
change.
{quote}
And I could reproduce through the example code in the issue 
https://gist.github.com/dpiddy/0c460a8bb297ee19a7a0

Verify that add {{-t}} when {{docker run}} also could avoid this problem.

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>Assignee: Joseph Wu
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> credentials for authentication from '/tmp/HncLLj/credentials'
> [18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using 
> default 'crammd5' authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using 
> default 'basic' HTTP authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] 
> Authorization enabled
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
> whitelist given
> [18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
> metadata (8 bytes) to leveldb took 1.517992ms
> [18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.263144  1754 recover.cpp:473] Replica is 
> in STARTING status
> [18:06:25][Step 8/8] 

[jira] [Updated] (MESOS-3078) Recovered resources are not re-allocated until the next allocation delay.

2016-02-24 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-3078:

Assignee: (was: Klaus Ma)

> Recovered resources are not re-allocated until the next allocation delay.
> -
>
> Key: MESOS-3078
> URL: https://issues.apache.org/jira/browse/MESOS-3078
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Mahler
>
> Currently, when resources are recovered, we do not perform an allocation for 
> that slave. Rather, we wait until the next allocation interval.
> For small task, high throughput frameworks, this can have a significant 
> impact on overall throughput, see the following thread:
> http://markmail.org/thread/y6mzfwzlurv6nik3
> We should consider immediately performing a re-allocation for the slave upon 
> resource recovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4677) LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.

2016-02-24 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166531#comment-15166531
 ] 

haosdent commented on MESOS-4677:
-

Nice analyzation!

{quote}
cgroups.procs doesn't change since exec doesn't change the PID. But there may 
be a race between updating the "threads" (cgroup/tasks) and us reading the 
cgroup/tasks.
{quote}
I think cgroup/tasks value always same as cgroup/cgroup.procs here before 
because we only have "cat". According to your analyzation, cgroup/tasks also 
would change here, right?

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.
> ---
>
> Key: MESOS-4677
> URL: https://issues.apache.org/jira/browse/MESOS-4677
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.27
>Reporter: Bernd Mathiske
>Assignee: Joseph Wu
>  Labels: flaky, test
>
> This test fails very often when run on CentOS 7, but may also fail elsewhere 
> sometimes. Unfortunately, it tends to only fail when --verbose is not set. 
> The output is this:
> {noformat}
> [21:45:21][Step 8/8] [ RUN  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> [21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807: 
> Failure
> [21:45:21][Step 8/8] Value of: usage.get().threads()
> [21:45:21][Step 8/8]   Actual: 0
> [21:45:21][Step 8/8] Expected: 1U
> [21:45:21][Step 8/8] Which is: 1
> [21:45:21][Step 8/8] [  FAILED  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4771) Document the network/cni isolator.

2016-02-24 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang reassigned MESOS-4771:
-

Assignee: Qian Zhang

> Document the network/cni isolator.
> --
>
> Key: MESOS-4771
> URL: https://issues.apache.org/jira/browse/MESOS-4771
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> We need to document this isolator in mesos-containerizer.md (e.g., how to 
> configure it, what's the pre-requisite, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4768) MasterMaintenanceTest.InverseOffers is flaky

2016-02-24 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-4768:
-
Shepherd: Joris Van Remoortere
Assignee: Joseph Wu
  Sprint: Mesosphere Sprint 29

> MasterMaintenanceTest.InverseOffers is flaky
> 
>
> Key: MESOS-4768
> URL: https://issues.apache.org/jira/browse/MESOS-4768
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 0.28.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere, test
>
> [MESOS-4169] significantly sped up this test, but also surfaced some more 
> flakiness.  This can be fixed in the same way as [MESOS-4059].
> Verbose logs from ASF Centos7 build:
> {code}
> [ RUN  ] MasterMaintenanceTest.InverseOffers
> I0224 22:35:53.714018  1948 leveldb.cpp:174] Opened db in 2.034387ms
> I0224 22:35:53.714663  1948 leveldb.cpp:181] Compacted db in 608839ns
> I0224 22:35:53.714709  1948 leveldb.cpp:196] Created db iterator in 19043ns
> I0224 22:35:53.714844  1948 leveldb.cpp:202] Seeked to beginning of db in 
> 2330ns
> I0224 22:35:53.714956  1948 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 518ns
> I0224 22:35:53.715092  1948 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0224 22:35:53.715646  1968 recover.cpp:447] Starting replica recovery
> I0224 22:35:53.715915  1981 recover.cpp:473] Replica is in EMPTY status
> I0224 22:35:53.717067  1972 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (4533)@172.17.0.1:36678
> I0224 22:35:53.717445  1981 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0224 22:35:53.717888  1978 recover.cpp:564] Updating replica status to 
> STARTING
> I0224 22:35:53.718585  1979 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 525061ns
> I0224 22:35:53.718618  1979 replica.cpp:320] Persisted replica status to 
> STARTING
> I0224 22:35:53.718827  1982 recover.cpp:473] Replica is in STARTING status
> I0224 22:35:53.719728  1969 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (4534)@172.17.0.1:36678
> I0224 22:35:53.719974  1971 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0224 22:35:53.720369  1970 recover.cpp:564] Updating replica status to VOTING
> I0224 22:35:53.720789  1982 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 322308ns
> I0224 22:35:53.720823  1982 replica.cpp:320] Persisted replica status to 
> VOTING
> I0224 22:35:53.720968  1982 recover.cpp:578] Successfully joined the Paxos 
> group
> I0224 22:35:53.721101  1982 recover.cpp:462] Recover process terminated
> I0224 22:35:53.721698  1982 master.cpp:376] Master 
> aab18b61-7811-4c43-a672-d1a63818c880 (4db5fa128d2d) started on 
> 172.17.0.1:36678
> I0224 22:35:53.721719  1982 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_http="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/MjbcWP/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/MjbcWP/master" --zk_session_timeout="10secs"
> I0224 22:35:53.722039  1982 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I0224 22:35:53.722053  1982 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0224 22:35:53.722061  1982 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/MjbcWP/credentials'
> I0224 22:35:53.722394  1982 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0224 22:35:53.722525  1982 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0224 22:35:53.722661  1982 master.cpp:571] Authorization enabled
> I0224 22:35:53.722813  1968 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0224 22:35:53.722846  1980 whitelist_watcher.cpp:77] No whitelist given
> I0224 22:35:53.724957  1977 master.cpp:1712] The newly elected leader is 
> master@172.17.0.1:36678 with id 

[jira] [Created] (MESOS-4771) Document the network/cni isolator.

2016-02-24 Thread Jie Yu (JIRA)
Jie Yu created MESOS-4771:
-

 Summary: Document the network/cni isolator.
 Key: MESOS-4771
 URL: https://issues.apache.org/jira/browse/MESOS-4771
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu


We need to document this isolator in mesos-containerizer.md (e.g., how to 
configure it, what's the pre-requisite, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4764) The network/cni isolator should report assigned IP address.

2016-02-24 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang reassigned MESOS-4764:
-

Assignee: Qian Zhang

> The network/cni isolator should report assigned IP address. 
> 
>
> Key: MESOS-4764
> URL: https://issues.apache.org/jira/browse/MESOS-4764
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> In order for service discovery to work in some cases, the network/cni 
> isolator needs to report the assigned IP address through the 
> isolator->status() interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3078) Recovered resources are not re-allocated until the next allocation delay.

2016-02-24 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma reassigned MESOS-3078:
---

Assignee: Klaus Ma

> Recovered resources are not re-allocated until the next allocation delay.
> -
>
> Key: MESOS-3078
> URL: https://issues.apache.org/jira/browse/MESOS-3078
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Klaus Ma
>
> Currently, when resources are recovered, we do not perform an allocation for 
> that slave. Rather, we wait until the next allocation interval.
> For small task, high throughput frameworks, this can have a significant 
> impact on overall throughput, see the following thread:
> http://markmail.org/thread/y6mzfwzlurv6nik3
> We should consider immediately performing a re-allocation for the slave upon 
> resource recovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4770) Investigate performance improvements for 'Resources' class.

2016-02-24 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-4770:
--

 Summary: Investigate performance improvements for 'Resources' 
class.
 Key: MESOS-4770
 URL: https://issues.apache.org/jira/browse/MESOS-4770
 Project: Mesos
  Issue Type: Improvement
Reporter: Benjamin Mahler
Priority: Critical


Currently we have some performance issues when we have heavy usage of the 
{{Resources}} class. Currently, we tend to work around these issues (e.g. 
reduce the amount of Resources arithmetic operations in the caller code).

The implementation of {{Resources}} currently consists of wrapping underlying 
{{Resource}} protobuf objects and manipulating them. This is fairly expensive 
compared to doing things more directly with C++ objects.

This ticket is to explore the performance improvements of using C++ objects 
more directly instead of working off of {{Resource}} objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4769) Update state endpoints to allow clients to determine how many resources for a given role have been used

2016-02-24 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-4769:
---
Labels: mesosphere  (was: )

> Update state endpoints to allow clients to determine how many resources for a 
> given role have been used
> ---
>
> Key: MESOS-4769
> URL: https://issues.apache.org/jira/browse/MESOS-4769
> Project: Mesos
>  Issue Type: Task
>Affects Versions: 0.27.1
>Reporter: Michael Gummelt
>  Labels: mesosphere
>
> AFAICT, this is currently impossible.  Say I have a cluster with 4CPUs 
> reserved for {{spark}} and 4CPUs unreserved, I have a framework registered as 
> {{spark}}, and I would like to determine how many CPUs reserved for {{Spark}} 
> have been used.  AFAIK, there are two endpoints with interesting information: 
> {{/master/state}} and {{/master/roles}}.  Both endpoints tell me how many 
> resources are used by the framework registered as {{spark}}, but it doesn't 
> tell me which role those resources belong to (i.e. are they reserved or 
> unreserved).
> A simple fix would be to update {{/master/roles}} to split out resources into 
> "reserved" and "unreserved".  However, this will fail to solve the problem if 
> (and hopefully when) Mesos supports multi-role frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4769) Update state endpoints to allow clients to determine how many resources for a given role have been used

2016-02-24 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-4769:
--

 Summary: Update state endpoints to allow clients to determine how 
many resources for a given role have been used
 Key: MESOS-4769
 URL: https://issues.apache.org/jira/browse/MESOS-4769
 Project: Mesos
  Issue Type: Task
Affects Versions: 0.27.1
Reporter: Michael Gummelt


AFAICT, this is currently impossible.  Say I have a cluster with 4CPUs reserved 
for {{spark}} and 4CPUs unreserved, I have a framework registered as {{spark}}, 
and I would like to determine how many CPUs reserved for {{Spark}} have been 
used.  AFAIK, there are two endpoints with interesting information: 
{{/master/state}} and {{/master/roles}}.  Both endpoints tell me how many 
resources are used by the framework registered as {{spark}}, but it doesn't 
tell me which role those resources belong to (i.e. are they reserved or 
unreserved).

A simple fix would be to update {{/master/roles}} to split out resources into 
"reserved" and "unreserved".  However, this will fail to solve the problem if 
(and hopefully when) Mesos supports multi-role frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4768) MasterMaintenanceTest.InverseOffers is flaky

2016-02-24 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-4768:


 Summary: MasterMaintenanceTest.InverseOffers is flaky
 Key: MESOS-4768
 URL: https://issues.apache.org/jira/browse/MESOS-4768
 Project: Mesos
  Issue Type: Bug
  Components: tests
Affects Versions: 0.28.0
Reporter: Joseph Wu


[MESOS-4169] significantly sped up this test, but also surfaced some more 
flakiness.  This can be fixed in the same way as [MESOS-4059].

Verbose logs from ASF Centos7 build:
{code}
[ RUN  ] MasterMaintenanceTest.InverseOffers
I0224 22:35:53.714018  1948 leveldb.cpp:174] Opened db in 2.034387ms
I0224 22:35:53.714663  1948 leveldb.cpp:181] Compacted db in 608839ns
I0224 22:35:53.714709  1948 leveldb.cpp:196] Created db iterator in 19043ns
I0224 22:35:53.714844  1948 leveldb.cpp:202] Seeked to beginning of db in 2330ns
I0224 22:35:53.714956  1948 leveldb.cpp:271] Iterated through 0 keys in the db 
in 518ns
I0224 22:35:53.715092  1948 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0224 22:35:53.715646  1968 recover.cpp:447] Starting replica recovery
I0224 22:35:53.715915  1981 recover.cpp:473] Replica is in EMPTY status
I0224 22:35:53.717067  1972 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (4533)@172.17.0.1:36678
I0224 22:35:53.717445  1981 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I0224 22:35:53.717888  1978 recover.cpp:564] Updating replica status to STARTING
I0224 22:35:53.718585  1979 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 525061ns
I0224 22:35:53.718618  1979 replica.cpp:320] Persisted replica status to 
STARTING
I0224 22:35:53.718827  1982 recover.cpp:473] Replica is in STARTING status
I0224 22:35:53.719728  1969 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (4534)@172.17.0.1:36678
I0224 22:35:53.719974  1971 recover.cpp:193] Received a recover response from a 
replica in STARTING status
I0224 22:35:53.720369  1970 recover.cpp:564] Updating replica status to VOTING
I0224 22:35:53.720789  1982 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 322308ns
I0224 22:35:53.720823  1982 replica.cpp:320] Persisted replica status to VOTING
I0224 22:35:53.720968  1982 recover.cpp:578] Successfully joined the Paxos group
I0224 22:35:53.721101  1982 recover.cpp:462] Recover process terminated
I0224 22:35:53.721698  1982 master.cpp:376] Master 
aab18b61-7811-4c43-a672-d1a63818c880 (4db5fa128d2d) started on 172.17.0.1:36678
I0224 22:35:53.721719  1982 master.cpp:378] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="false" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/MjbcWP/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
--work_dir="/tmp/MjbcWP/master" --zk_session_timeout="10secs"
I0224 22:35:53.722039  1982 master.cpp:425] Master allowing unauthenticated 
frameworks to register
I0224 22:35:53.722053  1982 master.cpp:428] Master only allowing authenticated 
slaves to register
I0224 22:35:53.722061  1982 credentials.hpp:35] Loading credentials for 
authentication from '/tmp/MjbcWP/credentials'
I0224 22:35:53.722394  1982 master.cpp:468] Using default 'crammd5' 
authenticator
I0224 22:35:53.722525  1982 master.cpp:537] Using default 'basic' HTTP 
authenticator
I0224 22:35:53.722661  1982 master.cpp:571] Authorization enabled
I0224 22:35:53.722813  1968 hierarchical.cpp:144] Initialized hierarchical 
allocator process
I0224 22:35:53.722846  1980 whitelist_watcher.cpp:77] No whitelist given
I0224 22:35:53.724957  1977 master.cpp:1712] The newly elected leader is 
master@172.17.0.1:36678 with id aab18b61-7811-4c43-a672-d1a63818c880
I0224 22:35:53.725000  1977 master.cpp:1725] Elected as the leading master!
I0224 22:35:53.725023  1977 master.cpp:1470] Recovering from registrar
I0224 22:35:53.725306  1967 registrar.cpp:307] Recovering registrar
I0224 22:35:53.725808  1977 log.cpp:659] Attempting to start the writer
I0224 22:35:53.727145  1973 replica.cpp:493] Replica received implicit promise 
request from (4536)@172.17.0.1:36678 with proposal 1
I0224 22:35:53.727728  1973 

[jira] [Updated] (MESOS-4573) Design doc for scheduler HTTP Stream IDs

2016-02-24 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4573:
-
Description: This ticket is for the design of HTTP stream IDs, for use with 
HTTP schedulers. These IDs allow Mesos to distinguish between different 
instances of HTTP framework schedulers.  (was: This ticket is for the design of 
an HTTP session protocol for use with HTTP schedulers.)

> Design doc for scheduler HTTP Stream IDs
> 
>
> Key: MESOS-4573
> URL: https://issues.apache.org/jira/browse/MESOS-4573
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: http, mesosphere
>
> This ticket is for the design of HTTP stream IDs, for use with HTTP 
> schedulers. These IDs allow Mesos to distinguish between different instances 
> of HTTP framework schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4573) Design doc for scheduler HTTP Stream IDs

2016-02-24 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4573:
-
Summary: Design doc for scheduler HTTP Stream IDs  (was: Design doc for 
scheduler HTTP sessions)

> Design doc for scheduler HTTP Stream IDs
> 
>
> Key: MESOS-4573
> URL: https://issues.apache.org/jira/browse/MESOS-4573
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: http, mesosphere
>
> This ticket is for the design of an HTTP session protocol for use with HTTP 
> schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4573) Design doc for scheduler HTTP Stream IDs

2016-02-24 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163863#comment-15163863
 ] 

Greg Mann commented on MESOS-4573:
--

The design document can be found here: 
https://docs.google.com/document/d/141wvs8upivIRw7I-tW5pW9ABP2gXKMmCB8hsV36ELc0/edit?usp=sharing

> Design doc for scheduler HTTP Stream IDs
> 
>
> Key: MESOS-4573
> URL: https://issues.apache.org/jira/browse/MESOS-4573
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: http, mesosphere
>
> This ticket is for the design of an HTTP session protocol for use with HTTP 
> schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-24 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-4676:
-
  Sprint: Mesosphere Sprint 29
Story Points: 2

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>Assignee: Joseph Wu
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> credentials for authentication from '/tmp/HncLLj/credentials'
> [18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using 
> default 'crammd5' authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using 
> default 'basic' HTTP authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] 
> Authorization enabled
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
> whitelist given
> [18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
> metadata (8 bytes) to leveldb took 1.517992ms
> [18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.263144  1754 recover.cpp:473] Replica is 
> in STARTING status
> [18:06:25][Step 8/8] I0215 17:06:25.264010  1757 master.cpp:1712] The newly 
> elected leader is master@172.30.2.239:39785 with id 
> 112363e2-c680-4946-8fee-d0626ed8b21e
> [18:06:25][Step 8/8] I0215 17:06:25.264044  1757 master.cpp:1725] Elected as 
> the leading master!
> [18:06:25][Step 8/8] I0215 17:06:25.264061  1757 master.cpp:1470] Recovering 
> from registrar
> [18:06:25][Step 8/8] I0215 17:06:25.264117  1760 replica.cpp:673] Replica in 
> STARTING 

[jira] [Assigned] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-24 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-4676:


Assignee: Joseph Wu

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>Assignee: Joseph Wu
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> credentials for authentication from '/tmp/HncLLj/credentials'
> [18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using 
> default 'crammd5' authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using 
> default 'basic' HTTP authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] 
> Authorization enabled
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
> whitelist given
> [18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
> metadata (8 bytes) to leveldb took 1.517992ms
> [18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.263144  1754 recover.cpp:473] Replica is 
> in STARTING status
> [18:06:25][Step 8/8] I0215 17:06:25.264010  1757 master.cpp:1712] The newly 
> elected leader is master@172.30.2.239:39785 with id 
> 112363e2-c680-4946-8fee-d0626ed8b21e
> [18:06:25][Step 8/8] I0215 17:06:25.264044  1757 master.cpp:1725] Elected as 
> the leading master!
> [18:06:25][Step 8/8] I0215 17:06:25.264061  1757 master.cpp:1470] Recovering 
> from registrar
> [18:06:25][Step 8/8] I0215 17:06:25.264117  1760 replica.cpp:673] Replica in 
> STARTING status received a 

[jira] [Commented] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-24 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163720#comment-15163720
 ] 

Joseph Wu commented on MESOS-4676:
--

Based on the linked issue ("Bug report for Docker 1.9.1 on Fedora"), it looks 
like docker has some sort of race when the containerized process writes to both 
stdout & stderr at the same time.

To mitigate the test hitting this:
* Try separating the two {{echo}} commands.
* Try using the {{unbuffer}} utility. i.e. {{unbuffer echo foo; unbuffer echo 
bar 1>&2}}.  See https://github.com/docker/docker/issues/1385

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> credentials for authentication from '/tmp/HncLLj/credentials'
> [18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using 
> default 'crammd5' authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using 
> default 'basic' HTTP authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] 
> Authorization enabled
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
> whitelist given
> [18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
> metadata (8 bytes) to leveldb took 1.517992ms
> [18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.263144  1754 recover.cpp:473] Replica is 
> in STARTING status
> [18:06:25][Step 8/8] I0215 17:06:25.264010  1757 master.cpp:1712] The newly 
> elected leader is 

[jira] [Created] (MESOS-4767) Apply batching to allocation events to reduce allocator backlogging.

2016-02-24 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-4767:
--

 Summary: Apply batching to allocation events to reduce allocator 
backlogging.
 Key: MESOS-4767
 URL: https://issues.apache.org/jira/browse/MESOS-4767
 Project: Mesos
  Issue Type: Improvement
  Components: allocation
Reporter: Benjamin Mahler


Per the 
[discussion|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377]
 that came out of MESOS-3157, we'd like to batch together outstanding 
allocation dispatches in order to avoid backing up the allocator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4694) DRFAllocator takes very long to allocate resources with a large number of frameworks

2016-02-24 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-4694:
---
Issue Type: Improvement  (was: Bug)

> DRFAllocator takes very long to allocate resources with a large number of 
> frameworks
> 
>
> Key: MESOS-4694
> URL: https://issues.apache.org/jira/browse/MESOS-4694
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Affects Versions: 0.26.0, 0.27.0, 0.27.1
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>
> With a growing number of connected frameworks, the allocation time grows to 
> very high numbers. The addition of quota in 0.27 had an additional impact on 
> these numbers. Running `mesos-tests.sh --benchmark 
> --gtest_filter=HierarchicalAllocator_BENCHMARK_Test.DeclineOffers` gives us 
> the following numbers:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.921202secs to make 200 offers
> round 1 allocate took 2.85045secs to make 200 offers
> round 2 allocate took 2.823768secs to make 200 offers
> {noformat}
> Increasing the number of frameworks to 2000:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 28.209454secs to make 2000 offers
> round 1 allocate took 28.469419secs to make 2000 offers
> round 2 allocate took 28.138086secs to make 2000 offers
> {noformat}
> I was able to reduce this time by a substantial amount. After applying the 
> patches:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 1.016226secs to make 2000 offers
> round 1 allocate took 1.102729secs to make 2000 offers
> round 2 allocate took 1.102624secs to make 2000 offers
> {noformat}
> And with 2000 frameworks:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 12.563203secs to make 2000 offers
> round 1 allocate took 12.437517secs to make 2000 offers
> round 2 allocate took 12.470708secs to make 2000 offers
> {noformat}
> The patches do 3 things to improve the performance of the allocator.
> 1) The total values in the DRFSorter will be pre calculated per resource type
> 2) In the allocate method, when no resources are available to allocate, we 
> break out of the innermost loop to prevent looping over a large number of 
> frameworks when we have nothing to allocate
> 3) when a framework suppresses offers, we remove it from the sorter instead 
> of just calling continue in the allocation loop - this greatly improves 
> performance in the sorter and prevents looping over frameworks that don't 
> need resources
> Assuming that most of the frameworks behave nicely and suppress offers when 
> they have nothing to schedule, it is fair to assume, that point 3) has the 
> biggest impact on the performance. If we suppress offers for 90% of the 
> frameworks in the benchmark test, we see following numbers:
> {noformat}
> ==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 200 slaves and 2000 frameworks
> round 0 allocate took 11626us to make 200 offers
> round 1 allocate took 22890us to make 200 offers
> round 2 allocate took 21346us to make 200 offers
> {noformat}
> And for 200 frameworks:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 1.11178secs to make 2000 offers
> round 1 allocate took 1.062649secs to make 2000 offers
> round 2 allocate took 1.080181secs to make 2000 offers
> {noformat}
> Review requests:
> https://reviews.apache.org/r/43665/
> https://reviews.apache.org/r/43666/
> 

[jira] [Created] (MESOS-4766) Improve allocator performance.

2016-02-24 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-4766:
--

 Summary: Improve allocator performance.
 Key: MESOS-4766
 URL: https://issues.apache.org/jira/browse/MESOS-4766
 Project: Mesos
  Issue Type: Epic
  Components: allocation
Reporter: Benjamin Mahler
Priority: Critical


This is an epic to track the various tickets around improving the performance 
of the allocator, including the following:

* Preventing un-necessary backup of the allocator.
* Reducing the cost of allocations and allocator state updates.
* Improving performance of the DRF sorter.
* More benchmarking to simulate scenarios with performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4738) Expose egress bandwidth as a resource

2016-02-24 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163563#comment-15163563
 ] 

Jie Yu commented on MESOS-4738:
---

Do we have a shepherd for this work? [~idownes], are you going to shepherd this 
work?

> Expose egress bandwidth as a resource
> -
>
> Key: MESOS-4738
> URL: https://issues.apache.org/jira/browse/MESOS-4738
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Assignee: Cong Wang
>Priority: Minor
>  Labels: mesosphere
>
> Some of our users care about variable network network isolation. Although we 
> cannot fundamentally limit ingress network bandwidth, having it as a 
> resource, so we can drop packets above a specific limit would be attractive. 
> It would be nice to expose egress and ingress bandwidth as an agent resource, 
> perhaps with a default of 10,000 mbps, and we can allow people to adjust as 
> needed. Alternatively, a more advanced design would involve generating 
> heuristics based on an analysis of the network MII / PHY. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4738) Expose egress bandwidth as a resource

2016-02-24 Thread Cong Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cong Wang reassigned MESOS-4738:


Assignee: Cong Wang

> Expose egress bandwidth as a resource
> -
>
> Key: MESOS-4738
> URL: https://issues.apache.org/jira/browse/MESOS-4738
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Assignee: Cong Wang
>Priority: Minor
>  Labels: mesosphere
>
> Some of our users care about variable network network isolation. Although we 
> cannot fundamentally limit ingress network bandwidth, having it as a 
> resource, so we can drop packets above a specific limit would be attractive. 
> It would be nice to expose egress and ingress bandwidth as an agent resource, 
> perhaps with a default of 10,000 mbps, and we can allow people to adjust as 
> needed. Alternatively, a more advanced design would involve generating 
> heuristics based on an analysis of the network MII / PHY. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4765) Add equality operator for `process::http::URL` objects.

2016-02-24 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4765:
-

 Summary: Add equality operator for `process::http::URL` objects.
 Key: MESOS-4765
 URL: https://issues.apache.org/jira/browse/MESOS-4765
 Project: Mesos
  Issue Type: Task
  Components: HTTP API, libprocess
Reporter: Anand Mazumdar
Priority: Minor


Currently two {{process::http::URL}} objects cannot be compared. It would be 
good to add an equality operator for comparing them. This might require a 
hostname lookup provided that the {{URL}} object was constructed from 
{{domain}} and not from {{net::IP}}.

The other details can be similar to the equality operator semantics of the 
corresponding Java 7 URL object: 
https://docs.oracle.com/javase/7/docs/api/java/net/URL.html#equals(java.lang.Object)

Also, it would also us to get rid of the corresponding {{URL}} object 
comparison in {{type_utils.cpp}} that just compares if the serialized strings 
match.

{code}
// TODO(bmahler): Leverage process::http::URL for equality.
bool operator==(const URL& left, const URL& right)
{
  return left.SerializeAsString() == right.SerializeAsString();
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4641) Support Container Network Interface (CNI).

2016-02-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4641:
--
Epic Name: CNI Support  (was: cni)

> Support Container Network Interface (CNI).
> --
>
> Key: MESOS-4641
> URL: https://issues.apache.org/jira/browse/MESOS-4641
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> CoreOS developed the Container Network Interface (CNI), a proposed standard 
> for configuring network interfaces for Linux containers. Many CNI plugins 
> (e.g., calico) have already been developed.
> https://coreos.com/blog/rkt-cni-networking.html
> https://github.com/appc/cni/blob/master/SPEC.md
> Kubernetes supports CNI as well.
> http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-libnetwork.html
> In the context of Unified Containerizer, it would be nice if we can have a 
> 'network/cni' isolator which will speak the CNI protocol and prepare the 
> network for the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4764) The network/cni isolator should report assigned IP address.

2016-02-24 Thread Jie Yu (JIRA)
Jie Yu created MESOS-4764:
-

 Summary: The network/cni isolator should report assigned IP 
address. 
 Key: MESOS-4764
 URL: https://issues.apache.org/jira/browse/MESOS-4764
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu


In order for service discovery to work in some cases, the network/cni isolator 
needs to report the assigned IP address through the isolator->status() 
interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4763) Add test mock for CNI plugins.

2016-02-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4763:
--
Labels: mesosphere  (was: )

> Add test mock for CNI plugins.
> --
>
> Key: MESOS-4763
> URL: https://issues.apache.org/jira/browse/MESOS-4763
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> In order to test the network/cni isolator, we need to mock the behavior of an 
> CNI plugin. One option is to write a mock script which acts as a CNI plugin. 
> The isolator will talk to the mock script the same way it talks to an actual 
> CNI plugin.
> The mock script can just join the host network?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4677) LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.

2016-02-24 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-4677:


Assignee: Joseph Wu

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.
> ---
>
> Key: MESOS-4677
> URL: https://issues.apache.org/jira/browse/MESOS-4677
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.27
>Reporter: Bernd Mathiske
>Assignee: Joseph Wu
>  Labels: flaky, test
>
> This test fails very often when run on CentOS 7, but may also fail elsewhere 
> sometimes. Unfortunately, it tends to only fail when --verbose is not set. 
> The output is this:
> {noformat}
> [21:45:21][Step 8/8] [ RUN  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> [21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807: 
> Failure
> [21:45:21][Step 8/8] Value of: usage.get().threads()
> [21:45:21][Step 8/8]   Actual: 0
> [21:45:21][Step 8/8] Expected: 1U
> [21:45:21][Step 8/8] Which is: 1
> [21:45:21][Step 8/8] [  FAILED  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4763) Add test mock for CNI plugins.

2016-02-24 Thread Jie Yu (JIRA)
Jie Yu created MESOS-4763:
-

 Summary: Add test mock for CNI plugins.
 Key: MESOS-4763
 URL: https://issues.apache.org/jira/browse/MESOS-4763
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Avinash Sridharan


In order to test the network/cni isolator, we need to mock the behavior of an 
CNI plugin. One option is to write a mock script which acts as a CNI plugin. 
The isolator will talk to the mock script the same way it talks to an actual 
CNI plugin.

The mock script can just join the host network?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4742) Design doc for CNI isolator

2016-02-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4742:
--
Labels: mesosphere  (was: )

> Design doc for CNI isolator
> ---
>
> Key: MESOS-4742
> URL: https://issues.apache.org/jira/browse/MESOS-4742
> Project: Mesos
>  Issue Type: Documentation
>  Components: isolation
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> This ticket is for the design of isolator for Container Network Interface 
> (CNI).
> Design doc: 
> https://docs.google.com/document/d/1FFZwPHPZqS17cRQvsbbWyQbZpwIoHFR_N6AAApRv514/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4762) Setup proper DNS resolver for containers in network/cni isolator.

2016-02-24 Thread Jie Yu (JIRA)
Jie Yu created MESOS-4762:
-

 Summary: Setup proper DNS resolver for containers in network/cni 
isolator.
 Key: MESOS-4762
 URL: https://issues.apache.org/jira/browse/MESOS-4762
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Avinash Sridharan


Please get more context from the design doc (MESOS-4742).

The CNI plugin will return the DNS information about the network. The 
network/cni isolator needs to properly setup /etc/resolv.conf for the 
container. We should consider the following cases:
1) container is using host filesystem
2) container is using a different filesystem
3) custom executor and command executor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4762) Setup proper DNS resolver for containers in network/cni isolator.

2016-02-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4762:
--
Labels: mesosphere  (was: )

> Setup proper DNS resolver for containers in network/cni isolator.
> -
>
> Key: MESOS-4762
> URL: https://issues.apache.org/jira/browse/MESOS-4762
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Please get more context from the design doc (MESOS-4742).
> The CNI plugin will return the DNS information about the network. The 
> network/cni isolator needs to properly setup /etc/resolv.conf for the 
> container. We should consider the following cases:
> 1) container is using host filesystem
> 2) container is using a different filesystem
> 3) custom executor and command executor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4761) Add agent flags to allow operators to specify CNI plugin and config directories.

2016-02-24 Thread Jie Yu (JIRA)
Jie Yu created MESOS-4761:
-

 Summary: Add agent flags to allow operators to specify CNI plugin 
and config directories.
 Key: MESOS-4761
 URL: https://issues.apache.org/jira/browse/MESOS-4761
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Qian Zhang


According to design doc, we plan to add the following flags:

“--network_cni_plugins_dir”
Location of the CNI plugin binaries. The “network/cni” isolator will find CNI 
plugins under this directory so that it can execute the plugins to add/delete 
container from the CNI networks. It is the operator’s responsibility to install 
the CNI plugin binaries in the specified directory.

“--network_cni_config_dir”
Location of the CNI network configuration files. For each network that 
containers launched in Mesos agent can connect to, the operator should install 
a network configuration file in JSON format in the specified directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4742) Design doc for CNI isolator

2016-02-24 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-4742:
-
Issue Type: Documentation  (was: Bug)

> Design doc for CNI isolator
> ---
>
> Key: MESOS-4742
> URL: https://issues.apache.org/jira/browse/MESOS-4742
> Project: Mesos
>  Issue Type: Documentation
>  Components: isolation
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> This ticket is for the design of isolator for Container Network Interface 
> (CNI).
> Design doc: 
> https://docs.google.com/document/d/1FFZwPHPZqS17cRQvsbbWyQbZpwIoHFR_N6AAApRv514/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4758) Add a 'name' field into NetworkInfo.

2016-02-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4758:
--
Issue Type: Task  (was: Bug)

> Add a 'name' field into NetworkInfo.
> 
>
> Key: MESOS-4758
> URL: https://issues.apache.org/jira/browse/MESOS-4758
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> This allows the framework writer to specify the name of the network they want 
> their container to join.
> Why not using 'groups'? That's because there might be multiple groups under a 
> single network (e.g., admin vs. user, public vs. private, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4759) Add network/cni isolator for Mesos containerizer.

2016-02-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4759:
--
Issue Type: Task  (was: Bug)

> Add network/cni isolator for Mesos containerizer.
> -
>
> Key: MESOS-4759
> URL: https://issues.apache.org/jira/browse/MESOS-4759
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> See the design doc for more context (MESOS-4742).
> The isolator will interact with CNI plugins to create the network for the 
> container to join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4759) Add network/cni isolator for Mesos containerizer.

2016-02-24 Thread Jie Yu (JIRA)
Jie Yu created MESOS-4759:
-

 Summary: Add network/cni isolator for Mesos containerizer.
 Key: MESOS-4759
 URL: https://issues.apache.org/jira/browse/MESOS-4759
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu


See the design doc for more context (MESOS-4742).

The isolator will interact with CNI plugins to create the network for the 
container to join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4759) Add network/cni isolator for Mesos containerizer.

2016-02-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4759:
--
Shepherd: Jie Yu

> Add network/cni isolator for Mesos containerizer.
> -
>
> Key: MESOS-4759
> URL: https://issues.apache.org/jira/browse/MESOS-4759
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> See the design doc for more context (MESOS-4742).
> The isolator will interact with CNI plugins to create the network for the 
> container to join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4759) Add network/cni isolator for Mesos containerizer.

2016-02-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4759:
--
Assignee: Qian Zhang

> Add network/cni isolator for Mesos containerizer.
> -
>
> Key: MESOS-4759
> URL: https://issues.apache.org/jira/browse/MESOS-4759
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> See the design doc for more context (MESOS-4742).
> The isolator will interact with CNI plugins to create the network for the 
> container to join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-02-24 Thread Michael Browning (JIRA)
Michael Browning created MESOS-4760:
---

 Summary: Expose metrics and gauges for fetcher cache usage and hit 
rate
 Key: MESOS-4760
 URL: https://issues.apache.org/jira/browse/MESOS-4760
 Project: Mesos
  Issue Type: Improvement
  Components: fetcher, statistics
Reporter: Michael Browning
Priority: Minor


To evaluate the fetcher cache and calibrate the value of the fetcher_cache_size 
flag, it would be useful to have metrics and gauges on agents that expose 
operational statistics like cache hit rate, occupied cache size, and time spent 
downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4758) Add a 'name' field into NetworkInfo.

2016-02-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4758:
--
Story Points: 1

> Add a 'name' field into NetworkInfo.
> 
>
> Key: MESOS-4758
> URL: https://issues.apache.org/jira/browse/MESOS-4758
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> This allows the framework writer to specify the name of the network they want 
> their container to join.
> Why not using 'groups'? That's because there might be multiple groups under a 
> single network (e.g., admin vs. user, public vs. private, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4758) Add a 'name' field into NetworkInfo.

2016-02-24 Thread Jie Yu (JIRA)
Jie Yu created MESOS-4758:
-

 Summary: Add a 'name' field into NetworkInfo.
 Key: MESOS-4758
 URL: https://issues.apache.org/jira/browse/MESOS-4758
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu
Assignee: Qian Zhang


This allows the framework writer to specify the name of the network they want 
their container to join.

Why not using 'groups'? That's because there might be multiple groups under a 
single network (e.g., admin vs. user, public vs. private, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4742) Design doc for CNI isolator

2016-02-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4742:
--
Description: 
This ticket is for the design of isolator for Container Network Interface (CNI).

Design doc: 
https://docs.google.com/document/d/1FFZwPHPZqS17cRQvsbbWyQbZpwIoHFR_N6AAApRv514/edit?usp=sharing

  was:This ticket is for the design of isolator for Container Network Interface 
(CNI).


> Design doc for CNI isolator
> ---
>
> Key: MESOS-4742
> URL: https://issues.apache.org/jira/browse/MESOS-4742
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> This ticket is for the design of isolator for Container Network Interface 
> (CNI).
> Design doc: 
> https://docs.google.com/document/d/1FFZwPHPZqS17cRQvsbbWyQbZpwIoHFR_N6AAApRv514/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-4757:
-

Assignee: Jie Yu

> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-24 Thread Jie Yu (JIRA)
Jie Yu created MESOS-4757:
-

 Summary: Mesos containerizer should get uid/gids before pivot_root.
 Key: MESOS-4757
 URL: https://issues.apache.org/jira/browse/MESOS-4757
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu


Currently, we call os::su(user) after pivot_root. This is problematic because 
/etc/passwd and /etc/group might be missing in container's root filesystem. We 
should instead, get the uid/gids before pivot_root, and call setuid/setgroups 
after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4738) Expose egress bandwidth as a resource

2016-02-24 Thread Sargun Dhillon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163473#comment-15163473
 ] 

Sargun Dhillon commented on MESOS-4738:
---

Yeah, we're using DRR for egress load balancing at the moment with our security 
stuff.

> Expose egress bandwidth as a resource
> -
>
> Key: MESOS-4738
> URL: https://issues.apache.org/jira/browse/MESOS-4738
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: mesosphere
>
> Some of our users care about variable network network isolation. Although we 
> cannot fundamentally limit ingress network bandwidth, having it as a 
> resource, so we can drop packets above a specific limit would be attractive. 
> It would be nice to expose egress and ingress bandwidth as an agent resource, 
> perhaps with a default of 10,000 mbps, and we can allow people to adjust as 
> needed. Alternatively, a more advanced design would involve generating 
> heuristics based on an analysis of the network MII / PHY. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4756) DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard is flaky on CentOS 6

2016-02-24 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-4756:


 Summary: DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard 
is flaky on CentOS 6
 Key: MESOS-4756
 URL: https://issues.apache.org/jira/browse/MESOS-4756
 Project: Mesos
  Issue Type: Bug
  Components: tests
Affects Versions: 0.27
 Environment: Centos6 (AWS) + GCC 4.9
Reporter: Joseph Wu


{code}
[ RUN  ] DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard
I0224 17:50:26.577450 17755 leveldb.cpp:174] Opened db in 6.715352ms
I0224 17:50:26.579607 17755 leveldb.cpp:181] Compacted db in 2.128954ms
I0224 17:50:26.579648 17755 leveldb.cpp:196] Created db iterator in 16927ns
I0224 17:50:26.579661 17755 leveldb.cpp:202] Seeked to beginning of db in 1408ns
I0224 17:50:26.579669 17755 leveldb.cpp:271] Iterated through 0 keys in the db 
in 343ns
I0224 17:50:26.579721 17755 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0224 17:50:26.580185 17776 recover.cpp:447] Starting replica recovery
I0224 17:50:26.580382 17776 recover.cpp:473] Replica is in EMPTY status
I0224 17:50:26.581264 17770 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (14098)@172.30.2.121:33050
I0224 17:50:26.581771 17772 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I0224 17:50:26.582188 17771 recover.cpp:564] Updating replica status to STARTING
I0224 17:50:26.583030 17772 master.cpp:376] Master 
00a3ac12-9e76-48f5-92fa-48770b82035d (ip-172-30-2-121.mesosphere.io) started on 
172.30.2.121:33050
I0224 17:50:26.583051 17772 master.cpp:378] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/jSZ9of/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/jSZ9of/master" 
--zk_session_timeout="10secs"
I0224 17:50:26.583328 17772 master.cpp:423] Master only allowing authenticated 
frameworks to register
I0224 17:50:26.583336 17772 master.cpp:428] Master only allowing authenticated 
slaves to register
I0224 17:50:26.583343 17772 credentials.hpp:35] Loading credentials for 
authentication from '/tmp/jSZ9of/credentials'
I0224 17:50:26.583901 17772 master.cpp:468] Using default 'crammd5' 
authenticator
I0224 17:50:26.584022 17772 master.cpp:537] Using default 'basic' HTTP 
authenticator
I0224 17:50:26.584141 17772 master.cpp:571] Authorization enabled
I0224 17:50:26.584234 17770 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 1.955608ms
I0224 17:50:26.584264 17770 replica.cpp:320] Persisted replica status to 
STARTING
I0224 17:50:26.584285 17771 hierarchical.cpp:144] Initialized hierarchical 
allocator process
I0224 17:50:26.584295 17773 whitelist_watcher.cpp:77] No whitelist given
I0224 17:50:26.584463 17775 recover.cpp:473] Replica is in STARTING status
I0224 17:50:26.585260 17771 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (14100)@172.30.2.121:33050
I0224 17:50:26.585553 1 recover.cpp:193] Received a recover response from a 
replica in STARTING status
I0224 17:50:26.586042 17773 recover.cpp:564] Updating replica status to VOTING
I0224 17:50:26.586091 17770 master.cpp:1712] The newly elected leader is 
master@172.30.2.121:33050 with id 00a3ac12-9e76-48f5-92fa-48770b82035d
I0224 17:50:26.586122 17770 master.cpp:1725] Elected as the leading master!
I0224 17:50:26.586146 17770 master.cpp:1470] Recovering from registrar
I0224 17:50:26.586294 17773 registrar.cpp:307] Recovering registrar
I0224 17:50:26.588148 17776 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 1.89126ms
I0224 17:50:26.588171 17776 replica.cpp:320] Persisted replica status to VOTING
I0224 17:50:26.588260 17772 recover.cpp:578] Successfully joined the Paxos group
I0224 17:50:26.588440 17772 recover.cpp:462] Recover process terminated
I0224 17:50:26.588770 17773 log.cpp:659] Attempting to start the writer
I0224 17:50:26.589782 17770 replica.cpp:493] Replica received implicit promise 
request from (14101)@172.30.2.121:33050 with proposal 1
I0224 17:50:26.591498 17770 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb 

[jira] [Commented] (MESOS-4677) LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.

2016-02-24 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163417#comment-15163417
 ] 

Joseph Wu commented on MESOS-4677:
--

My guess is this:
# The first {{usage = isolator.get()->usage(containerId);}} comes right after 
we isolate the test process, by writing to {{cgroup.procs}}.  Underneath, the 
cgroups API probably blocks the write from completing until the cgroups are 
updated.
# We do an {{os::close}} on a parent pipe to trigger the test process into 
{{exec}} ing.
# We immediately call {{usage = isolator.get()->usage(containerId);}} again.
# {{cgroups.procs}} doesn't change since {{exec}} doesn't change the PID.  But 
there may be a race between updating the "threads" ({{cgroup/tasks}}) and us 
reading the {{cgroup/tasks}}.

We can either:
* Import the {{cgroups.h}} header and use {{cgroups_lock}}/{{cgroups_unlock}} 
to synchronize.
* Add a sleep between closing the parent pipe and calling {{->usage(...)}}.
* Do some sort of operation on the test process (which would confirm that it is 
finished {{exec}} ing).  In this case we can write to the {{cat}} test process 
and read the echoed result.

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.
> ---
>
> Key: MESOS-4677
> URL: https://issues.apache.org/jira/browse/MESOS-4677
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.27
>Reporter: Bernd Mathiske
>  Labels: flaky, test
>
> This test fails very often when run on CentOS 7, but may also fail elsewhere 
> sometimes. Unfortunately, it tends to only fail when --verbose is not set. 
> The output is this:
> {noformat}
> [21:45:21][Step 8/8] [ RUN  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> [21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807: 
> Failure
> [21:45:21][Step 8/8] Value of: usage.get().threads()
> [21:45:21][Step 8/8]   Actual: 0
> [21:45:21][Step 8/8] Expected: 1U
> [21:45:21][Step 8/8] Which is: 1
> [21:45:21][Step 8/8] [  FAILED  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4602) Invalid usage of ATOMIC_FLAG_INIT in member initialization

2016-02-24 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163404#comment-15163404
 ] 

Anand Mazumdar commented on MESOS-4602:
---

{code}
commit fd1101db8af8f3ea684a09e2f1d79f5fa9b69496
Author: Yong Tang yong.tang.git...@outlook.com
Date:   Tue Feb 23 10:47:15 2016 +0100

Fixed invalid usage of ATOMIC_FLAG_INIT in libprocess.

Review: https://reviews.apache.org/r/43859/
{code}

> Invalid usage of ATOMIC_FLAG_INIT in member initialization
> --
>
> Key: MESOS-4602
> URL: https://issues.apache.org/jira/browse/MESOS-4602
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Benjamin Bannier
>Assignee: Yong Tang
>  Labels: newbie, tech-debt
>
> MESOS-2925 fixed a few instances where {{ATOMIC_FLAG_INIT}} was used in 
> initializer lists, but missed to fix 
> {{3rdparty/libprocess/src/libevent_ssl_socket.cpp}} (even though the 
> corresponding header was touched).
> There, {{LibeventSSLSocketImpl}}'s {{lock}} member is still (incorrectly) 
> initialized in initializer lists, even though the member is already 
> initialized in the class declaration, so it appears they should be dropped.
> Clang from trunk incorrectly diagnoses the initializations in the initializer 
> lists as benign redundant braces in initialization of a scalar, but they 
> should be fixed for the reasons stated in MESOS-2925.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4602) Invalid usage of ATOMIC_FLAG_INIT in member initialization

2016-02-24 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163399#comment-15163399
 ] 

Yong Tang commented on MESOS-4602:
--

The patch has been applied. Thanks [~bbannier] and  [~tillt] for reviews.

> Invalid usage of ATOMIC_FLAG_INIT in member initialization
> --
>
> Key: MESOS-4602
> URL: https://issues.apache.org/jira/browse/MESOS-4602
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Benjamin Bannier
>Assignee: Yong Tang
>  Labels: newbie, tech-debt
>
> MESOS-2925 fixed a few instances where {{ATOMIC_FLAG_INIT}} was used in 
> initializer lists, but missed to fix 
> {{3rdparty/libprocess/src/libevent_ssl_socket.cpp}} (even though the 
> corresponding header was touched).
> There, {{LibeventSSLSocketImpl}}'s {{lock}} member is still (incorrectly) 
> initialized in initializer lists, even though the member is already 
> initialized in the class declaration, so it appears they should be dropped.
> Clang from trunk incorrectly diagnoses the initializations in the initializer 
> lists as benign redundant braces in initialization of a scalar, but they 
> should be fixed for the reasons stated in MESOS-2925.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-02-24 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163382#comment-15163382
 ] 

Greg Mann commented on MESOS-4492:
--

Sure, I'm happy to help review! [~fan.du], if you could post a link to the 
review request here when you submit it, I'll have a look :-)

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4047) MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky

2016-02-24 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-4047:
-
 Assignee: Alexander Rojas  (was: Joseph Wu)
   Sprint: Mesosphere Sprint 23, Mesosphere Sprint 24, Mesosphere 
Sprint 29  (was: Mesosphere Sprint 23, Mesosphere Sprint 24)
Fix Version/s: 0.28.0

> MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky
> ---
>
> Key: MESOS-4047
> URL: https://issues.apache.org/jira/browse/MESOS-4047
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
> Environment: Ubuntu 14, gcc 4.8.4
>Reporter: Joseph Wu
>Assignee: Alexander Rojas
>  Labels: flaky, flaky-test
> Fix For: 0.27.0, 0.28.0
>
>
> {code:title=Output from passed test}
> [--] 1 test from MemoryPressureMesosTest
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB) copied, 0.000430889 s, 2.4 GB/s
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> I1202 11:09:14.319327  5062 exec.cpp:134] Version: 0.27.0
> I1202 11:09:14.17  5079 exec.cpp:208] Executor registered on slave 
> bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> Registered executor on ubuntu
> Starting task 4e62294c-cfcf-4a13-b699-c6a4b7ac5162
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 5085
> I1202 11:09:14.391739  5077 exec.cpp:254] Received reconnect request from 
> slave bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> I1202 11:09:14.398598  5082 exec.cpp:231] Executor re-registered on slave 
> bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> Re-registered executor on ubuntu
> Shutting down
> Sending SIGTERM to process tree at pid 5085
> Killing the following process trees:
> [ 
> -+- 5085 sh -c while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done 
>  \--- 5086 dd count=512 bs=1M if=/dev/zero of=./temp 
> ]
> [   OK ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (1096 ms)
> {code}
> {code:title=Output from failed test}
> [--] 1 test from MemoryPressureMesosTest
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB) copied, 0.000404489 s, 2.6 GB/s
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> I1202 11:09:15.509950  5109 exec.cpp:134] Version: 0.27.0
> I1202 11:09:15.568183  5123 exec.cpp:208] Executor registered on slave 
> 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0
> Registered executor on ubuntu
> Starting task 14b6bab9-9f60-4130-bdc4-44efba262bc6
> Forked command at 5132
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> I1202 11:09:15.665498  5129 exec.cpp:254] Received reconnect request from 
> slave 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0
> I1202 11:09:15.670995  5123 exec.cpp:381] Executor asked to shutdown
> Shutting down
> Sending SIGTERM to process tree at pid 5132
> ../../src/tests/containerizer/memory_pressure_tests.cpp:283: Failure
> (usage).failure(): Unknown container: ebe90e15-72fa-4519-837b-62f43052c913
> *** Aborted at 1449083355 (unix time) try "date -d @1449083355" if you are 
> using GNU date ***
> {code}
> Notice that in the failed test, the executor is asked to shutdown when it 
> tries to reconnect to the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4738) Expose egress bandwidth as a resource

2016-02-24 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163362#comment-15163362
 ] 

Avinash Sridharan edited comment on MESOS-4738 at 2/24/16 5:30 PM:
---

I agree with [~w013ccw] on this, we don't have a clean way of putting ingress 
bandwidth limits and hence should focus on egress. As far as treating egress 
bandwidth as a resource, we can treat this resource as a minimal rate guarantee 
rather than a fixed rate allocation. Using a DRR scheduler 
(https://en.wikipedia.org/wiki/Deficit_round_robin) on the egress rates should 
guarantee a minimum rate to each container . I think qdisc does support DRR 
(http://manpages.ubuntu.com/manpages/raring/man8/tc-drr.8.html) .


was (Author: avin...@mesosphere.io):
I agree with [~w013ccw] on this, we don't have a clean way of putting ingress 
bandwidth limits and hence should focus on egress. As far as treating egress 
bandwidth as a resource, we can treat this resource as a minimal rate guarantee 
rather than a fixed rate allocation. Using a DRR scheduler 
(https://en.wikipedia.org/wiki/Deficit_round_robin) on the egress rates should 
guarantee a minimum rate to each container . I think qdisc does support DRR.

> Expose egress bandwidth as a resource
> -
>
> Key: MESOS-4738
> URL: https://issues.apache.org/jira/browse/MESOS-4738
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: mesosphere
>
> Some of our users care about variable network network isolation. Although we 
> cannot fundamentally limit ingress network bandwidth, having it as a 
> resource, so we can drop packets above a specific limit would be attractive. 
> It would be nice to expose egress and ingress bandwidth as an agent resource, 
> perhaps with a default of 10,000 mbps, and we can allow people to adjust as 
> needed. Alternatively, a more advanced design would involve generating 
> heuristics based on an analysis of the network MII / PHY. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4738) Expose egress bandwidth as a resource

2016-02-24 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163362#comment-15163362
 ] 

Avinash Sridharan commented on MESOS-4738:
--

I agree with [~w013ccw] on this, we don't have a clean way of putting ingress 
bandwidth limits and hence should focus on egress. As far as treating egress 
bandwidth as a resource, we can treat this resource as a minimal rate guarantee 
rather than a fixed rate allocation. Using a DRR scheduler 
(https://en.wikipedia.org/wiki/Deficit_round_robin) on the egress rates should 
guarantee a minimum rate to each container . I think qdisc does support DRR.

> Expose egress bandwidth as a resource
> -
>
> Key: MESOS-4738
> URL: https://issues.apache.org/jira/browse/MESOS-4738
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: mesosphere
>
> Some of our users care about variable network network isolation. Although we 
> cannot fundamentally limit ingress network bandwidth, having it as a 
> resource, so we can drop packets above a specific limit would be attractive. 
> It would be nice to expose egress and ingress bandwidth as an agent resource, 
> perhaps with a default of 10,000 mbps, and we can allow people to adjust as 
> needed. Alternatively, a more advanced design would involve generating 
> heuristics based on an analysis of the network MII / PHY. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-02-24 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163353#comment-15163353
 ] 

haosdent commented on MESOS-4492:
-

Keep in mind we also could RESERVE and UNRESERVE through http endpoints. Need 
track them as well.

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-02-24 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163335#comment-15163335
 ] 

haosdent commented on MESOS-4492:
-

Seems you not submit the patch yet?

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4755) Update roleSorter when slave active/deactive

2016-02-24 Thread Klaus Ma (JIRA)
Klaus Ma created MESOS-4755:
---

 Summary: Update roleSorter when slave active/deactive
 Key: MESOS-4755
 URL: https://issues.apache.org/jira/browse/MESOS-4755
 Project: Mesos
  Issue Type: Bug
  Components: allocation
Reporter: Klaus Ma
Assignee: Klaus Ma


Currently, the total resources of {{roleSorter}} are not updated when Agent 
active/deactive.
It need to remove slave.total from roleSorter when deactive, and add it back 
when agent active again.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4747) ContainerLoggerTest.MesosContainerizerRecover cannot be executed in isolation

2016-02-24 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4747:

Sprint: Mesosphere Sprint 29
Labels: mesosphere  (was: )

> ContainerLoggerTest.MesosContainerizerRecover cannot be executed in isolation
> -
>
> Key: MESOS-4747
> URL: https://issues.apache.org/jira/browse/MESOS-4747
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> Some cleanup of spawned processes is missing in 
> {{ContainerLoggerTest.MesosContainerizerRecover}} so that when the test is 
> run in isolation the global teardown might find lingering processes.
> {code}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from ContainerLoggerTest
> [ RUN  ] ContainerLoggerTest.MesosContainerizerRecover
> [   OK ] ContainerLoggerTest.MesosContainerizerRecover (13 ms)
> [--] 1 test from ContainerLoggerTest (13 ms total)
> [--] Global test environment tear-down
> ../../src/tests/environment.cpp:728: Failure
> Failed
> Tests completed with child processes remaining:
> -+- 7112 /SOME/PATH/src/mesos/build/src/.libs/mesos-tests 
> --gtest_filter=ContainerLoggerTest.MesosContainerizerRecover
>  \--- 7130 (sh)
> [==] 1 test from 1 test case ran. (23 ms total)
> [  PASSED  ] 1 test.
> [  FAILED  ] 0 tests, listed below:
>  0 FAILED TESTS
> {code}
> Observered on OS X with clang-trunk and an unoptimized build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4754) The "executors" field is exposed under a backwards incompatible schema.

2016-02-24 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15162684#comment-15162684
 ] 

Michael Park edited comment on MESOS-4754 at 2/24/16 10:08 AM:
---

The issue here is that even though {{src/common/http.cpp}} has a definition of
{{void json(JSON::ObjectWriter* writer, const ExecutorInfo& executorInfo);}},
its declaration is missing from {{src/common/http.hpp}}.

We would have liked this to cause a compiler error, but it didn't because of 
the generic {{json}} function for protobuf messages:
{{inline void json(ObjectWriter* writer, const google::protobuf::Message& 
message)}}, which can jsonify {{ExecutorInfo}} using the protobuf schema.

The resolution will be the following:
  1. Add the missing declaration of {{void json(JSON::ObjectWriter* writer, 
const ExecutorInfo& executorInfo);}} to {{src/common/http.hpp}}
  2. Make the generic {{json}} function that handles protobuf messages to 
require explicit opt-in.

{code}
-writer->field("cgroup_info", status.cgroup_info());
+writer->field("cgroup_info", JSON::Protobuf(status.cgroup_info()));
{code}


was (Author: mcypark):
The issue here is that even though {{src/common/http.cpp}} has a definition of
{{void json(JSON::ObjectWriter* writer, const ExecutorInfo& executorInfo);}},
its declaration is missing from {{src/common/http.hpp}}.

We would have liked this to cause a compiler error, but it didn't because of 
the generic {{json}} function for protobuf messages:
{{inline void json(ObjectWriter* writer, const google::protobuf::Message& 
message)}}, which can jsonify {{ExecutorInfo}} using the protobuf schema.

The resolution will be the following:
  1. Add the missing declaration of {{void json(JSON::ObjectWriter* writer, 
const ExecutorInfo& executorInfo);}} to {{src/common/http.hpp}}
  2. Make the generic {{json}} function that handles protobuf messages to 
required explicit opt-in.

{code}
-writer->field("cgroup_info", status.cgroup_info());
+writer->field("cgroup_info", JSON::Protobuf(status.cgroup_info()));
{code}

> The "executors" field is exposed under a backwards incompatible schema.
> ---
>
> Key: MESOS-4754
> URL: https://issues.apache.org/jira/browse/MESOS-4754
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
> Fix For: 0.27.2
>
>
> In 0.26.0, the master's {{/state}} endpoint generated the following:
> {code}
> {
>   /* ... */
>   "frameworks": [
> {
>   /* ... */
>   "executors": [
> {
>   "command": {
> "argv": [],
> "uris": [],
> "value": 
> "/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"
>   },
>   "executor_id": "default",
>   "framework_id": "0ea528a9-64ba-417f-98ea-9c4b8d418db6-",
>   "name": "Long Lived Executor (C++)",
>   "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
>   },
>   "slave_id": "8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0"
> }
>   ],
>   /* ... */
> }
>   ]
>   /* ... */
> }
> {code}
> In 0.27.1, the {{ExecutorInfo}} is mistakenly exposed in the raw protobuf 
> schema:
> {code}
> {
>   /* ... */
>   "frameworks": [
> {
>   /* ... */
>   "executors": [
> {
>   "command": {
> "shell": true,
> "value": 
> "/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"
>   },
>   "executor_id": {
> "value": "default"
>   },
>   "framework_id": {
> "value": "368a5a49-480b-41f6-a13b-24a69c92a72e-"
>   },
>   "name": "Long Lived Executor (C++)",
>   "slave_id": "8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0",
>   "source": "cpp_long_lived_framework"
> }
>   ],
>   /* ... */
> }
>   ]
>   /* ... */
> }
> {code}
> This is a backwards incompatible API change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4754) The "executors" field is exposed under a backwards incompatible schema.

2016-02-24 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15162684#comment-15162684
 ] 

Michael Park edited comment on MESOS-4754 at 2/24/16 10:07 AM:
---

The issue here is that even though {{src/common/http.cpp}} has a definition of
{{void json(JSON::ObjectWriter* writer, const ExecutorInfo& executorInfo);}},
its declaration is missing from {{src/common/http.hpp}}.

We would have liked this to cause a compiler error, but it didn't because of 
the generic {{json}} function for protobuf messages:
{{inline void json(ObjectWriter* writer, const google::protobuf::Message& 
message)}}, which can jsonify {{ExecutorInfo}} using the protobuf schema.

The resolution will be the following:
  1. Add the missing declaration of {{void json(JSON::ObjectWriter* writer, 
const ExecutorInfo& executorInfo);}} to {{src/common/http.hpp}}
  2. Make the generic {{json}} function that handles protobuf messages to 
required explicit opt-in.

{code}
-writer->field("cgroup_info", status.cgroup_info());
+writer->field("cgroup_info", JSON::Protobuf(status.cgroup_info()));
{code}


was (Author: mcypark):
The issue here is that even though {{src/common/http.cpp}} has a definition of
{{void json(JSON::ObjectWriter* writer, const ExecutorInfo& executorInfo);}},
its declaration is missing from {{src/common/http.hpp}}.

This should have caused a compiler error, but it did not because we provide a 
generic {{json}} function for protobuf messages:
{{inline void json(ObjectWriter* writer, const google::protobuf::Message& 
message)}}, which can jsonify {{ExecutorInfo}} using the protobuf schema.

The resolution will be the following:
  1. Add the missing declaration of {{void json(JSON::ObjectWriter* writer, 
const ExecutorInfo& executorInfo);}} to {{src/common/http.hpp}}
  2. Make the generic {{json}} function that handles protobuf messages to 
required explicit opt-in.

{code}
-writer->field("cgroup_info", status.cgroup_info());
+writer->field("cgroup_info", JSON::Protobuf(status.cgroup_info()));
{code}

> The "executors" field is exposed under a backwards incompatible schema.
> ---
>
> Key: MESOS-4754
> URL: https://issues.apache.org/jira/browse/MESOS-4754
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
> Fix For: 0.27.2
>
>
> In 0.26.0, the master's {{/state}} endpoint generated the following:
> {code}
> {
>   /* ... */
>   "frameworks": [
> {
>   /* ... */
>   "executors": [
> {
>   "command": {
> "argv": [],
> "uris": [],
> "value": 
> "/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"
>   },
>   "executor_id": "default",
>   "framework_id": "0ea528a9-64ba-417f-98ea-9c4b8d418db6-",
>   "name": "Long Lived Executor (C++)",
>   "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
>   },
>   "slave_id": "8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0"
> }
>   ],
>   /* ... */
> }
>   ]
>   /* ... */
> }
> {code}
> In 0.27.1, the {{ExecutorInfo}} is mistakenly exposed in the raw protobuf 
> schema:
> {code}
> {
>   /* ... */
>   "frameworks": [
> {
>   /* ... */
>   "executors": [
> {
>   "command": {
> "shell": true,
> "value": 
> "/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"
>   },
>   "executor_id": {
> "value": "default"
>   },
>   "framework_id": {
> "value": "368a5a49-480b-41f6-a13b-24a69c92a72e-"
>   },
>   "name": "Long Lived Executor (C++)",
>   "slave_id": "8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0",
>   "source": "cpp_long_lived_framework"
> }
>   ],
>   /* ... */
> }
>   ]
>   /* ... */
> }
> {code}
> This is a backwards incompatible API change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4743) Mesos fetcher not working correctly on docker apps on CoreOS

2016-02-24 Thread Guillermo Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15162717#comment-15162717
 ] 

Guillermo Rodriguez commented on MESOS-4743:


Yes it is running inside a container. I installed 0.27.1 but still the same. I 
hope MEOS-4249 fixes it.

Thanks!!!


> Mesos fetcher not working correctly on docker apps on CoreOS
> 
>
> Key: MESOS-4743
> URL: https://issues.apache.org/jira/browse/MESOS-4743
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, fetcher
>Affects Versions: 0.26.0
>Reporter: Guillermo Rodriguez
>
> I initially sent this issue to the Marathon group. They asked me to send it 
> here. This is the original thread:
> https://github.com/mesosphere/marathon/issues/3179
> Then they closed it so I had to ask again with more proof.
> https://github.com/mesosphere/marathon/issues/3213
> In a nutshell, when I start a Marathon task that uses the URI while running 
> on CoreOS. The file is effectively fetched but not passed to the container. I 
> can see the file in the mesos UI but the file is not in the container. It is, 
> however, downloaded to another folder.
> It is very simple to test. The original ticket has two files attaches with a 
> Marathon JSON for a Prometheus server and a prometheus.yml config file. The 
> objective is to start prometheus with the config file.
> CoreOS 899.6
> Mesos 0.26
> Marathon 0.15.2
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4753) Add executor state when reporting resource usage

2016-02-24 Thread Fan Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Du updated MESOS-4753:
--
Description: 
Slave reports resource usage of each executor for resource estimator to feed 
master with revocable resource,  it's better to append executor state as well 
when reporting usage, which in turn resource estimator would easily focus on 
the *RUNNING* executor only.

it's possible to call {{Slave:: getExecutor}} in estimator, but it's possible 
not sync up with the resource usage. 

  was:
Slave reports resource usage of each executor for resource estimator to feed 
master with revocable resource,  it's better to append executor state as well 
when reporting usage, which in turn resource estimator would easily focus on 
the *RUNNING* executor only.

it's possible to call {code} Slave:: getExecutor {code} in estimator, but it's 
possible not sync up with the resource usage. 


> Add executor state when reporting resource usage
> 
>
> Key: MESOS-4753
> URL: https://issues.apache.org/jira/browse/MESOS-4753
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave, statistics
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> Slave reports resource usage of each executor for resource estimator to feed 
> master with revocable resource,  it's better to append executor state as well 
> when reporting usage, which in turn resource estimator would easily focus on 
> the *RUNNING* executor only.
> it's possible to call {{Slave:: getExecutor}} in estimator, but it's possible 
> not sync up with the resource usage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4754) The "executors" field is exposed under a backwards incompatible schema.

2016-02-24 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15162684#comment-15162684
 ] 

Michael Park commented on MESOS-4754:
-

The issue here is that even though {{src/common/http.cpp}} has a definition of
{{void json(JSON::ObjectWriter* writer, const ExecutorInfo& executorInfo);}},
its declaration is missing from {{src/common/http.hpp}}.

This should have caused a compiler error, but it did not because we provide a 
generic {{json}} function for protobuf messages:
{{inline void json(ObjectWriter* writer, const google::protobuf::Message& 
message)}}, which can jsonify {{ExecutorInfo}} using the protobuf schema.

The resolution will be the following:
  1. Add the missing declaration of {{void json(JSON::ObjectWriter* writer, 
const ExecutorInfo& executorInfo);}} to {{src/common/http.hpp}}
  2. Make the generic {{json}} function that handles protobuf messages to 
required explicit opt-in.

{code}
-writer->field("cgroup_info", status.cgroup_info());
+writer->field("cgroup_info", JSON::Protobuf(status.cgroup_info()));
{code}

> The "executors" field is exposed under a backwards incompatible schema.
> ---
>
> Key: MESOS-4754
> URL: https://issues.apache.org/jira/browse/MESOS-4754
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
> Fix For: 0.27.2
>
>
> In 0.26.0, the master's {{/state}} endpoint generated the following:
> {code}
> {
>   /* ... */
>   "frameworks": [
> {
>   /* ... */
>   "executors": [
> {
>   "command": {
> "argv": [],
> "uris": [],
> "value": 
> "/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"
>   },
>   "executor_id": "default",
>   "framework_id": "0ea528a9-64ba-417f-98ea-9c4b8d418db6-",
>   "name": "Long Lived Executor (C++)",
>   "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
>   },
>   "slave_id": "8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0"
> }
>   ],
>   /* ... */
> }
>   ]
>   /* ... */
> }
> {code}
> In 0.27.1, the {{ExecutorInfo}} is mistakenly exposed in the raw protobuf 
> schema:
> {code}
> {
>   /* ... */
>   "frameworks": [
> {
>   /* ... */
>   "executors": [
> {
>   "command": {
> "shell": true,
> "value": 
> "/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"
>   },
>   "executor_id": {
> "value": "default"
>   },
>   "framework_id": {
> "value": "368a5a49-480b-41f6-a13b-24a69c92a72e-"
>   },
>   "name": "Long Lived Executor (C++)",
>   "slave_id": "8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0",
>   "source": "cpp_long_lived_framework"
> }
>   ],
>   /* ... */
> }
>   ]
>   /* ... */
> }
> {code}
> This is a backwards incompatible API change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4754) The "executors" field is exposed under a backwards incompatible schema.

2016-02-24 Thread Michael Park (JIRA)
Michael Park created MESOS-4754:
---

 Summary: The "executors" field is exposed under a backwards 
incompatible schema.
 Key: MESOS-4754
 URL: https://issues.apache.org/jira/browse/MESOS-4754
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Michael Park
Assignee: Michael Park
 Fix For: 0.27.2


In 0.26.0, the master's {{/state}} endpoint generated the following:

{code}
{
  /* ... */
  "frameworks": [
{
  /* ... */
  "executors": [
{
  "command": {
"argv": [],
"uris": [],
"value": 
"/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"
  },
  "executor_id": "default",
  "framework_id": "0ea528a9-64ba-417f-98ea-9c4b8d418db6-",
  "name": "Long Lived Executor (C++)",
  "resources": {
"cpus": 0,
"disk": 0,
"mem": 0
  },
  "slave_id": "8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0"
}
  ],
  /* ... */
}
  ]
  /* ... */
}
{code}

In 0.27.1, the {{ExecutorInfo}} is mistakenly exposed in the raw protobuf 
schema:

{code}
{
  /* ... */
  "frameworks": [
{
  /* ... */
  "executors": [
{
  "command": {
"shell": true,
"value": 
"/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"
  },
  "executor_id": {
"value": "default"
  },
  "framework_id": {
"value": "368a5a49-480b-41f6-a13b-24a69c92a72e-"
  },
  "name": "Long Lived Executor (C++)",
  "slave_id": "8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0",
  "source": "cpp_long_lived_framework"
}
  ],
  /* ... */
}
  ]
  /* ... */
}
{code}

This is a backwards incompatible API change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4747) ContainerLoggerTest.MesosContainerizerRecover cannot be executed in isolation

2016-02-24 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4747:

Shepherd: Adam B

> ContainerLoggerTest.MesosContainerizerRecover cannot be executed in isolation
> -
>
> Key: MESOS-4747
> URL: https://issues.apache.org/jira/browse/MESOS-4747
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
> Fix For: 0.28.0
>
>
> Some cleanup of spawned processes is missing in 
> {{ContainerLoggerTest.MesosContainerizerRecover}} so that when the test is 
> run in isolation the global teardown might find lingering processes.
> {code}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from ContainerLoggerTest
> [ RUN  ] ContainerLoggerTest.MesosContainerizerRecover
> [   OK ] ContainerLoggerTest.MesosContainerizerRecover (13 ms)
> [--] 1 test from ContainerLoggerTest (13 ms total)
> [--] Global test environment tear-down
> ../../src/tests/environment.cpp:728: Failure
> Failed
> Tests completed with child processes remaining:
> -+- 7112 /SOME/PATH/src/mesos/build/src/.libs/mesos-tests 
> --gtest_filter=ContainerLoggerTest.MesosContainerizerRecover
>  \--- 7130 (sh)
> [==] 1 test from 1 test case ran. (23 ms total)
> [  PASSED  ] 1 test.
> [  FAILED  ] 0 tests, listed below:
>  0 FAILED TESTS
> {code}
> Observered on OS X with clang-trunk and an unoptimized build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)