[jira] [Issue Comment Deleted] (MESOS-5184) Mesos does not validate role info when framework registered with specified role
[ https://issues.apache.org/jira/browse/MESOS-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Qiu updated MESOS-5184: Comment: was deleted (was: We also need to validate role when update weight and quota.) > Mesos does not validate role info when framework registered with specified > role > --- > > Key: MESOS-5184 > URL: https://issues.apache.org/jira/browse/MESOS-5184 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.28.0 >Reporter: Liqiang Lin > Fix For: 0.29.0 > > > When framework registered with specified role, Mesos does not validate the > role info. It will accept the subscription and send unreserved resources as > offer to the framework. > {code} > # cat register.json > { > "framework_id": {"value" : "test1"}, > "type":"SUBSCRIBE", > "subscribe":{ > "framework_info":{ > "user":"root", > "name":"test1", > "failover_timeout":60, > "role":"/test/test1", > "id":{"value":"test1"}, > "principal":"test1", > "capabilities":[{"type":"REVOCABLE_RESOURCES"}] > }, > "force":true > } > } > # curl -v http://192.168.56.110:5050/api/v1/scheduler -H "Content-type: > application/json" -X POST -d @register.json > * Hostname was NOT found in DNS cache > * Trying 192.168.56.110... > * Connected to 192.168.56.110 (192.168.56.110) port 5050 (#0) > > POST /api/v1/scheduler HTTP/1.1 > > User-Agent: curl/7.35.0 > > Host: 192.168.56.110:5050 > > Accept: */* > > Content-type: application/json > > Content-Length: 265 > > > * upload completely sent off: 265 out of 265 bytes > < HTTP/1.1 200 OK > < Date: Wed, 06 Apr 2016 21:34:18 GMT > < Transfer-Encoding: chunked > < Mesos-Stream-Id: 8b2c6740-b619-49c3-825a-e6ae780f4edc > < Content-Type: application/json > < > 69 > {"subscribed":{"framework_id":{"value":"test1"}},"type":"SUBSCRIBED"}20 > {"type":"HEARTBEAT"}1531 > {"offers":{"offers":[{"agent_id":{"value":"2cd5576e-6260-4262-a62c-b0dc45c86c45-S0"},"attributes":[{"name":"mesos_agent_type","text":{"value":"IBM_MESOS_EGO"},"type":"TEXT"},{"name":"hostname","text":{"value":"mesos2"},"type":"TEXT"}],"framework_id":{"value":"test1"},"hostname":"mesos2","id":{"value":"5b84aad8-dd60-40b3-84c2-93be6b7aa81c-O0"},"resources":[{"name":"disk","role":"*","scalar":{"value":20576.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"},{"name":"mem","role":"*","scalar":{"value":3952.0},"type":"SCALAR"},{"name":"cpus","role":"*","scalar":{"value":4.0},"type":"SCALAR"}],"url":{"address":{"hostname":"mesos2","ip":"192.168.56.110","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2cd5576e-6260-4262-a62c-b0dc45c86c45-S1"},"attributes":[{"name":"mesos_agent_type","text":{"value":"IBM_MESOS_EGO"},"type":"TEXT"},{"name":"hostname","text":{"value":"mesos1"},"type":"TEXT"}],"framework_id":{"v > alue":"test1"},"hostname":"mesos1","id":{"value":"5b84aad8-dd60-40b3-84c2-93be6b7aa81c-O1"},"resources":[{"name":"disk","role":"*","scalar":{"value":21468.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"},{"name":"mem","role":"*","scalar":{"value":3952.0},"type":"SCALAR"},{"name":"cpus","role":"*","scalar":{"value":4.0},"type":"SCALAR"}],"url":{"address":{"hostname":"mesos1","ip":"192.168.56.111","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"}20 > {"type":"HEARTBEAT"}20 > {code} > As you see, the role under which framework register is "/test/test1", which > is an invalid role according to > [#MESOS-2210|https://issues.apache.org/jira/browse/MESOS-2210] > And Mesos master log > {code} > I0407 05:34:18.132333 20672 master.cpp:2107] Received subscription request > for HTTP framework 'test1' > I0407 05:34:18.133515 20672 master.cpp:2198] Subscribing framework 'test1' > with checkpointing disabled and capabilities [ REVOCABLE_RESOURCES ] > I0407 05:34:18.135027 20674 hierarchical.cpp:264] Added framework test1 > I0407 05:34:18.138746 20672 master.cpp:5659] Sending 2 offers to framework > test1 (test1) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5184) Mesos does not validate role info when framework registered with specified role
[ https://issues.apache.org/jira/browse/MESOS-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236654#comment-15236654 ] Jian Qiu commented on MESOS-5184: - We also need to validate role when update weight and quota. > Mesos does not validate role info when framework registered with specified > role > --- > > Key: MESOS-5184 > URL: https://issues.apache.org/jira/browse/MESOS-5184 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.28.0 >Reporter: Liqiang Lin > Fix For: 0.29.0 > > > When framework registered with specified role, Mesos does not validate the > role info. It will accept the subscription and send unreserved resources as > offer to the framework. > {code} > # cat register.json > { > "framework_id": {"value" : "test1"}, > "type":"SUBSCRIBE", > "subscribe":{ > "framework_info":{ > "user":"root", > "name":"test1", > "failover_timeout":60, > "role":"/test/test1", > "id":{"value":"test1"}, > "principal":"test1", > "capabilities":[{"type":"REVOCABLE_RESOURCES"}] > }, > "force":true > } > } > # curl -v http://192.168.56.110:5050/api/v1/scheduler -H "Content-type: > application/json" -X POST -d @register.json > * Hostname was NOT found in DNS cache > * Trying 192.168.56.110... > * Connected to 192.168.56.110 (192.168.56.110) port 5050 (#0) > > POST /api/v1/scheduler HTTP/1.1 > > User-Agent: curl/7.35.0 > > Host: 192.168.56.110:5050 > > Accept: */* > > Content-type: application/json > > Content-Length: 265 > > > * upload completely sent off: 265 out of 265 bytes > < HTTP/1.1 200 OK > < Date: Wed, 06 Apr 2016 21:34:18 GMT > < Transfer-Encoding: chunked > < Mesos-Stream-Id: 8b2c6740-b619-49c3-825a-e6ae780f4edc > < Content-Type: application/json > < > 69 > {"subscribed":{"framework_id":{"value":"test1"}},"type":"SUBSCRIBED"}20 > {"type":"HEARTBEAT"}1531 > {"offers":{"offers":[{"agent_id":{"value":"2cd5576e-6260-4262-a62c-b0dc45c86c45-S0"},"attributes":[{"name":"mesos_agent_type","text":{"value":"IBM_MESOS_EGO"},"type":"TEXT"},{"name":"hostname","text":{"value":"mesos2"},"type":"TEXT"}],"framework_id":{"value":"test1"},"hostname":"mesos2","id":{"value":"5b84aad8-dd60-40b3-84c2-93be6b7aa81c-O0"},"resources":[{"name":"disk","role":"*","scalar":{"value":20576.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"},{"name":"mem","role":"*","scalar":{"value":3952.0},"type":"SCALAR"},{"name":"cpus","role":"*","scalar":{"value":4.0},"type":"SCALAR"}],"url":{"address":{"hostname":"mesos2","ip":"192.168.56.110","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2cd5576e-6260-4262-a62c-b0dc45c86c45-S1"},"attributes":[{"name":"mesos_agent_type","text":{"value":"IBM_MESOS_EGO"},"type":"TEXT"},{"name":"hostname","text":{"value":"mesos1"},"type":"TEXT"}],"framework_id":{"v > alue":"test1"},"hostname":"mesos1","id":{"value":"5b84aad8-dd60-40b3-84c2-93be6b7aa81c-O1"},"resources":[{"name":"disk","role":"*","scalar":{"value":21468.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"},{"name":"mem","role":"*","scalar":{"value":3952.0},"type":"SCALAR"},{"name":"cpus","role":"*","scalar":{"value":4.0},"type":"SCALAR"}],"url":{"address":{"hostname":"mesos1","ip":"192.168.56.111","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"}20 > {"type":"HEARTBEAT"}20 > {code} > As you see, the role under which framework register is "/test/test1", which > is an invalid role according to > [#MESOS-2210|https://issues.apache.org/jira/browse/MESOS-2210] > And Mesos master log > {code} > I0407 05:34:18.132333 20672 master.cpp:2107] Received subscription request > for HTTP framework 'test1' > I0407 05:34:18.133515 20672 master.cpp:2198] Subscribing framework 'test1' > with checkpointing disabled and capabilities [ REVOCABLE_RESOURCES ] > I0407 05:34:18.135027 20674 hierarchical.cpp:264] Added framework test1 > I0407 05:34:18.138746 20672 master.cpp:5659] Sending 2 offers to framework > test1 (test1) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5060) Requesting /files/read.json with a negative length value causes subsequent /files requests to 404.
[ https://issues.apache.org/jira/browse/MESOS-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236611#comment-15236611 ] zhou xing commented on MESOS-5060: -- Hi Greg, thanks for your reminder! Is that ok for you to help shepherd this ticket? or I will send a mail in the mailing thread to find one > Requesting /files/read.json with a negative length value causes subsequent > /files requests to 404. > -- > > Key: MESOS-5060 > URL: https://issues.apache.org/jira/browse/MESOS-5060 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.23.0 > Environment: Mesos 0.23.0 on CentOS 6, also Mesos 0.28.0 on OSX >Reporter: Tom Petr >Assignee: zhou xing >Priority: Minor > Fix For: 0.29.0 > > > I accidentally hit a slave's /files/read.json endpoint with a negative length > (ex. http://hostname:5051/files/read.json?path=XXX&offset=0&length=-100). The > HTTP request timed out after 30 seconds with nothing relevant in the slave > logs, and subsequent calls to any of the /files endpoints on that slave > immediately returned a HTTP 404 response. We ultimately got things working > again by restarting the mesos-slave process (checkpointing FTW!), but it'd be > wise to guard against negative lengths on the slave's end too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4828) XFS disk quota isolator
[ https://issues.apache.org/jira/browse/MESOS-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236604#comment-15236604 ] Yan Xu commented on MESOS-4828: --- [~jieyu] Hey, the main reason is for consistency with {{posix/disk}}. I realize that there hasn't been too strict of a convention to follow. I don't have a strong preference about it but was aiming for consistency. If we agree to start to use "disk/du" I think "disk/xfs" is fine. Of course leaving "posix/disk" for a deprecation cycle is reasonable. > XFS disk quota isolator > --- > > Key: MESOS-4828 > URL: https://issues.apache.org/jira/browse/MESOS-4828 > Project: Mesos > Issue Type: Epic > Components: isolation >Reporter: James Peach >Assignee: James Peach > > Implement a disk resource isolator using XFS project quotas. Compared to the > {{posix/disk}} isolator, this doesn't need to scan the filesystem > periodically, and applications receive a {{EDQUOT}} error instead of being > summarily killed. > This initial implementation only isolates sandbox directory resources, since > isolation doesn't have any visibility into the the lifecycle of volumes, > which is needed to assign and track project IDs. > The build dependencies for this are XFS header (from xfsprogs-devel) and > libblkid. We need libblkid or the equivalent to map filesystem paths to block > devices in order to apply quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5184) Mesos does not validate role info when framework registered with specified role
[ https://issues.apache.org/jira/browse/MESOS-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liqiang Lin updated MESOS-5184: --- Description: When framework registered with specified role, Mesos does not validate the role info. It will accept the subscription and send unreserved resources as offer to the framework. {code} # cat register.json { "framework_id": {"value" : "test1"}, "type":"SUBSCRIBE", "subscribe":{ "framework_info":{ "user":"root", "name":"test1", "failover_timeout":60, "role":"/test/test1", "id":{"value":"test1"}, "principal":"test1", "capabilities":[{"type":"REVOCABLE_RESOURCES"}] }, "force":true } } # curl -v http://192.168.56.110:5050/api/v1/scheduler -H "Content-type: application/json" -X POST -d @register.json * Hostname was NOT found in DNS cache * Trying 192.168.56.110... * Connected to 192.168.56.110 (192.168.56.110) port 5050 (#0) > POST /api/v1/scheduler HTTP/1.1 > User-Agent: curl/7.35.0 > Host: 192.168.56.110:5050 > Accept: */* > Content-type: application/json > Content-Length: 265 > * upload completely sent off: 265 out of 265 bytes < HTTP/1.1 200 OK < Date: Wed, 06 Apr 2016 21:34:18 GMT < Transfer-Encoding: chunked < Mesos-Stream-Id: 8b2c6740-b619-49c3-825a-e6ae780f4edc < Content-Type: application/json < 69 {"subscribed":{"framework_id":{"value":"test1"}},"type":"SUBSCRIBED"}20 {"type":"HEARTBEAT"}1531 {"offers":{"offers":[{"agent_id":{"value":"2cd5576e-6260-4262-a62c-b0dc45c86c45-S0"},"attributes":[{"name":"mesos_agent_type","text":{"value":"IBM_MESOS_EGO"},"type":"TEXT"},{"name":"hostname","text":{"value":"mesos2"},"type":"TEXT"}],"framework_id":{"value":"test1"},"hostname":"mesos2","id":{"value":"5b84aad8-dd60-40b3-84c2-93be6b7aa81c-O0"},"resources":[{"name":"disk","role":"*","scalar":{"value":20576.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"},{"name":"mem","role":"*","scalar":{"value":3952.0},"type":"SCALAR"},{"name":"cpus","role":"*","scalar":{"value":4.0},"type":"SCALAR"}],"url":{"address":{"hostname":"mesos2","ip":"192.168.56.110","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2cd5576e-6260-4262-a62c-b0dc45c86c45-S1"},"attributes":[{"name":"mesos_agent_type","text":{"value":"IBM_MESOS_EGO"},"type":"TEXT"},{"name":"hostname","text":{"value":"mesos1"},"type":"TEXT"}],"framework_id":{"v alue":"test1"},"hostname":"mesos1","id":{"value":"5b84aad8-dd60-40b3-84c2-93be6b7aa81c-O1"},"resources":[{"name":"disk","role":"*","scalar":{"value":21468.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"},{"name":"mem","role":"*","scalar":{"value":3952.0},"type":"SCALAR"},{"name":"cpus","role":"*","scalar":{"value":4.0},"type":"SCALAR"}],"url":{"address":{"hostname":"mesos1","ip":"192.168.56.111","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"}20 {"type":"HEARTBEAT"}20 {code} As you see, the role under which framework register is "/test/test1", which is an invalid role according to [#MESOS-2210|https://issues.apache.org/jira/browse/MESOS-2210] And Mesos master log {code} I0407 05:34:18.132333 20672 master.cpp:2107] Received subscription request for HTTP framework 'test1' I0407 05:34:18.133515 20672 master.cpp:2198] Subscribing framework 'test1' with checkpointing disabled and capabilities [ REVOCABLE_RESOURCES ] I0407 05:34:18.135027 20674 hierarchical.cpp:264] Added framework test1 I0407 05:34:18.138746 20672 master.cpp:5659] Sending 2 offers to framework test1 (test1) {code} was: When framework registered with specified role, Mesos does not validate the role info. It will accept the subscription and send unreserved resources as offer to the framework. # cat register.json { "framework_id": {"value" : "test1"}, "type":"SUBSCRIBE", "subscribe":{ "framework_info":{ "user":"root", "name":"test1", "failover_timeout":60, "role":"/test/test1", "id":{"value":"test1"}, "principal":"test1", "capabilities":[{"type":"REVOCABLE_RESOURCES"}] }, "force":true } } # curl -v http://192.168.56.110:5050/api/v1/scheduler -H "Content-type: application/json" -X POST -d @register.json * Hostname was NOT found in DNS cache * Trying 192.168.56.110... * Connected to 192.168.56.110 (192.168.56.110) port 5050 (#0) > POST /api/v1/scheduler HTTP/1.1 > User-Agent: curl/7.35.0 > Host: 192.168.56.110:5050 > Accept: */* > Content-type: application/json > Content-Length: 265 > * upload completely sent off: 265 out of 265 bytes < HTTP/1.1 200 OK < Date: Wed, 06 Apr 2016 21:34:18 GMT < Transfer-Encoding: chunked < Mesos-Stream-Id: 8b2c6740-b619-49c3-825a-e6ae780f4edc < Content-Type: application/json < 69 {"subscribed":{"framework_id":{"value":"test1"}},"type":"SUBSCRIBED"}20 {"type":"HEARTBEAT"}1531 {"offers":{"offers":[{"agent_id":{"value":"2cd5576e-6260-4262-a62c-b0dc45c86c45-S0"},"attributes":[{"name":"mesos_agent_type","text":{"value":"IBM_MESOS_EGO"},"
[jira] [Created] (MESOS-5185) Accessibility for Mesos Web UI
haosdent created MESOS-5185: --- Summary: Accessibility for Mesos Web UI Key: MESOS-5185 URL: https://issues.apache.org/jira/browse/MESOS-5185 Project: Mesos Issue Type: Epic Components: webui Reporter: haosdent Priority: Minor Currently, Mesos Web UI do not have fully support Accessibility features for disabled people. For example: Web GUI can support screen reader to read page content for blind person. so we can fix some issues such as making Mesos Web GUI pages to support [WAI-ARIA standard | https://www.w3.org/WAI/intro/aria] We could update webui according to [Accessibility Design Guidelines for the Web|https://msdn.microsoft.com/en-us/library/aa291312(v=vs.71).aspx] and https://www.w3.org/standards/webdesign/accessibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5184) Mesos does not validate role info when framework registered with specified role
Liqiang Lin created MESOS-5184: -- Summary: Mesos does not validate role info when framework registered with specified role Key: MESOS-5184 URL: https://issues.apache.org/jira/browse/MESOS-5184 Project: Mesos Issue Type: Bug Components: general Affects Versions: 0.28.0 Reporter: Liqiang Lin Fix For: 0.29.0 When framework registered with specified role, Mesos does not validate the role info. It will accept the subscription and send unreserved resources as offer to the framework. # cat register.json { "framework_id": {"value" : "test1"}, "type":"SUBSCRIBE", "subscribe":{ "framework_info":{ "user":"root", "name":"test1", "failover_timeout":60, "role":"/test/test1", "id":{"value":"test1"}, "principal":"test1", "capabilities":[{"type":"REVOCABLE_RESOURCES"}] }, "force":true } } # curl -v http://192.168.56.110:5050/api/v1/scheduler -H "Content-type: application/json" -X POST -d @register.json * Hostname was NOT found in DNS cache * Trying 192.168.56.110... * Connected to 192.168.56.110 (192.168.56.110) port 5050 (#0) > POST /api/v1/scheduler HTTP/1.1 > User-Agent: curl/7.35.0 > Host: 192.168.56.110:5050 > Accept: */* > Content-type: application/json > Content-Length: 265 > * upload completely sent off: 265 out of 265 bytes < HTTP/1.1 200 OK < Date: Wed, 06 Apr 2016 21:34:18 GMT < Transfer-Encoding: chunked < Mesos-Stream-Id: 8b2c6740-b619-49c3-825a-e6ae780f4edc < Content-Type: application/json < 69 {"subscribed":{"framework_id":{"value":"test1"}},"type":"SUBSCRIBED"}20 {"type":"HEARTBEAT"}1531 {"offers":{"offers":[{"agent_id":{"value":"2cd5576e-6260-4262-a62c-b0dc45c86c45-S0"},"attributes":[{"name":"mesos_agent_type","text":{"value":"IBM_MESOS_EGO"},"type":"TEXT"},{"name":"hostname","text":{"value":"mesos2"},"type":"TEXT"}],"framework_id":{"value":"test1"},"hostname":"mesos2","id":{"value":"5b84aad8-dd60-40b3-84c2-93be6b7aa81c-O0"},"resources":[{"name":"disk","role":"*","scalar":{"value":20576.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"},{"name":"mem","role":"*","scalar":{"value":3952.0},"type":"SCALAR"},{"name":"cpus","role":"*","scalar":{"value":4.0},"type":"SCALAR"}],"url":{"address":{"hostname":"mesos2","ip":"192.168.56.110","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2cd5576e-6260-4262-a62c-b0dc45c86c45-S1"},"attributes":[{"name":"mesos_agent_type","text":{"value":"IBM_MESOS_EGO"},"type":"TEXT"},{"name":"hostname","text":{"value":"mesos1"},"type":"TEXT"}],"framework_id":{"v alue":"test1"},"hostname":"mesos1","id":{"value":"5b84aad8-dd60-40b3-84c2-93be6b7aa81c-O1"},"resources":[{"name":"disk","role":"*","scalar":{"value":21468.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"},{"name":"mem","role":"*","scalar":{"value":3952.0},"type":"SCALAR"},{"name":"cpus","role":"*","scalar":{"value":4.0},"type":"SCALAR"}],"url":{"address":{"hostname":"mesos1","ip":"192.168.56.111","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"}20 {"type":"HEARTBEAT"}20 As you see, the role under which framework register is "/test/test1", which is an invalid role according to [#MESOS-2210|https://issues.apache.org/jira/browse/MESOS-2210] And Mesos master log I0407 05:34:18.132333 20672 master.cpp:2107] Received subscription request for HTTP framework 'test1' I0407 05:34:18.133515 20672 master.cpp:2198] Subscribing framework 'test1' with checkpointing disabled and capabilities [ REVOCABLE_RESOURCES ] I0407 05:34:18.135027 20674 hierarchical.cpp:264] Added framework test1 I0407 05:34:18.138746 20672 master.cpp:5659] Sending 2 offers to framework test1 (test1) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5183) Provide backup/restore functionality for the registry.
Benjamin Mahler created MESOS-5183: -- Summary: Provide backup/restore functionality for the registry. Key: MESOS-5183 URL: https://issues.apache.org/jira/browse/MESOS-5183 Project: Mesos Issue Type: Epic Components: master Reporter: Benjamin Mahler Priority: Critical Currently there is no built-in support for backup/restore of the registry state. The current suggestion is to back up the LevelDB directories across each master and to restore them. This can be error prone and it requires that operators deal directly with the underlying storage layer. Ideally, the master provides a means to extract the complete registry contents for backup purposes, and has the ability to restore its state from a backup. As a note, the {{/registrar(1)/registry}} endpoint currently provides an ability to extract the state as JSON. There is currently no built-in support for restoring from backups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5182) mesos-executor (CommandScheduler) does not accept offer with revocable resources
Liqiang Lin created MESOS-5182: -- Summary: mesos-executor (CommandScheduler) does not accept offer with revocable resources Key: MESOS-5182 URL: https://issues.apache.org/jira/browse/MESOS-5182 Project: Mesos Issue Type: Bug Components: framework Affects Versions: 0.28.0 Reporter: Liqiang Lin Fix For: 0.29.0 Currently mesos-executor (CommandScheduler) does not accept offer with revocable resources. It's unable to verify cases using revocable resources to launch tasks with this example framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5148) Supporting Container Images in Mesos Containerizer doesn't work by using marathon api
[ https://issues.apache.org/jira/browse/MESOS-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236424#comment-15236424 ] wangqun commented on MESOS-5148: [~haosd...@gmail.com] Thanks for let me understand it. > Supporting Container Images in Mesos Containerizer doesn't work by using > marathon api > - > > Key: MESOS-5148 > URL: https://issues.apache.org/jira/browse/MESOS-5148 > Project: Mesos > Issue Type: Bug >Reporter: wangqun > > Hi > I use the marathon api to create tasks to test Supporting Container > Images in Mesos Containerizer . > My steps is the following: > 1) to run the process in master node. > sudo /usr/sbin/mesos-master --zk=zk://10.0.0.4:2181/mesos --port=5050 > --log_dir=/var/log/mesos --cluster=mesosbay --hostname=10.0.0.4 --ip=10.0.0.4 > --quorum=1 --work_dir=/var/lib/mesos > 2) to run the process in slave node. > sudo /usr/sbin/mesos-slave --master=zk://10.0.0.4:2181/mesos > --log_dir=/var/log/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --hostname=10.0.0.5 --ip=10.0.0.5 > --isolation=docker/runtime,filesystem/linux --work_dir=/tmp/mesos/slave > --image_providers=docker --executor_environment_variables="{}" > 3) to create one json file to specify the container to be managed by mesos. > sudo touch mesos.json > sudo vim mesos.json > { > "container": { > "type": "MESOS", > "docker": { > "image": "library/redis" > } > }, > "id": "ubuntumesos", > "instances": 1, > "cpus": 0.5, > "mem": 512, > "uris": [], > "cmd": "ping 8.8.8.8" > } > 4)sudo curl -X POST -H "Content-Type: application/json" > localhost:8080/v2/apps -d...@mesos.json > 5)sudo curl http://localhost:8080/v2/tasks > {"tasks":[{"id":"ubuntumesos.fc1879be-fc9f-11e5-81e0-024294de4967","host":"10.0.0.5","ipAddresses":[],"ports":[31597],"startedAt":"2016-04-07T09:06:24.900Z","stagedAt":"2016-04-07T09:06:16.611Z","version":"2016-04-07T09:06:14.354Z","slaveId":"058fb5a7-9273-4bfa-83bb-8cb091621e19-S1","appId":"/ubuntumesos","servicePorts":[1]}]} > 6) sudo docker run -ti --net=host redis redis-cli > Could not connect to Redis at 127.0.0.1:6379: Connection refused > not connected> > 7) > I0409 01:43:48.774868 3492 slave.cpp:3886] Executor > 'ubuntumesos.a0b45838-fdf0-11e5-8b4b-0242e2dedfce' of framework > ffb72d7c-dd63-4c30-abea-bb746ab2c326- exited with status 0 > I0409 01:43:48.781307 3492 slave.cpp:3990] Cleaning up executor > 'ubuntumesos.a0b45838-fdf0-11e5-8b4b-0242e2dedfce' of framework > ffb72d7c-dd63-4c30-abea-bb746ab2c326- at executor(1)@10.0.0.5:60134 > I0409 01:43:48.808364 3492 slave.cpp:4078] Cleaning up framework > ffb72d7c-dd63-4c30-abea-bb746ab2c326- > I0409 01:43:48.811336 3493 gc.cpp:55] Scheduling > '/tmp/mesos/slave/slaves/da0e09ff-d5b2-4680-bd7e-b58a2a206497-S0/frameworks/ffb72d7c-dd63-4c30-abea-bb746ab2c326-/executors/ubuntumesos.a0b45838-fdf0-11e5-8b4b-0242e2dedfce/runs/24d0872d-1ba1-4384-be11-a20c82893ea4' > for gc 6.9070953778days in the future > I0409 01:43:48.817401 3493 gc.cpp:55] Scheduling > '/tmp/mesos/slave/slaves/da0e09ff-d5b2-4680-bd7e-b58a2a206497-S0/frameworks/ffb72d7c-dd63-4c30-abea-bb746ab2c326-/executors/ubuntumesos.a0b45838-fdf0-11e5-8b4b-0242e2dedfce' > for gc 6.9065992889days in the future > I0409 01:43:48.823158 3493 gc.cpp:55] Scheduling > '/tmp/mesos/slave/meta/slaves/da0e09ff-d5b2-4680-bd7e-b58a2a206497-S0/frameworks/ffb72d7c-dd63-4c30-abea-bb746ab2c326-/executors/ubuntumesos.a0b45838-fdf0-11e5-8b4b-0242e2dedfce/runs/24d0872d-1ba1-4384-be11-a20c82893ea4' > for gc 6.9065273185days in the future > I0409 01:43:48.826216 3491 status_update_manager.cpp:282] Closing status > update streams for framework ffb72d7c-dd63-4c30-abea-bb746ab2c326- > I0409 01:43:48.835602 3493 gc.cpp:55] Scheduling > '/tmp/mesos/slave/meta/slaves/da0e09ff-d5b2-4680-bd7e-b58a2a206497-S0/frameworks/ffb72d7c-dd63-4c30-abea-bb746ab2c326-/executors/ubuntumesos.a0b45838-fdf0-11e5-8b4b-0242e2dedfce' > for gc 6.9064716444days in the future > I0409 01:43:48.838580 3493 gc.cpp:55] Scheduling > '/tmp/mesos/slave/slaves/da0e09ff-d5b2-4680-bd7e-b58a2a206497-S0/frameworks/ffb72d7c-dd63-4c30-abea-bb746ab2c326-' > for gc 6.9041064889days in the future > I0409 01:43:48.844699 3493 gc.cpp:55] Scheduling > '/tmp/mesos/slave/meta/slaves/da0e09ff-d5b2-4680-bd7e-b58a2a206497-S0/frameworks/ffb72d7c-dd63-4c30-abea-bb746ab2c326-' > for gc 6.902654163days in the future > I0409 01:44:01.623440 3494 slave.cpp:4374] Current disk usage 27.10%. Max > allowed age: 4.403153217546436days > I0409 01:44:32.339310 3494 slave.cpp:1361] Got assigned task > ubuntumesos.9ab04999-fdf4-11e5-8b4b-0242e2dedfce for framework > ffb72d7c-dd63-4c30-abea-bb746ab2c326-
[jira] [Updated] (MESOS-5173) Allow master/agent to take multiple --modules flags
[ https://issues.apache.org/jira/browse/MESOS-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-5173: -- Sprint: Mesosphere Sprint 33 > Allow master/agent to take multiple --modules flags > --- > > Key: MESOS-5173 > URL: https://issues.apache.org/jira/browse/MESOS-5173 > Project: Mesos > Issue Type: Task >Reporter: Kapil Arya >Assignee: Kapil Arya > Labels: mesosphere > Fix For: 0.29.0 > > > When loading multiple modules into master/agent, one has to merge all module > metadata (library name, module name, parameters, etc.) into a single json > file which is then passed on to the --modules flag. This quickly becomes > cumbersome especially if the modules are coming from different > vendors/developers. > An alternate would be to allow multiple invocations of --modules flag that > can then be passed on to the module manager. That way, each flag corresponds > to just one module library and modules from that library. > Another approach is to create a new flag (e.g., --modules-dir) that contains > a path to a directory that would contain multiple json files. One can think > of it as an analogous to systemd units. The operator that drops a new file > into this directory and the file would automatically be picked up by the > master/agent module manager. Further, the naming scheme can also be inherited > to prefix the filename with an "NN_" to signify oad order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5171) Expose state/state.hpp to public headers
[ https://issues.apache.org/jira/browse/MESOS-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-5171: -- Sprint: Mesosphere Sprint 33 > Expose state/state.hpp to public headers > > > Key: MESOS-5171 > URL: https://issues.apache.org/jira/browse/MESOS-5171 > Project: Mesos > Issue Type: Task > Components: replicated log >Reporter: Kapil Arya >Assignee: Kapil Arya > Labels: mesosphere > Fix For: 0.29.0 > > > We want the Modules to be able to use replicated log along with the APIs to > communicate with Zookeeper. This change would require us to expose at least > the following headers state/storage.hpp, and any additional files that > state.hpp depends on (e.g., zookeeper/authentication.hpp). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5180) Scheduler driver does not detect disconnection with master and reregister.
[ https://issues.apache.org/jira/browse/MESOS-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-5180: -- Description: The existing implementation of the scheduler driver does not re-register with the master under some network partition cases. When a scheduler registers with the master: 1) master links to the framework 2) framework links to the master It is possible for either of these links to break *without* the master changing. (Currently, the scheduler driver will only re-register if the master changes). If both links break or if just link (1) breaks, the master views the framework as {{inactive}} and {{disconnected}}. This means the framework will not receive any more events (such as offers) from the master until it re-registers. There is currently no way for the scheduler to detect a one-way link breakage. if link (2) breaks, it makes (almost) no difference to the scheduler. The scheduler usually uses the link to send messages to the master, but libprocess will create another socket if the persistent one is not available. To fix link breakages for (1+2) and (2), the scheduler driver should implement a `::exited` event handler for the master's {{pid}} and trigger a master (re-)detection upon a disconnection. This in turn should make the driver (re)-register with the master. The scheduler library already does this: https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp#L395 See the related issue MESOS-5181 for link (1) breakage. was: The existing implementation of the scheduler driver does not re-register with the master under some network partition cases. When a scheduler registers with the master: 1) master links to the framework 2) framework links to the master It is possible for either of these links to break *without* the master changing. (Currently, the scheduler driver will only re-register if the master changes). If both links break or if just link (1) breaks, the master views the framework as {{inactive}} and {{disconnected}}. This means the framework will not receive any more events (such as offers) from the master until it re-registers. There is currently no way for the scheduler to detect a one-way link breakage. if link (2) breaks, it makes (almost) no difference to the scheduler. The scheduler usually uses the link to send messages to the master, but libprocess will create another socket if the persistent one is not available. To fix link breakages for (1+2) and (2), the scheduler driver should implement a `::exited` event handler for the master's {{pid}} and re-register in this case. See the related issue MESOS-5181 for link (1) breakage. > Scheduler driver does not detect disconnection with master and reregister. > -- > > Key: MESOS-5180 > URL: https://issues.apache.org/jira/browse/MESOS-5180 > Project: Mesos > Issue Type: Bug > Components: scheduler driver >Affects Versions: 0.24.0 >Reporter: Joseph Wu >Assignee: Anand Mazumdar > Labels: mesosphere > > The existing implementation of the scheduler driver does not re-register with > the master under some network partition cases. > When a scheduler registers with the master: > 1) master links to the framework > 2) framework links to the master > It is possible for either of these links to break *without* the master > changing. (Currently, the scheduler driver will only re-register if the > master changes). > If both links break or if just link (1) breaks, the master views the > framework as {{inactive}} and {{disconnected}}. This means the framework > will not receive any more events (such as offers) from the master until it > re-registers. There is currently no way for the scheduler to detect a > one-way link breakage. > if link (2) breaks, it makes (almost) no difference to the scheduler. The > scheduler usually uses the link to send messages to the master, but > libprocess will create another socket if the persistent one is not available. > To fix link breakages for (1+2) and (2), the scheduler driver should > implement a `::exited` event handler for the master's {{pid}} and trigger a > master (re-)detection upon a disconnection. This in turn should make the > driver (re)-register with the master. The scheduler library already does > this: > https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp#L395 > See the related issue MESOS-5181 for link (1) breakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5180) Scheduler driver does not detect disconnection with master and reregister.
[ https://issues.apache.org/jira/browse/MESOS-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-5180: -- Story Points: 2 (was: 1) > Scheduler driver does not detect disconnection with master and reregister. > -- > > Key: MESOS-5180 > URL: https://issues.apache.org/jira/browse/MESOS-5180 > Project: Mesos > Issue Type: Bug > Components: scheduler driver >Affects Versions: 0.24.0 >Reporter: Joseph Wu >Assignee: Anand Mazumdar > Labels: mesosphere > > The existing implementation of the scheduler driver does not re-register with > the master under some network partition cases. > When a scheduler registers with the master: > 1) master links to the framework > 2) framework links to the master > It is possible for either of these links to break *without* the master > changing. (Currently, the scheduler driver will only re-register if the > master changes). > If both links break or if just link (1) breaks, the master views the > framework as {{inactive}} and {{disconnected}}. This means the framework > will not receive any more events (such as offers) from the master until it > re-registers. There is currently no way for the scheduler to detect a > one-way link breakage. > if link (2) breaks, it makes (almost) no difference to the scheduler. The > scheduler usually uses the link to send messages to the master, but > libprocess will create another socket if the persistent one is not available. > To fix link breakages for (1+2) and (2), the scheduler driver should > implement a `::exited` event handler for the master's {{pid}} and re-register > in this case. > See the related issue MESOS-5181 for link (1) breakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5181) Master should reject calls from the scheduler driver if the scheduler is not connected.
[ https://issues.apache.org/jira/browse/MESOS-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-5181: -- Description: When a scheduler registers, the master will create a link from master to scheduler. If this link breaks, the master will consider the scheduler {{inactive}} and mark it as {{disconnected}}. This causes a couple problems: 1) Master does not send offers to {{inactive}} schedulers. But these schedulers might consider themselves "registered" in a one-way network partition scenario. 2) Any calls from the {{inactive}} scheduler is still accepted, which leaves the scheduler in a starved, but semi-functional state. See the related issue for more context: MESOS-5180 There should be an additional guard for registered, but {{inactive}} schedulers here: https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/master.cpp#L1977 The HTTP API already does this: https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/http.cpp#L459 Since the scheduler driver cannot return a 403, it may be necessary to return a {{Event::ERROR}} and force the scheduler to abort. was: When a scheduler registers, the master will create a link from master to scheduler. If this link breaks, the master will consider the scheduler {{inactive}} and {{disconnected}}. This causes a couple problems: 1) Master does not send offers to {{inactive}} schedulers. But these schedulers are still considered "registered". 2) Any calls from the {{inactive}} scheduler is still accepted, which leaves the scheduler in a starved, but semi-functional state. See the related issue for more context: MESOS-5180 There should be an additional guard for registered, but {{inactive}} schedulers here: https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/master.cpp#L1977 The HTTP API already does this: https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/http.cpp#L459 Since the scheduler driver cannot return a 403, it may be necessary to return a {{Event::ERROR}} and force the scheduler to abort. > Master should reject calls from the scheduler driver if the scheduler is not > connected. > --- > > Key: MESOS-5181 > URL: https://issues.apache.org/jira/browse/MESOS-5181 > Project: Mesos > Issue Type: Bug > Components: scheduler driver >Affects Versions: 0.24.0 >Reporter: Joseph Wu >Assignee: Anand Mazumdar > Labels: mesosphere > > When a scheduler registers, the master will create a link from master to > scheduler. If this link breaks, the master will consider the scheduler > {{inactive}} and mark it as {{disconnected}}. > This causes a couple problems: > 1) Master does not send offers to {{inactive}} schedulers. But these > schedulers might consider themselves "registered" in a one-way network > partition scenario. > 2) Any calls from the {{inactive}} scheduler is still accepted, which leaves > the scheduler in a starved, but semi-functional state. > See the related issue for more context: MESOS-5180 > There should be an additional guard for registered, but {{inactive}} > schedulers here: > https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/master.cpp#L1977 > The HTTP API already does this: > https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/http.cpp#L459 > Since the scheduler driver cannot return a 403, it may be necessary to return > a {{Event::ERROR}} and force the scheduler to abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5180) Scheduler driver does not detect disconnection with master and reregister.
[ https://issues.apache.org/jira/browse/MESOS-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-5180: - Description: The existing implementation of the scheduler driver does not re-register with the master under some network partition cases. When a scheduler registers with the master: 1) master links to the framework 2) framework links to the master It is possible for either of these links to break *without* the master changing. (Currently, the scheduler driver will only re-register if the master changes). If both links break or if just link (1) breaks, the master views the framework as {{inactive}} and {{disconnected}}. This means the framework will not receive any more events (such as offers) from the master until it re-registers. There is currently no way for the scheduler to detect a one-way link breakage. if link (2) breaks, it makes (almost) no difference to the scheduler. The scheduler usually uses the link to send messages to the master, but libprocess will create another socket if the persistent one is not available. To fix link breakages for (1+2) and (2), the scheduler driver should implement a `::exited` event handler for the master's {{pid}} and re-register in this case. See the related issue MESOS-5181 for link (1) breakage. was: The existing implementation of the scheduler driver does not re-register with the master under some network partition cases. When a scheduler registers with the master: 1) master links to the framework 2) framework links to the master It is possible for either of these links to break *without* the master changing. (Currently, the scheduler driver will only re-register if the master changes). If both links break or if just link (1) breaks, the master views the framework as {{inactive}} and {{disconnected}}. This means the framework will not receive any more events (such as offers) from the master until it re-registers. There is currently no way for the scheduler to detect a one-way link breakage. if link (2) breaks, it makes (almost) no difference to the scheduler. The scheduler usually uses the link to send messages to the master, but libprocess will create another socket if the persistent one is not available. To fix link breakages for (1+2) and (2), the scheduler driver should implement a `::exited` event handler for the master's {{pid}} and re-register in this case. See the related issue [TODO] for link (1) breakage. > Scheduler driver does not detect disconnection with master and reregister. > -- > > Key: MESOS-5180 > URL: https://issues.apache.org/jira/browse/MESOS-5180 > Project: Mesos > Issue Type: Bug > Components: scheduler driver >Affects Versions: 0.24.0 >Reporter: Joseph Wu >Assignee: Anand Mazumdar > Labels: mesosphere > > The existing implementation of the scheduler driver does not re-register with > the master under some network partition cases. > When a scheduler registers with the master: > 1) master links to the framework > 2) framework links to the master > It is possible for either of these links to break *without* the master > changing. (Currently, the scheduler driver will only re-register if the > master changes). > If both links break or if just link (1) breaks, the master views the > framework as {{inactive}} and {{disconnected}}. This means the framework > will not receive any more events (such as offers) from the master until it > re-registers. There is currently no way for the scheduler to detect a > one-way link breakage. > if link (2) breaks, it makes (almost) no difference to the scheduler. The > scheduler usually uses the link to send messages to the master, but > libprocess will create another socket if the persistent one is not available. > To fix link breakages for (1+2) and (2), the scheduler driver should > implement a `::exited` event handler for the master's {{pid}} and re-register > in this case. > See the related issue MESOS-5181 for link (1) breakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5181) Master should reject calls from the scheduler driver if the scheduler is not connected.
Joseph Wu created MESOS-5181: Summary: Master should reject calls from the scheduler driver if the scheduler is not connected. Key: MESOS-5181 URL: https://issues.apache.org/jira/browse/MESOS-5181 Project: Mesos Issue Type: Bug Components: scheduler driver Affects Versions: 0.24.0 Reporter: Joseph Wu Assignee: Anand Mazumdar When a scheduler registers, the master will create a link from master to scheduler. If this link breaks, the master will consider the scheduler {{inactive}} and {{disconnected}}. This causes a couple problems: 1) Master does not send offers to {{inactive}} schedulers. But these schedulers are still considered "registered". 2) Any calls from the {{inactive}} scheduler is still accepted, which leaves the scheduler in a starved, but semi-functional state. See the related issue for more context: MESOS-5180 There should be an additional guard for registered, but {{inactive}} schedulers here: https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/master.cpp#L1977 The HTTP API already does this: https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/http.cpp#L459 Since the scheduler driver cannot return a 403, it may be necessary to return a {{Event::ERROR}} and force the scheduler to abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5180) Scheduler driver does not detect disconnection with master and reregister.
Joseph Wu created MESOS-5180: Summary: Scheduler driver does not detect disconnection with master and reregister. Key: MESOS-5180 URL: https://issues.apache.org/jira/browse/MESOS-5180 Project: Mesos Issue Type: Bug Components: scheduler driver Affects Versions: 0.24.0 Reporter: Joseph Wu Assignee: Anand Mazumdar The existing implementation of the scheduler driver does not re-register with the master under some network partition cases. When a scheduler registers with the master: 1) master links to the framework 2) framework links to the master It is possible for either of these links to break *without* the master changing. (Currently, the scheduler driver will only re-register if the master changes). If both links break or if just link (1) breaks, the master views the framework as {{inactive}} and {{disconnected}}. This means the framework will not receive any more events (such as offers) from the master until it re-registers. There is currently no way for the scheduler to detect a one-way link breakage. if link (2) breaks, it makes (almost) no difference to the scheduler. The scheduler usually uses the link to send messages to the master, but libprocess will create another socket if the persistent one is not available. To fix link breakages for (1+2) and (2), the scheduler driver should implement a `::exited` event handler for the master's {{pid}} and re-register in this case. See the related issue [TODO] for link (1) breakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5166) ExamplesTest.DynamicReservationFramework is slow
[ https://issues.apache.org/jira/browse/MESOS-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Klaus Ma reassigned MESOS-5166: --- Assignee: Klaus Ma > ExamplesTest.DynamicReservationFramework is slow > > > Key: MESOS-5166 > URL: https://issues.apache.org/jira/browse/MESOS-5166 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Assignee: Klaus Ma > Labels: examples, mesosphere > > For an unoptimized build under OS X > {{ExamplesTest.DynamicReservationFramework}} currently takes more than 13 > seconds on my machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5179) Enhance the error message for Duration flag
Guangya Liu created MESOS-5179: -- Summary: Enhance the error message for Duration flag Key: MESOS-5179 URL: https://issues.apache.org/jira/browse/MESOS-5179 Project: Mesos Issue Type: Bug Reporter: Guangya Liu Assignee: Guangya Liu Enhance the error message for https://github.com/apache/mesos/blob/4dfa91fc21f80204f5125b2e2f35c489f8fb41d8/3rdparty/libprocess/3rdparty/stout/include/stout/duration.hpp#L70 to list all of the supported duration unit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5170) Adapt json creation for authorization based endpoint filtering.
[ https://issues.apache.org/jira/browse/MESOS-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5170: -- Fix Version/s: 0.29.0 > Adapt json creation for authorization based endpoint filtering. > --- > > Key: MESOS-5170 > URL: https://issues.apache.org/jira/browse/MESOS-5170 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: authorization, mesosphere, security > Fix For: 0.29.0 > > > For authorization based endpoint filtering we need to adapt the json endpoint > creation as discussed in MESOS-4931. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5170) Adapt json creation for authorization based endpoint filtering.
[ https://issues.apache.org/jira/browse/MESOS-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5170: -- Assignee: Joerg Schad > Adapt json creation for authorization based endpoint filtering. > --- > > Key: MESOS-5170 > URL: https://issues.apache.org/jira/browse/MESOS-5170 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: authorization, mesosphere, security > Fix For: 0.29.0 > > > For authorization based endpoint filtering we need to adapt the json endpoint > creation as discussed in MESOS-4931. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5169) Introduce new Authorizer Actions for Authorized based filtering of endpoints.
[ https://issues.apache.org/jira/browse/MESOS-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5169: -- Assignee: Joerg Schad Sprint: Mesosphere Sprint 33 Fix Version/s: 0.29.0 Description: For authorization based endpoint filtering we need to introduce the authorizer actions outlined via MESOS-4932. (was: For authorization based endpoint filtering we need to introduce the authorizer actions outlined via MESOS-493.) Component/s: security > Introduce new Authorizer Actions for Authorized based filtering of endpoints. > - > > Key: MESOS-5169 > URL: https://issues.apache.org/jira/browse/MESOS-5169 > Project: Mesos > Issue Type: Improvement > Components: security >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: authorization, mesosphere, security > Fix For: 0.29.0 > > > For authorization based endpoint filtering we need to introduce the > authorizer actions outlined via MESOS-4932. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5168) Benchmark overhead of authorization based filtering.
[ https://issues.apache.org/jira/browse/MESOS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5168: -- Assignee: Joerg Schad Sprint: Mesosphere Sprint 33 Fix Version/s: 0.29.0 > Benchmark overhead of authorization based filtering. > > > Key: MESOS-5168 > URL: https://issues.apache.org/jira/browse/MESOS-5168 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: authorization, mesosphere, security > Fix For: 0.29.0 > > > When adding authorization based filtering as outlined in MESOS-4931 we need > to be careful especially for performance critical endpoints such as /state. > We should ensure via a benchmark that performance does not degreade below an > acceptable state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3578) ProvisionerDockerLocalStoreTest.MetadataManagerInitialization is flaky
[ https://issues.apache.org/jira/browse/MESOS-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236150#comment-15236150 ] Anand Mazumdar commented on MESOS-3578: --- Logs from another ASF CI run. {code} [ RUN ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization E0411 17:14:46.692386 32652 shell.hpp:106] Command 'hadoop version 2>&1' failed; this is the output: sh: 1: hadoop: not found I0411 17:14:46.692488 32652 fetcher.cpp:59] Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was either not found or exited with a non-zero exit status: 127 I0411 17:14:46.692757 32652 local_puller.cpp:90] Creating local puller with docker registry '/tmp/s6Ahtf/images' I0411 17:14:46.695791 32678 metadata_manager.cpp:159] Looking for image 'abc' I0411 17:14:46.696559 32678 local_puller.cpp:142] Untarring image 'abc' from '/tmp/s6Ahtf/images/abc.tar' to '/tmp/s6Ahtf/store/staging/qf0NsJ' I0411 17:14:46.741811 32685 local_puller.cpp:162] The repositories JSON file for image 'abc' is '{"abc":{"latest":"456"}}' I0411 17:14:46.742210 32685 local_puller.cpp:290] Extracting layer tar ball '/tmp/s6Ahtf/store/staging/qf0NsJ/123/layer.tar to rootfs '/tmp/s6Ahtf/store/staging/qf0NsJ/123/rootfs' I0411 17:14:46.747326 32685 local_puller.cpp:290] Extracting layer tar ball '/tmp/s6Ahtf/store/staging/qf0NsJ/456/layer.tar to rootfs '/tmp/s6Ahtf/store/staging/qf0NsJ/456/rootfs' ../../src/tests/containerizer/provisioner_docker_tests.cpp:210: Failure (imageInfo).failure(): Collect failed: Subprocess 'tar, tar, -x, -f, /tmp/s6Ahtf/store/staging/qf0NsJ/123/layer.tar, -C, /tmp/s6Ahtf/store/staging/qf0NsJ/123/rootfs' failed: tar: This does not look like a tar archive tar: Exiting with failure status due to previous errors [ FAILED ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization (204 ms) {code} > ProvisionerDockerLocalStoreTest.MetadataManagerInitialization is flaky > -- > > Key: MESOS-3578 > URL: https://issues.apache.org/jira/browse/MESOS-3578 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Anand Mazumdar > Labels: flaky-test, mesosphere > > Showed up on ASF CI: > https://builds.apache.org/job/Mesos/881/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull > {code} > [ RUN ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization > Using temporary directory > '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE' > I0929 02:36:44.066397 30457 local_puller.cpp:127] Untarring image from > '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE/store/staging/aZND7C' > to > '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE/images/abc:latest.tar' > ../../src/tests/containerizer/provisioner_docker_tests.cpp:843: Failure > (layers).failure(): Collect failed: Untar failed with exit code: exited with > status 2 > [ FAILED ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization > (181 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5064) Remove default value for the agent `work_dir`
[ https://issues.apache.org/jira/browse/MESOS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220643#comment-15220643 ] Greg Mann edited comment on MESOS-5064 at 4/11/16 10:58 PM: Reviews here: https://reviews.apache.org/r/46003/ https://reviews.apache.org/r/46005/ https://reviews.apache.org/r/46004/ https://reviews.apache.org/r/45562/ was (Author: greggomann): Reviews here: https://reviews.apache.org/r/46003/ https://reviews.apache.org/r/46005/ https://reviews.apache.org/r/46004/ https://reviews.apache.org/r/45562/ https://reviews.apache.org/r/46038/ > Remove default value for the agent `work_dir` > - > > Key: MESOS-5064 > URL: https://issues.apache.org/jira/browse/MESOS-5064 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Greg Mann > > Following a crash report from the user we need to be more explicit about the > dangers of using {{/tmp}} as agent {{work_dir}}. In addition, we can remove > the default value for the {{\-\-work_dir}} flag, forcing users to explicitly > set the work directory for the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5159) Add test to verify error when requesting fractional GPUs
[ https://issues.apache.org/jira/browse/MESOS-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236104#comment-15236104 ] Kevin Klues commented on MESOS-5159: Updated to fail with TASK_ERROR semantics based on MESOS-5178 https://reviews.apache.org/r/45970/ > Add test to verify error when requesting fractional GPUs > > > Key: MESOS-5159 > URL: https://issues.apache.org/jira/browse/MESOS-5159 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: gpu, mesosphere > > Fractional GPU requests should immediately cause a TASK_FAILED without ever > launching the task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5178) Add logic to validate for non-fractional GPU requests in the master
Kevin Klues created MESOS-5178: -- Summary: Add logic to validate for non-fractional GPU requests in the master Key: MESOS-5178 URL: https://issues.apache.org/jira/browse/MESOS-5178 Project: Mesos Issue Type: Task Reporter: Kevin Klues Assignee: Kevin Klues We should not put this logic directly into the 'Resources::validate()' function. The primary reason is that the existing 'Resources::validate()' function doesn't consider the semantics of any particular resource when performing its validation (it only makes sure that the fields in the 'Resource' protobuf message are correctly formed). Since a fractional 'gpus' resources is actually well-formed (and only semantically incorrect), we should push this validation logic up into the master. Moreover, the existing logic to construct a 'Resources' object from a 'RepeatedPtrField' silently drops any resources that don't pass 'Resources::validate()'. This means that if we were to push the non-fractional 'gpus' validation into 'Resources::validate()', the 'gpus' resources would just be silently dropped rather than causing a TASK_ERROR in the master. This is obviously *not* the desired behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5174) Update the balloon-framework to run on test clusters
[ https://issues.apache.org/jira/browse/MESOS-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-5174: - Sprint: Mesosphere Sprint 33 > Update the balloon-framework to run on test clusters > > > Key: MESOS-5174 > URL: https://issues.apache.org/jira/browse/MESOS-5174 > Project: Mesos > Issue Type: Improvement > Components: framework, technical debt >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere, tech-debt > > There are a couple of problems with the balloon framework that prevent it > from being deployed (easily) on an actual cluster: > * The framework accepts 100% of memory in an offer. This means the expected > behavior (finish or OOM) is dependent on the offer size. > * The framework assumes the {{balloon-executor}} binary is available on each > agent. This is generally only true in the build environment or in > single-agent test environments. > * The framework does not specify CPUs with the executor. This is required by > many isolators. > * The executor's {{TASK_FINISHED}} logic path was untested and is flaky. > * The framework has no metrics. > * The framework only launches a single task and then exits. With this > behavior, we can't have useful metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4541) Default work_dir slave to /var/lib/mesos instead of /tmp
[ https://issues.apache.org/jira/browse/MESOS-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235989#comment-15235989 ] Greg Mann commented on MESOS-4541: -- In MESOS-5064, we've opted for eliminating the default {{work_dir}} for the agent; this will require users to specify the work directory explicitly. Closing this ticket as "Won't Fix". > Default work_dir slave to /var/lib/mesos instead of /tmp > > > Key: MESOS-4541 > URL: https://issues.apache.org/jira/browse/MESOS-4541 > Project: Mesos > Issue Type: Improvement >Reporter: Nick van 't Hart > > Centos cleanup Daily systemd service > /usr/lib/systemd/system/systemd-tmpfiles-clean.service > # This file is part of systemd. > # > # systemd is free software; you can redistribute it and/or modify it > # under the terms of the GNU Lesser General Public License as published by > # the Free Software Foundation; either version 2.1 of the License, or > # (at your option) any later version. > [Unit] > Description=Cleanup of Temporary Directories > Documentation=man:tmpfiles.d(5) man:systemd-tmpfiles(8) > DefaultDependencies=no > Wants=local-fs.target > After=systemd-readahead-collect.service systemd-readahead-replay.service > local-fs.target > Before=sysinit.target shutdown.target > ConditionDirectoryNotEmpty=|/usr/lib/tmpfiles.d > ConditionDirectoryNotEmpty=|/usr/local/lib/tmpfiles.d > ConditionDirectoryNotEmpty=|/etc/tmpfiles.d > ConditionDirectoryNotEmpty=|/run/tmpfiles.d > [Service] > Type=oneshot > ExecStart=/usr/bin/systemd-tmpfiles --clean > IOSchedulingClass=idle > http://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html > systemd-tmpfiles creates, deletes, and cleans up volatile and temporary files > and directories, based on the configuration file format and location > specified in tmpfiles.d(5). > /usr/lib/tmpfiles.d/tmp.conf > delete all files older then 10 days /tmp/* > change default work_dir for mesos from /tmp to /var/lib/mesos/ > Problems: > - mesos slave crash when deploying from marathon (state of running tasks lost) > - mesos slave restart recovery will not work, because > /tmp/mesos/meta/slaves/latest could not be found > For now maybe add some extra documentation for work_dir option, when using in > production. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5064) Remove default value for the agent `work_dir`
[ https://issues.apache.org/jira/browse/MESOS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-5064: - Story Points: 2 (was: 1) > Remove default value for the agent `work_dir` > - > > Key: MESOS-5064 > URL: https://issues.apache.org/jira/browse/MESOS-5064 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Greg Mann > > Following a crash report from the user we need to be more explicit about the > dangers of using {{/tmp}} as agent {{work_dir}}. In addition, we can remove > the default value for the {{\-\-work_dir}} flag, forcing users to explicitly > set the work directory for the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event
[ https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235933#comment-15235933 ] Benjamin Mahler commented on MESOS-4705: Which patch? This one? https://reviews.apache.org/r/44379/ It still does not contain the information related to perf stat formats that [~haosd...@gmail.com] provided earlier in this thread. Can you add that? With respect to https://reviews.apache.org/r/44255/, happy to discuss further, but let's do that outside of this ticket since it is not related. > Slave failed to sample container with perf event > > > Key: MESOS-4705 > URL: https://issues.apache.org/jira/browse/MESOS-4705 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation >Affects Versions: 0.27.1 >Reporter: Fan Du >Assignee: Fan Du > > When sampling container with perf event on Centos7 with kernel > 3.10.0-123.el7.x86_64, slave complained with below error spew: > {code} > E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line > '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': > Unexpected number of fields > {code} > it's caused by the current perf format [assumption | > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] > with kernel version below 3.12 > On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: > value,unit,event,cgroup,running,ratio > A local modification fixed this error on my test bed, please review this > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5176) LinuxFilesystemIsolatorTest.ROOT_RecoverOrphanedPersistentVolume is flaky
[ https://issues.apache.org/jira/browse/MESOS-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235893#comment-15235893 ] Greg Mann commented on MESOS-5176: -- [~kaysoky] > LinuxFilesystemIsolatorTest.ROOT_RecoverOrphanedPersistentVolume is flaky > - > > Key: MESOS-5176 > URL: https://issues.apache.org/jira/browse/MESOS-5176 > Project: Mesos > Issue Type: Bug > Components: tests > Environment: CentOS 7, with libevent and SSL enabled >Reporter: Greg Mann > Labels: mesosphere > > Observed on the internal Mesosphere CI: > {code} > [07:10:58] : [Step 11/11] [ RUN ] > LinuxFilesystemIsolatorTest.ROOT_RecoverOrphanedPersistentVolume > [07:10:58]W: [Step 11/11] I0410 07:10:58.289384 32129 cluster.cpp:149] > Creating default 'local' authorizer > [07:10:58]W: [Step 11/11] I0410 07:10:58.317526 32129 leveldb.cpp:174] > Opened db in 27.91929ms > [07:10:58]W: [Step 11/11] I0410 07:10:58.318943 32129 leveldb.cpp:181] > Compacted db in 1.383973ms > [07:10:58]W: [Step 11/11] I0410 07:10:58.318989 32129 leveldb.cpp:196] > Created db iterator in 18603ns > [07:10:58]W: [Step 11/11] I0410 07:10:58.319000 32129 leveldb.cpp:202] > Seeked to beginning of db in 1529ns > [07:10:58]W: [Step 11/11] I0410 07:10:58.319008 32129 leveldb.cpp:271] > Iterated through 0 keys in the db in 358ns > [07:10:58]W: [Step 11/11] I0410 07:10:58.319046 32129 replica.cpp:779] > Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [07:10:58]W: [Step 11/11] I0410 07:10:58.319627 32143 recover.cpp:447] > Starting replica recovery > [07:10:58]W: [Step 11/11] I0410 07:10:58.319852 32143 recover.cpp:473] > Replica is in EMPTY status > [07:10:58]W: [Step 11/11] I0410 07:10:58.320796 32145 replica.cpp:673] > Replica in EMPTY status received a broadcasted recover request from > (17047)@172.30.2.121:48158 > [07:10:58]W: [Step 11/11] I0410 07:10:58.321202 32146 recover.cpp:193] > Received a recover response from a replica in EMPTY status > [07:10:58]W: [Step 11/11] I0410 07:10:58.321650 32150 recover.cpp:564] > Updating replica status to STARTING > [07:10:58]W: [Step 11/11] I0410 07:10:58.323005 32149 master.cpp:382] > Master 57a2cf4e-da76-4801-a887-c0c84ad59d0d (ip-172-30-2-121.mesosphere.io) > started on 172.30.2.121:48158 > [07:10:58]W: [Step 11/11] I0410 07:10:58.323022 32149 master.cpp:384] Flags > at startup: --acls="" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate="true" > --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/fWC4sn/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/fWC4sn/master" > --zk_session_timeout="10secs" > [07:10:58]W: [Step 11/11] I0410 07:10:58.323227 32149 master.cpp:433] > Master only allowing authenticated frameworks to register > [07:10:58]W: [Step 11/11] I0410 07:10:58.323237 32149 master.cpp:438] > Master only allowing authenticated agents to register > [07:10:58]W: [Step 11/11] I0410 07:10:58.323243 32149 credentials.hpp:37] > Loading credentials for authentication from '/tmp/fWC4sn/credentials' > [07:10:58]W: [Step 11/11] I0410 07:10:58.323498 32149 master.cpp:480] Using > default 'crammd5' authenticator > [07:10:58]W: [Step 11/11] I0410 07:10:58.323616 32149 master.cpp:551] Using > default 'basic' HTTP authenticator > [07:10:58]W: [Step 11/11] I0410 07:10:58.323739 32149 master.cpp:589] > Authorization enabled > [07:10:58]W: [Step 11/11] I0410 07:10:58.323884 32150 > whitelist_watcher.cpp:77] No whitelist given > [07:10:58]W: [Step 11/11] I0410 07:10:58.323920 32143 hierarchical.cpp:142] > Initialized hierarchical allocator process > [07:10:58]W: [Step 11/11] I0410 07:10:58.324103 32148 leveldb.cpp:304] > Persisting metadata (8 bytes) to leveldb took 2.27166ms > [07:10:58]W: [Step 11/11] I0410 07:10:58.324126 32148 replica.cpp:320] > Persisted replica status to STARTING > [07:10:58]W: [Step 11/11] I0410 07:10:58.324322 32146 recover.cpp:473] > Replica is in STARTING status > [07:10:58]W: [Step 11/11] I0410
[jira] [Commented] (MESOS-5177) LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutorWithVolumes is flaky
[ https://issues.apache.org/jira/browse/MESOS-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235896#comment-15235896 ] Greg Mann commented on MESOS-5177: -- [~tnachen] > LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutorWithVolumes > is flaky > > > Key: MESOS-5177 > URL: https://issues.apache.org/jira/browse/MESOS-5177 > Project: Mesos > Issue Type: Bug > Components: isolation, tests >Affects Versions: 0.28.0 > Environment: CentOS 7, with libevent and SSL enabled >Reporter: Greg Mann > Labels: mesosphere > > Observed on the internal Mesosphere CI: > {code} > [19:35:11] : [Step 11/11] [ RUN ] > LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutorWithVolumes > [19:35:11]W: [Step 11/11] I0411 19:35:11.907374 31187 cluster.cpp:149] > Creating default 'local' authorizer > [19:35:11]W: [Step 11/11] I0411 19:35:11.912621 31187 leveldb.cpp:174] > Opened db in 5.045872ms > [19:35:11]W: [Step 11/11] I0411 19:35:11.914330 31187 leveldb.cpp:181] > Compacted db in 1.6835ms > [19:35:11]W: [Step 11/11] I0411 19:35:11.914373 31187 leveldb.cpp:196] > Created db iterator in 17681ns > [19:35:11]W: [Step 11/11] I0411 19:35:11.914386 31187 leveldb.cpp:202] > Seeked to beginning of db in 1769ns > [19:35:11]W: [Step 11/11] I0411 19:35:11.914393 31187 leveldb.cpp:271] > Iterated through 0 keys in the db in 306ns > [19:35:11]W: [Step 11/11] I0411 19:35:11.914429 31187 replica.cpp:779] > Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [19:35:11]W: [Step 11/11] I0411 19:35:11.914922 31206 recover.cpp:447] > Starting replica recovery > [19:35:11]W: [Step 11/11] I0411 19:35:11.915133 31206 recover.cpp:473] > Replica is in EMPTY status > [19:35:11]W: [Step 11/11] I0411 19:35:11.916041 31203 replica.cpp:673] > Replica in EMPTY status received a broadcasted recover request from > (16968)@172.30.2.184:40532 > [19:35:11]W: [Step 11/11] I0411 19:35:11.916425 31202 recover.cpp:193] > Received a recover response from a replica in EMPTY status > [19:35:11]W: [Step 11/11] I0411 19:35:11.916898 31201 recover.cpp:564] > Updating replica status to STARTING > [19:35:11]W: [Step 11/11] I0411 19:35:11.917946 31207 master.cpp:382] > Master abd3c4ca-5e96-4cbe-8814-a9c5ebd1767b (ip-172-30-2-184.mesosphere.io) > started on 172.30.2.184:40532 > [19:35:11]W: [Step 11/11] I0411 19:35:11.917966 31207 master.cpp:384] Flags > at startup: --acls="" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate="true" > --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/0PzkwC/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/0PzkwC/master" > --zk_session_timeout="10secs" > [19:35:11]W: [Step 11/11] I0411 19:35:11.918198 31207 master.cpp:433] > Master only allowing authenticated frameworks to register > [19:35:11]W: [Step 11/11] I0411 19:35:11.918207 31207 master.cpp:438] > Master only allowing authenticated agents to register > [19:35:11]W: [Step 11/11] I0411 19:35:11.918213 31207 credentials.hpp:37] > Loading credentials for authentication from '/tmp/0PzkwC/credentials' > [19:35:11]W: [Step 11/11] I0411 19:35:11.918454 31207 master.cpp:480] Using > default 'crammd5' authenticator > [19:35:11]W: [Step 11/11] I0411 19:35:11.918587 31207 master.cpp:551] Using > default 'basic' HTTP authenticator > [19:35:11]W: [Step 11/11] I0411 19:35:11.918615 31205 leveldb.cpp:304] > Persisting metadata (8 bytes) to leveldb took 1.524112ms > [19:35:11]W: [Step 11/11] I0411 19:35:11.918644 31205 replica.cpp:320] > Persisted replica status to STARTING > [19:35:11]W: [Step 11/11] I0411 19:35:11.918750 31207 master.cpp:589] > Authorization enabled > [19:35:11]W: [Step 11/11] I0411 19:35:11.918856 31204 recover.cpp:473] > Replica is in STARTING status > [19:35:11]W: [Step 11/11] I0411 19:35:11.918908 31201 hierarchical.cpp:142] > Initialized hierarchical allocator process > [19:35:11]W: [Step 11/11] I0411 19:35:11.918912 3
[jira] [Created] (MESOS-5177) LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutorWithVolumes is flaky
Greg Mann created MESOS-5177: Summary: LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutorWithVolumes is flaky Key: MESOS-5177 URL: https://issues.apache.org/jira/browse/MESOS-5177 Project: Mesos Issue Type: Bug Components: isolation, tests Affects Versions: 0.28.0 Environment: CentOS 7, with libevent and SSL enabled Reporter: Greg Mann Observed on the internal Mesosphere CI: {code} [19:35:11] : [Step 11/11] [ RUN ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutorWithVolumes [19:35:11]W: [Step 11/11] I0411 19:35:11.907374 31187 cluster.cpp:149] Creating default 'local' authorizer [19:35:11]W: [Step 11/11] I0411 19:35:11.912621 31187 leveldb.cpp:174] Opened db in 5.045872ms [19:35:11]W: [Step 11/11] I0411 19:35:11.914330 31187 leveldb.cpp:181] Compacted db in 1.6835ms [19:35:11]W: [Step 11/11] I0411 19:35:11.914373 31187 leveldb.cpp:196] Created db iterator in 17681ns [19:35:11]W: [Step 11/11] I0411 19:35:11.914386 31187 leveldb.cpp:202] Seeked to beginning of db in 1769ns [19:35:11]W: [Step 11/11] I0411 19:35:11.914393 31187 leveldb.cpp:271] Iterated through 0 keys in the db in 306ns [19:35:11]W: [Step 11/11] I0411 19:35:11.914429 31187 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [19:35:11]W: [Step 11/11] I0411 19:35:11.914922 31206 recover.cpp:447] Starting replica recovery [19:35:11]W: [Step 11/11] I0411 19:35:11.915133 31206 recover.cpp:473] Replica is in EMPTY status [19:35:11]W: [Step 11/11] I0411 19:35:11.916041 31203 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (16968)@172.30.2.184:40532 [19:35:11]W: [Step 11/11] I0411 19:35:11.916425 31202 recover.cpp:193] Received a recover response from a replica in EMPTY status [19:35:11]W: [Step 11/11] I0411 19:35:11.916898 31201 recover.cpp:564] Updating replica status to STARTING [19:35:11]W: [Step 11/11] I0411 19:35:11.917946 31207 master.cpp:382] Master abd3c4ca-5e96-4cbe-8814-a9c5ebd1767b (ip-172-30-2-184.mesosphere.io) started on 172.30.2.184:40532 [19:35:11]W: [Step 11/11] I0411 19:35:11.917966 31207 master.cpp:384] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/0PzkwC/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/0PzkwC/master" --zk_session_timeout="10secs" [19:35:11]W: [Step 11/11] I0411 19:35:11.918198 31207 master.cpp:433] Master only allowing authenticated frameworks to register [19:35:11]W: [Step 11/11] I0411 19:35:11.918207 31207 master.cpp:438] Master only allowing authenticated agents to register [19:35:11]W: [Step 11/11] I0411 19:35:11.918213 31207 credentials.hpp:37] Loading credentials for authentication from '/tmp/0PzkwC/credentials' [19:35:11]W: [Step 11/11] I0411 19:35:11.918454 31207 master.cpp:480] Using default 'crammd5' authenticator [19:35:11]W: [Step 11/11] I0411 19:35:11.918587 31207 master.cpp:551] Using default 'basic' HTTP authenticator [19:35:11]W: [Step 11/11] I0411 19:35:11.918615 31205 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.524112ms [19:35:11]W: [Step 11/11] I0411 19:35:11.918644 31205 replica.cpp:320] Persisted replica status to STARTING [19:35:11]W: [Step 11/11] I0411 19:35:11.918750 31207 master.cpp:589] Authorization enabled [19:35:11]W: [Step 11/11] I0411 19:35:11.918856 31204 recover.cpp:473] Replica is in STARTING status [19:35:11]W: [Step 11/11] I0411 19:35:11.918908 31201 hierarchical.cpp:142] Initialized hierarchical allocator process [19:35:11]W: [Step 11/11] I0411 19:35:11.918912 31208 whitelist_watcher.cpp:77] No whitelist given [19:35:11]W: [Step 11/11] I0411 19:35:11.919694 31202 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (16970)@172.30.2.184:40532 [19:35:11]W: [Step 11/11] I0411 19:35:11.920127 31205 recover.cpp:193] Received a recover response from a replica in STARTING status [19:35:11]W: [Step 11/11] I0411
[jira] [Created] (MESOS-5176) LinuxFilesystemIsolatorTest.ROOT_RecoverOrphanedPersistentVolume is flaky
Greg Mann created MESOS-5176: Summary: LinuxFilesystemIsolatorTest.ROOT_RecoverOrphanedPersistentVolume is flaky Key: MESOS-5176 URL: https://issues.apache.org/jira/browse/MESOS-5176 Project: Mesos Issue Type: Bug Components: tests Environment: CentOS 7, with libevent and SSL enabled Reporter: Greg Mann Observed on the internal Mesosphere CI: {code} [07:10:58] : [Step 11/11] [ RUN ] LinuxFilesystemIsolatorTest.ROOT_RecoverOrphanedPersistentVolume [07:10:58]W: [Step 11/11] I0410 07:10:58.289384 32129 cluster.cpp:149] Creating default 'local' authorizer [07:10:58]W: [Step 11/11] I0410 07:10:58.317526 32129 leveldb.cpp:174] Opened db in 27.91929ms [07:10:58]W: [Step 11/11] I0410 07:10:58.318943 32129 leveldb.cpp:181] Compacted db in 1.383973ms [07:10:58]W: [Step 11/11] I0410 07:10:58.318989 32129 leveldb.cpp:196] Created db iterator in 18603ns [07:10:58]W: [Step 11/11] I0410 07:10:58.319000 32129 leveldb.cpp:202] Seeked to beginning of db in 1529ns [07:10:58]W: [Step 11/11] I0410 07:10:58.319008 32129 leveldb.cpp:271] Iterated through 0 keys in the db in 358ns [07:10:58]W: [Step 11/11] I0410 07:10:58.319046 32129 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [07:10:58]W: [Step 11/11] I0410 07:10:58.319627 32143 recover.cpp:447] Starting replica recovery [07:10:58]W: [Step 11/11] I0410 07:10:58.319852 32143 recover.cpp:473] Replica is in EMPTY status [07:10:58]W: [Step 11/11] I0410 07:10:58.320796 32145 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (17047)@172.30.2.121:48158 [07:10:58]W: [Step 11/11] I0410 07:10:58.321202 32146 recover.cpp:193] Received a recover response from a replica in EMPTY status [07:10:58]W: [Step 11/11] I0410 07:10:58.321650 32150 recover.cpp:564] Updating replica status to STARTING [07:10:58]W: [Step 11/11] I0410 07:10:58.323005 32149 master.cpp:382] Master 57a2cf4e-da76-4801-a887-c0c84ad59d0d (ip-172-30-2-121.mesosphere.io) started on 172.30.2.121:48158 [07:10:58]W: [Step 11/11] I0410 07:10:58.323022 32149 master.cpp:384] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/fWC4sn/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/fWC4sn/master" --zk_session_timeout="10secs" [07:10:58]W: [Step 11/11] I0410 07:10:58.323227 32149 master.cpp:433] Master only allowing authenticated frameworks to register [07:10:58]W: [Step 11/11] I0410 07:10:58.323237 32149 master.cpp:438] Master only allowing authenticated agents to register [07:10:58]W: [Step 11/11] I0410 07:10:58.323243 32149 credentials.hpp:37] Loading credentials for authentication from '/tmp/fWC4sn/credentials' [07:10:58]W: [Step 11/11] I0410 07:10:58.323498 32149 master.cpp:480] Using default 'crammd5' authenticator [07:10:58]W: [Step 11/11] I0410 07:10:58.323616 32149 master.cpp:551] Using default 'basic' HTTP authenticator [07:10:58]W: [Step 11/11] I0410 07:10:58.323739 32149 master.cpp:589] Authorization enabled [07:10:58]W: [Step 11/11] I0410 07:10:58.323884 32150 whitelist_watcher.cpp:77] No whitelist given [07:10:58]W: [Step 11/11] I0410 07:10:58.323920 32143 hierarchical.cpp:142] Initialized hierarchical allocator process [07:10:58]W: [Step 11/11] I0410 07:10:58.324103 32148 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 2.27166ms [07:10:58]W: [Step 11/11] I0410 07:10:58.324126 32148 replica.cpp:320] Persisted replica status to STARTING [07:10:58]W: [Step 11/11] I0410 07:10:58.324322 32146 recover.cpp:473] Replica is in STARTING status [07:10:58]W: [Step 11/11] I0410 07:10:58.325204 32143 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (17049)@172.30.2.121:48158 [07:10:58]W: [Step 11/11] I0410 07:10:58.325527 32145 recover.cpp:193] Received a recover response from a replica in STARTING status [07:10:58]W: [Step 11/11] I0410 07:10:58.325860 32150 master.cpp:1832] The newly elected leader is m
[jira] [Commented] (MESOS-5175) LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is flaky
[ https://issues.apache.org/jira/browse/MESOS-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235886#comment-15235886 ] Greg Mann commented on MESOS-5175: -- [~jieyu] > LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is flaky > - > > Key: MESOS-5175 > URL: https://issues.apache.org/jira/browse/MESOS-5175 > Project: Mesos > Issue Type: Bug > Components: tests > Environment: CentOS 7 with SSL and libevent enabled >Reporter: Greg Mann > Labels: mesosphere > > Observed on the internal Mesosphere CI: > {code} > [07:12:07] : [Step 11/11] [ RUN ] > LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint > [07:12:08]W: [Step 11/11] I0410 07:12:08.906998 32129 linux.cpp:81] Making > '/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH' > a shared mount > [07:12:08]W: [Step 11/11] I0410 07:12:08.923028 32129 > linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy > for the Linux launcher > [07:12:08]W: [Step 11/11] I0410 07:12:08.923751 32144 > containerizer.cpp:682] Starting container > '86d04a91-e7b0-4b8f-9706-b9969796b5d1' for executor 'test_executor' of > framework '' > [07:12:08]W: [Step 11/11] I0410 07:12:08.924296 32148 provisioner.cpp:285] > Provisioning image rootfs > '/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/provisioner/containers/86d04a91-e7b0-4b8f-9706-b9969796b5d1/backends/copy/rootfses/104f1991-f54a-4dd0-ab92-48ff2d3bebab' > for container 86d04a91-e7b0-4b8f-9706-b9969796b5d1 > [07:12:08]W: [Step 11/11] I0410 07:12:08.924885 32145 copy.cpp:127] Copying > layer path '/tmp/WwQa3Q/test_image' to rootfs > '/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/provisioner/containers/86d04a91-e7b0-4b8f-9706-b9969796b5d1/backends/copy/rootfses/104f1991-f54a-4dd0-ab92-48ff2d3bebab' > [07:12:13]W: [Step 11/11] I0410 07:12:13.627612 32145 linux.cpp:355] Bind > mounting work directory from '/tmp/WwQa3Q/sandbox' to > '/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/provisioner/containers/86d04a91-e7b0-4b8f-9706-b9969796b5d1/backends/copy/rootfses/104f1991-f54a-4dd0-ab92-48ff2d3bebab/mnt/mesos/sandbox' > for container 86d04a91-e7b0-4b8f-9706-b9969796b5d1 > [07:12:13]W: [Step 11/11] I0410 07:12:13.648669 32147 > linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS > [07:12:13]W: [Step 11/11] + > /mnt/teamcity/work/4240ba9ddd0997c3/build/src/mesos-containerizer mount > --help=false --operation=make-rslave --path=/ > [07:12:13]W: [Step 11/11] + grep -E > /mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/.+ > /proc/self/mountinfo > [07:12:13]W: [Step 11/11] + grep -v 86d04a91-e7b0-4b8f-9706-b9969796b5d1 > [07:12:13]W: [Step 11/11] + cut '-d ' -f5 > [07:12:13]W: [Step 11/11] + xargs --no-run-if-empty umount -l > [07:12:13]W: [Step 11/11] + mount -n --rbind /tmp/WwQa3Q > /mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/provisioner/containers/86d04a91-e7b0-4b8f-9706-b9969796b5d1/backends/copy/rootfses/104f1991-f54a-4dd0-ab92-48ff2d3bebab/mnt/mesos/sandbox/mountpoint > [07:12:13] : [Step 11/11] Changing root to > /mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/provisioner/containers/86d04a91-e7b0-4b8f-9706-b9969796b5d1/backends/copy/rootfses/104f1991-f54a-4dd0-ab92-48ff2d3bebab > [07:12:13]W: [Step 11/11] I0410 07:12:13.827551 32145 > containerizer.cpp:1674] Executor for container > '86d04a91-e7b0-4b8f-9706-b9969796b5d1' has exited > [07:12:13]W: [Step 11/11] I0410 07:12:13.827607 32145 > containerizer.cpp:1439] Destroying container > '86d04a91-e7b0-4b8f-9706-b9969796b5d1' > [07:12:13]W: [Step 11/11] I0410 07:12:13.830469 32145 cgroups.cpp:2676] > Freezing cgroup > /sys/fs/cgroup/freezer/mesos/86d04a91-e7b0-4b8f-9706-b9969796b5d1 > [07:12:13]W: [Step 11/11] I0410 07:12:13.832928 32143 cgroups.cpp:1409] > Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/86d04a91-e7b0-4b8f-9706-b9969796b5d1 after > 2.412032ms > [07:12:13]W: [Step 11/11] I0410 07:12:13.835292 32150 cgroups.cpp:2694] > Thawing cgroup > /sys/fs/cgroup/freezer/mesos/86d04a91-e7b0-4b8f-9706-b9969796b5d1 > [07:12:13]W: [Step 11/11] I0410 07:12:13.837411 32150 cgroups.cpp:1438] > Successfullly thawed cgroup > /sys/fs/cgroup/freezer/mesos/86d04a91-e7b0-4b8f-9706-b9969796b5d1 after > 2.07616ms > [07:12:13]W: [Step 11/11] I0410 07:12:13.840045 32148 linux.cpp:817] > Unmounting sandbox/work directory > '/mn
[jira] [Created] (MESOS-5175) LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is flaky
Greg Mann created MESOS-5175: Summary: LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is flaky Key: MESOS-5175 URL: https://issues.apache.org/jira/browse/MESOS-5175 Project: Mesos Issue Type: Bug Components: tests Environment: CentOS 7 with SSL and libevent enabled Reporter: Greg Mann Observed on the internal Mesosphere CI: {code} [07:12:07] : [Step 11/11] [ RUN ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint [07:12:08]W: [Step 11/11] I0410 07:12:08.906998 32129 linux.cpp:81] Making '/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH' a shared mount [07:12:08]W: [Step 11/11] I0410 07:12:08.923028 32129 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [07:12:08]W: [Step 11/11] I0410 07:12:08.923751 32144 containerizer.cpp:682] Starting container '86d04a91-e7b0-4b8f-9706-b9969796b5d1' for executor 'test_executor' of framework '' [07:12:08]W: [Step 11/11] I0410 07:12:08.924296 32148 provisioner.cpp:285] Provisioning image rootfs '/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/provisioner/containers/86d04a91-e7b0-4b8f-9706-b9969796b5d1/backends/copy/rootfses/104f1991-f54a-4dd0-ab92-48ff2d3bebab' for container 86d04a91-e7b0-4b8f-9706-b9969796b5d1 [07:12:08]W: [Step 11/11] I0410 07:12:08.924885 32145 copy.cpp:127] Copying layer path '/tmp/WwQa3Q/test_image' to rootfs '/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/provisioner/containers/86d04a91-e7b0-4b8f-9706-b9969796b5d1/backends/copy/rootfses/104f1991-f54a-4dd0-ab92-48ff2d3bebab' [07:12:13]W: [Step 11/11] I0410 07:12:13.627612 32145 linux.cpp:355] Bind mounting work directory from '/tmp/WwQa3Q/sandbox' to '/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/provisioner/containers/86d04a91-e7b0-4b8f-9706-b9969796b5d1/backends/copy/rootfses/104f1991-f54a-4dd0-ab92-48ff2d3bebab/mnt/mesos/sandbox' for container 86d04a91-e7b0-4b8f-9706-b9969796b5d1 [07:12:13]W: [Step 11/11] I0410 07:12:13.648669 32147 linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS [07:12:13]W: [Step 11/11] + /mnt/teamcity/work/4240ba9ddd0997c3/build/src/mesos-containerizer mount --help=false --operation=make-rslave --path=/ [07:12:13]W: [Step 11/11] + grep -E /mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/.+ /proc/self/mountinfo [07:12:13]W: [Step 11/11] + grep -v 86d04a91-e7b0-4b8f-9706-b9969796b5d1 [07:12:13]W: [Step 11/11] + cut '-d ' -f5 [07:12:13]W: [Step 11/11] + xargs --no-run-if-empty umount -l [07:12:13]W: [Step 11/11] + mount -n --rbind /tmp/WwQa3Q /mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/provisioner/containers/86d04a91-e7b0-4b8f-9706-b9969796b5d1/backends/copy/rootfses/104f1991-f54a-4dd0-ab92-48ff2d3bebab/mnt/mesos/sandbox/mountpoint [07:12:13] : [Step 11/11] Changing root to /mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/provisioner/containers/86d04a91-e7b0-4b8f-9706-b9969796b5d1/backends/copy/rootfses/104f1991-f54a-4dd0-ab92-48ff2d3bebab [07:12:13]W: [Step 11/11] I0410 07:12:13.827551 32145 containerizer.cpp:1674] Executor for container '86d04a91-e7b0-4b8f-9706-b9969796b5d1' has exited [07:12:13]W: [Step 11/11] I0410 07:12:13.827607 32145 containerizer.cpp:1439] Destroying container '86d04a91-e7b0-4b8f-9706-b9969796b5d1' [07:12:13]W: [Step 11/11] I0410 07:12:13.830469 32145 cgroups.cpp:2676] Freezing cgroup /sys/fs/cgroup/freezer/mesos/86d04a91-e7b0-4b8f-9706-b9969796b5d1 [07:12:13]W: [Step 11/11] I0410 07:12:13.832928 32143 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/86d04a91-e7b0-4b8f-9706-b9969796b5d1 after 2.412032ms [07:12:13]W: [Step 11/11] I0410 07:12:13.835292 32150 cgroups.cpp:2694] Thawing cgroup /sys/fs/cgroup/freezer/mesos/86d04a91-e7b0-4b8f-9706-b9969796b5d1 [07:12:13]W: [Step 11/11] I0410 07:12:13.837411 32150 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/86d04a91-e7b0-4b8f-9706-b9969796b5d1 after 2.07616ms [07:12:13]W: [Step 11/11] I0410 07:12:13.840045 32148 linux.cpp:817] Unmounting sandbox/work directory '/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromHostSandboxMountPoint_aSovaH/provisioner/containers/86d04a91-e7b0-4b8f-9706-b9969796b5d1/backends/copy/rootfses/104f1991-f54a-4dd0-ab92-48ff2d3bebab/mnt/mesos/sandbox' for container 86d04a91-e7b0-4b8f-9706-b9969796b5d1 [07:12:13]W: [Step 11/11] I0410 07:12:13.840504 32150 provisioner.cpp:330] Destr
[jira] [Updated] (MESOS-5139) ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar is flaky
[ https://issues.apache.org/jira/browse/MESOS-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-5139: Sprint: Mesosphere Sprint 33 Story Points: 2 > ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar is flaky > -- > > Key: MESOS-5139 > URL: https://issues.apache.org/jira/browse/MESOS-5139 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.0 > Environment: Ubuntu14.04 >Reporter: Vinod Kone >Assignee: Gilbert Song > Labels: mesosphere > > Found this on ASF CI while testing 0.28.1-rc2 > {code} > [ RUN ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar > E0406 18:29:30.870481 520 shell.hpp:93] Command 'hadoop version 2>&1' > failed; this is the output: > sh: 1: hadoop: not found > E0406 18:29:30.870576 520 fetcher.cpp:59] Failed to create URI fetcher > plugin 'hadoop': Failed to create HDFS client: Failed to execute 'hadoop > version 2>&1'; the command was either not found or exited with a non-zero > exit status: 127 > I0406 18:29:30.871052 520 local_puller.cpp:90] Creating local puller with > docker registry '/tmp/3l8ZBv/images' > I0406 18:29:30.873325 539 metadata_manager.cpp:159] Looking for image 'abc' > I0406 18:29:30.874438 539 local_puller.cpp:142] Untarring image 'abc' from > '/tmp/3l8ZBv/images/abc.tar' to '/tmp/3l8ZBv/store/staging/5tw8bD' > I0406 18:29:30.901916 547 local_puller.cpp:162] The repositories JSON file > for image 'abc' is '{"abc":{"latest":"456"}}' > I0406 18:29:30.902304 547 local_puller.cpp:290] Extracting layer tar ball > '/tmp/3l8ZBv/store/staging/5tw8bD/123/layer.tar to rootfs > '/tmp/3l8ZBv/store/staging/5tw8bD/123/rootfs' > I0406 18:29:30.909144 547 local_puller.cpp:290] Extracting layer tar ball > '/tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar to rootfs > '/tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs' > ../../src/tests/containerizer/provisioner_docker_tests.cpp:183: Failure > (imageInfo).failure(): Collect failed: Subprocess 'tar, tar, -x, -f, > /tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar, -C, > /tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs' failed: tar: This does not look > like a tar archive > tar: Exiting with failure status due to previous errors > [ FAILED ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar (243 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3781) Replace Master/Slave Terminology Phase I - Add duplicate agent flags
[ https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235826#comment-15235826 ] Vinod Kone commented on MESOS-3781: --- This is the work flow we use. Open --> Accept --> In Progress --> Reviewable. There should be buttons up top for making these transitions. Sometimes the buttons are at the top level and sometimes they are underneath the "Workflow" button. Once a ticket is "In Progress", you can click "Post Review" button to post the RB link as the comment and transition the ticket to "Reviewable". > Replace Master/Slave Terminology Phase I - Add duplicate agent flags > - > > Key: MESOS-3781 > URL: https://issues.apache.org/jira/browse/MESOS-3781 > Project: Mesos > Issue Type: Task >Reporter: Diana Arroyo >Assignee: Jay Guo > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3214) Replace boost foreach with range-based for
[ https://issues.apache.org/jira/browse/MESOS-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-3214: Sprint: Mesosphere Sprint 33 > Replace boost foreach with range-based for > -- > > Key: MESOS-3214 > URL: https://issues.apache.org/jira/browse/MESOS-3214 > Project: Mesos > Issue Type: Task > Components: stout >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere > > It's desirable to replace the boost {{foreach}} macro with the C++11 > range-based {{for}}. This will help avoid some of the pitfalls of boost > {{foreach}} such as dealing with types with commas in them, as well as > improving compiler diagnostics by avoiding the macro expansion. > One way to accomplish this is to replace the existing {{foreach (const Elem& > elem, container)}} pattern with {{for (const Elem& elem : container)}}. We > could support {{foreachkey}} and {{foreachvalue}} semantics via adaptors > {{keys}} and {{values}} which would be used like this: {{for (const Key& key > : keys(container))}}, {{for (const Value& value : values(container))}}. This > leaves {{foreachpair}} which cannot be used with {{for}}. I think it would be > desirable to support {{foreachpair}} for cases where the implicit unpacking > is useful. > Another approach is to keep {{foreach}}, {{foreachpair}}, {{foreachkey}} and > {{foreachvalue}}, but simply implement them based on range-based {{for}}. For > example, {{#define foreach(elem, container) for (elem : container)}}. While > the consistency in the names is desirable, but unnecessary indirection of the > macro definition is not. > It's unclear to me which approach we would favor in Mesos, so please share > your thoughts and preferences. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3214) Replace boost foreach with range-based for
[ https://issues.apache.org/jira/browse/MESOS-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park reassigned MESOS-3214: --- Assignee: Michael Park > Replace boost foreach with range-based for > -- > > Key: MESOS-3214 > URL: https://issues.apache.org/jira/browse/MESOS-3214 > Project: Mesos > Issue Type: Task > Components: stout >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere > > It's desirable to replace the boost {{foreach}} macro with the C++11 > range-based {{for}}. This will help avoid some of the pitfalls of boost > {{foreach}} such as dealing with types with commas in them, as well as > improving compiler diagnostics by avoiding the macro expansion. > One way to accomplish this is to replace the existing {{foreach (const Elem& > elem, container)}} pattern with {{for (const Elem& elem : container)}}. We > could support {{foreachkey}} and {{foreachvalue}} semantics via adaptors > {{keys}} and {{values}} which would be used like this: {{for (const Key& key > : keys(container))}}, {{for (const Value& value : values(container))}}. This > leaves {{foreachpair}} which cannot be used with {{for}}. I think it would be > desirable to support {{foreachpair}} for cases where the implicit unpacking > is useful. > Another approach is to keep {{foreach}}, {{foreachpair}}, {{foreachkey}} and > {{foreachvalue}}, but simply implement them based on range-based {{for}}. For > example, {{#define foreach(elem, container) for (elem : container)}}. While > the consistency in the names is desirable, but unnecessary indirection of the > macro definition is not. > It's unclear to me which approach we would favor in Mesos, so please share > your thoughts and preferences. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4882) Add support for command and arguments to mesos-execute.
[ https://issues.apache.org/jira/browse/MESOS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-4882: --- Sprint: Mesosphere Sprint 33 Affects Version/s: 0.28.0 0.27.2 Story Points: 5 Labels: cli mesosphere (was: ) Description: {{CommandInfo}} protobuf support two kinds of command: {code} // There are two ways to specify the command: // 1) If 'shell == true', the command will be launched via shell //(i.e., /bin/sh -c 'value'). The 'value' specified will be //treated as the shell command. The 'arguments' will be ignored. // 2) If 'shell == false', the command will be launched by passing //arguments to an executable. The 'value' specified will be //treated as the filename of the executable. The 'arguments' //will be treated as the arguments to the executable. This is //similar to how POSIX exec families launch processes (i.e., //execlp(value, arguments(0), arguments(1), ...)). {code} The mesos-execute cannot handle 2) now, enabling 2) can help with testing and running one off tasks. was: The commandInfo support two kind of command: {code} // There are two ways to specify the command: // 1) If 'shell == true', the command will be launched via shell //(i.e., /bin/sh -c 'value'). The 'value' specified will be //treated as the shell command. The 'arguments' will be ignored. // 2) If 'shell == false', the command will be launched by passing //arguments to an executable. The 'value' specified will be //treated as the filename of the executable. The 'arguments' //will be treated as the arguments to the executable. This is //similar to how POSIX exec families launch processes (i.e., //execlp(value, arguments(0), arguments(1), ...)). {code} The mesos-execute cannot handle 2) now, enabling 2) can help some unit test with isolator. Issue Type: Improvement (was: Bug) Summary: Add support for command and arguments to mesos-execute. (was: Enabled mesos-execute treat command as executable value and arguments.) > Add support for command and arguments to mesos-execute. > --- > > Key: MESOS-4882 > URL: https://issues.apache.org/jira/browse/MESOS-4882 > Project: Mesos > Issue Type: Improvement >Affects Versions: 0.28.0, 0.27.2 >Reporter: Guangya Liu >Assignee: Guangya Liu > Labels: cli, mesosphere > > {{CommandInfo}} protobuf support two kinds of command: > {code} > // There are two ways to specify the command: > // 1) If 'shell == true', the command will be launched via shell > // (i.e., /bin/sh -c 'value'). The 'value' specified will be > // treated as the shell command. The 'arguments' will be ignored. > // 2) If 'shell == false', the command will be launched by passing > // arguments to an executable. The 'value' specified will be > // treated as the filename of the executable. The 'arguments' > // will be treated as the arguments to the executable. This is > // similar to how POSIX exec families launch processes (i.e., > // execlp(value, arguments(0), arguments(1), ...)). > {code} > The mesos-execute cannot handle 2) now, enabling 2) can help with testing and > running one off tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5155) Consolidate authorization actions for quota.
[ https://issues.apache.org/jira/browse/MESOS-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235759#comment-15235759 ] Alexander Rukletsov commented on MESOS-5155: I'm afraid so, because we also have to update ACLs which we've published in 0.27. > Consolidate authorization actions for quota. > > > Key: MESOS-5155 > URL: https://issues.apache.org/jira/browse/MESOS-5155 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Assignee: Zhitao Li > Labels: mesosphere > > We should have just a single authz action: {{UPDATE_QUOTA_WITH_ROLE}}. It was > a mistake in retrospect to introduce multiple actions. > Actions that are not symmetrical are register/teardown and dynamic > reservations. The way they are implemented in this way is because entities > that do one action differ from entities that do the other. For example, > register framework is issued by a framework, teardown by an operator. What is > a good way to identify a framework? A role it runs in, which may be different > each launch and makes no sense in multi-role frameworks setup or better a > sort of a group id, which is its principal. For dynamic reservations and > persistent volumes, they can be both issued by frameworks and operators, > hence similar reasoning applies. > Now, quota is associated with a role and set only by operators. Do we need to > care about principals that set it? Not that much. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4908) Tasks cannot be killed forcefully.
[ https://issues.apache.org/jira/browse/MESOS-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-4908: --- Sprint: Mesosphere Sprint 33 Story Points: 5 > Tasks cannot be killed forcefully. > -- > > Key: MESOS-4908 > URL: https://issues.apache.org/jira/browse/MESOS-4908 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: mesosphere > > Currently there is no way for a scheduler to instruct the executor to kill a > certain task immediately, skipping any possible timeouts and / or kill > policies. This may be desirable in cases like, e.g., the kill policy is 10 > minutes but something went wrong, so the scheduler decides to issue a > forceful kill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5139) ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar is flaky
[ https://issues.apache.org/jira/browse/MESOS-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-5139: - Labels: mesosphere (was: ) > ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar is flaky > -- > > Key: MESOS-5139 > URL: https://issues.apache.org/jira/browse/MESOS-5139 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.0 > Environment: Ubuntu14.04 >Reporter: Vinod Kone >Assignee: Gilbert Song > Labels: mesosphere > > Found this on ASF CI while testing 0.28.1-rc2 > {code} > [ RUN ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar > E0406 18:29:30.870481 520 shell.hpp:93] Command 'hadoop version 2>&1' > failed; this is the output: > sh: 1: hadoop: not found > E0406 18:29:30.870576 520 fetcher.cpp:59] Failed to create URI fetcher > plugin 'hadoop': Failed to create HDFS client: Failed to execute 'hadoop > version 2>&1'; the command was either not found or exited with a non-zero > exit status: 127 > I0406 18:29:30.871052 520 local_puller.cpp:90] Creating local puller with > docker registry '/tmp/3l8ZBv/images' > I0406 18:29:30.873325 539 metadata_manager.cpp:159] Looking for image 'abc' > I0406 18:29:30.874438 539 local_puller.cpp:142] Untarring image 'abc' from > '/tmp/3l8ZBv/images/abc.tar' to '/tmp/3l8ZBv/store/staging/5tw8bD' > I0406 18:29:30.901916 547 local_puller.cpp:162] The repositories JSON file > for image 'abc' is '{"abc":{"latest":"456"}}' > I0406 18:29:30.902304 547 local_puller.cpp:290] Extracting layer tar ball > '/tmp/3l8ZBv/store/staging/5tw8bD/123/layer.tar to rootfs > '/tmp/3l8ZBv/store/staging/5tw8bD/123/rootfs' > I0406 18:29:30.909144 547 local_puller.cpp:290] Extracting layer tar ball > '/tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar to rootfs > '/tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs' > ../../src/tests/containerizer/provisioner_docker_tests.cpp:183: Failure > (imageInfo).failure(): Collect failed: Subprocess 'tar, tar, -x, -f, > /tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar, -C, > /tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs' failed: tar: This does not look > like a tar archive > tar: Exiting with failure status due to previous errors > [ FAILED ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar (243 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart
[ https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235678#comment-15235678 ] Adam B commented on MESOS-1739: --- This is why we suggest that changes like this will need to notify (all?) frameworks of the change in attributes, so the framework can make the right choice about what to do with its tasks based on the new information. I'm not sure, however, how we should handle frameworks that don't understand the new "attributes changed" message. > Allow slave reconfiguration on restart > -- > > Key: MESOS-1739 > URL: https://issues.apache.org/jira/browse/MESOS-1739 > Project: Mesos > Issue Type: Epic >Reporter: Patrick Reilly > Labels: external-volumes, mesosphere, myriad > > Make it so that either via a slave restart or a out of process "reconfigure" > ping, the attributes and resources of a slave can be updated to be a superset > of what they used to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5155) Consolidate authorization actions for quota.
[ https://issues.apache.org/jira/browse/MESOS-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-5155: --- Sprint: Mesosphere Sprint 33 > Consolidate authorization actions for quota. > > > Key: MESOS-5155 > URL: https://issues.apache.org/jira/browse/MESOS-5155 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Assignee: Zhitao Li > Labels: mesosphere > > We should have just a single authz action: {{UPDATE_QUOTA_WITH_ROLE}}. It was > a mistake in retrospect to introduce multiple actions. > Actions that are not symmetrical are register/teardown and dynamic > reservations. The way they are implemented in this way is because entities > that do one action differ from entities that do the other. For example, > register framework is issued by a framework, teardown by an operator. What is > a good way to identify a framework? A role it runs in, which may be different > each launch and makes no sense in multi-role frameworks setup or better a > sort of a group id, which is its principal. For dynamic reservations and > persistent volumes, they can be both issued by frameworks and operators, > hence similar reasoning applies. > Now, quota is associated with a role and set only by operators. Do we need to > care about principals that set it? Not that much. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4941) Support update existing quota.
[ https://issues.apache.org/jira/browse/MESOS-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-4941: --- Shepherd: Alexander Rukletsov (was: Joris Van Remoortere) Sprint: Mesosphere Sprint 33 Story Points: 8 > Support update existing quota. > -- > > Key: MESOS-4941 > URL: https://issues.apache.org/jira/browse/MESOS-4941 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Zhitao Li >Assignee: Zhitao Li > Labels: Quota, mesosphere > > We want to support updating an existing quota without the cycle of delete and > recreate. This avoids the possible starvation risk of losing the quota > between delete and recreate, and also makes the interface friendly. > Design doc: > https://docs.google.com/document/d/1c8fJY9_N0W04FtUQ_b_kZM6S0eePU7eYVyfUP14dSys -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5174) Update the balloon-framework to run on test clusters
[ https://issues.apache.org/jira/browse/MESOS-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235663#comment-15235663 ] Joseph Wu commented on MESOS-5174: -- || Review || Summary || | https://reviews.apache.org/r/45604/ | First 4 bullet points in the description | | https://reviews.apache.org/r/45905/ | Metrics | > Update the balloon-framework to run on test clusters > > > Key: MESOS-5174 > URL: https://issues.apache.org/jira/browse/MESOS-5174 > Project: Mesos > Issue Type: Improvement > Components: framework, technical debt >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere, tech-debt > > There are a couple of problems with the balloon framework that prevent it > from being deployed (easily) on an actual cluster: > * The framework accepts 100% of memory in an offer. This means the expected > behavior (finish or OOM) is dependent on the offer size. > * The framework assumes the {{balloon-executor}} binary is available on each > agent. This is generally only true in the build environment or in > single-agent test environments. > * The framework does not specify CPUs with the executor. This is required by > many isolators. > * The executor's {{TASK_FINISHED}} logic path was untested and is flaky. > * The framework has no metrics. > * The framework only launches a single task and then exits. With this > behavior, we can't have useful metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5155) Consolidate authorization actions for quota.
[ https://issues.apache.org/jira/browse/MESOS-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235662#comment-15235662 ] Adam B commented on MESOS-5155: --- +1 to getting rid of DESTROY_QUOTA_WITH_PRINCIPAL, but has it made it into a release already? Do we need to put it through a deprecation cycle? > Consolidate authorization actions for quota. > > > Key: MESOS-5155 > URL: https://issues.apache.org/jira/browse/MESOS-5155 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Assignee: Zhitao Li > Labels: mesosphere > > We should have just a single authz action: {{UPDATE_QUOTA_WITH_ROLE}}. It was > a mistake in retrospect to introduce multiple actions. > Actions that are not symmetrical are register/teardown and dynamic > reservations. The way they are implemented in this way is because entities > that do one action differ from entities that do the other. For example, > register framework is issued by a framework, teardown by an operator. What is > a good way to identify a framework? A role it runs in, which may be different > each launch and makes no sense in multi-role frameworks setup or better a > sort of a group id, which is its principal. For dynamic reservations and > persistent volumes, they can be both issued by frameworks and operators, > hence similar reasoning applies. > Now, quota is associated with a role and set only by operators. Do we need to > care about principals that set it? Not that much. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5174) Update the balloon-framework to run on test clusters
Joseph Wu created MESOS-5174: Summary: Update the balloon-framework to run on test clusters Key: MESOS-5174 URL: https://issues.apache.org/jira/browse/MESOS-5174 Project: Mesos Issue Type: Improvement Components: framework, technical debt Reporter: Joseph Wu Assignee: Joseph Wu There are a couple of problems with the balloon framework that prevent it from being deployed (easily) on an actual cluster: * The framework accepts 100% of memory in an offer. This means the expected behavior (finish or OOM) is dependent on the offer size. * The framework assumes the {{balloon-executor}} binary is available on each agent. This is generally only true in the build environment or in single-agent test environments. * The framework does not specify CPUs with the executor. This is required by many isolators. * The executor's {{TASK_FINISHED}} logic path was untested and is flaky. * The framework has no metrics. * The framework only launches a single task and then exits. With this behavior, we can't have useful metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5155) Consolidate authorization actions for quota.
[ https://issues.apache.org/jira/browse/MESOS-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhitao Li reassigned MESOS-5155: Assignee: Zhitao Li > Consolidate authorization actions for quota. > > > Key: MESOS-5155 > URL: https://issues.apache.org/jira/browse/MESOS-5155 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Assignee: Zhitao Li > Labels: mesosphere > > We should have just a single authz action: {{UPDATE_QUOTA_WITH_ROLE}}. It was > a mistake in retrospect to introduce multiple actions. > Actions that are not symmetrical are register/teardown and dynamic > reservations. The way they are implemented in this way is because entities > that do one action differ from entities that do the other. For example, > register framework is issued by a framework, teardown by an operator. What is > a good way to identify a framework? A role it runs in, which may be different > each launch and makes no sense in multi-role frameworks setup or better a > sort of a group id, which is its principal. For dynamic reservations and > persistent volumes, they can be both issued by frameworks and operators, > hence similar reasoning applies. > Now, quota is associated with a role and set only by operators. Do we need to > care about principals that set it? Not that much. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate
[ https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhitao Li updated MESOS-4760: - Assignee: Michael Browning (was: Zhitao Li) > Expose metrics and gauges for fetcher cache usage and hit rate > -- > > Key: MESOS-4760 > URL: https://issues.apache.org/jira/browse/MESOS-4760 > Project: Mesos > Issue Type: Improvement > Components: fetcher, statistics >Reporter: Michael Browning >Assignee: Michael Browning >Priority: Minor > Labels: features, fetcher, statistics, uber > > To evaluate the fetcher cache and calibrate the value of the > fetcher_cache_size flag, it would be useful to have metrics and gauges on > agents that expose operational statistics like cache hit rate, occupied cache > size, and time spent downloading resources that were not present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate
[ https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhitao Li reassigned MESOS-4760: Assignee: Zhitao Li > Expose metrics and gauges for fetcher cache usage and hit rate > -- > > Key: MESOS-4760 > URL: https://issues.apache.org/jira/browse/MESOS-4760 > Project: Mesos > Issue Type: Improvement > Components: fetcher, statistics >Reporter: Michael Browning >Assignee: Zhitao Li >Priority: Minor > Labels: features, fetcher, statistics, uber > > To evaluate the fetcher cache and calibrate the value of the > fetcher_cache_size flag, it would be useful to have metrics and gauges on > agents that expose operational statistics like cache hit rate, occupied cache > size, and time spent downloading resources that were not present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3739) Mesos does not set Content-Type for 400 Bad Request
[ https://issues.apache.org/jira/browse/MESOS-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-3739: -- Assignee: Vinod Kone Sprint: Mesosphere Sprint 33 > Mesos does not set Content-Type for 400 Bad Request > --- > > Key: MESOS-3739 > URL: https://issues.apache.org/jira/browse/MESOS-3739 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.0, 0.24.1, 0.25.0 >Reporter: Ben Whitehead >Assignee: Vinod Kone > Labels: mesosphere > > While integrating with the HTTP Scheduler API I encountered the following > scenario. > The message below was serialized to protobuf and sent as the POST body > {code:title=message} > call { > type: ACKNOWLEDGE, > acknowledge: { > uuid: , > agentID: { value: "20151012-182734-16777343-5050-8978-S2" }, > taskID: { value: "task-1" } > } > } > {code} > {code:title=Request Headers} > POST /api/v1/scheduler HTTP/1.1 > Content-Type: application/x-protobuf > Accept: application/x-protobuf > Content-Length: 73 > Host: localhost:5050 > User-Agent: RxNetty Client > {code} > I received the following response > {code:title=Response Headers} > HTTP/1.1 400 Bad Request > Date: Wed, 14 Oct 2015 23:21:36 GMT > Content-Length: 74 > Failed to validate Scheduler::Call: Expecting 'framework_id' to be present > {code} > Even though my accept header made no mention of {{text/plain}} the message > body returned to me is {{text/plain}}. Additionally, there is no > {{Content-Type}} header set on the response so I can't even do anything > intelligently in my response handler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5166) ExamplesTest.DynamicReservationFramework is slow
[ https://issues.apache.org/jira/browse/MESOS-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-5166: Sprint: Mesosphere Sprint 33 > ExamplesTest.DynamicReservationFramework is slow > > > Key: MESOS-5166 > URL: https://issues.apache.org/jira/browse/MESOS-5166 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier > Labels: examples, mesosphere > > For an unoptimized build under OS X > {{ExamplesTest.DynamicReservationFramework}} currently takes more than 13 > seconds on my machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5166) ExamplesTest.DynamicReservationFramework is slow
[ https://issues.apache.org/jira/browse/MESOS-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-5166: Shepherd: Michael Park > ExamplesTest.DynamicReservationFramework is slow > > > Key: MESOS-5166 > URL: https://issues.apache.org/jira/browse/MESOS-5166 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier > Labels: examples, mesosphere > > For an unoptimized build under OS X > {{ExamplesTest.DynamicReservationFramework}} currently takes more than 13 > seconds on my machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4689) Design doc for v1 Operator API
[ https://issues.apache.org/jira/browse/MESOS-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-4689: -- Sprint: Mesosphere Sprint 29, Mesosphere Sprint 33 (was: Mesosphere Sprint 29) > Design doc for v1 Operator API > -- > > Key: MESOS-4689 > URL: https://issues.apache.org/jira/browse/MESOS-4689 > Project: Mesos > Issue Type: Documentation >Reporter: Vinod Kone >Assignee: Kevin Klues > > We need to design how the v1 operator API (all the HTTP endpoints exposed by > master/agent that are not for scheduler/executor interactions) looks and > works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3558) Implement HTTPCommandExecutor that uses the Executor Library
[ https://issues.apache.org/jira/browse/MESOS-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-3558: -- Sprint: Mesosphere Sprint 33 > Implement HTTPCommandExecutor that uses the Executor Library > -- > > Key: MESOS-3558 > URL: https://issues.apache.org/jira/browse/MESOS-3558 > Project: Mesos > Issue Type: Task >Reporter: Anand Mazumdar >Assignee: Qian Zhang > Labels: mesosphere > > Instead of using the {{MesosExecutorDriver}} , we should make the > {{CommandExecutor}} in {{src/launcher/executor.cpp}} use the new Executor > HTTP Library that we create in {{MESOS-3550}}. > This would act as a good validation of the {{HTTP API}} implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5173) Allow master/agent to take multiple --modules flags
[ https://issues.apache.org/jira/browse/MESOS-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya reassigned MESOS-5173: - Assignee: Kapil Arya > Allow master/agent to take multiple --modules flags > --- > > Key: MESOS-5173 > URL: https://issues.apache.org/jira/browse/MESOS-5173 > Project: Mesos > Issue Type: Task >Reporter: Kapil Arya >Assignee: Kapil Arya > Labels: mesosphere > Fix For: 0.29.0 > > > When loading multiple modules into master/agent, one has to merge all module > metadata (library name, module name, parameters, etc.) into a single json > file which is then passed on to the --modules flag. This quickly becomes > cumbersome especially if the modules are coming from different > vendors/developers. > An alternate would be to allow multiple invocations of --modules flag that > can then be passed on to the module manager. That way, each flag corresponds > to just one module library and modules from that library. > Another approach is to create a new flag (e.g., --modules-dir) that contains > a path to a directory that would contain multiple json files. One can think > of it as an analogous to systemd units. The operator that drops a new file > into this directory and the file would automatically be picked up by the > master/agent module manager. Further, the naming scheme can also be inherited > to prefix the filename with an "NN_" to signify oad order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5119) Support directory structure in CommandInfo.URI.filename in fetcher
[ https://issues.apache.org/jira/browse/MESOS-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Browning reassigned MESOS-5119: --- Assignee: Michael Browning > Support directory structure in CommandInfo.URI.filename in fetcher > -- > > Key: MESOS-5119 > URL: https://issues.apache.org/jira/browse/MESOS-5119 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Yan Xu >Assignee: Michael Browning > > In MESOS-4735, {{CommandInfo.URI.filename}} is added but there is no > validation to make sure it's a simple basename, so people can actually > specify the filename to be something like {{path/to/file}} but the validation > [won't catch it|https://reviews.apache.org/r/45046/#comment190155]. The fetch > will fail later in {{download()}} because it cannot open a destination file > without its parent directory. > Instead of fixing this by disallowing such output filename, we could actually > support this behavior. There are use cases where multiple fetch targets have > the same basename but they are organized by a directory hierarchy. > {noformat:title=} > root/app.dat > root/parent/app.dat > root/parent/child/app.dat > {noformat} > It looks to me that supporting this is straightforward and we just need to 1) > make sure the output path is within the sandbox and 2) recursively mkdirs for > the parent dirs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5145) protobuf vendored but its depencencies are not
[ https://issues.apache.org/jira/browse/MESOS-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-5145: - Labels: mesosphere (was: ) > protobuf vendored but its depencencies are not > -- > > Key: MESOS-5145 > URL: https://issues.apache.org/jira/browse/MESOS-5145 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: David Robinson > Labels: mesosphere > > Updating [protobuf from 2.5 to > 2.6.1|https://github.com/apache/mesos/commit/51872fba7f94d80e55c9cc9b46f96780a938f626] > has caused Mesos builds to fail if pypi.python.org is unreachable. > Protobuf-2.6.1 requires > [google-apputils|https://pypi.python.org/pypi/google-apputils] and if it's > not available the build process will attempt to download it from pypi. > Prior to this change it was possible to build Mesos without Internet access. > If the build process reaches out to arbitrary things on the Internet it's > impossible to guarantee build reproducibility. > {noformat:title=snippet from setup.py in protobuf-2.6.1.tar.gz} > setup(name = 'protobuf', > version = '2.6.1', > ... > setup_requires = ['google-apputils'], > ... > ) > {noformat} > {noformat:title=snippet from build log} > 08:20:49 DEBUG: Building protobuf Python egg ... > 08:20:49 DEBUG: cd ../3rdparty/libprocess/3rdparty/protobuf-2.6.1/python && > \ > 08:20:49 DEBUG: CC="gcc" \ > 08:20:49 DEBUG: CXX="g++" \ > 08:20:49 DEBUG: CFLAGS="-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic > -Wno-unused-local-typedefs" \ > 08:20:49 DEBUG: CXXFLAGS="-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic > -Wno-unused-local-typedefs -Wno-maybe-uninitialized -std=c++11" > \ > 08:20:49 DEBUG: > PYTHONPATH=/builddir/build/BUILD/mesos-0.29.0/3rdparty/distribute-0.6.26 > \ > 08:20:49 DEBUG: /usr/bin/python2.7 setup.py build bdist_egg > 08:20:49 DEBUG: Download error on > http://pypi.python.org/simple/google-apputils/: [Errno 111] Connection > refused -- Some packages may not be found! > 08:20:49 DEBUG: Download error on > http://pypi.python.org/simple/google-apputils/: [Errno 111] Connection > refused -- Some packages may not be found! > 08:20:49 DEBUG: Couldn't find index page for 'google-apputils' (maybe > misspelled?) > 08:20:49 DEBUG: Download error on http://pypi.python.org/simple/: [Errno 111] > Connection refused -- Some packages may not be found! > 08:20:49 DEBUG: No local packages or download links found for google-apputils > 08:20:49 DEBUG: Traceback (most recent call last): > 08:20:49 DEBUG: File "setup.py", line 200, in > 08:20:49 DEBUG: "Protocol Buffers are Google's data interchange format.", > 08:20:49 DEBUG: File "/usr/lib64/python2.7/distutils/core.py", line 111, in > setup > 08:20:49 DEBUG: _setup_distribution = dist = klass(attrs) > 08:20:49 DEBUG: File > "/builddir/build/BUILD/mesos-0.29.0/3rdparty/distribute-0.6.26/setuptools/dist.py", > line 221, in __init__ > 08:20:49 DEBUG: self.fetch_build_eggs(attrs.pop('setup_requires')) > 08:20:49 DEBUG: File > "/builddir/build/BUILD/mesos-0.29.0/3rdparty/distribute-0.6.26/setuptools/dist.py", > line 245, in fetch_build_eggs > 08:20:49 DEBUG: parse_requirements(requires), > installer=self.fetch_build_egg > 08:20:49 DEBUG: File > "/builddir/build/BUILD/mesos-0.29.0/3rdparty/distribute-0.6.26/pkg_resources.py", > line 580, in resolve > 08:20:49 DEBUG: dist = best[req.key] = env.best_match(req, self, > installer) > 08:20:49 DEBUG: File > "/builddir/build/BUILD/mesos-0.29.0/3rdparty/distribute-0.6.26/pkg_resources.py", > line 825, in best_match > 08:20:49 DEBUG: return self.obtain(req, installer) # try and > download/install > 08:20:49 DEBUG: File > "/builddir/build/BUILD/mesos-0.29.0/3rdparty/distribute-0.6.26/pkg_resources.py", > line 837, in obtain > 08:20:49 DEBUG: return installer(requirement) > 08:20:49 DEBUG: File > "/builddir/build/BUILD/mesos-0.29.0/3rdparty/distribute-0.6.26/setuptools/dist.py", > line 294, in fetch_build_egg > 08:20:49 DEBUG: return cmd.easy_install(req) > 08:20:49 DEBUG: File > "/builddir/build/BUILD/mesos-0.29.0/3rdparty/distribute-0.6.26/setuptools/command/easy_install.py", > line 584, in easy_install > 08:20:49 DEBUG: raise DistutilsError(msg) > 08:20:49 DEBUG: distutils.errors.DistutilsError: Could not find suitable > distribution for Requirement.parse('google-apputils') > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2533) Support HTTP checks in Mesos health check program
[ https://issues.apache.org/jira/browse/MESOS-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235528#comment-15235528 ] haosdent commented on MESOS-2533: - [~medzin] After read the issue, how about you use marathon command health check as workaround currently. Marathon command health check depends on the health check of Mesos as well. Because I think still need take some times to put this patch forward and merged, I afraid you could not use this shortly. > Support HTTP checks in Mesos health check program > - > > Key: MESOS-2533 > URL: https://issues.apache.org/jira/browse/MESOS-2533 > Project: Mesos > Issue Type: Bug >Reporter: Niklas Quarfot Nielsen >Assignee: haosdent > Labels: mesosphere > > Currently, only commands are supported but our health check protobuf enables > users to encode HTTP checks as well. We should wire up this in the health > check program or remove the http field from the protobuf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5064) Remove default value for the agent `work_dir`
[ https://issues.apache.org/jira/browse/MESOS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220643#comment-15220643 ] Greg Mann edited comment on MESOS-5064 at 4/11/16 5:18 PM: --- Reviews here: https://reviews.apache.org/r/46003/ https://reviews.apache.org/r/46005/ https://reviews.apache.org/r/46004/ https://reviews.apache.org/r/45562/ https://reviews.apache.org/r/46038/ was (Author: greggomann): Reviews here: https://reviews.apache.org/r/45562/ https://reviews.apache.org/r/45563/ > Remove default value for the agent `work_dir` > - > > Key: MESOS-5064 > URL: https://issues.apache.org/jira/browse/MESOS-5064 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Greg Mann > > Following a crash report from the user we need to be more explicit about the > dangers of using {{/tmp}} as agent {{work_dir}}. In addition, we can remove > the default value for the {{\-\-work_dir}} flag, forcing users to explicitly > set the work directory for the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2533) Support HTTP checks in Mesos health check program
[ https://issues.apache.org/jira/browse/MESOS-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235486#comment-15235486 ] Adam Medziński commented on MESOS-2533: --- [~haosd...@gmail.com] thanks for suggestion, but I ask about this because of my issue on Marathon github: https://github.com/mesosphere/marathon/issues/3728 > Support HTTP checks in Mesos health check program > - > > Key: MESOS-2533 > URL: https://issues.apache.org/jira/browse/MESOS-2533 > Project: Mesos > Issue Type: Bug >Reporter: Niklas Quarfot Nielsen >Assignee: haosdent > Labels: mesosphere > > Currently, only commands are supported but our health check protobuf enables > users to encode HTTP checks as well. We should wire up this in the health > check program or remove the http field from the protobuf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4922) Setup proper /etc/hostname, /etc/hosts and /etc/resolv.conf for containers in network/cni isolator.
[ https://issues.apache.org/jira/browse/MESOS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235468#comment-15235468 ] Jie Yu commented on MESOS-4922: --- commit 00141e4a56a81525fec1f86f2b212dcbc04e3a8c Author: Avinash sridharan Date: Mon Apr 11 09:51:16 2016 -0700 Adding a stout interface for `sethostname` system call in linux. Review: https://reviews.apache.org/r/45953/ > Setup proper /etc/hostname, /etc/hosts and /etc/resolv.conf for containers in > network/cni isolator. > --- > > Key: MESOS-4922 > URL: https://issues.apache.org/jira/browse/MESOS-4922 > Project: Mesos > Issue Type: Bug > Components: isolation >Reporter: Qian Zhang >Assignee: Avinash Sridharan > Labels: mesosphere > > The network/cni isolator needs to properly setup /etc/hostname and /etc/hosts > for the container with a hostname (e.g., randomly generated) and the assigned > IP returned by CNI plugin. > We should consider the following cases: > 1) container is using host filesystem > 2) container is using a different filesystem > 3) custom executor and command executor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5173) Allow master/agent to take multiple --modules flags
Kapil Arya created MESOS-5173: - Summary: Allow master/agent to take multiple --modules flags Key: MESOS-5173 URL: https://issues.apache.org/jira/browse/MESOS-5173 Project: Mesos Issue Type: Task Reporter: Kapil Arya Fix For: 0.29.0 When loading multiple modules into master/agent, one has to merge all module metadata (library name, module name, parameters, etc.) into a single json file which is then passed on to the --modules flag. This quickly becomes cumbersome especially if the modules are coming from different vendors/developers. An alternate would be to allow multiple invocations of --modules flag that can then be passed on to the module manager. That way, each flag corresponds to just one module library and modules from that library. Another approach is to create a new flag (e.g., --modules-dir) that contains a path to a directory that would contain multiple json files. One can think of it as an analogous to systemd units. The operator that drops a new file into this directory and the file would automatically be picked up by the master/agent module manager. Further, the naming scheme can also be inherited to prefix the filename with an "NN_" to signify oad order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4891) Add a '/containers' endpoint to the agent to list all the active containers.
[ https://issues.apache.org/jira/browse/MESOS-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4891: -- Shepherd: Jie Yu > Add a '/containers' endpoint to the agent to list all the active containers. > > > Key: MESOS-4891 > URL: https://issues.apache.org/jira/browse/MESOS-4891 > Project: Mesos > Issue Type: Improvement > Components: slave >Reporter: Jie Yu >Assignee: Jay Guo > Labels: mesosphere > Attachments: screenshot.png > > > This endpoint will be similar to /monitor/statistics.json endpoint, but it'll > also contain the 'container_status' about the container (see ContainerStatus > in mesos.proto). We'll eventually deprecate the /monitor/statistics.json > endpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4891) Add a '/containers' endpoint to the agent to list all the active containers.
[ https://issues.apache.org/jira/browse/MESOS-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4891: -- Story Points: 8 > Add a '/containers' endpoint to the agent to list all the active containers. > > > Key: MESOS-4891 > URL: https://issues.apache.org/jira/browse/MESOS-4891 > Project: Mesos > Issue Type: Improvement > Components: slave >Reporter: Jie Yu >Assignee: Jay Guo > Labels: mesosphere > Attachments: screenshot.png > > > This endpoint will be similar to /monitor/statistics.json endpoint, but it'll > also contain the 'container_status' about the container (see ContainerStatus > in mesos.proto). We'll eventually deprecate the /monitor/statistics.json > endpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5172) Registry puller cannot fetch blobs correctly from some private repos.
Gilbert Song created MESOS-5172: --- Summary: Registry puller cannot fetch blobs correctly from some private repos. Key: MESOS-5172 URL: https://issues.apache.org/jira/browse/MESOS-5172 Project: Mesos Issue Type: Bug Components: containerization Reporter: Gilbert Song Assignee: Gilbert Song When the registry puller is pulling a private repository from some private registry (e.g., quay.io), errors may occur when fetching blobs, at which point fetching the manifest of the repo is finished correctly. The error message is `Unexpected HTTP response '400 Bad Request' when trying to download the blob`. This may arise from the logic of fetching blobs, or incorrect format of uri when requesting blobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4944) Improve overlay backend so that it's writable
[ https://issues.apache.org/jira/browse/MESOS-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4944: -- Shepherd: Jie Yu Sprint: Mesosphere Sprint 32 Labels: mesosphere (was: ) > Improve overlay backend so that it's writable > - > > Key: MESOS-4944 > URL: https://issues.apache.org/jira/browse/MESOS-4944 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jie Yu >Assignee: Shuai Lin > Labels: mesosphere > Fix For: 0.29.0 > > > Currently, the overlay backend will provision a read-only FS. We can use an > empty directory from the container sandbox to act as the upper layer so that > it's writable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4944) Improve overlay backend so that it's writable
[ https://issues.apache.org/jira/browse/MESOS-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4944: -- Story Points: 5 > Improve overlay backend so that it's writable > - > > Key: MESOS-4944 > URL: https://issues.apache.org/jira/browse/MESOS-4944 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jie Yu >Assignee: Shuai Lin > Labels: mesosphere > Fix For: 0.29.0 > > > Currently, the overlay backend will provision a read-only FS. We can use an > empty directory from the container sandbox to act as the upper layer so that > it's writable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4944) Improve overlay backend so that it's writable
[ https://issues.apache.org/jira/browse/MESOS-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4944: -- Component/s: containerization > Improve overlay backend so that it's writable > - > > Key: MESOS-4944 > URL: https://issues.apache.org/jira/browse/MESOS-4944 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jie Yu >Assignee: Shuai Lin > Labels: mesosphere > Fix For: 0.29.0 > > > Currently, the overlay backend will provision a read-only FS. We can use an > empty directory from the container sandbox to act as the upper layer so that > it's writable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5171) Expose state/state.hpp to public headers
Kapil Arya created MESOS-5171: - Summary: Expose state/state.hpp to public headers Key: MESOS-5171 URL: https://issues.apache.org/jira/browse/MESOS-5171 Project: Mesos Issue Type: Task Components: replicated log Reporter: Kapil Arya Assignee: Kapil Arya Fix For: 0.29.0 We want the Modules to be able to use replicated log along with the APIs to communicate with Zookeeper. This change would require us to expose at least the following headers state/storage.hpp, and any additional files that state.hpp depends on (e.g., zookeeper/authentication.hpp). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5170) Adapt json creation for authorization based endpoint filtering.
[ https://issues.apache.org/jira/browse/MESOS-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joerg Schad updated MESOS-5170: --- Story Points: 5 (was: 3) Labels: authorization mesosphere security (was: mesosphere security) > Adapt json creation for authorization based endpoint filtering. > --- > > Key: MESOS-5170 > URL: https://issues.apache.org/jira/browse/MESOS-5170 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad > Labels: authorization, mesosphere, security > > For authorization based endpoint filtering we need to adapt the json endpoint > creation as discussed in MESOS-4931. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5169) Introduce new Authorizer Actions for Authorized based filtering of endpoints.
[ https://issues.apache.org/jira/browse/MESOS-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joerg Schad updated MESOS-5169: --- Labels: authorization mesosphere security (was: ) > Introduce new Authorizer Actions for Authorized based filtering of endpoints. > - > > Key: MESOS-5169 > URL: https://issues.apache.org/jira/browse/MESOS-5169 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad > Labels: authorization, mesosphere, security > > For authorization based endpoint filtering we need to introduce the > authorizer actions outlined via MESOS-493. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5168) Benchmark overhead of authorization based filtering.
[ https://issues.apache.org/jira/browse/MESOS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joerg Schad updated MESOS-5168: --- Labels: authorization mesosphere security (was: mesosphere security) > Benchmark overhead of authorization based filtering. > > > Key: MESOS-5168 > URL: https://issues.apache.org/jira/browse/MESOS-5168 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad > Labels: authorization, mesosphere, security > > When adding authorization based filtering as outlined in MESOS-4931 we need > to be careful especially for performance critical endpoints such as /state. > We should ensure via a benchmark that performance does not degreade below an > acceptable state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5168) Benchmark overhead of authorization based filtering.
Joerg Schad created MESOS-5168: -- Summary: Benchmark overhead of authorization based filtering. Key: MESOS-5168 URL: https://issues.apache.org/jira/browse/MESOS-5168 Project: Mesos Issue Type: Improvement Reporter: Joerg Schad When adding authorization based filtering as outlined in MESOS-4931 we need to be careful especially for performance critical endpoints such as /state. We should ensure via a benchmark that performance does not degreade below an acceptable state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5168) Benchmark overhead of authorization based filtering.
[ https://issues.apache.org/jira/browse/MESOS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joerg Schad updated MESOS-5168: --- Labels: mesosphere security (was: ) > Benchmark overhead of authorization based filtering. > > > Key: MESOS-5168 > URL: https://issues.apache.org/jira/browse/MESOS-5168 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad > Labels: mesosphere, security > > When adding authorization based filtering as outlined in MESOS-4931 we need > to be careful especially for performance critical endpoints such as /state. > We should ensure via a benchmark that performance does not degreade below an > acceptable state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5170) Adapt json creation for authorization based endpoint filtering.
[ https://issues.apache.org/jira/browse/MESOS-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joerg Schad updated MESOS-5170: --- Labels: mesosphere security (was: mesosphere) > Adapt json creation for authorization based endpoint filtering. > --- > > Key: MESOS-5170 > URL: https://issues.apache.org/jira/browse/MESOS-5170 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad > Labels: mesosphere, security > > For authorization based endpoint filtering we need to adapt the json endpoint > creation as discussed in MESOS-4931. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5170) Adapt json creation for authorization based endpoint filtering.
Joerg Schad created MESOS-5170: -- Summary: Adapt json creation for authorization based endpoint filtering. Key: MESOS-5170 URL: https://issues.apache.org/jira/browse/MESOS-5170 Project: Mesos Issue Type: Improvement Reporter: Joerg Schad For authorization based endpoint filtering we need to adapt the json endpoint creation as discussed in MESOS-4931. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5169) Introduce new Authorizer Actions for Authorized based filtering of endpoints.
Joerg Schad created MESOS-5169: -- Summary: Introduce new Authorizer Actions for Authorized based filtering of endpoints. Key: MESOS-5169 URL: https://issues.apache.org/jira/browse/MESOS-5169 Project: Mesos Issue Type: Improvement Reporter: Joerg Schad For authorization based endpoint filtering we need to introduce the authorizer actions outlined via MESOS-493. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5168) Benchmark overhead of authorization based filtering.
[ https://issues.apache.org/jira/browse/MESOS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joerg Schad updated MESOS-5168: --- Story Points: 3 > Benchmark overhead of authorization based filtering. > > > Key: MESOS-5168 > URL: https://issues.apache.org/jira/browse/MESOS-5168 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad > > When adding authorization based filtering as outlined in MESOS-4931 we need > to be careful especially for performance critical endpoints such as /state. > We should ensure via a benchmark that performance does not degreade below an > acceptable state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4931) Authorization based filtering for endpoints.
[ https://issues.apache.org/jira/browse/MESOS-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joerg Schad updated MESOS-4931: --- Description: Some endpoints such as /state should be filtered depending on which information the user is authorized to see. For example a user should be only able to see tasks he is authorized to see. (was: Some endpoints -such as state- should be filtered depending on which information the user is authorized to see. For example a user should be only able to see tasks he is authorized to see.) > Authorization based filtering for endpoints. > > > Key: MESOS-4931 > URL: https://issues.apache.org/jira/browse/MESOS-4931 > Project: Mesos > Issue Type: Epic > Components: security >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: authorization, mesosphere, security > Fix For: 0.29.0 > > > Some endpoints such as /state should be filtered depending on which > information the user is authorized to see. For example a user should be only > able to see tasks he is authorized to see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5130) Enable `newtork/cni` isolator in `MesosContainerizer` as the default `network` isolator.
[ https://issues.apache.org/jira/browse/MESOS-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avinash Sridharan updated MESOS-5130: - Sprint: Mesosphere Sprint 32 Description: Currently there are no default `network` isolators for `MesosContainerizer`. With the development of the `network/cni` isolator we have an interface to run Mesos on multitude of IP networks. Given that its based on an open standard (the CNI spec) which is gathering a lot of traction from vendors (calico, weave, coreOS) and already works on some default networks (bridge, ipvlan, macvlan) it makes sense to make it as the default network isolator. (was: The CNI network isolator needs to be enabled by default. ) > Enable `newtork/cni` isolator in `MesosContainerizer` as the default > `network` isolator. > > > Key: MESOS-5130 > URL: https://issues.apache.org/jira/browse/MESOS-5130 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > Currently there are no default `network` isolators for `MesosContainerizer`. > With the development of the `network/cni` isolator we have an interface to > run Mesos on multitude of IP networks. Given that its based on an open > standard (the CNI spec) which is gathering a lot of traction from vendors > (calico, weave, coreOS) and already works on some default networks (bridge, > ipvlan, macvlan) it makes sense to make it as the default network isolator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5064) Remove default value for the agent `work_dir`
[ https://issues.apache.org/jira/browse/MESOS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-5064: - Summary: Remove default value for the agent `work_dir` (was: Document avoiding using `/tmp` as agent’s work directory in production) > Remove default value for the agent `work_dir` > - > > Key: MESOS-5064 > URL: https://issues.apache.org/jira/browse/MESOS-5064 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Greg Mann > > Following a crash report from the user we need to be more explicit about the > dangers of using {{/tmp}} as agent {{work_dir}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5064) Remove default value for the agent `work_dir`
[ https://issues.apache.org/jira/browse/MESOS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-5064: - Description: Following a crash report from the user we need to be more explicit about the dangers of using {{/tmp}} as agent {{work_dir}}. In addition, we can remove the default value for the {{\-\-work_dir}} flag, forcing users to explicitly set the work directory for the agent. (was: Following a crash report from the user we need to be more explicit about the dangers of using {{/tmp}} as agent {{work_dir}}) > Remove default value for the agent `work_dir` > - > > Key: MESOS-5064 > URL: https://issues.apache.org/jira/browse/MESOS-5064 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Greg Mann > > Following a crash report from the user we need to be more explicit about the > dangers of using {{/tmp}} as agent {{work_dir}}. In addition, we can remove > the default value for the {{\-\-work_dir}} flag, forcing users to explicitly > set the work directory for the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3084) PPC64LE architecture support on third-party libraries
[ https://issues.apache.org/jira/browse/MESOS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235275#comment-15235275 ] Neil Conway commented on MESOS-3084: Hi [~ykrips] -- I believe that this JIRA (and the associated review requests) can be closed, because the libraries in question have been updated. Is that correct? Thanks. > PPC64LE architecture support on third-party libraries > - > > Key: MESOS-3084 > URL: https://issues.apache.org/jira/browse/MESOS-3084 > Project: Mesos > Issue Type: Improvement > Components: general, libprocess >Affects Versions: 0.22.1 > Environment: Ubuntu 14.04 ppc64le >Reporter: Jihun Kang >Assignee: Jihun Kang >Priority: Minor > > Some third-party libraries were behind the development cycle, so some of them > already supported ppc64 and ppc64le architecture but these changes are not > applied to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5151) Marathon Pass Dynamic Value with Parameters Resource in Docker Configuration
[ https://issues.apache.org/jira/browse/MESOS-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235206#comment-15235206 ] Greg Mann commented on MESOS-5151: -- [~jesada], thanks for the extra information. I believe that Marathon already supports passing arbitrary command-line parameters to the Docker CLI; for example, see the JSON provided at the bottom of this Github issue: https://github.com/mesosphere/marathon/issues/3111 > Marathon Pass Dynamic Value with Parameters Resource in Docker Configuration > > > Key: MESOS-5151 > URL: https://issues.apache.org/jira/browse/MESOS-5151 > Project: Mesos > Issue Type: Wish > Components: docker >Affects Versions: 0.28.0 > Environment: software >Reporter: Jesada Gonkratoke > > "parameters": [ >{ "key": "add-host", "value": "dockerhost:$(hostname -i)" } > ] > }, > # I want to add dynamic host ip -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2533) Support HTTP checks in Mesos health check program
[ https://issues.apache.org/jira/browse/MESOS-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235105#comment-15235105 ] haosdent commented on MESOS-2533: - [~medzin] Thank you for your inquiring. This patch is stale now. I think I need rebase again. But If you want to check by http now, it could use command check as workaround. For example, you could use {{curl xxx}} as your health check command. > Support HTTP checks in Mesos health check program > - > > Key: MESOS-2533 > URL: https://issues.apache.org/jira/browse/MESOS-2533 > Project: Mesos > Issue Type: Bug >Reporter: Niklas Quarfot Nielsen >Assignee: haosdent > Labels: mesosphere > > Currently, only commands are supported but our health check protobuf enables > users to encode HTTP checks as well. We should wire up this in the health > check program or remove the http field from the protobuf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3070) Master CHECK failure if a framework uses duplicated task id.
[ https://issues.apache.org/jira/browse/MESOS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235104#comment-15235104 ] Klaus Ma commented on MESOS-3070: - Ping [~vinodkone]/[~jieyu] :). > Master CHECK failure if a framework uses duplicated task id. > > > Key: MESOS-3070 > URL: https://issues.apache.org/jira/browse/MESOS-3070 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 0.22.1 >Reporter: Jie Yu >Assignee: Klaus Ma > > We observed this in one of our testing cluster. > One framework (under development) keeps launching tasks using the same > task_id. We don't expect the master to crash even if the framework is not > doing what it's supposed to do. However, under a series of events, this could > happen and keeps crashing the master. > 1) frameworkA launches task 'task_id_1' on slaveA > 2) master fails over > 3) slaveA has not re-registered yet > 4) frameworkA re-registered and launches task 'task_id_1' on slaveB > 5) slaveA re-registering and add task "task_id_1' to frameworkA > 6) CHECK failure in addTask > {noformat} > I0716 21:52:50.759305 28805 master.hpp:159] Adding task 'task_id_1' with > resources cpus(*):4; mem(*):32768 on slave > 20150417-232509-1735470090-5050-48870-S25 (hostname) > ... > ... > F0716 21:52:50.760136 28805 master.hpp:362] Check failed: > !tasks.contains(task->task_id()) Duplicate task 'task_id_1' of framework > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4070) numify() handles negative numbers inconsistently.
[ https://issues.apache.org/jira/browse/MESOS-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235077#comment-15235077 ] Yong Tang commented on MESOS-4070: -- Hi [~jieyu], I am wondering if you get a chance to take a look at the review request: https://reviews.apache.org/r/45011/ And since you initially opened this JIRA ticket (MESOS-4070), is it possible for you to shepherd this ticket? Thanks. > numify() handles negative numbers inconsistently. > - > > Key: MESOS-4070 > URL: https://issues.apache.org/jira/browse/MESOS-4070 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Jie Yu >Assignee: Yong Tang > Labels: tech-debt > > As pointed by [~neilc] in this review: > https://reviews.apache.org/r/40988 > {noformat} > Try num2 = numify("-10"); > EXPECT_SOME_EQ(-10, num2); > // TODO(neilc): This is inconsistent with the handling of non-hex numbers. > EXPECT_ERROR(numify("-0x10")); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4621) --disable-optimize triggers optimized builds.
[ https://issues.apache.org/jira/browse/MESOS-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235072#comment-15235072 ] Yong Tang commented on MESOS-4621: -- Hi [~tillt], just wondering if you get a chance to take a look at the review request for: https://reviews.apache.org/r/44911/ Or do you think we should close this issue MESOS-4621 and continue on the other issue MESOS-2537 by [~jamespeach]? > --disable-optimize triggers optimized builds. > - > > Key: MESOS-4621 > URL: https://issues.apache.org/jira/browse/MESOS-4621 > Project: Mesos > Issue Type: Bug >Reporter: Till Toenshoff >Assignee: Yong Tang >Priority: Minor > > The toggle-logic of the build configuration argument {{optimize}} appears to > be implemented incorrectly. When using the perfectly legal invocation; > {noformat} > ../configure --disable-optimize > {noformat} > What you get here is enabled optimizing {{O2}}. > {noformat} > ccache g++ -Qunused-arguments -fcolor-diagnostics > -DPACKAGE_NAME=\"libprocess\" -DPACKAGE_TARNAME=\"libprocess\" > -DPACKAGE_VERSION=\"0.0.1\" -DPACKAGE_STRING=\"libprocess\ 0.0.1\" > -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"libprocess\" > -DVERSION=\"0.0.1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 > -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 > -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 > -DLT_OBJDIR=\".libs/\" -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 > -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 > -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBCURL=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 > -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBDL=1 -I. > -I../../../../3rdparty/libprocess/3rdparty > -I../../../../3rdparty/libprocess/3rdparty/stout/include -Iprotobuf-2.5.0/src > -Igmock-1.7.0/gtest/include -Igmock-1.7.0/include -isystem boost-1.53.0 > -Ipicojson-1.3.0 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -Iglog-0.3.3/src > -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include > -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 > -I/usr/include/apr-1.0 -O2 -Wno-unused-local-typedef -std=c++11 > -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT > stout_tests-flags_tests.o -MD -MP -MF .deps/stout_tests-flags_tests.Tpo -c -o > stout_tests-flags_tests.o `test -f 'stout/tests/flags_tests.cpp' || echo > '../../../../3rdparty/libprocess/3rdparty/'`stout/tests/flags_tests.cpp > {noformat} > It seems more straightforward to actually disable optimizing for the above > argument. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5167) Add tests for `network/cni` isolator
[ https://issues.apache.org/jira/browse/MESOS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Zhang updated MESOS-5167: -- Shepherd: Jie Yu > Add tests for `network/cni` isolator > > > Key: MESOS-5167 > URL: https://issues.apache.org/jira/browse/MESOS-5167 > Project: Mesos > Issue Type: Task > Components: test >Reporter: Qian Zhang >Assignee: Qian Zhang > > We need to add tests to verify the functionality of `network/cni` isolator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5167) Add tests for `network/cni` isolator
Qian Zhang created MESOS-5167: - Summary: Add tests for `network/cni` isolator Key: MESOS-5167 URL: https://issues.apache.org/jira/browse/MESOS-5167 Project: Mesos Issue Type: Task Components: test Reporter: Qian Zhang Assignee: Qian Zhang We need to add tests to verify the functionality of `network/cni` isolator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5166) ExamplesTest.DynamicReservationFramework is slow
Benjamin Bannier created MESOS-5166: --- Summary: ExamplesTest.DynamicReservationFramework is slow Key: MESOS-5166 URL: https://issues.apache.org/jira/browse/MESOS-5166 Project: Mesos Issue Type: Bug Components: test Reporter: Benjamin Bannier For an unoptimized build under OS X {{ExamplesTest.DynamicReservationFramework}} currently takes more than 13 seconds on my machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart
[ https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234997#comment-15234997 ] Deshi Xiao commented on MESOS-1739: --- log yujie's comment here: {quote} This is a high level question: I am now sure if adding attributes is safe or not. For instance, my framework has the following rule: only schedule tasks to agents that do not have attribute "not_safe". Now, say agent A is initially without that attribute. My framework lands several tasks on that agent. Later, when agent restarts, the operator adds the new attribute "not_safe". Suddently, i have tasks running on unsafe boxes. oops. {quote} > Allow slave reconfiguration on restart > -- > > Key: MESOS-1739 > URL: https://issues.apache.org/jira/browse/MESOS-1739 > Project: Mesos > Issue Type: Epic >Reporter: Patrick Reilly > Labels: external-volumes, mesosphere, myriad > > Make it so that either via a slave restart or a out of process "reconfigure" > ping, the attributes and resources of a slave can be updated to be a superset > of what they used to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5165) Add tests to ensure the installed Python tools work
Benjamin Bannier created MESOS-5165: --- Summary: Add tests to ensure the installed Python tools work Key: MESOS-5165 URL: https://issues.apache.org/jira/browse/MESOS-5165 Project: Mesos Issue Type: Bug Components: python api, test Reporter: Benjamin Bannier We should check at least that the installed tools are complete, and probably also add some integration tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5010) Installation of mesos python package is incomplete
[ https://issues.apache.org/jira/browse/MESOS-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-5010: -- Sprint: Mesosphere Sprint 32 Fix Version/s: 0.29.0 > Installation of mesos python package is incomplete > -- > > Key: MESOS-5010 > URL: https://issues.apache.org/jira/browse/MESOS-5010 > Project: Mesos > Issue Type: Bug > Components: python api >Affects Versions: 0.26.0, 0.28.0, 0.27.2, 0.29.0 >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > Fix For: 0.29.0 > > > The installation of mesos python package is incomplete, i.e., the files > {{cli.py}}, {{futures.py}}, and {{http.py}} are not installed. > {code} > % ../configure --enable-python > % make install DESTDIR=$PWD/D > % PYTHONPATH=$PWD/D/usr/local/lib/python2.7/site-packages:$PYTHONPATH python > -c 'from mesos import http' > Traceback (most recent call last): > File "", line 1, in > ImportError: cannot import name http > {code} > This appears to be first broken with {{d1d70b9}} (MESOS-3969, [Upgraded > bundled pip to 7.1.2.|https://reviews.apache.org/r/40630]). Bisecting in > {{pip}}-land shows that our install becomes broken for {{pip-6.0.1}} and > later (we are using {{pip-7.1.2}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332)